Intel® oneAPI Base Toolkit Release Notes

Intel® oneAPI Base Toolkit supports direct programming and API programming, and delivers a unified language and libraries that offer full native code support across a range of hardware including Intel® and compatible processors, Intel® Processor Graphics Gen9, Gen11, Gen12, Intel® Iris® Xe MAX graphics, Intel® Data Center GPU Max Series, and Intel® Arria® 10 or Intel® Stratix® 10 SX FPGAs. It also contains analysis & debug tools for development and performance tuning.

Sravani Konda

System Requirements

Please see Intel oneAPI Base Toolkit System Requirements

Intel® oneAPI Base Toolkit Major Component Versions

Please visit Intel® oneAPI Toolkit and Component Versioning Schema for semantic versioning schema detail.

The following table contains major versions of components in the latest oneAPI Base Toolkit 2024.1.0

Component Name	Version
Intel® oneAPI DPC++ Compiler	2024.1.0
Intel® oneAPI DPC++ Library	2022.5.0
Intel® DPC++ Compatibility Tool	2024.1.0
Intel® oneAPI Math Kernel Library	2024.1.0
Intel® Distribution for GDB*	2024.1.0
Intel® VTune™ Profiler	2024.1.0
Intel® Advisor	2024.1.0
Intel® oneAPI Threading Building Blocks	2021.12.0
Intel® Integrated Performance Primitives	2021.11.0
Intel® Integrated Performance Primitives Cryptography	2021.10.0
Intel® oneAPI Collective Communications Library	2021.12.0
Intel® oneAPI Data Analytics Library	2024.2.0
Intel® oneAPI Deep Neural Networks Library	2024.1.0

New in Intel® oneAPI Base Toolkit 2024.1.0

Toolkit Level Updates

Intel® oneAPI Base Toolkit 2024.1.0 now supports Fedora 39 for CPU.
The Intel® oneAPI DPC++/C++ Compiler is the industry's first compiler conformant with SYCL 2020 allowing developers to write code once and run it on a variety of different processors using standard C++ making developers more productive by reducing development time and effort.
Enhanced SYCL Graph, allowing developers to use multi-threaded work generation and thread-safe functions seamlessly integrate with applications. SYCL ensures thread safety for all member functions, enhancing performance and reliability in parallel computing. SYCL Graph is now available on multiple SYCL backends, allowing developers to tune once and deploying anywhere. Additionally, our CUDA graph alternative offers and open, multi-platform solution, minimizing kernel dispatching overhead and ensuring adaptability across diverse software and hardware stacks.
Migrate to SYCL, build and deploy easier with Intel® DPC++ Compatibility Tool migrating more CUDA APIs, and now also migrating the project CMake file as a technology preview
Develop more future-proof code with the Data Parallel Control library (dpctl), providing 100% conformance to the Python Array API standard and offers new support for Nvidia* devices. New functions include types for reduction, statistics, sorting, set, elementwise, linear algebra, and in-place elementwise operations.
Intel® oneAPI Math Kernel Library 2024.1.0 introduces several optimizations and new functionalities able to reduce the data transfer between Intel GPUs and the host CPU including batched Singular Value Decomposition, batched solver for linear systems and addition of Bessel functions of first and second kinds.
Unlock performance enhancements with the latest Intel® oneAPI Deep Neural Network Library (oneDNN) including improvements in graphics processing for Intel data center GPUs and Intel® Arc™ Graphics, perfect for complex models like Large Language Models and Stable Diffusion and increased performance for Intel Xeon Scalable processors.
Ensure accuracy & consistency of your computations with reproducibility of BLAS level 3 operations on Intel GPUs now, using Conditional Numeric Reproducibility (CNR).
Speed up gradient boosting inference across XGBoost, LightGBM, and CatBoost* without sacrificing accuracy with new fast tree inference1 in Intel® oneAPI Data Analytics Library (oneDAL).
Enhance your security with Intel IPP Cryptography’s compliance to FIPS 140-3, a U.S govt standard. Ideal for govt agencies and industries that handle sensitive data.

Intel® oneAPI DPC++ Compiler 2024.1.0

The Intel® oneAPI DPC++/C++ Compiler is the industry's first compiler conformant with SYCL 2020 allowing developers to write code once and run it on a variety of different processors using standard C++ making developers more productive by reducing development time and effort.
Enhanced SYCL Graph, allowing developers to use multi-threaded work generation and thread-safe functions seamlessly integrate with applications. SYCL ensures thread safety for all member functions, enhancing performance and reliability in parallel computing. SYCL Graph is now available on multiple SYCL backends, allowing developers to tune once and deploying anywhere. Additionally, our CUDA graph alternative offers and open, multi-platform solution, minimizing kernel dispatching overhead and ensuring adaptability across diverse software and hardware stacks.
Intel® oneAPI DPC++/C++ Compiler enhances OpenMP 5.0, 5.1, 5.2, and TR12 standards compliance.

Intel® oneAPI DPC++ Library 2022.5.0

Intel® oneAPI DPC++ Library adds specialized sort algorithm to improve PVC app performance
Intel® oneAPI DPC++ Library adds transform_if variant with mask input for stencil computation needs
Intel® oneAPI DPC++ Library extends C++ STL style programming with histogram algorithms to accelerate scientific, AI & other apps

Intel® DPC++ Compatibility Tool 2024.1.0

Automatically captures CUDA workload signature to validate migrated SYCL code using Intel® DPC++ Compatibility Tool “CodePin” technology preview
Migrate to SYCL, build and deploy easier with Intel® DPC++ Compatibility Tool migrating more CUDA APIs, and now also migrating the project CMake file as a technology preview

Intel® oneAPI Math Kernel Library 2024.1.0

Intel® oneAPI Math Kernel Library 2024.1.0 introduces several optimizations and new functionalities able to reduce the data transfer between Intel GPUs and the host CPU including batched Singular Value Decomposition, batched solver for linear systems and addition of Bessel functions of first and second kinds.
This release provides users the ability to reproduce results of BLAS level 3 operations on Intel GPUs from run to run through Conditional Numerical Reproducibility (CNR) that was previously available only for x86 CPUs. Users can configure Intel® oneMKL to ensure bitwise reproducible results.
Intel® oneMKL 2024.1.0 makes it easier to port CUDA applications to SYCL by adding multiple functions equivalent to those available in cuSolver*, cuBLAS* and CUDA Math Library*.
Improved performance of QR factorization, a key computation in LAPACK, by taking advantage of both the Intel Xeon Processor Family and the Intel® Data Center GPU Max Series.

Intel® Distribution for GDB* 2024.1.0

Intel Distribution for GDB* rebases to GDB* 14 staying current and aligned with the latest enhancements supporting effective application debug.
Intel Distribution for GDB* adds online page fault handling for GPUs allowing developers to monitor and troubleshoot memory access issues in real-time, while also providing insight into the behavior of the GPU driver, resulting in improved application performance and reliability.
Intel Distribution for GDB* adds large General Purpose Register File (GRF) debug mode support for GPUs providing developers with more visibility into the GPU's internal state and allowing for more comprehensive debugging and optimization of GPU-accelerated applications. This mode is particularly useful for debugging complex or performance-critical code.

Intel® VTune™ Profiler 2024.1.0

Intel® VTune™ Profiler 2024.1.0 adds capability to identify and understand the reasons of implicit Unified Shared Memory data movements between Host and GPU causing performance inefficiencies in SYCL* applications. It also correlates the data transfers with compute tasks execution on GPU.
Intel® VTune™ Profiler 2024.1.0 adds support for .NET 8, Ubuntu 23.10 and FreeBSD 14.0.

Intel® Advisor 2024.1.0

Intel® Advisor 2024.1.0 adds stability, quality improvements and better performance of CPU and GPU Roofline Analysis.

Intel® oneAPI Threading Building Blocks 2021.12.0

Intel® oneAPI Threading Building Blocks 2021.12.0 provides several improvements and bug fixes

Intel® Integrated Performance Primitives 2021.11.0

Added the verification part of post-quantum eXtended Merkle Signature Scheme (XMSS) algorithm as a tech preview feature.
Added FIPS-compliance mode for the library (open-source distribution). More information can be found in the Intel(R) IPP Cryptography FIPS Guide.
The version of LZ4 (lossless data compression algorithm) in IPP has been updated to v1.9.4.
IPP NuGet packages have been improved to support .NET Standard 2.0 which allows them to be used in .NET projects (useful for .NET developers.)

Intel® oneAPI Collective Communications Library 2021.12.0

The 2024.1 update to oneCCL delivers even more performance for distributed Deep Learning and Machine Learning Training and Inference workloads. All key communication patterns have been further optimized to not only speed up message passing but also to do so in a memory efficient manner. This release in particular improves Inference performance.

Intel® oneAPI Data Analytics Library 2024.2.0

Speed up gradient boosting inference across XGBoost, LightGBM, and CatBoost* without sacrificing accuracy with new fast tree inference in Intel® oneAPI Data Analytics Library (oneDAL).
oneDAL improves clustering by adding spare K-Means support to automatically identify a subset of the features to use in clustering observations and improving K-Means performance.

Intel® oneAPI Deep Neural Networks Library 2024.1.0

Improved performance on Intel Architecture, Graphics, Aarch64-based processors.
Introduced GPT-Q support to improve Large Language Models (LLMs) performance with compressed weights. Optimized implementation is available for Intel Graphics Products and support matmul with int8 wight compression.
Introduced fp8 data type support in primitives and Graph API. Optimized implementation is available for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
Introduced support for fp16 and bf16 scale and shift arguments for layer normalization. Optimized implementation is available for Intel Graphics Products.
Added opt-in deterministic mode support. Deterministic mode guarantees that results are bitwise identical between runs in a fixed environment.
For Intel Graphics Products, introduced PReLU post-op support for inner product and matmul primitives.

Deprecation Notices

Intel® Fortran Compiler Classic (ifort) is now deprecated and will be discontinued in late 2024. Intel recommends that customers transition now to using the LLVM-based Intel® Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations.
For more information on ifx, see the Intel® Fortran Compiler Developer Guide and Reference and the Porting Guide for ifort Users to ifx.
Flow Graph Analyzer feature of Intel Advisor will be discontinued in 2025 or later. Customers who have purchased Intel® Priority Support will continue to receive support.
32-bit support for Intel® Analyzers is deprecated and will be discontinued in the 2024.2 release.
Fedora Linux version 38 support on CPU is deprecated with 2024.1 and will be discontinued in a future release.

Intel® oneAPI Base Toolkit 2024.0.1

Toolkit Level Updates

Intel oneAPI Base Toolkit 2024.0.1 now includes recent component patch releases.
Patches are built on top of previous patch releases as needed.

Intel® oneAPI DPC++ Compiler 2024.0.2

Minor bug fixes

Intel® Integrated Performance Primitives Cryptography 2021.9.1

This patch release fixes an algorithmic issue in the AES-XTS Intel® Advanced Vector Extensions 512 (Intel® AVX-512) code path.

Intel® oneAPI Collective Communications Library 2021.11.2

This update provides bug fixes to maintain driver compatibility for Intel® Data Center GPU Max Series.

Intel® oneAPI Data Analytics Library 2024.0.1

New features and bug fixes. See the Release Notes for more information.

Intel® oneAPI Base Toolkit 2024.0

Toolkit Level Updates

Directory layout is improved across all products to streamline installation and setup. The Unified Directory Layout is implemented in 2024.0. If you have multiple toolkit versions installed, the Unified layout ensures that your development environment contains the correct component versions for each installed version of the toolkit. The directory layout used before 2024.0, the Component Directory Layout, is still supported on new and existing installations.
For detailed information about the Unified layout, including how to initialize the environment and advantages with the Unified layout, refer to Use the setvars and oneapi-vars Scripts with Linux and Use the setvars and oneapi-vars Scripts with Windows.
Intel® developer tools now maximize performance on upcoming Intel 5th Generation Xeon Scalable Processors (formerly known as Emerald Rapids) and the Intel® Core™ Ultra processors (formerly known as Meteor Lake)
Improved installation directory layout provides faster environment setup.
Intel® oneAPI DPC++/C++ Compiler fully implements the SYCL 2020 specification, improving developer productivity and boosts CPU and GPU offload performance, and enhances OpenMP 5.0, 5.1, 5.2 standards compliance
Intel® oneAPI DPC++/C++ Compiler adds popular LLVM sanitizers to easily catch C++, SYCL, OpenMP address, memory leak, uninitialized memory, thread data races, deadlocks and undefined behavior on CPU.
Several advanced preview features are available for evaluation including C++ parallel STL for easy GPU offload, dynamic device selection to optimize compute node resource usage, SYCL graph for reduced GPU offload overhead, and thread composability to prevent thread oversubscription between oneTBB and OpenMP.
Starting with this release, the Intel Level Zero and OpenCL™ GPU driver exposes each GPU tile of the Intel® Data Center GPU Max Series differently, which also affects the way these devices are exposed in SYCL and OpenMP. Prior to this change, each card was exposed as a root device and tiles were exposed as sub-devices. Now, each tile is exposed as a root device by default. This also affects how root devices can be partitioned into sub-devices. The old behavior can be enabled via the ZE_FLAT_DEVICE_HIERARCHY environment variable. As a result, use of the environment variables ONEAPI_DEVICE_SELECTOR and ZE_AFFINITY_MASK may need to change because the number of root devices and the availability of sub-devices is different than in prior releases. Refer to the GPU optimization guide and the article Options for using a GPU Tile Hierarchy for more details.

Intel® oneAPI DPC++ Compiler 2024.0.1

SYCL Bindless textures are fixed and now work correctly on NVidia® hardware via the Codeplay® NVidia® plugin.
The OpenMP* runtime library is updated to support Intel® Core™ Ultra devices.

Intel® oneAPI DPC++ Compiler 2024.0.0

Intel® oneAPI DPC++/C++ Compiler implements the SYCL 2020 specification, improving developer productivity and boosts CPU and GPU offload performance, and enhances OpenMP 5.0, 5.1, 5.2 standards compliance
The compiler adds popular LLVM sanitizers to easily catch C++, SYCL, OpenMP address, memory leak, uninitialized memory, thread data races, deadlocks and undefined behavior on CPU.
The compiler significantly improves developer productivity by adding an easier way to adapt C++ code using virtual functions to run with SYCL device offload, improved error messaging and error handling for SYCL and OpenMP code.

Intel® oneAPI DPC++ Library 2022.3.0.0

Dynamic device selection technology preview to choose between round robin, load based and auto tune policy to schedule work on available compute devices
Improved performance of merge, sort, stable_sort, and sort_by_key, reduce, min_element, max_element, minmax_element, is_partitioned, and lexicographical_compare algorithms with DPC++ execution policies.

Intel® DPC++ Compatibility Tool 2024.0.0

Improved migration coverage for cuBlase, cuSolver, cuDNN, NCCL, CUB, Thrust, CUDA Math APIs.
Enhanced performance portability of auto migrated SYCL code demonstrated by Velocity Bench Application Suite on Intel® Data Center Max Series GPU.

Intel® oneAPI Math Kernel Library 2024.0.0

Integrates Vector Math optimizations into Random Number Generators for high performance computer simulations, statistical sampling, and other areas on x86 CPUs and Intel GPUs.
Supports Vector Math for FP16 datatype on Intel GPUs.
Delivers high-performance benchmarks HPL and HPL-AI optimized for Intel® Xeon® CPU Max Series and Intel Data Center GPU Max Series.
oneMKL SYCL library binary partitioning resulting smaller shared objects footprint for applications that use subdomains of oneMKL.

Intel® Distribution for GDB* 2024.0.0

Intel Distribution for GDB* enhances the developer experience, both in the command line and when using Microsoft* Visual Studio and Visual Studio Code* by boosting the debugger performance, refining the user interface, and streamlining the debugging process. These improvements allow developers to efficiently debug code for CPUs and GPUs, improving their overall experience.
Intel Distribution for GDB* provides an improved scheduler locking mechanism with greater control, allowing users to fine-tune the scheduler lock. This enables developers to tailor the lock's behavior according to their preferences and specific debugging needs.

Intel® VTune™ Profiler 2024.0.0

Intel® VTune™ Profiler implements support for pinpointing performance bottlenecks for the applications running on Intel® Core™ Ultra and 5th Gen Intel® Xeon® processors.
Intel® VTune™ profiles the code offloaded to NPU (Neural Processor Units) on Intel® Core Ultra procdssors. It helps understand how much data is transferred from NPU to DDR memory and identify the most time-consuming tasks running on NPU. This feature is currently in technical preview.
ntel® VTune™ Profiler provides understanding of cross GPU traffic and bandwidth through Xe Link for each node using Application Performance Snapshot.

Intel® Advisor 2024.0.0

Intel® Advisor adds support for the 4th Gen Intel® Xeon® Scalable Processors with support for FP16 and BF16 extensions and AMX profiling.
Intel® Advisor adds the capability to profile Python code to understand application performance headroom against hardware limitations with an automated Roofline analysis.
Intel® Advisor supports application performance characterization, such as bandwidth sensitivity, instruction mix, and cache-line use, for multiple GPUs, multi-tile architectures, Vector Neural Network Instructions (VNNI) Support and Instruction Set Analysis or ISA support.

Intel® oneAPI Threading Building Blocks 2021.11.0

Enable Intel® oneAPI Threading Building Blocks (oneTBB) to be compiled on WebAssembly (WASM) using Emscripten. This would facilitate use of oneTBB by applications that run on web browser.
New oneTBB Thread Composability Manager feature provides higher flexibility and workload performance when nesting oneTBB and OpenMP threads

Intel® Integrated Performance Primitives 2021.10.0

Faster performance seen in Intel® IPP ‘s image domain with AVX512-VNNI based optimizations for RGB to XYZ color conversion
Faster performance seen with Intel® AVX-512 optimizations for Intel® IPP ‘s signal processing domain statistical function L2 Norm
Other bug fixes & security enhancements
Intel® IPP Cryptography introduced VAES AVX2 optimizations for AES-GCM algorithm and AVX-512 optimizations for RSA algorithm that help users securely transmit data faster

Intel® oneAPI Collective Communications Library 2021.11.1

Provides improved stability for distributed Training and Inference workloads on Intel® Data Center GPU Max Series.

Intel® oneAPI Collective Communications Library 2021.11.0

Intel® oneAPI Collective Communications Library added point to point blocking communication operations for send and receive.
Implemented performance optimizations for Reduce-Scatter.
Improved profiling with Intel® Instrumentation and Tracing Technology (ITT) profiling level.
Directory layout changes for oneAPI V2

Intel® oneAPI Data Analytics Library 2024.0.0

oneDAL is now integrated with Microsoft®'s open source ML.Net machine learning framework to build and ship machine learning models.
oneDAL deprecates legacy daal_sycl (data analytics acceleration library) APIs.

Intel® oneAPI Deep Neural Networks Library 2024.0.0

Reduced footprint with the new directory layout.
4th Gen Intel Xeon support for s8s8 in AVX2-VNNI for fast conversion between fp32↔fp16/bf16 and next gen Intel Xeon support for fp16-AMX instruction set.
Default support for graph compiler features like fusion, dynamic shapes, and InstanceNorm/LayerNorm.
Compiler Xbyak backend which improves code generation.
Improved performance for sparse_tensor_dense_matmul() on Intel Xeon with TF2.5 and oneDNN

Deprecation Notices

Intel® Fortran Compiler Classic (ifort) is now deprecated and will be discontinued in late 2024. Intel recommends that customers transition now to using the LLVM-based Intel® Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations.
For more information on ifx, see the Intel® Fortran Compiler Developer Guide and Reference and the Porting Guide for ifort Users to ifx.
The following OS support is deprecated with 2024.0 and will be discontinued in a future release.
- CPU:
  - SUSE Linux Enterprise Server (SLES) version 15 SP3
  - Ubuntu Linux version 20.04
  - Fedora Linux version 37
  - Debian Linux version 11
  - Amazon Linux version 2022
- GPU:
  - Red Hat Enterprise Linux (RHEL) version 8.6

Installation Instructions

Please visit Installation Guide for Intel oneAPI Toolkits

How to Start Using the Tools

Please reference:

Known Issues, Limitations and Workarounds

Known Issue: The modulefiles included with Intel® VTune™ Profiler and Intel® Advisor incorrectly derive the component root path as "/". Workarounds for this issue include:
- For Intel VTune Profiler:
  1. Click here to download the fixed tcl file and replace the existing 2024.0 file located in <install-dir>/vtune/2024.0/etc/modulefiles/vtune/
  2. Instead of using "module load" to set up the environment variables, run:
    
    $ source <install-dir>/vtune/latest/vtune-vars.sh
- For Intel Advisor:
  1. Click here to download the fixed tcl file and replace the existing 2024.0 file located in <install-dir>/advisor/2024.0/etc/modulefiles/advisor/
  2. Instead of using "module load" to set up the environment variables, run:
    
    $ source <install-dir>/advisor/latest/advisor-vars.sh
Known Issue: There is a known issue integrating Intel software developer tools (Intel® oneAPI Base Toolkit, Intel® HPC Toolkit, or their component products) into Microsoft Visual Studio* 2022 (17.7 or higher) on offline systems with the Windows Performance Toolkit (Win11SDK_WindowsPerformanceToolkit) installed. This results in an incomplete integration. To work around the issue, either enable an Internet connection during Intel developer tools installation or uninstall the Windows Performance Toolkit before installing Intel developer tools, after Intel developer tools are installed, reinstall the Windows Performance Toolkit.
Known Issue: When using Intel® oneAPI DPC++/C++ Compiler on a Linux machine, users may run into an issue if the highest version of GNU gcc detected doesn't have the equivalent g++ package installed. More details on the error and workarounds can be found here.
Please read the whitepaper on Challenges, tips, and known issues when debugging heterogeneous programs using DPC++ or OpenMP offload
Limitations
1. Running any GPU code on a Virtual Machine is not supported at this time.
2. If you have chosen to download the Get Started Guide to use offline, viewing it in Chrome may cause the text to disappear when the browser window is resized. To fix this problem, resize your browser window again, or use a different browser.
3. Eclipse* 4.12: the code sample project created by IDE plugin from Makefile will not build. It is a known issue with Eclipse 4.12. Please use Eclipse 4.9, 4.10 or 4.11.

Release Notes for All Tools included in Intel® oneAPI Base Toolkit

Previous oneAPI Releases

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

选择您的语言

使用 Intel.com 搜索

快速链接

最近搜索

高级搜索

仅搜索

Intel® oneAPI Base Toolkit Release Notes

System Requirements

Intel® oneAPI Base Toolkit Major Component Versions

New in Intel® oneAPI Base Toolkit 2024.1.0

Toolkit Level Updates

Intel® oneAPI DPC++ Compiler 2024.1.0

Intel® oneAPI DPC++ Library 2022.5.0

Intel® DPC++ Compatibility Tool 2024.1.0

Intel® oneAPI Math Kernel Library 2024.1.0

Intel® Distribution for GDB* 2024.1.0

Intel® VTune™ Profiler 2024.1.0

Intel® Advisor 2024.1.0

Intel® oneAPI Threading Building Blocks 2021.12.0

Intel® Integrated Performance Primitives 2021.11.0

Intel® oneAPI Collective Communications Library 2021.12.0

Intel® oneAPI Data Analytics Library 2024.2.0

Intel® oneAPI Deep Neural Networks Library 2024.1.0

Deprecation Notices

Intel® oneAPI Base Toolkit 2024.0.1

Intel® oneAPI Base Toolkit 2024.0

Installation Instructions

How to Start Using the Tools

Known Issues, Limitations and Workarounds

Release Notes for All Tools included in Intel® oneAPI Base Toolkit

Previous oneAPI Releases

Notices and Disclaimers

产品和性能信息