Intel® oneAPI Math Kernel Library (oneMKL) Release Notes

发布日期: 07/03/2019  

最后更新日期: 03/31/2022

作者 Khang T Nguyen, Abhinav Singh


Where to Find the Release

Intel® oneAPI Math Kernel Library

New in This Release

NOTE for NuGet Package Manager Users: There will be a delay in providing oneMKL NuGet package for the version 2021.4 release. We are working to get package size that work within NuGet size limits.  Because of this, oneMKL packages for 2021.4 will not be available at the oneAPI version 2021.4 release.  We hope to have these uploaded soon. Please check back for information on these packages.  If you do not use NuGet package manager you are not affected.


  • Intel oneAPI Math Kernel Library, 2022.1.0 does not include the latest functional and security updates. Intel oneAPI Math Kernel Library, 2022.1.1 is targeted to be released in May 2022 and will include additional functional and security updates. Customers should update to the latest version as it becomes available.
  • After installing the oneAPI Base Toolkit 2022.1, compiling applications with Win32 platform settings that require oneAPI Math Kernel Library (oneMKL) will fail. 32-bit oneMKL on Windows* OS are provided separately as part of Intel® oneAPI Base Toolkit 32-bit package. It can be downloaded here as an add-on.



System Requirements  Bug Fix Log


  • BLAS

    • Extended C/Fortran OpenMP offload to support the OpenMP* 5.1 specification 
    • Enabled MKL_VERBOSE support for BLAS GPU functionality for DPC++ and OpenMP offload
  • Intel® Distribution for LINPACK* Benchmark

    • Continued performance enhancements for 3rd Generation Intel Xeon Scalable processors and the processor code named, “Sapphire Rapids”
  • Tranpose

    • Added new DPC++ API for omatadd_batch function
    • Enabled MKL_VERBOSE support for CPU for transpose domain

    • Improved performance for LU, batch strided LU solve and inverse on Intel GPU.
    • Improved performance for real precision divide-and-conquer SVD ({S,D}GESDD) on CPU.
    • Introduced multishift QZ algorithm from Netlib LAPACK 3.10.0; Integrated other minor bug fixes from Netlib LAPACK 3.9.1 – 3.10.
    • Modified a set of ?LAQR[0-5] computational kernels used for solving non-symmetric eigenvalue problems (?GEEV, ?GEES, ?GEEVX and ?GEESX) as was done in Netlib LAPACK 3.10. 
  • Sparse

    • Improved performance for sparse::gemm with col-major on all GPUs.
    • Extended the support for C OpenMP offload for MKL_SPARSE_?_MM with column-major layout
  • DFT

    • Relaxed padding requirement for complex-to-real out of place FFT on GPU.
  • Vector Math

    • Introduced _FTZDAZ_DEFAULT to accurately represent the default VML mode for C interface.
    • Improved the DPC++ interface to have configurable CPU fallback. It is enabled by default to be compatible with previous versions.
    • Improved cbrt, erf performance for Intel discrete GPUs
    • Improved accuracy for several functions (SPOW3O2/HA, DPOWX/EP, VSCOSD/LA, VSCOSD/EP, SLGAMMAF/HA, CDIV/EP, ZDIV/EP)
  • Vector Statistics

    • Introduced Device DPC++ APIs for Bernoulli distribution and mcg31m1 / mcg59 engines
    • Optimized Device DPC++ implementation for Gaussian distribution with sycl::vec<16> and mrg32k3a engine
    • Optimized CPU implementations for exponential, lognormal, Cauchy, Weibull, Rayleigh, Gumbel distributions for the vector length 1E5 and higher
  • Data Fitting

    • Introduced experimental DPC++ APIs with GPU support for linear / cubic Hermite splines, uniform / non-uniform partitions hints and construction / interpolation routines
  • Library Engineering

    • Introduced Single Dynamic Library linking mode support for applications using OpenMP offload for LAPACK, sparse BLAS (C only), Vector Statistics, DFTi and FFTW APIs.
    • Dispatch Half precision by default, the enable Macro is MKL_ENABLE_INSTRUCTIONS=AVX512_E4
    • Set threading layer to GNU OpenMP when gomp library is loaded, or set it to Intel threading by default.

Known Issues and Limitations

  • Workaround for sp2m offload example with asynchronous execution using OpenCL backend failure, use Level Zero backend (default) or run using synchronous mode with OpenCL backend.
  • Use Release mode as workaround for non-functional Debug mode trsv /sp2m offload APIs and sparse::trsv/sparse::matmat on Win32e.
  • Experimental Data Fitting DPC++ APIs are not supported using mkl_sycld.dll on Windows OS.
  • On GPUs lacking native double-precision support, non-batched and batched LAPACK functions {c,s}getr{f,s} via OpenMP offload with static linking may fail with an error that double type is not supported on this platform. As a workaround, use dynamic linking or use the -fsycl-device-code-split=per_kernel compilation flag.
  • Real-to-complex and complex-to-real FFT with non-unit stride might return incorrect results if strides are flipped when switching from forward to backward transform as recommended by the oneMKL documentation. As a workaround, do not flip the strides when switching from forward to backward transform.
  • Due to a DPC++ issue, when calling LAPACK DPC++ routines on the CPU device and using an in-order queue, application-side kernels may not wait for the LAPACK calculations to finish and may produce incorrect results. As a workaround, call wait() on the queue after the LAPACK call for explicit synchronization.
  • Lognormal<double> device random number distribution with philox4x32x10 engine may produce wrong results on Gen9 GPU in case of Windows OS and enabled /Od option.
  • Using multiple host threads using L0 backend can cause segmentation faults, exceptions, or other unexpected behavior.
  • Beta distribution with mt2203 generator may produce the wrong random sequence on Xe HPG in case of C OpenMP offload API, OpenCL backend and Linux OS. As a workaround, please use Level0 backend.
  • Mrg32k3a generator may produce the wrong sequence on Xe HPG for Linux OS.


System Requirements  Bug Fix Log


  • BLAS

    • Extended DPC++ support for in-place and out of place matrix copy/transposition. 
      • oneapi::mkl::blas::{i,o}matcopy_batch 

    • Enabled C/C++ OpenMP* offload support for getri_oop_batch. 
    • Improved performance of double precision, non-pivoting batch strided LU factorization on GPU. 
    • Improved performance of out-of-place batch strided LU inverse on GPU. 
    • Renamed the LAPACK DPC++ function getrfnp_batch_strided to getrfnp_batch.
  • Sparse

    • Enabled C/C++ OpenMP* offload support for mkl_sparse_sp2m support and mkl_sparse_?_export_csr.
    • Improved performance of DPC++ oneapi::mkl::sparse::matmat for small to medium sizes.
  • DFT

    • Enabled MKL_VERBOSE support on GPU devices for DFT DPC++ and C/C++/Fortran OpenMP* offload.
  • Vector Math

    • Improved performance and stability.
  • Library Engineering

    • Enabled support of lp64 & ilp64 BLAS and LAPACK interfaces in a single application.

Known Issues and Limitations

  • LAPACK functions {sy,he}{ev,evd,gvd,gvx,trd} and gesvd for single precision may work incorrectly on Intel® Iris® Xe MAX Graphics / Intel® UHD Graphics for Intel® Processor Graphics Gen11.
  • In certain cases, to avoid crashes, oneMKL may force synchronization within OpenMP* offload functionality even when nowait clause is provided. 
  • DPC++ matmat examples can sporadically produce a segmentation fault due to a one-off memory allocation error in the example code in case of one-based indexing.
  • Random number generators uniform_with_host_helpers device example may fail on Gen9 and DG1 GPUs in case of Windows OS and enabled /Od option. 
  • Sparse BLAS C OpenMP* offload in asynchronous execution mode fails for OpenCL backend. Use level0 backend instead (default).


  • Dropped support for  Intel® Xeon Phi™ Processor x200 “Knights Landing (KNL)” and Intel® Xeon Phi™ Processors “Knights Mill (KNM)”. AVX2 is still supported for this architecture.  
  • Dropped MPICH2 support on Windows*.  
  • Dropped SGI MPI support on Linux*. 
  • Deprecated support of Microsoft Visual Studio* 2017 version with this release.
  • Removed cvf (stdcall) interface.
  • Renamed "cl::sycl::vector_class" and "sycl::vector_class" to "std::vector" for input events in DPC++ USM API.
  • Changed data type “half” to “sycl::half”.

Previous oneAPI Releases


Release Notes  System Requirements  Bug Fix Log


Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.



性能因用途、配置和其他因素而异。请访问 了解更多信息。