Release Notes for Intel® oneAPI Data Analytics Library

ID 标签 763738
已更新 1/26/2024
版本 2024.1.0
公共

author-image

作者

This article includes the Release Notes for Intel® oneAPI Data Analytics Library (oneDAL)

Version History

Document revision Date Change History
2024.1.0 2024-01-26 2024.1.0  Release Update
2024.0.1 2023-12-20 2024.0.1 Release Update
2024.0 2023-11-17 2024.0 Release Update
2023.2 2023-7-13 2023.2 Release Update
2023.1 2023-3-30 2023.1 Release Update
2023.0 2022-12-16 2023.0 Release Update
2022.3.1 2022-11-10 2022.3.1 Release Update
2022.3 2022-9-27 2022.3 Release Update
2022.2 2022-4-13 2022.2 Release Update
2022.1 2021-12-7 2022.1 Release Update
2021.4 2021-9-29 2021.4 Release Update
2021.3 2021-6-22 2021.3 Release Update
2021.2 2021-3-29 2021.2 Release Update
2021.1 2020-12-07 2021.1 Release Update

Overview

oneDAL is the library of Intel® architecture optimized building blocks covering all stages of compute-intense data analytics: data acquisition from a data source, preprocessing, transformation, data mining, modeling, validation, and decision making.

System Requirements

Please see dedicated system requirements article.

2024.1.0

 What's New

  • New oneDAL functionality:
    • Enabled distributed computations for LogisticRegression algorithm
    • Basic statistics algorithm for sparse data
  • Added new parameters to oneDAL algorithms:
    • Bias parameter to Covariance algorithm
  • Improved oneDAL performance for the following algorithms:
    • DBSCAN
    • Distributed version of kNN
  • New Intel® Extension for Scikit-learn* functionality:
    • SHAP support for symmetric CatBoost models
    • Added oneDAL LinReg and Covariance hyperparameters API
    • Added LogisticRegression interface to preview section
    • Initial support of n_jobs parameter

What's New

  • Introduced new Intel® oneDAL functionality:
    • Distributed Linear Regression, kNN, PCA algorithms
  • Introduced new functionality for Intel® Extension for Scikit-learn:
    • Enabled PCA, Linear Regression, Random Forest algorithms and SPMD policy as preview
    • Scikit-learn 1.2 support
    • sklearn_is_patched() function added to validate status of algorithms patching
  • Improved performance for the following Intel® Extension for Scikit-learn algorithms:
    • t-SNE for “Burnes-Hut” algorithm
    • SVM algorithm for single row inference

Known Issues 

  • In certain conditions DAAL SYCL interface might hang with L0 backend – please use oneDAL DPC interfaces instead. If older interfaces are required OpenCL backend can be used as workaround.

Library Engineering

  • Reduced the size of Intel® oneDAL library by approximately ~30%
  • Enabled NuGet distribution channel for Intel® oneDAL on Linux and MacOS

Support Materials

The following additional materials were created:  

Deprecation Notice 

  • DAAL data compression functionality is deprecated and would be removed in 2024.0 release
  • oneDAL make and Visual studio examples are deprecated – please use CMake based examples instead
  • DAAL cpp_sycl interfaces are deprecated and would be removed in 2024.0 release

What's New

  • Introduced new Intel® oneDAL functionality: 
    • DPC++ interface for Linear Regression algorithm

Known Issues

  • Intel® Extension for Scikit-learn SVC.fit and KNN.fit do not support GPU
  • Most Intel® Extension for Scikit-learn sycl examples fail when using GPU context
  • Running the Random Forest algorithm with versions 2021.7.1 and 2023.0 of scikit-learn-intelex on the 2nd Generation Intel® Xeon® Scalable Processors, formerly Cascade Lake may result in an 'Illegal instruction' error.

    • No workaround is currently available for this issue.

    • Recommendation: Use an older version of scikit-learn-intelex until the issue is fixed in a future release.

Deprecation Notice

  • The sequential version of oneDAL was deprecated starting the 2023.0 version. Please use TBB capabilities to limit the thread count if execution on single core is required
  • Intel® oneAPI Data Analytics Library KDB* Samples were deprecated in the open source distribution
  • Intel® oneAPI Data Analytics Library Hadoop* Samples on macOS were deprecated in the open source distribution
  • Intel® oneAPI Data Analytics Library Spark* Samples on macOS were deprecated in the open source distribution

What's New

  • Get more functionality and productivity for Intel® Extension for Scikit-learn with Minkowski and Chebyshev distances in kNN and acceleration of the t-SNE algorithm.
  • For oneDAL, take advantage of the new LinReg algorithm and distributed PCA algorithm.
  • This release is immediately available through the Intel® Developer Zone. It will be available through repositories at a later date.

Deprecation Notice

zlib and bzip2 methods of compression were deprecated. They are dispatched to the lzo method starting the 2022.3.1 version.

There are no updates for the 2022.3 release. Please refer to the 2022.2 release notes.

Library Engineering  

  • Reduced the size of oneDAL python run-time package by approximately 8%
  • Added Python 3.10 support for daal4py and Intel(R) Extension for Scikit-learn packages

Support Materials  

Created Kaggle kernels for Intel® Extension for Scikit-learn:

What's New  

  • Improved performance of oneDAL algorithms:
    • Optimized data conversion for tables with column-major layout in host memory to tables with row-major layout in device memory
    • Optimized the computation of Minkowski distances in brute-force kNN on CPU
    • Optimized Covariance algorithm
    • Added DPC++ column-wise atomic reduction
  • Introduced new oneDAL functionality:
    • KMeans distributed random dense initialization
    • Distributed PcaCov
    • sendrecv_replace communicator method
  • Added new parameters to oneDAL algorithms:
    • Weights in Decision Forest for CPU
    • Cosine and Chebyshev distances for KNN on GPU
  • Improved performance for the following Intel® Extension for Scikit-learn algorithms:
    • t-SNE for “Burnes-Hut” algorithm
  • Introduced new functionality for Intel® Extension for Scikit-learn:
    • Manhattan, Minkowski, Chebyshev and Cosine distances for KNeighborsClassifier and NearestNeighbors with “brute” algorithm
  • Fixed the following issues in Intel® Extension for Scikit-learn:
    • An issue with the search of common data type in pandas DataFrame
    • Patching overhead of finiteness checker for specific small data sizes
    • Incorrect values in a tree visualization with plot_tree function in RandomForestClassifier
    • Unexpected error for device strings in {device}:{device_index} format while using config context
  • The sequential version of oneDAL will be deprecated starting in the next release

The release introduces the following changes: 

Library Engineering

  • Reduced the size of oneDAL library by approximately ~15%.

Support Materials

The following additional materials were created:

What's New

  • Introduced new oneDAL functionality: 
    • Distributed algorithms for Covariance, DBSCAN, Decision Forest, Low Order Moments
    • oneAPI interfaces for Linear Regression, DBSCAN, KNN
  • Improved error handling for distributed algorithms in oneDAL in case of compute nodes failures
  • Improved performance for the following oneDAL algorithms:
    • Louvain algorithm
    • KNN and SVM algorithms on GPU
  • Introduced new functionality for Intel® Extension for Scikit-learn: 
    • Scikit-learn 1.0 support
  • Fixed the following issues:
    • Stabilized the results of Linear Regression in oneDAL and Intel® Extension for Scikit-learn
    • Fixed an issue with RPATH on MacOS 

The release introduces the following changes: 

Library Engineering

  • Introduced new functionality for Intel® Extension for Scikit-learn*:
    • Enabled patching for all Scikit-learn applications at once:
    •  Added the support of Python 3.9 for both Intel® Extension for Scikit-learn and daal4py. The packages are available from PyPI and the Intel Channel on Anaconda Cloud.
  • Introduced new oneDAL functionality:
    • Added pkg-config support for Linux, macOS, Windows and for static/dynamic, thread/sequential configurations of oneDAL applications.
    • Reduced the size of oneDAL library by approximately ~30%.

Support Materials

The following additional materials were created:

What's New

  • Introduced new oneDAL functionality: 
    • General:
      • Basic statistics (Low order moments) algorithm in oneDAL interfaces
      • Result options for kNN Brute-force in oneDAL interfaces: using a single function call to return any combination of responses, indices, and distances
    • CPU:
      • Sigmoid kernel of SVM algorithm
      • Model converter from CatBoost to oneDAL representation
      • Louvain Community Detection algorithm technical preview
      • Connected Components algorithm technical preview
      • Search task and cosine distance for kNN Brute-force
    • GPU:
      • The full range support of Minkowski distances in kNN Brute-force
  • Improved oneDAL performance for the following algorithms:
    • CPU:
      • Decision Forest training and prediction
      • Brute-force kNN
      • KMeans
      • NuSVMs and SVR training
  • Introduced new functionality in Intel® Extension for Scikit-learn:
    • General:
      • Enabled the global patching of all Scikit-learn applications
      • Provided an integration with dpctl for heterogeneous computing (the support of dpctl.tensor.usm_ndarray for input and output)
      • Extended API with set_config and get_config methods. Added the support of target_offload and allow_fallback_to_host options for device offloading scenarios
      • Added the support of predict_proba in RandomForestClassifier estimator
    • CPU:
      • Added the support of Sigmoid kernel in SVM algorithms
    • GPU
      • Added binary SVC support with Linear and RBF kernels
  • Improved the performance of the following scikit-learn estimators via scikit-learn patching:
    • SVR algorithm training
    • NuSVC and NuSVR algorithms training
    • RandomForestRegression and RandomForestClassifier algorithms training and prediction
    • KMeans
  • Fixed the following issues:
    • General:
      • Fixed an incorrectly raised exception during the patching of Random Forest algorithm when the number of trees was more than 7000.
    • CPU:
      • Fixed an accuracy issue in Random Forest algorithm caused by the exclusion of constant features.
      • Fixed an issue in NuSVC Multiclass.
      • Fixed an issue with KMeans convergence inconsistency.
      • Fixed incorrect work of train_test_split with specific subset sizes. 
    • GPU:
      • Fixed incorrect bias calculation in SVM.

Known Issues

  • GPU:
    • For most algorithms, performance degradations were observed when the 2021.4 version of Intel® oneAPI DPC++ Compiler was used. 
    • Examples are failing when run with Visual Studio Solutions on hardware that does not support double precision floating-point operations.

The release introduces the following changes: 

Library Engineering

  • Introduced a new Python package, Intel® Extension for Scikit-learn*. The scikit-learn-intelex package contains scikit-learn patching functionality that was originally available in daal4py package. All future updates for the patches will be available only in Intel® Extension for Scikit-learn. We recommend using scikit-learn-intelex package instead of daal4py.
    • Download the extension using one of the following commands:
      • pip install scikit-learn-intelex
      • conda install scikit-learn-intelex -c conda-forge
    • Enable Scikit-learn patching:
      • from sklearnex import patch_sklearn
      • patch_sklearn()
  • Introduced optional dependencies on DPC++ runtime to daal4py. To enable DPC++ backend, install dpcpp_cpp_rt package. It reduces the default package size with all dependencies from 1.2GB to 400 MB.
  • •    Added the support of building oneDAL-based applications with /MD and /MDd options on Windows. The -d suffix is used in the names of oneDAL libraries that are built with debug run-time (/MDd).

Support Materials

The following additional materials were created:

What's New

  • Introduced new oneDAL and daal4py functionality: 
    • CPU:
      • SVM Regression algorithm
      • NuSVM algorithm for both Classification and Regression tasks
      • Polynomial kernel support for all SVM algorithms (SVC, SVR, NuSVC, NuSVR)
      • Minkowski and Chebyshev distances for kNN Brute-force
      • The brute-force method and the voting mode support for kNN algorithm in oneDAL interfaces
      • Multiclass support for SVM algorithms in oneDAL interfaces
      • CSR-matrix support for SVM algorithms in oneDAL interfaces
      • Subgraph Isomorphism algorithm technical preview
      • Single Source Shortest Path (SSSP) algorithm technical preview
  • Improved oneDAL and daal4py performance for the following algorithms:
    • CPU:
      • Support Vector Machines training and prediction
      • Linear, Ridge, ElasticNet, and LASSO regressions prediction
    • GPU:
      • Decision Forest training and prediction
      • Principal Components Analysis training 
  • Introduced the support of scikit-learn 1.0 version in Intel Extension for Scikit-learn. The 2021.3 release of Intel Extension for Scikit-learn supports the latest scikit-learn releases: 0.22.X, 0.23.X, 0.24.X and 1.0.X.
  • Introduced new functionality for Intel Extension for Scikit-learn:
    • General:
      • The support of patch_sklearn for all algorithms
    • CPU:
      • Acceleration of SVR estimator
      • Acceleration of NuSVC and NuSVR estimators
      • Polynomial kernel support in SVM algorithms
  • Improved the performance of the following scikit-learn estimators via scikit-learn patching:
    • SVM algorithms training and prediction
    • Linear, Ridge, ElasticNet, and Lasso regressions prediction
  • Fixed the following issues:
    • General:
      • Fixed binary incompatibility for the versions of numpy earlier than 1.19.4
      • Fixed an issue with a very large number of trees (> 7000) for Random Forest algorithm.
      • Fixed patch_sklearn to patch both fit and predict methods of Logistic Regression when the algorithm is given as a single parameter to patch_sklearn
    • CPU:
      • Improved numerical stability of training for Alternating Least Squares (ALS) and Linear and Ridge regressions with Normal Equations method
      • Reduced the memory consumption of SVM prediction
    • GPU:
      • Fixed an issue with kernel compilation on the platforms without hardware FP64 support

Known Issues

  • Intel® Extension for Scikit-learn and daal4py packages installed from PyPI repository can’t be found on Debian systems (including Google Collab). Mitigation: add “site-packages” folder into Python packages searching before importing the packages:

import sys  import os  import site  sys.path.append(os.path.join(os.path.dirname(site.getsitepackages()[0]), "site-packages")) 

The release introduces the following changes: 

Library Engineering

  • Enabled new PyPI distribution channel for daal4py:
    • Four latest Python versions (3.6, 3.7, 3.8, 3.9) are supported on Linux, Windows and MacOS.
    • Support of both CPU and GPU is included in the package.
    • You can download daal4py using the following command: pip install daal4py
  • Introduced CMake support for oneDAL examples

Support Materials

The following additional materials were created:

What's New

  •  Introduced new oneDAL and daal4py functionality:  
    • CPU:
      • Hist method for Decision Forest Classification and Regression, which outperforms the existing exact method
      • Bit-to-bit results reproducibility for: Linear and Ridge regressions, LASSO and ElasticNet, KMeans training and initialization, PCA, SVM, Logistic Regression, kNN Brute Force method, Decision Forest Classification and Regression
    • GPU:
      • Multi-node multi-GPU algorithms: K-means (batch and online), Covariance (batch and online), Low order moments (batch and online) and PCA
      • Sparsity support for SVM algorithm
  • Improved oneDAL and daal4py performance for the following algorithms:
    • CPU:
      • Decision Forest training Classification and Regression
      • Support Vector Machines training and prediction
      • Logistic Regression, Logistic Loss and Cross Entropy for non-homogeneous input types
    • GPU:
      • Decision Forest training Classification and Regression
      • All algorithms with GPU kernels (as a result of migration to Unified Shared Memory data management)
  • Reduced performance overhead for oneAPI C++ interfaces on CPU and oneAPI DPC++ interfaces on GPU
  • Added technical preview features in Graph Analytics:
    • CPU:
      • Local and Global Triangle Counting
  • Introduced new functionality for scikit-learn patching through daal4py:
    • CPU:
      • Patches for four latest scikit-learn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
      • Acceleration of roc_auc_score function
      • Bit-to-bit results reproducibility for: LinearRegression, Ridge, SVC, KMeans, PCA, Lasso, ElasticNet, tSNE, KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors, RandomForestClassifier, RandomForestRegressor
  • ​Improved performance of the following scikit-learn estimators via scikit-learn patching:
    • CPU:
      • RandomForestClassifier and RandomForestRegressor scikit-learn estimators: training and prediction
      • Principal Component Analysis (PCA) scikit-learn estimator: training 
      • Support Vector Classification (SVC) scikit-learn estimators: training and prediction
      • Support Vector Classification (SVC) scikit-learn estimator with the probability==True parameter: training and prediction
  • Fixed the following issues:
    • Scikit-learn patching:
      • Improved accuracy of RandomForestClassifier and RandomForestRegressor scikit-learn estimators
      • Fixed patching issues with pairwise_distances
      • Fixed the behavior of the patch_sklearn and unpatch_sklearn functions
      • Fixed unexpected behavior that made accelerated functionality unavailable through scikit-learn patching if the unput was not of float32 or float64 data types. Scikit-learn patching now works with all numpy data types.
      • Fixed a memory leak that appeared when DataFrame from pandas was used as an input type
      • Fixed performance issue for interoperability with Modin
  • daal4py:
    • GPU:
      • Fixed the crash of SVM and kNN algorithms on Windows
  • oneDAL:
    • CPU:
      • Improved accuracy of Decision Forest Classification and Regression
    • GPU:
      • Improved accuracy of KMeans algorithm
      • Improved stability of Linear Regression and Logistic Regression algorithms

​​Known Issues

  • oneDAL vars.sh script does not support kornShell

Getting Started Guide

Please refer to oneDAL Getting Started Guide

 

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.