Intel® Data Analytics Acceleration Library Release Notes and New Features

ID 标签 672212
已更新 3/20/2020
版本 Latest
公共

author-image

作者

This page provides the current Release Notes for Intel® Data Analytics Acceleration Library (Intel® DAAL). The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

Click a version to expand it into a summary of new features, changes, and known issues in that version since the last release, or the Release Notes link under each major release to see important information, such as pre-requisites, software compatibility, and installation instructions.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.

2020

Installation Guide | System Requirements | Get Started

Update 3

What's New in Intel®  DAAL 2020 Update3:

  • Introduced new Intel® DAAL and daal4py functionality:
    • Brute Force method for k-Nearest Neighbors classification algorithm, which for datasets with more than 13 features demonstrates a better performance than the existing K-D tree method
    • k-Nearest Neighbors search for K-D tree and Brute Force methods with computation of distances to nearest neighbors and their indices
  • Extended existing Intel® DAAL and daal4py functionality:
    • Voting methods for prediction in k-Nearest Neighbors classification and search: based on inverse-distance and uniform weighting
    • New parameters in Decision Forest classification and regression: minObservationsInSplitNode, minWeightFractionInLeafNode, minImpurityDecreaseInSplitNode, maxLeafNodes with best-first strategy and sample weights
    • Support of Support Vector Machine (SVM) decision function for Multi-class Classifier
  • Improved Intel® DAAL and daal4py performance for the following algorithms: 
    • Support Vector Machine training and prediction 
    • Decision Forest classification training
    • RBF and Linear kernel functions
  • Introduced new daal4py functionality:
    • Conversion of trained XGBoost* and LightGBM* models into a daal4py Gradient Boosted Trees model for fast prediction
    • Support of Modin* DataFrame as an input
  • Introduced new functionality for scikit-learn patching through daal4py:
    • Acceleration of KNeighborsClassifier scikit-learn estimator with Brute Force and K-D tree methods
    • Acceleration of RandomForestClassifier and RandomForestRegressor scikit-learn estimators
    • Sparse input support for KMeans and Support Vector Classification (SVC) scikit-learn estimators
    • Prediction of probabilities for SVC scikit-learn estimator
    • Support of ‘normalize’ parameter for Lasso and ElasticNet scikit-learn estimators
  • Improved performance of the following Intel scikit-learn algorithms and functions:
    • train_test_split()
    • Support Vector Classification (SVC) fit and prediction
       

Update 2

What's New in Intel®  DAAL 2020 Update2:

  • Introduced new functionality:
    • Thunder method for Support Vector Machine (SVM) training algorithm, which demonstrates better training time than the existing sequential minimal optimization method
  • Extended existing functionality:
    • Training with the number of features greater than the number of observations for Linear Regression, Ridge Regression, and Principal Component Analysis
    • New sample_weights parameter for SVM algorithm
    • New parameter in K-Means algorithm, resultsToEvaluate, which controls computation of centroids, assignments, and exact objective function
  • Improved performance for the following: 
    • Support Vector Machine training and prediction, Elastic Net and LASSO training, Principal Component Analysis training and transform, K-D tree based k-Nearest Neighbors prediction
    • K-Means algorithm in batch computation mode
    • RBF kernel function
  • Deprecated 32-bit support:
    • 2020 product line will be the last one to support 32-bit
  • Introduced improvements to daal4py library:
    • Performance optimizations for pandas input format
    • Scikit-learn compatible API for AdaBoost classifier, Decision Tree classifier, and Gradient Boosted Trees classifier and regressor
  • Improved performance of the following Intel Scikit-learn algorithms and functions:
    • fit and prediction in K-Means and Support Vector Classification (SVC), fit in Elastic Net and LASSO, fit and transform in PCA
    • Support Vector Classification (SVC) with non-default weights of samples and classes
    • train_test_split() and assert_all_finite()

Update 1

What's New in Intel®  DAAL 2020 Update1:

  • Introduced new functionality:
    • Elastic Net algorithm with L1 and L2 regularization in batch computation mode. The algorithm supports various optimization solvers that handle non-smooth functions.
    • Probabilistic classification for Decision Forest Classification algorithm with a choice voting method to calculate probabilities.
  • Extended existing functionality:
    • Performance optimizations for distributed Spark samples, K-means algorithm for some input dimensions, Gradient Boosted Trees training stage for large datasets on multi-core platforms and Decision Forest prediction stage for datasets with a small number of observations on processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
    • Performance optimizations across algorithms that use SOA (Structure Of Arrays) NumericTable as an input on processors that support Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

Initial Release

What's New in Intel®  DAAL 2020:

  • Introduced new functionality:
    • Probabilistic classification and variable importance computation for Gradient Boosted Trees.
    • Classification Stump with Information gain and Gini index split methods.
    • Regression Stump with MSE split method.
  • Extended existing functionality:
    • Decision Tree functionality supports weighted data.
    • AdaBoost algorithm now works with multiple classes.
    • AdaBoost multiclass algorithm is available with SAMME and SAMME.R methods. AdaBoost, BrownBoost, and LogitBoost work with algorithms that support weights.
  • Improved performance for LBFGS Optimization Solver.
  • Started Neural Network Deprecation:
    • Starting from Intel® DAAL 2020, Neural Networks will not have any new features and functionalities. The support will be completely discontinued from Intel® DAAL 2021. For more information, see the Deprecation Notes.

2019

Installation Guide | System Requirements | Get Started

Update 5

What's New in Intel® DAAL 2019 Update 5:

  • New algorithms were added:
    • DBSCAN (Density-Based Spatial Clustering of Applications with Noise), available in batch and distributed computation modes. Brute-force method for neighborhood computation is implemented.
    • LASSO (Least Absolute Shrinkage and Selection Operator) regression algorithm with L1 regularization in batch computation mode. Algorithm was designed with support various optimization solvers which handle non-smooth functions.
    • Coordinate Descent optimization solver in batch computation mode which handles non-smooth functions.
    • Model convertors for SVM, Gradient Boosted Trees, Decision Forest Classification, Multiclass Classification, Linear Regression and Logistic Regression algorithms. This new functionality allows to use Intel® DAAL prediction without Intel® DAAL training directly by converting user`s trained model.
  • Initial Apache Arrow support. New type of Numeric Tables were added to DAAL
  • The performance was improved for the following algorithms:
    • Decision Forest Classification prediction for the processors supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512).
    • Gradient Boosted Trees algorithm(prediction stage and model size).

Update 4

Release Notes

What's New in Intel® DAAL 2019 Update 4:

  • Introduced new distribution channel: NuGet* packages for Intel® DAAL.
  • Improved Gradient Boosted Trees training stage performance for large-dimensional data sets with inexact split mode.
  • Extended Z-Score by adding a new parameter "doScale". The feature is applicable for PCA algorithm with svdDense method.
  • Changed minimal supported Java™ Development Kit (JDK™) version from 7 to 8.
  • Fixed the issue with building an open source version of Intel® DAAL using Java 11.
  • The "parameter" field was replaced to "parameter()" method in the Batch class of Zscore normalization algorithm. See more details follow this KB article.
  • Made available building an open source version of Intel® DAAL with Intel® Compiler in the Intel® Parallel Studio XE 2019 Update 2 or later.

Update 3

Release Notes

What's New in Intel® DAAL 2019 Update 3:

  • Gradient Boosting training stage performance improvements for inexact split mode.
  • How to build reduced-size library on resource-constrained devices can be found here.
  • New parameter nTrials was introduced for K-means++ initialization. It allows to set number of trials to generate all clusters but the first initial cluster.
  • Improved performance for Cholesky algorithm
  • SAGA optimization solver is available to optimize non-smooth objective functions which are used for L1 regularized Logistic Regression, LASSO, ElasticNet algorithms.
  • Currently DAAL Logistic Regression with L1 penalty is supported by SAGA solver.
  • Intel offers several AI software tools and libraries on Amazon Web Services* Marketplace, including some of the most popular algorithmic optimizations from the Intel® Data Analytics Acceleration Library (Intel® DAAL) and from BigDL, a distributed deep learning library for Apache Spark*.

Deprecation Notice:

  • With the introduction of daal4py, a package that supersedes PyDAAL, Intel is deprecating PyDAAL and will discontinue support starting with Intel® DAAL 2021 and Intel® Distribution for Python 2021.Until then Intel will continue to provide compatible pyDAAL pip and conda packages for newer releases of Intel DAAL and make it available in open source. However, Intel will not add the new features of Intel DAAL to pyDAAL. Intel recommends developers switch to and use daal4py.

Known Issues:

  • For the open source version of Intel® DAAL, it is highly recommend that customers build DAAL with the Intel Compiler version not later than the one in the Intel® Parallel Studio XE 2019 Update 2. It will be fixed in the next release.
  • For the open source version of Intel® DAAL, building DAAL fails with java 11. It will be fixed in the next release.

*Other names and brands may be claimed as the property of others.

Update 2

Intel® Data Analytics Acceleration Library (Intel® DAAL) Update 2 includes functional and security updates. Users should update to the latest version.

Update 1

Release Notes

What's New in Intel® DAAL 2019 Update 1:

  • Improved performance of CSV Data Source, Low Order Moments, initialization stage of implicit ALS, K-means for CSR datasets for the processors supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512
  • LBFGS algorithm now supports automatic step-length selection on each iteration of this algorithm by line search to satisfy the Wolfe conditions if specific parameters were set.
    • Batch size b and correction pair batch size b_H are equal to the number of observations in the training set.
    • The L parameter is set to 1.
    • The α^tstep-length sequence is not specified.
  • Introduced support for MT2203 random number generators. Decision forest APIs changes that include:
    • API to pass an instance of random number generator to algorithm.
    • API to retrieve updated instance of random number generator after training stage that can be used in other computations.
  • Introduced limited^ FreeBSD support in DAAL Open Source version.
  • Windows version of DAAL static libraries are Universal Windows driver* compatible.

Known Issues

  • The DAAL functions dispatch Intel® AVX2 optimization code on the Intel® AVX-512 processor with macOS* systems. The problem will be fixed in future releases. Please contact the technical support if you need further help on this problem.

*Other names and brands may be claimed as the property of others.

Initial Release

Release Notes

What's New in Intel® DAAL 2019:

  • Implemented logistic regression classification algorithm.
  • Implemented cross-entropy and logistic loss objective functions.
  • Added new distribution model via Java MAVEN, which helps JAVA developers to simplify the process of getting and building projects with Intel® DAAL usage
  • Extended traversal API to return additional data from decision tree , decision forest, and gradient boosting models.
  • Boosted machine learning and data analytics performance in EM-GMM, sparse SVM training (linear kernel), logit boost training, gradient boosting prediction and others.
  • Enabled support for user-defined data modification procedure in CSV and ODBC data sources. This functionality provides capability to implement a wide range of feature extraction and transformation techniques on the user side.

Known Issues:

  • For the standalone version of Intel® DAAL on Linux, if you expect to use Intel® TBB, you will need to install the standalone version of Intel® TBB. This is a workaround. For more information, see this article
  • Backward compatibility is unreliable with dynamic linkage mode between DAAL2019 and DAAL2018u3. Workaround: User should in rebuild source code using DAAL2019 (or newer versions).

Deprecation Notes:

  • Deprecation of the following features:
    • 32bit library for Mac OS* is deprecated in the release and will be removed in next release. 
  • Removal of support for:
    • Installation on 32-bit hosts for all types of OS is no longer supported. However, the 32-bit library (for windows* and linux*) continues to exist, and can be used on 64-bit hosts.

2018

Installation Guide | System Requirements

Update 3

  • Bug fixes.

Update 2

  • Host application interface is added, which enables algorithm-level computation cancelling by user-defined callback. This interface is available in Decision Forest and Gradient Boosting Trees algorithms. New example code is provided.
  • New technical preview for experimental Intel DAAL and Intel DAAL extension library
    1. Introduced distributed k-Nearest Neighbors classifiers for both training and prediction. Included new sample that demonstrates how to use this algorithm with Intel® MPI Library.
    2. Developed experimental extension library on top of existing Intel DAAL Python* APIs that provides easy to use API for Intel® DAAL neural networks. This extension library supports configuring and training neural network models in a few lines of code, and allows use of existing TensorFlow* and Caffe* inference models.
  • Gradient Boosting Trees training algorithm has been extended with inexact splits calculation mode. It is applied to continuous features that are bucketed into discrete bins and the possible splits are restricted by the buckets borders.
  • Intel® Threading Building Blocks (Intel® TBB) dependency is removed in library sequential mode.

Known Issues

  • Online linear regression with QR method incorrectly merges results from continuous compute calls. The bug is expected to be fixed in a future release.
  • Categorical features processing bug in Gradient Boosting Trees algorithm. The bug is expected to be fixed in a future release.

Update 1

  • Introduced gradient boosted trees algorithm for classification and regression as stochastic gradient boosting machine with regularization and second order numerical optimization in training procedure (xgboost-like) and exact splits mode. The implementation employs multiple levels of parallelization in trees construction and prediction.
  • Developed experimental extension library on top of existing pyDAAL package that provides easy to use API for Intel® DAAL neural networks. Extension library allows to configure and train neural network model in few lines of code, and to use existing TensorFlow or Caffe models on inference stage.
  • Fixed issue in multi-class classifier so that it now supports other boosting binary classifiers in addition to SVM. Now boosting algorithm clones weak learner before using it, so different threads in multiclass classifier work with different weak learner objects.
  • Introduced new experimental distributed k – Nearest Neighbors classifiers for both training and prediction stages. Added new sample which demonstrates how to use this algorithm along with MPI. The experimental distributed kNN is available here.
  • Added support in PCA algorithm for wide matrices (number of rows is less than the number of columns) with correlation method.
  • Introduced new feature of optionally calculating results for means and variances of input data set in PCA algorithm. Added support of sign-deterministic output. Library is extended by PCA Transformation algorithm. This feature includes the PCA transformation of dataset with optional data normalization and data whitening. Introduced quality metrics for PCA: explained variances, explained variance ratios and noise variance.
  • Introduced new feature of optionally calculating results for means and variances of input data set in Zscore algorithm.

Initial Release

  • Introduced API modifications to streamline library usage and enable consistency across functionality.
  • Introduced support for Decision Tree for both classification and regression. The feature includes calculation of Gini index and Information Gain for classification, and mean squared error (MSE) for regression split criteria, and Reduced Error Pruning.
  • Introduced support for Decision Forest for both classification and regression. The feature includes calculation of Gini index for classification, variance for regression split criteria, generalization error, and variable importance measures such as Mean Decrease Impurity and Mean Decrease Accuracy.
  • Introduced support for varying learning rate in the Stochastic Gradient Descent algorithm for neural network training.
  • Introduced support for filtering in the Data Source including loading selected features/columns from CSV data source and binary representation of the categorical features.
  • Extended Neural Network layers with Element Wise Add layer.
  • Introduced new samples that allow easy integration of the library with Spark* MLlib.
  • Introduced service method for enabling thread pinning;Performance improvements in various algorithms on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server).

Known Issues:

  • Intel DAAL Python API (a.k.a. pyDAAL) is provided as source. When build it on Windows, users may see warning messages. These warning messages do not indicate critical issues and do not affect the library's functionality.
  • Intel DAAL Python API (a.k.a. pyDAAL) built from the source does not work on OS X* El Capitan (version 10.11). Workaround: Users can get the Intel® Distribution for Python* as an Anaconda package (http://anaconda.org/intel/), which contains a pre-built pyDAAL that works on OS X* El Capitan.

2017

Installation Guide | System Requirements | Bug Fix Log

Update 4

  • Small fixes for Python examples
  • Tune Microsoft Visual Studio solution for cpp examples: disable debug for release configurations; set start point for relative paths; add possibility to run examples from IDE
  • Enabled support for macOS with Xcode 8.3
  • Performance tuning for few algorithms to address previous degradation
  • Fixes in documentation

Update 3

  • Intel DAAL (on Linux and macOS) can now be installed directly from yum, apt, and conda repositories.
  • Bug fixes and performance improvements
  • Intel DAAL (for Linux and macOS) switched to the Apache License 2.0

Update 2

  • Lots of improvements for the neural networks API:
    • Added the transposed convolution layer
    • Added the reshape layer
    • Extended interface of loss softmax cross-entropy layer to support input tensors of arbitrary dimensions
    • Added sigmoid cross-entropy criterion
    • Added truncated Gaussian initializer for tensors
    • Extended support for distributed computing by adding the objective function with pre-computed characteristics
    • Improved performance of neural network layers used in topologies such as AlexNet
  • Added more samples to demonstrate the usage of this library. You can find and download the latest samples from: Intel® Data Analytics Acceleration Library Code Samples

Update 1

  • Added K-Nearest Neighbors (KNN) algorithm for batch computing mode
  • Added distributed processing mode for neural network training to support distributed parallel data processing
  • Introduced diagonal variance-covariance matrices in EM GMM and controls to treat degenerated covariance matrices
  • Introduced k-means++ and k-means|| initialization methods for K-Means clustering
  • Introduced the Gaussian initializer for neural network model parameters (weights and biases) initialization
  • Introduced min-max normalization algorithm
  • Added multiple ground truth tensors and multiple result tensors for neural networks training and inference stage, respectively
  • Added optional arguments and results in the SGD solver to enable computation resumption from a paused state
  • Added support for merging of the numeric tables by rows
  • Added support for symmetric and triangular packed numeric tables in Java
  • Performance improvements for the following functions:
    • Neural network training and inference, including support for batch mode on the inference stage
    • Local response normalization layer and 2D max pooling layer
    • Abs and Tanh backward layers
    • Cosine distance for result in lower triangular layout, correlation distance for result in full, lower- and upper triangular layouts
    • Lower order moments
    • z-score normalization
    • PCA
    • Kernel functions for CSR NumericTables
    • CSV feature manager
  • Bug fixes for the following components:
    • Multi-class classifier
    • IBFGS optimization solver
    • Documentation

Initial Release

  • Introducing Python programming language API
  • Introducing Neural Networks functionality
    • Uniform and Xavier initialization methods
    • Layers
      • Two-dimensional convolutional
      • One-, two-, and three-dimensional max pooling
      • One-, two-, and three-dimensional average pooling
      • Spatial pyramid pooling, stochastic pooling and locally connected layers
      • Fully connected
      • Dropout
      • Logistic
      • Hyperbolic tangent
      • Rectifier Linear Unit (ReLu)
      • Parametric Rectifier Linear Unit (pReLu)
      • Smooth Rectifier Linear Unit (smooth ReLu)
      • Softmax with cross-entropy loss
      • Absolute value (abs)
      • Batch normalization
      • Local response normalization
      • Local contrast normalization
      • Concat
      • Split
    • Optimization solvers
      • Stochastic gradient descent
      • Mini-batch stochastic gradient descent
      • Stochastic limited memory Broyden–Fletcher–Goldfarb–Shanno (lBFGS)
      • Mini-batch Adagrad optimization solver
    • Objective functions
      • Mean squared error (MSE)
    • Tensor: Support multiple data layouts, axes control, and computation of tensor size
    • Other: Support for user-defined memory allocation to store layer results in Neural Networks
  • Added Ridge Linear regression algorithm in batch/online/distributed processing mode
  • Added support for quality metrics for linear regression
  • Added z-score normalization
  • Improved performance for QR, SVD, PCA, variance-covariance, linear regression, Expectation Maximization (EM) for Gaussian Mixture Models (GMM), K-means, and the Naïve Bayes algorithms on the 2nd generation of Intel ® Xeon Phi™ processors (codenamed Knights Landing), as well as on the Intel® Xeon® E5-xxxx v3 (codenamed Haswell) and the Intel® Xeon® E5-xxxx v4 (codenamed Broadwell) processors.
  • Bug fixes and other improvements in the library and its documentation
  • Intel DAAL User's Guide and the API documentation are available for online browsing, and are removed from the installer packages
  • Intel DAAL samples are now available as online download and removed from the installer packages
  • Support removed for installation on IA-32 architecture hosts. The 32-bit library continues to exist and can be used on Intel® 64 architecture hosts.

2016

Installation Guide | System Requirements | Bug Fix Log

Update 4

  • Fixed bug in the SVM example with the size of training dataset
  • Fixed bug in C++ examples on OS X* 10.11.4
  • Other minor bug fixes and improvements in the documentation

Update 3

  • Fixed bug in the initialization of Expectation-Maximization algorithm
  • Added examples of using the CSR format of sparse matrices with kernel functions
  • Fixed bug in MPI samples of linear regression
  • Fixed memory leak in AOS NumericTables
  • Other minor bug fixes and improvements in the documentation

Update 2

  • Improved numerical stability and error handling for EM GMM algorithm.
  • Performance improvements for multi-class classifiers, SVM, kernel functions, Apriori, and ALS algorithms.
  • Introduced support for Sorting algorithm in batch processing mode.
  • Introduced support for CSR data layout format in the initialization phase of the KMeans algorithm.
  • Bug fixes and other improvements in the library and its documentation.

Update 1

  • Introduced support for Alternating Least Squares algorithm in batch and distributed processing modes.
  • Added support for compressed sparse row (CSR) sparse matrix storage format in Principal Component Analysis, Naïve Bayes and K-means algorithms.
  • Introduced new features in Data Management component:
    • Data loading from the Data Source into several numeric tables
    • Data loading with unknown number of feature vectors
    • Performance improvements in data serialization and deserialization
  • Bug fixes and other improvements in the library and its documentation. 

Initial Release

  • C++ and Java programming languages API.
  • Optimized performance for a range of Intel architectures, including Intel® Xeon®, Intel® Core™, and Intel® Atom™.
  • Data mining and analysis algorithms for
    • Computing correlation distance and cosine distance
    • PCA (Correlation, SVD)
    • Matrix decomposition (SVD, QR, Cholesky)
    • Computing statistical moments
    • Computing variance-covariance and correlation matrices
    • Computing quantiles
    • Univariate and multivariate outlier detection
    • Association rule mining
    • Linear and RBF kernel functions
  • Algorithms for supervised and unsupervised machine learning:
    • Linear regressions
    • Naïve Bayes classifier
    • AdaBoost, LogitBoost, and BrownBoost classifiers
    • SVM classifier
    • K-Means clustering
    • Expectation Maximization (EM) for Gaussian Mixture Models (GMM)
    • Support for validation metrics for classifiers including Confusion Matrix, Accuracy, Precision, Recall, and Fscore.
  • Support for batch, online, and distributed processing modes:
    • Algorithms supporting batch processing: All
    • Algorithms supporting online processing: Statistical moments, Variance-covariance matrix, Correlation matrix, SVD, QR, PCA, Linear regression, Naïve Bayes
    • Algorithms supporting distributed processing: Statistical moments, Variance-covariance matrix, Correlation matrix, SVD, QR, PCA, Linear regression, Naïve Bayes, K-Means
  • Support for local and distributed data sources:
    • In-file and in-memory CSV
    • MySQL
    • HDFS
    • Support for Resilient Distributed Dataset (RDD) objects for Apache Spark*.
  • Data compression and decompression:
    • ZLIB
    • LZO
    • RLE
    • BZIP2
  • Data serialization and deserialization.