Part 4: Accelerate Scikit-learn*

Intel® Distribution for Python* provides scikit-learn* for machine learning workflows. Learn how to accelerate key algorithms in this library.

Intel® Distribution for Python*: Forum 

Downloads: Docker Hub* | Anaconda*

Intel® Data Analytics Acceleration Library: Home | Forum | Overview

Intel® Math Kernel Library: Home | Forum

Intel® VTune™ Amplifier: Home | Forum

YouTube* Playlist

Subscribe to the Intel® Software Channel on YouTube

 

Hi, I'm Frank Schlimbach. I'm going to talk about how Intel makes your scikit-learn* faster with the Intel® Math Kernel Library and Intel® Data Analytics Acceleration Library. Stay here to learn more. Also, follow the links below for more information.

[MUSIC PLAYING]

With Intel® Distribution for Python*, we provide performance optimized Python packages. You know our latest release, scikit-learn*, got another performance boost by our highly optimized compute engine, [Intel® Data Analytics Acceleration Library] Intel® DAAL. Previous versions of Intel scikit-learn [sic] already show decent speedups over standard versions, such as packages delivered by [INAUDIBLE] Pythons. Scikit-learn uses NumPy and ScyPi for its compute kernels and by accelerating NumPy, we were able to achieve significant performance gains in scikit-learn without even touching its code.

Our version of NumPy uses [Intel® Math Kernel Library] Intel® MKL internally so it gets best in class performance. The speedups achievable with accelerated NumPy range from a few percent to factors up to eight. In our latest release, we further optimized selected kernels from scikit-learn by using Intel DAAL, which is also a specialized performance library.

Intel DAAL provides highly optimized building blocks needed to build your analytics pipeline and machine learning algorithms. It not only covers the core functionality like analysis, decision-making, and modeling, but also IO and data manipulation. The algorithms we currently support now show extreme speedups over the previous version. The performance is now close to native DAAL [sic] performance, which can be considered as best in class.

Scikit-learn is a mature Python package with hundreds of algorithms with different configuration parameters each. DAAL [sic] has a different set of algorithms and sometimes implementations use slightly different variants of the algorithm. To make sure the use of optimized DAAL gives valid results, we make sure that only mathematically equivalent implementations are used from DAAL [sic]. Configurations without an equivalent in DAAL [sic] will fall back to scikit-learn's only limitation. Additionally, we allow easy, on the fly enabling and disabling these DAAL [sic] optimizations. This is done by simply calling enable or disable, and can be applied to each algorithm individually.

Last, but not least, I'd like to mention that DAAL [sic] also comes with its own Python API, which lets you utilize its full power directly. It operates with other Python packages through NumPy arrays. So you can easily combine it with anything that also works with NumPy arrays. Of course, scikit-learn is one of these.

Thanks for watching. To learn more, or access anything discussed in this video, follow the links below.