Benchmarking Intel® Extension for Scikit-learn*: How Much Faster Is It?

Get the Latest on All Things CODE

author-image

作者

Scikit-learn is one of the most powerful Python libraries for machine learning (ML). It’s equipped with many tools for machine learning, including mathematical, statistical, and general-purpose algorithms. While Scikit-learn is fast, there’s always room for improvement to enable greater optimization and reduce execution time. Furthermore, Scikit-learn doesn’t natively support running over GPUs.

To address this, Intel developed the Intel® Extension for Scikit-learn*. It enhances performance and can improve your program speed from 10 to 100 times faster. The acceleration is accomplished by replacing Scikit-learn’s stock algorithms with versions that utilize vector instructions such as Intel® Advanced Vector Extensions (AVX-512), threading, and hardware-specific memory optimizations.

This extension provides better performance without relying on a different library, so you don’t need to change your code. Another benefit of using this extension is that it includes support for Intel’s oneAPI concepts, which means that your code can easily run on different devices like CPUs and GPUs.

To take advantage of these optimizations, just run the following command:

python -m sklearnex my_application.py

This will enable you to accelerate applications you’ve already made without editing the code or modifying your implementation.

This article will explore and compare the performance of the Intel Extension for Scikit-learn and benchmark it against the stock Scikit-learn library.

Benchmarking Intel Extension for Scikit-learn

The best way to determine an algorithm’s quality is via benchmarking, which we can perform by testing the algorithm on a predefined data set. We will use Intel’s Machine Learning Benchmarks library for benchmarking in this article.

Our test will perform K-Means fittings. K-means is one of the simplest unsupervised ML clustering mechanisms that iteratively partitions data into subgroups called clusters.

We’ll repeat the fitting 100 times and then calculate the average time per fitting, first executing our test without using the Intel Extension for Scikit-learn. Then, we’ll perform the patching and compare the results.

The environment setup is simple. Install scikit-learn.

pip install -U scikit-Learn

Next, install Intel Extension for Scikit-learn using the following command:

pip install scikit-learn-intelex

After cloning the benchmark library, we can create a new Python file called my_app.py. Then, within this Python file, we need to import a few libraries:

import bench
import argparse
import numpy as np
from sklearn.cluster import KMeans
from tqdm import tqdm

The tqdm library displays a progress bar on the command-line interface. If you don’t already have this library, use the following command to install it:

pip install tqdm

Next, we need to define a Python function called Kmeans_test. This test function will be used to run the K-Means fitting 100 times. It then calculates the average function timings using the measure_function_time() function from Intel’s Machine Learning Benchmarks library:

def Kmeans_test(X):
  X_init = 'k-means++'
  parser = argparse.ArgumentParser(description='SciKit-Learn K-means benchmark')
  params = bench.parse_args(parser)
  time = 0
  for x in tqdm(range(100)):
    fit_time, kmeans = bench.measure_function_time(fit_kmeans, X,
      X_init, params=params)
    time += fit_time
    average_time = time/100
    return average_time

The measure_function_time() function takes several arguments: the function to measure, the arguments for the measured function, and parameters for time measurements. As we want to measure the fit function performance, we will create a function called fit_kmeans(). This function is responsible for creating the K-Means instance and fitting the model to the given data:

def fit_kmeans(X, X_init):
  alg = KMeans(n_clusters=2, random_state=0)
  alg.fit(X)
  return alg

Our first simple test will include an array of six elements as sample data. We will use the KMeans algorithm to cluster the data and measure the time needed to fit the K-Means Model:

if __name__ == "__main__":
  X = np.array([[1,  3], [1,  4], [1,  0],
    [12, 2], [15, 4], [10, 3]])
  average_time = Kmeans_test(X)
  print(f"Average time ={average_time}")

First, run the code without the Intel extension. The output will resemble this:

Benchmarking without Intel Extension for Scikit-learn

This result means that the average execution time of this code is around 34 ms.

Now, use the following command to run the extension:

python -m sklearnex my_app.py

Note that we didn’t change anything within the code. We just used a different command to run it.

The program output with Intel’s extension is:

Benchmarking with Intel Extension for Scikit-learn

This shows that the average time to execute this code with the Intel Extension for Scikit-learn is around 1.3 ms, which was about 26 times faster than the original execution speed.

Now, we'll increase the data set size and observe how the times compare.

To do this, we'll modify our code to load the diabetes data set built into Scikit-learn, which contains data about 442 samples.

First, load the data using the following command:

from sklearn import datasets
diabetes = datasets.load_diabetes()

Then, pass the data to the Kmeans_test method.

if __name__ == "__main__":
	
    diabetes = datasets.load_diabetes()
    x= diabetes.data
	
    average_time = Kmeans_test(x)
    print(f"Average time ={average_time}")

When run without the Intel extension, the output of the code appears as follows:

Benchmarking without Intel Extension for Scikit-learn

This result means that the average execution time of the code without Intel Extension for Scikit-learn extension is around 63 ms.

Executing the code with the extension provides the following result:

Benchmarking Intel Extension for Scikit-learn

The average execution time of this code is now around 3.2 ms.

These results indicate a speed increase of about 19 times when using the larger data set. You may see a different amount of speedup depending on your hardware platform and software versions. You can also try some of the other models in the Machine Learning Benchmarks library on your own using this methodology.

These results show that Intel Extension for Scikit-learn can enhance the performance of programs written using the Scikit-learn library without modifying your code. If you want to ensure that this extension is always enabled for your code without having to add the -m sklearnex option when running Python, you can simply add these two lines of code:

from sklearnex import patch_sklearn
patch_sklearn()

Conclusion

In this article, we showed how Intel Extension for Scikit-learn improves the performance of applications that use the Scikit-learn Python library. Intel achieves this by patching the stock algorithms of the Scikit-learn library with optimized oneAPI versions. This extension—which also adds GPU support to your applications—enables you to speed up your application without rewriting any code.

We encourage you to try the extension for yourself to experience how it can easily optimize your applications without requiring refactoring. You can find the list of supported algorithms and parameters and submit an issue on GitHub if patching does not accommodate your scenarios.

References

See Related Content

Articles