# Machine Learning Using oneAPI

Learn how to accelerate machine learning workloads using packages like scikit-learn*,

XGBoost, NumPy, SciPy, and pandas—all powered by oneAPI.

Realize performance gains with a few extra lines of code on the latest Intel® CPUs and GPUs.

Use stock versions or Intel® Extension for Scikit-learn*, which is part of AI Tools from Intel.

## Overview

AI Tools give data scientists, AI developers, and researchers familiar Python* tools and frameworks to accelerate end-to-end data science and analytics pipelines on Intel architecture. The components are built using oneAPI libraries for low-level compute optimizations. AI Tools maximize performance from preprocessing through machine learning and provides interoperability for efficient model development.

This learning path enables you to:

- Achieve drop-in acceleration for data preprocessing and machine learning workflows with compute-intensive Python packages, scikit-learn*, and XGBoost, optimized for Intel.
- Gain direct access to analytics and AI optimizations from Intel to ensure that your software works together seamlessly.

**Who is this for?**

Data scientists, data engineers, and software developers who want to learn how to accelerate machine learning workloads.

**What will I be able to do?**

- Adapt common scikit-learn algorithms to offload computation to accelerator devices like CPUs and GPUs.
- Apply and describe how to engage XGBoost, powered by oneAPI.
- Analyze Python code to find low-performing Python loops and list comprehensions. Replace these slow methods with faster vectorized equivalents that are more readable, more performant, and easier to adapt to new Intel innovations in libraries and hardware instruction sets.

## Start Optimizing Machine Learning with oneAPI

Get hands-on practice with code samples in a Jupyter* Notebook running live on the Intel® Developer Cloud.

**Intel® Developer Cloud**

To get started:

- Sign in to Intel Developer Cloud, select
**One Click Log In**for JupyterLab, and then select**Launch Terminal**. - Follow the instructions in the GitHub* README.
- Select
**TeacherKit.ipynb**. - Refresh your browser.

## Modules

### Machine Learning Using a Notebook

Use a Jupyter Notebook to modify and run code as part of the exercises.

To begin, open the file: **TeacherKit.ipynb**. This is a hyperlink-driven course for the following modules.

**Program Structure**

- Describe the dataset and algorithms used.
- Use follow-me instructions in select cells in hands-on lab exercises to learn the basics.
- Secure your knowledge with practicums without the follow-me instructions.

## Intel® Extension for Scikit-learn* for CPUs

- Describe the basics of AI Tools components and where the Intel Extension for Scikit-learn fits in the broader package.
- Describe where to download and how to install the tools.
- Describe the advantages of one component in AI Tools and Intel Extension for Scikit-learn that is invoked via the
**sklearnex**library. - Apply the patch and unpatch functions with varying granularities to Python scripts and within Jupyter cells, from whole-file applications to more surgical patches applied to a single algorithm.
- List the optimized scikit-learn algorithms.

## Applied Patching for a CPU

- Build a scikit-learn implementation of K-means that targets a CPU using patching.
- Apply patching:
- With dynamic versus lexical scope approaches
- To the support vector classifier (SVC) algorithm

- Describe and apply the correct surgical patching method to patch
**pairwise_distance**. - Recall that Intel Extension for Scikit-learn does not optimize the Euclidean metric, but that it does optimize the metrics cosine and correlation.
- Describe the application of
**pairwise_distance**to the problem of finding all-time series charts that are similar to a chosen pattern. - Solidify your knowledge:
- Apply code changes to try different classifiers optimized with Intel Extension for Scikit-learn.
- Use the target decision tree for replacement by a classifier (x2) of your choice.
- Apply patching to the Principal Components Analysis (PCA) and K-means.
- Synthesize your learning by applying patching to the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm

## Image Clustering for CPUs

- Perform image clustering using PCA, K-means, and DBSCAN:
- Examine the following Jupyter Notebook to identify opportunities to apply
**patching_sklearn()**to algorithms applied to tabular data. - Explore and interpret the image dataset.
- Apply Intel Extension for Scikit-learn patches to PCA, K-means, and DBSCAN algorithms.
- Synthesize your understanding by searching for ways to patch or unpatch any applicable cell to maximize the performance of each cell.

- Examine the following Jupyter Notebook to identify opportunities to apply

## Galaxy Classification for a CPU

- Apply multiple classification algorithms on the CPU to determine the most accurate model for classifying the stars that belong to each galaxy within a combined super galaxy.
- Apply an Intel Extension for Scikit-learn patch and SYCL context to compute on the CPU.
- Synthesize your comprehension by searching for opportunities in each cell to maximize performance.
- Investigate adding pairwise distance as a means for all the stars that are within three light years distance.

## Introduction to Using a GPU with Intel Extension for Scikit-learn

This current notebook is not intended to demonstrate performance but rather the functionality of how to target current and future Intel GPUs with scikit-learn algorithms powered by oneAPI.

- Learn how to apply patching while targeting an Intel GPU.
- Apply Intel Extension for Scikit-learn to a Random Forest classifier on an Intel GPU.
- Describe how to apply data parallel control (dpctl) compute follows data with patching.
- Use the compute follows data methodology using the dpctl library from Intel to target an Intel GPU.
- Apply dpctl and patching to a variety of scikit-learn algorithms in a simple test harness structure.

## Image Clustering for a GPU

- Explore and interpret the image dataset.
- Apply Intel Extension for Scikit-learn patches to Principal Components Analysis (PCA), K-means, and DBSCAN algorithms.
- Solidify your understanding by searching for ways to patch or unpatch any applicable cells to maximize the performance of each cell.
- Apply a
**q.sh**script to submit a job to another node that has a GPU on the Intel Developer Cloud.

## Galaxy Classification for a GPU

- Apply multiple classification algorithms with a GPU to classify stars belonging to each galaxy within a combined super galaxy to determine the most accurate model.
- Apply an Intel Extension for Scikit-learn patch and SYCL context to compute on an available GPU resource.
- Synthesize your comprehension by searching for opportunities in each cell to maximize performance.
- Investigate adding pairwise distance as a means for all the stars that are within three light years distance.

## Introduction to NumPy Powered by oneAPI

- Describe why replacing inefficient code (such as time-consuming loops) wastes resources and time.
- Describe why using Python for highly repetitive small tasks is inefficient.
- Describe the additive value of using packages such as NumPy, which are powered by oneAPI in a cloud.
- Describe the importance of keeping oneAPI and a third-party package such as NumPy or SciPy.
- Describe and apply NumPy universal functions (ufuncs), aggregations, and broadcasting.
- Apply NumPy
**Where**or**Select**clauses to conditional loops in a fast vectorized way. - Describe several domain areas spanned by SciPy.
- Apply the SciPy Floyd's algorithm to accelerate an all-pairs shortest-path task.
- Apply and compare various methods of accelerating matrix multiplication, including NumPy broadcasting, NumPy dot, NumPy matrix multiplication, and SciPy linear algebra.
- Apply and compare various methods of accelerating pairwise distances.
- Describe where Intel Extension for Scikit-learn applies to the subset of metrics.