Enhance Deep Learning Workloads on the Latest Intel® Xeon® Processors
Subscribe Now
Stay in the know on all things CODE. Updates are delivered to your inbox.
Overview
The 4th gen Intel® Xeon® Scalable processors (formerly code named Sapphire Rapids) offer several built-in features for boosting performance and efficiency of deep learning applications.
This session focuses on one of them—Intel® Advanced Matrix Extensions (Intel® AMX)—and how to take advantage of its AI acceleration power to boost model training and inference using Intel optimizations for PyTorch* and TensorFlow*.
Topics covered include:
- An overview of the Intel optimizations, including performance and features on the latest Intel CPUs and how they compare to stock PyTorch and TensorFlow.
- How the optimizations reduce a memory footprint and improve performance by automatically mixing precision using bfloat16 or float16 data types.
- Using Intel® oneAPI Deep Neural Network Library (oneDNN) with Intel optimizations for PyTorch and TensorFlow to take advantage of other 4th gen Intel Xeon processor built-in acceleration features, such as Intel® Advanced Vector Extensions 512 and Vector Neural Network Instructions (VNNI)
- Reducing model inference time with quantization features in Intel® Optimization for PyTorch*
- How speedups can be gained over stock PyTorch and TensorFlow on new Amazon Web Services* instances built on Intel Xeon Scalable processors.
Skill level: Novice
Featured Software
- The Intel optimizations are available as part of the Intel® AI Analytics Toolkit or you can download stand-alone versions: PyTorch Optimization | TensorFlow Optimization.
- Get the stand-alone version of oneDNN or as part of the Intel® oneAPI Base Toolkit.
Code Samples
Download a variety of samples on GitHub*, including:
- Get Started with Intel® Extension for PyTorch*
- Optimize PyTorch Models Using Quantization
- PyTorch Training Optimizations with bfloat16 for Intel AMX
Accelerate data science and AI pipelines-from preprocessing through machine learning-and provide interoperability for efficient model development.
Improve deep learning (DL) application and framework performance on CPUs and GPUs with highly optimized implementations of DL building blocks.
You May Also Like
Articles & Blogs