Intel® Extension for PyTorch* Cheat Sheet

author-image

作者

Get started with Intel® Extension for PyTorch* using the following commands.

This extension provides the most up-to-date features and optimizations on Intel hardware, most of which will eventually be upstreamed to stock PyTorch releases.

For additional installation methods, see the Installation Guide.

Note This extension has version requirements for PyTorch.

For more information, see Intel® Extension for PyTorch*.

"

Basic CPU Installation Using PyPI*

python -m pip install intel_extension_for_pytorch

Basic CPU Installation Using Anaconda*

conda install -c intel intel-extension-for-pytorch

Basic GPU Installation Using PyPI

python -m pip install torch==1.13.0a0 -f https://developer.intel.com/ipex-whl-stable-xpu

python -m pip install intel_extension_for_pytorch==1.13.10+xpu -f https://developer.intel.com/ipex-whl-stable-xpu

Import Intel Extension for PyTorch

import intel_extension_for_pytorch as ipex

Set backend to use GPU (default CPU)

model = model.to('xpu')

data = data.to('xpu')

Capture a Verbose Log (Command Prompt)

export ONEDNN_VERBOSE=1

Capture a Verbose Log on Demand (in the Code)

import torch.backends.mkl as torch_mkl

import torch.backends.mkldnn as torch_mkldnn

 

with torch_mkl.verbose(torch_mkl.VERBOSE_ON), torch_mkldnn.verbose(torch_mkldnn.VERBOSE_ON):

    model(data)

Optimization During Training

model = …

optimizer = ...

model.train()

optimized_model, optimized_optimizer = ipex.optimize(model, optimizer=optimizer)

Optimization During Inference (Performed After Loading Weights)

model = ...

model.load_state_dict(torch.load(PATH))

model.eval()

optimized_model = ipex.optimize(model)

Optimization Using the Low-Precision Data Type bfloat16 During Training (Default FP32)

optimized_model, optimized_optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)

 

with torch.no_grad():

    with torch.cpu.amp.autocast():

        model(data)

Optimization Using the Low-Precision Data Type bfloat16 During Inference (Default FP32)

optimized_model = ipex.optimize(model, dtype=torch.bfloat16)

 

with torch.cpu.amp.autocast():

    model(data)

Run a Launch Script from a Command Prompt: Automate Configuration Settings for Performance Tuning

ipexrun [knobs] <your_pytorch_script> [args]

Non-Uniform Memory Access (NUMA) from a Command Prompt

numactl --cpunodebind N --membind N python <script>

Set a Number of Threads Using GNU* with OpenMP*

export OMP_NUM_THREADS=<num threads>

Bind Threads to a Specific CPU Using GNU with OpenMP

export GOMP_CPU_AFFINITY=<space- or comma-separated list of CPUs>

Specify Whether Threads May Move Between Processors Using GNU with OpenMP

export OMP_PROC_BIND=<value>

Determine Thread Scheduling Using GNU with OpenMP

export OMP_SCHEDULE=<value>

Switch to OpenMP (libiomp)

export LD_PRELOAD=<path>/libiomp5.so:$LD_PRELOAD

Bind Threads to Physical Processing Units Using OpenMP

export KMP_AFFINITY=granularity=fine,compact,1,0

Use OpenMP to Set a Wait Time (ms) After Completing Running a Parallel Region Before Sleeping

export KMP_BLOCKTIME=<time>

Recommended to be to 0 or 1 for convolutional neural network (CNN) based models

Tune an Intel® oneAPI Deep Neural Network Library (oneDNN) Primitive Cache (Note the Increased Memory Use: Adjust as Needed)

export ONEDNN_PRIMITIVE_CACHE_CAPACITY = <Tuning size>

//Note that <Tuning size> has an upper limit 65536 cached primitives

Use a Denormal Number to Store Extremely Small Numbers to Boost Performance

torch.set_flush_denormal(True)

 

Import Quantization Functions

from intel_extension_for_pytorch.quantization import prepare, convert

Post-Training int8 Quantization (Static): Reduce Model Size and Memory Bandwidth While Speeding Up Inference Time by Quantizing Weights and Activations

model = …

model.eval()

data = …

qconfig = ipex.quantization.default_static_qconfig

prepared_model = prepare(model, qconfig, example_inputs=data, inplace=False)

for d in calibration_data_loader():

  prepared_model(d)

converted_model = convert(prepared_model)

Post-Training int8 Quantization (Dynamic): Reduce Model Size and Memory Bandwidth While Speeding Up Inference Time with on-the-Fly Quantization of Activations During Inference without Calibration

model = …

model.eval()

data = …

dynamic_qconfig = ipex.quantization.default_dynamic_qconfig

prepared_model = prepare(model, qconfig, example_inputs=data)

"

For more information and support, or to report any issues, see:

PyTorch Issues on GitHub*

Intel® AI Analytics Toolkit Forum

 

Sign up and try this extension for free using Intel® Developer Cloud for oneAPI.

"