Get started with Intel® Extension for PyTorch* using the following commands.
This extension provides the most up-to-date features and optimizations on Intel hardware, most of which will eventually be upstreamed to stock PyTorch releases.
For additional installation methods, see the Installation Guide.
Note This extension has version requirements for PyTorch.
For more information, see Intel® Extension for PyTorch*.
"
Basic CPU Installation Using PyPI* |
python -m pip install intel_extension_for_pytorch |
Basic CPU Installation Using Anaconda* |
conda install -c intel intel-extension-for-pytorch |
Basic GPU Installation Using PyPI |
python -m pip install torch==1.13.0a0 -f https://developer.intel.com/ipex-whl-stable-xpu python -m pip install intel_extension_for_pytorch==1.13.10+xpu -f https://developer.intel.com/ipex-whl-stable-xpu |
Import Intel Extension for PyTorch |
import intel_extension_for_pytorch as ipex |
Set backend to use GPU (default CPU) |
model = model.to('xpu') data = data.to('xpu') |
Capture a Verbose Log (Command Prompt) |
export ONEDNN_VERBOSE=1 |
Capture a Verbose Log on Demand (in the Code) |
import torch.backends.mkl as torch_mkl import torch.backends.mkldnn as torch_mkldnn
with torch_mkl.verbose(torch_mkl.VERBOSE_ON), torch_mkldnn.verbose(torch_mkldnn.VERBOSE_ON): model(data) |
Optimization During Training |
model = … optimizer = ... model.train() optimized_model, optimized_optimizer = ipex.optimize(model, optimizer=optimizer) |
Optimization During Inference (Performed After Loading Weights) |
model = ... model.load_state_dict(torch.load(PATH)) model.eval() optimized_model = ipex.optimize(model) |
Optimization Using the Low-Precision Data Type bfloat16 During Training (Default FP32) |
optimized_model, optimized_optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)
with torch.no_grad(): with torch.cpu.amp.autocast(): model(data) |
Optimization Using the Low-Precision Data Type bfloat16 During Inference (Default FP32) |
optimized_model = ipex.optimize(model, dtype=torch.bfloat16)
with torch.cpu.amp.autocast(): model(data) |
Run a Launch Script from a Command Prompt: Automate Configuration Settings for Performance Tuning |
ipexrun [knobs] <your_pytorch_script> [args] |
Non-Uniform Memory Access (NUMA) from a Command Prompt |
numactl --cpunodebind N --membind N python <script> |
Set a Number of Threads Using GNU* with OpenMP* |
export OMP_NUM_THREADS=<num threads> |
Bind Threads to a Specific CPU Using GNU with OpenMP |
export GOMP_CPU_AFFINITY=<space- or comma-separated list of CPUs> |
Specify Whether Threads May Move Between Processors Using GNU with OpenMP |
export OMP_PROC_BIND=<value> |
Determine Thread Scheduling Using GNU with OpenMP |
export OMP_SCHEDULE=<value> |
Switch to OpenMP (libiomp) |
export LD_PRELOAD=<path>/libiomp5.so:$LD_PRELOAD |
Bind Threads to Physical Processing Units Using OpenMP |
export KMP_AFFINITY=granularity=fine,compact,1,0 |
Use OpenMP to Set a Wait Time (ms) After Completing Running a Parallel Region Before Sleeping |
export KMP_BLOCKTIME=<time> Recommended to be to 0 or 1 for convolutional neural network (CNN) based models |
Tune an Intel® oneAPI Deep Neural Network Library (oneDNN) Primitive Cache (Note the Increased Memory Use: Adjust as Needed) |
export ONEDNN_PRIMITIVE_CACHE_CAPACITY = <Tuning size> //Note that <Tuning size> has an upper limit 65536 cached primitives |
Use a Denormal Number to Store Extremely Small Numbers to Boost Performance |
torch.set_flush_denormal(True)
|
Import Quantization Functions |
from intel_extension_for_pytorch.quantization import prepare, convert |
Post-Training int8 Quantization (Static): Reduce Model Size and Memory Bandwidth While Speeding Up Inference Time by Quantizing Weights and Activations |
model = … model.eval() data = … qconfig = ipex.quantization.default_static_qconfig prepared_model = prepare(model, qconfig, example_inputs=data, inplace=False) for d in calibration_data_loader(): prepared_model(d) converted_model = convert(prepared_model) |
Post-Training int8 Quantization (Dynamic): Reduce Model Size and Memory Bandwidth While Speeding Up Inference Time with on-the-Fly Quantization of Activations During Inference without Calibration |
model = … model.eval() data = … dynamic_qconfig = ipex.quantization.default_dynamic_qconfig prepared_model = prepare(model, qconfig, example_inputs=data) |
For more information and support, or to report any issues, see:
Intel® AI Analytics Toolkit Forum
Sign up and try this extension for free using Intel® Developer Cloud for oneAPI.
"