Framework VersionModelUsagePrecisionThroughputPerf/WattLatency(ms)Batch sizeConfig*
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4int8  4011 instance per socket
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4int8130.4 tokens/s 9261 instance per socket
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4bf16  59.511 instance per socket
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4bf16125 tokens/s 9661 instance per socket
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4int8  4711 instance per socket
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4int8111.6 tokens/s 107.561 instance per socket
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4bf16  6811 instance per socket
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4bf16109.1 tokens/s 11061 instance per socket
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionint810,215.7 img/s9.98 14 cores per instance
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionint813,862.96 img/s14.09 1161 instance per socket
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf166,210.69 img/s6.13 14 cores per instance
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf167,295.63 img/s7.33 1161 instance per socket
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionfp321,319.52 img/s1.27 14 cores per instance
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionfp321,360.05 img/s1.28 1161 instance per socket
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf321,659.37 img/s1.65 14 cores per instance
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf321,985.26 img/s2.02 1161 instance per socket
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionint87,440.61 img/s7.70 14 cores per instance
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionint812,345.54 img/s11.80 1161 instance per socket
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf165,053.76 img/s5.01 14 cores per instance
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf166,704.17 img/s6.34 1161 instance per socket
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionfp321,282.77 img/s1.17 14 cores per instance
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionfp321,342.91 img/s1.27 1161 instance per socket
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf321,529.49 img/s1.41 14 cores per instance
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf322,017.54 img/s1.89 1161 instance per socket
OpenVINO 2023.2ResNet50 v1.5Image Recognitionint88,819.657 img/s8.81 14 cores per instance
OpenVINO 2023.2ResNet50 v1.5Image Recognitionbf165,915.793 img/s5.82 14 cores per instance
OpenVINO 2023.2ResNet50 v1.5Image Recognitionfp321,281.337 img/s1.25 14 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingint8335.1 sent/s0.35 14 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingint8378.73 sent/s0.36 561 instance per socket
Intel PyTorch 2.1BERTLargeNatural Language Processingbf16204.52 sent/s0.21 14 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingbf16201.44 sent/s0.21 161 instance per socket
Intel PyTorch 2.1BERTLargeNatural Language Processingfp3235.25 sent/s0.03 14 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingfp3241.05 sent/s0.04 561 instance per socket
Intel PyTorch 2.1BERTLargeNatural Language Processingbf3272.42 sent/s0.07 14 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingbf3271.63 sent/s0.07 161 instance per socket
Intel TensorFlow 2.14BERTLargeNatural Language Processingint8253.27 sent/s0.24 14 cores per instance
Intel TensorFlow 2.14BERTLargeNatural Language Processingint8239.89 sent/s0.25 161 instance per socket
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf16181.02 sent/s0.18 14 cores per instance
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf16184.06 sent/s0.17 1281 instance per socket
Intel TensorFlow 2.14BERTLargeNatural Language Processingfp3244.73 sent/s0.04 14 cores per instance
Intel TensorFlow 2.14BERTLargeNatural Language Processingfp3238.58 sent/s0.04 161 instance per socket
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf3272.78 sent/s0.07 14 cores per instance
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf3271.77 sent/s0.07 161 instance per socket
OpenVINO 2023.2BERTLargeNatural Language Processingint8298.44 sent/s0.30 14 cores per instance
OpenVINO 2023.2BERTLargeNatural Language Processingint8285.68 sent/s0.28 481 instance per socket
OpenVINO 2023.2BERTLargeNatural Language Processingbf16202.48 sent/s0.20 14 cores per instance
OpenVINO 2023.2BERTLargeNatural Language Processingbf16191.2533 sent/s0.19 321 instance per socket
OpenVINO 2023.2BERTLargeNatural Language Processingfp3247.33667 sent/s0.05 14 cores per instance
OpenVINO 2023.2BERTLargeNatural Language Processingfp3244.23333 sent/s0.04 481 instance per socket
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderint823,444,587 rec/s23611.92 1281 instance per socket
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf1610,646,560 rec/s10238.88 1281 instance per socket
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderfp322,278,228 rec/s2220.37 1281 instance per socket
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf324,530,200 rec/s4427.38 1281 instance per socket
Intel PyTorch 2.1DistilBERTNatural Language Processingint84,726.15 sent/s4.94 14 cores per instance
Intel PyTorch 2.1DistilBERTNatural Language Processingint87,759.25 sent/s8.42 1681 instance per socket
Intel PyTorch 2.1DistilBERTNatural Language Processingbf163,306.46 sent/s3.35 14 cores per instance
Intel PyTorch 2.1DistilBERTNatural Language Processingbf165,057.47 sent/s5.50 1201 instance per socket
Intel PyTorch 2.1DistilBERTNatural Language Processingfp32900.58 sent/s0.85 14 cores per instance
Intel PyTorch 2.1DistilBERTNatural Language Processingfp321,007.05 sent/s1.04 561 instance per socket
Intel PyTorch 2.1DistilBERTNatural Language Processingbf321,513.66 sent/s1.49 14 cores per instance
Intel PyTorch 2.1DistilBERTNatural Language Processingbf321,926.1 sent/s1.77 2881 instance per socket
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationint861.03 sent/s0.06 14 cores per instance
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationint8245.66 sent/s0.24 4481 instance per socket
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf1641.44 sent/s0.04 14 cores per instance
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf16278.81 sent/s0.28 4481 instance per socket
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationfp3220.27 sent/s0.02 14 cores per instance
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationfp32102.48 sent/s0.10 4481 instance per socket
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf3220.28 sent/s0.02 14 cores per instance
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf32114.08 sent/s0.11 4481 instance per socket
OpenVINO 2023.23D-UnetImage Segmentationint824.68333 samp/s0.02 14 cores per instance
OpenVINO 2023.23D-UnetImage Segmentationint821.85667 samp/s0.02 61 instance per socket
OpenVINO 2023.23D-UnetImage Segmentationbf1613.05333 samp/s0.01 14 cores per instance
OpenVINO 2023.23D-UnetImage Segmentationbf1611.87 samp/s0.01 61 instance per socket
OpenVINO 2023.23D-UnetImage Segmentationfp322.883333 samp/s0.00 14 cores per instance
OpenVINO 2023.23D-UnetImage Segmentationfp322.62 samp/s0.00 61 instance per socket
OpenVINO 2023.2SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionint8459.3633 img/s0.44 14 cores per instance
OpenVINO 2023.2SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionbf16218.4133 img/s0.20 14 cores per instance
OpenVINO 2023.2SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionfp3231.17333 img/s0.03 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationint81289.95 fps1.35 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationint81923.77 fps1.83 1161 instance per socket
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf16648.58 fps0.66 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf16867.05 fps0.87 641 instance per socket
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationfp32151.29 fps0.14 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationfp32160.93 fps0.15 641 instance per socket
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf32215.11 fps0.21 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf32241.98 fps0.22 1161 instance per socket

 

Framework VersionModelUsagePrecisionTTT (minutes)AccurayBatch SizeRanks
Transformers 4.31, Intel Extension for Pytorch 2.0.1, PEFT 0.4.0GPT-J 6B (Glue MNLI dataset)Fine-turning, Text generation taskbf16230.4081.681
Transformers 4.34.1, Intel PyTorch 2.1.0, PEFT 0.5.0, Intel(r) oneCCL v2.1.0BioGPT (1.5 billion parameters) (PubMedQA dataset)Fine-turning, Response generationbf1648.7079.488
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Fine-turning, Colorectal cancer detectionfp328.8394.33264
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Fine-turning, Colorectal cancer detectionbf164.6594.33264
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Fine-turning, Colorectal cancer detectionfp326.0493.832128
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Fine-turning, Colorectal cancer detectionbf164.0294.632128
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (IMDb dataset)Fine-turning, Sentiment Analysisfp3261.7293.59644
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (IMDb dataset)Fine-turning, Sentiment Analysisbf1618.8693.88644
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (GLUE SST2 dataset)Fine-turning, Sentiment Analysisfp3214.0692.22564
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (GLUE SST2 dataset)Fine-turning, Sentiment Analysisbf163.6892.092564

 

Framework VersionModel/DatasetUsagePrecisionThroughputPerf/WattBatch size
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionfp32129.97 img/s0.16128
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf16327.96 img/s0.42128
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf32146.18 img/s0.18128
Intel TensorFlow 2.14ResNet50 v1.5 ImageNet (224 x224)Image Recognitionfp32137.36 img/s0.16 1,024
Intel TensorFlow 2.14ResNet50 v1.5 ImageNet (224 x224)Image Recognitionbf16317.83 img/s0.38 1,024
Intel TensorFlow 2.14ResNet50 v1.5 ImageNet (224 x224)Image Recognitionbf32152 img/s0.18 1,024
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderfp32265,503.91 rec/s323.99 32,768
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf16783,058.09 rec/s980.37 32,768
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf32369,848.15 rec/s447.84 32,768
Intel TensorFlow 2.14SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionfp3252.49 img/s0.07896
Intel TensorFlow 2.14SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionbf16190.53 img/s0.25896
Intel TensorFlow 2.14SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionbf3268.08 img/s0.09896
Intel PyTorch 2.1RNNT LibriSpeechSpeech Recognitionfp323.38 fps0.0032
Intel PyTorch 2.1RNNT LibriSpeechSpeech Recognitionbf1627.32 fps0.0364
Intel PyTorch 2.1RNNT LibriSpeechSpeech Recognitionbf3211.05 fps0.0132
Intel PyTorch 2.1MaskR-CNN COCO 2017Object Detectionfp323.76 img/s0.00112
Intel PyTorch 2.1MaskR-CNN COCO 2017Object Detectionbf1610.04 img/s0.01112
Intel PyTorch 2.1MaskR-CNN COCO 2017Object Detectionbf323.94 img/s0.00112
Intel PyTorch 2.1BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingfp323.76 sent/s0.0028
Intel PyTorch 1.13BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf1610.04 sent/s0.0156
Intel PyTorch 1.13BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf323.94 sent/s0.0056
Intel TensorFlow 2.14BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingfp324.28 sent/s0.01128
Intel TensorFlow 2.14BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf169.75 sent/s0.01128
Intel TensorFlow 2.14BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf324.79 sent/s0.01128
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationfp3212,072.19 sent/s11.53 42,000
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf1628,757.83 sent/s28.89 42,000
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf3211, 995.37 sent/s11.7842,000

Hardware and software configuration (measured October 24, 2023):

Deep Learning configuration:

  • Hardware configuration for Intel® Xeon® Platinum 8480+ processor (code named Sapphire Rapids): 2 sockets for inference, 1 socket for training, 56 cores, 350 watts, 1024GB 16 x 64GB DDR5 4800 MT/s memory, operating system CentOS* Stream 8. Using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) optimized kernels integrated into Intel® Extension for PyTorch*, Intel® Extension for TensorFlow*, and Intel® Distribution of OpenVINO™ toolkit. Measurements may vary. If the dataset is not listed, a synthetic dataset was used to measure performance.
  • If the dataset is not listed, a synthetic dataset was used to measure performance. Accuracy (if listed) was validated with the specified dataset.

Transfer Learning configuration:

  • Hardware configuration for Intel® Xeon® Platinum 8480+ processor (code named Sapphire Rapids): Use DLSA single node fine tuning, Vision Transfer Learning using single node, 56 cores, 350 watts, 16 x 64 GB DDR5 4800 memory, BIOS version EGSDREL1.SYS.8612.P03.2208120629, operating system: Ubuntu 22.04.1 LT, using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) v2.6 optimized kernels integrated into Intel® Extension for PyTorch* v1.12, and Intel® oneAPI Collective Communications Library v2021.5.2. Measurements and some software configurations may vary.