Get More Performance for Every Dollar You Spend on AWS*, with Your Data on Intel

Discover the advantage of choosing instances based on Intel® Xeon® Scalable processors.

Are you getting the best value from your AWS* investment?

The cloud gives you the scalability, reliability, and flexibility you need, but not all clouds are created equal: Some instances deliver much more value than others. To understand the optimal choice for your workloads, you have to look at how those specific workloads perform. A generic performance statistic might not tell you much about the results you’ll actually see, especially if you’re running workloads that are compute-intensive or data-intensive. Similarly, knowing the price per instance doesn’t tell you much about the price per transaction, or any other real business or performance metric. You have to look closer.

For example, did you know that AWS* instances based on Intel® Xeon® Scalable processors can offer up to 4.15x higher performance per dollar for high-performance computing (HPC) workloads1, compared to instances based on AMD EPYC* processors, according to the High-performance Linpack* benchmark? They offer up to 2.19x higher performance per dollar according to the LAMMPS* benchmark.1 For database workloads on AWS, Intel® Xeon® Scalable processors can deliver up to 2.84x the higher performance per dollar2; and for memory bandwidth intensive workloads, they can enable up to 2.25x higher performance per dollar.3 If you’re running web-based workloads such as server-side Java* or Wordpress PHP/HHVM*, you might find they give you up to 1.74x higher performance per dollar running on Intel.4

If you’re already using Intel® processors, you could also make a saving by moving to a more modern instance, based on the Intel® Xeon® Scalable processor, too. TSO Logic delivers data-driven recommendations to right-size and right-cost compute across public and private cloud. It studied millions of data points across its 100,000-instance repository of anonymized AWS customer data. The conclusion? 19 percent of current instances could save money by moving to newer, smaller Amazon EC2* instance types that offer equivalent performance, at lower cost. For example, migrating from older C4.8XLarge to newer C5.4XLarge instances can save up to 50% of your cloud costs, over $3000 per instance.5 What’s more, the savings can add up fast if you’re licensing software per core. TSO Logic found that one workload could be delivered using 40 fewer cores running on newer instances based on the 2nd Generation Intel® Xeon® Scalable processor.5 If you run a commercial database licensed at $1,800 per core, you could save $72,000 per year by cutting the core count by 40.5

The Intel® Xeon® Scalable processors and 2nd Generation Intel® Xeon® Scalable processors have a number of optimizations built in to accelerate your workloads. The INT8 number format enables unnecessary detail to be discarded to accelerate machine learning; and Intel® Deep Learning Boost (Intel® DL Boost) provides a new processor instruction to speed up inference in applications such as image classification, speech recognition, language translation, and object detection. Intel® Advanced Vector Extensions 512 (Intel® AVX-512) provides 512-bit vector instructions to accelerate floating point calculations, including scientific simulations. Intel® Turbo Boost Technology enables you to run cores faster than the base operating frequency to give you extra performance when you need it most. To help protect your data, Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI) provides processor instructions to accelerate encryption and decryption.

Notices and Disclaimers:

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.cn/benchmarks.

Performance results are based on testing as of the date set forth in the Configurations and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure.
Intel does not control or audit third-party data. You should review this content, consult other sources, and confirm whether referenced data are accurate.

Cost reduction scenarios described are intended as examples of how a given Intel®-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

Intel® technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.cn.

Intel, the Intel logo, and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others. 
© Intel Corporation

产品和性能信息

1

英特尔使用截至 2019 年 1 月 12 日的 AWS 定价计算的结果(美元/小时,标准 1 年期限,无预付款)。
在 AWS* EC2 M5 和 M5a 实例 (https://aws.amazon.com/ec2/instance-types/) 上执行的性能成本比测试,其中将 96 vCPU 英特尔® 至强® 可扩展处理器性能成本比与 AMD EPYC* 处理器性能成本比相比较。

工作负载:LAMMPS*
结果:AMD EPYC 性能成本比 = 基准 1;英特尔® 至强® 可扩展处理器性能成本比 = 2.19X(越高越好)。
HPC 材料科学 – LAMMPS(越高越好):
AWS M5.24xlarge(英特尔)实例,LAMMPS 版本:2018-08-22(代码:https://lammps.sandia.gov/download.html),工作负载:水 – 512K 个粒子,英特尔 ICC 18.0.3.20180410,英特尔® MPI 库 Linux* OS 版,版本 2018 更新 3 内部版本 20180411,48 个 MPI 级别,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,OMP_NUM_THREADS=2,分数 137.5 个时间步长/秒,由英特尔于 2018 年 10 月 31 日测量。
AWS M5a.24xlarge (AMD) 实例,LAMMPS 版本:2018-08-22(代码:https://lammps.sandia.gov/download.html),工作负载:水 – 512K 个粒子,英特尔 ICC 18.0.3.20180410,英特尔® MPI 库 Linux* OS 版,版本 2018 更新 3 内部版本 20180411,48 个 MPI 级别,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,OMP_NUM_THREADS=2,分数 55.8 个时间步长/秒,由英特尔于 2018 年 11 月 7 日测量。
将 AMD 更改为支持 AVX2(AMD 仅支持 AVX2,因此这些更改很有必要):
sed -i 's/-xHost/-xCORE-AVX2/g' Makefile.intel_cpu_intelmpi
sed -i 's/-qopt-zmm-usage=high/-xCORE-AVX2/g' Makefile.intel_cpu_intelmpi

工作负载:高性能 Linpack*
结果:AMD EPYC 性能成本比 = 基准 1;英特尔® 至强® 可扩展处理器性能成本比 = 4.15X(越高越好)。
HPC Linpack(越高越好):
AWS M5.24xlarge(英特尔)实例,HP Linpack 版本 2.2 (https://software.intel.com/zh-cn/articles/intel-mkl-benchmarks-suite 目录:benchmarks_2018.3.222/linux/mkl/benchmarks/mp_linpack/bin_intel/intel64),英特尔 ICC 18.0.3.20180410(带 AVX512),英特尔® MPI 库 Linux* OS 版,版本 2018 更新 3 内部版本 20180411,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,OMP_NUM_THREADS=24,2 个 MPI 进程,分数 3152 GB/s,由英特尔于 2018 年 10 月 31 日测量。
AWS M5a.24xlarge (AMD) 实例,HP Linpack 版本 2.2,(HPL 来源:http://www.netlib.org/benchmark/hpl/hpl-2.2.tar.gz,版本 2.2;icc (ICC) 18.0.2 20180210,用于编译和链接到 BLIS 库版本 0.4.0;https://github.com/flame/blis;Addt’l 编译器标志:-O3 -funroll-loops -W -Wall –qopenmp;make arch=zen OMP_NUM_THREADS=8;6 个 MPI 进程), 英特尔 ICC 18.0.3.20180410(带 AVX2),英特尔® MPI 库 Linux* OS 版,版本 2018 更新 3 内部版本 20180411,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,OMP_NUM_THREADS=8,6 个 MPI 进程,分数 677.7 GB/s,由英特尔于 2018 年 11 月 7 日测量。

2

英特尔使用截至 2019 年 1 月 12 日的 AWS 定价计算的结果(美元/小时,标准 1 年期限,无预付款)。
在 AWS* EC2 R5 和 R5a 实例 (https://aws.amazon.com/ec2/instance-types/) 上执行的性能成本比测试,其中将 96 vCPU 英特尔® 至强® 可扩展处理器性能成本比与 AMD EPYC* 处理器性能成本比相比较。

工作负载:HammerDB* PostgreSQL*
结果:AMD EPYC 性能成本比 = 基准 1;英特尔® 至强® 可扩展处理器性能成本比 = 1.85X(越高越好)。
数据库:HammerDB – PostgreSQL(越高越好):
AWS R5.24xlarge(英特尔)实例,HammerDB 3.0 PostgreSQL 10.2,内存:768GB,Hypervisor:KVM;存储类型:EBS io1,磁盘卷 200GB,总存储 200GB,Docker 版本:18.06.1-ce,Red Hat* Enterprise Linux 7.6,3.10.0-957.el7.x86_64,6400MB shared_buffer,256 个仓库,96 个用户。分数“NOPM”439931,由英特尔于 2018 年 12 月 11 日-2018 年 12 月 14 日测量。
AWS R5a.24xlarge (AMD) 实例,HammerDB 3.0 PostgreSQL 10.2,内存:768GB,Hypervisor:KVM;存储类型:EBS io1,磁盘卷 200GB,总存储 200GB,Docker 版本:18.06.1-ce,Red Hat* Enterprise Linux 7.6,3.10.0-957.el7.x86_64,6400MB shared_buffer,256 个仓库,96 个用户。分数“NOPM”212903,由英特尔于 2018 年 12 月 20 日测量。

工作负载:MongoDB*
结果:AMD EPYC 性能成本比 = 基准 1;英特尔® 至强® 可扩展处理器性能成本比 = 2.84X(越高越好)。
数据库:MongoDB(越高越好):
AWS R5.24xlarge(英特尔)实例,MongoDB v4.0,日志已禁用,同步到文件系统已禁用,wiredTigeCache=27GB,maxPoolSize = 256;7 个 MongoDB 实例,14 个客户端虚拟机,每个虚拟机 1 个 YCSB 客户端,每个 YCSB 客户端 96 个线程,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,分数 1229288 次操作/秒,由英特尔于 2018 年 12 月 10 日测量。
AWS R5a.24xlarge (AMD) 实例,MongoDB v4.0,日志已禁用,同步到文件系统已禁用,wiredTigeCache=27GB,maxPoolSize = 256;7 个 MongoDB 实例,14 个客户端虚拟机,每个虚拟机 1 个 YCSB 客户端,每个 YCSB 客户端 96 个线程,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,分数 388596 次操作/秒,由英特尔于 2018 年 12 月 10 日测量。
有关更多详细信息,请访问 www.intel.cn/benchmarks

3

AWS M5.4xlarge(英特尔)实例,McCalpin Stream(OMP 版本),(来源:https://www.cs.virginia.edu/stream/FTP/Code/stream.c);英特尔 ICC 18.0.3 20180410,带 AVX512,-qopt-zmm-usage=high,-DSTREAM_ARRAY_SIZE=134217728 -DNTIMES=100 -DOFFSET=0 –qopenmp,-qopt-streaming-stores always -o $OUT stream.c,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,OMP_NUM_THREADS:8,KMP_AFFINITY:proclist=[0-7:1],granularity=thread,explicit,分数 81216.7 MB/s,由英特尔于 2018 年 12 月 6 日测量。
AWS M5a.4xlarge (AMD) 实例,McCalpin Stream(OMP 版本),(来源:https://www.cs.virginia.edu/stream/FTP/Code/stream.c);英特尔 ICC 18.0.3 20180410,带 AVX2,-DSTREAM_ARRAY_SIZE=134217728,-DNTIMES=100 -DOFFSET=0 -qopenmp -qopt-streaming-stores always -o $OUT stream.c,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,OMP_NUM_THREADS:8,KMP_AFFINITY:proclist=[0-7:1],granularity=thread,explicit,分数 32154.4 MB/s,由英特尔于 2018 年 12 月 6 日测量。
OpenFOAM 免责声明:此产品未经 OpenCFD Limited、OpenFOAM 软件生产商和分销商以及 OpenFOAM® 和 OpenCFD® 商标的所有者通过 www.openfoam.com 进行批准或背书。

4

英特尔使用截至 2019 年 1 月 12 日的 AWS 定价计算的结果(美元/小时,标准 1 年期限,无预付款)。
在 AWS* EC2 M5 和 M5a 实例 (https://aws.amazon.com/ec2/instance-types/) 上执行的性能成本比测试,其中将 96 vCPU 英特尔® 至强® 可扩展处理器性能成本比与 AMD EPYC* 处理器性能成本比相比较。

工作负载:服务器端 Java* 1 JVM
结果:AMD EPYC 性能成本比 = 基准 1;英特尔® 至强® 可扩展处理器性能成本比 = 1.74X(越高越好)。
服务器端 Java(越高越好):
AWS M5.24xlarge(英特尔)实例,Java 服务器基准性能测试编号 NUMA 绑定,2JVM,OpenJDK 10.0.1,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,分数 101767 次事务处理/秒,由英特尔于 2018 年 11 月 16 日测量。
AWS M5a.24xlarge (AMD) 实例,Java 服务器基准性能测试编号 NUMA 绑定,2JVM,OpenJDK 10.0.1,Red Hat* Enterprise Linux 7.5,内核 3.10.0-862.el7.x86_64,分数 52068 次事务处理/秒,由英特尔于 2018 年 11 月 16 日测量。

工作负载:WordPress* PHP/HHVM*
结果:AMD EPYC 性能成本比 = 基准 1;英特尔® 至强® 可扩展处理器性能成本比 = 1.75X(越高越好)。
Web 前端 WordPress(越高越好):
AWS M5.24xlarge(英特尔)实例,oss-performance/wordpress 4.2.0 版;Ver 10.2.19-MariaDB-1:10.2.19+maria~bionic;工作负载版本:u'4.2.0;客户端线程数:200;PHP 7.2.12-1;perfkitbenchmarker_version="v1.12.0-944-g82392cc;Ubuntu 18.04,内核 Linux 4.15.0-1025-aws,分数 3626.11 TPS,由英特尔于 2018 年 11 月 16 日测量。
AWS M5a.24xlarge (AMD) 实例,oss-performance/wordpress 4.2.0 版;Ver 10.2.19-MariaDB-1:10.2.19+maria~bionic;工作负载版本:u'4.2.0;客户端线程数:200;PHP 7.2.12-1;perfkitbenchmarker_version="v1.12.0-944-g82392cc;Ubuntu 18.04,内核 Linux 4.15.0-1025-aws,分数 1838.48 TPS,由英特尔于 2018 年 11 月 16 日测量。
有关更多详细信息,请访问 www.intel.cn/benchmarks

5

来源:TSO Logic/英特尔研究报告:“英特尔和 Amazon Web Service 的新进展,实现了借助云端节约大量时间和成本的目标”。