Introduction
Intel® VTune™ Profiler is a performance profiling tool that delivers software and hardware performance analysis through its graphical and command line interface. There are three general types of data it collects:
- Software (user-mode hotspots and threading) - these collections are generally software-based and do not rely on availability of hardware events
- Hardware (event-based hotspots and threading, microarchitectural analysis, and HPC characteristics) - these collections are hardware-based and require the availability of some hardware events
- Memory (memory access and bandwidth analysis) - this collection is hardware-based and requires the availability of events that occur outside of the CPU (uncore events)
Amazon Web Services* (AWS*) provides a large variety of instance types and sizes for users in its Elastic Compute Cloud* (EC2*) service. Some VTune Profiler collection types will be unavailable on certain instances due to the hypervisor not providing the necessary hardware counters.
Instances Tested
Compute Optimized (C5, C6i)
Instance | VTune Profiler Collections Supported | Intel® Xeon® Scalable Processor Code Name |
c5.4xlarge and smaller | Software only | Skylake or Cascade Lake |
c5.9xlarge | Software, Hardware | Skylake or Cascade Lake |
c5.12xlarge | Software, Hardware | Cascade Lake |
c5.18xlarge | Software, Hardware | Skylake or Cascade Lake |
c5.24xlarge | Software, Hardware | Cascade Lake |
c5.metal | All | Cascade Lake |
c6i.12xlarge and smaller | Software only | Ice Lake |
c6i.16xlarge | Software, Hardware | Ice Lake |
c6i.24xlarge | Software only | Ice Lake |
c6i.32xlarge | Software, Hardware | Ice Lake |
c6i.metal | All | Ice Lake |
General Purpose (M5, M6i)
Instance | VTune Profiler Collections Supported | Intel® Xeon® Scalable Processor Code Name |
m5.8xlarge and smaller | Software only | Skylake or Cascade Lake |
m5.12xlarge | Software, Hardware | Skylake or Cascade Lake |
m5.16xlarge | Software only | Skylake or Cascade Lake |
m5.24xlarge | Software, Hardware | Skylake or Cascade Lake |
m5.metal | All | Skylake or Cascade Lake |
m6i.12xlarge and smaller | Software only | Ice Lake |
m6i.16xlarge | Software, Hardware | Ice Lake |
m6i.24xlarge | Software only | Ice Lake |
m6i.32xlarge | Software, Hardware | Ice Lake |
m6i.metal | All | Ice Lake |
Memory Optimized (R5, R6i)
Instance | VTune Profiler Collections Supported | Intel® Xeon® Scalable Processor Code Name |
r5.8xlarge and smaller | Software only | Skylake or Cascade Lake |
r5.12xlarge | Software, Hardware | Skylake or Cascade Lake |
r5.16xlarge | Software only | Skylake or Cascade Lake |
r5.24xlarge | Software, Hardware | Skylake or Cascade Lake |
r5.metal | All | Skylake or Cascade Lake |
r6i.12xlarge and smaller | Software only | Ice Lake |
r6i.16xlarge | Software, Hardware | Ice Lake |
r6i.24xlarge | Software only | Ice Lake |
r6i.32xlarge | Software, Hardware | Ice Lake |
r6i.metal | All | Ice Lake |
Instance Description
The instances tested use Intel® Xeon® Scalable Processors (codename Skylake, Cascade Lake, and Ice Lake) of various sizes and configuration. For more information, see: https://aws.amazon.com/ec2/instance-types/
Performance Monitoring Unit (PMU)
The PMU is on-chip hardware that monitors micro architectural events such as cache misses, cache hits and elapsed cycles. It also analyzes how the operating system or application performs on the processor. The PMU consists of two main types of events, hardware and software. The hardware event includes instructions, CPU cycles and cache references, and the software event includes context switches and page faults.
VTune Profiler has two ways of collecting on these events in Linux*:
- Linux Perf* tool - an interface that provides access to the PMU and its features. Perf also provides modes such as event-based sampling (EBS) which records when a threshold number of events is reached. Perf is already installed on the default kernel.
- VTune Profiler's sep driver - provided as part of the VTune Profiler package and installed if PMU access is detected. If VTune Profiler is unable to use the sep driver, it will collect using perf. The sep driver is only supported on metal instances at this time.
Instances without Full PMU Support
VTune Profiler analysis types such as the Additional Insights on Hotspot Analysis, Microarchitecture Exploration and HPC Performance Characterization require access to PMU events in order to provide hardware data such as instructions retired and number of cycles. The PMU events accessible on AWS* instances depends largely on the instance size. The instances tested run on Intel Xeon Scalable Processors with two sockets. Only instance sizes that use one or both complete sockets allow for PMU access, presumably because partial use of a socket results in shared CPU resources. Of the larger instances tested, the M5.16xlarge and R5.16xlarge instances do not support PMU events because they consume one complete socket and a portion of the second. Therefore they do not allow for the hardware analyses to take place.
Intel VTune Profiler - Application Performance Snapshot
Application Performance Snapshot (APS) is a utility packaged with VTune Profiler for Linux*. It provides the ability to quickly visualize MPI and OpenMP imbalances, efficiency of memory access, floating point unit (FPU), I/O and memory data in your application. After analyzing this data, it displays ways to perform additional analysis with VTune Profiler.
APS has the same limitations as VTune Amplifier hardware analysis types. It can only run when PMU events are accessible.
Intel VTune Profiler - Platform Profiler
The VTune Profiler Platform Profiler utility is also packaged with VTune Profiler. It profiles at the system level to help identify hardware configuration issues such as storage layout, memory and disk I/O, CPU frequency, cycles per instruction (CPI), power consumption and many more.
Platform Profiler is limited to use on metal instances only.
Metal versus Non-metal Instances
Some instance types have a metal offering that is the same size as the largest non-metal instance. For example, c5.24xlarge has the same number of vCPUs as c5.metal, and appears to utilize the same hardware. The main difference is that the 24xlarge instance still uses a hypervisor which prevents full access to the PMU, including uncore events used in memory access analysis. The result is that VTune Profiler will still be limited on the largest non-metal instance, and fully functional on the metal equivalent.