Monitoring Integrated Memory Controller Requests in the 2nd, 3rd, 4th, 5th, 6th generation Intel® Core™ processors

ID 标签 751706
已更新 2/23/2016
版本 Latest
公共

author-image

作者

Authors: Roman Dementiev and Angela D. Schmid

Dear Software Tuning, Performance Optimization & Platform Monitoring community,

The recent and upcoming Intel® Core™ processors of 2nd,3rd,4th ,5th and 6th generation (previously codenamed Sandy-Bridge, Ivy-Bridge, Haswell, Broadwell and Skylake) expose model specific counters that allow for monitoring requests to DRAM.

The counters employ circuitry residing in the memory controller, and monitor transaction requests coming from various sources, e.g. the processor cores, the graphic engine, or other I/O agents.  The monitoring interface uses memory-mapped I/O reads from physical memory at the offsets specified in Table 1. Memory traffic metrics can be derived as follows:

  • Data read from DRAM in number of bytes:   UNC_IMC_DRAM_DATA_READS*64
  • Data written to DRAM in number of bytes:   UNC_IMC_DRAM_DATA_WRITES*64

Users and developers may take advantage of Intel tools to easily access the counters or derived memory performance metrics:

Table 1. Addresses of DRAM Counters.

The DRAM counters below are model specific meaning they will change or not be supported in the future. The BAR is available (in PCI configuration space) at Bus 0; Device 0; Function 0; Offset 048H.

UNC_IMC_DRAM_GT_REQUESTS BAR + 0x5040 Counts every read/write request entering the Memory Controller to DRAM (sum of all channels) from the GT engine. Each partial write request counts as a request incrementing this counter. However same-cache-line partial write requests are combined to a single 64-byte data transfers from DRAM. Therefore multiplying the number of requests by 64-bytes will lead to inaccurate GT memory bandwidth. The inaccuracy is proportional to the number of same-cache-line partial writes combined.
UNC_IMC_DRAM_IA_REQUESTS BAR + 0x5044 Counts every read/write request (demand and HW prefetch) entering the Memory Controller to DRAM (sum of all channels) from IA. Each partial write request counts as a request incrementing this counter. However same-cache-line partial write requests are combined to a single 64-byte data transfers from DRAM. Therefore multiplying the number of requests by 64-bytes will lead to inaccurate IA memory bandwidth. The inaccuracy is proportional to the number of same-cache-line partial writes combined.
UNC_IMC_DRAM_IO_REQUESTS BAR + 0x5048 Counts every read/write request entering the Memory Controller to DRAM (sum of all channels) from all IO sources (e.g. PCIe, Display Engine, USB audio, etc.). Each partial write request counts as a request incrementing this counter. However same-cache-line partial write requests are combined to a single 64-byte data transfers from DRAM. Therefore multiplying the number of requests by 64-bytes will lead to inaccurate IO memory bandwidth. The inaccuracy is proportional to the number of same-cache-line partial writes combined.
UNC_IMC_DRAM_DATA_READS BAR + 0x5050 Counts every read (RdCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64-byte data transfers from DRAM. Use for accurate memory bandwidth calculations.
UNC_IMC_DRAM_DATA_WRITES BAR + 0x5054 Counts every write (WrCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64-byte data transfers from DRAM. Use for accurate memory bandwidth calculations.

 

Regards,
Roman Dementiev
Staff Application Engineer
Intel Corporation

Angela D. Schmid
Performance Engineer
Intel Corporation

"