Runtime Performance Optimization Blueprint: Intel® Architecture Optimization with Large Code Pages

ID 标签 660157
已更新 3/19/2020
版本 Latest
公共

author-image

作者

Abstract

This document is a Runtime Optimization Blueprint illustrating how the performance of runtimes can be improved by using large code pages. The intended audience is runtime implementers, customers, and providers deploying runtimes at scale. In the Overview section, we introduce the problem that runtimes have with high Instruction Translation Lookaside Buffer (ITLB) miss stalls (on average 7% of the CPU cycles are stalled across seven commonly used runtimes). In the Diagnosis section, we illustrate how to diagnose this problem using the Performance Monitoring Unit (PMU) on Intel® architecture processors, counters, and sample tools. In the Solution section, we provide an Intel reference implementation as well as other approaches to solve this problem. The Solution Integration section describes how to integrate the reference implementation in runtimes. The Case Studies section details how this optimization improves performance and reduces ITLB misses (up to 50%) in three applications in three environments. The last section summarizes the blueprint and provides a call to action for runtime developers/implementers.

Download Runtime Performance Optimization Blueprint: Large Code Pages (PDF 662K)