Configuring Intel® Optane™ DC Persistent Memory for Best Performance

This video describes how to configure a 2nd Gen Intel® Xeon® Scalable platform with Optane DC persistent memory, and the factors to consider in order to enjoy the best performance.

Hi. I'm Dave Larsen at Intel. And today I'll explain several key factors that will help you get the best performance from platforms equipped with Intel® Optane™ DC Persistent Memory. It is a revolutionary new class of server memory that offers DRAM-like performance with a capacity and data persistence of storage. It works with our new 2nd Generation Intel® Xeon® Scalable processor.

DRAM memory and Intel® Optane™ Persistent Memory work together in the platform. Exactly how depends on which operating mode you select for the Intel Optane Persistent Memory. In Memory mode, all the data in memory is volatile, just like today. But you can take advantage of the affordable capacity of up to three terabytes per processor.

In App Direct mode, the application and OS are aware there are two different types of memory and can use them independently. If you want to use the persistent data features, you have to use App Direct mode and an enabled operating system or applications. For software that uses App Direct mode, it's best to consult with the vendor on the configuration they recommend for their application.

Most of the rest of our talk will be about how to get the best performance in Memory mode. There are four factors that we'll consider for best performance– correct DRAM ratio, slot configuration, processor core count, and workload behavior. In Memory mode, the DRAM acts as a fast cache for the most frequently used data. You want a DRAM cache that's big enough so the requested data is in the cache most of the time. Intel recommends a ratio of one gigabyte of DRAM for every 8 gigabytes of Optane Persistent Memory. This should provide an adequate DRAM cache for most workloads when using Memory mode. For example, if you have one terabyte of Optane persistent memory, you'll want to have 128 gigabytes of DRAM for an 8 to 1 ratio.

The second factor is the memory slot configuration. Each 2nd Generation Intel Xeon Scalable processor has two integrated memory controllers. Each controller has three memory channels. And each channel can support up to two memory slots. The configuration type is designated by the number of slots on each one of the memory controllers.

Here is a 2-2-2 config where each controller has two slots on each of its three memory channels. That's a total of 12 slots per processor. A 2-2-1 configuration has two slots on two channels and one slot on the third channel. That's 10 slots per processor. Here's a 2-1-1 configuration with eight slots per processor, and finally, a 1-1-1 with six total slots.

For best performance, you want to maximize the total memory bandwidth and the probability of a DRAM cache hit. To do this, place an Optane Persistent Memory module on the slot nearest the processor and a DRAM DIMM on the slot furthest from the processor. That means that a fast DRAM DIMM is available no matter which memory channel is accessed. Follow this rule on 2-2-2, 2-2-1, and 2-1-1 configurations.

In the actual platform, the population looks like this. I've placed a DRAM DIMM in the appropriate slot of each channel, which are the blue slots. The Optane Persistent Memory modules go on the black slots on each channel. Keep in mind that the color will depend on your server manufacturer. You can spot the Optane modules by the silver clips in the integrated heat spreader.

In a 1-1-1 configuration, you can only populate DRAM on two of the three channels for each memory controller. This limits the total bandwidth, so the 1-1-1 has the greatest risk of performance degradation. In App Direct mode, this configuration may be fine since the application has full control over memory behavior and can compensate for the lower bandwidth.

If the processor has more than 20 cores, use a configuration where you can place a DRAM on every channel. Those are the 2-2-2, 2-2-1, and 2-1-1 configurations. The 1-1-1 configuration should only be used with processors with 20 cores or fewer to make sure the cores aren't starved for memory services.

The final factor is application behavior. The Intel Xeon processor uses a prediction algorithm to cache the most frequently used data in DRAM. If the application has moderate to high levels of predictability, the probability of successful caching increases. When that happens, the response latency is the same as DRAM. But if the data isn't in DRAM cache, the processor must fetch it from Optane Persistent Memory, which has a somewhat higher latency. If the application's data access pattern is highly random, you get more cache misses, and performance will be lower than if the system had been populated with only DRAM.

Most applications are predictable enough that performance with Optane Persistent Memory is very close to an all DRAM configuration. But performance at a few benchmarks and applications can fall off noticeably, but those are situations that should be the minority. As with any new technology, you should test your applications to make sure that you'll meet your service level agreements.

So that's it. You should enjoy great performance with Optane Persistent Memory with the proper configuration focused on these four factors. You'll get more affordable capacity and much faster performance in storing persistent data on SSDs or hard drives. For more information on Intel Optane DC Persistent Memory, please visit us at intel.com/optane and see the links provided.