Use ipmctl to Debug Intel® Optane™ DC Persistent Memory Modules

ID 标签 689403
已更新 6/26/2019
版本 Latest
公共

author-image

作者

Introduction

This article describes how to debug or further configure your Intel® persistent memory devices with ipmctl. ipmctl is an open source tool maintained by Intel and is available for download on GitHub*. With ipmctl, you can select operating modes, create goals, provision capacities, create regions, and much more. The most common ipmctl calls are described in our Quick Start Guide.

This article assumes you have basic knowledge of ipmctl and persistent memory programming concepts. If you’re just getting started, check out the Quick Start Guide first, and come back to this article for debugging assistance.

Discover Configuration

Show Topology

To see available resources, use the show topology command, which displays both the Intel® Optane™ DC persistent memory modules and DDR4 dual in-line memory modules (DIMMs) discovered in the system by enumerating the SMIOS Type 17 tables. For more information on this, please refer to ACPI Specifications v6.0 or the Advanced Configuration Tables section of this article for NFIT table information.

Platform Configuration Details

You can learn many details about your configuration from looking at the platform configuration details (PCD) with the following command:

# ipmctl show –dimm 0x0001 -pcd

The tables that are shown when this command is run are:

  • Configuration Header
  • Current Config
  • Interleave Information
  • Identification Information x6
  • Conf Input
  • Conf Output
  • Partition Size Change
  • Interleave Information
  • Identification Information x6
  • Label Storage Area—Current Index
  • Label Storage Area—Labels

Advanced Configuration and Power Interface Tables

The following Advanced Configuration and Power Interface (ACPI) tables are available:

  • NFIT: The nonvolatile dual in-line memory module (NVDIMM) Firmware Interface Table
  • PCAT: The Platform Capabilities Table
  • PMTT: The Platform Memory Topology Table

Shortened versions of the output of each command can be seen below:

NFIT

# ipmctl show -system NFIT


---NVDIMM Firmware Interface Table---

Signature: NFIT

Length: 3296 bytes

Revision: 0x1

Checksum: 0x32

OEMID: INTEL

OEMTableID: S2600WF

OEMRevision: 0x2

CreatorID: INTL

CreatorRevision: 0x20091013

BwRegionTablesNum: 0

ControlRegionTablesNum: 12

FlushHintTablesNum: 12

InterleaveTablesNum: 24

NVDIMMRegionTablesNum: 24

SmbiosTablesNum: 0

SpaRangeTablesNum: 3

PlatformCapabilitiesTablesNum: 1

Type: 0x4

Length: 32 bytes

TypeEquals: ControlRegion

ControlRegionDescriptorTableIndex: 0x1

VendorId: 0x8980

DeviceId: 0x4151

Rid: 0x0

SubsystemVendorId: 0x8980

SubsystemDeviceId: 0x97a

SubsystemRid: 0x18

ValidFields: 0x1

ManufacturingLocation: 0xa2

ManufacturingDate: 0x3718

SerialNumber: 0x63110000

RegionFormatInterfaceCode: 0x301

NumberOfBlockControlWindows: 0x0

...


Type: 0x2

Length: 80 bytes

TypeEquals: Interleave

InterleaveStructureIndex: 0x9

NumberOfLinesDescribed: 0x10

LineSize: 0x100

LineOffset 0: 0x0

LineOffset 1: 0x3

LineOffset 2: 0x6

LineOffset 3: 0x9

LineOffset 4: 0xc

LineOffset 5: 0x3f

LineOffset 6: 0x42

LineOffset 7: 0x45

LineOffset 8: 0x48

LineOffset 9: 0x4b

LineOffset 10: 0x7e

LineOffset 11: 0x81

LineOffset 12: 0x84

LineOffset 13: 0x87

LineOffset 14: 0x8a

LineOffset 15: 0x8d

...


Type: 0x1

Length: 48 bytes

TypeEquals: NvDimmRegion

NfitDeviceHandle: 0x0001

NfitDeviceHandle.DimmNumber: 0x1

NfitDeviceHandle.MemChannel: 0x0

NfitDeviceHandle.MemControllerId: 0x0

NfitDeviceHandle.SocketId: 0x0

NfitDeviceHandle.NodeControllerId: 0x0

NvDimmPhysicalId: 0x28

NvDimmRegionalId: 0x0

SpaRangeDescriptionTableIndex: 0x1

NvdimmControlRegionDescriptorTableIndex: 0x1

NvDimmRegionSize: 0x3f00000000

RegionOffset: 0x0

NvDimmPhysicalAddressRegionBase: 0x10000000

InterleaveStructureIndex: 0x1

InterleaveWays: 0x6

NvDimmStateFlags: 0x34

...


Type: 0x0

Length: 56 bytes

TypeEquals: SpaRange

AddressRangeType: 66f0d379-b4f3-4074-ac43-0d3318b78cdb

SpaRangeDescriptionTableIndex: 0x1

Flags: 0x2

ProximityDomain: 0x2

SystemPhysicalAddressRangeBase: 0x3060000000

SystemPhysicalAddressRangeLength: 0x17a00000000

MemoryMappingAttribute: 0x8008

...


---NVDIMM Firmware Interface Table---

Signature: NFIT

Length: 3296 bytes

Revision: 0x1

Checksum: 0x32

OEMID: INTEL

OEMTableID: S2600WF

OEMRevision: 0x2

CreatorID: INTL

CreatorRevision: 0x20091013

BwRegionTablesNum: 0

ControlRegionTablesNum: 12

FlushHintTablesNum: 12

InterleaveTablesNum: 24

NVDIMMRegionTablesNum: 24

SmbiosTablesNum: 0

SpaRangeTablesNum: 3

PlatformCapabilitiesTablesNum: 1

Type: 0x4

Length: 32 bytes

TypeEquals: ControlRegion

ControlRegionDescriptorTableIndex: 0x1

VendorId: 0x8980

DeviceId: 0x4151

Rid: 0x0

SubsystemVendorId: 0x8980

SubsystemDeviceId: 0x97a

SubsystemRid: 0x18

ValidFields: 0x1

ManufacturingLocation: 0xa2

ManufacturingDate: 0x3718

SerialNumber: 0x63110000

RegionFormatInterfaceCode: 0x301

NumberOfBlockControlWindows: 0x0

...


Type: 0x2

Length: 80 bytes

TypeEquals: Interleave

InterleaveStructureIndex: 0x9

NumberOfLinesDescribed: 0x10

LineSize: 0x100

LineOffset 0: 0x0

LineOffset 1: 0x3

LineOffset 2: 0x6

LineOffset 3: 0x9

LineOffset 4: 0xc

LineOffset 5: 0x3f

LineOffset 6: 0x42

LineOffset 7: 0x45

LineOffset 8: 0x48

LineOffset 9: 0x4b

LineOffset 10: 0x7e

LineOffset 11: 0x81

LineOffset 12: 0x84

LineOffset 13: 0x87

LineOffset 14: 0x8a

LineOffset 15: 0x8d

...


Type: 0x1

Length: 48 bytes

TypeEquals: NvDimmRegion

NfitDeviceHandle: 0x0001

NfitDeviceHandle.DimmNumber: 0x1

NfitDeviceHandle.MemChannel: 0x0

NfitDeviceHandle.MemControllerId: 0x0

NfitDeviceHandle.SocketId: 0x0

NfitDeviceHandle.NodeControllerId: 0x0

NvDimmPhysicalId: 0x28

NvDimmRegionalId: 0x0

SpaRangeDescriptionTableIndex: 0x1

NvdimmControlRegionDescriptorTableIndex: 0x1

NvDimmRegionSize: 0x3f00000000

RegionOffset: 0x0

NvDimmPhysicalAddressRegionBase: 0x10000000

InterleaveStructureIndex: 0x1

InterleaveWays: 0x6

NvDimmStateFlags: 0x34

...


Type: 0x0

Length: 56 bytes

TypeEquals: SpaRange

AddressRangeType: 66f0d379-b4f3-4074-ac43-0d3318b78cdb

SpaRangeDescriptionTableIndex: 0x1

Flags: 0x2

ProximityDomain: 0x2

SystemPhysicalAddressRangeBase: 0x3060000000

SystemPhysicalAddressRangeLength: 0x17a00000000

MemoryMappingAttribute: 0x8008

...

PCAT

# ipmctl show -system PCAT


---Platform Configurations Attributes Table---

Signature: PCAT

Length: 136 bytes

Revision: 0x2

Checksum: 0xae

OEMID: INTEL

OEMTableID: S2600WF

OEMRevision: 0x2

CreatorID: INTL

CreatorRevision: 0x20091013



Type: 0x0

Length: 16 bytes

TypeEquals: PlatformCapabilityInfoTable

IntelNVDIMMManagementSWConfigInputSupport: 0x1

MemoryModeCapabilities: 0x27

CurrentMemoryMode: 0x14

PersistentMemoryRASCapability: 0x0



Type: 0x1

Length: 16 bytes

TypeEquals: MemoryInterleaveCapabilityTable

MemoryMode: 0x3

InterleaveAlignmentSize: 0x1e

NumberOfInterleaveFormatsSupported: 0x1

InterleaveFormatSupported(0): 0x801f4040



Type: 0x6

Length: 32 bytes

SocketSkuInfoTable

SocketID: 0x0

MappedMemorySizeLimit: 4947802324992

TotalMemorySizeMappedToSpa: 1828582326272

CachingMemorySize: 0

...

PMTT

# ipmctl show -system PMTT



---Platform Memory Topology Table---

Signature: PMTT

Length: 1336 bytes

Revision: 0x1

Checksum: 0x9f

OEMID: INTEL

OEMTableID: S2600WF

OEMRevision: 0x1

CreatorID: INTL

CreatorRevision: 0x20091013



--------------------------Socket--------------------------

Type: 0

Reserved1: 0

Length: 324

Flags:3

Reserved2:0

SocketId: 0

Reserved3: 0

-------------------iMC-------------------

Type: 1

Reserved1: 0

Length: 156

Flags:2

Reserved2:0

ReadLatency: 0

WriteLatency: 0

ReadBW: 0

WriteBW:0

OptimalAccessUnit:0

OptimalAccessAlignment:0

Reserved3:0

NoOfProximityDomains:0

ProximityDomainArray:1

----MODULE----

Type: 2

Reserved1: 0

Length: 20

Flags:2

Reserved2:0

PhysicalComponentId: 0

Reserved3: 0

SizeOfDimm: 32768

----MODULE----

...

Health Monitoring

Show DIMM Information

The show -dimm command displays the Intel Optane DC persistent memory modules discovered in the system and verifies that software can communicate with them. Among other information, this command outputs each DIMM’s ID, capacity, health state, and firmware version:

# ipmctl show –dimm

Sensor Health States

ipmctl has the ability to see health states of sensors located on each persistent memory module. The sensors available are:

  • Health
  • MediaTemperature
  • ControllerTemperature
  • PercentagRemaining
  • LatchedDirtyShutdownCount
  • PowerOnTime
  • UpTime
  • PowerCycles
  • FwErrorCount
  • UnlatchedDirtyShutdownCount

Use the following command to see sensor health for a specific module. Health values for all modules can be seen by not specifying a DimmID.

# ipmctl show -sensor -dimm 0x0001



DimmID | Type | CurrentValue | CurrentState

====================================================================

0x0001 | Health | Healthy | Normal

0x0001 | MediaTemperature | 33C | Normal

0x0001 | ControllerTemperature | 35C | Normal

0x0001 | PercentageRemaining | 100% | Normal

0x0001 | LatchedDirtyShutdownCount | 2 | Normal

0x0001 | PowerOnTime | 12944539s | Normal

0x0001 | UpTime | 2728s | Normal

0x0001 | PowerCycles | 80 | Normal

0x0001 | FwErrorCount | 8 | Normal

0x0001 | UnlatchedDirtyShutdownCount | 34 | Normal

Percentage Life Remaining

The remaining life of a persistent memory module is based on the number of reads/writes left in its lifetime. Use the following command to see the percentage of life remaining on each module. In the example below, you can see that DIMM 0x0101 has 45 percent life remaining, and the rest have 100 percent.

# ipmctl show -sensor PercentageRemaining



DimmID | Type | CurrentValue | CurrentState

============================================================

0x0001 | PercentageRemaining | 100% | Normal

0x0011 | PercentageRemaining | 100% | Normal

0x0021 | PercentageRemaining | 100% | Normal

0x0101 | PercentageRemaining | 45% | Normal

0x0111 | PercentageRemaining | 100% | Normal

0x0121 | PercentageRemaining | 100% | Normal

0x1001 | PercentageRemaining | 100% | Normal

0x1011 | PercentageRemaining | 100% | Normal

0x1021 | PercentageRemaining | 100% | Normal

0x1101 | PercentageRemaining | 100% | Normal

0x1111 | PercentageRemaining | 100% | Normal

0x1121 | PercentageRemaining | 100% | Normal

Similar to how in this call we can see the PercentageRemaining sensor value for each DIMM available, you could replace PercentageRemaining with any of the other sensor types and see their values that way.

On DIMM 0x0101, I injected an error to specify the PercentageRemaining to be 45 percent. You can read more about error injection in the Debugging section.

Change Sensor Thresholds

Each sensor has a set threshold that specifies the Normal range. On your modules, you can set your own threshold, called the NonCriticalThreshold. For example, if you were to set the MediaTemperature NonCriticalThreshold to a lower number than the Normal range, you would get a warning if the temperature went above that number specified. Each sensor’s threshold limit can be set with the following command:

# ipmctl set -sensor MediaTemperature -dimm 0x0001 NonCriticalThreshold=51 EnabledState=1


Modifying settings on DIMM (0x0001).

Do you want to continue? [y/n] y

Modify media temperature settings on DIMM 0x0001: Success

Performance

Show Sensor Performance Per DIMM

Performance indicators can be seen either per DIMM, per indicator, or all of the above as a big dump. To see all the performance indicators of a single DIMM, use this command:

# ipmctl show -dimm 0x0001 -performance


---DimmID=0x0001---

 MediaReads=0x0000000000000000000000011dd1d084

 MediaWrites=0x0000000000000000000000001e877cc0

 ReadRequests=0x000000000000000000000000000959b7

 WriteRequests=0x0000000000000000000000000000974f

 TotalMediaReads=0x00000000000000000000008c4c411278

 TotalMediaWrites=0x0000000000000000000000523e0292f8

 TotalReadRequests=0x000000000000000000000006b0fd3128

 TotalWriteRequests=0x000000000000000000000007dd265020

Here is the full list of performance indicators:

  • DimmID: The Intel Optane DC persistent memory module identifier.
  • MediaReads: Number of 64-byte reads from media on the Intel Optane DC persistent memory module since the last alternating current (AC) cycle.
  • MediaWrites: Number of 64-byte writes to media on the Intel Optane DC persistent memory module since the last AC cycle.
  • ReadRequests: Number of DDRT read transactions that the Intel Optane DC persistent memory module has serviced since the last AC cycle.
  • WriteRequests: Number of DDRT write transactions that the Intel Optane DC persistent memory module has serviced since the last AC cycle.
  • TotalMediaReads: Number of 64-byte reads from the media on the Intel Optane DC persistent memory module over its lifetime.
  • TotalMediaWrites: Number of 64-byte writes to media on the Intel Optane DC persistent memory module over its lifetime.
  • TotalReadRequest: Number of DDRT read transactions that the Intel Optane DC persistent memory module has serviced over its lifetime.
  • TotalWriteRequest: Number of DDRT write transactions that the Intel Optane DC persistent memory module has serviced over its lifetime.

Debugging

Discover Errors

To debug errors on your modules, the following commands will come in handy. Seeing the error log can easily be done with show error log command.

# ipmctl show -dimm 0x1111 -error Thermal Level=High

No errors found on DIMM 0x1111

Show error executed successfully

If an error is present, the output will be similar to:

# ipmctl show -dimm 0x0001 -error Media Level=High

Media Error occurred on DIMM 0x0001:

System Timestamp : Thu Jan 01 00:45:32 UTC 1998

DPA : 0x00012880

PDA : 0x00000001

Range : 4B

Error Type : 4 - Locked/Illegal Access

Error Flags : DPA Valid

Transaction Type : 10 - CSR Read

Sequence Number : 20

The –error option can be either Thermal or Media, with severity levels of either High or Low.

Inject an Error

For testing purposes, you may want to inject a mock error onto your persistent memory modules. Injectable errors include: Temperature, Poison, PoisonType, PackageSparing, PercentageRemaining, FatalMediaError, and DirtyShutdown. It is important to note that this command is only available when error injection is enabled on the Intel Optane DC persistent memory module in the BIOS. Examples of each of these can be seen in the ipmctl-inject-error man pages.

To change the PercentageRemaining:

# ipmctl set -dimm 0x1001 PercentageRemaining=84

Trigger a percentage remaining on DIMM 0x1001: Success

To change the Temperature (Celsius) variable:

# ipmctl set -dimm 0x1111 Temperature=12

Set temperature on DIMM 0x1111: Success

To clear injected errors, specify which injection property (Temperature, Poison, PoisonType, PackageSparing, PercentageRemaining, FatalMediaError, or DirtyShutdown), and add Clear=1. For example, the first call clears all DIMMs of any injected Temperature changes:

# ipmctl set -dimm Clear=1 Temperature=1

This call clears only DIMM 0x1001 of the injected PercentageRemaining change:

# ipmctl set -dimm 0x1001 PercentageRemaining=10 Clear=1

Diagnose Further Problems

Use the start diagnostic command to see a quick health overview of your persistent memory modules. After the –diagnostic flag, you can specify any of the following flags. Or, if left blank, all will display.

  • Quick - This test verifies that the Intel Optane DC persistent memory module host mailbox is accessible and that basic health indicators can be read and are currently reporting acceptable values.
  • Config - This test verifies that the BIOS platform configuration matches the installed hardware, and the platform configuration conforms to best-known practices.
  • Security - This test verifies that all Intel Optane DC persistent memory modules have a consistent security state. It is a best practice to enable security on all Intel Optane DC persistent memory modules, rather than just some.
  • FW - This test verifies that all Intel Optane DC persistent memory modules of a given model have consistent FW installed and other FW modifiable attributes are set in accordance with best practices.

Note that the test does not have a means of verifying that the installed FW is the optimal version for a given Intel Optane DC persistent memory module model, just that it has been consistently applied across the system.

For example, the following command shows all the diagnostic flags for DIMM 0x0001:

# ipmctl start -diagnostic -dimm 0x0001


---Diagnostic=Quick---

State=Ok

Message=The quick health check detected that the firmware on DIMM 0x0001 experienced a dirty shutdown before its latest restart.

The quick health check succeeded.

---Diagnostic=Config---

State=Ok

Message=The platform configuration check succeeded.

---Diagnostic=Security---

State=Ok

Message=The security check succeeded.

---Diagnostic=FW---

State=Warning

Message=The firmware consistency and settings check detected that DIMM 0x0001 is greater than system time by 21 seconds.

The firmware consistency and settings check detected that DIMM 0x0011 is greater than system time by 22 seconds.

The firmware consistency and settings check detected that DIMM 0x0021 is greater than system time by 23 seconds.

The firmware consistency and settings check detected that DIMM 0x0101 is reporting a percentage remaining of 45% which is below the recommended threshold 50%

The firmware consistency and settings check detected that DIMM 0x0101 is greater than system time by 22 seconds.

The firmware consistency and settings check detected that DIMM 0x0111 is greater than system time by 22 seconds.

The firmware consistency and settings check detected that DIMM 0x0121 is greater than system time by 22 seconds.

The firmware consistency and settings check detected that DIMM 0x1001 is greater than system time by 22 seconds.

The firmware consistency and settings check detected that DIMM 0x1011 is greater than system time by 22 seconds.

The firmware consistency and settings check detected that DIMM 0x1021 is greater than system time by 22 seconds.

The firmware consistency and settings check detected that DIMM 0x1101 is greater than system time by 22 seconds.

The firmware consistency and settings check detected that DIMM 0x1111 is greater than system time by 22 seconds.

The firmware consistency and settings check detected that DIMM 0x1121 is greater than system time by 23 seconds.

Security

Firmware Version

Show information about the firmware on one or more DIMMs:

# ipmctl show -firmware

DimmID | ActiveFWVersion | StagedFWVersion

============================================

0x0001 | 01.02.00.5310 | N/A

0x0011 | 01.02.00.5310 | N/A

0x0021 | 01.02.00.5310 | N/A

0x0101 | 01.02.00.5310 | N/A

0x0111 | 01.02.00.5310 | N/A

0x0121 | 01.02.00.5310 | N/A

0x1001 | 01.02.00.5310 | N/A

0x1011 | 01.02.00.5310 | N/A

0x1021 | 01.02.00.5310 | N/A

0x1101 | 01.02.00.5310 | N/A

0x1111 | 01.02.00.5310 | N/A

0x1121 | 01.02.00.5310 | N/A

Update Firmware

Update firmware on one or more DIMMs with the following command. To update all DIMMs, simply leave the –dimm tag off so that no DIMM is specified.

# ipmctl load -source (path) -dimm 0x0101

Firmware Debug Log

Dump the firmware debug log to a specified file destination using the following command:

# ipmctl dump -destination (file) -debug -dimm 0x0001

Display CLI version

The ipmctl command line version can easily be seen with the following command:

# ipmctl version

Intel(R) Optane(TM) DC Persistent Memory Command Line Interface Version 01.00.00.3402

Conclusion

ipmctl is a powerful tool used for configuring and managing Intel Optane DC persistent memory modules. This article outlines some of the most common ipmctl debugging and configuration commands used for learning more about your Intel Optane DC Persistent Memory Modules. The full ipmctl API can be found on the man pages or by typing ipmctl help at any time.

Resources

Man pages

Quick Start Guide

ipmctl GitHub

"