Machine Check Error Avoidance on Page Size Change

ID 标签 662881
已更新 11/12/2019
版本 2.0
公共

author-image

作者

Refer to the disclosure and list of affected processors for more information on CVE-2018-12207. 

Intel processors that support machine check architecture have a mechanism to detect and report hardware errors (such as system bus errors, ECC errors, parity errors, cache errors, and TLB errors) to system software. The machine check architecture enables the processor to generate a machine check exception to signal the detection of a machine check error. This allows system software and OS developers to gracefully shut down the system when hardware error conditions are detected on the platform.

Recently, Intel discovered that a previously published erratum on some Intel platforms can be exploited by malicious software to potentially cause a denial of service by triggering a machine check that will crash or hang the system. The error code logged for this machine check is 0150H.

This article describes the conditions for the machine check error erratum, details how the 0150H error can be exploited to potentially enable a denial of service, and describes methods to mitigate this issue. Refer to the List of Affected Processors section for a full list of affected processors and errata.

CVE and CVSS

The CVE assigned for the page size change MCE vulnerability is CVE-2018-12207 with a CVSS score of 6.5.

Description of Vulnerability

Architectural Background

Modern OSes use paging for programs, execution, and memory accesses. The processor translates programs' linear memory address accesses (for example, instruction fetch or data access) to physical addresses using page tables.

Page tables are data structures in memory used by the processor to translate linear addresses to physical addresses. These translations happen on certain granularity, also known as the page size, because the translation of linear address to physical address stays the same for the entire page. Intel processors support page sizes of 4 KB, 2 MB, 4 MB and 1 GB. More details on paging can be found in the Intel® 64 and IA-32 Architectures Software Developer Manuals Vol. 3, Section 4.9 Paging and Memory Typing.

To save the performance cost of walking through the page tables on each linear memory access within the same page, the processor uses a hardware structure called the translation lookaside buffer (TLB) to cache resolved translations from linear addresses to physical addresses. The processor can use the cached translations instead of performing a page table walk on the next access to a linear address within the same page. The page size change MCE issue is related to the instruction TLB (ITLB) that contains the code fetch address translations.

Privileged system software, like the OS or VMM, is responsible for platform memory management and thus owns the page tables. The OS or VMM may perform page table modifications to create a new translation or to alter an existing translation for a linear address. When the OS or VMM modifies an existing page table translation and the TLB already contains a translation for that page, the processor may choose either translation (for example, either from the TLB or from the page table in memory). The implications of this software-induced inconsistency are described in the Intel® 64 and IA-32 Architectures Software Developer Manuals Vol. 3, Section 4.10.4.4 Delayed Invalidation.

Description of Errata

There are a series of errata that have been filed for the instruction fetch unit in multiple generations of Intel processors. The full list is available in the List of Affected Processors section. An example for 6th generation Intel® Core™ processors is shown below.

Table 1. SKL002
SKL002 Instruction Fetch May Cause Machine Check if Page Size Was Changed Without Invalidation
Problem This erratum may cause a machine-check error
(IA32_MCi_STATUS.MCACOD=0150H) on the fetch of an instruction that crosses a 4 KB address boundary. It applies only if all of the following are true:
  1. The 4 KB linear region on which the instruction begins is originally translated using a 4 KB page with the WB memory type
  2. The paging structures are later modified so that the linear region is translated using a large page (2 MB, 4 MB or 1 GB) with the UC memory type.
  3. The instruction fetch occurs after the paging structure modification but before software invalidates any TLB entries for the linear region.
Implication Due to this erratum an unexpected machine check with error code 0150H may occur, possibly resulting in a shutdown. Intel has not observed this erratum with any commercially available software.
Workaround Software should not write to a paging-structure entry in a way that would change, for any linear address, both the page size and the memory type. It can instead use the following algorithm: first clear the P flag in the relevant paging-structure entry (for example, PDE); then invalidate any translations for the affected linear addresses, and then modify the relevant paging-structure entry to set the P flag and establish the new page size and memory type.

Software sequences that may lead to machine check error code 0150H can be summarized as follows:

  1. Code is fetched from a linear address translated using a 4 KB translation cached in the ITLB.
  2. Software modifies the paging structures so that the same linear address is translated using a large page (2 MB, 4 MB, or 1 GB) with a different physical address or memory type.
  3. After the paging structure modification, but before software invalidates any ITLB entries for the linear address, code fetch happens again on the same linear address.
  4. This may cause a machine-check error (IA32_MCi_STATUS.MCACOD=150H), which can result in a system hang or shutdown.

These errata are applicable only to code pages. There is no known issue with the data pages in similar page size change conditions.

Implications for Bare Metal OS

Applications cannot cause an OS to make changes to page tables that would trigger the conditions described in the erratum. Intel has worked with industry partners to ensure that OSes follow the guidelines documented in the Intel® 64 and IA-32 Architectures Software Developer Manuals. There is no known security vulnerability created by this erratum in bare metal OS environments.

Implications for VMM/Hypervisor

In virtualized environments, a malicious guest has direct access to its own page tables and could make changes to them in ways that would trigger this issue. Note that when an off-the-shelf commercial operating system is used in a Platform as a Service (PAAS) or Software as a Service (SAAS) environment, the applications in that environment do not have the necessary access to force the OS to make changes to page tables that would trigger the issue. Infrastructure as a service (IAAS) cloud services allow users to provision their own operating system and other privileged software, like device drivers. In this case, the privileged guest software is outside of the trusted computing base (TCB) of the host cloud service provider and could potentially contain malicious code sequences.

Mitigation Strategy for VMM/Hypervisor

The page size change MCE issue can be mitigated by applying software algorithms to the VMM/hypervisor. This section describes:

  1. How to determine if a CPU is vulnerable.
  2. Mitigation algorithms that can be applied in VMM/hypervisor software.

Enumeration of Vulnerability

Intel has added a new bit in the IA32_ARCH_CAPABILITIES MSR to current and future generation CPUs to help VMMs and hypervisor software determine if the processor is vulnerable to the page size change MCE issue. Your system may need to apply the latest MCUs to correctly detect the vulnerability.

Table 2. IA32_ARCH_CAPABILITIES, MSR Address: 10Ah. Presence enumerated with CPUID.7.0.EDX
Bit  Scope Attribute Description
6 Thread Read only IF_PSCHANGE_MC_NO: The processor is not susceptible to a machine check error due to modifying the size of a code page without TLB invalidation.
  • If CPUID.(EAX=07H,ECX=0H).EDX[29] is set, then the IA32_ARCH_CAPABILITIES MSR (address 10AH) exists. If IA32_ARCH_CAPABILITIES[IF_PSCHANGE_MC_NO] (bit 6) is set, then the CPU is not vulnerable to the page size change MCE issue.
  • If the IA32_ARCH_CAPABILITIES MSR does not exist, or if IF_PSCHANGE_MC_NO is clear, then the CPU may be vulnerable to the page size change MCE issue.
    • Determining whether a CPU is vulnerable in this case requires checking the CPUID Family/Model/Stepping against the list of affected processors in the List of Affected Processors section.
    • Alternatively, a list of unaffected processors can be used to determine the vulnerability of a CPU. This list and a suggested algorithm is also shown in the Unaffected Processors section.

Pseudocode that represents the algorithms is shown below:

If (CPUID.(EAX=07H,ECX=0H).EDX[29] == 1 && IA32_ARCH_CAPABILITIES[IF_PSCHANGE_MC_NO] == 1) {
 	CPU is not vulnerable; no mitigation needed
} else {
 	CPU may be vulnerable
Consult CPUID Family/Model/Stepping for further determination
Use SW mitigation if needed
}

Software Mitigation Algorithms

VMMs can mitigate the page size change MCE vulnerability by using the software techniques described below.

The VMM can use Extended Page Tables (EPT) to enforce that each guest physical address is 4 KB in size and that guest software cannot change the hardware page size for translations. Guests running on the VMM with EPT enabled in this way will protect the hardware from the page size change MCE vulnerability, but may encounter impacts from longer page walks, and reduced TLB coverage.

One approach to minimize these impacts is to limit the use of 4 KB entries to only those locations where they are strictly required to mitigate the MCE issue. Since only executable translations can be cached in the ITLB, 2 MB, 4 MB, and 1 GB page EPT entries may be used as long as they are always marked nonexecutable. The following example shows how VMMs can accomplish this using a 2 MB large page:

Converting large pages using EPT

Figure 1. Converting large pages using EPT

  1. Clear the execute permission (bit 2) in the EPT paging-structure entries for large pages. Usually the VMM sets all the permissions (bit position: 2-0) there, for example, execute access (X), write access (W), and read access (R).
  2. Intercept an EPT execution-permission violation caused by an attempt to execute an instruction in the guest.
  3. Convert the large page into 4 KB pages, adding a new EPT page table. The exit-qualification field contains sufficient information for the VMM to determine whether the guest access was an instruction fetch. The guest-physical address field receives the guest-physical address that caused the EPT violation, which should be the address of the instruction (in the guest-physical address).
  4. Add execution permission (bit 2) to the new EPT page table entry (PTE) and page directory entry (PDE) for those 4 KB pages where the instruction is located.
  5. Resume execution of the guest software (VM entry).

The sequence above requires bit 10 (Execute access for user-mode linear address) to be treated in the same manner as bit 2 when mode-based execution controls are active (the mode-based execute control for EPT feature is present and the VM execution control is set to 1).

As soon as the guest software attempts to execute code mapped in a 2 MB, 4 MB, or 1 GB page, the VMM gets a notification of an EPT violation due to lack of execution permission. The exception handler in the VMM can convert the large page into 4 KB pages. When the VMM resumes the guest software, the software can continue to run, and the page size information obtained by the ITLB will be 4 KB.

The large page used by the EPT paging structures can be a 1 GB or 2 MB page. To resolve such EPT violations, the VMM can convert the executable 2 MB area into 4 KB pages and add execution permission. If the large page is 1 GB, the VMM can convert the nonexecutable areas into 2 MB pages with nonexecutable permission as an intermediate step instead of converting the entire 1 GB region into 4 KB pages. Through the detection of EPT violations described above, only code pages will get converted pages in a selective manner.

One of the notable consequences of the above algorithm is that the number of 4 KB pages could constantly increase as the guest software runs more workloads. If pages used for code execution are randomly chosen in the guest software, the large pages are converted to 4 KB pages probabilistically. This would eventually start to cause performance impacts on workloads. To avoid this situation, the VMM can periodically scan the EPT paging table structures to find and clean up 4 KB pages by combining 512 of them into a 2 MB page with the execute permission bit cleared at runtime.

Mitigation Enabling Guidelines

Intel has been working with industry partners to make necessary software changes to enable software mitigations. It is possible that your latest system software or VMM/hypervisor already contains a security mitigation that is either enabled by default or can be optionally enabled. Intel strongly recommends that users keep their systems up-to-date with the latest software, and also recommends confirming the mitigation status of your VMM/hypervisor with your software provider to ensure that mitigations are properly enabled in exposed environments. Some VMMs/hypervisors may support configuring these mitigation on a per-VM basis. The guidelines below will help you evaluate your VM environment.

As described in earlier sections, it is important to note that the page size change MCE vulnerability applies only to certain environments. The following general guidelines can help you determine whether it is necessary to enable the mitigation in your environment. This decision can be highly dependent on the actual environment and requirements of your system.

  • In bare-metal environments:
    • While there were a few reports of commercial off-the-shelf software that triggered this erratum, Intel has worked with industry partners and OS developers to update software to follow Intel's software guidelines and to release fixes. Intel recommends users update their OSes with the latest patches. Intel is not aware of any security issues caused by the page size change MCE condition in bare-metal environments.
  • In virtualized environments:
    • Intel suggests enabling the VMM mitigation for those VM guests whose privileged software stacks and OS are not trusted.
    • Guest VMMs in nested virtualization environments don’t need to activate the software mitigations because the root VMM virtualizes EPT for the guest software. Accordingly, the root VMM would enforce the software mitigation using the real EPT advertising the IA32_ARCH_CAPABILITIES MSR with IF_PSCHANGE_MC_NO set. Enabling the software mitigation in the nested guest VMMs would be ineffective.

It is possible to unnecessarily enable the software mitigation due to incorrect diagnosis of the vulnerability (described in the Enumeration of Vulnerability section). However, the suggested software techniques in the Software Mitigation Algorithms section is compatible with non-vulnerable CPUs and should not cause functional issues.

Applying the software mitigation described here may result in reduced TLB coverage and increased TLB fill costs, which may affect system behavior.

Additional Considerations for CPU model 06_2DH

On certain processors (family model 06_2DH with Intel® Virtualization Technology (Intel® VT) for Directed I/O (Intel® VT-d), Intel has observed performance degradation when using 2 MB pages for Intel® VT-d translations. Intel recommends only using 4 KB pages for Intel VT-d.

Furthermore, Intel has observed that when sharing page tables with EPT and Intel VT-D, breaking up 2 MB mappings to 4 KB mappings while DMA is in progress may result in spurious unrecoverable Intel VT-d faults on this processor family.

Therefore, Intel suggests an additional mitigation step for this processor family to ensure that EPT tables and Intel VT-d tables are not shared.

List of Affected Processors

Processors Affected: Machine Check Error Avoidance on Page Size Change

 

Software Security Guidance Home | Advisory Guidance | Technical Documentation | Best Practices | Resources