Discover Persistent Memory Programming Errors with Pmemcheck

ID 标签 688410
已更新 12/7/2018
版本 Latest
公共

author-image

作者

Introduction

Persistent memory programming introduces new opportunities as well as new challenges. For example, developers need to be aware of errors due to improper handling of the data placed in persistent memory. This article covers potential pitfalls and the tools available for eliminating these errors.

Pmemcheck is a new Valgrind* tool developed by Intel, very similar to memcheck (the default tool in Valgrind to discover memory-related bugs) but adapted for persistent memory. All the libraries that are part of the Persistent Memory Developer Kit (PMDK) are already instrumented with pmemcheck. If you use PMDK for persistent memory programming, you will be able to easily check your code with pmemcheck without any code modification.

Note: Pmemcheck is not the only option for debugging persistent memory code. Intel has its own tool, the Persistence Inspector in Intel® Inspector. Persistence Inspector is available as part of Intel® Parallel Studio XE and Intel® System Studio. To learn more, please refer to How to Detect Persistent Memory Programming Errors Using Intel® Inspector – Persistent Inspector.

This article assumes that you have a basic understanding of persistent memory concepts and are familiar with general PMDK features. If not, please visit Persistent Memory Programming on Intel® Developer Zone, where you will find the information you need to get started.

Valgrind* Framework

According to their website, Valgrind is:

“…an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools.”

The two simple examples below will give you a better idea of how Valgrind works.

Out-of-bound example

The out-of-bound bug is a special case of the stack/buffer overflow bug, where data is written or read beyond the capacity of the stack or array. Consider the following small code snippet:

#include <stdlib.h>
int main (void)
{
        int *stack = malloc (100*sizeof(int));
        stack[100] = 1234;
        free (stack);
        return 0;
}

As shown, there is an error in the second statement of the main() function where we are assigning the value 1234 to the position 100, which is outside the array (valid positions are 0-99). If you compile and run this code, it may not fail. This is because, even if we only allocated 400 bytes (100 integers) for our array, the operating system has given us a whole memory page, which is normally 4 KB. Run Valgrind to see what it discovers:

$ valgrind ./stackoverflow
==4188== Memcheck, a memory error detector
...
==4188== Invalid write of size 4
==4188==    at 0x400556: main (stackoverflow.c:5)
==4188==  Address 0x51f91d0 is 0 bytes after a block of size 400 alloc'd
==4188==    at 0x4C2EB37: malloc (vg_replace_malloc.c:299)
==4188==    by 0x400547: main (stackoverflow.c:4)
...
==4188== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

The snippet above shows only the relevant part of the output, which corresponds to the error Invalid write. When compiling code with symbol information (gcc -g), it’s easy to see the exact place in the code where the error is detected. In this case, in line 5 of the stackoverflow.c file.

Memory leak example

Consider the following code:

#include <stdlib.h>
void func (void) {
        int* stack = malloc (100 * sizeof(int));
}
int main (void) {
        func ();
        return 0;
}

The allocation is moved now to the function func(). The leak occurs because the pointer to the newly allocated memory is a local variable, which is lost when the function returns. When running Valgrind, the results are:

$ valgrind --leak-check=yes ./leak
==4413== Memcheck, a memory error detector
...
==4413== 400 bytes in 1 blocks are definitely lost in loss record 1 of 1
==4413==    at 0x4C2EB37: malloc (vg_replace_malloc.c:299)
==4413==    by 0x4004F7: func (leak.c:3)
==4413==    by 0x400507: main (leak.c:6)
==4413==
==4413== LEAK SUMMARY:
...
==4413== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Valgrind shows a loss of 400 bytes of memory allocated at leak.c:3.

These are just two small examples. To learn more, please visit the official Valgrind documentation.

Debugging with Pmemcheck

To run pmemcheck, you need an enhanced version of Valgrind that supports the new CLFLUSHOPT and CLWB flushing instructions. This enhanced version—as well as the pmemcheck tool—are available on GitHub*. Please follow the instructions there to install it.

Pmemcheck can detect the following:

  • Non-persistent stores: This refers to data written to persistent memory but not flushed explicitly. This is clearly a problem, since data not flushed may still sit on the CPU caches and could be lost if the process were to crash unexpectedly.
  • Stores not added into a transaction: When working within a transaction block, it is assumed that all the modified persistent memory locations have been added to it at the beginning (which also implies that their previous values are copied to an undo log). This allows the transaction to implicitly flush added locations at the end of the block or roll back to the old values in the event of an unexpected failure. A modification within a transaction to a location that is not added to the transaction is most surely a bug that pmemcheck will warn about.
  • Memory added to two different transactions: In the case where one program can work with multiple transactions simultaneously, adding the same memory object to multiple transactions has the potential to corrupt data. This is the case in PMDK, for example, where the library maintains a different transaction per thread. If two threads write to the same object within transactions, there are some scenarios where one thread crashing can override other non-crashing threads’ modifications.
  • Memory overwrites: This refers to the case where multiple modifications to the same persistent memory location occur before the location is made persistent. This issue is mostly related to performance, although it can uncover lack of flushing too. In general, it is always better to use volatile memory for short-lived data.
  • Unnecessary flushes: Flushing should be done carefully. Detecting unnecessary flushes (such as redundant ones) can help in improving code performance.

To instrumentalize code for pmemcheck, we need to add a set of macros indicating what memory locations correspond to persistent memory and when transactions start and stop. As mentioned in the introduction of this article, PMDK libraries are already instrumentalized for pmemcheck. If you use them, there is no need to worry about this. Nevertheless, it is always good to understand how this process works.

Non-persistent stores

Consider the following code writing data to persistent memory (it is assumed that a persistent memory device—real or emulated using DRAM—is mounted at /mnt/pmem):

#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <valgrind/pmemcheck.h>
int main (int argc, char *argv[]) {
        int fd, *data;
        fd = open ("/mnt/pmem/file", O_CREAT|O_RDWR, 0666);
        posix_fallocate (fd, 0, sizeof (int));
        data = (int *)mmap (NULL, sizeof (int), PROT_READ|PROT_WRITE,
                            MAP_SHARED, fd, 0);
        VALGRIND_PMC_REGISTER_PMEM_MAPPING (data, sizeof (int));
        *data = 1234;        
        munmap (data, sizeof (int));
        VALGRIND_PMC_REMOVE_PMEM_MAPPING (data, sizeof (int));
        return 0;
}

After the file is opened calling open(), we make sure that there is enough space in the file to allocate an integer by calling posix_fallocate(). Then, the file is memory-mapped with mmap() and the pointer registered to pmemcheck with the macro VALGRIND_PMC_REGISTER_PMEM_MAPPING. Finally, data is written to persistent memory and the file unmapped. We tell pmemcheck that we are unmapping the file with the macro VALGRIND_PMC_REMOVE_PMEM_MAPPING.

If we run pmemcheck, it will show that the data is not being flushed after the write:

$ valgrind --tool=pmemcheck ./test1b
==8904== pmemcheck-1.0, a simple persistent store checker
...
==8904== Number of stores not made persistent: 1
==8904== Stores not made persistent properly:
==8904== [0]    at 0x4008B4: main (test1b.c:12)
==8904==        Address: 0x4027000      size: 4 state: DIRTY
==8904== Total memory not made persistent: 4
==8904== ERROR SUMMARY: 1 errors

To fix this and flush the write, a new function flush() is added which flushes all the cache lines storing any part of the data with the instruction CLFLUSH:

#include <emmintrin.h>
#include <stdint.h>
#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <valgrind/pmemcheck.h>
void flush (const void *addr, size_t len) {
        uintptr_t flush_align = 64, uptr;
        for (uptr = (uintptr_t)addr & ~(flush_align -1);
                uptr < (uintptr_t)addr + len; uptr += flush_align)
                _mm_clflush ((char *)uptr);
}
int main (int argc, char *argv[]) {
        int fd, *data;
        fd = open ("/mnt/pmem/file", O_CREAT|O_RDWR, 0666);
        posix_fallocate (fd, 0, sizeof (int));
        data = (int *)mmap (NULL, sizeof (int), PROT_READ|PROT_WRITE,
                            MAP_SHARED, fd, 0);
        VALGRIND_PMC_REGISTER_PMEM_MAPPING (data, sizeof (int));
        *data = 1234;
        flush ((void *)data, sizeof (int));
        munmap (data, sizeof(int));
        VALGRIND_PMC_REMOVE_PMEM_MAPPING (data, sizeof (int));
        return 0;
}

Running pmemcheck now shows:

$ valgrind --tool=pmemcheck ./test1bfixed
==9710== pmemcheck-1.0, a simple persistent store checker
...
==9710== Number of stores not made persistent: 0
==9710== ERROR SUMMARY: 0 errors

Stores not added into a transaction

Consider the following code:

#include <emmintrin.h>
#include <stdint.h>
#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <valgrind/pmemcheck.h>
void flush (const void *addr, size_t len) {
        uintptr_t flush_align = 64, uptr;
        for (uptr = (uintptr_t)addr & ~(flush_align -1);
                uptr < (uintptr_t)addr + len; uptr += flush_align)
                _mm_clflush ((char *)uptr);
}
int main (int argc, char *argv[]) {
        int fd, *data;
        fd = open ("/mnt/pmem/file", O_CREAT|O_RDWR, 0666);
        posix_fallocate (fd, 0, sizeof (int));
        data = (int *)mmap (NULL, sizeof (int), PROT_READ|PROT_WRITE,
                            MAP_SHARED, fd, 0);
        VALGRIND_PMC_REGISTER_PMEM_MAPPING (data, sizeof (int));
        VALGRIND_PMC_START_TX;
        *data = 1234;
        flush ((void *)data, sizeof (int));
        VALGRIND_PMC_END_TX;
        munmap (data, sizeof(int));
        VALGRIND_PMC_REMOVE_PMEM_MAPPING (data, sizeof (int));
        return 0;
}

The only difference between this snippet and the previous one is two lines of code. One to “start” a transaction: VALGRIND_PMC_START_TX, and one to “end” it: VALGRIND_PMC_END_TX. The reason for the quotes is that we are not really creating a transaction here, but just telling pmemcheck we are doing so. Pmemcheck does not have a way to tell, since the nature of what constitutes a transaction can vary between different implementations. Usually this is hidden behind a library (such as libpmemobj from PMDK), and so it is the responsibility of the library’s developers to instrument the code appropriately.

Run pmemcheck to test:

$ valgrind --tool=pmemcheck ./test6
==23514== pmemcheck-1.0, a simple persistent store checker
...
==23514== Number of stores made without adding to transaction: 1
==23514== Stores made without adding to transactions:
==23514== [0]    at 0x400A8D: main (test6.c:21)
==23514==       Address: 0x4029000      size: 4
==23514== ERROR SUMMARY: 1 errors

Well, let’s “add” the store (again, we do not have a real transaction here) to the transaction by adding the following line of code right after VALGRIND_PMC_START_TX:

...
        VALGRIND_PMC_ADD_TO_TX (data, sizeof (int));
...

Running pmemcheck again results in:

$ valgrind --tool=pmemcheck ./test6fixed
==23745== pmemcheck-1.0, a simple persistent store checker
...
==23745== Number of stores not made persistent: 0
==23745== ERROR SUMMARY: 0 errors

As shown, now that all writes to persistent memory within the transaction are properly “added” to the transaction, pmemcheck gives the OK.

Memory added to two different transactions

Consider the following code:

#include <emmintrin.h>
#include <stdint.h>
#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <valgrind/pmemcheck.h>
#include <pthread.h>
void flush (const void *addr, size_t len) {
        uintptr_t flush_align = 64, uptr;
        for (uptr = (uintptr_t)addr & ~(flush_align -1);
                uptr < (uintptr_t)addr + len; uptr += flush_align)
                _mm_clflush ((char *)uptr);
}
void *func (void *args) {
        int *data = (int *) args;
        VALGRIND_PMC_START_TX_N (1);
        VALGRIND_PMC_ADD_TO_TX_N (1, data, sizeof (int));
        *data = 4321;
        flush ((void *)data, sizeof (int));
        VALGRIND_PMC_END_TX_N (1);
}
int main (int argc, char *argv[]) {
        int fd, *data;
        pthread_t thread;
        fd = open ("/mnt/pmem/file", O_CREAT|O_RDWR, 0666);
        posix_fallocate (fd, 0, sizeof (int));
        data = (int *)mmap (NULL, sizeof (int), PROT_READ|PROT_WRITE,
                            MAP_SHARED, fd, 0);
        VALGRIND_PMC_REGISTER_PMEM_MAPPING (data, sizeof (int));
        VALGRIND_PMC_START_TX_N (0);
        VALGRIND_PMC_ADD_TO_TX_N (0, data, sizeof (int));
        pthread_create (&thread, NULL, func, (void *) data);
        *data = 1234;
        flush ((void *)data, sizeof (int));
        pthread_join (thread, NULL);
        VALGRIND_PMC_END_TX_N (0);
        munmap (data, sizeof(int));
        VALGRIND_PMC_REMOVE_PMEM_MAPPING (data, sizeof (int));
        return 0;
}

A couple of new things are added in this snippet. The first is the change of the transaction macros to include transaction IDs, so that multiple transactions can be created simultaneously. For example, VALGRIND_PMC_START_TX_N (n) is used to create transaction number n, instead of just VALGRIND_PMC_START_TX. The second thing is creating a thread that will run the function func(). This function creates a new transaction, with id 1, which does the same thing that transaction 0 does in main(). That is, it adds data to the transaction, writes to it, and then ends the transaction.

Pmemcheck will complain about this:

$ valgrind --tool=pmemcheck ./test8
==7015== pmemcheck-1.0, a simple persistent store checker
...
==7015== Number of stores not made persistent: 0
==7015==
==7015== Number of overlapping regions registered in different transactions: 1
==7015== Overlapping regions:
==7015== [0]    at 0x4009E5: func (test8.c:17)
==7015==    by 0x4C32593: start_thread (in /usr/lib64/libpthread-2.27.so)
==7015==    by 0x4F43E6E: clone (in /usr/lib64/libc-2.27.so)
==7015==        Address: 0x4027000      size: 4 tx_id: 1
==7015==    First registered here:
==7015== [0]'   at 0x400C17: main (test8.c:31)
==7015==        Address: 0x4027000      size: 4 tx_id: 0
==7015== ERROR SUMMARY: 1 errors

It is best to avoid this type of situation. However, if you must write to a persistent memory region from different threads, make sure to always use a locking mechanism, so only one thread writes at once. You should also make sure that the transaction and the lock end at the same time. If your lock ends before the transaction ends, there is a chance that a thread may fail right in that spot. If that is the case, the recovery mechanism can override new changes done by other threads that have acquired the lock after the failing thread released it but before it finished the transaction.

Memory overwrites

Consider the following code:

#include <emmintrin.h>
#include <stdint.h>
#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <valgrind/pmemcheck.h>
void flush (const void *addr, size_t len) {
        uintptr_t flush_align = 64, uptr;
        for (uptr = (uintptr_t)addr & ~(flush_align -1);
                uptr < (uintptr_t)addr + len; uptr += flush_align)
                _mm_clflush ((char *)uptr);
}
int main (int argc, char *argv[]) {
        int fd, *data;
        fd = open ("/mnt/pmem/file", O_CREAT|O_RDWR, 0666);
        posix_fallocate (fd, 0, sizeof (int));
        data = (int *)mmap (NULL, sizeof (int), PROT_READ|PROT_WRITE,
                            MAP_SHARED, fd, 0);
        VALGRIND_PMC_REGISTER_PMEM_MAPPING (data, sizeof (int));
        *data = 1234;
        *data = 4321;
        flush ((void *)data, sizeof (int));
        munmap (data, sizeof(int));
        VALGRIND_PMC_REMOVE_PMEM_MAPPING (data, sizeof (int));
        return 0;
}

Realize that in the code above we are writing to data twice before flushing. To check for overwrites, run pmemcheck with the option mult-stores:

$ valgrind --tool=pmemcheck --mult-stores=yes ./test2b
==23985== pmemcheck-1.0, a simple persistent store checker
...
==23985== Number of overwritten stores: 1
==23985== Overwritten stores before they were made persistent:
==23985== [0]    at 0x40090A: main (test2b.c:20)
==23985==       Address: 0x4027000      size: 4 state: DIRTY
==23985== ERROR SUMMARY: 1 errors

Fix this by either inserting a flushing instruction between the writes or eliminating one of the writes. For example, if the last (and definitive) write depends on previous values of some intermediate computation, it is better to store those in a volatile variable than to do it on a persistent variable.

Unnecessary flushes

Consider the following code:

#include <emmintrin.h>
#include <stdint.h>
#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <valgrind/pmemcheck.h>
void flush (const void *addr, size_t len) {
        uintptr_t flush_align = 64, uptr;
        for (uptr = (uintptr_t)addr & ~(flush_align -1);
                uptr < (uintptr_t)addr + len; uptr += flush_align)
                _mm_clflush ((char *)uptr);
}
int main (int argc, char *argv[]) {
        int fd, *data;
        fd = open ("/mnt/pmem/file", O_CREAT|O_RDWR, 0666);
        posix_fallocate (fd, 0, sizeof (int));
        data = (int *)mmap (NULL, sizeof (int), PROT_READ|PROT_WRITE,
                            MAP_SHARED, fd, 0);
        VALGRIND_PMC_REGISTER_PMEM_MAPPING (data, sizeof (int));
        *data = 1234;
        flush ((void *)data, sizeof (int));
        flush ((void *)data, sizeof (int));
        munmap (data, sizeof(int));
        VALGRIND_PMC_REMOVE_PMEM_MAPPING (data, sizeof (int));
        return 0;
}

Realize that we are flushing data twice. To check for unnecessary flushes, we need to run pmemcheck with the option flush-check:

$ valgrind --tool=pmemcheck --flush-check=yes ./test3b
==24165== pmemcheck-1.0, a simple persistent store checker
...
==24165== Number of stores not made persistent: 0
==24165==
==24165== Number of unnecessary flushes: 1
==24165== [0]    at 0x400819: flush (emmintrin.h:1485)
==24165==    by 0x400931: main (test3b.c:22)
==24165==       Address: 0x4027000      size: 64
==24165== ERROR SUMMARY: 1 errors

The tool shows that there is an unnecessary flush on line 22, go ahead and delete it.

This is as much detail as this article will go into pmemcheck. For more information, please review the documentation section of the pmemcheck repository.

PMDK

As mentioned in the introduction, all the libraries that are part of the PMDK are already instrumented with pmemcheck. If you use PMDK for persistent memory programming, you will be able to automatically check your code with pmemcheck without any code modification.

Consider the following code using libpmemobj with the C++ bindings:

#include <libpmemobj++/persistent_ptr.hpp>
using namespace std;
namespace pobj = pmem::obj;

struct my_root {
        int value;
        int is_odd;
};
int main (int argc, char *argv[]) {
        pobj::pool<my_root> pop;
        pop = pobj::pool<my_root>::create ("/mnt/pmem/pool", "TEST7",
                                           (1024*1024*100), 0666);
        auto proot = pop.root ();
        proot->value = 1234;
        proot->is_odd = proot->value % 2;
        return 0;
}

Two variables are being modified here without flushing or using a transaction. Running pmemcheck shows:

$ valgrind --tool=pmemcheck ./test7
==2637== pmemcheck-1.0, a simple persistent store checker
...
==2637== Number of stores not made persistent: 2
==2637== Stores not made persistent properly:
==2637== [0]    at 0x400F49: main (test7.cpp:14)
==2637==        Address: 0x7fc0550      size: 4 state: DIRTY
==2637== [1]    at 0x400F84: main (test7.cpp:15)
==2637==        Address: 0x7fc0554      size: 4 state: DIRTY
==2637== Total memory not made persistent: 8
==2637== ERROR SUMMARY: 2 errors

To fix this problem, add the following 2 lines of code after the stores:

...
        pop.persist (&(proot->value), sizeof(int));
        pop.persist (&(proot->is_odd), sizeof(int));
...

Pmreorder

New tools are continuing to be developed to help with robust consistency checking. PMDK 1.5 introduced a tool called pmreorder, which is a set of Python* scripts to parse and replay operations logged by pmemcheck. Learn more about pmreorder by reading its manpage pmreorder(1).

Summary

This article reviewed new potential bugs affecting persistent memory programming and how to find them by instrumenting the code with pmemcheck. Pmemcheck is a new Valgrind tool developed by Intel and similar to memcheck, the default tool in Valgrind to discover memory-related bugs. The difference is that Pmemcheck is adapted for persistent memory. The article showed how to detect five potential bugs: non-persistent stores, stores not added into a transaction, memory added to two different transactions, memory overwrites, and unnecessary flushes. Moreover, all the libraries that are part of the Persistent Memory Developer Kit (PMDK) are already instrumented with pmemcheck. If you use PMDK for persistent memory programming, you will be able to automatically check your code with pmemcheck without any code modification.

About the Author

Eduardo Berrocal is a cloud software engineer at Intel working on persistent memory. He received his Ph.D. in computer science from the Illinois Institute of Technology (IIT) in Chicago, Illinois. His doctoral research focused on (but was not limited to) data analytics and fault tolerance for high-performance computing. In the past, he worked as a summer intern at Bell Labs (Nokia), as a research aide at Argonne National Laboratory, as a scientific programmer and web developer at the University of Chicago, and as an intern in the CESVIMA laboratory in Spain.

Resources

"