Introduction
DEFLATE compression algorithms traditionally use either a dynamic or static compression table. Those who want the best compression results use a dynamic table at the cost of more processing time, while the algorithms focused on throughput will use static tables. The Intel® Intelligent Storage Acceleration Library (Intel® ISA-L) semi-dynamic compression comes close to getting the best of both worlds. In addition, Intel® ISA-L offers a version of the decompression (inflate) algorithm which substantially improves the decompression performance.
Testing shows the usage of semi-dynamic compression and decompression is only slightly slower than using a static table and almost as space-efficient as algorithms that use dynamic tables. This article's goal is to help you incorporate Intel ISA-L’s semi-dynamic compression and optimized decompression algorithms into your storage application. It describes prerequisites for using Intel ISA-L, and includes a downloadable code sample, with full build instructions. The code sample is a compression and decompression tool that can be used to compare the ration and performance of Intel ISA-L’s semi-dynamic compression algorithm on a public data set with the standard DEFLATE implementation, zlib*.
Hardware and Software Configuration
CPU and Chipset | Intel® Xeon® processor E5-2699 v4, 2.2 GHz
|
Platform | Platform: Intel® Server System R2000WT product family (code-named Wildcat Pass)
|
Memory | Memory size: 256 GB (16X16 GB) DDR4 2133P Brand/model: Micron – MTA36ASF2G72PZ2GATESIG |
Storage | Brand and model: 1 TB Western Digital (WD1002FAEX) Intel® SSD Data Center P3700 Series (SSDPEDMD400G4) |
Operating System | Ubuntu* 16.04 LTS (Xenial Xerus) Linux* kernel 4.4.0-21-generic |
Note: Depending on the platform capability, Intel ISA-L can run on various Intel® processor families. Improvements are obtained by speeding up the computations through the use of the following instruction sets:
- Intel® Advanced Encryption Standard New Instruction (Intel® AES-NI)
- Intel® Streaming SIMD Extensions (Intel® SSE)
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Advanced Vector Extensions 2 (Intel® AVX2)
Why Use Intel® ISA-L?
Intel ISA-L has the ability to compress and decompress faster than zlib* with only a small sacrifice in the compression ratio. This capability is well suited for high throughput storage applications. This article includes a sample application that simulates a compression and decompression scenario where the output will show the efficiency. Click on the button at the top of this article to download.
Prerequisites
Intel ISA-L supports Linux and Microsoft Windows*. A full list of prerequisite packages can be found here.
Building the INTEL ISA-L Sample application (for Linux):
- Install the dependencies:
- a c++14 compliant c++ compiler
- cmake >= 3.1
- git
- autogen
- autoconf
- automake
- yasm and/or nasm
- libtool
- boost's "Filesystem" library and headers
- boost's "Program Options" library and headers
-
boost's "String Algo" headers
>sudo apt-get update
>sudo apt-get install gcc g++ make cmake git zlib1g-dev autogen autoconf automake yasm nasm libtool libboost-all-dev
-
You also need the latest versions of isa-l and zlib. The get_libs.bash script can be used to get them. The script will download the two libraries from their official GitHub* repositories, build them, and then install them in `./libs/usr` directory.
>`bash ./libs/get_libs.bash`
- Build from the `ex1` directory:
- `mkdir <build-dir>`
- `cd <build-dir>`
- `cmake -DCMAKE_BUILD_TYPE=Release $OLDPWD`
- `make`
Getting Started with the Sample Application
The sample application contains the following files:
This example goes through the following steps at a high-level work flow and focuses on the “main.cpp”, “bm_isal.cpp”, and “bm_isal_semidyn.cpp" files:
Setup
1. In the “main.cpp” file, the program parses the command line and displays the options that are going to be performed.
int main(int argc, char* argv[])
{
options options = options::parse(argc, argv);
Parsing the option of the command line
2. In the options.cpp file, the program parses the command line arguments using `options::parse()`.
Create the benchmarks object
3. In the “main.cpp” file, the program will benchmark each raw file using a compression-level inside the benchmarks::add_benchmark() function. Since the benchmarks do not run concurrently, there is only one file “pointer” created.
benchmarks benchmarks;
// adding the benchmark for each files and libary/level combination
for (const auto& path : options.files)
{
auto compression = benchmark_info::Method::Compression;
auto decompression = benchmark_info::Method::Decompression;
auto isal_static = benchmark_info::Library::ISAL_STATIC;
auto isal_semidyn = benchmark_info::Library::ISAL_SEMIDYN;
auto zlib = benchmark_info::Library::ZLIB;
benchmarks.add_benchmark({compression, isal_static, 0, path});
benchmarks.add_benchmark({decompression, isal_static, 0, path});
if (options.isal_semidyn_stateful)
{
benchmarks.add_benchmark({compression, isal_semidyn, 0, path});
benchmarks.add_benchmark({decompression, isal_semidyn, 0, path});
}
if (options.isal_semidyn_stateless)
{
benchmarks.add_benchmark({compression, isal_semidyn, 1, path});
benchmarks.add_benchmark({decompression, isal_semidyn, 1, path});
}
for (auto level : options.zlib_levels)
{
if (level >= 1 && level <= 9)
{
benchmarks.add_benchmark({compression, zlib, level, path});
benchmarks.add_benchmark({decompression, zlib, level, path});
}
else
{
std::cout << "[Warning] zlib compression level " << level << "will be ignored\n";
}
}
}
Intel® ISA-L compression and decompression
4. In the “bm_isal.cpp” file, the program performs the static compression and decompression on the raw file using a single thread. The key functions to note are isal_deflate and isal_inflate. Both functions accept a stream as an argument, and this data structure holds the data about the input buffer, the length in bytes of the input buffer, and the output buffer and the size of the output buffer. end_of_stream indicates whether it will be last iteration.
std::string bm_isal::version()
{
return std::to_string(ISAL_MAJOR_VERSION) + "." + std::to_string(ISAL_MINOR_VERSION) + "." +
std::to_string(ISAL_PATCH_VERSION);
}
bm::raw_duration bm_isal::iter_deflate(file_wrapper* in_file, file_wrapper* out_file, int /*level*/)
{
raw_duration duration{};
struct isal_zstream stream;
uint8_t input_buffer[BUF_SIZE];
uint8_t output_buffer[BUF_SIZE];
isal_deflate_init(&stream);
stream.end_of_stream = 0;
stream.flush = NO_FLUSH;
do
{
stream.avail_in = static_cast<uint32_t>(in_file->read(input_buffer, BUF_SIZE));
stream.end_of_stream = static_cast<uint32_t>(in_file->eof());
stream.next_in = input_buffer;
do
{
stream.avail_out = BUF_SIZE;
stream.next_out = output_buffer;
auto begin = std::chrono::steady_clock::now();
isal_deflate(&stream);
auto end = std::chrono::steady_clock::now();
duration += (end - begin);
out_file->write(output_buffer, BUF_SIZE - stream.avail_out);
} while (stream.avail_out == 0);
} while (stream.internal_state.state != ZSTATE_END);
return duration;
}
bm::raw_duration bm_isal::iter_inflate(file_wrapper* in_file, file_wrapper* out_file)
{
raw_duration duration{};
int ret;
int eof;
struct inflate_state stream;
uint8_t input_buffer[BUF_SIZE];
uint8_t output_buffer[BUF_SIZE];
isal_inflate_init(&stream);
stream.avail_in = 0;
stream.next_in = nullptr;
do
{
stream.avail_in = static_cast<uint32_t>(in_file->read(input_buffer, BUF_SIZE));
eof = in_file->eof();
stream.next_in = input_buffer;
do
{
stream.avail_out = BUF_SIZE;
stream.next_out = output_buffer;
auto begin = std::chrono::steady_clock::now();
ret = isal_inflate(&stream);
auto end = std::chrono::steady_clock::now();
duration += (end - begin);
out_file->write(output_buffer, BUF_SIZE - stream.avail_out);
} while (stream.avail_out == 0);
} while (ret != ISAL_END_INPUT && eof == 0);
return duration;
}
5. In the “bm_isal_semidyn.cpp” file, the program performs the dynamic compression and decompression on the raw file using multiple threads.
std::string bm_isal_semidyn::version()
{
return std::to_string(ISAL_MAJOR_VERSION) + "." + std::to_string(ISAL_MINOR_VERSION) + "." +
std::to_string(ISAL_PATCH_VERSION);
}
bm::raw_duration
bm_isal_semidyn::iter_deflate(file_wrapper* in_file, file_wrapper* out_file, int config)
{
raw_duration duration{};
bool stateful = (config == 0);
struct isal_zstream stream;
struct isal_huff_histogram histogram;
struct isal_hufftables hufftable;
long in_file_size = in_file->size();
uint8_t* input_buffer = new (std::nothrow) uint8_t[in_file_size];
if (input_buffer == nullptr)
return raw_duration{0};
long out_buffer_size = std::max((int)(in_file_size * 1.30), 4 * 1024);
uint8_t* output_buffer = new (std::nothrow) uint8_t[out_buffer_size];
if (output_buffer == nullptr)
return raw_duration{0};
stream.avail_in = static_cast<uint32_t>(in_file->read(input_buffer, in_file_size));
if (stream.avail_in != in_file_size)
return raw_duration{0};
int segment_size = SEGMENT_SIZE;
int sample_size = SAMPLE_SIZE;
int hist_size = sample_size > segment_size ? segment_size : sample_size;
if (stateful)
isal_deflate_init(&stream);
else
isal_deflate_stateless_init(&stream);
stream.end_of_stream = 0;
stream.flush = stateful ? SYNC_FLUSH : FULL_FLUSH;
stream.next_in = input_buffer;
stream.next_out = output_buffer;
if (stateful)
stream.avail_out = out_buffer_size;
int remaining = in_file_size;
int chunk_size = segment_size;
while (remaining > 0)
{
auto step = std::chrono::steady_clock::now();
memset(&histogram, 0, sizeof(struct isal_huff_histogram));
duration += std::chrono::steady_clock::now() - step;
if (remaining < segment_size * 2)
{
chunk_size = remaining;
stream.end_of_stream = 1;
}
step = std::chrono::steady_clock::now();
int hist_rem = (hist_size > chunk_size) ? chunk_size : hist_size;
isal_update_histogram(stream.next_in, hist_rem, &histogram);
if (hist_rem == chunk_size)
isal_create_hufftables_subset(&hufftable, &histogram);
else
isal_create_hufftables(&hufftable, &histogram);
duration += std::chrono::steady_clock::now() - step;
stream.avail_in = chunk_size;
if (!stateful)
stream.avail_out = chunk_size + 8 * (1 + (chunk_size >> 16
stream.hufftables = &hufftable;
remaining -= chunk_size;
step = std::chrono::steady_clock::now();
if (stateful)
isal_deflate(&stream);
else
isal_deflate_stateless(&stream);
duration += std::chrono::steady_clock::now() - step;
if (stateful)
{
if (stream.internal_state.state != ZSTATE_NEW_HDR)
break;
}
else
{
if (stream.avail_in != 0)
break;
}
}
if (stream.avail_in != 0)
return raw_duration{0};
out_file->write(output_buffer, stream.total_out);
delete[] input_buffer;
delete[] output_buffer;
return duration;
}
bm::raw_duration bm_isal_semidyn::iter_inflate(file_wrapper* in_file, file_wrapper* out_file)
{
raw_duration duration{};
int ret;
int eof;
struct inflate_state stream;
uint8_t input_buffer[INFLATE_BUF_SIZE];
uint8_t output_buffer[INFLATE_BUF_SIZE];
isal_inflate_init(&stream);
stream.avail_in = 0;
stream.next_in = nullptr;
do
{
stream.avail_in = static_cast<uint32_t>(in_file->read(input_buffer, INFLATE_BUF_SIZE));
eof = in_file->eof();
stream.next_in = input_buffer;
do
{
stream.avail_out = INFLATE_BUF_SIZE;
stream.next_out = output_buffer;
auto begin = std::chrono::steady_clock::now();
ret = isal_inflate(&stream);
auto end = std::chrono::steady_clock::now();
duration += (end - begin);
out_file->write(output_buffer, INFLATE_BUF_SIZE - stream.avail_out);
} while (stream.avail_out == 0);
} while (ret != ISAL_END_INPUT && eof == 0);
return duration;
}
6. When all compression and decompression tasks are complete, the program displays the results on the screen. All temporary files are deleted using benchmarks.run().
Execute the sample application
In this example, the program will run through the compression and decompression functions of the Intel ISA-L and zlib. For Intel ISA-L functions, the results will show both static and semi-dynamic compression and decompression.
Run
From the ex1 directory:
cd <build-bir>/ex1
./ex1 --help
Usage
Usage: ./ex1 [--help] [--folder <path>]... [--file <path>]... :
--help display this message
--file path use the file at 'path'
--folder path use all the files in 'path'
--zlib-levels n,... coma-separated list of compression level [1-9]
--semidyn-config flag,... coma-separated list of flags for the semi-dynamic
compression ('stateful', 'stateless') [stateful]
• --file and --folder can be used multiple times to add more files to the benchmark
• --folder will look for files recursively
• the default --zlib-level is 6
Test corpuses are public data files designed to test the compression and decompression algorithms, which are available online (for example, Calgary and Silesia corpuses). The --folder option can be used to easily benchmark them:
./ex1 --folder /path/to/corpus/folder.
Running the example
Here is an example of how to run the application:
./ex1 –zlib-levels 4,6,8 –file corpuses/silesia/mozilla
Program output displays a column for the compression library, either ‘isa-l’ or ‘zlib’. The table shows the compression ratio (compressed file/raw file), and the system and processor time that it takes to perform the operation. For decompression, it just measures the elapsed time for the decompression operation. All the data was produced on the same system. Both results for Intel ISA-L results of static and semi-dynamic compression and decompression are displayed in the table.
Notes: 2x Intel® Xeon® processor E5-2699v4 (HT off), Intel® Speed Step enabled, Intel® Turbo Boost Technology disabled, 16x16GB DDR4 2133 MT/s, 1 DIMM per channel, Ubuntu* 16.04 LTS, Linux kernel 4.4.0-21-generic, 1 TB Western Digital* (WD1002FAEX), 1 Intel® SSD P3700 Series (SSDPEDMD400G4), 22x per CPU socket. Performance measured by the written sample application in this article.
Conclusion
This tutorial and its sample application demonstrates one method through which you can incorporate the Intel ISA-L static and semi-dynamic compression and decompression features into your storage application. The sample application’s output data shows there is a balancing act between processing time (CPU time) and disk space. It can assist you in determining which compression and decompression algorithm best suits your requirements, then help you to quickly adapt your application to take advantage of Intel® Architecture with the Intel ISA-L.
Other Useful Links
- Accelerating your Storage Algorithms using Intelligent Storage Acceleration Library (ISA-L) video
- Accelerating Data Deduplication with ISA-L blog post
Authors
Thai Le is a software engineer who focuses on cloud computing and performance computing analysis at Intel.
Steven Briscoe is an application engineer focusing on cloud computing within the Software Services Group at Intel Corporation (UK).
Notices
System configurations, SSD configurations and performance tests conducted are discussed in detail within the body of this paper. For more information go to http://www.intel.com/content/www/cn/zh/benchmarks/intel-product-performance.html.
This sample source code is released under the Intel Sample Source Code License Agreement.
"