Tuning Guide for ClickHouse with Intel® In-Memory Analytics Accelerator and Intel® Advanced Vector Extensions 512 with 4th Generation Intel® Xeon® Scalable Processors

author-image

作者

Introduction

This guide is for users who are already familiar with ClickHouse. It provides recommendations for configuring hardware and software that provide the best performance in most situations. However, note that we rely on the users to carefully consider these settings for their specific scenarios, since ClickHouse can be deployed in multiple ways.

4th generation Intel® Xeon® Scalable processors deliver workload-optimized performance with built-in acceleration for AI, encryption, HPC, storage, database systems, and networking. They feature unique security technologies to help protect data on-premises or in the cloud.
 

  • New built-in accelerators for AI, HPC, networking, security, storage and analytics
  • Intel® Ultra Path Interconnect (Intel® UPI)
  • Intel® Speed Select Technology
  • Hardware-enhanced security
  • New flex bus I/O interface (PCIe* 5.0 and CXL )
  • New flexible I/O interface up to 20 high-speed I/O (HSIO) lanes (PCI 3.0)
  • Increased I/O bandwidth with PCIe 5.0 (up to 80 lanes)
  • Increased memory bandwidth with DDR5
  • Increased multi-socket bandwidth with Intel UPI 2.0 (up to 16 GT/s)
  • Support for Intel® Optane™ PMem 300 series
  • Intel® In-Memory Analytics Accelerator (Intel® IAA) is a new built-in accelerator for AI, HPC, networking, security, storage, and analytics. Intel IAA supports compression and decompression with the deflate compression standard described in RFC 1951 and analytics operations. ClickHouse is a column-oriented database management system (DBMS) for online analytical processing (OLAP) of queries. By applying Intel IAA features on ClickHouse, the performance is improved. Star Schema Benchmark is used for benchmark testing.
     

Optimization on ClickHouse that can run on 4th generation Intel Xeon Scalable processors includes:
 

  • Deflate compression algorithm implemented by Intel® Query Processing Library (Intel® QPL) that includes Intel IAA and Intel® Advanced Vector Extensions (Intel® AVX-512)
  • Parquet file decoding by Intel IAA
  • ClickHouse filter operator optimized by Intel AVX 512 VBMI and VBMI2 extensions
 

Server Configuration

Hardware

The configuration described in this article is based on the 4th generation Intel Xeon processor. The server platform, memory, hard drives, and network interface cards can be determined according to your usage requirements.

Hardware

Model

CPU

4th generation Intel Xeon Scalable processor, base frequency 1.6 GHz

BIOS

EGSDCRB1.86B.0072.D01.2201101353

Memory

256 GB (16x16 GB DRAM 4800 MT/s [4800 MT/s])

Storage/Disks

1 * 892G Intel SSDSC2KB960G8

 

Software

Software

Version

Operating System

CentOS* Stream 8

Kernel

The latest upstream kernel (v6.0 and higher) basically supports a fully functional DSA and Intel IAA host driver

ClickHouse

Version 22.10

 

Hardware Tuning

Check BIOS configuration

Begin by resetting your BIOS to its default setting, and then follow the suggestion below for changes:

Configuration Item

Recommended Value

EDKII > Socket Configuration > IIO Configuration > Interrupt Remapping Option: No

Yes

EDKII Menu >Socket Configuration > IIO Configuration > PCIe ENQCMD /ENQCMDS

Enable

EDKII Menu > Socket Configuration >Processor Configuration -> VMX

Enable

 

Enable an Intel IAA Device

Intel IAA, one of Intel seamlessly integrated accelerators in the next-generation Intel Xeon Scalable processors, aimed at optimizing analytics performance while offloading CPU cores. The Intel IAA hardware resources (including work queues and work processing engines) are as follows:

 

  • Work Queues: On-device storage to contain the pending works submitted to the Intel IAA hardware.
  • Work Processing Engines: Operational units within Intel IAA hardware .
  • Groups: A configurable set of work queues and engines
     

Enable and Disable the Device with the accel-config Tool

For more information, see accel-config tool

This example shows how to enable a device (enable one Intel IAA device, and configure one Intel IAA work queue and one engine resource).

accel-config config-engine iax1/engine1.0 -g 0
accel-config config-wq iax1/wq1.0 -g 0 -s 16 -p 10 -b 1 -t 15 -m shared -y user -n app1 -d user
accel-config enable-device iax1
accel-config enable-wq iax1/wq1.0

 

See -config --help COMMAND for more information on a specific command, accel-config --list-cmds to see all available commands

Here’s an example of the accel-config --help config-engine output:

NAME
       accel-config-config-engine - configure individual attributes of an engine

SYNOPSIS
       accel-config config-engine <device name>/<engine name> [<options>]

EXAMPLE
       accel-config config-engine dsa0/engine1.2 --group-id=0

OPTIONS
       -g, --group-id=
           specify the group-id for this engine, group-id should be between 0 and the maximum number of groups per device
           shown in max_groups attribute under a device. A value of -1 disassociates the engine from any group.

Here are examples of disable device commands (disable device iax1 and work queue wq1.0):

accel-config disable-wq iax1/wq1.0
accel-config disable-device iax1

Use the following command to show the enabled device:

accel-config list

 

Software Tuning

Software configuration tuning is essential. From the operating system to ClickHouse configuration settings, they are all designed for general-purpose applications. Default settings are rarely tuned for the best performance.

ClickHouse with Intel IAA and Intel AVX-512 VBMI and VBMI2 Architecture

ClickHouse supports multiple storage engines including MergeTree engine, File engine, Apache Hadoop* Distributed File system (HDFS) engine, and S3 engine. During the query running, the data is first read from storage and disks, and then decompressed or decoded into in-memory ClickHouse columns. Then some operations including filter, aggregator, and more are run according to the query plan. The following is a diagram to show the optimization performed ClickHouse with Intel IAA and Intel AVX-512 extensions:

 

 

ClickHouse Software Tuning

Deflate compression and parquet file reader optimization are based on Intel QPL. The index and filter operator are optimized by Intel AVX-512 VBMI and VBMI2. If you want to enable these features, there are some prerequisites:

Default release packages support the AVX512 feature and  already support compression and decompression features with IAA in the latest release. However, you can also build ClickHouse yourselves. To do this, you can refer to the “Build Clickhouse with DEFLATE_QPL” documentation.

For the special configuration of each feature, you can refer to the following content.

1. Deflate (de)compression using Intel QPL and Intel IAA

This feature is supported after the release version v22.8. To use DEFLATE_QPL, see the ClickHouse Documentation.

You can change the default compression method of a server configuration for the MergeTree engine family to DEFLATE_QPL.

More Information

<compression incl="clickhouse_compression">
    <case>
        <min_part_size>10000000000</min_part_size>
        <min_part_size_ratio>0.01</min_part_size_ratio>
        <method>deflate_qpl</method>
        <level>1</level>
    </case>
</compression>

 

Or, you can also define the compression method for each column in the CREATE TABLE query.

 

 

CREATE TABLE codec_example
(
   dt Date CODEC(DEFLATE_QPL),
   ts DateTime CODEC(DEFLATE_QPL),
   float_value Float32 CODEC(NONE),
   double_value Float64 CODEC(LZ4HC(9)),
   value Float32 CODEC(Delta, ZSTD)
)
ENGINE = <Engine>
...

 

2. Parquet Reader in Arrow Using Intel QPL and Intel IAA

This feature is submitted to apache/arrow(#14585) but has not been merged yet. Just pick this PR if you want to integrate this feature. Then for Apache Parquet format data selection, the bit-packing and run-length encoding (RLE) decoding work is offloaded to IAA devices.

3. Enable Intel AVX-512 VBMI and VBMI2 in ClickHouse

This feature is supported after version v22.10. Just update the ClickHouse release version if you want to use the two features for accelerating the index and filter operation. Dynamic dispatch is already enabled, no extra build flags are required. ClickHouse uses an optimized path if you are running platforms for the 4th generation Intel Xeon Scalable processor.

Conclusion

Intel IAA and Intel AVX-512 VBMI andVBMI2 are built-in with the 4th generation Intel Xeon Scalable processor. Some optimizations on ClickHouse are based on Intel IAA hardware and Intel AVX-512 extensions. Intel IAA enables a high compress ratio that is required for large datasets. With the tools provided, users have an easier experience with integrating Intel IAA into their environment.

Apache Parquet* software is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Intel IAA can offload an decoding work with Parquet Run-Length Encoding (RLE) from the CPU to the Intel IAA device. A test on our lab environment shows better performance with Intel IAA accelerators.

Index and filter are frequently used operations in ClickHouse, which can be accelerated by Intel AVX-512 extensions. The dynamic dispatch is already enabled and ClickHouse uses an optimized path if running on platforms for the 4th generation Intel Xeon Scalable processor.

Related Tools and Information

accel-config tool

accel-config is a utility library for controlling and configuring Intel IAA subsystem in the Linux* kernel. Use accel-config -h to check if the tool works. If not, go to the GitHub clone, and then follow the README to build and install it.

References

  1. Intel QPL is an open source library to provide high-performance query processing operations on Intel CPUs. Intel QPL supports capabilities of the new Intel IAA that is available on 4th generation Intel Xeon Scalable processors, such as high-throughput compression and decompression combined with primitive analytic functions, as well as to provide highly-optimized software fallback on other Intel CPUs

           For more information about Intel QPL, see:

          GitHub Repository.

          Documentation

  1. Intel IAA Specification

Feedback

We value your feedback. If you have comments (positive or negative) on this guide or are seeking something that is not part of this guide, let us know what you think.