Baidu ABC Storage: Redefining Object Storage

Baidu ABC Storage leverages Intel® Optane™ SSD and Intel® QLC 3D NAND SSD technology to drive performance and capacity.

Advanced technologies, such as Artificial Intelligence (AI) Training, Big Data Processing, and High Performance Computing (HPC), are driving the direction in development of private cloud storage services. Storage systems for massive data are also closely intertwined with enterprise needs, especially in the area of high-performance storage systems for massive quantities of unstructured small files. As a leading enterprise in IT and in the internet industry, Baidu AI Cloud* applied its years of experience in public cloud storage technologies to a private cloud storage solution as a crucial component in its ABC (AI, Big Data, Cloud) Strategy. Through its partnership with Intel, Baidu AI Cloud employed a combination of SSDs with Intel® Optane™ technology and Intel® QLC technology for the core hardware of ABC Storage’s all-flash object storage solution.

“Baidu AI Cloud expects its high performance all-flash object storage solution to help private cloud users tackle the challenges posed by massive unstructured small files. The combination of Intel® Optane™ Solid State Drives (SSD) and Intel® SSD based on Intel® QLC 3D NAND Technology has helped our solution yield optimum results in terms of stability and Input/Output Operations Per Second (IOPS).” - Baidu AI Cloud ABC Storage Team

Data Growth—Opportunity and Challenge
The volume of worldwide data is expected to swell to 163 ZB (Zettabytes) by 2025.1 Massive data, especially with the explosive growth of unstructured data, has become a driving force for the digitization of enterprise data, as well as the rapid and continued evolution of related IT technologies. This amount of data is expected to enable breakthroughs in technologies, such as computer vision, speech recognition, and financial risk control. Thus, effective management, processing, and utilization of massive data has become a key area of competitiveness for enterprises wishing to maintain an edge in their industries.

However, the storage of massive unstructured data creates challenges for traditional storage systems due to file size and quantity, indexing, accessing patterns, and legacy storage technologies (i.e., spinning drives). Additionally, block storage and file storage systems are not ideal for small file storage, while AI and other new applications demand higher requirements for storage systems in terms of read/write performance. These present interesting technology challenges.

File Size and Quantity—The performance of traditional file storage systems tends to be volatile and declines with the rapid increase of file quantities. In AI training scenarios, such as image recognition, the training datasets incorporate astounding file quantities, typically of small file size. Likewise, for popular internet applications, such as Media Asset Management, unmanned vehicles, and video services, the file quantities stored and processed in the system usually reach hundreds of millions. The rapid increase of file quantities results in the decline and volatility of IOPS performance in storage systems, especially in traditional file storage, such as Network Attached Storage (NAS) systems.

Indexing—In addition, file storage systems currently use Hash tree and B+ tree computing methods to manage and index directories. The algorithms used to manage and index directories tend to significantly decline in efficiency and performance when retrieving from directories containing over 100 million files.

Accessing—In certain application scenarios, “Read Once, Write Many” or “mixed read/write” access modes further exacerbate the challenges in terms of performance. Common file I/O processes comprise “open”, “search”, “read/write” and “close” operations. “Open” before “read” or “write” take up the most system time and resources. As such, when handling “mixed read/write” access modes, the system repeatedly executes “open” operations. When there are massive concurrent operations, a huge amount of the system’s resources will be wasted and result in performance loss.

HDDs—The weaknesses of traditional HDDs in terms of IOPS and random read/write performance have hindered the performance upgrades of storage systems. Due to mechanical limitations, even the higher-performance HDDs only have IOPS figures in the hundreds for random read/write performance.2 When processing small files, the efficiency is even lower, as the HDD is required to continuously search for and locate the files at different storage locations.

Baidu ABC Storage’s High-Performance, All-Flash Storage Solution
Baidu has gained widespread recognition for its work in the area of search technologies. With over 100 billion pages, 2,000 Petabytes (PB) of data stored, and 100 PB of data processed per day,3 Baidu is well-versed in the technological challenges brought about by the storage of massive unstructured small files.

Baidu AI Cloud has attempted to tackle the above challenges through software improvements and Intel®-based hardware enhancements.

Figure 1. The performance stability test results of the ABC Storage object storage solution under Baidu AI Cloud

Developers incorporated Baidu’s high-performance object storage engine into the new solution, thereby enabling it to deliver great data life cycle management, data protection strategy, retrieval efficiency, InfiniBand* Architecture network and RDMA support, and flexible rights management mechanisms. Additionally, by leveraging flat deployment for object storage, high-efficiency retrieval, and Exabyte scalability, the ABC Storage high-performance object storage engine is able to provide private cloud users with storage of massive unstructured small files.

An AI training process comprises data collection, cleaning and labeling, resizing, modeling, training, evaluation, and prediction. Each step requires the storage system to perform read, write, and retrieve operations. Throughout the training, the data will be subjected to high concurrency and iterative throughput, so as to provide sufficient data to train the system for full-load operations.

Baidu’s object storage engine solves performance issues with massive files, enabling storage systems to achieve stable performance output and effectively boost the data utilization efficiency of AI applications. Meanwhile, for certain mixed read/write operations during training, the engine also performs further optimization to ensure that the system performance is unaffected under mixed read/write scenarios.

Testing results of various optimizations show that the software alone is able to maintain stable performance throughout with increasing file quantities. As shown in Figure 1, the Query Per Second (QPS), and latency performance fluctuated within a 5 percent4 range as file quantities gradually increased from 100 million to 8 billion.

As described above, HDDs present several challenges for high-performance storage solutions. SSDs have virtually no seek time or rotational latency, thereby resulting in high IOPS performance compared to HDDs. Baidu AI Cloud uses a combination of Intel® Optane™ SSD and Intel® QLC 3D NAND SSD technology to make up the core hardware for the ABC Storage all-flash object storage solution. Intel Optane SSDs feature innovative Intel® 3D XPoint™ Storage Media and incorporate advanced system memory controllers, interface hardware, and software technology, delivering low latency and high stability. The Baidu solution uses the following devices:

Intel® Optane™ SSD DC P4800X is deployed in core storage system areas, such as the cache, MDS, and log system. This device offers up to 550,000 IOPS of random read/write capacity and less than 10 µs of read/write latency,5 enabling the solution to perform more effectively in multi-user and high-concurrency scenarios. Meanwhile, its drive writes per day (DWPD) performance also provides a longer lifespan and ensures better economic value.

Intel® SSD D5-P4320, based on QLC technology, offers large capacity data storage. Intel’s 64-layer 3D NAND technology enables a single QLC SSD disk capacity of up to 7.68 TB in order to adequately fulfill the storage requirements of massive data. It also has a random read IOPS of up to 427,0007, and, when paired with the Intel® Xeon® Gold 6142 processor, it is especially suitable in terms of meeting “Write Once, Read Many” (WORM) performance requirements in application scenarios, such as AI training. The Intel SSD D5-P4320 used in the new solution effectively meets the requirements for large storage capacity.

In the ABC Storage solution, each storage server is deployed with four SSDs, which provide a total file storage quantity of up to 2 billion 15 KB files in 30 TB of capacity. More importantly, the price/performance ratio of the Intel QLC 3D NAND SSDs has enabled this combination of SSDs to ensure the high performance of this solution while effectively lowering the Total Cost of Ownership (TCO) for the system. Baidu testing has shown that the Baidu AI Cloud high performance all-flash solution could lower TCO by 60 percent.6

With the support of Intel, the Baidu AI Cloud team carried out a detailed evaluation and measurement of the performance of the ABC Storage all-flash storage solution. Figure 2 shows the benchmark test framework, which includes a cluster made up of five servers with each server configured with two Intel® Xeon® Gold 6142 processors and 256 GB of memory. One 750 GB Intel Optane SSD DC P4800X and four 7.68 TB Intel SSD D5-P4320 drives were used. The system used a 40 GbE network to connect to the computing platform.

Testing showed that the combination of the Intel Optane SSD and Intel 3D NAND QLC SSD technology adequately meets the storage system performance requirements for AI training application scenarios. Table 1 shows the performance results of the basic ABC Storage version.

Figure 2. The Benchmark Test Framework for ABC Storage’s All-flash Storage Solution

Table 1. Benchmark performance test results for ABC Storage’s all-flash storage solution4

Future Prospects
As one of the crucial practical outcomes of the Baidu AI Cloud ABC strategy, the ABC Storage high-performance all-flash object storage solution has provided strong and reliable support for private cloud application scenarios, such as AI training, big data analysis, and high-performance computing, with its improved storage performance and storage size.

Intel’s products and technologies are crucial factors in the success of the solution. In the future, both parties plan to embark on more partnerships to optimize the performance of the existing solutions, while incorporating more of Intel’s products and technologies. Meanwhile, both parties also plan to extend the all-flash high-performance object storage solution to more application scenarios to truly convert massive data into a driving force that will propel the transformation of the development of IT technologies and the digitization of enterprises.

The Advantages of the Baidu AI Cloud Solution

  • The ABC Storage high-performance object storage engine provides an integrated object storage interface for application scenarios, such as AI training and high performance computing, thereby providing stable performance output even with a rapid increase in file quantities.
  • With targeted optimization processes, the ABC Storage high-performance object storage engine helps storage systems maintain good performance, whereby “read/write”, WORM and “mixed read/write” scenarios are required for massive data.
  • The combination of the Intel® Optane™ SSD and the Intel® SSD based on Intel® QLC 3D NAND technology enables the ABC Storage all-flash object storage solution to maintain high performance, while drastically reducing TCO.


Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Intel® Optane™ DC SSDs

Intel® SSDs for the data center are optimized for performance, reliability, and endurance.

Learn more

Intel® SSD DC Series

Intel® SSDs for the data center are optimized for performance, reliability, and endurance.

Learn more


英特尔® 技术的特性和优势取决于系统配置,并可能需要支持的硬件、软件或服务激活。实际性能可能因系统配置的不同而有所差异。没有任何计算机系统能够保证绝对安全。请咨询您的系统制造商或零售商,也可登录 获取更多信息。// 性能测试中使用的软件和工作负载仅在英特尔® 微处理器上针对性能进行了优化。SYSmark 和 MobileMark 等性能测试使用特定的计算机系统、组件、软件、操作和功能进行测量。上述任何要素的变动都有可能导致测试结果的变化。您应该查询其他信息和性能测试,以帮助您对正在考虑购买的产品作出全面的评估,包括该产品在与其他产品结合使用时的性能表现。如欲了解更多完整信息,请访问。// 性能结果基于配置中所规定日期的测试,可能无法反映所有公开的安全更新。有关详细信息,请参见配置信息披露。没有任何产品或组件能保证绝对安全。// 所描述的成本降低方案仅用作示例,表明某些基于英特尔® 的产品在特定环境和配置下会如何影响未来的成本,并节约成本。环境各不相同。英特尔不保证任何成本和成本的节约。// 英特尔并不控制或审核本文档引用的第三方基准资料或网站。您应访问引用的网站,确认参考资料准确无误。// 在某些测试案例中,结果以英特尔内部分析或架构模拟或建模为基础来评测或模拟,且仅供参考。您的系统硬件、软件或配置的任何不同均可能会影响实际性能。


1 Data taken from the IDC report: “Data Age 2025: The Evolution of Data to Life-Critical.”
2 The data is preliminarily estimated based on the IOPS=1000 µs/(Search time + rotational latency) formula.
3 Data taken from Baidu AI Cloud’s product introduction: “Baidu AI Cloud ABC Storage’s distributed storage products.”
4 The results were provided by Baidu AI Cloud and were based on its internal tests. For more information, please contact Baidu AI Cloud. For the results shown in Figure 3, four storage nodes were configured and the servers were all configured with four Intel® Xeon® processors E5-2620 v4/2.10GHz (with a total of 32 cores and 64 threads), 128 GB DRAM memory, and seven 4TB SATA SSDs (Note: This test was mainly designed to verify the software solution and was not configured with combinations of the Intel® Optane™ SSD and the Intel® QLC 3D NAND SSD). During the test, the team imported 4K files before executing “random read” operations at 500 concurrency.