Twitter Boosts Performance and Cost Efficiency

Twitter increases Hadoop performance and cost efficiency with caching, fast SSDs and more compute.

Executive Overview
Storage I/O can be a significant performance bottleneck for Hadoop* clusters, especially in hyperscale deployments like those at Twitter, where a single cluster can have up to 10,000 nodes and nearly 100 PB of logical storage. The typical Hadoop cluster at Twitter contains over 100,000 hard disk drives (HDDs)—but this configuration was reaching an I/O performance limit because while HDD capacity has increased over time, HDD performance has not significantly changed.2 Therefore, simply adding more, bigger HDDs wasn’t going to solve Twitter’s scaling challenges—in fact, it would make things worse as the I/O per GB decreases. Adding more spindles per node was not feasible due to space and power limitations.

Working in collaboration with an Intel engineering team, Twitter engineers conducted a series of experiments that revealed that storing temporary files managed by YARN* (Yet Another Resource Negotiator*) on a fast SSD enabled significant performance improvements on existing hardware (up to a 50 percent reduction in runtime).3 The team also discovered that removing a storage I/O bottleneck enabled them to use larger hard drives while simultaneously increasing processor utilization, which in turn resulted in the ability to use higher-core-count processors. This positively affected storage performance, and contributed to higher data center density by reducing the number of required HDDs.

Higher density leads to total cost of ownership (TCO) savings through energy efficiency, fewer racks, and a smaller data center footprint. Overall, Twitter expects that caching temporary data and increasing core counts will result in approximately 30 percent lower TCO and over 50 percent faster runtimes, compared to their legacy production cluster configuration.1

Read the white paper - Boosting Hadoop* Performance and Cost Efficiency with Caching, Fast SSDs, and More Compute

探索相关产品和解决方案

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Intel® SSD DC Series

Intel® SSDs for the data center are optimized for performance, reliability, and endurance.

Learn more

通知和免责声明

英特尔® 技术的特性和优势取决于系统配置,并可能需要支持的硬件、软件或服务激活。实际性能可能因系统配置的不同而有所差异。没有任何计算机系统能够保证绝对安全。请咨询您的系统制造商或零售商,也可登录 www.intel.cn 获取更多信息。// 性能测试中使用的软件和工作负载仅在英特尔® 微处理器上针对性能进行了优化。SYSmark 和 MobileMark 等性能测试使用特定的计算机系统、组件、软件、操作和功能进行测量。上述任何要素的变动都有可能导致测试结果的变化。您应该查询其他信息和性能测试,以帮助您对正在考虑购买的产品作出全面的评估,包括该产品在与其他产品结合使用时的性能表现。如欲了解更多完整信息,请访问 www.intel.cn/benchmarks。// 性能结果基于配置中所规定日期的测试,可能无法反映所有公开的安全更新。有关详细信息,请参见配置信息披露。没有任何产品或组件能保证绝对安全。// 所描述的成本降低方案仅用作示例,表明某些基于英特尔® 的产品在特定环境和配置下会如何影响未来的成本,并节约成本。环境各不相同。英特尔不保证任何成本和成本的节约。// 英特尔并不控制或审核本文档引用的第三方基准资料或网站。您应访问引用的网站,确认参考资料准确无误。// 在某些测试案例中,结果以英特尔内部分析或架构模拟或建模为基础来评测或模拟,且仅供参考。您的系统硬件、软件或配置的任何不同均可能会影响实际性能。

产品和性能信息

1

基准:单个插槽英特尔® 至强® E3-1230 处理器 v6(4 核)、32 至 64 GB 内存、1 个 1 TB 或 2 TB HDD、英特尔 S4500 240 GB 启动盘、1 GbE 至 10 GbE 以太网;无缓存。测试:单个插槽英特尔® 至强® Gold 6262 处理器(24 核)、192 GB RAM、英特尔 S4500 240 GB 启动盘、8 个 6 TB HDD、1 个 英特尔® 固态硬盘 DC P4610 系列 6.4TB、25 GbE 以太网,缓存使用英特尔® 缓存加速软件(英特尔® CAS)。操作系统:Twitter CentOS* 6 Derivative,内核版本:2.6.74-t1.el6.x86_64(基于上游 4.14.12 内核),BIOS 版本:D3WWM11,Microcode 版本:0xb000021。

3

基准:双插槽英特尔® 至强® 处理器 E5-2630 v4 @ 2.2 GHz(每插槽 10 个核心/20 个线程)、128 GB RAM、12 个 6 TB 7200 RPM SATA HDD、1 个 SATA SSD 启动盘、25 GbE 以太网,6 个机架上分布 102 个节点。工作负载:Gridmix* 和 Terasort*。Gridmix 评分:3309 秒,Terasort 评分:5504 秒,测试:双插槽英特尔® 至强® 处理器 E5-2630 v4 @ 2.2 GHz(每插槽 10 个核心/20 个线程)、128 GB RAM、12 个 6 TB 7200 RPM SATA HDD、1 个 SATA SSD 启动盘、1 个 750 GB 英特尔® 傲腾™ 数据中心级基于 P4800X NVMe* 的固态硬盘、25 GbE 以太网,6 个机架上分布 102 个节点。工作负载:Gridmix 和 Terasort。Gridmix 评分:2396 秒,Terasort 评分:2640 秒,OS:Twitter CentOS* 6 Derivative,内核。