Pachyzoom: Understanding and Optimizing Apache Hadoop* Servers With Intel® VTune™ Amplifier Platform Profiler

Twitter* collaborated with Intel to find ways to increase the storage density of Apache Hadoop* nodes. The project began with a focus on Intel® Cache Acceleration Software (Intel® CAS) and Intel® Optane™ Solid State Drive, but evolved into a deeper dive into Twitter's existing Apache Hadoop infrastructure using Intel® VTune™ Amplifier platform profiler and internal tooling. As bottlenecks were removed, new ones took their place―causing a shift in the focus of our testing. By working with experts on Apache Hadoop, storage, caching, and telemetry from both companies, we were able to challenge several assumptions about Twitter's desired compute/storage balance. The result of many months of reconfiguration, benchmark testing, and analysis was a clear direction for the shape of Twitter's next generation of Apache Hadoop hardware. The presentation will discuss the evolution of the project and key results of the collaboration.