Descartes Labs: A Living Atlas

By upgrading to the latest generation Intel® processor, Descartes Labs was able to accelerate its data compression.

Descartes Labs helps companies to get business insight from huge volumes of satellite and geographic data, using a combination of Software as a Service and custom development. Handling petabytes of data, compression is hugely important for packaging the data in usefully sized files and for driving down storage costs. By upgrading to the latest generation Intel® processor, provided in the Google Cloud Platform*, Descartes Labs was able to accelerate its compression.

Challenge

  • Enable the storage and processing of petabytes of satellite and geographic data
  • Create an architecture that scales across storage, compute, and networking so it can ingest huge volumes of data regularly as data volumes increase
  • Drive down the cost of the architecture

Solution

  • Descartes Labs chose Google Cloud Platform* for its linear scalability across storage, compute, and networking
  • The company uses preemptible VMs to drive down costs
  • The 96-core Intel® Xeon® Scalable processor is used to deliver the performance required, including for compression
  • Intel® VTune™ Amplifier is used to help identify performance bottlenecks and fine-tune the code

Results

  • Descartes now counts Cargill and DARPA among its customers, which also include businesses in the agriculture, financial services, and utilities industries

Making Sense of Huge Volumes of Satellite Data
Over the last few decades, satellites have shrunk dramatically. Whereas early satellites were the size of a small bus and weighed a ton, today’s CubeSats are closer in size to a smartphone, weighing no more than 1.3kg per unit. Costs have dropped from around USD 100 million to around USD 65,000.1 The commercial space industry is working hard to find more affordable ways to launch and recover rockets too, further driving down the cost of launching satellites and acquiring data from space. Within five years, the boom in private satellites could be giving us continuous updates covering the whole planet, every 20 minutes.

For businesses, more timely satellite data represents a unique opportunity to understand and forecast change, both environmental and economic. For example: Infrastructure can be measured as it is built; crop yields can be predicted based on imagery of farmland worldwide; and solar capacity can be measured to inform decisions in the energy industry.

Making sense of the huge volumes of satellite data is a big challenge, though. The Landsat 8 satellite captures 3.1 trillion pixels per color band (red, green, blue), totaling 70 trillion pixels between 2013 and 2017. That’s 320 terabytes of data, captured by just one satellite.1 For a more complete picture, data from different satellites can be combined, but that presents challenges of its own because the data is unlikely to be consistently aligned and formatted.

Solution Details
Descartes Labs is building a digital twin of the world by applying machine learning to satellite imagery and other massive data sets, such as weather data, pricing and customer data. The solution is based in the cloud, which means it can scale storage for the massive data sets, and scale compute capability to enable analysis results and data to be returned more quickly.

The Descartes Labs data refinery offers geographic data including the entire library of satellite data from the NASA Landsat and ESA Sentinel missions, the entire Airbus OneAtlas* catalog, and NOAA’s Global Surface Summary of the Day weather dataset. The data has been combined and cleaned, so it is ready for machine learning analysis.

Customers with experience of machine learning can build their own applications and access Descartes Labs’ data using an application programming interface (API). Data available includes imagery and vector data describing features such as county boundaries. Using a short Python program, it’s possible to build applications that scale to thousands of processor cores in the cloud, enabling the huge volumes of data to be processed quickly. Customers can request geographic data covering a particular region and time period and receive the data back as imagery or a CSV file suitable for analysis in a spreadsheet.

Customers without the experience to write their own solutions can work with the team at Descartes Labs, who can combine customer data sets with Descartes Labs’ own geographic data, then build a machine learning model, and execute it on a subscription basis, with new data being continuously added.

“We were extremely impressed with how GCP scaled linearly across multiple components, not just compute, but how well the network, cloud storage, and Google Cloud PubSub* [used for messaging] all scaled linearly,” said Tim Kelton, co-founder and head of cloud operations at Descartes Labs. “When we began, we were just a few people above a pizza shop in New Mexico, with no physical servers. One of the first things we did was to clean and calibrate 43 years of satellite imagery from NASA, and using GCP we scaled that to 30,000 cores in the cloud.”

Figure 1. Descartes Labs ingests satellite data from multiple sources and writes vector data into a database as images are analyzed.

Technical Components of Solution

  • Google Cloud Platform*. To store the huge volumes of data it handles and to enable highly scalable compute capabilities, Descartes Labs uses Google Cloud Platform for both compute and storage.
  • Intel Xeon Scalable processor. The latest generation Intel processor increases performance, compared to the Intel® Xeon® processor E5 v3 family, which the company was using previously. In particular, the introduction of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) accelerated compression operations, which are essential for optimizing storage costs and packaging data in usefully sized volumes.

Those 43 years of NASA imagery amounted to 1 petabyte. Processing that volume of data could be a weekly requirement within five years, Descartes Labs estimates, so its use of historical data is not only important for analyzing changes over time but also for testing the scalability of the cloud environment.

Descartes Labs uses preemptible virtual machines (VMs), which are VMs that Google may withdraw at any time, and which will be available for no more than 24 hours. They are offered at a substantial discount, and have helped Descartes Labs to drive down its costs. The processing pipeline is an embarrassingly parallel problem, which means it can be easily divided up and distributed across multiple cores. Descartes Labs uses a Python queue called Celery* to manage tasks, and ensure they are all completed. Redis Stackdriver* is used for monitoring. Both the queuing and monitoring applications run on non-preemptible VMs to ensure continuity across the application.

As images are analyzed, information is captured and written to a PostGIS database for geospatial queries, using Google Cloud Pub/Sub for messaging. Google Kubernetes Engine* is used for managing and isolating the workloads of different customers.

Intel Processors Power the Cloud
The Descartes Labs solution runs today on the 96-core Intel Xeon Scalable processor, provided through Google Cloud Platform. The Intel Xeon Scalable processor introduces Intel Advanced Vector Extensions 512 (Intel AVX-512), doubling the amount of data that can be processed simultaneously using a single instruction, compared to the previous generation processor. “We chose the Intel Xeon Scalable platform for its performance,” said Kelton. “We found that we could recompile our code without needing to make any code changes to take advantage of Intel AVX-512.”

The Google Compute Engine running on the Intel Xeon Scalable processor is used to ingest the processing pipelines, where compression is one of the requirements, and for the Software as a Service platform where models are executed against imagery (which requires imagery expansion). The software is Descartes Labs’ proprietary stack, written in C, C++ and Python. Customers executing models on the platform often use libraries from the Python machine learning stack such as Numpy, SciPy, SciKit-Learn, TensorFlow and Keras.

Given the data volumes Descartes Labs is working with, compression is essential to minimize storage cost and to deliver data in usefully sized files. A satellite might capture 15 bands of light, for example, but a particular use case might only require the infrared band. The solution needs to be able to provide just the data required, in a compressed file for ease of use.

The machine learning models can require 1000 iterations to train. “We see improved performance on the Intel Xeon Scalable processor, compared to the Intel Xeon processor E5 v3 family we had used previously,” said Kelton. “I love it when I can get an answer faster, or reduce my billed processing time. That’s pretty amazing! I’ll take either one of those!”

While most of the company’s developers work at the level of the algorithm, coding in C, C++ and Python, one of the engineers is engaged in performance tuning. “We used Intel VTune Amplifier to help optimize the early stages of image preprocessing,” said Kelton. “It helped us to see where our code was spending too much time on a particular operation, so we could debug and fine-tune the details that we couldn’t see in a regular integrated development environment (IDE). Intel makes some of the best tools because they understand the back end architecture and what’s going on in the processor.”

Intel has helped Descartes Labs with advice on isolating workloads in a multitenant environment, and Descartes Labs is exploring the open source project Kata Containers for container security, which Intel contributed to, and the Intel® Distribution for Python, which is tuned to optimize performance on Intel processors.

Winning New Business
Descartes has secured new business from customers in the agriculture, energy, and financial services sectors, among others. “Previously, one company might own 70 percent of production, transportation, and the supply chain for a particular commodity,” said Kelton. “They could trade in the market with greater insight than anyone else. Now, using satellite imagery, there’s more transparency there. We’re starting to see more opportunities for disruption.”

For the grain trader Cargill, Descartes Labs combined Cargill’s data sets with their own to create a model that improved on both companies’ previous models for forecasting corn production in the United States.

The Defense Advanced Research Projects Agency (DARPA) in the US has commissioned Descartes Labs to build cloud infrastructure for its Geospatial Cloud Analytics program, which will integrate up to 75 different types of data. For that phase, Descartes Labs will help organizations to build sample projects on top of the new infrastructure. Potential applications include detecting illegal fishing and monitoring the construction of fracking sites.

Lessons Learned

  • By building a close relationship with its cloud provider, Descartes Labs has had an opportunity to get early access to technologies, including the Intel Xeon Scalable processor, and a chance to help shape Google’s own innovations.
  • Upgrading to the Intel Xeon Scalable processor and recompiling software to take advantage of new processor features can deliver significant performance improvements, depending on the workload.
  • The use of preemptible VMs can drive down costs significantly. The processing pipeline used in Descartes Labs’ workloads can be easily distributed across VMs, and the company has built a queue system to account for the possibility that a VM will be withdrawn at short notice.

Spotlight on Descartes Labs
Founded by a team from Los Alamos National Laboratory in 2014, Descartes Labs is building a digital twin of the world. Through its API and its custom services, it helps companies to use huge volumes of geographic data to inform business decisions. Its customers include Cargill and DARPA, and come from sectors including agriculture, financial services, and utilities.

Explore Related Products and Solutions

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Intel® Advanced Vector Extensions 512

Workload-optimized innovation with Intel® Advanced Vector Extensions 512 is now available on Intel® Xeon® Scalable processors.

Learn more

通知和免责声明

英特尔® 技术的特性和优势取决于系统配置,并可能需要支持的硬件、软件或服务激活。实际性能可能因系统配置的不同而有所差异。没有任何计算机系统能够保证绝对安全。请咨询您的系统制造商或零售商,也可登录 www.intel.cn 获取更多信息。// 性能测试中使用的软件和工作负载仅在英特尔® 微处理器上针对性能进行了优化。SYSmark 和 MobileMark 等性能测试使用特定的计算机系统、组件、软件、操作和功能进行测量。上述任何要素的变动都有可能导致测试结果的变化。您应该查询其他信息和性能测试,以帮助您对正在考虑购买的产品作出全面的评估,包括该产品在与其他产品结合使用时的性能表现。如欲了解更多完整信息,请访问 www.intel.cn/benchmarks。// 性能结果基于配置中所规定日期的测试,可能无法反映所有公开的安全更新。有关详细信息,请参见配置信息披露。没有任何产品或组件能保证绝对安全。// 所描述的成本降低方案仅用作示例,表明某些基于英特尔® 的产品在特定环境和配置下会如何影响未来的成本,并节约成本。环境各不相同。英特尔不保证任何成本和成本的节约。// 英特尔并不控制或审核本文档引用的第三方基准资料或网站。您应访问引用的网站,确认参考资料准确无误。// 在某些测试案例中,结果以英特尔内部分析或架构模拟或建模为基础来评测或模拟,且仅供参考。您的系统硬件、软件或配置的任何不同均可能会影响实际性能。

产品和性能信息

1How Computers See the Earth: A ML Approach to Understanding Satellite Imagery (Cloud Next ‘18), Kyle Story, Descartes Labs. https://www.youtube.com/watch?v=5PNnPagENxQ