Red Hat® and Intel® AI and Machine Learning: The Perfect Combination for Data Scientists

An Excellent way to Optimize Developers Data Science Workflow

Get the Latest on All Things CODE

author-image

作者

Introduction

Red Hat and Intel are responding to the industry need for a cloud-based platform optimized for data science operations and built with open source components. Their joint solution combines oneAPI-powered Intel AI solutions (i.e., the Intel® AI Analytics and OpenVINO™ toolkits), cnvrg.io, and Habana Gaudi* integrated into Red Hat OpenShift* Data Science.

Solutions for Data Science Productivity

Red Hat OpenShift Data Science is a comprehensive end-to-end environment to accelerate time-to-market for container-based AI solutions. As a cloud service, Red Hat OpenShift Data Science gives data scientists and developers a powerful AI/ML platform for building intelligent applications. Intel’s integrated toolkits amplify these capabilities.

RHODS is designed to solve data science challenges by:

  • Choosing and deploying the right machine learning (ML) and deep learning (DL) tools (e.g., open source tooling, Jupyter* Notebooks, TensorFlow*, PyTorch*, Kubeflow, and commercial partners).
  • Reducing the time required to train, test, select, and retrain ML models that provide the highest predictive accuracy.
  • Improving performance of model training and inference tasks using software and hardware acceleration.
  • Reducing reliance on IT operations to provision and manage infrastructure.
  • Improving collaboration between data engineers and software developers to build intelligent applications.

What is AI/ML on Red Hat OpenShift*?

Red Hat OpenShift is an enterprise-grade container-orchestration platform that takes advantage of open source technologies such as Docker*, Kubernetes*, Tekton*, and others.

Red Hat OpenShift Data Science and Intel® AI tools leverage the OpenShift platform to help organizations make the most out of their data—curating and ingesting it, creating models, and deploying them into production—utilizing business processes for data governance, quality assessment, and integration.

What is AI/ML on Intel?

Intel's approach to AI/ML is driven by three principles which are essential to the future of AI:

  1. developing intelligent systems
  2. optimizing hardware and software resources
  3. collaborating with industry-leading partners to offer development platforms

Intel teamed up with Red Hat to integrate its AI portfolio into Red Hat OpenShift Data Science. It is available on Amazon cloud which, by the way, runs on Intel hardware and software including integration with the Intel AI Analytics Toolkit powered by oneAPI. (The toolkit includes essential tools for analyzing, visualizing, and optimizing data sets for machine and deep learning workloads.)

"

Understanding Red Hat OpenShift Data Science

As established above, Red Hat OpenShift Data Science combines into one common platform what self-service data scientists and developers want with the confidence enterprise IT demands. It provides a set of widely used open source data science tools that can be used to build intelligent applications, enabling developers to take advantage of the latest Intel technologies and build data science applications.

Figure 1: AI workflows using Intel® AI toolkits, frameworks, and solutions on Red Hat OpenShift Data Science.

Red Hat OpenShift Data Science Components

The platform is built on widely used open source AI frameworks—JupyterLab*, PyTorch*, TensorFlow, and more—and integrates with a core set of Intel technologies such as the aforementioned AI Analytics Toolkit, plus the OpenVINO toolkit, cnvrg.io, and Habana Gaudi using Amazon EC2 DL1 instances (cnvrg.io and Habana to be available later this year).

Collectively, this makes it easier for data scientists to quickly get started without having to worry about managing the underlying infrastructure.

Let’s look at the key details of each Intel technology and how they work with Red Hat OpenShift Data Science.

Intel AI Analytics Toolkit accelerates end-to-end machine learning and data analytics pipelines with frameworks and libraries optimized for Intel architectures, including:

  • Intel® Distribution for Python*, a version of the popular Python framework which provides drop-in performance enhancements for your existing Python code with minimal code changes
  • Intel® Optimizations for TensorFlow and PyTorch to accelerate DL training and inference
  • Model compression for DL inference with the IntelⓇ Neural Compressor
  • Model Zoo for Intel® Architecture for pre-trained popular DL models to run on Intel® Xeon® Scalable processors and DL reference models on Habana GitHub
  • Optimizations for CPU- and multiple core-intensive packages with pandas and Intel-optimized versions of Scikit-learn* and XGBoost* and distributed Dataframe processing in Intel® Distribution of Modin*

OpenVINO toolkit accelerates edge-to-cloud, high-performance model inference including:

  • Support for multiple deep learning frameworks—TensorFlow, Caffe*, PyTorch, MXNet*, Keras*, ONNX*, and more
  • Applicability across several DL tasks such as computer vision, speech recognition, and natural language processing
  • Easy deployment of model server at scale in OpenShift
  • Support for multiple storage options (S3, Azure Blob, GSC, local)
  • Configurable resource restrictions and security context with OpenShift resource requirements
  • Quantization, filter pruning, and binarization to compress models
  • Configurable service options depending on infrastructure requirements

Cnvrg.io will extend Red Hat OpenShift Data Science enterprise-grade MLOps capabilities later this year with out-of-the-box, end- to-end MLOps tooling, including:

  • Advanced MLOps platform to automate the continuous training and deployment of AI and ML models
  • Management of the entire lifecycle: data preprocessing, experimentation, training, testing, versioning, deployment, monitoring, and automatic retraining 
  • Enablement to train and deploy on any infrastructure at scale
  • Managed Kubernetes deployment on any cloud or on-premises environment
  • Open and flexible data science platform, which integrates any open source tool

Habana Gaudi DL1 instances for DL workloads will be available through the Red Hat OpenShift Data Science platform later this year. Gaudi is designed to accelerate model delivery, reduce time-to-train and cost-to-train, and facilitate building new or migrating existing models to Gaudi solutions, as well as  deploying them in production environments. Gaudi benefits include:

  • Easy access to Gaudi-based Amazon EC2 DL1 training instances from Red Hat OpenShift Data Science
  • Reduction in Gaudi hardware accelerators aim to reduce in total cost of ownership (TCO) with competitive price/performance ratio
  • Streamlined training and deployment for data scientists and developers with Habana GitHhub and Habana SynapseAI software stack featuring integrated TensorFlow and PyTorch frameworks, documentation, tools, support, reference models, and developer forum
"

Red Hat OpenShift Data Science Benefits for Data Scientists & Development Teams

  • Eliminates complex Kubernetes setup tasks – Includes support for a full-featured, managed OpenShift environment and is ready for rapid development, training, and testing.
  • Efficient management of software lifecycles – Through the managed cloud service, Red Hat updates the platform and integrated AI tooling like Jupyter Notebooks, PyTorch, and TensorFlow libraries. Kubernetes operators validate security provisions and automate management of components in the container stack, helping to avoid downtime and minimizing manual maintenance tasks.
  • Provides specialized components and partner support within Jupyter Notebooks – Data scientists can work with familiar tools or tap into a dynamic technology partner ecosystem for deeper AI/ML expertise, including the AI Analytics Toolkit and OpenVINO toolkit.
  • Streamlines development of data analytics solutions – Create models and refine them—from initial pilots to containerized deployments—on a shared, consistent platform. Data scientists can work efficiently with their choice of tools and access to a self-service infrastructure.
  • Publish models as end points – Using the Source-to-Image (S2I) tool built into OpenShift, models are container-ready, which makes it easier to integrate them into an intelligent app. Models can be rebuilt and redeployed as part of a continuous integration/ continuous development process based on changes to the source notebook.
  • Harness the power of hardware acceleration for high-performance AI workloads – Intel AI tools and solutions unlock high-performance training and inference with the power of hardware acceleration via optimized, low-level libraries such as TensorFlow and PyTorch.
  • Optimize and deploy DL models – Using the OpenVINO toolkit, deploy performant inference solutions for Intel XPUs including various types of CPUs, GPUs, and special DL inference accelerators. 
  • Improve productivity and deployment portability using best-in-class AI tools – Scale your code across multiple Intel architectures using aforementioned tools—all powered by oneAPI—without code changes.

Conclusion

Red Hat and Intel responded to the industry need for a cloud-based platform that is optimized for data science operations and built with open source components. 

The solution: Red Hat OpenShift Data Science, a collection of open source components that help data scientists build, train, and deploy machine learning models with relative ease and speed.

Red Hat OpenShift Data Science is a managed cloud service provided as an add-on to the Red Hat OpenShift platform and is integrated with the latest Intel technologies to allow data scientists and application developers to quickly build and deploy intelligent applications across the hybrid cloud.

The platform’s design, including its integration with a variety of Intel AI tools, will enable users to leverage the latest Intel technologies and build data science applications using Red Hat OpenShift Cloud Services.

More Resources

Acknowledgment

We would like to thank team Red Hat (Christina Xu, Steven Huels, Will McGrath, Audrey Reznik, Leigh Blaylock, Kristin Anderson, Erin Britton, Jeff DeMoss) and Team Intel (Susan Lansing, Maya Perry, Ryan Loney, Jack Erikson, navin Samuel, Renuke Mendis, Neil Dey, Tony Mongkolsmai, Raghu K Moorthy, Peter Velasquez, Rachel Oberman, Thomas Dewey) for their contributions to the blog and, Monique Torres, Katia Gondarenko, Dan Zloof, Leigh Rosenwald and Keenan Connolly for their review and approval help.

"

See Related Content

On-Demand Webinars

  • Optimize AI with Drop-In Acceleration from Intel & Red Hat 
    Watch
  • Deploy AI using Microsoft Azure & ONNX for the OpenVINO™ Toolkit
    Watch
  • Speed and Scale AI Inference Operations across Multiple Architectures 
    Watch
  • Accelerate AI Pipelines Using Intel® oneAPI AI Analytics Toolkit 
    Watch

Tech Articles

  • Accelerate End-to-End Machine Learning with Intel Libraries
    Read
  • Getting Started with Habana® Gaudi® for Deep Learning Training
    Read
  • Increase Distributed Time to Train Up to 15.2x on Intel® CPUs
    Read

Get the Software

 

Intel® AI Analytics Toolkit

Accelerate end-to-end machine learning and data science pipelines with optimized deep learning frameworks and high-performing Python* libraries.

Get It Now
See All Tools

 

"