跳转至主要内容
英特尔标志 - 返回主页
我的工具

选择您的语言

  • Bahasa Indonesia
  • Deutsch
  • English
  • Español
  • Français
  • Português
  • Tiếng Việt
  • ไทย
  • 한국어
  • 日本語
  • 简体中文
  • 繁體中文
登录 以访问受限制的内容

使用 Intel.com 搜索

您可以使用几种方式轻松搜索整个 Intel.com 网站。

  • 品牌名称: 酷睿 i9
  • 文件号: 123456
  • Code Name: Emerald Rapids
  • 特殊操作符: “Ice Lake”、Ice AND Lake、Ice OR Lake、Ice*

快速链接

您也可以尝试使用以下快速链接查看最受欢迎搜索的结果。

  • 产品信息
  • 支持
  • 驱动程序和软件

最近搜索

登录 以访问受限制的内容

高级搜索

仅搜索

Sign in to access restricted content.

不建议本网站使用您正在使用的浏览器版本。
请考虑通过单击以下链接之一升级到最新版本的浏览器。

  • Safari
  • Chrome
  • Edge
  • Firefox

Python* Data Science at Scale: Speed Up Your End-to-End Workflow

Python* Data Science at Scale: Speed Up Your End-to-End Workflow

@IntelDevTools

Subscribe Now

Stay in the know on all things CODE. Updates are delivered to your inbox.

Sign Up

Overview

Data scientists and AI developers need the ability to explore and experiment with extremely large datasets as they converge on novel solutions for deployment in production applications. Exploration and experimentation means a lot of iteration, which is only feasible with fast turnaround times. While model training performance is an important part, the entire end-to-end process must be addressed. Loading, exploring, cleaning, and adding features to large datasets can often be so time-consuming that it limits exploration and experimentation. And responsiveness during inference is often crucial once a model is deployed.

Many of the solutions for large-scale AI development require installing new packages and rewriting code to use their APIs. For instance, data scientists and AI developers often use pandas to load data for machine learning applications. But once the size of the dataset gets to about 100 MB or larger, loading and cleaning the data really slows down because pandas is single-core only.

As a result, developers must change their workflow to use different data loading and preprocessing, such as switching to Apache Spark*, which requires data scientists to learn the Spark API and overhaul their code to integrate it. This is usually an inopportune time to make such changes and is not a good use of data scientists’ and AI developers’ skills.

Intel has been working to improve performance of popular Python* libraries while maintaining the usability of Python, by implementing the key underlying algorithms in built-in code using oneAPI performance libraries. This delivers concurrency at multiple levels, such as vectorization, multithreading, and multiprocessing with minimal impact on existing code. For example:

  • Modin* scales pandas DataFrames to multiple cores with a single line of code change.
  • Intel® Optimization for PyTorch* or Intel® Optimization for TensorFlow* accelerate deep learning training and inference.
  • Intel® Extension for Scikit-learn* or XGBoost optimized for Intel architecture speed up machine learning algorithms with no code changes.

In this session, see how to accelerate your end-to-end workflow with these technologies via a demonstration using the full New York City taxi fare dataset.

 

Presenters

  • Rachel Oberman, technical consulting engineer, Intel
  • Todd Tomashek, machine learning engineer, Intel
  • Albert DeFusco, principal data scientist, Anaconda*

 

Featured Software

Get these Intel-optimized versions of your Python libraries as part of the AI Tools, or download them as stand-alone components:

  • Modin
  • Intel Optimization for PyTorch
  • Intel Optimization for TensorFlow
  • Intel Extension for Scikit-learn
  • XGBoost Optimized for Intel Architecture

 

Additional Resources

AI Tools, Libraries, and Framework Optimizations

 

Jump to:

You May Also Like
 


 

You May Also Like

Related Articles & Blogs

Optimize End-to-End AI Pipelines

Speed up Databricks* Runtime for Machine Learning with Intel®-optimized Libraries

Scale Your pandas Workflow with Modin—No Rewrite Required

One-Line Code Changes Boost Data Analytics Performance

Related Webinars

Optimize Deep Learning Workloads Using PyTorch Optimized by Intel

Achieve Up to 1.77x Boost Ratio for Your PyTorch AI Workloads

Seamlessly Scale pandas Workloads with a Single Code-Line Change

Drive 2x Performance into Your scikit-learn Machine Learning Tasks

Related Podcast

An Open Road to Swift DataFrame Scaling

  • 公司信息
  • 英特尔资本
  • 企业责任部
  • 投资者关系
  • 联系我们
  • 新闻发布室
  • 网站地图
  • 招贤纳士 (英文)
  • © 英特尔公司
  • 沪 ICP 备 18006294 号-1
  • 使用条款
  • *商标
  • Cookie
  • 隐私条款
  • 请勿分享我的个人信息 California Consumer Privacy Act (CCPA) Opt-Out Icon

英特尔技术可能需要支持的硬件、软件或服务激活。// 没有任何产品或组件能够做到绝对安全。// 您的成本和结果可能会有所不同。// 性能因用途、配置和其他因素而异。请访问 intel.cn/performanceindex 了解更多信息。// 请参阅我们的完整法律声明和免责声明。// 英特尔致力于尊重人权,并避免成为侵犯人权行为的同谋。请参阅英特尔的《全球人权原则》。英特尔产品和软件仅可用于不会导致或有助于任何国际公认的侵犯人权行为的应用。

英特尔页脚标志