跳转至主要内容
英特尔标志 - 返回主页

登录

缺少用户名
缺少密码

您登录即表明您同意我们的服务条款。

忘记了您的英特尔用户名 或密码?

常见问题解答

您是否在英特尔工作? 在此登录.

没有英特尔帐户? 在此注册 基本帐户。

我的工具

选择您的地区

Asia Pacific

  • Asia Pacific (English)
  • Australia (English)
  • India (English)
  • Indonesia (Bahasa Indonesia)
  • Japan (日本語)
  • Korea (한국어)
  • Mainland China (简体中文)
  • Taiwan (繁體中文)
  • Thailand (ไทย)
  • Vietnam (Tiếng Việt)

Europe

  • France (Français)
  • Germany (Deutsch)
  • Ireland (English)
  • Italy (Italiano)
  • Poland (Polski)
  • Spain (Español)
  • Turkey (Türkçe)
  • United Kingdom (English)

Latin America

  • Argentina (Español)
  • Brazil (Português)
  • Chile (Español)
  • Colombia (Español)
  • Latin America (Español)
  • Mexico (Español)
  • Peru (Español)

Middle East/Africa

  • Israel (עברית)

North America

  • United States (English)
  • Canada (English)
  • Canada (Français)
登录 以访问受限制的内容

使用 Intel.com 搜索

您可以使用几种方式轻松搜索整个 Intel.com 网站。

  • 品牌名称: 酷睿 i9
  • 文件号: 123456
  • 代号: Alder Lake
  • 特殊操作符: “Ice Lake”、Ice AND Lake、Ice OR Lake、Ice*

快速链接

您也可以尝试使用以下快速链接查看最受欢迎搜索的结果。

  • 产品信息
  • 支持
  • 驱动程序和软件

最近搜索

登录 以访问受限制的内容

高级搜索

仅搜索

Sign in to access restricted content.

不建议将您正在使用的浏览器版本用于此网站。
请考虑点击以下链接之一升级到该浏览器的最新版本。

  • Safari
  • Chrome
  • Edge
  • Firefox

Intel® Neural Compressor

Speed Up AI Inference without Sacrificing Accuracy

Deploy More Efficient Deep Learning Models

Intel® Neural Compressor performs model compression to reduce the model size and increase the speed of deep learning inference for deployment on CPUs or GPUs. This open source Python* library automates popular model compression technologies, such as quantization, pruning, and knowledge distillation across multiple deep learning frameworks.

 

Using this library, you can:

  • Converge quickly on quantized models though automatic accuracy-driven tuning strategies.
  • Prune model weights by specifying predefined sparsity goals that drive pruning algorithms.
  • Distill knowledge from a larger network (“teacher”) to train a smaller network (“student”) to mimic its performance with minimal precision loss.

Demonstration of AI Performance and Productivity 

Download as Part of the Toolkit

Intel Neural Compressor is available in the Intel® AI Analytics Toolkit (AI Kit), which provides accelerated machine learning and data analytics pipelines with optimized deep learning frameworks and high-performing Python libraries.

Get It Now
Download the Stand-Alone Version

A stand-alone download of Intel Neural Compressor is available. You can download binaries from Intel or choose your preferred repository.

Download

      

Develop in the Free Intel® Cloud

Get what you need to build and optimize your oneAPI projects for free. With an Intel® Developer Cloud account, you get 120 days of access to the latest Intel® hardware—CPUs, GPUs, FPGAs—and Intel® oneAPI tools and frameworks. No software downloads. No configuration steps. No installations.

Get Access

Features

Model Compression Techniques

  • Quantize data and computation to int8, bfloat16, or a mixture of FP32, BF16, and int8 to reduce model size and to speed inference while minimizing precision loss. Quantize during training and posttraining, or dynamically based on the runtime data range.
  • Prune parameters that have minimal effect on accuracy to reduce the size of a network. Discard weights in structured or unstructured sparsity patterns, or remove filters or layers according to specified rules.
  • Distill knowledge from a teacher network to a student network to improve the accuracy of the compressed model.

Built-in Strategies

  • Automatically optimize models using recipes of model compression techniques to achieve objectives with expected accuracy criteria. 
     

APIs for TensorFlow*, PyTorch*, Apache MXNet*, and Open Neural Network Exchange Runtime (ONNXRT) Frameworks 

  • Get started quickly with built-in DataLoaders for popular industry dataset objects or register your own dataset.
  • Preprocess input data using built-in methods such as resize, crop, normalize, transpose, flip, pad, and more.
  • Configure model objectives and evaluation metrics without writing framework-specific code.
  • Analyze the graph and tensor after each tuning run with TensorBoard*.
  • TensorFlow INT8 Quantization
  • PyTorch INT8 Post-training Quantization
  • PyTorch INT8 Quantization-aware Training
  • Intel® Extension for PyTorch* INT8 Quantization

Case Studies

Accelerating Alibaba* Transformer Model Performance

Alibaba Group* and Intel collaborated to explore and deploy their AI int8 models on platforms that are based on 3rd generation Intel® Xeon® Scalable processors.

Learn More

CERN Uses Intel® Deep Learning Boost and oneAPI to Juice Inference without Accuracy Loss

Researchers at CERN demonstrated success in accelerating inferencing nearly twofold by using reduced precision without compromising accuracy.

Learn More

A 3D Digital Face Reconstruction Solution Enabled by 3rd Generation Intel® Xeon® Scalable Processors

By quantizing the Position Map Regression Network from an FP32-based inference down to int8, Tencent Games* improved inference efficiency and provided a practical solution for 3D digital face reconstruction.

Learn More

Demonstrations

Quantize ONNX* Models 

Learn how to quantize MobileNet* v2 in the ONNX* framework using Intel Neural Compressor. Show accuracy versus performance results for a variety of models that are based on ONNX.

Learn More

AI Inference Acceleration on CPUs

Deploying a trained model for inference often requires modification, optimization, and simplification based on where it is being deployed. This overview of Intel’s end-to-end solution includes a downloadable neural style transfer demonstration.

Learn More

Accelerate AI Inference without Sacrificing Accuracy

This webinar provides an overview of available model compression techniques and demonstrates an end-to-end quantization workflow.

Learn More

Documentation & Code Samples

Documentation

  • Installation Guide (All Operating Systems)
  • User Guide
  • Tuning Strategies
  • API Documentation
  • Release Notes
  • System Requirements

View All Documentation

Code Samples

  • Get Started
  • Model Compression: TensorFlow* | PyTorch* | MXNet* | ONNXRT
  • Boost Network Security AI Inference Performance in the Google Cloud Platform* Service

More Samples

Training

Use Low-Precision Optimizations for High-Performance, Deep Learning Inference Applications

Quantize a Model for Text Classification Tasks

Quantize PyTorch for Ease of Use

Quantize an AI Model in the AI Kit on Alibaba Cloud*

Specifications

Processor:

  • Intel® Xeon® processors
  • Intel Xeon Scalable processors
  • Intel® Arc™ GPUs

Operating systems:

  • Linux*
  • Windows*

Languages:

  • Python

Get Help

Your success is our success. Access this support resource when you need assistance.

  • AI Kit Support Forum
  • Deep Learning Frameworks Support Forum

For additional help, see the general oneAPI Support.

Related Products

All AI Development Tools and Resources

Stay in the Know with All Things CODE

Sign up to receive the latest trends, tutorials, tools, training, and more to
help you write better code optimized for CPUs, GPUs, FPGAs, and other
accelerators—stand-alone or in any combination.

 

Sign Up
  • Features
  • Documentation & Code Samples
  • Training
  • Specifications
  • Help
  • 公司信息
  • 英特尔资本
  • 企业责任
  • 投资者关系
  • 联系我们
  • 新闻发布室
  • 网站地图
  • 招贤纳士 (英文)
  • © 英特尔公司
  • 沪 ICP 备 18006294 号-1
  • 使用条款
  • *商标
  • Cookie
  • 隐私条款
  • 请勿分享我的个人信息

英特尔技术可能需要支持的硬件、软件或服务激活。// 没有任何产品或组件能够做到绝对安全。// 您的成本和结果可能会有所不同。// 性能因用途、配置和其他因素而异。// 请参阅我们的完整法律声明和免责声明。// 英特尔致力于尊重人权,并避免成为侵犯人权行为的同谋。请参阅英特尔的《全球人权原则》。英特尔产品和软件仅可用于不会导致或有助于任何国际公认的侵犯人权行为的应用。

英特尔页脚标志