跳转至主要内容
英特尔标志 - 返回主页
我的工具

选择您的语言

  • Bahasa Indonesia
  • Deutsch
  • English
  • Español
  • Français
  • Português
  • Tiếng Việt
  • ไทย
  • 한국어
  • 日本語
  • 简体中文
  • 繁體中文
登录 以访问受限制的内容

使用 Intel.com 搜索

您可以使用几种方式轻松搜索整个 Intel.com 网站。

  • 品牌名称: 酷睿 i9
  • 文件号: 123456
  • Code Name: Emerald Rapids
  • 特殊操作符: “Ice Lake”、Ice AND Lake、Ice OR Lake、Ice*

快速链接

您也可以尝试使用以下快速链接查看最受欢迎搜索的结果。

  • 产品信息
  • 支持
  • 驱动程序和软件

最近搜索

登录 以访问受限制的内容

高级搜索

仅搜索

Sign in to access restricted content.

不建议本网站使用您正在使用的浏览器版本。
请考虑通过单击以下链接之一升级到最新版本的浏览器。

  • Safari
  • Chrome
  • Edge
  • Firefox



Develop Solutions on Intel® Gaudi® AI Accelerators

 

 

 

 

  • Overview
  • Inference
  • Fine-Tune
  • Pretrain

Fine-Tune Use Case on Intel® Gaudi® 2 AI Accelerators

Learn how to run a typical model fine-tuning use case on the Intel® Gaudi® AI accelerator. Select a model, set up the environment, and run the workload. Intel Gaudi accelerators support PyTorch* as the main framework for fine-tuning.

Run Fine-Tuning

Fine-tuning on the Intel Gaudi AI accelerator is streamlined, and the code takes you step-by-step through the following items:

  • Get access to a node for the Intel Gaudi AI accelerator on the Intel® Tiber™ AI Cloud.
  • Ensure that all the software is installed and configured properly by running the PyTorch* version of the Docker* image for the accelerator.
  • Select the model to run by loading the desired model repository and appropriate libraries for model acceleration.
  • Run the model and extract the details for evaluation.

Access Models

Accessing models for running fine-tuning can be found in four main ways:

  1. Using Hugging Face* models with the Optimum for Intel library at Hugging Face.
  2. Using the Intel Gaudi AI accelerator model references repository to use built-in PyTorch models.
  3. Using the GPU Migration toolkit to automatically convert GPU-based models to be compatible with Intel Gaudi AI accelerators.
  4. Manual migration from PyTorch models in the public domain.

The Optimum for Intel library at Hugging Face and the model-reference repository contain fully optimized and fully documented model examples. Use them as a starting point for running a model.

This example shows model fine-tuning with Hugging Face by running the Meta* Llama-3-70b-Instruct model using the Optimum for Intel library at Hugging Face. Since Hugging Face models are used with an associated task, run fine-tuning with the language-modeling task.

 

  • Setup Instructions
  • Run and Fine-Tune

Runtime Instructions

The Following are the run instructions needed to setup the node, the model infrastructure and the full runtimes for the model.  

Accessing the Intel® Gaudi® Node 

To access an Intel® Gaudi® node in the Intel® Tiber™ AI Cloud, go to Intel® Tiber™ AI Cloud console and access the hardware instances to select the Intel® Gaudi® 2 platform for deep learning and follow the steps to start and connect to the node.   

The website will provide an ssh command to login to the node, and it’s advisable to add a local port forwarding to the command to be able to access a local Jupyter Notebook. For example, add the command:  ssh -L 8888:localhost:8888 ..  to be able to access the notebook.

Details about setting up Jupyter Notebooks on an Intel® Gaudi® Platform are available here.

Docker Setup  

With access to the node, use the latest Intel® Gaudi® Docker image by first calling the Docker run command which will automatically download and run the Docker: 

docker run -itd --name Gaudi_Docker --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.21.0/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest

Start the Docker and enter the Docker environment by issuing the following command:  

docker exec -it Gaudi_Docker bash

More information on Gaudi Docker setup and validation can be found here.

Model Setup  

Once the Docker environment is running, install the remaining libraries and model repositories.

Start in the root directory and install the DeepSpeed Library. DeepSpeed improves memory consumption on Intel® Gaudi® while running large language models. 

cd ~ 
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.21.0

Now install the Hugging Face Optimum Intel® Gaudi® library and GitHub Examples, selecting the latest validated release of optimum-habana: 

pip install optimum-habana==1.16.0
git clone -b v1.16.0 https://github.com/huggingface/optimum-habana 

Finally, transition to the language-modeling example and install the final set of requirements to run the model: 

cd ~/optimum-habana/examples/language-modeling  
pip install -r requirements.txt 

How to Access and Use the Llama 3 Model 

Use of the pre-trained model is subject to compliance with third-party licenses, including the “META LLAMA 3 COMMUNITY LICENSE AGREEMENT”. For guidance on the intended use of the LLAMA 3 model, what will be considered misuse and out-of-scope uses, who are the intended users and additional terms please review and read the instructions. Users bear sole liability and responsibility to follow and comply with any third-party licenses, and Habana Labs disclaims and will bear no liability with respect to users’ use or compliance with third-party licenses. To be able to run gated models like this Llama-3-70b, perform the following step:

  • Have a Hugging Face account and agree to the terms of use of the model in its model card on the Hugging Face Hub 
  • Create a read token and request access to the Llama 3 model from meta-llama 
  • Login to your account using the Hugging Face CLI:  
huggingface-cli login --token <your_hugging_face_token_here> 

To run with the associated Jupyter Notebook for fine-tuning, please see the running and fine-tuning addendum section for set up of the Jupyter Notebook. You can run these steps directly in the Jupyter interface.

 

 

 

Intel® Tiber™ AI Cloud

 

Jupyter* Notebook

 

Run and Fine-Tune

Fine-tuning a Simple GPT Model 

Start with a simple example of fine-tuning from the Hugging Face* language modeling page. This is using the wikitext dataset to fine-tune the gpt2 model. The fine-tuning of this model takes only a few minutes and the fine-tuned model output is placed in the test_clm folder. 

python run_clm.py \
--model_name_or_path gpt2 \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--do_train \
--do_eval \
--overwrite_output_dir \
--report_to none \
--output_dir ./test-clm \
--gaudi_config_name Habana/gpt2 \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs \
--throughput_warmup_steps 3

Fine-tuning the Llama 3 70B Model 

Once simple fine-tuning is complete, start running the full Llama 3 70 model for fine-tuning. Since the Llama 3 70B is a large model, employ the DeepSpeed* library to more efficiently manage the memory usage of the local HBM memory on each Intel Gaudi card. This example also deploys some additional techniques for fine-tuning:  

  • Parameter Efficient fine-tuning (PEFT) is a strategy for adapting large pre-trained language models to specific tasks.  Instead of fine-tuning the entire pre-trained model, PEFT adds a task-specific layer or a few task-specific layers on top of the pre-trained model. These additional layers are relatively smaller and have fewer parameters compared to the base model.

  • DeepSpeed significantly optimizes training efficiency, reducing both computational and memory requirements. It enables the handling of extremely large models by providing advanced parallelism techniques and memory optimization strategies

  • Flash attention is used to reduce memory usage and enhance computational speed through a fused implementation. This includes the use of the FusedSDPA (Scaled Dot Product Attention) applies similar principles to the Intel Gaudi processor environment, optimizing the scaled dot product attention function with reduced memory usage and faster performance while maintaining compatibility with standard PyTorch* functionality.

  • Setting epochs = 2; this is enough to ensure that the training loss is below 1.0, running any more epoch is not needed. 

PT_HPU_MAX_COMPOUND_OP_SIZE=10 DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 \
python3 ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lora_clm.py \
--model_name_or_path meta-llama/Meta-Llama-3-70B-Instruct \
--deepspeed llama2_ds_zero3_config.json \
--dataset_name tatsu-lab/alpaca \
--bf16 True \
--output_dir ./llama3_fine_tuning_output \
--num_train_epochs 2 \
--max_seq_len 2048 \
--per_device_train_batch_size 10 \
--per_device_eval_batch_size 10 \
--gradient_checkpointing \
--evaluation_strategy epoch \
--eval_delay 2 \
--save_strategy no \
--learning_rate 0.0018 \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--dataset_concatenation \
--attn_softmax_bf16 True \
--do_train \
--do_eval \
--use_habana \
--use_lazy_mode \
--pipelining_fwd_bwd \
--throughput_warmup_steps 3 \
--report_to none \
--lora_rank 4 \
--lora_target_modules "q_proj" "v_proj" "k_proj" "o_proj" \
--validation_split_percentage 4 \
--use_flash_attention True \
--flash_attention_causal_mask True

The result of the run shows that the fine-tuning of the model required only 38 minutes and achieved 2.2 samples (or sentences) per second. 

***** train metrics ***** 
  epoch                       =        2.0 
  max_memory_allocated (GB)   =      94.53 
  memory_allocated (GB)       =      27.15 
  total_flos                  =  1037280GF 
  total_memory_available (GB) =      94.62 
  train_loss                  =     1.1525 
  train_runtime               = 0:38:47.30 
  train_samples_per_second    =      2.221 
  train_steps_per_second      =      0.028 

The output of the run is in the llama3_fine_tuning_output folder. The full model is the adapter_model.safetensors which contains the additional weights generated by the parameter efficient fine-tuning. These weights can used for inference.

 

 

 

Large Language Model Training

 


Stay Informed


Register for the latest Intel Gaudi AI accelerator developer news, events, training, and updates.

Sign Up

除非标为可选,否则所有字段均为必填。

英特尔致力于为您提供优质、个性化的体验,您的数据帮助我们实现这一目标。
本网站采用了 reCAPTCHA 保护机制,并且适用谷歌隐私政策和服务条款。
提交此表单,即表示您确认自己已经年满 18 周岁。英特尔将针对此业务请求处理您的个人数据。要详细了解英特尔的实践,包括如何管理您的偏好和设置,请访问英特尔的隐私声明。
提交此表单,即表示您确认自己已经年满 18 周岁。 英特尔可能会与您联系,以进行与营销相关的沟通。您可以随时选择退出。要详细了解英特尔的实践,包括如何管理您的偏好和设置,请访问英特尔的隐私声明。

Thank you for signing up with Intel.

  • 公司信息
  • 英特尔资本
  • 企业责任部
  • 投资者关系
  • 联系我们
  • 新闻发布室
  • 网站地图
  • 招贤纳士 (英文)
  • © 英特尔公司
  • 沪 ICP 备 18006294 号-1
  • 使用条款
  • *商标
  • Cookie
  • 隐私条款
  • 请勿分享我的个人信息 California Consumer Privacy Act (CCPA) Opt-Out Icon

英特尔技术可能需要支持的硬件、软件或服务激活。// 没有任何产品或组件能够做到绝对安全。// 您的成本和结果可能会有所不同。// 性能因用途、配置和其他因素而异。请访问 intel.cn/performanceindex 了解更多信息。// 请参阅我们的完整法律声明和免责声明。// 英特尔致力于尊重人权,并避免成为侵犯人权行为的同谋。请参阅英特尔的《全球人权原则》。英特尔产品和软件仅可用于不会导致或有助于任何国际公认的侵犯人权行为的应用。

英特尔页脚标志