Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Important
AI Runtime for single-node tasks is in Public Preview. The distributed training API for multi-GPU workloads remain in Beta.
These notebooks fine-tune large language models (LLMs) on AI Runtime. They cover parameter-efficient methods like Low-Rank Adaptation (LoRA) and full supervised fine-tuning across libraries including TRL, Unsloth, Axolotl, and LLM Foundry, with models from Qwen2 and Llama to GPT-OSS 120B.
| Tutorial | Description |
|---|---|
| Fine-tune Qwen3-4B model | Full-weight fine-tune the Qwen3-4B model on a single H100 GPU using Transformer Reinforcement Learning (TRL), with BF16 mixed precision and gradient checkpointing for memory-efficient training. |
| Fine-tune Llama-3.2-3B with Unsloth | Fine-tune Llama-3.2-3B using the Unsloth library. |
| Fine-tune GPT-OSS 20B | Fine-tune OpenAI's gpt-oss-20b model on 8 H100 GPUs using distributed data parallelism and LoRA for parameter-efficient fine-tuning. |
| Supervised fine-tuning using DeepSpeed and TRL | Use the Serverless GPU Python API to run supervised fine-tuning (SFT) using the Transformer Reinforcement Learning (TRL) library with DeepSpeed ZeRO Stage 3 optimization. |
| LoRA fine-tuning using Axolotl | Use the Serverless GPU Python API to LoRA fine-tune an Olmo3 7B model using the Axolotl library. |
| Distributed fine-tune Qwen2-0.5B | Fine-tune the Qwen2-0.5B model using LoRA and Liger Kernels for memory-efficient distributed training with parameter reduction. |
| Distributed fine-tune Llama-3.2-3B with Unsloth | Fine-tune Llama-3.2-3B using distributed training across multiple GPUs with the Unsloth library for optimized parameter-efficient training. |
| Fine-tune Llama 3.1 8B with LLM Foundry | Fine-tune the Llama 3.1 8B model using Mosaic LLM Foundry with distributed training strategies and model evaluation. |
| Fine-tune GPT-OSS 120B with DDP and FSDP | Fine-tune OpenAI's GPT-OSS 120B model using supervised fine-tuning on H100 GPUs with DDP and FSDP distributed training strategies. |
| Distributed training with PyTorch FSDP | Train Transformer models using PyTorch Fully Sharded Data Parallel (FSDP) to shard model parameters across multiple GPUs. |
Video demo
This video walks through the Fine-tune Llama-3.2-3B with Unsloth example notebook in detail (12 minutes).