Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Important
This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.
The following examples are complete, end-to-end workloads you submit from the air CLI
with air run -f train.yaml. Each shows a real multi-GPU pattern on H100 GPUs, including
the workload YAML, bootstrap commands, and code. Start with the
quickstart if you haven't submitted a run before.
| Example | Description |
|---|---|
| Multi-node LLM fine-tuning with FSDP | Supervised fine-tuning of Llama-3.1-8B across 16 H100 GPUs (2 nodes) using torchrun and PyTorch Fully Sharded Data Parallel (FSDP). Logs to MLflow and checkpoints to a Unity Catalog volume. |
| Distributed training with Ray Train | Distributed data-parallel fine-tuning with Ray Train's TorchTrainer across 8 H100 GPUs on a single node, with one worker per GPU. |
| Batch inference with Ray Data and vLLM | Offline LLM batch inference with Ray Data and vLLM across 8 H100 GPUs on a single node, running one vLLM replica per GPU and writing results to a Unity Catalog volume as Parquet. |