Domäne
Machine Learning & AI
Skill-Profil
Dieser Skill definiert Erwartungen über Rollen und Level.
Rollen
1
wo dieser Skill vorkommt
Stufen
5
strukturierter Entwicklungspfad
Pflichtanforderungen
0
die anderen 5 optional
Machine Learning & AI
LLM & Generative AI
22.2.2026
Wählen Sie Ihr aktuelles Level und vergleichen Sie die Erwartungen.
Die Tabelle zeigt, wie die Tiefe von Junior bis Principal wächst.
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| LLM Engineer | Knows distributed training basics: DataParallel, model parallelism. Understands gradient synchronization concepts and runs simple multi-GPU training under mentor guidance on PyTorch. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| LLM Engineer | Independently configures distributed training with DeepSpeed ZeRO and FSDP. Configures data parallel, pipeline parallel, and tensor parallel for models up to 7B parameters on GPU clusters. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| LLM Engineer | Designs distributed training strategies for large LLM: 3D parallelism, ZeRO-3 offloading, activation checkpointing. Optimizes communication overhead and GPU utilization on 100+ GPUs. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| LLM Engineer | Defines distributed training infrastructure for the LLM team. Establishes best practices for multi-node training configuration, monitoring and debugging distributed jobs on GPU clusters. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| LLM Engineer | Shapes enterprise distributed training strategy. Defines approaches to scaling to 1000+ GPUs, cost optimization, and GPU resource planning for pre-training and fine-tuning. |