Domain
Machine Learning & AI
Skill Profile
This skill defines expectations across roles and levels.
Roles
1
where this skill appears
Levels
5
structured growth path
Mandatory requirements
0
the other 5 optional
Machine Learning & AI
LLM & Generative AI
2/22/2026
Choose your current level and compare expectations. The items below show what to cover to advance to the next level.
The table shows how skill depth grows from Junior to Principal. Click a row to see details.
| Role | Required | Description |
|---|---|---|
| LLM Engineer | Knows distributed training basics: DataParallel, model parallelism. Understands gradient synchronization concepts and runs simple multi-GPU training under mentor guidance on PyTorch. |
| Role | Required | Description |
|---|---|---|
| LLM Engineer | Independently configures distributed training with DeepSpeed ZeRO and FSDP. Configures data parallel, pipeline parallel, and tensor parallel for models up to 7B parameters on GPU clusters. |
| Role | Required | Description |
|---|---|---|
| LLM Engineer | Designs distributed training strategies for large LLM: 3D parallelism, ZeRO-3 offloading, activation checkpointing. Optimizes communication overhead and GPU utilization on 100+ GPUs. |
| Role | Required | Description |
|---|---|---|
| LLM Engineer | Defines distributed training infrastructure for the LLM team. Establishes best practices for multi-node training configuration, monitoring and debugging distributed jobs on GPU clusters. |
| Role | Required | Description |
|---|---|---|
| LLM Engineer | Shapes enterprise distributed training strategy. Defines approaches to scaling to 1000+ GPUs, cost optimization, and GPU resource planning for pre-training and fine-tuning. |