领域
Machine Learning & AI
技能档案
此技能定义了各角色和级别的期望。
角色数
1
包含此技能的角色
级别数
5
结构化成长路径
必要要求
0
其余 5 个可选
Machine Learning & AI
LLM & Generative AI
2026/2/22
选择当前级别并对比期望。下方卡片显示晋升所需掌握的内容。
表格展示从初级到首席的技能深度变化。点击行查看详情。
| 角色 | 必要性 | 描述 |
|---|---|---|
| LLM Engineer | Knows distributed training basics: DataParallel, model parallelism. Understands gradient synchronization concepts and runs simple multi-GPU training under mentor guidance on PyTorch. |
| 角色 | 必要性 | 描述 |
|---|---|---|
| LLM Engineer | Independently configures distributed training with DeepSpeed ZeRO and FSDP. Configures data parallel, pipeline parallel, and tensor parallel for models up to 7B parameters on GPU clusters. |
| 角色 | 必要性 | 描述 |
|---|---|---|
| LLM Engineer | Designs distributed training strategies for large LLM: 3D parallelism, ZeRO-3 offloading, activation checkpointing. Optimizes communication overhead and GPU utilization on 100+ GPUs. |
| 角色 | 必要性 | 描述 |
|---|---|---|
| LLM Engineer | Defines distributed training infrastructure for the LLM team. Establishes best practices for multi-node training configuration, monitoring and debugging distributed jobs on GPU clusters. |
| 角色 | 必要性 | 描述 |
|---|---|---|
| LLM Engineer | Shapes enterprise distributed training strategy. Defines approaches to scaling to 1000+ GPUs, cost optimization, and GPU resource planning for pre-training and fine-tuning. |