领域
Machine Learning & AI
技能档案
Benchmarks, BLEU/ROUGE metrics, human eval, LLM-as-judge, generation quality assessment
角色数
2
包含此技能的角色
级别数
5
结构化成长路径
必要要求
8
其余 2 个可选
Machine Learning & AI
LLM & Generative AI
2026/3/17
选择当前级别并对比期望。下方卡片显示晋升所需掌握的内容。
表格展示从初级到首席的技能深度变化。点击行查看详情。
| 角色 | 必要性 | 描述 |
|---|---|---|
| AI Product Engineer | Understands the fundamentals of LLM Evaluation. Applies basic practices in daily work. Follows recommendations from the team and documentation. | |
| LLM Engineer | 必要 | Knows basic LLM evaluation metrics: perplexity, BLEU, ROUGE. Runs standard benchmarks (MMLU, HellaSwag) under mentor guidance and interprets basic results. |
| 角色 | 必要性 | 描述 |
|---|---|---|
| AI Product Engineer | Independently applies LLM Evaluation in practice. Understands trade-offs of different approaches. Solves typical tasks independently. | |
| LLM Engineer | 必要 | Independently designs evaluation pipelines: custom benchmarks, domain-specific eval sets, human evaluation protocols. Compares models across multiple metrics for production decision-making. |
| 角色 | 必要性 | 描述 |
|---|---|---|
| AI Product Engineer | 必要 | Has deep expertise in LLM Evaluation. Designs solutions for production systems. Optimizes and scales. Mentors the team. |
| LLM Engineer | 必要 | Designs comprehensive evaluation frameworks: automated eval with LLM-as-judge, contamination detection, statistical significance testing. Develops domain-specific benchmarks for production tasks. |
| 角色 | 必要性 | 描述 |
|---|---|---|
| AI Product Engineer | 必要 | Defines LLM Evaluation strategy at the team/product level. Establishes standards and best practices. Conducts reviews. |
| LLM Engineer | 必要 | Defines evaluation standards for the LLM team. Establishes model evaluation guidelines, regression testing, benchmark management. Coordinates human evaluation processes and quality assurance. |
| 角色 | 必要性 | 描述 |
|---|---|---|
| AI Product Engineer | 必要 | Defines LLM Evaluation strategy at the organizational level. Establishes enterprise approaches. Mentors leads and architects. |
| LLM Engineer | 必要 | Shapes enterprise evaluation strategy. Defines approaches to continuous evaluation, model quality governance, and benchmark development. Ensures alignment between evaluation metrics and business objectives. |