Domain
Machine Learning & AI
Skill Profile
Benchmarks, BLEU/ROUGE metrics, human eval, LLM-as-judge, generation quality assessment
Roles
2
where this skill appears
Levels
5
structured growth path
Mandatory requirements
8
the other 2 optional
Machine Learning & AI
LLM & Generative AI
3/17/2026
Choose your current level and compare expectations. The items below show what to cover to advance to the next level.
The table shows how skill depth grows from Junior to Principal. Click a row to see details.
| Role | Required | Description |
|---|---|---|
| AI Product Engineer | Understands the fundamentals of LLM Evaluation. Applies basic practices in daily work. Follows recommendations from the team and documentation. | |
| LLM Engineer | Required | Knows basic LLM evaluation metrics: perplexity, BLEU, ROUGE. Runs standard benchmarks (MMLU, HellaSwag) under mentor guidance and interprets basic results. |
| Role | Required | Description |
|---|---|---|
| AI Product Engineer | Independently applies LLM Evaluation in practice. Understands trade-offs of different approaches. Solves typical tasks independently. | |
| LLM Engineer | Required | Independently designs evaluation pipelines: custom benchmarks, domain-specific eval sets, human evaluation protocols. Compares models across multiple metrics for production decision-making. |
| Role | Required | Description |
|---|---|---|
| AI Product Engineer | Required | Has deep expertise in LLM Evaluation. Designs solutions for production systems. Optimizes and scales. Mentors the team. |
| LLM Engineer | Required | Designs comprehensive evaluation frameworks: automated eval with LLM-as-judge, contamination detection, statistical significance testing. Develops domain-specific benchmarks for production tasks. |
| Role | Required | Description |
|---|---|---|
| AI Product Engineer | Required | Defines LLM Evaluation strategy at the team/product level. Establishes standards and best practices. Conducts reviews. |
| LLM Engineer | Required | Defines evaluation standards for the LLM team. Establishes model evaluation guidelines, regression testing, benchmark management. Coordinates human evaluation processes and quality assurance. |
| Role | Required | Description |
|---|---|---|
| AI Product Engineer | Required | Defines LLM Evaluation strategy at the organizational level. Establishes enterprise approaches. Mentors leads and architects. |
| LLM Engineer | Required | Shapes enterprise evaluation strategy. Defines approaches to continuous evaluation, model quality governance, and benchmark development. Ensures alignment between evaluation metrics and business objectives. |