Domäne
Machine Learning & AI
Skill-Profil
Benchmarks, BLEU/ROUGE metrics, human eval, LLM-as-judge, generation quality assessment
Rollen
2
wo dieser Skill vorkommt
Stufen
5
strukturierter Entwicklungspfad
Pflichtanforderungen
8
die anderen 2 optional
Machine Learning & AI
LLM & Generative AI
17.3.2026
Wählen Sie Ihr aktuelles Level und vergleichen Sie die Erwartungen.
Die Tabelle zeigt, wie die Tiefe von Junior bis Principal wächst.
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Understands the fundamentals of LLM Evaluation. Applies basic practices in daily work. Follows recommendations from the team and documentation. | |
| LLM Engineer | Pflicht | Knows basic LLM evaluation metrics: perplexity, BLEU, ROUGE. Runs standard benchmarks (MMLU, HellaSwag) under mentor guidance and interprets basic results. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Independently applies LLM Evaluation in practice. Understands trade-offs of different approaches. Solves typical tasks independently. | |
| LLM Engineer | Pflicht | Independently designs evaluation pipelines: custom benchmarks, domain-specific eval sets, human evaluation protocols. Compares models across multiple metrics for production decision-making. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Pflicht | Has deep expertise in LLM Evaluation. Designs solutions for production systems. Optimizes and scales. Mentors the team. |
| LLM Engineer | Pflicht | Designs comprehensive evaluation frameworks: automated eval with LLM-as-judge, contamination detection, statistical significance testing. Develops domain-specific benchmarks for production tasks. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Pflicht | Defines LLM Evaluation strategy at the team/product level. Establishes standards and best practices. Conducts reviews. |
| LLM Engineer | Pflicht | Defines evaluation standards for the LLM team. Establishes model evaluation guidelines, regression testing, benchmark management. Coordinates human evaluation processes and quality assurance. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Pflicht | Defines LLM Evaluation strategy at the organizational level. Establishes enterprise approaches. Mentors leads and architects. |
| LLM Engineer | Pflicht | Shapes enterprise evaluation strategy. Defines approaches to continuous evaluation, model quality governance, and benchmark development. Ensures alignment between evaluation metrics and business objectives. |