Domäne
Machine Learning & AI
Skill-Profil
Triton, BentoML, Seldon: model deployment, A/B testing, canary releases
Rollen
7
wo dieser Skill vorkommt
Stufen
5
strukturierter Entwicklungspfad
Pflichtanforderungen
25
die anderen 10 optional
Machine Learning & AI
MLOps
17.3.2026
Wählen Sie Ihr aktuelles Level und vergleichen Sie die Erwartungen.
Die Tabelle zeigt, wie die Tiefe von Junior bis Principal wächst.
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Understands model serving basics for AI products: REST/gRPC inference endpoints, model versioning for A/B testing, and basic latency/throughput requirements. Follows team guidelines on integrating model predictions into product features. Understands differences between batch and real-time inference. | |
| Computer Vision Engineer | Understands model serving basics for CV systems: image/video inference pipeline setup, GPU resource allocation for inference, and model format conversion (ONNX, TensorRT). Follows team practices for deploying CV models to production endpoints. | |
| Data Scientist | Understands model serving basics: exporting trained models (pickle, ONNX, SavedModel), basic API wrapper creation with Flask/FastAPI, and model input/output schema definition. Follows team practices for model packaging and deployment workflows. | |
| LLM Engineer | Understands LLM serving basics: inference API setup (vLLM, TGI), prompt/completion endpoint configuration, and token-based billing considerations. Follows team practices for LLM deployment including context window management and response streaming setup. | |
| ML Engineer | Pflicht | Deploys ML model as REST API through web framework/Flask. Understands inference pipeline: preprocessing → prediction → postprocessing. Uses pickle/joblib for model serialization. |
| MLOps Engineer | Understands basic model serving concepts: difference between batch and real-time inference, main model formats (ONNX, SavedModel, pickle). Can deploy a simple model via Flask/FastAPI endpoint, load a model from file, and return predictions. Knows about specialized serving systems — TFServing, Triton, Seldon. | |
| NLP Engineer | Pflicht | Knows NLP model serving basics: REST API endpoints, model loading, batching. Deploys simple NLP models as REST API for text classification and NER tasks. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Implements model serving for AI product features: multi-model inference pipelines, feature store integration for real-time enrichment, and A/B testing infrastructure for model comparison. Configures auto-scaling based on inference load patterns. Implements model fallback strategies for high-availability product features. | |
| Computer Vision Engineer | Implements CV model serving pipelines: batch and real-time inference with GPU optimization, model ensemble strategies for accuracy improvement, and pre/post-processing pipeline optimization. Configures TensorRT/ONNX Runtime for inference acceleration. Implements model caching and warm-up strategies for consistent latency. | |
| Data Scientist | Implements model serving solutions: containerized model deployment (Docker/K8s), monitoring for data drift and prediction quality, and canary deployment for safe model rollout. Uses MLflow/BentoML for model packaging and serving. Implements feature engineering in serving pipeline consistent with training. | |
| LLM Engineer | Implements LLM serving solutions: KV-cache optimization for throughput, batching strategies (continuous batching, dynamic batching), and quantization for cost-efficient inference (GPTQ, AWQ, GGUF). Configures vLLM/TGI for production workloads. Implements streaming response infrastructure and token-level latency monitoring. | |
| ML Engineer | Pflicht | Uses model serving frameworks: Triton, BentoML, Seldon. Configures batch and real-time inference. Optimizes inference latency (ONNX, model optimization). Configures A/B testing for models. |
| MLOps Engineer | Deploys models to production via specialized serving platforms: TensorFlow Serving for TF models, Triton Inference Server for multi-framework serving. Configures BentoML for packaging models with dependencies, implements batch inference via Spark/Ray, and configures model versioning for seamless production model updates. | |
| NLP Engineer | Pflicht | Independently designs NLP model serving: TorchServe, Triton Inference Server. Configures batching, model versioning, A/B testing. Optimizes latency through model optimization. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Pflicht | Designs model serving architecture for AI products: multi-model orchestration with routing logic, real-time feature computation for inference enrichment, and cost-optimized serving with tiered model selection. Implements serving observability: latency percentiles, prediction quality metrics, and cost-per-inference tracking. Creates model deployment governance for AI products. Mentors team on production ML patterns. |
| Computer Vision Engineer | Pflicht | Designs CV model serving architecture: edge-cloud hybrid inference for latency-critical applications, multi-GPU serving with dynamic batching, and model distillation pipelines for deployment optimization. Implements serving monitoring: inference latency, GPU utilization, and prediction accuracy tracking. Creates reference architectures for CV model deployment. Mentors team on production CV system design. |
| Data Scientist | Pflicht | Designs model serving architecture: scalable inference platforms, model registry integration with automated deployment, and online/offline feature consistency guarantees. Implements advanced monitoring: data drift detection, model performance degradation alerts, and automated retraining triggers. Creates serving best practices and model deployment standards. Mentors team on MLOps patterns. |
| LLM Engineer | Pflicht | Designs LLM serving architecture: multi-model gateway with intelligent routing, speculative decoding for latency optimization, and disaggregated serving (prefill/decode separation). Implements cost optimization: token budget management, caching layers for repeated prompts, and model cascade strategies. Creates LLM serving benchmarks and capacity planning models. Mentors team on production LLM infrastructure. |
| ML Engineer | Pflicht | Designs model serving architecture. Optimizes throughput (batching, GPU scheduling). Configures autoscaling for ML serving. Implements model fallback and canary deployment. |
| MLOps Engineer | Pflicht | Architects model serving for complex scenarios: multi-model serving with dynamic loading, ensemble inference via Triton, model A/B testing. Optimizes latency through model optimization (TensorRT, ONNX Runtime), implements GPU sharing for efficient resource utilization, and designs autoscaling based on inference metrics. |
| NLP Engineer | Pflicht | Designs high-performance serving infrastructure for NLP models. Optimizes through quantization, distillation, model parallelism. Ensures latency and throughput SLA. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Pflicht | Defines model serving strategy for AI product platform. Establishes inference SLA targets, cost management frameworks, and model deployment governance. Conducts architecture reviews for AI serving infrastructure. Drives adoption of efficient model serving patterns across product teams. |
| Computer Vision Engineer | Pflicht | Defines model serving strategy for CV engineering teams. Establishes inference performance standards, GPU resource governance, and model deployment pipelines. Conducts architecture reviews for CV serving infrastructure. Drives adoption of optimized inference patterns for production CV systems. |
| Data Scientist | Pflicht | Defines model serving strategy for ML teams. Establishes model deployment standards, serving infrastructure requirements, and monitoring governance. Conducts reviews of serving architectures. Drives adoption of MLOps best practices for reliable model deployment across teams. |
| LLM Engineer | Pflicht | Defines LLM serving strategy for the organization. Establishes inference cost management policies, serving SLA targets, and GPU infrastructure governance. Evaluates serving frameworks (vLLM, TGI, TensorRT-LLM). Conducts architecture reviews for LLM infrastructure. Drives adoption of cost-efficient LLM serving patterns. |
| ML Engineer | Pflicht | Defines model serving strategy for the platform. Designs unified serving layer. Optimizes serving costs. Coordinates with DevOps on infrastructure. |
| MLOps Engineer | Pflicht | Defines the model serving strategy for the MLOps team: standard stack (KServe/Seldon Core on Kubernetes), deployment patterns (canary, shadow, blue-green). Implements unified model rollout process with mandatory quality checks, configures SLA monitoring for latency, and defines runbooks for inference service incidents. |
| NLP Engineer | Pflicht | Defines model serving strategy for the NLP team. Establishes deployment standards, SLA framework, and architectural decisions for scaling NLP inference infrastructure. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| AI Product Engineer | Pflicht | Defines organizational AI serving strategy: inference platform architecture, model deployment governance, and AI infrastructure investment decisions. Evaluates build-vs-buy for serving infrastructure (self-hosted vs managed services). Drives adoption of production ML excellence across the organization. |
| Computer Vision Engineer | Pflicht | Defines organizational strategy for CV model serving infrastructure: edge/cloud inference architecture, GPU fleet management, and hardware selection for CV workloads. Evaluates emerging inference technologies (custom ASICs, neuromorphic computing). Drives adoption of production CV excellence across the organization. |
| Data Scientist | Pflicht | Defines organizational ML serving strategy: inference platform standardization, model deployment governance, and ML infrastructure investment roadmap. Evaluates emerging serving technologies and hardware. Drives adoption of production ML best practices across all data science teams. |
| LLM Engineer | Pflicht | Defines organizational LLM serving strategy: inference infrastructure architecture, GPU/TPU procurement strategy, and cost governance for LLM operations. Evaluates build-vs-buy decisions for LLM infrastructure (self-hosted vs API providers). Drives adoption of efficient LLM serving practices and shapes technical vision for AI infrastructure at enterprise scale. |
| ML Engineer | Pflicht | Defines enterprise model serving strategy. Evaluates serving technologies. Designs multi-model serving platform. |
| MLOps Engineer | Pflicht | Shapes the model serving strategy at the organizational level: unified serving platform for all model types (CV, NLP, tabular), SLA standards. Designs architecture for scaling to thousands of models — model mesh, serverless inference, edge deployment. Defines GPU infrastructure cost optimization strategy for inference and platform roadmap. |
| NLP Engineer | Pflicht | Shapes enterprise model serving strategy for the NLP platform. Defines inference infrastructure architecture, optimization standards, and cost management at organizational level. |