Skill-Profil

Model Serving

Triton, BentoML, Seldon: model deployment, A/B testing, canary releases

Machine Learning & AI MLOps

Rollen

wo dieser Skill vorkommt

Stufen

strukturierter Entwicklungspfad

Pflichtanforderungen

die anderen 10 optional

Domäne

Machine Learning & AI

skills.group

MLOps

Zuletzt aktualisiert

17.3.2026

Verwendung

Wählen Sie Ihr aktuelles Level und vergleichen Sie die Erwartungen.

Was wird auf jedem Level erwartet

Die Tabelle zeigt, wie die Tiefe von Junior bis Principal wächst.

Rolle	Pflicht	Beschreibung
AI Product Engineer		Understands model serving basics for AI products: REST/gRPC inference endpoints, model versioning for A/B testing, and basic latency/throughput requirements. Follows team guidelines on integrating model predictions into product features. Understands differences between batch and real-time inference.
Computer Vision Engineer		Understands model serving basics for CV systems: image/video inference pipeline setup, GPU resource allocation for inference, and model format conversion (ONNX, TensorRT). Follows team practices for deploying CV models to production endpoints.
Data Scientist		Understands model serving basics: exporting trained models (pickle, ONNX, SavedModel), basic API wrapper creation with Flask/FastAPI, and model input/output schema definition. Follows team practices for model packaging and deployment workflows.
LLM Engineer		Understands LLM serving basics: inference API setup (vLLM, TGI), prompt/completion endpoint configuration, and token-based billing considerations. Follows team practices for LLM deployment including context window management and response streaming setup.
ML Engineer	Pflicht	Deploys ML model as REST API through web framework/Flask. Understands inference pipeline: preprocessing → prediction → postprocessing. Uses pickle/joblib for model serialization.
MLOps Engineer		Understands basic model serving concepts: difference between batch and real-time inference, main model formats (ONNX, SavedModel, pickle). Can deploy a simple model via Flask/FastAPI endpoint, load a model from file, and return predictions. Knows about specialized serving systems — TFServing, Triton, Seldon.
NLP Engineer	Pflicht	Knows NLP model serving basics: REST API endpoints, model loading, batching. Deploys simple NLP models as REST API for text classification and NER tasks.

Rolle	Pflicht	Beschreibung
AI Product Engineer		Implements model serving for AI product features: multi-model inference pipelines, feature store integration for real-time enrichment, and A/B testing infrastructure for model comparison. Configures auto-scaling based on inference load patterns. Implements model fallback strategies for high-availability product features.
Computer Vision Engineer		Implements CV model serving pipelines: batch and real-time inference with GPU optimization, model ensemble strategies for accuracy improvement, and pre/post-processing pipeline optimization. Configures TensorRT/ONNX Runtime for inference acceleration. Implements model caching and warm-up strategies for consistent latency.
Data Scientist		Implements model serving solutions: containerized model deployment (Docker/K8s), monitoring for data drift and prediction quality, and canary deployment for safe model rollout. Uses MLflow/BentoML for model packaging and serving. Implements feature engineering in serving pipeline consistent with training.
LLM Engineer		Implements LLM serving solutions: KV-cache optimization for throughput, batching strategies (continuous batching, dynamic batching), and quantization for cost-efficient inference (GPTQ, AWQ, GGUF). Configures vLLM/TGI for production workloads. Implements streaming response infrastructure and token-level latency monitoring.
ML Engineer	Pflicht	Uses model serving frameworks: Triton, BentoML, Seldon. Configures batch and real-time inference. Optimizes inference latency (ONNX, model optimization). Configures A/B testing for models.
MLOps Engineer		Deploys models to production via specialized serving platforms: TensorFlow Serving for TF models, Triton Inference Server for multi-framework serving. Configures BentoML for packaging models with dependencies, implements batch inference via Spark/Ray, and configures model versioning for seamless production model updates.
NLP Engineer	Pflicht	Independently designs NLP model serving: TorchServe, Triton Inference Server. Configures batching, model versioning, A/B testing. Optimizes latency through model optimization.

Rolle	Pflicht	Beschreibung
AI Product Engineer	Pflicht	Designs model serving architecture for AI products: multi-model orchestration with routing logic, real-time feature computation for inference enrichment, and cost-optimized serving with tiered model selection. Implements serving observability: latency percentiles, prediction quality metrics, and cost-per-inference tracking. Creates model deployment governance for AI products. Mentors team on production ML patterns.
Computer Vision Engineer	Pflicht	Designs CV model serving architecture: edge-cloud hybrid inference for latency-critical applications, multi-GPU serving with dynamic batching, and model distillation pipelines for deployment optimization. Implements serving monitoring: inference latency, GPU utilization, and prediction accuracy tracking. Creates reference architectures for CV model deployment. Mentors team on production CV system design.
Data Scientist	Pflicht	Designs model serving architecture: scalable inference platforms, model registry integration with automated deployment, and online/offline feature consistency guarantees. Implements advanced monitoring: data drift detection, model performance degradation alerts, and automated retraining triggers. Creates serving best practices and model deployment standards. Mentors team on MLOps patterns.
LLM Engineer	Pflicht	Designs LLM serving architecture: multi-model gateway with intelligent routing, speculative decoding for latency optimization, and disaggregated serving (prefill/decode separation). Implements cost optimization: token budget management, caching layers for repeated prompts, and model cascade strategies. Creates LLM serving benchmarks and capacity planning models. Mentors team on production LLM infrastructure.
ML Engineer	Pflicht	Designs model serving architecture. Optimizes throughput (batching, GPU scheduling). Configures autoscaling for ML serving. Implements model fallback and canary deployment.
MLOps Engineer	Pflicht	Architects model serving for complex scenarios: multi-model serving with dynamic loading, ensemble inference via Triton, model A/B testing. Optimizes latency through model optimization (TensorRT, ONNX Runtime), implements GPU sharing for efficient resource utilization, and designs autoscaling based on inference metrics.
NLP Engineer	Pflicht	Designs high-performance serving infrastructure for NLP models. Optimizes through quantization, distillation, model parallelism. Ensures latency and throughput SLA.

Rolle	Pflicht	Beschreibung
AI Product Engineer	Pflicht	Defines model serving strategy for AI product platform. Establishes inference SLA targets, cost management frameworks, and model deployment governance. Conducts architecture reviews for AI serving infrastructure. Drives adoption of efficient model serving patterns across product teams.
Computer Vision Engineer	Pflicht	Defines model serving strategy for CV engineering teams. Establishes inference performance standards, GPU resource governance, and model deployment pipelines. Conducts architecture reviews for CV serving infrastructure. Drives adoption of optimized inference patterns for production CV systems.
Data Scientist	Pflicht	Defines model serving strategy for ML teams. Establishes model deployment standards, serving infrastructure requirements, and monitoring governance. Conducts reviews of serving architectures. Drives adoption of MLOps best practices for reliable model deployment across teams.
LLM Engineer	Pflicht	Defines LLM serving strategy for the organization. Establishes inference cost management policies, serving SLA targets, and GPU infrastructure governance. Evaluates serving frameworks (vLLM, TGI, TensorRT-LLM). Conducts architecture reviews for LLM infrastructure. Drives adoption of cost-efficient LLM serving patterns.
ML Engineer	Pflicht	Defines model serving strategy for the platform. Designs unified serving layer. Optimizes serving costs. Coordinates with DevOps on infrastructure.
MLOps Engineer	Pflicht	Defines the model serving strategy for the MLOps team: standard stack (KServe/Seldon Core on Kubernetes), deployment patterns (canary, shadow, blue-green). Implements unified model rollout process with mandatory quality checks, configures SLA monitoring for latency, and defines runbooks for inference service incidents.
NLP Engineer	Pflicht	Defines model serving strategy for the NLP team. Establishes deployment standards, SLA framework, and architectural decisions for scaling NLP inference infrastructure.

Rolle	Pflicht	Beschreibung
AI Product Engineer	Pflicht	Defines organizational AI serving strategy: inference platform architecture, model deployment governance, and AI infrastructure investment decisions. Evaluates build-vs-buy for serving infrastructure (self-hosted vs managed services). Drives adoption of production ML excellence across the organization.
Computer Vision Engineer	Pflicht	Defines organizational strategy for CV model serving infrastructure: edge/cloud inference architecture, GPU fleet management, and hardware selection for CV workloads. Evaluates emerging inference technologies (custom ASICs, neuromorphic computing). Drives adoption of production CV excellence across the organization.
Data Scientist	Pflicht	Defines organizational ML serving strategy: inference platform standardization, model deployment governance, and ML infrastructure investment roadmap. Evaluates emerging serving technologies and hardware. Drives adoption of production ML best practices across all data science teams.
LLM Engineer	Pflicht	Defines organizational LLM serving strategy: inference infrastructure architecture, GPU/TPU procurement strategy, and cost governance for LLM operations. Evaluates build-vs-buy decisions for LLM infrastructure (self-hosted vs API providers). Drives adoption of efficient LLM serving practices and shapes technical vision for AI infrastructure at enterprise scale.
ML Engineer	Pflicht	Defines enterprise model serving strategy. Evaluates serving technologies. Designs multi-model serving platform.
MLOps Engineer	Pflicht	Shapes the model serving strategy at the organizational level: unified serving platform for all model types (CV, NLP, tabular), SLA standards. Designs architecture for scaling to thousands of models — model mesh, serverless inference, edge deployment. Defines GPU infrastructure cost optimization strategy for inference and platform roadmap.
NLP Engineer	Pflicht	Shapes enterprise model serving strategy for the NLP platform. Defines inference infrastructure architecture, optimization standards, and cost management at organizational level.

Junior 7 Anforderungen

AI Product Engineer

Understands model serving basics for AI products: REST/gRPC inference endpoints, model versioning for A/B testing, and basic latency/throughput requirements. Follows team guidelines on integrating model predictions into product features. Understands differences between batch and real-time inference.
Computer Vision Engineer

Understands model serving basics for CV systems: image/video inference pipeline setup, GPU resource allocation for inference, and model format conversion (ONNX, TensorRT). Follows team practices for deploying CV models to production endpoints.
Data Scientist

Understands model serving basics: exporting trained models (pickle, ONNX, SavedModel), basic API wrapper creation with Flask/FastAPI, and model input/output schema definition. Follows team practices for model packaging and deployment workflows.

LLM Engineer

Understands LLM serving basics: inference API setup (vLLM, TGI), prompt/completion endpoint configuration, and token-based billing considerations. Follows team practices for LLM deployment including context window management and response streaming setup.
ML Engineer
Pflicht

Deploys ML model as REST API through web framework/Flask. Understands inference pipeline: preprocessing → prediction → postprocessing. Uses pickle/joblib for model serialization.
MLOps Engineer

Understands basic model serving concepts: difference between batch and real-time inference, main model formats (ONNX, SavedModel, pickle). Can deploy a simple model via Flask/FastAPI endpoint, load a model from file, and return predictions. Knows about specialized serving systems — TFServing, Triton, Seldon.
NLP Engineer
Pflicht

Knows NLP model serving basics: REST API endpoints, model loading, batching. Deploys simple NLP models as REST API for text classification and NER tasks.

Middle 7 Anforderungen

AI Product Engineer

Implements model serving for AI product features: multi-model inference pipelines, feature store integration for real-time enrichment, and A/B testing infrastructure for model comparison. Configures auto-scaling based on inference load patterns. Implements model fallback strategies for high-availability product features.
Computer Vision Engineer

Implements CV model serving pipelines: batch and real-time inference with GPU optimization, model ensemble strategies for accuracy improvement, and pre/post-processing pipeline optimization. Configures TensorRT/ONNX Runtime for inference acceleration. Implements model caching and warm-up strategies for consistent latency.
Data Scientist

Implements model serving solutions: containerized model deployment (Docker/K8s), monitoring for data drift and prediction quality, and canary deployment for safe model rollout. Uses MLflow/BentoML for model packaging and serving. Implements feature engineering in serving pipeline consistent with training.

LLM Engineer

Implements LLM serving solutions: KV-cache optimization for throughput, batching strategies (continuous batching, dynamic batching), and quantization for cost-efficient inference (GPTQ, AWQ, GGUF). Configures vLLM/TGI for production workloads. Implements streaming response infrastructure and token-level latency monitoring.
ML Engineer
Pflicht

Uses model serving frameworks: Triton, BentoML, Seldon. Configures batch and real-time inference. Optimizes inference latency (ONNX, model optimization). Configures A/B testing for models.
MLOps Engineer

Deploys models to production via specialized serving platforms: TensorFlow Serving for TF models, Triton Inference Server for multi-framework serving. Configures BentoML for packaging models with dependencies, implements batch inference via Spark/Ray, and configures model versioning for seamless production model updates.
NLP Engineer
Pflicht

Independently designs NLP model serving: TorchServe, Triton Inference Server. Configures batching, model versioning, A/B testing. Optimizes latency through model optimization.

Senior 7 Anforderungen

AI Product Engineer
Pflicht

Designs model serving architecture for AI products: multi-model orchestration with routing logic, real-time feature computation for inference enrichment, and cost-optimized serving with tiered model selection. Implements serving observability: latency percentiles, prediction quality metrics, and cost-per-inference tracking. Creates model deployment governance for AI products. Mentors team on production ML patterns.
Computer Vision Engineer
Pflicht

Designs CV model serving architecture: edge-cloud hybrid inference for latency-critical applications, multi-GPU serving with dynamic batching, and model distillation pipelines for deployment optimization. Implements serving monitoring: inference latency, GPU utilization, and prediction accuracy tracking. Creates reference architectures for CV model deployment. Mentors team on production CV system design.
Data Scientist
Pflicht

Designs model serving architecture: scalable inference platforms, model registry integration with automated deployment, and online/offline feature consistency guarantees. Implements advanced monitoring: data drift detection, model performance degradation alerts, and automated retraining triggers. Creates serving best practices and model deployment standards. Mentors team on MLOps patterns.

LLM Engineer
Pflicht

Designs LLM serving architecture: multi-model gateway with intelligent routing, speculative decoding for latency optimization, and disaggregated serving (prefill/decode separation). Implements cost optimization: token budget management, caching layers for repeated prompts, and model cascade strategies. Creates LLM serving benchmarks and capacity planning models. Mentors team on production LLM infrastructure.
ML Engineer
Pflicht

Designs model serving architecture. Optimizes throughput (batching, GPU scheduling). Configures autoscaling for ML serving. Implements model fallback and canary deployment.
MLOps Engineer
Pflicht

Architects model serving for complex scenarios: multi-model serving with dynamic loading, ensemble inference via Triton, model A/B testing. Optimizes latency through model optimization (TensorRT, ONNX Runtime), implements GPU sharing for efficient resource utilization, and designs autoscaling based on inference metrics.
NLP Engineer
Pflicht

Designs high-performance serving infrastructure for NLP models. Optimizes through quantization, distillation, model parallelism. Ensures latency and throughput SLA.

Lead / Staff 7 Anforderungen

AI Product Engineer
Pflicht

Defines model serving strategy for AI product platform. Establishes inference SLA targets, cost management frameworks, and model deployment governance. Conducts architecture reviews for AI serving infrastructure. Drives adoption of efficient model serving patterns across product teams.
Computer Vision Engineer
Pflicht

Defines model serving strategy for CV engineering teams. Establishes inference performance standards, GPU resource governance, and model deployment pipelines. Conducts architecture reviews for CV serving infrastructure. Drives adoption of optimized inference patterns for production CV systems.
Data Scientist
Pflicht

Defines model serving strategy for ML teams. Establishes model deployment standards, serving infrastructure requirements, and monitoring governance. Conducts reviews of serving architectures. Drives adoption of MLOps best practices for reliable model deployment across teams.

LLM Engineer
Pflicht

Defines LLM serving strategy for the organization. Establishes inference cost management policies, serving SLA targets, and GPU infrastructure governance. Evaluates serving frameworks (vLLM, TGI, TensorRT-LLM). Conducts architecture reviews for LLM infrastructure. Drives adoption of cost-efficient LLM serving patterns.
ML Engineer
Pflicht

Defines model serving strategy for the platform. Designs unified serving layer. Optimizes serving costs. Coordinates with DevOps on infrastructure.
MLOps Engineer
Pflicht

Defines the model serving strategy for the MLOps team: standard stack (KServe/Seldon Core on Kubernetes), deployment patterns (canary, shadow, blue-green). Implements unified model rollout process with mandatory quality checks, configures SLA monitoring for latency, and defines runbooks for inference service incidents.
NLP Engineer
Pflicht

Defines model serving strategy for the NLP team. Establishes deployment standards, SLA framework, and architectural decisions for scaling NLP inference infrastructure.

Principal 7 Anforderungen

AI Product Engineer
Pflicht

Defines organizational AI serving strategy: inference platform architecture, model deployment governance, and AI infrastructure investment decisions. Evaluates build-vs-buy for serving infrastructure (self-hosted vs managed services). Drives adoption of production ML excellence across the organization.
Computer Vision Engineer
Pflicht

Defines organizational strategy for CV model serving infrastructure: edge/cloud inference architecture, GPU fleet management, and hardware selection for CV workloads. Evaluates emerging inference technologies (custom ASICs, neuromorphic computing). Drives adoption of production CV excellence across the organization.
Data Scientist
Pflicht

Defines organizational ML serving strategy: inference platform standardization, model deployment governance, and ML infrastructure investment roadmap. Evaluates emerging serving technologies and hardware. Drives adoption of production ML best practices across all data science teams.

LLM Engineer
Pflicht

Defines organizational LLM serving strategy: inference infrastructure architecture, GPU/TPU procurement strategy, and cost governance for LLM operations. Evaluates build-vs-buy decisions for LLM infrastructure (self-hosted vs API providers). Drives adoption of efficient LLM serving practices and shapes technical vision for AI infrastructure at enterprise scale.
ML Engineer
Pflicht

Defines enterprise model serving strategy. Evaluates serving technologies. Designs multi-model serving platform.
MLOps Engineer
Pflicht

Shapes the model serving strategy at the organizational level: unified serving platform for all model types (CV, NLP, tabular), SLA standards. Designs architecture for scaling to thousands of models — model mesh, serverless inference, edge deployment. Defines GPU infrastructure cost optimization strategy for inference and platform roadmap.
NLP Engineer
Pflicht

Shapes enterprise model serving strategy for the NLP platform. Defines inference infrastructure architecture, optimization standards, and cost management at organizational level.

Community

👁 Beobachten ✏️ Aenderung vorschlagen

Kommentare werden geladen...