Domäne
Observability & Monitoring
Skill-Profil
Service Level Indicators/Objectives/Agreements, error budgets, on-call rotation
Rollen
35
wo dieser Skill vorkommt
Stufen
5
strukturierter Entwicklungspfad
Pflichtanforderungen
52
die anderen 113 optional
Observability & Monitoring
Alerting & On-Call
17.3.2026
Wählen Sie Ihr aktuelles Level und vergleichen Sie die Erwartungen.
Die Tabelle zeigt, wie die Tiefe von Junior bis Principal wächst.
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Android Developer | Understands SLI, SLO, and SLA concepts and their application for mobile services. Knows key quality indicators for Android backends: API availability, response latency, error rate. | |
| Backend Developer (C#/.NET) | Understands SLI/SLO for C#: latency, error rate, throughput metrics. Monitors via Application Insights. | |
| Backend Developer (Elixir) | Understands SLI/SLO/SLA for Elixir services: availability metrics, latency percentiles. Monitors basic indicators through Phoenix telemetry. | |
| Backend Developer (Go) | Understands what SLA (Service Level Agreement) is. Knows that 99.9% availability means ~8.7 hours of downtime per year. Understands why monitoring is needed. | |
| Backend Developer (Java/Kotlin) | Understands what SLA means for Java backend services — knows that 99.9% availability translates to ~8.7 hours allowed downtime per year. Understands why monitoring JVM metrics, response times, and error rates matters for meeting service commitments. | |
| Backend Developer (Node.js) | Understands SLI/SLO for Node.js: latency, error rate, event loop lag. Monitors basic indicators via prom-client. | |
| Backend Developer (PHP) | Understands SLI, SLO, and SLA concepts and their significance for PHP service reliability. Knows basic metrics: availability, latency, error rate. Can check current SLIs via monitoring dashboards and understands how PHP-FPM response time and HTTP error codes affect SLOs. | |
| Backend Developer (Python) | Understands what SLA means for Python backend services — knows that 99.9% availability translates to ~8.7 hours allowed downtime per year. Understands why monitoring request latency, error rates, and worker process health matters for meeting service commitments. | |
| Backend Developer (Rust) | Understands SLI/SLO for Rust services: latency metrics, error rate, throughput. Monitors basic indicators through Prometheus metrics crate. | |
| Backend Developer (Scala) | Understands SLI/SLO/SLA for Scala services: latency metrics, JVM-specific indicators. Monitors basic SLI through Kamon/Prometheus. | |
| Cloud Engineer | Understands SLI, SLO, SLA concepts and their application in cloud services. Knows SLA of major cloud providers (AWS 99.99% for S3, 99.95% for EC2). Monitors basic SLIs — availability, latency, error rate through CloudWatch dashboards. | |
| Data Engineer | Understands SLI/SLO for data: data freshness, completeness, accuracy metrics. Monitors basic data quality indicators. | |
| Database Engineer / DBA | Understands basic database SLIs: query latency (p50, p99), availability (uptime), error rate. Monitors SLIs via dashboards. Participates in incident response when database service SLOs are violated. | |
| Desktop Developer (.NET WPF/WinUI/MAUI) | Studies SLI, SLO and SLA concepts for .NET server components of desktop ecosystem. Understands availability and latency metrics for backend services serving desktop applications. | |
| Desktop Developer (Qt/C++) | Studies SLI, SLO and SLA concepts as applied to server components of the Qt ecosystem. Understands the difference between indicators, objectives and service level agreements for backend services of desktop applications. | |
| DevOps Engineer | Understands the difference between SLI, SLO and SLA: indicators, objectives and agreements. Knows main SLIs: availability, latency, throughput, error rate. Monitors SLO dashboards and escalates on error budget violations. | |
| Flutter Developer | Understands the difference between SLI, SLO, and SLA and their significance for mobile apps. Tracks basic indicators — API response time, error rate, and backend availability. | |
| Fullstack Developer | Understands SLI/SLO for fullstack: Core Web Vitals, API latency, error rate. Monitors basic indicators. | |
| Game Server Developer | Understands SLI/SLO/SLA concepts for game server availability — player session success rate, matchmaking latency, and server uptime commitments. Follows team monitoring dashboards to track game service health indicators. | |
| Infrastructure Engineer | Understands the difference between SLI, SLO and SLA in infrastructure context: SLI as specific metrics (uptime, latency p99, error rate), SLO as target values, SLA as contractual obligations. Can read SLO dashboards, understands error budget concept and knows how infrastructure issues affect SLI. | |
| iOS Developer | Studies SLI, SLO, and SLA concepts as applied to mobile ecosystem server components. Understands availability and latency metrics for mobile APIs and their impact on iOS application user experience. | |
| IoT Engineer | Understands the difference between SLI, SLO, and SLA. Knows key IoT platform reliability metrics: API availability, command delivery latency, telemetry message loss rate. | |
| ML Engineer | Understands SLI, SLO and SLA concepts and their application to ML services. Knows typical SLIs for inference: latency p50/p95/p99, throughput, error rate and model prediction quality. | |
| MLOps Engineer | Understands the difference between SLI, SLO, and SLA. Knows key ML service metrics: inference latency, prediction throughput, model serving endpoint availability, accuracy drift. | |
| Network Engineer | Knows basic SLI/SLO/SLA concepts for network engineering and can apply them in typical tasks. Uses standard tools and follows established team practices. Understands when and why this approach is used. | |
| Platform Engineer | Understands SLI, SLO, SLA concepts and their differences for platform services. Monitors error budget via Grafana dashboards. Creates basic SLI metrics: availability, latency, throughput. Participates in SLO review meetings and escalates when burn rate is exceeded. | |
| QA Automation Engineer | Understands SLI, SLO, and SLA concepts and their significance for product quality. Knows how testing helps verify that the application meets target reliability indicators. | |
| Release Engineer | Pflicht | Knows basic SLI/SLO/SLA concepts for release engineering and can apply them in typical tasks. Uses standard tools and follows established team practices. Understands when and why this approach is applied. |
| Security Analyst | Understands SLI/SLO/SLA concepts for security operations — incident response time targets, detection coverage SLOs, and security tool availability commitments. Follows team monitoring dashboards to track security service health and alert response metrics. | |
| Site Reliability Engineer (SRE) | Understands SLI/SLO/SLA: availability, latency, error rate as indicators. Monitors SLO dashboards. Understands error budgets. Responds to SLO burn rate alerts. | |
| Technical Product Manager | Pflicht | Understands the difference between SLI, SLO and SLA and their role in product decisions. Knows basic SLIs: availability, latency, error rate. Understands how SLOs influence product decisions and engineering priorities. |
| Telecom Developer | Understands SLI/SLO/SLA concepts for telecom service delivery — call setup success rate, voice quality MOS scores, and carrier-grade availability targets (five nines). Follows team monitoring dashboards to track network function health and service quality indicators. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Android Developer | Defines SLIs/SLOs for Android application backend services. Configures error budget monitoring, creates dashboards for tracking compliance with service quality targets. | |
| Backend Developer (C#/.NET) | Defines SLIs for .NET: request duration, GC metrics, thread pool usage. Configures SLO with alerting. | |
| Backend Developer (Elixir) | Defines SLIs for Elixir services: request latency, error rate, BEAM process metrics. Configures SLO through Prometheus + Grafana, burn rate alerts. | |
| Backend Developer (Go) | Pflicht | Defines SLIs for Go services — p99 latency from middleware metrics, error rate from structured logs, and availability from health checks. Configures Prometheus-based SLI monitoring with recording rules. Understands error budgets and manages them for iterative feature releases. Participates in Go service on-call rotation. |
| Backend Developer (Java/Kotlin) | Pflicht | Defines SLIs for Java services — p99 latency from Micrometer metrics, error rate from exception tracking, and JVM health indicators (GC pauses, heap utilization). Configures SLI monitoring with Spring Boot Actuator and Prometheus. Understands error budgets and participates in on-call rotation for Java service reliability. |
| Backend Developer (Node.js) | Defines SLIs for Node.js: event loop lag, GC pause duration, request latency p99. Configures SLO with alerting via Prometheus. | |
| Backend Developer (PHP) | Defines and measures SLIs for PHP services: latency percentiles (p50/p95/p99), availability, throughput. Configures metric collection via Prometheus with PHP exporter, creates SLO dashboards in Grafana, and implements health-check endpoints for Laravel/Symfony applications with dependency verification. | |
| Backend Developer (Python) | Pflicht | Defines SLIs for Python services — p99 latency from middleware instrumentation, error rate from exception handlers, and worker pool health indicators. Configures SLI monitoring with Prometheus client and custom metrics. Understands error budgets and participates in on-call rotation for Python service reliability. |
| Backend Developer (Rust) | Defines SLIs for Rust: p99 latency, allocation rate, connection pool metrics. Configures SLOs with alerting, uses tracing for diagnostics. | |
| Backend Developer (Scala) | Defines SLI for Scala services: GC pause impact on latency, Akka actor mailbox metrics, error rates. Configures SLO with alerting. | |
| Cloud Engineer | Defines SLI/SLO for cloud services: availability (successful requests / total), latency (p50, p95, p99), throughput. Configures error budget tracking, burn rate alerts in Prometheus/CloudWatch. Understands multi-window multi-burn-rate alerts and their configuration. | |
| Data Engineer | Defines data SLI: freshness SLO, completeness targets, pipeline latency. Configures alerting on data quality metrics. | |
| Database Engineer / DBA | Defines SLI/SLO for database services: query latency budgets, connection availability, replication lag thresholds. Configures SLO-based alerts: error budget burn rate, latency degradation. Participates in SLO review. | |
| Desktop Developer (.NET WPF/WinUI/MAUI) | Defines SLIs for .NET server components — update API availability, licensing latency, telemetry uptime. Configures SLO monitoring through Prometheus/Grafana for early service degradation detection. | |
| Desktop Developer (Qt/C++) | Defines SLIs for Qt ecosystem server components — update service availability, API latency, licensing success rate. Configures SLO monitoring and alerts for early detection of backend service degradation. | |
| DevOps Engineer | Defines and implements SLI/SLO for services: selecting meaningful indicators, setting realistic targets, configuring error budget tracking. Creates SLO dashboards in Grafana with burn rate alerts, configures multi-window alerting. | |
| Engineering Manager | Configures SLI/SLO dashboards for team-owned services — defines availability, latency, and throughput indicators. Creates alerting rules for SLO breaches and error budget consumption. Participates in on-call rotation and coordinates initial incident analysis for managed services. | |
| Flutter Developer | Defines SLI and SLO for the Flutter app backend — API latency, uptime, and error budget. Configures SLO compliance monitoring and alerting when approaching thresholds. | |
| Fullstack Developer | Defines SLIs: frontend performance (LCP, FID, CLS), backend latency p99, availability. Configures SLOs with alerting. | |
| Game Server Developer | Configures SLI/SLO for game server services — tracks matchmaking latency p99, game session crash rate, and server tick rate stability. Creates dashboards for player experience indicators and alerts for degradation patterns. Participates in game-specific on-call and analyzes gameplay-impacting incidents. | |
| Infrastructure Engineer | Configures SLI/SLO monitoring for infrastructure services: defining key SLIs (availability, latency, throughput), implementing SLOs through Prometheus recording rules. Configures burn rate alerts for early warning, creates SLO dashboards with error budget tracking and sets up multi-window alerting. | |
| iOS Developer | Defines SLIs for mobile backend — API p99 latency, success rate, push delivery rate, and availability. Configures SLO monitoring and alerts for early detection of service degradation affecting iOS users. | |
| IoT Engineer | Defines SLIs for IoT services: telemetry delivery latency, gateway availability, command processing error rate. Configures monitoring for SLO tracking and alerting on violations. | |
| ML Engineer | Defines SLIs and SLOs for ML services: inference latency, model freshness, prediction accuracy and availability. Configures SLI monitoring with alerts on approaching SLO violation for production models. | |
| MLOps Engineer | Defines SLIs for ML services: p99 inference latency, prediction throughput, model freshness, data pipeline lag. Configures SLO monitoring and alerting on model degradation. | |
| Network Engineer | Confidently applies SLI/SLO/SLA for network engineering in non-standard tasks. Independently selects the optimal approach and tools. Analyzes trade-offs and proposes improvements to existing solutions. | |
| Platform Engineer | Configures SLO monitoring for platform services: multi-window burn rate alerts, error budget policies. Creates SLO dashboards with burn-down visualization. Defines SLIs for various service types (API, batch, streaming). Implements automated SLO reporting for stakeholders. | |
| QA Automation Engineer | Develops tests for SLO compliance verification — testing API response times, service availability, correct error handling. Monitors error budget in test environments. | |
| Release Engineer | Pflicht | Confidently applies SLI/SLO/SLA for release engineering in non-standard tasks. Independently selects the optimal approach and tools. Analyzes trade-offs and proposes improvements to existing solutions. |
| Security Analyst | Configures SLI/SLO for security operations — tracks mean time to detect (MTTD), alert triage completion rate, and SIEM query performance. Creates security operations dashboards and alerts for detection coverage gaps. Participates in security on-call rotation and analyzes security incident timelines. | |
| Site Reliability Engineer (SRE) | Defines SLIs for services: availability (successful requests / total), latency (p99 < threshold), quality. Configures SLO tracking in Prometheus/Grafana. Calculates error budgets. | |
| Technical Lead | Configures SLI/SLO for team-owned microservices — defines latency percentile targets, error rate thresholds, and dependency health indicators. Creates comprehensive service dashboards with burn rate alerts. Participates in on-call rotation and leads incident analysis with structured post-mortem follow-ups. | |
| Technical Product Manager | Pflicht | Independently defines SLIs and SLOs for their product jointly with engineering. Understands error budget concept and its impact on feature velocity vs reliability trade-off. Participates in SLO review and incident analysis. |
| Telecom Developer | Configures SLI/SLO for telecom network functions — tracks call setup success rate, voice quality degradation, and signaling latency. Creates carrier-grade monitoring dashboards with multi-level alerting. Participates in network operations on-call and analyzes service-impacting network incidents. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Android Developer | Designs comprehensive SLI/SLO system for the mobile platform. Defines end-to-end metrics from client UX to server infrastructure and configures automatic alerting. | |
| Backend Developer (C#/.NET) | Designs SLI/SLO framework: .NET-specific metrics, composite SLOs, error budgets. Defines reliability targets. | |
| Backend Developer (Elixir) | Designs SLI/SLO framework for the Elixir platform: BEAM-specific metrics (scheduler utilization, message queue length), error budgets, composite SLOs. | |
| Backend Developer (Go) | Pflicht | Defines and implements comprehensive SLOs for Go service portfolios with multi-signal burn rate alerting. Creates SLO dashboards correlating latency, error rate, and saturation metrics through distributed tracing. Manages error budgets driving release velocity decisions. Conducts structured post-mortem analysis and coordinates cross-service incident response. Designs graceful degradation patterns with circuit breakers and load shedding. |
| Backend Developer (Java/Kotlin) | Pflicht | Defines and implements comprehensive SLOs for Java service portfolios with burn rate alerts correlated to JVM health metrics. Creates SLO dashboards spanning Spring Boot services with distributed tracing through Sleuth/OpenTelemetry. Manages error budgets accounting for GC pauses and thread pool saturation. Conducts post-mortem analysis and designs graceful degradation with Resilience4j patterns. |
| Backend Developer (Node.js) | Designs SLI/SLO framework: Node.js-specific metrics (event loop utilization, heap usage), composite SLOs, error budgets. | |
| Backend Developer (PHP) | Designs SLO framework for PHP microservices: selecting meaningful SLIs based on user journey, establishing error budget, automating burn rate alerts. Analyzes impact of PHP-specific issues (OPcache invalidation, connection pool exhaustion) on SLOs and develops runbooks for typical incidents. | |
| Backend Developer (Python) | Pflicht | Defines and implements comprehensive SLOs for Python service portfolios with burn rate alerts correlated to worker health and event loop metrics. Creates SLO dashboards spanning Django/FastAPI services with distributed tracing through OpenTelemetry. Manages error budgets and conducts post-mortem analysis. Designs graceful degradation with circuit breaker patterns and async task prioritization. |
| Backend Developer (Rust) | Designs SLI/SLO framework: Rust-specific metrics (allocation-free hot paths), sub-millisecond SLOs, error budgets. Defines reliability targets. | |
| Backend Developer (Scala) | Designs SLI/SLO framework: JVM-aware SLIs (GC, heap), Akka Cluster health metrics, composite SLOs. Defines error budgets. | |
| Cloud Engineer | Pflicht | Designs SLO framework for cloud platform: composite SLOs for distributed systems, dependency-aware SLOs, SLO-based deployment gates. Introduces error budget policies — automated rollback on budget exhaustion, feature freeze processes. Integrates SLO with incident management. |
| Data Engineer | Designs data SLI/SLO framework: multi-dimensional data quality SLOs, error budgets for data, pipeline reliability targets. | |
| Database Engineer / DBA | Pflicht | Designs SLI/SLO framework for the database tier: multi-tier SLO (critical vs standard databases), SLI by operation type (read vs write latency). Implements error budget policies and automated remediation on SLO breach. |
| Desktop Developer (.NET WPF/WinUI/MAUI) | Designs SLI/SLO system for .NET desktop ecosystem server infrastructure with error budgets and automated alerting. Implements burn-rate monitoring and integrates SLO into release decision process. | |
| Desktop Developer (Qt/C++) | Designs comprehensive SLI/SLO system for the entire Qt ecosystem server infrastructure with error budgets and automated management. Implements SLO-based alerting and burn-rate visualization for critical desktop platform services. | |
| DevOps Engineer | Pflicht | Designs SLO framework for the organization: SLI definition standards for different service types, automated error budget calculation. Implements SLO-based alerting through Sloth/Pyrra, integrates with incident management and capacity planning. |
| Engineering Manager | Pflicht | Designs observability strategy for engineering organization services — implements distributed tracing across team boundaries, defines multi-service SLI/SLO frameworks, and establishes cross-team incident response processes. Conducts blameless post-mortems and drives reliability improvement initiatives. |
| Flutter Developer | Designs comprehensive SLI/SLO system for the Flutter app accounting for client metrics. Implements error budget policies and automatic release freezes on SLO violations. | |
| Fullstack Developer | Designs SLI/SLO framework: end-to-end user experience SLIs, composite SLOs, error budgets. | |
| Game Server Developer | Pflicht | Designs observability strategy for game server infrastructure — implements distributed tracing across matchmaking, game session, and player data services. Defines game-specific SLI/SLO frameworks covering player experience metrics, tick rate stability, and desync detection. Conducts gameplay-aware post-mortems and designs graceful degradation for peak player loads. |
| Infrastructure Engineer | Pflicht | Designs SLO framework for infrastructure platform: cascading SLOs from infrastructure to services, composite SLIs for complex systems, automated error budget calculation. Implements SLO-as-code through Sloth or OpenSLO, configures automated incident creation on breach and integrates SLO with capacity planning. |
| iOS Developer | Architects comprehensive SLI/SLO system for mobile infrastructure with error budgets and automatic management. Implements SLO-based alerting accounting for mobile-specific metrics — app startup time, sync latency, and offline recovery. | |
| IoT Engineer | Designs SLI/SLO system for IoT platform: multi-level metrics (device/gateway/cloud), error budgets for work planning, SLO correlation with device fleet business metrics. | |
| ML Engineer | Designs comprehensive SLI/SLO system for ML platform with error budgets and automated incident response. Introduces ML-specific SLIs: model drift rate, data freshness, feature availability and retraining latency. | |
| MLOps Engineer | Architects SLI/SLO system for the ML platform: metrics for training pipeline (time-to-train, GPU utilization), serving (latency, availability), data quality. Error budgets for ML releases. | |
| Network Engineer | Expertly applies SLI/SLO/SLA for network engineering to design complex systems. Optimizes existing solutions and prevents architectural mistakes. Conducts code reviews and trains colleagues on best practices. | |
| Platform Engineer | Pflicht | Designs SLO framework for IDP: automated SLO tracking (Sloth, Pyrra), error budget-driven release process. Creates self-service SLO configuration for teams. Implements SLO-based alerting instead of threshold-based. Integrates SLO compliance into deployment pipeline for gating. |
| QA Automation Engineer | Designs SLO-driven testing strategy — automatic SLI verification in CI, load tests with SLO validation, graceful degradation testing when error budget is exhausted. | |
| Release Engineer | Pflicht | Expertly applies SLI/SLO/SLA for release engineering to design complex systems. Optimizes existing solutions and prevents architectural mistakes. Conducts code reviews and trains colleagues on best practices. |
| Security Analyst | Pflicht | Designs observability strategy for security operations platforms — implements security event correlation and threat detection pipeline monitoring. Defines security-specific SLI/SLO frameworks covering MTTD, MTTR, and detection coverage metrics. Conducts security incident post-mortems and designs resilient security monitoring architectures. |
| Site Reliability Engineer (SRE) | Pflicht | Designs SLO framework: multi-window burn rate alerting, SLO-based pages, error budget policies. Implements automated SLO reporting. Integrates SLO with deployment decisions. |
| Solutions Architect | Pflicht | Designs enterprise observability strategy spanning microservice architectures — implements end-to-end distributed tracing, defines organizational SLI/SLO framework templates, and establishes cross-service dependency mapping. Conducts architectural post-mortems and designs multi-level graceful degradation strategies for complex distributed systems. |
| Technical Lead | Pflicht | Designs observability strategy with SLI / SLO / SLA. Implements distributed tracing. Defines SLIs/SLOs. Conducts post-mortems. |
| Technical Product Manager | Pflicht | Defines SLI/SLO strategy for the product with a user-centric approach. Designs error budget policies: what happens when budget is exhausted. Links SLO with product decisions — feature rollout, scaling, architecture changes. |
| Telecom Developer | Pflicht | Designs observability strategy for carrier-grade telecom platforms — implements protocol-level distributed tracing across signaling and media planes. Defines telecom-specific SLI/SLO frameworks covering five-nines availability, call quality metrics, and regulatory compliance indicators. Conducts network incident post-mortems and designs carrier-grade failover architectures. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Android Developer | Defines SLI/SLO/SLA strategy for all mobile products in the organization. Designs error budget management processes and release decisions based on quality metrics. | |
| Backend Developer (C#/.NET) | Defines SLO standards: mandatory SLI, error budget policies, performance regression detection. | |
| Backend Developer (Elixir) | Defines SLO standards for the Elixir team: mandatory SLIs per service, error budget policies, incident response for SLO breaches. Conducts SLO reviews. | |
| Backend Developer (Node.js) | Defines SLO standards: mandatory SLI per service, error budget policies, performance regression detection. | |
| Backend Developer (PHP) | Manages SLO practices for the PHP platform: SLO alignment with business stakeholders, error budget policy, SLO review processes. Coordinates SLO-based decision making across teams: reliability vs features prioritization based on error budget, SLO integration into sprint planning and release processes. | |
| Backend Developer (Rust) | Defines SLO standards: mandatory SLI per service, error budget policies, performance regression detection. Conducts SLO reviews. | |
| Backend Developer (Scala) | Defines SLO standards: mandatory SLI per service, error budget policies, incident escalation. Conducts SLO review and capacity planning. | |
| Cloud Engineer | Pflicht | Defines organizational SLO culture: SLO review process, error budget governance, SLA negotiations with clients. Introduces tooling (Sloth, Google SLO Generator) and standards for all cloud services. Balances reliability requirements and delivery speed based on error budgets. |
| Data Engineer | Defines data SLO standards: mandatory quality SLI, data freshness requirements, incident response for data issues. | |
| Database Engineer / DBA | Pflicht | Defines SLO standards for the data platform: SLO templates by DB tier, escalation policies, SLO review cadence. Coordinates SLO agreements between DBA and product teams. Establishes database reliability targets. |
| Desktop Developer (.NET WPF/WinUI/MAUI) | Defines SLA for .NET desktop ecosystem server components based on business requirements. Establishes SLO-driven development culture and coordinates reliability target alignment between product and infrastructure teams. | |
| Desktop Developer (Qt/C++) | Defines SLAs for Qt ecosystem server components based on business requirements and fosters an SLO-driven development culture. Coordinates reliability target alignment between product teams and infrastructure. | |
| DevOps Engineer | Pflicht | Defines SRE culture through SLO: standards for each service tier, error budget policies (feature freeze on exhaustion). Designs organizational SLO dashboard, review and target revision processes, product management integration. |
| Engineering Manager | Pflicht | Defines product observability strategy across engineering teams — establishes SLO-based reliability culture, coordinates cross-team incident management processes, and optimizes MTTD/MTTR metrics through tooling and process improvements. Drives error budget policy adoption and blameless post-mortem practices. |
| Flutter Developer | Establishes SLI/SLO/SLA standards for all Flutter team projects. Implements reliability engineering culture and trains the team on error budget management and decision-making. | |
| Fullstack Developer | Defines SLO standards: mandatory SLI for frontend and backend, error budget policies, performance requirements. | |
| Game Server Developer | Pflicht | Defines product observability strategy for game server platforms — establishes SLO-based approach for player experience reliability, coordinates game-specific incident management with live operations teams, and optimizes MTTD/MTTR for gameplay-impacting issues. Drives reliability culture across game development teams. |
| Infrastructure Engineer | Pflicht | Defines SLO standards for all infrastructure: standard SLIs for each component class (compute, storage, network, DB), SLO negotiation process with teams. Implements SLO-driven prioritization for engineering work, reviews team SLOs and coordinates error budget policy with product management. |
| iOS Developer | Defines SLAs for the mobile platform based on business requirements and builds SLO-driven development culture. Coordinates reliability target alignment between the iOS team, backend development, and infrastructure. | |
| IoT Engineer | Defines SLA for IoT products: formalizing availability and latency guarantees, SLO review processes, integrating error budgets into sprint planning and release decisions. | |
| ML Engineer | Defines SLI/SLO/SLA standards for organizational ML services considering downstream business impact. Designs reliability management architecture for ML platform with error budget policies and capacity planning. | |
| MLOps Engineer | Defines SLAs for ML products: latency and availability guarantees for inference, SLOs for model freshness, error budget integration into retraining and deployment decision-making processes. | |
| Network Engineer | Establishes SLI/SLO/SLA standards for the network engineering team and makes architectural decisions. Defines the technical roadmap incorporating this skill. Mentors senior engineers and influences practices of adjacent teams. | |
| Platform Engineer | Pflicht | Defines organizational SLO strategy: tiered SLO targets, error budget governance, SLA management process. Leads SRE practice adoption through SLO framework. Designs organizational error budget policy: freeze deployments, allocate engineering time on depletion. |
| QA Automation Engineer | Defines SLO testing standards for the team. Integrates SLI/SLO into test reporting, ensures each release is verified against target indicators. | |
| Release Engineer | Pflicht | Establishes SLI/SLO/SLA standards for the release engineering team and makes architectural decisions. Defines the technical roadmap considering this skill. Mentors senior engineers and influences practices of adjacent teams. |
| Security Analyst | Pflicht | Defines observability strategy for security operations — establishes SLO-based approach for detection and response capabilities, coordinates security incident management processes, and optimizes MTTD/MTTR for security events through tooling and automation improvements. |
| Site Reliability Engineer (SRE) | Pflicht | Defines organizational SLO standards: SLO requirements per tier, error budget governance, SLO review cadence. Trains teams on SRE practices. Coordinates SLO adoption. |
| Solutions Architect | Pflicht | Defines enterprise observability strategy spanning product portfolios — establishes SLO-based approach for multi-service architectures, coordinates cross-organizational incident management, and optimizes MTTD/MTTR through platform observability tooling. Drives reliability engineering culture and error budget governance. |
| Technical Lead | Pflicht | Defines product observability strategy for team-owned service portfolios — establishes SLO-based approach for microservice reliability, coordinates incident management with dependent teams, and optimizes MTTD/MTTR through improved monitoring and on-call processes. |
| Technical Product Manager | Pflicht | Defines SLI/SLO standards for the division. Introduces error budget-driven development process. Coordinates SLA agreements with customers and partners. Shapes reliability culture in the product-engineering organization. |
| Telecom Developer | Pflicht | Defines product observability strategy for carrier-grade telecom platforms — establishes SLO-based approach for five-nines service availability, coordinates network incident management with NOC teams, and optimizes MTTD/MTTR for carrier-scale service disruptions. Drives carrier-grade reliability culture. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Android Developer | Shapes the organizational culture of mobile service reliability management through SLI/SLO. Defines quality standards impacting all mobile products and server infrastructure. | |
| Backend Developer (C#/.NET) | Shapes reliability strategy: platform SLO framework, .NET performance baselines, reliability governance. | |
| Backend Developer (Elixir) | Shapes Elixir platform reliability strategy: platform-wide SLO framework, BEAM-specific reliability patterns, error budget governance for business decisions. | |
| Backend Developer (Node.js) | Shapes reliability strategy: platform SLO framework, Node.js performance baselines, error budget governance. | |
| Backend Developer (PHP) | Shapes corporate SLO culture for the PHP ecosystem: SLO hierarchy from infrastructure to business metrics, platform-level SLOs, SLO as the basis for architectural decisions. Designs automated error budget management systems, defines SLAs for internal platforms, and builds incident management processes based on SLOs. | |
| Backend Developer (Rust) | Shapes reliability strategy: platform SLO framework, Rust performance guarantees, error budget governance. Defines reliability principles. | |
| Backend Developer (Scala) | Shapes reliability strategy: platform SLO framework, JVM tuning governance, error budget management. Defines reliability engineering principles. | |
| Cloud Engineer | Pflicht | Shapes reliability engineering strategy: platform-wide SLO framework, business-aligned reliability targets, cost of reliability analysis. Designs SLO platform for automated tracking of hundreds of services, defines reliability investment priorities at organizational level. |
| Data Engineer | Shapes data reliability strategy: platform data SLO framework, data quality governance, reliability engineering for data. | |
| Database Engineer / DBA | Pflicht | Shapes organizational SLO strategy for the data tier: SLO framework covering all DBMSes, SLA for internal database-as-a-service, SLO-driven investment decisions. Defines reliability culture for database engineering. |
| Desktop Developer (.NET WPF/WinUI/MAUI) | Shapes corporate reliability management strategy for .NET desktop ecosystem server infrastructure. Defines architecture for automated error budget management and balancing reliability with development velocity. | |
| Desktop Developer (Qt/C++) | Shapes corporate reliability management strategy for the desktop ecosystem server infrastructure through SLI/SLO/SLA. Defines architecture for automated error budget management and balancing reliability with velocity. | |
| DevOps Engineer | Pflicht | Develops reliability engineering strategy based on SLO: corporate reliability standards, SLO-driven development, automated error budget management. Defines platform reliability architecture: from SLO definition to automatic scaling and DR. |
| Engineering Manager | Pflicht | Defines organizational observability strategy. Implements platform solutions. Builds reliability culture. Establishes enterprise SLO framework. |
| Flutter Developer | Defines organizational mobile product reliability strategy through SLI/SLO/SLA. Creates a framework for aligning reliability targets between mobile and backend development. | |
| Fullstack Developer | Shapes reliability strategy: platform SLO framework, fullstack performance baselines, reliability governance. | |
| Game Server Developer | Pflicht | Defines organizational observability strategy for global game platforms — implements platform-level monitoring solutions for game server fleets, builds reliability culture integrating player experience SLOs with infrastructure metrics, and establishes enterprise SLO framework for real-time gaming services across titles. |
| Infrastructure Engineer | Pflicht | Shapes SLO-driven infrastructure strategy for the company: framework for defining business-critical SLIs, SLA strategy for external and internal customers, SLO integration with FinOps. Defines SLO approach for emerging technologies (AI/ML inference, edge), C-level reporting standards and SLO correlation with business metrics. |
| iOS Developer | Shapes the corporate reliability management strategy for the mobile ecosystem through SLI/SLO/SLA. Defines architecture for automatic error budget management and balancing reliability with mobile development velocity. | |
| IoT Engineer | Shapes reliability engineering strategy for IoT ecosystem: cascading SLOs for complex processing chains, reliability standards for industrial IoT, integration with compliance. | |
| ML Engineer | Shapes reliability strategy for organizational ML platform, linking ML SLOs with business metrics. Defines reliability management approaches for compound AI systems with cascading SLOs between components. | |
| MLOps Engineer | Shapes the reliability strategy for the AI platform: cascading SLOs for ML pipelines, reliability standards for mission-critical ML systems, integration of ML-specific metrics into SRE practices. | |
| Network Engineer | Shapes SLI/SLO/SLA strategy for network engineering at the organizational level. Defines best practices and influences technology choices beyond their own team. Is a recognized expert in this area. | |
| Platform Engineer | Pflicht | Shapes reliability culture through SLOs: SLO-driven architecture decisions, automated reliability scoring. Defines SLO strategy for distributed systems: end-to-end SLOs, dependency-aware budgets. Advises C-level on SLA strategy and customer reliability expectations. |
| QA Automation Engineer | Shapes quality-driven SLO strategy for the organization. Creates an automatic SLO verification platform for all services and integrates with the release management process. | |
| Release Engineer | Pflicht | Shapes SLI/SLO/SLA strategy for release engineering at the organizational level. Defines best practices and influences technology choices beyond their own team. Is a recognized expert in this area. |
| Security Analyst | Pflicht | Defines organizational observability strategy for security operations — implements platform solutions for unified security monitoring, builds security reliability culture integrating detection SLOs with operational metrics, and establishes enterprise SLO framework for security service availability and response effectiveness. |
| Site Reliability Engineer (SRE) | Pflicht | Designs SLO platform: organizational SLO framework, automated SLO management, SLO-driven architecture decisions. Defines reliability culture and error budget policy. |
| Solutions Architect | Pflicht | Defines organizational observability strategy spanning all technology platforms — implements enterprise-grade monitoring solutions for hundreds of services, builds reliability engineering culture with organization-wide error budget governance, and establishes enterprise SLO framework driving business-aligned service reliability standards. |
| Technical Lead | Pflicht | Defines the organization's observability strategy. Implements platform solutions. Shapes reliability culture. Defines enterprise SLO framework. |
| Technical Product Manager | Pflicht | Shapes enterprise reliability strategy through SLI/SLO/SLA framework. Defines organizational reliability targets and investment priorities. Coordinates customer-facing SLA with internal SLO. Builds reliability as competitive advantage. |
| Telecom Developer | Pflicht | Defines organizational observability strategy for carrier-grade telecom platforms spanning multiple network generations — implements platform solutions for unified network monitoring, builds carrier-grade reliability culture with regulatory compliance integration, and establishes enterprise SLO framework for five-nines telecom service availability. |