Skill-Profil

Chaos Engineering

Litmus, Gremlin, Chaos Monkey, fault injection, game days, steady state hypothesis

Testing & QA Specialized Testing

Rollen

wo dieser Skill vorkommt

Stufen

strukturierter Entwicklungspfad

Pflichtanforderungen

die anderen 8 optional

Domäne

Testing & QA

skills.group

Specialized Testing

Zuletzt aktualisiert

17.3.2026

Verwendung

Wählen Sie Ihr aktuelles Level und vergleichen Sie die Erwartungen.

Was wird auf jedem Level erwartet

Die Tabelle zeigt, wie die Tiefe von Junior bis Principal wächst.

Rolle	Pflicht	Beschreibung
DevOps Engineer		Understands chaos engineering principles: knows why intentional failures are introduced in production, Principles of Chaos Engineering. Familiar with basic tools (Chaos Monkey, Gremlin). Understands the difference between chaos testing and regular fault injection.
Infrastructure Engineer		Understands infrastructure-level chaos: knows that server, disk, network, and DNS failures can be tested. Understands how redundancy (multi-AZ, replication) protects against infrastructure failures. Participates in disaster recovery testing.
Performance Testing Engineer	Pflicht	Understands the fundamentals of Chaos Engineering. Applies basic practices in daily work. Follows recommendations from the team and documentation.
Platform Engineer		Understands chaos engineering in platform context: knows the platform should provide chaos testing tools, understands how Kubernetes primitives (PodDisruptionBudget) relate to chaos resilience.
Site Reliability Engineer (SRE)		Understands chaos engineering as an SRE practice: knows the connection with error budgets (chaos for verifying system stays within SLO), understands game day format. Participates in experiments as an observer and helps document results.

Rolle	Pflicht	Beschreibung
DevOps Engineer	Pflicht	Conducts chaos experiments: uses Litmus Chaos or Chaos Mesh for Kubernetes, creates game days with the team. Implements basic experiments: pod kill, network delay, resource stress. Documents hypotheses, execution and conclusions.
Infrastructure Engineer	Pflicht	Conducts infrastructure chaos experiments: tests database failover (RDS failover, Redis sentinel), network partition between AZs, disk failure scenarios. Uses AWS Fault Injection Simulator or terraform-based fault injection for cloud infrastructure.
Performance Testing Engineer	Pflicht	Independently develops Chaos Engineering tests. Applies test design techniques. Integrates tests into CI/CD. Covers edge cases.
Platform Engineer	Pflicht	Integrates chaos engineering into the platform: installs and configures Chaos Mesh/Litmus as platform service, creates experiment templates for developer self-service. Ensures isolation: chaos experiments don't escape target namespace.
Site Reliability Engineer (SRE)		Conducts chaos experiments for SLO validation: creates hypothesis-driven experiments with clear steady-state metrics, uses Chaos Mesh/Litmus for Kubernetes failures. Analyzes impact on SLIs and determines remediation actions based on findings.

Rolle	Pflicht	Beschreibung
DevOps Engineer	Pflicht	Designs chaos engineering program: defines steady-state metrics, designs experiments with increasing complexity (single pod → availability zone → region), configures automated chaos runs in CI/CD. Integrates results with SLO/SLI monitoring to identify weaknesses.
Infrastructure Engineer	Pflicht	Designs infrastructure resilience testing: creates automated DR drills, tests backup/restore procedures under load, implements region failover experiments. Configures infrastructure monitoring for chaos impact detection and automatic rollback.
Performance Testing Engineer	Pflicht	Designs test strategy with Chaos Engineering. Implements automated testing at all levels. Optimizes the test pyramid. Mentors the team.
Platform Engineer	Pflicht	Designs chaos-as-a-service platform: creates API for programmatic experiment launching, integrates with CI/CD for automated chaos testing, implements RBAC for controlling who can run which experiments. Designs safety mechanisms: abort conditions, blast radius limits.
Site Reliability Engineer (SRE)	Pflicht	Designs chaos program linked with SRE practices: integrates chaos experiments into post-mortem follow-ups, creates continuous verification for critical paths. Implements sophisticated experiments: clock skew, DNS failures, TLS certificate expiry, cascading failure scenarios.

Rolle	Pflicht	Beschreibung
DevOps Engineer	Pflicht	Implements chaos engineering culture: trains teams on experiment design, creates safety net for production chaos (abort conditions, blast radius control). Designs chaos matrix covering all failure types: infrastructure, network, application, database.
Infrastructure Engineer	Pflicht	Defines infrastructure resilience strategy: designs multi-region failover architecture validated through chaos, creates infrastructure chaos suite for continuous verification. Standardizes DR procedures and ensures RTO/RPO compliance through regular testing.
Performance Testing Engineer	Pflicht	Defines chaos + performance standards: performance degradation testing during failures, resilience testing under load. Implements GameDays for performance failures.
Platform Engineer	Pflicht	Standardizes chaos engineering at platform level: designs automated resilience scoring infrastructure, creates chaos experiment marketplace for reuse. Defines platform-level chaos: testing platform components themselves (control plane, etcd, ingress).
Site Reliability Engineer (SRE)	Pflicht	Defines chaos engineering strategy for SRE organization: creates chaos maturity assessment, designs automated resilience scoring per service. Implements chaos experiments as prerequisite for production readiness review and defines escalation procedures.

Rolle	Pflicht	Beschreibung
DevOps Engineer		Shapes enterprise chaos engineering strategy: designs chaos-as-a-service platform for team self-service, defines continuous verification pipeline. Influences resilience culture through executive buy-in and ROI demonstration (prevented incidents vs cost of chaos program).
Infrastructure Engineer		Shapes enterprise infrastructure resilience: designs chaos testing for multi-cloud and hybrid infrastructure, defines compliance requirements for business continuity. Influences industry standards for infrastructure resilience testing in regulated industries.
Performance Testing Engineer	Pflicht	Designs performance resilience testing: chaos engineering integrated with load testing, automated degradation detection, resilience SLO framework.
Platform Engineer		Shapes enterprise chaos platform: designs multi-cluster chaos coordination, defines chaos governance (who, what, when, blast radius). Influences platform architecture through chaos-driven design decisions — ensuring the platform itself is chaos-resilient.
Site Reliability Engineer (SRE)	Pflicht	Shapes enterprise resilience strategy through chaos: designs organization-wide chaos framework, defines compliance requirements for chaos testing (financial services, healthcare). Influences industry practices through publications and talks about chaos engineering ROI.

Junior 5 Anforderungen

DevOps Engineer

Understands chaos engineering principles: knows why intentional failures are introduced in production, Principles of Chaos Engineering. Familiar with basic tools (Chaos Monkey, Gremlin). Understands the difference between chaos testing and regular fault injection.
Infrastructure Engineer

Understands infrastructure-level chaos: knows that server, disk, network, and DNS failures can be tested. Understands how redundancy (multi-AZ, replication) protects against infrastructure failures. Participates in disaster recovery testing.
Performance Testing Engineer
Pflicht

Understands the fundamentals of Chaos Engineering. Applies basic practices in daily work. Follows recommendations from the team and documentation.

Middle 5 Anforderungen

DevOps Engineer
Pflicht

Conducts chaos experiments: uses Litmus Chaos or Chaos Mesh for Kubernetes, creates game days with the team. Implements basic experiments: pod kill, network delay, resource stress. Documents hypotheses, execution and conclusions.
Infrastructure Engineer
Pflicht

Conducts infrastructure chaos experiments: tests database failover (RDS failover, Redis sentinel), network partition between AZs, disk failure scenarios. Uses AWS Fault Injection Simulator or terraform-based fault injection for cloud infrastructure.
Performance Testing Engineer
Pflicht

Independently develops Chaos Engineering tests. Applies test design techniques. Integrates tests into CI/CD. Covers edge cases.

Senior 5 Anforderungen

DevOps Engineer
Pflicht

Designs chaos engineering program: defines steady-state metrics, designs experiments with increasing complexity (single pod → availability zone → region), configures automated chaos runs in CI/CD. Integrates results with SLO/SLI monitoring to identify weaknesses.
Infrastructure Engineer
Pflicht

Designs infrastructure resilience testing: creates automated DR drills, tests backup/restore procedures under load, implements region failover experiments. Configures infrastructure monitoring for chaos impact detection and automatic rollback.
Performance Testing Engineer
Pflicht

Designs test strategy with Chaos Engineering. Implements automated testing at all levels. Optimizes the test pyramid. Mentors the team.

Lead / Staff 5 Anforderungen

DevOps Engineer
Pflicht

Implements chaos engineering culture: trains teams on experiment design, creates safety net for production chaos (abort conditions, blast radius control). Designs chaos matrix covering all failure types: infrastructure, network, application, database.
Infrastructure Engineer
Pflicht

Defines infrastructure resilience strategy: designs multi-region failover architecture validated through chaos, creates infrastructure chaos suite for continuous verification. Standardizes DR procedures and ensures RTO/RPO compliance through regular testing.
Performance Testing Engineer
Pflicht

Defines chaos + performance standards: performance degradation testing during failures, resilience testing under load. Implements GameDays for performance failures.

Principal 5 Anforderungen

DevOps Engineer

Shapes enterprise chaos engineering strategy: designs chaos-as-a-service platform for team self-service, defines continuous verification pipeline. Influences resilience culture through executive buy-in and ROI demonstration (prevented incidents vs cost of chaos program).
Infrastructure Engineer

Shapes enterprise infrastructure resilience: designs chaos testing for multi-cloud and hybrid infrastructure, defines compliance requirements for business continuity. Influences industry standards for infrastructure resilience testing in regulated industries.
Performance Testing Engineer
Pflicht

Designs performance resilience testing: chaos engineering integrated with load testing, automated degradation detection, resilience SLO framework.

Community

👁 Beobachten ✏️ Aenderung vorschlagen

Kommentare werden geladen...