Skill-Profil

Chaos Engineering

Litmus, Gremlin, Chaos Monkey, fault injection, game days, steady state hypothesis

Testing & QA Specialized Testing

Rollen

5

wo dieser Skill vorkommt

Stufen

5

strukturierter Entwicklungspfad

Pflichtanforderungen

17

die anderen 8 optional

Domäne

Testing & QA

skills.group

Specialized Testing

Zuletzt aktualisiert

17.3.2026

Verwendung

Wählen Sie Ihr aktuelles Level und vergleichen Sie die Erwartungen.

Was wird auf jedem Level erwartet

Die Tabelle zeigt, wie die Tiefe von Junior bis Principal wächst.

Rolle Pflicht Beschreibung
DevOps Engineer Understands chaos engineering principles: knows why intentional failures are introduced in production, Principles of Chaos Engineering. Familiar with basic tools (Chaos Monkey, Gremlin). Understands the difference between chaos testing and regular fault injection.
Infrastructure Engineer Understands infrastructure-level chaos: knows that server, disk, network, and DNS failures can be tested. Understands how redundancy (multi-AZ, replication) protects against infrastructure failures. Participates in disaster recovery testing.
Performance Testing Engineer Pflicht Understands the fundamentals of Chaos Engineering. Applies basic practices in daily work. Follows recommendations from the team and documentation.
Platform Engineer Understands chaos engineering in platform context: knows the platform should provide chaos testing tools, understands how Kubernetes primitives (PodDisruptionBudget) relate to chaos resilience.
Site Reliability Engineer (SRE) Understands chaos engineering as an SRE practice: knows the connection with error budgets (chaos for verifying system stays within SLO), understands game day format. Participates in experiments as an observer and helps document results.
Rolle Pflicht Beschreibung
DevOps Engineer Pflicht Conducts chaos experiments: uses Litmus Chaos or Chaos Mesh for Kubernetes, creates game days with the team. Implements basic experiments: pod kill, network delay, resource stress. Documents hypotheses, execution and conclusions.
Infrastructure Engineer Pflicht Conducts infrastructure chaos experiments: tests database failover (RDS failover, Redis sentinel), network partition between AZs, disk failure scenarios. Uses AWS Fault Injection Simulator or terraform-based fault injection for cloud infrastructure.
Performance Testing Engineer Pflicht Independently develops Chaos Engineering tests. Applies test design techniques. Integrates tests into CI/CD. Covers edge cases.
Platform Engineer Pflicht Integrates chaos engineering into the platform: installs and configures Chaos Mesh/Litmus as platform service, creates experiment templates for developer self-service. Ensures isolation: chaos experiments don't escape target namespace.
Site Reliability Engineer (SRE) Conducts chaos experiments for SLO validation: creates hypothesis-driven experiments with clear steady-state metrics, uses Chaos Mesh/Litmus for Kubernetes failures. Analyzes impact on SLIs and determines remediation actions based on findings.
Rolle Pflicht Beschreibung
DevOps Engineer Pflicht Designs chaos engineering program: defines steady-state metrics, designs experiments with increasing complexity (single pod → availability zone → region), configures automated chaos runs in CI/CD. Integrates results with SLO/SLI monitoring to identify weaknesses.
Infrastructure Engineer Pflicht Designs infrastructure resilience testing: creates automated DR drills, tests backup/restore procedures under load, implements region failover experiments. Configures infrastructure monitoring for chaos impact detection and automatic rollback.
Performance Testing Engineer Pflicht Designs test strategy with Chaos Engineering. Implements automated testing at all levels. Optimizes the test pyramid. Mentors the team.
Platform Engineer Pflicht Designs chaos-as-a-service platform: creates API for programmatic experiment launching, integrates with CI/CD for automated chaos testing, implements RBAC for controlling who can run which experiments. Designs safety mechanisms: abort conditions, blast radius limits.
Site Reliability Engineer (SRE) Pflicht Designs chaos program linked with SRE practices: integrates chaos experiments into post-mortem follow-ups, creates continuous verification for critical paths. Implements sophisticated experiments: clock skew, DNS failures, TLS certificate expiry, cascading failure scenarios.
Rolle Pflicht Beschreibung
DevOps Engineer Pflicht Implements chaos engineering culture: trains teams on experiment design, creates safety net for production chaos (abort conditions, blast radius control). Designs chaos matrix covering all failure types: infrastructure, network, application, database.
Infrastructure Engineer Pflicht Defines infrastructure resilience strategy: designs multi-region failover architecture validated through chaos, creates infrastructure chaos suite for continuous verification. Standardizes DR procedures and ensures RTO/RPO compliance through regular testing.
Performance Testing Engineer Pflicht Defines chaos + performance standards: performance degradation testing during failures, resilience testing under load. Implements GameDays for performance failures.
Platform Engineer Pflicht Standardizes chaos engineering at platform level: designs automated resilience scoring infrastructure, creates chaos experiment marketplace for reuse. Defines platform-level chaos: testing platform components themselves (control plane, etcd, ingress).
Site Reliability Engineer (SRE) Pflicht Defines chaos engineering strategy for SRE organization: creates chaos maturity assessment, designs automated resilience scoring per service. Implements chaos experiments as prerequisite for production readiness review and defines escalation procedures.
Rolle Pflicht Beschreibung
DevOps Engineer Shapes enterprise chaos engineering strategy: designs chaos-as-a-service platform for team self-service, defines continuous verification pipeline. Influences resilience culture through executive buy-in and ROI demonstration (prevented incidents vs cost of chaos program).
Infrastructure Engineer Shapes enterprise infrastructure resilience: designs chaos testing for multi-cloud and hybrid infrastructure, defines compliance requirements for business continuity. Influences industry standards for infrastructure resilience testing in regulated industries.
Performance Testing Engineer Pflicht Designs performance resilience testing: chaos engineering integrated with load testing, automated degradation detection, resilience SLO framework.
Platform Engineer Shapes enterprise chaos platform: designs multi-cluster chaos coordination, defines chaos governance (who, what, when, blast radius). Influences platform architecture through chaos-driven design decisions — ensuring the platform itself is chaos-resilient.
Site Reliability Engineer (SRE) Pflicht Shapes enterprise resilience strategy through chaos: designs organization-wide chaos framework, defines compliance requirements for chaos testing (financial services, healthcare). Influences industry practices through publications and talks about chaos engineering ROI.

Community

👁 Beobachten ✏️ Aenderung vorschlagen Anmelden, um Aenderungen vorzuschlagen
📋 Vorschlaege
Noch keine Vorschlaege fuer Chaos Engineering
Kommentare werden geladen...