Perfil de habilidad

Chaos Engineering

Litmus, Gremlin, Chaos Monkey, fault injection, game days, steady state hypothesis

Testing & QA Specialized Testing

Roles

donde aparece esta habilidad

Niveles

ruta de crecimiento estructurada

Requisitos obligatorios

los otros 8 opcionales

Dominio

Testing & QA

skills.group

Specialized Testing

Última actualización

17/3/2026

Cómo usar

Selecciona tu nivel actual y compara las expectativas.

Qué se espera en cada nivel

La tabla muestra cómo crece la profundidad desde Junior hasta Principal.

Rol	Obligatorio	Descripción
DevOps Engineer		Understands chaos engineering principles: knows why intentional failures are introduced in production, Principles of Chaos Engineering. Familiar with basic tools (Chaos Monkey, Gremlin). Understands the difference between chaos testing and regular fault injection.
Infrastructure Engineer		Understands infrastructure-level chaos: knows that server, disk, network, and DNS failures can be tested. Understands how redundancy (multi-AZ, replication) protects against infrastructure failures. Participates in disaster recovery testing.
Performance Testing Engineer	Obligatorio	Understands the fundamentals of Chaos Engineering. Applies basic practices in daily work. Follows recommendations from the team and documentation.
Platform Engineer		Understands chaos engineering in platform context: knows the platform should provide chaos testing tools, understands how Kubernetes primitives (PodDisruptionBudget) relate to chaos resilience.
Site Reliability Engineer (SRE)		Understands chaos engineering as an SRE practice: knows the connection with error budgets (chaos for verifying system stays within SLO), understands game day format. Participates in experiments as an observer and helps document results.

Rol	Obligatorio	Descripción
DevOps Engineer	Obligatorio	Conducts chaos experiments: uses Litmus Chaos or Chaos Mesh for Kubernetes, creates game days with the team. Implements basic experiments: pod kill, network delay, resource stress. Documents hypotheses, execution and conclusions.
Infrastructure Engineer	Obligatorio	Conducts infrastructure chaos experiments: tests database failover (RDS failover, Redis sentinel), network partition between AZs, disk failure scenarios. Uses AWS Fault Injection Simulator or terraform-based fault injection for cloud infrastructure.
Performance Testing Engineer	Obligatorio	Independently develops Chaos Engineering tests. Applies test design techniques. Integrates tests into CI/CD. Covers edge cases.
Platform Engineer	Obligatorio	Integrates chaos engineering into the platform: installs and configures Chaos Mesh/Litmus as platform service, creates experiment templates for developer self-service. Ensures isolation: chaos experiments don't escape target namespace.
Site Reliability Engineer (SRE)		Conducts chaos experiments for SLO validation: creates hypothesis-driven experiments with clear steady-state metrics, uses Chaos Mesh/Litmus for Kubernetes failures. Analyzes impact on SLIs and determines remediation actions based on findings.

Rol	Obligatorio	Descripción
DevOps Engineer	Obligatorio	Designs chaos engineering program: defines steady-state metrics, designs experiments with increasing complexity (single pod → availability zone → region), configures automated chaos runs in CI/CD. Integrates results with SLO/SLI monitoring to identify weaknesses.
Infrastructure Engineer	Obligatorio	Designs infrastructure resilience testing: creates automated DR drills, tests backup/restore procedures under load, implements region failover experiments. Configures infrastructure monitoring for chaos impact detection and automatic rollback.
Performance Testing Engineer	Obligatorio	Designs test strategy with Chaos Engineering. Implements automated testing at all levels. Optimizes the test pyramid. Mentors the team.
Platform Engineer	Obligatorio	Designs chaos-as-a-service platform: creates API for programmatic experiment launching, integrates with CI/CD for automated chaos testing, implements RBAC for controlling who can run which experiments. Designs safety mechanisms: abort conditions, blast radius limits.
Site Reliability Engineer (SRE)	Obligatorio	Designs chaos program linked with SRE practices: integrates chaos experiments into post-mortem follow-ups, creates continuous verification for critical paths. Implements sophisticated experiments: clock skew, DNS failures, TLS certificate expiry, cascading failure scenarios.

Rol	Obligatorio	Descripción
DevOps Engineer	Obligatorio	Implements chaos engineering culture: trains teams on experiment design, creates safety net for production chaos (abort conditions, blast radius control). Designs chaos matrix covering all failure types: infrastructure, network, application, database.
Infrastructure Engineer	Obligatorio	Defines infrastructure resilience strategy: designs multi-region failover architecture validated through chaos, creates infrastructure chaos suite for continuous verification. Standardizes DR procedures and ensures RTO/RPO compliance through regular testing.
Performance Testing Engineer	Obligatorio	Defines chaos + performance standards: performance degradation testing during failures, resilience testing under load. Implements GameDays for performance failures.
Platform Engineer	Obligatorio	Standardizes chaos engineering at platform level: designs automated resilience scoring infrastructure, creates chaos experiment marketplace for reuse. Defines platform-level chaos: testing platform components themselves (control plane, etcd, ingress).
Site Reliability Engineer (SRE)	Obligatorio	Defines chaos engineering strategy for SRE organization: creates chaos maturity assessment, designs automated resilience scoring per service. Implements chaos experiments as prerequisite for production readiness review and defines escalation procedures.

Rol	Obligatorio	Descripción
DevOps Engineer		Shapes enterprise chaos engineering strategy: designs chaos-as-a-service platform for team self-service, defines continuous verification pipeline. Influences resilience culture through executive buy-in and ROI demonstration (prevented incidents vs cost of chaos program).
Infrastructure Engineer		Shapes enterprise infrastructure resilience: designs chaos testing for multi-cloud and hybrid infrastructure, defines compliance requirements for business continuity. Influences industry standards for infrastructure resilience testing in regulated industries.
Performance Testing Engineer	Obligatorio	Designs performance resilience testing: chaos engineering integrated with load testing, automated degradation detection, resilience SLO framework.
Platform Engineer		Shapes enterprise chaos platform: designs multi-cluster chaos coordination, defines chaos governance (who, what, when, blast radius). Influences platform architecture through chaos-driven design decisions — ensuring the platform itself is chaos-resilient.
Site Reliability Engineer (SRE)	Obligatorio	Shapes enterprise resilience strategy through chaos: designs organization-wide chaos framework, defines compliance requirements for chaos testing (financial services, healthcare). Influences industry practices through publications and talks about chaos engineering ROI.

Junior 5 requisitos

DevOps Engineer

Understands chaos engineering principles: knows why intentional failures are introduced in production, Principles of Chaos Engineering. Familiar with basic tools (Chaos Monkey, Gremlin). Understands the difference between chaos testing and regular fault injection.
Infrastructure Engineer

Understands infrastructure-level chaos: knows that server, disk, network, and DNS failures can be tested. Understands how redundancy (multi-AZ, replication) protects against infrastructure failures. Participates in disaster recovery testing.
Performance Testing Engineer
Obligatorio

Understands the fundamentals of Chaos Engineering. Applies basic practices in daily work. Follows recommendations from the team and documentation.

Middle 5 requisitos

DevOps Engineer
Obligatorio

Conducts chaos experiments: uses Litmus Chaos or Chaos Mesh for Kubernetes, creates game days with the team. Implements basic experiments: pod kill, network delay, resource stress. Documents hypotheses, execution and conclusions.
Infrastructure Engineer
Obligatorio

Conducts infrastructure chaos experiments: tests database failover (RDS failover, Redis sentinel), network partition between AZs, disk failure scenarios. Uses AWS Fault Injection Simulator or terraform-based fault injection for cloud infrastructure.
Performance Testing Engineer
Obligatorio

Independently develops Chaos Engineering tests. Applies test design techniques. Integrates tests into CI/CD. Covers edge cases.

Senior 5 requisitos

DevOps Engineer
Obligatorio

Designs chaos engineering program: defines steady-state metrics, designs experiments with increasing complexity (single pod → availability zone → region), configures automated chaos runs in CI/CD. Integrates results with SLO/SLI monitoring to identify weaknesses.
Infrastructure Engineer
Obligatorio

Designs infrastructure resilience testing: creates automated DR drills, tests backup/restore procedures under load, implements region failover experiments. Configures infrastructure monitoring for chaos impact detection and automatic rollback.
Performance Testing Engineer
Obligatorio

Designs test strategy with Chaos Engineering. Implements automated testing at all levels. Optimizes the test pyramid. Mentors the team.

Lead / Staff 5 requisitos

DevOps Engineer
Obligatorio

Implements chaos engineering culture: trains teams on experiment design, creates safety net for production chaos (abort conditions, blast radius control). Designs chaos matrix covering all failure types: infrastructure, network, application, database.
Infrastructure Engineer
Obligatorio

Defines infrastructure resilience strategy: designs multi-region failover architecture validated through chaos, creates infrastructure chaos suite for continuous verification. Standardizes DR procedures and ensures RTO/RPO compliance through regular testing.
Performance Testing Engineer
Obligatorio

Defines chaos + performance standards: performance degradation testing during failures, resilience testing under load. Implements GameDays for performance failures.

Principal 5 requisitos

DevOps Engineer

Shapes enterprise chaos engineering strategy: designs chaos-as-a-service platform for team self-service, defines continuous verification pipeline. Influences resilience culture through executive buy-in and ROI demonstration (prevented incidents vs cost of chaos program).
Infrastructure Engineer

Shapes enterprise infrastructure resilience: designs chaos testing for multi-cloud and hybrid infrastructure, defines compliance requirements for business continuity. Influences industry standards for infrastructure resilience testing in regulated industries.
Performance Testing Engineer
Obligatorio

Designs performance resilience testing: chaos engineering integrated with load testing, automated degradation detection, resilience SLO framework.

Comunidad

👁 Seguir ✏️ Sugerir cambio

Cargando comentarios...