Perfil de habilidad

Chaos Engineering

Litmus, Gremlin, Chaos Monkey, fault injection, game days, steady state hypothesis

Testing & QA Specialized Testing

Roles

5

donde aparece esta habilidad

Niveles

5

ruta de crecimiento estructurada

Requisitos obligatorios

17

los otros 8 opcionales

Dominio

Testing & QA

skills.group

Specialized Testing

Última actualización

17/3/2026

Cómo usar

Selecciona tu nivel actual y compara las expectativas.

Qué se espera en cada nivel

La tabla muestra cómo crece la profundidad desde Junior hasta Principal.

Rol Obligatorio Descripción
DevOps Engineer Understands chaos engineering principles: knows why intentional failures are introduced in production, Principles of Chaos Engineering. Familiar with basic tools (Chaos Monkey, Gremlin). Understands the difference between chaos testing and regular fault injection.
Infrastructure Engineer Understands infrastructure-level chaos: knows that server, disk, network, and DNS failures can be tested. Understands how redundancy (multi-AZ, replication) protects against infrastructure failures. Participates in disaster recovery testing.
Performance Testing Engineer Obligatorio Understands the fundamentals of Chaos Engineering. Applies basic practices in daily work. Follows recommendations from the team and documentation.
Platform Engineer Understands chaos engineering in platform context: knows the platform should provide chaos testing tools, understands how Kubernetes primitives (PodDisruptionBudget) relate to chaos resilience.
Site Reliability Engineer (SRE) Understands chaos engineering as an SRE practice: knows the connection with error budgets (chaos for verifying system stays within SLO), understands game day format. Participates in experiments as an observer and helps document results.
Rol Obligatorio Descripción
DevOps Engineer Obligatorio Conducts chaos experiments: uses Litmus Chaos or Chaos Mesh for Kubernetes, creates game days with the team. Implements basic experiments: pod kill, network delay, resource stress. Documents hypotheses, execution and conclusions.
Infrastructure Engineer Obligatorio Conducts infrastructure chaos experiments: tests database failover (RDS failover, Redis sentinel), network partition between AZs, disk failure scenarios. Uses AWS Fault Injection Simulator or terraform-based fault injection for cloud infrastructure.
Performance Testing Engineer Obligatorio Independently develops Chaos Engineering tests. Applies test design techniques. Integrates tests into CI/CD. Covers edge cases.
Platform Engineer Obligatorio Integrates chaos engineering into the platform: installs and configures Chaos Mesh/Litmus as platform service, creates experiment templates for developer self-service. Ensures isolation: chaos experiments don't escape target namespace.
Site Reliability Engineer (SRE) Conducts chaos experiments for SLO validation: creates hypothesis-driven experiments with clear steady-state metrics, uses Chaos Mesh/Litmus for Kubernetes failures. Analyzes impact on SLIs and determines remediation actions based on findings.
Rol Obligatorio Descripción
DevOps Engineer Obligatorio Designs chaos engineering program: defines steady-state metrics, designs experiments with increasing complexity (single pod → availability zone → region), configures automated chaos runs in CI/CD. Integrates results with SLO/SLI monitoring to identify weaknesses.
Infrastructure Engineer Obligatorio Designs infrastructure resilience testing: creates automated DR drills, tests backup/restore procedures under load, implements region failover experiments. Configures infrastructure monitoring for chaos impact detection and automatic rollback.
Performance Testing Engineer Obligatorio Designs test strategy with Chaos Engineering. Implements automated testing at all levels. Optimizes the test pyramid. Mentors the team.
Platform Engineer Obligatorio Designs chaos-as-a-service platform: creates API for programmatic experiment launching, integrates with CI/CD for automated chaos testing, implements RBAC for controlling who can run which experiments. Designs safety mechanisms: abort conditions, blast radius limits.
Site Reliability Engineer (SRE) Obligatorio Designs chaos program linked with SRE practices: integrates chaos experiments into post-mortem follow-ups, creates continuous verification for critical paths. Implements sophisticated experiments: clock skew, DNS failures, TLS certificate expiry, cascading failure scenarios.
Rol Obligatorio Descripción
DevOps Engineer Obligatorio Implements chaos engineering culture: trains teams on experiment design, creates safety net for production chaos (abort conditions, blast radius control). Designs chaos matrix covering all failure types: infrastructure, network, application, database.
Infrastructure Engineer Obligatorio Defines infrastructure resilience strategy: designs multi-region failover architecture validated through chaos, creates infrastructure chaos suite for continuous verification. Standardizes DR procedures and ensures RTO/RPO compliance through regular testing.
Performance Testing Engineer Obligatorio Defines chaos + performance standards: performance degradation testing during failures, resilience testing under load. Implements GameDays for performance failures.
Platform Engineer Obligatorio Standardizes chaos engineering at platform level: designs automated resilience scoring infrastructure, creates chaos experiment marketplace for reuse. Defines platform-level chaos: testing platform components themselves (control plane, etcd, ingress).
Site Reliability Engineer (SRE) Obligatorio Defines chaos engineering strategy for SRE organization: creates chaos maturity assessment, designs automated resilience scoring per service. Implements chaos experiments as prerequisite for production readiness review and defines escalation procedures.
Rol Obligatorio Descripción
DevOps Engineer Shapes enterprise chaos engineering strategy: designs chaos-as-a-service platform for team self-service, defines continuous verification pipeline. Influences resilience culture through executive buy-in and ROI demonstration (prevented incidents vs cost of chaos program).
Infrastructure Engineer Shapes enterprise infrastructure resilience: designs chaos testing for multi-cloud and hybrid infrastructure, defines compliance requirements for business continuity. Influences industry standards for infrastructure resilience testing in regulated industries.
Performance Testing Engineer Obligatorio Designs performance resilience testing: chaos engineering integrated with load testing, automated degradation detection, resilience SLO framework.
Platform Engineer Shapes enterprise chaos platform: designs multi-cluster chaos coordination, defines chaos governance (who, what, when, blast radius). Influences platform architecture through chaos-driven design decisions — ensuring the platform itself is chaos-resilient.
Site Reliability Engineer (SRE) Obligatorio Shapes enterprise resilience strategy through chaos: designs organization-wide chaos framework, defines compliance requirements for chaos testing (financial services, healthcare). Influences industry practices through publications and talks about chaos engineering ROI.

Comunidad

👁 Seguir ✏️ Sugerir cambio Inicia sesión para sugerir cambios
📋 Propuestas
Aún no hay propuestas para Chaos Engineering
Cargando comentarios...