Dominio
Testing & QA
Perfil de habilidad
Litmus, Gremlin, Chaos Monkey, fault injection, game days, steady state hypothesis
Roles
5
donde aparece esta habilidad
Niveles
5
ruta de crecimiento estructurada
Requisitos obligatorios
17
los otros 8 opcionales
Testing & QA
Specialized Testing
17/3/2026
Selecciona tu nivel actual y compara las expectativas.
La tabla muestra cómo crece la profundidad desde Junior hasta Principal.
| Rol | Obligatorio | Descripción |
|---|---|---|
| DevOps Engineer | Understands chaos engineering principles: knows why intentional failures are introduced in production, Principles of Chaos Engineering. Familiar with basic tools (Chaos Monkey, Gremlin). Understands the difference between chaos testing and regular fault injection. | |
| Infrastructure Engineer | Understands infrastructure-level chaos: knows that server, disk, network, and DNS failures can be tested. Understands how redundancy (multi-AZ, replication) protects against infrastructure failures. Participates in disaster recovery testing. | |
| Performance Testing Engineer | Obligatorio | Understands the fundamentals of Chaos Engineering. Applies basic practices in daily work. Follows recommendations from the team and documentation. |
| Platform Engineer | Understands chaos engineering in platform context: knows the platform should provide chaos testing tools, understands how Kubernetes primitives (PodDisruptionBudget) relate to chaos resilience. | |
| Site Reliability Engineer (SRE) | Understands chaos engineering as an SRE practice: knows the connection with error budgets (chaos for verifying system stays within SLO), understands game day format. Participates in experiments as an observer and helps document results. |
| Rol | Obligatorio | Descripción |
|---|---|---|
| DevOps Engineer | Obligatorio | Conducts chaos experiments: uses Litmus Chaos or Chaos Mesh for Kubernetes, creates game days with the team. Implements basic experiments: pod kill, network delay, resource stress. Documents hypotheses, execution and conclusions. |
| Infrastructure Engineer | Obligatorio | Conducts infrastructure chaos experiments: tests database failover (RDS failover, Redis sentinel), network partition between AZs, disk failure scenarios. Uses AWS Fault Injection Simulator or terraform-based fault injection for cloud infrastructure. |
| Performance Testing Engineer | Obligatorio | Independently develops Chaos Engineering tests. Applies test design techniques. Integrates tests into CI/CD. Covers edge cases. |
| Platform Engineer | Obligatorio | Integrates chaos engineering into the platform: installs and configures Chaos Mesh/Litmus as platform service, creates experiment templates for developer self-service. Ensures isolation: chaos experiments don't escape target namespace. |
| Site Reliability Engineer (SRE) | Conducts chaos experiments for SLO validation: creates hypothesis-driven experiments with clear steady-state metrics, uses Chaos Mesh/Litmus for Kubernetes failures. Analyzes impact on SLIs and determines remediation actions based on findings. |
| Rol | Obligatorio | Descripción |
|---|---|---|
| DevOps Engineer | Obligatorio | Designs chaos engineering program: defines steady-state metrics, designs experiments with increasing complexity (single pod → availability zone → region), configures automated chaos runs in CI/CD. Integrates results with SLO/SLI monitoring to identify weaknesses. |
| Infrastructure Engineer | Obligatorio | Designs infrastructure resilience testing: creates automated DR drills, tests backup/restore procedures under load, implements region failover experiments. Configures infrastructure monitoring for chaos impact detection and automatic rollback. |
| Performance Testing Engineer | Obligatorio | Designs test strategy with Chaos Engineering. Implements automated testing at all levels. Optimizes the test pyramid. Mentors the team. |
| Platform Engineer | Obligatorio | Designs chaos-as-a-service platform: creates API for programmatic experiment launching, integrates with CI/CD for automated chaos testing, implements RBAC for controlling who can run which experiments. Designs safety mechanisms: abort conditions, blast radius limits. |
| Site Reliability Engineer (SRE) | Obligatorio | Designs chaos program linked with SRE practices: integrates chaos experiments into post-mortem follow-ups, creates continuous verification for critical paths. Implements sophisticated experiments: clock skew, DNS failures, TLS certificate expiry, cascading failure scenarios. |
| Rol | Obligatorio | Descripción |
|---|---|---|
| DevOps Engineer | Obligatorio | Implements chaos engineering culture: trains teams on experiment design, creates safety net for production chaos (abort conditions, blast radius control). Designs chaos matrix covering all failure types: infrastructure, network, application, database. |
| Infrastructure Engineer | Obligatorio | Defines infrastructure resilience strategy: designs multi-region failover architecture validated through chaos, creates infrastructure chaos suite for continuous verification. Standardizes DR procedures and ensures RTO/RPO compliance through regular testing. |
| Performance Testing Engineer | Obligatorio | Defines chaos + performance standards: performance degradation testing during failures, resilience testing under load. Implements GameDays for performance failures. |
| Platform Engineer | Obligatorio | Standardizes chaos engineering at platform level: designs automated resilience scoring infrastructure, creates chaos experiment marketplace for reuse. Defines platform-level chaos: testing platform components themselves (control plane, etcd, ingress). |
| Site Reliability Engineer (SRE) | Obligatorio | Defines chaos engineering strategy for SRE organization: creates chaos maturity assessment, designs automated resilience scoring per service. Implements chaos experiments as prerequisite for production readiness review and defines escalation procedures. |
| Rol | Obligatorio | Descripción |
|---|---|---|
| DevOps Engineer | Shapes enterprise chaos engineering strategy: designs chaos-as-a-service platform for team self-service, defines continuous verification pipeline. Influences resilience culture through executive buy-in and ROI demonstration (prevented incidents vs cost of chaos program). | |
| Infrastructure Engineer | Shapes enterprise infrastructure resilience: designs chaos testing for multi-cloud and hybrid infrastructure, defines compliance requirements for business continuity. Influences industry standards for infrastructure resilience testing in regulated industries. | |
| Performance Testing Engineer | Obligatorio | Designs performance resilience testing: chaos engineering integrated with load testing, automated degradation detection, resilience SLO framework. |
| Platform Engineer | Shapes enterprise chaos platform: designs multi-cluster chaos coordination, defines chaos governance (who, what, when, blast radius). Influences platform architecture through chaos-driven design decisions — ensuring the platform itself is chaos-resilient. | |
| Site Reliability Engineer (SRE) | Obligatorio | Shapes enterprise resilience strategy through chaos: designs organization-wide chaos framework, defines compliance requirements for chaos testing (financial services, healthcare). Influences industry practices through publications and talks about chaos engineering ROI. |