Skill Profile

Disaster Recovery Design

RPO/RTO, multi-region, failover strategies, backups, recovery plans

Architecture & System Design System Design

Roles

9

where this skill appears

Levels

5

structured growth path

Mandatory requirements

24

the other 18 optional

Domain

Architecture & System Design

Group

System Design

Last updated

3/17/2026

How to Use

Choose your current level and compare expectations. The items below show what to cover to advance to the next level.

What is Expected at Each Level

The table shows how skill depth grows from Junior to Principal. Click a row to see details.

Role Required Description
Cloud Engineer Understands basic disaster recovery concepts for cloud: backup strategies, recovery point/time objectives (RPO/RTO), and availability zones. Follows team procedures for executing DR runbooks and verifying backup integrity.
Database Engineer / DBA Understands basic architectural concepts of Disaster Recovery Design. Follows team's architectural decisions. Understands main patterns.
DevOps Engineer Understands DR concepts: RPO, RTO, backup/restore, hot/warm/cold standby. Performs scheduled backups, verifies backup integrity by instruction. Knows main risks and failure scenarios in cloud infrastructure.
Infrastructure Engineer Understands basic disaster recovery concepts: backup types (full, incremental, differential), recovery procedures, and failover basics. Follows team guidelines for backup verification, restore testing, and hardware redundancy checks.
Network Engineer Knows basic disaster recovery concepts for network engineering and can apply them in typical tasks. Uses standard tools and follows established team practices. Understands when and why this approach is used.
Platform Engineer Understands RPO/RTO metrics for platform services. Participates in DR drills following runbooks: verifies backup operability, tests restore procedures. Configures basic backup policies (Velero for K8s, AWS Backup). Documents DR testing results.
Site Reliability Engineer (SRE) Understands DR concepts: RPO, RTO, backup types. Follows DR procedures: failover runbooks, backup verification. Participates in DR drills.
Role Required Description
Cloud Engineer Applies DR design for cloud workloads: multi-AZ deployments, cross-region replication, automated failover with Route 53/Traffic Manager. Implements backup automation with lifecycle policies and conducts regular DR drill exercises.
Database Engineer / DBA Participates in DR processes: performs backup verification, tests restore procedures per runbooks. Configures automated snapshots and cross-region backup copying. Monitors replication lag and backup status.
DevOps Engineer Implements DR solutions: automated backups (Velero for Kubernetes, AWS Backup), cross-region data replication. Configures automated database failover, tests restore procedures. Documents DR plans and runbooks.
Engineering Manager Applies DR planning in project risk assessments and business continuity discussions. Understands trade-offs between DR costs and recovery capabilities. Coordinates team participation in DR drills and incident response exercises.
Infrastructure Engineer Applies DR design for on-premise and hybrid infrastructure: RAID configurations, SAN replication, warm/hot standby servers. Implements automated backup schedules and conducts quarterly DR testing with documented recovery procedures.
Network Engineer Confidently applies disaster recovery for network engineering in non-standard tasks. Independently selects the optimal approach and tools. Analyzes trade-offs and proposes improvements to existing solutions.
Platform Engineer Configures DR infrastructure for the platform: Velero backup schedules, cross-region S3 replication, database replicas. Creates automated DR runbooks for key services. Conducts quarterly DR drills and analyzes gaps. Configures monitoring for RPO compliance.
Site Reliability Engineer (SRE) Implements DR solutions: automated backups, cross-region replication, failover testing. Documents DR plans. Configures backup monitoring and alerting on backup failures.
Role Required Description
Cloud Engineer Required Designs DR architecture for multi-cloud environments: pilot light, warm standby, and multi-site active-active patterns. Implements chaos engineering for DR validation. Makes ADR decisions on RPO/RTO trade-offs and cost optimization for resilience.
Database Engineer / DBA Required Designs DR for the database tier: multi-region replication, automated failover (Patroni, MHA), backup verification through automated restores. Defines RPO/RTO for different tiers. Conducts DR drills.
DevOps Engineer Required Designs DR architecture: multi-region active-passive and active-active configurations, automated failover with DNS. Implements chaos engineering (Chaos Monkey, Litmus) for DR validation. Defines RPO/RTO for each service, automates DR testing.
Engineering Manager Required Designs DR processes aligned with business continuity requirements and compliance mandates. Evaluates DR investment ROI and negotiates RPO/RTO targets with stakeholders. Establishes regular DR testing cadence and post-mortem review practices.
Infrastructure Engineer Required Designs disaster recovery for critical infrastructure: multi-AZ architecture with automatic failover, backup strategy with cross-region replication, recovery runbooks. Configures automatic DR testing through chaos engineering (Chaos Monkey, Litmus), defines RPO/RTO for each component.
Network Engineer Expertly applies disaster recovery for network engineering to design complex systems. Optimizes existing solutions and prevents architectural mistakes. Conducts code reviews and trains colleagues on best practices.
Platform Engineer Required Designs DR architecture for IDP: multi-region active-passive, pilot light, warm standby for platform components. Implements chaos engineering (Litmus, Gremlin) for DR plan validation. Creates automated failover with DNS-based switching and health-check driven promotion.
Site Reliability Engineer (SRE) Required Designs DR architecture: active-passive vs active-active, pilot light, warm standby. Implements automated failover. Conducts chaos engineering for DR validation. Defines RTO/RPO by tier.
Solutions Architect Required Designs enterprise DR architecture with multi-region failover, data replication strategies, and automated recovery orchestration. Evaluates non-functional requirements for resilience. Makes ADR decisions on active-passive vs active-active patterns.
Role Required Description
Cloud Engineer Required Defines product architectural strategy with Disaster Recovery Design. Establishes architecture guidelines. Conducts architecture review.
Database Engineer / DBA Required Defines DR standards for the data platform: RPO/RTO by tier, DR testing schedule, failover procedures. Coordinates DR drills with cross-functional teams. Creates incident playbooks for database failures.
DevOps Engineer Required Defines organizational DR strategy: service classification by criticality, RPO/RTO standards for each tier. Designs automated DR testing platform, game day processes and tabletop exercises. Manages DR budget and prioritization.
Engineering Manager Required Defines product architectural strategy with Disaster Recovery Design. Shapes architecture guidelines. Conducts architecture reviews.
Infrastructure Engineer Required Defines DR standards for organizational infrastructure: service classification by criticality (Tier 1-4), standard DR patterns for each tier, regular DR drills. Reviews team DR plans, implements automated failover testing and coordinates quarterly disaster recovery exercises.
Network Engineer Establishes disaster recovery standards for the network engineering team and makes architectural decisions. Defines the technical roadmap incorporating this skill. Mentors senior engineers and influences practices of adjacent teams.
Platform Engineer Required Defines organizational DR strategy: tiered RPO/RTO by service criticality, budget allocation, compliance requirements. Leads game days and tabletop exercises for DR plans. Designs organizational DR governance with regular review and improvement cycles.
Site Reliability Engineer (SRE) Required Defines organizational DR standards: tiered recovery model, mandatory DR testing schedule, communication plan. Coordinates cross-team DR drills. Implements DR metrics.
Solutions Architect Required Defines product architectural strategy with Disaster Recovery Design. Establishes architecture guidelines. Conducts architecture reviews.
Role Required Description
Cloud Engineer Required Shapes organizational DR strategy: multi-region active-active vs pilot light vs warm standby, RPO/RTO matrix by criticality. Designs automated failover through Route 53 health checks and cross-region replication. Organizes regular DR drills and gameday exercises.
Database Engineer / DBA Required Shapes organizational disaster recovery strategy: multi-region active-active vs active-passive, cross-cloud DR, RTO/RPO frameworks. Defines DR governance, compliance requirements, and investments in database resilience.
DevOps Engineer Required Develops corporate business continuity and disaster recovery strategy: multi-cloud DR, automated failover for the entire platform. Defines resilience engineering architecture: chaos engineering culture, gameday framework, continuous DR validation.
Engineering Manager Required Defines organizational architectural strategy. Designs reference architectures. Establishes architecture governance.
Infrastructure Engineer Required Shapes Business Continuity and Disaster Recovery strategy for the company: active-active multi-region architecture, DR for multi-cloud environments, compliance with regulatory DR requirements. Defines DR infrastructure investments, designs full region loss scenarios and coordinates DR strategy with company leadership.
Network Engineer Shapes disaster recovery strategy for network engineering at the organizational level. Defines best practices and influences technology choices beyond their own team. Is a recognized expert in this area.
Platform Engineer Required Shapes business continuity strategy for the platform: active-active multi-region, cell-based architecture for blast radius isolation. Defines architectural patterns for resilient distributed systems. Advises board on risk management and compliance for mission-critical platform.
Site Reliability Engineer (SRE) Required Designs organizational DR strategy: multi-region architecture, data sovereignty compliance, full-stack failover automation. Defines business continuity framework.
Solutions Architect Required Defines the organization's architectural strategy. Designs reference architectures. Establishes architecture governance.

Community

👁 Watch ✏️ Suggest Change Sign in to suggest changes
📋 Proposals
No proposals yet for Disaster Recovery Design
Loading comments...