Site Reliability Engineer (SRE)

Ensuring reliability, scalability, and performance of production systems

DevOps & SRE Junior Middle Senior Lead / Staff Principal
Full Matrix Career Track PDF
61 skills
5 levels
139 mandatory
305 requirements

Site Reliability Engineer (SRE) is a role in the DevOps & SRE family. It has 61 skills across 5 levels (from Junior to Principal). 139 skills are mandatory. Key domains: Programming Fundamentals, Backend Development, Database Management.

Technology Stack

Junior Linux, Prometheus/Grafana, PagerDuty/OpsGenie, Bash/Python scripting, Docker, Kubernetes basics
Middle Kubernetes, Prometheus/Thanos, Grafana/Loki, OpenTelemetry, Terraform, Go/Python, Chaos Monkey basics, Runbook automation
Senior Kubernetes advanced, Chaos Engineering (Litmus/Gremlin), eBPF tools, OpenTelemetry advanced, Custom exporters, Load testing (k6/Gatling)
Lead / Staff SRE platform, Incident management automation, SLO automation, Multi-cluster monitoring, FinOps, Disaster Recovery testing
Principal Enterprise SRE architecture, Multi-region, Global traffic management, Reliability at scale

Focus by Level

Junior

Monitoring SLI/SLO. Participating in on-call rotation. Writing runbooks. Automating routine operations. Incident analysis.

Middle

Defining SLI/SLO/SLA. Designing monitoring. Capacity planning. Automating incident response. Post-mortem analysis.

Senior

Designing highly available systems. Chaos engineering. Performance engineering. Error budgets. Coordination with development.

Lead / Staff

SRE strategy. Reliability culture. SLO standards. Incident management processes. Coordination with product.

Principal

Enterprise reliability strategy. Multi-region architecture. SRE culture at scale. Industry best practices.

Skill Matrix

61 skills × 5 levels. Click on a cell for details.

A Awareness W Working V Advanced E Expert

AI-Assisted Development

4 skills
Skills Jun Mid Sen Lead Princ
GitHub Copilot A W A E E
Cursor IDE A W A E E
ChatGPT / Claude A W A E E
Prompt Engineering for Code A W A E E

API & Integration

3 skills
Skills Jun Mid Sen Lead Princ
REST API Design A W A E E
GraphQL Design A W A E E
API Documentation A W A E E

Architecture & System Design

4 skills
Skills Jun Mid Sen Lead Princ
System Design Fundamentals A W A E E
High Load Architecture A W A E E
Capacity Planning A W A E E
Disaster Recovery Design A W A E E

Backend Development

3 skills
Skills Jun Mid Sen Lead Princ
Python Web Frameworks A W A E E
Apache Kafka A W A E E
Redis A W A E E

Cloud & Infrastructure

9 skills
Skills Jun Mid Sen Lead Princ
Docker A W A E E
Kubernetes Core A W A E E
Kubernetes Advanced A W A E E
Helm A W A E E
Terraform A W A E E
AWS A W A E E
Network Fundamentals A W A E E
Load Balancing A W A E E
VPN & Network Isolation A W A E E

Database Management

3 skills
Skills Jun Mid Sen Lead Princ
PostgreSQL A W A E E
Database Indexing A W A E E
Query Optimization A W A E E

DevOps & CI/CD

3 skills
Skills Jun Mid Sen Lead Princ
GitHub Actions / GitLab CI A W A E E
GitOps Practices A W A E E
ArgoCD A W A E E

Observability & Monitoring

11 skills
Skills Jun Mid Sen Lead Princ
Structured Logging A W A E E
ELK Stack A W A E E
Grafana Loki A W A E E
Prometheus & Grafana A W A E E
Custom Business Metrics A W A E E
OpenTelemetry A W A E E
Jaeger / Grafana Tempo A W A E E
Continuous Profiling A W A E E
APM Tools A W A E E
SLI / SLO / SLA A W A E E
On-Call Management A W A E E

Performance Engineering

1 skills
Skills Jun Mid Sen Lead Princ
Latency Optimization A W A E E

Programming Fundamentals

9 skills
Skills Jun Mid Sen Lead Princ
Algorithms & Complexity A W A E E
Data Structures A W A E E
OOP & SOLID Principles A W A E E
Design Patterns A W A E E
Multithreading A W A E E
Async Programming A W A E E
Code Quality & Refactoring A W A E E
Type Safety & Type Systems A W A E E
Memory Management A W A E E

Security

5 skills
Skills Jun Mid Sen Lead Princ
OWASP & Application Security A W A E E
Secure Coding Practices A W A E E
Secrets Management A W A E E
JWT / OAuth2 / OIDC A W A E E
Incident Response Process A W A E E

Testing & QA

4 skills
Skills Jun Mid Sen Lead Princ
Unit Testing A W A E E
Integration Testing A W A E E
E2E Testing A W A E E
Chaos Engineering A W A E E

Version Control & Collaboration

2 skills
Skills Jun Mid Sen Lead Princ
Git Advanced A W A E E
Code Review A W A E E

FAQ

What skills are needed for the Site Reliability Engineer (SRE) role?

The Site Reliability Engineer (SRE) role requires 61 skills, of which 139 are mandatory. Skills are distributed across 5 levels: from Junior to Principal. See full matrix.

How to advance to the next level in the Site Reliability Engineer (SRE) role?

Use the Grade Calculator to assess your current level and get personalized recommendations. The system will show which skills need to be developed for the next level.

What tech stack is used in the Site Reliability Engineer (SRE) role?

The stack includes 5 technologies at different levels. Linux, Prometheus/Grafana, PagerDuty/OpsGenie, Bash/Python scripting, Docker, Kubernetes basics, Kubernetes, Prometheus/Thanos, Grafana/Loki, OpenTelemetry, Terraform, Go/Python, Chaos Monkey basics, Runbook automation, Kubernetes advanced, Chaos Engineering (Litmus/Gremlin), eBPF tools, OpenTelemetry advanced, Custom exporters, Load testing (k6/Gatling)...

How does the community define requirements for the Site Reliability Engineer (SRE) role?

Role requirements are shaped by the community through a proposal system. Any member can suggest changes that go through voting and expert review.

Community

👁 Watch ✏️ Suggest Change Sign in to suggest changes
📋 Proposals
No proposals yet for Site Reliability Engineer (SRE)
Loading comments...