Site Reliability Engineer (SRE)
Ensuring reliability, scalability, and performance of production systems
Level:
Observability & Monitoring
11 skills · 55 requirements| Skill | Junior | Middle | Senior | Lead | Principal |
|---|---|---|---|---|---|
| Logging | |||||
| Structured Logging | Awareness | Working | Advanced | Expert | Expert |
| ELK Stack | Awareness | Working | Advanced | Expert | Expert |
| Grafana Loki | Awareness | Working | Advanced | Expert | Expert |
| Metrics & Monitoring | |||||
| Prometheus & Grafana | Awareness | Working | Advanced | Expert | Expert |
| Custom Business Metrics | Awareness | Working | Advanced | Expert | Expert |
| Distributed Tracing | |||||
| OpenTelemetry | Awareness | Working | Advanced | Expert | Expert |
| Jaeger / Grafana Tempo | Awareness | Working | Advanced | Expert | Expert |
| Profiling | |||||
| Continuous Profiling | Awareness | Working | Advanced | Expert | Expert |
| APM Tools | Awareness | Working | Advanced | Expert | Expert |
| Alerting & On-Call | |||||
| SLI / SLO / SLA | Awareness | Working | Advanced | Expert | Expert |
| Incident Management | |||||
| On-Call Management | Awareness | Working | Advanced | Expert | Expert |