Domain
Observability & Monitoring
Skill Profile
Log aggregation, LogQL, labels, retention, multi-tenancy, Grafana integration
Roles
2
where this skill appears
Levels
5
structured growth path
Mandatory requirements
6
the other 4 optional
Observability & Monitoring
Logging
3/17/2026
Choose your current level and compare expectations. The items below show what to cover to advance to the next level.
The table shows how skill depth grows from Junior to Principal. Click a row to see details.
| Role | Required | Description |
|---|---|---|
| Platform Engineer | Queries logs in Grafana Loki using basic LogQL syntax. Navigates Grafana dashboards to view application log streams. Understands label-based log filtering and basic log aggregation concepts. | |
| Site Reliability Engineer (SRE) | Uses Grafana Loki to search and filter logs during incident investigation. Understands log retention policies and storage concepts. Follows runbooks that reference Loki queries for common troubleshooting scenarios. |
| Role | Required | Description |
|---|---|---|
| Platform Engineer | Configures Loki ingestion pipelines with Promtail and structured metadata extraction. Builds Grafana dashboards combining Loki logs with Prometheus metrics for correlated observability. Sets up log-based alerting rules for platform health monitoring. | |
| Site Reliability Engineer (SRE) | Configures Loki for multi-tenant log aggregation across services. Creates advanced LogQL queries with metric extraction for SLI tracking. Builds alerting rules on log patterns and participates in on-call rotation using log-based diagnostics. |
| Role | Required | Description |
|---|---|---|
| Platform Engineer | Required | Architects Loki deployment topology for high-throughput multi-cluster log aggregation. Designs log pipeline standards including labeling conventions, retention policies, and cost optimization. Integrates Loki into the platform observability stack alongside tracing and metrics. |
| Site Reliability Engineer (SRE) | Required | Designs the organization-wide logging strategy with Loki as the centralized log platform. Defines SLI/SLO based on log-derived metrics and automates error-budget alerting. Leads post-mortems leveraging Loki correlation with distributed traces and APM data. |
| Role | Required | Description |
|---|---|---|
| Platform Engineer | Required | Adopts Grafana Loki as cost-effective logging solution for the platform: multi-tenant configuration, retention policies. Designs label strategy for optimal query performance. Integrates with Grafana for unified observability (logs + metrics + traces in single UI). |
| Site Reliability Engineer (SRE) | Required | Defines Loki standards: label strategy (low cardinality), retention policies, query patterns. Implements Loki for cost-effective log aggregation. Compares Loki vs ELK by scenarios. |
| Role | Required | Description |
|---|---|---|
| Platform Engineer | Required | Defines logging strategy: Loki vs ELK vs managed solutions for various platform use cases. Designs Loki at scale: microservices mode, S3 backend, caching. Shapes vision for cost-efficient observability data platform with tiered storage. |
| Site Reliability Engineer (SRE) | Required | Designs log aggregation strategy: Loki for Kubernetes-native logging, multi-tenant setup, long-term storage. Defines when Loki vs ELK vs managed (Datadog/Splunk). |