Domäne
Observability & Monitoring
Skill-Profil
Log aggregation, LogQL, labels, retention, multi-tenancy, Grafana integration
Rollen
2
wo dieser Skill vorkommt
Stufen
5
strukturierter Entwicklungspfad
Pflichtanforderungen
6
die anderen 4 optional
Observability & Monitoring
Logging
17.3.2026
Wählen Sie Ihr aktuelles Level und vergleichen Sie die Erwartungen.
Die Tabelle zeigt, wie die Tiefe von Junior bis Principal wächst.
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Platform Engineer | Queries logs in Grafana Loki using basic LogQL syntax. Navigates Grafana dashboards to view application log streams. Understands label-based log filtering and basic log aggregation concepts. | |
| Site Reliability Engineer (SRE) | Uses Grafana Loki to search and filter logs during incident investigation. Understands log retention policies and storage concepts. Follows runbooks that reference Loki queries for common troubleshooting scenarios. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Platform Engineer | Configures Loki ingestion pipelines with Promtail and structured metadata extraction. Builds Grafana dashboards combining Loki logs with Prometheus metrics for correlated observability. Sets up log-based alerting rules for platform health monitoring. | |
| Site Reliability Engineer (SRE) | Configures Loki for multi-tenant log aggregation across services. Creates advanced LogQL queries with metric extraction for SLI tracking. Builds alerting rules on log patterns and participates in on-call rotation using log-based diagnostics. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Platform Engineer | Pflicht | Architects Loki deployment topology for high-throughput multi-cluster log aggregation. Designs log pipeline standards including labeling conventions, retention policies, and cost optimization. Integrates Loki into the platform observability stack alongside tracing and metrics. |
| Site Reliability Engineer (SRE) | Pflicht | Designs the organization-wide logging strategy with Loki as the centralized log platform. Defines SLI/SLO based on log-derived metrics and automates error-budget alerting. Leads post-mortems leveraging Loki correlation with distributed traces and APM data. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Platform Engineer | Pflicht | Adopts Grafana Loki as cost-effective logging solution for the platform: multi-tenant configuration, retention policies. Designs label strategy for optimal query performance. Integrates with Grafana for unified observability (logs + metrics + traces in single UI). |
| Site Reliability Engineer (SRE) | Pflicht | Defines Loki standards: label strategy (low cardinality), retention policies, query patterns. Implements Loki for cost-effective log aggregation. Compares Loki vs ELK by scenarios. |
| Rolle | Pflicht | Beschreibung |
|---|---|---|
| Platform Engineer | Pflicht | Defines logging strategy: Loki vs ELK vs managed solutions for various platform use cases. Designs Loki at scale: microservices mode, S3 backend, caching. Shapes vision for cost-efficient observability data platform with tiered storage. |
| Site Reliability Engineer (SRE) | Pflicht | Designs log aggregation strategy: Loki for Kubernetes-native logging, multi-tenant setup, long-term storage. Defines when Loki vs ELK vs managed (Datadog/Splunk). |