Category · Observability

Observability

5 skills Category 19 of 20

This category links metrics, logs, traces, and humans: SLO dashboards, structured logging and distributed tracing, alert severity and escalation, minimal reproduces with redaction, and async stack parsing. Together with PII redaction and runbooks it supports production resilience.

In the hub it opens the “observability & troubleshooting” band. The five entries match the main hub.

Quick links

Monitoring dashboards

SLOs and alert noise.

Logging & tracing

Structured logs and traces.

Alerting & on-call

Severity and escalation.

Bug reproduction

Minimal cases and redaction.

Stack trace analysis

Async and wrapped errors.

In depth

Monitoring dashboards

RED/USE-style signals, SLOs and error budgets, deduping noisy alerts—dashboards should answer “is the user impacted?” not only “is the CPU high?”

Logging & tracing

Structured fields, trace IDs across services, sampling—locate latency and error propagation in microservices.

Alerting & on-call

Severities, rotations, escalation chains, and ticketing—avoid duplicate pages for the same incident.

Bug reproduction

Shrink to minimal data and steps; redact before sharing—feed into regression tests to prevent repeat failures.

Stack trace analysis

Separate root frames from async wrappers, align source maps—differences across Node, browsers, JVM, etc.

Back to more skills Skill library (filters)