Claude
Skills
Sign in
Back

ring:creating-grafana-dashboards

Included with Lifetime
$97 forever

Authoring Grafana dashboards for Lerian Go services from lib-observability telemetry (tracing, metrics, log), plus a reference mode for RED/USE panel patterns and Grafonnet templates. Sweep mode inventories telemetry, runs PM deliberation on themes/SLIs/alerts, authors Grafonnet libsonnet compiled to JSON, and installs a CI drift gate. Use when scaffolding dashboards. Skip when the service is non-Go or emits no telemetry.

Cloud & DevOps

What this skill does


# Creating Grafana Dashboards (lib-observability, PM-team)

## When to use

Sweep mode:
- "Create / scaffold Grafana dashboards for this service"
- "Inventory telemetry / build telemetry dictionary"
- "Audit observability before designing dashboards"
- "Produce dashboards as code for {service}"
- "PM wants visibility into {domain} — what dashboards do we need?"

Reference mode:
- "What's the right panel for HTTP request latency?"
- "RED vs USE methodology for this metric type?"
- "How do I compose Grafonnet panels?"
- "Which Grafonnet template fits a counter / histogram / gauge?"

## Skip when

- Service is not a Go project (lib-observability is Go-only at this skill's scope)
- Service emits no telemetry (pre-instrumentation; instrument the service before dashboard authoring, then use ring:implementing-tasks to verify observability checks pass)
- Task is purely Grafana folder organization or dashboard import (no authoring)
- Service is consumer-only sidecar with no metrics surface

## Sequence

**Runs before:** ring:running-dev-cycle, ring:running-dev-cycle-frontend

## Related

**Complementary:** ring:implementing-tasks, ring:codebase-explorer, ring:mapping-streaming-events, ring:using-lib-observability, ring:using-tracing
**Similar:** ring:using-runtime, ring:using-assert

## Prerequisites

- Go service with lib-observability initialized in bootstrap (`tracing.NewTelemetry`, `metrics.NewFactory`, `zap.NewLogger`)
- At least one metric, span, or structured log emission point present
- docs/ directory writable
- Grafonnet toolchain available in CI (jsonnet + grafonnet-lib) — installer instructions in ci-drift-check.md


Orchestrates a 3-phase, 8-gate workflow to produce Grafana dashboards grounded in real telemetry. You orchestrate. Agents explore. PM iterates. You NEVER read, write, or edit source code directly during the sweep.

**Announce at start:** "Using ring:creating-grafana-dashboards through 8 gates (0–7)."

## Mode Selection

| Request Shape | Mode |
|---|---|
| "Create / scaffold dashboards" / "build telemetry dictionary" | **Sweep** (run gates 0–7) |
| "Which panel for X?" / "RED vs USE?" / "Grafonnet template for Y?" | **Reference** (load `sub-files/reference.md`) |

---

# SWEEP MODE

## Telemetry Architecture (lib-observability)

Lerian Go services emit telemetry through `github.com/LerianStudio/lib-observability`:
- **Tracing** via `lib-observability/tracing` — `tracer.Start(ctx, name, opts...)` returning `context.Context, trace.Span`
- **Metrics** via `lib-observability/metrics` — fluent factory producing `meter.Int64Counter`, `meter.Float64Histogram`, `meter.Int64UpDownCounter`, `meter.Int64ObservableGauge`
- **Logs** via `lib-observability/log` (interface) and `lib-observability/zap` (implementation) — structured fields, automatically correlated with active span via `trace_id`/`span_id`
- **OTel attribute / metric / event names** via `lib-observability/constants` — canonical string constants; dashboards reference these for label and metric names
- **Cross-cutting** — `tenant_id` propagation through context, error attribution via `span.RecordError` + `span.SetStatus`

> **Deprecated shims:** `lib-commons/v5/commons/{opentelemetry,zap,log,metrics}` still compile but route through lib-observability. New emission sites MUST import lib-observability directly. The sweep detects both canonical and shim imports.

**WebFetch canonical docs (lib-observability — develop branch; main has only LICENSE + README):**
- Tracing: `https://raw.githubusercontent.com/LerianStudio/lib-observability/develop/tracing/doc.go`
- Metrics: `https://raw.githubusercontent.com/LerianStudio/lib-observability/develop/metrics/doc.go`
- Log: `https://raw.githubusercontent.com/LerianStudio/lib-observability/develop/log/doc.go`
- Constants: `https://raw.githubusercontent.com/LerianStudio/lib-observability/develop/constants/doc.go`

**WebFetch changelog:** `https://raw.githubusercontent.com/LerianStudio/lib-observability/develop/CHANGELOG.md`

## Authoring Format: Grafonnet (Mandatory)

Dashboards are authored as **Grafonnet** (Jsonnet templating language) — compiled to JSON in CI. Raw JSON dashboards are FORBIDDEN.

Reasons:
- Diffable in PR review (libsonnet is code-shaped, JSON is not)
- Composable via `import` and inheritance
- Templated panel patterns reusable across themes
- Single source of truth — JSON is a build artifact, not a checked-in source

Toolchain setup: `sub-files/ci-drift-check.md`. Panel templates: `sub-files/grafonnet-templates/`.

## Theme Taxonomy

**Free-form per service.** PM defines the theme directories under `docs/dashboards/{theme}/` during Gate 5. No enforced taxonomy — Lerian services are observability islands and theme naming reflects each service's domain.

Common-but-not-mandatory examples: `transactions/`, `auth/`, `ledger/`, `infrastructure/`, `business-kpis/`, `sla/`. The skill SUGGESTS themes from dictionary contents in Gate 4; PM ACCEPTS, RENAMES, MERGES, or SPLITS in Gate 5.

## Drift Gate Posture

CI drift detection is **BLOCKING from day 1**. Any divergence between regenerated dictionary and committed `telemetry-dictionary.md` fails the PR. This is a deliberate cold-start choice — the skill is greenfield, no installed base to retrofit, every new metric emits under the strict regime.

Drift gate spec: `sub-files/ci-drift-check.md`.

## Gate Overview

| Gate | Name | Agent | Cadence |
|------|------|-------|---------|
| 0 | Stack Detection | Orchestrator (grep + read) | Once per run |
| 1 | Telemetry Sweep (7 angles) | ring:codebase-explorer × 7 parallel | Once per run |
| 2 | Dictionary Assembly + Validation | Orchestrator (deterministic merge) | Once per run |
| 3 | Dictionary Rendering | Orchestrator → markdown writer | Once per run |
| 4 | Theme Proposal + Dashboard Plans | Orchestrator (LLM opinion via reference.md) | Once per run |
| 5 | PM Iteration — NEVER SKIPPABLE | User (PM team) | Loops until APPROVED |
| 6 | Grafonnet Authoring | ring:backend-go per theme | Per approved theme |
| 7 | CI Drift Gate Setup | Orchestrator | Once (idempotent) |

Gates execute sequentially. Gate 1 parallelizes internally across 7 angles. Gate 6 parallelizes per approved theme.

## Gate 0: Stack Detection

Orchestrator executes directly. Detect in parallel:

```
1. Go version:                grep "^go " go.mod | head -1
2. lib-observability version: grep "lib-observability" go.mod
3. lib-commons version:       grep "lib-commons" go.mod
4. Tracing package present:   grep -rn "lib-observability/tracing\|lib-commons/v5/commons/opentelemetry" internal/ cmd/   # canonical + deprecated shim
5. Metrics package present:   grep -rn "lib-observability/metrics\|lib-commons/v5/commons/opentelemetry" internal/ cmd/   # canonical + deprecated shim
6. Meter init:                grep -rn "Meter(\|NewMeter\|meter.Int64Counter\|meter.Float64Histogram" internal/ cmd/
7. Tracer init:               grep -rn "Tracer(\|NewTracer\|tracer.Start" internal/ cmd/
8. Log emission:              grep -rn "lib-observability/log\|lib-observability/zap\|lib-commons/v5/commons/log\|lib-commons/v5/commons/zap" internal/ cmd/   # canonical + deprecated shim
9. HTTP framework:            grep -rn "gofiber/fiber\|labstack/echo\|gin-gonic" go.mod
10. gRPC server:              grep -rn "grpc.NewServer" internal/ cmd/
11. RabbitMQ command consumers: grep -rn "lib-commons/v5/commons/rabbitmq" internal/ cmd/   # command queues; event emission goes through lib-streaming
12. lib-streaming present:    grep "lib-streaming" go.mod
13. Tenant source:            grep -rn "tmcore.GetTenantIDContext\|GetTenantID" internal/
14. Existing dictionary:      test -f docs/dashboards/telemetry-dictionary.md
15. Existing dashboards:      ls docs/dashboards/ 2>/dev/null
16. Grafonnet in CI:          test -f .github/workflows/telemetry-drift.yml
17. Service identity:         cat go.mod | grep "^module"
```

Emit `/tmp/dashboards-recon.json`:
```json
{
  "service_name": "...",
  "go_version": "...",
 

Related in Cloud & DevOps