Claude
Skills
Sign in
Back

observability-testing-patterns

Included with Lifetime
$97 forever

Observability and monitoring validation patterns for dashboards, alerting, log aggregation, APM traces, and SLA/SLO verification. Use when testing monitoring infrastructure, dashboard accuracy, alert rules, or metric pipelines.

specialized-testingobservabilitymonitoringkibanaelasticsearchdashboardsalertingmetricsloggingscripts

What this skill does


# Observability Testing Patterns

## Browser engine

Dashboard screenshot validation and alert-UI verification go through the **qe-browser** fleet skill (`.claude/skills/qe-browser/`). Vibium is installed by `aqe init`. Typical dashboard regression workflow:

```bash
vibium go "$GRAFANA_URL/d/api-latency"
vibium wait load
node .claude/skills/qe-browser/scripts/assert.js --checks '[
  {"kind": "selector_visible", "selector": ".panel-title"},
  {"kind": "no_console_errors"},
  {"kind": "no_failed_requests"},
  {"kind": "element_count", "selector": ".panel", "op": ">=", "count": 4}
]'
node .claude/skills/qe-browser/scripts/visual-diff.js --name "grafana-api-latency"
```

<default_to_action>
When testing observability infrastructure, dashboards, or monitoring:
1. VALIDATE data accuracy (source data matches what the dashboard displays)
2. TEST alert rules fire correctly at defined thresholds
3. VERIFY log aggregation completeness (no missing logs across services)
4. TRACE distributed requests end-to-end through APM
5. MEASURE dashboard performance (render time, query latency)
6. CONFIRM SLA/SLO compliance through synthetic monitoring
7. TEST metric pipeline integrity from collection to display

**Quick Pattern Selection:**
- Dashboard shows wrong numbers -> Data accuracy validation
- Alerts not firing -> Alert rule threshold testing
- Missing logs in Kibana -> Log aggregation completeness
- Slow dashboard -> Dashboard performance testing
- Broken traces -> APM trace validation
- SLA disputes -> SLO compliance validation

**Critical Success Factors:**
- Observability is only as good as the data it shows
- A dashboard that lies is worse than no dashboard
- Alert fatigue kills response times; test thresholds carefully
</default_to_action>

## Quick Reference Card

### When to Use
- Validating dashboard data accuracy (Kibana, Grafana, Datadog)
- Testing alert rule thresholds and notification delivery
- Verifying log aggregation completeness across microservices
- Validating distributed tracing (APM) correctness
- Measuring SLA/SLO compliance
- Testing metric pipeline integrity (collection -> aggregation -> display)

### Testing Levels
| Level | Purpose | Dependencies | Speed |
|-------|---------|--------------|-------|
| Query Validation | Elasticsearch/PromQL query accuracy | Data source | Fast |
| Dashboard Accuracy | Visual matches source data | Full stack | Medium |
| Alert Threshold | Trigger and notification testing | Alerting stack | Medium |
| Pipeline Integrity | End-to-end metric flow | Full pipeline | Slower |
| Performance | Dashboard render time, query latency | Full stack | Slower |

### Critical Test Scenarios
| Scenario | Must Test | Example |
|----------|----------|---------|
| Data Accuracy | Dashboard = source truth | Order count on dashboard = DB count |
| Alert Firing | Threshold triggers alert | Error rate > 5% fires PagerDuty |
| Alert Recovery | Auto-resolve when recovered | Error rate drops below 5% clears alert |
| Log Completeness | All services emit logs | 10 microservices, all logs in Kibana |
| Trace Integrity | Full request path visible | Auth -> API -> DB -> Cache spans |
| SLO Compliance | Error budget tracking | 99.9% availability over 30 days |
| Time Accuracy | Timestamps aligned | Log timestamp matches event time |

### Tools
- **Dashboards**: Kibana, Grafana, Datadog, New Relic
- **Search**: Elasticsearch, OpenSearch, Loki
- **Metrics**: Prometheus, InfluxDB, CloudWatch
- **Tracing**: Jaeger, Zipkin, Datadog APM, OpenTelemetry
- **Alerting**: PagerDuty, OpsGenie, Alertmanager
- **Synthetic**: Datadog Synthetics, Checkly, Playwright

### Agent Coordination
- `qe-integration-tester`: Validate data pipelines, query accuracy, log completeness
- `qe-performance-tester`: Dashboard render performance, query latency
- `qe-visual-tester`: Dashboard visual regression, layout accuracy

---

## Dashboard Data Accuracy Validation

### Compare Source Data to Dashboard
```javascript
describe('Dashboard Data Accuracy', () => {
  it('order count on dashboard matches database', async () => {
    // Step 1: Get ground truth from source database
    const dbResult = await db.query(
      "SELECT COUNT(*) as count FROM orders WHERE created_at >= NOW() - INTERVAL '24 HOURS'"
    );
    const dbCount = parseInt(dbResult.rows[0].count);

    // Step 2: Query Elasticsearch (same data source as dashboard)
    const esResult = await esClient.search({
      index: 'orders-*',
      body: {
        query: {
          range: { created_at: { gte: 'now-24h' } }
        },
        size: 0,
        track_total_hits: true
      }
    });
    const esCount = esResult.hits.total.value;

    // Step 3: Compare
    expect(esCount).toBe(dbCount);
  });

  it('revenue metric on dashboard matches transaction totals', async () => {
    const dbRevenue = await db.query(
      "SELECT SUM(total) as revenue FROM orders WHERE status = 'COMPLETED' AND created_at >= NOW() - INTERVAL '24 HOURS'"
    );
    const expectedRevenue = parseFloat(dbRevenue.rows[0].revenue);

    const esResult = await esClient.search({
      index: 'orders-*',
      body: {
        query: {
          bool: {
            must: [
              { term: { status: 'COMPLETED' } },
              { range: { created_at: { gte: 'now-24h' } } }
            ]
          }
        },
        aggs: {
          total_revenue: { sum: { field: 'total' } }
        },
        size: 0
      }
    });
    const dashboardRevenue = esResult.aggregations.total_revenue.value;

    // Allow small floating point tolerance
    expect(Math.abs(dashboardRevenue - expectedRevenue)).toBeLessThan(0.01);
  });

  it('error rate percentage is calculated correctly', async () => {
    const esResult = await esClient.search({
      index: 'logs-*',
      body: {
        query: { range: { '@timestamp': { gte: 'now-1h' } } },
        aggs: {
          total: { value_count: { field: 'status_code' } },
          errors: {
            filter: { range: { status_code: { gte: 500 } } },
            aggs: { count: { value_count: { field: 'status_code' } } }
          }
        },
        size: 0
      }
    });

    const total = esResult.aggregations.total.value;
    const errors = esResult.aggregations.errors.count.value;
    const expectedErrorRate = (errors / total) * 100;

    // Fetch what the dashboard shows via Kibana API
    const dashboardPanel = await kibanaApi.get('/api/saved_objects/visualization/error-rate-gauge');
    const displayedErrorRate = await evaluateKibanaVisualization(dashboardPanel);

    expect(Math.abs(displayedErrorRate - expectedErrorRate)).toBeLessThan(0.1);
  });
});
```

---

## Elasticsearch Query Result Validation

```javascript
describe('Elasticsearch Query Validation', () => {
  it('validates date histogram aggregation returns correct buckets', async () => {
    // Insert known test data
    const testDocs = [];
    for (let hour = 0; hour < 24; hour++) {
      const timestamp = new Date();
      timestamp.setHours(hour, 0, 0, 0);
      testDocs.push({
        '@timestamp': timestamp.toISOString(),
        service: 'order-api',
        status_code: hour % 5 === 0 ? 500 : 200,
        response_time: 100 + (hour * 10)
      });
    }

    await esClient.bulk({
      index: 'test-logs',
      body: testDocs.flatMap(doc => [{ index: {} }, doc])
    });
    await esClient.indices.refresh({ index: 'test-logs' });

    // Run the same query the dashboard uses
    const result = await esClient.search({
      index: 'test-logs',
      body: {
        query: { match_all: {} },
        aggs: {
          requests_over_time: {
            date_histogram: { field: '@timestamp', fixed_interval: '1h' },
            aggs: {
              avg_response: { avg: { field: 'response_time' } },
              error_count: {
                filter: { range: { status_code: { gte: 500 } } }
              }
            }
          }
        },
        size: 0
      }
    });

    const buckets = result.aggregations.requests_over

Related in specialized-testing