observability-testing-patterns

Included with Lifetime

$97 forever

Observability and monitoring validation patterns for dashboards, alerting, log aggregation, APM traces, and SLA/SLO verification. Use when testing monitoring infrastructure, dashboard accuracy, alert rules, or metric pipelines.

specialized-testingobservabilitymonitoringkibanaelasticsearchdashboardsalertingmetricsloggingscripts

What this skill does


# Observability Testing Patterns

## Browser engine

Dashboard screenshot validation and alert-UI verification go through the **qe-browser** fleet skill (`.claude/skills/qe-browser/`). Vibium is installed by `aqe init`. Typical dashboard regression workflow:

```bash
vibium go "$GRAFANA_URL/d/api-latency"
vibium wait load
node .claude/skills/qe-browser/scripts/assert.js --checks '[
  {"kind": "selector_visible", "selector": ".panel-title"},
  {"kind": "no_console_errors"},
  {"kind": "no_failed_requests"},
  {"kind": "element_count", "selector": ".panel", "op": ">=", "count": 4}
]'
node .claude/skills/qe-browser/scripts/visual-diff.js --name "grafana-api-latency"
```

<default_to_action>
When testing observability infrastructure, dashboards, or monitoring:
1. VALIDATE data accuracy (source data matches what the dashboard displays)
2. TEST alert rules fire correctly at defined thresholds
3. VERIFY log aggregation completeness (no missing logs across services)
4. TRACE distributed requests end-to-end through APM
5. MEASURE dashboard performance (render time, query latency)
6. CONFIRM SLA/SLO compliance through synthetic monitoring
7. TEST metric pipeline integrity from collection to display

**Quick Pattern Selection:**
- Dashboard shows wrong numbers -> Data accuracy validation
- Alerts not firing -> Alert rule threshold testing
- Missing logs in Kibana -> Log aggregation completeness
- Slow dashboard -> Dashboard performance testing
- Broken traces -> APM trace validation
- SLA disputes -> SLO compliance validation

**Critical Success Factors:**
- Observability is only as good as the data it shows
- A dashboard that lies is worse than no dashboard
- Alert fatigue kills response times; test thresholds carefully
</default_to_action>

## Quick Reference Card

### When to Use
- Validating dashboard data accuracy (Kibana, Grafana, Datadog)
- Testing alert rule thresholds and notification delivery
- Verifying log aggregation completeness across microservices
- Validating distributed tracing (APM) correctness
- Measuring SLA/SLO compliance
- Testing metric pipeline integrity (collection -> aggregation -> display)

### Testing Levels
| Level | Purpose | Dependencies | Speed |
|-------|---------|--------------|-------|
| Query Validation | Elasticsearch/PromQL query accuracy | Data source | Fast |
| Dashboard Accuracy | Visual matches source data | Full stack | Medium |
| Alert Threshold | Trigger and notification testing | Alerting stack | Medium |
| Pipeline Integrity | End-to-end metric flow | Full pipeline | Slower |
| Performance | Dashboard render time, query latency | Full stack | Slower |

### Critical Test Scenarios
| Scenario | Must Test | Example |
|----------|----------|---------|
| Data Accuracy | Dashboard = source truth | Order count on dashboard = DB count |
| Alert Firing | Threshold triggers alert | Error rate > 5% fires PagerDuty |
| Alert Recovery | Auto-resolve when recovered | Error rate drops below 5% clears alert |
| Log Completeness | All services emit logs | 10 microservices, all logs in Kibana |
| Trace Integrity | Full request path visible | Auth -> API -> DB -> Cache spans |
| SLO Compliance | Error budget tracking | 99.9% availability over 30 days |
| Time Accuracy | Timestamps aligned | Log timestamp matches event time |

### Tools
- **Dashboards**: Kibana, Grafana, Datadog, New Relic
- **Search**: Elasticsearch, OpenSearch, Loki
- **Metrics**: Prometheus, InfluxDB, CloudWatch
- **Tracing**: Jaeger, Zipkin, Datadog APM, OpenTelemetry
- **Alerting**: PagerDuty, OpsGenie, Alertmanager
- **Synthetic**: Datadog Synthetics, Checkly, Playwright

### Agent Coordination
- `qe-integration-tester`: Validate data pipelines, query accuracy, log completeness
- `qe-performance-tester`: Dashboard render performance, query latency
- `qe-visual-tester`: Dashboard visual regression, layout accuracy

---

## Dashboard Data Accuracy Validation

### Compare Source Data to Dashboard
```javascript
describe('Dashboard Data Accuracy', () => {
  it('order count on dashboard matches database', async () => {
    // Step 1: Get ground truth from source database
    const dbResult = await db.query(
      "SELECT COUNT(*) as count FROM orders WHERE created_at >= NOW() - INTERVAL '24 HOURS'"
    );
    const dbCount = parseInt(dbResult.rows[0].count);

    // Step 2: Query Elasticsearch (same data source as dashboard)
    const esResult = await esClient.search({
      index: 'orders-*',
      body: {
        query: {
          range: { created_at: { gte: 'now-24h' } }
        },
        size: 0,
        track_total_hits: true
      }
    });
    const esCount = esResult.hits.total.value;

    // Step 3: Compare
    expect(esCount).toBe(dbCount);
  });

  it('revenue metric on dashboard matches transaction totals', async () => {
    const dbRevenue = await db.query(
      "SELECT SUM(total) as revenue FROM orders WHERE status = 'COMPLETED' AND created_at >= NOW() - INTERVAL '24 HOURS'"
    );
    const expectedRevenue = parseFloat(dbRevenue.rows[0].revenue);

    const esResult = await esClient.search({
      index: 'orders-*',
      body: {
        query: {
          bool: {
            must: [
              { term: { status: 'COMPLETED' } },
              { range: { created_at: { gte: 'now-24h' } } }
            ]
          }
        },
        aggs: {
          total_revenue: { sum: { field: 'total' } }
        },
        size: 0
      }
    });
    const dashboardRevenue = esResult.aggregations.total_revenue.value;

    // Allow small floating point tolerance
    expect(Math.abs(dashboardRevenue - expectedRevenue)).toBeLessThan(0.01);
  });

  it('error rate percentage is calculated correctly', async () => {
    const esResult = await esClient.search({
      index: 'logs-*',
      body: {
        query: { range: { '@timestamp': { gte: 'now-1h' } } },
        aggs: {
          total: { value_count: { field: 'status_code' } },
          errors: {
            filter: { range: { status_code: { gte: 500 } } },
            aggs: { count: { value_count: { field: 'status_code' } } }
          }
        },
        size: 0
      }
    });

    const total = esResult.aggregations.total.value;
    const errors = esResult.aggregations.errors.count.value;
    const expectedErrorRate = (errors / total) * 100;

    // Fetch what the dashboard shows via Kibana API
    const dashboardPanel = await kibanaApi.get('/api/saved_objects/visualization/error-rate-gauge');
    const displayedErrorRate = await evaluateKibanaVisualization(dashboardPanel);

    expect(Math.abs(displayedErrorRate - expectedErrorRate)).toBeLessThan(0.1);
  });
});
```

---

## Elasticsearch Query Result Validation

```javascript
describe('Elasticsearch Query Validation', () => {
  it('validates date histogram aggregation returns correct buckets', async () => {
    // Insert known test data
    const testDocs = [];
    for (let hour = 0; hour < 24; hour++) {
      const timestamp = new Date();
      timestamp.setHours(hour, 0, 0, 0);
      testDocs.push({
        '@timestamp': timestamp.toISOString(),
        service: 'order-api',
        status_code: hour % 5 === 0 ? 500 : 200,
        response_time: 100 + (hour * 10)
      });
    }

    await esClient.bulk({
      index: 'test-logs',
      body: testDocs.flatMap(doc => [{ index: {} }, doc])
    });
    await esClient.indices.refresh({ index: 'test-logs' });

    // Run the same query the dashboard uses
    const result = await esClient.search({
      index: 'test-logs',
      body: {
        query: { match_all: {} },
        aggs: {
          requests_over_time: {
            date_histogram: { field: '@timestamp', fixed_interval: '1h' },
            aggs: {
              avg_response: { avg: { field: 'response_time' } },
              error_count: {
                filter: { range: { status_code: { gte: 500 } } }
              }
            }
          }
        },
        size: 0
      }
    });

    const buckets = result.aggregations.requests_over

Files: 4

Size: 38.4 KB

Complexity: 58/100

Category: specialized-testing

Source: https://github.com/proffesor-for-testing/agentic-qe/tree/main/assets/skills/observability-testing-patterns

Related in specialized-testing

security-testing

Included

Scans for security vulnerabilities including XSS, SQL injection, CSRF, and auth flaws using OWASP Top 10 methodology. Use when conducting SAST/DAST scans, auditing authentication flows, testing authorization rules, or implementing security test automation.

specialized-testingscripts

mutation-testing

Included

Test quality validation through mutation testing, assessing test suite effectiveness by introducing code mutations and measuring kill rate. Use when evaluating test quality, identifying weak tests, or proving tests actually catch bugs.

specialized-testingscripts

performance-testing

Included

Profiles application performance under load using k6, Artillery, or JMeter to measure latency, throughput, and error rates. Use when planning load tests, stress tests, soak tests, benchmarking APIs, or identifying performance bottlenecks.

specialized-testingscripts

accessibility-testing

Included

WCAG 2.2 compliance testing, screen reader validation, and inclusive design verification. Use when ensuring legal compliance (ADA, Section 508), testing for disabilities, or building accessible applications for 1 billion disabled users globally.

specialized-testingscripts

a11y-ally

Included

Use when running comprehensive WCAG accessibility audits with axe-core + pa11y + Lighthouse, generating context-aware remediation, or testing video accessibility. Supports 3-tier browser cascade with graceful degradation.

specialized-testingscripts

chaos-engineering-resilience

Included

Chaos engineering principles, controlled failure injection, resilience testing, and system recovery validation. Use when testing distributed systems, building confidence in fault tolerance, or validating disaster recovery.

specialized-testingscripts