Claude
Skills
Sign in
โ† Back

bigconfig-generator

Included with Lifetime
$97 forever

Use this skill when creating or updating Bigeye monitoring configurations (bigconfig.yml files) for BigQuery tables. Works with metadata-manager skill.

Generalscriptsassets

What this skill does


# Bigconfig Generator

**Composable:** Works with metadata-manager (for schema/metadata generation) and bigquery-etl-core (for conventions)
**When to use:** Creating/updating Bigeye configurations, data quality monitoring

## Overview

Generate and manage Bigeye monitoring configurations for BigQuery tables in the Mozilla bigquery-etl repository. Bigeye is Mozilla's data quality monitoring platform that checks for freshness, volume anomalies, null values, uniqueness violations, and custom business logic validation.

This skill helps configure monitoring through:
1. **metadata.yaml** - High-level monitoring settings (freshness, volume, collections)
2. **bigconfig.yml** - Detailed metric definitions (auto-generated via bqetl CLI)
3. **bigeye_custom_rules.sql** - Custom SQL validation rules (optional, for complex business logic)

**Official Documentation:**
- **bigConfig Reference:** https://mozilla.github.io/bigquery-etl/reference/bigconfig/ (docs/reference/bigconfig.md)
- **Bigeye Intro:** https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html
- **Bigeye Official Docs:** https://docs.bigeye.com/docs/bigconfig

## ๐Ÿšจ REQUIRED READING - Start Here

**BEFORE creating monitoring configurations, READ these resources:**

1. **Existing Collections:** READ `references/existing_collections.md`
   - Collections already in use across the repository
   - Notification channels by dataset/team
   - Helps maintain consistency and avoid creating duplicate collections

2. **Monitoring Patterns:** READ `references/monitoring_patterns.md`
   - Common monitoring scenarios
   - Freshness vs volume monitoring
   - When to use custom rules
   - Configuration workflow

## ๐Ÿ“‹ Templates - Copy These Structures

**When adding monitoring to metadata.yaml, READ and COPY from these templates:**

- **Basic monitoring (most tables)?** โ†’ READ `assets/metadata_monitoring_basic.yaml`
  - Standard freshness and volume checks
  - Collection assignment

- **Critical table (high priority)?** โ†’ READ `assets/metadata_monitoring_critical.yaml`
  - More aggressive monitoring settings
  - Faster alerting

- **View (non-partitioned)?** โ†’ READ `assets/metadata_monitoring_view.yaml`
  - Monitoring for views without partitions

**For custom validation rules:**
- **Custom SQL checks?** โ†’ READ `assets/custom_rules_template.sql`
  - Template for bigeye_custom_rules.sql
  - Shows how to write validation queries

## When to Use This Skill

Use this skill when:
- Creating new tables and user wants to enable monitoring
- User explicitly requests "create a bigeye config for..."
- User asks about adding data quality monitoring
- Setting up freshness or volume checks
- Creating custom validation rules
- Troubleshooting monitoring configurations

**Integration with metadata-manager:**
When metadata-manager creates new tables, it should ask the user: "Would you like to enable Bigeye monitoring for this table?" If yes, invoke this skill.

## ๐Ÿšจ IMPORTANT: Deployment Safety

**Manual deployment is BLOCKED for safety reasons.**

If a user asks to run `./bqetl monitoring deploy`, **warn them:**

> โš ๏ธ **Manual deployment can accidentally delete existing metrics.** The recommended workflow is to commit your changes and let the `bqetl_artifact_deployment` DAG deploy automatically. Manual deployment is disabled in this environment.
>
> If you need to manually deploy for testing purposes, you'll need to:
> 1. Ensure you have `BIGEYE_API_KEY` set
> 2. Understand that deploying only specific tables can remove metrics from other tables
> 3. Use `--dry-run` first to review changes
> 4. Contact Data Engineering if you're unsure
>
> **Proceed with caution - this can affect production monitoring.**

The standard workflow (update โ†’ validate โ†’ commit โ†’ push) is safe and recommended.

## Prerequisites

- Table must have metadata.yaml file
- Table must be deployed to BigQuery
- Understanding of table's update schedule (daily, hourly, etc.)
- For manual deployment (discouraged): `BIGEYE_API_KEY` environment variable must be set

## Staying Current with Documentation

**Always prefer official documentation over this skill's references:**

1. **For bigConfig syntax and structure:** Read docs/reference/bigconfig.md or use WebFetch on https://mozilla.github.io/bigquery-etl/reference/bigconfig/
2. **For available saved metrics:** Check sql/bigconfig.yml in the repository (source of truth)
3. **For Bigeye concepts:** Use WebFetch on https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html
4. **For bqetl CLI commands:** Check `./bqetl monitoring --help` or the monitoring.py source code

**When to use WebFetch:**
- User asks about specific bigConfig features not covered in this skill
- Need to verify current syntax or available options
- References in this skill seem outdated or incomplete
- Troubleshooting issues not covered in common patterns

This skill focuses on **workflow and decision-making** rather than being a comprehensive bigConfig reference.

## Workflow

### Step 1: Determine Monitoring Requirements

Ask the user what type of monitoring they need:

**For new tables created by metadata-manager:**
"Would you like to enable Bigeye monitoring for this table? This can check for:
- Freshness (when data was last updated)
- Volume (row count anomalies)
- Column-level validation (nulls, uniqueness, formats)
- Custom business logic validation"

**For existing tables:**
"What type of monitoring would you like to configure?
1. Basic (freshness + volume)
2. Critical (freshness + volume with blocking)
3. Column-level validation
4. Custom SQL rules
5. All of the above"

**After determining monitoring type, check existing collections:**

Before configuring metadata.yaml, READ `references/existing_collections.md` to:
- Find the dataset in "Collections by Dataset" section
- Check if there's an existing collection for this dataset/team
- Note the notification channels used by similar tables

Ask the user: "Based on existing configurations, would you like to use the [Collection Name] collection with [notification channels]? Or create a new collection?"

### Step 2: Configure metadata.yaml

Add a `monitoring` section to metadata.yaml based on table type:

- **Basic (most tables):** `assets/metadata_monitoring_basic.yaml` - Freshness + volume, non-blocking
- **Critical (production):** `assets/metadata_monitoring_critical.yaml` - Blocking failures, collection assignment
- **Views:** `assets/metadata_monitoring_view.yaml` - Requires explicit partition_column

**Key settings:**
- `blocking: true` - Failures block deployments (use for critical tables)
- `collection` - Groups related tables, configures alerts
- `partition_column` - Required for views (or null if non-partitioned)

### Step 3: Generate bigconfig.yml

Use the bqetl CLI to auto-generate bigconfig.yml from metadata.yaml:

```bash
./bqetl monitoring update <dataset>.<table>
```

This command:
- Reads monitoring settings from metadata.yaml
- Generates appropriate metric definitions in bigconfig.yml
- Adds freshness/volume checks based on configuration
- Uses saved metrics from sql/bigconfig.yml

**What gets generated:**
- If `freshness.enabled: true` โ†’ Adds freshness metric
- If `volume.enabled: true` โ†’ Adds volume metric
- If `blocking: true` โ†’ Uses `freshness_fail`/`volume_fail` variants
- If `collection` specified โ†’ Groups under that collection

### Step 4: Customize bigconfig.yml (Optional)

Manually edit the generated bigconfig.yml for advanced use cases:

**Column-level validation:** Add `tag_deployments` section with `column_selectors` and metrics (is_not_null, is_unique, is_valid_client_id, etc.). See `sql/bigconfig.yml` for all available saved metrics.

**Lookback windows:** Adjust how far back Bigeye scans data (0=latest partition, 7=last 7 days, 28=last 28 days). Use longer lookback for tables with sporadic updates.

**When to customize:** Column-specific validation, custom thresholds, infrequent updates, different notification channels per metr

Related in General