bigconfig-generator

Included with Lifetime

$97 forever

Use this skill when creating or updating Bigeye monitoring configurations (bigconfig.yml files) for BigQuery tables. Works with metadata-manager skill.

Generalscriptsassets

What this skill does


# Bigconfig Generator

**Composable:** Works with metadata-manager (for schema/metadata generation) and bigquery-etl-core (for conventions)
**When to use:** Creating/updating Bigeye configurations, data quality monitoring

## Overview

Generate and manage Bigeye monitoring configurations for BigQuery tables in the Mozilla bigquery-etl repository. Bigeye is Mozilla's data quality monitoring platform that checks for freshness, volume anomalies, null values, uniqueness violations, and custom business logic validation.

This skill helps configure monitoring through:
1. **metadata.yaml** - High-level monitoring settings (freshness, volume, collections)
2. **bigconfig.yml** - Detailed metric definitions (auto-generated via bqetl CLI)
3. **bigeye_custom_rules.sql** - Custom SQL validation rules (optional, for complex business logic)

**Official Documentation:**
- **bigConfig Reference:** https://mozilla.github.io/bigquery-etl/reference/bigconfig/ (docs/reference/bigconfig.md)
- **Bigeye Intro:** https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html
- **Bigeye Official Docs:** https://docs.bigeye.com/docs/bigconfig

## 🚨 REQUIRED READING - Start Here

**BEFORE creating monitoring configurations, READ these resources:**

1. **Existing Collections:** READ `references/existing_collections.md`
   - Collections already in use across the repository
   - Notification channels by dataset/team
   - Helps maintain consistency and avoid creating duplicate collections

2. **Monitoring Patterns:** READ `references/monitoring_patterns.md`
   - Common monitoring scenarios
   - Freshness vs volume monitoring
   - When to use custom rules
   - Configuration workflow

## 📋 Templates - Copy These Structures

**When adding monitoring to metadata.yaml, READ and COPY from these templates:**

- **Basic monitoring (most tables)?** → READ `assets/metadata_monitoring_basic.yaml`
  - Standard freshness and volume checks
  - Collection assignment

- **Critical table (high priority)?** → READ `assets/metadata_monitoring_critical.yaml`
  - More aggressive monitoring settings
  - Faster alerting

- **View (non-partitioned)?** → READ `assets/metadata_monitoring_view.yaml`
  - Monitoring for views without partitions

**For custom validation rules:**
- **Custom SQL checks?** → READ `assets/custom_rules_template.sql`
  - Template for bigeye_custom_rules.sql
  - Shows how to write validation queries

## When to Use This Skill

Use this skill when:
- Creating new tables and user wants to enable monitoring
- User explicitly requests "create a bigeye config for..."
- User asks about adding data quality monitoring
- Setting up freshness or volume checks
- Creating custom validation rules
- Troubleshooting monitoring configurations

**Integration with metadata-manager:**
When metadata-manager creates new tables, it should ask the user: "Would you like to enable Bigeye monitoring for this table?" If yes, invoke this skill.

## 🚨 IMPORTANT: Deployment Safety

**Manual deployment is BLOCKED for safety reasons.**

If a user asks to run `./bqetl monitoring deploy`, **warn them:**

> ⚠️ **Manual deployment can accidentally delete existing metrics.** The recommended workflow is to commit your changes and let the `bqetl_artifact_deployment` DAG deploy automatically. Manual deployment is disabled in this environment.
>
> If you need to manually deploy for testing purposes, you'll need to:
> 1. Ensure you have `BIGEYE_API_KEY` set
> 2. Understand that deploying only specific tables can remove metrics from other tables
> 3. Use `--dry-run` first to review changes
> 4. Contact Data Engineering if you're unsure
>
> **Proceed with caution - this can affect production monitoring.**

The standard workflow (update → validate → commit → push) is safe and recommended.

## Prerequisites

- Table must have metadata.yaml file
- Table must be deployed to BigQuery
- Understanding of table's update schedule (daily, hourly, etc.)
- For manual deployment (discouraged): `BIGEYE_API_KEY` environment variable must be set

## Staying Current with Documentation

**Always prefer official documentation over this skill's references:**

1. **For bigConfig syntax and structure:** Read docs/reference/bigconfig.md or use WebFetch on https://mozilla.github.io/bigquery-etl/reference/bigconfig/
2. **For available saved metrics:** Check sql/bigconfig.yml in the repository (source of truth)
3. **For Bigeye concepts:** Use WebFetch on https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html
4. **For bqetl CLI commands:** Check `./bqetl monitoring --help` or the monitoring.py source code

**When to use WebFetch:**
- User asks about specific bigConfig features not covered in this skill
- Need to verify current syntax or available options
- References in this skill seem outdated or incomplete
- Troubleshooting issues not covered in common patterns

This skill focuses on **workflow and decision-making** rather than being a comprehensive bigConfig reference.

## Workflow

### Step 1: Determine Monitoring Requirements

Ask the user what type of monitoring they need:

**For new tables created by metadata-manager:**
"Would you like to enable Bigeye monitoring for this table? This can check for:
- Freshness (when data was last updated)
- Volume (row count anomalies)
- Column-level validation (nulls, uniqueness, formats)
- Custom business logic validation"

**For existing tables:**
"What type of monitoring would you like to configure?
1. Basic (freshness + volume)
2. Critical (freshness + volume with blocking)
3. Column-level validation
4. Custom SQL rules
5. All of the above"

**After determining monitoring type, check existing collections:**

Before configuring metadata.yaml, READ `references/existing_collections.md` to:
- Find the dataset in "Collections by Dataset" section
- Check if there's an existing collection for this dataset/team
- Note the notification channels used by similar tables

Ask the user: "Based on existing configurations, would you like to use the [Collection Name] collection with [notification channels]? Or create a new collection?"

### Step 2: Configure metadata.yaml

Add a `monitoring` section to metadata.yaml based on table type:

- **Basic (most tables):** `assets/metadata_monitoring_basic.yaml` - Freshness + volume, non-blocking
- **Critical (production):** `assets/metadata_monitoring_critical.yaml` - Blocking failures, collection assignment
- **Views:** `assets/metadata_monitoring_view.yaml` - Requires explicit partition_column

**Key settings:**
- `blocking: true` - Failures block deployments (use for critical tables)
- `collection` - Groups related tables, configures alerts
- `partition_column` - Required for views (or null if non-partitioned)

### Step 3: Generate bigconfig.yml

Use the bqetl CLI to auto-generate bigconfig.yml from metadata.yaml:

```bash
./bqetl monitoring update <dataset>.<table>
```

This command:
- Reads monitoring settings from metadata.yaml
- Generates appropriate metric definitions in bigconfig.yml
- Adds freshness/volume checks based on configuration
- Uses saved metrics from sql/bigconfig.yml

**What gets generated:**
- If `freshness.enabled: true` → Adds freshness metric
- If `volume.enabled: true` → Adds volume metric
- If `blocking: true` → Uses `freshness_fail`/`volume_fail` variants
- If `collection` specified → Groups under that collection

### Step 4: Customize bigconfig.yml (Optional)

Manually edit the generated bigconfig.yml for advanced use cases:

**Column-level validation:** Add `tag_deployments` section with `column_selectors` and metrics (is_not_null, is_unique, is_valid_client_id, etc.). See `sql/bigconfig.yml` for all available saved metrics.

**Lookback windows:** Adjust how far back Bigeye scans data (0=latest partition, 7=last 7 days, 28=last 28 days). Use longer lookback for tables with sporadic updates.

**When to customize:** Column-specific validation, custom thresholds, infrequent updates, different notification channels per metr

Files: 9

Size: 45.6 KB

Complexity: 74/100

Category: General

Source: https://github.com/mozilla/bigquery-etl-skills/tree/main/skills/bigconfig-generator

Related in General

modeling-omnistudio-epc-catalog

Included

Salesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).

Generalscripts

relationship-science-coach

Included

Use this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.

Generalscripts

building-sf-integrations

Included

Salesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).

Generalscripts

venue-templates

Included

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

Generalscripts

let-fate-decide

Included

Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.

Generalscripts

net-ops

Included

Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.

Generalscripts