mcp-developer-analysis

Included with Lifetime

$97 forever

Analyze a Materialize environment via the MCP Developer endpoint, and/or configure an MCP client (Claude Code, Cursor, VS Code, Zed, Continue, Windsurf, Claude Desktop) to connect to the materialize-developer server. For analysis: check environment health, investigate performance, troubleshoot stale materialized views, diagnose memory pressure, audit resource utilization, get optimization recommendations. For client connection: configure/connect/set-up an MCP client to materialize-developer (Emulator, Cloud, or self-managed), control which user/role is used, switch between identities. Trigger even if user just says "check my environment", "why is my MV stale", "why is my cluster slow", "what can I optimize", "how do I connect Claude Code to materialize-developer", or "configure Cursor for the Materialize MCP".

Cloud & DevOps

What this skill does


# Materialize Developer Analysis

Analyze a Materialize environment by querying system catalog tables via the MCP
Developer endpoint (`query_system_catalog` tool), and produce a structured
report with health status, performance findings, and optimization recommendations.

## Connecting an MCP client to `materialize-developer`

If the user is asking how to **configure**, **connect**, or **set up** an MCP
client (Claude Code, Cursor, VS Code, Zed, Continue, Windsurf, Claude Desktop)
to talk to the `materialize-developer` server — including how to control which
user or role the connection uses, or how to switch between identities — see
[`mcp-client-connect.md`](mcp-client-connect.md). It covers the Emulator,
Materialize Cloud, and self-managed deployments, with per-client snippets,
authentication patterns, and verification recipes.

The rest of this `SKILL.md` covers what to do *once you are connected* and want
to analyze the environment.

## Discovering Tables and Columns

**Do NOT guess column names.** Before writing queries, check if the `mz_ontology`
schema is available by running:

```sql
SHOW TABLES FROM mz_ontology
```

### If mz_ontology is available

Use it to discover the correct tables, columns, join paths, and ID types:

| Table | What it tells you |
|-------|-------------------|
| `mz_ontology.mz_ontology_entity_types` | What catalog entities exist and which `mz_*` table they map to. |
| `mz_ontology.mz_ontology_link_types` | Relationships between entities (foreign keys, metrics, etc.). |
| `mz_ontology.mz_ontology_properties` | Column names, types, and descriptions for each entity. |
| `mz_ontology.mz_ontology_semantic_types` | Typed ID domains (CatalogItemId, ReplicaId, etc.). |

Example queries:
```sql
-- Find the right table for an entity
SELECT name, relation, description
FROM mz_ontology.mz_ontology_entity_types
WHERE name LIKE '%source%'

-- Find join paths between entities
SELECT name, source_entity, target_entity, properties, description
FROM mz_ontology.mz_ontology_link_types
WHERE source_entity = 'source' OR target_entity = 'source'

-- Find columns for a table
SELECT column_name, semantic_type, description
FROM mz_ontology.mz_ontology_properties
WHERE entity_type = 'source_status'
```

### If mz_ontology is NOT available

Use `SHOW COLUMNS FROM <schema>.<table>` to verify column names before querying.
Refer to the Critical Rules below for known pitfalls.

## Critical Rules

### Known column name pitfalls

Even with the ontology, be aware of these common mistakes:

| Wrong | Correct | Table |
|-------|---------|-------|
| `updated_at` | `last_status_change_at` | `mz_source_statuses`, `mz_sink_statuses` |
| `cluster_name` | Must JOIN through `replica_id` to `mz_cluster_replicas` then `mz_clusters` | `mz_cluster_replica_utilization` |

When unsure, run `SHOW COLUMNS FROM <schema>.<table>` to verify.

### Do NOT query mz_dataflow_arrangement_sizes

**NEVER query `mz_introspection.mz_dataflow_arrangement_sizes`** via MCP. It
fails for two reasons:

1. **Cluster-scoped**: Only returns data for the session's current cluster,
   and the MCP tool does not support `SET cluster = ...` to switch clusters.
2. **Type mismatch**: Its `id` column is `uint8`, not `text` like
   `mz_catalog.mz_objects.id`. JOINs fail with `operator does not exist: uint8 = text`.

Instead, use:
- `mz_internal.mz_cluster_replica_utilization` — memory/CPU/disk percentage
- `mz_internal.mz_cluster_replica_metrics` — raw memory bytes
- `mz_internal.mz_index_advice` — find MVs/indexes that can be removed

### Type casting notes

Some `mz_introspection` views use `uint8` for ID columns instead of `text`.
**Avoid JOINing `mz_introspection` views with `mz_catalog` views unless you
cast IDs explicitly.** The `mz_internal` views all use `text` IDs and are safe
to JOIN with `mz_catalog`.

### Discovering tables without the ontology

If `mz_ontology` is not available, use these fallbacks:
- `SHOW COLUMNS FROM <schema>.<table>` to check a table's columns
- **Do NOT use `SHOW TABLES FROM mz_internal LIKE '...'`** — this only shows
  tables, not views. Most system catalog objects are views.

## Workflow Overview

1. **Connect** — Verify the MCP Developer tools are available
2. **Discover** — Use the ontology + catalog queries to inventory all deployed objects
3. **Analyze** — Assess performance metrics: freshness, hydration, memory, utilization
4. **Report** — Produce a structured markdown report with findings and recommendations

## Step 1: Verify MCP Connection

Confirm you have access to the `query_system_catalog` tool. Run a quick test:

```
query_system_catalog: SELECT mz_version()
```

If this fails, check:
- The MCP server is configured in `.mcp.json`
- The `enable_mcp_developer` feature flag is enabled on the environment
- Your authentication credentials are valid

### Running Queries

All queries are run via the `query_system_catalog` MCP tool. Constraints:
- One statement per call (no semicolons)
- Read-only: SELECT, SHOW, EXPLAIN only
- System tables only: no access to user tables
- No `SET` statements

When filtering out system schemas, always exclude: `mz_catalog`, `mz_internal`,
`pg_catalog`, `information_schema`, and `mz_introspection`.

## Step 2: Discover — Inventory the Environment

Run the discovery queries to understand what is deployed. See
`references/queries.md` for the full query set. The discovery phase covers:

### Environment Overview
- Materialize version (`SELECT mz_version()`)
- Clusters and replicas — names, sizes, and replica counts
- Schemas in use

### Deployed Objects Inventory
- **Sources**: type (Kafka, Postgres, MySQL, Webhook, etc.), cluster assignment, status
- **Materialized Views**: cluster assignment, indexes, dependencies
- **Views**: (non-materialized) and their usage patterns
- **Sinks**: type, destination, cluster assignment
- **Indexes**: what they're on, cluster assignment
- **Connections**: external system connections configured

Build a mental model of the data pipeline: what data comes in (sources), how it's
transformed (views/MVs), and where it goes out (sinks).

### Object Definitions

Retrieve SQL definitions for materialized views, views, indexes, and sources
using `references/queries.md`. This is critical for optimization analysis —
the SQL definitions tell you *how* things are computed:
- Join patterns and join order
- Filter predicates (or lack thereof — missing temporal filters are a common issue)
- Aggregation strategies
- Whether MVs duplicate logic that could be shared

## Step 3: Analyze — Performance and Resource Metrics

### Freshness (Lag Analysis)
Query `mz_internal.mz_materialization_lag` for per-object lag.

**Important**: The `write_frontier` column is of type `mz_timestamp` (a uint8),
not a standard timestamp. You cannot subtract it from `now()` directly. Cast to
get a human-readable time: `to_timestamp(write_frontier::bigint / 1000)`.

### Hydration Status
Query `mz_internal.mz_hydration_statuses` to check whether all dataflows are
hydrated. Non-hydrated objects after initial startup may indicate resource
pressure or configuration issues.

### Memory and Resource Consumption
- `mz_internal.mz_cluster_replica_utilization` for memory/CPU percentage per replica
- `mz_internal.mz_cluster_replica_metrics` for raw memory metrics
- `mz_internal.mz_index_advice` to identify which MVs/indexes can be optimized

### Index Advice
Query `mz_internal.mz_index_advice` — Materialize's built-in advisor. Hint types:
- **"keep"** — the MV/index is needed as-is
- **"drop unless queried directly"** — no structural dependencies; only useful for direct SELECT queries
- **"convert to a view"** — MV can be dematerialized entirely, saving all arrangement memory
- **"convert to a view with an index"** — convert MV to a view but keep its indexes
- **"add index"** — object would benefit from an index

### Cost Analysis (optional)
Query `mz_catalog.mz_cluster_replica_sizes` to get credit rates per cluster
size, then calculate: `credi

Files: 4

Size: 52.6 KB

Complexity: 52/100

Category: Cloud & DevOps

Source: https://github.com/materializeinc/agent-skills/tree/main/skills/mcp-developer-analysis

Related in Cloud & DevOps

appbuilder-action-scaffolder

Included

Create, implement, deploy, and debug Adobe Runtime actions with consistent layout, validation, and error handling. Use this skill whenever the user needs to add actions to an App Builder project, understand action structure (params, response format, web/raw actions), configure actions in the manifest, use App Builder SDKs (State, Files, Events, database), deploy and invoke actions via CLI, debug action issues, or implement patterns such as webhook receivers, custom event providers, journaling consumers, large payload redirects, action sequence pipelines, and Asset Compute workers. Also trigger when users mention serverless functions in Adobe context, action logging, IMS authentication for actions, or cron-style scheduled actions.

Cloud & DevOpsscripts

orchestrating-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. Use this skill when the user needs a multi-step Data Cloud pipeline, cross-phase troubleshooting, or data space and data kit management. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase sf data360 workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching phase-specific skill), the task is STDM/session tracing/parquet telemetry (use observing-agentforce), standard CRM SOQL (use querying-soql), or Apex implementation (use generating-apex).

Cloud & DevOpsscripts

github-project-automation

Included

Automate GitHub repository setup with CI/CD workflows, issue templates, Dependabot, and CodeQL security scanning. Includes 12 production-tested workflows and prevents 18 errors: YAML syntax, action pinning, and configuration. Use when: setting up GitHub Actions CI/CD, creating issue/PR templates, enabling Dependabot or CodeQL scanning, deploying to Cloudflare Workers, implementing matrix testing, or troubleshooting YAML indentation, action version pinning, secrets syntax, runner versions, or CodeQL configuration. Keywords: github actions, github workflow, ci/cd, issue templates, pull request templates, dependabot, codeql, security scanning, yaml syntax, github automation, repository setup, workflow templates, github actions matrix, secrets management, branch protection, codeowners, github projects, continuous integration, continuous deployment, workflow syntax error, action version pinning, runner version, github context, yaml indentation error

Cloud & DevOpsscripts

sf-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase `sf data360` workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching sf-datacloud-* skill), the task is STDM/session tracing/parquet telemetry (use sf-ai-agentforce-observability), standard CRM SOQL (use sf-soql), or Apex implementation (use sf-apex).

Cloud & DevOpsscripts

fabric-cli

Included

Use this skill for Fabric.so CLI workflows with the `fabric` terminal command: diagnose/install/login, search or browse a Fabric library, save notes/links/files, create folders, ask the Fabric AI assistant, manage tasks/workspaces, generate shell completion, check subscription usage, produce JSON output, and use Fabric as persistent agent memory. Do not use for Microsoft Fabric/Azure/Power BI `fab`, Daniel Miessler's Fabric framework, Python Fabric SSH, Fabric.js, or textile/fashion fabric.

Cloud & DevOpsscripts

lark

Included

Lark/Feishu CLI skills: lark-cli operations for docs, markdown, sheets, base, calendar, im, mail, task, okr, drive, wiki, slides, whiteboard, apps, approval, attendance, contact, vc, minutes, event. Use when the user needs to operate Lark/Feishu resources via lark-cli, send messages, manage documents, spreadsheets, calendars, tasks, OKRs, deploy web pages, or any Feishu/Lark workspace operations.

Cloud & DevOpsscripts