Claude
Skills
Sign in
Back

code-to-catalog

Included with Lifetime
$97 forever

Turns a codebase into EventCatalog documentation through an evidence-based interview. Scans the code first, proposes an architectural model (domains, services, agents, messages, channels), grills the user on the structural decisions, produces a reviewable plan file, then hands off to catalog-documentation-creator. Use when user says "document my codebase in EventCatalog", "turn this repo into a catalog", "model my code as a catalog", "document my agents", "document my AI agents", "grill me on my architecture", "update my catalog from the code", "reconcile my catalog with my code", or "I don't know where to start documenting this codebase". Works for brand-new catalogs AND for updating existing catalogs that have drifted from the code.

Ads & Marketing

What this skill does


# Code to Catalog

Turn a codebase into EventCatalog documentation through a guided, evidence-based interview. Works for two situations:

1. **No catalog yet** — document an unfamiliar or undocumented codebase from scratch.
2. **Existing catalog** — reconcile the catalog with the current code (add new resources, flag drift, surface stale entries).

This skill does not write catalog files itself. It produces a **plan file** (`.catalog-plan.md`) that captures the agreed architectural model, then hands off to the `catalog-documentation-creator` skill to generate the actual documentation.

## How this skill works

The skill runs in six phases. Follow them in order — later phases depend on earlier ones.

1. **Locate & inventory** — find the code directory and any existing catalog
2. **Discovery scan** — read the code, form a hypothesis
3. **Reconcile with existing catalog** — categorize findings as `new` / `update` / `unchanged` / `investigate`
4. **Tiered grilling** — interview the user on structural decisions only
5. **Produce the plan file** — write `.catalog-plan.md`, get approval
6. **Handoff** — ask whether to generate the catalog now or stop at the plan

## Conversational style (applies throughout)

- **One question at a time.** Never batch questions. The user answers, then move on.
- **Always provide a recommended answer.** Every question includes what you think is true, with evidence (file path + line number). The user confirms, corrects, or overrides.
- **Cite the code.** When you present a finding, point at the file where you saw it — e.g., `src/orders/events.ts:42`. The user should be able to verify without trusting you.
- **Be honest about uncertainty.** If the code does not tell you whether something is an event or a command, say so. Do not guess silently.
- **Surface conflicts, do not pick silently.** When catalog and code disagree, the user decides. Never overwrite without confirmation.
- **Respect the user's time.** Grilling is tiered on purpose — structural decisions only. Do not grill on per-resource fields (summaries, owners, schemas).
- **No catalog deletions.** Resources in the catalog that you cannot find in code are flagged `investigate` — never removed automatically.

## Phase 1: Locate & inventory

### Find the codebase

Ask the user: **"Which code directory should I analyze?"**

Verify the directory exists and looks like a code project (has a `package.json`, `pom.xml`, `go.mod`, `Cargo.toml`, `pyproject.toml`, source directories, etc.). If the directory is ambiguous (e.g., a monorepo), confirm the scope: the whole repo, or a specific subdirectory.

### Find the catalog

Ask the user: **"Do you already have an EventCatalog project, or do you want to start fresh?"**

**If they already have one:**

- Ask for the path. Verify it's an EventCatalog project by checking for `eventcatalog.config.js` or the standard directories (`services/`, `agents/`, `events/`, `commands/`, `queries/`, `domains/`, `channels/`, `flows/`).
- Build an inventory of what already exists. If the EventCatalog MCP server is connected, use `getResources`, `getResource`, `findResourcesByOwner`. Otherwise read the filesystem directly and parse the frontmatter of each `index.md`/`index.mdx`.
- Record for each resource: `id`, `name`, `version`, `type`, `summary`, and (for services and agents) `sends` / `receives` relationships.
- Note the catalog's conventions: nested (`domains/X/services/Y/events/Z`) vs flat, PascalCase vs kebab-case IDs, existing owners, schema formats in use.

**If they do not have a catalog:**

- That's fine. Note that scaffolding will happen at handoff time through `catalog-documentation-creator` (which runs `npx @eventcatalog/create-eventcatalog@latest <name> --empty`).
- Phase 3 (reconciliation) becomes a no-op — everything discovered will be `new`.

## Phase 2: Discovery scan

Read the codebase and form a hypothesis. Do **not** show the user your findings yet — you'll present them as questions in Phase 4, backed by evidence.

For detailed detection heuristics per language/framework (Node.js, Python, Go, Java, .NET), see `references/discovery.md`. Read that file now if the codebase uses a stack you need guidance on.

Detect:

### Project structure

- Monorepo vs single service (look for workspace configs: `pnpm-workspace.yaml`, `package.json` with `workspaces`, `nx.json`, `turbo.json`, `lerna.json`, multiple top-level service directories with their own manifests).
- Language and framework per service or agent.
- Build/deploy units (Dockerfiles, Helm charts, `serverless.yml`, `cdk` stacks, k8s manifests).

### Service boundaries

A service is an independently-deployable, independently-ownable unit. Signals:

- Separate package with its own manifest
- Separate Dockerfile / deployment config
- Its own entrypoint (`main.ts`, `main.go`, `app.py`, etc.)
- Consumed by others over a network boundary (HTTP, message bus)

When in doubt, mark as a **candidate** and grill the user in Phase 4.

### Agent boundaries

An agent is an AI/LLM-powered runtime or worker that reasons, calls tools, or automates decisions. Signals:

- Explicit names: `*Agent`, `*Assistant`, `*Copilot`, `*Worker` with LLM/tool orchestration
- LLM SDK usage: OpenAI, Anthropic, Gemini, Vercel AI SDK, LangChain, LlamaIndex, Mastra, CrewAI, AutoGen
- Tool registries or callable tools: MCP clients/servers, `tools`, `function_call`, `tool_choice`, `executeTool`
- Agent frameworks: `Agent`, `createAgent`, `runAgent`, graph/workflow nodes using an LLM
- Memory/state stores used by the agent: vector DBs, Redis, Postgres, Supabase, Pinecone, Qdrant, Chroma

Do not classify a plain service as an agent just because it calls an LLM once. Treat it as an agent when the code owns a durable assistant/worker boundary, tool set, model policy, memory, or autonomous workflow.

### Messages (events, commands, queries)

Candidates come from:

- **Naming patterns** — `*Created`, `*Placed`, `*Updated`, `*Deleted` (likely events); `Place*`, `Create*`, `Cancel*`, `Process*` (likely commands); `Get*`, `Find*`, `List*` (likely queries).
- **Message bus clients** — Kafka (`kafkajs`, `confluent-kafka`, `sarama`), RabbitMQ (`amqplib`, `pika`), NATS, AWS SNS/SQS/EventBridge, GCP PubSub, Azure Service Bus.
- **Schema files** — JSON Schema (`.schema.json`), Avro (`.avsc`), Protobuf (`.proto`). These are strong signals of a message contract.
- **DTO / type definitions** — especially if they look like payloads (flat, data-only, named after a domain event).

Classify each candidate as **event**, **command**, or **query** based on evidence:

- Event: past tense, published to a topic/exchange, multiple consumers possible, no direct reply expected.
- Command: imperative, sent to one specific handler, expects to be processed.
- Query: read-only, expects a response.

If evidence is ambiguous (common), mark it as **uncertain** and grill in Phase 4 — do not silently pick.

### Channels

Anywhere messages flow through named infrastructure:

- Kafka topics (string literals passed to `producer.send({ topic: '...' })`)
- RabbitMQ queues/exchanges
- SNS topics, SQS queues
- HTTP endpoints for query services (`GET /users/:id`)

### Domains (candidates)

Strong signals:

- Top-level folder grouping (`src/orders/`, `src/payments/`, `src/shipping/`)
- Bounded-context hints in READMEs, module docs
- Package namespaces (`com.company.orders.*`)
- Ownership files (`CODEOWNERS`, `.codeowners`)

Do not guess domains with low confidence. If unclear, propose "single domain = whole codebase" and let the user split in Phase 4.

### Containers

Databases, caches, queues referenced in config, env vars, or client instantiation:

- Postgres / MySQL / SQLite / Mongo / DynamoDB / Cassandra
- Redis / Memcached
- S3 buckets, GCS buckets

### Output of Phase 2

An internal draft map:

```
domains:
  - name: <candidate>
    confidence: high|medium|low
    services: [...]
    agents: [...]
services:
  - name: <candidate>
    path: <dir>
    sends: [...]

Related in Ads & Marketing