doc-snapshot-agent
Automatically illustrate Markdown documents by turning image markers into browser screenshots or AI-generated images, then writing an image-enriched Markdown output. Use when a document needs screenshots, generated visuals, semantic image placement, or end-to-end document illustration automation.
What this skill does
## When to Use
Load this skill when a Markdown document needs real images — screenshots of live web pages, AI-generated editorial illustrations, or a rerun that only fixes image placement in an already-processed file.
Use it when the user asks to:
- add images to a Markdown article
- process a case file with image markers
- capture screenshots for documentation
- generate article visuals and insert them into a document
- rerun or fix image placement in an already processed document
Do not use it for pure text editing, proofreading, or translation — those tasks do not benefit from browser automation or image generation and should be handled directly.
## Architecture
This skill has a single entry point (this file) plus four sibling references for depth. It does not create hidden memory folders, does not persist browser state, and does not send any data beyond what the target workflow requires.
All paths (input cases, output images, illustrated Markdown, cache) resolve under one `{project-root}` the user names at the start of the run. Browser work routes exclusively through the Playwright MCP server; generated images route through a bundled Python script that calls OpenRouter.
## Quick Start
1. **Check Playwright MCP tools** — confirm `mcp__playwright__browser_navigate` and other `mcp__playwright__*` tools are available. If missing, send the user the install snippet from `references/mcp-setup.md` and stop.
2. **Confirm the project root** — ask once; default to `/tmp/doc-snapshot-agent` if the user has no preference.
3. **Inspect existing artifacts** — reuse anything already on disk (see Incremental Execution).
4. **Parse the case file** — merge markers from heading form, HTML-comment form, and the Image Summary table.
5. **Capture, generate, place, write README** — follow the Workflow section.
## Quick Reference
| Topic | File |
|-------|------|
| Install Playwright MCP for each client, grant permissions, runtime setup | `references/mcp-setup.md` |
| Navigate, snapshot, login, capture, verify — full browser loop and tool patterns | `references/browser-capture.md` |
| Build and maintain site-specific navigation knowledge | `references/site-explorer.md` |
| Prompt construction and script usage for generated images | `references/image-generation.md` |
| Image generation CLI | `scripts/generate_image.py` |
## Approach Selection
| Situation | Best path | Why |
|-----------|-----------|-----|
| Article already has screenshots in `output/{article-id}/raw/` and the user only wants the Markdown rebuilt | Skip capture, rerun Step 5 (Illustrated Markdown) | Browser work is expensive; Markdown regeneration is cheap |
| Marker type is `screenshot` and the page is publicly reachable | Playwright MCP navigate → snapshot → capture | Reliable, inspectable, handles JS rendering |
| Marker type is `screenshot` and the page is behind auth | Playwright MCP with `PLAYWRIGHT_CRED_*` env vars | Keeps secrets out of prompts and the transcript |
| Marker type is `generated` (editorial, hero, conceptual) | `scripts/generate_image.py` via OpenRouter | Screenshots cannot render conceptual imagery |
| Marker landed on the wrong paragraph | Reparse case file, reapply semantic placement | Re-capturing won't fix placement bugs |
| Required MCP tools are missing in the runtime | Stop, point user to `references/mcp-setup.md` | Workflow cannot proceed without MCP |
## Workflow
### Step 0: Verify Playwright MCP
Run this check at the start of **every** execution, not just the first time.
1. Detect tools whose name starts with `mcp__playwright__`. Required: `browser_navigate`, `browser_snapshot`, `browser_take_screenshot`.
2. If they are missing, stop and hand the user the matching install snippet from `references/mcp-setup.md` (Claude Code, Codex, VS Code/Cursor/Kiro, Claude Desktop, or standalone). Include the `permissions.allow: ["mcp__playwright__*"]` note for Claude Code and Codex.
3. After the user installs and restarts the client, resume from here rather than restarting the run.
Do **not** substitute direct Playwright library calls or any browser tool that lacks the `mcp__playwright__` prefix. If the prefix is missing, the call does not go through the MCP server.
### Step 0.5: Confirm the project root
Ask once:
> Which directory should I use as the project root for this run?
- If the user provides a path, use it as `{project-root}`.
- If the user says "no preference", skips, or does not answer, default to `/tmp/doc-snapshot-agent`.
- Create the directory if it does not exist.
All subsequent paths (`cases/`, `output/`, `.cache/`, `scripts/`, `references/`) resolve under `{project-root}/`.
Recommended layout inside `{project-root}/`:
```text
{project-root}/
├── cases/
│ └── {article-id}.md
├── output/
│ ├── {article-id}/
│ │ ├── raw/
│ │ │ ├── A1_example.png
│ │ │ └── A2_example.png
│ │ ├── A1_example.png
│ │ ├── A2_example.png
│ │ └── README.md
│ └── markdowns/
│ └── {article-id}.md
└── .cache/
└── screenshots/
└── {article-id}/
```
Conventions:
- `cases/` holds the source Markdown.
- `output/{article-id}/raw/` holds original browser screenshots — **never overwrite** files here.
- `output/{article-id}/` holds post-processed assets that the final Markdown references.
- `output/markdowns/` holds the final illustrated Markdown.
- `.cache/screenshots/` holds reusable screenshot cache entries.
If the user specifies a different layout, follow their instruction.
### Step 1: Parse the case file
Merge image requirements from three sources:
1. inline heading-based screenshot markers
2. inline `<!-- IMAGE: ... -->` markers
3. the `Image Summary` table
For each image, record: type (`screenshot` or `generated`), filename, marker id if present, description or purpose, source URL if present, post-processing instruction if present, exact inline location if present, and whether semantic placement is still required. Also detect the target websites referenced by the article.
### Step 2: Prepare the environment
- create output directories
- check the screenshot cache for reusable entries
- load credentials from environment variables (pattern: `PLAYWRIGHT_CRED_{SERVICE}_{FIELD}`)
- re-confirm Playwright MCP tools are present
- if the Chromium runtime is missing, run `npx playwright install chromium` (see `references/mcp-setup.md`)
- if the target flow needs login/signup/invite/verification and the required information is not already supplied, pause and ask the user before taking any account-specific action
### Step 2.5: Understand the target site
Bad screenshots usually come from landing on the wrong page, not from the wrong capture command. Before capturing:
1. Check for existing site knowledge under `$IMAGE_AGENT_SITE_KNOWLEDGE_DIR/` and `$IMAGE_AGENT_SITE_LEARNING_DIR/`.
2. Derive a stable `site-key` from the domain (`memclaw.me` → `memclaw`, `app.felo.ai` → `felo`).
3. If `{site-key}.md` exists and is recent, read it before browsing.
4. If knowledge is missing or stale, run a structured site exploration — see `references/site-explorer.md` — and save findings for reuse.
5. Map every screenshot description to a specific page or UI state: target URL or click path, required visible elements, scroll/tab/expand actions needed.
6. Append new knowledge to the site knowledge files whenever browsing discovers something worth remembering.
### Step 3: Capture browser screenshots
Follow `references/browser-capture.md` for the full navigate → snapshot → act → wait → capture → verify loop and the concrete tool patterns.
Typical flow:
- open the target website
- log in if required (credentials from env)
- navigate to the correct page or UI state
- wait for key content to load
- resize the viewport if the requested layout needs it
- save screenshots to `{project-root}/output/{article-id}/raw/`
Naming rule:
- if a marker id exists, save as `{marker-id}_{filename}` (e.g. `A1_workspace-dashboard.png`)
- otherwise use the originaRelated in Design
contribute
IncludedLocal-only OSS contribution command center. Auto-refreshes the user's in-flight PR and issue state on invoke so conversations start with full context — no need to brief Claude on what's in flight. Helps the user find issues to contribute to on GitHub, builds per-repo dossiers of what each upstream expects (CLA, DCO, branch convention, AI policy, draft-first, review bots, issue templates), runs deterministic gates before any external action so AI-assisted contributions don't reach maintainers as slop. State is markdown-only: candidate files at ~/.contribute-system/candidates/, repo dossiers at ~/.contribute-system/research/, append-only event log at ~/.contribute-system/log.jsonl. No database, no cloud calls. Use when the user asks about their PRs / issues / contributions, wants to find new work to take on, claim an issue, build/refresh a repo's dossier, or draft a Design Issue or PR. Trigger with "/contribute", "what's my PR status", "find a contribution", "claim issue X", "draft a Design Issue for Y", "refresh dossier for Z".
architectural-analysis
IncludedUser-triggered deep architectural analysis of a codebase or scoped subtree across eight modes — information architecture, data flow, integration points, UI surfaces, interaction patterns, data model, control flow, and failure modes. This skill should be used when the user asks to "diagram this codebase," "map the architecture," "show the data flow," "give me an ERD," "trace control flow," "find the integration points," "verify the layout pattern," "audit the UX architecture," or any similar request whose primary deliverable is mermaid diagrams plus cited reports under docs/architecture/. Dispatches haiku/sonnet sub-agents in parallel for per-mode exploration, then verifies every citation mechanically before any node lands in a diagram. Not for one-off prose explanations of code (use code-explanation) or for high-level system design from scratch (use system-design).
mcp
IncludedModel Context Protocol (MCP) server development and tool management. Languages: Python, TypeScript. Capabilities: build MCP servers, integrate external APIs, discover/execute MCP tools, manage multi-server configs, design agent-centric tools. Actions: create, build, integrate, discover, execute, configure MCP servers/tools. Keywords: MCP, Model Context Protocol, MCP server, MCP tool, stdio transport, SSE transport, tool discovery, resource provider, prompt template, external API integration, Gemini CLI MCP, Claude MCP, agent tools, tool execution, server config. Use when: building MCP servers, integrating external APIs as MCP tools, discovering available MCP tools, executing MCP capabilities, configuring multi-server setups, designing tools for AI agents.
react-native-skia
IncludedDesign, build, debug, and optimise high-polish animated graphics in React Native or Expo using @shopify/react-native-skia, Reanimated, and Gesture Handler. Use when the user wants canvas-driven UI, shaders, paths, rich text, image filters, sprite fields, Skottie, video frames, snapshots, web CanvasKit setup, or performance tuning for custom motion-heavy elements such as loaders, hero art, cards, charts, progress indicators, particle systems, or gesture-driven surfaces. Also use when the user asks for fluid, glow, glass, blob, parallax, 60fps/120fps, or GPU-friendly animated effects in React Native, even if they do not explicitly say "Skia". Do not use for ordinary form/layout work with standard views.
plaid
IncludedProduct Led AI Development — guides founders from idea to launched product. Six capabilities: Idea (discover a product idea), Validate (pressure-test the idea against fatal flaws, problem reality, competition, and 2-week MVP feasibility), Plan (vision intake + document generation), Design (translate image references into a design.md spec), Launch (go-to-market strategy), and Build (roadmap execution). Use when someone says "PLAID", "plaid idea", "help me find an idea", "product idea", "idea from my business", "idea from my expertise", "plaid validate", "validate my idea", "pressure-test", "is this idea good", "find fatal flaws", "validate the problem", "plan a product", "define my vision", "generate a PRD", "product strategy", "plaid design", "design from image", "translate image to design", "create design.md", "extract design tokens", "plaid launch", "go-to-market", "launch plan", "GTM strategy", "launch playbook", "plaid build", "build the app", "start building", or "execute the roadmap".
nextjs-framer-motion-animations
IncludedAdds production-safe Motion for React or Framer Motion animations to Next.js apps, including reveal, hover and tap micro-interactions, whileInView, stagger, AnimatePresence, layout and layoutId transitions, reorder, scroll-linked UI, and lightweight route-content transitions. Use when the user asks to add, refactor, or debug Motion or Framer Motion in App Router or Pages Router codebases, especially around server/client boundaries, reduced motion, LazyMotion, bundle size, hydration, or route transitions. Avoid for GSAP-style timelines, WebGL or 3D scenes, heavy scroll storytelling, or CSS-only effects unless Motion is explicitly requested.