agentic-system-design
Prescriptive Q&A workflow for designing agentic pipelines, multi-model councils, sub-agent hierarchies, and tool-loop hardening for any domain. Use when the user asks to "design an agent", "design a multi-agent system", "should I use a council/debate", "build a [domain] review agent" (HAZOP, finance, tutorial, marketing, compliance, accounting), "real agency vs workflow", "how to add sub-agents", "AI for [domain] review", or names patterns like "orchestrator-worker", "evaluator-optimizer", "Magentic", "ReAct", "plan-and-execute", "handoffs". Walks the user through 12 stages one question at a time and emits a buildable design doc with citations. Do NOT use for general coding questions, single-shot prompt tuning, or bare "use Claude to do X" requests with no agency requirement.
What this skill does
# Agentic System Design Prescriptive design partner for any agentic system: tool-loop agents, multi-model councils, sub-agent hierarchies, plan-execute pipelines, handoff networks. Built around 2026 SOTA practice from Anthropic, OpenAI, Microsoft, and the multi-agent-debate literature. Outputs a buildable design doc. Opinionated by design: most "agent" requests are workflows; most "council" requests are wasteful; most "depth-3" hierarchies are depth-2 with a tool that needed renaming. The skill filters ruthlessly *before* the user starts building. --- ## Quick Start **User just asks:** ``` "Design an agent that does HAZOP analysis" "Should I use a multi-model council for finance review?" "Help me design an AI tutor pipeline" "I want to build an AI brand strategist — orchestrator-worker or handoff?" "Add sub-agents to my research pipeline" "Real agency or workflow?" ``` **Claude Code will:** 1. Run the 12-stage Q&A flow, **one question at a time** (Socratic, à la `superpowers:brainstorming`) 2. Filter through the agent-washing rubric, council-decision test, and depth-3 sanity check before committing the user to anything expensive 3. Pick a pattern (1 of 7), a council shape (0 or 1 of 7), persona roster, model routing, and tool-loop config 4. Emit a design doc with citations, anti-patterns surfaced, and a build order **You do not write code in this skill.** The output is a design doc. Implementation lives in `agentic-toolkit` (the companion plugin) and the user's repo. --- ## Critical Rules ### 1. One Question at a Time This is a brainstorming skill, not a form. Ask one question, wait for the answer, then ask the next. Multiple-choice when possible. No question dumps. If the user pastes a wall of context, extract the answers they've implicitly given, **summarize them back**, and ask only the missing ones. ### 2. No Premature Implementation Do **not** write code, scaffolding, prompt templates, or pseudo-code during the Q&A flow. The skill's value is the discovery loop. Code goes in the design doc's "build order" section as a checklist, not a draft. If the user says "just write the code" before stage 12, push back: *"Let me lock the pattern and roster first — implementations are 10× harder to fix than designs."* ### 3. Filter Before Designing Three hard filters fire **early** and **explicitly**: - **Stage 4** — Real-agency-vs-workflow rubric (`≥4` yes → real agency; `≤2` → it's a workflow, stop calling it an agent) - **Stage 6** — Council-decision test (4 conditions; if zero hold, single agent + retry) - **Stage 11** — Depth-3 sanity check (3 named cases only) If the user "fails" a filter, the skill **does not lecture** — it pivots cleanly: *"Looks like a workflow. Here's the right shape for that, and how to add agentic frosting later if it pays off."* ### 4. Prescriptive Output, Not a Catalog The output is a **design doc** the user can hand to an engineer. Every recommendation has a rationale and a citation. No "here are seven options, pick one" — the skill picks, the user pushes back if they disagree. ### 5. Cite Primary Sources Every non-trivial claim ends in an inline markdown URL. Anthropic, OpenAI, Microsoft, arXiv, OpenReview. No LinkedIn-thought-leadership, no Medium summaries, no "as everyone knows." --- ## The 12-Stage Q&A Flow Each stage is one question (or a tight cluster). Pause between stages. Score, branch, then move on. ### Stage 1 — Use-case capture **Ask:** > What does the system do, what's the output, and what's the blast radius of one bad output? "Bad output = bad tweet" and "bad output = LOPA mis-scoping that misses a hazard" demand entirely different designs. Blast radius gates everything downstream — councils, judges, human gating. **Listen for:** the noun (tweet / journal entry / HAZOP cause / lesson script), the verb (generate / classify / review / debate), and the consequence (visible to whom, reversible or not, regulated or not). **Output:** one-paragraph use-case statement + blast-radius tag (low / medium / high / regulated). See `references/case-*.md` for shape templates. ### Stage 2 — Operational mode **Ask:** > How does the system get triggered to run? | Mode | Description | Examples | |------|-------------|----------| | **A. Synchronous request-response** | User/API call → council runs → returns one answer | Brandling caption gen, HAZOP review on demand, finance audit on a specific entity | | **B. Batch** | Process N items offline; returns aggregated results | Review 1,000 contracts overnight, score a quarter's transactions | | **C. Event-driven** | System wakes on an external signal | CVE feed, customer complaint email, log anomaly, deploy event | | **D. Continuous / scheduled** | Runs on a cadence with persistent state between runs | Daily cost-anomaly scan, weekly compliance sweep, ongoing telemetry watch | This question is asked **early** because the trigger model determines a separate infrastructure layer (scheduler / queue / listener / state persistence) that the rest of the skill **does not cover**. Catching it now prevents users from designing a beautiful council and discovering at implementation time that they have no answer for "how does it actually wake up?" **Branching:** | Mode | Skill behavior | |------|----------------| | **A** | Continue normally to Stage 3. The rest of this skill assumes sync request-response and is fully sufficient. | | **B** | Continue, but flag in the design doc: "needs queue + idempotency keys + rate limiting layer — design separately." | | **C** or **D** | Pause. The council/persona design from this skill applies, but the user also needs a **sensor/listener layer** (C) or **scheduler + state-persistence layer** (D), plus dedup, backpressure, and dead-letter handling. **These are out of scope here.** Continue with the council half if the user wants it; flag explicitly that the trigger/infra half needs separate design (future `agentic-platforms` skill). | **Output of stage:** operational-mode tag (A/B/C/D) + trigger surface (HTTP endpoint / cron / queue subscriber / webhook listener / etc.) + list of out-of-scope infrastructure layers (if B/C/D). **Anti-pattern to surface:** a "council that runs continuously" without addressing where the trigger comes from, how state persists between runs, or how to dedup signals. This is half a system. If the user can't name the trigger surface, they're not ready to build mode C or D — recommend they ship mode A first against a manual trigger, then upgrade. ### Stage 3 — Boundary identification **Ask:** > What can the system **not** touch? Existing safety guards, deterministic computations, regulatory constraints, segregation-of-duties. A council blowing through these breaks the safety guarantees the rest of the system depends on (the yf-hazop V3 state-masking + V4 LOPA-independence lesson: agents consume *masked* slices and stop short of deterministic math). **Listen for:** "LOPA math is pure-Python", "GAAP forbids the model from inventing account codes", "incident DB is read-only to the agent". **Output:** explicit boundary list — input fields off-limits, outputs off-limits, deterministic ops not to replace. **Branch:** if boundaries dominate (80%+ of the work is deterministic), warn that "agent" may be overkill — they may want a thin LLM cap on a deterministic core. ### Stage 4 — Real-agency-vs-workflow rubric (the FILTER) **Ask all six. Score 1 point per yes:** 1. **Tool order is unknown at design time.** Does the model decide which tool to call next from observations, or is the sequence hardcoded in code/graph edges? 2. **Tool count varies per run.** Can the same input produce 2 tool calls in one run and 14 in another? 3. **The model can spawn new sub-agents at runtime.** Not pre-declared workers — actual runtime delegation of an unspecified subtask. 4. **Environmental feedback shapes the next decision.** Tool outputs influence the next step; the model isn't running a stale plan. 5. **A real sto
Related in Design
contribute
IncludedLocal-only OSS contribution command center. Auto-refreshes the user's in-flight PR and issue state on invoke so conversations start with full context — no need to brief Claude on what's in flight. Helps the user find issues to contribute to on GitHub, builds per-repo dossiers of what each upstream expects (CLA, DCO, branch convention, AI policy, draft-first, review bots, issue templates), runs deterministic gates before any external action so AI-assisted contributions don't reach maintainers as slop. State is markdown-only: candidate files at ~/.contribute-system/candidates/, repo dossiers at ~/.contribute-system/research/, append-only event log at ~/.contribute-system/log.jsonl. No database, no cloud calls. Use when the user asks about their PRs / issues / contributions, wants to find new work to take on, claim an issue, build/refresh a repo's dossier, or draft a Design Issue or PR. Trigger with "/contribute", "what's my PR status", "find a contribution", "claim issue X", "draft a Design Issue for Y", "refresh dossier for Z".
architectural-analysis
IncludedUser-triggered deep architectural analysis of a codebase or scoped subtree across eight modes — information architecture, data flow, integration points, UI surfaces, interaction patterns, data model, control flow, and failure modes. This skill should be used when the user asks to "diagram this codebase," "map the architecture," "show the data flow," "give me an ERD," "trace control flow," "find the integration points," "verify the layout pattern," "audit the UX architecture," or any similar request whose primary deliverable is mermaid diagrams plus cited reports under docs/architecture/. Dispatches haiku/sonnet sub-agents in parallel for per-mode exploration, then verifies every citation mechanically before any node lands in a diagram. Not for one-off prose explanations of code (use code-explanation) or for high-level system design from scratch (use system-design).
mcp
IncludedModel Context Protocol (MCP) server development and tool management. Languages: Python, TypeScript. Capabilities: build MCP servers, integrate external APIs, discover/execute MCP tools, manage multi-server configs, design agent-centric tools. Actions: create, build, integrate, discover, execute, configure MCP servers/tools. Keywords: MCP, Model Context Protocol, MCP server, MCP tool, stdio transport, SSE transport, tool discovery, resource provider, prompt template, external API integration, Gemini CLI MCP, Claude MCP, agent tools, tool execution, server config. Use when: building MCP servers, integrating external APIs as MCP tools, discovering available MCP tools, executing MCP capabilities, configuring multi-server setups, designing tools for AI agents.
react-native-skia
IncludedDesign, build, debug, and optimise high-polish animated graphics in React Native or Expo using @shopify/react-native-skia, Reanimated, and Gesture Handler. Use when the user wants canvas-driven UI, shaders, paths, rich text, image filters, sprite fields, Skottie, video frames, snapshots, web CanvasKit setup, or performance tuning for custom motion-heavy elements such as loaders, hero art, cards, charts, progress indicators, particle systems, or gesture-driven surfaces. Also use when the user asks for fluid, glow, glass, blob, parallax, 60fps/120fps, or GPU-friendly animated effects in React Native, even if they do not explicitly say "Skia". Do not use for ordinary form/layout work with standard views.
plaid
IncludedProduct Led AI Development — guides founders from idea to launched product. Six capabilities: Idea (discover a product idea), Validate (pressure-test the idea against fatal flaws, problem reality, competition, and 2-week MVP feasibility), Plan (vision intake + document generation), Design (translate image references into a design.md spec), Launch (go-to-market strategy), and Build (roadmap execution). Use when someone says "PLAID", "plaid idea", "help me find an idea", "product idea", "idea from my business", "idea from my expertise", "plaid validate", "validate my idea", "pressure-test", "is this idea good", "find fatal flaws", "validate the problem", "plan a product", "define my vision", "generate a PRD", "product strategy", "plaid design", "design from image", "translate image to design", "create design.md", "extract design tokens", "plaid launch", "go-to-market", "launch plan", "GTM strategy", "launch playbook", "plaid build", "build the app", "start building", or "execute the roadmap".
nextjs-framer-motion-animations
IncludedAdds production-safe Motion for React or Framer Motion animations to Next.js apps, including reveal, hover and tap micro-interactions, whileInView, stagger, AnimatePresence, layout and layoutId transitions, reorder, scroll-linked UI, and lightweight route-content transitions. Use when the user asks to add, refactor, or debug Motion or Framer Motion in App Router or Pages Router codebases, especially around server/client boundaries, reduced motion, LazyMotion, bundle size, hydration, or route transitions. Avoid for GSAP-style timelines, WebGL or 3D scenes, heavy scroll storytelling, or CSS-only effects unless Motion is explicitly requested.