Claude
Skills
Sign in
Back

agentic-system-design

Included with Lifetime
$97 forever

Prescriptive Q&A workflow for designing agentic pipelines, multi-model councils, sub-agent hierarchies, and tool-loop hardening for any domain. Use when the user asks to "design an agent", "design a multi-agent system", "should I use a council/debate", "build a [domain] review agent" (HAZOP, finance, tutorial, marketing, compliance, accounting), "real agency vs workflow", "how to add sub-agents", "AI for [domain] review", or names patterns like "orchestrator-worker", "evaluator-optimizer", "Magentic", "ReAct", "plan-and-execute", "handoffs". Walks the user through 12 stages one question at a time and emits a buildable design doc with citations. Do NOT use for general coding questions, single-shot prompt tuning, or bare "use Claude to do X" requests with no agency requirement.

Design

What this skill does


# Agentic System Design

Prescriptive design partner for any agentic system: tool-loop agents, multi-model councils, sub-agent hierarchies, plan-execute pipelines, handoff networks. Built around 2026 SOTA practice from Anthropic, OpenAI, Microsoft, and the multi-agent-debate literature. Outputs a buildable design doc.

Opinionated by design: most "agent" requests are workflows; most "council" requests are wasteful; most "depth-3" hierarchies are depth-2 with a tool that needed renaming. The skill filters ruthlessly *before* the user starts building.

---

## Quick Start

**User just asks:**

```
"Design an agent that does HAZOP analysis"
"Should I use a multi-model council for finance review?"
"Help me design an AI tutor pipeline"
"I want to build an AI brand strategist — orchestrator-worker or handoff?"
"Add sub-agents to my research pipeline"
"Real agency or workflow?"
```

**Claude Code will:**

1. Run the 12-stage Q&A flow, **one question at a time** (Socratic, à la `superpowers:brainstorming`)
2. Filter through the agent-washing rubric, council-decision test, and depth-3 sanity check before committing the user to anything expensive
3. Pick a pattern (1 of 7), a council shape (0 or 1 of 7), persona roster, model routing, and tool-loop config
4. Emit a design doc with citations, anti-patterns surfaced, and a build order

**You do not write code in this skill.** The output is a design doc. Implementation lives in `agentic-toolkit` (the companion plugin) and the user's repo.

---

## Critical Rules

### 1. One Question at a Time

This is a brainstorming skill, not a form. Ask one question, wait for the answer, then ask the next. Multiple-choice when possible. No question dumps.

If the user pastes a wall of context, extract the answers they've implicitly given, **summarize them back**, and ask only the missing ones.

### 2. No Premature Implementation

Do **not** write code, scaffolding, prompt templates, or pseudo-code during the Q&A flow. The skill's value is the discovery loop. Code goes in the design doc's "build order" section as a checklist, not a draft.

If the user says "just write the code" before stage 12, push back: *"Let me lock the pattern and roster first — implementations are 10× harder to fix than designs."*

### 3. Filter Before Designing

Three hard filters fire **early** and **explicitly**:

- **Stage 4** — Real-agency-vs-workflow rubric (`≥4` yes → real agency; `≤2` → it's a workflow, stop calling it an agent)
- **Stage 6** — Council-decision test (4 conditions; if zero hold, single agent + retry)
- **Stage 11** — Depth-3 sanity check (3 named cases only)

If the user "fails" a filter, the skill **does not lecture** — it pivots cleanly: *"Looks like a workflow. Here's the right shape for that, and how to add agentic frosting later if it pays off."*

### 4. Prescriptive Output, Not a Catalog

The output is a **design doc** the user can hand to an engineer. Every recommendation has a rationale and a citation. No "here are seven options, pick one" — the skill picks, the user pushes back if they disagree.

### 5. Cite Primary Sources

Every non-trivial claim ends in an inline markdown URL. Anthropic, OpenAI, Microsoft, arXiv, OpenReview. No LinkedIn-thought-leadership, no Medium summaries, no "as everyone knows."

---

## The 12-Stage Q&A Flow

Each stage is one question (or a tight cluster). Pause between stages. Score, branch, then move on.

### Stage 1 — Use-case capture

**Ask:**
> What does the system do, what's the output, and what's the blast radius of one bad output?

"Bad output = bad tweet" and "bad output = LOPA mis-scoping that misses a hazard" demand entirely different designs. Blast radius gates everything downstream — councils, judges, human gating.

**Listen for:** the noun (tweet / journal entry / HAZOP cause / lesson script), the verb (generate / classify / review / debate), and the consequence (visible to whom, reversible or not, regulated or not).

**Output:** one-paragraph use-case statement + blast-radius tag (low / medium / high / regulated). See `references/case-*.md` for shape templates.

### Stage 2 — Operational mode

**Ask:**
> How does the system get triggered to run?

| Mode | Description | Examples |
|------|-------------|----------|
| **A. Synchronous request-response** | User/API call → council runs → returns one answer | Brandling caption gen, HAZOP review on demand, finance audit on a specific entity |
| **B. Batch** | Process N items offline; returns aggregated results | Review 1,000 contracts overnight, score a quarter's transactions |
| **C. Event-driven** | System wakes on an external signal | CVE feed, customer complaint email, log anomaly, deploy event |
| **D. Continuous / scheduled** | Runs on a cadence with persistent state between runs | Daily cost-anomaly scan, weekly compliance sweep, ongoing telemetry watch |

This question is asked **early** because the trigger model determines a separate infrastructure layer (scheduler / queue / listener / state persistence) that the rest of the skill **does not cover**. Catching it now prevents users from designing a beautiful council and discovering at implementation time that they have no answer for "how does it actually wake up?"

**Branching:**

| Mode | Skill behavior |
|------|----------------|
| **A** | Continue normally to Stage 3. The rest of this skill assumes sync request-response and is fully sufficient. |
| **B** | Continue, but flag in the design doc: "needs queue + idempotency keys + rate limiting layer — design separately." |
| **C** or **D** | Pause. The council/persona design from this skill applies, but the user also needs a **sensor/listener layer** (C) or **scheduler + state-persistence layer** (D), plus dedup, backpressure, and dead-letter handling. **These are out of scope here.** Continue with the council half if the user wants it; flag explicitly that the trigger/infra half needs separate design (future `agentic-platforms` skill). |

**Output of stage:** operational-mode tag (A/B/C/D) + trigger surface (HTTP endpoint / cron / queue subscriber / webhook listener / etc.) + list of out-of-scope infrastructure layers (if B/C/D).

**Anti-pattern to surface:** a "council that runs continuously" without addressing where the trigger comes from, how state persists between runs, or how to dedup signals. This is half a system. If the user can't name the trigger surface, they're not ready to build mode C or D — recommend they ship mode A first against a manual trigger, then upgrade.

### Stage 3 — Boundary identification

**Ask:**
> What can the system **not** touch? Existing safety guards, deterministic computations, regulatory constraints, segregation-of-duties.

A council blowing through these breaks the safety guarantees the rest of the system depends on (the yf-hazop V3 state-masking + V4 LOPA-independence lesson: agents consume *masked* slices and stop short of deterministic math).

**Listen for:** "LOPA math is pure-Python", "GAAP forbids the model from inventing account codes", "incident DB is read-only to the agent".

**Output:** explicit boundary list — input fields off-limits, outputs off-limits, deterministic ops not to replace.

**Branch:** if boundaries dominate (80%+ of the work is deterministic), warn that "agent" may be overkill — they may want a thin LLM cap on a deterministic core.

### Stage 4 — Real-agency-vs-workflow rubric (the FILTER)

**Ask all six. Score 1 point per yes:**

1. **Tool order is unknown at design time.** Does the model decide which tool to call next from observations, or is the sequence hardcoded in code/graph edges?
2. **Tool count varies per run.** Can the same input produce 2 tool calls in one run and 14 in another?
3. **The model can spawn new sub-agents at runtime.** Not pre-declared workers — actual runtime delegation of an unspecified subtask.
4. **Environmental feedback shapes the next decision.** Tool outputs influence the next step; the model isn't running a stale plan.
5. **A real sto
Files: 13
Size: 113.5 KB
Complexity: 63/100
Category: Design

Related in Design