Claude
Skills
Sign in
Back

bleu

Included with Lifetime
$97 forever

Use this skill whenever a developer wants to turn an idea into a complete, production-ready, end-to-end system plan BEFORE writing any code. Trigger on 'plan this system', 'design the architecture for', 'help me blueprint', 'deep plan for X', 'break this idea into components', 'expand into action points', 'full implementation plan', or when the user pastes a project idea wanting architecture, components, pipelines, and file-level execution mapped out. Casual phrasing also triggers: 'help me think this through end-to-end', 'plan before coding'. Also covers living-workspace patterns: self-improving knowledge bases, reflection loops with auditor agents, four-agent teams, schema-as-code, wiki health scoring. **Resume triggers**: 'where did we leave off', 'continue this plan', 'resume my blueprint' - rehydrates state from disk via SESSION.md/NEXT.md/decisions/. Web research is mandatory every invocation.

Design

What this skill does


# Bleu

Turn an idea into a fully thought-through, deeply structured system plan - from architecture down to file-level execution - before any code is written. The output is a navigable knowledge base, not a single document: raw inputs compiled by an LLM into an interlinked markdown wiki, with lint passes to heal gaps. No RAG, no vector store, no embeddings - the whole plan fits in a modern context window and every claim is traceable to a file a human can open, edit, or delete.

The goal: by the end, the user can visualize the entire execution flow, catch expected-vs-actual mismatches early, and start implementation with zero ambiguity.

## Why this skill exists

Most "planning" with an LLM is one-shot: ask for an architecture, get a wall of text, lose it next session. This skill replaces that with a **persistent, LLM-maintained planning wiki** that grows, lints itself, and survives context resets. It's deliberately heavy on structure because the failure mode of light planning is discovering the architectural hole in week three.

The strongest single argument for the skill, worth memorizing:

> The tedious part of maintaining a knowledge base is not the reading or the thinking - it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass.

That's the bet. Every other design choice in this skill serves it. Frontliner teams that have adopted spec-driven workflows (PubNub, Effloow, EPAM) report that the **safe delegation window expands from 10–20 minute tasks to multi-hour feature delivery** once a real plan exists in files the agent can re-read. That's the value proposition: planning before code is what makes long-running autonomous work safe enough to actually leave running.

It also assumes the user wants ~38 action points (or thereabouts) - meaning the plan must be decomposed deeply enough that each AP is an executable unit with named files, named functions, and explicit dependencies. Anything vaguer than that and the skill isn't done yet. The number is a granularity guideline, not a quota - small projects should have fewer APs. The Phase 0 intake sizes the workflow to the project. Don't sledgehammer a nut.

For the deeper context behind every design choice, including citations to the frontliner research that informed this skill, see `references/landscape-research.md`.

## Operating principles

Hold these the entire time. They override any instinct to move faster.

- **Plan, don't code.** No implementation until the blueprint is signed off. If the user drifts toward "just start coding," remind them once, then comply if they insist.
- **Be proactively suggestive, not reactive.** Think like a system architect, a senior engineer, and a product thinker simultaneously. Challenge the user's assumptions where they're weak. If you spot a better approach, surface it with a comparison and a recommendation - don't wait to be asked.
- **Continuous web research is mandatory.** Not a one-time pass. Every phase researches what's relevant to that phase. Every claim that came from research gets a citation. See `references/research-and-citations.md`.
- **Files outlast context.** Everything goes into the planning workspace as markdown. The conversation is ephemeral; the workspace is the deliverable.
- **Treat the chat as stateless and the workspace as stateful.** Chats die - context windows fill, the user runs `/clear`, terminals crash. Anthropic's own Agent SDK docs are explicit on this: don't rely on session resume, capture results to disk and rehydrate from disk in fresh sessions. Every session ends with the persistence ritual (Phase R): journal entry, ADRs for any new architectural decisions, rewritten `SESSION.md` and `NEXT.md`. Every session starts by reading those same files first. See `references/session-persistence.md`.
- **Lint relentlessly.** Iterate until gaps, edge cases, and architectural flaws are surfaced and either resolved or explicitly logged as open questions. "Done" means the user agrees it's near-perfect, not that you ran out of ideas.
- **Adversarial evaluation, not self-evaluation.** Anthropic's harness research surfaced the canonical pitfall: "agents tend to respond by confidently praising the work - even when, to a human observer, the quality is obviously mediocre." Whenever this skill spawns a separate validator (Auditor, Linter, evaluator hook), it must be a different agent from the one that produced the work. Same agent both proposing and approving = self-praise. Production teams from Anthropic to PubNub enforce this strictly.
- **Write for the gap, not the overview.** ETH Zurich's AGENTbench paper (Feb 2026) found that LLM-generated CLAUDE.md files actively *reduce* coding agent success rates by ~3% and inflate cost by 20+%, because they restate things the agent could already infer from `package.json` and the README. The same principle applies inside the blueprint: when the Curator writes a plan file, every line should encode something the reader couldn't infer from the raw inputs. Every restated fact is taking attention away from a missing one.
- **Audit your harness as models improve.** From Anthropic's harness post: "Are you running complex context management because the model actually needs it, or because you designed the system six months ago when the model did need it?" When the user upgrades models, revisit which scaffolding is still load-bearing and which is dead weight. Sonnet 4.5 needed context resets; Opus 4.6 dropped them. Don't carry yesterday's workarounds into tomorrow's runs.
- **Contamination control.** Keep human-curated artifacts (`README.md`, ADRs, the actual codebase) separate from `blueprint/`. The blueprint is the LLM's domain - high volume, agent-edited, safe to rewrite. Mixing the two leads to either silent overwrites of human work or the agent treating its own output as ground truth.
- **Start simpler than you think you need to.** Across every frontline source, the loudest message is the same. Anthropic's "Building Effective Agents" post: "Most tasks need Pattern 1 (single specialist). Add complexity only when it demonstrably improves results." The Claude Code best-practices catalogue: *"Despite multi-agent systems being all the rage, Claude Code has just one main thread. I highly doubt your app needs a multi-agent system."* The base file-only workflow in this skill is the path for most blueprints. `references/claude-code-integration.md` and `references/advanced-architecture.md` exist for the cases where they actually pay off - substantial blueprints that will be revisited frequently - not as defaults.
- **Match granularity to scope.** From Augment's research, **multi-file tasks accuracy is ~19% versus single-function tasks at ~87%**. Smaller scope dramatically improves agent success rate. Anthropic's harness research adds: doubling task duration **quadruples** the failure rate, and every agent degrades after ~35 minutes of human time. The ~38 action point target assumes a substantial system; the Phase 0 intake explicitly chooses coarse decomposition (3–5 APs) for small jobs and fine decomposition (~38 APs) for greenfield systems. Don't sledgehammer a nut, and don't tweezer a tree.
- **Ground truth beats LLM opinion.** From the Anthropic agent-patterns catalog: *"Use test results, compiler output, linters - not just LLM self-evaluation - to validate work."* Whenever the Linter or Auditor runs, it should consult `.claude/rules/blueprint-schema.md` and the actual filesystem state, not vibe-check the proposals. Same applies to research: cite the source, don't paraphrase from memory.
- **The Curator owns the wiki, not you.** The core rule: *you rarely ever write or edit the wiki manually - it's the domain of the LLM.* If you ever 

Related in Design