Claude
Skills
Sign in
Back

quality-playbook

Included with Lifetime
$97 forever

Run a complete quality engineering audit on any codebase. Derives behavioral requirements from the code, generates spec-traced functional tests, runs a three-pass code review with regression tests, executes a multi-model spec audit (Council of Three), and produces a consolidated bug report with TDD-verified patches. Finds the 35% of real defects that structural code review alone cannot catch. Works with any language. Trigger on 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', or 'coverage theater'.

Security

What this skill does


# Quality Playbook Generator

## Plan Overview — read this first, then explain it to the user

Before reading any other section of this skill, understand the plan and its dependencies. Each phase produces artifacts that the next phase depends on. Skipping or rushing a phase means every downstream phase works from incomplete information.

**Phase 0 (Prior Run Analysis):** If previous quality runs exist, load their findings as seed data. This is automatic and only applies to re-runs.

**Phase 1 (Explore):** Run the v1.5.3 documentation intake first (`python -m bin.reference_docs_ingest <target>` to walk `reference_docs/` — `cite/` files produce `quality/formal_docs_manifest.json` records; top-level files are loaded as Tier 4 context via `reference_docs_ingest.load_tier4_context(<target>)`). Then explore the codebase in three stages: open exploration driven by domain knowledge, domain-knowledge risk analysis, and selected structured exploration patterns. Write all findings to `quality/EXPLORATION.md`. This file is the foundation — Phase 2 reads it as its primary input.

**Phase 2 (Generate):** Read EXPLORATION.md and produce the quality artifacts: requirements, constitution, functional tests, code review protocol, integration tests, spec audit protocol, TDD protocol. (`AGENTS.md` at the target's repo root is generated by the orchestrator AFTER Phase 6, not by you in Phase 2 — see "File 6" below for the contract.)

**Phase 3 (Code Review):** Run the three-pass code review against HEAD. Write regression tests for every confirmed bug. Generate patches.

**Phase 4 (Spec Audit):** Three independent AI auditors review the code against requirements. Triage with verification probes. After triage, the same Council runs the v1.5.3 Layer-2 semantic citation check — one prompt per reviewer, structured per-REQ verdicts for every Tier 1/2 citation, output to `quality/citation_semantic_check.json`. Write regression tests for net-new findings.

**Phase 5 (Reconciliation):** Close the loop — every bug from code review and spec audit is tracked, regression-tested or explicitly exempted. Run TDD red-green cycle. Finalize the completeness report.

**Phase 6 (Verify):** Run self-check benchmarks against all generated artifacts. Check for internal consistency, version stamp correctness, and convergence.

**Phase 7 (Present, Explore, Improve):** Present results to the user with a scannable summary table, offer drill-down on any artifact, and provide a menu of improvement paths (iteration strategies, requirement refinement, integration test tuning). This is the interactive phase where the user takes ownership of the quality system.

Every bug found traces back to a requirement, and every requirement traces back to an exploration finding.

**The critical dependency chain:** Exploration findings → EXPLORATION.md → Requirements → Code review + Spec audit → Bug discovery. A shallow exploration produces abstract requirements. Abstract requirements miss bugs. The exploration phase is where bugs are won or lost.

**MANDATORY FIRST ACTION:** After reading and understanding the plan above, print the following message to the user, then explain the plan in your own words — what you'll do, what each phase produces, and why the exploration phase matters most. Emphasize that exploration starts with open-ended domain-driven investigation, followed by domain-knowledge risk analysis that reasons about what goes wrong in systems like this, then supplemented by selected structured patterns. Do not copy the plan verbatim; paraphrase it to demonstrate understanding.

> Quality Playbook v1.5.6 — by Andrew Stellman
> https://github.com/andrewstellman/quality-playbook

Generate a complete quality system tailored to a specific codebase. Unlike test stub generators that work mechanically from source code, this skill explores the project first — understanding its domain, architecture, specifications, and failure history — then produces a quality playbook grounded in what it finds.

## How to run this — v1.5.4 self-encoded invocation contract

If the operator hands you this skill (or points you at any QPB-installed target) and says **"Run the Quality Playbook"** — possibly with a hint like "this is a bootstrap run" or "run on itself" or "self-audit" — this section tells you exactly what to do. The operator should not need to provide additional instructions; the canonical invocation, the defaults, the guardrails, and the output contract all live here.

### Pick your execution mode

QPB ships in two execution shapes. Pick the one that matches your runtime — the wrong choice produces the codex-on-codex indirection pathology surfaced by the 2026-04-30 bootstrap test.

| Mode | When this is you | What you do |
|------|------------------|-------------|
| **A. Skill-direct (UI-context)** | You are a coding agent (Claude Code, Cursor, Copilot, Codex desktop, etc.) handed this skill in your own chat. Your runtime IS the reasoning loop — you read files, you write files, you decide. | Walk through Phase 1 → Phase 6 yourself using the externalized phase prompts in `phase_prompts/`. Write artifacts into the target's `quality/` directory directly. No subprocess, no runner. |
| **B. Runner-driven (CLI-automation)** | The operator is invoking `python3 -m bin.run_playbook` deliberately — to batch across multiple targets, drive a headless CI run, or fan out per-phase work to a different model than the one reading this prose. | The orchestrator spawns a CLI agent (`claude`, `copilot`, `codex`, or `cursor`) per phase. You (or whoever is reading this) are the operator-side control loop, not the per-phase reasoner. |

**Both modes use the same phase prompt content** — the `phase_prompts/*.md` files at the repo root are the single source of truth, loaded by `bin/run_playbook.py::_load_phase_prompt` and read directly by Mode A walkthroughs. The only thing the two modes differ on is WHO drives — you (Mode A) or the orchestrator subprocess-spawning a CLI agent (Mode B).

**When in doubt, default to Mode A.** If the operator wanted runner-driven invocation they would have run the runner themselves; if they pasted "Run the Quality Playbook" into your chat, they want you to drive. The Mode B section below tells you what to do *if* the operator explicitly invokes the runner.

### Mode A — skill-direct walkthrough (UI-context)

The operator's prompt is just **"Run the Quality Playbook"** (or "run on itself", "self-audit", etc.). You drive every phase inline.

For each phase 1..6, in order:

1. **Load the phase prompt.** Read `phase_prompts/phaseN.md` (resolve via the same install-location fallback list documented for `references/` below). For `phase1.md`, substitute `{seed_instruction}` (the prelude that says "skip Phase 0/0b" — empty string when seeds are allowed) and `{role_taxonomy}` (the taxonomy block rendered from the role taxonomy below). For `phase2.md` through `phase6.md`, the file is pure-literal — read it verbatim.
2. **Execute the phase per the prompt.** Read the inputs the prompt names, do the analysis, write outputs into the target's `quality/` directory.
3. **STOP at the end-of-phase boundary.** Every phase prompt ends with an "IMPORTANT: Do NOT proceed to Phase N+1" instruction. Honor it. The operator advances to the next phase by saying so.

You are responsible — without the orchestrator's structural backstop — for the same source-unchanged invariant the runner enforces: **do NOT modify any file outside the target's `quality/` directory**. In Mode B the gate would catch this; in Mode A you are the gate. The 2026-04-30 bootstrap test specifically failed on a Phase 2 LLM modifying the target's root `AGENTS.md` — the same failure mode applies in Mode A.

For the bootstrap-run (self-audit) variant of Mode A, see "Bootstrap mode" below — the only delta is that the target IS the QPB repo, so cite the same `phase_prompts/` files you read from.

#### Mode A scope — what's covered, what's Mode-B-only

Council 2026-04-30 P1-
Files: 31
Size: 769.7 KB
Complexity: 75/100
Category: Security

Related in Security