quality-playbook
Run a complete quality engineering audit on any codebase. Derives behavioral requirements from the code, generates spec-traced functional tests, runs a three-pass code review with regression tests, executes a multi-model spec audit (Council of Three), and produces a consolidated bug report with TDD-verified patches. Finds the 35% of real defects that structural code review alone cannot catch. Works with any language. Trigger on 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', or 'coverage theater'.
What this skill does
# Quality Playbook Generator
## Plan Overview — read this first, then explain it to the user
Before reading any other section of this skill, understand the plan and its dependencies. Each phase produces artifacts that the next phase depends on. Skipping or rushing a phase means every downstream phase works from incomplete information.
**Phase 0 (Prior Run Analysis):** If previous quality runs exist, load their findings as seed data. This is automatic and only applies to re-runs.
**Phase 1 (Explore):** Run the v1.5.3 documentation intake first (`python -m bin.reference_docs_ingest <target>` to walk `reference_docs/` — `cite/` files produce `quality/formal_docs_manifest.json` records; top-level files are loaded as Tier 4 context via `reference_docs_ingest.load_tier4_context(<target>)`). Then explore the codebase in three stages: open exploration driven by domain knowledge, domain-knowledge risk analysis, and selected structured exploration patterns. Write all findings to `quality/EXPLORATION.md`. This file is the foundation — Phase 2 reads it as its primary input.
**Phase 2 (Generate):** Read EXPLORATION.md and produce the quality artifacts: requirements, constitution, functional tests, code review protocol, integration tests, spec audit protocol, TDD protocol. (`AGENTS.md` at the target's repo root is generated by the orchestrator AFTER Phase 6, not by you in Phase 2 — see "File 6" below for the contract.)
**Phase 3 (Code Review):** Run the three-pass code review against HEAD. Write regression tests for every confirmed bug. Generate patches.
**Phase 4 (Spec Audit):** Three independent AI auditors review the code against requirements. Triage with verification probes. After triage, the same Council runs the v1.5.3 Layer-2 semantic citation check — one prompt per reviewer, structured per-REQ verdicts for every Tier 1/2 citation, output to `quality/citation_semantic_check.json`. Write regression tests for net-new findings.
**Phase 5 (Reconciliation):** Close the loop — every bug from code review and spec audit is tracked, regression-tested or explicitly exempted. Run TDD red-green cycle. Finalize the completeness report.
**Phase 6 (Verify):** Run self-check benchmarks against all generated artifacts. Check for internal consistency, version stamp correctness, and convergence.
**Phase 7 (Present, Explore, Improve):** Present results to the user with a scannable summary table, offer drill-down on any artifact, and provide a menu of improvement paths (iteration strategies, requirement refinement, integration test tuning). This is the interactive phase where the user takes ownership of the quality system.
Every bug found traces back to a requirement, and every requirement traces back to an exploration finding.
**The critical dependency chain:** Exploration findings → EXPLORATION.md → Requirements → Code review + Spec audit → Bug discovery. A shallow exploration produces abstract requirements. Abstract requirements miss bugs. The exploration phase is where bugs are won or lost.
**MANDATORY FIRST ACTION:** After reading and understanding the plan above, print the following message to the user, then explain the plan in your own words — what you'll do, what each phase produces, and why the exploration phase matters most. Emphasize that exploration starts with open-ended domain-driven investigation, followed by domain-knowledge risk analysis that reasons about what goes wrong in systems like this, then supplemented by selected structured patterns. Do not copy the plan verbatim; paraphrase it to demonstrate understanding.
> Quality Playbook v1.5.6 — by Andrew Stellman
> https://github.com/andrewstellman/quality-playbook
Generate a complete quality system tailored to a specific codebase. Unlike test stub generators that work mechanically from source code, this skill explores the project first — understanding its domain, architecture, specifications, and failure history — then produces a quality playbook grounded in what it finds.
## How to run this — v1.5.4 self-encoded invocation contract
If the operator hands you this skill (or points you at any QPB-installed target) and says **"Run the Quality Playbook"** — possibly with a hint like "this is a bootstrap run" or "run on itself" or "self-audit" — this section tells you exactly what to do. The operator should not need to provide additional instructions; the canonical invocation, the defaults, the guardrails, and the output contract all live here.
### Pick your execution mode
QPB ships in two execution shapes. Pick the one that matches your runtime — the wrong choice produces the codex-on-codex indirection pathology surfaced by the 2026-04-30 bootstrap test.
| Mode | When this is you | What you do |
|------|------------------|-------------|
| **A. Skill-direct (UI-context)** | You are a coding agent (Claude Code, Cursor, Copilot, Codex desktop, etc.) handed this skill in your own chat. Your runtime IS the reasoning loop — you read files, you write files, you decide. | Walk through Phase 1 → Phase 6 yourself using the externalized phase prompts in `phase_prompts/`. Write artifacts into the target's `quality/` directory directly. No subprocess, no runner. |
| **B. Runner-driven (CLI-automation)** | The operator is invoking `python3 -m bin.run_playbook` deliberately — to batch across multiple targets, drive a headless CI run, or fan out per-phase work to a different model than the one reading this prose. | The orchestrator spawns a CLI agent (`claude`, `copilot`, `codex`, or `cursor`) per phase. You (or whoever is reading this) are the operator-side control loop, not the per-phase reasoner. |
**Both modes use the same phase prompt content** — the `phase_prompts/*.md` files at the repo root are the single source of truth, loaded by `bin/run_playbook.py::_load_phase_prompt` and read directly by Mode A walkthroughs. The only thing the two modes differ on is WHO drives — you (Mode A) or the orchestrator subprocess-spawning a CLI agent (Mode B).
**When in doubt, default to Mode A.** If the operator wanted runner-driven invocation they would have run the runner themselves; if they pasted "Run the Quality Playbook" into your chat, they want you to drive. The Mode B section below tells you what to do *if* the operator explicitly invokes the runner.
### Mode A — skill-direct walkthrough (UI-context)
The operator's prompt is just **"Run the Quality Playbook"** (or "run on itself", "self-audit", etc.). You drive every phase inline.
For each phase 1..6, in order:
1. **Load the phase prompt.** Read `phase_prompts/phaseN.md` (resolve via the same install-location fallback list documented for `references/` below). For `phase1.md`, substitute `{seed_instruction}` (the prelude that says "skip Phase 0/0b" — empty string when seeds are allowed) and `{role_taxonomy}` (the taxonomy block rendered from the role taxonomy below). For `phase2.md` through `phase6.md`, the file is pure-literal — read it verbatim.
2. **Execute the phase per the prompt.** Read the inputs the prompt names, do the analysis, write outputs into the target's `quality/` directory.
3. **STOP at the end-of-phase boundary.** Every phase prompt ends with an "IMPORTANT: Do NOT proceed to Phase N+1" instruction. Honor it. The operator advances to the next phase by saying so.
You are responsible — without the orchestrator's structural backstop — for the same source-unchanged invariant the runner enforces: **do NOT modify any file outside the target's `quality/` directory**. In Mode B the gate would catch this; in Mode A you are the gate. The 2026-04-30 bootstrap test specifically failed on a Phase 2 LLM modifying the target's root `AGENTS.md` — the same failure mode applies in Mode A.
For the bootstrap-run (self-audit) variant of Mode A, see "Bootstrap mode" below — the only delta is that the target IS the QPB repo, so cite the same `phase_prompts/` files you read from.
#### Mode A scope — what's covered, what's Mode-B-only
Council 2026-04-30 P1-Related in Security
mac-ops
IncludedComprehensive macOS workstation operations — diagnose kernel panics, identify failing drives, audit launchd startup items, decode wake reasons, triage TCC permission denials, manage APFS snapshots, recover from no-boot. Use for: Mac is slow, slow bootup, won't boot, kernel panic, kernel_task hot, mds_stores CPU, photoanalysisd, cloudd, login loop, gray screen, sleep wake failure, drive failing, IO errors, APFS snapshots eating space, Time Machine local snapshots, Spotlight indexing, launchd, LaunchAgent, LaunchDaemon, login items, TCC permissions, Full Disk Access, Screen Recording denied, Gatekeeper, quarantine, com.apple.quarantine, app is damaged, helper tool, /Library/PrivilegedHelperTools, pmset, wake reasons, dark wake, sysdiagnose, panic.ips, DiagnosticReports, configuration profile, MDM profile, remote diagnostics over SSH.
a11y-audit
IncludedRun accessibility audits on web projects combining automated scanning (axe-core, Lighthouse) with WCAG 2.1 AA compliance mapping, manual check guidance, and structured reporting. Output is configurable: markdown report only, markdown plus machine-readable JSON, or markdown plus issue tracker integration. Use this skill whenever the user mentions "accessibility audit", "a11y audit", "WCAG audit", "accessibility check", "compliance scan", or asks to check a web project for accessibility issues. Also trigger when the user wants to verify WCAG conformance or map findings to a specific standard (CAN-ASC-6.2, EN 301 549, ADA/AODA).
erpclaw
IncludedAI-native ERP system with self-extending OS. Full accounting, invoicing, inventory, purchasing, tax, billing, HR, payroll, advanced accounting (ASC 606/842, intercompany, consolidation), and financial reporting. 413 actions across 14 domains, 43 expansion modules. Constitutional guardrails, adversarial audit, schema migration. Double-entry GL, immutable audit trail, US GAAP.
assess
IncludedAssesses and rates quality 0-10 across multiple dimensions (correctness, maintainability, security, performance, testability, simplicity) with pros/cons analysis. Compares against project conventions and prior decisions from memory. Produces structured evaluation reports with actionable improvement suggestions. Use when evaluating code, designs, architectures, or comparing alternative approaches.
spring-boot-security-jwt
IncludedProvides JWT authentication and authorization patterns for Spring Boot 3.5.x covering token generation with JJWT, Bearer/cookie authentication, database/OAuth2 integration, and RBAC/permission-based access control using Spring Security 6.x. Use when implementing authentication or authorization in Spring Boot applications.
code-hardcode-audit
IncludedDetect hardcoded values, magic numbers, and leaked secrets. TRIGGERS - hardcode audit, magic numbers, PLR2004, secret scanning.