Claude
Skills
Sign in
Back

discovery-to-determinism

Included with Lifetime
$97 forever

Put the bulk of acceptance coverage below the UI through a fast, deterministic headless client driving an operator seam, and reserve a surgical UI state-graph tier for defects that only manifest through the real GUI. Use when designing test/QA or acceptance-testing strategy, automating acceptance, end-to-end (E2E), or QA testing of a running app, deciding what to cover with fast headless tests vs slow UI/E2E, building agent-driven exploration or automation of a running app, building a below-UI operator seam (interaction layer) or headless client, or crystallising agent-discovered knowledge into reusable deterministic artifacts (maps, graphs, scripts, tests). Covers the Discovery⇄Determinism flywheel, the operator-seam architecture (one seam serving both a headless test client and AI-agent tools), and layered headless-first acceptance testing with a surgical UI state-graph tier for GUI-only defects.

Design

What this skill does


<objective>
There is an asymmetry at the heart of agent-driven engineering. **Discovery** — perceiving an unknown system, exercising judgement, handling the open-ended — is what *agents* are uniquely good at; it is also expensive and non-deterministic. **Exact repetition** — doing the same thing the same way, fast, every time — is what *code* is uniquely good at; it is cheap and deterministic but discovers nothing new.

This skill teaches how to run that asymmetry as a **flywheel** (discovery hardens into determinism; determinism makes the next discovery cheaper and sharper), and how to cash it out architecturally: by carving a clean **operator seam** below the UI so most behaviour is driven by fast deterministic code instead of the slow, flaky GUI — a seam that, built well, serves both a headless test client and AI-agent tools, because tests and agents are both *non-human operators* needing scriptable access to the real logic.

The principle leads; testing is its first worked embodiment. Keep it general; the worked example illustrates, it does not define.
</objective>

<quick_start>
1. **Run the applicability gate first** (`<when-this-applies>`) — and STOP if it fails (the UI *is* the logic, a programmatic seam already exists, the app is too small, or no failing test needs the seam yet).
2. **Default acceptance coverage below the UI** — drive the real logic, persistence, and backend through an **operator seam** with a fast, deterministic headless client; that tier carries the bulk.
3. **Extract the seam test-first** — RED before any seam code; reuse an existing API / service-layer / CLI / SDK / MCP in preference to inventing one.
4. **Reserve a small, surgical UI tier** (`references/ui-state-graph-edt.md`) only for defects that *only* surface through the real GUI; never let it re-test what headless covers.
5. **Crystallise every worthwhile discovery** into self-checking code as part of the same effort — an un-crystallised discovery is a missing feedback loop.
</quick_start>

<the-flywheel>
Run discovery and determinism as a loop with **two arrows**, not a one-way pipeline.

**Arrow 1 — Discovery → Determinism (crystallise).** An agent explores the unknown; the finding is hardened *immediately* into deterministic, self-checking code — a map, a script, a graph, a test, a recognizer. The artifact is a *cache of discovered structure*. A discovery left as prose or a transcript is a finding you will pay full price to discover again.

**Arrow 2 — Determinism → Discovery (bootstrap & target).** ← the half people miss. The crystallised artifacts make the *next* discovery cheaper, safer, and self-targeting:
1. **Launch from the frontier.** To explore new territory, deterministically *traverse the known map to its edge*, then explore only the delta — "arrive in seconds, explore the one new thing," not "redo the whole expensive prefix every run."
2. **A reliable harness, not flailing.** Deterministic *arrange* + *reset* gives exploration a repeatable scaffold: reach a precondition exactly, poke around, reset and re-arrive if lost. Discovery becomes a controlled experiment.
3. **Drift becomes a targeted discovery prompt.** When the system changes, a *precise, localised* failure in the deterministic layer ("expected marker X missing at step A→B") **is the instruction for where to re-discover.** Coarse failures cannot do this.
4. **Orientation keeps findings integrated.** A deterministic "where am I?" check lets an exploring agent locate itself against the known model, so new findings *slot into* the existing structure instead of forking a duplicate.

**The ratchet:** discovery produces determinism; determinism lowers the cost and raises the precision of the next discovery; repeat. This is a *model and a discipline*, not a measured law — there is no promised "cost falls by N%," and a **stale** crystallised map inverts the benefit until re-crystallised.

**Promode already runs this flywheel once.** The `CLAUDE.md`-rooted agent-knowledge graph *is* its first instance: a knowledge node is a crystallised discovery about the *repo*; "orient before you act" is the where-am-I check. Everything below aims the same loop at a *running app* instead of a codebase.
</the-flywheel>

<closing-the-loop>
**Why this is a loop, not two phases — and why it's the whole point.** The naive reading is "discover once, crystallise, then replay forever" — a one-way pipeline. The leverage is in wiring determinism's *output* back to inference's *input*: build the deterministic layer as an **instrument whose failures are designed to summon inference**, because a deterministic check that fails is asking a question only judgement can answer. Use each side for what only it can do:
- **Inference (agents) for discovery and judgement** — perceiving an unknown system, and deciding what a failure *means*. Expensive and non-deterministic; spend it where judgement is unavoidable.
- **Determinism (code) for efficient repetition** — replaying the known, fast and identically, for free. It discovers nothing, but it is the only thing that can *watch continuously at no cost*.

**When a crystallised artifact fails, it has asked a question. Triage it — only inference can, because code cannot know intent:**
1. **Flake** — the check itself is non-deterministic. Response: *eliminate the non-determinism* (pin time, seed RNG, isolate state, fix the unstable selector). A flaky deterministic check is worse than none — it trains everyone to ignore red; hardening it feeds more determinism back into the loop.
2. **Legitimate change** — the system moved on purpose and the artifact is now stale. Response: *re-discover the delta and re-crystallise* — update the map/recognizer/expected value so the deterministic layer tracks reality again. This is the flywheel turning: launch from the frontier, explore only what changed.
3. **Regression** — the system broke by accident. Response: *raise it* — the deterministic layer just did its job as a regression alarm.

This triage **is** the feedback channel, and it is why coverage compounds instead of rotting: every failure either **hardens** the suite (flake → more determinism), **advances** it (change → re-crystallise), or **protects** the system (regression → alarm). The fail-fast, localised-failure requirement exists to make the triage cheap — a precise break tells inference *where* to look and often *which* of the three it is; a vague "it went red" forces a fresh investigation every time, and a suite that fails imprecisely cannot drive its own repair.
</closing-the-loop>

<disciplines>
The methodology is not "use a graph." It is the set of disciplines that keep the loop turning:

- **Always crystallise.** Harden every worthwhile discovery into deterministic, version-controlled code *as part of the same effort*. An un-crystallised discovery is a missing feedback loop.
- **Explore from the frontier.** Forbid re-discovering already-mapped territory; new exploration begins by deterministically traversing the existing map to its edge. Applies identically to a repo (knowledge graph) and a running app (state graph).
- **Make determinism break precisely.** Localised, fail-fast errors are a *first-class build requirement*, not a test-quality nicety — the precise break is the re-discovery signal, and the thing that lets inference triage a failure cheaply (flake vs legitimate change vs regression — see `<closing-the-loop>`). Verify the property by perturbation (deliberately break one check; confirm it halts exactly there and reports precisely).
- **Keep the map orientable.** Maintain a cheap "where am I?" check and stable identifiers, so discoveries integrate rather than fork.
</disciplines>

<the-operator-seam>
**The architectural move that cashes the flywheel out.** When real logic sits behind a UI, most behaviour lives *below* the UI. Carve a clean **operator seam** there — an observable, scriptable interface to the real logic, persistence, and backend, with the GUI removed — and drive 

Related in Design