Claude
Skills
Sign in
Back

swain-search

Included with Lifetime
$97 forever

Trove collection and normalization for swain-design artifacts. Collects sources from the web, local files, and media (video/audio), normalizes them to markdown, and caches them in reusable troves. Use when researching a topic for a spike, ADR, vision, or any artifact that needs structured research. Also use to refresh stale troves or extend existing ones with new sources. Triggers on: 'research X', 'gather sources for', 'compile research on', 'search for sources about', 'refresh the trove', 'find existing research on X', or when swain-design needs research inputs for a spike or ADR.

Designscripts

What this skill does

<!-- swain-model-hint: opus, effort: high -->

# swain-search

Collect, normalize, and cache source materials into reusable troves that swain-design artifacts can reference.

## Script invocation convention

Scripts live under this skill's `scripts/` directory. Use the `<SKILL_DIR>` placeholder to mean the folder holding this SKILL.md. Resolve it at run time. In an installed skill, that is `.claude/skills/swain-search/`. In the swain repo, it is `skills/swain-search/`.

Run the bootstrap once per session before the media or X-thread flows:

```bash
bash "<SKILL_DIR>/scripts/bootstrap.sh"
```

The script checks that `uv` is on `PATH`. After the first run, a marker file at `~/.local/share/swain-search/.bootstrapped` short-circuits later runs. If it exits non-zero, stop and tell the operator what is missing.

## Mode detection

| Signal | Mode |
|--------|------|
| No trove exists for the topic, or user says "research X" / "gather sources" | **Create** — new trove |
| Trove exists and user provides new sources or says "add to" / "extend" | **Extend** — add sources to existing trove |
| Trove exists and user says "refresh" or sources are past TTL | **Refresh** — re-fetch stale sources |
| User asks "what troves do we have" or "find sources about X" | **Discover** — search existing troves by tag |

## Prior art check

Before creating a new trove or running web searches, scan existing troves for relevant content. This avoids duplicating research and surfaces connections to prior work.

### Phase 1 — Literal keyword match

Search for the source name, URL fragments, and author name:

```bash
# Search trove manifests by tag
grep -rl "<keyword>" docs/troves/*/manifest.yaml 2>/dev/null

# Search trove source content
grep -rl "<keyword>" docs/troves/*/sources/**/*.md 2>/dev/null

# Search trove syntheses
grep -rl "<keyword>" docs/troves/*/synthesis.md 2>/dev/null
```

### Phase 2 — Semantic topic match

After fetching the source and understanding what it's about, extract 3-5 topic keywords from the source's *content* (not just its name or URL). Then search existing troves by topic:

```bash
# Search trove tags for topic keywords
grep -l "<topic-keyword-1>\|<topic-keyword-2>\|<topic-keyword-3>" docs/troves/*/manifest.yaml 2>/dev/null

# Search synthesis summaries for topic keywords
grep -l "<topic-keyword-1>\|<topic-keyword-2>\|<topic-keyword-3>" docs/troves/*/synthesis.md 2>/dev/null
```

Topic keywords should describe what the source *is about*, not what it's *called*. For example, a repo named "Cog" that implements a memory system for Claude Code should generate topic keywords like `agent-memory`, `memory-architecture`, `claude-code`, `persistent-memory` — not `cog` or `marciopuga`.

If the source has not been fetched yet (URL-only invocation), use whatever topic information is available from the URL or title and defer full topic matching until after the source is fetched.

### Decision gate

Before proceeding to Create or Extend mode, output a visible routing decision:

> **Prior art check:** Phase 1 found [N matches / no matches]. Phase 2 found [N matches / no matches]: [trove-id (tags: x, y), ...].
> **Decision:** Extending [trove-id] / Creating new trove [slug] because [reason].

This makes the trove routing decision auditable. If any trove matches on 2+ topic keywords, default to Extend mode unless the topic is genuinely distinct (adjacent but different subject matter).

### Action on matches

If existing troves contain relevant sources:
1. **Report what was found** — show the trove ID, matching source titles, and relevant excerpts
2. **Suggest extend over create** — if an existing trove covers the same topic, extend it rather than creating a parallel trove
3. **Cross-link** — if the topic is adjacent but distinct, create a new trove but note the related trove in synthesis.md

This step runs in all modes (Create, Extend, Discover) and before any web searches. Existing trove content is always checked first.

## Snapshot evidence gate (SPEC-220)

Before a remote source can be treated as collected evidence, the run must produce a raw snapshot and a metadata ledger entry in `.agents/search-snapshots/metadata.jsonl`.

Required flow for remote sources:
1. Export/download the raw snapshot first:
   - `bash "<SKILL_DIR>/scripts/export-snapshot.sh" --url "<source-url>" --out-dir ".agents/search-snapshots/raw"`
2. Normalize the downloaded file using `writing-skills` or `skill-creator` (never summary-only browser notes).
3. Log metadata:
   - `bash "<SKILL_DIR>/scripts/log-snapshot-metadata.sh" --source-url "<source-url>" --export-mode "<mode>" --raw-path "<raw-path>" --normalized-path "<normalized-path>" --normalization-skill "<writing-skills|skill-creator>"`
4. Verify before publication:
   - `bash "<SKILL_DIR>/scripts/verify-snapshot-evidence.sh" --source-url "<source-url>"`

If verification fails, mark the source unverified, do not publish it downstream, and report the warning to the operator.

## Create mode

Build a new trove from scratch.

### Step 1 — Gather inputs

Ask the user (or infer from context) for:

1. **Trove ID** — a slug for the topic (e.g., `websocket-vs-sse`). Suggest one if the context is clear.
2. **Tags** — keywords for discovery (e.g., `real-time`, `websocket`, `sse`)
3. **Sources** — any combination of:
   - Web search queries ("search for WebSocket vs SSE comparisons")
   - URLs (web pages, forum threads, docs)
   - Video/audio URLs
   - Local file paths
4. **Freshness TTL overrides** — optional, defaults are fine for most troves

If invoked from swain-design (e.g., spike entering Active), the artifact context provides the topic, tags, and sometimes initial sources.

### Step 2 — Collect and normalize

For each source, use the appropriate capability. Read `references/normalization-formats.md` for the exact markdown structure per source type.

**Web search queries:**
1. Use a web search capability to find relevant results
2. Select the top 3-5 most relevant results
3. For each: fetch the page, normalize to markdown per the web page format
4. If no web search capability is available, tell the user and skip

**Web page URLs:**
1. Fetch the page using a browser or page-fetching capability
2. Strip boilerplate (nav, ads, sidebars, cookie banners)
3. Normalize to markdown per the web page format
4. If fetch fails, record the URL in manifest with a `failed: true` flag and move on

**Sites needing authentication (cookie support):**

When a target site requires authentication or subscription access, supply browser-exported cookies:

1. Export cookies from a browser where you are logged in to the target site:
   - Firefox: use the "cookies.txt" extension, or export from DevTools → Storage → Cookies → right-click → "Export All".
   - Chrome: use the "EditThisCookie" extension or DevTools → Application → Cookies → right-click → "Export".
   - Any tool that produces a JSON array of cookie objects with `Host raw`, `Name raw`, `Content raw`, `Path raw`, `Expires raw`, `Send for raw`, and `This domain only raw` fields.

2. Pass the exported JSON file to `export-snapshot.sh`:
   ```bash
   bash "<SKILL_DIR>/scripts/export-snapshot.sh" \
     --url "<source-url>" \
     --out-dir ".agents/search-snapshots/raw" \
     --cookies "path/to/cookies.json"
   ```
   The `--cookies` flag triggers conversion to Netscape format via `convert-cookies.py` and attaches the cookie jar to the curl request. The export mode is recorded as `<mode>-with-cookies`.

3. If no cookies are provided, the fetch proceeds without authentication (same behaviour as before).

4. For the browser-based page-fetching path (MCP browser tools), cookies from the browser's own session are available automatically — no explicit cookies file is needed. This flag is useful for curl-based snapshot export.

**Google Docs / Drive-like documents:**
1. Export raw content first (required):
   - `bash "<SKILL_DIR>/scripts/export-snapshot.sh" --url "<source-url>" --out-dir ".agents/search-snapshots/raw"`
Files: 24
Size: 137.1 KB
Complexity: 90/100
Category: Design

Related in Design