Claude
Skills
Sign in
Back

kb-builder

Included with Lifetime
$97 forever

Build an Obsidian-compatible knowledge base from public web sources using the TinyFish CLI. Use this skill when a user wants a builder-grade markdown knowledge base on a technical topic, asks for a structured research vault, or wants a topic compiled from live public sources into interlinked markdown files. Supports two input modes: topic only, or topic plus starter URLs. Supports both first-build and update workflows. Always generates index.md, sources.md, audit.md, and manifest.json. Creates additional files only when the evidence supports them. The output must synthesize the topic into a usable mental model, not just summarize pages. Uses explicit tinyfish agent run commands and public web sources only. Optional `--trace` mode saves raw TinyFish outputs under `_trace/` for debugging.

AI Agents

What this skill does


# KB Builder

Build a topic-specific markdown knowledge base by using TinyFish to browse public web sources and extract structured evidence.

This skill is for **builder knowledge bases**, not personal journals and not direct code generation.

The output is a folder you can drop into Obsidian immediately, and update later without starting over.

## Core principle

Do not produce a pile of source summaries.

The KB should help the reader understand:

- the core mental model
- the main approaches or schools of thought
- what is foundational vs derivative
- what actually matters
- what is unresolved
- what to read first if they want genuine understanding

If the output only says what each source said, the skill has failed.

## Pre-flight check

Run both checks before any TinyFish call:

```bash
which tinyfish && tinyfish --version || echo "TINYFISH_CLI_NOT_INSTALLED"
tinyfish auth status
```

If TinyFish is not installed, stop and tell the user:

```bash
npm install -g @tiny-fish/cli
```

If TinyFish is not authenticated, stop and tell the user:

```bash
tinyfish auth login
```

Do not continue until both checks pass.

## Scope

- **Allowed:** public web pages, public GitHub repos, public papers, public docs, public datasets, public blog posts
- **Not allowed:** private sources, local private files, authenticated dashboards, chat logs, email, Slack, or anything the user cannot access publicly
- **Primitive:** use explicit `tinyfish agent run` commands
- **Output shape:** always `index.md`, `sources.md`, `audit.md`, and `manifest.json`; everything else is dynamic

## Input modes

You support two modes:

1. **Topic only**
   - Example: `Build me a knowledge base on web agent frameworks`
2. **Topic + starter URLs**
   - Example: `Build me a knowledge base on web agent frameworks and start from these URLs: ...`
3. **Update an existing KB**
   - Example: `Update my knowledge base on Kolmogorov-Arnold Networks with these new URLs: ...`
4. **Trace mode**
   - Example: `Build me a knowledge base on browser agents --trace`

If the topic is missing, ask for it before proceeding.

If starter URLs are present:
- use them first
- deduplicate them
- keep only public URLs

If the user explicitly says `update`, `refresh`, `add these sources`, or clearly wants to add to an existing KB, switch into update mode.

If the user includes `--trace`, `trace`, `debug`, or explicitly asks for raw outputs:
- enable trace mode
- save raw TinyFish outputs under `_trace/`
- keep `_trace/` out of the main page navigation unless the user asks for it

## Output directory

Create a folder named:

```text
kb-{topic-slug}/
```

Examples:
- `kb-web-agent-frameworks/`
- `kb-kolmogorov-arnold-networks/`
- `kb-landing-page-design-patterns/`

When trace mode is enabled, also create:

```text
kb-{topic-slug}/_trace/
```

## Always-generated files

### `index.md`

This file is always required. It should contain:
- a short topic overview
- what the knowledge base covers
- a list of generated pages using `[[wikilinks]]`
- 3-7 key takeaways
- open questions or evidence gaps
- a **mental model** section
- a **what matters** section
- a **reading order** section for the strongest sources or pages

### `sources.md`

This file is always required. It should log **every URL visited** with:
- stable source ID
- timestamp
- URL
- source label
- reason it was opened
- result status: useful, partial, irrelevant, blocked, or conflicting

Use ISO 8601 timestamps.

Each source entry must use a stable source ID such as `S001`, `S002`, `S003`.

Example:

```markdown
## [S001] 2026-04-06T08:49:24.014Z | useful

- URL: https://example.com
- Label: Official docs
- Reason opened: discovery pass for {TOPIC}
- Notes: yielded 4 good follow-up links
```

### `audit.md`

This file is always required. It is the trust layer for the KB.

It must contain four sections:

- `FOUND`
- `INFERRED`
- `CONFLICTING`
- `MISSING`

Example:

```markdown
# Audit

## FOUND
- [FOUND | S003] Pikachu is an Electric-type Mouse Pokemon.

## INFERRED
- [INFERRED | S003,S004] Pikachu's mascot role is reinforced across both official canon and encyclopedia framing.

## CONFLICTING
- [CONFLICTING | S004,S009] Source A says X while source B frames Y.

## MISSING
- [MISSING] No dedicated benchmark source was read in this run.
```

Rules:

- `FOUND` requires at least one direct source ID
- `INFERRED` should usually reference at least two source IDs
- `CONFLICTING` must name the disagreement explicitly
- `MISSING` should be used whenever the KB lacks evidence rather than hand-waving

### `manifest.json`

This file is always required. It stores:

- topic
- topic slug
- build or update mode
- created timestamp
- last updated timestamp
- page list
- run history
- simple run bookkeeping like URLs visited and pages generated
- whether trace mode was enabled

## Dynamic files

Do **not** hardcode a fixed set like `papers.md` or `repos.md` for every topic.

Create additional files only when the topic actually supports them.

Common examples:
- `papers.md`
- `repos.md`
- `docs.md`
- `articles.md`
- `datasets.md`
- `benchmarks.md`
- `people.md`
- `glossary.md`
- `timeline.md`
- `landscape.md`
- `reading-order.md`
- `disagreements.md`
- `what-matters.md`

Rules:
- if a category has meaningful evidence, create its file
- if it does not, skip it
- do not create empty placeholder files
- if a category only has 1-2 minor findings, fold it into `index.md` instead
- create `updates.md` when the KB is refreshed in update mode
- if the topic is broad enough to have multiple camps, phases, or implementation styles, create `landscape.md`
- if the sources disagree in meaningful ways, create `disagreements.md`
- if the reader would benefit from a guided path, create `reading-order.md`

All generated markdown files should use `[[wikilinks]]` when linking to other local pages.

Trace mode exception:
- files under `_trace/` are debugging artifacts, not user-facing KB pages
- do not clutter `index.md` with `_trace/` links unless the user explicitly asks

## Operating model

Use a **two-pass workflow**:

1. **Discovery pass**
   - find high-value URLs
2. **Reading pass**
   - extract structured information from the selected URLs

Use **one TinyFish run per URL**.

Do not ask one TinyFish agent to cover multiple independent sites in a single command.

Run independent URLs in parallel where possible using background jobs and `wait`.

## Step 0 — Decide build mode

Determine whether this run is:

- `build` — creating a KB from scratch
- `update` — adding or refreshing sources in an existing KB

Also determine:

- `TRACE` = `true` or `false`

Use `update` mode when:

- the user explicitly says update or refresh
- the target KB folder already exists and the user's intent is additive

In update mode:

- read the existing `index.md`
- read the existing `sources.md`
- read the existing `audit.md`
- read the existing `manifest.json`
- do not renumber old source IDs
- only rewrite the pages whose evidence changed

## Step 1 — Normalize the task

Write down:
- `TOPIC`
- `TOPIC_SLUG`
- `STARTER_URLS` if provided
- `MODE` = `build` or `update`
- `TRACE` = `true` or `false`

Keep the topic human-readable in the markdown output.

## Step 2 — Build the starting URL set

If the user gave starter URLs:
- start with those

If a starter URL is a direct arXiv paper page such as `/abs/...`, `/pdf/...`, or an arXiv HTML render:
- treat it as a **reading target**
- do not send it through the discovery-search workflow first

Then expand with a small set of public discovery URLs relevant to the topic. Choose from these patterns when relevant:

- GitHub repo search:
  ```text
  https://github.com/search?q={TOPIC}&type=repositories
  ```
- arXiv search:
  ```text
  https://arxiv.org/search/?query={TOPIC}&searchtype=all
  ```
- Hugging Face models search:
  ```text
  https://huggingface.co/models?search={TOPIC}
  ```
- Hugging Face datasets search:
  ```text
  https://huggingface.co/datasets?search={TOPIC
Files: 2
Size: 25.6 KB
Complexity: 37/100
Category: AI Agents

Related in AI Agents