darwinian-evolver

Included with Lifetime

$97 forever

Evolve prompts/regex/SQL/code with Imbue's evolution loop.

Backend & APIsscripts

What this skill does


# Darwinian Evolver

Run Imbue's [darwinian_evolver](https://github.com/imbue-ai/darwinian_evolver) — an
LLM-driven evolutionary search loop — to optimize a **prompt, regex, SQL query,
or small code snippet** against a fitness function.

Status: thin wrapper around the upstream tool. The skill installs it, walks the
agent through writing a `Problem` definition (organism + evaluator + mutator),
and drives the loop via the upstream CLI or a small custom Python driver.

**License:** the upstream tool is **AGPL-3.0**. The skill ONLY ever invokes it
via the upstream CLI or a `subprocess`/`uv run` call (mere aggregation). Do NOT
import upstream classes into Hermes itself.

## When to Use

- User says "optimize this prompt", "evolve a regex for X", "auto-improve this
  code/SQL", "search for a better instruction".
- You have a scorer (exact match, regex pass-rate, unit test, LLM-judge, runtime
  metric) AND a starting candidate (organism). If you don't have a scorer, stop
  and define one first — that's the hard part.
- Cost is OK: a typical run is 50–500 LLM calls. On gpt-4o-mini that's pennies;
  on Claude Sonnet it can be a few dollars.

Do **not** use this when:
- The optimization target is differentiable (use gradient descent / DSPy).
- You only need to try 2–3 variants — just write them by hand.
- The fitness signal is purely subjective with no measurable criterion.

## Prerequisites

- Python ≥3.11
- `git`, `uv` (or `pip`)
- One of: `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, or `OPENAI_API_KEY`

The skill ships a small `parrot_openrouter.py` driver that uses `OPENROUTER_API_KEY`
via the OpenAI SDK, so any model on OpenRouter works. The upstream CLI itself
hardcodes Anthropic and needs `ANTHROPIC_API_KEY`.

## Install (One-Time)

Run via the `terminal` tool:

```bash
mkdir -p ~/.hermes/cache/darwinian-evolver && cd ~/.hermes/cache/darwinian-evolver
[ -d darwinian_evolver ] || git clone --depth 1 https://github.com/imbue-ai/darwinian_evolver.git
cd darwinian_evolver && uv sync
```

Verify:

```bash
cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver \
  && uv run darwinian_evolver --help | head -5
```

## Quick Start — The Built-In Parrot Example

Tiny smoke test (requires `ANTHROPIC_API_KEY`):

```bash
cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver
uv run darwinian_evolver parrot \
  --num_iterations 2 \
  --num_parents_per_iteration 2 \
  --mutator_concurrency 2 --evaluator_concurrency 2 \
  --output_dir /tmp/parrot_demo
```

Outputs:
- `/tmp/parrot_demo/snapshots/iteration_N.pkl` — pickled population per iteration
- `/tmp/parrot_demo/<jsonl>` — per-iteration JSON log (path printed at end)

Open `~/.hermes/cache/darwinian-evolver/darwinian_evolver/darwinian_evolver/lineage_visualizer.html`
in a browser and load the JSON log to see the evolutionary tree.

## Quick Start — OpenRouter Driver (No Anthropic Key)

The skill ships `scripts/parrot_openrouter.py` — same parrot problem, but the
LLM call goes through OpenRouter so any provider works.

```bash
# From wherever the skill is installed:
SKILL_DIR=~/.hermes/skills/research/darwinian-evolver
DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver

cd "$DE_DIR" && \
  EVOLVER_MODEL='openai/gpt-4o-mini' \
  uv run --with openai python "$SKILL_DIR/scripts/parrot_openrouter.py" \
    --num_iterations 3 --num_parents_per_iteration 2 \
    --output_dir /tmp/parrot_or
```

Inspect the result with `scripts/show_snapshot.py`:

```bash
uv run --with openai python "$SKILL_DIR/scripts/show_snapshot.py" \
  /tmp/parrot_or/snapshots/iteration_3.pkl
```

Expected output: 7 evolved prompt templates ranked by score, with the best
landing around 0.6–0.8 (the seed `Say {{ phrase }}` scored 0.000).

## Defining a Custom Problem

The skill ships `templates/custom_problem_template.py` — copy, edit, run.
Three things you must define:

1. **`Organism`** — a Pydantic `BaseModel` subclass holding the artifact being
   evolved (`prompt_template: str`, `regex_pattern: str`, `sql_query: str`,
   `code_block: str`, etc.). Add a `run(*args)` method that exercises it.

2. **`Evaluator`** — `.evaluate(organism) -> EvaluationResult(score=..., trainable_failure_cases=[...], holdout_failure_cases=[...], is_viable=True)`.
   - **`score`** is in `[0, 1]`. Higher is better.
   - **`trainable_failure_cases`** — what the mutator sees. Include enough
     context (input, expected, actual) for the LLM to diagnose.
   - **`holdout_failure_cases`** — kept out of the mutator's view. Use these
     to detect overfitting.
   - **`is_viable=True`** unless the organism is completely broken (raises,
     returns None, etc.). A 0-score viable organism is fine — it just gets
     down-weighted in parent selection.

3. **`Mutator`** — `.mutate(organism, failure_cases, learning_log_entries) -> list[Organism]`.
   Typically: build an LLM prompt that includes the current organism + a
   failure case + an ask to propose a fix; parse the LLM's response; return
   a new `Organism`. Return `[]` on parse failure — the loop handles it.

Then write a driver script that wires `Problem(initial_organism, evaluator, [mutators])`
into `EvolveProblemLoop` and iterates over `loop.run(num_iterations=N)` — the
shipped `scripts/parrot_openrouter.py` is the reference.

## Hyperparameters That Actually Matter

| flag | default | when to change |
|---|---|---|
| `--num_iterations` | 5 | bump to 10–20 once you trust the evaluator |
| `--num_parents_per_iteration` | 4 | drop to 2 for cheap exploration |
| `--mutator_concurrency` | 10 | drop to 2–4 to avoid rate limits |
| `--evaluator_concurrency` | 10 | same; evaluator hits the LLM too |
| `--batch_size` | 1 | raise to 3–5 once your mutator handles multiple failures |
| `--verify_mutations` | off | turn on once mutator is wasteful (>10× cost saving on later runs per Imbue) |
| `--midpoint_score` | `p75` | leave alone unless scores cluster |
| `--sharpness` | 10 | leave alone |

## Pitfalls

1. **`Initial organism must be viable`** — set `is_viable=True` in your
   `EvaluationResult` even on a 0-score seed. The loop refuses non-viable
   organisms because they imply the loop has nothing to evolve from.
2. **Provider content filters kill runs.** Azure-backed OpenRouter models
   reject phrases like "ignore previous instructions" with HTTP 400. Wrap
   the LLM call in `try/except` and return `f"<LLM_ERROR: {e}>"` — the
   evolver will just score that organism 0 and move on.
3. **`loop.run()` is a generator** — calling it doesn't run anything until
   you iterate. Use `for snap in loop.run(num_iterations=N):`.
4. **Snapshots are nested pickles.** `iteration_N.pkl` contains a dict with
   `population_snapshot` (more pickled bytes). To unpickle you must have the
   `Organism` class importable under the same dotted path it was pickled at.
5. **Concurrency defaults are aggressive.** 10/10 will hit rate limits on
   most providers. Start with 2/2.
6. **CLI is hardcoded to Anthropic.** `uv run darwinian_evolver <problem>`
   reaches for `ANTHROPIC_API_KEY` and uses Claude Sonnet. To use any other
   provider, write a driver like `parrot_openrouter.py`.
7. **AGPL.** Never `from darwinian_evolver import ...` inside Hermes core.
   Custom driver scripts under `~/.hermes/skills/...` are user-side and fine.
8. **No PyPI package.** `pip install darwinian-evolver` will pull the wrong
   thing. Always install from the GitHub repo.

## Verification

After install + a parrot run, exit code 0 from this is sufficient:

```bash
DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver
ls "$DE_DIR/darwinian_evolver/lineage_visualizer.html" >/dev/null && \
cd "$DE_DIR" && uv run darwinian_evolver --help >/dev/null && \
echo "darwinian-evolver: OK"
```

## References

- [Imbue research post](https://imbue.com/research/2026-02-27-darwinian-evolver/)
- [ARC-AGI-2 results](https://imbue.com/research/2026-02-27-arc-agi-2-evolution/)
- [imbue-ai/darwinian_evolver](https://github.com/imbue-ai/darwinian_evolver) (AGPL-3.0)
- [Darwin Gödel Mac

Files: 4

Size: 27.7 KB

Complexity: 53/100

Category: Backend & APIs

Source: https://github.com/nousresearch/hermes-agent/tree/main/optional-skills/research/darwinian-evolver

Related in Backend & APIs

jfrog

Included

Interact with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.

Backend & APIsscripts

cupynumeric-migration-readiness

Included

Pre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must be refactored before porting, or mentions pre-port assessment, scaling analysis, or refactor planning. Inspect the user's source code, look up NumPy usage, cross-reference the cuPyNumeric API support manifest, and distinguish distributed-scaling-friendly patterns from blockers such as unsupported APIs, scalar synchronization, host round-trips, Python/object-heavy control flow, shape/data-dependent branching, and in-place mutation hazards. Produce a verdict of READY, LIGHT REFACTOR, SIGNIFICANT REFACTOR, or NOT RECOMMENDED, with concrete refactor pointers.

Backend & APIsscripts

alibabacloud-data-agent-skill

Included

Invoke Alibaba Cloud Apsara Data Agent for Analytics via CLI to perform natural language-driven data analysis on enterprise databases. Data Agent for Analytics is an intelligent data analysis agent developed by Alibaba Cloud Database team for enterprise users. It automatically completes requirement analysis, data understanding, analysis insights, and report generation based on natural language descriptions. This tool supports: discovering data resources (instances/databases/tables) managed in DMS, initiating query or deep analysis sessions, real-time progress tracking, and retrieving analysis conclusions and generated reports. Use this Skill when users need to query databases, analyze data trends, generate data reports, ask questions in natural language, or mention "Data Agent", "data analysis", "database query", "SQL analysis", "data insights".

Backend & APIsscripts

token-optimizer

Included

Reduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session pruning, bootstrap size limits, cache TTL alignment). Use when token costs are high, API rate limits are being hit, or hosting multiple agents at scale. The 4 executable scripts (context_optimizer, model_router, heartbeat_optimizer, token_tracker) are local-only — no network requests, no subprocess calls, no system modifications. Reference files (PROVIDERS.md, config-patches.json) document optional multi-provider strategies that require external API keys and network access if you choose to use them. See SECURITY.md for full breakdown.

Backend & APIsscripts

resend-cli

Included

Use this skill when the task is specifically about operating Resend from an AI agent, terminal session, or CI job via the official resend CLI: installing/authenticating the CLI, sending/listing/updating/cancelling emails, batch sends, domains and DNS, webhooks and local listeners, inbound receiving, contacts, topics, segments, broadcasts, templates, API keys, profiles, or debugging Resend CLI/API failures. Trigger on mentions of Resend CLI, `resend`, `resend doctor`, `resend emails send`, `resend domains`, `resend webhooks listen`, `resend emails receiving`, or agent-friendly terminal automation.

Backend & APIsscripts

alibabacloud-odps-maxframe-coding

Included

Use this skill for MaxFrame SDK development and documentation navigation on Alibaba Cloud MaxCompute (ODPS). Helps answer MaxFrame API, concept, official example, and supported pandas API questions; create data processing programs; read/write MaxCompute tables; debug jobs (remote or local); and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute with MaxFrame, ODPS table processing, DPE runtime, MaxFrame docs/examples, DataFrame/Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.

Backend & APIsscripts