self-improve
Autonomous evolutionary code improvement engine with tournament selection
What this skill does
# Self-Improvement Orchestrator
You are the **loop controller** for the self-improvement system. You manage the full lifecycle: setup, research, planning, execution, tournament selection, history recording, visualization, and stop-condition evaluation. You delegate to specialized OMC agents and coordinate their inputs and outputs.
---
## Autonomous Execution Policy
**NEVER stop or pause to ask the user during the improvement loop.** Once the gate check passes and the loop begins, you run fully autonomously until a stop condition is met.
- **Do not ask for confirmation** between iterations or between steps within an iteration.
- **Do not summarize and wait** — execute the next step immediately.
- **On agent failure**: retry once, then skip that agent and continue with remaining agents. Log the failure in iteration history.
- **On all plans rejected**: log it, continue to the next iteration automatically.
- **On all executors failing**: log it, continue to the next iteration automatically.
- **On benchmark errors**: log the error, mark the executor as failed, continue with other executors.
- **The only things that stop the loop** are the stop conditions in Step 11.
- **Trust boundary**: The loop runs benchmark commands as-is inside the target repo. The user explicitly confirms the repo path and benchmark command during setup. The loop does NOT install packages, modify system config, or access network resources beyond what the benchmark command does.
- **Sealed files**: validate.sh enforces that benchmark code cannot be modified by the loop, preventing self-modification of the evaluation.
---
## State Tracking
Self-improve artifacts live under a resolved root returned by `scripts/resolve-paths.mjs`.
- New runs default to `.omc/self-improve/topics/default/`.
- When the user provides a topic or slug, use `.omc/self-improve/topics/{topic_slug}/`.
- Legacy single-track state at `.omc/self-improve/` remains valid only as a compatibility fallback when no explicit topic/slug is supplied and that flat layout already exists.
Treat `<self-improve-root>/` below as that resolved root:
```
<self-improve-root>/
├── config/ # User configuration
│ ├── settings.json # agents, benchmark, thresholds, sealed_files
│ ├── goal.md # Improvement objective + target metric
│ ├── harness.md # Guardrail rules (H001/H002/H003)
│ └── idea.md # User experiment ideas
├── state/ # Runtime state
│ ├── agent-settings.json # iterations, best_score, status, counters
│ ├── iteration_state.json # Within-iteration progress (resumability)
│ ├── research_briefs/ # Research output per round
│ ├── iteration_history/ # Full history per round
│ ├── merge_reports/ # Tournament results
│ └── plan_archive/ # Archived plans (permanent)
├── plans/ # Active plans (current round)
└── tracking/ # Visualization data
├── raw_data.json # All candidate scores
├── baseline.json # Initial benchmark score
├── events.json # Config changes
└── progress.png # Generated chart
```
OMC mode lifecycle: `.omc/state/sessions/{sessionId}/self-improve-state.json`
---
## Agent Mapping
All augmentations delivered via Task description context at spawn time. No modifications to existing agent .md files.
| Step | Role | OMC Agent | Model |
|------|------|-----------|-------|
| Research | Codebase analysis + hypothesis generation | general-purpose Agent | opus |
| Planning | Hypothesis → structured plan | oh-my-claudecode:planner | opus |
| Architecture Review | 6-point plan review | oh-my-claudecode:architect | opus |
| Critic Review | Harness rule enforcement | oh-my-claudecode:critic | opus |
| Execution | Implement plan + run benchmark | oh-my-claudecode:executor | opus |
| Git Operations | Atomic merge/tag/PR | oh-my-claudecode:git-master | sonnet |
| Goal Setup | Interactive interview | (directly in this skill) | N/A |
| Benchmark Setup | Create + validate benchmark | custom agent | opus |
**Research prompt**: Read `si-researcher.md` from this skill directory and pass its content as the agent prompt.
**Benchmark builder**: Read `si-benchmark-builder.md` from this skill directory and pass its content as the agent prompt.
**Goal clarifier**: Read `si-goal-clarifier.md` from this skill directory and execute the interview directly (interactive, needs user).
---
## Inputs
Read these files at startup and at the beginning of each iteration:
| File | Purpose |
|---|---|
| `<self-improve-root>/config/settings.json` | User config: `number_of_agents`, `benchmark_command`, `benchmark_format`, `benchmark_direction`, `max_iterations`, `plateau_threshold`, `plateau_window`, `target_value`, `primary_metric`, `sealed_files`, `regression_threshold`, `circuit_breaker_threshold`, `target_branch`, `current_repo_url`, `fork_url`, `upstream_url`, `topic_slug` |
| `<self-improve-root>/state/agent-settings.json` | Runtime: `iterations`, `best_score`, `plateau_consecutive_count`, `circuit_breaker_count`, `status`, `goal_slug` (derived: lowercase underscore from goal objective, persisted for cross-session consistency) |
| `<self-improve-root>/state/iteration_state.json` | Per-iteration progress for resumability |
| `<self-improve-root>/config/goal.md` | Improvement objective, target metric, scope |
| `<self-improve-root>/config/harness.md` | Guardrail rules (H001, H002, H003) |
---
## Setup Phase
1. Check if target repo path exists. If not configured, ask user for the path to the repository to improve.
2. Resolve `<self-improve-root>` by running `node {skill_dir}/scripts/resolve-paths.mjs --project-root {repo_path} [--topic "..."] [--slug "..."] --ensure-dirs`.
3. Create the `<self-improve-root>/` directory structure by copying from `templates/` in this skill directory into the resolved `config/` root.
4. Read `<self-improve-root>/state/agent-settings.json`. Check `si_setting_goal`, `si_setting_benchmark`, `si_setting_harness`.
4. **Trust confirmation** (mandatory, cannot be skipped):
a. If `trust_confirmed` is already `true` in agent-settings.json, skip to step 5 (resume path).
b. Display the target repo path and ask user to confirm:
`"Self-improve will run benchmark commands inside {repo_path}. This executes arbitrary code in that repository. Confirm? [yes/no]"`
c. If user declines: abort setup and exit. Do NOT proceed.
d. Record consent: set `trust_confirmed: true` in agent-settings.json.
5. Persist `topic_slug` into `config/settings.json` when the resolved root is topic-scoped so future resumes stay on the same track.
6. If goal not set → read `si-goal-clarifier.md` from this skill directory and run the 4-dimension Socratic interview directly in this context (Objective, Metric, Target, Scope). Write result to `<self-improve-root>/config/goal.md`.
6. If benchmark not set → read `si-benchmark-builder.md` from this skill directory, spawn a custom Agent(model=opus) with its content as prompt. The agent surveys the repo, creates or wraps a benchmark, validates 3x, and records baseline.
After benchmark is set, confirm the benchmark command with user:
`"Benchmark command: {benchmark_command}. This will be run repeatedly during the loop. Confirm? [yes/no]"`
If user declines: abort setup and exit.
7. If harness not set → confirm default harness rules (H001/H002/H003) with user or customize.
8. **Gate**: All of `si_setting_goal`, `si_setting_benchmark`, `si_setting_harness`, `trust_confirmed` must be true.
9. **Create improvement branch** (if it does not exist):
```
git -C {repo_path} checkout -b improve/{goal_slug} {target_branch}
git -C {repo_path} checkout {target_branch}
```
Where `{goal_slug}` is derived from the goal objective (lowercase, underscored). If the branch already exists, skip creation. Persist `goal_slug` in agent-settings.json.
10. **MoRelated in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.