Claude
Skills
Sign in
Back

skill-optimizer

Included with Lifetime
$97 forever

Diagnose and optimize Agent Skills (SKILL.md) with real session data and research-backed static analysis. Works with Claude Code, Codex, and any Agent Skills-compatible agent.

AI Agents

What this skill does


## When to Use This Skill

- Use when skills are not triggering as expected or seem broken
- Use when you want to audit and improve your skill library's quality
- Use when you want to understand which skills are underperforming or wasting context tokens

## Rules

- **Read-only**: never modify skill files. Only output report.
- **All 8 dimensions**: do not skip any. If data is insufficient, report "N/A — insufficient session data" rather than omitting.
- **Quantify**: "you had 12 research tasks last week but the skill never triggered" beats "you often do research".
- **Suggest, don't prescribe**: give specific wording suggestions for description improvements, but frame as suggestions.
- **Show evidence**: for undertrigger claims, quote the actual user message that should have triggered the skill.
- **Evidence-based suggestions**: when suggesting description rewrites, cite the specific research finding that motivates the change (e.g., "front-load trigger keywords — MCP study shows 3.6x selection rate improvement").

## Overview

Analyze skills using **historical session data + static quality checks**, output a diagnostic report with P0/P1/P2 prioritized fixes. Scores each skill on a 5-point composite scale across 8 dimensions.

CSO (Claude/Agent Search Optimization) = writing skill descriptions so agents select the right skill at the right time. This skill checks for CSO violations.

## Usage

- `/optimize-skill` → scan all skills
- `/optimize-skill my-skill` → single skill
- `/optimize-skill skill-a skill-b` → multiple specified skills

## Data Sources

Auto-detect the current agent platform and scan the corresponding paths:

| Source | Claude Code | Codex | Shared |
|--------|------------|-------|--------|
| Session transcripts | `~/.claude/projects/**/*.jsonl` | `~/.codex/sessions/**/*.jsonl` | — |
| Skill files | `~/.claude/skills/*/SKILL.md` | `~/.codex/skills/*/SKILL.md` | `~/.agents/skills/*/SKILL.md` |

**Platform detection:** Check which directories exist. Scan all available sources — a user may have both Claude Code and Codex installed.

## Workflow

```
Identify target skills
        ↓
Collect session data (python3 scripts scan JSONL transcripts)
        ↓
Run 8 analysis dimensions
        ↓
Compute composite scores
        ↓
Output report with P0/P1/P2
```

### Step 1: Identify Target Skills

Scan skill directories in order: `~/.claude/skills/`, `~/.codex/skills/`, `~/.agents/skills/`. Deduplicate by skill name (same name in multiple locations = same skill). For each, read `SKILL.md` and extract:
- name, description (from YAML frontmatter)
- trigger keywords (from description field)
- defined workflow steps (Step 1/2/3... or ### sections under Workflow)
- word count

If user specified skill names, filter to only those.

### Step 2: Collect Session Data

Use python3 scripts via Bash to scan session JSONL files. Extract:

**Claude Code sessions** (`~/.claude/projects/**/*.jsonl`):
- `Skill` tool_use calls (which skills were invoked)
- User messages (full text)
- Assistant messages after skill invocation (for workflow tracking)
- User messages after skill invocation (for reaction analysis)

**Codex sessions** (`~/.codex/sessions/**/*.jsonl`):
- `session_meta` events → extract `base_instructions` for skill loading evidence
- `response_item` events → assistant outputs (workflow tracking)
- `event_msg` events → tool execution and skill-related events
- User messages from `turn_context` events (for reaction analysis)

**Note:** Codex injects skills via context rather than explicit `Skill` tool calls. Skill loading (present in `base_instructions`) does NOT equal active invocation. To detect actual use, search for skill-specific workflow markers (step headers, output formats) in `response_item` content within that session. A skill is "invoked" only if the agent produced output following the skill's defined workflow.

**Aggregated:**
- Per-skill: invocation count, trigger keyword match count
- Per-skill: user reaction sentiment after invocation
- Per-skill: workflow step completion markers

### Step 3: Run 8 Analysis Dimensions

**You MUST run ALL 8 dimensions.** The baseline behavior without this skill is to skip dimensions 4.2, 4.3, 4.5b, and 4.8. These are the most valuable dimensions — do not skip them.

#### 4.1 Trigger Rate

Count how many times each skill was actually invoked vs how many times its trigger keywords appeared in user messages.

**Claude Code:** count `Skill` tool_use calls in transcripts.
**Codex:** count sessions where the agent produced output following the skill's workflow markers (not merely loaded in context).

**Diagnose:**
- Never triggered → skill may be useless or trigger words wrong
- Keywords match >> actual invocations → undertrigger problem, description needs work
- High frequency → core skill, worth optimizing

#### 4.2 Post-Invocation User Reaction

**This dimension is critical and easy to skip. Do not skip it.**

After a skill is invoked in a session, read the user's next 3 messages. Classify:
- **Negative**: "no", "wrong", "never mind", "not what I wanted", user interrupts
- **Correction**: user re-describes their intent, manually overrides skill output
- **Positive**: "good", "ok", "continue", "nice", user follows the workflow
- **Silent switch**: user changes topic entirely (likely false positive trigger)

Report per-skill satisfaction rate.

#### 4.3 Workflow Completion Rate

**This dimension is critical and easy to skip. Do not skip it.**

For each skill invocation found in session data:
1. Extract the skill's defined steps from SKILL.md
2. Search the assistant messages in that session for step markers (Step N, specific output formats defined in the skill)
3. Calculate: how far did execution get?

Report: `{skill-name} (N steps): avg completed Step X/N (Y%)`

If a specific step is frequently where execution stops, flag it.

#### 4.4 Static Quality Analysis

Check each SKILL.md against these 14 rules:

| Check | Pass Criteria |
|-------|--------------|
| Frontmatter format | Only `name` + `description`, total < 1024 chars |
| Name format | Letters, numbers, hyphens only |
| Description trigger | Starts with "Use when..." or has explicit trigger conditions |
| Description workflow leak | Description does NOT summarize the skill's workflow steps (CSO violation) |
| Description pushiness | Description actively claims scenarios where it should be used, not just passive |
| Overview section | Present |
| Rules section | Present |
| MUST/NEVER density | Count ALL-CAPS directive words; >5 per 100 words = flag |
| Word count | < 500 words (flag if over) |
| Narrative anti-pattern | No "In session X, we found..." storytelling |
| YAML quoting safety | description containing `: ` must be wrapped in double quotes |
| Critical info position | Core trigger conditions and primary actions must be in the first 20% of SKILL.md |
| Description 250-char check | Primary trigger keywords must appear within the first 250 characters of description |
| Trigger condition count | ≤ 2 trigger conditions in description is ideal |

#### 4.5a False Positive Rate (Overtrigger)

Skill was invoked but user immediately rejected or ignored it.

#### 4.5b Undertrigger Detection

**This is the highest-value dimension.** For each skill, extract its **capability keywords** (not just trigger keywords — what the skill CAN do). Then scan user messages for tasks that match those capabilities but where the skill was NOT invoked.

Report: which user messages SHOULD have triggered the skill but didn't, and suggest description improvements.

**Compounding Risk Assessment:**
For skills with chronic undertriggering (0 triggers across 5+ sessions where relevant tasks appeared), flag as "compounding risk" — undertriggered skills cannot self-improve through usage feedback, causing the gap to widen over time. Recommend immediate description rewrite as P0.

#### 4.6 Cross-Skill Conflicts

Compare all skill pairs:
- Trigger keyword overlap (same keywords in tw

Related in AI Agents