Claude
Skills
Sign in
Back

skill-reviewer

Included with Lifetime
$97 forever

Evaluate any skill's design quality against best practices and flag anti-patterns. Use when: "/skill-reviewer", "review this skill", "evaluate skill quality", "audit skill", "how good is this skill", "rate this skill". NOT for: conversation-level reflection (use /reflect), code review, or PR review.

Designscripts

What this skill does


# /skill-reviewer - Skill Design Evaluation

Evaluate a skill's design quality against best practices from Anthropic's skill-building guidance and real-world patterns from high-quality skills.

## How It Differs from /reflect

| | /reflect | /skill-reviewer |
|---|---------|---------------|
| **Input** | Current conversation history | Skill files on disk |
| **Evaluates** | What worked/failed during usage | Design, structure, documentation quality |
| **When** | After using a skill | Any time, even before first use |
| **Output** | Conversation learnings, SKILL.md patches | Scored evaluation with actionable recommendations |

## Arguments

```
/skill-reviewer publisher-lookup          → Review by skill name (auto-discovers path)
/skill-reviewer ./skills/my-skill/        → Review by path
/skill-reviewer --fix publisher-lookup    → Review + apply quick fixes automatically
/skill-reviewer --compare reflect taboolar → Side-by-side comparison of two skills
/skill-reviewer --with-logs my-skill      → Review + include transcript usage evidence
```

**$ARGUMENTS** = `[--fix] [--with-logs] <skill-path-or-name>` OR `--compare <left-skill> <right-skill>`

Parse the argument before starting:
- If first token is `--compare`, expect exactly two skill names following it. Run comparison mode (Phase 7).
- Recognize `--with-logs` anywhere in the args as an opt-in modifier; pass it through to `scripts/cleaner_checks.py` in Phase 5.
- If `--fix` is present, the remaining positional token is the skill name. Run normal evaluation then apply fixes (Phase 6).
- Otherwise, the positional token is the skill name. Run normal evaluation only.
- If no argument is provided, ask: "Which skill should I review?"

## Workflow

### Phase 1: Discover & Inventory

**Step 1 - Locate the skill.** Search these paths in order, stop at first match:

1. Exact path if argument looks like a path (`./`, `/`, `~/`)
2. `~/.claude/skills/<name>/SKILL.md`
3. `~/.claude/plugins/*/skills/<name>/SKILL.md`
4. `~/Code/taboola-pm-skills/plugins/<name>/skills/<name>/SKILL.md`
5. Current project's `.claude/skills/<name>/SKILL.md`
6. Glob `**/skills/<name>/SKILL.md` (exclude `.git/`, `node_modules/`, `.venv/`)

**If no SKILL.md found**: Stop. Report "Skill not found: <name>. Searched [list paths checked]. Try providing the full path."

**If multiple matches**: List all candidates and ask "Which skill did you mean?" before proceeding.

**Step 2 - Resolve symlinks.** If the located path is a symlink, resolve it once to the real path. Work from the real path for all subsequent steps. Do not follow further symlinks inside the skill directory.

**Step 3 - Inventory the skill root.** Check for these standard owned subfolders only:

- `SKILL.md` (required)
- `LEARNINGS.md`
- `CLAUDE.md`
- `references/` - list files
- `scripts/` - list files
- `templates/` - list files
- `data/` - list files
- `examples/` - list files

Do NOT recurse into nested `plugins/`, `.git/`, `node_modules/`, `vendor/`, `.venv/`, or additional `SKILL.md` files beyond the root unless the main SKILL.md explicitly delegates to them.

**Step 4 - Read SKILL.md** fully. Extract frontmatter fields and note body structure (sections, line count).

**Step 5 - Read supporting files** referenced by SKILL.md or needed to score a dimension. Do not read files speculatively.

**Step 6 - Count metrics**: SKILL.md line count, number of files per subfolder.

### Phase 1b: Check Prior Reviews

Before scoring, check if this skill has been reviewed before:

```bash
python3 scripts/history.py --skill <skill-name>
```

If prior reviews exist, note the previous score and grade. A re-review should explain what changed since the last evaluation.

### Phase 2: Classify

Read `references/evaluation-framework.md` for the 9 categories. Use the signal words table as **hints only** — confirm classification semantically from the surrounding text, not keyword presence alone.

A skill should fit ONE category cleanly. If evidence is mixed, mark confidence as `low` and explain why. Do not force a category.

### Phase 3: Score (10 Dimensions)

Read `references/evaluation-framework.md` for the full rubric with scoring anchors.

For each dimension, use keyword patterns as hints — confirm semantically before assigning a score. When evidence is ambiguous, round down and note what would raise the score.

**D1: Progressive Disclosure (0-3)** - Does it use the file system for context engineering?
- Check: references/ or scripts/ populated? Does SKILL.md point to them appropriately?
- Red flag: SKILL.md > 200 lines AND no sub-files at all

**D2: Description Quality (0-3)** - Is the frontmatter description trigger-focused?
- Check: Does it say when to use AND when NOT to use? Includes specific trigger phrases?
- Red flag: Generic "Does X" with no trigger guidance

**D3: Gotchas Section (0-3)** - Is there a section documenting known failures?
- Check: Look for failure documentation semantically - specific errors with root causes and fixes
- Red flag: No failure documentation anywhere in the skill

**D4: Don't State the Obvious (0-3)** - Is content non-obvious to Claude?
- Check: Would Claude already know this without the skill? Is it org-specific tribal knowledge?
- Red flag: Skill is mostly generic frameworks Claude knows (STAR method, agile, product frameworks)

**D5: Avoid Railroading (0-3)** - Is there flexibility for Claude to adapt?
- Check: Decision branches, "if X then Y" conditionals, room for judgment?
- Red flag: Every step hardcoded, no conditionals, no graceful degradation

**D6: Setup & Configuration (0-3)** - Are user-specific values configurable?
- Check: config.json pattern? Arguments? Hardcoded absolute paths or usernames?
- Red flag: `/Users/specific-name/`, hardcoded emails or IDs baked into the skill

**D7: Memory & Data Storage (0-3)** - Does it retain useful state across runs?
- **Important**: Score 0 is correct and acceptable for intentionally stateless skills. Only score higher if the skill produces outputs where retention would genuinely benefit future runs.
- Check: LEARNINGS.md? Log files? Structured persistence?
- Red flag: Skill produces outputs that could inform future runs but discards everything

**D8: Scripts & Composable Code (0-3)** - Does it provide reusable scripts?
- Check: scripts/ with SQL, shell, or Python utilities Claude can compose on top of?
- Red flag: Complex multi-step logic described only in prose with no code assets

**D9: Frontmatter Quality (0-3)** - Is the YAML frontmatter well-configured?
- Check: name, description, allowed-tools (minimally scoped), user-invocable, argument-hint
- Red flag: Missing fields, `Bash` without subcommand scope, no argument-hint

**D10: Hooks Integration (0-2)** - Does it use on-demand hooks?
- Note: Most skills legitimately don't need hooks. Score 0 is fine. Only flag if hooks would clearly prevent a real risk (e.g., destructive pipeline with no guardrails).
- Check: hooks config, safety guardrails tied to skill invocation

### Phase 4: Detect Anti-Patterns

`score.py` auto-detects the structural anti-patterns (persona-only, script-wrapper, monolithic, hardcoded-paths, no-error-handling, stale-references, overpermissioned unscoped Bash, no-output-format). Trust its output for those.

The following three **require your semantic judgment** — score.py does not detect them:

| Anti-Pattern | How to Detect (Claude only) |
|-------------|----------------------------|
| **Redundant knowledge** | Read the content — is this generic best-practice Claude already knows, or org-specific? |
| **Category straddling** | After classifying in Phase 2, does the skill pull equally hard in 2+ directions? |
| **Overpermissioned (unused tools)** | Do the listed `allowed-tools` all appear in the workflow? Score.py only catches unscoped `Bash`. |

### Phase 5: Generate Report

**Step 1 — Skill-cleaner signals.** Run `python3 scripts/cleaner_checks.py <skill-md-path>` (append `--with-logs` if the user passed it). Include the script

Related in Design