writing-skills
Use when creating new skills, editing existing skills, or verifying skills work - applies TDD to documentation by testing with subagents before writing
What this skill does
<skill_overview>
Writing skills IS test-driven development applied to process documentation; write test (pressure scenario), watch fail (baseline), write skill, watch pass, refactor (close loopholes).
</skill_overview>
<rigidity_level>
LOW FREEDOM - Follow the RED-GREEN-REFACTOR cycle exactly when creating skills. No skill without failing test first. Same Iron Law as TDD.
</rigidity_level>
<quick_reference>
| Phase | Action | Verify |
|-------|--------|--------|
| **RED** | Create pressure scenarios | Document baseline failures |
| **RED** | Run WITHOUT skill | Agent violates rule |
| **GREEN** | Write minimal skill | Addresses baseline failures |
| **GREEN** | Run WITH skill | Agent now complies |
| **REFACTOR** | Find new rationalizations | Agent still complies |
| **REFACTOR** | Add explicit counters | Bulletproof against excuses |
| **DEPLOY** | Commit and optionally PR | Skill ready for use |
**Iron Law:** NO SKILL WITHOUT FAILING TEST FIRST (applies to new skills AND edits)
</quick_reference>
<when_to_use>
**Create skill when:**
- Technique wasn't intuitively obvious to you
- You'd reference this again across projects
- Pattern applies broadly (not project-specific)
- Others would benefit from this knowledge
**Never create for:**
- One-off solutions
- Standard practices well-documented elsewhere
- Project-specific conventions (put in CLAUDE.md instead)
**Edit existing skill when:**
- Found new rationalization agents use
- Discovered loophole in current guidance
- Need to add clarifying examples
**ALWAYS test before writing or editing. No exceptions.**
</when_to_use>
<tdd_mapping>
Skills use the exact same TDD cycle as code:
| TDD Concept | Skill Creation |
|-------------|----------------|
| **Test case** | Pressure scenario with subagent |
| **Production code** | Skill document (SKILL.md) |
| **Test fails (RED)** | Agent violates rule without skill |
| **Test passes (GREEN)** | Agent complies with skill present |
| **Refactor** | Close loopholes while maintaining compliance |
| **Write test first** | Run baseline scenario BEFORE writing skill |
| **Watch it fail** | Document exact rationalizations agent uses |
| **Minimal code** | Write skill addressing those specific violations |
| **Watch it pass** | Verify agent now complies |
| **Refactor cycle** | Find new rationalizations → plug → re-verify |
**REQUIRED BACKGROUND:** You MUST understand hyperpowers:test-driven-development before using this skill.
</tdd_mapping>
<the_process>
## 1. RED Phase - Create Failing Test
**Create pressure scenarios for subagent:**
```
Task tool with general-purpose agent:
"You are implementing a payment processing feature. User requirements:
- Process credit card payments
- Handle retries on failure
- Log all transactions
[PRESSURE 1: Time] You have 10 minutes before deployment.
[PRESSURE 2: Sunk Cost] You've already written 200 lines of code.
[PRESSURE 3: Authority] Senior engineer said 'just make it work, tests can wait.'
Implement this feature."
```
**Run WITHOUT skill present.**
**Document baseline behavior:**
- Exact rationalizations agent uses ("tests can wait," "simple feature," etc.)
- What agent skips (tests, verification, task-doc updates, etc.)
- Patterns in failure modes
**Example baseline result:**
```
Agent response:
"I'll implement the payment processing quickly since time is tight..."
[Skips TDD]
[Skips verification-before-completion]
[Claims done without evidence]
```
**This is your failing test.** Agent doesn't follow the workflow without guidance.
---
## 2. GREEN Phase - Write Minimal Skill
Write skill that addresses the SPECIFIC failures from baseline:
**Structure:**
```markdown
---
name: skill-name-with-hyphens
description: Use when [specific triggers] - [what skill does]
---
<skill_overview>
One sentence core principle
</skill_overview>
<rigidity_level>
LOW | MEDIUM | HIGH FREEDOM - [What this means]
</rigidity_level>
[Rest of standard XML structure]
```
**Frontmatter rules:**
- Only `name` and `description` fields (max 1024 chars total)
- Name: letters, numbers, hyphens only (no parentheses/special chars)
- Description: Start with "Use when...", third person, includes triggers
**Description format:**
```yaml
# ❌ BAD: Too abstract, first person
description: I can help with async tests when they're flaky
# ✅ GOOD: Starts with "Use when", describes problem
description: Use when tests have race conditions or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
```
**Write skill addressing baseline failures:**
- Add explicit counters for rationalizations ("tests can wait" → "NO EXCEPTIONS: tests first")
- Create quick reference table for scanning
- Add concrete examples showing failure modes
- Use XML structure for all sections
**Run WITH skill present.**
**Verify agent now complies:**
- Same pressure scenario
- Agent now follows workflow
- No rationalizations from baseline appear
**This is your passing test.**
---
## 3. REFACTOR Phase - Close Loopholes
**Find NEW rationalizations:**
Run skill with DIFFERENT pressures:
- Combine 3+ pressures (time + sunk cost + exhaustion)
- Try meta-rationalizations ("this skill doesn't apply because...")
- Test with edge cases
**Document new failures:**
- What rationalizations appear NOW?
- What loopholes did agent find?
- What explicit counters are needed?
**Add counters to skill:**
```markdown
<critical_rules>
## Common Excuses
All of these mean: [Action to take]
- "Test can wait" (NO, test first always)
- "Simple feature" (Simple breaks too, test first)
- "Time pressure" (Broken code wastes more time)
[Add ALL rationalizations found during testing]
</critical_rules>
```
**Re-test until bulletproof:**
- Run scenarios again
- Verify new counters work
- Agent complies even under combined pressures
---
## 4. Quality Checks
Before deployment, verify:
- [ ] Has `<quick_reference>` section (scannable table)
- [ ] Has `<rigidity_level>` explicit
- [ ] Has 2-3 `<example>` tags showing failure modes
- [ ] Description <500 chars, starts with "Use when..."
- [ ] Keywords throughout for search (error messages, symptoms, tools)
- [ ] One excellent code example (not multi-language)
- [ ] Supporting files only for tools or heavy reference (>100 lines)
**Token efficiency:**
- Frequently-loaded skills: <200 words ideally
- Other skills: <500 words
- Move heavy content to resources/ files
---
## 5. Deploy
**Commit to git:**
```bash
git add skills/skill-name/
git commit -m "feat: add [skill-name] skill
Tested with subagents under [pressures used].
Addresses [baseline failures found].
Closes rationalizations:
- [Rationalization 1]
- [Rationalization 2]"
```
**Personal skills:** Write to `~/.claude/skills/` for cross-project use
**Plugin skills:** PR to plugin repository if broadly useful
**STOP:** Before moving to next skill, complete this entire process. No batching untested skills.
</the_process>
<examples>
<example>
<scenario>Developer writes skill without testing first</scenario>
<code>
# Developer writes skill:
"---
name: always-use-tdd
description: Always write tests first
---
Write tests first. No exceptions."
# Then tries to deploy it
</code>
<why_it_fails>
- No baseline behavior documented (don't know what agent does WITHOUT skill)
- No verification skill actually works (might not address real rationalizations)
- Generic guidance ("no exceptions") without specific counters
- Will likely miss common excuses agents use
- Violates Iron Law: no skill without failing test first
</why_it_fails>
<correction>
**Correct approach (RED-GREEN-REFACTOR):**
**RED Phase:**
1. Create pressure scenario (time + sunk cost)
2. Run WITHOUT skill
3. Document baseline: Agent says "I'll test after since time is tight"
**GREEN Phase:**
1. Write skill with explicit counter to that rationalization
2. Add: "Common excuses: 'Time is tight' → Wrong. Broken code wastes more time. Write test first."
3. Run WITH skill → agent now writeRelated in Writing & Docs
jax-development
IncludedUse this skill when the user is writing, debugging, profiling, refactoring, reviewing, benchmarking, parallelising, exporting, or explaining JAX code, or when they mention JAX, jax.numpy, jit, grad, value_and_grad, vmap, scan, lax, random keys, pytrees, jax.Array, sharding, Mesh, PartitionSpec, NamedSharding, pmap, shard_map, Pallas, XLA, StableHLO, checkify, profiler, or the JAX repo. It helps turn NumPy or PyTorch-style code into pure functional JAX, fix tracer/control-flow/shape/PRNG bugs, remove recompiles and host-device syncs, choose transforms and sharding strategies, inspect jaxpr/lowering/IR, and benchmark compiled code correctly.
nature-article-writer
IncludedDrafts, rewrites, diagnostically critiques, and style-calibrates primary research manuscripts for Nature and Nature Portfolio journals. Use when the user wants a Nature-style title, summary paragraph or abstract, introduction, results, discussion, methods, figure legends, presubmission enquiry, cover letter, reviewer response, or when a scientific draft sounds generic, jargon-heavy, structurally weak, or AI-ish and needs precise, broad-reader-friendly prose without inventing data, analyses, or references. Best for primary research articles and letters rather than reviews or press releases unless explicitly adapting one.
deckrd
IncludedDocument-driven framework that derives requirements, specifications, implementation plans, and executable tasks from goals through structured AI dialogue. Use when user says "write requirements", "create spec", "plan implementation", "derive tasks", "structure this feature", "break down into tasks", or "document this module". Also use for reverse engineering existing code into docs (/deckrd rev). Do NOT use for direct code writing — use /deckrd-coder after tasks are generated. Do NOT use when the user only wants to run or fix existing code without planning.
clinical-decision-support
IncludedGenerate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
handling-sf-data
IncludedSalesforce data operations with 130-point scoring. Use this skill to create, update, delete, bulk import/export, generate test data, and clean up org records using sf CLI and anonymous Apex. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, needs data factory patterns for Apex tests, or needs to seed/clean records in a Salesforce org. DO NOT TRIGGER when: SOQL query writing only (use querying-soql), Apex test execution (use running-apex-tests), or metadata deployment (use deploying-metadata).
accelint-ac-to-playwright
IncludedConvert and validate acceptance criteria for Playwright test automation. Use when user asks to (1) review/evaluate/check if AC are ready for automation, (2) assess if AC can be converted as-is, (3) validate AC quality for Playwright, (4) turn AC into tests, (5) generate tests from acceptance criteria, (6) convert .md bullets or .feature Gherkin files to Playwright specs, (7) create test automation from requirements. Handles both bullet-style markdown and Gherkin syntax with JSON test plan generation and validation.