subagent-testing
Test skills via TDD in fresh subagents. Use when validating behavior or preventing bias.
What this skill does
# Subagent Testing - TDD for Skills Test skills with fresh subagent instances to prevent priming bias and validate effectiveness. ## Table of Contents 1. [Overview](#overview) 2. [Why Fresh Instances Matter](#why-fresh-instances-matter) 3. [Testing Methodology](#testing-methodology) 4. [Quick Start](#quick-start) 5. [Detailed Testing Guide](#detailed-testing-guide) 6. [Success Criteria](#success-criteria) ## Overview **Fresh instances prevent priming:** Each test uses a new Claude conversation to verify the skill's impact is measured, not conversation history effects. ## Why Fresh Instances Matter ### The Priming Problem Running tests in the same conversation creates bias: - Prior context influences responses - Skill effects get mixed with conversation history - Can't isolate skill's true impact ### Fresh Instance Benefits - **Isolation**: Each test starts clean - **Reproducibility**: Consistent baseline state - **Measurement**: Clear before/after comparison - **Validation**: Proves skill effectiveness, not priming ## Testing Methodology Three-phase TDD-style approach: ### Phase 1: Baseline Testing (RED) Test without skill to establish baseline behavior. ### Phase 2: With-Skill Testing (GREEN) Test with skill loaded to measure improvements. ### Phase 3: Rationalization Testing (REFACTOR) Test skill's anti-rationalization guardrails. ## Quick Start ```bash # 1. Create baseline tests (without skill) # Use 5 diverse scenarios # Document full responses # 2. Create with-skill tests (fresh instances) # Load skill explicitly # Use identical prompts # Compare to baseline # 3. Create rationalization tests # Test anti-rationalization patterns # Verify guardrails work ``` ## Detailed Testing Guide For complete testing patterns, examples, and templates: - **[Testing Patterns](modules/testing-patterns.md)** - Full TDD methodology - **[Test Examples](modules/testing-patterns.md)** - Baseline, with-skill, rationalization tests - **[Analysis Templates](modules/testing-patterns.md)** - Scoring and comparison frameworks ## Success Criteria - **Baseline**: Document 5+ diverse baseline scenarios - **Improvement**: ≥50% improvement in skill-related metrics - **Consistency**: Results reproducible across fresh instances - **Rationalization Defense**: Guardrails prevent ≥80% of rationalization attempts ## See Also - **skill-authoring**: Creating effective skills - **bulletproof-skill**: Anti-rationalization patterns - **test-skill**: Automated skill testing command
Related in testing
python-testing
IncludedPython testing patterns with pytest, fixtures, TDD, mocking, async and integration tests. Use when writing or auditing a Python test suite.
test-review
IncludedEvaluates test suites for coverage gaps, TDD/BDD compliance, and anti-patterns. Use when auditing test quality or before a major release.
Skill Validator
IncludedValidates that a skill or MCP implementation matches its manifest by running Codex-powered semantic comparisons across descriptions, preconditions, effects, and API surface.
unit-test-vue-pinia
IncludedWrite and review unit tests for Vue 3 + TypeScript + Vitest + Pinia codebases. Use when creating or updating tests for components, composables, and stores; mocking Pinia with createTestingPinia; applying Vue Test Utils patterns; and enforcing black-box assertions over implementation details.
code-review
IncludedSystematic code review patterns covering security, performance, maintainability, correctness, and testing — with severity levels, structured feedback guidance, review process, and anti-patterns to avoid. Use when reviewing PRs, establishing review standards, or improving review quality.
playwright-browser
IncludedUse when automating browsers, testing pages, taking screenshots, checking UI, verifying login flows, or testing responsive behavior