Claude
Skills
Sign in
Back

qa-team

Included with Lifetime
$97 forever

QA team for outside-in validation, side-by-side parity loops, and A/B behavioral comparison. Use when you need behavior-driven tests, legacy-vs-new comparison, or rollout shadow validation. Creates executable scenarios and parity workflows that agents can observe, compare, and iterate on. Supports local, observable tmux, remote SSH, and shadow-mode divergence logging patterns.

Code Reviewscripts

What this skill does


# QA Team Skill

## Purpose [LEVEL 1]

This skill helps you create **agentic outside-in tests** that verify application behavior from an external user's perspective without any knowledge of internal implementation. Using the gadugi-agentic-test framework, you write declarative YAML scenarios that AI agents execute, observe, and validate.

**Key Principle**: Tests describe WHAT should happen, not HOW it's implemented. Agents figure out the execution details.

## When to Use This Skill [LEVEL 1]

### Perfect For

- **Smoke Tests**: Quick validation that critical user flows work
- **Behavior-Driven Testing**: Verify features from user perspective
- **Cross-Platform Testing**: Same test logic for CLI, TUI, Web, Electron
- **Refactoring Safety**: Tests remain valid when implementation changes
- **AI-Powered Testing**: Let agents handle complex interactions
- **Documentation as Tests**: YAML scenarios double as executable specs

### Use This Skill When

- Starting a new project and defining expected behaviors
- Refactoring code and need tests that won't break with internal changes
- Testing user-facing applications (CLI tools, TUIs, web apps, desktop apps)
- Writing acceptance criteria that can be automatically verified
- Need tests that non-developers can read and understand
- Want to catch regressions in critical user workflows
- Testing complex multi-step interactions

### Don't Use This Skill When

- Need unit tests for internal functions (use test-gap-analyzer instead)
- Testing performance or load characteristics
- Need precise timing or concurrency control
- Testing non-interactive batch processes
- Implementation details matter more than behavior

## Core Concepts [LEVEL 1]

### Outside-In Testing Philosophy

**Traditional Inside-Out Testing**:

```python
# Tightly coupled to implementation
def test_calculator_add():
    calc = Calculator()
    result = calc.add(2, 3)
    assert result == 5
    assert calc.history == [(2, 3, 5)]  # Knows internal state
```

**Agentic Outside-In Testing**:

```yaml
# Implementation-agnostic behavior verification
scenario:
  name: "Calculator Addition"
  steps:
    - action: launch
      target: "./calculator"
    - action: send_input
      value: "add 2 3"
    - action: verify_output
      contains: "Result: 5"
```

**Benefits**:

- Tests survive refactoring (internal changes don't break tests)
- Readable by non-developers (YAML is declarative)
- Platform-agnostic (same structure for CLI/TUI/Web/Electron)
- AI agents handle complexity (navigation, timing, screenshots)

### The Gadugi Agentic Test Framework [LEVEL 2]

Gadugi-agentic-test is a Python framework that:

1. **Parses YAML test scenarios** with declarative steps
2. **Dispatches to specialized agents** (CLI, TUI, Web, Electron agents)
3. **Executes actions** (launch, input, click, wait, verify)
4. **Collects evidence** (screenshots, logs, output captures)
5. **Validates outcomes** against expected results
6. **Generates reports** with evidence trails

**Architecture**:

```
YAML Scenario → Scenario Loader → Agent Dispatcher → Execution Engine
                                          ↓
                     [CLI Agent, TUI Agent, Web Agent, Electron Agent]
                                          ↓
                           Observers → Comprehension Agent
                                          ↓
                                   Evidence Report
```

### Progressive Disclosure Levels [LEVEL 1]

This skill teaches testing in four levels:

- **Level 1: Fundamentals** - Basic single-action tests, simple verification
- **Level 2: Intermediate** - Multi-step flows, conditional logic, error handling
- **Level 3: Advanced** - Custom agents, visual regression, performance validation
- **Level 4: Parity & Shadowing** - Side-by-side A/B comparison, remote observable runs, rollout divergence logging

Each example is marked with its level. Start at Level 1 and progress as needed.

## Side-by-Side Parity and A/B Validation [LEVEL 2]

QA Team is the renamed primary skill for what used to be `outside-in-testing`. Use it for standard outside-in scenarios **and** for parity loops where you must compare a legacy implementation to a replacement, or compare approach A to approach B, as an external user would observe them.

### Use QA Team for parity work when

- migrating Python to Rust, old CLI to new CLI, or v1 to v2 behavior
- validating a rewrite before switching defaults
- comparing branch A vs branch B using the same user scenarios
- running observable side-by-side sessions in paired virtual TTYs
- logging rollout divergences in shadow mode without failing the run

### Recommended parity loop

1. Define shared user-facing scenarios first.
2. Run both implementations in isolated sandboxes.
3. Compare stdout, stderr, exit code, JSON outputs, and filesystem side effects.
4. Re-run in `--observable` mode when you need paired tmux panes for debugging.
5. Use `--ssh-target <host>` when parity must happen on a remote environment such as `azlin`.
6. Use `--shadow-mode --shadow-log <file>` during rollout to log divergences without blocking execution.

### Command pattern to reuse

If the repo already has a parity harness, extend it instead of inventing a second one. A good baseline is:

```bash
python tests/parity/validate_cli_parity.py \
  --scenario tests/parity/scenarios/feature.yaml \
  --python-repo /path/to/legacy-repo \
  --rust-binary /path/to/new-binary \
  --observable
```

For remote parity:

```bash
python tests/parity/validate_cli_parity.py \
  --ssh-target azlin \
  --scenario tests/parity/scenarios/feature.yaml \
  --python-repo /remote/path/to/legacy-repo \
  --rust-binary /remote/path/to/new-binary
```

For rollout shadow logging:

```bash
python tests/parity/validate_cli_parity.py \
  --scenario tests/parity/scenarios/feature.yaml \
  --python-repo /path/to/legacy-repo \
  --rust-binary /path/to/new-binary \
  --shadow-mode \
  --shadow-log /tmp/feature-shadow.jsonl
```

## Quick Start [LEVEL 1]

### Installation

**Prerequisites (for native module compilation):**

```bash
# macOS
xcode-select --install

# Ubuntu/Debian
sudo apt-get install -y build-essential python3

# Windows: Install Visual Studio Build Tools with "Desktop development with C++"
```

**Install the framework:**

The gadugi-agentic-test framework is not published to npm. Install it from GitHub:

```bash
# Clone the repository
git clone https://github.com/rysweet/gadugi-agentic-test.git
cd gadugi-agentic-test

# Install dependencies and build
npm install
npm run build

# Verify the build succeeded
node dist/cli.js --version
```

> **Tip**: If you want CLI-style access from anywhere, you can add an alias:
>
> ```bash
> alias gadugi-test="node /path/to/gadugi-agentic-test/dist/cli.js"
> ```

### Your First Test (CLI Example)

Create `test-hello.yaml`:

```yaml
scenario:
  name: "Hello World CLI Test"
  description: "Verify CLI prints greeting"
  type: cli

  prerequisites:
    - "./hello-world executable exists"

  steps:
    - action: launch
      target: "./hello-world"

    - action: verify_output
      contains: "Hello, World!"

    - action: verify_exit_code
      expected: 0
```

Run the test:

```bash
node dist/cli.js run -s test-hello -d ./
```

Output:

```
✓ Scenario: Hello World CLI Test
  ✓ Step 1: Launched ./hello-world
  ✓ Step 2: Output contains "Hello, World!"
  ✓ Step 3: Exit code is 0

PASSED (3/3 steps successful)
Evidence saved to: ./evidence/test-hello-20250116-093045/
```

### Understanding the YAML Structure [LEVEL 1]

Every test scenario has this structure:

```yaml
scenario:
  name: "Descriptive test name"
  description: "What this test verifies"
  type: cli | tui | web | electron

  # Optional metadata
  tags: [smoke, critical, auth]
  timeout: 30s

  # What must be true before test runs
  prerequisites:
    - "Condition 1"
    - "Condition 2"

  # The test steps (executed sequentially)
  steps:
    - action: action_name
      parameter1: value1
      parameter2: val

Related in Code Review