Claude
Skills
Sign in
Back

multi-model-validation

Included with Lifetime
$97 forever

Run multiple AI models in parallel for 3-5x speedup with ENFORCED performance statistics tracking. Use when validating with Grok, Gemini, GPT-5, DeepSeek, MiniMax, Kimi, GLM, or Claudish proxy for code review, consensus analysis, or multi-expert validation. NEW in v3.2.0 - Direct API prefixes (mmax/, kimi/, glm/) for cost savings. Includes dynamic model discovery via `claudish --top-models` and `claudish --free`, session-based workspaces, and Pattern 7-8 for tracking model performance. Trigger keywords - "grok", "gemini", "gpt-5", "deepseek", "minimax", "kimi", "glm", "claudish", "multiple models", "parallel review", "external AI", "consensus", "multi-model", "model performance", "statistics", "free models".

Backend & APIsorchestrationclaudishparallelconsensusmulti-modelgrokgeminiexternal-ai

What this skill does


# Multi-Model Validation

**Version:** 3.3.0
**Purpose:** Patterns for running multiple AI models in parallel via Claudish proxy with **context-aware preferences**, dynamic model discovery, session-based workspaces, and performance statistics
**Status:** Production Ready

## Overview

Multi-model validation is the practice of running multiple AI models (Grok, Gemini, GPT-5, DeepSeek, etc.) in parallel to validate code, designs, or implementations from different perspectives. This achieves:

- **3-5x speedup** via parallel execution (15 minutes → 5 minutes)
- **Consensus-based prioritization** (issues flagged by all models are CRITICAL)
- **Diverse perspectives** (different models catch different issues)
- **Cost transparency** (know before you spend)
- **Free model discovery** (NEW v3.0) - find high-quality free models from trusted providers
- **Performance tracking** - identify slow/failing models for future exclusion
- **Data-driven recommendations** - optimize model shortlist based on historical performance

**Key Innovations:**

1. **Context-Aware Preferences** (NEW v3.3.0) - Automatically use saved model preferences per task type (debug/research/coding/review) from `.claude/multimodel-team.json`
2. **Dynamic Model Discovery** (v3.0) - Use `claudish --top-models` and `claudish --free` to get current available models with pricing
3. **Session-Based Workspaces** (v3.0) - Each validation session gets a unique directory to prevent conflicts
4. **4-Message Pattern** - Ensures true parallel execution by using only Task tool calls in a single message
5. **Pattern 7-8** - Statistics collection and data-driven model recommendations

This skill is extracted from the `/review` command and generalized for use in any multi-model workflow.

---

## ⚠️ MANDATORY: Learn and Reuse User Preferences

> **Model preferences are learned per context and reused automatically.**
>
> - First time a context is used → ASK user → SAVE to that context
> - Next time same context → USE saved models automatically (no asking)
> - User explicitly says "change models" or "different models" → ASK and UPDATE

```bash
# FIRST STEP - Read preferences file
cat .claude/multimodel-team.json 2>/dev/null
```

**Flow:**

```
1. Detect context from task keywords
   - "debug", "error", "bug", "fix" → debug
   - "research", "analyze", "investigate" → research
   - "implement", "build", "create", "code" → coding
   - "review", "audit", "check" → review

2. Check if contextPreferences[context] exists and is non-empty

   IF EXISTS (has models saved):
   → Use those models directly
   → DO NOT ask user
   → Proceed with validation

   IF EMPTY/MISSING (first time for this context):
   → Run: claudish --top-models
   → Ask user to select models (AskUserQuestion)
   → Save to contextPreferences[context]
   → Proceed with validation

3. User override triggers (explicit request to change):
   - "use different models"
   - "change models"
   - "update model preferences"
   → Ask user to select new models
   → Update contextPreferences[context]
```

**Example - Learning Flow:**

```
# First debug task ever:
Task: "Debug this authentication error"
→ Context: debug
→ contextPreferences.debug is empty
→ ASK: "Which models for debug tasks?"
→ User selects: grok, glm, minimax
→ SAVE to contextPreferences.debug
→ Run with those models

# Second debug task:
Task: "Debug the API timeout"
→ Context: debug
→ contextPreferences.debug = ["grok", "glm", "minimax"]
→ USE directly (no asking)
→ Run with saved models

# User wants to change:
Task: "Debug this error, use different models"
→ Detected: "different models" override trigger
→ ASK: "Which models for debug tasks?"
→ User selects: gemini, gpt-5-codex
→ UPDATE contextPreferences.debug
→ Run with new models
```

---

## Related Skills

> **CRITICAL: Tracking Protocol Required**
>
> Before using any patterns in this skill, ensure you have completed the
> pre-launch setup from `orchestration:model-tracking-protocol`.
>
> Launching models without tracking setup = INCOMPLETE validation.

**Cross-References:**

- **orchestration:model-tracking-protocol** - MANDATORY tracking templates and protocols (NEW in v0.6.0)
  - Pre-launch checklist (8 required items)
  - Tracking table templates
  - Failure documentation format
  - Results presentation template
- **orchestration:quality-gates** - Approval gates and severity classification
- **orchestration:task-orchestration** - Progress tracking during execution
- **orchestration:error-recovery** - Handling failures and retries

**Skill Integration:**

This skill (`multi-model-validation`) defines **execution patterns** (how to run models in parallel).
The `model-tracking-protocol` skill defines **tracking infrastructure** (how to collect and present results).

**Use both together:**
```yaml
skills: orchestration:multi-model-validation, orchestration:model-tracking-protocol
```

---

## Core Patterns

### Pattern 0: Session Setup and Model Discovery (NEW v3.0)

**Purpose:** Create isolated session workspace and discover available models dynamically.

**Why Session-Based Workspaces:**

Using a fixed directory like `ai-docs/reviews/` causes problems:
- ❌ Multiple sessions overwrite each other's files
- ❌ Stale data from previous sessions pollutes results
- ❌ Hard to track which files belong to which session

Instead, create a **unique session directory** for each validation:

```bash
# Generate unique session ID
TARGET_SLUG=$(echo "${TASK_NAME:-review}" | tr '[:upper:] ' '[:lower:]-' | sed 's/[^a-z0-9-]//g' | head -c20)
SESSION_ID="review-${TARGET_SLUG}-$(date +%Y%m%d-%H%M%S)-$(head -c 4 /dev/urandom | xxd -p)"
SESSION_DIR="ai-docs/sessions/${SESSION_ID}"

# Create session workspace
mkdir -p "$SESSION_DIR"

echo "Session: $SESSION_ID"
echo "Directory: $SESSION_DIR"

# Example output:
# Session: review-auth-impl-20251212-143052-a3f2
# Directory: ai-docs/sessions/review-auth-impl-20251212-143052-a3f2
```

**Benefits:**
- ✅ Each session is isolated (no cross-contamination)
- ✅ Traceable - can associate files with a specific session
- ✅ Session ID can be used for tracking in statistics
- ✅ Parallel sessions don't conflict
- ✅ Aligned with `dev:feature` session pattern
- ✅ Committed to git for audit trail (unlike `/tmp/`)

> **⚠️ Do NOT use `/tmp/` for session directories.** Files in `/tmp/` are not
> traceable, not committable, and parallel runs will overwrite each other.

---

**Dynamic Model Discovery:**

**NEVER hardcode model lists.** Models change frequently - new ones appear, old ones deprecate, pricing updates. Instead, use `claudish` to get current available models:

```bash
# Get top paid models (best value for money)
claudish --top-models

# Example output:
#   google/gemini-3-pro-preview    Google     $7.00/1M   1048K   🔧 🧠 👁️
#   openai/gpt-5.2-codex           Openai     $5.63/1M   400K    🔧 🧠 👁️
#   x-ai/grok-code-fast-1          X-ai       $0.85/1M   256K    🔧 🧠
#   minimax/minimax-m2.5           Minimax    $0.64/1M   262K    🔧 🧠
#   z-ai/glm-4.7                   Z-ai       $1.07/1M   202K    🔧 🧠
#   qwen/qwen3-vl-235b-a22b-ins... Qwen       $0.70/1M   262K    🔧    👁️

# Get free models from trusted providers
claudish --free

# Example output:
#   google/gemini-2.0-flash-exp:free  Google     FREE      1049K   ✓ · ✓
#   mistralai/devstral-2512:free      Mistralai  FREE      262K    ✓ · ·
#   qwen/qwen3-coder:free             Qwen       FREE      262K    ✓ · ·
#   qwen/qwen3-235b-a22b:free         Qwen       FREE      131K    ✓ ✓ ·
#   openai/gpt-oss-120b:free          Openai     FREE      131K    ✓ ✓ ·
```

**Recommended Free Models for Code Review:**

| Model | Provider | Context | Capabilities | Why Good |
|-------|----------|---------|--------------|----------|
| `qwen/qwen3-coder:free` | Qwen | 262K | Tools ✓ | Coding-specialized, large context |
| `mistralai/devstral-2512:free` | Mistral | 262K | Tools ✓ | Dev-focused, excellent for code |
| `qwen/qwen3-235b-a22b:free` | Qwen | 131K | Tools ✓ Reas

Related in Backend & APIs