multi-model-validation

Included with Lifetime

$97 forever

Run multiple AI models in parallel for 3-5x speedup with ENFORCED performance statistics tracking. Use when validating with Grok, Gemini, GPT-5, DeepSeek, MiniMax, Kimi, GLM, or Claudish proxy for code review, consensus analysis, or multi-expert validation. NEW in v3.2.0 - Direct API prefixes (mmax/, kimi/, glm/) for cost savings. Includes dynamic model discovery via `claudish --top-models` and `claudish --free`, session-based workspaces, and Pattern 7-8 for tracking model performance. Trigger keywords - "grok", "gemini", "gpt-5", "deepseek", "minimax", "kimi", "glm", "claudish", "multiple models", "parallel review", "external AI", "consensus", "multi-model", "model performance", "statistics", "free models".

Backend & APIsorchestrationclaudishparallelconsensusmulti-modelgrokgeminiexternal-ai

What this skill does


# Multi-Model Validation

**Version:** 3.3.0
**Purpose:** Patterns for running multiple AI models in parallel via Claudish proxy with **context-aware preferences**, dynamic model discovery, session-based workspaces, and performance statistics
**Status:** Production Ready

## Overview

Multi-model validation is the practice of running multiple AI models (Grok, Gemini, GPT-5, DeepSeek, etc.) in parallel to validate code, designs, or implementations from different perspectives. This achieves:

- **3-5x speedup** via parallel execution (15 minutes → 5 minutes)
- **Consensus-based prioritization** (issues flagged by all models are CRITICAL)
- **Diverse perspectives** (different models catch different issues)
- **Cost transparency** (know before you spend)
- **Free model discovery** (NEW v3.0) - find high-quality free models from trusted providers
- **Performance tracking** - identify slow/failing models for future exclusion
- **Data-driven recommendations** - optimize model shortlist based on historical performance

**Key Innovations:**

1. **Context-Aware Preferences** (NEW v3.3.0) - Automatically use saved model preferences per task type (debug/research/coding/review) from `.claude/multimodel-team.json`
2. **Dynamic Model Discovery** (v3.0) - Use `claudish --top-models` and `claudish --free` to get current available models with pricing
3. **Session-Based Workspaces** (v3.0) - Each validation session gets a unique directory to prevent conflicts
4. **4-Message Pattern** - Ensures true parallel execution by using only Task tool calls in a single message
5. **Pattern 7-8** - Statistics collection and data-driven model recommendations

This skill is extracted from the `/review` command and generalized for use in any multi-model workflow.

---

## ⚠️ MANDATORY: Learn and Reuse User Preferences

> **Model preferences are learned per context and reused automatically.**
>
> - First time a context is used → ASK user → SAVE to that context
> - Next time same context → USE saved models automatically (no asking)
> - User explicitly says "change models" or "different models" → ASK and UPDATE

```bash
# FIRST STEP - Read preferences file
cat .claude/multimodel-team.json 2>/dev/null
```

**Flow:**

```
1. Detect context from task keywords
   - "debug", "error", "bug", "fix" → debug
   - "research", "analyze", "investigate" → research
   - "implement", "build", "create", "code" → coding
   - "review", "audit", "check" → review

2. Check if contextPreferences[context] exists and is non-empty

   IF EXISTS (has models saved):
   → Use those models directly
   → DO NOT ask user
   → Proceed with validation

   IF EMPTY/MISSING (first time for this context):
   → Run: claudish --top-models
   → Ask user to select models (AskUserQuestion)
   → Save to contextPreferences[context]
   → Proceed with validation

3. User override triggers (explicit request to change):
   - "use different models"
   - "change models"
   - "update model preferences"
   → Ask user to select new models
   → Update contextPreferences[context]
```

**Example - Learning Flow:**

```
# First debug task ever:
Task: "Debug this authentication error"
→ Context: debug
→ contextPreferences.debug is empty
→ ASK: "Which models for debug tasks?"
→ User selects: grok, glm, minimax
→ SAVE to contextPreferences.debug
→ Run with those models

# Second debug task:
Task: "Debug the API timeout"
→ Context: debug
→ contextPreferences.debug = ["grok", "glm", "minimax"]
→ USE directly (no asking)
→ Run with saved models

# User wants to change:
Task: "Debug this error, use different models"
→ Detected: "different models" override trigger
→ ASK: "Which models for debug tasks?"
→ User selects: gemini, gpt-5-codex
→ UPDATE contextPreferences.debug
→ Run with new models
```

---

## Related Skills

> **CRITICAL: Tracking Protocol Required**
>
> Before using any patterns in this skill, ensure you have completed the
> pre-launch setup from `orchestration:model-tracking-protocol`.
>
> Launching models without tracking setup = INCOMPLETE validation.

**Cross-References:**

- **orchestration:model-tracking-protocol** - MANDATORY tracking templates and protocols (NEW in v0.6.0)
  - Pre-launch checklist (8 required items)
  - Tracking table templates
  - Failure documentation format
  - Results presentation template
- **orchestration:quality-gates** - Approval gates and severity classification
- **orchestration:task-orchestration** - Progress tracking during execution
- **orchestration:error-recovery** - Handling failures and retries

**Skill Integration:**

This skill (`multi-model-validation`) defines **execution patterns** (how to run models in parallel).
The `model-tracking-protocol` skill defines **tracking infrastructure** (how to collect and present results).

**Use both together:**
```yaml
skills: orchestration:multi-model-validation, orchestration:model-tracking-protocol
```

---

## Core Patterns

### Pattern 0: Session Setup and Model Discovery (NEW v3.0)

**Purpose:** Create isolated session workspace and discover available models dynamically.

**Why Session-Based Workspaces:**

Using a fixed directory like `ai-docs/reviews/` causes problems:
- ❌ Multiple sessions overwrite each other's files
- ❌ Stale data from previous sessions pollutes results
- ❌ Hard to track which files belong to which session

Instead, create a **unique session directory** for each validation:

```bash
# Generate unique session ID
TARGET_SLUG=$(echo "${TASK_NAME:-review}" | tr '[:upper:] ' '[:lower:]-' | sed 's/[^a-z0-9-]//g' | head -c20)
SESSION_ID="review-${TARGET_SLUG}-$(date +%Y%m%d-%H%M%S)-$(head -c 4 /dev/urandom | xxd -p)"
SESSION_DIR="ai-docs/sessions/${SESSION_ID}"

# Create session workspace
mkdir -p "$SESSION_DIR"

echo "Session: $SESSION_ID"
echo "Directory: $SESSION_DIR"

# Example output:
# Session: review-auth-impl-20251212-143052-a3f2
# Directory: ai-docs/sessions/review-auth-impl-20251212-143052-a3f2
```

**Benefits:**
- ✅ Each session is isolated (no cross-contamination)
- ✅ Traceable - can associate files with a specific session
- ✅ Session ID can be used for tracking in statistics
- ✅ Parallel sessions don't conflict
- ✅ Aligned with `dev:feature` session pattern
- ✅ Committed to git for audit trail (unlike `/tmp/`)

> **⚠️ Do NOT use `/tmp/` for session directories.** Files in `/tmp/` are not
> traceable, not committable, and parallel runs will overwrite each other.

---

**Dynamic Model Discovery:**

**NEVER hardcode model lists.** Models change frequently - new ones appear, old ones deprecate, pricing updates. Instead, use `claudish` to get current available models:

```bash
# Get top paid models (best value for money)
claudish --top-models

# Example output:
#   google/gemini-3-pro-preview    Google     $7.00/1M   1048K   🔧 🧠 👁️
#   openai/gpt-5.2-codex           Openai     $5.63/1M   400K    🔧 🧠 👁️
#   x-ai/grok-code-fast-1          X-ai       $0.85/1M   256K    🔧 🧠
#   minimax/minimax-m2.5           Minimax    $0.64/1M   262K    🔧 🧠
#   z-ai/glm-4.7                   Z-ai       $1.07/1M   202K    🔧 🧠
#   qwen/qwen3-vl-235b-a22b-ins... Qwen       $0.70/1M   262K    🔧    👁️

# Get free models from trusted providers
claudish --free

# Example output:
#   google/gemini-2.0-flash-exp:free  Google     FREE      1049K   ✓ · ✓
#   mistralai/devstral-2512:free      Mistralai  FREE      262K    ✓ · ·
#   qwen/qwen3-coder:free             Qwen       FREE      262K    ✓ · ·
#   qwen/qwen3-235b-a22b:free         Qwen       FREE      131K    ✓ ✓ ·
#   openai/gpt-oss-120b:free          Openai     FREE      131K    ✓ ✓ ·
```

**Recommended Free Models for Code Review:**

| Model | Provider | Context | Capabilities | Why Good |
|-------|----------|---------|--------------|----------|
| `qwen/qwen3-coder:free` | Qwen | 262K | Tools ✓ | Coding-specialized, large context |
| `mistralai/devstral-2512:free` | Mistral | 262K | Tools ✓ | Dev-focused, excellent for code |
| `qwen/qwen3-235b-a22b:free` | Qwen | 131K | Tools ✓ Reas

Files: 1

Size: 77.4 KB

Complexity: 45/100

Category: Backend & APIs

Source: https://github.com/madappgang/claude-code/tree/main/plugins/multimodel/skills/multi-model-validation

Related in Backend & APIs

jfrog

Included

Interact with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.

Backend & APIsscripts

cupynumeric-migration-readiness

Included

Pre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must be refactored before porting, or mentions pre-port assessment, scaling analysis, or refactor planning. Inspect the user's source code, look up NumPy usage, cross-reference the cuPyNumeric API support manifest, and distinguish distributed-scaling-friendly patterns from blockers such as unsupported APIs, scalar synchronization, host round-trips, Python/object-heavy control flow, shape/data-dependent branching, and in-place mutation hazards. Produce a verdict of READY, LIGHT REFACTOR, SIGNIFICANT REFACTOR, or NOT RECOMMENDED, with concrete refactor pointers.

Backend & APIsscripts

alibabacloud-data-agent-skill

Included

Invoke Alibaba Cloud Apsara Data Agent for Analytics via CLI to perform natural language-driven data analysis on enterprise databases. Data Agent for Analytics is an intelligent data analysis agent developed by Alibaba Cloud Database team for enterprise users. It automatically completes requirement analysis, data understanding, analysis insights, and report generation based on natural language descriptions. This tool supports: discovering data resources (instances/databases/tables) managed in DMS, initiating query or deep analysis sessions, real-time progress tracking, and retrieving analysis conclusions and generated reports. Use this Skill when users need to query databases, analyze data trends, generate data reports, ask questions in natural language, or mention "Data Agent", "data analysis", "database query", "SQL analysis", "data insights".

Backend & APIsscripts

token-optimizer

Included

Reduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session pruning, bootstrap size limits, cache TTL alignment). Use when token costs are high, API rate limits are being hit, or hosting multiple agents at scale. The 4 executable scripts (context_optimizer, model_router, heartbeat_optimizer, token_tracker) are local-only — no network requests, no subprocess calls, no system modifications. Reference files (PROVIDERS.md, config-patches.json) document optional multi-provider strategies that require external API keys and network access if you choose to use them. See SECURITY.md for full breakdown.

Backend & APIsscripts

resend-cli

Included

Use this skill when the task is specifically about operating Resend from an AI agent, terminal session, or CI job via the official resend CLI: installing/authenticating the CLI, sending/listing/updating/cancelling emails, batch sends, domains and DNS, webhooks and local listeners, inbound receiving, contacts, topics, segments, broadcasts, templates, API keys, profiles, or debugging Resend CLI/API failures. Trigger on mentions of Resend CLI, `resend`, `resend doctor`, `resend emails send`, `resend domains`, `resend webhooks listen`, `resend emails receiving`, or agent-friendly terminal automation.

Backend & APIsscripts

alibabacloud-odps-maxframe-coding

Included

Use this skill for MaxFrame SDK development and documentation navigation on Alibaba Cloud MaxCompute (ODPS). Helps answer MaxFrame API, concept, official example, and supported pandas API questions; create data processing programs; read/write MaxCompute tables; debug jobs (remote or local); and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute with MaxFrame, ODPS table processing, DPE runtime, MaxFrame docs/examples, DataFrame/Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.

Backend & APIsscripts