comfyui-video-production

Included with Lifetime

$97 forever

Plan and orchestrate end-to-end video production pipelines in ComfyUI with validation gates and error recovery. Handles img2vid, txt2vid, vid2vid, and multi-shot video production. Produces pipeline plans with correct step ordering (generate, validate, animate, validate, concat), model selection, retry strategies (seed randomization, parameter adjustment, model fallback), and VRAM-aware resource management. Use when asked to make a video, animate images, create a multi-shot video, set up a video pipeline, or orchestrate video production in ComfyUI. Does NOT cover still image generation, prompt writing, workflow building for non-video tasks, video editing in external tools, model training, installation, or hardware recommendations.

Image & Video

What this skill does


# ComfyUI Video Production Pipeline

End-to-end video production orchestration for ComfyUI with automatic error recovery, quality validation, and instance management.

## Quick Start: Which Pipeline?

**Creating a multi-shot narrative video?**
→ **Keyframe Pipeline** - Generate keyframes → Animate → Stitch with transitions

**Animating existing images?**
→ **I2V Batch Pipeline** - Load images → Queue I2V jobs → Auto-validate → Combine

**Need smooth transitions between scenes?**
→ **Transition Pipeline** - Crossfades, motion blur, zoom effects via FFmpeg

**ComfyUI stuck or crashed?**
→ **Instance Manager** - Auto-restart, health checks, queue monitoring

**Debugging video issues?**
→ **Validation Suite** - Check resolution, FPS, codec, face consistency, color grading

---

## Core Pipelines

### Pipeline 1: Keyframe-to-Video (Complete Narrative)

**Use when:** Creating story-driven videos with multiple distinct shots

```
1. Keyframe Generation Phase
   - Generate consistent keyframes with IP-Adapter/LoRA
   - Validate face consistency, lighting, pose progression
   - Save to organized directory structure
   - Auto-retry failed generations

2. I2V Animation Phase
   - Queue each keyframe to I2V model (Wan 2.2, LTX-2, AnimateDiff)
   - Monitor progress via ComfyUI API
   - Validate each clip (resolution, fps, duration)
   - Auto-retry with different seeds if failed

3. Concatenation Phase
   - Pre-flight validation (ensure all clips match)
   - Apply transition effects (crossfade, motion blur)
   - FFmpeg encoding with proper codec
   - Export final video with metadata

4. Quality Assurance
   - Face consistency check across clips
   - Color grading consistency
   - Audio sync validation (if applicable)
   - Generate QA report
```

**Expected output:** Single cohesive video with smooth transitions

---

### Pipeline 2: Batch I2V Processing

**Use when:** You have multiple images to animate independently

```
1. Image Discovery
   - Scan directory for source images
   - Validate image specs (resolution, format)
   - Generate processing manifest

2. Parallel I2V Queue
   - Queue all images to ComfyUI with appropriate prompts
   - Stagger submissions to avoid overload
   - Monitor queue depth and ETA

3. Progressive Validation
   - Check each completed video immediately
   - Flag issues (wrong resolution, fps, corruption)
   - Auto-retry flagged videos

4. Export & Organize
   - Move validated videos to output directory
   - Generate index with metadata
   - Create contact sheet (thumbnail preview grid)
```

**Expected output:** Directory of validated animated clips

---

### Pipeline 3: Video Concatenation with Transitions

**Use when:** Combining existing video clips with professional transitions

```
1. Clip Validation
   - Verify all clips exist and are readable
   - Check resolution, fps, codec consistency
   - Report mismatches with fix suggestions

2. Transition Planning
   - Detect scene changes (cut detection)
   - Recommend transition types (crossfade, zoom, pan)
   - Calculate transition timing

3. FFmpeg Pipeline
   - Apply transitions between clips
   - Re-encode with consistent settings
   - Preserve quality (high bitrate, proper codec)

4. Audio Handling
   - Extract audio from clips (if present)
   - Crossfade audio at transitions
   - Sync to final video timeline
```

**Expected output:** Polished video with seamless transitions

---

## Model Support (2026)

### Image-to-Video Models

| Model | Quality | Speed | VRAM | Best For | Notes |
|-------|---------|-------|------|----------|-------|
| **LTX-2** | ★★★★★ | Medium | 16GB+ | **Production 4K video** | Native 4K, audio+video |
| **Wan 2.2 MoE** | ★★★★★ | Slow | 24GB+ | **Film-quality aesthetics** | First+last frame control |
| Wan 2.1 14B | ★★★★ | Slow | 24GB | High quality | Proven, stable |
| Wan 2.1 1.3B | ★★★ | Fast | 8GB | **Quick iteration** | Consumer-friendly |
| AnimateDiff V3 | ★★★ | Fast | 8GB | Infinite length | Motion LoRAs |
| SVD (Stable Video Diffusion) | ★★★ | Medium | 12GB | Short clips | 14-25 frames |

### Transition Effects

| Effect | Use Case | Encoding Cost |
|--------|----------|---------------|
| **Crossfade** | General purpose | Low |
| **Motion blur** | High-motion scenes | Medium |
| **Zoom in/out** | Dramatic emphasis | Medium |
| **Pan left/right** | Scene establishment | Medium |
| **Fade to/from black** | Chapter breaks | Low |
| **Custom LUT** | Color grading | Low |

---

## ComfyUI Instance Management

### Health Monitoring

```python
# Auto-detected issues:
- Queue stalled (no progress for 5+ minutes)
- Memory leak (VRAM usage climbing)
- Process crashed (connection refused)
- API unresponsive (timeout on /queue endpoint)
- Disk full (output directory at capacity)
```

### Auto-Recovery Actions

```python
1. Soft Recovery (no restart)
   - Clear stuck queue items
   - Force garbage collection
   - Unload models from VRAM

2. Hard Recovery (restart required)
   - Save current queue state
   - Kill ComfyUI process gracefully
   - Wait for port release
   - Restart with same config
   - Restore queue from saved state

3. Emergency Fallback
   - Switch to backup ComfyUI instance
   - Redirect queue to instance on different port
   - Continue processing without data loss
```

### Multi-Instance Support

```bash
# Run multiple ComfyUI instances for parallel processing
Instance 1: localhost:8188 (primary - I2V generation)
Instance 2: localhost:8189 (secondary - upscaling/post-processing)
Instance 3: localhost:8190 (backup - standby for failover)

# Load balancing strategy:
- Round-robin for equal workloads
- Priority-based for mixed tasks
- Failover for crashed instances
```

---

## Validation Suite

### Pre-Generation Validation

```python
✓ Check ComfyUI is running and responsive
✓ Verify models are loaded (UNET, VAE, CLIP)
✓ Confirm output directory has sufficient space
✓ Validate source images exist and are readable
✓ Check prompts are non-empty and formatted correctly
✓ Verify workflow JSON is valid
```

### Post-Generation Validation

```python
✓ Video file exists and is non-zero size
✓ Resolution matches expected (e.g., 768x1024)
✓ FPS matches expected (e.g., 16 or 25)
✓ Duration matches expected (e.g., 3-5 seconds)
✓ Codec is compatible (h264, h265)
✓ No corruption (can read all frames)
✓ Face consistency score >0.85 (if character video)
✓ Color histogram within expected range
```

### Quality Metrics

```python
Metrics tracked:
- Face embedding distance (identity consistency)
- Optical flow magnitude (motion smoothness)
- Frame PSNR/SSIM (interpolation quality)
- Color histogram deviation (lighting consistency)
- Audio sync offset (if audio present)
```

---

## Error Handling & Recovery

### Retry Strategies

```python
1. Seed Randomization Retry
   - Failed generation? Try different seed
   - Max 3 attempts per keyframe
   - Track seeds that fail (avoid reuse)

2. Parameter Adjustment Retry
   - CFG too high causing artifacts? Lower it
   - Steps too low causing incompleteness? Increase
   - Resolution too high OOM? Downscale

3. Model Fallback Retry
   - Wan 2.2 14B OOM? Fall back to 1.3B
   - LTX-2 unavailable? Fall back to Wan 2.1
   - AnimateDiff motion broken? Switch motion LoRA

4. Checkpoint Resume
   - Save progress after each successful clip
   - Resume from last successful checkpoint
   - Skip already-generated clips
```

### Failure Logging

```python
logs/
├── 2026-02-16_pipeline.log       # Main pipeline log
├── 2026-02-16_comfyui.log        # ComfyUI stdout/stderr
├── 2026-02-16_validation.json    # Validation results
├── 2026-02-16_failures.json      # Failed attempts with reasons
└── 2026-02-16_recovery.json      # Recovery actions taken
```

---

## Directory Structure

### Organized Output

```
project_name/
├── 00_keyframes/                 # Source keyframe images
│   ├── kf01_scene_description.png
│   ├── kf02_scene_description.png
│   └── ...
├── 01_clips/                     # Individual animated clips
│   ├── clip_0

Files: 12

Size: 95.1 KB

Complexity: 61/100

Category: Image & Video

Source: https://github.com/mckruz/comfyui-expert/tree/main/skills/comfyui-video-production

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts