Claude
Skills
Sign in
Back

comfyui-character-gen

Included with Lifetime
$97 forever

Build identity-preserving character generation workflows and pipelines in ComfyUI. Selects the optimal identity method (InfiniteYou, FLUX Kontext, PuLID, InstantID, IP-Adapter) based on use case requirements. Handles face preservation, likeness transfer, cross-domain conversion (3D to photo), multi-reference consistency, iterative character editing, and character variation generation. Triggers on requests to generate consistent characters, preserve identity across images, create face-swapping workflows, or convert 3D renders to photorealistic portraits. Does NOT cover general image generation without identity preservation, model training/LoRA fine-tuning, animation, technical explanations, or workflow debugging.

Ads & Marketing

What this skill does


# ComfyUI Character Generation Expert

Build production-ready ComfyUI workflows for consistent character generation across image, video, and voice modalities.

## Quick Decision: Which Approach?

**Starting from reference images (like 3D renders)?**
→ **InfiniteYou** (state-of-the-art 2025) or InstantID + IP-Adapter (proven, lower VRAM)

**Need highest identity fidelity?**
→ **FLUX.2** (NEW 2026: up to 10 ref images) or **PuLID Flux II** (no model pollution)

**Want iterative editing without retraining?**
→ **FLUX Kontext** (context-aware, maintains consistency across edits)

**Creating video content?**
→ **LTX-2** (NEW 2026: 4K production-ready), **Wan 2.2 MoE** (film-level), or **FramePack** (60-sec on 6GB!)

**Need voice for character?**
→ **TTS Audio Suite** (unified platform, 23 languages) or **F5-TTS Cross-Lingual** (NEW 2026)

## Core Workflow Patterns

### Pattern 1: Zero-Shot Character Generation (No Training)

Best for: Quick iteration, 3D-to-photorealism conversion, limited reference images

```
Load Reference Face → InstantID + IP-Adapter FaceID → ControlNet Pose → KSampler → FaceDetailer → Upscale
```

**Critical settings:**
- CFG: 4-5 (prevents burning with InstantID)
- Resolution: 1016×1016 (avoids watermark artifacts)
- IP-Adapter weight: 0.6-0.8
- InstantID noise injection: 35% to negative

See `references/workflows.md` for complete node configurations.

### Pattern 2: LoRA + Identity Methods (Maximum Consistency)

Best for: Production work, character series, video generation base

```
Train LoRA → Load LoRA + Checkpoint → Add InstantID/PuLID → Generate → FaceDetailer → ReActor (optional) → Upscale
```

**Training requirements:**
- 15-30 images, varied poses/expressions/lighting
- Unique trigger word (e.g., "sage_character")
- See `references/lora-training.md` for full parameters

### Pattern 3: Video Generation Pipeline

Best for: Talking heads, character animation, promotional content

```
Generate/Load Hero Image → Wan 2.1 I2V OR AnimateDiff → FaceDetailer per frame → Frame Interpolation → Video Combine
```

**Model selection:**
- Wan 2.1 14B: Best quality, 24GB+ VRAM, slower
- Wan 2.1 1.3B: 8GB VRAM, good quality, faster
- AnimateDiff Lightning: Fastest, best for iteration

### Pattern 4: Talking Head with Voice

Best for: Character dialogue, presentations, social content

**Two approaches available:**

```
Approach 1 (Image → Talking Head):
Character Portrait → Generate Audio → SadTalker/LivePortrait → CodeFormer Enhancement → Final Video

Approach 2 (Video → Add Voice):
Existing Video → Generate Audio → Wav2Lip Lip-Sync → CodeFormer Enhancement → Final Video
```

See `references/talking-head-workflows.md` for complete workflows and `references/voice-synthesis.md` for voice creation options.

## Model Recommendations (2026 Updated)

### Image Generation
| Use Case | Model | Notes |
|----------|-------|-------|
| Best photorealism | FLUX.1-dev | Slow but superior quality |
| Multi-reference consistency | **FLUX.2** | **NEW 2026**: Up to 10 ref images, strong identity preservation |
| Fast iteration | RealVisXL V5.0 | Good balance speed/quality |
| Character editing | **FLUX Kontext** | Context-aware, maintains consistency across edits |
| Iterative refinement | FLUX Kontext Pro/Max | 8x faster than GPT-Image (API) |

### Identity Preservation (2026 State-of-Art)
| Method | Best For | VRAM | Notes |
|--------|----------|------|-------|
| **FLUX.2** | Multi-reference consistency | 24GB+ | **NEW 2026**: Up to 10 ref images, branded content |
| **InfiniteYou** | Highest identity match | 24GB | ICCV 2025 Highlight, SIM/AES variants |
| **FLUX Kontext** | Iterative editing | 12-32GB | Built-in consistency, no retraining |
| **PuLID Flux II** | Dual characters, no pollution | 24-40GB | Contrastive alignment solves model pollution |
| **AuraFace** | Commercial identity encoding | 12GB | **NEW 2026**: Open-source ArcFace alternative |
| InstantID | Style transfer, 3D→realistic | 12GB | Maintenance mode but still excellent |
| IP-Adapter FaceID | Speed, lower VRAM | 6GB+ | Good baseline approach |

### Video Generation
| Model | Quality | Speed | VRAM | Notes |
|-------|---------|-------|------|-------|
| **LTX-2** | ★★★★★ | Medium | 16GB+ | **NEW 2026**: First open-source 4K audio+video, production-ready |
| **Wan 2.2 MoE** | ★★★★★ | Slow | 24GB+ | Film-level aesthetics, first+last frame control |
| **FramePack** | ★★★★★ | Medium | **6GB** | 60-sec videos, VRAM-invariant breakthrough |
| Wan 2.1 1.3B | ★★★★ | Medium | 8GB+ | Consumer-friendly |
| AnimateDiff V3 | ★★★ | Fast | 8GB | Motion/camera LoRAs, infinite length |

### Voice/TTS
| Tool | License | Quality | Features |
|------|---------|---------|----------|
| **TTS Audio Suite** | Multi | ★★★★★ | Unified platform, 23 languages, emotion control |
| **F5-TTS** | MIT | ★★★★ | Zero-shot from <15 sec samples, **Cross-Lingual 2026** |
| Chatterbox | MIT | ★★★★★ | Paralinguistic tags (`[laugh]`, `[sigh]`), 4 voices |
| IndexTTS-2 | MIT | ★★★★ | 8-emotion vector control |
| ElevenLabs | Commercial | ★★★★★ | Production quality (API) |

## Essential Custom Nodes

Install via ComfyUI-Manager:

```
ComfyUI-Manager              # Must install first
ComfyUI_IPAdapter_plus       # IP-Adapter and FaceID
ComfyUI_InstantID            # InstantID workflow
ComfyUI-Impact-Pack          # FaceDetailer
ComfyUI-ReActor              # Face swapping
ComfyUI-AnimateDiff-Evolved  # Video generation
ComfyUI-VideoHelperSuite     # Video I/O
comfyui_controlnet_aux       # Pose/depth preprocessors
ComfyUI_UltimateSDUpscale    # Tiled upscaling
ComfyUI-Frame-Interpolation  # Smooth video
```

## RTX 50 Series Optimization (NEW 2026)

With 32GB VRAM on RTX 5090, run most workflows without optimization. **ComfyUI v0.8.1** adds major RTX 50 Series enhancements:

```
Launch flags: --highvram --fp8_e4m3fn-unet
```

**NEW v0.8.1 Features:**
- **NVFP4/NVFP8 precision formats**: 3x faster performance, 60% VRAM reduction on RTX 50 Series
- **Weight streaming**: Uses system RAM when VRAM exhausted, enables larger models on mid-range GPUs
- Enable tiled VAE for 8K+ upscaling
- Batch 4× 1024×1024 generations in parallel
- Run Wan 2.2 14B + LTX-2 natively
- Use FP8 quantization for FLUX (50% VRAM reduction)

## Workflow Generation Process

When building a workflow for a user:

1. **Clarify the goal**: Image only? Video? With voice? What's the source material?

2. **Select the pipeline pattern** from above based on requirements

3. **Generate the workflow** following node configurations in `references/workflows.md`

4. **Include model downloads** with exact filenames and paths from `references/models.md`

5. **Provide parameter recommendations** specific to their hardware/use case

## Reference Files

- `references/research-log.md` - **Latest techniques**: InfiniteYou, FLUX Kontext, PuLID Flux II, Wan 2.2 MoE, FramePack, FLUX.2, LTX-2.3, Wan 2.6, Qwen3-TTS, and more
- `references/models.md` - Complete model list with HuggingFace/Civitai links, file paths, and compatibility notes
- `references/workflows.md` - Detailed node-by-node workflow templates for each pattern
- `references/lora-training.md` - LoRA training guide with Kohya/AI-Toolkit parameters
- `references/voice-synthesis.md` - Voice cloning, TTS, and lip-sync pipeline details
- `references/talking-head-workflows.md` - **Complete talking head workflows**: Image→Talking Head (SadTalker, LivePortrait) and Video→Add Voice (Wav2Lip) with production scripts
- `references/evolution.md` - Update sources, changelog, and user-specific learnings

## Skill Evolution

This skill is designed to evolve. When helping the user:

**Before starting a workflow:**
- Check if new models have dropped that might be better (search HuggingFace/Civitai if uncertain)
- Consider if user's past successes/failures inform the approach

**After completing a workflow:**
- Note what worked well or poorly for future reference
- If user discovers better settings, update the relevant reference file

**Pr

Related in Ads & Marketing