Claude
Skills
Sign in
Back

realistic-ugc-video

Included with Lifetime
$97 forever

Create realistic long-form AI talking head/UGC videos that don't look AI-generated. Use when user wants to make "realistic AI video", "UGC video", "talking head video", "AI spokesperson", "AI ad content", "video that looks real", "human-looking AI video". Orchestrates Nano Banana for base images and Kling AI for video generation with natural pacing.

Ads & Marketing

What this skill does


# Realistic UGC Video Production

Create long-form AI videos that look and sound authentically human. This skill orchestrates a multi-step workflow using Nano Banana for realistic base images and Kling AI for video generation, with specific techniques to avoid the "AI look".

## Why Videos Look AI (And How to Fix It)

| Problem | Solution |
|---------|----------|
| Too perfect/clean skin | Add imperfections: micro-pores, natural oils, fine lines |
| Studio lighting | Use available/natural light, mixed color temps |
| Character too still | Add micro-movements, head tilts, natural sway |
| Inconsistent pacing | Use 55-60 syllables per clip |
| Robotic voice | Process through Adobe Podcast or Resemble AI |
| Obvious jump cuts | Cover with B-roll or animations |
| **Weird AI hands** | **Crop hands out of frame or keep completely static** |

## Known Limitation: Hands

AI video models (Kling, Veo, etc.) struggle with realistic hand movement. Fingers morph, gestures look unnatural, and hands are often the biggest tell.

**Best practice: Keep hands OUT of frame or completely static.**

Options:
1. **Head/shoulders framing** - Crop base image to exclude hands entirely
2. **Arms crossed** - Static pose, no finger movement needed
3. **Hands below frame** - Desk edge cuts off at wrists
4. **Cover with B-roll** - Cut away during any hand weirdness in post

## Complete Workflow

### Phase 1: Collect Requirements

Before starting, gather from the user:

1. **Character description** - Age, ethnicity, features, clothing
2. **Setting/background** - Office, home, studio, outdoor
3. **Script** - The full text the character will speak
4. **Tone** - Conversational, urgent, professional, friendly
5. **Video length target** - This determines how many clips needed

### Phase 2: Generate Base Image (Nano Banana)

Use the Nano Banana skill to generate the character image. **Critical:** Apply the imperfection techniques from [CHARACTER-PROMPTING.md](CHARACTER-PROMPTING.md).

**Key elements for realistic UGC:**
- iPhone capture aesthetic (26mm equivalent lens)
- Available/mixed lighting (NOT studio)
- Visible skin texture (pores, oils, fine lines)
- Minor imperfections (stubble, dark circles)
- Computational depth artifacts
- ISO noise (500-900 range)

**Command:**
```bash
~/.claude/skills/nano-banana/scripts/generate.sh "[enhanced prompt]" --aspect 9:16 --size 2K
```

Then optionally upscale through Enhancor AI for additional texture.

### Phase 3: Chunk the Script (Critical for Pacing)

This is the **most important step** for natural pacing. See [SCRIPT-CHUNKING.md](SCRIPT-CHUNKING.md).

**The 55-60 Syllable Rule:**
- Count syllables, not words
- Each video generation = 55-60 syllables
- Never cut mid-sentence
- Add filler sentences to reach target if needed

**Example chunking:**
```
Chunk 1 (58 syllables):
"Hey everyone, I wanted to share something that completely changed how I think about productivity. It's not another app or system."

Chunk 2 (56 syllables):
"It's actually about understanding your own energy patterns throughout the day. Once I figured this out, everything clicked into place."
```

### Phase 4: Generate Video Clips (Kling AI)

For each script chunk, generate a 10-second video clip. Use the Kling AI skill with **movement prompts** from [MOVEMENT-PROMPTING.md](MOVEMENT-PROMPTING.md).

**Key elements:**
- Hand "Home Base" Protocol
- Timestamped movement clusters
- Natural blinks, head tilts, micro-sways
- Consistent base image reference

**Spawn background agent for each clip:**
```
Task tool:
- subagent_type: "general-purpose"
- run_in_background: true
- prompt: [Include image URL, script chunk, movement prompt, output path]
```

### Phase 5: Post-Production

See [POST-PRODUCTION.md](POST-PRODUCTION.md) for detailed guidance.

1. **Assemble clips** in CapCut or similar editor
2. **Fix audio** with Adobe Podcast (minimum) or Resemble AI (for voice swap)
3. **Cover jump cuts** with B-roll or animations
4. **Remove filler sentences** if they feel awkward
5. **Export** in final resolution

## Quick Start: Single Clip Test

Before generating full video, test the workflow with one clip:

1. Generate base image with full imperfections
2. Create 10-second test video with first script chunk
3. Evaluate pacing and movement
4. Adjust prompts if needed
5. Proceed with full production

## Example Full Prompt for Base Image

```
A vertical 9:16 UGC-style video frame captured on an iPhone 11 resting on a tripod.
Medium-wide portrait at true eye-level with slightly forward-leaning posture.

[CHARACTER]: A [age] [gender] with [ethnicity] complexion, [eye color] eyes beneath
[eyebrow description], [nose description]. [Jawline/facial hair]. [Hair description].
Expression is [emotion]—[specific expression details].

[CLOTHING]: [Fitted/casual garment], [collar detail].

[SKIN TEXTURE]: Visible pores across T-zone, faint smile lines, natural oils catching
light on forehead and nose. [Age-appropriate details]. No filter, no foundation.

[FOREGROUND]: Hands rest naturally on [surface], fingers relaxed, visible veins and
knuckle texture. Nearby: [everyday objects like water bottle, phone, notebook].

[CAMERA]: Native iPhone 11 lens (26mm equivalent), slightly wide perspective, mild
barrel softness at edges. Only tiny pockets of neural blur around hair edges.

[LIGHTING]: Available light mix—cool overcast daylight from window left, warm tungsten
from desk lamp right. Soft asymmetric shadows, natural falloff. ISO noise 500-900.

[BACKGROUND]: [Realistic home/office elements]—bookshelf, [furniture], clearly visible
not heavily blurred.

[REALITY DETAILS]: Gentle 35mm film grain, light fingerprint smudge on lens, tiny dust
haze in air. No cinematic bloom, no studio finish.

Styling: raw UGC realism, available indoor light, mixed color temperature, minimal
depth blur, visible ISO noise, emphasis on authenticity.
```

## Example Movement Prompt for Video

```
Hand "Home Base" Protocol: Hands default to Active Idle. Fingers shift, thumbs rub,
wrists rotate slightly while anchored. Gestures only for key emphasis.

[0.0s-0.5s] Pre-roll: Sharp inhale, eyes lock to lens, head still
[0.5s-3.0s] Hands in Active Idle (fingers interlocked), head tilts slightly right,
           brows furrow in seriousness
[3.0s-6.0s] Hands break clasp for quick open-palm rotation then return, head drifts
           forward, natural blink
[6.0s-8.0s] Hands return to Active Idle (loose clasp, thumbs tapping), head nods
           encouragingly, cheeks lift in natural smile
[8.0s-10.0s] Hands anchored (wrist shifts), chin lifts in quick final nod, natural blink

[Script]: "[CHUNK TEXT HERE]"
[Tone]: [Urgent/Conversational/Professional/etc.]
[Pacing]: Rapid fire delivery, high energy, viral UGC style, confident, 2x speed
```

## Reference Files

- **[CHARACTER-PROMPTING.md](CHARACTER-PROMPTING.md)** - Imperfection techniques for realistic characters
- **[SCRIPT-CHUNKING.md](SCRIPT-CHUNKING.md)** - The syllable method for consistent pacing
- **[MOVEMENT-PROMPTING.md](MOVEMENT-PROMPTING.md)** - Natural movement choreography
- **[POST-PRODUCTION.md](POST-PRODUCTION.md)** - Audio fixing and editing tips

## Alternative: InfiniteTalk

For simpler long-form videos, consider InfiniteTalk (infinitetalk.ai).

**Pros:** Single generation for longer videos
**Cons:** Less control over pacing (no timer/duration control), charged by output length

Use the syllable method above when precise pacing control is needed.

## Checklist Before Generating

- [ ] Character prompt includes skin imperfections
- [ ] Lighting is available/natural, NOT studio
- [ ] Script chunked into 55-60 syllable segments
- [ ] Each chunk is complete sentences
- [ ] Movement prompt includes hand base, head movements, blinks
- [ ] Post-production plan for audio and jump cuts

Related in Ads & Marketing