realistic-ugc-video
Create realistic long-form AI talking head/UGC videos that don't look AI-generated. Use when user wants to make "realistic AI video", "UGC video", "talking head video", "AI spokesperson", "AI ad content", "video that looks real", "human-looking AI video". Orchestrates Nano Banana for base images and Kling AI for video generation with natural pacing.
What this skill does
# Realistic UGC Video Production
Create long-form AI videos that look and sound authentically human. This skill orchestrates a multi-step workflow using Nano Banana for realistic base images and Kling AI for video generation, with specific techniques to avoid the "AI look".
## Why Videos Look AI (And How to Fix It)
| Problem | Solution |
|---------|----------|
| Too perfect/clean skin | Add imperfections: micro-pores, natural oils, fine lines |
| Studio lighting | Use available/natural light, mixed color temps |
| Character too still | Add micro-movements, head tilts, natural sway |
| Inconsistent pacing | Use 55-60 syllables per clip |
| Robotic voice | Process through Adobe Podcast or Resemble AI |
| Obvious jump cuts | Cover with B-roll or animations |
| **Weird AI hands** | **Crop hands out of frame or keep completely static** |
## Known Limitation: Hands
AI video models (Kling, Veo, etc.) struggle with realistic hand movement. Fingers morph, gestures look unnatural, and hands are often the biggest tell.
**Best practice: Keep hands OUT of frame or completely static.**
Options:
1. **Head/shoulders framing** - Crop base image to exclude hands entirely
2. **Arms crossed** - Static pose, no finger movement needed
3. **Hands below frame** - Desk edge cuts off at wrists
4. **Cover with B-roll** - Cut away during any hand weirdness in post
## Complete Workflow
### Phase 1: Collect Requirements
Before starting, gather from the user:
1. **Character description** - Age, ethnicity, features, clothing
2. **Setting/background** - Office, home, studio, outdoor
3. **Script** - The full text the character will speak
4. **Tone** - Conversational, urgent, professional, friendly
5. **Video length target** - This determines how many clips needed
### Phase 2: Generate Base Image (Nano Banana)
Use the Nano Banana skill to generate the character image. **Critical:** Apply the imperfection techniques from [CHARACTER-PROMPTING.md](CHARACTER-PROMPTING.md).
**Key elements for realistic UGC:**
- iPhone capture aesthetic (26mm equivalent lens)
- Available/mixed lighting (NOT studio)
- Visible skin texture (pores, oils, fine lines)
- Minor imperfections (stubble, dark circles)
- Computational depth artifacts
- ISO noise (500-900 range)
**Command:**
```bash
~/.claude/skills/nano-banana/scripts/generate.sh "[enhanced prompt]" --aspect 9:16 --size 2K
```
Then optionally upscale through Enhancor AI for additional texture.
### Phase 3: Chunk the Script (Critical for Pacing)
This is the **most important step** for natural pacing. See [SCRIPT-CHUNKING.md](SCRIPT-CHUNKING.md).
**The 55-60 Syllable Rule:**
- Count syllables, not words
- Each video generation = 55-60 syllables
- Never cut mid-sentence
- Add filler sentences to reach target if needed
**Example chunking:**
```
Chunk 1 (58 syllables):
"Hey everyone, I wanted to share something that completely changed how I think about productivity. It's not another app or system."
Chunk 2 (56 syllables):
"It's actually about understanding your own energy patterns throughout the day. Once I figured this out, everything clicked into place."
```
### Phase 4: Generate Video Clips (Kling AI)
For each script chunk, generate a 10-second video clip. Use the Kling AI skill with **movement prompts** from [MOVEMENT-PROMPTING.md](MOVEMENT-PROMPTING.md).
**Key elements:**
- Hand "Home Base" Protocol
- Timestamped movement clusters
- Natural blinks, head tilts, micro-sways
- Consistent base image reference
**Spawn background agent for each clip:**
```
Task tool:
- subagent_type: "general-purpose"
- run_in_background: true
- prompt: [Include image URL, script chunk, movement prompt, output path]
```
### Phase 5: Post-Production
See [POST-PRODUCTION.md](POST-PRODUCTION.md) for detailed guidance.
1. **Assemble clips** in CapCut or similar editor
2. **Fix audio** with Adobe Podcast (minimum) or Resemble AI (for voice swap)
3. **Cover jump cuts** with B-roll or animations
4. **Remove filler sentences** if they feel awkward
5. **Export** in final resolution
## Quick Start: Single Clip Test
Before generating full video, test the workflow with one clip:
1. Generate base image with full imperfections
2. Create 10-second test video with first script chunk
3. Evaluate pacing and movement
4. Adjust prompts if needed
5. Proceed with full production
## Example Full Prompt for Base Image
```
A vertical 9:16 UGC-style video frame captured on an iPhone 11 resting on a tripod.
Medium-wide portrait at true eye-level with slightly forward-leaning posture.
[CHARACTER]: A [age] [gender] with [ethnicity] complexion, [eye color] eyes beneath
[eyebrow description], [nose description]. [Jawline/facial hair]. [Hair description].
Expression is [emotion]—[specific expression details].
[CLOTHING]: [Fitted/casual garment], [collar detail].
[SKIN TEXTURE]: Visible pores across T-zone, faint smile lines, natural oils catching
light on forehead and nose. [Age-appropriate details]. No filter, no foundation.
[FOREGROUND]: Hands rest naturally on [surface], fingers relaxed, visible veins and
knuckle texture. Nearby: [everyday objects like water bottle, phone, notebook].
[CAMERA]: Native iPhone 11 lens (26mm equivalent), slightly wide perspective, mild
barrel softness at edges. Only tiny pockets of neural blur around hair edges.
[LIGHTING]: Available light mix—cool overcast daylight from window left, warm tungsten
from desk lamp right. Soft asymmetric shadows, natural falloff. ISO noise 500-900.
[BACKGROUND]: [Realistic home/office elements]—bookshelf, [furniture], clearly visible
not heavily blurred.
[REALITY DETAILS]: Gentle 35mm film grain, light fingerprint smudge on lens, tiny dust
haze in air. No cinematic bloom, no studio finish.
Styling: raw UGC realism, available indoor light, mixed color temperature, minimal
depth blur, visible ISO noise, emphasis on authenticity.
```
## Example Movement Prompt for Video
```
Hand "Home Base" Protocol: Hands default to Active Idle. Fingers shift, thumbs rub,
wrists rotate slightly while anchored. Gestures only for key emphasis.
[0.0s-0.5s] Pre-roll: Sharp inhale, eyes lock to lens, head still
[0.5s-3.0s] Hands in Active Idle (fingers interlocked), head tilts slightly right,
brows furrow in seriousness
[3.0s-6.0s] Hands break clasp for quick open-palm rotation then return, head drifts
forward, natural blink
[6.0s-8.0s] Hands return to Active Idle (loose clasp, thumbs tapping), head nods
encouragingly, cheeks lift in natural smile
[8.0s-10.0s] Hands anchored (wrist shifts), chin lifts in quick final nod, natural blink
[Script]: "[CHUNK TEXT HERE]"
[Tone]: [Urgent/Conversational/Professional/etc.]
[Pacing]: Rapid fire delivery, high energy, viral UGC style, confident, 2x speed
```
## Reference Files
- **[CHARACTER-PROMPTING.md](CHARACTER-PROMPTING.md)** - Imperfection techniques for realistic characters
- **[SCRIPT-CHUNKING.md](SCRIPT-CHUNKING.md)** - The syllable method for consistent pacing
- **[MOVEMENT-PROMPTING.md](MOVEMENT-PROMPTING.md)** - Natural movement choreography
- **[POST-PRODUCTION.md](POST-PRODUCTION.md)** - Audio fixing and editing tips
## Alternative: InfiniteTalk
For simpler long-form videos, consider InfiniteTalk (infinitetalk.ai).
**Pros:** Single generation for longer videos
**Cons:** Less control over pacing (no timer/duration control), charged by output length
Use the syllable method above when precise pacing control is needed.
## Checklist Before Generating
- [ ] Character prompt includes skin imperfections
- [ ] Lighting is available/natural, NOT studio
- [ ] Script chunked into 55-60 syllable segments
- [ ] Each chunk is complete sentences
- [ ] Movement prompt includes hand base, head movements, blinks
- [ ] Post-production plan for audio and jump cuts
Related in Ads & Marketing
ads
IncludedMulti-platform paid advertising audit and optimization skill. Analyzes Google, Meta, YouTube, LinkedIn, TikTok, Microsoft, and Apple Ads. 250+ checks with scoring, parallel agents, industry templates, and AI creative generation.
banana
IncludedAI image generation Creative Director powered by Google Gemini Nano Banana models. Use this skill for ANY request involving image creation, editing, visual asset production, or creative direction. Triggers on: generate an image, create a photo, edit this picture, design a logo, make a banner, visual for my anything, and all /banana commands. Handles text-to-image, image editing, multi-turn creative sessions, batch workflows, and brand presets.
rpg-migration-analyzer
IncludedAnalyzes legacy RPG (Report Program Generator) programs from AS/400 and IBM i systems for migration to modern Java applications. Extracts business logic from RPG III/IV/ILE source code, identifies data structures (D-specs), file operations (F-specs), program dependencies (CALLB/CALLP), and converts RPG constructs to Java equivalents. Generates migration reports, complexity estimates, and Java implementation strategies with POJO classes, JPA entities, and service methods. Use when modernizing AS/400 or IBM i legacy systems, analyzing RPG source files (.rpg, .rpgle, .RPGLE), converting RPG to Java, mapping data specifications to Java classes, planning legacy system migration, or when user mentions RPG analysis, Report Program Generator, RPG III/IV/ILE, AS/400 modernization, IBM i migration, packed decimal conversion, or mainframe application rewrite.
brand-library-architect
IncludedBuild a complete brand library for a product — visual asset render pipeline, brand documentation set (BRAND, COPY, MANIFESTO, BIOS, FAQ, GLOSSARY, TONE, PRICING), open-source convention files (README, CONTRIBUTING, SECURITY, CODE_OF_CONDUCT), and a self-contained press kit. This skill should be used when the user asks to "build a brand library / brand kit / press kit / brand assets" for a product, "set up a brand library workflow," "create a positioning manifesto plus visual identity," or any combination of brand documentation + visual asset pipeline. Apply phase-by-phase or run end-to-end. Templates are product-agnostic and use {{TOKEN}} placeholders the skill prompts the user to fill.
writing-tech-post
IncludedAuthors engineering blog posts end-to-end: launch deep-dives, incident postmortems, architecture migrations, performance case studies, tutorials, AI/agent system writeups, security disclosures, and research-to-product translations. Picks the correct archetype, plans the abstraction ladder, enforces an evidence cadence (diagrams, benchmarks, profiles, traces, code, ablations), tunes voice against publisher house styles (Datadog, Vercel, GitHub, AWS, Meta, Cloudflare, Jane Street), and runs a pre-publish gate for narrative momentum and disclosure ethics. Use when drafting a new engineering post, restructuring a draft that feels flat, deciding which evidence form belongs where, validating that depth and product context are balanced, or preparing a postmortem, migration, or performance narrative for external publication. Do not use for API reference documentation, README authoring, marketing copy, release notes, generic SEO content, ghost-written executive thought leadership, or non-engineering long-form essays.
blog-google
IncludedGoogle API integration for blog performance: PageSpeed Insights, CrUX Core Web Vitals with 25-week history, Search Console performance, URL Inspection, Indexing API, GA4 organic traffic, NLP entity analysis for E-E-A-T, YouTube video search for embedding, and Google Ads Keyword Planner. Progressive feature availability based on credential tier (API key, OAuth/service account, GA4, Ads). Shares config with claude-seo at ~/.config/claude-seo/google-api.json. Use when user says "google data", "page speed", "core web vitals", "search console", "indexation", "GA4", "keyword research", "nlp entities", "blog performance", "youtube search", "google api setup".