ai-video-generation

Included with Lifetime

$97 forever

Generate AI videos on RunComfy via the `runcomfy` CLI — a smart router across the full video-model catalog: HappyHorse 1.0 (Arena #1, native in-pass audio), Wan-AI Wan 2-7 (open weights, audio-driven lip-sync), ByteDance Seedance v2 / 1-5 / 1-0 (multi-modal cinematic), Kling 3.0 / 2-6, Google Veo 3-1, MiniMax Hailuo 2-3, ByteDance Dreamina 3-0. Covers text-to-video (t2v), image-to-video (i2v), and Veo's video-extend endpoint. The skill picks the right model for the user's intent (Arena-#1 quality, multi-shot character identity, in-pass audio, cinematic motion, fastest path, sub-15s clip, longest duration) and ships each model's documented prompting patterns plus the minimal `runcomfy run` invoke. Triggers on "generate video", "make a video", "text to video", "t2v", "image to video", "i2v", "animate", "AI video", "make X move", "video from prompt", "video from image", or any explicit ask to produce a video clip from prompt or still.

Image & Video

What this skill does


# AI Video Generation

Generate videos with the full RunComfy video-model catalog through one CLI — text-to-video, image-to-video, and Veo's video-extend. This skill picks the right model for the user's intent and ships the documented prompt patterns + the exact `runcomfy run` invoke for each.

[runcomfy.com](https://www.runcomfy.com/?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation) · [Video models](https://www.runcomfy.com/models?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation) · [CLI docs](https://docs.runcomfy.com/cli/introduction?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)

## Powered by the RunComfy CLI

```bash
# 1. Install (see runcomfy-cli skill for details)
npm i -g @runcomfy/cli      # or:  npx -y @runcomfy/cli --version

# 2. Sign in
runcomfy login              # or in CI: export RUNCOMFY_TOKEN=<token>

# 3. Generate
runcomfy run <vendor>/<model>/<endpoint> \
  --input '{"prompt": "..."}' \
  --output-dir ./out
```

CLI deep dive: [`runcomfy-cli`](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/runcomfy-cli) skill.

## Install this skill

```bash
npx skills add agentspace-so/runcomfy-agent-skills --skill ai-video-generation -g
```

---

## Pick the right model for the user's intent

### Text-to-video (t2v) — newest first

**HappyHorse 1.0** — `happyhorse/happyhorse-1-0/text-to-video` *(default)*
> Currently #1 on Artificial Analysis Video Arena. Native synchronized audio generated in-pass (no separate Foley step). Native 1080p, up to ~15s, strong multi-shot character consistency.
> Pick for: general-purpose t2v, ad creative with audio, social-media clips, multi-shot narratives.
> Avoid for: audio-driven lip-sync to a specific voiceover MP3 — use **Wan 2-7**.

**Kling 3.0 4K** — [`kling/kling-3.0/4k/text-to-video`](https://www.runcomfy.com/models/kling/kling-3.0/4k/text-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Kling's latest, 4K output, strong multi-shot character identity, premium camera language.
> Pick for: hero shots, final-delivery 4K cuts, multi-shot character narratives.
> Avoid for: cost-sensitive iteration — drop to **Kling 2-6 Pro** or **Standard** i2v.

**Seedance v2 Pro** — `bytedance/seedance-v2/pro`
> ByteDance flagship — multi-modal (up to 9 reference images, 3 reference videos, 3 reference audio), in-pass synchronized audio, cinematic motion refinement, lens language honored.
> Pick for: cinematic ad frames, multi-reference composition (subject + scene + audio refs), 21:9 anamorphic looks.
> Avoid for: simple "single prompt → clip" jobs — overpowered, slower.

**Seedance v2 Fast** — [`bytedance/seedance-v2/fast`](https://www.runcomfy.com/models/bytedance/seedance-v2/fast?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Faster variant of Seedance v2 Pro, same multi-modal capabilities.
> Pick for: iteration on Seedance v2 compositions before locking a final on Pro.
> Avoid for: hero-shot final delivery.

**Wan 2-7** — `wan-ai/wan-2-7/text-to-video`
> Open-weights flagship, `audio_url` field for audio-driven lip-sync, pairs natively with Wan image models.
> Pick for: dialog scenes where mouth must sync to a specific voiceover file; open-weights pipeline requirement.
> Avoid for: in-pass audio generation (no MP3 input) — use **HappyHorse 1.0**.

**Kling 2-6 Pro** — [`kling/kling-2-6/pro/text-to-video`](https://www.runcomfy.com/models/kling/kling-2-6/pro/text-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Previous Kling tier — still strong quality at much lower cost than 3.0 4K.
> Pick for: production at scale where 3.0 4K is too expensive.
> Avoid for: top-tier hero shots — use **Kling 3.0 4K**.

**Seedance 1-5 Pro** — [`bytedance/seedance-1-5/pro/text-to-video`](https://www.runcomfy.com/models/bytedance/seedance-1-5/pro/text-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Previous Seedance generation, cheaper.
> Pick for: identity-stable batches between 1-5 generations; cost-sensitive baseline.
> Avoid for: new work — prefer **Seedance v2 Pro** or **Fast**.

### Image-to-video (i2v) — newest first

**HappyHorse 1.0 I2V** — `happyhorse/happyhorse-1-0/image-to-video` *(default)*
> Animate any still with in-pass audio described in prompt, strong identity preservation.
> Pick for: animating a generated portrait or product still, vertical social clips, voiceover-described audio.
> Avoid for: physics-accurate object motion — use **Veo 3-1**.

**Veo 3-1** — [`google-deepmind/veo-3-1/image-to-video`](https://www.runcomfy.com/models/google-deepmind/veo-3-1/image-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Google's flagship — physics-respecting motion, strong object permanence ("rotates 180 degrees" = 180°), pairs with `extend-video` for longer clips.
> Pick for: product spins, physics-accurate motion, scenes where "no other motion" must hold.
> Avoid for: audio-driven dialog — use **Wan 2-7** or **HappyHorse**.

**Veo 3-1 Fast** — [`google-deepmind/veo-3-1/fast/image-to-video`](https://www.runcomfy.com/models/google-deepmind/veo-3-1/fast/image-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Faster Veo 3-1 variant.
> Pick for: iteration on Veo compositions.
> Avoid for: hero delivery — use full **Veo 3-1**.

**Kling 3.0 4K I2V** — [`kling/kling-3.0/4k/image-to-video`](https://www.runcomfy.com/models/kling/kling-3.0/4k/image-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Multi-shot character identity, 4K output from a still.
> Pick for: 4K hero shots, character-narrative cuts.
> Avoid for: cost iteration — drop to Pro or Standard.

**Kling 3.0 Pro I2V** — [`kling/kling-3.0/pro/image-to-video`](https://www.runcomfy.com/models/kling/kling-3.0/pro/image-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Default Kling 3.0 quality tier.
> Pick for: high-quality i2v at moderate cost.
> Avoid for: 4K final delivery.

**Kling 3.0 Standard I2V** — [`kling/kling-3.0/standard/image-to-video`](https://www.runcomfy.com/models/kling/kling-3.0/standard/image-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Cheapest 3.0 i2v tier.
> Pick for: concepting / drafts on Kling 3.0.
> Avoid for: final delivery.

**Hailuo 2-3 Pro** — [`minimax/hailuo-2-3/pro/image-to-video`](https://www.runcomfy.com/models/minimax/hailuo-2-3/pro/image-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> MiniMax Hailuo latest — natural motion, strong on real-world subjects.
> Pick for: lifelike motion of real-people / real-product subjects.
> Avoid for: stylized characters — use Kling or Dreamina.

**Dreamina 3-0 Pro** — [`bytedance/dreamina-3-0/pro/image-to-video`](https://www.runcomfy.com/models/bytedance/dreamina-3-0/pro/image-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> ByteDance Dreamina i2v — illustration / stylized character lean.
> Pick for: animating illustrated heroes, painterly stills.
> Avoid for: photoreal motion.

**Seedance 1-0 Pro Fast** — [`bytedance/seedance-1-0/pro/fast/image-to-video`](https://www.runcomfy.com/models/bytedance/seedance-1-0/pro/fast/image-to-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Older Seedance i2v generation, cheap.
> Pick for: cost-sensitive batch i2v on Seedance.
> Avoid for: new work — Seedance v2 Pro is more capable (t2v + i2v + multi-modal).

### Extend an existing video — newest first

**Veo 3-1 Extend** — [`google-deepmind/veo-3-1/extend-video`](https://www.runcomfy.com/models/google-deepmind/veo-3-1/extend-video?utm_source=skills.sh&utm_medium=skill&utm_campaign=ai-video-generation)
> Continue an existing Veo clip with consistent motion / lighting / identity.
> Pick for: extending a video past Veo's per-call duration cap; chained narrat

Files: 1

Size: 24.2 KB

Complexity: 39/100

Category: Image & Video

Source: https://github.com/runcomfy-com/skills/tree/main/ai-video-generation

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts