qwencloud-video-generation

Included with Lifetime

$97 forever

[QwenCloud] Generate videos using Wan models. Supports text-to-video, image-to-video, first+last frame, reference-based role-play, and video editing (VACE). TRIGGER when: user wants to create, generate, or edit video content, mentions video generation/animation/video clips/Wan models, or explicitly invokes this skill by name (e.g. use qwencloud-video-generation). DO NOT TRIGGER when: user wants to generate images (use qwencloud-image-generation), understand/analyze existing videos (use qwencloud-vision), text-only tasks.

Image & Videoscripts

What this skill does


> **Agent setup**: If your agent doesn't auto-load skills (e.g. Claude Code),
> see [agent-compatibility.md](references/agent-compatibility.md) once per session.

# Qwen Video Generation

Generate videos using Wan models. All tasks are **asynchronous** — submit, then poll until
completion.
This skill is part of **qwencloud/qwencloud-ai**.

> **⚠️ Critical Parameter Differences by Mode:**
> - **kf2v (First+Last Frame)**: Duration is **fixed at 5 seconds** — other values will fail. Output is **silent only**.
> - **Resolution parameter varies**: t2v/r2v/vace use `size` (e.g. `"1280*720"`); i2v/kf2v use `resolution` (e.g. `"720P"`).

## Skill directory

Use this skill's internal files to execute and learn. Load reference files on demand when the default path fails or you need details.

| Location | Purpose |
|----------|---------|
| `scripts/video.py` | Default execution — mode auto-detect, submit, poll, download |
| `references/execution-guide.md` | Fallback: curl for all 5 modes, code generation |
| `references/request-fields.md` | Field tables and audio handling by mode |
| `references/workflows.md` | Duration extensions, multi-shot, VACE pipelines |
| `references/polling-guide.md` | Polling patterns and timing |
| `references/merge-media.md` | Concat, trim, audio overlay — ffmpeg/moviepy recipes |
| `references/prompt-guide.md` | Per-mode prompt formulas, sound description, multi-shot structure |
| `references/examples.md` | Full script examples per mode |
| `references/sources.md` | Official documentation URLs |
| `references/agent-compatibility.md` | Agent self-check: register skills in project config for agents that don't auto-load |

## Security

**NEVER output any API key or credential in plaintext.** Always use variable references (`$DASHSCOPE_API_KEY` in shell, `os.environ["DASHSCOPE_API_KEY"]` in Python). Any check or detection of credentials must be **non-plaintext**: report only status (e.g. "set" / "not set", "valid" / "invalid"), never the value. Never display contents of `.env` or config files that may contain secrets.

**When the API key is not configured, NEVER ask the user to provide it directly.** Instead, help create a `.env` file with a placeholder (`DASHSCOPE_API_KEY=sk-your-key-here`) and instruct the user to replace it with their actual key from the [QwenCloud Console](https://home.qwencloud.com/api-keys). Only write the actual key value if the user explicitly requests it.

## Key Compatibility

Scripts require a **standard QwenCloud API key** (`sk-...`). Coding Plan keys (`sk-sp-...`) cannot be used — video generation models are not available on Coding Plan, and Coding Plan does not support the native QwenCloud API. Video generation incurs per-second charges on standard keys. The script detects `sk-sp-` keys at startup and prints a warning. If qwencloud-ops-auth is installed, see its `references/codingplan.md` for full details.

## Mode Selection Guide

| User Want | Mode | Key Field |
|-----------|------|-----------|
| Generate video from text description only | **t2v** | `prompt` only |
| Animate a single image | **i2v** | `img_url` or `reference_image` |
| wan2.7 unified i2v: first frame, first+last frame, video continuation, audio sync | **i2v** | `media[]`, `first_frame_url`, `first_clip_url`, `driving_audio_url` |
| Transition between two images (**⚠️ 5s fixed, silent only**) | **kf2v** | `first_frame_url` + `last_frame_url` |
| Role-play: make characters act a new script | **r2v** | `reference_urls` (up to 5) |
| Video editing: multi-image ref, repainting, local edit, extend, outpaint | **vace** | `function` |

### Model Selection

1. **User specified a model** → use directly.
2. **Consult the qwencloud-model-selector skill** when model choice depends on capability, scenario, or pricing.
3. **No signal, clear task** → defaults: t2v → `wan2.6-t2v`, i2v → `wan2.6-i2v-flash`, kf2v → `wan2.2-kf2v-flash`, r2v → `wan2.6-r2v-flash`, vace → `wan2.1-vace-plus`. For wan2.7 features, explicitly set `--model wan2.7-t2v` or `--model wan2.7-i2v`.

## Models

### t2v (Text-to-Video)

| Model | Features |
|-------|----------|
| `wan2.7-t2v` | Ratio control, auto-dubbing, 5000 char prompt, 720P/1080P. Use `resolution` + `ratio` params. |
| `wan2.6-t2v` **default** | Audio, multi-shot, 2–15s, 720P/1080P. Use `size` param. |
| `wan2.5-t2v-preview` | Audio, 5s/10s, 480P/720P/1080P |
| `wan2.2-t2v-plus` | Silent, 5s, 480P/1080P |

### i2v (Image-to-Video)

| Model | Features |
|-------|----------|
| `wan2.7-i2v` | Unified protocol: first frame, first+last frame, video continuation, audio sync. Uses `media[]` array. |
| `wan2.6-i2v-flash` **default** | Audio/silent, multi-shot, 2–15s, 720P/1080P. Uses `img_url`. |
| `wan2.6-i2v` | Audio, multi-shot, 2–15s, 720P/1080P |
| `wan2.5-i2v-preview` | Audio, 5s/10s, 480P/720P/1080P |

### kf2v / r2v / vace

| Model                                  | Features                                           |
|----------------------------------------|----------------------------------------------------|
| `wan2.2-kf2v-flash` **(kf2v default)** | Silent, 5s, 480P/720P/1080P                        |
| `wan2.6-r2v`                           | Audio, single/multi character, 2–10s, 720P/1080P   |
| `wan2.6-r2v-flash` **(r2v default)**   | Audio/silent, multi-character, 2–10s, 720P/1080P   |
| `wan2.1-vace-plus` **(vace)**         | Multi-image ref, repainting, local edit, ≤5s, 720P |

> **⚠️ Important**: The model list above is a **point-in-time snapshot** and may be outdated. Model availability
> changes frequently. **Always check the [official model list](https://www.qwencloud.com/models)
> for the authoritative, up-to-date catalog before making model decisions.**

> **Model details**: For more information about a specific model, direct the user to its detail page: `https://www.qwencloud.com/models/<model-name>` (replace `<model-name>` with the exact model ID, e.g. `wan2.7-t2v` → https://www.qwencloud.com/models/wan2.7-t2v). NEVER modify or guess the model name in the URL.

> **Dynamic model queries**: If the **qwencloud-model-selector** skill or **QwenCloud CLI** (`qwencloud models info <model>`) is available, use it for real-time model data. CLI requires authentication — see the **qwencloud-usage** skill for login flow.

## Execution

> **⚠️ Multiple artifacts**: When generating multiple files in a single session, you MUST append a numeric suffix to each filename (e.g. `out_1.mp4`, `out_2.mp4`) to prevent overwrites.

### Prerequisites

- **API Key**: Check that `DASHSCOPE_API_KEY` (or `QWEN_API_KEY`) is set using a **non-plaintext** check only (e.g. in shell:
  `[ -n "$DASHSCOPE_API_KEY" ]`; report only "set" or "not set", never the key value). If not set: run the *
  *qwencloud-ops-auth** skill if available; otherwise guide the user to obtain a key from [QwenCloud Console](https://home.qwencloud.com/api-keys) and set it via `.env` file (
  `echo 'DASHSCOPE_API_KEY=sk-your-key-here' >> .env` in project root or current directory) or environment variable. The
  script searches for `.env` in the current working directory and the project root. Skills may be installed
  independently — do not assume qwencloud-ops-auth is present.
- Python 3.9+ (stdlib only, **no pip install needed**)
- For media merging (concat, trim, audio overlay): see [merge-media.md](references/merge-media.md) for ffmpeg/moviepy recipes suited to the user's environment

### Environment Check

Before first execution, verify Python is available:

```bash
python3 --version  # must be 3.9+
```

If `python3` is not found, try `python --version` or `py -3 --version`. If Python is unavailable or below 3.9, skip to **Path 2 (curl)** in [execution-guide.md](references/execution-guide.md).

### Default: Run Script

**Script path**: Scripts are in the `scripts/` subdirectory **of this skill's directory** (the directory containing this SKILL.md). **You MUST first locate this skill's installation directory, then ALWAYS use the full absolute path to exe

Files: 15

Size: 128.9 KB

Complexity: 83/100

Category: Image & Video

Source: https://github.com/qwencloud/qwencloud-ai/tree/main/skills/video/qwencloud-video-generation

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts