qwencloud-video-generation
[QwenCloud] Generate videos using Wan models. Supports text-to-video, image-to-video, first+last frame, reference-based role-play, and video editing (VACE). TRIGGER when: user wants to create, generate, or edit video content, mentions video generation/animation/video clips/Wan models, or explicitly invokes this skill by name (e.g. use qwencloud-video-generation). DO NOT TRIGGER when: user wants to generate images (use qwencloud-image-generation), understand/analyze existing videos (use qwencloud-vision), text-only tasks.
What this skill does
> **Agent setup**: If your agent doesn't auto-load skills (e.g. Claude Code), > see [agent-compatibility.md](references/agent-compatibility.md) once per session. # Qwen Video Generation Generate videos using Wan models. All tasks are **asynchronous** — submit, then poll until completion. This skill is part of **qwencloud/qwencloud-ai**. > **⚠️ Critical Parameter Differences by Mode:** > - **kf2v (First+Last Frame)**: Duration is **fixed at 5 seconds** — other values will fail. Output is **silent only**. > - **Resolution parameter varies**: t2v/r2v/vace use `size` (e.g. `"1280*720"`); i2v/kf2v use `resolution` (e.g. `"720P"`). ## Skill directory Use this skill's internal files to execute and learn. Load reference files on demand when the default path fails or you need details. | Location | Purpose | |----------|---------| | `scripts/video.py` | Default execution — mode auto-detect, submit, poll, download | | `references/execution-guide.md` | Fallback: curl for all 5 modes, code generation | | `references/request-fields.md` | Field tables and audio handling by mode | | `references/workflows.md` | Duration extensions, multi-shot, VACE pipelines | | `references/polling-guide.md` | Polling patterns and timing | | `references/merge-media.md` | Concat, trim, audio overlay — ffmpeg/moviepy recipes | | `references/prompt-guide.md` | Per-mode prompt formulas, sound description, multi-shot structure | | `references/examples.md` | Full script examples per mode | | `references/sources.md` | Official documentation URLs | | `references/agent-compatibility.md` | Agent self-check: register skills in project config for agents that don't auto-load | ## Security **NEVER output any API key or credential in plaintext.** Always use variable references (`$DASHSCOPE_API_KEY` in shell, `os.environ["DASHSCOPE_API_KEY"]` in Python). Any check or detection of credentials must be **non-plaintext**: report only status (e.g. "set" / "not set", "valid" / "invalid"), never the value. Never display contents of `.env` or config files that may contain secrets. **When the API key is not configured, NEVER ask the user to provide it directly.** Instead, help create a `.env` file with a placeholder (`DASHSCOPE_API_KEY=sk-your-key-here`) and instruct the user to replace it with their actual key from the [QwenCloud Console](https://home.qwencloud.com/api-keys). Only write the actual key value if the user explicitly requests it. ## Key Compatibility Scripts require a **standard QwenCloud API key** (`sk-...`). Coding Plan keys (`sk-sp-...`) cannot be used — video generation models are not available on Coding Plan, and Coding Plan does not support the native QwenCloud API. Video generation incurs per-second charges on standard keys. The script detects `sk-sp-` keys at startup and prints a warning. If qwencloud-ops-auth is installed, see its `references/codingplan.md` for full details. ## Mode Selection Guide | User Want | Mode | Key Field | |-----------|------|-----------| | Generate video from text description only | **t2v** | `prompt` only | | Animate a single image | **i2v** | `img_url` or `reference_image` | | wan2.7 unified i2v: first frame, first+last frame, video continuation, audio sync | **i2v** | `media[]`, `first_frame_url`, `first_clip_url`, `driving_audio_url` | | Transition between two images (**⚠️ 5s fixed, silent only**) | **kf2v** | `first_frame_url` + `last_frame_url` | | Role-play: make characters act a new script | **r2v** | `reference_urls` (up to 5) | | Video editing: multi-image ref, repainting, local edit, extend, outpaint | **vace** | `function` | ### Model Selection 1. **User specified a model** → use directly. 2. **Consult the qwencloud-model-selector skill** when model choice depends on capability, scenario, or pricing. 3. **No signal, clear task** → defaults: t2v → `wan2.6-t2v`, i2v → `wan2.6-i2v-flash`, kf2v → `wan2.2-kf2v-flash`, r2v → `wan2.6-r2v-flash`, vace → `wan2.1-vace-plus`. For wan2.7 features, explicitly set `--model wan2.7-t2v` or `--model wan2.7-i2v`. ## Models ### t2v (Text-to-Video) | Model | Features | |-------|----------| | `wan2.7-t2v` | Ratio control, auto-dubbing, 5000 char prompt, 720P/1080P. Use `resolution` + `ratio` params. | | `wan2.6-t2v` **default** | Audio, multi-shot, 2–15s, 720P/1080P. Use `size` param. | | `wan2.5-t2v-preview` | Audio, 5s/10s, 480P/720P/1080P | | `wan2.2-t2v-plus` | Silent, 5s, 480P/1080P | ### i2v (Image-to-Video) | Model | Features | |-------|----------| | `wan2.7-i2v` | Unified protocol: first frame, first+last frame, video continuation, audio sync. Uses `media[]` array. | | `wan2.6-i2v-flash` **default** | Audio/silent, multi-shot, 2–15s, 720P/1080P. Uses `img_url`. | | `wan2.6-i2v` | Audio, multi-shot, 2–15s, 720P/1080P | | `wan2.5-i2v-preview` | Audio, 5s/10s, 480P/720P/1080P | ### kf2v / r2v / vace | Model | Features | |----------------------------------------|----------------------------------------------------| | `wan2.2-kf2v-flash` **(kf2v default)** | Silent, 5s, 480P/720P/1080P | | `wan2.6-r2v` | Audio, single/multi character, 2–10s, 720P/1080P | | `wan2.6-r2v-flash` **(r2v default)** | Audio/silent, multi-character, 2–10s, 720P/1080P | | `wan2.1-vace-plus` **(vace)** | Multi-image ref, repainting, local edit, ≤5s, 720P | > **⚠️ Important**: The model list above is a **point-in-time snapshot** and may be outdated. Model availability > changes frequently. **Always check the [official model list](https://www.qwencloud.com/models) > for the authoritative, up-to-date catalog before making model decisions.** > **Model details**: For more information about a specific model, direct the user to its detail page: `https://www.qwencloud.com/models/<model-name>` (replace `<model-name>` with the exact model ID, e.g. `wan2.7-t2v` → https://www.qwencloud.com/models/wan2.7-t2v). NEVER modify or guess the model name in the URL. > **Dynamic model queries**: If the **qwencloud-model-selector** skill or **QwenCloud CLI** (`qwencloud models info <model>`) is available, use it for real-time model data. CLI requires authentication — see the **qwencloud-usage** skill for login flow. ## Execution > **⚠️ Multiple artifacts**: When generating multiple files in a single session, you MUST append a numeric suffix to each filename (e.g. `out_1.mp4`, `out_2.mp4`) to prevent overwrites. ### Prerequisites - **API Key**: Check that `DASHSCOPE_API_KEY` (or `QWEN_API_KEY`) is set using a **non-plaintext** check only (e.g. in shell: `[ -n "$DASHSCOPE_API_KEY" ]`; report only "set" or "not set", never the key value). If not set: run the * *qwencloud-ops-auth** skill if available; otherwise guide the user to obtain a key from [QwenCloud Console](https://home.qwencloud.com/api-keys) and set it via `.env` file ( `echo 'DASHSCOPE_API_KEY=sk-your-key-here' >> .env` in project root or current directory) or environment variable. The script searches for `.env` in the current working directory and the project root. Skills may be installed independently — do not assume qwencloud-ops-auth is present. - Python 3.9+ (stdlib only, **no pip install needed**) - For media merging (concat, trim, audio overlay): see [merge-media.md](references/merge-media.md) for ffmpeg/moviepy recipes suited to the user's environment ### Environment Check Before first execution, verify Python is available: ```bash python3 --version # must be 3.9+ ``` If `python3` is not found, try `python --version` or `py -3 --version`. If Python is unavailable or below 3.9, skip to **Path 2 (curl)** in [execution-guide.md](references/execution-guide.md). ### Default: Run Script **Script path**: Scripts are in the `scripts/` subdirectory **of this skill's directory** (the directory containing this SKILL.md). **You MUST first locate this skill's installation directory, then ALWAYS use the full absolute path to exe
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.