pixverse-ai-image-and-video-generator
PixVerse CLI — generate AI videos and images from the command line. Supports PixVerse V6, Veo, Sora, Grok, Seedance, Kling, Happy Horse video models; Nano Banana (Gemini), Seedream, Qwen, Kling, GPT Image image models; and PixVerse's rich effect template library. Start here.
What this skill does
# PixVerse CLI — Master Skill ## What is PixVerse CLI PixVerse CLI is the official command-line interface for [PixVerse](https://pixverse.ai) — an AI-powered creative platform for generating videos and images. It is essentially **a UI-free version of the PixVerse website**: all features, models, and parameters are aligned with the web experience at [app.pixverse.ai](https://app.pixverse.ai). It is designed for: - **AI agents** (primary) — structured JSON output, deterministic exit codes, and pipeable commands for autonomous workflows (Claude Code, Cursor, Codex, custom agents) - **Developers & power users** — scriptable video/image generation without leaving the terminal - **Automation** — batch processing, CI/CD pipelines, content production workflows Key facts: - Generating content **consumes credits** from the user's PixVerse account (same pricing as the website) - **Only subscribed users** can use the CLI — see [subscription plans](https://app.pixverse.ai/subscribe) - All output can be returned as structured JSON via `--json` flag - English only --- ## Installation ```bash npm install -g pixverse ``` Or run without installing: ```bash npx pixverse ``` Verify: ```bash pixverse --version ``` **Requires Node.js >= 20.** --- ## Quick Start ```bash # 1. Install npm install -g pixverse # 2. Authenticate (OAuth device flow — opens browser) pixverse auth login --json # 3. Create a video (waits for completion by default) RESULT=$(pixverse create video --prompt "A cat astronaut floating in space" --json) VIDEO_ID=$(echo "$RESULT" | jq -r '.video_id') # 4. Download the result pixverse asset download $VIDEO_ID --json ``` To skip waiting and poll later: ```bash RESULT=$(pixverse create video --prompt "A cat astronaut floating in space" --no-wait --json) VIDEO_ID=$(echo "$RESULT" | jq -r '.video_id') pixverse task wait $VIDEO_ID --json pixverse asset download $VIDEO_ID --json ``` > **Windows users**: For a full PowerShell pipeline example (T2I → I2V → upscale → download), see `skills/examples/windows/powershell-text-to-video.ps1`. --- ## Authentication PixVerse CLI uses **OAuth device flow** — no need to manually copy tokens: 1. Run `pixverse auth login --json` 2. The CLI prints an authorization URL 3. Open the URL in your browser and authorize 4. The token is stored automatically in `~/.pixverse/` Details: - Token is valid for 30 days - CLI sessions are independent from your web/app sessions - If token expires (exit code 3), re-run `pixverse auth login --json` - Run `pixverse auth status --json` to check login state and credits --- ## Capabilities Overview | I want to... | Use skill | |:---|:---| | Create a video from text or image | `pixverse:create-video` | | Enhance a video prompt for better results (V6 / generic) | `pixverse:prompt-enhance` | | Optimize a prompt for Seedance 2.0 (auto-triggers when prompt has clear optimization headroom; skipped when prompt is already clean) | `pixverse:seedance-prompt-optimize` | | Edit video content with AI (replace subjects, swap outfits, change backgrounds) | `pixverse:modify-video` | | Animate a character with motion from a reference video | `pixverse:motion-control` | | Create or edit an image | `pixverse:create-and-edit-image` | | Extend, upscale, or add audio to a video | `pixverse:post-process-video` | | Create transition animation between frames | `pixverse:transition` | | Check generation progress | `pixverse:task-management` | | Browse, download, upload, or delete assets | `pixverse:asset-management` | | Organize assets into named folders | `pixverse:saved-folders` | | Set up auth or check account | `pixverse:auth-and-account` | | Browse and create from effect templates | `pixverse:template` | | Manage workspaces (list, switch, status) | `pixverse:workspace` | | Generate Mondo-style posters and covers | `pixverse:mondo-poster-design` | | Design and reuse persistent characters across a story | `pixverse:character-design` | | Design and reuse persistent key items / props / objects | `pixverse:item-design` | > **Looking up models or parameters?** Don't wait until you're generating — read the relevant capabilities file directly: > - Video models & constraints → `skills/capabilities/create-video.md` (Model Reference section) > - Image models & constraints → `skills/capabilities/create-and-edit-image.md` (Model Reference section) --- ## Model Quick Reference Use this to pick a model before diving into a sub-skill. ### Video Models (`pixverse create video --model <value>`) | Model | `--model` value | Max Quality | Duration | |:---|:---|:---|:---| | PixVerse V6 *(default)* | `v6` | `1080p` | `1`–`15`s | | PixVerse C1 | `pixverse-c1` | `1080p` | `1`–`15`s | | PixVerse v5.6 | `v5.6` | `1080p` | `1`–`10`s | | Sora 2 | `sora-2` | `720p` | `4` `8` `12`s | | Sora 2 Pro | `sora-2-pro` | `1080p` | `4` `8` `12`s | | Veo 3.1 Standard | `veo-3.1-standard` | `1080p` | `4` `6` `8`s | | Veo 3.1 Fast | `veo-3.1-fast` | `1080p` | `4` `6` `8`s | | Veo 3.1 Lite | `veo-3.1-lite` | `1080p` | `4`–`6`s | | Grok Imagine | `grok-imagine` | `720p` | `1`–`15`s | | Happy Horse 1.0 | `happyhorse-1.0` | `1080p` | `3`–`15`s | | Seedance 2.0 Standard | `seedance-2.0-standard` | `1080p` | `4`–`15`s | | Seedance 2.0 Fast | `seedance-2.0-fast` | `720p` | `4`–`15`s | | Kling O3 Pro | `kling-o3-pro` | `720p` | `3`–`15`s | | Kling O3 Standard | `kling-o3-standard` | `720p` | `3`–`15`s | | Kling 3.0 Pro | `kling-3.0-pro` | `720p` | `3`–`15`s | | Kling 3.0 Standard | `kling-3.0-standard` | `720p` | `3`–`15`s | ### Image Models (`pixverse create image --model <value>`) | Model | `--model` value | Max Quality | |:---|:---|:---| | Qwen Image *(default)* | `qwen-image` | `1080p` | | GPT Image 2 | `gpt-image-2.0` | `2160p` | | Seedream 5.0 Lite | `seedream-5.0-lite` | `2160p` | | Seedream 4.5 | `seedream-4.5` | `2160p` | | Seedream 4.0 | `seedream-4.0` | `2160p` | | Gemini 2.5 Flash (Nanobanana) | `gemini-2.5-flash` | `1080p` | | Gemini 3.0 (Nano Banana Pro) | `gemini-3.0` | `2160p` | | Gemini 3.1 Flash (Nano Banana 2) | `gemini-3.1-flash` | `2160p` | | Kling Image O3 | `kling-image-o3` | `2160p` | | Kling Image V3 | `kling-image-v3` | `1440p` | For full parameter constraints (aspect ratios, quality per model, mode support), read the capabilities files listed above. --- ## Workflow Skills | I want to... | Use skill | |:---|:---| | Generate video from text end-to-end | `pixverse:text-to-video-pipeline` | | Animate an image into video | `pixverse:image-to-video-pipeline` | | Generate image then animate it | `pixverse:text-to-image-to-video` | | Iteratively edit an image | `pixverse:image-editing-pipeline` | | Modify a video and enhance it | `pixverse:modify-video-pipeline` | | Full video production (create + extend + audio + upscale) | `pixverse:video-production` | | Animate a character with a motion reference | `pixverse:motion-control-pipeline` | | Create multiple items in parallel | `pixverse:batch-creation` | | Generate a Mondo-style poster end-to-end | `pixverse:mondo-poster-pipeline` | | Generate poster then animate into video | `pixverse:mondo-poster-to-video-pipeline` | | Storyboard → 4-shot video from a single prompt | `pixverse:storyboard-to-video` | --- ## Reference Materials Located in `skills/references/`. These are read-only knowledge bases that capabilities and workflows draw from — no CLI commands, just curated design knowledge. | Reference | Path | Content | |:---|:---|:---| | Mondo Artist Styles | `references/mondo-poster/artist-styles.md` | 37 artist styles with prompt keywords across 7 categories | | Mondo Composition Patterns | `references/mondo-poster/composition-patterns.md` | 8 composition techniques (negative space, silhouette, geometric framing, etc.) | | Mondo Genre Templates | `references/mondo-poster/genre-templates.md` | Genre-specific prompt templates for film, book covers, and album covers | --- ## All Commands | Command | Description | |:---|:---| | `auth login` | Login via browser (
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.