ponyflash
Generate images, videos, speech audio, and music using the PonyFlash Python SDK. Also handle local media editing with FFmpeg, including clip, concat, transcode, extract audio, frame capture, subtitle capability checks, and ASS subtitle prep. Use when the user asks to create, generate, produce, edit, trim, merge, concatenate, transcode, subtitle, or render AI-generated media content.
What this skill does
# PonyFlash Skill
## Step 0: Decide Which Capability Path Applies
This skill now contains **two capability families**:
1. **Cloud generation via PonyFlash Python SDK**
- image generation
- video generation
- speech synthesis
- music generation
- model listing
- file management
- account / credits
- These tasks **require a valid PonyFlash API key**.
2. **Local media editing via FFmpeg toolchain**
- ffmpeg / ffprobe detection
- installation planning
- clip / concat / transcode
- extract audio / capture frame
- subtitle capability checks
- ASS subtitle generation and burn-in workflow
- These tasks **do NOT require a PonyFlash API key**, but they **do require local `ffmpeg` / `ffprobe` support**.
Before doing anything, classify the request:
- If the user is asking to **generate** media with PonyFlash models, follow the SDK path and require API key setup.
- If the user is asking to **edit or process local media**, follow the FFmpeg path and do dependency checks first.
- If the user wants an **end-to-end production workflow**, you may use both: generate assets with PonyFlash, then assemble or export with FFmpeg.
## Step 1A: API Key Setup for PonyFlash SDK Tasks
Only do this section when the request needs PonyFlash cloud capabilities.
**The FIRST time this skill is activated for a cloud generation task**, tell the user the following in your own words:
1. PonyFlash skill is ready to use.
2. It can handle:
- image generation
- video generation
- speech synthesis
- music generation
- local media editing with FFmpeg
3. For complex multi-step productions, there are **Creative Playbooks** in the `playbooks/` directory.
4. To use PonyFlash cloud generation, the user needs an API key:
- Register / log in at **https://www.ponyflash.com**
- Get API key at **https://www.ponyflash.com/api-key** (starts with `rk_`)
- Check credits at **https://www.ponyflash.com/usage**
- Paste the key back in the chat
**On subsequent SDK activations**, check whether `PONYFLASH_API_KEY` is set in the environment. If not, ask the user for the key again.
Once received, set it up:
```bash
export PONYFLASH_API_KEY="rk_xxx"
```
Then install the SDK:
```bash
pip install ponyflash
```
**Always verify the key works before any generation task:**
```python
from ponyflash import PonyFlash
pony_flash = PonyFlash(api_key="<key from user>")
balance = pony_flash.account.credits()
print(f"Balance: {balance.balance} {balance.currency}")
```
If verification fails:
- **Key invalid or missing** → direct user to https://api.ponyflash.com/api-key
- **Balance is zero** → direct user to https://api.ponyflash.com/usage to top up credits
## Step 1B: Local Dependency Setup for FFmpeg Tasks
Only do this section when the request needs local editing, subtitle, or export work.
1. First check local dependencies:
```bash
bash "{baseDir}/scripts/check_ffmpeg.sh"
```
2. If the task involves subtitles, do **capability checks**, not just existence checks:
```bash
bash "{baseDir}/scripts/check_ffmpeg.sh" --require-subtitles-filter
```
3. If `ffmpeg` / `ffprobe` or required filters are missing:
- Tell the user what is missing.
- Ask whether the user wants platform-appropriate FFmpeg installation guidance.
- After the user installs FFmpeg, rerun the dependency checks before continuing.
## What this Skill Can Do
| Capability | Resource | Description |
|---|---|---|
| Image generation | `pony_flash.images` | Text-to-image, image editing with mask/reference images |
| Video generation | `pony_flash.video` | Text-to-video, first-frame-to-video, OmniHuman, Motion Transfer |
| Speech synthesis | `pony_flash.speech` | Text-to-speech with voice cloning, emotion control, speed, pitch |
| Music generation | `pony_flash.music` | Text-to-music with lyrics, style, instrumental mode, continuation |
| Model listing | `pony_flash.models` | List available models, get model details and supported modes |
| File management | `pony_flash.files` | Upload, list, get, delete files |
| Account | `pony_flash.account` | Check credit balance, get recharge link |
| Local media editing | `scripts/media_ops.sh` | Clip, concat, transcode, extract audio, frame capture |
| FFmpeg environment checks | `scripts/check_ffmpeg.sh` | Detect ffmpeg / ffprobe and subtitle capabilities |
| Subtitle font prep | `scripts/ensure_subtitle_fonts.sh` | Keep a reusable local copy of the default subtitle font when explicitly requested |
| ASS subtitle prep | `scripts/build_ass_subtitles.py` | Adaptive ASS subtitle generation with pre-wrapping |
## Creative Playbooks (production workflows)
The `playbooks/` directory contains **Creative Playbooks** — step-by-step production workflow guides for specific content types. Playbooks act as a director layer: they tell you **what to create and in what order**, while this SKILL.md tells you **how to execute generation and editing**.
### When to use a playbook
1. **User explicitly requests a playbook by name** → Read the corresponding file from `playbooks/` and follow its workflow.
2. **User asks to see available playbooks** → Read [playbooks/INDEX.md](playbooks/INDEX.md) and display the full list.
3. **User's request is clearly a multi-step production task** → Suggest a matching playbook from [playbooks/INDEX.md](playbooks/INDEX.md) and ask whether to use it.
4. **User's request is a single-step generation or editing task** → Proceed directly with the relevant SDK or FFmpeg capability. No playbook needed.
### How to execute a playbook
Once a playbook is loaded:
- Follow its workflow (asset prep → content generation → voice / music → editing → output).
- Use PonyFlash SDK for generation tasks and FFmpeg scripts for local assembly / export tasks.
- Confirm key creative decisions with the user before expensive generation.
- Adapt prompts, durations, output format, and export strategy to the user's actual goal.
### Creating custom playbooks
When the user asks to create a new playbook, generate a markdown file in `playbooks/` following this template:
```markdown
---
name: Playbook Name
description: One-line summary of what this playbook produces
tags: [keyword1, keyword2, keyword3]
difficulty: beginner | intermediate | advanced
estimated_credits: credit range estimate
output_format: format description (e.g., "vertical 9:16 MP4")
---
# Playbook Name
## Use Cases
When to use this playbook.
## Workflow
### Step 1: Asset Preparation
What the user needs to provide; how to generate missing assets.
### Step 2: Visual Content Generation
Which models to use, recommended parameters, prompt guidance.
### Step 3: Voice / Music
Speech synthesis + background music guidance.
### Step 4: Editing / Assembly
How to assemble, trim, subtitle, transcode, and export with the local FFmpeg workflow.
### Step 5: Output / Optimization
Render settings, format recommendations.
## Prompt Templates
Reusable prompt examples for this content type.
## Notes
Best practices, common pitfalls.
```
After creating the file, update [playbooks/INDEX.md](playbooks/INDEX.md) to include the new playbook.
## PonyFlash SDK Core Concepts
### Client initialization
```python
from ponyflash import PonyFlash
pony_flash = PonyFlash(api_key="rk_xxx")
```
Reads `PONYFLASH_API_KEY` from environment if `api_key` is omitted.
### FileInput — zero-friction file handling
All file parameters accept any of these types:
| Input type | Example | Behavior |
|---|---|---|
| URL string | `"https://example.com/photo.jpg"` | Passed directly to API |
| file_id string | `"file_abc123"` | Passed directly to API |
| `Path` object | `Path("photo.jpg")` | Auto-uploaded via presigned URL |
| `open()` file | `open("photo.jpg", "rb")` | Auto-uploaded via presigned URL |
| `bytes` | `image_bytes` | Auto-uploaded via presigned URL |
| `(filename, bytes)` tuple | `("photo.jpg", data)` | Auto-uploaded with filename |
Temp uploads are cleaned up automatically after `generate()` completes.
PlainRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.