deAPI AI Media Suite (Community)
The cheapest AI media API on the market. Generate images (Flux), music (AceStep), speech with voice cloning, transcribe video/audio, OCR, video generation, background removal, upscale, style transfer, and prompt enhancement — all through one unified API. Free $5 credit on signup.
What this skill does
# deAPI Media Generation
AI-powered media tools via decentralized GPU network. Get your API key at [deapi.ai](https://deapi.ai) (free $5 credit on signup).
## Setup
```bash
export DEAPI_API_KEY=your_api_key_here
```
## Available Functions
| Function | Use when user wants to... |
|----------|---------------------------|
| Transcribe (URL) | Transcribe YouTube, Twitch, Kick, X videos, or audio URLs |
| Transcribe (File) | Transcribe uploaded local audio/video file |
| Generate Image | Generate images from text descriptions (Flux models) |
| Generate Audio | Convert text to speech (TTS, 54+ voices, 8 languages) |
| Clone Voice | Clone a voice from short audio sample (3-10s) |
| Design Voice | Create new voice from text description |
| Generate Music | Generate music tracks, jingles, songs with vocals (AceStep) |
| Generate Video | Create video from text or animate images |
| Boost Prompt | Improve prompt quality before generation |
| OCR | Extract text from images |
| Remove Background | Remove background from images |
| Upscale | Upscale image resolution (2x/4x) |
| Transform Image | Apply style transfer to images (multi-image support) |
| Embeddings | Generate text embeddings for semantic search |
| Check Balance | Check account balance |
| Discover Models | List available models dynamically |
---
## Agent Safety: Input Sanitization
All curl examples use placeholders. Before substituting user input into shell commands:
1. **JSON payloads** — build JSON safely with `jq`, never inline raw strings:
```bash
# ❌ UNSAFE — shell injection risk
curl -d '{"prompt": "{USER_INPUT}"}'
# ✅ SAFE — jq handles all escaping
JSON=$(jq -n --arg p "$USER_INPUT" '{"prompt": $p}')
curl -d "$JSON"
```
2. **URLs** — validate format before use:
```bash
if [[ ! "$URL" =~ ^https?:// ]]; then
echo "Invalid URL"; exit 1
fi
```
3. **File paths** — verify file exists, use `@` prefix only with validated local paths:
```bash
[[ -f "$FILE_PATH" ]] && curl -F "image=@$FILE_PATH"
```
4. **Never** pass raw user input directly into shell strings without escaping.
---
## Async Pattern (Important!)
**All deAPI requests are asynchronous.** Follow this pattern for every operation:
### 1. Submit Request
```bash
curl -s -X POST "https://api.deapi.ai/api/v1/client/{endpoint}" \
-H "Authorization: Bearer $DEAPI_API_KEY" \
-H "Content-Type: application/json" \
-d "$JSON"
```
Response contains `request_id`.
### 2. Poll Status (loop every 10 seconds)
```bash
curl -s "https://api.deapi.ai/api/v1/client/request-status/{request_id}" \
-H "Authorization: Bearer $DEAPI_API_KEY"
```
### 3. Handle Status
- `processing` → wait 10s, poll again
- `done` → fetch result from `result_url`
- `failed` → report error to user
### Common Error Handling
| Error | Action |
|-------|--------|
| 401 Unauthorized | Check DEAPI_API_KEY |
| 429 Rate Limited | Wait 60s and retry |
| 500 Server Error | Wait 30s and retry once |
---
## Model Selection Guide
**Image generation (txt2img):**
- Quick drafts / iterations → Klein (fastest)
- Photorealistic / detailed scenes → Flux1schnell (steps=8)
- Speed critical → ZImageTurbo
**Image transformation (img2img):**
- Logo/brand placement on objects → Qwen (preserves source better)
- Style transfer / artistic → Klein (faster, creative freedom)
- Combining multiple images → Klein (supports up to 3 images)
**Video generation:**
- Best quality → LTX-2 19B (no steps/guidance needed)
- Image animation → LTXv 13B (supports first_frame_image)
**TTS:**
- Quick narration → custom_voice + Kokoro
- Clone specific voice → voice_clone + reference audio
- Create new voice from description → voice_design
**Music:**
- Fast iteration → ACE-Step-v1.5-turbo (8 steps)
- Production quality → ACE-Step-v1.5 (32+ steps)
**Tip:** Model slugs change. When in doubt, call `GET /api/v1/client/models` to get the current list.
---
## Discover Available Models
Models change over time. Query the live list:
```bash
curl -s "https://api.deapi.ai/api/v1/client/models" \
-H "Authorization: Bearer $DEAPI_API_KEY" \
-H "Accept: application/json"
```
Filter by task type:
```bash
# Only txt2img models
curl -s "https://api.deapi.ai/api/v1/client/models?filter[inference_types]=txt2img" \
-H "Authorization: Bearer $DEAPI_API_KEY"
```
Each model returns: `slug` (use in requests), `inference_types`, `info.limits`, `info.defaults`, `languages` (TTS), `loras` (image).
---
## Transcription (URL — YouTube, Audio, Video)
**Use when:** user wants to transcribe video from YouTube, X, Twitch, Kick or audio URLs.
**Endpoints:**
- Video (YouTube, mp4, webm): `vid2txt`
- Audio (mp3, wav, m4a, flac, ogg): `aud2txt`
**Request (video):**
```bash
JSON=$(jq -n --arg url "$VIDEO_URL" '{
video_url: $url,
include_ts: true,
model: "WhisperLargeV3"
}')
curl -s -X POST "https://api.deapi.ai/api/v1/client/vid2txt" \
-H "Authorization: Bearer $DEAPI_API_KEY" \
-H "Content-Type: application/json" \
-d "$JSON"
```
**Request (audio):**
```bash
JSON=$(jq -n --arg url "$AUDIO_URL" '{
audio_url: $url,
include_ts: true,
model: "WhisperLargeV3"
}')
curl -s -X POST "https://api.deapi.ai/api/v1/client/aud2txt" \
-H "Authorization: Bearer $DEAPI_API_KEY" \
-H "Content-Type: application/json" \
-d "$JSON"
```
**After polling:** Present transcription with timestamps in readable format.
---
## Transcription (File Upload)
**Use when:** user has a local audio/video file to transcribe (not a URL).
**Endpoints:**
- Video file: `videofile2txt` (multipart/form-data)
- Audio file: `audiofile2txt` (multipart/form-data)
**Request (audio file):**
```bash
[[ -f "$AUDIO_PATH" ]] || { echo "File not found"; exit 1; }
curl -s -X POST "https://api.deapi.ai/api/v1/client/audiofile2txt" \
-H "Authorization: Bearer $DEAPI_API_KEY" \
-F "audio=@$AUDIO_PATH" \
-F "include_ts=true" \
-F "model=WhisperLargeV3"
```
**Request (video file):**
```bash
[[ -f "$VIDEO_PATH" ]] || { echo "File not found"; exit 1; }
curl -s -X POST "https://api.deapi.ai/api/v1/client/videofile2txt" \
-H "Authorization: Bearer $DEAPI_API_KEY" \
-F "video=@$VIDEO_PATH" \
-F "include_ts=true" \
-F "model=WhisperLargeV3"
```
---
## Image Generation (Flux)
**Use when:** user wants to generate images from text descriptions.
**Endpoint:** `txt2img`
**Models:**
| Model | API Name | Steps | Max Size | Notes |
|-------|----------|-------|----------|-------|
| Klein (default) | `Flux_2_Klein_4B_BF16` | 4 (fixed) | 1536px | Fastest, recommended |
| Flux | `Flux1schnell` | 4-10 | 2048px | Higher resolution |
| Turbo | `ZImageTurbo_INT8` | 4-10 | 1024px | Fastest inference |
**Request:**
```bash
JSON=$(jq -n --arg prompt "$PROMPT" --argjson seed "$RANDOM" '{
prompt: $prompt,
model: "Flux_2_Klein_4B_BF16",
width: 1024,
height: 1024,
steps: 4,
seed: ($seed % 1000000)
}')
curl -s -X POST "https://api.deapi.ai/api/v1/client/txt2img" \
-H "Authorization: Bearer $DEAPI_API_KEY" \
-H "Content-Type: application/json" \
-d "$JSON"
```
**Note:** Klein model does NOT support `guidance` parameter — omit it.
---
## Text-to-Speech (54+ Voices)
**Use when:** user wants to convert text to speech.
**Endpoint:** `txt2audio`
**Popular Voices:**
| Voice ID | Language | Description |
|----------|----------|-------------|
| `af_bella` | American EN | Warm, friendly (best quality) |
| `af_heart` | American EN | Expressive, emotional |
| `am_adam` | American EN | Deep, authoritative |
| `bf_emma` | British EN | Elegant (best British) |
| `jf_alpha` | Japanese | Natural Japanese female |
| `zf_xiaobei` | Chinese | Mandarin female |
| `ef_dora` | Spanish | Spanish female |
| `ff_siwis` | French | French female (best quality) |
Voice format: `{lang}{gender}_{name}` (e.g., `af_bella` = American Female Bella)
### TTS Mode 1: Custom Voice (default)
Use a predefined voice from the list above.
```bash
JSON=$(jq -n --arg text "$TEXT" '{
text: $text,
voice: "af_bella",
modRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.