deAPI AI Media Suite (Community)

Included with Lifetime

$97 forever

The cheapest AI media API on the market. Generate images (Flux), music (AceStep), speech with voice cloning, transcribe video/audio, OCR, video generation, background removal, upscale, style transfer, and prompt enhancement — all through one unified API. Free $5 credit on signup.

Image & Videomediatranscriptionimage-generationttsvoice-cloningmusic-generationocrvideo

What this skill does


# deAPI Media Generation

AI-powered media tools via decentralized GPU network. Get your API key at [deapi.ai](https://deapi.ai) (free $5 credit on signup).

## Setup

```bash
export DEAPI_API_KEY=your_api_key_here
```

## Available Functions

| Function | Use when user wants to... |
|----------|---------------------------|
| Transcribe (URL) | Transcribe YouTube, Twitch, Kick, X videos, or audio URLs |
| Transcribe (File) | Transcribe uploaded local audio/video file |
| Generate Image | Generate images from text descriptions (Flux models) |
| Generate Audio | Convert text to speech (TTS, 54+ voices, 8 languages) |
| Clone Voice | Clone a voice from short audio sample (3-10s) |
| Design Voice | Create new voice from text description |
| Generate Music | Generate music tracks, jingles, songs with vocals (AceStep) |
| Generate Video | Create video from text or animate images |
| Boost Prompt | Improve prompt quality before generation |
| OCR | Extract text from images |
| Remove Background | Remove background from images |
| Upscale | Upscale image resolution (2x/4x) |
| Transform Image | Apply style transfer to images (multi-image support) |
| Embeddings | Generate text embeddings for semantic search |
| Check Balance | Check account balance |
| Discover Models | List available models dynamically |

---

## Agent Safety: Input Sanitization

All curl examples use placeholders. Before substituting user input into shell commands:

1. **JSON payloads** — build JSON safely with `jq`, never inline raw strings:
   ```bash
   # ❌ UNSAFE — shell injection risk
   curl -d '{"prompt": "{USER_INPUT}"}'

   # ✅ SAFE — jq handles all escaping
   JSON=$(jq -n --arg p "$USER_INPUT" '{"prompt": $p}')
   curl -d "$JSON"
   ```

2. **URLs** — validate format before use:
   ```bash
   if [[ ! "$URL" =~ ^https?:// ]]; then
     echo "Invalid URL"; exit 1
   fi
   ```

3. **File paths** — verify file exists, use `@` prefix only with validated local paths:
   ```bash
   [[ -f "$FILE_PATH" ]] && curl -F "image=@$FILE_PATH"
   ```

4. **Never** pass raw user input directly into shell strings without escaping.

---

## Async Pattern (Important!)

**All deAPI requests are asynchronous.** Follow this pattern for every operation:

### 1. Submit Request
```bash
curl -s -X POST "https://api.deapi.ai/api/v1/client/{endpoint}" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON"
```

Response contains `request_id`.

### 2. Poll Status (loop every 10 seconds)
```bash
curl -s "https://api.deapi.ai/api/v1/client/request-status/{request_id}" \
  -H "Authorization: Bearer $DEAPI_API_KEY"
```

### 3. Handle Status
- `processing` → wait 10s, poll again
- `done` → fetch result from `result_url`
- `failed` → report error to user

### Common Error Handling
| Error | Action |
|-------|--------|
| 401 Unauthorized | Check DEAPI_API_KEY |
| 429 Rate Limited | Wait 60s and retry |
| 500 Server Error | Wait 30s and retry once |

---

## Model Selection Guide

**Image generation (txt2img):**
- Quick drafts / iterations → Klein (fastest)
- Photorealistic / detailed scenes → Flux1schnell (steps=8)
- Speed critical → ZImageTurbo

**Image transformation (img2img):**
- Logo/brand placement on objects → Qwen (preserves source better)
- Style transfer / artistic → Klein (faster, creative freedom)
- Combining multiple images → Klein (supports up to 3 images)

**Video generation:**
- Best quality → LTX-2 19B (no steps/guidance needed)
- Image animation → LTXv 13B (supports first_frame_image)

**TTS:**
- Quick narration → custom_voice + Kokoro
- Clone specific voice → voice_clone + reference audio
- Create new voice from description → voice_design

**Music:**
- Fast iteration → ACE-Step-v1.5-turbo (8 steps)
- Production quality → ACE-Step-v1.5 (32+ steps)

**Tip:** Model slugs change. When in doubt, call `GET /api/v1/client/models` to get the current list.

---

## Discover Available Models

Models change over time. Query the live list:

```bash
curl -s "https://api.deapi.ai/api/v1/client/models" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Accept: application/json"
```

Filter by task type:
```bash
# Only txt2img models
curl -s "https://api.deapi.ai/api/v1/client/models?filter[inference_types]=txt2img" \
  -H "Authorization: Bearer $DEAPI_API_KEY"
```

Each model returns: `slug` (use in requests), `inference_types`, `info.limits`, `info.defaults`, `languages` (TTS), `loras` (image).

---

## Transcription (URL — YouTube, Audio, Video)

**Use when:** user wants to transcribe video from YouTube, X, Twitch, Kick or audio URLs.

**Endpoints:**
- Video (YouTube, mp4, webm): `vid2txt`
- Audio (mp3, wav, m4a, flac, ogg): `aud2txt`

**Request (video):**
```bash
JSON=$(jq -n --arg url "$VIDEO_URL" '{
  video_url: $url,
  include_ts: true,
  model: "WhisperLargeV3"
}')
curl -s -X POST "https://api.deapi.ai/api/v1/client/vid2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON"
```

**Request (audio):**
```bash
JSON=$(jq -n --arg url "$AUDIO_URL" '{
  audio_url: $url,
  include_ts: true,
  model: "WhisperLargeV3"
}')
curl -s -X POST "https://api.deapi.ai/api/v1/client/aud2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON"
```

**After polling:** Present transcription with timestamps in readable format.

---

## Transcription (File Upload)

**Use when:** user has a local audio/video file to transcribe (not a URL).

**Endpoints:**
- Video file: `videofile2txt` (multipart/form-data)
- Audio file: `audiofile2txt` (multipart/form-data)

**Request (audio file):**
```bash
[[ -f "$AUDIO_PATH" ]] || { echo "File not found"; exit 1; }
curl -s -X POST "https://api.deapi.ai/api/v1/client/audiofile2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -F "audio=@$AUDIO_PATH" \
  -F "include_ts=true" \
  -F "model=WhisperLargeV3"
```

**Request (video file):**
```bash
[[ -f "$VIDEO_PATH" ]] || { echo "File not found"; exit 1; }
curl -s -X POST "https://api.deapi.ai/api/v1/client/videofile2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -F "video=@$VIDEO_PATH" \
  -F "include_ts=true" \
  -F "model=WhisperLargeV3"
```

---

## Image Generation (Flux)

**Use when:** user wants to generate images from text descriptions.

**Endpoint:** `txt2img`

**Models:**
| Model | API Name | Steps | Max Size | Notes |
|-------|----------|-------|----------|-------|
| Klein (default) | `Flux_2_Klein_4B_BF16` | 4 (fixed) | 1536px | Fastest, recommended |
| Flux | `Flux1schnell` | 4-10 | 2048px | Higher resolution |
| Turbo | `ZImageTurbo_INT8` | 4-10 | 1024px | Fastest inference |

**Request:**
```bash
JSON=$(jq -n --arg prompt "$PROMPT" --argjson seed "$RANDOM" '{
  prompt: $prompt,
  model: "Flux_2_Klein_4B_BF16",
  width: 1024,
  height: 1024,
  steps: 4,
  seed: ($seed % 1000000)
}')
curl -s -X POST "https://api.deapi.ai/api/v1/client/txt2img" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON"
```

**Note:** Klein model does NOT support `guidance` parameter — omit it.

---

## Text-to-Speech (54+ Voices)

**Use when:** user wants to convert text to speech.

**Endpoint:** `txt2audio`

**Popular Voices:**
| Voice ID | Language | Description |
|----------|----------|-------------|
| `af_bella` | American EN | Warm, friendly (best quality) |
| `af_heart` | American EN | Expressive, emotional |
| `am_adam` | American EN | Deep, authoritative |
| `bf_emma` | British EN | Elegant (best British) |
| `jf_alpha` | Japanese | Natural Japanese female |
| `zf_xiaobei` | Chinese | Mandarin female |
| `ef_dora` | Spanish | Spanish female |
| `ff_siwis` | French | French female (best quality) |

Voice format: `{lang}{gender}_{name}` (e.g., `af_bella` = American Female Bella)

### TTS Mode 1: Custom Voice (default)

Use a predefined voice from the list above.

```bash
JSON=$(jq -n --arg text "$TEXT" '{
  text: $text,
  voice: "af_bella",
  mod

Files: 3

Size: 23.9 KB

Complexity: 36/100

Category: Image & Video

Source: https://github.com/leoyeai/openclaw-master-skills/tree/main/skills/deapi-community

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts