Claude
Skills
Sign in
Back

deAPI AI Media Suite (Community)

Included with Lifetime
$97 forever

The cheapest AI media API on the market. Generate images (Flux), music (AceStep), speech with voice cloning, transcribe video/audio, OCR, video generation, background removal, upscale, style transfer, and prompt enhancement — all through one unified API. Free $5 credit on signup.

Image & Videomediatranscriptionimage-generationttsvoice-cloningmusic-generationocrvideo

What this skill does


# deAPI Media Generation

AI-powered media tools via decentralized GPU network. Get your API key at [deapi.ai](https://deapi.ai) (free $5 credit on signup).

## Setup

```bash
export DEAPI_API_KEY=your_api_key_here
```

## Available Functions

| Function | Use when user wants to... |
|----------|---------------------------|
| Transcribe (URL) | Transcribe YouTube, Twitch, Kick, X videos, or audio URLs |
| Transcribe (File) | Transcribe uploaded local audio/video file |
| Generate Image | Generate images from text descriptions (Flux models) |
| Generate Audio | Convert text to speech (TTS, 54+ voices, 8 languages) |
| Clone Voice | Clone a voice from short audio sample (3-10s) |
| Design Voice | Create new voice from text description |
| Generate Music | Generate music tracks, jingles, songs with vocals (AceStep) |
| Generate Video | Create video from text or animate images |
| Boost Prompt | Improve prompt quality before generation |
| OCR | Extract text from images |
| Remove Background | Remove background from images |
| Upscale | Upscale image resolution (2x/4x) |
| Transform Image | Apply style transfer to images (multi-image support) |
| Embeddings | Generate text embeddings for semantic search |
| Check Balance | Check account balance |
| Discover Models | List available models dynamically |

---

## Agent Safety: Input Sanitization

All curl examples use placeholders. Before substituting user input into shell commands:

1. **JSON payloads** — build JSON safely with `jq`, never inline raw strings:
   ```bash
   # ❌ UNSAFE — shell injection risk
   curl -d '{"prompt": "{USER_INPUT}"}'

   # ✅ SAFE — jq handles all escaping
   JSON=$(jq -n --arg p "$USER_INPUT" '{"prompt": $p}')
   curl -d "$JSON"
   ```

2. **URLs** — validate format before use:
   ```bash
   if [[ ! "$URL" =~ ^https?:// ]]; then
     echo "Invalid URL"; exit 1
   fi
   ```

3. **File paths** — verify file exists, use `@` prefix only with validated local paths:
   ```bash
   [[ -f "$FILE_PATH" ]] && curl -F "image=@$FILE_PATH"
   ```

4. **Never** pass raw user input directly into shell strings without escaping.

---

## Async Pattern (Important!)

**All deAPI requests are asynchronous.** Follow this pattern for every operation:

### 1. Submit Request
```bash
curl -s -X POST "https://api.deapi.ai/api/v1/client/{endpoint}" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON"
```

Response contains `request_id`.

### 2. Poll Status (loop every 10 seconds)
```bash
curl -s "https://api.deapi.ai/api/v1/client/request-status/{request_id}" \
  -H "Authorization: Bearer $DEAPI_API_KEY"
```

### 3. Handle Status
- `processing` → wait 10s, poll again
- `done` → fetch result from `result_url`
- `failed` → report error to user

### Common Error Handling
| Error | Action |
|-------|--------|
| 401 Unauthorized | Check DEAPI_API_KEY |
| 429 Rate Limited | Wait 60s and retry |
| 500 Server Error | Wait 30s and retry once |

---

## Model Selection Guide

**Image generation (txt2img):**
- Quick drafts / iterations → Klein (fastest)
- Photorealistic / detailed scenes → Flux1schnell (steps=8)
- Speed critical → ZImageTurbo

**Image transformation (img2img):**
- Logo/brand placement on objects → Qwen (preserves source better)
- Style transfer / artistic → Klein (faster, creative freedom)
- Combining multiple images → Klein (supports up to 3 images)

**Video generation:**
- Best quality → LTX-2 19B (no steps/guidance needed)
- Image animation → LTXv 13B (supports first_frame_image)

**TTS:**
- Quick narration → custom_voice + Kokoro
- Clone specific voice → voice_clone + reference audio
- Create new voice from description → voice_design

**Music:**
- Fast iteration → ACE-Step-v1.5-turbo (8 steps)
- Production quality → ACE-Step-v1.5 (32+ steps)

**Tip:** Model slugs change. When in doubt, call `GET /api/v1/client/models` to get the current list.

---

## Discover Available Models

Models change over time. Query the live list:

```bash
curl -s "https://api.deapi.ai/api/v1/client/models" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Accept: application/json"
```

Filter by task type:
```bash
# Only txt2img models
curl -s "https://api.deapi.ai/api/v1/client/models?filter[inference_types]=txt2img" \
  -H "Authorization: Bearer $DEAPI_API_KEY"
```

Each model returns: `slug` (use in requests), `inference_types`, `info.limits`, `info.defaults`, `languages` (TTS), `loras` (image).

---

## Transcription (URL — YouTube, Audio, Video)

**Use when:** user wants to transcribe video from YouTube, X, Twitch, Kick or audio URLs.

**Endpoints:**
- Video (YouTube, mp4, webm): `vid2txt`
- Audio (mp3, wav, m4a, flac, ogg): `aud2txt`

**Request (video):**
```bash
JSON=$(jq -n --arg url "$VIDEO_URL" '{
  video_url: $url,
  include_ts: true,
  model: "WhisperLargeV3"
}')
curl -s -X POST "https://api.deapi.ai/api/v1/client/vid2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON"
```

**Request (audio):**
```bash
JSON=$(jq -n --arg url "$AUDIO_URL" '{
  audio_url: $url,
  include_ts: true,
  model: "WhisperLargeV3"
}')
curl -s -X POST "https://api.deapi.ai/api/v1/client/aud2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON"
```

**After polling:** Present transcription with timestamps in readable format.

---

## Transcription (File Upload)

**Use when:** user has a local audio/video file to transcribe (not a URL).

**Endpoints:**
- Video file: `videofile2txt` (multipart/form-data)
- Audio file: `audiofile2txt` (multipart/form-data)

**Request (audio file):**
```bash
[[ -f "$AUDIO_PATH" ]] || { echo "File not found"; exit 1; }
curl -s -X POST "https://api.deapi.ai/api/v1/client/audiofile2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -F "audio=@$AUDIO_PATH" \
  -F "include_ts=true" \
  -F "model=WhisperLargeV3"
```

**Request (video file):**
```bash
[[ -f "$VIDEO_PATH" ]] || { echo "File not found"; exit 1; }
curl -s -X POST "https://api.deapi.ai/api/v1/client/videofile2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -F "video=@$VIDEO_PATH" \
  -F "include_ts=true" \
  -F "model=WhisperLargeV3"
```

---

## Image Generation (Flux)

**Use when:** user wants to generate images from text descriptions.

**Endpoint:** `txt2img`

**Models:**
| Model | API Name | Steps | Max Size | Notes |
|-------|----------|-------|----------|-------|
| Klein (default) | `Flux_2_Klein_4B_BF16` | 4 (fixed) | 1536px | Fastest, recommended |
| Flux | `Flux1schnell` | 4-10 | 2048px | Higher resolution |
| Turbo | `ZImageTurbo_INT8` | 4-10 | 1024px | Fastest inference |

**Request:**
```bash
JSON=$(jq -n --arg prompt "$PROMPT" --argjson seed "$RANDOM" '{
  prompt: $prompt,
  model: "Flux_2_Klein_4B_BF16",
  width: 1024,
  height: 1024,
  steps: 4,
  seed: ($seed % 1000000)
}')
curl -s -X POST "https://api.deapi.ai/api/v1/client/txt2img" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON"
```

**Note:** Klein model does NOT support `guidance` parameter — omit it.

---

## Text-to-Speech (54+ Voices)

**Use when:** user wants to convert text to speech.

**Endpoint:** `txt2audio`

**Popular Voices:**
| Voice ID | Language | Description |
|----------|----------|-------------|
| `af_bella` | American EN | Warm, friendly (best quality) |
| `af_heart` | American EN | Expressive, emotional |
| `am_adam` | American EN | Deep, authoritative |
| `bf_emma` | British EN | Elegant (best British) |
| `jf_alpha` | Japanese | Natural Japanese female |
| `zf_xiaobei` | Chinese | Mandarin female |
| `ef_dora` | Spanish | Spanish female |
| `ff_siwis` | French | French female (best quality) |

Voice format: `{lang}{gender}_{name}` (e.g., `af_bella` = American Female Bella)

### TTS Mode 1: Custom Voice (default)

Use a predefined voice from the list above.

```bash
JSON=$(jq -n --arg text "$TEXT" '{
  text: $text,
  voice: "af_bella",
  mod

Related in Image & Video