elevenlabs-remotion

Included with Lifetime

$97 forever

Generate professional voiceovers using ElevenLabs AI. Use when the user needs to create voiceovers for videos, audio narration, or text-to-speech content. Supports multiple voices with character presets (narrator, salesperson, expert) for natural delivery. Includes single scene regeneration for fine-tuning.

Image & Video

What this skill does


# ElevenLabs Voiceover Generation

Generate professional AI voiceovers for Remotion videos using ElevenLabs API.

## Prerequisites

- `ELEVENLABS_API_KEY` in `.env.local`

## Quick Start

```bash
# Generate voiceover from text
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text here" --output public/audio/voiceover.mp3

# Generate with narrator style (more natural)
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text" --character narrator --output voiceover.mp3

# Generate scenes with request stitching
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes remotion/scenes.json --output-dir public/audio/project/

# Regenerate a single scene
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --scene scene2 --new-text "Updated text"

# List available voices and character presets
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-voices
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-characters
```

## Character Presets

Use character presets for more natural voiceovers instead of literal screen text reading:

| Character | Description | Best For |
|-----------|-------------|----------|
| `literal` | Reads text exactly as written | Screen text, quotes |
| `narrator` | Professional storyteller, smooth, engaging | Explainers, documentaries |
| `salesperson` | Enthusiastic, persuasive, energetic | Marketing, ads |
| `expert` | Authoritative, confident, knowledgeable | Legal content, tutorials |
| `conversational` | Casual, friendly, natural | Social media, casual content |
| `dramatic` | Intense, emotional, impactful | Hooks, problem statements |
| `calm` | Soothing, reassuring, gentle | Trust-building, conclusions |

```bash
# Use narrator style globally
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --character narrator --output-dir public/audio/

# Or set per-scene in scenes.json
{
  "scenes": [
    { "id": "scene1", "text": "Problem statement", "character": "dramatic" },
    { "id": "scene2", "text": "Solution", "character": "calm" }
  ]
}
```

## Scene-Based Generation with Request Stitching

Generate multiple scenes with consistent prosody using ElevenLabs request stitching:

### scenes.json Format

```json
{
  "name": "product-demo",
  "voice": "George",
  "character": "narrator",
  "scenes": [
    {
      "id": "scene1",
      "text": "Generic text-to-speech sounds robotic. Your brand deserves better.",
      "duration": 4.5,
      "character": "dramatic"
    },
    {
      "id": "scene2",
      "text": "With voice cloning, you can use your own voice for unlimited content.",
      "duration": 5.5
    },
    {
      "id": "scene3",
      "text": "Record a short sample. Clone it. Create professional voiceovers in minutes.",
      "duration": 6,
      "delay": 0.3
    }
  ]
}
```

### Generate All Scenes

```bash
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --scenes remotion/product-demo-scenes.json \
  --output-dir public/audio/product-demo/
```

This creates:
- `product-demo-scene1.mp3` through `sceneN.mp3`
- `product-demo-combined.mp3` (all scenes stitched)
- `product-demo-info.json` (metadata with durations)

### Single Scene Regeneration

If a scene starts too early, has wrong timing, or needs different text:

```bash
# Regenerate scene2 with new text
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --scenes remotion/scenes.json \
  --scene scene2 \
  --new-text "Updated scene 2 text" \
  --output-dir public/audio/project/

# Regenerate scene3 with different character
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --scenes remotion/scenes.json \
  --scene scene3 \
  --character salesperson \
  --output-dir public/audio/project/

# Just regenerate (same text, same character)
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --scenes remotion/scenes.json \
  --scene scene1 \
  --output-dir public/audio/project/

# Embed a thumbnail into an MP4 video
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --embed-thumbnail public/videos/my-video.mp4 \
  --thumbnail public/videos/my-thumbnail.png \
  --output public/videos/my-video-with-thumb.mp4
```

The tool automatically:
- Uses request stitching from previous scenes for consistent prosody
- Updates the info.json file with new metadata
- Updates scenes.json if `--new-text` is provided

## Thumbnail Embedding

Embed a thumbnail image into MP4 videos so platforms like Twitter, YouTube, and video players display your custom thumbnail instead of the first frame.

### Embed Thumbnail into Video

```bash
# Basic usage - outputs to video-thumb.mp4
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --embed-thumbnail public/videos/promo.mp4 \
  --thumbnail public/videos/thumbnail.png

# Custom output path
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --embed-thumbnail public/videos/promo.mp4 \
  --thumbnail public/videos/thumbnail.png \
  --output public/videos/promo-final.mp4
```

### Workflow with Remotion

```bash
# 1. Render your video
npx remotion render MyVideo public/videos/my-video.mp4

# 2. Render your thumbnail (use Still composition)
npx remotion still MyVideoThumbnail public/videos/my-thumbnail.png

# 3. Embed the thumbnail
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --embed-thumbnail public/videos/my-video.mp4 \
  --thumbnail public/videos/my-thumbnail.png \
  --output public/videos/my-video-final.mp4
```

### Supported Formats

- **Video**: MP4 (H.264/H.265)
- **Thumbnail**: PNG, JPG, JPEG

The embedding uses ffmpeg's `-disposition:v:1 attached_pic` flag to set the thumbnail as an attached picture, which most video players and platforms recognize.

## Timing Validation

The skill automatically validates timing after generation using `ffprobe`:

### What It Checks

| Check | Threshold | Description |
|-------|-----------|-------------|
| Duration mismatch | >15% | Warns if actual differs from expected duration |
| Leading silence | >200ms | Audio starts late (voiceover delayed) |
| Trailing silence | >500ms | Unnecessary silence at end |
| Speaking rate | 2-4.5 wps | Optimal ~3 words/second |

### Validate Existing Audio

```bash
# Validate all scenes in a project
node .claude/skills/elevenlabs-remotion-skill/generate.js --validate public/audio/product-demo/
```

Output example:
```
🔍 Validating product-demo (6 scenes)

❌ scene1: 3.00s (expected: 4.5s)
   ❌ Audio 1.50s shorter than expected
   👍 8 words @ 3.1 words/sec
⚠️ scene2: 6.35s (expected: 5.5s)
   ⚠️ Leading silence: 235ms (may start late)
   🐢 10 words @ 1.8 words/sec
✅ scene4: 4.36s (expected: 4s)
   👍 9 words @ 2.3 words/sec

📊 Total duration: 30.80s (expected: 30.00s)
```

### Updated info.json

After validation, the info.json includes actual measurements:
```json
{
  "scenes": [
    {
      "id": "scene1",
      "duration": 4.5,
      "actualDuration": 3.0,
      "leadingSilence": 0.05,
      "wordsPerSecond": 3.1
    }
  ]
}
```

Use `actualDuration` in your Remotion composition for precise sync.

## Options

| Option | Description | Default |
|--------|-------------|---------|
| `--text`, `-t` | Text to convert to speech | Required (or --file/--scenes) |
| `--file`, `-f` | Read text from file | - |
| `--output`, `-o` | Output file path | `output.mp3` |
| `--output-dir` | Output directory for scenes | `public/audio` |
| `--voice`, `-v` | Voice name or ID | `George` |
| `--model`, `-m` | Model ID | `eleven_multilingual_v2` |
| `--character`, `-c` | Character preset | `literal` |
| `--scenes` | JSON file with scenes | - |
| `--scene` | Regenerate single scene ID | - |
| `--new-text` | New text for scene regen | - |
| `--validate` | Validate existing audio dir | - |
| `--skip-validation` | Skip auto-validation | false |
| `--embed-thumbnail` | Video file to embed thumbnail into | - |
| `--thumbnail` | Thumbnail image file (PNG/JPG) | - |
| `--stability` | Voice stability (

Files: 7

Size: 80.7 KB

Complexity: 47/100

Category: Image & Video

Source: https://github.com/maartenlouis/elevenlabs-remotion-skill

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts