elevenlabs-core-workflow-b

Included with Lifetime

$97 forever

Implement ElevenLabs speech-to-speech, sound effects, audio isolation, and speech-to-text. Use when converting voice to another voice, generating sound effects from text, removing background noise, or transcribing audio. Trigger: "elevenlabs speech to speech", "voice changer", "sound effects", "audio isolation", "remove background noise", "elevenlabs transcribe".

Image & Videosaasvoiceaielevenlabsspeech-to-speechsound-effectsaudio-isolation

What this skill does

# ElevenLabs Core Workflow B — Speech-to-Speech, Sound Effects & Audio Isolation

## Overview

Secondary ElevenLabs workflows beyond TTS: (1) Speech-to-Speech voice conversion, (2) Sound Effects generation from text descriptions, (3) Audio Isolation for noise removal, and (4) Speech-to-Text transcription.

## Prerequisites

- Completed `elevenlabs-install-auth` setup
- For STS: source audio file in MP3/WAV/M4A format
- For audio isolation: noisy audio file to clean

## Instructions

### Step 1: Speech-to-Speech (Voice Changer)

Transform audio from one voice to another using `POST /v1/speech-to-speech/{voice_id}`:

```typescript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createReadStream, createWriteStream } from "fs";
import { Readable } from "stream";
import { pipeline } from "stream/promises";

const client = new ElevenLabsClient();

async function speechToSpeech(
  sourceAudioPath: string,
  targetVoiceId: string,
  outputPath: string
) {
  const audio = await client.speechToSpeech.convert(targetVoiceId, {
    audio: createReadStream(sourceAudioPath),
    model_id: "eleven_english_sts_v2",  // STS-specific model
    voice_settings: JSON.stringify({
      stability: 0.5,
      similarity_boost: 0.8,
      style: 0.0,
    }),
    remove_background_noise: true,  // Built-in noise removal
  });

  await pipeline(Readable.fromWeb(audio as any), createWriteStream(outputPath));
  console.log(`Voice-converted audio saved to ${outputPath}`);
}

// Convert your voice recording to sound like "Rachel"
await speechToSpeech(
  "my_recording.mp3",
  "21m00Tcm4TlvDq8ikWAM",
  "converted.mp3"
);
```

**cURL equivalent:**

```bash
curl -X POST "https://api.elevenlabs.io/v1/speech-to-speech/21m00Tcm4TlvDq8ikWAM" \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -F "audio=@my_recording.mp3" \
  -F "model_id=eleven_english_sts_v2" \
  -F 'voice_settings={"stability":0.5,"similarity_boost":0.8}' \
  -F "remove_background_noise=true" \
  --output converted.mp3
```

### Step 2: Sound Effects Generation

Generate cinematic sound effects from text descriptions using `POST /v1/sound-generation`:

```typescript
async function generateSoundEffect(
  description: string,
  outputPath: string,
  options?: {
    duration?: number;      // 0.5-30 seconds (null = auto)
    promptInfluence?: number; // 0-1 (default 0.3, higher = follows prompt more closely)
    loop?: boolean;          // Seamless looping (default false)
  }
) {
  const audio = await client.textToSoundEffects.convert({
    text: description,
    duration_seconds: options?.duration,
    prompt_influence: options?.promptInfluence ?? 0.3,
    // model_id: "eleven_text_to_sound_v2",  // default
  });

  await pipeline(Readable.fromWeb(audio as any), createWriteStream(outputPath));
  console.log(`Sound effect saved to ${outputPath}`);
}

// Generate various sound effects
await generateSoundEffect(
  "Heavy rain on a tin roof with distant thunder",
  "rain.mp3",
  { duration: 10, promptInfluence: 0.6 }
);

await generateSoundEffect(
  "Sci-fi laser gun firing three quick bursts",
  "laser.mp3",
  { duration: 3, promptInfluence: 0.8 }
);

await generateSoundEffect(
  "Gentle forest ambiance with birds chirping",
  "forest_loop.mp3",
  { duration: 15, loop: true }  // Seamless loop for background audio
);
```

**cURL equivalent:**

```bash
curl -X POST "https://api.elevenlabs.io/v1/sound-generation" \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Heavy rain on a tin roof with distant thunder",
    "duration_seconds": 10,
    "prompt_influence": 0.6
  }' \
  --output rain.mp3
```

### Step 3: Audio Isolation (Voice Isolator)

Remove background noise from audio using `POST /v1/audio-isolation`:

```typescript
async function isolateVoice(
  noisyAudioPath: string,
  cleanOutputPath: string
) {
  const cleanAudio = await client.audioIsolation.audioIsolation({
    audio: createReadStream(noisyAudioPath),
  });

  await pipeline(
    Readable.fromWeb(cleanAudio as any),
    createWriteStream(cleanOutputPath)
  );
  console.log(`Clean audio saved to ${cleanOutputPath}`);
}

// Remove background noise from a recording
await isolateVoice("noisy_interview.mp3", "clean_interview.mp3");
```

**Streaming variant** for large files (`POST /v1/audio-isolation/stream`):

```typescript
async function isolateVoiceStreaming(
  noisyAudioPath: string,
  cleanOutputPath: string
) {
  const stream = await client.audioIsolation.audioIsolationStream({
    audio: createReadStream(noisyAudioPath),
  });

  const writer = createWriteStream(cleanOutputPath);
  for await (const chunk of stream) {
    writer.write(chunk);
  }
  writer.end();
}
```

**cURL equivalent:**

```bash
curl -X POST "https://api.elevenlabs.io/v1/audio-isolation" \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -F "audio=@noisy_interview.mp3" \
  --output clean_interview.mp3
```

### Step 4: Speech-to-Text (Transcription)

Transcribe audio with speaker diarization using `POST /v1/speech-to-text`:

```typescript
async function transcribeAudio(audioPath: string) {
  const result = await client.speechToText.convert({
    audio: createReadStream(audioPath),
    model_id: "scribe_v1",  // ElevenLabs' STT model
    // language_code: "en",  // Optional: force language
    // diarize: true,        // Enable speaker detection
    // timestamps_granularity: "word",  // "word" or "character"
  });

  console.log("Transcription:", result.text);

  // Word-level timestamps
  if (result.words) {
    for (const word of result.words) {
      console.log(`[${word.start.toFixed(2)}-${word.end.toFixed(2)}] ${word.text}`);
    }
  }

  return result;
}

await transcribeAudio("podcast_episode.mp3");
```

## API Endpoint Summary

| Feature | Method | Endpoint | Billing |
|---------|--------|----------|---------|
| Speech-to-Speech | POST | `/v1/speech-to-speech/{voice_id}` | Per character |
| Sound Effects | POST | `/v1/sound-generation` | Per generation |
| Audio Isolation | POST | `/v1/audio-isolation` | 1,000 chars/min of audio |
| Audio Isolation Stream | POST | `/v1/audio-isolation/stream` | 1,000 chars/min of audio |
| Speech-to-Text | POST | `/v1/speech-to-text` | Per audio minute |

## Sound Effect Tips

- Be specific: "wooden door creaking slowly open in a quiet room" beats "door sound"
- Specify quantity: "three quick gunshots" vs "gunshots"
- Set mood: "eerie", "cheerful", "aggressive" changes the output character
- Use `prompt_influence: 0.6-0.8` for precise results, `0.2-0.4` for creative variation
- Max duration: 30 seconds per generation

## Audio Isolation Limits

| Aspect | Limit |
|--------|-------|
| Max file size | 500 MB |
| Max duration | 1 hour |
| Supported formats | MP3, WAV, M4A, FLAC, OGG, WEBM |
| PCM optimization | Use `file_format: "pcm_s16le_16"` for lowest latency |

## Error Handling

| Error | HTTP | Cause | Solution |
|-------|------|-------|----------|
| `model_can_not_do_voice_conversion` | 400 | Wrong model for STS | Use `eleven_english_sts_v2` |
| `audio_too_short` | 400 | STS input under 1 second | Use longer audio clip |
| `audio_too_long` | 400 | STS input over limit | Trim to under 5 minutes |
| `invalid_sound_prompt` | 400 | Nonsensical SFX description | Write descriptive, specific prompts |
| `file_too_large` | 413 | Audio isolation over 500MB | Compress or split the file |
| `quota_exceeded` | 401 | Character/generation limit hit | Check usage dashboard |

## Resources

- [Speech-to-Speech API](https://elevenlabs.io/docs/api-reference/speech-to-speech/convert)
- [Sound Effects API](https://elevenlabs.io/docs/api-reference/text-to-sound-effects/convert)
- [Audio Isolation API](https://elevenlabs.io/docs/api-reference/audio-isolation/convert)
- [Speech-to-Text API](https://elevenlabs.io/docs/api-reference/speech-to-text/convert)

## Next Steps

For common errors, see `elevenlabs-common-errors`. For SDK patterns, see `elevenlabs-sdk-patterns`.

Files: 1

Size: 8.5 KB

Complexity: 19/100

Category: Image & Video

Source: https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/elevenlabs-pack/skills/elevenlabs-core-workflow-b

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts