skill-elevenlabs-tts-tool
ElevenLabs text-to-speech CLI tool guide
What this skill does
# When to use - Converting text to speech with ElevenLabs API - Exploring available voices and models - Managing TTS subscriptions and usage - Integrating TTS into workflows and pipelines # ElevenLabs TTS Tool Skill ## Purpose Comprehensive guide for the `elevenlabs-tts-tool` CLI - a professional command-line interface for ElevenLabs text-to-speech synthesis. Provides both direct audio playback and file output with support for 42+ premium voices and multiple models. ## When to Use This Skill **Use this skill when:** - Converting text to speech for notifications, audiobooks, or content creation - Exploring and comparing different voice characteristics - Managing ElevenLabs subscription quotas and usage - Building voice-enabled workflows and automation - Integrating TTS into Claude Code hooks or other tools **Do NOT use this skill for:** - Direct ElevenLabs API programming (use SDK docs instead) - Custom voice cloning (requires ElevenLabs web interface) - Real-time streaming TTS (tool focuses on file/playback generation) ## CLI Tool: elevenlabs-tts-tool Professional text-to-speech CLI tool built with Python 3.13+, uv, and the ElevenLabs SDK. ### Installation ```bash # Clone repository git clone https://github.com/dnvriend/elevenlabs-tts-tool.git cd elevenlabs-tts-tool # Install globally with uv uv tool install . # Verify installation elevenlabs-tts-tool --version ``` ### Prerequisites - **Python**: 3.13 or higher - **API Key**: ElevenLabs API key (get from https://elevenlabs.io/app/settings/api-keys) - **Environment Variable**: `export ELEVENLABS_API_KEY='your-api-key'` ### Quick Start ```bash # Set API key export ELEVENLABS_API_KEY='your-api-key' # Basic text-to-speech elevenlabs-tts-tool synthesize "Hello world" # Use different voice elevenlabs-tts-tool synthesize "Hello" --voice adam # Save to file elevenlabs-tts-tool synthesize "Text" --output speech.mp3 ``` ## Progressive Disclosure <details> <summary><strong>๐ Core Commands (Click to expand)</strong></summary> ### synthesize - Convert Text to Speech Convert text to speech using ElevenLabs API. Supports direct playback or file output. **Usage:** ```bash elevenlabs-tts-tool synthesize [TEXT] [OPTIONS] ``` **Arguments:** - `TEXT`: Text to synthesize (optional if --stdin used) - `--stdin, -s`: Read text from stdin instead of argument - `--voice, -v NAME`: Voice name or ID (default: rachel) - `--model, -m ID`: Model ID (default: eleven_turbo_v2_5) - `--output, -o PATH`: Save to audio file instead of playing - `--format, -f FORMAT`: Output format (default: mp3_44100_128) **Examples:** ```bash # Basic usage - play through speakers elevenlabs-tts-tool synthesize "Hello world" # Use different voice elevenlabs-tts-tool synthesize "Hello" --voice adam # Use specific model elevenlabs-tts-tool synthesize "Hello" --model eleven_multilingual_v2 # Emotional expression (requires eleven_v3 model) elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3 # Multiple emotions elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3 # Add pauses with SSML elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three." # Read from stdin echo "Text from pipeline" | elevenlabs-tts-tool synthesize --stdin # Save to file elevenlabs-tts-tool synthesize "Text" --output speech.mp3 # Pipeline integration cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audiobook.mp3 ``` **Output:** Plays audio through default speakers or saves to specified file format. **Available Formats:** - `mp3_44100_128` (default): MP3, 44.1kHz, 128kbps - `mp3_44100_64`: MP3, 44.1kHz, 64kbps - `mp3_22050_32`: MP3, 22.05kHz, 32kbps - `pcm_44100`: PCM WAV, 44.1kHz (requires Pro tier) --- ### list-voices - Show Available Voices List all available ElevenLabs voices with characteristics. **Usage:** ```bash elevenlabs-tts-tool list-voices ``` **Examples:** ```bash # List all voices elevenlabs-tts-tool list-voices # Filter by gender elevenlabs-tts-tool list-voices | grep female elevenlabs-tts-tool list-voices | grep male # Filter by accent elevenlabs-tts-tool list-voices | grep British elevenlabs-tts-tool list-voices | grep American # Filter by age elevenlabs-tts-tool list-voices | grep young elevenlabs-tts-tool list-voices | grep middle_aged # Combine filters elevenlabs-tts-tool list-voices | grep "female.*young.*British" ``` **Output:** ``` Voice Gender Age Accent Description ==================================================================================================== rachel female young American Calm and friendly American voice... adam male middle_aged American Deep, authoritative American male... charlotte female middle_aged British Smooth, professional British voice... ... ==================================================================================================== Total: 42 voices available ``` **Popular Voices:** - **rachel**: Calm, friendly American female (default) - **adam**: Deep, authoritative American male - **charlotte**: Professional British female - **josh**: Young, casual American male - **bella**: Expressive Italian female --- ### list-models - Show TTS Models List all available ElevenLabs TTS models with characteristics and use cases. **Usage:** ```bash elevenlabs-tts-tool list-models ``` **Examples:** ```bash # List all models elevenlabs-tts-tool list-models # Filter by status elevenlabs-tts-tool list-models | grep stable elevenlabs-tts-tool list-models | grep deprecated # Find low-latency models elevenlabs-tts-tool list-models | grep -i "ultra-low" # Find multilingual models elevenlabs-tts-tool list-models | grep -i "multilingual" ``` **Output:** Comprehensive model information including: - Model ID and version - Quality and latency characteristics - Language support (mono vs multilingual) - Character limits - Best use cases - Special features (emotions, etc.) **Key Models:** - **eleven_turbo_v2_5**: Fast, high-quality (default, best value) - **eleven_flash_v2_5**: Ultra-low latency (real-time applications) - **eleven_multilingual_v2**: 29 languages, production quality - **eleven_v3**: Most expressive with emotion tags (alpha, 2x cost) **Cost Multipliers:** - Turbo/Flash models: 1x cost - Multilingual v2: 1x cost - v3 models: 2x cost (half the minutes/tokens) --- ### info - Show Subscription Info Display subscription tier, character usage, quota limits, and historical usage. **Usage:** ```bash elevenlabs-tts-tool info [--days N] ``` **Arguments:** - `--days, -d N`: Number of days of historical usage to display (default: 7) **Examples:** ```bash # View subscription with last 7 days of usage elevenlabs-tts-tool info # View last 30 days of usage elevenlabs-tts-tool info --days 30 # Quick quota check (1 day) elevenlabs-tts-tool info --days 1 # Check usage before long generation elevenlabs-tts-tool info --days 1 && elevenlabs-tts-tool synthesize "Long text..." ``` **Output Information:** - Subscription tier and status - Character usage (used/limit/remaining) - Quota reset date - Historical usage breakdown by day - Average daily usage - Projected monthly usage - Warnings when approaching quota limits **Use Cases:** - Monitor character quota consumption - Track usage patterns over time - Plan when to upgrade subscription tier - Avoid hitting quota limits unexpectedly - Identify high-usage periods --- ### update-voices - Update Voice Table Fetch latest voices from ElevenLabs API and update local lookup table. **Usage:** ```bash elevenlabs-tts-tool update-voices [--output PATH] ``` **Arguments:** - `--output, -o PATH`: Output file path (default: ~/.config/elevenlabs-tts-tool/voices_lookup.json) **Examples:** ```bash # Update default voice lookup (user config directory) elevenlabs-tts-tool update-voices # Save to
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.