speakturbo-tts
Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
What this skill does
# speakturbo - Talk to your Claude!
Give your agent the ability to speak to you real-time. Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices.
## Quick Start
```bash
# Play immediately - you should hear "Hello world" through your speakers
speakturbo "Hello world"
# Output: ⚡ 92ms → ▶ 93ms → ✓ 1245ms
# Verify it's working by saving to file
speakturbo "Hello world" -o test.wav
ls -lh test.wav # Should show ~50-100KB file
```
**Output explained:** `⚡` = first audio received, `▶` = playback started, `✓` = done
## First Run
The **first execution takes 2-5 seconds** while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.
```bash
# First run (slow - daemon starting)
speakturbo "Starting up" # ~2-5 seconds
# Second run (fast - daemon already running)
speakturbo "Now I'm fast" # ~90ms
```
## Usage
```bash
# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"
# Save to file (no audio playback)
speakturbo "Hello" -o output.wav
# Save to specific file
speakturbo "Goodbye" -o goodbye.wav
# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q
# List available voices
speakturbo --list-voices
```
## Available Voices
| Voice | Type |
|-------|------|
| `alba` | Female (default) |
| `marius` | Male |
| `javert` | Male |
| `jean` | Male |
| `fantine` | Female |
| `cosette` | Female |
| `eponine` | Female |
| `azelma` | Female |
## Performance
| Metric | Value |
|--------|-------|
| Time to first sound | ~90ms (daemon warm) |
| First run | 2-5s (daemon startup) |
| Real-time factor | ~4x faster |
| Sample rate | 24kHz mono |
## Architecture
```
speakturbo (Rust CLI, 2.2MB)
│
│ HTTP streaming (port 7125)
▼
speakturbo-daemon (Python + pocket-tts)
│
│ Model in memory, auto-shutdown after 1hr idle
▼
Audio playback (rodio)
```
## Text Input
- **Encoding:** UTF-8
- **Quotes in text:** Use escaping: `speakturbo "She said \"hello\""`
- **Long text:** Supported, streams as it generates
## Output Path Security
The `-o` flag only writes to directories that are on the allowlist. By default, these are:
- `/tmp` and system temp directories
- Your current working directory
- `~/.speakturbo/`
If you need to write elsewhere, use `--allow-dir`:
```bash
speakturbo "Hello" -o /custom/path/audio.wav --allow-dir /custom/path
```
To permanently allow a directory, add it to `~/.speakturbo/config`:
```bash
mkdir -p ~/.speakturbo && echo "/custom/path" >> ~/.speakturbo/config
```
The config file is one directory per line. Lines starting with `#` are comments.
## Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success (audio played/saved) |
| 1 | Error (daemon connection failed, invalid args) |
## When to Use
**Use speakturbo when:**
- You need instant audio feedback (~90ms)
- Speed matters more than voice variety
- Built-in voices are sufficient
**Use `speak` instead when:**
- You need custom voice cloning (Morgan Freeman, etc.)
→ `speak "text" --voice ~/.chatter/voices/morgan_freeman.wav`
- You need emotion tags like `[laugh]`, `[sigh]`
- Quality/variety matters more than speed
See the `speak` skill documentation for full usage.
## Troubleshooting
**No audio plays:**
```bash
# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}
# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav # macOS
aplay /tmp/test.wav # Linux
```
**Daemon won't start:**
```bash
# Check port availability
lsof -i :7125
# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test" # Auto-restarts daemon
```
**First run is slow:**
This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).
## Daemon Management
The daemon auto-starts on first use and **auto-shuts down after 1 hour idle**.
```bash
# Check status
curl http://127.0.0.1:7125/health
# Manual stop
pkill -f "daemon_streaming"
# View logs
cat /tmp/speakturbo.log
```
## Comparison with speak
| Feature | speakturbo | speak |
|---------|------------|-------|
| Time to first sound | ~90ms | ~4-8s |
| Voice cloning | ❌ | ✅ |
| Emotion tags | ❌ | ✅ |
| Voices | 8 built-in | Custom wav files |
| Engine | pocket-tts | Chatterbox |
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.