elevenlabs-rate-limits
Implement ElevenLabs rate limiting, concurrency queuing, and backoff patterns. Use when handling 429 errors, implementing retry logic, or managing concurrent TTS request throughput. Trigger: "elevenlabs rate limit", "elevenlabs throttling", "elevenlabs 429", "elevenlabs retry", "elevenlabs backoff", "elevenlabs concurrent requests".
What this skill does
# ElevenLabs Rate Limits
## Overview
Handle ElevenLabs rate limits with plan-aware concurrency queuing, exponential backoff, and quota monitoring. ElevenLabs uses two rate limit mechanisms: concurrent request limits (per plan) and system-level throttling.
## Prerequisites
- ElevenLabs SDK installed
- Understanding of your subscription plan's limits
- `p-queue` package (recommended): `npm install p-queue`
## Instructions
### Step 1: Understand the Two 429 Error Types
ElevenLabs returns HTTP 429 for two different reasons:
| 429 Variant | Response Body | Cause | Strategy |
|-------------|--------------|-------|----------|
| `too_many_concurrent_requests` | `{"detail":{"status":"too_many_concurrent_requests"}}` | Exceeded plan concurrency | Queue requests, don't backoff |
| `system_busy` | `{"detail":{"status":"system_busy"}}` | Server overload | Exponential backoff |
### Step 2: Plan Concurrency Limits
| Plan | Max Concurrent Requests | Characters/Month |
|------|------------------------|-------------------|
| Free | 2 | 10,000 |
| Starter | 3 | 30,000 |
| Creator | 5 | 100,000 |
| Pro | 10 | 500,000 |
| Scale | 15 | 2,000,000 |
| Business | 15 | Custom |
### Step 3: Concurrency-Aware Request Queue
```typescript
// src/elevenlabs/rate-limiter.ts
import PQueue from "p-queue";
type ElevenLabsPlan = "free" | "starter" | "creator" | "pro" | "scale" | "business";
const CONCURRENCY_LIMITS: Record<ElevenLabsPlan, number> = {
free: 2,
starter: 3,
creator: 5,
pro: 10,
scale: 15,
business: 15,
};
export function createRequestQueue(plan: ElevenLabsPlan) {
const concurrency = CONCURRENCY_LIMITS[plan];
const queue = new PQueue({
concurrency,
// Each queued request adds ~50ms to response time
// so keep queue depth reasonable
timeout: 120_000, // 2 minute timeout per request
throwOnTimeout: true,
});
queue.on("error", (error) => {
console.error("[ElevenLabs Queue] Request failed:", error.message);
});
return queue;
}
// Usage
const queue = createRequestQueue("pro"); // 10 concurrent
async function generateWithQueue(voiceId: string, text: string) {
return queue.add(async () => {
return client.textToSpeech.convert(voiceId, {
text,
model_id: "eleven_flash_v2_5",
});
});
}
// All 20 requests run with max 10 concurrent
const results = await Promise.all(
texts.map(text => generateWithQueue("21m00Tcm4TlvDq8ikWAM", text))
);
```
### Step 4: Exponential Backoff for system_busy
```typescript
// src/elevenlabs/backoff.ts
export async function withBackoff<T>(
operation: () => Promise<T>,
config = {
maxRetries: 5,
baseDelayMs: 1000,
maxDelayMs: 32_000,
jitterMs: 500,
}
): Promise<T> {
for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
try {
return await operation();
} catch (error: any) {
const status = error.statusCode || error.status;
const errorType = error.body?.detail?.status;
// Don't retry non-retryable errors
if (status === 401 || status === 400 || status === 404) throw error;
// For concurrent limit, retry immediately (queue handles spacing)
if (errorType === "too_many_concurrent_requests") {
if (attempt === config.maxRetries) throw error;
// Short pause — the queue is managing concurrency
await new Promise(r => setTimeout(r, 50 * (attempt + 1)));
continue;
}
// For system_busy or 5xx, exponential backoff with jitter
if (attempt === config.maxRetries) throw error;
const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
const jitter = Math.random() * config.jitterMs;
const delay = Math.min(exponentialDelay + jitter, config.maxDelayMs);
console.warn(`[ElevenLabs] ${errorType || status}. Retry ${attempt + 1}/${config.maxRetries} in ${delay.toFixed(0)}ms`);
await new Promise(r => setTimeout(r, delay));
}
}
throw new Error("Unreachable");
}
```
### Step 5: Quota Monitor
```typescript
// src/elevenlabs/quota-monitor.ts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
export class QuotaMonitor {
private characterCount = 0;
private characterLimit = 0;
private lastCheck = 0;
constructor(
private client: ElevenLabsClient,
private warningThresholdPct = 80,
private checkIntervalMs = 60_000
) {}
async check(): Promise<{
used: number;
limit: number;
remaining: number;
pctUsed: number;
warning: boolean;
}> {
const now = Date.now();
if (now - this.lastCheck > this.checkIntervalMs) {
const user = await this.client.user.get();
this.characterCount = user.subscription.character_count;
this.characterLimit = user.subscription.character_limit;
this.lastCheck = now;
}
const remaining = this.characterLimit - this.characterCount;
const pctUsed = (this.characterCount / this.characterLimit) * 100;
return {
used: this.characterCount,
limit: this.characterLimit,
remaining,
pctUsed: Math.round(pctUsed * 10) / 10,
warning: pctUsed >= this.warningThresholdPct,
};
}
async guardRequest(textLength: number): Promise<void> {
const quota = await this.check();
if (textLength > quota.remaining) {
throw new Error(
`Insufficient quota: need ${textLength} chars, have ${quota.remaining} remaining (${quota.pctUsed}% used)`
);
}
if (quota.warning) {
console.warn(`[ElevenLabs] Quota warning: ${quota.pctUsed}% used (${quota.remaining} chars remaining)`);
}
}
}
```
### Step 6: Combined Rate-Limited Client
```typescript
// src/elevenlabs/resilient-client.ts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createRequestQueue } from "./rate-limiter";
import { withBackoff } from "./backoff";
import { QuotaMonitor } from "./quota-monitor";
export function createResilientClient(plan: "free" | "starter" | "creator" | "pro" | "scale" = "pro") {
const client = new ElevenLabsClient({ maxRetries: 0 }); // We handle retries
const queue = createRequestQueue(plan);
const quota = new QuotaMonitor(client);
return {
async generateSpeech(voiceId: string, text: string, modelId = "eleven_multilingual_v2") {
await quota.guardRequest(text.length);
return queue.add(() =>
withBackoff(() =>
client.textToSpeech.convert(voiceId, {
text,
model_id: modelId,
})
)
);
},
getQueueStats() {
return {
pending: queue.pending,
size: queue.size,
};
},
checkQuota: () => quota.check(),
};
}
```
## Model Cost Impact on Quota
| Model | Credits per Character | 10,000 Chars Cost |
|-------|-----------------------|-------------------|
| `eleven_v3` | 1.0 | 10,000 credits |
| `eleven_multilingual_v2` | 1.0 | 10,000 credits |
| `eleven_flash_v2_5` | 0.5 | 5,000 credits |
| `eleven_turbo_v2_5` | 0.5 | 5,000 credits |
Use Flash/Turbo models during development to conserve quota.
## Error Handling
| Scenario | Detection | Response |
|----------|-----------|----------|
| Concurrent limit hit | 429 + `too_many_concurrent_requests` | Queue; retry after ~50ms per queued request |
| System busy | 429 + `system_busy` | Exponential backoff (1s, 2s, 4s, 8s...) |
| Quota exhausted | 401 + `quota_exceeded` | Stop requests; alert; wait for reset |
| Server error | 500-599 | Exponential backoff; max 5 retries |
## Resources
- [ElevenLabs Rate Limits Help](https://help.elevenlabs.io/hc/en-us/articles/19571824571921)
- [ElevenLabs Pricing](https://elevenlabs.io/pricing)
- [p-queue Documentation](https://github.com/sindresorhus/p-queue)
## Next Steps
For security configuration, see `elevenlabs-security-basics`.
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.