deepgram-performance-tuning
Optimize Deepgram API performance for faster transcription and lower latency. Use when improving transcription speed, reducing latency, or optimizing audio processing pipelines. Trigger: "deepgram performance", "speed up deepgram", "optimize transcription", "deepgram latency", "deepgram faster", "deepgram throughput".
What this skill does
# Deepgram Performance Tuning
## Overview
Optimize Deepgram transcription performance through audio preprocessing with ffmpeg, model selection for speed vs accuracy, streaming for large files, parallel processing, result caching, and connection reuse. Targets: <2s latency for short files, 100+ files/minute batch throughput.
## Performance Levers
| Factor | Impact | Default | Optimized |
|--------|--------|---------|-----------|
| Audio format | High | Any format | 16kHz mono WAV |
| Model | High | nova-3 | base (speed) or nova-3 (accuracy) |
| File size | High | Full file sync | Stream >60s, callback >5min |
| Concurrency | Medium | Sequential | 50 parallel (p-limit) |
| Caching | Medium | None | Redis hash by audio+options |
| Features | Medium | All enabled | Disable unused (diarize, utterances) |
## Instructions
### Step 1: Audio Preprocessing with ffmpeg
```bash
# Optimal format for Deepgram: 16kHz, 16-bit, mono, WAV
ffmpeg -i input.mp3 \
-ar 16000 \ # 16kHz sample rate (ideal for speech)
-ac 1 \ # Mono channel
-acodec pcm_s16le \ # 16-bit signed LE PCM
-f wav \
output.wav
# Remove silence (saves API cost + processing time)
ffmpeg -i input.wav \
-af "silenceremove=stop_periods=-1:stop_duration=0.5:stop_threshold=-30dB" \
-ar 16000 -ac 1 -acodec pcm_s16le \
trimmed.wav
# Noise reduction + normalization
ffmpeg -i input.wav \
-af "highpass=f=200,lowpass=f=3000,loudnorm=I=-16:TP=-1.5:LRA=11" \
-ar 16000 -ac 1 -acodec pcm_s16le \
clean.wav
```
```typescript
import { execSync } from 'child_process';
import { statSync } from 'fs';
function preprocessAudio(inputPath: string, outputPath: string): {
originalSize: number;
optimizedSize: number;
savings: string;
} {
const originalSize = statSync(inputPath).size;
execSync(`ffmpeg -y -i "${inputPath}" \
-af "silenceremove=stop_periods=-1:stop_duration=0.5:stop_threshold=-30dB,\
highpass=f=200,lowpass=f=3000" \
-ar 16000 -ac 1 -acodec pcm_s16le \
"${outputPath}" 2>/dev/null`);
const optimizedSize = statSync(outputPath).size;
const savings = ((1 - optimizedSize / originalSize) * 100).toFixed(1);
console.log(`Preprocessed: ${inputPath}`);
console.log(` Original: ${(originalSize / 1024).toFixed(0)}KB`);
console.log(` Optimized: ${(optimizedSize / 1024).toFixed(0)}KB (${savings}% smaller)`);
return { originalSize, optimizedSize, savings };
}
```
### Step 2: Model Selection Strategy
```typescript
import { createClient } from '@deepgram/sdk';
type Priority = 'accuracy' | 'speed' | 'cost';
function selectModel(priority: Priority, audioDuration: number): string {
// Nova-3: Best accuracy, fast, $0.0043/min (STT)
// Nova-2: Proven stable, fast, $0.0043/min
// Base: Fastest, lower accuracy, $0.0048/min
// Whisper: Multilingual (100+ langs), slower, $0.0048/min
switch (priority) {
case 'accuracy':
return 'nova-3';
case 'speed':
return audioDuration > 300 ? 'base' : 'nova-2'; // Base for long files
case 'cost':
return 'nova-2'; // Same price as Nova-3, slightly faster
default:
return 'nova-3';
}
}
// Feature cost: disable what you don't need
function optimizedOptions(priority: Priority) {
return {
model: selectModel(priority, 0),
smart_format: true, // Free — always enable
punctuate: true, // Free — always enable
// These add processing time:
diarize: priority === 'accuracy', // Adds latency
utterances: priority === 'accuracy',
paragraphs: priority === 'accuracy',
summarize: false, // Only when needed
detect_topics: false, // Only when needed
sentiment: false, // Only when needed
};
}
```
### Step 3: Streaming for Large Files
```typescript
import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';
import { createReadStream } from 'fs';
async function streamLargeFile(filePath: string): Promise<string> {
const deepgram = createClient(process.env.DEEPGRAM_API_KEY!);
const transcripts: string[] = [];
return new Promise((resolve, reject) => {
const connection = deepgram.listen.live({
model: 'nova-3',
smart_format: true,
encoding: 'linear16',
sample_rate: 16000,
channels: 1,
});
connection.on(LiveTranscriptionEvents.Open, () => {
// Stream file in 32KB chunks
const stream = createReadStream(filePath, { highWaterMark: 32 * 1024 });
stream.on('data', (chunk: Buffer) => {
connection.send(chunk);
});
stream.on('end', () => {
// Signal end of audio
connection.finish();
});
stream.on('error', reject);
});
connection.on(LiveTranscriptionEvents.Transcript, (data) => {
if (data.is_final) {
const text = data.channel.alternatives[0]?.transcript;
if (text) transcripts.push(text);
}
});
connection.on(LiveTranscriptionEvents.Close, () => {
resolve(transcripts.join(' '));
});
connection.on(LiveTranscriptionEvents.Error, reject);
});
}
```
### Step 4: Parallel Batch Processing
```typescript
import pLimit from 'p-limit';
import { createClient } from '@deepgram/sdk';
async function batchTranscribe(
files: string[],
concurrency = 50, // Stay under your plan's concurrency limit
model = 'nova-3'
) {
const client = createClient(process.env.DEEPGRAM_API_KEY!);
const limit = pLimit(concurrency);
const startTime = Date.now();
const results = await Promise.allSettled(
files.map((file, i) =>
limit(async () => {
const fileStart = Date.now();
const { result, error } = await client.listen.prerecorded.transcribeFile(
require('fs').readFileSync(file),
{ model, smart_format: true, mimetype: 'audio/wav' }
);
if (error) throw error;
const elapsed = Date.now() - fileStart;
console.log(`[${i + 1}/${files.length}] ${file} — ${elapsed}ms (${result.metadata.duration}s audio)`);
return { file, result, elapsed };
})
)
);
const totalTime = Date.now() - startTime;
const succeeded = results.filter(r => r.status === 'fulfilled').length;
console.log(`\nBatch: ${succeeded}/${files.length} in ${totalTime}ms`);
console.log(`Throughput: ${(files.length / (totalTime / 60000)).toFixed(1)} files/min`);
return results;
}
```
### Step 5: Result Caching
```typescript
import { createHash } from 'crypto';
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL ?? 'redis://localhost:6379');
function cacheKey(audioUrl: string, options: Record<string, any>): string {
const hash = createHash('sha256')
.update(audioUrl + JSON.stringify(options))
.digest('hex');
return `dg:cache:${hash}`;
}
async function cachedTranscribe(
client: ReturnType<typeof createClient>,
url: string,
options: Record<string, any>,
ttlSeconds = 3600 // 1 hour default
) {
const key = cacheKey(url, options);
// Check cache
const cached = await redis.get(key);
if (cached) {
console.log('Cache hit:', url.substring(0, 60));
return JSON.parse(cached);
}
// Transcribe and cache
const { result, error } = await client.listen.prerecorded.transcribeUrl(
{ url }, options
);
if (error) throw error;
await redis.setex(key, ttlSeconds, JSON.stringify(result));
console.log('Cached result:', url.substring(0, 60));
return result;
}
```
### Step 6: Performance Benchmarking
```typescript
async function benchmark(audioUrl: string) {
const client = createClient(process.env.DEEPGRAM_API_KEY!);
const models = ['nova-3', 'nova-2', 'base'] as const;
console.log('Performance Benchmark');
console.log('='.repeat(60));
for (const model of models) {
const times: number[] = [];
for (let i = 0; i < 3; i++) {
const start = Date.now();
const { result, error } = await client.listen.prerecorded.transcribeUrl(
{ url: audioUrl }, { model, smart_format: true }
);
timRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.