deepgram-core-workflow-a
Implement production pre-recorded speech-to-text with Deepgram. Use when building audio transcription, batch processing, or implementing diarization and intelligence features. Trigger: "deepgram transcription", "speech to text", "transcribe audio", "batch transcription", "deepgram nova", "diarize audio".
What this skill does
# Deepgram Core Workflow A: Pre-recorded Transcription
## Overview
Production pre-recorded transcription service using Deepgram's REST API. Covers `transcribeUrl` and `transcribeFile`, speaker diarization, audio intelligence (summarization, topic detection, sentiment, intent), batch processing with concurrency control, and callback-based async transcription for large files.
## Prerequisites
- `@deepgram/sdk` installed, `DEEPGRAM_API_KEY` configured
- Audio files: WAV, MP3, FLAC, OGG, M4A, or WebM
- For batch: `p-limit` package (`npm install p-limit`)
## Instructions
### Step 1: Transcription Service Class
```typescript
import { createClient, DeepgramClient } from '@deepgram/sdk';
import { readFileSync } from 'fs';
interface TranscribeOptions {
model?: 'nova-3' | 'nova-2' | 'nova-2-meeting' | 'nova-2-phonecall' | 'base';
language?: string;
diarize?: boolean;
utterances?: boolean;
paragraphs?: boolean;
smart_format?: boolean;
summarize?: boolean; // Audio intelligence
detect_topics?: boolean; // Topic detection
sentiment?: boolean; // Sentiment analysis
intents?: boolean; // Intent recognition
keywords?: string[]; // Keyword boosting: ["term:weight"]
callback?: string; // Async callback URL
}
class DeepgramTranscriber {
private client: DeepgramClient;
constructor(apiKey: string) {
this.client = createClient(apiKey);
}
async transcribeUrl(url: string, opts: TranscribeOptions = {}) {
const { result, error } = await this.client.listen.prerecorded.transcribeUrl(
{ url },
{
model: opts.model ?? 'nova-3',
language: opts.language ?? 'en',
smart_format: opts.smart_format ?? true,
diarize: opts.diarize ?? false,
utterances: opts.utterances ?? false,
paragraphs: opts.paragraphs ?? false,
summarize: opts.summarize ? 'v2' : undefined,
detect_topics: opts.detect_topics ?? false,
sentiment: opts.sentiment ?? false,
intents: opts.intents ?? false,
keywords: opts.keywords,
callback: opts.callback,
}
);
if (error) throw new Error(`Transcription failed: ${error.message}`);
return result;
}
async transcribeFile(filePath: string, opts: TranscribeOptions = {}) {
const audio = readFileSync(filePath);
const mimetype = this.detectMimetype(filePath);
const { result, error } = await this.client.listen.prerecorded.transcribeFile(
audio,
{
model: opts.model ?? 'nova-3',
smart_format: opts.smart_format ?? true,
mimetype,
diarize: opts.diarize ?? false,
utterances: opts.utterances ?? false,
summarize: opts.summarize ? 'v2' : undefined,
detect_topics: opts.detect_topics ?? false,
sentiment: opts.sentiment ?? false,
}
);
if (error) throw new Error(`File transcription failed: ${error.message}`);
return result;
}
private detectMimetype(path: string): string {
const ext = path.split('.').pop()?.toLowerCase();
const map: Record<string, string> = {
wav: 'audio/wav', mp3: 'audio/mpeg', flac: 'audio/flac',
ogg: 'audio/ogg', m4a: 'audio/mp4', webm: 'audio/webm',
};
return map[ext ?? ''] ?? 'audio/wav';
}
}
```
### Step 2: Extract Structured Results
```typescript
function formatResult(result: any) {
const channel = result.results.channels[0];
const alt = channel.alternatives[0];
return {
transcript: alt.transcript,
confidence: alt.confidence,
words: alt.words?.map((w: any) => ({
word: w.word,
start: w.start,
end: w.end,
confidence: w.confidence,
speaker: w.speaker, // Only if diarize: true
punctuated_word: w.punctuated_word,
})),
// Speaker segments (requires utterances: true + diarize: true)
utterances: result.results.utterances?.map((u: any) => ({
speaker: u.speaker,
text: u.transcript,
start: u.start,
end: u.end,
confidence: u.confidence,
})),
// Audio intelligence results
summary: result.results.summary?.short,
topics: result.results.topics?.segments,
sentiments: result.results.sentiments?.segments,
intents: result.results.intents?.segments,
metadata: {
duration: result.metadata.duration,
channels: result.metadata.channels,
model: result.metadata.model_info,
request_id: result.metadata.request_id,
},
};
}
```
### Step 3: Batch Processing
```typescript
import pLimit from 'p-limit';
async function batchTranscribe(
files: string[],
opts: TranscribeOptions = {},
concurrency = 5
) {
const transcriber = new DeepgramTranscriber(process.env.DEEPGRAM_API_KEY!);
const limit = pLimit(concurrency);
const results = await Promise.allSettled(
files.map(file =>
limit(async () => {
const result = await transcriber.transcribeFile(file, opts);
console.log(`Done: ${file} (${result.metadata.duration}s)`);
return { file, result: formatResult(result) };
})
)
);
const succeeded = results.filter(r => r.status === 'fulfilled');
const failed = results.filter(r => r.status === 'rejected');
console.log(`Batch complete: ${succeeded.length} ok, ${failed.length} failed`);
return results;
}
```
### Step 4: Async Callback Transcription (Large Files)
```typescript
// For files >2 hours or when you don't want to hold a connection open,
// use Deepgram's callback feature. Deepgram POSTs results to your URL.
async function submitAsync(audioUrl: string, callbackUrl: string) {
const transcriber = new DeepgramTranscriber(process.env.DEEPGRAM_API_KEY!);
// Deepgram returns a request_id immediately, processes in background
const result = await transcriber.transcribeUrl(audioUrl, {
model: 'nova-3',
diarize: true,
callback: callbackUrl, // Your HTTPS endpoint
});
console.log('Submitted. Request ID:', result.metadata.request_id);
// Deepgram will POST results to callbackUrl when done
// Retries up to 10 times with 30s delay on failure
}
```
### Step 5: Keyword Boosting
```typescript
// Boost domain-specific terms for higher accuracy
const result = await transcriber.transcribeUrl(audioUrl, {
model: 'nova-3',
keywords: [
'Kubernetes:1.5', // Boost weight 1.0-2.0
'PostgreSQL:1.5',
'microservices:1.3',
],
});
```
## Output
- `DeepgramTranscriber` class with URL and file transcription
- Structured result extraction with word-level timing, speakers, and intelligence
- Batch processing with configurable concurrency via `p-limit`
- Async callback pattern for large files
- Keyword boosting for domain vocabulary
## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| `400 Bad Request` | Invalid audio format | Verify file header bytes (WAV: `RIFF`, MP3: `0xFFF3/0xFFFB`) |
| `413 Payload Too Large` | File exceeds limit | Use callback URL for async processing |
| Empty transcript | No speech in audio | Check audio volume, try `alternatives: 3` for confidence |
| `408 Timeout` | Long file, sync mode | Switch to callback-based async |
| Low confidence | Background noise | Preprocess: `ffmpeg -i input.wav -af "highpass=f=200,lowpass=f=3000" clean.wav` |
## Resources
- [Pre-recorded API](https://developers.deepgram.com/docs/pre-recorded-audio)
- [Speaker Diarization](https://developers.deepgram.com/docs/diarization)
- [Audio Intelligence](https://deepgram.com/product/audio-intelligence)
- [Summarization](https://developers.deepgram.com/docs/summarization)
- [Callback Feature](https://developers.deepgram.com/docs/callback)
## Next Steps
Proceed to `deepgram-core-workflow-b` for real-time streaming transcription.
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.