livekit-voice-agent
Guide for building production-ready LiveKit voice AI agents with multi-agent workflows and intelligent handoffs. Use when creating real-time voice agents that need to transfer control between specialized agents, implement supervisor escalation, or build complex conversational systems.
What this skill does
# LiveKit Voice Agent with Multi-Agent Handoffs
Build production-ready voice AI agents using LiveKit Agents framework with support for multi-agent workflows, intelligent handoffs, and specialized agent capabilities.
---
## Overview
LiveKit Agents enables building real-time multimodal AI agents with voice capabilities. This skill helps you create sophisticated voice systems where multiple specialized agents can seamlessly hand off conversations based on context, user needs, or business logic.
### Key Capabilities
- **Multi-Agent Workflows**: Chain multiple specialized agents with different instructions, tools, and models
- **Intelligent Handoffs**: Transfer control between agents using function tools
- **Context Preservation**: Maintain conversation state and user data across agent transitions
- **Flexible Architecture**: Support for lateral handoffs (peer agents), escalations (human operators), and returns
- **Production Ready**: Built-in testing, Docker deployment, and monitoring support
---
## Architecture Patterns
### Core Components
1. **AgentSession**: Orchestrates the overall interaction, manages shared services (VAD, STT, LLM, TTS), and holds shared userdata
2. **Agent Classes**: Individual agents with specific instructions, function tools, and optional model overrides
3. **Handoff Mechanism**: Function tools that return new agent instances to transfer control
4. **Shared Context**: UserData dataclass that persists information across agent handoffs
### Workflow Structure
```
┌─────────────────────────────────────────────────┐
│ AgentSession (Orchestrator) │
│ ├─ Shared VAD, STT, TTS, LLM services │
│ ├─ Shared UserData context │
│ └─ Agent lifecycle management │
└─────────────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Agent A │ │ Agent B │ │ Agent C │
│ ├─Instructions │ ├─Instructions │ ├─Instructions
│ ├─Tools │ ├─Tools │ ├─Tools
│ └─Handoff │ └─Handoff │ └─Handoff
└─────────┘ └─────────┘ └─────────┘
```
---
## Implementation Process
### Phase 1: Research and Planning
#### 1.1 Study LiveKit Documentation
**Load core documentation:**
- LiveKit Agents Overview: Use WebFetch to load `https://docs.livekit.io/agents/`
- Building Voice Agents: `https://docs.livekit.io/agents/build/`
- Workflows Guide: `https://docs.livekit.io/agents/build/workflows/`
- Testing Framework: `https://docs.livekit.io/agents/build/testing/`
**Study example implementations:**
- Agent Starter Template: `https://github.com/livekit-examples/agent-starter-python`
- Multi-Agent Example: `https://github.com/livekit-examples/multi-agent-python`
- Voice Agent Examples: `https://github.com/livekit/agents/tree/main/examples/voice_agents`
**Load reference documentation:**
- [📋 Agent Best Practices](./reference/agent_best_practices.md)
- [🏗️ Multi-Agent Patterns](./reference/multi_agent_patterns.md)
- [🧪 Testing Guide](./reference/testing_guide.md)
#### 1.2 Define Your Use Case
Determine your agent workflow:
**Customer Support Pattern:**
```
Greeting Agent → Triage Agent → Technical Support → Escalation Agent
```
**Sales Pipeline Pattern:**
```
Intro Agent → Qualification Agent → Demo Agent → Account Executive Handoff
```
**Service Workflow Pattern:**
```
Reception Agent → Information Gathering → Specialist Agent → Confirmation Agent
```
**Plan your agents:**
- List each agent needed
- Define the role and instructions for each
- Identify handoff triggers and conditions
- Specify tools needed per agent
- Determine if agents need different models (STT/LLM/TTS)
#### 1.3 Design Shared Context
Create a dataclass to store information that persists across agents:
```python
from dataclasses import dataclass, field
@dataclass
class ConversationData:
"""Shared context across all agents"""
user_name: str = ""
user_email: str = ""
issue_category: str = ""
collected_details: list[str] = field(default_factory=list)
escalation_needed: bool = False
# Add fields relevant to your use case
```
---
### Phase 2: Implementation
#### 2.1 Set Up Project Structure
Use the provided template as a starting point:
```
your-agent-project/
├── src/
│ ├── agent.py # Main entry point
│ ├── agents/
│ │ ├── __init__.py
│ │ ├── intro_agent.py # Initial agent
│ │ ├── specialist_agent.py
│ │ └── escalation_agent.py
│ ├── models/
│ │ └── shared_data.py # UserData dataclass
│ └── tools/
│ └── custom_tools.py # Business-specific tools
├── tests/
│ └── test_agent.py # pytest tests
├── pyproject.toml # Dependencies with uv
├── .env.example # Environment variables template
├── Dockerfile # Container definition
└── README.md
```
**Use the quick start script or copy template files:**
- See [⚡ Quick Start Script](./scripts/quickstart.sh) for automated setup
- Or manually copy files from `./templates/` directory
#### 2.2 Initialize Project
**Install uv package manager:**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
**Create project with dependencies:**
```bash
# Initialize project
uv init your-agent-project
cd your-agent-project
# Add dependencies
uv add "livekit-agents>=1.3.3"
uv add "livekit-plugins-openai" # For OpenAI LLM & TTS
uv add "livekit-plugins-deepgram" # For Deepgram STT
uv add "livekit-plugins-silero" # For Silero VAD
uv add "python-dotenv" # For environment variables
# Add testing dependencies
uv add --dev "pytest"
uv add --dev "pytest-asyncio"
```
**Set up environment variables:**
```bash
# Copy from template
cp .env.example .env
# Edit with your credentials
# LIVEKIT_URL=wss://your-livekit-server.com
# LIVEKIT_API_KEY=your-api-key
# LIVEKIT_API_SECRET=your-api-secret
# OPENAI_API_KEY=your-openai-key
# DEEPGRAM_API_KEY=your-deepgram-key
```
#### 2.3 Implement Core Infrastructure
**Create main entry point (src/agent.py):**
Load the complete template: [🚀 Main Entry Point Template](./templates/main_entry_point.py)
Key patterns:
- Use `prewarm()` to load static resources (VAD models) before sessions start
- Initialize `AgentSession[YourDataClass]` with shared services
- Start with your initial agent in the entrypoint
- Use `@server.rtc_session()` decorator for the main handler
**Example structure:**
```python
from livekit import rtc
from livekit.agents import (
Agent,
AgentSession,
JobContext,
JobProcess,
WorkerOptions,
cli,
)
from livekit.plugins import openai, deepgram, silero
import logging
from dotenv import load_dotenv
from agents.intro_agent import IntroAgent
from models.shared_data import ConversationData
load_dotenv()
logger = logging.getLogger("voice-agent")
def prewarm(proc: JobProcess):
"""Load static resources before sessions start"""
# Load VAD model once and reuse across sessions
proc.userdata["vad"] = silero.VAD.load()
async def entrypoint(ctx: JobContext):
"""Main agent entry point"""
logger.info("Starting voice agent session")
# Get prewarmed VAD
vad = ctx.proc.userdata["vad"]
# Initialize session with shared services
session = AgentSession[ConversationData](
vad=vad,
stt=deepgram.STT(model="nova-2-general"),
llm=openai.LLM(model="gpt-4o-mini"),
tts=openai.TTS(voice="alloy"),
userdata=ConversationData(),
)
# Connect to room
await ctx.connect()
# Start with intro agent
intro_agent = IntroAgent()
# Run session (handles all handoffs automatically)
await session.start(agent=intro_agent, room=ctx.room)
if __name__ == "__main__":
cli.run_app(
WorkerOptions(
entrypoint_fnc=entrypoint,
prewarm_fnc=prewarm,
)
)
```
#### 2.4 Implement Agent Classes
**Agent structure:**
EacRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.