agents-py

Included with Lifetime

$97 forever

Build LiveKit Agent backends in Python. Use this skill when creating voice AI agents, voice assistants, or any realtime AI application using LiveKit's Python Agents SDK (livekit-agents). Covers AgentSession, Agent class, function tools, STT/LLM/TTS models, turn detection, and multi-agent workflows.

Image & Video

What this skill does


# LiveKit Agents Python SDK

Build voice AI agents with LiveKit's Python Agents SDK.

## LiveKit MCP server tools

This skill works alongside the LiveKit MCP server, which provides direct access to the latest LiveKit documentation, code examples, and changelogs. Use these tools when you need up-to-date information that may have changed since this skill was created.

**Available MCP tools:**
- `docs_search` - Search the LiveKit docs site
- `get_pages` - Fetch specific documentation pages by path
- `get_changelog` - Get recent releases and updates for LiveKit packages
- `code_search` - Search LiveKit repositories for code examples
- `get_python_agent_example` - Browse 100+ Python agent examples

**When to use MCP tools:**
- You need the latest API documentation or feature updates
- You're looking for recent examples or code patterns
- You want to check if a feature has been added in recent releases
- The local references don't cover a specific topic

**When to use local references:**
- You need quick access to core concepts covered in this skill
- You're working offline or want faster access to common patterns
- The information in the references is sufficient for your needs

Use MCP tools and local references together for the best experience.

## References

Consult these resources as needed:

- ./references/livekit-overview.md -- LiveKit ecosystem overview and how these skills work together
- ./references/agent-session.md -- AgentSession lifecycle, events, and configuration
- ./references/tools.md -- Function tools, RunContext, and tool results
- ./references/models.md -- STT, LLM, TTS model strings and plugin configuration
- ./references/workflows.md -- Multi-agent handoffs, Tasks, TaskGroups, and pipeline nodes

## Installation

```bash
uv add "livekit-agents[silero,turn-detector]~=1.3" \
  "livekit-plugins-noise-cancellation~=0.2" \
  "python-dotenv"
```

## Environment variables

Use the LiveKit CLI to load your credentials into a `.env.local` file:

```bash
lk app env -w
```

Or manually create a `.env.local` file:

```bash
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
LIVEKIT_URL=wss://your-project.livekit.cloud
```

## Quick start

### Basic agent with STT-LLM-TTS pipeline

```python
from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import AgentSession, Agent, AgentServer, room_io
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv(".env.local")

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant.
            Keep responses concise, 1-3 sentences. No markdown or emojis.""",
        )

server = AgentServer()

@server.rtc_session()
async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt="assemblyai/universal-streaming:en",
        llm="openai/gpt-4.1-mini",
        tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
                    if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
                    else noise_cancellation.BVC(),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(server)
```

### Basic agent with realtime model

```python
from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import AgentSession, Agent, AgentServer, room_io
from livekit.plugins import openai, noise_cancellation

load_dotenv(".env.local")

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful voice AI assistant."
        )

server = AgentServer()

@server.rtc_session()
async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        llm=openai.realtime.RealtimeModel(voice="coral")
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
                    if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
                    else noise_cancellation.BVC(),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(server)
```

## Core concepts

### Agent class

Define agent behavior by subclassing `Agent`:

```python
from livekit.agents import Agent, function_tool

class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="Your system prompt here",
        )

    async def on_enter(self) -> None:
        """Called when agent becomes active."""
        await self.session.generate_reply(
            instructions="Greet the user"
        )

    async def on_exit(self) -> None:
        """Called before agent hands off to another agent."""
        pass

    @function_tool()
    async def my_tool(self, param: str) -> str:
        """Tool description for the LLM."""
        return f"Result: {param}"
```

### AgentSession

The session orchestrates the voice pipeline:

```python
session = AgentSession(
    stt="assemblyai/universal-streaming:en",
    llm="openai/gpt-4.1-mini",
    tts="cartesia/sonic-3:voice_id",
    vad=silero.VAD.load(),
    turn_detection=MultilingualModel(),
)
```

Key methods:
- `session.start(room, agent)` - Start the session
- `session.say(text)` - Speak text directly
- `session.generate_reply(instructions)` - Generate LLM response
- `session.interrupt()` - Stop current speech
- `session.update_agent(new_agent)` - Switch to different agent

### Function tools

Use the `@function_tool` decorator:

```python
from livekit.agents import function_tool, RunContext

@function_tool()
async def get_weather(self, context: RunContext, location: str) -> str:
    """Get the current weather for a location."""
    return f"Weather in {location}: Sunny, 72°F"
```

## Running the agent

```bash
# Development mode with auto-reload
uv run agent.py dev

# Console mode (local testing)
uv run agent.py console

# Production mode
uv run agent.py start

# Download required model files
uv run agent.py download-files
```

## LiveKit Inference model strings

Use model strings for simple configuration without API keys:

**STT (Speech-to-Text)**:
- `"assemblyai/universal-streaming:en"` - AssemblyAI streaming
- `"deepgram/nova-3:en"` - Deepgram Nova
- `"cartesia/ink"` - Cartesia STT

**LLM (Large Language Model)**:
- `"openai/gpt-4.1-mini"` - GPT-4.1 mini (recommended)
- `"openai/gpt-4.1"` - GPT-4.1
- `"openai/gpt-5"` - GPT-5
- `"gemini/gemini-3-flash"` - Gemini 3 Flash
- `"gemini/gemini-2.5-flash"` - Gemini 2.5 Flash

**TTS (Text-to-Speech)**:
- `"cartesia/sonic-3:{voice_id}"` - Cartesia Sonic 3
- `"elevenlabs/eleven_turbo_v2_5:{voice_id}"` - ElevenLabs
- `"deepgram/aura:{voice}"` - Deepgram Aura

## Best practices

1. **Always use LiveKit Inference model strings** as the default for STT, LLM, and TTS. This eliminates the need to manage individual provider API keys. Only use plugins when you specifically need custom models, voice cloning, Anthropic Claude, or self-hosted models.
2. **Use adaptive noise cancellation** with a lambda to detect SIP participants and apply appropriate noise cancellation (BVCTelephony for phone calls, BVC for standard participants).
3. **Use MultilingualModel turn detection** for natur

Files: 6

Size: 37.1 KB

Complexity: 49/100

Category: Image & Video

Source: https://github.com/codestackr/livekit-skills/tree/main/plugins/livekit-agents-py/skills/agents-py

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts