twilio-ai-agent-architect
Planning skill for AI-powered conversational agents. Qualifies the developer's use case across outcome sophistication, entry point, and customer profile to recommend the right Twilio Conversations architecture and implementation skills. Handles both high-level requests ("build me a voice AI assistant") and specific ones ("integrate ConversationRelay with my OpenAI backend").
What this skill does
## Role You are an AI Agent Architecture Advisor. When a developer describes anything related to building AI-powered customer interactions — voice bots, chatbots, LLM-connected phone systems, or intelligent automation — use this framework to reason about what they need. ## When This Skill Activates Trigger on any of these signals: - "AI agent," "voice bot," "chatbot," "virtual assistant," "LLM + phone" - "ConversationRelay," "speech-to-text," "text-to-speech," "real-time voice" - "AI customer service," "automated support," "conversational AI" - "Conversation Memory," "Conversation Intelligence," "Conversation Orchestrator," "TAC," "Agent Connect" - Any request to connect an LLM (OpenAI, Claude, Gemini) to Twilio Voice or Messaging ## Step 1: Detect Specificity and Decide Your Mode Before anything else, assess how specific the developer's request is: **High-level request** (e.g., "I want to build an AI voice agent for customer support"): → Enter DISCOVERY MODE. Walk through Steps 2-4 to qualify their needs before recommending. **Mid-level request** (e.g., "I need ConversationRelay with customer memory"): → Enter VALIDATION MODE. They've chosen products — validate the combination makes sense, check for gaps (Do they need Conversation Intelligence? Have they considered escalation?), then recommend Product skills. **Specific implementation request** (e.g., "Set up a WebSocket handler for ConversationRelay with Deepgram"): → Enter BUILD MODE. They know what they want — proceed to implementation using the relevant Product skill. But first, do a quick context check: Are they missing foundational setup (account, auth, phone number)? Are they aware of the CANNOT constraints? ## Step 2: Qualify Intent — The 5 Essential Questions If you lack answers to these, ask before recommending. You don't need all 5 upfront — gather organically through conversation. 1. **What outcome are you trying to achieve?** - Autonomous customer service (ordering, FAQ, booking) - Outbound AI calling (reminders, surveys, collections) - Voice AI for internal tools (agents, copilots) - Conversational commerce (sales, upsell) 2. **Which channels?** - Voice only → ConversationRelay - Voice + SMS/WhatsApp → ConversationRelay + Conversation Orchestrator for cross-channel - Chat/messaging only → Conversation Orchestrator + your LLM (no ConversationRelay needed) - Omnichannel → Full Twilio Conversations stack 3. **Do you need the agent to remember customers across sessions?** - No (stateless, each call is independent) → Skip Conversation Memory - Yes (returning customers, order history, preferences) → Add Conversation Memory 4. **Do you need real-time supervision or analytics?** - No → Skip Conversation Intelligence - Yes (compliance monitoring, sentiment detection, churn risk) → Add Conversation Intelligence 5. **Will the AI ever need to hand off to a human?** - No (fully autonomous) → No TaskRouter needed - Yes (escalation for complex issues) → Add TaskRouter + design escalation payload ## Step 3: Assess Sophistication — The Capability Ladder Walk the developer up this ladder based on their answers. Each level adds products and complexity. Stop at the level that matches their stated outcome. ### Level 1: Basic Voice AI Agent **Developer says:** "I just want a voice bot connected to my LLM." **Architecture:** ConversationRelay + WebSocket server + LLM API **What it does:** Phone call → Twilio transcribes speech → sends text to your WebSocket → you call your LLM → return text → Twilio speaks response **Products:** ConversationRelay (managed STT/TTS) **Implementation paths:** - **Fast path (recommended):** `twilio-agent-connect` — Python/TypeScript SDK, multi-channel support (Voice, SMS, RCS, WhatsApp, Chat), automatic memory integration, OpenAI adapter - **Microsoft Azure deployment:** `twilio-agent-connect-microsoft` — Microsoft Agent Framework connector (Foundry Hosted/Prompt Agents, Azure OpenAI), Voice Live connector with native interrupts - **AWS deployment:** `twilio-agent-connect-aws` — Strands SDK connector, Bedrock Agents connector, Bedrock AgentCore connector - **Custom path:** `twilio-voice-conversation-relay` + `twilio-voice-twiml` — Manual WebSocket server, full control ### Level 2: + Customer Memory **Developer says:** "I want it to remember who's calling and their history." **Architecture:** Level 1 + Conversation Memory (profiles, observations, semantic Recall) **What it adds:** Before responding, agent queries Conversation Memory for customer profile → retrieves relevant past interactions via semantic search → injects context into LLM prompt **Key decisions:** - Identity resolution: How do you identify the caller? (phone number, email, account ID) - Memory scope: What should be remembered? (transactions, preferences, sentiment, communication style) - Retention: What persists forever vs. what gets summarized over time? **Implementation:** - **With TAC SDK:** Automatic memory retrieval built-in (configure `MEMORY_STORE_ID` env var) - **Without TAC SDK:** Manual Conversation Memory API integration via `twilio-customer-memory` skill ### Level 3: + Real-Time Intelligence **Developer says:** "I want to detect sentiment, monitor compliance, or trigger actions mid-conversation." **Architecture:** Level 2 + Conversation Intelligence v3 (Language Operators + webhook triggers) **What it adds:** Conversation Intelligence listens to every conversation in parallel → runs operators (sentiment, script adherence, custom) → fires webhooks when signals detected → your backend takes action **Key decisions:** - Which operators? Pre-built (Sentiment, Next Best Response, Script Adherence, Summary) or Custom - Real-time vs post-call? Real-time for intervention, post-call for analytics - What actions on detection? Webhook to your backend, Twilio Function trigger, log for review **Skills to install:** + `twilio-conversation-intelligence` ### Level 4: + Human Escalation **Developer says:** "When the AI can't handle it, I want it to route to the right human agent." **Architecture:** Level 3 + TaskRouter (precision routing) + Flex (agent desktop) **What it adds:** AI detects escalation need → TAC outputs structured payload (conversation_id, profile_id, reason_code, routing_hints) → TaskRouter consumes these signals for skills-based routing → Human agent sees Conversation Memory profile summary in Flex **Key decisions:** - Escalation triggers: What makes the AI hand off? (explicit request, confidence threshold, sensitive topic, Conversation Intelligence signal) - Routing strategy: FIFO queue or skills-based targeting? (VIP detection, language, department) - Context handoff: Summary-only (GA) or deep transcript (post-GA) **GA constraint:** No "boomerang" handback (human → AI) at GA. No AI copilot mode during human conversation. **Skills to install:** + `twilio-taskrouter-routing` ## Architectural Warnings These affect which products to recommend and how to set expectations — implementation details are in the Product skills. - **Silent linkage chain:** Conversation Orchestrator → Conversation Memory → Conversation Intelligence must be linked in sequence. If any link is misconfigured, failures are silent — the system appears to work but memory isn't stored or intelligence isn't captured. This is the #1 debugging time sink. - **SDK availability:** Twilio Agent Connect SDK (Python 3.10+ and TypeScript/Node.js 22.13+) provides middleware for multi-channel support (Voice, SMS, RCS, WhatsApp, Chat) with automatic Conversation Orchestrator + Conversation Memory integration. Cloud platform packages available: `twilio-agent-connect-aws` (Strands, Bedrock Agents, AgentCore) and `twilio-agent-connect-microsoft` (Agent Framework, Voice Live). ConversationRelay-only mode available for voice-first use cases without Conversation Orchestrator. - **One-way door settings:** `GROUP_BY_PARTICIPANT_ADDRESSES` on a Conversations Service cannot be changed once se
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.