zoom-rtms
Reference skill for Zoom RTMS. Use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams.
What this skill does
# Zoom Realtime Media Streams (RTMS)
Background reference for live Zoom media pipelines. Prefer `build-zoom-bot` first, then use this skill for stream types, capabilities, and RTMS-specific implementation constraints.
# Zoom Realtime Media Streams (RTMS)
Expert guidance for accessing live audio, video, transcript, chat, and screen share data from Zoom meetings, webinars, Video SDK sessions, and Zoom Contact Center Voice in real-time. RTMS uses a WebSocket-based protocol with open standards and does not require a meeting bot to capture the media plane.
## Read This First (Critical)
RTMS is primarily a **backend media ingestion service**.
- Your backend receives and processes live media: **audio, video, screen share, chat, transcript**.
- RTMS is not a frontend UI SDK by itself.
- Processing is **event-triggered**: backend waits for RTMS start webhook events before stream handling begins.
Optional architecture (common):
- Add a **Zoom App SDK** frontend for in-client UI/controls.
- Stream backend RTMS outputs to frontend via **WebSocket** (or SSE, gRPC, queue workers, etc.).
Use RTMS for media/data plane, and use frontend frameworks/Zoom Apps for presentation + user interactions.
**Official Documentation**: https://developers.zoom.us/docs/rtms/
**SDK Reference (JS)**: https://zoom.github.io/rtms/js/
**SDK Reference (Python)**: https://zoom.github.io/rtms/py/
**Sample Repository**: https://github.com/zoom/rtms-samples
## Quick Links
**New to RTMS? Follow this path:**
1. **[Connection Architecture](concepts/connection-architecture.md)** - Two-phase WebSocket design
2. **[SDK Quickstart](examples/sdk-quickstart.md)** - Fastest way to receive media (recommended)
3. **[Manual WebSocket](examples/manual-websocket.md)** - Full protocol control without SDK
4. **[Media Types](references/media-types.md)** - Audio, video, transcript, chat, screen share
**Complete Implementation:**
- **[RTMS Bot](examples/rtms-bot.md)** - End-to-end bot implementation guide
**Reference:**
- **[Lifecycle Flow](concepts/lifecycle-flow.md)** - Complete webhook-to-streaming flow
- **[Data Types](references/data-types.md)** - All enums and constants
- **[Webhooks](references/webhooks.md)** - Event subscription details
- **[Environment Variables](references/environment-variables.md)** - credential modes and runtime knobs
- **[Quickstart Notes](references/quickstart.md)** - Secondary quickstart guide
- **Integrated Index** - see the section below in this file
**Having issues?**
- Connection fails -> [Common Issues](troubleshooting/common-issues.md)
- Duplicate connections -> [Webhook Gotchas](troubleshooting/common-issues.md#webhook-response-timing)
- No audio/video -> [Media Configuration](references/media-types.md)
- Start with preflight checks -> [5-Minute Runbook](RUNBOOK.md)
## Supported Products
| Product | Webhook Event | Payload ID | App Type |
|---------|--------------|------------|----------|
| **Meetings** | `meeting.rtms_started` / `meeting.rtms_stopped` | `meeting_uuid` | General App |
| **Webinars** | `webinar.rtms_started` / `webinar.rtms_stopped` | `meeting_uuid` (same!) | General App |
| **Video SDK** | `session.rtms_started` / `session.rtms_stopped` | `session_id` | Video SDK App |
| **Zoom Contact Center Voice** | Product-specific RTMS/ZCC Voice events | Product-specific stream/session identifiers | Contact Center / approved RTMS integration |
Once connected, the core signaling/media socket model is shared across products. Meetings, webinars, and Video SDK sessions use the familiar start/stop webhooks. Zoom Contact Center Voice adds its own RTMS/ZCC Voice event family and should be treated as the same transport model with product-specific event payloads.
## RTMS Overview
RTMS is a data pipeline that gives your app access to live media from Zoom meetings, webinars, and Video SDK sessions **without participant bots**. Instead of having automated clients join meetings, use RTMS to collect media data directly from Zoom's infrastructure.
### What RTMS Provides
| Media Type | Format | Use Cases |
|------------|--------|-----------|
| **Audio** | PCM (L16), G.711, G.722, Opus | Transcription, voice analysis, recording |
| **Video** | H.264, JPG, PNG | Recording, AI vision, thumbnails, active participant selection |
| **Screen Share** | H.264, JPG, PNG | Content capture, slide extraction |
| **Transcript** | JSON text | Meeting notes, search, compliance |
| **Chat** | JSON text | Archive, sentiment analysis |
### March 2026 Protocol Changes
- **Zoom Contact Center Voice support**: RTMS now covers Contact Center Voice audio and transcript scenarios.
- **Transcript Language Identification control**: transcript media handshakes now support `src_language` and `enable_lid`. Default behavior is LID enabled. Set `enable_lid: false` to force a fixed language.
- **Single individual video stream subscription**: RTMS can now stream one participant's camera feed at a time when `data_opt` is set to `VIDEO_SINGLE_INDIVIDUAL_STREAM`.
- **Graceful client-initiated shutdown**: backends can send `STREAM_CLOSE_REQ` over the signaling socket and wait for `STREAM_CLOSE_RESP`.
- **Media keep-alive tolerance increased**: media socket keep-alive timeout is now **65 seconds**, not 35.
### Two Approaches
| Approach | Best For | Complexity |
|----------|----------|------------|
| **SDK** (`@zoom/rtms`) | Most use cases | Low - handles WebSocket complexity |
| **Manual WebSocket** | Custom protocols, other languages | High - full protocol implementation |
## Prerequisites
- **Node.js 20.3.0+** (24 LTS recommended) for JavaScript SDK
- **Python 3.10+** for Python SDK
- Zoom General App (for meetings/webinars) or Video SDK App (for Video SDK) with RTMS feature enabled
- Webhook endpoint for RTMS events
- Server to receive WebSocket streams
> **Need RTMS access?** Post in [Zoom Developer Forum](https://devforum.zoom.us/) requesting RTMS access with your use case.
## Quick Start (SDK - Recommended)
```javascript
import rtms from "@zoom/rtms";
// All RTMS start/stop events across products
const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
// Handle webhook events
rtms.onWebhookEvent(({ event, payload }) => {
if (!RTMS_EVENTS.includes(event)) return;
const client = new rtms.Client();
client.onAudioData((data, timestamp, metadata) => {
console.log(`Audio from ${metadata.userName}: ${data.length} bytes`);
});
client.onTranscriptData((data, timestamp, metadata) => {
const text = data.toString('utf8');
console.log(`${metadata.userName}: ${text}`);
});
client.onJoinConfirm((reason) => {
console.log(`Joined session: ${reason}`);
});
// SDK handles all WebSocket connections automatically
// Accepts both meeting_uuid and session_id transparently
client.join(payload);
});
```
## Quick Start (Manual WebSocket)
For full control or non-SDK languages, implement the two-phase WebSocket protocol:
```javascript
const WebSocket = require('ws');
const crypto = require('crypto');
const RTMS_EVENTS = ['meeting.rtms_started', 'webinar.rtms_started', 'session.rtms_started'];
// 1. Generate signature
// For meetings/webinars: uses meeting_uuid. For Video SDK: uses session_id.
function generateSignature(clientId, idValue, streamId, clientSecret) {
const message = `${clientId},${idValue},${streamId}`;
return crypto.createHmac('sha256', clientSecret).update(message).digest('hex');
}
// 2. Handle webhook
app.post('/webhook', (req, res) => {
res.status(200).send(); // CRITICAL: Respond immediately!
const { event, payload } = req.body;
if (RTMS_EVENTS.includes(event)) {
connectToRTMS(payload);
}
});
// 3. Connect to signaling WebSocket
function connectToRTMS(payload) {
const { server_urls, rtms_stream_id } = payload;
// meeting_uuid for meetings/webinars, session_id for Video SDK
const idValue = payload.meeting_uuid || payload.session_id;
const signature = generateSignature(CLIENT_IDRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.