video-sdk/windows
Zoom Video SDK for Windows - C++ integration for video sessions, raw audio/video capture, screen sharing, recording, and real-time communication
What this skill does
# Zoom Video SDK - Windows Development
Expert guidance for developing with the Zoom Video SDK on Windows. This SDK enables custom video applications, raw media capture/injection, cloud recording, live streaming, and real-time transcription on Windows platforms.
**Official Documentation**: https://developers.zoom.us/docs/video-sdk/windows/
**API Reference**: https://marketplacefront.zoom.us/sdk/custom/windows/
**Sample Repository**: https://github.com/zoom/videosdk-windows-rawdata-sample
## Quick Links
**New to Video SDK? Follow this path:**
1. **[SDK Architecture Pattern](concepts/sdk-architecture-pattern.md)** - Universal 3-step pattern for ANY feature
2. **[Session Join Pattern](examples/session-join-pattern.md)** - Complete working code to join a session
3. **[Windows Message Loop](troubleshooting/windows-message-loop.md)** - **CRITICAL**: Fix callbacks not firing
4. **[Video Rendering](examples/video-rendering.md)** - Display video with Canvas API
**Reference:**
- **[Singleton Hierarchy](concepts/singleton-hierarchy.md)** - 5-level SDK navigation map
- **[API Reference](references/windows-reference.md)** - Methods, error codes, timing rules
- **[Delegate Methods](references/delegate-methods.md)** - All 80+ callback methods
- **[Sample Applications](references/samples.md)** - Official samples guide
- **[windows.md](windows.md)** - Secondary overview doc (pointer-style)
- **[SKILL.md](SKILL.md)** - Complete documentation navigation
**Having issues?**
- Callbacks not firing → [Windows Message Loop](troubleshooting/windows-message-loop.md)
- Build errors → [Build Errors Guide](troubleshooting/build-errors.md)
- Video subscribe fails → [Video Rendering](examples/video-rendering.md) (subscribe in `onUserVideoStatusChanged`)
- Quick diagnostics → [Common Issues](troubleshooting/common-issues.md)
**Building a Custom UI?**
- [Canvas vs Raw Data](concepts/canvas-vs-raw-data.md) - Choose your rendering approach
- [Raw Video Capture](examples/raw-video-capture.md) - YUV420 frame processing
## SDK Overview
The Zoom Video SDK for Windows is a C++ library that provides:
- **Session Management**: Join/leave video SDK sessions
- **Raw Data Access**: Capture raw audio/video frames (YUV420, PCM)
- **Raw Data Injection**: Send custom audio/video into sessions
- **Screen Sharing**: Share screens or inject custom share sources
- **Cloud Recording**: Record sessions to Zoom cloud
- **Live Streaming**: Stream to RTMP endpoints (YouTube, etc.)
- **Chat & Commands**: In-session messaging and command channels
- **Live Transcription**: Real-time speech-to-text
- **Subsessions**: Breakout room support
- **Whiteboard**: Collaborative whiteboard features
- **Annotations**: Screen share annotations
- **C# Integration**: C++/CLI wrapper for .NET applications
## Prerequisites
### System Requirements
- **OS**: Windows 10 (1903 or later) or Windows 11
- **Architecture**: x64 (recommended), x86, or ARM64
- **Visual Studio**: 2019 or 2022 (Community, Professional, or Enterprise)
- **Windows SDK**: 10.0.19041.0 or later
- **.NET Framework**: 4.8 or later (for C# applications)
### Visual Studio Workloads
Install these workloads via Visual Studio Installer:
1. **Desktop development with C++**
- MSVC v142 or v143 compiler
- Windows 10/11 SDK
- C++ CMake tools (optional)
2. **.NET desktop development** (for C# applications)
- .NET Framework 4.8 targeting pack
- C++/CLI support
## Quick Start
### C++ Application
```cpp
#include <windows.h>
#include "zoom_video_sdk_api.h"
#include "zoom_video_sdk_interface.h"
#include "zoom_video_sdk_delegate_interface.h"
USING_ZOOM_VIDEO_SDK_NAMESPACE
// 1. Create SDK object
IZoomVideoSDK* video_sdk_obj = CreateZoomVideoSDKObj();
// 2. Initialize
ZoomVideoSDKInitParams init_params;
init_params.domain = L"https://zoom.us";
init_params.enableLog = true;
init_params.logFilePrefix = L"zoom_win_video";
init_params.videoRawDataMemoryMode = ZoomVideoSDKRawDataMemoryModeHeap;
init_params.shareRawDataMemoryMode = ZoomVideoSDKRawDataMemoryModeHeap;
init_params.audioRawDataMemoryMode = ZoomVideoSDKRawDataMemoryModeHeap;
ZoomVideoSDKErrors err = video_sdk_obj->initialize(init_params);
// 3. Add event listener
video_sdk_obj->addListener(myDelegate);
// 4. Join session (IMPORTANT: set audioOption.connect = false)
ZoomVideoSDKSessionContext session_context;
session_context.sessionName = L"my-session";
session_context.userName = L"Windows User";
session_context.token = L"your-jwt-token";
session_context.videoOption.localVideoOn = false;
session_context.audioOption.connect = false; // Connect audio after join
session_context.audioOption.mute = true;
IZoomVideoSDKSession* session = video_sdk_obj->joinSession(session_context);
// 5. CRITICAL: Add Windows message pump for callbacks to work
bool running = true;
while (running) {
// Process Windows messages (required for SDK callbacks)
MSG msg;
while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {
TranslateMessage(&msg);
DispatchMessage(&msg);
}
// Your application logic here
Sleep(10);
}
```
### C# Application
```csharp
using ZoomVideoSDK;
var sdkManager = new ZoomSDKManager();
sdkManager.Initialize();
sdkManager.JoinSession("my-session", "jwt-token", "User Name", "");
```
## Key Features
| Feature | Description |
|---------|-------------|
| **Session Management** | Join, leave, and manage video sessions |
| **Raw Video (YUV I420)** | Capture and inject raw video frames |
| **Raw Audio (PCM)** | Capture and inject raw audio data |
| **Screen Sharing** | Share screens or custom content |
| **Cloud Recording** | Record sessions to Zoom cloud |
| **Live Streaming** | Stream to RTMP endpoints |
| **Chat** | Send/receive chat messages |
| **Command Channel** | Custom command messaging |
| **Live Transcription** | Real-time speech-to-text |
| **C# Support** | Full .NET Framework integration |
## Sample Applications
**Official Repository**: https://github.com/zoom/videosdk-windows-rawdata-sample
| Sample | Description |
|--------|-------------|
| VSDK_SkeletonDemo | Minimal session join - **start here** |
| VSDK_getRawVideo | Capture YUV420 video frames |
| VSDK_getRawAudio | Capture PCM audio |
| VSDK_sendRawVideo | Inject custom video (virtual camera) |
| VSDK_sendRawAudio | Inject custom audio (virtual mic) |
| VSDK_CloudRecording | Cloud recording control |
| VSDK_CommandChannel | Custom command messaging |
| VSDK_TranscriptionAndTranslation | Live captions |
**See complete guide**: [Sample Applications Reference](references/samples.md)
## Critical Gotchas and Best Practices
### ⚠️ CRITICAL: Windows Message Pump Required
**The #1 issue that causes session joins to hang with no callbacks:**
All Windows applications using the Zoom SDK **MUST** process Windows messages. The SDK uses Windows messages to deliver callbacks like `onSessionJoin()`, `onError()`, etc.
**Problem**: Without a message pump, `joinSession()` appears to succeed but callbacks never fire.
**Solution**: Add this to your main loop:
```cpp
while (running) {
// REQUIRED: Process Windows messages
MSG msg;
while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {
TranslateMessage(&msg);
DispatchMessage(&msg);
}
// Your application logic
Sleep(10);
}
```
**Applies to**:
- Console applications (no automatic message pump)
- Custom main loops
- Applications that don't use standard WinMain/WndProc
**GUI applications** using WinMain with standard message loop already have this.
### Audio Connection Strategy
**Best Practice**: Set `audioOption.connect = false` when joining, then connect audio in the `onSessionJoin()` callback.
```cpp
// During join
session_context.audioOption.connect = false; // Don't connect yet
session_context.audioOption.mute = true;
// In onSessionJoin() callback
void onSessionJoin() override {
IZoomVideoSDKAudioHelper* audioHelper = video_sdk_obj->getAudioHelper();
if (audioHelper) {
audioHelRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.