videodb
Video and audio perception, indexing, and editing. Ingest files/URLs/live streams, build visual/spoken indexes, search with timestamps, edit timelines, add overlays/subtitles, generate media, and create real-time alerts.
What this skill does
# VideoDB Skill
**Perception + memory + actions for video, live streams, and desktop sessions.**
Use this skill when you need to:
## When to Use
- You need video or audio perception, indexing, search, or timeline editing from files, URLs, desktop sessions, or live streams.
- The task involves timestamps, searchable evidence, subtitles, clips, overlays, or real-time monitoring alerts.
- You want one workflow that combines ingestion, understanding, retrieval, and media actions.
## 1) Desktop Perception
- Start/stop a **desktop session** capturing **screen, mic, and system audio**
- Stream **live context** and store **episodic session memory**
- Run **real-time alerts/triggers** on what's spoken and what's happening on screen
- Produce **session summaries**, a searchable timeline, and **playable evidence links**
## 2) Video ingest + stream
- Ingest a **file or URL** and return a **playable web stream link**
- Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio**
## 3) Index + search (timestamps + evidence)
- Build **visual**, **spoken**, and **keyword** indexes
- Search and return exact moments with **timestamps** and **playable evidence**
- Auto-create **clips** from search results
## 4) Timeline editing + generation
- Subtitles: **generate**, **translate**, **burn-in**
- Overlays: **text/image/branding**, motion captions
- Audio: **background music**, **voiceover**, **dubbing**
- Programmatic composition and exports via **timeline operations**
## 5) Live streams (RTSP) + monitoring
- Connect **RTSP/live feeds**
- Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows
---
## Common inputs
- Local **file path**, public **URL**, or **RTSP URL**
- Desktop capture request: **start / stop / summarize session**
- Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules
## Common outputs
- **Stream URL**
- Search results with **timestamps** and **evidence links**
- Generated assets: subtitles, audio, images, clips
- **Event/alert payloads** for live streams
- Desktop **session summaries** and memory entries
---
## Canonical prompts (examples)
- "Start desktop capture and alert when a password field appears."
- "Record my session and produce an actionable summary when it ends."
- "Ingest this file and return a playable stream link."
- "Index this folder and find every scene with people, return timestamps."
- "Generate subtitles, burn them in, and add light background music."
- "Connect this RTSP URL and alert when a person enters the zone."
## Running Python code
Before running any VideoDB code, change to the project directory and load environment variables:
```python
from dotenv import load_dotenv
load_dotenv(".env")
import videodb
conn = videodb.connect()
```
This reads `VIDEO_DB_API_KEY` from:
1. Environment (if already exported)
2. Project's `.env` file in current directory
If the key is missing, `videodb.connect()` raises `AuthenticationError` automatically.
Do NOT write a script file when a short inline command works.
When writing inline Python (`python -c "..."`), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:
```bash
python << 'EOF'
from dotenv import load_dotenv
load_dotenv(".env")
import videodb
conn = videodb.connect()
coll = conn.get_collection()
print(f"Videos: {len(coll.get_videos())}")
EOF
```
## Setup
When the user asks to "setup videodb" or similar:
### 1. Install SDK
```bash
pip install "videodb[capture]" python-dotenv
```
If `videodb[capture]` fails on Linux, install without the capture extra:
```bash
pip install videodb python-dotenv
```
### 2. Configure API key
The user must set `VIDEO_DB_API_KEY` using **either** method:
- **Export in terminal** (before starting Claude): `export VIDEO_DB_API_KEY=your-key`
- **Project `.env` file**: Save `VIDEO_DB_API_KEY=your-key` in the project's `.env` file
Get a free API key at https://console.videodb.io (50 free uploads, no credit card).
**Do NOT** read, write, or handle the API key yourself. Always let the user set it.
## Quick Reference
### Upload media
```python
# URL
video = coll.upload(url="https://example.com/video.mp4")
# YouTube
video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")
# Local file
video = coll.upload(file_path="/path/to/video.mp4")
```
### Transcript + subtitle
```python
# force=True skips the error if the video is already indexed
video.index_spoken_words(force=True)
text = video.get_transcript_text()
stream_url = video.add_subtitle()
```
### Search inside videos
```python
from videodb.exceptions import InvalidRequestError
video.index_spoken_words(force=True)
# search() raises InvalidRequestError when no results are found.
# Always wrap in try/except and treat "No results found" as empty.
try:
results = video.search("product demo")
shots = results.get_shots()
stream_url = results.compile()
except InvalidRequestError as e:
if "No results found" in str(e):
shots = []
else:
raise
```
### Scene search
```python
import re
from videodb import SearchType, IndexType, SceneExtractionType
from videodb.exceptions import InvalidRequestError
# index_scenes() has no force parameter — it raises an error if a scene
# index already exists. Extract the existing index ID from the error.
try:
scene_index_id = video.index_scenes(
extraction_type=SceneExtractionType.shot_based,
prompt="Describe the visual content in this scene.",
)
except Exception as e:
match = re.search(r"id\s+([a-f0-9]+)", str(e))
if match:
scene_index_id = match.group(1)
else:
raise
# Use score_threshold to filter low-relevance noise (recommended: 0.3+)
try:
results = video.search(
query="person writing on a whiteboard",
search_type=SearchType.semantic,
index_type=IndexType.scene,
scene_index_id=scene_index_id,
score_threshold=0.3,
)
shots = results.get_shots()
stream_url = results.compile()
except InvalidRequestError as e:
if "No results found" in str(e):
shots = []
else:
raise
```
### Timeline editing
**Important:** Always validate timestamps before building a timeline:
- `start` must be >= 0 (negative values are silently accepted but produce broken output)
- `start` must be < `end`
- `end` must be <= `video.length`
```python
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle
timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30))
timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36)))
stream_url = timeline.generate_stream()
```
### Transcode video (resolution / quality change)
```python
from videodb import TranscodeMode, VideoConfig, AudioConfig
# Change resolution, quality, or aspect ratio server-side
job_id = conn.transcode(
source="https://example.com/video.mp4",
callback_url="https://example.com/webhook",
mode=TranscodeMode.economy,
video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"),
audio_config=AudioConfig(mute=False),
)
```
### Reframe aspect ratio (for social platforms)
**Warning:** `reframe()` is a slow server-side operation. For long videos it can take
several minutes and may time out. Best practices:
- Always limit to a short segment using `start`/`end` when possible
- For full-length videos, use `callback_url` for async processing
- Trim the video on a `Timeline` first, then reframe the shorter result
```python
from videodb import ReframeMode
# Always prefer reframing a short segment:
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
# Async reframe for full-length videos (returns None, result via webhook):
video.reframe(target="Related in media
youtube-music-cast
IncludedDownload music from YouTube/YouTube Music and stream to Chromecast via Home Assistant. Complete CLI toolset with web server integration, configuration wizard, and playback controls.
faceswap
IncludedAI Face Swap - Swap face in video using verging.ai API. Supports local video files, YouTube/Bilibili URLs, local and remote face images. Auto-download, trimming, real-time progress.
background-remover
IncludedAI Background Removal - Remove background from images, create transparent PNG, one-click background remover for e-commerce product photos, portrait headshots, design materials. Supports JPG, PNG, WebP local files and remote URLs.
Spotify Automation
IncludedAutomate Spotify music playback, playlist management, and audio analysis workflows
videodb-skills
IncludedUpload, stream, search, edit, transcribe, and generate AI video and audio using the VideoDB SDK.