omnicaptions-download

Included with Lifetime

$97 forever

Use when downloading videos, audio, or captions from YouTube and other video platforms. Supports quality selection.

Image & Video

What this skill does


# Download from Video Platforms

Download videos, audio, and captions from YouTube and 1000+ video platforms using yt-dlp.

## Confirmation Required

**IMPORTANT**: Before executing any download, you MUST confirm with the user using AskUserQuestion:

1. Show the URL to download
2. Show the quality setting (audio/video)
3. Show the output directory
4. Ask for confirmation

Example confirmation:
```
Ready to download:
- URL: https://youtube.com/watch?v=xxx
- Type: Audio only / Video (1080p)
- Save to: Current directory

Confirm download?
```

Only proceed with the download command after user confirms.

## When to Use

- Download YouTube videos/audio for offline use
- Extract captions from video platforms
- Get audio for local transcription or editing

## When NOT to Use

- Just need transcription (use `/omnicaptions:transcribe` - Gemini handles URLs directly)
- Converting existing caption formats (use `/omnicaptions:convert`)

## Setup

```bash
pip install omni-captions-skills --extra-index-url https://lattifai.github.io/pypi/simple/
```

## CLI Usage

**Note**: By default, files are saved to the current working directory. Do not specify `-o` unless the user explicitly requests a different location.

```bash
# Download audio only (default, saves to current directory)
omnicaptions download "https://www.youtube.com/watch?v=VIDEO_ID"

# Supports bare YouTube video ID (auto-validates via yt-dlp)
omnicaptions download e882eXLtwkI

# Download video (1080p recommended)
omnicaptions download "https://youtube.com/watch?v=VIDEO_ID" -q 1080p

# Only use -o when user explicitly requests a different location
omnicaptions download "https://youtube.com/watch?v=VIDEO_ID" -o ./downloads/
```

| Option | Description |
|--------|-------------|
| `-o, --output` | Output directory (default: current) |
| `-q, --quality` | Quality: `audio` (default), `best`, `1080p`, `720p`, `480p`, `360p` |
| `-v, --verbose` | Verbose output |

## Quality Presets

| Preset | Description |
|--------|-------------|
| `audio` | Audio only (m4a/mp3), smallest size |
| `1080p` | 1080p video + audio (recommended for video) |
| `720p` | 720p video + audio |
| `480p` | 480p video + audio |
| `360p` | 360p video + audio |
| `best` | Best available quality (may be 4K+, very large) |

## Output

Downloads produce:
- **Audio/Video file**: `.m4a`, `.mp4`, etc.
- **Captions** (if available): `.vtt` or `.srt`
- **Metadata**: `.meta.json` (video resolution, title, etc. for ASS font scaling)

```
Video: ./VIDEO_ID.mp4
Audio: ./VIDEO_ID.m4a
Caption: ./VIDEO_ID.en.vtt
Metadata: ./VIDEO_ID.meta.json  # Used by convert for auto font size
Title: Video Title Here
```

The `.meta.json` file stores video resolution, which `omnicaptions convert` uses to auto-calculate font size for ASS karaoke output.

## Supported Platforms

YouTube, Bilibili, Vimeo, Twitter/X, and [1000+ sites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md).

## Related Skills

| Skill | Use When |
|-------|----------|
| `/omnicaptions:transcribe` | Transcribe downloaded audio/video |
| `/omnicaptions:translate` | Translate captions with Gemini |
| `/omnicaptions:translate` | Translate captions with Claude (no API) |
| `/omnicaptions:convert` | Convert caption format |

### Workflow Examples

**Important**: Generate bilingual captions AFTER LaiCut alignment. Preserve language tag in filename.

```bash
# Has caption: download → LaiCut align (JSON) → convert → translate
omnicaptions download "https://youtube.com/watch?v=xxx"
# → xxx.en.vtt
omnicaptions LaiCut xxx.mp4 xxx.en.vtt
# → xxx.en_LaiCut.json
omnicaptions convert xxx.en_LaiCut.json -o xxx.en_LaiCut.srt
# → xxx.en_LaiCut_Claude_zh.srt (after translate)

# No caption: download → transcribe → LaiCut align (JSON) → convert → translate
omnicaptions download "https://youtube.com/watch?v=xxx"
omnicaptions transcribe xxx.mp4
# → xxx_GeminiUnd.md
omnicaptions LaiCut xxx.mp4 xxx_GeminiUnd.md
# → xxx_GeminiUnd_LaiCut.json
omnicaptions convert xxx_GeminiUnd_LaiCut.json -o xxx_GeminiUnd_LaiCut.srt
# → xxx_GeminiUnd_LaiCut_Claude_zh.srt (after translate)
```

Files: 1

Size: 4.2 KB

Complexity: 12/100

Category: Image & Video

Source: https://github.com/lattifai/omni-captions-skills/tree/main/skills/omnicaptions-download

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts