omnicaptions-download
Use when downloading videos, audio, or captions from YouTube and other video platforms. Supports quality selection.
What this skill does
# Download from Video Platforms Download videos, audio, and captions from YouTube and 1000+ video platforms using yt-dlp. ## Confirmation Required **IMPORTANT**: Before executing any download, you MUST confirm with the user using AskUserQuestion: 1. Show the URL to download 2. Show the quality setting (audio/video) 3. Show the output directory 4. Ask for confirmation Example confirmation: ``` Ready to download: - URL: https://youtube.com/watch?v=xxx - Type: Audio only / Video (1080p) - Save to: Current directory Confirm download? ``` Only proceed with the download command after user confirms. ## When to Use - Download YouTube videos/audio for offline use - Extract captions from video platforms - Get audio for local transcription or editing ## When NOT to Use - Just need transcription (use `/omnicaptions:transcribe` - Gemini handles URLs directly) - Converting existing caption formats (use `/omnicaptions:convert`) ## Setup ```bash pip install omni-captions-skills --extra-index-url https://lattifai.github.io/pypi/simple/ ``` ## CLI Usage **Note**: By default, files are saved to the current working directory. Do not specify `-o` unless the user explicitly requests a different location. ```bash # Download audio only (default, saves to current directory) omnicaptions download "https://www.youtube.com/watch?v=VIDEO_ID" # Supports bare YouTube video ID (auto-validates via yt-dlp) omnicaptions download e882eXLtwkI # Download video (1080p recommended) omnicaptions download "https://youtube.com/watch?v=VIDEO_ID" -q 1080p # Only use -o when user explicitly requests a different location omnicaptions download "https://youtube.com/watch?v=VIDEO_ID" -o ./downloads/ ``` | Option | Description | |--------|-------------| | `-o, --output` | Output directory (default: current) | | `-q, --quality` | Quality: `audio` (default), `best`, `1080p`, `720p`, `480p`, `360p` | | `-v, --verbose` | Verbose output | ## Quality Presets | Preset | Description | |--------|-------------| | `audio` | Audio only (m4a/mp3), smallest size | | `1080p` | 1080p video + audio (recommended for video) | | `720p` | 720p video + audio | | `480p` | 480p video + audio | | `360p` | 360p video + audio | | `best` | Best available quality (may be 4K+, very large) | ## Output Downloads produce: - **Audio/Video file**: `.m4a`, `.mp4`, etc. - **Captions** (if available): `.vtt` or `.srt` - **Metadata**: `.meta.json` (video resolution, title, etc. for ASS font scaling) ``` Video: ./VIDEO_ID.mp4 Audio: ./VIDEO_ID.m4a Caption: ./VIDEO_ID.en.vtt Metadata: ./VIDEO_ID.meta.json # Used by convert for auto font size Title: Video Title Here ``` The `.meta.json` file stores video resolution, which `omnicaptions convert` uses to auto-calculate font size for ASS karaoke output. ## Supported Platforms YouTube, Bilibili, Vimeo, Twitter/X, and [1000+ sites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md). ## Related Skills | Skill | Use When | |-------|----------| | `/omnicaptions:transcribe` | Transcribe downloaded audio/video | | `/omnicaptions:translate` | Translate captions with Gemini | | `/omnicaptions:translate` | Translate captions with Claude (no API) | | `/omnicaptions:convert` | Convert caption format | ### Workflow Examples **Important**: Generate bilingual captions AFTER LaiCut alignment. Preserve language tag in filename. ```bash # Has caption: download → LaiCut align (JSON) → convert → translate omnicaptions download "https://youtube.com/watch?v=xxx" # → xxx.en.vtt omnicaptions LaiCut xxx.mp4 xxx.en.vtt # → xxx.en_LaiCut.json omnicaptions convert xxx.en_LaiCut.json -o xxx.en_LaiCut.srt # → xxx.en_LaiCut_Claude_zh.srt (after translate) # No caption: download → transcribe → LaiCut align (JSON) → convert → translate omnicaptions download "https://youtube.com/watch?v=xxx" omnicaptions transcribe xxx.mp4 # → xxx_GeminiUnd.md omnicaptions LaiCut xxx.mp4 xxx_GeminiUnd.md # → xxx_GeminiUnd_LaiCut.json omnicaptions convert xxx_GeminiUnd_LaiCut.json -o xxx_GeminiUnd_LaiCut.srt # → xxx_GeminiUnd_LaiCut_Claude_zh.srt (after translate) ```
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.