video-insight
Extract transcripts, generate summaries, create Q&A highlights, and perform deep research from YouTube videos or local media files. Use when the user provides a YouTube URL or local video/audio file path and asks to summarize, digest, analyze, or transcribe media content. Triggers: "video insight", "summarize video", "transcribe audio" + URL or file path.
What this skill does
# Video Insight
Analyzes YouTube videos or local media files to generate summaries,
insights, and optionally Q&A highlights to reinforce key learning points.
## Architecture
```mermaid
flowchart TB
subgraph Main["Main Session"]
SKILL[SKILL.md<br/>Orchestrator]
end
subgraph Agents["Subagents"]
subgraph Haiku["Haiku Models"]
QM[qa-generator<br/>Q&A Generation]
end
subgraph Sonnet["Sonnet Models"]
TA[transcript-analyzer<br/>Transcript Analysis]
DW[digest-writer<br/>Digest Writing]
DR[deep-researcher<br/>Deep Research]
end
end
SKILL --> TA
SKILL --> DW
SKILL --> QM
SKILL --> DR
TA -.->|Return Summary| SKILL
DW -.->|Save Document| SKILL
QM -.->|Q&A Section| SKILL
DR -.->|Research Results| SKILL
style Main fill:#f5f5f5,stroke:#333
style Haiku fill:#e1f5fe,stroke:#0288d1
style Sonnet fill:#fff3e0,stroke:#f57c00
```
**Context Management**: Main Session handles only orchestration.
Long transcript processing is performed by Subagents to protect context.
## Prerequisites
**YouTube URL Processing:**
- Requires `yt-dlp` (`brew install yt-dlp`)
**Local File Processing:**
- Requires `whisper-cpp` (`brew install whisper-cpp`)
- Requires `ffmpeg` (`brew install ffmpeg`)
- Whisper model download (automatic on first run)
Check dependencies: `./scripts/check_dependencies.sh`
## Supported Input Types
| Type | Pattern | Processing Method |
|---------------|----------------------|-----------------------|
| YouTube URL | `https://youtu.be/` | Extract (yt-dlp) |
| Video File | `*.mp4`, `*.mov` | whisper.cpp STT |
| Audio File | `*.mp3`, `*.m4a` | whisper.cpp STT |
| Subtitle File | `*.srt`, `*.vtt` | Use directly |
## Workflow
### Dependency Check (Before Starting)
**CRITICAL**: Check required dependencies before processing.
If missing, show installation guide and **stop immediately** (do not retry).
**For YouTube URL:**
```bash
./scripts/check_dependencies.sh --youtube
```
**For Local Media File:**
```bash
./scripts/check_dependencies.sh --local
```
**If exit code is 1 (missing dependencies):**
1. Display the script output (shows missing tools and install commands)
2. Inform user: "Please install the required dependencies and try again."
3. Reference: `references/prerequisites.md` for detailed installation guide
4. **Stop processing** - do not attempt to continue or retry
**Important**: Do not repeatedly check or retry installation.
The user must manually install dependencies and re-run the command.
### Step 0: Detect Input Type
Determine if input is YouTube URL or local file:
**YouTube URL Pattern:**
```regex
^https?://(www\.)?(youtube\.com|youtu\.be)
```
**Local File:**
- Check file existence (`[ -f "$INPUT" ]`)
- Determine type by extension
**Branching:**
- YouTube URL → Step 1A (YouTube metadata)
- Local media file → Step 1B (Local metadata)
- Subtitle file (srt/vtt) → Go directly to Step 3
- Invalid input → Error message
### Step 1A: Extract YouTube Metadata
```bash
./scripts/extract_metadata.sh "{youtube_url}"
```
Extract from JSON result:
- `title`, `channel`, `upload_date`, `duration`, `description`
- `chapters` (if available)
- `subtitles`, `automatic_captions` (subtitle availability)
### Step 1B: Extract Local File Metadata
```bash
./scripts/extract_local_metadata.sh "{file_path}"
```
Extract from JSON result:
- `title` (extracted from filename)
- `duration` (extracted with ffprobe)
- `format` (file format)
- `source: "local"` (local file indicator)
### Step 2: Check Video Duration
**If over 60 minutes**, present options with AskUserQuestion:
```yaml
question: "Video duration is {duration}. How would you like to proceed?"
options:
- label: "Process entire video"
description: "Process the full video (may take longer)"
- label: "First 30 minutes only"
description: "Process only the first 30 minutes"
- label: "Cancel"
description: "Cancel video processing"
```
### Step 3: Extract Transcript
**For YouTube URL:**
```bash
./scripts/extract_transcript.sh "{youtube_url}" "/tmp/video-insight"
```
Subtitle priority:
Korean manual > English manual > Korean auto > English auto
**If no subtitles available**, present options with AskUserQuestion:
```yaml
question: "No subtitles found. How would you like to proceed?"
options:
- label: "Summarize description only"
description: "Create a brief summary from the video description"
- label: "Cancel"
description: "Cancel video processing"
```
**For local media file:**
```bash
./scripts/extract_local_transcript.sh "{file_path}" "/tmp/video-insight"
```
Convert speech-to-text with whisper.cpp (Korean default)
**For existing subtitle file:**
Copy srt/vtt file to `/tmp/video-insight/` for use
### Step 4: Analyze Transcript (Subagent)
Call **transcript-analyzer** (Sonnet):
```markdown
Using Task tool:
- subagent_type: "transcript-analyzer"
- model: sonnet
- prompt: |
Analyze the transcript file.
- transcript_path: /tmp/video-insight/{title}.ko.srt
- metadata: {metadata JSON}
- language: ko
Extract key content, timeline, and important quotes.
```
**Result**: Return only analysis results to main session (not entire transcript)
### Step 5: Confirm Save Path
Confirm save path with AskUserQuestion:
```yaml
question: "Where would you like to save the digest file?"
header: "Save path"
options:
- label: "Default path"
description: "outputs/video/{YYYY-MM-DD}__{title}.md"
- label: "Current folder"
description: "./{YYYY-MM-DD}__{title}.md"
- label: "Custom path"
description: "Specify a custom path"
```
**If custom path selected**: Request path input from user
### Step 6: Write Digest (Subagent)
Call **digest-writer** (Sonnet):
```markdown
Using Task tool:
- subagent_type: "digest-writer"
- model: sonnet
- prompt: |
Write a digest document.
- analysis_result: {Step 4 result}
- metadata: {metadata}
- output_path: {path confirmed in Step 5}
- template_path: templates/video-insight.md
Also perform proper noun correction and add background information.
```
**Result**: Markdown file saved confirmation message
### Step 7: Additional Content Options
Present options with AskUserQuestion (multiSelect enabled):
```yaml
question: "Would you like to add additional sections?"
header: "Options"
multiSelect: true
options:
- label: "Q&A Section"
description: "Add Q&A highlights (1-5 pairs based on content length)"
- label: "Deep Research"
description: "Conduct in-depth research with web search"
- label: "Skip all"
description: "Generate digest only without additional sections"
```
### Step 8: Generate Additional Content (Parallel Execution)
Based on user selection, execute agents in parallel.
Each agent returns content only (does not write to file).
**If Q&A selected**, call **qa-generator** (Haiku):
```markdown
Using Task tool:
- subagent_type: "qa-generator"
- model: haiku
- prompt: |
Generate Q&A section content.
- digest_path: {file path from Step 6}
- qa_patterns_path: references/qa-patterns.md
Create 1-5 Q&A pairs (based on content length)
highlighting key information from the video.
Return the Q&A section content in markdown format
(do not write to file).
```
**If Deep Research selected**, call **deep-researcher** (Sonnet):
```markdown
Using Task tool:
- subagent_type: "deep-researcher"
- model: sonnet
- prompt: |
Perform deep research.
- digest_path: {file path from Step 6}
- deep_research_reference: references/deep-research.md
Collect related materials via web search.
Return the Deep Research section content in markdown format
(do not write to file).
```
**Parallel Execution**: If both options are selected,
launch both Task tools in a single message for parallel execution.
### Step 9: Append RRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.