video-insight

Included with Lifetime

$97 forever

Extract transcripts, generate summaries, create Q&A highlights, and perform deep research from YouTube videos or local media files. Use when the user provides a YouTube URL or local video/audio file path and asks to summarize, digest, analyze, or transcribe media content. Triggers: "video insight", "summarize video", "transcribe audio" + URL or file path.

Image & Videoscripts

What this skill does


# Video Insight

Analyzes YouTube videos or local media files to generate summaries,
insights, and optionally Q&A highlights to reinforce key learning points.

## Architecture

```mermaid
flowchart TB
    subgraph Main["Main Session"]
        SKILL[SKILL.md<br/>Orchestrator]
    end

    subgraph Agents["Subagents"]
        subgraph Haiku["Haiku Models"]
            QM[qa-generator<br/>Q&A Generation]
        end
        subgraph Sonnet["Sonnet Models"]
            TA[transcript-analyzer<br/>Transcript Analysis]
            DW[digest-writer<br/>Digest Writing]
            DR[deep-researcher<br/>Deep Research]
        end
    end

    SKILL --> TA
    SKILL --> DW
    SKILL --> QM
    SKILL --> DR

    TA -.->|Return Summary| SKILL
    DW -.->|Save Document| SKILL
    QM -.->|Q&A Section| SKILL
    DR -.->|Research Results| SKILL

    style Main fill:#f5f5f5,stroke:#333
    style Haiku fill:#e1f5fe,stroke:#0288d1
    style Sonnet fill:#fff3e0,stroke:#f57c00
```

**Context Management**: Main Session handles only orchestration.
Long transcript processing is performed by Subagents to protect context.

## Prerequisites

**YouTube URL Processing:**

- Requires `yt-dlp` (`brew install yt-dlp`)

**Local File Processing:**

- Requires `whisper-cpp` (`brew install whisper-cpp`)
- Requires `ffmpeg` (`brew install ffmpeg`)
- Whisper model download (automatic on first run)

Check dependencies: `./scripts/check_dependencies.sh`

## Supported Input Types

| Type          | Pattern              | Processing Method     |
|---------------|----------------------|-----------------------|
| YouTube URL   | `https://youtu.be/`  | Extract (yt-dlp)      |
| Video File    | `*.mp4`, `*.mov`     | whisper.cpp STT       |
| Audio File    | `*.mp3`, `*.m4a`     | whisper.cpp STT       |
| Subtitle File | `*.srt`, `*.vtt`     | Use directly          |

## Workflow

### Dependency Check (Before Starting)

**CRITICAL**: Check required dependencies before processing.
If missing, show installation guide and **stop immediately** (do not retry).

**For YouTube URL:**

```bash
./scripts/check_dependencies.sh --youtube
```

**For Local Media File:**

```bash
./scripts/check_dependencies.sh --local
```

**If exit code is 1 (missing dependencies):**

1. Display the script output (shows missing tools and install commands)
2. Inform user: "Please install the required dependencies and try again."
3. Reference: `references/prerequisites.md` for detailed installation guide
4. **Stop processing** - do not attempt to continue or retry

**Important**: Do not repeatedly check or retry installation.
The user must manually install dependencies and re-run the command.

### Step 0: Detect Input Type

Determine if input is YouTube URL or local file:

**YouTube URL Pattern:**

```regex
^https?://(www\.)?(youtube\.com|youtu\.be)
```

**Local File:**

- Check file existence (`[ -f "$INPUT" ]`)
- Determine type by extension

**Branching:**

- YouTube URL → Step 1A (YouTube metadata)
- Local media file → Step 1B (Local metadata)
- Subtitle file (srt/vtt) → Go directly to Step 3
- Invalid input → Error message

### Step 1A: Extract YouTube Metadata

```bash
./scripts/extract_metadata.sh "{youtube_url}"
```

Extract from JSON result:

- `title`, `channel`, `upload_date`, `duration`, `description`
- `chapters` (if available)
- `subtitles`, `automatic_captions` (subtitle availability)

### Step 1B: Extract Local File Metadata

```bash
./scripts/extract_local_metadata.sh "{file_path}"
```

Extract from JSON result:

- `title` (extracted from filename)
- `duration` (extracted with ffprobe)
- `format` (file format)
- `source: "local"` (local file indicator)

### Step 2: Check Video Duration

**If over 60 minutes**, present options with AskUserQuestion:

```yaml
question: "Video duration is {duration}. How would you like to proceed?"
options:
  - label: "Process entire video"
    description: "Process the full video (may take longer)"
  - label: "First 30 minutes only"
    description: "Process only the first 30 minutes"
  - label: "Cancel"
    description: "Cancel video processing"
```

### Step 3: Extract Transcript

**For YouTube URL:**

```bash
./scripts/extract_transcript.sh "{youtube_url}" "/tmp/video-insight"
```

Subtitle priority:
Korean manual > English manual > Korean auto > English auto

**If no subtitles available**, present options with AskUserQuestion:

```yaml
question: "No subtitles found. How would you like to proceed?"
options:
  - label: "Summarize description only"
    description: "Create a brief summary from the video description"
  - label: "Cancel"
    description: "Cancel video processing"
```

**For local media file:**

```bash
./scripts/extract_local_transcript.sh "{file_path}" "/tmp/video-insight"
```

Convert speech-to-text with whisper.cpp (Korean default)

**For existing subtitle file:**

Copy srt/vtt file to `/tmp/video-insight/` for use

### Step 4: Analyze Transcript (Subagent)

Call **transcript-analyzer** (Sonnet):

```markdown
Using Task tool:
- subagent_type: "transcript-analyzer"
- model: sonnet
- prompt: |
    Analyze the transcript file.

    - transcript_path: /tmp/video-insight/{title}.ko.srt
    - metadata: {metadata JSON}
    - language: ko

    Extract key content, timeline, and important quotes.
```

**Result**: Return only analysis results to main session (not entire transcript)

### Step 5: Confirm Save Path

Confirm save path with AskUserQuestion:

```yaml
question: "Where would you like to save the digest file?"
header: "Save path"
options:
  - label: "Default path"
    description: "outputs/video/{YYYY-MM-DD}__{title}.md"
  - label: "Current folder"
    description: "./{YYYY-MM-DD}__{title}.md"
  - label: "Custom path"
    description: "Specify a custom path"
```

**If custom path selected**: Request path input from user

### Step 6: Write Digest (Subagent)

Call **digest-writer** (Sonnet):

```markdown
Using Task tool:
- subagent_type: "digest-writer"
- model: sonnet
- prompt: |
    Write a digest document.

    - analysis_result: {Step 4 result}
    - metadata: {metadata}
    - output_path: {path confirmed in Step 5}
    - template_path: templates/video-insight.md

    Also perform proper noun correction and add background information.
```

**Result**: Markdown file saved confirmation message

### Step 7: Additional Content Options

Present options with AskUserQuestion (multiSelect enabled):

```yaml
question: "Would you like to add additional sections?"
header: "Options"
multiSelect: true
options:
  - label: "Q&A Section"
    description: "Add Q&A highlights (1-5 pairs based on content length)"
  - label: "Deep Research"
    description: "Conduct in-depth research with web search"
  - label: "Skip all"
    description: "Generate digest only without additional sections"
```

### Step 8: Generate Additional Content (Parallel Execution)

Based on user selection, execute agents in parallel.
Each agent returns content only (does not write to file).

**If Q&A selected**, call **qa-generator** (Haiku):

```markdown
Using Task tool:
- subagent_type: "qa-generator"
- model: haiku
- prompt: |
    Generate Q&A section content.

    - digest_path: {file path from Step 6}
    - qa_patterns_path: references/qa-patterns.md

    Create 1-5 Q&A pairs (based on content length)
    highlighting key information from the video.
    Return the Q&A section content in markdown format
    (do not write to file).
```

**If Deep Research selected**, call **deep-researcher** (Sonnet):

```markdown
Using Task tool:
- subagent_type: "deep-researcher"
- model: sonnet
- prompt: |
    Perform deep research.

    - digest_path: {file path from Step 6}
    - deep_research_reference: references/deep-research.md

    Collect related materials via web search.
    Return the Deep Research section content in markdown format
    (do not write to file).
```

**Parallel Execution**: If both options are selected,
launch both Task tools in a single message for parallel execution.

### Step 9: Append R

Files: 11

Size: 32.0 KB

Complexity: 76/100

Category: Image & Video

Source: https://github.com/gzupark/claude-plugin-pack/tree/main/plugins/task-forge/skills/video-insight

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts