video-processor

Included with Lifetime

$97 forever

Download and process videos from YouTube and other platforms. Supports video download, audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions YouTube download, video conversion, audio extraction, transcription, mp4, webm, ffmpeg, yt-dlp, or whisper transcription.

Ads & Marketingscripts

What this skill does


# Video Processor

## Instructions

This skill provides comprehensive video processing utilities including YouTube video download, audio extraction, format conversion, and audio transcription using yt-dlp, FFmpeg, and OpenAI's Whisper model.

### Prerequisites

**Required tools** (must be installed in your environment):
- **yt-dlp**: Video downloader for YouTube and thousands of other sites
  ```bash
  # Install via pip
  pip install -U yt-dlp

  # Verify installation
  yt-dlp --version
  ```

- **FFmpeg**: Multimedia framework for video/audio processing
  ```bash
  # macOS
  brew install ffmpeg

  # Ubuntu/Debian
  apt-get install ffmpeg

  # Verify installation
  ffmpeg -version
  ```

- **OpenAI Whisper**: Speech-to-text transcription model
  ```bash
  # Install via pip
  pip install -U openai-whisper

  # Verify installation
  whisper --help
  ```

**Python packages** (included in script via PEP 723):
- click (CLI framework)
- ffmpeg-python (Python wrapper for FFmpeg)
- yt-dlp (video downloader)

### Workflow

Use the `scripts/video_processor.py` script for all video processing tasks. The script provides a simple CLI with the following commands:

#### 0. **Download Video from YouTube or Other Platforms** (NEW!)

Download videos from YouTube and thousands of other supported websites:

```bash
# Download video
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." output.mp4

# Download audio only (as MP3)
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." --audio-only

# Show video info without downloading
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." --info

# Download with subtitles
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." output.mp4 --subtitle
```

Options:
- `--audio-only`: Download audio only (extracts to MP3)
- `--subtitle`: Download and embed subtitles (supports en, zh-Hans, zh-Hant)
- `--info`: Show video information without downloading
- `--format`: Specify video format preference (default: best quality)

#### 1. **Extract Audio from Video**

Extract the audio track from a video file:

```bash
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio input.mp4 output.wav
```

Options:
- `--format`: Output audio format (default: wav). Supports: wav, mp3, aac, flac
- Output is suitable for transcription or standalone audio use

#### 2. **Convert Video to MP4**

Convert any video file to MP4 format:

```bash
uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 input.avi output.mp4
```

Options:
- `--codec`: Video codec (default: libx264). Common options: libx264, libx265, h264
- `--preset`: Encoding speed/quality preset (default: medium). Options: ultrafast, fast, medium, slow, veryslow

#### 3. **Convert Video to WebM**

Convert any video file to WebM format (web-optimized):

```bash
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm input.mp4 output.webm
```

Options:
- `--codec`: Video codec (default: libvpx-vp9). Options: libvpx, libvpx-vp9
- WebM is optimized for web playback and streaming

#### 4. **Transcribe Audio with Whisper**

Transcribe audio or video files to text using OpenAI's Whisper model:

```bash
# Transcribe video file (audio will be extracted automatically)
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe input.mp4 transcript.txt

# Transcribe audio file directly
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe audio.wav transcript.txt
```

Options:
- `--model`: Whisper model size (default: base). Options:
  - `tiny`: Fastest, lowest accuracy (~1GB RAM)
  - `base`: Fast, good accuracy (~1GB RAM) **[DEFAULT]**
  - `small`: Balanced (~2GB RAM)
  - `medium`: High accuracy (~5GB RAM)
  - `large`: Best accuracy, slowest (~10GB RAM)
- `--language`: Language code (default: auto-detect). Examples: en, es, fr, de, zh
- `--format`: Output format (default: txt). Options: txt, srt, vtt, json

**Transcription workflow:**
1. If input is video, FFmpeg extracts audio to temporary WAV file
2. Whisper processes the audio file
3. Transcription is saved in requested format
4. Temporary files are cleaned up automatically

#### 5. **Combined Workflow Example**

Process a video end-to-end:

```bash
# 1. Extract audio for analysis
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav

# 2. Transcribe to SRT subtitles
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 lecture.srt --format srt --model small

# 3. Convert to web format
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm lecture.mp4 lecture.webm
```

### Key Technical Details

**FFmpeg and Whisper Integration:**
- FFmpeg doesn't transcribe audio itself - it prepares audio for external transcription
- The workflow is: Extract audio (FFmpeg) → Transcribe (Whisper) → Optional: Re-integrate with video
- FFmpeg can pipe audio directly to Whisper for real-time processing (advanced use case)

**Audio Format for Transcription:**
- Whisper works best with WAV or MP3 formats
- Sample rate: 16kHz is optimal (script handles conversion automatically)
- The script extracts audio with optimal settings for Whisper

**Output Formats:**
- **txt**: Plain text transcript
- **srt**: SubRip subtitle format (includes timestamps)
- **vtt**: WebVTT subtitle format (web standard)
- **json**: Detailed JSON with word-level timestamps

### Error Handling

The script includes comprehensive error handling:
- Validates input files exist
- Checks FFmpeg and Whisper are installed
- Provides clear error messages for missing dependencies
- Handles temporary file cleanup on errors

### Performance Tips

- Use `tiny` or `base` models for quick drafts
- Use `small` or `medium` for production transcriptions
- Use `large` only when maximum accuracy is required
- For long videos, consider extracting audio first, then transcribe in segments
- WebM conversion with VP9 takes longer but produces smaller files

## Examples

### Example 1: Quick Video to MP4 Conversion

User request:
```
I have an AVI file from my old camera. Can you convert it to MP4?
```

You would:
1. Use the to-mp4 command with default settings:
   ```bash
   uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 old_video.avi output.mp4
   ```
2. Confirm the conversion completed successfully
3. Inform the user about the output file location

### Example 2: Extract Audio and Transcribe

User request:
```
I recorded a lecture video and need a transcript. Can you extract the audio and transcribe it?
```

You would:
1. First extract the audio:
   ```bash
   uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav
   ```
2. Then transcribe using the base model (good balance of speed/accuracy):
   ```bash
   uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 transcript.txt --model base
   ```
3. Share the transcript.txt file with the user

### Example 3: Create Web-Optimized Video with Subtitles

User request:
```
I need to put this video on my website with subtitles. Can you help?
```

You would:
1. Convert to WebM for web optimization:
   ```bash
   uv run .claude/skills/video-processor/scripts/video_processor.py to-webm presentation.mp4 presentation.webm
   ```
2. Generate SRT subtitle file:
   ```bash
   uv run .claude/skills/video-processor/scripts/video_processor.py transcribe presentation.mp4 subtitles.srt --format srt --model small
   ```
3. Inform user they now have:
   - presentation.webm (web-optimized video)
   - subtitles.srt (subtitle file for embedding)

### Example 4: High-Quality Transcription with Language Specification

User request:
```
I have a Spanish interview video that needs an accura

Files: 2

Size: 28.3 KB

Complexity: 54/100

Category: Ads & Marketing

Source: https://github.com/iamzhihuix/happy-claude-skills/tree/main/skills/video-processor

Related in Ads & Marketing

Included

Multi-platform paid advertising audit and optimization skill. Analyzes Google, Meta, YouTube, LinkedIn, TikTok, Microsoft, and Apple Ads. 250+ checks with scoring, parallel agents, industry templates, and AI creative generation.

Ads & Marketingscriptsfeatured

banana

Included

AI image generation Creative Director powered by Google Gemini Nano Banana models. Use this skill for ANY request involving image creation, editing, visual asset production, or creative direction. Triggers on: generate an image, create a photo, edit this picture, design a logo, make a banner, visual for my anything, and all /banana commands. Handles text-to-image, image editing, multi-turn creative sessions, batch workflows, and brand presets.

Ads & Marketingscriptsfeatured

rpg-migration-analyzer

Included

Analyzes legacy RPG (Report Program Generator) programs from AS/400 and IBM i systems for migration to modern Java applications. Extracts business logic from RPG III/IV/ILE source code, identifies data structures (D-specs), file operations (F-specs), program dependencies (CALLB/CALLP), and converts RPG constructs to Java equivalents. Generates migration reports, complexity estimates, and Java implementation strategies with POJO classes, JPA entities, and service methods. Use when modernizing AS/400 or IBM i legacy systems, analyzing RPG source files (.rpg, .rpgle, .RPGLE), converting RPG to Java, mapping data specifications to Java classes, planning legacy system migration, or when user mentions RPG analysis, Report Program Generator, RPG III/IV/ILE, AS/400 modernization, IBM i migration, packed decimal conversion, or mainframe application rewrite.

Ads & Marketingscripts

brand-library-architect

Included

Build a complete brand library for a product — visual asset render pipeline, brand documentation set (BRAND, COPY, MANIFESTO, BIOS, FAQ, GLOSSARY, TONE, PRICING), open-source convention files (README, CONTRIBUTING, SECURITY, CODE_OF_CONDUCT), and a self-contained press kit. This skill should be used when the user asks to "build a brand library / brand kit / press kit / brand assets" for a product, "set up a brand library workflow," "create a positioning manifesto plus visual identity," or any combination of brand documentation + visual asset pipeline. Apply phase-by-phase or run end-to-end. Templates are product-agnostic and use {{TOKEN}} placeholders the skill prompts the user to fill.

Ads & Marketingscripts

writing-tech-post

Included

Authors engineering blog posts end-to-end: launch deep-dives, incident postmortems, architecture migrations, performance case studies, tutorials, AI/agent system writeups, security disclosures, and research-to-product translations. Picks the correct archetype, plans the abstraction ladder, enforces an evidence cadence (diagrams, benchmarks, profiles, traces, code, ablations), tunes voice against publisher house styles (Datadog, Vercel, GitHub, AWS, Meta, Cloudflare, Jane Street), and runs a pre-publish gate for narrative momentum and disclosure ethics. Use when drafting a new engineering post, restructuring a draft that feels flat, deciding which evidence form belongs where, validating that depth and product context are balanced, or preparing a postmortem, migration, or performance narrative for external publication. Do not use for API reference documentation, README authoring, marketing copy, release notes, generic SEO content, ghost-written executive thought leadership, or non-engineering long-form essays.

Ads & Marketingscripts

blog-google

Included

Google API integration for blog performance: PageSpeed Insights, CrUX Core Web Vitals with 25-week history, Search Console performance, URL Inspection, Indexing API, GA4 organic traffic, NLP entity analysis for E-E-A-T, YouTube video search for embedding, and Google Ads Keyword Planner. Progressive feature availability based on credential tier (API key, OAuth/service account, GA4, Ads). Shares config with claude-seo at ~/.config/claude-seo/google-api.json. Use when user says "google data", "page speed", "core web vitals", "search console", "indexation", "GA4", "keyword research", "nlp entities", "blog performance", "youtube search", "google api setup".

Ads & Marketingscripts

Included

Ads & Marketingscriptsfeatured

video-processor

What this skill does

Related in Ads & Marketing

ads

banana

rpg-migration-analyzer

brand-library-architect

writing-tech-post

blog-google

ads

banana

rpg-migration-analyzer

brand-library-architect

writing-tech-post

blog-google