video-content-extractor
Included with Lifetime
$97 forever
Extract key frames from MP4 videos at configurable intervals, run Tesseract OCR, and generate structured Markdown reports with video metadata and timestamped text transcripts.
media-processingvideoocrffmpegtesseractframe-extractionmedia
What this skill does
# Video Content Extractor ## Overview Automatically extracts key frames from MP4 video files at configurable time intervals, performs OCR text recognition on each frame, and generates a structured Markdown report. The report includes video metadata (duration, resolution, codecs) and frame-by-frame OCR transcripts with timestamp references. This skill is designed for Codex CLI and requires FFmpeg and Tesseract OCR installed on the local machine. ## When to Use This Skill - Use when you need to extract text content from video presentations, lectures, or screencasts. - Use when you want to create searchable transcripts from video files without embedded subtitles. - Use when you need to analyze video content programmatically and generate structured summaries. - Use when the user asks to "read what is on screen" or "extract the content from this video." ## How It Works ### Step 1: Analyze Video Metadata The skill uses ffprobe to extract video metadata: duration, resolution, frame rate, codec information, and file size. ### Step 2: Extract Key Frames Using FFmpeg, the skill captures frames at the configured interval (default: every 30 seconds). Each frame is saved as a timestamped JPEG image. ### Step 3: OCR Text Recognition Each extracted frame is processed by Tesseract OCR. If the default PSM mode returns no meaningful text, it falls back to fully automatic page segmentation. ### Step 4: Generate Markdown Report All extracted data is assembled into a structured Markdown document. ## Examples ### Example 1: Basic Extraction Agent prompt: Use the video-content-extractor skill to extract content from lecture.mp4 Output generates lecture.md and lecture_frames/ directory. ### Example 2: Custom Interval Parameters: video_path, output_dir, interval(seconds), lang Extract every 60 seconds with English-only OCR: python scripts/extract_video.py recording.mp4 ./output 60 eng ### Example 3: Bilingual Content Extract with default Chinese + English OCR: python scripts/extract_video.py lecture.mp4 . 15 chi_sim+eng ## Best Practices - Use shorter intervals (10-15s) for fast-paced content with frequent text changes. - Use longer intervals (30-60s) for presentation slides or slow lectures to reduce duplicate frames. - For Chinese content, ensure Tesseract Chinese language pack is installed (chi_sim). ## Limitations - Requires FFmpeg and Tesseract OCR to be installed and accessible via PATH. - Tesseract OCR accuracy depends on video quality, text size, and font clarity. - Does not extract audio or perform speech-to-text transcription. - Frame extraction is time-based (not scene-change-based), which may produce near-duplicate frames. - Large videos with short intervals can generate many frames - ensure sufficient disk space. ## Security and Safety Notes - This skill only reads video files and writes extracted frames and Markdown reports. - It does NOT send any data over the network - all processing is local. - FFmpeg and Tesseract are invoked with fixed, pre-vetted arguments. - The skill does not modify or delete the original video file. ## Common Pitfalls - Problem: Tesseract returns garbled text Solution: Ensure the correct language pack is installed. Run tesseract --list-langs to verify. - Problem: FFmpeg fails with "not found" Solution: Make sure FFmpeg is on PATH. Run ffmpeg -version to verify. - Problem: OCR is slow on large videos Solution: Increase the interval parameter to reduce frames processed. ## Related Skills - @media-summarizer - For summarizing video content using visual and audio cues. - @document-ocr - For OCR on static images or scanned documents without video processing.