youtube-summarizer
Automatically fetch YouTube video transcripts, generate structured summaries, and send full transcripts to messaging platforms. Detects YouTube URLs and provides metadata, key insights, and downloadable transcripts.
What this skill does
# YouTube Summarizer Skill
Automatically fetch transcripts from YouTube videos, generate structured summaries, and deliver full transcripts to messaging platforms.
## When to Use
Activate this skill when:
- User shares a YouTube URL (youtube.com/watch, youtu.be, youtube.com/shorts)
- User asks to summarize or transcribe a YouTube video
- User requests information about a YouTube video's content
## Dependencies
**Required:** MCP YouTube Transcript server must be installed at:
`/root/clawd/mcp-server-youtube-transcript`
If not present, install it:
```bash
cd /root/clawd
git clone https://github.com/kimtaeyoon83/mcp-server-youtube-transcript.git
cd mcp-server-youtube-transcript
npm install && npm run build
```
## Workflow
### 1. Detect YouTube URL
Extract video ID from these patterns:
- `https://www.youtube.com/watch?v=VIDEO_ID`
- `https://youtu.be/VIDEO_ID`
- `https://www.youtube.com/shorts/VIDEO_ID`
- Direct video ID: `VIDEO_ID` (11 characters)
### 2. Fetch Transcript
Run this command to get the transcript:
```bash
cd /root/clawd/mcp-server-youtube-transcript && node --input-type=module -e "
import { getSubtitles } from './dist/youtube-fetcher.js';
const result = await getSubtitles({ videoID: 'VIDEO_ID', lang: 'en' });
console.log(JSON.stringify(result, null, 2));
" > /tmp/yt-transcript.json
```
Replace `VIDEO_ID` with the extracted ID. Read the output from `/tmp/yt-transcript.json`.
### 3. Process the Data
Parse the JSON to extract:
- `result.metadata.title` - Video title
- `result.metadata.author` - Channel name
- `result.metadata.viewCount` - Formatted view count
- `result.metadata.publishDate` - Publication date
- `result.actualLang` - Language used
- `result.lines` - Array of transcript segments
Full text: `result.lines.map(l => l.text).join(' ')`
### 4. Generate Summary
Create a structured summary using this template:
```markdown
๐น **Video:** [title]
๐ค **Channel:** [author] | ๐๏ธ **Views:** [views] | ๐
**Published:** [date]
**๐ฏ Main Thesis:**
[1-2 sentence core argument/message]
**๐ก Key Insights:**
- [insight 1]
- [insight 2]
- [insight 3]
- [insight 4]
- [insight 5]
**๐ Notable Points:**
- [additional point 1]
- [additional point 2]
**๐ Takeaway:**
[Practical application or conclusion]
```
Aim for:
- Main thesis: 1-2 sentences maximum
- Key insights: 3-5 bullets, each 1-2 sentences
- Notable points: 2-4 supporting details
- Takeaway: Actionable conclusion
### 5. Save Full Transcript
Save the complete transcript to a timestamped file:
```
/root/clawd/transcripts/YYYY-MM-DD_VIDEO_ID.txt
```
Include in the file:
- Video metadata header
- Full transcript text
- URL reference
### 6. Platform-Specific Delivery
**If channel is Telegram:**
```bash
message --action send --channel telegram --target CHAT_ID \
--filePath /root/clawd/transcripts/YYYY-MM-DD_VIDEO_ID.txt \
--caption "๐ YouTube Transcript: [title]"
```
**If channel is other/webchat:**
Just reply with the summary (no file attachment).
### 7. Reply with Summary
Send the structured summary as your response to the user.
## Error Handling
**If transcript fetch fails:**
- Check if video has captions enabled
- Try with `lang: 'en'` fallback if requested language unavailable
- Inform user that transcript is not available and suggest alternatives:
- Manual YouTube transcript feature
- Video may not have captions
- Try a different video
**If MCP server not installed:**
- Provide installation instructions
- Offer to install it automatically if in appropriate context
**If video ID extraction fails:**
- Ask user to provide the full YouTube URL or video ID
## Examples
See `examples/` directory for sample outputs.
## Quality Guidelines
- **Be concise:** Summary should be scannable in 30 seconds
- **Be accurate:** Don't add information not in the transcript
- **Be structured:** Use consistent formatting for easy reading
- **Be contextual:** Adjust detail level based on video length
- Short videos (<5 min): Brief summary
- Long videos (>30 min): More detailed breakdown
## Notes
- MCP server uses Android client emulation to bypass YouTube's cloud IP blocking
- Works reliably from VPS/cloud environments where yt-dlp often fails
- Supports multiple languages with automatic fallback to English
- Transcript quality depends on YouTube's auto-generated captions or manual captions
Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.