tech-news-digest
Generate tech news digests with unified source model, quality scoring, and multi-format output. Six-source data collection from RSS feeds, Twitter/X KOLs, GitHub releases, GitHub Trending, Reddit, and web search. Pipeline-based scripts with retry mechanisms and deduplication. Supports Discord, email, and markdown templates.
What this skill does
# Tech News Digest
Automated tech news digest system with unified data source model, quality scoring pipeline, and template-based output generation.
## Quick Start
1. **Configuration Setup**: Default configs are in `config/defaults/`. Copy to workspace for customization:
```bash
mkdir -p workspace/config
cp config/defaults/sources.json workspace/config/tech-news-digest-sources.json
cp config/defaults/topics.json workspace/config/tech-news-digest-topics.json
```
2. **Environment Variables**:
- `TWITTERAPI_IO_KEY` - twitterapi.io API key (optional, preferred)
- `X_BEARER_TOKEN` - Twitter/X official API bearer token (optional, fallback)
- `TAVILY_API_KEY` - Tavily Search API key, alternative to Brave (optional)
- `WEB_SEARCH_BACKEND` - Web search backend: auto|brave|tavily (optional, default: auto)
- `BRAVE_API_KEYS` - Brave Search API keys, comma-separated for rotation (optional)
- `BRAVE_API_KEY` - Single Brave key fallback (optional)
- `GITHUB_TOKEN` - GitHub personal access token (optional, improves rate limits)
3. **Generate Digest**:
```bash
# Unified pipeline (recommended) โ runs all 6 sources in parallel + merge
python3 scripts/run-pipeline.py \
--defaults config/defaults \
--config workspace/config \
--hours 48 --freshness pd \
--archive-dir workspace/archive/tech-news-digest/ \
--output /tmp/td-merged.json --verbose --force
```
4. **Use Templates**: Apply Discord, email, or PDF templates to merged output
## Configuration Files
### `sources.json` - Unified Data Sources
```json
{
"sources": [
{
"id": "openai-rss",
"type": "rss",
"name": "OpenAI Blog",
"url": "https://openai.com/blog/rss.xml",
"enabled": true,
"priority": true,
"topics": ["llm", "ai-agent"],
"note": "Official OpenAI updates"
},
{
"id": "sama-twitter",
"type": "twitter",
"name": "Sam Altman",
"handle": "sama",
"enabled": true,
"priority": true,
"topics": ["llm", "frontier-tech"],
"note": "OpenAI CEO"
}
]
}
```
### `topics.json` - Enhanced Topic Definitions
```json
{
"topics": [
{
"id": "llm",
"emoji": "๐ง ",
"label": "LLM / Large Models",
"description": "Large Language Models, foundation models, breakthroughs",
"search": {
"queries": ["LLM latest news", "large language model breakthroughs"],
"must_include": ["LLM", "large language model", "foundation model"],
"exclude": ["tutorial", "beginner guide"]
},
"display": {
"max_items": 8,
"style": "detailed"
}
}
]
}
```
## Scripts Pipeline
### `run-pipeline.py` - Unified Pipeline (Recommended)
```bash
python3 scripts/run-pipeline.py \
--defaults config/defaults [--config CONFIG_DIR] \
--hours 48 --freshness pd \
--archive-dir workspace/archive/tech-news-digest/ \
--output /tmp/td-merged.json --verbose --force
```
- **Features**: Runs all 6 fetch steps in parallel, then merges + deduplicates + scores
- **Output**: Final merged JSON ready for report generation (~30s total)
- **Metadata**: Saves per-step timing and counts to `*.meta.json`
- **GitHub Auth**: Auto-generates GitHub App token if `$GITHUB_TOKEN` not set
- **Fallback**: If this fails, run individual scripts below
### Individual Scripts (Fallback)
#### `fetch-rss.py` - RSS Feed Fetcher
```bash
python3 scripts/fetch-rss.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--verbose]
```
- Parallel fetching (10 workers), retry with backoff, feedparser + regex fallback
- Timeout: 30s per feed, ETag/Last-Modified caching
#### `fetch-twitter.py` - Twitter/X KOL Monitor
```bash
python3 scripts/fetch-twitter.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--backend auto|official|twitterapiio]
```
- Backend auto-detection: uses twitterapi.io if `TWITTERAPI_IO_KEY` set, else official X API v2 if `X_BEARER_TOKEN` set
- Rate limit handling, engagement metrics, retry with backoff
#### `fetch-web.py` - Web Search Engine
```bash
python3 scripts/fetch-web.py [--defaults DIR] [--config DIR] [--freshness pd] [--output FILE]
```
- Auto-detects Brave API rate limit: paid plans โ parallel queries, free โ sequential
- Without API: generates search interface for agents
#### `fetch-github.py` - GitHub Releases Monitor
```bash
python3 scripts/fetch-github.py [--defaults DIR] [--config DIR] [--hours 168] [--output FILE]
```
- Parallel fetching (10 workers), 30s timeout
- Auth priority: `$GITHUB_TOKEN` โ GitHub App auto-generate โ `gh` CLI โ unauthenticated (60 req/hr)
#### `fetch-github.py --trending` - GitHub Trending Repos
```bash
python3 scripts/fetch-github.py --trending [--hours 48] [--output FILE] [--verbose]
```
- Searches GitHub API for trending repos across 4 topics (LLM, AI Agent, Crypto, Frontier Tech)
- Quality scoring: base 5 + daily_stars_est / 10, max 15
#### `fetch-reddit.py` - Reddit Posts Fetcher
```bash
python3 scripts/fetch-reddit.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE]
```
- Parallel fetching (4 workers), public JSON API (no auth required)
- 13 subreddits with score filtering
#### `enrich-articles.py` - Article Full-Text Enrichment
```bash
python3 scripts/enrich-articles.py --input merged.json --output enriched.json [--min-score 10] [--max-articles 15] [--verbose]
```
- Fetches full article text for high-scoring articles
- Cloudflare Markdown for Agents (preferred) โ HTML extraction (fallback) โ Skip (paywalled/social)
- Blog domain whitelist with lower score threshold (โฅ3)
- Parallel fetching (5 workers, 10s timeout)
#### `merge-sources.py` - Quality Scoring & Deduplication
```bash
python3 scripts/merge-sources.py --rss FILE --twitter FILE --web FILE --github FILE --reddit FILE
```
- Quality scoring, title similarity dedup (85%), previous digest penalty
- Output: topic-grouped articles sorted by score
#### `validate-config.py` - Configuration Validator
```bash
python3 scripts/validate-config.py [--defaults DIR] [--config DIR] [--verbose]
```
- JSON schema validation, topic reference checks, duplicate ID detection
#### `generate-pdf.py` - PDF Report Generator
```bash
python3 scripts/generate-pdf.py --input report.md --output digest.pdf [--verbose]
```
- Converts markdown digest to styled A4 PDF with Chinese typography (Noto Sans CJK SC)
- Emoji icons, page headers/footers, blue accent theme. Requires `weasyprint`.
#### `sanitize-html.py` - Safe HTML Email Converter
```bash
python3 scripts/sanitize-html.py --input report.md --output email.html [--verbose]
```
- Converts markdown to XSS-safe HTML email with inline CSS
- URL whitelist (http/https only), HTML-escaped text content
#### `source-health.py` - Source Health Monitor
```bash
python3 scripts/source-health.py --rss FILE --twitter FILE --github FILE --reddit FILE --web FILE [--verbose]
```
- Tracks per-source success/failure history over 7 days
- Reports unhealthy sources (>50% failure rate)
#### `summarize-merged.py` - Merged Data Summary
```bash
python3 scripts/summarize-merged.py --input merged.json [--top N] [--topic TOPIC]
```
- Human-readable summary of merged data for LLM consumption
- Shows top articles per topic with scores and metrics
## User Customization
### Workspace Configuration Override
Place custom configs in `workspace/config/` to override defaults:
- **Sources**: Append new sources, disable defaults with `"enabled": false`
- **Topics**: Override topic definitions, search queries, display settings
- **Merge Logic**:
- Sources with same `id` โ user version takes precedence
- Sources with new `id` โ appended to defaults
- Topics with same `id` โ user version completely replaces default
### Example Workspace Override
```json
// workspace/config/tech-news-digest-sources.json
{
"sources": [
{
"id": "simonwillison-rss",
"enabled": false,
"note": "Disabled: too noisy for my use case"
},
{
"id": "my-Related in Productivity
gitea-workflow
IncludedOrchestrate agile development workflows for Gitea repositories using the tea CLI. Use when working with Gitea-hosted repos and asking to 'run the workflow', 'continue working', 'what's next', 'complete the task cycle', 'start my day', 'end the sprint', 'implement the next task', or wanting guided step-by-step development assistance. Keywords: workflow, orchestrate, agile, task cycle, sprint, daily, implement, review, PR, standup, retrospective, gitea, tea.
microsoft-graph-gateway
IncludedRoute Microsoft Graph work in this workspace. Use when users want to read or write Outlook mail, calendar events, contacts, OneDrive or SharePoint files, Teams, Planner, To Do, users, groups, directory data, or arbitrary Microsoft Graph endpoints from VS Code. Prefer WorkIQ for common read scenarios. Use Microsoft Graph for write actions and gap-read scenarios that need exact Graph properties, filters, permissions, or endpoints.
copilotkit
IncludedUse when building with CopilotKit โ setup, development, integrations, debugging, upgrading, or contributing. Routes to the appropriate specialized skill based on the task.
wordly-wisdom
IncludedProvides calibrated decision analysis using Charlie Munger-style multiple mental models, inversion, incentive mapping, circle-of-competence checks, misjudgment audits, second-order effects, and forecast updates. Use when the user asks for an oracle take, a hard call, a decision memo, a premortem, an outside view, a red-team, a sanity-check, what am I missing, think this through, or wants a strategy, hire, investment, plan, product, partnership, or major life choice analysed. Avoid for simple factual lookups or time-sensitive legal, medical, or market questions without fresh evidence.
swain-session
IncludedSession management and project status dashboard. Owns the full session lifecycle (start/work/close/resume), focus lane, bookmarks, worktree detection, and tab naming. Also serves as the project status dashboard โ shows active epics, progress, actionable next steps, blocked items, tasks, GitHub issues, and recommendations. Worktree creation is deferred to swain-do task dispatch (SPEC-195). Triggers on: 'session', 'status', 'what's next', 'dashboard', 'overview', 'where are we', 'what should I work on', 'show me priorities', 'bookmark', 'focus on', 'session info'.
gandi
IncludedComprehensive Gandi domain registrar integration for domain and DNS management. Register and manage domains, create/update/delete DNS records (A, AAAA, CNAME, MX, TXT, SRV, and more), configure email forwarding and aliases, check SSL certificate status, create DNS snapshots for safe rollback, bulk update zone files, and monitor domain expiration. Supports multi-domain management, zone file import/export, and automated DNS backups. Includes both read-only and destructive operations with safety controls.