daily-news-report
Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.
What this skill does
# Daily News Report v3.0
> **Architecture Upgrade**: Main Agent Orchestration + SubAgent Execution + Browser Scraping + Smart Caching
## Core Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ Main Agent (Orchestrator) │
│ Role: Scheduling, Monitoring, Evaluation, Decision, Aggregation │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ 1. Init │ → │ 2. Dispatch │ → │ 3. Monitor │ → │ 4. Evaluate │ │
│ │ Read Config │ │ Assign Tasks│ │ Collect Res │ │ Filter/Sort │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ 5. Decision │ ← │ Enough 20? │ │ 6. Generate │ → │ 7. Update │ │
│ │ Cont/Stop │ │ Y/N │ │ Report File │ │ Cache Stats │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
↓ Dispatch ↑ Return Results
┌─────────────────────────────────────────────────────────────────────┐
│ SubAgent Execution Layer │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Worker A │ │ Worker B │ │ Browser │ │
│ │ (WebFetch) │ │ (WebFetch) │ │ (Headless) │ │
│ │ Tier1 Batch │ │ Tier2 Batch │ │ JS Render │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Structured Result Return │ │
│ │ { status, data: [...], errors: [...], metadata: {...} } │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
## Configuration Files
This skill uses the following configuration files:
| File | Purpose |
|------|---------|
| `sources.json` | Source configuration, priorities, scrape methods |
| `cache.json` | Cached data, historical stats, deduplication fingerprints |
## Execution Process Details
### Phase 1: Initialization
```yaml
Steps:
1. Determine date (user argument or current date)
2. Read sources.json for source configurations
3. Read cache.json for historical data
4. Create output directory NewsReport/
5. Check if a partial report exists for today (append mode)
```
### Phase 2: Dispatch SubAgents
**Strategy**: Parallel dispatch, batch execution, early stopping mechanism
```yaml
Wave 1 (Parallel):
- Worker A: Tier1 Batch A (HN, HuggingFace Papers)
- Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)
Wait for results → Evaluate count
If < 15 high-quality items:
Wave 2 (Parallel):
- Worker C: Tier2 Batch A (James Clear, FS Blog)
- Worker D: Tier2 Batch B (HackerNoon, Scott Young)
If still < 20 items:
Wave 3 (Browser):
- Browser Worker: ProductHunt, Latent Space (Require JS rendering)
```
### Phase 3: SubAgent Task Format
Task format received by each SubAgent:
```yaml
task: fetch_and_extract
sources:
- id: hn
url: https://news.ycombinator.com
extract: top_10
- id: hf_papers
url: https://huggingface.co/papers
extract: top_voted
output_schema:
items:
- source_id: string # Source Identifier
title: string # Title
summary: string # 2-4 sentence summary
key_points: string[] # Max 3 key points
url: string # Original URL
keywords: string[] # Keywords
quality_score: 1-5 # Quality Score
constraints:
filter: "Cutting-edge Tech/Deep Tech/Productivity/Practical Info"
exclude: "General Science/Marketing Puff/Overly Academic/Job Posts"
max_items_per_source: 10
skip_on_error: true
return_format: JSON
```
### Phase 4: Main Agent Monitoring & Feedback
Main Agent Responsibilities:
```yaml
Monitoring:
- Check SubAgent return status (success/partial/failed)
- Count collected items
- Record success rate per source
Feedback Loop:
- If a SubAgent fails, decide whether to retry or skip
- If a source fails persistently, mark as disabled
- Dynamically adjust source selection for subsequent batches
Decision:
- Items >= 25 AND HighQuality >= 20 → Stop scraping
- Items < 15 → Continue to next batch
- All batches done but < 20 → Generate with available content (Quality over Quantity)
```
### Phase 5: Evaluation & Filtering
```yaml
Deduplication:
- Exact URL match
- Title similarity (>80% considered duplicate)
- Check cache.json to avoid history duplicates
Score Calibration:
- Unify scoring standards across SubAgents
- Adjust weights based on source credibility
- Bonus points for manually curated high-quality sources
Sorting:
- Descending order by quality_score
- Sort by source priority if scores are equal
- Take Top 20
```
### Phase 6: Browser Scraping (MCP Chrome DevTools)
For pages requiring JS rendering, use a headless browser:
```yaml
Process:
1. Call mcp__chrome-devtools__new_page to open page
2. Call mcp__chrome-devtools__wait_for to wait for content load
3. Call mcp__chrome-devtools__take_snapshot to get page structure
4. Parse snapshot to extract required content
5. Call mcp__chrome-devtools__close_page to close page
Applicable Scenarios:
- ProductHunt (403 on WebFetch)
- Latent Space (Substack JS rendering)
- Other SPA applications
```
### Phase 7: Generate Report
```yaml
Output:
- Directory: NewsReport/
- Filename: YYYY-MM-DD-news-report.md
- Format: Standard Markdown
Content Structure:
- Title + Date
- Statistical Summary (Source count, items collected)
- 20 High-Quality Items (Template based)
- Generation Info (Version, Timestamps)
```
### Phase 8: Update Cache
```yaml
Update cache.json:
- last_run: Record this run info
- source_stats: Update stats per source
- url_cache: Add processed URLs
- content_hashes: Add content fingerprints
- article_history: Record included articles
```
## SubAgent Call Examples
### Using general-purpose Agent
Since custom agents require session restart to be discovered, use general-purpose and inject worker prompts:
```
Task Call:
subagent_type: general-purpose
model: haiku
prompt: |
You are a stateless execution unit. Only do the assigned task and return structured JSON.
Task: Scrape the following URLs and extract content
URLs:
- https://news.ycombinator.com (Extract Top 10)
- https://huggingface.co/papers (Extract top voted papers)
Output Format:
{
"status": "success" | "partial" | "failed",
"data": [
{
"source_id": "hn",
"title": "...",
"summary": "...",
"key_points": ["...", "...", "..."],
"url": "...",
"keywords": ["...", "..."],
"quality_score": 4
}
],
"errors": [],
"metadata": { "processed": 2, "failed": 0 }
}
Filter Criteria:
- Keep: Cutting-edge Tech/Deep Tech/Productivity/Practical Info
- Exclude: General Science/Marketing Puff/Overly Academic/Job Posts
Related in Writing & Docs
jax-development
IncludedUse this skill when the user is writing, debugging, profiling, refactoring, reviewing, benchmarking, parallelising, exporting, or explaining JAX code, or when they mention JAX, jax.numpy, jit, grad, value_and_grad, vmap, scan, lax, random keys, pytrees, jax.Array, sharding, Mesh, PartitionSpec, NamedSharding, pmap, shard_map, Pallas, XLA, StableHLO, checkify, profiler, or the JAX repo. It helps turn NumPy or PyTorch-style code into pure functional JAX, fix tracer/control-flow/shape/PRNG bugs, remove recompiles and host-device syncs, choose transforms and sharding strategies, inspect jaxpr/lowering/IR, and benchmark compiled code correctly.
nature-article-writer
IncludedDrafts, rewrites, diagnostically critiques, and style-calibrates primary research manuscripts for Nature and Nature Portfolio journals. Use when the user wants a Nature-style title, summary paragraph or abstract, introduction, results, discussion, methods, figure legends, presubmission enquiry, cover letter, reviewer response, or when a scientific draft sounds generic, jargon-heavy, structurally weak, or AI-ish and needs precise, broad-reader-friendly prose without inventing data, analyses, or references. Best for primary research articles and letters rather than reviews or press releases unless explicitly adapting one.
deckrd
IncludedDocument-driven framework that derives requirements, specifications, implementation plans, and executable tasks from goals through structured AI dialogue. Use when user says "write requirements", "create spec", "plan implementation", "derive tasks", "structure this feature", "break down into tasks", or "document this module". Also use for reverse engineering existing code into docs (/deckrd rev). Do NOT use for direct code writing — use /deckrd-coder after tasks are generated. Do NOT use when the user only wants to run or fix existing code without planning.
clinical-decision-support
IncludedGenerate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
handling-sf-data
IncludedSalesforce data operations with 130-point scoring. Use this skill to create, update, delete, bulk import/export, generate test data, and clean up org records using sf CLI and anonymous Apex. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, needs data factory patterns for Apex tests, or needs to seed/clean records in a Salesforce org. DO NOT TRIGGER when: SOQL query writing only (use querying-soql), Apex test execution (use running-apex-tests), or metadata deployment (use deploying-metadata).
accelint-ac-to-playwright
IncludedConvert and validate acceptance criteria for Playwright test automation. Use when user asks to (1) review/evaluate/check if AC are ready for automation, (2) assess if AC can be converted as-is, (3) validate AC quality for Playwright, (4) turn AC into tests, (5) generate tests from acceptance criteria, (6) convert .md bullets or .feature Gherkin files to Playwright specs, (7) create test automation from requirements. Handles both bullet-style markdown and Gherkin syntax with JSON test plan generation and validation.