apify-financial-news
Discover and extract financial news for tracked portfolio companies across 33 verified Tier 1 sources (Bloomberg, Reuters, FT, WSJ, IntelliNews, ČTK, PAP, BTA, TASR, ING Think, ECB, EC Press Corner, ...) plus broad Google News fallback. Use when the user asks to find news about a company, get press coverage, monitor financial press, run a news scan, or check headlines for a portfolio company. Reads tracked companies from data/companies.json. Do NOT use for marketing/social-listening (use apify/awesome-skills) or for morning-briefing formatting (out of scope).
What this skill does
# Financial News Intelligence
Discover and extract financial news for portfolio companies via Apify Actors. Two modes:
- **Single company** — news scan for one company.
- **Portfolio scan** — same pipeline run across multiple companies.
33 Tier 1 sources organized in 4 categories (Global / Pan-European / Institutional / CEE Local). Tier 2 = broad Google News fallback for unverified domains.
**Estimated cost**: $0.10–0.50 per single company, $1–5 per full portfolio scan.
## Prerequisites
- Apify access — preferred: `apify` CLI (`npm install -g apify-cli && apify login`); fallback: Apify MCP connector (`call-actor` tool). CLI is faster and preferred when both are available.
- Python 3 + `pip install readability-lxml lxml` (for the `extract_and_clean.py` post-processor)
- Companies data at `${CLAUDE_PLUGIN_ROOT}/data/companies.json`
`${CLAUDE_PLUGIN_ROOT}` is the plugin's root directory (where `.claude-plugin/` lives). It is resolved automatically by Claude Code when the plugin is installed, or set to the `--plugin-dir` path during development.
## Workflow checklist
Copy this and tick boxes as you progress:
```
Task Progress:
- [ ] Step 0: Verify prerequisites — try `apify --version && apify info`; if unavailable, check for `call-actor` MCP tool; if neither, tell user to install apify CLI or Apify MCP connector. Also verify: `python3 -c "from readability import Document; print('OK')"` (install: `pip install readability-lxml lxml`)
- [ ] Step 1: Build queries (look up company in data/companies.json or construct manually)
- [ ] Step 2: Discovery — pick 8-12 sources by region, run 2-phase Google News
- [ ] Step 3: Dedup + route (Tier 1 = whitelisted domain → verified extractor; Tier 2 = broad → rag-web-browser)
- [ ] Step 4: Extract & clean (run extractor, then extract_and_clean.py)
- [ ] Step 5: Output Tier 1 + Tier 2 tables to user
```
## Constraints
### Allowed Apify Actors (exhaustive — do NOT use others)
| Actor | Purpose |
|-------|---------|
| `data_xplorer/google-news-scraper-fast` | Google News discovery |
| `louvre/rss-news-aggregator` | RSS discovery |
| `rodrigo_pacelli/headline-news-scraper` | Headline discovery |
| `jamie_tran/bloomberg-article-scraper` | Bloomberg extraction |
| `romy/bloomberg-news-scraper` | Bloomberg fallback |
| `workhard3000/news-intelligence-rag-extractor` | Paywall extraction |
| `apify/rag-web-browser` | Free/soft-paywall + Tier 2 |
| `stanvanrooy6/universal-ai-web-scraper` | Hard paywall (Barron's, MarketWatch) — $0.25/page |
Do NOT use any other actor. Do NOT use WebSearch, WebFetch, or browser tools.
### Extractor routing (mandatory — do NOT substitute)
| Extractor | Domains |
|-----------|---------|
| `jamie_tran/bloomberg-article-scraper` | bloomberg.com |
| `workhard3000/news-intelligence-rag-extractor` | ft.com, wsj.com, economist.com, morningstar.com, asia.nikkei.com, caixinglobal.com, zawya.com, euobserver.com, reuters.com |
| `apify/rag-web-browser` | cnbc.com, forbes.com, investors.com, lesechos.fr, afr.com, scmp.com, euronews.com, intellinews.com, handelsblatt.com, politico.eu, eubusiness.com, eureporter.co, ecb.europa.eu, + all 7 CEE Local |
| `stanvanrooy6/universal-ai-web-scraper` | barrons.com, marketwatch.com |
| REST API (presscorner) | ec.europa.eu |
Full per-source config: [reference/SOURCE_CONFIGS.md](reference/SOURCE_CONFIGS.md). Machine-readable: [data/sources.json](data/sources.json).
## Pipeline
### Step 1: Build queries
Look up company in `${CLAUDE_PLUGIN_ROOT}/data/companies.json`. Key fields under `queries`: `gnews_en`, `gnews_cz`, `bloomberg`.
For non-portfolio companies, construct manually: quoted full legal name + ticker OR variant + geographic qualifier.
**Query rules:**
- Use `"InPost SA"` not `InPost` (quoted full names avoid false positives — see [reference/EUROPEAN_COMPANIES_GUIDE.md](reference/EUROPEAN_COMPANIES_GUIDE.md))
- Per-source `site:` operator: `site:bloomberg.com "InPost SA" OR "INPST"`
- FT tip: use `site:ft.com/content/` (bare `ft.com` returns stock-data pages)
- Valid timeframes: `"1h"`, `"1d"`, `"7d"`, `"1y"`, `"all"` (NOT `"30d"`)
- Tickers < 4 chars: always pair with full company name
- ALWAYS set `decodeUrls: true` on Google News input
### Step 2: Discovery
Do NOT search all 33 sources. Pick 8–12 based on company region.
#### Regional priorities
| Region | Priority Sources |
|--------|------------------|
| **CZ** | ČTK, IntelliNews, Reuters, Bloomberg, POLITICO EU, FT, Handelsblatt |
| **PL** | PAP, IntelliNews, Reuters, Bloomberg, POLITICO EU, FT |
| **HU** | Telex.hu, HVG.hu, VG.hu, IntelliNews, Reuters, Bloomberg |
| **BG** | BTA, IntelliNews, Reuters, Bloomberg, Euronews |
| **SK** | TASR, IntelliNews, Reuters, Bloomberg, Handelsblatt |
| **Western Europe** | Bloomberg, Reuters, FT, Handelsblatt, Les Echos, POLITICO EU |
| **US / Global** | Bloomberg, Reuters, WSJ, FT, CNBC, Forbes, Barron's |
| **Asia / MENA** | Bloomberg, Reuters, SCMP, Nikkei, Caixin, Zawya |
| **EU Regulatory** | POLITICO EU, EUobserver, EUbusiness, EU Reporter, EC Press Corner, ECB |
#### Two-phase Google News strategy
**Phase 1 — Targeted** (per priority source, with `site:`):
```bash
apify call data_xplorer/google-news-scraper-fast \
--input '{"keywords":["site:bloomberg.com \"InPost SA\" OR \"INPST\""],"maxArticles":10,"timeframe":"7d","region_language":"US:en","decodeUrls":true,"proxyConfiguration":{"useApifyProxy":true,"apifyProxyGroups":["RESIDENTIAL"]}}' \
--user-agent apify-awesome-skills/apify-financial-news \
--output-dataset > discovery_bloomberg.json
```
**Phase 2 — Broad** (always run, no `site:` operator). Classify results by domain in Step 3 — whitelisted → Tier 1, others → Tier 2.
**CEE local-language discovery**: For CEE companies, run additional queries with `region_language` set to `CZ:cs`, `PL:pl`, `HU:hu`, `BG:bg`, `SK:sk`. See `region_language` field per source in [data/sources.json](data/sources.json).
#### EC Press Corner — direct REST API (no Actor)
```bash
curl -s "https://ec.europa.eu/commission/presscorner/api/documents?reference=IP/26/614&language=en"
```
Parse `IP_XX_NNN` reference IDs from Google News titles to construct API calls.
#### RSS / Headline discovery (optional)
For sources with RSS, you can supplement GNews with `louvre/rss-news-aggregator` (max 10 feeds per run; split into batches). For 6 sources, `rodrigo_pacelli/headline-news-scraper` works (CNBC, SCMP, Nikkei, Caixin, Zawya tag-pages-only, Handelsblatt). RSS/headline output is unfiltered — filter client-side by company name in title/description. Full feed list: [reference/PIPELINE_DETAIL.md](reference/PIPELINE_DETAIL.md).
### Step 3: Dedup & route
1. Collect URLs from all discovery runs.
2. Classify by domain: whitelist (33 Tier 1 sources) → Tier 1; other → Tier 2.
3. Filter non-article URLs: `/quote/`, `/stock/`, `/sitemap`, `/author/`, `/tag/`, `/key-metrics/`, `/newsletters/`, `/topic/`, `/profile/`, `redirectUrl=`.
4. Filter by company name/ticker in title.
5. Deduplicate (URL normalize).
6. Route Tier 1 URLs to verified extractor per routing table.
7. Route Tier 2 URLs to `apify/rag-web-browser`.
**Low-coverage fallback** (< 3 articles): broaden timeframe to `"1y"`, run Phase 2 if skipped, try local-language queries.
### Step 4: Extract & clean
Run the extractor per the routing table. For `rag-web-browser` calls, use `outputFormats: ["html"]`.
After extraction, run `extract_and_clean.py` on the dataset to strip nav/menus/footers via readability-lxml. The script auto-detects format: HTML cleaned via readability-lxml, already-clean output (Bloomberg scraper, workhard3000) passes through.
```bash
DATASET_ID=$(apify call apify/rag-web-browser \
--input '{"query":"<ARTICLE_URL>","maxResults":1,"outputFormats":["html"],"requestTimeoutSecs":40,"proxyConfiguration":{"useApifyProxy":true,"apifyProxyGroups":["RESIDENTIAL"]},"removeCookieWarnings":true}' \
--user-agent apify-awesome-skills/apify-financial-news \
--json | jq -r '.defaultDataseRelated in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.