apify-scrapers
Social media and web scraping using Apify actors. Use this skill when scraping Twitter/X tweets, Reddit posts, LinkedIn posts, Instagram profiles/posts/reels, Facebook pages/posts/groups, TikTok videos, YouTube content, Google Maps businesses/reviews, contact enrichment (emails/phones from websites), or when auto-detecting URL type to scrape. Triggers on requests to scrape social media, get trending posts, extract business info, find contact details, or extract content from social URLs.
What this skill does
# Apify Scrapers
## Overview
Scrape content from major social platforms using Apify actors. Each platform has optimized settings for cost and quality.
## Quick Decision Tree
```
What do you want to scrape?
│
├── Social Media Posts
│ ├── Twitter/X → references/twitter.md
│ │ └── Script: scripts/scrape_twitter_ai_trends.py
│ │
│ ├── Reddit → references/reddit.md
│ │ └── Script: scripts/scrape_reddit_ai_tech.py
│ │
│ ├── LinkedIn → references/linkedin.md
│ │ └── Script: scripts/scrape_linkedin_posts.py
│ │
│ ├── Instagram → references/instagram.md
│ │ └── Script: scripts/scrape_instagram.py
│ │ └── Modes: profile, posts, hashtag, reels, comments
│ │
│ ├── Facebook → references/facebook.md
│ │ └── Script: scripts/scrape_facebook.py
│ │ └── Modes: page, posts, reviews, groups, marketplace
│ │
│ ├── TikTok → references/multi-platform.md
│ │ └── Script: scripts/scrape_multi_platform.py
│ │
│ └── YouTube → references/multi-platform.md
│ └── Script: scripts/scrape_multi_platform.py
│
├── Business/Places
│ ├── Google Maps businesses → references/google-maps.md
│ │ └── Script: scripts/scrape_google_maps.py
│ │ └── Modes: search, place, reviews
│ │
│ └── Contact info from websites → references/contact-enrichment.md
│ └── Script: scripts/scrape_contact_info.py
│ └── Extract: emails, phone numbers, social profiles
│
├── Auto-detect URL type → references/url-detect.md
│ └── Script: scripts/scrape_content_by_url.py
│
├── Trend Analysis (NEW)
│ └── Enriched trend analysis → workflows/trend-analysis.md
│ └── Script: scripts/analyze_trends.py
│ └── Features: velocity scoring, lifecycle staging, opportunity scoring
│
└── Workflows (multi-step)
├── Lead generation → workflows/lead-generation.md
├── Influencer discovery → workflows/influencer-discovery.md
├── Competitor analysis → workflows/competitor-intel.md
├── Trend analysis → workflows/trend-analysis.md
└── Competitor Ads Intelligence (NEW) → workflows/competitor-ads.md
└── Script: scripts/scrape_competitor_ads.py
└── Platforms: Facebook Ads Library, Google Ads Transparency
└── Features: Spend estimates, creative analysis, benchmarking
```
## Environment Setup
```bash
# Required in .env
APIFY_TOKEN=apify_api_xxxxx
```
Get your API key: https://console.apify.com/account/integrations
## Common Usage Patterns
### Scrape Twitter Trends
```bash
python scripts/scrape_twitter_ai_trends.py --query "AI agents" --max-tweets 50
```
### Scrape Reddit Discussions
```bash
python scripts/scrape_reddit_ai_tech.py --subreddits "MachineLearning,LocalLLaMA" --max-posts 100
```
### Scrape LinkedIn Author
```bash
python scripts/scrape_linkedin_posts.py author "https://linkedin.com/in/username" --max-posts 30
```
### Auto-detect and Scrape URL
```bash
python scripts/scrape_content_by_url.py "https://x.com/user/status/123456"
```
### Scrape Instagram Profile
```bash
python scripts/scrape_instagram.py profile "https://instagram.com/username" --max-posts 20
```
### Scrape Instagram Hashtag
```bash
python scripts/scrape_instagram.py hashtag "#artificialintelligence" --max-posts 50
```
### Scrape Instagram Reels
```bash
python scripts/scrape_instagram.py reels "https://instagram.com/username" --max-reels 30
```
### Scrape Facebook Page
```bash
python scripts/scrape_facebook.py page "https://facebook.com/pagename" --max-posts 50
```
### Scrape Facebook Reviews
```bash
python scripts/scrape_facebook.py reviews "https://facebook.com/pagename" --max-reviews 100
```
### Scrape Facebook Marketplace
```bash
python scripts/scrape_facebook.py marketplace "laptops in san francisco" --max-items 30
```
### Scrape Google Maps Businesses
```bash
python scripts/scrape_google_maps.py search "AI consulting firms in New York" --max-results 50
```
### Scrape Google Maps Reviews
```bash
python scripts/scrape_google_maps.py reviews "ChIJN1t_tDeuEmsRUsoyG83frY4" --max-reviews 100
```
### Extract Contact Info from Websites
```bash
python scripts/scrape_contact_info.py "https://example.com" --depth 2
```
### Bulk Contact Enrichment
```bash
python scripts/scrape_contact_info.py --urls-file companies.txt --output contacts.json
```
### Scrape Competitor Ads (Single Competitor)
```bash
python scripts/scrape_competitor_ads.py "Nike" --platforms facebook google --country US --days 30
```
### Compare Multiple Competitors' Ads
```bash
python scripts/scrape_competitor_ads.py "Nike" "Adidas" "Puma" --compare --output comparison.json
```
### Discover Advertisers by Keyword
```bash
python scripts/scrape_competitor_ads.py --search "running shoes" --country US --max-ads 200
```
### Filter Competitor Ads by Media Type
```bash
python scripts/scrape_competitor_ads.py "Netflix" "Disney+" --platforms facebook --media-types video --days 7
```
### Analyze Trends (NEW)
```bash
# Analyze specific topic with enrichments
python scripts/analyze_trends.py "artificial intelligence" --sources google instagram tiktok --days 90
# Discover trending topics in category
python scripts/analyze_trends.py --category technology --discover --top 50
# Compare multiple trends
python scripts/analyze_trends.py "AI" "blockchain" "metaverse" --compare
# Export HTML trend report
python scripts/analyze_trends.py "sustainable fashion" --format html --output trend_report.html
```
## Cost Estimates
| Platform | Actor | Cost per Item |
|----------|-------|---------------|
| Twitter | kaitoeasyapi/twitter-x-data-tweet-scraper | ~$0.00025 |
| Reddit | trudax/reddit-scraper | ~$0.001-0.005 |
| LinkedIn | harvestapi/linkedin-post-search | ~$0.01-0.05 |
| YouTube | streamers/youtube-scraper | ~$0.01-0.05 |
| TikTok | clockworks/tiktok-scraper | ~$0.005 |
| Instagram (profile) | apify/instagram-profile-scraper | ~$0.005 |
| Instagram (posts) | apify/instagram-post-scraper | ~$0.002-0.005 |
| Instagram (hashtag) | apify/instagram-hashtag-scraper | ~$0.002-0.005 |
| Instagram (reels) | apify/instagram-reel-scraper | ~$0.005-0.01 |
| Instagram (comments) | apify/instagram-comment-scraper | ~$0.001-0.003 |
| Facebook (page) | apify/facebook-pages-scraper | ~$0.005-0.01 |
| Facebook (posts) | apify/facebook-posts-scraper | ~$0.003-0.005 |
| Facebook (reviews) | apify/facebook-reviews-scraper | ~$0.002-0.005 |
| Facebook (groups) | apify/facebook-groups-scraper | ~$0.005-0.01 |
| Facebook (marketplace) | apify/facebook-marketplace-scraper | ~$0.005-0.01 |
| Google Maps (search) | compass/crawler-google-places | ~$0.01-0.02 |
| Google Maps (place) | compass/google-maps-business-scraper | ~$0.01 |
| Google Maps (reviews) | compass/google-maps-reviews-scraper | ~$0.003-0.005 |
| Contact Enrichment | lukaskrivka/contact-info-scraper | ~$0.01-0.03 |
| Google Trends | apify/google-trends-scraper | ~$0.01 |
| Trend Analysis (multi) | Multiple actors | ~$0.50-1.50/run |
| Facebook Ads Library | apify/facebook-ads-scraper | ~$0.75/1K ads |
| Facebook Ads (alt) | curious_coder/facebook-ads-library-scraper | ~$0.50/1K ads |
| Google Ads Transparency | lexis-solutions/google-ads-scraper | ~$1.00/1K ads |
| Google Ads (alt) | xtech/google-ad-transparency-scraper | ~$0.80/1K ads |
## Output Location
All scraped data saves to `.tmp/` with timestamped filenames:
- `.tmp/twitter_ai_trends_YYYYMMDD.json`
- `.tmp/reddit_ai_tech_YYYYMMDD.json`
- `.tmp/linkedin_posts_YYYYMMDD_HHMMSS.json`
## Security Notes
### Credential Handling
- Store `APIFY_TOKEN` in `.env` file (never commit to git)
- Rotate API tokens periodically via Apify Console
- Never log or print API tokens in script output
- Use environment variables, not hardcoded values
### Data Privacy
- Scraped data contains only publicly available content
- Social media posts may include PII (names, handles, profile info)
- Data is stored locally in `.tmp/` directory
- No data is retained by Apify after actor run completes
- Consider data minimization - only scrape what you need
### Access Scopes
- Apify tokens Related in Ads & Marketing
ads
IncludedMulti-platform paid advertising audit and optimization skill. Analyzes Google, Meta, YouTube, LinkedIn, TikTok, Microsoft, and Apple Ads. 250+ checks with scoring, parallel agents, industry templates, and AI creative generation.
banana
IncludedAI image generation Creative Director powered by Google Gemini Nano Banana models. Use this skill for ANY request involving image creation, editing, visual asset production, or creative direction. Triggers on: generate an image, create a photo, edit this picture, design a logo, make a banner, visual for my anything, and all /banana commands. Handles text-to-image, image editing, multi-turn creative sessions, batch workflows, and brand presets.
rpg-migration-analyzer
IncludedAnalyzes legacy RPG (Report Program Generator) programs from AS/400 and IBM i systems for migration to modern Java applications. Extracts business logic from RPG III/IV/ILE source code, identifies data structures (D-specs), file operations (F-specs), program dependencies (CALLB/CALLP), and converts RPG constructs to Java equivalents. Generates migration reports, complexity estimates, and Java implementation strategies with POJO classes, JPA entities, and service methods. Use when modernizing AS/400 or IBM i legacy systems, analyzing RPG source files (.rpg, .rpgle, .RPGLE), converting RPG to Java, mapping data specifications to Java classes, planning legacy system migration, or when user mentions RPG analysis, Report Program Generator, RPG III/IV/ILE, AS/400 modernization, IBM i migration, packed decimal conversion, or mainframe application rewrite.
brand-library-architect
IncludedBuild a complete brand library for a product — visual asset render pipeline, brand documentation set (BRAND, COPY, MANIFESTO, BIOS, FAQ, GLOSSARY, TONE, PRICING), open-source convention files (README, CONTRIBUTING, SECURITY, CODE_OF_CONDUCT), and a self-contained press kit. This skill should be used when the user asks to "build a brand library / brand kit / press kit / brand assets" for a product, "set up a brand library workflow," "create a positioning manifesto plus visual identity," or any combination of brand documentation + visual asset pipeline. Apply phase-by-phase or run end-to-end. Templates are product-agnostic and use {{TOKEN}} placeholders the skill prompts the user to fill.
writing-tech-post
IncludedAuthors engineering blog posts end-to-end: launch deep-dives, incident postmortems, architecture migrations, performance case studies, tutorials, AI/agent system writeups, security disclosures, and research-to-product translations. Picks the correct archetype, plans the abstraction ladder, enforces an evidence cadence (diagrams, benchmarks, profiles, traces, code, ablations), tunes voice against publisher house styles (Datadog, Vercel, GitHub, AWS, Meta, Cloudflare, Jane Street), and runs a pre-publish gate for narrative momentum and disclosure ethics. Use when drafting a new engineering post, restructuring a draft that feels flat, deciding which evidence form belongs where, validating that depth and product context are balanced, or preparing a postmortem, migration, or performance narrative for external publication. Do not use for API reference documentation, README authoring, marketing copy, release notes, generic SEO content, ghost-written executive thought leadership, or non-engineering long-form essays.
blog-google
IncludedGoogle API integration for blog performance: PageSpeed Insights, CrUX Core Web Vitals with 25-week history, Search Console performance, URL Inspection, Indexing API, GA4 organic traffic, NLP entity analysis for E-E-A-T, YouTube video search for embedding, and Google Ads Keyword Planner. Progressive feature availability based on credential tier (API key, OAuth/service account, GA4, Ads). Shares config with claude-seo at ~/.config/claude-seo/google-api.json. Use when user says "google data", "page speed", "core web vitals", "search console", "indexation", "GA4", "keyword research", "nlp entities", "blog performance", "youtube search", "google api setup".