data-feeds
Extract structured data from 40+ websites including Amazon, LinkedIn, Instagram, TikTok, Facebook, YouTube, and more. Uses Bright Data's Web Data APIs with automatic polling. Returns clean JSON with product details, profiles, reviews, posts, and comments.
What this skill does
# Bright Data - Structured Data Feeds
Extract structured data from major websites with automatic parsing. No scraping logic needed - just provide a URL and get clean JSON data.
## Setup
### Environment Variables (Required)
```bash
export BRIGHTDATA_API_KEY="your-api-key"
```
### Optional
```bash
export BRIGHTDATA_POLLING_TIMEOUT=600 # Max seconds to wait (default: 600)
```
Get your API key from [Bright Data Dashboard](https://brightdata.com/cp).
## Usage
```bash
bash scripts/datasets.sh <dataset_type> <url> [additional_params...]
```
## Available Datasets
### E-Commerce
| Dataset | Command | Description |
|---------|---------|-------------|
| Amazon Product | `datasets.sh amazon_product <url>` | Product details, pricing, ratings |
| Amazon Reviews | `datasets.sh amazon_product_reviews <url>` | Customer reviews for a product |
| Amazon Search | `datasets.sh amazon_product_search <keyword> <domain_url>` | Search results |
| Walmart Product | `datasets.sh walmart_product <url>` | Product details from Walmart |
| Walmart Seller | `datasets.sh walmart_seller <url>` | Seller information |
| eBay Product | `datasets.sh ebay_product <url>` | eBay listing details |
| Home Depot | `datasets.sh homedepot_products <url>` | Home Depot product data |
| Zara | `datasets.sh zara_products <url>` | Zara product details |
| Etsy | `datasets.sh etsy_products <url>` | Etsy listing data |
| Best Buy | `datasets.sh bestbuy_products <url>` | Best Buy product info |
### Professional Networks
| Dataset | Command | Description |
|---------|---------|-------------|
| LinkedIn Person | `datasets.sh linkedin_person_profile <url>` | Profile data (experience, skills) |
| LinkedIn Company | `datasets.sh linkedin_company_profile <url>` | Company page data |
| LinkedIn Jobs | `datasets.sh linkedin_job_listings <url>` | Job posting details |
| LinkedIn Posts | `datasets.sh linkedin_posts <url>` | Post content and engagement |
| LinkedIn Search | `datasets.sh linkedin_people_search <url> <first> <last>` | Find people |
| Crunchbase | `datasets.sh crunchbase_company <url>` | Company funding, employees |
| ZoomInfo | `datasets.sh zoominfo_company_profile <url>` | Company profile data |
### Instagram
| Dataset | Command | Description |
|---------|---------|-------------|
| Profiles | `datasets.sh instagram_profiles <url>` | Bio, followers, following |
| Posts | `datasets.sh instagram_posts <url>` | Post details, likes, captions |
| Reels | `datasets.sh instagram_reels <url>` | Reel data and metrics |
| Comments | `datasets.sh instagram_comments <url>` | Post comments |
### Facebook
| Dataset | Command | Description |
|---------|---------|-------------|
| Posts | `datasets.sh facebook_posts <url>` | Post content and reactions |
| Marketplace | `datasets.sh facebook_marketplace_listings <url>` | Listing details |
| Reviews | `datasets.sh facebook_company_reviews <url> [num]` | Company reviews |
| Events | `datasets.sh facebook_events <url>` | Event details |
### TikTok
| Dataset | Command | Description |
|---------|---------|-------------|
| Profiles | `datasets.sh tiktok_profiles <url>` | Creator profile data |
| Posts | `datasets.sh tiktok_posts <url>` | Video details and metrics |
| Shop | `datasets.sh tiktok_shop <url>` | TikTok Shop product data |
| Comments | `datasets.sh tiktok_comments <url>` | Video comments |
### YouTube
| Dataset | Command | Description |
|---------|---------|-------------|
| Profiles | `datasets.sh youtube_profiles <url>` | Channel data |
| Videos | `datasets.sh youtube_videos <url>` | Video details and stats |
| Comments | `datasets.sh youtube_comments <url> [num]` | Video comments (default: 10) |
### Other Social
| Dataset | Command | Description |
|---------|---------|-------------|
| X (Twitter) | `datasets.sh x_posts <url>` | Tweet data |
| Reddit | `datasets.sh reddit_posts <url>` | Post and comment data |
### Google Services
| Dataset | Command | Description |
|---------|---------|-------------|
| Maps Reviews | `datasets.sh google_maps_reviews <url> [days]` | Business reviews (default: 3 days) |
| Shopping | `datasets.sh google_shopping <url>` | Product comparison data |
| Play Store | `datasets.sh google_play_store <url>` | App details and reviews |
### Other
| Dataset | Command | Description |
|---------|---------|-------------|
| Apple App Store | `datasets.sh apple_app_store <url>` | iOS app data |
| Reuters News | `datasets.sh reuter_news <url>` | News article content |
| GitHub | `datasets.sh github_repository_file <url>` | Repository file data |
| Yahoo Finance | `datasets.sh yahoo_finance_business <url>` | Stock and company data |
| Zillow | `datasets.sh zillow_properties_listing <url>` | Property listing details |
| Booking.com | `datasets.sh booking_hotel_listings <url>` | Hotel listing data |
## Examples
### Get LinkedIn Profile
```bash
bash scripts/datasets.sh linkedin_person_profile "https://www.linkedin.com/in/satyanadella/"
```
### Get Amazon Product
```bash
bash scripts/datasets.sh amazon_product "https://www.amazon.com/dp/B09V3KXJPB"
```
### Get Instagram Profile
```bash
bash scripts/datasets.sh instagram_profiles "https://www.instagram.com/natgeo/"
```
### Get YouTube Comments
```bash
bash scripts/datasets.sh youtube_comments "https://www.youtube.com/watch?v=dQw4w9WgXcQ" 20
```
### Search Amazon
```bash
bash scripts/datasets.sh amazon_product_search "wireless headphones" "https://www.amazon.com"
```
## Output Format
Returns structured JSON with website-specific fields. Example for LinkedIn profile:
```json
{
"name": "Satya Nadella",
"headline": "Chairman and CEO at Microsoft",
"location": "Greater Seattle Area",
"connections": "500+",
"experience": [...],
"education": [...],
"skills": [...]
}
```
## How It Works
1. **Trigger**: Sends URL to Bright Data's Web Data API
2. **Poll**: Waits for data collection to complete (checks every second)
3. **Return**: Outputs structured JSON when ready
The polling mechanism handles rate limits and ensures data quality by waiting for full extraction.
## Advanced: Direct Fetch
For custom dataset IDs or advanced use cases:
```bash
bash scripts/fetch.sh <dataset_id> '<json_input>'
```
Example:
```bash
bash scripts/fetch.sh gd_l1viktl72bvl7bjuj0 '{"url":"https://linkedin.com/in/someone"}'
```
Related in Ads & Marketing
ads
IncludedMulti-platform paid advertising audit and optimization skill. Analyzes Google, Meta, YouTube, LinkedIn, TikTok, Microsoft, and Apple Ads. 250+ checks with scoring, parallel agents, industry templates, and AI creative generation.
banana
IncludedAI image generation Creative Director powered by Google Gemini Nano Banana models. Use this skill for ANY request involving image creation, editing, visual asset production, or creative direction. Triggers on: generate an image, create a photo, edit this picture, design a logo, make a banner, visual for my anything, and all /banana commands. Handles text-to-image, image editing, multi-turn creative sessions, batch workflows, and brand presets.
rpg-migration-analyzer
IncludedAnalyzes legacy RPG (Report Program Generator) programs from AS/400 and IBM i systems for migration to modern Java applications. Extracts business logic from RPG III/IV/ILE source code, identifies data structures (D-specs), file operations (F-specs), program dependencies (CALLB/CALLP), and converts RPG constructs to Java equivalents. Generates migration reports, complexity estimates, and Java implementation strategies with POJO classes, JPA entities, and service methods. Use when modernizing AS/400 or IBM i legacy systems, analyzing RPG source files (.rpg, .rpgle, .RPGLE), converting RPG to Java, mapping data specifications to Java classes, planning legacy system migration, or when user mentions RPG analysis, Report Program Generator, RPG III/IV/ILE, AS/400 modernization, IBM i migration, packed decimal conversion, or mainframe application rewrite.
brand-library-architect
IncludedBuild a complete brand library for a product — visual asset render pipeline, brand documentation set (BRAND, COPY, MANIFESTO, BIOS, FAQ, GLOSSARY, TONE, PRICING), open-source convention files (README, CONTRIBUTING, SECURITY, CODE_OF_CONDUCT), and a self-contained press kit. This skill should be used when the user asks to "build a brand library / brand kit / press kit / brand assets" for a product, "set up a brand library workflow," "create a positioning manifesto plus visual identity," or any combination of brand documentation + visual asset pipeline. Apply phase-by-phase or run end-to-end. Templates are product-agnostic and use {{TOKEN}} placeholders the skill prompts the user to fill.
writing-tech-post
IncludedAuthors engineering blog posts end-to-end: launch deep-dives, incident postmortems, architecture migrations, performance case studies, tutorials, AI/agent system writeups, security disclosures, and research-to-product translations. Picks the correct archetype, plans the abstraction ladder, enforces an evidence cadence (diagrams, benchmarks, profiles, traces, code, ablations), tunes voice against publisher house styles (Datadog, Vercel, GitHub, AWS, Meta, Cloudflare, Jane Street), and runs a pre-publish gate for narrative momentum and disclosure ethics. Use when drafting a new engineering post, restructuring a draft that feels flat, deciding which evidence form belongs where, validating that depth and product context are balanced, or preparing a postmortem, migration, or performance narrative for external publication. Do not use for API reference documentation, README authoring, marketing copy, release notes, generic SEO content, ghost-written executive thought leadership, or non-engineering long-form essays.
blog-google
IncludedGoogle API integration for blog performance: PageSpeed Insights, CrUX Core Web Vitals with 25-week history, Search Console performance, URL Inspection, Indexing API, GA4 organic traffic, NLP entity analysis for E-E-A-T, YouTube video search for embedding, and Google Ads Keyword Planner. Progressive feature availability based on credential tier (API key, OAuth/service account, GA4, Ads). Shares config with claude-seo at ~/.config/claude-seo/google-api.json. Use when user says "google data", "page speed", "core web vitals", "search console", "indexation", "GA4", "keyword research", "nlp entities", "blog performance", "youtube search", "google api setup".