comp-scout-scrape

Included with Lifetime

$97 forever

Scrape competition websites, extract structured data, and auto-persist to GitHub issues. Creates issues for new competitions, adds comments for duplicates.

General

What this skill does


# Competition Scraper

Scrape creative writing competitions from Australian aggregator sites and **automatically persist to GitHub**.

## What This Skill Does

1. Scrapes competitions.com.au and netrewards.com.au
2. Extracts structured data (dates, prompts, prizes)
3. **Checks for duplicates** against existing GitHub issues (by URL and title similarity)
4. Creates issues for **NEW** competitions only
5. Adds comments to existing issues when same competition found on another site
6. Skips competitions that are already tracked

**The scraper already filters out sponsored/lottery ads. Your job is to check for duplicates, then persist only new competitions.**

## What Counts as "New"

A competition is NEW if:
- Its URL is not found in any existing issue body (check the full body text, not just the primary URL field)
- AND its normalized title is <80% similar to all existing issue titles

A competition is a DUPLICATE if:
- Its URL appears anywhere in an existing issue (body text, comments) → already tracked, skip
- Its normalized title is >80% similar to an existing issue title → likely same competition, skip
- Same competition found on a different aggregator site → add comment to existing issue noting the alternate URL

**Note:** An issue body may contain multiple URLs (one per aggregator site). When checking for duplicates, search the entire issue body for the scraped URL, not just a specific field.

## Word Limit Clarification

**"25WOL" is a category name, NOT a filter.** Competitions with 25, 50, or 100 word limits are all valid creative writing competitions - persist them all (if new).

## Prerequisites

```bash
pip install playwright
playwright install chromium
```

Also requires:
- `gh` CLI authenticated
- Target repository for competition data (not this skills repo)

## Workflow

### Step 1: Determine Target Repository

The target repo stores competition issues. Specify or get from config:

```bash
# From workspace config (if hiivmind-pulse-gh initialized)
TARGET_REPO=$(yq '.repositories[0].full_name' .hiivmind/github/config.yaml 2>/dev/null)

# Or use default/specified
TARGET_REPO="${TARGET_REPO:-discreteds/competition-data}"
```

### Step 2: Scrape Listings

Run the scraper to get structured competition data:

```bash
python skills/comp-scout-scrape/scraper.py listings
```

**Output:**
```json
{
  "competitions": [
    {
      "url": "https://competitions.com.au/win-example/",
      "site": "competitions.com.au",
      "title": "Win a $500 Gift Card",
      "normalized_title": "500 gift card",
      "brand": "Example Brand",
      "prize_summary": "$500",
      "prize_value": 500,
      "closing_date": "2024-12-31"
    }
  ],
  "scrape_date": "2024-12-09",
  "errors": []
}
```

### Step 3: Check for Existing Issues

For each scraped competition, check if it already exists:

```bash
# Get all open competition issues
gh issue list -R "$TARGET_REPO" \
  --label "competition" \
  --state open \
  --json number,title,body \
  --limit 200
```

**Match by:**
1. URL in issue body (exact match = definite duplicate)
2. Normalized title similarity (>80% = likely duplicate)

### Step 4: Fetch Details for New Competitions

For competitions not already tracked, get full details:

```bash
python skills/comp-scout-scrape/scraper.py detail "https://competitions.com.au/win-example/"
```

For multiple new competitions, use batch mode:

```bash
echo '{"urls": ["url1", "url2", ...]}' | python skills/comp-scout-scrape/scraper.py details-batch
```

### Step 4.5: Apply Auto-Tagging Rules (NOT Filtering)

**IMPORTANT: Auto-tagging is for LABELING issues, not for skipping/excluding competitions.**

Check competitions against user preferences from the data repo's CLAUDE.md to determine which labels to apply.

1. Fetch preferences:
```bash
gh api repos/$TARGET_REPO/contents/CLAUDE.md -H "Accept: application/vnd.github.raw" 2>/dev/null
```

2. Parse the Detection Keywords section for tagging rules

3. For each competition, check if title/prize matches any keywords:
```
For each tag_rule in [for-kids, cruise]:
  For each keyword in tag_rule.keywords:
    If keyword.lower() in (competition.title + competition.prize_summary).lower():
      Add tag_rule.label to issue labels
```

4. **ALL competitions are ALWAYS persisted as issues.** Tagged competitions:
   - Get the relevant label applied (e.g., `for-kids`, `cruise`)
   - Are closed immediately with explanation comment
   - But they ARE STILL CREATED as issues (for record-keeping and potential review)

### Step 5: Auto-Persist Results

#### For New Competitions → Create Issue

```bash
gh issue create -R "$TARGET_REPO" \
  --title "$TITLE" \
  --label "competition" \
  --label "25wol" \
  --body "$(cat <<'EOF'
## Competition Details

**URL:** {url}
**Brand:** {brand}
**Prize:** {prize_summary}
**Word Limit:** {word_limit} words
**Closes:** {closing_date}
**Draw Date:** {draw_date}
**Winners Notified:** {notification_info}

## Prompt

> {prompt}

---
*Scraped from {site} on {scrape_date}*
EOF
)"
```

Then set milestone by closing month:
```bash
gh issue edit $ISSUE_NUMBER -R "$TARGET_REPO" --milestone "December 2024"
```

#### For Duplicates → Add Comment

If competition URL found on another site:

```bash
gh issue comment $EXISTING_ISSUE -R "$TARGET_REPO" --body "$(cat <<'EOF'
### Also found on {other_site}

**URL:** {url}
**Title on this site:** {title}
*Discovered: {date}*
EOF
)"
```

#### For Filtered Competitions → Create Issue + Close

If competition matched auto-filter keywords:

```bash
# Create the issue first (for record-keeping)
ISSUE_URL=$(gh issue create -R "$TARGET_REPO" \
  --title "$TITLE" \
  --label "competition" \
  --label "25wol" \
  --label "$FILTER_LABEL" \
  --body "...")

# Extract issue number
ISSUE_NUMBER=$(echo "$ISSUE_URL" | grep -oE '[0-9]+$')

# Close with explanation
gh issue close $ISSUE_NUMBER -R "$TARGET_REPO" --comment "$(cat <<'EOF'
Auto-filtered: matches '$KEYWORD' in $FILTER_RULE preferences.

See CLAUDE.md in this repository for filter settings.
EOF
)"
```

### Step 6: Report Results

Present confirmation to user:

```
✅ Scrape complete!

**Created 3 new issues:**
- #42: Win a $500 Coles Gift Card (closes Dec 31)
- #43: Win a Trip to Bali (closes Jan 15)
- #44: Win a Year's Supply of Coffee (closes Dec 20)

**Auto-filtered 2 (created + closed):**
- #45: Win Lego Set (for-kids: matched "Lego")
- #46: Win P&O Cruise (cruise: matched "P&O")

**Found 2 duplicates (added as comments):**
- #38: Win Woolworths Gift Cards (also on netrewards.com.au)
- #39: Win Dreamworld Experience (also on netrewards.com.au)

**Skipped 7 already tracked**
```

**IMPORTANT:** Do NOT ask "Would you like me to analyze these?" at the end. When invoked by `comp-scout-daily`, the workflow will automatically invoke analyze/compose skills next. Report results and stop.

## Output Fields

### Listing Output

| Field | Type | Description |
|-------|------|-------------|
| url | string | Full URL to competition detail page |
| site | string | Source site (competitions.com.au or netrewards.com.au) |
| title | string | Competition title as displayed |
| normalized_title | string | Lowercase, prefixes stripped, for matching |
| brand | string | Sponsor/brand name (if available) |
| prize_summary | string | Prize description or value badge |
| prize_value | int/null | Numeric value in dollars |
| closing_date | string/null | YYYY-MM-DD format |

### Detail Output

All listing fields plus:

| Field | Type | Description |
|-------|------|-------------|
| prompt | string | The actual competition question/prompt |
| word_limit | int | Maximum words (default 25) |
| entry_method | string | How to submit entry |
| winner_notification | object/null | Notification details from JSON-LD |
| scraped_at | string | ISO timestamp of scrape |

### Winner Notification Object

| Field | Type | Description |
|-------|------|-------------|
| notification_text | string | Raw notification text |
| notification_date | string/null | Specific dat

Files: 2

Size: 47.9 KB

Complexity: 34/100

Category: General

Source: https://github.com/discreteds/competition-scout/tree/main/skills/comp-scout-scrape

Related in General

modeling-omnistudio-epc-catalog

Included

Salesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).

Generalscripts

relationship-science-coach

Included

Use this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.

Generalscripts

building-sf-integrations

Included

Salesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).

Generalscripts

venue-templates

Included

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

Generalscripts

let-fate-decide

Included

Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.

Generalscripts

net-ops

Included

Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.

Generalscripts