Claude
Skills
Sign in
Back

performing-ai-driven-osint-correlation

Included with Lifetime
$97 forever

Use AI and LLM-based reasoning to correlate findings across multiple OSINT sources—username enumeration, email lookups, social media profiles, domain records, breach databases, and dark-web mentions—into unified intelligence profiles with confidence scoring and link analysis.

AI Agentsosintai-correlationthreat-intelligencereconnaissancelink-analysistarget-profilingsherlocktheharvesterscripts

What this skill does


# Performing AI-Driven OSINT Correlation

## When to Use

- You have collected raw OSINT data from multiple tools and sources but need to identify connections, contradictions, and patterns across them.
- You need to build a unified intelligence profile for a target entity (person, organization, or infrastructure) from fragmented data.
- Traditional manual correlation is too slow or error-prone for the volume of data collected.
- You want confidence-scored assessments of identity linkage across platforms rather than simple keyword matching.

## Prerequisites

- Python 3.10+ with `requests`, `json`, and `csv` libraries
- [Sherlock](https://github.com/sherlock-project/sherlock) installed (`pip install sherlock-project`)
- [theHarvester](https://github.com/laramies/theHarvester) installed (`pip install theHarvester`)
- [SpiderFoot](https://github.com/smicallef/spiderfoot) 4.0+ running on localhost:5001
- Access to an LLM API (OpenAI, Anthropic, or local model via Ollama)
- Optional: Maltego CE for graph visualization of correlation results
- Optional: API keys for Shodan, VirusTotal, HaveIBeenPwned, Hunter.io

## Workflow

### Legal & Ethical Requirements

- Obtain documented written authorization before any investigation
- Establish lawful basis for data processing (law enforcement, corporate policy, etc.)
- Define PII retention limits and data handling procedures
- Comply with local privacy regulations (GDPR, CCPA, etc.)

### Phase 1 — Multi-Source OSINT Collection

0. **Create the working directory for all OSINT outputs:**

   ```bash
   mkdir -p /tmp/osint
   ```

1. **Enumerate usernames across platforms with Sherlock:**

   ```bash
   sherlock "targetusername" --output /tmp/osint/sherlock-results.txt --csv
   ```

2. **Harvest emails, subdomains, and hosts with theHarvester:**

   ```bash
   theHarvester -d targetdomain.com -b all -f /tmp/osint/harvester-results.json
   ```

3. **Run a SpiderFoot passive scan via REST API:**

   ```bash
   curl -s http://localhost:5001/api/scan/start \
     -d "scanname=target-recon&scantarget=targetdomain.com&usecase=passive" \
     | jq '.scanid'
   ```

4. **Export SpiderFoot results when scan completes:**

   ```bash
   SCAN_ID="<scanid_from_step_3>"
   curl -s "http://localhost:5001/api/scan/${SCAN_ID}/results?type=all" \
     -o /tmp/osint/spiderfoot-results.json
   ```

5. **Query breach databases for email exposure (example with HIBP API):**

   ```bash
   curl -s -H "hibp-api-key: ${HIBP_KEY}" \
     -H "User-Agent: OSINT-Correlation-Skill" \
     "https://haveibeenpwned.com/api/v3/breachedaccount/[email protected]" \
     -o /tmp/osint/breach-results.json
   ```

### Phase 2 — Data Normalization

6. **Normalize all collected data into a common schema.** Create a unified JSON structure that tags each finding with its source, timestamp, and data type:

   ```bash
   cat > /tmp/osint/normalize.py << 'EOF'
   import json, csv, sys, os
   from datetime import datetime

   findings = []

   # Normalize Sherlock CSV results
   sherlock_path = "/tmp/osint/sherlock-results.txt"
   if os.path.exists(sherlock_path):
       with open(sherlock_path) as f:
           for row in csv.DictReader(f):
               findings.append({
                   "source": "sherlock",
                   "type": "social_profile",
                   "platform": row.get("name", ""),
                   "url": row.get("url_user", ""),
                   "username": row.get("username", ""),
                   "status": row.get("status", ""),
                   "collected_at": datetime.utcnow().isoformat()
               })

   # Normalize theHarvester JSON results
   harvester_path = "/tmp/osint/harvester-results.json"
   if os.path.exists(harvester_path):
       with open(harvester_path) as f:
           data = json.load(f)
           for email in data.get("emails", []):
               findings.append({
                   "source": "theHarvester",
                   "type": "email",
                   "value": email,
                   "collected_at": datetime.utcnow().isoformat()
               })
           for host in data.get("hosts", []):
               findings.append({
                   "source": "theHarvester",
                   "type": "hostname",
                   "value": host,
                   "collected_at": datetime.utcnow().isoformat()
               })

   # Normalize SpiderFoot results
   sf_path = "/tmp/osint/spiderfoot-results.json"
   if os.path.exists(sf_path):
       with open(sf_path) as f:
           for item in json.load(f):
               findings.append({
                   "source": "spiderfoot",
                   "type": item.get("type", "unknown"),
                   "value": item.get("data", ""),
                   "module": item.get("module", ""),
                   "collected_at": datetime.utcnow().isoformat()
               })

   with open("/tmp/osint/normalized-findings.json", "w") as f:
       json.dump(findings, f, indent=2)

   print(f"Normalized {len(findings)} findings from {len(set(f['source'] for f in findings))} sources")
   EOF
   python3 /tmp/osint/normalize.py
   ```

### Phase 3 — AI-Driven Correlation

7. **Send normalized findings to an LLM for cross-source correlation analysis:**

   ```bash
   cat > /tmp/osint/correlate.py << 'PYEOF'
   import json, os
   from openai import OpenAI  # or anthropic, ollama, etc.

   client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

   with open("/tmp/osint/normalized-findings.json") as f:
       findings = json.load(f)

   correlation_prompt = f"""You are an OSINT analyst. Analyze these findings collected
   from multiple sources and produce a correlation report.

   For each identity or entity you detect:
   1. List all linked accounts/profiles with the evidence connecting them.
   2. Assign a confidence score (0.0-1.0) for each linkage based on:
      - Exact username match across platforms (high)
      - Similar usernames with shared metadata (medium)
      - Same email in breach data and registration (high)
      - Co-occurring infrastructure (IP, domain) (medium)
      - Temporal correlation of account creation dates (low-medium)
   3. Identify contradictions or potential false positives.
   4. Flag high-risk exposures (breached credentials, PII leaks, infrastructure overlaps).
   5. Produce a structured JSON report.

   Raw findings:
   {json.dumps(findings[:500], indent=2)}
   """

   response = client.chat.completions.create(
       model="gpt-4o",
       messages=[
           {"role": "system", "content": "You are an expert OSINT analyst specializing in identity correlation and link analysis."},
           {"role": "user", "content": correlation_prompt}
       ],
       temperature=0.1,
       response_format={"type": "json_object"}
   )

   report = json.loads(response.choices[0].message.content)

   with open("/tmp/osint/correlation-report.json", "w") as f:
       json.dump(report, f, indent=2)

   print(json.dumps(report, indent=2))
   PYEOF
   python3 /tmp/osint/correlate.py
   ```

8. **Perform entity resolution — deduplicate and merge related identities:**

   ```bash
   cat > /tmp/osint/resolve.py << 'PYEOF'
   import json

   with open("/tmp/osint/correlation-report.json") as f:
       report = json.load(f)

   # Extract entities and build a link graph
   entities = report.get("entities", [])
   print(f"Identified {len(entities)} distinct entities")
   for entity in entities:
       name = entity.get("identifier", "unknown")
       confidence = entity.get("confidence", 0)
       links = entity.get("linked_accounts", [])
       risk = entity.get("risk_level", "unknown")
       print(f"  [{confidence:.0%}] {name} — {len(links)} linked accounts — risk: {risk}")
   PYEOF
   python3 /tmp/osint/resolve.py
   ```

### Phase 4 — Reporting and Visualization

9. **Generate a final intelligence profile in Markdown:**

   ```bash
   cat > /tmp/osint/report.py << 'PYEOF'
   import json
   from datetime import datetime

   with open("/tmp/

Related in AI Agents