Claude
Skills
Sign in
Back

performing-malware-hash-enrichment-with-virustotal

Included with Lifetime
$97 forever

Enrich malware file hashes using the VirusTotal API to retrieve detection rates, behavioral analysis, YARA matches, and contextual threat intelligence for incident triage and IOC validation.

Backend & APIsvirustotalmalware-analysishash-enrichmentiocthreat-intelligencetriageapidetectionscripts

What this skill does

# Performing Malware Hash Enrichment with VirusTotal

## Overview

VirusTotal is the world's largest crowdsourced malware corpus, scanning files with 70+ antivirus engines and providing behavioral analysis, YARA rule matches, network indicators, and community intelligence. This skill covers using the VirusTotal API v3 to enrich file hashes (MD5, SHA-1, SHA-256) with detection verdicts, sandbox reports, related indicators, and contextual intelligence for SOC triage, incident response, and threat intelligence enrichment workflows.


## When to Use

- When conducting security assessments that involve performing malware hash enrichment with virustotal
- When following incident response procedures for related security events
- When performing scheduled security testing or auditing activities
- When validating security controls through hands-on testing

## Prerequisites

- Python 3.9+ with `vt-py` (official VirusTotal Python client) or `requests`
- VirusTotal API key (free tier: 4 requests/minute, 500/day; premium for higher limits)
- Understanding of file hash types: MD5, SHA-1, SHA-256
- Familiarity with AV detection naming conventions
- STIX 2.1 knowledge for IOC representation

## Key Concepts

### VirusTotal API v3

The API provides RESTful endpoints for file reports (`/files/{hash}`), URL scanning, domain reports, IP address intelligence, and advanced hunting with VirusTotal Intelligence (VTI). Each file report includes detection results from 70+ AV engines, behavioral analysis from sandboxes, YARA rule matches, sigma rule matches, file metadata (PE headers, imports, sections), network indicators (contacted IPs, domains, URLs), and community votes and comments.

### Hash Enrichment Workflow

The typical enrichment flow is: receive hash from alert/EDR -> query VT API -> parse detection ratio -> extract behavioral indicators -> correlate with existing intelligence -> make triage decision. The API returns a `last_analysis_stats` object with `malicious`, `suspicious`, `undetected`, and `harmless` counts.

### Pivoting from Hashes

VirusTotal enables pivoting from a single hash to related intelligence: similar files (ITW/in-the-wild samples), contacted domains and IPs (C2 infrastructure), dropped files, embedded URLs, YARA rule matches, and threat actor attribution through crowdsourced intelligence.

## Workflow

### Step 1: Query VirusTotal for Hash Report

```python
import vt
import json
import hashlib
from datetime import datetime

class VTEnricher:
    def __init__(self, api_key):
        self.client = vt.Client(api_key)

    def enrich_hash(self, file_hash):
        """Enrich a file hash with VirusTotal intelligence."""
        try:
            file_obj = self.client.get_object(f"/files/{file_hash}")
            stats = file_obj.last_analysis_stats
            report = {
                "hash": file_hash,
                "sha256": file_obj.sha256,
                "sha1": file_obj.sha1,
                "md5": file_obj.md5,
                "file_type": getattr(file_obj, "type_description", "Unknown"),
                "file_size": getattr(file_obj, "size", 0),
                "first_submission": str(getattr(file_obj, "first_submission_date", "")),
                "last_analysis_date": str(getattr(file_obj, "last_analysis_date", "")),
                "detection_stats": {
                    "malicious": stats.get("malicious", 0),
                    "suspicious": stats.get("suspicious", 0),
                    "undetected": stats.get("undetected", 0),
                    "harmless": stats.get("harmless", 0),
                },
                "detection_ratio": f"{stats.get('malicious', 0)}/{sum(stats.values())}",
                "popular_threat_names": getattr(file_obj, "popular_threat_classification", {}),
                "tags": getattr(file_obj, "tags", []),
                "names": getattr(file_obj, "names", []),
            }
            total_engines = sum(stats.values())
            mal_count = stats.get("malicious", 0)
            report["threat_level"] = (
                "critical" if mal_count > total_engines * 0.7
                else "high" if mal_count > total_engines * 0.4
                else "medium" if mal_count > total_engines * 0.1
                else "low" if mal_count > 0
                else "clean"
            )
            print(f"[+] {file_hash[:16]}... -> {report['detection_ratio']} "
                  f"({report['threat_level'].upper()})")
            return report
        except vt.error.APIError as e:
            print(f"[-] VT API error for {file_hash}: {e}")
            return None

    def get_behavior_report(self, file_hash):
        """Get sandbox behavioral analysis for a file."""
        try:
            behaviors = self.client.get_object(f"/files/{file_hash}/behaviours")
            behavior_data = {
                "processes_created": [],
                "files_written": [],
                "registry_keys_set": [],
                "dns_lookups": [],
                "http_conversations": [],
                "mutexes_created": [],
                "commands_executed": [],
            }
            for sandbox in getattr(behaviors, "data", []):
                attrs = sandbox.get("attributes", {})
                behavior_data["processes_created"].extend(
                    attrs.get("processes_created", []))
                behavior_data["files_written"].extend(
                    [f.get("path", "") for f in attrs.get("files_written", [])])
                behavior_data["registry_keys_set"].extend(
                    [r.get("key", "") for r in attrs.get("registry_keys_set", [])])
                behavior_data["dns_lookups"].extend(
                    [d.get("hostname", "") for d in attrs.get("dns_lookups", [])])
                behavior_data["commands_executed"].extend(
                    attrs.get("command_executions", []))
            return behavior_data
        except Exception as e:
            print(f"[-] Behavior report error: {e}")
            return {}

    def close(self):
        self.client.close()

# Usage
enricher = VTEnricher("YOUR_VT_API_KEY")
report = enricher.enrich_hash("275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f")
print(json.dumps(report, indent=2, default=str))
enricher.close()
```

### Step 2: Batch Hash Enrichment with Rate Limiting

```python
import time
import csv

def batch_enrich(api_key, hash_file, output_file, rate_limit=4):
    """Enrich a list of hashes from a file with rate limiting."""
    enricher = VTEnricher(api_key)
    results = []

    with open(hash_file, "r") as f:
        hashes = [line.strip() for line in f if line.strip()]

    print(f"[*] Enriching {len(hashes)} hashes (rate: {rate_limit}/min)")
    for i, file_hash in enumerate(hashes):
        report = enricher.enrich_hash(file_hash)
        if report:
            results.append(report)
        if (i + 1) % rate_limit == 0:
            print(f"  [{i+1}/{len(hashes)}] Rate limit pause (60s)...")
            time.sleep(60)

    # Export to CSV
    with open(output_file, "w", newline="") as f:
        if results:
            writer = csv.DictWriter(f, fieldnames=results[0].keys())
            writer.writeheader()
            for r in results:
                flat = {k: str(v) for k, v in r.items()}
                writer.writerow(flat)

    print(f"[+] Enrichment complete: {len(results)}/{len(hashes)} hashes")
    print(f"[+] Results saved to {output_file}")
    enricher.close()
    return results

batch_enrich("YOUR_API_KEY", "hashes.txt", "enrichment_results.csv")
```

### Step 3: Extract Network Indicators for Pivoting

```python
def extract_network_iocs(api_key, file_hash):
    """Extract network-based IOCs from VT for C2 identification."""
    client = vt.Client(api_key)
    network_iocs = {
        "contacted_ips": [],
        "contacted_domains": [],
        "contacted_urls": [],
        "embedded_urls": [],
    }

    try:
        # Get contacted IPs
        it = client.iterator(f"/file

Related in Backend & APIs