Claude
Skills
Sign in
Back

exploiting-excessive-data-exposure-in-api

Included with Lifetime
$97 forever

Tests APIs for excessive data exposure where endpoints return more data than the client application needs, relying on the frontend to filter sensitive fields. The tester intercepts API responses and analyzes them for leaked PII, internal identifiers, debug information, or sensitive business data that the UI does not display but the API transmits. This maps to OWASP API3:2023 Broken Object Property Level Authorization. Activates for requests involving API data leakage testing, excessive data exposure, response filtering bypass, or API over-fetching.

Designapi-securityowaspdata-exposurerest-securitypii-leakagescripts

What this skill does

# Exploiting Excessive Data Exposure in API

## When to Use

- Testing APIs where the frontend displays a subset of data but the API response includes additional fields
- Assessing mobile application APIs where responses are designed for multiple client types and may contain excess data
- Identifying PII leakage in API responses that include email addresses, phone numbers, SSNs, or payment data not shown in the UI
- Testing GraphQL APIs where clients can request arbitrary fields including sensitive attributes
- Evaluating APIs after microservice refactoring where internal service-to-service data leaks into public endpoints

**Do not use** without written authorization. Data exposure testing involves capturing and analyzing potentially sensitive personal data.

## Prerequisites

- Written authorization specifying target API endpoints and scope
- Burp Suite Professional or mitmproxy configured as intercepting proxy
- Two test accounts at different privilege levels (regular user and admin)
- Browser developer tools or mobile proxy setup for traffic capture
- Python 3.10+ with `requests` and `json` libraries
- API documentation (OpenAPI spec) for comparison against actual responses


> **Legal Notice:** This skill is for authorized security testing and educational purposes only. Unauthorized use against systems you do not own or have written permission to test is illegal and may violate computer fraud laws.

## Workflow

### Step 1: Response Schema Discovery

Compare documented API responses with actual responses:

```python
import requests
import json

BASE_URL = "https://target-api.example.com/api/v1"
headers = {"Authorization": "Bearer <user_token>", "Content-Type": "application/json"}

# Fetch a resource and analyze all returned fields
endpoints_to_test = [
    ("GET", "/users/me", None),
    ("GET", "/users/me/orders", None),
    ("GET", "/products", None),
    ("GET", "/users/me/settings", None),
    ("GET", "/transactions", None),
]

for method, path, body in endpoints_to_test:
    resp = requests.request(method, f"{BASE_URL}{path}", headers=headers, json=body)
    if resp.status_code == 200:
        data = resp.json()
        # Recursively extract all field names
        def extract_fields(obj, prefix=""):
            fields = []
            if isinstance(obj, dict):
                for k, v in obj.items():
                    full_key = f"{prefix}.{k}" if prefix else k
                    fields.append(full_key)
                    fields.extend(extract_fields(v, full_key))
            elif isinstance(obj, list) and obj:
                fields.extend(extract_fields(obj[0], f"{prefix}[]"))
            return fields

        all_fields = extract_fields(data)
        print(f"\n{method} {path} - {len(all_fields)} fields returned:")
        for f in sorted(all_fields):
            print(f"  {f}")
```

### Step 2: Sensitive Data Pattern Detection

Scan API responses for sensitive data patterns:

```python
import re

SENSITIVE_PATTERNS = {
    "email": r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
    "phone": r'(\+?1?\s?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4})',
    "ssn": r'\b\d{3}-\d{2}-\d{4}\b',
    "credit_card": r'\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})\b',
    "password_hash": r'\$2[aby]?\$\d{2}\$[./A-Za-z0-9]{53}',
    "api_key": r'(?:api[_-]?key|apikey)["\s:=]+["\']?([a-zA-Z0-9_\-]{20,})',
    "internal_ip": r'\b(?:10\.\d{1,3}|172\.(?:1[6-9]|2\d|3[01])|192\.168)\.\d{1,3}\.\d{1,3}\b',
    "aws_key": r'AKIA[0-9A-Z]{16}',
    "jwt_token": r'eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+',
    "uuid": r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
}

SENSITIVE_FIELD_NAMES = [
    "password", "password_hash", "secret", "token", "ssn", "social_security",
    "credit_card", "card_number", "cvv", "pin", "private_key", "api_key",
    "internal_id", "debug", "trace", "stack_trace", "created_by_ip",
    "last_login_ip", "salt", "session_id", "refresh_token", "mfa_secret",
    "date_of_birth", "bank_account", "routing_number", "tax_id"
]

def scan_response(endpoint, response_text):
    findings = []
    # Check for sensitive data patterns in values
    for pattern_name, pattern in SENSITIVE_PATTERNS.items():
        matches = re.findall(pattern, response_text)
        if matches:
            findings.append({
                "endpoint": endpoint,
                "type": "sensitive_value",
                "pattern": pattern_name,
                "count": len(matches),
                "sample": matches[0][:20] + "..." if len(matches[0]) > 20 else matches[0]
            })

    # Check for sensitive field names
    response_lower = response_text.lower()
    for field in SENSITIVE_FIELD_NAMES:
        if f'"{field}"' in response_lower or f"'{field}'" in response_lower:
            findings.append({
                "endpoint": endpoint,
                "type": "sensitive_field",
                "field_name": field
            })

    return findings

# Scan all endpoint responses
for method, path, body in endpoints_to_test:
    resp = requests.request(method, f"{BASE_URL}{path}", headers=headers, json=body)
    if resp.status_code == 200:
        findings = scan_response(f"{method} {path}", resp.text)
        for f in findings:
            print(f"[FINDING] {f['endpoint']}: {f['type']} - {f.get('pattern', f.get('field_name'))}")
```

### Step 3: Compare UI Display vs API Response

```python
# Fields the UI shows (observed from the frontend application)
ui_displayed_fields = {
    "/users/me": {"name", "email", "avatar_url", "role"},
    "/users/me/orders": {"order_id", "date", "status", "total"},
    "/products": {"id", "name", "price", "image_url", "description"},
}

# Fields the API actually returns
for method, path, body in endpoints_to_test:
    resp = requests.request(method, f"{BASE_URL}{path}", headers=headers, json=body)
    if resp.status_code == 200:
        data = resp.json()
        if isinstance(data, list):
            actual_fields = set(data[0].keys()) if data else set()
        elif isinstance(data, dict):
            # Handle paginated responses
            items_key = next((k for k in data if isinstance(data[k], list)), None)
            if items_key and data[items_key]:
                actual_fields = set(data[items_key][0].keys())
            else:
                actual_fields = set(data.keys())
        else:
            continue

        expected = ui_displayed_fields.get(path, set())
        excess = actual_fields - expected
        if excess:
            print(f"\n{method} {path} - EXCESS FIELDS (not shown in UI):")
            for field in sorted(excess):
                print(f"  - {field}")
```

### Step 4: Test User Object Exposure in Related Endpoints

```python
# Many APIs embed full user objects in responses for orders, comments, etc.
endpoints_with_user_objects = [
    "/orders",          # Each order may include full seller/buyer profile
    "/comments",        # Comments may include full author profile
    "/reviews",         # Reviews may expose reviewer details
    "/transactions",    # Transactions may include counterparty info
    "/team/members",    # Team listing may expose excessive member data
]

for path in endpoints_with_user_objects:
    resp = requests.get(f"{BASE_URL}{path}", headers=headers)
    if resp.status_code == 200:
        text = resp.text
        # Check for user data leakage in nested objects
        user_fields_found = []
        for field in ["password_hash", "last_login_ip", "mfa_enabled", "phone_number",
                      "date_of_birth", "ssn", "internal_notes", "salary", "address"]:
            if f'"{field}"' in text:
                user_fields_found.append(field)
        if user_fields_found:
            print(f"[EXCESSIVE] {path} exposes user fields: {user_fields_found}")
```

### Step 5: GraphQL Over-Fetching Analysis

```python
# GraphQL allows clients to request any available field
GRAPHQL_URL = f"{BASE_URL}/graphql"

# Introspection quer

Related in Design