detecting-dns-exfiltration-with-dns-query-analysis
Detect data exfiltration through DNS tunneling by analyzing query entropy, subdomain length, query volume, TXT record abuse, and response payload sizes using passive DNS monitoring.
What this skill does
# Detecting DNS Exfiltration with DNS Query Analysis
## Overview
DNS exfiltration exploits the Domain Name System as a covert channel to extract data from compromised networks. Attackers encode stolen data into DNS query names (subdomains) or DNS response records (TXT, CNAME, NULL), bypassing traditional security controls that typically allow DNS traffic unrestricted. Tools like iodine, dnscat2, and dns2tcp enable full TCP tunneling over DNS. Detection requires analyzing DNS query patterns for anomalies including excessive query length, high entropy subdomain strings, abnormal query volumes to single domains, and oversized TXT record responses. This skill covers building a comprehensive DNS exfiltration detection capability using passive DNS analysis, statistical methods, and machine learning approaches.
## When to Use
- When investigating security incidents that require detecting dns exfiltration with dns query analysis
- When building detection rules or threat hunting queries for this domain
- When SOC analysts need structured procedures for this analysis type
- When validating security monitoring coverage for related attack techniques
## Prerequisites
- Access to DNS query logs (passive DNS capture, DNS server logs, or PCAP)
- Zeek, Suricata, or tcpdump for DNS traffic capture
- Python 3.8+ with scipy, numpy, pandas, and scikit-learn
- SIEM platform for alert correlation
- Baseline of normal DNS traffic patterns for the environment
## Core Concepts
### DNS Tunneling Mechanics
DNS exfiltration encodes data in different parts of DNS messages:
**Outbound (Query-based exfiltration):**
```
Encoded data as subdomain labels:
dGhlIHNlY3JldCBkYXRh.exfil.attacker.com
[base64-encoded data].[tunnel domain]
Query types used: A, AAAA, CNAME, MX, TXT, NULL
```
**Inbound (Response-based command channel):**
```
TXT records carry encoded commands/data in responses
CNAME records chain encoded data through multiple labels
NULL records carry arbitrary binary data
```
### Detection Indicators
| Indicator | Normal DNS | DNS Tunneling |
|-----------|-----------|---------------|
| Subdomain length | 5-20 chars | 40-253 chars |
| Label count | 2-4 labels | 5-10+ labels |
| Shannon entropy | 2.5-3.5 bits | 4.0-5.5 bits |
| Query volume (per domain) | Variable | 100s-1000s/min |
| TXT response size | < 100 bytes | 200-4000+ bytes |
| Unique subdomains | Low | Very high |
| Query type distribution | Mostly A/AAAA | Heavy TXT, NULL, CNAME |
### Common Tunneling Tools
| Tool | Protocol | Encoding | Detection Difficulty |
|------|----------|----------|---------------------|
| iodine | IP-over-DNS | Base32/Base64/Raw | Medium |
| dnscat2 | TCP-over-DNS | Hex encoding | Medium |
| dns2tcp | TCP-over-DNS | Base64 | Medium |
| DNSExfiltrator | Custom | Base64 | Low |
| Cobalt Strike DNS | C2 over DNS | Custom encoding | High |
## Workflow
### Step 1: Capture DNS Traffic
**Using Zeek:**
```bash
# Live capture
zeek -i eth0 -C base/protocols/dns
# Offline PCAP analysis
zeek -r traffic.pcap base/protocols/dns
# Output: dns.log with query, qtype, answers, TTL
```
**Using tcpdump:**
```bash
# Capture all DNS traffic
tcpdump -i eth0 -w dns_capture.pcap port 53
# Capture with size filter (large DNS packets)
tcpdump -i eth0 -w large_dns.pcap 'port 53 and greater 512'
```
**Using Suricata:**
```yaml
# In suricata.yaml, enable DNS logging
outputs:
- eve-log:
types:
- dns:
query: yes
answer: yes
formats: [detailed]
```
### Step 2: Analyze Query Characteristics
Python script for DNS exfiltration detection:
```python
#!/usr/bin/env python3
"""DNS Exfiltration Detector - Analyzes DNS logs for tunneling indicators."""
import json
import math
import re
import sys
from collections import defaultdict
from datetime import datetime, timedelta
import pandas as pd
def calculate_entropy(domain: str) -> float:
"""Calculate Shannon entropy of a string."""
if not domain:
return 0.0
freq = defaultdict(int)
for char in domain:
freq[char] += 1
length = len(domain)
entropy = -sum(
(count / length) * math.log2(count / length)
for count in freq.values()
)
return entropy
def extract_subdomain(query: str) -> str:
"""Extract subdomain portion from FQDN."""
parts = query.rstrip('.').split('.')
if len(parts) > 2:
return '.'.join(parts[:-2])
return ''
def get_base_domain(query: str) -> str:
"""Extract registered domain from FQDN."""
parts = query.rstrip('.').split('.')
if len(parts) >= 2:
return '.'.join(parts[-2:])
return query
def is_base64_like(s: str) -> bool:
"""Check if string resembles base64 encoding."""
b64_chars = set('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=')
if len(s) < 10:
return False
char_ratio = sum(1 for c in s if c in b64_chars) / len(s)
return char_ratio > 0.9 and calculate_entropy(s) > 4.0
def is_hex_encoded(s: str) -> bool:
"""Check if string appears hex-encoded."""
hex_chars = set('0123456789abcdefABCDEF')
if len(s) < 16:
return False
clean = s.replace('.', '').replace('-', '')
return all(c in hex_chars for c in clean) and len(clean) % 2 == 0
class DNSExfiltrationDetector:
def __init__(self):
self.domain_stats = defaultdict(lambda: {
'query_count': 0,
'unique_subdomains': set(),
'total_subdomain_length': 0,
'entropy_sum': 0.0,
'query_types': defaultdict(int),
'source_ips': set(),
'first_seen': None,
'last_seen': None,
'txt_response_sizes': [],
})
# Detection thresholds
self.thresholds = {
'min_query_count': 50,
'min_unique_subdomains': 30,
'avg_subdomain_length': 30,
'avg_entropy': 3.8,
'unique_ratio': 0.7,
'txt_query_ratio': 0.3,
'max_label_length': 63,
'max_subdomain_labels': 5,
}
def process_query(self, timestamp, src_ip, query, qtype, response_size=0):
"""Process a single DNS query and update statistics."""
base_domain = get_base_domain(query)
subdomain = extract_subdomain(query)
stats = self.domain_stats[base_domain]
stats['query_count'] += 1
stats['unique_subdomains'].add(subdomain)
stats['total_subdomain_length'] += len(subdomain)
stats['entropy_sum'] += calculate_entropy(subdomain)
stats['query_types'][qtype] += 1
stats['source_ips'].add(src_ip)
if stats['first_seen'] is None:
stats['first_seen'] = timestamp
stats['last_seen'] = timestamp
if qtype in ('TXT', 'NULL') and response_size > 0:
stats['txt_response_sizes'].append(response_size)
def analyze(self):
"""Analyze accumulated statistics and return suspicious domains."""
alerts = []
for domain, stats in self.domain_stats.items():
if stats['query_count'] < self.thresholds['min_query_count']:
continue
unique_count = len(stats['unique_subdomains'])
avg_length = stats['total_subdomain_length'] / stats['query_count']
avg_entropy = stats['entropy_sum'] / stats['query_count']
unique_ratio = unique_count / stats['query_count']
txt_queries = stats['query_types'].get('TXT', 0) + stats['query_types'].get('NULL', 0)
txt_ratio = txt_queries / stats['query_count']
score = 0
indicators = []
if avg_length > self.thresholds['avg_subdomain_length']:
score += 25
indicators.append(f"high_avg_subdomain_length={avg_length:.1f}")
if avg_entropy > self.thresholds['avg_entropy']:
score += 25
indicators.append(f"high_entropy={avg_entropy:.2f}")
ifRelated in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.