performing-malware-triage-with-yara
Performs rapid malware triage and classification using YARA rules to match file patterns, strings, byte sequences, and structural characteristics against known malware families and suspicious indicators. Covers rule writing, scanning, and integration with analysis pipelines. Activates for requests involving YARA rule creation, malware classification, pattern matching, sample triage, or signature-based detection.
What this skill does
# Performing Malware Triage with YARA
## When to Use
- Rapidly classifying a large batch of malware samples against known family signatures
- Writing detection rules for a newly analyzed malware family based on unique byte patterns
- Scanning file shares, endpoints, or memory dumps for indicators of a specific threat
- Building automated triage pipelines that classify samples before manual analysis
- Hunting for variants of a known threat across an enterprise using YARA scans
**Do not use** as the sole analysis method; YARA triage identifies known patterns but does not reveal new or unknown malware behaviors.
## Prerequisites
- YARA 4.x installed (`apt install yara` or `pip install yara-python`)
- YARA rule repositories (YARA-Rules, awesome-yara, Malpedia rules, Florian Roth's signature-base)
- Python 3.8+ with `yara-python` for scripted scanning
- Sample collection organized in a directory structure for batch scanning
- Understanding of PE file format, hex patterns, and regular expressions for rule writing
## Workflow
### Step 1: Scan Samples with Existing Rule Sets
Apply community and commercial YARA rules to classify samples:
```bash
# Scan a single file
yara -s malware_rules.yar suspect.exe
# Scan a directory of samples
yara -r malware_rules.yar /path/to/samples/
# Scan with multiple rule files
yara -r rules/apt_rules.yar rules/ransomware_rules.yar rules/trojan_rules.yar suspect.exe
# Scan with timeout (prevent hanging on large files)
yara -t 30 malware_rules.yar suspect.exe
# Scan and show matching strings
yara -s -r malware_rules.yar suspect.exe
# Scan with compiled rules (faster for repeated scans)
yarac malware_rules.yar compiled_rules.yarc
yara compiled_rules.yarc suspect.exe
```
```bash
# Download community rule sets
git clone https://github.com/Yara-Rules/rules.git yara-community-rules
git clone https://github.com/Neo23x0/signature-base.git signature-base
# Scan with signature-base
yara -r signature-base/yara/*.yar suspect.exe
```
### Step 2: Write Rules for Unique String Patterns
Create YARA rules based on strings extracted during malware analysis:
```
rule MalwareX_Strings {
meta:
description = "Detects MalwareX based on unique strings"
author = "analyst"
date = "2025-09-15"
reference = "Internal Analysis Report #1547"
hash = "e3b0c44298fc1c149afbf4c8996fb924"
tlp = "WHITE"
strings:
// C2 URL pattern
$url1 = "/gate.php?id=" ascii
$url2 = "/panel/connect.php" ascii
// Unique mutex name
$mutex = "Global\\CryptLocker_2025" ascii wide
// User-Agent string
$ua = "Mozilla/5.0 (compatible; MSIE 10.0)" ascii
// Registry persistence path
$reg = "Software\\Microsoft\\Windows\\CurrentVersion\\Run\\WindowsUpdate" ascii
// Campaign identifier
$campaign = "campaign_2025_q3" ascii
condition:
uint16(0) == 0x5A4D and // PE file (MZ header)
filesize < 500KB and // Size constraint
($url1 or $url2) and // At least one C2 URL
($mutex or $campaign) and // Campaign identifier
$ua // Specific User-Agent
}
```
### Step 3: Write Rules for Byte Patterns
Create rules matching specific code sequences:
```
rule MalwareX_Decryptor {
meta:
description = "Detects MalwareX XOR decryption routine"
author = "analyst"
date = "2025-09-15"
strings:
// XOR decryption loop (x86 assembly)
// mov al, [esi+ecx]
// xor al, [edi+ecx]
// mov [esi+ecx], al
// inc ecx
// cmp ecx, edx
// jl loop
$xor_loop = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 }
// RC4 KSA initialization (256-byte loop)
$rc4_ksa = { 33 C0 88 04 ?8 40 3D 00 01 00 00 7? }
// Embedded RSA public key marker
$rsa_key = { 06 02 00 00 00 A4 00 00 52 53 41 31 } // PUBLICKEYBLOB
condition:
uint16(0) == 0x5A4D and
($xor_loop or $rc4_ksa) and
$rsa_key
}
```
### Step 4: Write Rules with PE Module
Leverage YARA's PE module for structural detection:
```
import "pe"
import "hash"
import "math"
rule MalwareX_PE_Characteristics {
meta:
description = "Detects MalwareX by PE structure and imports"
author = "analyst"
condition:
pe.is_pe and
// Compiled within specific timeframe
pe.timestamp > 1693526400 and // After 2023-09-01
pe.timestamp < 1727740800 and // Before 2024-10-01
// Specific import hash
pe.imphash() == "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6" or
// Suspicious import combination
(
pe.imports("kernel32.dll", "VirtualAllocEx") and
pe.imports("kernel32.dll", "WriteProcessMemory") and
pe.imports("kernel32.dll", "CreateRemoteThread") and
pe.imports("wininet.dll", "InternetOpenA")
) or
// High entropy .text section (packed)
(
for any section in pe.sections : (
section.name == ".text" and
math.entropy(section.raw_data_offset, section.raw_data_size) > 7.0
)
)
}
rule MalwareX_Rich_Header {
meta:
description = "Detects MalwareX by Rich header hash"
condition:
pe.is_pe and
hash.md5(pe.rich_signature.clear_data) == "abc123def456abc123def456abc123de"
}
```
### Step 5: Batch Triage with Python
Automate scanning of sample collections:
```python
import yara
import os
import json
import hashlib
from datetime import datetime
# Compile all rule files
rule_files = {
"apt": "rules/apt_rules.yar",
"ransomware": "rules/ransomware_rules.yar",
"trojan": "rules/trojan_rules.yar",
"custom": "rules/custom_rules.yar",
}
rules = yara.compile(filepaths=rule_files)
# Scan sample directory
results = []
sample_dir = "/path/to/samples"
for filename in os.listdir(sample_dir):
filepath = os.path.join(sample_dir, filename)
if not os.path.isfile(filepath):
continue
with open(filepath, "rb") as f:
data = f.read()
sha256 = hashlib.sha256(data).hexdigest()
matches = rules.match(filepath)
result = {
"filename": filename,
"sha256": sha256,
"size": len(data),
"matches": [],
"classification": "UNKNOWN",
}
for match in matches:
result["matches"].append({
"rule": match.rule,
"namespace": match.namespace,
"tags": match.tags,
"strings": [(hex(s[0]), s[1], s[2].decode("utf-8", errors="replace")[:100])
for s in match.strings] if match.strings else []
})
if result["matches"]:
result["classification"] = result["matches"][0]["namespace"].upper()
results.append(result)
# Summary
classified = sum(1 for r in results if r["classification"] != "UNKNOWN")
print(f"Scanned: {len(results)} samples")
print(f"Classified: {classified} ({classified/len(results)*100:.1f}%)")
print(f"Unknown: {len(results)-classified}")
# Export results
with open("triage_results.json", "w") as f:
json.dump(results, f, indent=2)
```
### Step 6: Validate and Optimize Rules
Test rules for false positives and performance:
```bash
# Test rule syntax
yara -C custom_rules.yar
# Scan known-clean directory to check false positives
yara -r custom_rules.yar /path/to/clean_files/ > false_positives.txt
wc -l false_positives.txt
# Benchmark rule performance
time yara -r custom_rules.yar /path/to/large_sample_collection/
# Profile individual rule performance
yara -p custom_rules.yar suspect.exe
```
## Key Concepts
| Term | Definition |
|------|------------|
| **YARA Rule** | Pattern matching rule defining strings, byte sequences, and conditions that identify a specific file or malware family |
| **Condition** | Boolean expression combining string matches, file properties, and module functions to determiRelated in Writing & Docs
jax-development
IncludedUse this skill when the user is writing, debugging, profiling, refactoring, reviewing, benchmarking, parallelising, exporting, or explaining JAX code, or when they mention JAX, jax.numpy, jit, grad, value_and_grad, vmap, scan, lax, random keys, pytrees, jax.Array, sharding, Mesh, PartitionSpec, NamedSharding, pmap, shard_map, Pallas, XLA, StableHLO, checkify, profiler, or the JAX repo. It helps turn NumPy or PyTorch-style code into pure functional JAX, fix tracer/control-flow/shape/PRNG bugs, remove recompiles and host-device syncs, choose transforms and sharding strategies, inspect jaxpr/lowering/IR, and benchmark compiled code correctly.
nature-article-writer
IncludedDrafts, rewrites, diagnostically critiques, and style-calibrates primary research manuscripts for Nature and Nature Portfolio journals. Use when the user wants a Nature-style title, summary paragraph or abstract, introduction, results, discussion, methods, figure legends, presubmission enquiry, cover letter, reviewer response, or when a scientific draft sounds generic, jargon-heavy, structurally weak, or AI-ish and needs precise, broad-reader-friendly prose without inventing data, analyses, or references. Best for primary research articles and letters rather than reviews or press releases unless explicitly adapting one.
deckrd
IncludedDocument-driven framework that derives requirements, specifications, implementation plans, and executable tasks from goals through structured AI dialogue. Use when user says "write requirements", "create spec", "plan implementation", "derive tasks", "structure this feature", "break down into tasks", or "document this module". Also use for reverse engineering existing code into docs (/deckrd rev). Do NOT use for direct code writing — use /deckrd-coder after tasks are generated. Do NOT use when the user only wants to run or fix existing code without planning.
clinical-decision-support
IncludedGenerate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
handling-sf-data
IncludedSalesforce data operations with 130-point scoring. Use this skill to create, update, delete, bulk import/export, generate test data, and clean up org records using sf CLI and anonymous Apex. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, needs data factory patterns for Apex tests, or needs to seed/clean records in a Salesforce org. DO NOT TRIGGER when: SOQL query writing only (use querying-soql), Apex test execution (use running-apex-tests), or metadata deployment (use deploying-metadata).
accelint-ac-to-playwright
IncludedConvert and validate acceptance criteria for Playwright test automation. Use when user asks to (1) review/evaluate/check if AC are ready for automation, (2) assess if AC can be converted as-is, (3) validate AC quality for Playwright, (4) turn AC into tests, (5) generate tests from acceptance criteria, (6) convert .md bullets or .feature Gherkin files to Playwright specs, (7) create test automation from requirements. Handles both bullet-style markdown and Gherkin syntax with JSON test plan generation and validation.