performing-yara-rule-development-for-detection

Included with Lifetime

$97 forever

Develop precise YARA rules for malware detection by identifying unique byte patterns, strings, and behavioral indicators in executable files while minimizing false positives.

Generalyaramalware-detectionsignature-developmentthreat-huntingpattern-matchingyara-xindicator-developmentscriptsassets

What this skill does

# Performing YARA Rule Development for Detection

## Overview

YARA is the pattern matching swiss knife for malware researchers, enabling identification and classification of malware based on textual or binary patterns. Effective YARA rules combine unique string patterns, byte sequences, PE header characteristics, import table analysis, and conditional logic to detect malware families while avoiding false positives. Modern YARA-X (rewritten in Rust, stable since June 2025) brings improved performance and new modules. Rules should target unpacked malware artifacts like hardcoded stack strings, C2 URLs, mutex names, encryption constants, and unique code sequences rather than packer signatures.


## When to Use

- When conducting security assessments that involve performing yara rule development for detection
- When following incident response procedures for related security events
- When performing scheduled security testing or auditing activities
- When validating security controls through hands-on testing

## Prerequisites

- Python 3.9+ with `yara-python` library
- YARA 4.5+ or YARA-X 0.10+
- PE analysis tools (`pefile`, `pestudio`)
- Hex editor for identifying unique byte patterns
- Access to malware samples (VirusTotal, MalwareBazaar)
- Understanding of PE file format, strings, and import tables

## Key Concepts

### Rule Structure

Every YARA rule consists of three sections: `meta` (optional descriptive metadata), `strings` (pattern definitions), and `condition` (matching logic). String types include text strings (ASCII/wide/nocase), hex patterns with wildcards and jumps, and regular expressions. Conditions combine string matches with file properties using boolean operators.

### String Selection Strategy

Effective rules target patterns that are unique to the malware family and survive recompilation. Hardcoded stack strings are excellent choices because compilers embed them consistently. C2 domain patterns, custom encryption routines, unique error messages, and specific API call sequences provide stable detection anchors. Avoid compiler-generated boilerplate and common library strings.

### Performance Optimization

YARA evaluates conditions short-circuit style. Place the most discriminating and cheapest-to-evaluate conditions first. Use `filesize` limits to skip irrelevant files quickly. Minimize regex usage in favor of hex patterns. Use `private` rules as building blocks for complex detection logic without generating standalone matches.

## Workflow

### Step 1: Analyze Sample for Unique Patterns

```python
#!/usr/bin/env python3
"""Extract candidate strings and byte patterns for YARA rule creation."""
import pefile
import re
import sys
from collections import Counter


def extract_strings(filepath, min_length=6):
    """Extract ASCII and wide strings from binary."""
    with open(filepath, 'rb') as f:
        data = f.read()

    # ASCII strings
    ascii_strings = re.findall(
        rb'[\x20-\x7e]{' + str(min_length).encode() + rb',}', data
    )

    # Wide (UTF-16LE) strings
    wide_strings = re.findall(
        rb'(?:[\x20-\x7e]\x00){' + str(min_length).encode() + rb',}', data
    )

    return {
        'ascii': [s.decode('ascii') for s in ascii_strings],
        'wide': [s.decode('utf-16-le') for s in wide_strings],
    }


def analyze_pe_imports(filepath):
    """Extract import table for API-based detection."""
    try:
        pe = pefile.PE(filepath)
    except pefile.PEFormatError:
        return []

    imports = []
    if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
        for entry in pe.DIRECTORY_ENTRY_IMPORT:
            dll_name = entry.dll.decode('utf-8', errors='replace')
            for imp in entry.imports:
                if imp.name:
                    func_name = imp.name.decode('utf-8', errors='replace')
                    imports.append(f"{dll_name}!{func_name}")
    return imports


def find_unique_byte_patterns(filepath, pattern_length=16):
    """Find unique byte sequences suitable for YARA hex patterns."""
    with open(filepath, 'rb') as f:
        data = f.read()

    try:
        pe = pefile.PE(filepath)
        # Focus on code section
        for section in pe.sections:
            if section.Characteristics & 0x20000000:  # IMAGE_SCN_MEM_EXECUTE
                code_start = section.PointerToRawData
                code_end = code_start + section.SizeOfRawData
                code_data = data[code_start:code_end]
                break
        else:
            code_data = data
    except Exception:
        code_data = data

    # Find byte patterns that appear exactly once
    patterns = []
    for i in range(0, len(code_data) - pattern_length, 4):
        pattern = code_data[i:i+pattern_length]
        if pattern.count(b'\x00') < pattern_length // 3:  # Skip null-heavy
            hex_pattern = ' '.join(f'{b:02X}' for b in pattern)
            patterns.append(hex_pattern)

    # Count frequency and return unique ones
    freq = Counter(patterns)
    unique = [p for p, count in freq.items() if count == 1]

    return unique[:20]  # Top 20 candidates


def suggest_rule_strings(filepath):
    """Suggest strings and patterns for YARA rule."""
    print(f"[+] Analyzing: {filepath}")

    # Extract strings
    strings = extract_strings(filepath)

    # Filter for suspicious/unique strings
    suspicious_keywords = [
        'http', 'https', 'cmd', 'powershell', 'mutex', 'pipe',
        'password', 'credential', 'inject', 'hook', 'debug',
        'sandbox', 'virtual', 'vmware', 'vbox',
    ]

    print("\n[+] Suspicious ASCII strings:")
    for s in strings['ascii']:
        if any(kw in s.lower() for kw in suspicious_keywords):
            print(f"  $ = \"{s}\" ascii")

    print("\n[+] Suspicious wide strings:")
    for s in strings['wide']:
        if any(kw in s.lower() for kw in suspicious_keywords):
            print(f"  $ = \"{s}\" wide")

    # Import analysis
    imports = analyze_pe_imports(filepath)
    suspicious_apis = [
        'VirtualAlloc', 'VirtualProtect', 'WriteProcessMemory',
        'CreateRemoteThread', 'NtUnmapViewOfSection', 'RtlMoveMemory',
        'OpenProcess', 'CreateToolhelp32Snapshot',
        'InternetOpenA', 'HttpSendRequestA',
        'CryptEncrypt', 'CryptDecrypt',
    ]

    print("\n[+] Suspicious imports:")
    for imp in imports:
        func = imp.split('!')[-1]
        if func in suspicious_apis:
            print(f"  {imp}")

    # Byte patterns
    print("\n[+] Candidate hex patterns:")
    patterns = find_unique_byte_patterns(filepath)
    for p in patterns[:5]:
        print(f"  $hex = {{ {p} }}")


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <sample_path>")
        sys.exit(1)
    suggest_rule_strings(sys.argv[1])
```

### Step 2: Write and Test YARA Rules

```python
import yara
import os

def create_yara_rule(rule_name, meta, strings, condition):
    """Generate a YARA rule from components."""
    meta_str = "\n".join(f'        {k} = "{v}"' for k, v in meta.items())
    strings_str = "\n".join(f"        {s}" for s in strings)

    rule = f"""rule {rule_name} {{
    meta:
{meta_str}

    strings:
{strings_str}

    condition:
        {condition}
}}"""
    return rule


def test_yara_rule(rule_text, test_dir):
    """Compile and test YARA rule against sample directory."""
    try:
        rules = yara.compile(source=rule_text)
    except yara.SyntaxError as e:
        print(f"[-] YARA syntax error: {e}")
        return None

    results = {"matches": [], "no_match": []}

    for filename in os.listdir(test_dir):
        filepath = os.path.join(test_dir, filename)
        if not os.path.isfile(filepath):
            continue

        matches = rules.match(filepath)
        if matches:
            results["matches"].append({
                "file": filename,
                "rules": [m.rule for m in matches],
            })
        else:
            results["no_match"].append(filename)

    print(f"[+] Matches: {len(results['ma

Files: 8

Size: 42.1 KB

Complexity: 78/100

Category: General

Source: https://github.com/mukul975/anthropic-cybersecurity-skills/tree/main/skills/performing-yara-rule-development-for-detection

Related in General

modeling-omnistudio-epc-catalog

Included

Salesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).

Generalscripts

relationship-science-coach

Included

Use this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.

Generalscripts

building-sf-integrations

Included

Salesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).

Generalscripts

venue-templates

Included

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

Generalscripts

let-fate-decide

Included

Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.

Generalscripts

net-ops

Included

Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.

Generalscripts