testing-ransomware-recovery-procedures
Test and validate ransomware recovery procedures including backup restore operations, RTO/RPO target verification, recovery sequencing, and clean restore validation to ensure organizational resilience against destructive ransomware attacks.
What this skill does
# Testing Ransomware Recovery Procedures
## When to Use
Use this skill when:
- Validating that ransomware recovery plans actually work under realistic conditions
- Measuring RTO (Recovery Time Objective) and RPO (Recovery Point Objective) against business requirements
- Testing backup restore operations to confirm data integrity and completeness after simulated encryption
- Conducting tabletop exercises or live recovery drills for ransomware scenarios
- Auditing disaster recovery readiness as part of compliance or cyber insurance requirements
**Do not use** for active incident response during a live ransomware attack. Use dedicated IR playbooks instead.
## Prerequisites
- Isolated recovery test environment (air-gapped or network-segmented lab)
- Access to backup infrastructure (Veeam, Commvault, Rubrik, AWS Backup, Azure Backup)
- Documented RTO/RPO targets per application tier from business impact analysis
- Backup copies available for restore testing (production replicas or test snapshots)
- Recovery runbooks with step-by-step procedures for each critical system
## Workflow
### Step 1: Define Recovery Test Scope
Identify critical systems and their tiered recovery targets:
| Tier | System Type | RTO Target | RPO Target | Example |
|------|------------|------------|------------|---------|
| Tier 1 | Mission-critical | < 1 hour | < 15 min | Active Directory, core database |
| Tier 2 | Business-critical | < 4 hours | < 1 hour | ERP, email, CRM |
| Tier 3 | Business-operational | < 24 hours | < 4 hours | File shares, internal apps |
| Tier 4 | Non-critical | < 72 hours | < 24 hours | Dev/test, analytics |
### Step 2: Prepare Test Environment
```bash
# Verify isolated recovery network is segmented
# No routes to production should exist
ip route show | grep -v "192.168.100.0/24" # recovery VLAN only
# Verify backup catalog is accessible
restic snapshots --repo s3:s3.amazonaws.com/backup-bucket --password-file /etc/restic/pw
# Or for Veeam:
# Get-VBRBackup | Where-Object {$_.JobType -eq "Backup"} | Select Name, LastPointCreationTime
```
### Step 3: Execute Restore and Measure RTO
For each tiered system, measure the full recovery timeline:
1. **Detection to Decision** - Time from simulated alert to restore decision
2. **Backup Locate** - Time to identify and select the correct clean restore point
3. **Restore Execution** - Time to restore data/VM/application from backup
4. **Validation** - Time to verify data integrity and application functionality
5. **Service Restoration** - Time until the system is fully operational
```
Recovery Timeline Measurement:
T0: Incident declared (simulated ransomware detection)
T1: Recovery team assembled and backup identified
T2: Restore initiated from clean backup
T3: Restore completed, integrity checks passed
T4: Application validated and service restored
Actual RTO = T4 - T0
Actual RPO = T0 - backup_timestamp
```
### Step 4: Validate Data Integrity Post-Restore
```bash
# Compare file counts between backup manifest and restored data
find /restored/data -type f | wc -l
# Compare against pre-backup manifest
# Verify database consistency after restore
pg_isready -h localhost -p 5432
psql -c "SELECT count(*) FROM critical_table;" -d restored_db
# Hash verification of critical files
sha256sum /restored/data/critical_config.xml
# Compare against known-good hash from backup manifest
```
### Step 5: Test Credential Rotation and Security Hardening
After restore, validate that security controls are re-established:
1. Rotate all service account passwords and API keys
2. Verify MFA is enabled on all administrative accounts
3. Confirm EDR/AV agents are running and reporting to management console
4. Validate firewall rules block known C2 indicators
5. Check that restored systems have latest security patches
### Step 6: Document Results and Calculate Gap
```
Recovery Test Report:
System: [Name]
Tier: [1-4]
RTO Target: [target] Actual RTO: [measured] Gap: [delta]
RPO Target: [target] Actual RPO: [measured] Gap: [delta]
Data Integrity: [PASS/FAIL]
Application Validation: [PASS/FAIL]
Security Controls Restored: [PASS/FAIL]
Status: [MEETS TARGET / EXCEEDS TARGET / FAILS TARGET]
Remediation Required: [description if FAILS]
```
## Key Concepts
| Term | Definition |
|------|-----------|
| **RTO** | Recovery Time Objective: maximum acceptable downtime for a system after a disaster |
| **RPO** | Recovery Point Objective: maximum acceptable data loss measured in time |
| **WRT** | Work Recovery Time: time to verify system integrity after restore completes |
| **MTD** | Maximum Tolerable Downtime: absolute limit before unacceptable business impact |
| **Clean Restore Point** | A backup verified to be free of ransomware artifacts or encryption |
| **Recovery Sequencing** | The order in which interdependent systems must be restored |
| **Air-Gapped Backup** | Backup stored on media physically disconnected from the network |
## Tools & Systems
| Tool | Purpose |
|------|---------|
| Veeam Backup & Replication | VM and physical server backup and restore |
| Commvault | Enterprise data protection and recovery orchestration |
| Rubrik | Cloud-native backup with ransomware recovery SLA |
| AWS Backup | Centralized backup for AWS services |
| Azure Backup | Microsoft cloud backup with immutable vault |
| Restic | Open-source encrypted backup tool |
| Velero | Kubernetes cluster backup and restore |
## Common Pitfalls
- **Not testing restores regularly**: Backups that are never tested often fail when needed. Test quarterly at minimum.
- **Ignoring recovery sequencing**: Restoring an application before its database dependency causes cascading failures.
- **Skipping credential rotation**: Restored systems may contain compromised credentials that allow re-infection.
- **Using production network for testing**: Recovery tests on production networks risk spreading simulated or real infections.
- **Measuring RTO without WRT**: Restore completion is not recovery completion. Include validation and hardening time.
- **No immutable backups**: If ransomware can encrypt or delete backups, recovery is impossible. Use air-gapped or immutable storage.
## References
- NIST SP 800-184: Guide for Cybersecurity Event Recovery
- CISA Ransomware Guide: https://www.cisa.gov/stopransomware
- Veeam RTO/RPO Best Practices: https://www.veeam.com/blog/recovery-time-recovery-point-objectives.html
- NIST CSF 2.0 RC.RP (Recovery Planning)
Related in Code Review
gstack
IncludedFast headless browser for QA testing and site dogfooding. Navigate pages, interact with elements, verify state, diff before/after, take annotated screenshots, test responsive layouts, forms, uploads, dialogs, and capture bug evidence. Use when asked to open or test a site, verify a deployment, dogfood a user flow, or file a bug with screenshots. (gstack)
startup-due-diligence
IncludedLegal due diligence review for seed-stage and Series A startups (US, Delaware C-Corp focus). Supports both investor and founder perspectives. Capabilities include: (1) Interactive document review and issue spotting; (2) Document request list generation; (3) Cap table and SAFE/convertible note analysis; (4) Red flag identification with severity ratings; (5) Diligence report generation. TRIGGERS: due diligence, DD, startup investment, cap table review, Series A, seed round, investor diligence, legal review startup, SAFE analysis, convertible note, 409A, founder vesting.
interview-master
IncludedThis skill should be used when the user asks to "generate interview questions", "prepare for interview", "optimize resume", "conduct mock interview", "analyze git commits for resume", "generate resume from code", "review my resume", or mentions interview preparation, career assistance, or extracting project experience from git history. Provides comprehensive interview and career development guidance for both job seekers and interviewers.
fix-issue
IncludedFixes GitHub issues using parallel analysis agents for root cause investigation, code exploration, and regression detection. Reads issue context from gh CLI, searches codebase and memory for related patterns, generates a fix with tests, and links the resolution back to the issue via PR. Includes prevention analysis to avoid recurrence. Use when debugging errors, resolving regressions, fixing bugs, or triaging issues.
sf-apex
IncludedGenerates and reviews Salesforce Apex code with 150-point scoring. TRIGGER when: user writes, reviews, or fixes Apex classes, triggers, test classes, batch/queueable/schedulable jobs, or touches .cls/.trigger files. DO NOT TRIGGER when: LWC JavaScript (use sf-lwc), Flow XML (use sf-flow), SOQL-only queries (use sf-soql), or non-Salesforce code.
swift-development
IncludedComprehensive Swift development for building, testing, and deploying iOS/macOS applications. Use when Claude needs to: (1) Build Swift packages or Xcode projects from command line, (2) Run tests with XCTest or Swift Testing framework, (3) Manage iOS simulators with simctl, (4) Handle code signing, provisioning profiles, and app distribution, (5) Format or lint Swift code with SwiftFormat/SwiftLint, (6) Work with Swift Package Manager (SPM), (7) Implement Swift 6 concurrency patterns (async/await, actors, Sendable), (8) Create SwiftUI views with MVVM architecture, (9) Set up Core Data or SwiftData persistence, or any other Swift/iOS/macOS development tasks.