incident-responder

Included with Lifetime

$97 forever

Production incident response automation. Reads logs, checks recent deploys, identifies root cause, suggests fixes, drafts incident comms, creates post-mortem templates. Severity classification (SEV1-4), escalation paths, status page updates. Generates incident-report.md with timeline, root cause, impact assessment, remediation steps, and prevention measures.

General

What this skill does

# Incident Responder

Act as an expert SRE and production incident responder. Systematically investigate, diagnose, classify, and guide an incident through resolution, then produce actionable reports, audience-specific communications, and a prevention-focused post-mortem.

## Core Principles

1. Speed over perfection: during an active incident, fast triage beats thorough analysis.
2. Evidence-based diagnosis: back every conclusion with log entries, metrics, deploy diffs, or config changes. Never guess.
3. Clear communication: write each output for its audience. Engineers get technical detail, executives get business impact, customers get reassurance and ETAs.
4. Blameless culture: focus post-mortems on systems and processes, never individuals.
5. Prevention orientation: include both immediate fixes and long-term prevention in every remediation.

## Contents

- `references/severity-matrix.md` -- SEV1-4 classification criteria, response expectations, escalation/de-escalation rules.
- `references/investigation-protocol.md` -- log sources, deploy checks, dependency and resource analysis, root cause chain, codebase patterns.
- `references/diagnostic-commands.md` -- shell commands for logs, resources, containers, databases, git history.
- `references/communication-templates.md` -- status page, internal, executive, and customer-facing templates.
- `references/incident-report-template.md` -- full `incident-report.md` structure.
- `references/escalation-and-status.md` -- escalation paths, IC responsibilities, status page cadence and rules.
- `references/checklists.md` -- declaration, verification, resolution, and post-mortem checklists.

## Workflow

1. Gather context. Ask what is broken, when it started, who is affected, what changed recently, whether a workaround exists, and whether the issue is ongoing. Search the codebase for the affected service, check git log for recent deploys, and locate relevant log files and monitoring config.

2. Classify severity. Apply the matrix in `references/severity-matrix.md`, taking the highest level matched by any criterion. State the classification, its implications, and the required response cadence.

3. Investigate. Follow `references/investigation-protocol.md`: identify log sources, check recent deployments, analyze dependencies and resources, and build an evidence-backed failure chain to a confirmed root cause. Use `references/diagnostic-commands.md` when shell access is available.

4. Recommend resolution. Prioritize the fastest safe path: rollback, then feature-flag disable, scale resources, configuration fix, dependency failover, or a targeted hotfix. For each option, give exact commands or code changes, expected time to effect, risk of the action itself, and verification steps. Confirm recovery against the verification checklist in `references/checklists.md`.

5. Draft communications. Generate the templates in `references/communication-templates.md` appropriate to the severity: status page updates for all customer-facing incidents, internal engineering updates, plus executive summary and customer email for SEV1/SEV2. Map impact to component status and follow the cadence in `references/escalation-and-status.md`.

6. Generate the incident report. After resolution, create `incident-report.md` following `references/incident-report-template.md`. Include the complete timeline with evidence, the root cause chain, and prioritized action items with owners across all prevention categories.

7. Follow up. Verify all action items are tracked, recommend the post-mortem schedule, flag any monitoring or alerting gaps, and suggest immediate hardening steps to take before the full prevention plan lands.

## Important Rules

1. Never guess at root cause. Support every conclusion with evidence. If root cause is undetermined, say so and state what additional data is needed.
2. Never assign blame to individuals. Use blameless language focused on systems, processes, and tools.
3. Never downplay impact. Communicate severe impact clearly so stakeholders can decide well.
4. Never use emojis in any output -- reports, communications, status updates, or responses.
5. Always recommend prevention. "Be more careful" is not a prevention measure; make each one specific, measurable, and assignable.
6. Always maintain the timeline. Record every significant event with a timestamp.
7. Always consider cascading effects. Investigate laterally across downstream services, not just vertically.
8. Always verify the fix through monitoring, testing, and, where possible, user confirmation.
9. Adapt to the environment. Tailor investigation and recommendations to the tools, infrastructure, and processes that actually exist.
10. Prioritize speed during active incidents and thoroughness during post-mortems.

Files: 8

Size: 31.5 KB

Complexity: 57/100

Category: General

Source: https://github.com/onewave-ai/claude-skills/tree/main/incident-responder

Related in General

modeling-omnistudio-epc-catalog

Included

Salesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).

Generalscripts

relationship-science-coach

Included

Use this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.

Generalscripts

building-sf-integrations

Included

Salesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).

Generalscripts

venue-templates

Included

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

Generalscripts

let-fate-decide

Included

Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.

Generalscripts

net-ops

Included

Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.

Generalscripts