windsurf-incident-runbook
Execute Windsurf incident response when AI features fail or cause production issues. Use when Cascade breaks code, Windsurf service is down, AI-generated code causes production incidents, or team needs emergency Windsurf troubleshooting. Trigger with phrases like "windsurf incident", "windsurf outage", "windsurf broke production", "cascade caused bug", "windsurf emergency".
What this skill does
# Windsurf Incident Runbook
## Overview
Incident response procedures for Windsurf-related issues: Cascade service outages, AI-generated code causing bugs, and team workflow disruptions.
## Prerequisites
- Access to Windsurf dashboard and status page
- Git access to affected repositories
- Team communication channel (Slack, Teams)
## Severity Levels
| Level | Definition | Response Time | Examples |
|-------|------------|---------------|----------|
| P1 | Production broken by AI code | < 15 min | Cascade-generated code deployed with critical bug |
| P2 | Team workflow blocked | < 1 hour | Windsurf service outage, all Cascade down |
| P3 | Degraded AI features | < 4 hours | Slow Cascade, Supercomplete intermittent |
| P4 | Minor inconvenience | Next business day | Specific model unavailable, feature regression |
## Quick Triage Decision Tree
```
Is Windsurf service itself down?
├─ YES: Check https://status.windsurf.com
│ ├─ Status page shows incident → WAIT for Windsurf to resolve
│ │ Action: Switch to manual coding, notify team
│ └─ Status page green → Local issue
│ Action: Restart Windsurf, check internet, re-authenticate
│
└─ NO: Did AI-generated code cause a production issue?
├─ YES → P1 INCIDENT
│ 1. Revert the deployment immediately
│ 2. Identify the Cascade-generated commit(s)
│ 3. Fix manually or with targeted Cascade prompt
│ 4. Post-incident: update review policy
│
└─ NO: Is Cascade giving bad suggestions?
├─ YES → Check .windsurfrules, start fresh Cascade session
└─ NO → See windsurf-common-errors
```
## P1 Playbook: AI Code Caused Production Bug
### Step 1: Immediate Mitigation
```bash
set -euo pipefail
# Revert the deployment
git log --oneline -10 # Find the bad commit(s)
# If tagged with [cascade]:
git revert HEAD --no-edit # Revert most recent commit
git push origin main # Deploy revert
# If multiple Cascade commits:
git revert --no-commit HEAD~3..HEAD # Revert last 3 commits
git commit -m "revert: undo cascade changes causing [issue]"
git push origin main
```
### Step 2: Identify Root Cause
```bash
# Find all Cascade-generated commits
git log --all --oneline --grep="cascade" --since="1 week ago"
git log --all --oneline --grep="\[cascade\]" --since="1 week ago"
# Compare before/after
git diff [last-good-commit]..HEAD -- src/
# Common root causes:
# 1. Cascade modified shared utility used by many modules
# 2. Cascade changed error handling (swallowed exceptions)
# 3. Cascade "optimized" code that had intentional behavior
# 4. Cascade introduced dependency on newer API version
```
### Step 3: Fix and Validate
```bash
set -euo pipefail
git checkout -b fix/cascade-revert
# Make targeted fix
npm test
npm run typecheck
# Deploy to staging first
```
## P2 Playbook: Windsurf Service Outage
### Step 1: Confirm and Communicate
```bash
# Check Windsurf status
curl -sf https://status.windsurf.com || echo "Status page unreachable"
```
### Step 2: Team Notification
```
Team notification template:
Windsurf AI features are currently unavailable.
Status: https://status.windsurf.com
Impact: Cascade and Supercomplete are not working.
Workaround: Continue coding manually. Windsurf still works as a
standard VS Code editor — only AI features are affected.
ETA: Monitoring status page for updates.
```
### Step 3: Workarounds During Outage
```markdown
1. Windsurf still works as VS Code (file editing, terminal, git)
2. Extensions still work (ESLint, Prettier, debugger)
3. Only Cascade, Supercomplete, and Command mode are down
4. Continue coding manually until service restores
5. Do NOT switch to a different editor mid-task (context loss)
```
## P3 Playbook: Degraded AI Features
```markdown
Symptoms and fixes:
Slow Cascade → Start fresh session, reduce workspace size
No Supercomplete → Check status bar widget, verify enabled
Wrong model → Check credit balance, switch to available model
MCP disconnected → Restart MCP servers (Command Palette)
Indexing stuck → Reset indexing (Command Palette > "Codeium: Reset Indexing")
```
## Post-Incident Actions
### Evidence Collection
```bash
set -euo pipefail
# Collect relevant data
mkdir incident-$(date +%Y%m%d)
git log --since="1 day ago" --stat > incident-$(date +%Y%m%d)/commits.txt
cp .windsurfrules incident-$(date +%Y%m%d)/ 2>/dev/null || true
# See windsurf-debug-bundle for full diagnostic collection
```
### Postmortem Template
```markdown
## Incident: [Title]
**Date:** YYYY-MM-DD
**Duration:** X hours Y minutes
**Severity:** P[1-4]
### Summary
[1-2 sentence description]
### Timeline
- HH:MM — [Event]
- HH:MM — [Event]
### Root Cause
[Was this an AI-generated code issue? Windsurf service issue? Config issue?]
### What Went Wrong
- [ ] AI-generated code not reviewed thoroughly
- [ ] Missing tests for AI-modified code
- [ ] .windsurfrules didn't prevent the bad pattern
- [ ] Cascade modified shared code without constraint
### Action Items
- [ ] Update .windsurfrules to prevent this pattern
- [ ] Add test coverage for affected module
- [ ] Update team Cascade usage policy
- [ ] Add CI gate for AI-modified code
```
## Error Handling
| Issue | Immediate Action | Long-Term Fix |
|-------|-----------------|---------------|
| AI code in prod broke feature | Git revert + redeploy | Enforce test gates for Cascade commits |
| Windsurf service down | Code manually | No action needed — external service |
| AI modified protected files | Git revert those files | Add to .codeiumignore |
| Team lost work from Cascade | Recover from git history | Enforce pre-Cascade git commit policy |
## Examples
### Quick Health Check
```bash
curl -sf https://status.windsurf.com | head -5 || echo "WINDSURF STATUS UNREACHABLE"
```
### Find Recent Cascade Commits
```bash
git log --all --oneline --since="7 days ago" | grep -i cascade
```
## Resources
- [Windsurf Status Page](https://status.windsurf.com)
- [Windsurf GitHub Issues](https://github.com/Exafunction/codeium/issues)
## Next Steps
For data handling compliance, see `windsurf-data-handling`.
Related in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.