fortify
Included with Lifetime
$97 forever
Use when the workflow lacks error handling, has been failing in production, or needs retry logic, fallback strategies, and circuit breakers.
fix
What this skill does
## MANDATORY PREPARATION Invoke /agent-workflow — it contains workflow principles, anti-patterns, and the **Context Gathering Protocol**. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run /teach-maestro first. Consult the guardrails-safety reference in the agent-workflow skill for defense-in-depth patterns and error boundary design. --- Make the workflow resilient. Every external call will fail eventually — model APIs, tools, databases, third-party services. Fortify ensures the workflow handles failure gracefully. ### Fortification Layers **Layer 1: Input Validation** - Validate all inputs before processing - Return clear error messages for invalid input - Set size limits on all input fields **Layer 2: Retry with Backoff** For transient failures (network errors, rate limits, timeouts): ```yaml Retry strategy: max_retries: 3 initial_delay: 1s backoff_multiplier: 2 max_delay: 30s retryable_errors: [429, 500, 502, 503, 504, TIMEOUT, CONNECTION_ERROR] non_retryable_errors: [400, 401, 403, 404] ``` **Layer 3: Fallback Responses** When retries are exhausted: - Use a cached previous response (if applicable) - Use a simpler/cheaper model as fallback - Return a graceful degradation response - Escalate to human review **Layer 4: Circuit Breakers** When a service is consistently failing: ```yaml Circuit breaker: failure_threshold: 5 consecutive failures state: CLOSED → OPEN (after threshold) → HALF_OPEN (after cooldown) cooldown: 60 seconds half_open_max_requests: 1 ``` **Layer 5: Timeout Controls** Every external call needs a timeout: - Model API calls: 30-120s depending on task - Tool executions: 10-60s depending on tool - Database queries: 5-15s - Third-party APIs: 10-30s ### Fortification Audit For each component, verify: - [ ] Input validation present - [ ] Retry logic for transient failures - [ ] Fallback for when retries fail - [ ] Timeout set - [ ] Error logged with context - [ ] User gets a meaningful error (not a stack trace) ### Recommended Next Step After fortification, run `/evaluate` to verify error handling works under realistic failure scenarios. **NEVER**: - Retry non-retryable errors (authentication failures, validation errors) - Retry without backoff (you'll make the problem worse) - Swallow errors silently (log and handle, don't ignore) - Set infinite timeouts (they'll hang forever) - Skip the fallback (retries exhausted with no fallback = user sees an error)
Related in fix
zero-defect
IncludedUse when you need maximum precision on a critical task — production deployments, security-sensitive code, financial calculations, or any work where mistakes are unacceptable.
fix
calibrate
IncludedUse when workflow components are inconsistent, naming conventions vary, or a new team member's work needs alignment to project standards.
fix
refine
IncludedUse when the workflow works but needs polish, or as the final step in a diagnose → fix → refine cycle before shipping.
fix
streamline
IncludedUse when the workflow feels too complex, has accumulated cruft, or has redundant steps and overlapping tools that need consolidation.
fix