runaway-guard
Cost-safety discipline for paid AI / inference APIs: treat $-cost as a third complexity dimension alongside time and space. Forces a written per-run $-cap, per-day $-cap, max-iterations bound, concurrency limit, and a matching provider-dashboard hard cap BEFORE any call site is written.
What this skill does
# runaway-guard — $-Cost is the Third Complexity Dimension
Every loop has time complexity and space complexity. A loop that calls a paid API has a third: **dollars per execution**. The model tracks the first two automatically. It does not track the third, so it ships code where a single bug — a retry without bound, a stream reconnect storm, an agent that re-queues itself, a webhook that fires the same job twice — silently spends real money.
The canonical incident: developer writes a Fal.ai image-generation loop. Loop "obviously terminates" because it iterates over a fixed list. The list comes from a callback that fires on every Inngest retry. Each retry doubles the list. By morning, the bill is **$200**. Tests pass. Code review passed. The bug is not in the loop body. The bug is that **no one stated the wallet invariant**.
runaway-guard fixes this. State the max calls. State the max dollars per run. State the max dollars per day. Set the same caps in the provider dashboard so a code bug cannot bypass them. Then write the code.
**Violating the letter of these rules is violating the spirit of the skill.** "I'm only testing locally" is the exact rationalization that ships the $200 bill — local code hits the same paid API as production.
## When to Use This Skill
Use **runaway-guard** when:
- Writing or reviewing code that calls a paid AI / inference API in a loop, queue, retry path, agent step, webhook handler, or background job.
- Importing or wrapping any paid-inference SDK: `@fal-ai/*`, `fal-client`, `@anthropic-ai/sdk`, `anthropic`, `openai`, `replicate`, `elevenlabs`, `together-ai`, `groq-sdk`, `cohere-ai`, `@mistralai/*`.
- Designing an agent loop, fan-out pipeline, retry wrapper, polling job, stream reconnect, or self-rescheduling job that may call a billed endpoint.
- Auditing a codebase / PR for unbounded fan-out, unbounded retries, missing idempotency keys, or missing provider-side spend caps.
- Diagnosing an unexpected bill, runaway loop incident, or surprise overage.
## The Iron Law
```text
NO CALL TO A PAID API WITHOUT A WRITTEN $-CAP AT BOTH THE CODE AND PROVIDER LEVEL
```
A cap only in code can be bypassed by a bug in that code. A cap only at the provider can be hit during normal usage and degrade the product. You need both. If you cannot state both in one sentence each, you have not designed the call site — you have written a wish.
## Non-negotiable rules
1. **Every call site gets a one-line cost contract.** Before writing any paid-API call, state in one sentence:
- **Max calls per run:** the strict upper bound on invocations in a single execution of this code path.
- **Max $ per run:** `max_calls × unit_cost` — compute it, don't estimate.
- **Max $ per day:** the provider-side hard cap that backstops the code-side bound.
Examples:
- "Fal flux-pro at $0.05/image; max 20 images per job; max $1 per job; provider Spend Limit $50/day."
- "Anthropic Sonnet at ~$0.015 per request (cached); max 50 requests per agent run; max $0.75 per run; Workspace Budget hard cap $30/day."
If you cannot fill in all three numbers, you have not designed the call site.
2. **Every loop calling a paid API gets an explicit iteration bound, not just a termination argument.** `invariant-guard` requires a termination measure. runaway-guard requires the bound to be a **concrete integer in code**, not just "eventually terminates":
```ts
// ❌ Terminates in theory. Bills $200 in practice.
while (job.status !== 'done') {
await fal.run(...);
}
// ✅ Concrete bound — wallet invariant explicit.
const MAX_CALLS = 20;
for (let i = 0; i < MAX_CALLS && job.status !== 'done'; i++) {
await fal.run(...);
}
if (job.status !== 'done') throw new Error('exceeded MAX_CALLS budget');
```
3. **Every retry path is bounded by attempts AND total elapsed cost, not by time alone.** Exponential backoff with no attempt cap is a wallet attack on yourself.
- Max attempts: a small integer (3–5 for transient errors, 1 for 4xx).
- Cap counts across the whole pipeline, not just one library — Inngest retries × SDK retries × your own retry wrapper multiply.
- 4xx errors do not retry. Period. They will not become 2xx; they will just bill again.
4. **Every fan-out path declares a concurrency limit.** Parallel calls multiply cost per wall-clock second. State the limit in code, at the queue (Inngest `concurrency`), and at the provider where supported:
- Inngest: `concurrency: { limit: N }` on the function.
- BullMQ / Sidekiq / Cloud Tasks: queue-level concurrency.
- In-process: `p-limit`, semaphore, or batched `Promise.all` chunks — never an unbounded `Promise.all(items.map(...))` on a paid API.
5. **Every paid API has a matching provider-side hard cap, configured out of band.** Defense in depth: if the code is wrong, the provider stops the bleeding. Document the cap in the same file as the call site so future readers know it exists.
| Provider | Where to set the hard cap |
|---|---|
| **Fal.ai** | Dashboard → Billing → **Spend Limit** (e.g. $50/day). Hard stop on exceed. |
| **Anthropic** | Console → Workspaces → **Workspace Budget** with hard limit. Per-workspace, per-month. |
| **OpenAI** | Org → Settings → **Usage limits** (org-level hard limit blocks requests). ⚠️ Per-*project* monthly budgets are **soft thresholds only** — they alert but do not block. For a real hard cap use the org-level Usage limit, a billing gateway, or your own fail-closed budget check. |
| **Replicate** | Account → Billing → **Spend limit**. Per account. |
| **ElevenLabs** | Workspace → **Usage limits** per workspace / API key. |
| **Together / Groq / Cohere / Mistral** | Each has a billing dashboard with a monthly spend cap — set it before first deploy, not after. |
No hard cap, no call site. Set the cap before the first request, not after the first incident.
6. **Idempotency keys on every mutating or charging call.** A webhook that fires twice should bill once. Without an idempotency key, retry policies you cannot see (load balancer, framework, gateway) silently double-charge.
7. **Make the "amplifier" patterns explicit and forbidden by default.** These are the shapes that turn small bugs into large bills:
- **Self-rescheduling jobs.** A job that re-enqueues itself with no decrementing measure is an unbounded loop with extra steps.
- **Webhook handlers that call the API that called the webhook.** Cycle detection or it will cycle.
- **Recursion over LLM output.** "Ask the model what to do next" with no depth cap is a depth-unbounded recursion in dollars.
- **Polling without a deadline.** `while (!done) await poll()` with no `maxWaitMs` is a wallet leak.
- **Streaming reconnect storms.** A WebSocket / SSE reconnect with no backoff and no attempt cap can hammer a billed endpoint thousands of times per minute.
- **Cache-miss stampede on a paid call.** N concurrent requests for the same uncached key → N billed calls. Use `singleflight` / request coalescing.
## The pre-write protocol
Before producing code that calls a paid API, your message must contain — in this order:
1. **Provider + unit cost.** "Fal flux-pro: $0.05/image, billed per success."
2. **Max calls per run.** A literal integer that will appear as a constant in the code.
3. **Max $ per run.** `max_calls × unit_cost`. Compute it.
4. **Max $ per day (provider hard cap).** The dashboard setting that backstops the code.
5. **Concurrency limit.** In code, at the queue, at the provider.
6. **Retry policy.** Max attempts, which error codes retry, idempotency key strategy.
7. **Amplifier audit.** Walk the list in rule 7; declare "none apply" or address each that does.
8. **The code** — with the cost contract in a comment above the call site.
9. **Self-check.** One line: "in the worst case, this code bills $X and the provider cap stops it at $Y."
If any of 1–7 is missing, do not emit code.
## Worked trap — the Inngest + Fal $200 nRelated in Ads & Marketing
ads
IncludedMulti-platform paid advertising audit and optimization skill. Analyzes Google, Meta, YouTube, LinkedIn, TikTok, Microsoft, and Apple Ads. 250+ checks with scoring, parallel agents, industry templates, and AI creative generation.
banana
IncludedAI image generation Creative Director powered by Google Gemini Nano Banana models. Use this skill for ANY request involving image creation, editing, visual asset production, or creative direction. Triggers on: generate an image, create a photo, edit this picture, design a logo, make a banner, visual for my anything, and all /banana commands. Handles text-to-image, image editing, multi-turn creative sessions, batch workflows, and brand presets.
rpg-migration-analyzer
IncludedAnalyzes legacy RPG (Report Program Generator) programs from AS/400 and IBM i systems for migration to modern Java applications. Extracts business logic from RPG III/IV/ILE source code, identifies data structures (D-specs), file operations (F-specs), program dependencies (CALLB/CALLP), and converts RPG constructs to Java equivalents. Generates migration reports, complexity estimates, and Java implementation strategies with POJO classes, JPA entities, and service methods. Use when modernizing AS/400 or IBM i legacy systems, analyzing RPG source files (.rpg, .rpgle, .RPGLE), converting RPG to Java, mapping data specifications to Java classes, planning legacy system migration, or when user mentions RPG analysis, Report Program Generator, RPG III/IV/ILE, AS/400 modernization, IBM i migration, packed decimal conversion, or mainframe application rewrite.
brand-library-architect
IncludedBuild a complete brand library for a product — visual asset render pipeline, brand documentation set (BRAND, COPY, MANIFESTO, BIOS, FAQ, GLOSSARY, TONE, PRICING), open-source convention files (README, CONTRIBUTING, SECURITY, CODE_OF_CONDUCT), and a self-contained press kit. This skill should be used when the user asks to "build a brand library / brand kit / press kit / brand assets" for a product, "set up a brand library workflow," "create a positioning manifesto plus visual identity," or any combination of brand documentation + visual asset pipeline. Apply phase-by-phase or run end-to-end. Templates are product-agnostic and use {{TOKEN}} placeholders the skill prompts the user to fill.
writing-tech-post
IncludedAuthors engineering blog posts end-to-end: launch deep-dives, incident postmortems, architecture migrations, performance case studies, tutorials, AI/agent system writeups, security disclosures, and research-to-product translations. Picks the correct archetype, plans the abstraction ladder, enforces an evidence cadence (diagrams, benchmarks, profiles, traces, code, ablations), tunes voice against publisher house styles (Datadog, Vercel, GitHub, AWS, Meta, Cloudflare, Jane Street), and runs a pre-publish gate for narrative momentum and disclosure ethics. Use when drafting a new engineering post, restructuring a draft that feels flat, deciding which evidence form belongs where, validating that depth and product context are balanced, or preparing a postmortem, migration, or performance narrative for external publication. Do not use for API reference documentation, README authoring, marketing copy, release notes, generic SEO content, ghost-written executive thought leadership, or non-engineering long-form essays.
blog-google
IncludedGoogle API integration for blog performance: PageSpeed Insights, CrUX Core Web Vitals with 25-week history, Search Console performance, URL Inspection, Indexing API, GA4 organic traffic, NLP entity analysis for E-E-A-T, YouTube video search for embedding, and Google Ads Keyword Planner. Progressive feature availability based on credential tier (API key, OAuth/service account, GA4, Ads). Shares config with claude-seo at ~/.config/claude-seo/google-api.json. Use when user says "google data", "page speed", "core web vitals", "search console", "indexation", "GA4", "keyword research", "nlp entities", "blog performance", "youtube search", "google api setup".