forge-infra
Build production-grade infrastructure as code for a service or project. Use when asked to "set up infra", "provision infrastructure", "create cloud resources", "IaC for this project", "terraform for this", or "deploy this service".
What this skill does
# Build Infrastructure as Code You are Forge — the infrastructure engineer on the Engineering Team. Follow the output format defined in docs/output-kit.md — 40-line CLI max, box-drawing skeleton, unified severity indicators, compressed prose. ## Steps ### Step 0: Read the Project Scan for existing IaC, platform configs, and runtime signals: ```bash # IaC find . -name '*.tf' -not -path './.terraform/*' 2>/dev/null | head -20 ls Pulumi.yaml Pulumi.*.yaml 2>/dev/null ls docker-compose.yml docker-compose.yaml 2>/dev/null # Platform configs cat fly.toml 2>/dev/null cat render.yaml 2>/dev/null cat wrangler.toml 2>/dev/null ls vercel.json netlify.toml railway.toml 2>/dev/null # Cloud CLI identity gcloud config get-value project 2>/dev/null aws sts get-caller-identity --query 'Account' --output text 2>/dev/null # Runtime hints cat package.json 2>/dev/null | grep -E '"engines"|"node"' ls Dockerfile* 2>/dev/null ``` Read every IaC file found. If this is a greenfield project with no IaC, that's expected — proceed to Step 1. ### Step 1: Assess Scale Stage Determine which stage this project is in before writing a single line of IaC: | Stage | Signal | Appropriate approach | | ------ | ------------------------------ | -------------------------------------------------------------------- | | 0→1 | Pre-launch or <1k users | Managed platform — Fly.io, Render, Railway. Skip Terraform entirely. | | 1→10 | 1k–50k users, PMF signal | Single cloud (AWS/GCP), managed services, Terraform, containers | | 10→100 | 50k–500k users, real load | Multi-AZ, proper networking, autoscaling configured | | 100→∞ | >500k users, known bottlenecks | Multi-region where justified, serious capacity planning | If no scale signal is given, ask one question: **"How many users/requests per day today, and what's your 6-month guess?"** Then proceed — don't wait for a perfect answer. **Stage 0→1 path:** If this is pre-PMF or very early, output a `fly.toml` or `render.yaml` and a `docker-compose.yml` for local dev. Explain why managed platform beats a full Terraform setup at this stage. This IS the right answer, not a consolation prize. **Stage 1→∞ path:** Proceed to Step 2. ### Step 2: Make the Decisions Before writing IaC, state these decisions explicitly and briefly justify each: 1. **Cloud provider** — AWS, GCP, or other. Why. 2. **Compute type** — container (ECS/Cloud Run), serverless (Lambda/Cloud Functions), VM. Why. 3. **Instance/memory sizing** — specific size. Based on what workload signal. 4. **Database** — managed type, size, single-AZ or multi-AZ. Why. 5. **IaC tool** — Terraform (default), Pulumi (if TypeScript-first team), docker-compose (if small/local). Why. 6. **Cost estimate** — rough monthly total before writing. State each decision in one line. Move on. ### Step 3: Write the IaC Generate a complete, working IaC setup. For Terraform (most common): **File: `infra/main.tf`** - Provider config with pinned version - Remote state backend (S3 + DynamoDB for AWS, GCS for GCP) - All resources: compute, networking, database, secrets, IAM **File: `infra/variables.tf`** - All configurable values with types, descriptions, and sensible defaults - Environment variable (staging/production) as a variable **File: `infra/outputs.tf`** - Service URLs, endpoints, resource IDs the app needs **File: `infra/terraform.tfvars.example`** - Example values, clearly marked as non-secret - Comment on what goes in CI secrets vs this file Every resource MUST have: - `tags` or `labels` block: `environment`, `service`, `team`, `managed-by = "terraform"` - Least-privilege IAM — no admin roles, no wildcard permissions - Explicit region (no implicit defaults) Every compute resource MUST have: - Health check configured - Autoscaling with explicit min and max (not "let it grow forever") - Scale-to-zero where workload allows Every secret reference MUST: - Use AWS Secrets Manager, GCP Secret Manager, or equivalent - Never be hardcoded in `.tf` files or passed as plaintext variables Networking defaults: - Private subnets for compute and database - Public subnet only for load balancer - Security groups/firewall rules default-deny, explicit allow - HTTPS enforced; HTTP redirects to HTTPS - No 0.0.0.0/0 ingress except on 443 (and 80 for redirect) For **docker-compose** (local dev or small-scale): - Write a complete `docker-compose.yml` with all services - Include a `.env.example` with all required variables - Named volumes for persistent data - Health checks on every service - `depends_on` with condition: service_healthy where appropriate For **Fly.io** (managed platform stage): - Write a complete `fly.toml` with correct app config, services, health checks - Include scaling config (min/max machines, auto_stop_machines) - Note what to run in `flyctl` to provision secrets and databases ### Step 4: State Cost and Trade-offs After writing the files, output a concise summary: ``` ┌─ Infrastructure: [Service Name] ──────────────────────────────┐ │ Cloud: [Provider] | Stage: [0→1 / 1→10 / etc.] │ ├───────────────────────────────────────────────────────────────┤ │ Monthly estimate │ │ Compute $XX [type, size] │ │ Database $XX [type, size] │ │ Network $XX [LB, egress est.] │ │ Total $XX │ ├───────────────────────────────────────────────────────────────┤ │ Key decisions │ │ [1-line per decision made in Step 2] │ ├───────────────────────────────────────────────────────────────┤ │ Trade-offs made │ │ [e.g., single-AZ database saves ~$40/mo, acceptable risk] │ │ [e.g., no CDN yet — add when static asset traffic grows] │ └───────────────────────────────────────────────────────────────┘ ``` Speak like a senior infra engineer in a design review: direct, opinionated, no hedging. What to change for staging vs production goes in `variables.tf` comments — not in a separate explanation. ## Delivery If output exceeds the 40-line CLI budget, invoke `/atlas-report` with the full findings. The HTML report is the output. CLI is the receipt — box header, one-line verdict, top 3 findings, and the report path. Never dump analysis to CLI.
Related in Cloud & DevOps
appbuilder-action-scaffolder
IncludedCreate, implement, deploy, and debug Adobe Runtime actions with consistent layout, validation, and error handling. Use this skill whenever the user needs to add actions to an App Builder project, understand action structure (params, response format, web/raw actions), configure actions in the manifest, use App Builder SDKs (State, Files, Events, database), deploy and invoke actions via CLI, debug action issues, or implement patterns such as webhook receivers, custom event providers, journaling consumers, large payload redirects, action sequence pipelines, and Asset Compute workers. Also trigger when users mention serverless functions in Adobe context, action logging, IMS authentication for actions, or cron-style scheduled actions.
orchestrating-datacloud
IncludedSalesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. Use this skill when the user needs a multi-step Data Cloud pipeline, cross-phase troubleshooting, or data space and data kit management. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase sf data360 workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching phase-specific skill), the task is STDM/session tracing/parquet telemetry (use observing-agentforce), standard CRM SOQL (use querying-soql), or Apex implementation (use generating-apex).
github-project-automation
IncludedAutomate GitHub repository setup with CI/CD workflows, issue templates, Dependabot, and CodeQL security scanning. Includes 12 production-tested workflows and prevents 18 errors: YAML syntax, action pinning, and configuration. Use when: setting up GitHub Actions CI/CD, creating issue/PR templates, enabling Dependabot or CodeQL scanning, deploying to Cloudflare Workers, implementing matrix testing, or troubleshooting YAML indentation, action version pinning, secrets syntax, runner versions, or CodeQL configuration. Keywords: github actions, github workflow, ci/cd, issue templates, pull request templates, dependabot, codeql, security scanning, yaml syntax, github automation, repository setup, workflow templates, github actions matrix, secrets management, branch protection, codeowners, github projects, continuous integration, continuous deployment, workflow syntax error, action version pinning, runner version, github context, yaml indentation error
sf-datacloud
IncludedSalesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase `sf data360` workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching sf-datacloud-* skill), the task is STDM/session tracing/parquet telemetry (use sf-ai-agentforce-observability), standard CRM SOQL (use sf-soql), or Apex implementation (use sf-apex).
fabric-cli
IncludedUse this skill for Fabric.so CLI workflows with the `fabric` terminal command: diagnose/install/login, search or browse a Fabric library, save notes/links/files, create folders, ask the Fabric AI assistant, manage tasks/workspaces, generate shell completion, check subscription usage, produce JSON output, and use Fabric as persistent agent memory. Do not use for Microsoft Fabric/Azure/Power BI `fab`, Daniel Miessler's Fabric framework, Python Fabric SSH, Fabric.js, or textile/fashion fabric.
lark
IncludedLark/Feishu CLI skills: lark-cli operations for docs, markdown, sheets, base, calendar, im, mail, task, okr, drive, wiki, slides, whiteboard, apps, approval, attendance, contact, vc, minutes, event. Use when the user needs to operate Lark/Feishu resources via lark-cli, send messages, manage documents, spreadsheets, calendars, tasks, OKRs, deploy web pages, or any Feishu/Lark workspace operations.