chief-ai-officer-advisor
Chief AI Officer advisory for startups: model build-vs-buy decisions (API vs fine-tune vs in-house), AI risk classification under EU AI Act + US state patchwork, AI cost economics (API-to-self-hosted breakeven), and AI team org evolution. Use when deciding whether to call an API or fine-tune, classifying AI use cases for regulatory risk, calculating when self-hosting pays off, sequencing AI hires, or when user mentions CAIO, AI strategy, model selection, foundation model, fine-tuning, EU AI Act, NIST AI RMF, AI governance, model risk, or AI economics. Strategic only — does not duplicate engineering AI/ML skills.
What this skill does
# Chief AI Officer Advisor Strategic AI leadership for startup CAIOs and founders without one. **Four decisions, no AI hype:** 1. **Should we use an API, fine-tune, or build our own?** — model build-vs-buy with 3-year TCO 2. **Is this AI use case high-risk under regulation, and how do we govern it?** — EU AI Act + NIST AI RMF + US state patchwork 3. **When do we switch from API to self-hosted, and at what cost?** — token economics with breakeven analysis 4. **What AI role do we hire next?** — stage-to-role map (AI engineer ≠ ML engineer ≠ research scientist) This skill does **not** cover tactical AI/ML engineering. For RAG implementation, agent design, prompt engineering, eval infrastructure, model deployment, or cost optimization, see `engineering/rag-architect/`, `engineering/agent-designer/`, `engineering/prompt-governance/`, `engineering/self-eval/`, `engineering/llm-cost-optimizer/`. ## Keywords CAIO, chief AI officer, AI strategy, model selection, foundation model, fine-tuning, RLHF, DPO, LoRA, QLoRA, build vs buy, AI build-vs-buy, model risk tier, EU AI Act, AI Act Article 6, Article 9, Article 10, Annex III, prohibited AI, high-risk AI, NIST AI RMF, AI risk management framework, NYC Local Law 144, Colorado SB 21-169, Illinois HB 53, model card, eval set, eval harness, hallucination rate, jailbreak risk, prompt injection, AI red team, AI safety, alignment, model lifecycle, model registry, API-to-self-hosted breakeven, GPU economics, A100, H100, inference cost, fine-tuning cost, AI team, AI engineer, ML engineer, research scientist, MLOps, AI platform ## Quick Start ```bash # Decision A: API vs fine-tune vs build python scripts/model_buildvsbuy_calculator.py # embedded customer-support sample python scripts/model_buildvsbuy_calculator.py path/to/use_case.json # Decision B: Risk classification under EU AI Act + US state laws python scripts/ai_risk_classifier.py # embedded hiring-AI sample python scripts/ai_risk_classifier.py path/to/use_case.json # Decision C: API vs self-hosted economics python scripts/ai_cost_economics.py # embedded 5M tokens/day sample python scripts/ai_cost_economics.py path/to/workload.json ``` ## Key Questions (ask these first) - **What does this AI need to be good at, and how would you measure it?** (If no eval set, no ship.) - **What's the SLO on hallucination / error rate?** (Without one, "AI quality" is a vibe.) - **What happens when the model is wrong?** (Fallback behavior, human-in-the-loop, blast radius.) - **What's the risk tier under EU AI Act, and is conformity assessment required?** (Determines product launch timeline.) - **At what monthly token volume does self-hosting beat API?** (Almost never below 100M tokens/month at frontier quality.) - **Are we hiring an AI engineer or an ML research scientist?** (Different jobs; founders confuse them.) ## Core Responsibilities ### 1. Model Build-vs-Buy The decision is not "use AI or not" — it's **API vs fine-tune vs in-house** for each use case. Each path has a different TCO curve, latency profile, and capability ceiling. **Default path: API (frontier model)** - Use when: well-served by frontier (Claude, GPT, Gemini), QPS < 100, latency budget > 1s, cost < $50K/month - Why: frontier APIs are 10-100x more capable than what most teams can fine-tune in-house - Failure mode: API rate limits at scale, vendor lock-in, capability drift between model versions **Fine-tune a smaller model** - Use when: domain-specific behavior the API can't be prompted into (medical coding, legal redlining), high volume reducing API cost, latency budget < 500ms, specific style/format consistency required - Approaches: full fine-tune (rare), LoRA/QLoRA (common), RLHF/DPO (when alignment matters) - Failure mode: fine-tuned model lags frontier capability within 6-12 months; ongoing retraining cost **Build from scratch / pre-train** - Use when: almost never. You're a foundation-model company, OR you have a unique data corpus, $50M+ funding, and 18+ month patience. - Failure mode: by the time you ship, frontier models have caught up and your sunk cost is unrecoverable **Run** `model_buildvsbuy_calculator.py` for a use-case-specific recommendation with 3-year TCO. See `references/model_buildvsbuy_strategy.md` for full decision tree. ### 2. AI Risk Classification & Governance The 2026 question every founder is facing: **does this AI use case trigger high-risk regulatory obligations?** **EU AI Act (in force 2026) tiers:** | Tier | Examples | Obligations | |---|---|---| | **Prohibited** | Social scoring, real-time biometric surveillance, manipulative AI | Cannot deploy in EU | | **High-risk** | Employment screening, credit scoring, education access, critical infrastructure, law enforcement, biometric ID | Conformity assessment, registration, post-market monitoring, transparency, human oversight | | **Limited-risk** | Chatbots, deepfakes, emotion recognition | Transparency: user must know they're interacting with AI | | **Minimal-risk** | Recommendation systems, spam filters, most B2B SaaS internals | No specific obligations | **Run** `ai_risk_classifier.py` to classify a use case and get the required-controls list. **US state patchwork (non-exhaustive):** - NYC LL 144 — Automated Employment Decision Tools (AEDTs) require annual bias audit + candidate notice - Colorado AI Act / SB 21-169 — AI in consumer decisions (credit, insurance, employment, housing) - Illinois HB 53 — AI in interview/hiring - California SB 1001 — Bot disclosure - Texas TCPA — Biometric identifier capture - Federal NIST AI RMF — voluntary; increasingly referenced in contracts **Industry-specific overlays:** - Healthcare: FDA AI/ML guidance (2023), MDR (EU) for medical-device AI, 510(k) pathway for AI/ML-enabled medical devices - Financial: NYDFS Reg 23, FTC Section 5, ECOA for credit decisions - Insurance: NAIC model bulletin, state insurance commissioner rules See `references/ai_risk_governance.md` for the full regulatory landscape + governance program checklist. ### 3. AI Cost Economics **The breakeven question:** at what monthly token volume does self-hosted inference beat API costs? **Key components:** - **API cost** — variable, per-token. Frontier models 2026: Claude Sonnet 4.6 ~$3/$15 per M tokens (input/output), GPT-4o ~$2.50/$10, Gemini 2.5 ~$1.25/$5 - **Self-hosted cost** — fixed (GPU commitment) + variable (electricity). H100 spot ~$2-5/hour, A100 spot ~$1-3/hour. Llama 3.1 70B / Qwen 2.5 72B: ~$0.50-2.00 per million output tokens at 70% utilization - **Hidden costs of self-hosting** — ops on-call, monitoring, model updates, scaling overhead, idle time penalty - **Hidden costs of API** — rate limits requiring multi-vendor failover, vendor lock-in, capability drift between versions, data residency **Typical breakeven (frontier-quality):** 100M–500M tokens/month, depending on model size and acceptable quality tradeoff. Below this, API wins. Above this, run the calculator. **Run** `ai_cost_economics.py` with workload characteristics for a breakeven point + sensitivity to GPU rates and model size. See `references/ai_cost_economics.md` for the full economics model and operational considerations. ### 4. AI Team Org Evolution **The wrong question:** "Should we hire an ML engineer or a research scientist?" **The right question:** "What's the next AI capability we need to ship, and what role unblocks that?" Stage-to-role map: | Stage | First AI hire | Then | Then | |---|---|---|---| | Pre-PMF | Founder + 1 ML-curious engineer playing with prompts | — | — | | Series A | **AI engineer** (applied, full-stack; owns prompts/evals/deployment) | Second AI engineer for evals/quality | — | | Series B | AI/ML platform engineer (inference, evals, observability) | Third AI engineer for production reliability | Data scientist if model is core IP | | Series C | Manager of AI | ML research scientist (only if model IS
Related in Backend & APIs
jfrog
IncludedInteract with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.
cupynumeric-migration-readiness
IncludedPre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must be refactored before porting, or mentions pre-port assessment, scaling analysis, or refactor planning. Inspect the user's source code, look up NumPy usage, cross-reference the cuPyNumeric API support manifest, and distinguish distributed-scaling-friendly patterns from blockers such as unsupported APIs, scalar synchronization, host round-trips, Python/object-heavy control flow, shape/data-dependent branching, and in-place mutation hazards. Produce a verdict of READY, LIGHT REFACTOR, SIGNIFICANT REFACTOR, or NOT RECOMMENDED, with concrete refactor pointers.
alibabacloud-data-agent-skill
IncludedInvoke Alibaba Cloud Apsara Data Agent for Analytics via CLI to perform natural language-driven data analysis on enterprise databases. Data Agent for Analytics is an intelligent data analysis agent developed by Alibaba Cloud Database team for enterprise users. It automatically completes requirement analysis, data understanding, analysis insights, and report generation based on natural language descriptions. This tool supports: discovering data resources (instances/databases/tables) managed in DMS, initiating query or deep analysis sessions, real-time progress tracking, and retrieving analysis conclusions and generated reports. Use this Skill when users need to query databases, analyze data trends, generate data reports, ask questions in natural language, or mention "Data Agent", "data analysis", "database query", "SQL analysis", "data insights".
token-optimizer
IncludedReduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session pruning, bootstrap size limits, cache TTL alignment). Use when token costs are high, API rate limits are being hit, or hosting multiple agents at scale. The 4 executable scripts (context_optimizer, model_router, heartbeat_optimizer, token_tracker) are local-only — no network requests, no subprocess calls, no system modifications. Reference files (PROVIDERS.md, config-patches.json) document optional multi-provider strategies that require external API keys and network access if you choose to use them. See SECURITY.md for full breakdown.
resend-cli
IncludedUse this skill when the task is specifically about operating Resend from an AI agent, terminal session, or CI job via the official resend CLI: installing/authenticating the CLI, sending/listing/updating/cancelling emails, batch sends, domains and DNS, webhooks and local listeners, inbound receiving, contacts, topics, segments, broadcasts, templates, API keys, profiles, or debugging Resend CLI/API failures. Trigger on mentions of Resend CLI, `resend`, `resend doctor`, `resend emails send`, `resend domains`, `resend webhooks listen`, `resend emails receiving`, or agent-friendly terminal automation.
alibabacloud-odps-maxframe-coding
IncludedUse this skill for MaxFrame SDK development and documentation navigation on Alibaba Cloud MaxCompute (ODPS). Helps answer MaxFrame API, concept, official example, and supported pandas API questions; create data processing programs; read/write MaxCompute tables; debug jobs (remote or local); and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute with MaxFrame, ODPS table processing, DPE runtime, MaxFrame docs/examples, DataFrame/Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.