promptfoo-provider-setup
Configure promptfoo providers or redteam targets for hosted models, live HTTP APIs, Python/JavaScript local scripts, agent SDKs, or multi-input systems. Use when connecting promptfoo to the system under test, mapping vars, auth env vars, request bodies, response transforms, or static-code-derived provider wrappers. Do not use for choosing eval assertions or red team plugins unless a smoke test is needed to verify the connection.
What this skill does
# Promptfoo Provider Setup
Connect Promptfoo to the system under test with the smallest reliable provider
or target configuration. Prefer a working smoke test over a clever abstraction.
Read `references/provider-patterns.md` when you need concrete YAML or provider
wrapper examples.
For OpenAPI specs, you can run the bundled
`scripts/openapi-operation-to-config.mjs` to draft a one-operation HTTP smoke
config, then inspect and edit the result before probing. The script ships in this
skill's `scripts/` directory; when the skill is installed as a plugin it lives in
the plugin cache, not your project, so run it by its absolute path (or copy it in)
rather than a bare `scripts/...` path. With `--token-env`, it
infers Bearer/OAuth2/OpenID and header/query/cookie API-key auth; use
`--auth-header`/`--auth-prefix` to override.
## Inputs
Infer from the repo or user prompt when possible:
- Target surface: hosted model, live HTTP endpoint, local function/script, agent
harness, MCP/tool agent, or redteam target.
- Invocation shape: method, URL/path, headers, request body, input vars, auth,
streaming/statefulness, and expected response field.
- Safety boundary: whether it is okay to call the live endpoint and which sample
payload is safe.
- Output goal: eval provider block, redteam `targets` block, local provider
wrapper, or a minimal smoke-test suite.
If the contract is unclear, create a conservative TODO-marked starter and state
exactly what must be verified before using it against production.
## Workflow
### 1. Pick discovery mode
Use one of these modes, or combine them:
- **Live HTTP endpoint**: probe an already-running endpoint with safe requests.
- **Static code discovery**: inspect route handlers, OpenAPI specs, tests, SDK
clients, or existing fetch/axios calls.
- **Hybrid**: compare static contract assumptions with a live probe.
- **Wrapper mode**: write `provider.js` or `provider.py` when built-in providers
cannot express auth, signing, streaming, multi-step calls, or custom parsing.
Do not send secrets to unknown endpoints. Use `{{env.VAR}}` placeholders in
configs and local environment variables only in shell commands.
### 2. Discover the contract
For live HTTP endpoints:
1. Start with non-mutating checks: docs URL, OpenAPI URL, health endpoint,
`OPTIONS`, or a safe `GET`.
2. Make at most one safe representative call before writing config.
3. Capture the response shape and status/error behavior.
4. Prefer explicit JSON paths in `transformResponse`, such as `json.output`.
5. Use `queryParams` for query-string fields on any HTTP method, and use the
`text` variable in `transformResponse` for plain-text responses.
6. Set `stateful: false` for stateless endpoints; otherwise `validate target`
will run a session-memory check. For stateful apps, include `{{sessionId}}`
in the request or configure server-side session parsing.
For static code discovery:
1. Search for route definitions, tests, and clients with `rg`.
2. Identify method, path, required headers, request schema, response schema, and
authentication source.
3. If the app constructs prompts dynamically, wrap the real code instead of
duplicating business logic in YAML.
4. For agents/tools, identify whether Promptfoo should send one string input or
a structured object with named fields.
### 3. Choose the provider pattern
- Use `id: https` for straightforward JSON HTTP APIs.
- Use `file://provider.js` or `file://provider.py` for custom auth, request
signing, streaming, retries, multi-step setup, local code, Python agent SDKs,
or complex parsing.
- Use native model providers for direct model comparisons.
- Use `targets` with `inputs` for redteam multi-input systems. Do not invent a
single `prompt` field when the real app accepts named inputs.
### 4. Implement the minimal smoke test
Add or update a config with:
- `# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json`
- A short `description`
- Provider or `targets` config with `{{env.VAR}}` for secrets
- One or two smoke tests that verify the request reaches the target and the
response transform extracts the right field
- `--no-cache` run commands
When writing a local wrapper, return `{ output }` and include structured errors
when the target response is malformed. JavaScript providers receive config in
constructor `options.config` and expose `callApi(prompt, context)`; read named
inputs from `context.vars`. Python providers use `file://provider.py` or
`file://provider.py:function_name`; the function takes `(prompt, options,
context)` and reads named inputs from `context.get("vars", {})`. Set
`config.workers: 1` for non-thread-safe SDKs, `config.timeout` for slow calls,
and `config.pythonExecutable`/`PROMPTFOO_PYTHON` for venvs. Add harmless
defaults because `validate target` may call providers without test-case vars.
### 5. Validate and run
From the promptfoo repo, use the local build:
```bash
npm run local -- validate config -c path/to/promptfooconfig.yaml
npm run local -- validate target -c path/to/promptfooconfig.yaml
npm run local -- eval -c path/to/promptfooconfig.yaml -o output.json --no-cache --no-share
```
Outside the promptfoo repo, use:
```bash
npx promptfoo@latest validate config -c path/to/promptfooconfig.yaml
npx promptfoo@latest validate target -c path/to/promptfooconfig.yaml
npx promptfoo@latest eval -c path/to/promptfooconfig.yaml -o output.json --no-cache --no-share
```
Inspect the output file for `results.stats`, `response.output`, `score`, and
`error`; do not rely only on the process exit code.
Use `--no-share` by default while probing live or internal systems. Remove it
only when the user explicitly wants a cloud share URL.
## Common Mistakes
```yaml
# WRONG: shell-style env vars are literal strings in YAML
apiKey: $API_KEY
# CORRECT: promptfoo renders Nunjucks env references
apiKey: '{{env.API_KEY}}'
```
```yaml
# WRONG: flattening a multi-input target into prompt loses attack surface
body:
prompt: '{{prompt}}'
# BETTER: preserve the real app fields
body:
user_id: '{{user_id}}'
message: '{{message}}'
```
## Output Contract
When done, state:
- Connection mode used: live, static, hybrid, or wrapper
- Files created or modified
- Required environment variables
- Safe smoke command with `--no-cache --no-share`
- What was actually verified, and what remains a TODO
Related in AI Agents
skill-development
IncludedComprehensive meta-skill for creating, managing, validating, auditing, and distributing Claude Code skills and slash commands (unified in v2.1.3+). Provides skill templates, creation workflows, validation patterns, audit checklists, naming conventions, YAML frontmatter guidance, progressive disclosure examples, and best practices lookup. Use when creating new skills, validating existing skills, auditing skill quality, understanding skill architecture, needing skill templates, learning about YAML frontmatter requirements, progressive disclosure patterns, tool restrictions (allowed-tools), skill composition, skill naming conventions, troubleshooting skill activation issues, creating custom slash commands, configuring command frontmatter, using command arguments ($ARGUMENTS, $1, $2), bash execution in commands, file references in commands, command namespacing, plugin commands, MCP slash commands, Skill tool configuration, or deciding between skills vs slash commands. Delegates to docs-management skill for official documentation.
reprompter
IncludedTransform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptception", "run with quality", "smart run", "smart agents", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files. Success criteria: Single mode quality score ≥ 7/10; Repromptception per-agent prompt quality score 8+/10; all required sections present, actionable and specific.
adaptive-compaction
IncludedAdaptive add-on policy and recovery layer that decides WHEN to compact, prune, snapshot, or fork -- replacing fixed-percent auto-compaction across Claude Code, Codex, and MCP-capable hosts. Trigger on auto-compact timing or damage: "when should I compact", "is it safe to compact now or start a fresh session", "auto-compact fires too early/mid-task", "switching to an unrelated task but the window still has space", "context rot", "answers get worse the longer the session runs", "the agent forgot the plan or my decisions after it summarized", "add a layer on top that manages context without changing the agent", raising autoCompactWindow to give the policy room, or installing/tuning a cross-tool compaction policy or PreCompact hook -- even when "compaction" is never said but the problem is context-window pressure or post-summarization memory loss. Do NOT use to summarize a conversation, build RAG, write a summarization prompt (decides WHEN not HOW), or answer max-context-length trivia.
agent-skill-creator
IncludedCreate cross-platform agent skills from workflow descriptions. Activates when users ask to create an agent, automate a repetitive workflow, create a custom skill, or need advanced agent creation. Triggers on phrases like create agent for, automate workflow, create skill for, every day I have to, daily I need to, turn process into agent, need to automate, create a cross-platform skill, validate this skill, export this skill, migrate this skill. Supports single skills, multi-agent suites, transcript processing, template-based creation, interactive configuration, cross-platform export, and spec validation.
llm-wiki
IncludedUse when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.
skill-master
IncludedAgent Skills authoring, evaluation, and optimization. Create, edit, validate, benchmark, and improve skills following the agentskills.io specification. Use when designing SKILL.md files, structuring skill folders (references, scripts, assets), ingesting external documentation into skills, running trigger evals, benchmarking skill quality, optimizing descriptions, or performing blind A/B comparisons. Keywords: agentskills.io, SKILL.md, skill authoring, eval, benchmark, trigger optimization.