promptfoo-redteam-setup

Included with Lifetime

$97 forever

Create or refine promptfoo redteam setup configs: purpose, targets, plugins, strategies, frameworks, multi-input target inputs, policy text, grader guidance, contexts, and static-code-derived target/threat mapping. Use when preparing a red team scan plan from live probes, code evidence, or provider configs, or when generating adversarial test cases for QA. Do not use for basic provider wiring alone or for running/evaluating an already-generated redteam scan.

Securityscripts

What this skill does

# Promptfoo Redteam Setup

Build a small, explicit redteam config that matches the real app threat model.
Start with a narrow scan that can be generated and inspected, then expand.

Read `references/redteam-setup-patterns.md` when you need concrete YAML
patterns.
For OpenAPI specs, you can run the bundled
`scripts/openapi-operation-to-redteam-config.mjs` to draft a one-operation
redteam setup config, then inspect the inferred inputs, policy, and plugins. The
script ships in this skill's `scripts/` directory; when the skill is installed as
a plugin it lives in the plugin cache, not your project, so run it by its absolute
path (or copy it in) rather than a bare `scripts/...` path.
With `--token-env`, it infers Bearer/OAuth2/OpenID and header/query/cookie API-key auth; use
`--auth-header`/`--auth-prefix` to override.
For live connectivity QA, add `--smoke-test true` to include one deterministic
`tests` row that can be run with `npm run local -- eval -c ... --no-cache`
before redteam generation.

## Inputs

Infer these from the repo, docs, or user prompt:

- Target shape: HTTP/API, model provider, custom provider, agent, RAG, MCP/tool
system, or multi-input app.
- Purpose: who uses the system, what it may access or do, and what it must
refuse or protect.
- Trust boundaries: identities, object IDs, documents, tools, secrets,
permissions, and external content.
- Discovery evidence: live probe trace, route/controller files, OpenAPI specs,
existing tests, SDK clients, or provider wrappers.
- First-pass scope: risk categories the user cares about most.

If target wiring is missing, use `promptfoo-provider-setup` first or create a
TODO-marked target block and validate it before generation.

## Workflow

### 1. Derive target facts from live or static evidence

- For live endpoints, use only safe probes and keep the request/response trace
that proves method, path, auth, body/query fields, and response path.
- For static code, search route handlers, API clients, tests, and auth/object
checks with `rg`; capture file paths and line numbers for the setup notes.
- Preserve identity, tenant, role, object, document, and tool/action fields as
target `inputs`; these are the attack surface for authorization plugins.
- Convert evidence into risks: object IDs imply `bola`, role/permission checks
imply `rbac`/`bfla`, free-form instructions imply prompt-boundary plugins,
tool URLs or shell/database calls imply SSRF/injection/tool plugins.
- Use a JavaScript or Python local wrapper when static code is easier and safer
to exercise than the deployed endpoint; otherwise map the live HTTP contract
directly.

### 2. Write the target and purpose

- Prefer `targets` for redteam configs.
- Add stable `label`; reports and generated files use it for continuity.
- For single-input targets, include a prompt template or set `redteam.injectVar`
so generation lands in the variable the target actually uses.
- For multi-input targets, define `inputs` on the target. Do not set
`redteam.injectVar` or invent a synthetic `prompt` field.
- Write `redteam.purpose` as security-relevant behavior: allowed users/actions,
forbidden data/actions, and domain-specific constraints.

### 3. Choose a small plugin set

Avoid `plugins: default` for an initial scan unless the user explicitly wants a
broad run.

Pick 2-5 plugins from the app's real risks:

- Policy/business rules: `policy`
- Authorization and object access: `bola`, `bfla`, `rbac`
- Prompt boundaries: `hijacking`, `prompt-extraction`, `system-prompt-override`
- RAG/document workflows: `indirect-prompt-injection`,
`rag-document-exfiltration`, `rag-poisoning`, `rag-source-attribution`
- Tool/agent systems: `excessive-agency`, `tool-discovery`, `debug-access`,
`shell-injection`, `sql-injection`, `ssrf`
- Privacy: `pii:direct`, `pii:session`, `pii:social`
- Domain packs: use finance, medical, insurance, ecommerce, real estate,
telecom, teen-safety, or pharmacy plugins only when that domain is real.

For `policy`, include inline policy text unless the user intentionally references
a resolved Promptfoo Cloud policy object.

### 4. Choose strategies conservatively

- Use `jailbreak:meta` for the default first setup/generation pass.
- Use `jailbreak:hydra` instead when the target is stateful, supports
multi-turn conversations, and sessions are configured.
- Add broader follow-up strategies such as `jailbreak:composite` only after the
first generated cases look sane.

### 5. Configure generation and grading

- Use Promptfoo's default redteam generation unless a specific generator or
model is needed for reproducibility, cost, or fixture QA.
- When using `redteam.provider: file://...`, make the path valid from the
command working directory; JavaScript providers expose `callApi`, while Python
providers expose `call_api` or the function named in a `file://x.py:name`
suffix. Run commands from the repo root unless the project convention says
otherwise.
- For deterministic QA, use a small local file provider that returns Promptfoo's
expected prompt format.
- Use high-value plugins such as `bola` and `bfla` whenever target evidence
shows object IDs, ownership checks, or authorization boundaries.
- Use `redteam.maxConcurrency: 1` for fragile local providers or rate-limited
targets.
- Add plugin-level `graderGuidance` and `graderExamples` only when default
grading would misunderstand domain-specific allowed behavior.

### 6. Validate and generate

From the promptfoo repo:

```bash
npm run local -- validate config -c path/to/promptfooconfig.yaml
npm run local -- validate target -c path/to/promptfooconfig.yaml
npm run local -- redteam generate -c path/to/promptfooconfig.yaml -o /tmp/redteam.yaml --no-cache --force --no-progress-bar --strict
```

Outside the repo (installed plugin or your own project), use the published CLI:

```bash
npx promptfoo@latest validate config -c path/to/promptfooconfig.yaml
npx promptfoo@latest redteam generate -c path/to/promptfooconfig.yaml -o /tmp/redteam.yaml --no-cache --force --no-progress-bar --strict
```

Use a non-precreated output path or keep `--force`; `redteam generate` reads an
existing output file to compare metadata and an empty temp file can fail before
generation. Inspect generated YAML for `tests`, `assert`,
per-test `metadata.pluginId`, `defaultTest.metadata.purpose`, and preserved
multi-input vars. Do not proceed to `redteam run` until generated cases are
plausible.

If the target uses config-relative `file://./target.js` or `file://./target.py`,
write generated YAML next to the source config or switch the target to a stable
absolute/repo-root path before validating; `/tmp/redteam.yaml` makes relative
file targets resolve under `/tmp`.

## Common Mistakes

```yaml
# WRONG: too broad for a first pass
redteam:
plugins:
- default

# BETTER: risk-led starter
redteam:
plugins:
- id: policy
config:
policy: The assistant must not disclose another user's records.
- bola
- rbac
```

```yaml
# WRONG: multi-input mode configured under redteam
redteam:
injectVar: message

# BETTER: define real input variables on the target
targets:
- id: https
inputs:
user_id: Signed-in user identifier.
record_id: Record being requested.
message: User message.
```

## Output Contract

When done, state:

- Target mode and whether `promptfoo-provider-setup` was needed
- Purpose summary and selected plugin/strategy rationale
- Files created or changed
- Validation/generation commands run and generated test count
- Risks intentionally deferred to a later scan

Files: 4

Size: 50.9 KB

Complexity: 64/100

Category: Security

Source: https://github.com/promptfoo/promptfoo/tree/main/plugins/promptfoo/skills/promptfoo-redteam-setup

Related in Security

mac-ops

Included

Comprehensive macOS workstation operations — diagnose kernel panics, identify failing drives, audit launchd startup items, decode wake reasons, triage TCC permission denials, manage APFS snapshots, recover from no-boot. Use for: Mac is slow, slow bootup, won't boot, kernel panic, kernel_task hot, mds_stores CPU, photoanalysisd, cloudd, login loop, gray screen, sleep wake failure, drive failing, IO errors, APFS snapshots eating space, Time Machine local snapshots, Spotlight indexing, launchd, LaunchAgent, LaunchDaemon, login items, TCC permissions, Full Disk Access, Screen Recording denied, Gatekeeper, quarantine, com.apple.quarantine, app is damaged, helper tool, /Library/PrivilegedHelperTools, pmset, wake reasons, dark wake, sysdiagnose, panic.ips, DiagnosticReports, configuration profile, MDM profile, remote diagnostics over SSH.

Securityscripts

a11y-audit

Included

Run accessibility audits on web projects combining automated scanning (axe-core, Lighthouse) with WCAG 2.1 AA compliance mapping, manual check guidance, and structured reporting. Output is configurable: markdown report only, markdown plus machine-readable JSON, or markdown plus issue tracker integration. Use this skill whenever the user mentions "accessibility audit", "a11y audit", "WCAG audit", "accessibility check", "compliance scan", or asks to check a web project for accessibility issues. Also trigger when the user wants to verify WCAG conformance or map findings to a specific standard (CAN-ASC-6.2, EN 301 549, ADA/AODA).

Securityscripts

erpclaw

Included

AI-native ERP system with self-extending OS. Full accounting, invoicing, inventory, purchasing, tax, billing, HR, payroll, advanced accounting (ASC 606/842, intercompany, consolidation), and financial reporting. 413 actions across 14 domains, 43 expansion modules. Constitutional guardrails, adversarial audit, schema migration. Double-entry GL, immutable audit trail, US GAAP.

Securityscripts

assess

Included

Assesses and rates quality 0-10 across multiple dimensions (correctness, maintainability, security, performance, testability, simplicity) with pros/cons analysis. Compares against project conventions and prior decisions from memory. Produces structured evaluation reports with actionable improvement suggestions. Use when evaluating code, designs, architectures, or comparing alternative approaches.

Securityscripts

spring-boot-security-jwt

Included

Provides JWT authentication and authorization patterns for Spring Boot 3.5.x covering token generation with JJWT, Bearer/cookie authentication, database/OAuth2 integration, and RBAC/permission-based access control using Spring Security 6.x. Use when implementing authentication or authorization in Spring Boot applications.

Securityscripts

code-hardcode-audit

Included

Detect hardcoded values, magic numbers, and leaked secrets. TRIGGERS - hardcode audit, magic numbers, PLR2004, secret scanning.

Securityscripts

Detect hardcoded values, magic numbers, and leaked secrets. TRIGGERS - hardcode audit, magic numbers, PLR2004, secret scanning.

Securityscripts