Workers AI
This skill should be used when the user asks about "Workers AI", "AI models", "text generation", "embeddings", "semantic search", "RAG", "Retrieval Augmented Generation", "AI inference", "LLaMA", "Llama", "bge embeddings", "@cf/ models", "AI Gateway", or discusses implementing AI features, choosing AI models, generating embeddings, or building RAG systems on Cloudflare Workers.
What this skill does
# Workers AI
## Purpose
This skill provides comprehensive guidance for using Workers AI, Cloudflare's AI inference platform. It covers available models, inference patterns, embedding generation, RAG (Retrieval Augmented Generation) architectures, AI Gateway integration, and best practices for AI workloads. Use this skill when implementing AI features, selecting models, building RAG systems, or optimizing AI inference on Workers.
## Workers AI Overview
Workers AI provides serverless AI inference at the edge with:
- **Text Generation**: LLMs for chat, completion, summarization
- **Embeddings**: Vector representations for semantic search
- **Image Generation**: Text-to-image models
- **Vision**: Image classification and object detection
- **Speech**: Text-to-speech and automatic speech recognition
- **Translation**: Language translation models
### Key Benefits
- **Edge deployment**: Low latency inference globally
- **No infrastructure**: Serverless, auto-scaling
- **Integrated**: Native integration with Workers, Vectorize, D1
- **Cost-effective**: Pay per inference, no minimum
- **Latest models**: Llama 3.1, Mistral, BAAI embeddings
## Project-Specific Model Decisions
Before recommending a model:
1. Check `.claude/cloudflare-expert.local.md` for existing decisions in the "AI Model Decisions" section
2. If found, use the saved decision and mention: "Based on your project's saved configuration..."
3. If not found, describe options with trade-offs and let the user decide
4. After user decides, offer to save the decision to memory with rationale
## Model Information Freshness
**Fetch fresh info via Docs MCP when**:
- User asks for "latest" or "current" models
- Memory decision is older than 90 days
- Starting a new project
- User mentions an unknown model
**Use skill knowledge when**:
- Explaining patterns (RAG workflow, chunking)
- Showing code patterns (API usage)
- Teaching concepts (temperature, top-k)
## Model Categories
### Text Generation Models
**LLaMA 3.1** (Long context, multilingual):
- `@cf/meta/llama-3.1-8b-instruct` - Chat and instruction following
- Best for: Conversational AI, Q&A, summarization, general text generation
- Context window: 128K tokens
- Multilingual support
**Mistral** (Fast, efficient):
- `@cf/mistral/mistral-7b-instruct-v0.2` - Fast instruction following
- Best for: Quick responses, simpler tasks
- Context window: 32K tokens
**Qwen** (Balanced efficiency):
- `@cf/qwen/qwen1.5-14b-chat-awq` - Quantized for efficiency
- Best for: Balance between speed and quality
See `references/model-selection-framework.md` for decision criteria and `references/workers-ai-models.md` for complete model catalog.
### Embedding Models
**BGE Base** (English, balanced):
- `@cf/baai/bge-base-en-v1.5` - High-quality English embeddings
- Dimensions: 768
- Best for: RAG, semantic search, English content
**BGE Large** (Higher quality, slower):
- `@cf/baai/bge-large-en-v1.5` - Higher quality, more compute
- Dimensions: 1024
- Best for: When quality is critical
**BGE Small** (Fast, compact):
- `@cf/baai/bge-small-en-v1.5` - Faster, smaller model
- Dimensions: 384
- Best for: When speed is critical, large volumes
**BGE M3** (Multilingual):
- `@cf/baai/bge-m3` - Multilingual support
- Best for: Multi-language content
### Image Generation
**Stable Diffusion**:
- `@cf/stabilityai/stable-diffusion-xl-base-1.0` - Text-to-image
- `@cf/bytedance/stable-diffusion-xl-lightning` - Faster generation
- Best for: Creating images from text descriptions
### Vision Models
**Image Classification**:
- `@cf/microsoft/resnet-50` - Object recognition
- Best for: Classifying image content
### Speech Models
**Text-to-Speech**:
- `@cf/meta/m2m100-1.2b` - Multilingual speech synthesis
**Automatic Speech Recognition**:
- `@cf/openai/whisper` - Speech-to-text
- Best for: Transcribing audio
## Text Generation
### Basic Inference
```javascript
export default {
async fetch(request, env, ctx) {
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is Cloudflare Workers?' }
]
});
return new Response(JSON.stringify(response));
}
};
```
### Streaming Responses
```javascript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'user', content: 'Write a story about...' }
],
stream: true
});
return new Response(stream, {
headers: { 'Content-Type': 'text/event-stream' }
});
```
### Model Parameters
```javascript
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [/* messages */],
max_tokens: 512, // Max tokens to generate
temperature: 0.7, // Creativity (0-1, higher = more random)
top_p: 0.9, // Nucleus sampling
top_k: 40, // Top-k sampling
repetition_penalty: 1.2 // Penalize repetition
});
```
**Parameter guidelines**:
- **temperature**: 0.1-0.3 for factual, 0.7-0.9 for creative
- **max_tokens**: Set based on expected response length
- **top_p/top_k**: Usually leave at defaults unless fine-tuning behavior
## Embeddings
### Generating Embeddings
```javascript
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: ['Hello world', 'Another sentence']
}) as { data: number[][] };
const vector1 = embeddings.data[0]; // [0.123, -0.456, ...]
const vector2 = embeddings.data[1];
```
**Important TypeScript note**: Always add `as { data: number[][] }` type assertion when using embeddings API.
### Batch Processing
```javascript
// Batch multiple texts for efficiency
const texts = documents.map(d => d.content);
// Process in batches of 100 (recommended batch size)
const batchSize = 100;
const allEmbeddings = [];
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize);
const result = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: batch
}) as { data: number[][] };
allEmbeddings.push(...result.data);
}
```
### Text Chunking for Embeddings
For long documents, split into chunks before embedding:
```javascript
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 500, // Characters per chunk
chunkOverlap: 50 // Overlap between chunks
});
const chunks = await splitter.splitText(longDocument);
// Generate embedding for each chunk
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: chunks
}) as { data: number[][] };
// Store each chunk with its embedding
for (let i = 0; i < chunks.length; i++) {
await env.VECTOR_INDEX.insert([{
id: `${docId}-chunk-${i}`,
values: embeddings.data[i],
metadata: { text: chunks[i], docId, chunkIndex: i }
}]);
}
```
See `references/rag-architecture-patterns.md` for complete RAG implementation patterns.
## RAG (Retrieval Augmented Generation)
### Basic RAG Pattern
```javascript
async function answerQuestion(question, env) {
// 1. Generate question embedding
const questionEmbedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: [question]
}) as { data: number[][] };
// 2. Find similar documents
const similar = await env.VECTOR_INDEX.query(questionEmbedding.data[0], {
topK: 3,
returnMetadata: true
});
// 3. Build context from retrieved documents
const context = similar.matches
.map(match => match.metadata.text)
.join('\n\n');
// 4. Generate answer with context
const answer = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{
role: 'system',
content: 'Answer the question using only the provided context. If the answer is not in the context, say "I don\'t have enough information."'
},
{
role: 'user',
content: `Context:\n${context}\n\nQuestion: ${question}`
}
]
});
return {
answer: answer.response,
Related in Cloud & DevOps
appbuilder-action-scaffolder
IncludedCreate, implement, deploy, and debug Adobe Runtime actions with consistent layout, validation, and error handling. Use this skill whenever the user needs to add actions to an App Builder project, understand action structure (params, response format, web/raw actions), configure actions in the manifest, use App Builder SDKs (State, Files, Events, database), deploy and invoke actions via CLI, debug action issues, or implement patterns such as webhook receivers, custom event providers, journaling consumers, large payload redirects, action sequence pipelines, and Asset Compute workers. Also trigger when users mention serverless functions in Adobe context, action logging, IMS authentication for actions, or cron-style scheduled actions.
orchestrating-datacloud
IncludedSalesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. Use this skill when the user needs a multi-step Data Cloud pipeline, cross-phase troubleshooting, or data space and data kit management. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase sf data360 workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching phase-specific skill), the task is STDM/session tracing/parquet telemetry (use observing-agentforce), standard CRM SOQL (use querying-soql), or Apex implementation (use generating-apex).
github-project-automation
IncludedAutomate GitHub repository setup with CI/CD workflows, issue templates, Dependabot, and CodeQL security scanning. Includes 12 production-tested workflows and prevents 18 errors: YAML syntax, action pinning, and configuration. Use when: setting up GitHub Actions CI/CD, creating issue/PR templates, enabling Dependabot or CodeQL scanning, deploying to Cloudflare Workers, implementing matrix testing, or troubleshooting YAML indentation, action version pinning, secrets syntax, runner versions, or CodeQL configuration. Keywords: github actions, github workflow, ci/cd, issue templates, pull request templates, dependabot, codeql, security scanning, yaml syntax, github automation, repository setup, workflow templates, github actions matrix, secrets management, branch protection, codeowners, github projects, continuous integration, continuous deployment, workflow syntax error, action version pinning, runner version, github context, yaml indentation error
sf-datacloud
IncludedSalesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase `sf data360` workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching sf-datacloud-* skill), the task is STDM/session tracing/parquet telemetry (use sf-ai-agentforce-observability), standard CRM SOQL (use sf-soql), or Apex implementation (use sf-apex).
fabric-cli
IncludedUse this skill for Fabric.so CLI workflows with the `fabric` terminal command: diagnose/install/login, search or browse a Fabric library, save notes/links/files, create folders, ask the Fabric AI assistant, manage tasks/workspaces, generate shell completion, check subscription usage, produce JSON output, and use Fabric as persistent agent memory. Do not use for Microsoft Fabric/Azure/Power BI `fab`, Daniel Miessler's Fabric framework, Python Fabric SSH, Fabric.js, or textile/fashion fabric.
lark
IncludedLark/Feishu CLI skills: lark-cli operations for docs, markdown, sheets, base, calendar, im, mail, task, okr, drive, wiki, slides, whiteboard, apps, approval, attendance, contact, vc, minutes, event. Use when the user needs to operate Lark/Feishu resources via lark-cli, send messages, manage documents, spreadsheets, calendars, tasks, OKRs, deploy web pages, or any Feishu/Lark workspace operations.