langfuse-rate-limits
Implement Langfuse rate limiting, batching, and backoff patterns. Use when handling rate limit errors, optimizing trace ingestion, or managing high-volume LLM observability workloads. Trigger with phrases like "langfuse rate limit", "langfuse throttling", "langfuse 429", "langfuse batching", "langfuse high volume".
What this skill does
# Langfuse Rate Limits
## Overview
Handle Langfuse API rate limits with optimized SDK batching, exponential backoff with jitter, concurrent request limiting, and configurable sampling for ultra-high-volume workloads.
## Prerequisites
- Langfuse SDK installed and configured
- High-volume trace workload (1,000+ events/minute)
## Instructions
### Step 1: Optimize SDK Batching Configuration
The Langfuse SDK batches events internally before sending. Tuning batch settings is the first defense against rate limits.
```typescript
// v3 Legacy: Direct configuration
import { Langfuse } from "langfuse";
const langfuse = new Langfuse({
flushAt: 50, // Events per batch (default: 15, max ~200)
flushInterval: 10000, // Milliseconds between flushes (default: 10000)
requestTimeout: 30000, // Timeout per batch request
});
// v4+: Configure via OTel span processor
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { NodeSDK } from "@opentelemetry/sdk-node";
const processor = new LangfuseSpanProcessor({
exportIntervalMillis: 10000, // Flush interval
maxExportBatchSize: 50, // Events per batch
});
const sdk = new NodeSDK({ spanProcessors: [processor] });
sdk.start();
```
### Step 2: Implement Retry with Exponential Backoff
For custom API calls (scores, datasets, prompts) that hit rate limits:
```typescript
async function withRetry<T>(
fn: () => Promise<T>,
options: { maxRetries?: number; baseDelayMs?: number; maxDelayMs?: number } = {}
): Promise<T> {
const { maxRetries = 5, baseDelayMs = 1000, maxDelayMs = 30000 } = options;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error: any) {
const status = error?.status || error?.response?.status;
// Only retry on rate limits (429) and server errors (5xx)
if (attempt === maxRetries || (status && status < 429)) {
throw error;
}
// Honor Retry-After header if present
const retryAfter = error?.response?.headers?.["retry-after"];
let delay: number;
if (retryAfter) {
delay = parseInt(retryAfter, 10) * 1000;
} else {
// Exponential backoff with jitter
delay = Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs);
delay += Math.random() * 500; // Jitter
}
console.warn(`Rate limited. Retry ${attempt + 1}/${maxRetries} in ${Math.round(delay)}ms`);
await new Promise((r) => setTimeout(r, delay));
}
}
throw new Error("Unreachable");
}
// Usage with Langfuse client operations
const langfuse = new LangfuseClient();
await withRetry(() =>
langfuse.score.create({
traceId: "trace-123",
name: "quality",
value: 0.95,
dataType: "NUMERIC",
})
);
```
### Step 3: Queue-Based Concurrency Limiting
Use `p-queue` to cap concurrent Langfuse API calls:
```typescript
import PQueue from "p-queue";
import { LangfuseClient } from "@langfuse/client";
const langfuse = new LangfuseClient();
// Max 10 concurrent API calls, 50 per second
const queue = new PQueue({
concurrency: 10,
interval: 1000,
intervalCap: 50,
});
// Queue score submissions
async function queueScore(params: {
traceId: string;
name: string;
value: number;
}) {
return queue.add(() =>
langfuse.score.create({
...params,
dataType: "NUMERIC",
})
);
}
// Queue dataset item creation
async function queueDatasetItem(datasetName: string, item: any) {
return queue.add(() =>
langfuse.api.datasetItems.create({
datasetName,
input: item.input,
expectedOutput: item.expectedOutput,
})
);
}
// Monitor queue health
setInterval(() => {
console.log(`Queue: ${queue.pending} pending, ${queue.size} queued`);
}, 10000);
```
### Step 4: Configurable Sampling for Ultra-High Volume
When tracing volume exceeds rate limits, sample traces instead of dropping them:
```typescript
import { observe, updateActiveObservation, startActiveObservation } from "@langfuse/tracing";
class TraceSampler {
private rate: number;
private windowCounts: number[] = [];
private windowMs = 60000; // 1 minute window
private maxPerWindow: number;
constructor(sampleRate: number, maxPerMinute: number) {
this.rate = sampleRate;
this.maxPerWindow = maxPerMinute;
}
shouldSample(tags?: string[]): boolean {
// Always sample errors
if (tags?.includes("error") || tags?.includes("critical")) {
return true;
}
// Check window limit
const now = Date.now();
this.windowCounts = this.windowCounts.filter((t) => t > now - this.windowMs);
if (this.windowCounts.length >= this.maxPerWindow) {
return false;
}
// Probabilistic sampling
if (Math.random() > this.rate) {
return false;
}
this.windowCounts.push(now);
return true;
}
}
// 10% sampling, max 1000 traces/minute
const sampler = new TraceSampler(0.1, 1000);
async function sampledOperation(name: string, fn: () => Promise<any>) {
if (!sampler.shouldSample()) {
return fn(); // Run without tracing
}
return startActiveObservation(name, async () => {
updateActiveObservation({ metadata: { sampled: true } });
return fn();
});
}
```
## Rate Limit Reference
| Tier | Traces/min | Batch Size | Strategy |
|------|------------|------------|----------|
| Hobby | ~500 | 15 | Default settings |
| Pro | ~5,000 | 50 | Increase `flushAt` |
| Team | ~10,000 | 100 | + Queue-based limiting |
| Enterprise | Custom | Custom | + Sampling |
## Error Handling
| Error | Response | Action |
|-------|----------|--------|
| `429 Too Many Requests` | `Retry-After: N` | Backoff for N seconds |
| `503 Service Unavailable` | Server overloaded | Backoff 30s+ |
| Flush timeout | Large batch | Reduce `flushAt`, increase `requestTimeout` |
| Memory growth | Queue backup | Add `maxSize` to PQueue |
## Resources
- [Event Queuing/Batching](https://langfuse.com/docs/observability/features/queuing-batching)
- [Advanced SDK Configuration](https://langfuse.com/docs/observability/sdk/typescript/advanced-usage)
- [p-queue](https://github.com/sindresorhus/p-queue)
Related in AI Agents
skill-development
IncludedComprehensive meta-skill for creating, managing, validating, auditing, and distributing Claude Code skills and slash commands (unified in v2.1.3+). Provides skill templates, creation workflows, validation patterns, audit checklists, naming conventions, YAML frontmatter guidance, progressive disclosure examples, and best practices lookup. Use when creating new skills, validating existing skills, auditing skill quality, understanding skill architecture, needing skill templates, learning about YAML frontmatter requirements, progressive disclosure patterns, tool restrictions (allowed-tools), skill composition, skill naming conventions, troubleshooting skill activation issues, creating custom slash commands, configuring command frontmatter, using command arguments ($ARGUMENTS, $1, $2), bash execution in commands, file references in commands, command namespacing, plugin commands, MCP slash commands, Skill tool configuration, or deciding between skills vs slash commands. Delegates to docs-management skill for official documentation.
reprompter
IncludedTransform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptception", "run with quality", "smart run", "smart agents", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files. Success criteria: Single mode quality score ≥ 7/10; Repromptception per-agent prompt quality score 8+/10; all required sections present, actionable and specific.
adaptive-compaction
IncludedAdaptive add-on policy and recovery layer that decides WHEN to compact, prune, snapshot, or fork -- replacing fixed-percent auto-compaction across Claude Code, Codex, and MCP-capable hosts. Trigger on auto-compact timing or damage: "when should I compact", "is it safe to compact now or start a fresh session", "auto-compact fires too early/mid-task", "switching to an unrelated task but the window still has space", "context rot", "answers get worse the longer the session runs", "the agent forgot the plan or my decisions after it summarized", "add a layer on top that manages context without changing the agent", raising autoCompactWindow to give the policy room, or installing/tuning a cross-tool compaction policy or PreCompact hook -- even when "compaction" is never said but the problem is context-window pressure or post-summarization memory loss. Do NOT use to summarize a conversation, build RAG, write a summarization prompt (decides WHEN not HOW), or answer max-context-length trivia.
agent-skill-creator
IncludedCreate cross-platform agent skills from workflow descriptions. Activates when users ask to create an agent, automate a repetitive workflow, create a custom skill, or need advanced agent creation. Triggers on phrases like create agent for, automate workflow, create skill for, every day I have to, daily I need to, turn process into agent, need to automate, create a cross-platform skill, validate this skill, export this skill, migrate this skill. Supports single skills, multi-agent suites, transcript processing, template-based creation, interactive configuration, cross-platform export, and spec validation.
llm-wiki
IncludedUse when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.
skill-master
IncludedAgent Skills authoring, evaluation, and optimization. Create, edit, validate, benchmark, and improve skills following the agentskills.io specification. Use when designing SKILL.md files, structuring skill folders (references, scripts, assets), ingesting external documentation into skills, running trigger evals, benchmarking skill quality, optimizing descriptions, or performing blind A/B comparisons. Keywords: agentskills.io, SKILL.md, skill authoring, eval, benchmark, trigger optimization.