firecrawl-reliability-patterns
Implement Firecrawl reliability patterns: circuit breakers, crawl fallbacks, and content validation. Use when building fault-tolerant scraping pipelines, implementing crawl-to-scrape fallback, or adding content quality gates to Firecrawl integrations. Trigger with phrases like "firecrawl reliability", "firecrawl circuit breaker", "firecrawl fallback", "firecrawl resilience", "firecrawl fault tolerant".
What this skill does
# Firecrawl Reliability Patterns
## Overview
Production reliability patterns for Firecrawl scraping pipelines. Firecrawl's async crawl model, JS rendering, and credit-based pricing create specific reliability challenges: crawl jobs may time out, scraped content may be empty (bot detection, JS failures), and credits can be burned by runaway crawls. This skill covers battle-tested patterns for each.
## Instructions
### Step 1: Robust Crawl with Timeout and Backoff
```typescript
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
async function reliableCrawl(
url: string,
opts: { limit: number; paths?: string[] },
timeoutMs = 600000
) {
const job = await firecrawl.asyncCrawlUrl(url, {
limit: opts.limit,
includePaths: opts.paths,
scrapeOptions: { formats: ["markdown"], onlyMainContent: true },
});
const deadline = Date.now() + timeoutMs;
let pollInterval = 2000;
while (Date.now() < deadline) {
const status = await firecrawl.checkCrawlStatus(job.id);
if (status.status === "completed") return status;
if (status.status === "failed") {
throw new Error(`Crawl failed: ${status.error}`);
}
await new Promise(r => setTimeout(r, pollInterval));
pollInterval = Math.min(pollInterval * 1.5, 30000); // back off to 30s max
}
throw new Error(`Crawl timed out after ${timeoutMs}ms (job: ${job.id})`);
}
```
### Step 2: Content Quality Validation
```typescript
interface ScrapedPage {
url: string;
markdown: string;
metadata: { title?: string; statusCode?: number };
}
function validateContent(page: ScrapedPage): {
valid: boolean;
reason?: string;
} {
if (!page.markdown || page.markdown.length < 100) {
return { valid: false, reason: "Content too short" };
}
if (page.metadata.statusCode && page.metadata.statusCode >= 400) {
return { valid: false, reason: `HTTP ${page.metadata.statusCode}` };
}
const errorPatterns = [
"access denied", "403 forbidden", "page not found",
"captcha", "please verify", "enable javascript",
];
const lower = page.markdown.toLowerCase();
for (const pattern of errorPatterns) {
if (lower.includes(pattern)) {
return { valid: false, reason: `Error page detected: "${pattern}"` };
}
}
return { valid: true };
}
```
### Step 3: Crawl-to-Scrape Fallback
```typescript
// If a full crawl fails, fall back to scraping critical pages individually
async function resilientFetch(urls: string[]): Promise<any[]> {
// Try batch scrape first (most efficient)
try {
const batch = await firecrawl.batchScrapeUrls(urls, {
formats: ["markdown"],
onlyMainContent: true,
});
const results = (batch.data || []).filter(page => {
const { valid } = validateContent({
url: page.metadata?.sourceURL || "",
markdown: page.markdown || "",
metadata: page.metadata || {},
});
return valid;
});
if (results.length >= urls.length * 0.5) {
return results; // batch succeeded (>50% valid)
}
} catch (batchError) {
console.warn("Batch scrape failed, falling back to individual scrapes");
}
// Fallback: scrape individually with retries
const results: any[] = [];
for (const url of urls) {
try {
const result = await firecrawl.scrapeUrl(url, {
formats: ["markdown"],
onlyMainContent: true,
waitFor: 5000,
});
if (validateContent({ url, markdown: result.markdown || "", metadata: result.metadata || {} }).valid) {
results.push(result);
}
} catch (e) {
console.error(`Failed to scrape ${url}: ${(e as Error).message}`);
}
// Delay between individual scrapes to avoid rate limits
await new Promise(r => setTimeout(r, 1000));
}
return results;
}
```
### Step 4: Circuit Breaker for Firecrawl
```typescript
class FirecrawlCircuitBreaker {
private failures = 0;
private lastFailure = 0;
private state: "closed" | "open" | "half-open" = "closed";
private threshold: number;
private resetTimeMs: number;
constructor(threshold = 5, resetTimeMs = 60000) {
this.threshold = threshold;
this.resetTimeMs = resetTimeMs;
}
async execute<T>(operation: () => Promise<T>, fallback?: () => T): Promise<T> {
// Check if circuit should reset
if (this.state === "open" && Date.now() - this.lastFailure > this.resetTimeMs) {
this.state = "half-open";
}
if (this.state === "open") {
console.warn("Circuit breaker OPEN — using fallback");
if (fallback) return fallback();
throw new Error("Firecrawl circuit breaker is open");
}
try {
const result = await operation();
if (this.state === "half-open") {
this.state = "closed";
this.failures = 0;
}
return result;
} catch (error) {
this.failures++;
this.lastFailure = Date.now();
if (this.failures >= this.threshold) {
this.state = "open";
console.error(`Circuit breaker OPENED after ${this.failures} failures`);
}
throw error;
}
}
}
const breaker = new FirecrawlCircuitBreaker(5, 60000);
async function protectedScrape(url: string) {
return breaker.execute(
() => firecrawl.scrapeUrl(url, { formats: ["markdown"] }),
() => ({ markdown: getCachedContent(url), metadata: { fromCache: true } })
);
}
```
### Step 5: Credit-Aware Processing
```typescript
class CreditGuard {
private dailyUsage = new Map<string, number>();
private dailyLimit: number;
constructor(dailyLimit = 5000) {
this.dailyLimit = dailyLimit;
}
canAfford(credits: number): boolean {
const today = new Date().toISOString().split("T")[0];
return (this.dailyUsage.get(today) || 0) + credits <= this.dailyLimit;
}
record(credits: number) {
const today = new Date().toISOString().split("T")[0];
this.dailyUsage.set(today, (this.dailyUsage.get(today) || 0) + credits);
}
remaining(): number {
const today = new Date().toISOString().split("T")[0];
return this.dailyLimit - (this.dailyUsage.get(today) || 0);
}
}
const creditGuard = new CreditGuard(5000);
async function budgetedCrawl(url: string, limit: number) {
if (!creditGuard.canAfford(limit)) {
throw new Error(`Budget exceeded: ${creditGuard.remaining()} credits remaining`);
}
const result = await reliableCrawl(url, { limit });
creditGuard.record(result.data?.length || 0);
return result;
}
```
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Crawl timeout | Large site, slow rendering | Set timeout, reduce limit |
| Empty markdown | Bot detection or JS failure | Increase `waitFor`, use `actions` |
| Credit overrun | No budget tracking | Implement credit guard |
| Cascade failures | Single scrape failure crashes pipeline | Circuit breaker + fallback |
| Partial crawl results | Some pages blocked | Validate content, retry failed URLs |
## Examples
### Full Resilient Pipeline
```typescript
async function resilientPipeline(url: string) {
const map = await firecrawl.mapUrl(url);
const urls = (map.links || []).filter(u => u.includes("/docs/")).slice(0, 50);
if (!creditGuard.canAfford(urls.length)) {
console.warn("Budget tight — reducing scope");
urls.splice(20); // trim to 20
}
const pages = await resilientFetch(urls);
const valid = pages.filter(p => validateContent(p).valid);
creditGuard.record(urls.length);
return { scraped: urls.length, valid: valid.length, remaining: creditGuard.remaining() };
}
```
## Resources
- [Firecrawl API Reference](https://docs.firecrawl.dev/api-reference/introduction)
- [Firecrawl Rate Limits](https://docs.firecrawl.dev/rate-limits)
## Next Steps
For policy enforcement, see `firecrawl-policy-guardrails`.
Related in Writing & Docs
jax-development
IncludedUse this skill when the user is writing, debugging, profiling, refactoring, reviewing, benchmarking, parallelising, exporting, or explaining JAX code, or when they mention JAX, jax.numpy, jit, grad, value_and_grad, vmap, scan, lax, random keys, pytrees, jax.Array, sharding, Mesh, PartitionSpec, NamedSharding, pmap, shard_map, Pallas, XLA, StableHLO, checkify, profiler, or the JAX repo. It helps turn NumPy or PyTorch-style code into pure functional JAX, fix tracer/control-flow/shape/PRNG bugs, remove recompiles and host-device syncs, choose transforms and sharding strategies, inspect jaxpr/lowering/IR, and benchmark compiled code correctly.
nature-article-writer
IncludedDrafts, rewrites, diagnostically critiques, and style-calibrates primary research manuscripts for Nature and Nature Portfolio journals. Use when the user wants a Nature-style title, summary paragraph or abstract, introduction, results, discussion, methods, figure legends, presubmission enquiry, cover letter, reviewer response, or when a scientific draft sounds generic, jargon-heavy, structurally weak, or AI-ish and needs precise, broad-reader-friendly prose without inventing data, analyses, or references. Best for primary research articles and letters rather than reviews or press releases unless explicitly adapting one.
deckrd
IncludedDocument-driven framework that derives requirements, specifications, implementation plans, and executable tasks from goals through structured AI dialogue. Use when user says "write requirements", "create spec", "plan implementation", "derive tasks", "structure this feature", "break down into tasks", or "document this module". Also use for reverse engineering existing code into docs (/deckrd rev). Do NOT use for direct code writing — use /deckrd-coder after tasks are generated. Do NOT use when the user only wants to run or fix existing code without planning.
clinical-decision-support
IncludedGenerate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
handling-sf-data
IncludedSalesforce data operations with 130-point scoring. Use this skill to create, update, delete, bulk import/export, generate test data, and clean up org records using sf CLI and anonymous Apex. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, needs data factory patterns for Apex tests, or needs to seed/clean records in a Salesforce org. DO NOT TRIGGER when: SOQL query writing only (use querying-soql), Apex test execution (use running-apex-tests), or metadata deployment (use deploying-metadata).
accelint-ac-to-playwright
IncludedConvert and validate acceptance criteria for Playwright test automation. Use when user asks to (1) review/evaluate/check if AC are ready for automation, (2) assess if AC can be converted as-is, (3) validate AC quality for Playwright, (4) turn AC into tests, (5) generate tests from acceptance criteria, (6) convert .md bullets or .feature Gherkin files to Playwright specs, (7) create test automation from requirements. Handles both bullet-style markdown and Gherkin syntax with JSON test plan generation and validation.