firecrawl-advanced-troubleshooting
Debug hard-to-diagnose Firecrawl issues with systematic isolation and evidence collection. Use when standard troubleshooting fails, investigating why scrapes return empty content, crawl jobs hang, or webhooks don't fire. Trigger with phrases like "firecrawl hard bug", "firecrawl mystery error", "firecrawl impossible to debug", "firecrawl deep debug", "firecrawl not scraping".
What this skill does
# Firecrawl Advanced Troubleshooting
## Overview
Deep debugging techniques for complex Firecrawl issues: empty scrapes on certain domains, crawl jobs that never complete, inconsistent extraction results, and webhook delivery failures. Uses systematic layer-by-layer isolation.
## Instructions
### Step 1: Minimal Reproduction
```typescript
import FirecrawlApp from "@mendable/firecrawl-js";
// Strip everything down to the simplest failing case
async function minimalRepro() {
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Test 1: Can we scrape at all?
console.log("Test 1: Basic scrape");
const basic = await firecrawl.scrapeUrl("https://example.com", {
formats: ["markdown"],
});
console.log(` Success: ${basic.success}, Length: ${basic.markdown?.length}`);
// Test 2: Does the target URL work?
console.log("Test 2: Target URL");
const target = await firecrawl.scrapeUrl("https://YOUR-FAILING-URL.com", {
formats: ["markdown"],
});
console.log(` Success: ${target.success}, Length: ${target.markdown?.length}`);
// Test 3: With waitFor for JS rendering
console.log("Test 3: With JS wait");
const withWait = await firecrawl.scrapeUrl("https://YOUR-FAILING-URL.com", {
formats: ["markdown"],
waitFor: 10000,
onlyMainContent: true,
});
console.log(` Success: ${withWait.success}, Length: ${withWait.markdown?.length}`);
// Test 4: With actions
console.log("Test 4: With actions");
const withActions = await firecrawl.scrapeUrl("https://YOUR-FAILING-URL.com", {
formats: ["markdown", "screenshot"],
actions: [
{ type: "wait", milliseconds: 3000 },
{ type: "scroll", direction: "down" },
{ type: "wait", milliseconds: 2000 },
],
});
console.log(` Success: ${withActions.success}, Length: ${withActions.markdown?.length}`);
// Screenshot will show what Firecrawl actually sees
}
```
### Step 2: Layer-by-Layer Isolation
```typescript
async function diagnose(url: string) {
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! });
const results: Array<{ test: string; pass: boolean; detail: string }> = [];
// Layer 1: API connectivity
try {
await firecrawl.scrapeUrl("https://example.com", { formats: ["markdown"] });
results.push({ test: "API connectivity", pass: true, detail: "OK" });
} catch (e: any) {
results.push({ test: "API connectivity", pass: false, detail: `${e.statusCode}: ${e.message}` });
return results; // can't continue
}
// Layer 2: Target URL accessibility
try {
const result = await firecrawl.scrapeUrl(url, { formats: ["markdown"] });
const hasContent = (result.markdown?.length || 0) > 50;
results.push({
test: "Target scrape",
pass: result.success && hasContent,
detail: `Success: ${result.success}, Chars: ${result.markdown?.length}, Status: ${result.metadata?.statusCode}`,
});
} catch (e: any) {
results.push({ test: "Target scrape", pass: false, detail: e.message });
}
// Layer 3: Content quality
try {
const result = await firecrawl.scrapeUrl(url, {
formats: ["markdown", "html"],
onlyMainContent: true,
waitFor: 5000,
});
const md = result.markdown || "";
const isErrorPage = /404|403|access denied|captcha|blocked/i.test(md);
results.push({
test: "Content quality",
pass: md.length > 100 && !isErrorPage,
detail: `Chars: ${md.length}, Error page: ${isErrorPage}, Has headings: ${/^#{1,3}\s/m.test(md)}`,
});
} catch (e: any) {
results.push({ test: "Content quality", pass: false, detail: e.message });
}
// Layer 4: Map endpoint (URL discovery)
try {
const map = await firecrawl.mapUrl(url);
results.push({
test: "Map endpoint",
pass: (map.links?.length || 0) > 0,
detail: `Found ${map.links?.length} URLs`,
});
} catch (e: any) {
results.push({ test: "Map endpoint", pass: false, detail: e.message });
}
return results;
}
// Run diagnosis
const results = await diagnose("https://YOUR-URL.com");
console.table(results);
```
### Step 3: Debug Empty Scrapes
```typescript
// When scrapeUrl returns empty or thin markdown:
async function debugEmptyScrape(url: string) {
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! });
// Get all formats to understand what Firecrawl sees
const result = await firecrawl.scrapeUrl(url, {
formats: ["markdown", "html", "screenshot"],
waitFor: 10000,
});
console.log("=== Scrape Debug ===");
console.log(`URL: ${result.metadata?.sourceURL}`);
console.log(`Status: ${result.metadata?.statusCode}`);
console.log(`Markdown length: ${result.markdown?.length || 0}`);
console.log(`HTML length: ${result.html?.length || 0}`);
console.log(`Title: ${result.metadata?.title}`);
// Check if HTML has content but markdown doesn't
if ((result.html?.length || 0) > 1000 && (result.markdown?.length || 0) < 100) {
console.log("DIAGNOSIS: HTML has content but markdown extraction failed");
console.log("FIX: Content may be in iframes or shadow DOM. Try with actions.");
}
// Check for bot detection
if (/captcha|cloudflare|access denied|please verify/i.test(result.html || "")) {
console.log("DIAGNOSIS: Bot detection / CAPTCHA detected");
console.log("FIX: Site blocks automated scraping. Contact Firecrawl support.");
}
return result;
}
```
### Step 4: Debug Stuck Crawl Jobs
```typescript
async function debugCrawlJob(jobId: string) {
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! });
const status = await firecrawl.checkCrawlStatus(jobId);
console.log("=== Crawl Job Debug ===");
console.log(`Status: ${status.status}`);
console.log(`Completed: ${status.completed}/${status.total}`);
console.log(`Error: ${status.error || "none"}`);
if (status.status === "scraping" && status.completed === status.total) {
console.log("DIAGNOSIS: All pages scraped but job not marked complete");
console.log("FIX: This is a Firecrawl backend issue. Wait or start a new crawl.");
}
if (status.completed === 0 && status.status === "scraping") {
console.log("DIAGNOSIS: Crawl started but no pages scraped");
console.log("FIX: Check if start URL returns content. Try scrapeUrl first.");
}
}
```
### Step 5: Timing Analysis
```typescript
async function timeScrape(url: string, iterations = 5) {
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! });
const times: number[] = [];
for (let i = 0; i < iterations; i++) {
const start = Date.now();
await firecrawl.scrapeUrl(url, { formats: ["markdown"] });
times.push(Date.now() - start);
}
times.sort((a, b) => a - b);
console.log(`p50: ${times[Math.floor(times.length * 0.5)]}ms`);
console.log(`p95: ${times[Math.floor(times.length * 0.95)]}ms`);
console.log(`min: ${times[0]}ms, max: ${times[times.length - 1]}ms`);
}
```
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Empty markdown, HTML exists | Shadow DOM or iframes | Use `actions` to interact with page |
| Scrape returns CAPTCHA | Bot detection | Try with `mobile: true`, contact Firecrawl |
| Crawl stuck at 0 pages | Start URL blocked | Verify URL loads in browser first |
| Inconsistent results | JS rendering timing | Increase `waitFor`, use selector-based wait |
| Webhook never fires | URL unreachable | Test with `curl` to your endpoint first |
## Support Escalation Template
```
Subject: [P1/P2/P3] [Brief description]
URL: [failing URL]
API Key prefix: fc-xxx (first 6 chars)
Timestamp: [ISO 8601]
Expected: [what should happen]
Actual: [what happens]
Diagnostic output: [paste from diagnose() above]
Screenshot: [if available from screenshot format]
Workarounds tried:
1. [what you tried] — result: [outcome]
```
## Resources
- [Firecrawl Advanced Scraping](https://docs.firecrawl.dev/advanced-scraping-guide)
- [GitHub IssuRelated in Writing & Docs
jax-development
IncludedUse this skill when the user is writing, debugging, profiling, refactoring, reviewing, benchmarking, parallelising, exporting, or explaining JAX code, or when they mention JAX, jax.numpy, jit, grad, value_and_grad, vmap, scan, lax, random keys, pytrees, jax.Array, sharding, Mesh, PartitionSpec, NamedSharding, pmap, shard_map, Pallas, XLA, StableHLO, checkify, profiler, or the JAX repo. It helps turn NumPy or PyTorch-style code into pure functional JAX, fix tracer/control-flow/shape/PRNG bugs, remove recompiles and host-device syncs, choose transforms and sharding strategies, inspect jaxpr/lowering/IR, and benchmark compiled code correctly.
nature-article-writer
IncludedDrafts, rewrites, diagnostically critiques, and style-calibrates primary research manuscripts for Nature and Nature Portfolio journals. Use when the user wants a Nature-style title, summary paragraph or abstract, introduction, results, discussion, methods, figure legends, presubmission enquiry, cover letter, reviewer response, or when a scientific draft sounds generic, jargon-heavy, structurally weak, or AI-ish and needs precise, broad-reader-friendly prose without inventing data, analyses, or references. Best for primary research articles and letters rather than reviews or press releases unless explicitly adapting one.
deckrd
IncludedDocument-driven framework that derives requirements, specifications, implementation plans, and executable tasks from goals through structured AI dialogue. Use when user says "write requirements", "create spec", "plan implementation", "derive tasks", "structure this feature", "break down into tasks", or "document this module". Also use for reverse engineering existing code into docs (/deckrd rev). Do NOT use for direct code writing — use /deckrd-coder after tasks are generated. Do NOT use when the user only wants to run or fix existing code without planning.
clinical-decision-support
IncludedGenerate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
handling-sf-data
IncludedSalesforce data operations with 130-point scoring. Use this skill to create, update, delete, bulk import/export, generate test data, and clean up org records using sf CLI and anonymous Apex. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, needs data factory patterns for Apex tests, or needs to seed/clean records in a Salesforce org. DO NOT TRIGGER when: SOQL query writing only (use querying-soql), Apex test execution (use running-apex-tests), or metadata deployment (use deploying-metadata).
accelint-ac-to-playwright
IncludedConvert and validate acceptance criteria for Playwright test automation. Use when user asks to (1) review/evaluate/check if AC are ready for automation, (2) assess if AC can be converted as-is, (3) validate AC quality for Playwright, (4) turn AC into tests, (5) generate tests from acceptance criteria, (6) convert .md bullets or .feature Gherkin files to Playwright specs, (7) create test automation from requirements. Handles both bullet-style markdown and Gherkin syntax with JSON test plan generation and validation.