document-workflows

Included with Lifetime

$97 forever

Use this skill for building end-to-end document processing workflows and pipelines using LandingAI ADE. Trigger when users need to: (1) Process batches of documents in parallel or async, (2) Build classify-then-extract pipelines for mixed document types, (3) Prepare parsed documents for RAG systems with chunking and vector DB ingestion, (4) Load extraction results into databases like Snowflake or export to CSV/DataFrames, (5) Visualize extraction results: draw bounding box overlays on pages, crop chunk images, or highlight/annotate specific words or phrases found in documents, (6) Build Streamlit or web UIs for document processing, (7) Find and highlight specific terms within document sections using word-level grounding (e.g. highlight "L2S" in the Introduction, redact PII, annotate extracted values on the original page). This skill complements the document-extraction skill which covers ADE SDK basics. Use document-extraction to write code that executes parse/extract/split operations with more precision and less cost than adding the document image to the prompt and asking the LLM to find the relevant info. Use document-workflows when composing those operations into pipelines, or when you need visualization, annotation, or word-level grounding on parsed documents.

Image & Video

What this skill does


# Document Workflows — ADE Pipeline Patterns

## Overview

This skill provides **reusable building blocks** for composing LandingAI ADE
primitives (parse, extract, split) into production-ready document processing
pipelines. It complements the `document-extraction` skill:

| Concern | `document-extraction` | `document-workflows` |
|---------|----------------------|---------------------|
| Scope | ADE SDK API: parse, extract, split, grounding | End-to-end pipelines: batch, RAG, DB, classify-route |
| When | Need to call a single ADE operation | Need to compose operations into a workflow |
| Code | SDK method calls with parameters | Complete functions with error handling, parallelism |
| Deps | `landingai-ade` only | + workflow-specific libs (pandas, chromadb, etc.) |

**Philosophy:** Organize by *workflow pattern* (batch, RAG, DB insertion),
not by document type. The same pattern applies whether documents are invoices,
utility bills, or medical forms.

---

## Step 0 (mandatory) — Pre-Flight Document Exploration {#pre-flight}

**Run this before writing any pipeline code** whenever working with documents
whose internal structure has not already been inspected in this session.

> **Rule: never write section-detection, heading-matching, or text-search code
> without first running Tool 2 (diagnostic parse) on the sample document.
> Heading format is document-specific and cannot be inferred from the task
> description or document type alone — the only reliable way to know it is to
> look at the actual ADE output.**
>
> Common surprises: a paper's "Introduction" heading may appear as
> `1. Introduction` (plain text, no `#`), `## Introduction`, `INTRODUCTION`
> (all-caps), or embedded inside a text chunk with body copy. Getting this
> wrong means a silent failure (zero chunks matched) that requires a full
> re-parse to debug.

Run Tool 1 (visual render) and Tool 2 (diagnostic parse) on 1–3 representative
sample documents before writing any code. This takes under a minute and
prevents debugging iterations that a pre-flight would have avoided.

### Tool 1 — Visual page render

Render 1–2 pages as PNG and read them as visual context. No ADE credits used,
but each PNG consumes context tokens. Use when layout is ambiguous or document
origin is unknown (handwriting? scan? form?).

```bash
.venv/bin/python - << 'EOF'
import pymupdf
from pathlib import Path
from PIL import Image

pdf = Path('path/to/sample.pdf')
out_dir = Path('/tmp/ade_preflight'); out_dir.mkdir(exist_ok=True)
doc = pymupdf.open(pdf)
for pg in range(min(2, len(doc))):   # first 2 pages only
    pix = doc[pg].get_pixmap(matrix=pymupdf.Matrix(1.5, 1.5))   # 108 DPI
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    out = out_dir / f"{pdf.stem}_page{pg + 1}.png"
    img.save(out)
    print(out)
doc.close()
EOF
```

Then read the saved PNGs. Immediately answers:
- Are headings **bold text** (→ ADE may output plain-text heading, not `# Heading`)
- Is the document handwritten or scanned? → Tesseract OCR needed, not PyMuPDF
- Single-column or two-column layout?
- Any noise: running headers, page numbers, watermarks, stamps?

### Tool 2 — ADE diagnostic parse

Parses 1 sample and prints markdown structure + chunk inventory. Uses ADE
credits — keep to **1–3 samples only**, never the full corpus.

```bash
.venv/bin/python - << 'EOF'
import os
from pathlib import Path
from collections import Counter
from dotenv import load_dotenv

# Load API key: prefer existing env var, then .env file lookup
load_dotenv()  # Load API key from .env. Add a path to the .env if needed.

from landingai_ade import LandingAIADE
client = LandingAIADE()
pr = client.parse(document=Path('path/to/sample.pdf'))

print("=== MARKDOWN (first 80 lines) ===")
for i, ln in enumerate(pr.markdown.splitlines()[:80], 1):
    print(f"{i:3}: {ln}")

print("\n=== CHUNKS ===")
for ch in pr.chunks:
    txt = (ch.markdown or '').replace('\n', ' ')[:70]
    b = ch.grounding.box
    print(f"p{ch.grounding.page} {ch.type:12} "
          f"l={b.left:.2f} t={b.top:.2f} r={b.right:.2f} b={b.bottom:.2f} | {txt}")

print(f"\nPages: {pr.metadata.page_count}  "
      f"Chunks: {len(pr.chunks)}  "
      f"Types: {dict(Counter(ch.type for ch in pr.chunks))}")
EOF
```

> **Cost note:** Save the parse result with `pr.model_dump()` to a JSON file
> after the first run. Load it for later development instead of calling
> `client.parse()` again. Only re-parse when the document set changes.

### What to look for

| Observation | Implication |
|-------------|-------------|
| Heading is `1. Introduction` (plain text, no `#`) | ADE markdown won't use ATX header → use ADE extract, not regex |
| Heading format varies across docs (`# INTRO` in one, `1. Intro` in another) | Regex will break on some docs → use ADE extract for robustness |
| Every `ch.markdown` starts with `<a id='...'></a>` | Strip anchor before string matching or display |
| Two-column: chunks on same page with `l=0.07` vs `l=0.50` | Text order is left column then right; sections may span both |
| Chunk text cut mid-word at page break | Section spans pages; collect chunks from multiple pages |
| `marginalia` chunks at `t<0.08` or `t>0.90` | Running headers / page numbers → exclude from content extraction |
| Scanned / handwritten content visible in page image | PyMuPDF text extraction won't work → use Tesseract OCR |

### Tool 3 — Post-Crop Visual Verification (mandatory for bounding-box workflows) {#post-crop-verification}

After producing any bounding-box crop or overlay (figure extraction, chunk
cropping, table cell extraction, word-level grounding), **read back at least
one output PNG as an image** and describe what you see. Compare your
description against the user's request. This catches:

- **Wrong-page bugs** — ADE page numbers are 0-indexed; an off-by-one error
  lands the crop on an adjacent page with completely different content
- **Wrong-region bugs** — coordinate system mismatches that crop blank space
  or an unrelated section

> **Rule: never declare a crop workflow complete without visually reading at
> least one output PNG and confirming its content matches the user's request.**

#### Verification steps

1. Save the first crop as PNG (the workflow already does this)
2. Read the PNG file as an image (use the `read_file` tool on the PNG path)
3. Describe what you see: what content, table, figure, or text appears?
4. Compare against the user's request:
   - User asked for "the Events table" → does the crop show an Events table?
   - User asked for "Figure 3" → does the crop show a chart/diagram?
   - User asked for "Introduction section" → does the crop show intro text?
5. If the description doesn't match → investigate page indexing and
   bounding-box coordinates before continuing
6. Only proceed with remaining crops after the first one is verified

#### Why LLM vision, not heuristics

A blank-check heuristic (e.g. "mean brightness > 250 → blank") catches only
the most obvious failures. The agent's own vision capability can semantically
verify: "this crop shows a bar chart" vs "the user asked for a data table."
This catches wrong-page errors even when the crop contains valid content from
the wrong section.

---

## Quick Reference — Building Blocks

| # | Block | Pattern | Reference |
|---|-------|---------|-----------|
| 0 | Pre-flight (mandatory) | Render pages + diagnostic parse before building | [Above](#pre-flight) |
| 1 | Parse + Save | Single doc → JSON + markdown | [Below](#core-workflow) |
| 2 | Parse + Extract + Save | Single doc → structured data | [Below](#core-workflow) |
| 3 | Batch (sync) | ThreadPoolExecutor + tqdm | [batch-processing.md](references/batch-processing.md) |
| 4 | Batch (async) | AsyncLandingAIADE + aiolimiter | [batch-processing.md](references/batch-processing.md) |
| 5 | Large files | Parse Jobs API (async polling) | [batch-processing.md](references/batch-processing.md) |
| 6 | Classify → Extract | Enum classification + schema routing | [Be

Files: 7

Size: 124.3 KB

Complexity: 55/100

Category: Image & Video

Source: https://github.com/andrewyng/context-hub/tree/main/content/landingai/skills/ade/document-workflows

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts