liteparse
Use this skill when the user asks to parse, perform multi-format document conversion or spatially extract text from an unstructured file (PDF, DOCX, PPTX, XLSX, images, etc.) locally without cloud dependencies.
What this skill does
# LiteParse Skill
Parse unstructured documents (PDF, DOCX, PPTX, XLSX, images, and more) locally with LiteParse: fast, lightweight, no cloud dependencies or LLM required.
## Initial Setup
When this skill is invoked, respond with:
```
I'm ready to use LiteParse to parse files locally. Before we begin, please confirm that:
- `@llamaindex/liteparse` is installed globally (`npm i -g @llamaindex/liteparse`)
- The `lit` CLI command is available in your terminal
If both are set, please provide:
1. One or more files to parse (PDF, DOCX, PPTX, XLSX, images, etc.)
2. Any specific options: output format (json/text), page ranges, OCR preferences, DPI, etc.
3. What you'd like to do with the parsed content.
I will produce the appropriate `lit` CLI command or TypeScript script, and once approved, report the results.
```
Then wait for the user's input.
---
## Step 0 — Install LiteParse (if needed)
If `liteparse` is not yet installed, install it globally:
```bash
npm i -g @llamaindex/liteparse
```
Verify installation:
```bash
lit --version
```
For Office document support (DOCX, PPTX, XLSX), LibreOffice is required:
```bash
# macOS
brew install --cask libreoffice
# Ubuntu/Debian
apt-get install libreoffice
```
For image parsing, ImageMagick is required:
```bash
# macOS
brew install imagemagick
# Ubuntu/Debian
apt-get install imagemagick
```
---
## Step 1 — Produce the CLI Command or Script
### Parse a Single File
```bash
# Basic text extraction
lit parse document.pdf
# JSON output saved to a file
lit parse document.pdf --format json -o output.json
# Specific page range
lit parse document.pdf --target-pages "1-5,10,15-20"
# Disable OCR (faster, text-only PDFs)
lit parse document.pdf --no-ocr
# Use an external HTTP OCR server for higher accuracy
lit parse document.pdf --ocr-server-url http://localhost:8828/ocr
# Higher DPI for better quality
lit parse document.pdf --dpi 300
```
### Batch Parse a Directory
```bash
lit batch-parse ./input-directory ./output-directory
# Only process PDFs, recursively
lit batch-parse ./input ./output --extension .pdf --recursive
```
### Generate Page Screenshots
Screenshots are useful for LLM agents that need to see visual layout.
```bash
# All pages
lit screenshot document.pdf -o ./screenshots
# Specific pages
lit screenshot document.pdf --pages "1,3,5" -o ./screenshots
# High-DPI PNG
lit screenshot document.pdf --dpi 300 --format png -o ./screenshots
# Page range
lit screenshot document.pdf --pages "1-10" -o ./screenshots
```
---
## Step 3 — Key Options Reference
### OCR Options
| Option | Description |
|--------|-------------|
| (default) | Tesseract.js — zero setup, built-in |
| `--ocr-language fra` | Set OCR language (ISO code) |
| `--ocr-server-url <url>` | Use external HTTP OCR server (EasyOCR, PaddleOCR, custom) |
| `--no-ocr` | Disable OCR entirely |
### Output Options
| Option | Description |
|--------|-------------|
| `--format json` | Structured JSON with bounding boxes |
| `--format text` | Plain text (default) |
| `-o <file>` | Save output to file |
### Performance / Quality Options
| Option | Description |
|--------|-------------|
| `--dpi <n>` | Rendering DPI (default: 150; use 300 for high quality) |
| `--max-pages <n>` | Limit pages parsed |
| `--target-pages <pages>` | Parse specific pages (e.g. `"1-5,10"`) |
| `--no-precise-bbox` | Disable precise bounding boxes (faster) |
| `--skip-diagonal-text` | Ignore rotated/diagonal text |
| `--preserve-small-text` | Keep very small text that would otherwise be dropped |
---
## Step 4 — Using a Config File
For repeated use with consistent options, generate a `liteparse.config.json`:
```json
{
"ocrLanguage": "en",
"ocrEnabled": true,
"maxPages": 1000,
"dpi": 150,
"outputFormat": "json",
"preciseBoundingBox": true,
"skipDiagonalText": false,
"preserveVerySmallText": false
}
```
For an HTTP OCR server:
```json
{
"ocrServerUrl": "http://localhost:8828/ocr",
"ocrLanguage": "en",
"outputFormat": "json"
}
```
Use with:
```bash
lit parse document.pdf --config liteparse.config.json
```
---
## Step 5 — HTTP OCR Server API (Advanced)
If the user wants to plug in a custom OCR backend, the server must implement:
- **Endpoint**: `POST /ocr`
- **Accepts**: `file` (multipart) and `language` (string) parameters
- **Returns**:
```json
{
"results": [
{ "text": "Hello", "bbox": [x1, y1, x2, y2], "confidence": 0.98 }
]
}
```
Ready-to-use wrappers exist for EasyOCR and PaddleOCR in the LiteParse repo.
---
## Supported Input Formats
| Category | Formats |
|----------|---------|
| PDF | `.pdf` |
| Word | `.doc`, `.docx`, `.docm`, `.odt`, `.rtf` |
| PowerPoint | `.ppt`, `.pptx`, `.pptm`, `.odp` |
| Spreadsheets | `.xls`, `.xlsx`, `.xlsm`, `.ods`, `.csv`, `.tsv` |
| Images | `.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.webp`, `.svg` |
Office documents require LibreOffice; images require ImageMagick. LiteParse auto-converts these formats to PDF before parsing.
Related in Ads & Marketing
ads
IncludedMulti-platform paid advertising audit and optimization skill. Analyzes Google, Meta, YouTube, LinkedIn, TikTok, Microsoft, and Apple Ads. 250+ checks with scoring, parallel agents, industry templates, and AI creative generation.
banana
IncludedAI image generation Creative Director powered by Google Gemini Nano Banana models. Use this skill for ANY request involving image creation, editing, visual asset production, or creative direction. Triggers on: generate an image, create a photo, edit this picture, design a logo, make a banner, visual for my anything, and all /banana commands. Handles text-to-image, image editing, multi-turn creative sessions, batch workflows, and brand presets.
rpg-migration-analyzer
IncludedAnalyzes legacy RPG (Report Program Generator) programs from AS/400 and IBM i systems for migration to modern Java applications. Extracts business logic from RPG III/IV/ILE source code, identifies data structures (D-specs), file operations (F-specs), program dependencies (CALLB/CALLP), and converts RPG constructs to Java equivalents. Generates migration reports, complexity estimates, and Java implementation strategies with POJO classes, JPA entities, and service methods. Use when modernizing AS/400 or IBM i legacy systems, analyzing RPG source files (.rpg, .rpgle, .RPGLE), converting RPG to Java, mapping data specifications to Java classes, planning legacy system migration, or when user mentions RPG analysis, Report Program Generator, RPG III/IV/ILE, AS/400 modernization, IBM i migration, packed decimal conversion, or mainframe application rewrite.
brand-library-architect
IncludedBuild a complete brand library for a product — visual asset render pipeline, brand documentation set (BRAND, COPY, MANIFESTO, BIOS, FAQ, GLOSSARY, TONE, PRICING), open-source convention files (README, CONTRIBUTING, SECURITY, CODE_OF_CONDUCT), and a self-contained press kit. This skill should be used when the user asks to "build a brand library / brand kit / press kit / brand assets" for a product, "set up a brand library workflow," "create a positioning manifesto plus visual identity," or any combination of brand documentation + visual asset pipeline. Apply phase-by-phase or run end-to-end. Templates are product-agnostic and use {{TOKEN}} placeholders the skill prompts the user to fill.
writing-tech-post
IncludedAuthors engineering blog posts end-to-end: launch deep-dives, incident postmortems, architecture migrations, performance case studies, tutorials, AI/agent system writeups, security disclosures, and research-to-product translations. Picks the correct archetype, plans the abstraction ladder, enforces an evidence cadence (diagrams, benchmarks, profiles, traces, code, ablations), tunes voice against publisher house styles (Datadog, Vercel, GitHub, AWS, Meta, Cloudflare, Jane Street), and runs a pre-publish gate for narrative momentum and disclosure ethics. Use when drafting a new engineering post, restructuring a draft that feels flat, deciding which evidence form belongs where, validating that depth and product context are balanced, or preparing a postmortem, migration, or performance narrative for external publication. Do not use for API reference documentation, README authoring, marketing copy, release notes, generic SEO content, ghost-written executive thought leadership, or non-engineering long-form essays.
blog-google
IncludedGoogle API integration for blog performance: PageSpeed Insights, CrUX Core Web Vitals with 25-week history, Search Console performance, URL Inspection, Indexing API, GA4 organic traffic, NLP entity analysis for E-E-A-T, YouTube video search for embedding, and Google Ads Keyword Planner. Progressive feature availability based on credential tier (API key, OAuth/service account, GA4, Ads). Shares config with claude-seo at ~/.config/claude-seo/google-api.json. Use when user says "google data", "page speed", "core web vitals", "search console", "indexation", "GA4", "keyword research", "nlp entities", "blog performance", "youtube search", "google api setup".