nutrient-document-processing

Included with Lifetime

$97 forever

Process documents with Nutrient DWS. Use when the user wants to generate PDFs from HTML or URLs, convert Office/images/PDFs, assemble or split packets, OCR scans, extract text/tables/key-value pairs, redact PII, watermark, sign, fill forms, optimize PDFs, or produce compliance outputs like PDF/A or PDF/UA. Triggers include convert to PDF, merge these PDFs, OCR this scan, extract tables, redact PII, sign this PDF, make this PDF/A, or linearize for web delivery.

Web Devscriptsassets

What this skill does


# Nutrient Document Processing

Use Nutrient DWS for managed document workflows where fidelity, compliance, or multi-step processing matters more than local-tool convenience.

## Setup
- Get a Nutrient DWS API key at <https://dashboard.nutrient.io/sign_up/?product=processor>.
- Direct API calls use `Authorization: Bearer $NUTRIENT_API_KEY`.
  ```bash
  export NUTRIENT_API_KEY="nutr_sk_..."
  ```
- MCP setups commonly use `@nutrient-sdk/dws-mcp-server` with `NUTRIENT_DWS_API_KEY`.
- Scripts live in `scripts/` relative to this SKILL.md. Use the directory containing this SKILL.md as the working directory:
  ```bash
  cd <directory containing this SKILL.md> && uv run scripts/<script>.py --help
  ```
- Page ranges use `start:end` with 0-based indexes and end-exclusive semantics. Negative indexes count from the end.

## When to use
- Generate PDFs from HTML templates, uploaded assets, or remote URLs.
- Convert Office, HTML, image, and PDF files between supported formats.
- OCR scans and extract text, tables, or key-value pairs.
- Redact PII, watermark, sign, fill forms, merge, split, rotate, flatten, or encrypt PDFs.
- Produce delivery targets like PDF/A, PDF/UA, optimized PDFs, or linearized PDFs.
- Check credits before large, batch, or AI-heavy runs.

## Tool preference
1. Prefer `scripts/*.py` for covered single-operation workflows.
2. Use `assets/templates/custom-workflow-template.py` for multi-step jobs that should still run through the Python client.
3. Use the modular `references/` docs and direct API payloads for capabilities that do not yet have a dedicated helper script, especially HTML/URL generation and compliance tuning.
4. Use local PDF utilities only for lightweight inspection. Use Nutrient when output fidelity or compliance matters.

## Single-operation scripts
- `convert.py` -> convert between `pdf`, `pdfa`, `pdfua`, `docx`, `xlsx`, `pptx`, `png`, `jpeg`, `webp`, `html`, and `markdown`
- `merge.py` -> merge multiple files into one PDF
- `split.py` -> split one PDF into multiple PDFs by page ranges
- `add-pages.py` -> append blank pages
- `delete-pages.py` -> remove specific pages
- `duplicate-pages.py` -> reorder or duplicate pages into a new PDF
- `rotate.py` -> rotate selected pages
- `ocr.py` -> OCR scanned PDFs or images
- `extract-text.py` -> extract text to JSON
- `extract-table.py` -> extract tables
- `extract-key-value-pairs.py` -> extract key-value pairs
- `watermark-text.py` -> apply a text watermark
- `redact-ai.py` -> detect and apply AI-powered redactions
- `sign.py` -> digitally sign a local PDF
- `password-protect.py` -> write encrypted output PDFs
- `optimize.py` -> apply optimization and linearization-style options via JSON

## Multi-Step Workflow Rule
Do not add new committed pipeline scripts under `scripts/`.

When the user asks for multiple operations in one run:
1. Copy `assets/templates/custom-workflow-template.py` to a temporary location such as `/tmp/ndp-workflow-<task>.py`.
2. Implement the combined workflow in that temporary script.
3. Run it with `uv run /tmp/ndp-workflow-<task>.py ...`.
4. Return generated output files.
5. Delete the temporary script unless the user explicitly asks to keep it.

## PDF Requirements
- `split.py` requires a multi-page PDF and cannot extract ranges from a single-page document.
- `delete-pages.py` must retain at least one page and cannot delete the entire document.
- `sign.py` only accepts local file paths for the main PDF.

## Decision rules
- Prefer a helper script when one already covers the requested operation cleanly.
- If you control the source markup, prefer HTML generation over browser print workflows.
- Use remote `file.url` inputs when the source already lives at a stable URL and you want to avoid local uploads.
- Use `output.type` for conversion and finalization targets. Use `actions` for transformations when building direct API payloads.
- OCR before text extraction, key-value extraction, or semantic redaction on scans.
- Prefer preset or regex redaction when the target is explicit. Use AI redaction only for contextual or natural-language requests.
- Use the PDF manipulation reference for merge, split, rotate, flatten, and page-range workflows instead of inferring those payloads from conversion examples.
- Treat PDF/A and PDF/UA as compliance targets, not cosmetic export formats. Choose the target up front and validate final artifacts when requirements are contractual.
- For PDF/UA, clean born-digital inputs and structured HTML usually tag better than rasterized or flattened source PDFs.
- For delivery optimization, linearize or optimize unsigned output artifacts instead of mutating already signed files.
- When the user asks for multiple steps, keep destructive or final steps late in the sequence. Use the workflow recipes when ordering is ambiguous.

## Anti-patterns
- Do not OCR born-digital PDFs just because the task mentions extraction. Extract first and OCR only if the text layer is missing.
- Do not flatten forms or annotations until the user confirms the artifact no longer needs to stay editable.
- Do not sign, archive, or linearize intermediate working files. Keep those as final-delivery steps.
- Do not promise PDF/A or PDF/UA compliance without a validation step when the requirement is contractual.
- Do not commit temporary workflow scripts under `scripts/`.

## Reference map
Read only what you need:

- `references/request-basics.md` -> endpoint model, auth, multipart vs JSON, credits, limits, and errors
- `references/generation-and-conversion.md` -> HTML/URL generation and format conversion
- `references/pdf-manipulation.md` -> merge, split, page-range, rotate, and flatten workflows
- `references/extraction-and-ocr.md` -> OCR, text extraction, tables, and key-value workflows
- `references/security-signing-and-forms.md` -> redaction, watermarking, signatures, forms, and passwords
- `references/compliance-and-optimization.md` -> PDF/A, PDF/UA, optimization, and linearization
- `references/workflow-recipes.md` -> end-to-end sequencing patterns for common business document workflows

## Rules
- Fail fast when required arguments are missing.
- Write outputs to explicit paths and print created files.
- Do not log secrets.
- All client methods are async and should run via `asyncio.run(main())`.
- If import fails, install dependency with `uv add nutrient-dws`.

## Security Hardening Addendum

- Prefer a pinned, preinstalled MCP server binary over runtime package fetches.
  - Preferred: `npm i -g @nutrient-sdk/dws-mcp-server@<pinned-version>`
  - Avoid unpinned runtime fetch in production paths.
- Never store `NUTRIENT_DWS_API_KEY` in committed JSON config files.
  - Use process env injection at runtime (shell/export, secrets manager, or host env).
- Restrict file access with `SANDBOX_PATH` to the minimum required working directory.
- Before enabling MCP mode in production, verify package provenance and lock version.

Files: 31

Size: 73.6 KB

Complexity: 97/100

Category: Web Dev

Source: https://github.com/pspdfkit-labs/nutrient-agent-skill/tree/main/nutrient-document-processing

Related in Web Dev

generating-lwc-components

Included

Lightning Web Components with PICKLES methodology and 165-point scoring. Use this skill when the user creates or edits LWC components, builds wire service patterns, or writes Jest tests for LWC. TRIGGER when: user creates/edits LWC components, touches lwc/**/*.js, .html, .css, .js-meta.xml files, or asks about wire service, SLDS, or Jest LWC tests. DO NOT TRIGGER when: Apex classes (use generating-apex), Aura components, or Visualforce.

Web Devscripts

tanstack-query

Included

Manage server state in React with TanStack Query v5. Set up queries with useQuery, mutations with useMutation, configure QueryClient caching strategies, implement optimistic updates, and handle infinite scroll with useInfiniteQuery. Use when: setting up data fetching in React projects, migrating from v4 to v5, or fixing object syntax required errors, query callbacks removed issues, cacheTime renamed to gcTime, isPending vs isLoading confusion, keepPreviousData removed problems.

Web Devscripts

document-processor-api

Included

Web Devscripts

tanstack-query

Included

Manage server state in React with TanStack Query v5. Covers useMutationState, simplified optimistic updates, throwOnError, network mode (offline/PWA), and infiniteQueryOptions. Use when setting up data fetching, fixing v4→v5 migration errors (object syntax, gcTime, isPending, keepPreviousData), or debugging SSR/hydration issues with streaming server components.

Web Devscripts

accelint-nextjs-best-practices

Included

Next.js performance optimization and best practices. Use when writing Next.js code (App Router or Pages Router); implementing Server Components, Server Actions, or API routes; optimizing RSC serialization, data fetching, or server-side rendering; reviewing Next.js code for performance issues; fixing authentication in Server Actions; or implementing Suspense boundaries, parallel data fetching, or request deduplication.

Web Devscripts

clerk-auth

Included

Clerk auth with API version 2025-11-10 breaking changes (billing endpoints, payment_source→payment_method), Next.js v6 async auth(), PKCE for custom OAuth, credential stuffing defense. Use when: troubleshooting "Missing Clerk Secret Key", JWKS errors, authorizedParties CSRF, JWT size limits (1.2KB), 431 header errors (Vite dev mode), or testing with 424242 OTP.

Web Devscripts