Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When GLM needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
What this skill does
# PDF Processing Guide
## Overview
This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see reference.md. If you need to fill out a PDF form, read forms.md and follow its instructions.
Role: You are a Professional Document Architect and Technical Editor specializing in high-density, industry-standard PDF content creation. If the content is not rich enough, use the web-search skill first.
Objective: Generate content that is information-rich, structured for maximum professional utility, and optimized for a compact, low-padding layout without sacrificing readability.
---
## Core Constraints (Must Follow)
### 1. Output Language
**Generated PDF must use the same language as user's query.**
- Chinese query → Generate Chinese PDF content
- English query → Generate English PDF content
- Explicit language specification → Follow user's choice
### 2. Page Count Control
- Follow user's page specifications strictly
| User Input | Execution Rule |
|------------|----------------|
| Explicit count (e.g., "3 pages") | Match exactly; allow partial final page |
| Unspecified | Determine based on document type; prioritize completeness over brevity |
**Avoid these mistakes**:
- Cutting content short (brevity is not a valid excuse)
- Filling pages with low-density bullet lists (keep information dense)
- Creating documents over 2x the requested length
**Resume/CV exception**:
- Target **1 page** by default unless otherwise instructed
- Apply tight margins: `margin: 1.5cm`
### 3. Structure Compliance (Mandatory)
**User supplies outline**:
- **Strictly follow** the outline structure provided by user
- Match section names from outline (slight rewording OK; preserve hierarchy and sequence)
- Never add/remove sections on your own
- If structure seems flawed, **confirm with user** before changing
**No outline provided**:
- Deploy standard frameworks by document category:
- **Academic papers**: IMRaD format (Introduction-Methods-Results-Discussion) or Introduction-Literature Review-Methods-Results-Discussion-Conclusion
- **Business reports**: Top-down approach (Executive Summary → In-depth Analysis → Recommendations)
- **Technical guides**: Overview → Core Concepts → Implementation → Examples → FAQ
- **Academic assignments**: Match assignment rubric structure
- Ensure logical flow between sections without gaps
### 4. Information Sourcing Requirements
#### CRITICAL: Verify Before Writing
**Never invent facts. If unsure, SEARCH immediately.**
Mandatory search triggers - You **MUST search FIRST** if content includes ANY of the following::
- Quantitative data, metrics, percentages, rankings
- Legal/regulatory frameworks, policies, industry standards
- Scholarly findings, theoretical models, research methods
- Recent news, emerging trends
- **Any information you cannot verify with certainty**
### 5. Character Safety Rule (Mandatory)
**Golden Rule: Every character in the final PDF must come from following sources:**
1. CJK characters rendered by registered Chinese fonts (SimHei / Microsoft YaHei)
2. Mathematical/relational operators (e.g., `+` ,`−` , `×`, `÷`, `±`, `≤`,`√`, `∑`,`≅`, `∫`, `π`, `∠`, etc.)
**FORBIDDEN unicode escape sequence (DO NOT USE):**
1. Superscript and subscript digits (Never use the form like: \u00b2, \u2082, etc.)
2. Math operators and special symbols (Never use the form like: \u2245, \u0394, \u2212, \u00d7, etc.)
3. Emoji characters (Never use the form like: \u2728, \u2705, etc.)
**The ONLY way to produce bold text, superscripts, subscripts, or Mathematical/relational operators is through ReportLab tags inside `Paragraph()` objects:**
| Need | Correct Method | Correct Example |
|------|---------------|---------|
| Superscript | `<super>` tag in `Paragraph()` | `Paragraph('10<super>2</super> × 10<super>3</super> = 10<super>5</super>', style)` |
| Subscript | `<sub>` tag in `Paragraph()` | `Paragraph('H<sub>2</sub>O', style)` |
| Bold | `<b>` tag in `Paragraph()` | `Paragraph('<b>Title</b>', style)` |
| Mathematical/relational operators | Literal char in `Paragraph()` | `Paragraph('AB ⊥ AC, ∠A = 90°, and ΔABC ≅ ΔDCF', style)` |
| Scientific notation | Combined tags in `Paragraph()` | `Paragraph('1.2 × 10<super>8</super> kg/m<super>3</super>', style)` |
```python
from reportlab.platypus import Paragraph
from reportlab.lib.styles import ParagraphStyle
from reportlab.lib.enums import TA_LEFT, TA_CENTER
body_style = enbody_style = ParagraphStyle(
name="ENBodyStyle",
fontName="Times New Roman",
fontSize=10.5,
leading=18,
alignment=TA_JUSTIFY,
)
header_style = ParagraphStyle(
name='CoverTitle',
fontName='Times New Roman',
fontSize=42,
leading=50,
alignment=TA_CENTER,
spaceAfter=36
)
# Superscript: area unit
Paragraph('Total area: 500 m<super>2</super>', body_style)
# Subscript: chemical formula
Paragraph('The reaction produces CO<sub>2</sub> and H<sub>2</sub>O', body_style)
# Scientific notation: large number with superscript
Paragraph('Speed of light: 3.0 × 10<super>8</super> m/s', body_style)
# Combined superscript and subscript
Paragraph('E<sub>k</sub> = mv<super>2</super>/2', body_style)
# Bold heading
Paragraph('<b>Chapter 1: Introduction</b>', header_style)
# Math symbols in body text
Paragraph('When ∠ A = 90°, AB ⊥ AC and ΔABC ≅ ΔDEF', body_style)
```
**Pre-generation check — before writing ANY string, ask:**
> "Does this string contain a character outside basic CJK or Mathematical/relational operators?"
> If YES → it MUST be inside a `Paragraph()` with the appropriate tag.
> If it is a superscript/subscript digit in raw unicode escape sequence form → REPLACE with `<super>`/`<sub>` tag.
**NEVER rely on post-generation scanning. Prevent at the point of writing.**
## Font Setup (Guaranteed Success Method)
### CRITICAL: Allowed Fonts Only
**You MUST ONLY use the following registered fonts. Using ANY other font (such as Arial, Helvetica, Courier, Georgia, etc.) is STRICTLY FORBIDDEN and will cause rendering failures.**
| Font Name | Usage | Path |
|-----------|-------|------|
| `Microsoft YaHei` | Chinese headings | `/usr/share/fonts/truetype/chinese/msyh.ttf` |
| `SimHei` | Chinese body text | `/usr/share/fonts/truetype/chinese/SimHei.ttf` |
| `SarasaMonoSC` | Chinese code blocks | `/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf` |
| `Times New Roman` | English text, numbers, tables | `/usr/share/fonts/truetype/english/Times-New-Roman.ttf` |
| `Calibri` | English alternative | `/usr/share/fonts/truetype/english/calibri-regular.ttf` |
| `DejaVuSans` | Formulas, symbols, code | `/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf` |
**FORBIDDEN fonts (DO NOT USE):**
- ❌ Arial, Arial-Bold, Arial-Italic
- ❌ Helvetica, Helvetica-Bold, Helvetica-Oblique
- ❌ Courier, Courier-Bold
- ❌ Any font not listed in the table above
**For bold text and superscript/subscript:**
- Must call `registerFontFamily()` after registering fonts
- Then use `<b></b>`, `<super></super>`, `<sub></sub>` tags in Paragraph
- **CRITICAL**: These tags ONLY work inside `Paragraph()` objects, NOT in plain strings
### Font Registration Template
```python
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase.pdfmetrics import registerFontFamily
# Chinese fonts
pdfmetrics.registerFont(TTFont('Microsoft YaHei', '/usr/share/fonts/truetype/chinese/msyh.ttf'))
pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf'))
pdfmetrics.registerFont(TTFont("SarasaMonoSC", '/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf'))
# English fonts
pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
pdfmetrics.registerFont(TTFont('Calibri', '/usr/share/fonts/truetype/english/calibri-regular.ttf'))
# Symbol/Formula font
pdfmetrics.registerFont(TTFont("DeRelated in Writing & Docs
jax-development
IncludedUse this skill when the user is writing, debugging, profiling, refactoring, reviewing, benchmarking, parallelising, exporting, or explaining JAX code, or when they mention JAX, jax.numpy, jit, grad, value_and_grad, vmap, scan, lax, random keys, pytrees, jax.Array, sharding, Mesh, PartitionSpec, NamedSharding, pmap, shard_map, Pallas, XLA, StableHLO, checkify, profiler, or the JAX repo. It helps turn NumPy or PyTorch-style code into pure functional JAX, fix tracer/control-flow/shape/PRNG bugs, remove recompiles and host-device syncs, choose transforms and sharding strategies, inspect jaxpr/lowering/IR, and benchmark compiled code correctly.
nature-article-writer
IncludedDrafts, rewrites, diagnostically critiques, and style-calibrates primary research manuscripts for Nature and Nature Portfolio journals. Use when the user wants a Nature-style title, summary paragraph or abstract, introduction, results, discussion, methods, figure legends, presubmission enquiry, cover letter, reviewer response, or when a scientific draft sounds generic, jargon-heavy, structurally weak, or AI-ish and needs precise, broad-reader-friendly prose without inventing data, analyses, or references. Best for primary research articles and letters rather than reviews or press releases unless explicitly adapting one.
deckrd
IncludedDocument-driven framework that derives requirements, specifications, implementation plans, and executable tasks from goals through structured AI dialogue. Use when user says "write requirements", "create spec", "plan implementation", "derive tasks", "structure this feature", "break down into tasks", or "document this module". Also use for reverse engineering existing code into docs (/deckrd rev). Do NOT use for direct code writing — use /deckrd-coder after tasks are generated. Do NOT use when the user only wants to run or fix existing code without planning.
clinical-decision-support
IncludedGenerate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
handling-sf-data
IncludedSalesforce data operations with 130-point scoring. Use this skill to create, update, delete, bulk import/export, generate test data, and clean up org records using sf CLI and anonymous Apex. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, needs data factory patterns for Apex tests, or needs to seed/clean records in a Salesforce org. DO NOT TRIGGER when: SOQL query writing only (use querying-soql), Apex test execution (use running-apex-tests), or metadata deployment (use deploying-metadata).
accelint-ac-to-playwright
IncludedConvert and validate acceptance criteria for Playwright test automation. Use when user asks to (1) review/evaluate/check if AC are ready for automation, (2) assess if AC can be converted as-is, (3) validate AC quality for Playwright, (4) turn AC into tests, (5) generate tests from acceptance criteria, (6) convert .md bullets or .feature Gherkin files to Playwright specs, (7) create test automation from requirements. Handles both bullet-style markdown and Gherkin syntax with JSON test plan generation and validation.