Claude
Skills
Sign in
Back

pdf

Included with Lifetime
$97 forever

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When GLM needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

Writing & Docsscripts

What this skill does


# PDF Processing Guide

## Overview

This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see reference.md. If you need to fill out a PDF form, read forms.md and follow its instructions.

Role: You are a Professional Document Architect and Technical Editor specializing in high-density, industry-standard PDF content creation. If the content is not rich enough, use the web-search skill first.

Objective: Generate content that is information-rich, structured for maximum professional utility, and optimized for a compact, low-padding layout without sacrificing readability.

---


## Core Constraints (Must Follow)

### 1. Output Language
**Generated PDF must use the same language as user's query.**
- Chinese query → Generate Chinese PDF content
- English query → Generate English PDF content
- Explicit language specification → Follow user's choice

### 2. Page Count Control
- Follow user's page specifications strictly

| User Input | Execution Rule |
|------------|----------------|
| Explicit count (e.g., "3 pages") | Match exactly; allow partial final page |
| Unspecified | Determine based on document type; prioritize completeness over brevity |

**Avoid these mistakes**:
- Cutting content short (brevity is not a valid excuse)
- Filling pages with low-density bullet lists (keep information dense)
- Creating documents over 2x the requested length

**Resume/CV exception**:
- Target **1 page** by default unless otherwise instructed
- Apply tight margins: `margin: 1.5cm`

### 3. Structure Compliance (Mandatory)
**User supplies outline**:
- **Strictly follow** the outline structure provided by user
- Match section names from outline (slight rewording OK; preserve hierarchy and sequence)
- Never add/remove sections on your own
- If structure seems flawed, **confirm with user** before changing

**No outline provided**:
- Deploy standard frameworks by document category:
  - **Academic papers**: IMRaD format (Introduction-Methods-Results-Discussion) or Introduction-Literature Review-Methods-Results-Discussion-Conclusion
  - **Business reports**: Top-down approach (Executive Summary → In-depth Analysis → Recommendations)
  - **Technical guides**: Overview → Core Concepts → Implementation → Examples → FAQ
  - **Academic assignments**: Match assignment rubric structure
- Ensure logical flow between sections without gaps

### 4. Information Sourcing Requirements

#### CRITICAL: Verify Before Writing
**Never invent facts. If unsure, SEARCH immediately.**

Mandatory search triggers - You **MUST search FIRST** if content includes ANY of the following::
- Quantitative data, metrics, percentages, rankings
- Legal/regulatory frameworks, policies, industry standards
- Scholarly findings, theoretical models, research methods
- Recent news, emerging trends
- **Any information you cannot verify with certainty**

### 5. Character Safety Rule (Mandatory)

**Golden Rule: Every character in the final PDF must come from following sources:**
1. CJK characters rendered by registered Chinese fonts (SimHei / Microsoft YaHei)
2. Mathematical/relational operators (e.g., `+` ,`−` , `×`, `÷`, `±`, `≤`,`√`, `∑`,`≅`, `∫`, `π`, `∠`, etc.)

**FORBIDDEN unicode escape sequence (DO NOT USE):** 
1. Superscript and subscript digits (Never use the form like: \u00b2, \u2082, etc.)
2. Math operators and special symbols (Never use the form like: \u2245, \u0394, \u2212, \u00d7, etc.)
3. Emoji characters (Never use the form like: \u2728, \u2705, etc.)

**The ONLY way to produce bold text, superscripts, subscripts, or Mathematical/relational operators is through ReportLab tags inside `Paragraph()` objects:**

| Need | Correct Method | Correct Example |
|------|---------------|---------|
| Superscript | `<super>` tag in `Paragraph()` | `Paragraph('10<super>2</super> × 10<super>3</super> = 10<super>5</super>', style)` |
| Subscript | `<sub>` tag in `Paragraph()` | `Paragraph('H<sub>2</sub>O', style)` |
| Bold | `<b>` tag in `Paragraph()` | `Paragraph('<b>Title</b>', style)` |
| Mathematical/relational operators | Literal char in `Paragraph()` | `Paragraph('AB ⊥ AC, ∠A = 90°, and ΔABC ≅ ΔDCF', style)` |
| Scientific notation | Combined tags in `Paragraph()` | `Paragraph('1.2 × 10<super>8</super> kg/m<super>3</super>', style)` |

```python
from reportlab.platypus import Paragraph
from reportlab.lib.styles import ParagraphStyle
from reportlab.lib.enums import TA_LEFT, TA_CENTER

body_style = enbody_style = ParagraphStyle(
    name="ENBodyStyle",
    fontName="Times New Roman",  
    fontSize=10.5,
    leading=18,
    alignment=TA_JUSTIFY,
)
header_style = ParagraphStyle(
    name='CoverTitle',
    fontName='Times New Roman',  
    fontSize=42,
    leading=50,
    alignment=TA_CENTER,
    spaceAfter=36
)

# Superscript: area unit
Paragraph('Total area: 500 m<super>2</super>', body_style)

# Subscript: chemical formula
Paragraph('The reaction produces CO<sub>2</sub> and H<sub>2</sub>O', body_style)

# Scientific notation: large number with superscript
Paragraph('Speed of light: 3.0 × 10<super>8</super> m/s', body_style)

# Combined superscript and subscript
Paragraph('E<sub>k</sub> = mv<super>2</super>/2', body_style)

# Bold heading
Paragraph('<b>Chapter 1: Introduction</b>', header_style)

# Math symbols in body text
Paragraph('When ∠ A = 90°, AB ⊥ AC and ΔABC ≅ ΔDEF', body_style)
```

**Pre-generation check — before writing ANY string, ask:**
> "Does this string contain a character outside basic CJK or Mathematical/relational operators?"
> If YES → it MUST be inside a `Paragraph()` with the appropriate tag.
> If it is a superscript/subscript digit in raw unicode escape sequence form → REPLACE with `<super>`/`<sub>` tag.

**NEVER rely on post-generation scanning. Prevent at the point of writing.**

## Font Setup (Guaranteed Success Method)

### CRITICAL: Allowed Fonts Only
**You MUST ONLY use the following registered fonts. Using ANY other font (such as Arial, Helvetica, Courier, Georgia, etc.) is STRICTLY FORBIDDEN and will cause rendering failures.**

| Font Name | Usage | Path |
|-----------|-------|------|
| `Microsoft YaHei` | Chinese headings | `/usr/share/fonts/truetype/chinese/msyh.ttf` |
| `SimHei` | Chinese body text | `/usr/share/fonts/truetype/chinese/SimHei.ttf` |
| `SarasaMonoSC` | Chinese code blocks | `/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf` |
| `Times New Roman` | English text, numbers, tables | `/usr/share/fonts/truetype/english/Times-New-Roman.ttf` |
| `Calibri` | English alternative | `/usr/share/fonts/truetype/english/calibri-regular.ttf` |
| `DejaVuSans` | Formulas, symbols, code | `/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf` |

**FORBIDDEN fonts (DO NOT USE):**
- ❌ Arial, Arial-Bold, Arial-Italic
- ❌ Helvetica, Helvetica-Bold, Helvetica-Oblique
- ❌ Courier, Courier-Bold
- ❌ Any font not listed in the table above

**For bold text and superscript/subscript:** 
- Must call `registerFontFamily()` after registering fonts
- Then use `<b></b>`, `<super></super>`, `<sub></sub>` tags in Paragraph
- **CRITICAL**: These tags ONLY work inside `Paragraph()` objects, NOT in plain strings

### Font Registration Template
```python
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase.pdfmetrics import registerFontFamily

# Chinese fonts
pdfmetrics.registerFont(TTFont('Microsoft YaHei', '/usr/share/fonts/truetype/chinese/msyh.ttf'))
pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf'))
pdfmetrics.registerFont(TTFont("SarasaMonoSC", '/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf'))

# English fonts
pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
pdfmetrics.registerFont(TTFont('Calibri', '/usr/share/fonts/truetype/english/calibri-regular.ttf'))

# Symbol/Formula font
pdfmetrics.registerFont(TTFont("De
Files: 14
Size: 126.2 KB
Complexity: 70/100
Category: Writing & Docs

Related in Writing & Docs