web-to-markdown

Included with Lifetime

$97 forever

Convert a web URL into cleaned Markdown with deterministic routing. Use when Codex needs to read article-like content from links and should apply source-aware fetch strategies: default to r.jina.ai for general pages (including X/Twitter), use defuddle.md for YouTube links, and use browser-impersonated extraction for WeChat/Zhihu/Feishu pages with Mozilla Readability cleanup.

Writing & Docsscripts

What this skill does


# Web To Markdown

Convert URLs into usable Markdown by applying domain-aware fetching routes, then return the cleaned content directly.

## Quick Workflow

1. Normalize and validate the input URL.
2. Select route:
- `r.jina.ai`: general web + X/Twitter.
- `defuddle.md`: YouTube transcript/content extraction.
- `special-browser-fetch`: WeChat/Zhihu/Feishu.
3. Return markdown text (or JSON metadata if needed).

For generic URLs (non-YouTube, non-WeChat/Zhihu/Feishu), use this fallback chain:

- try `r.jina.ai` first,
- if it fails, fallback to direct HTTP fetch + Readability,
- if direct fetch still fails or returns shell-like content, fallback to browser extraction.

## Commands

Run from this skill directory (`skills/web-to-markdown`):

```bash
npm install
node scripts/url_to_markdown.mjs <url>
```

Return metadata with markdown:

```bash
node scripts/url_to_markdown.mjs <url> --json
```

Force special-site browser extraction:

```bash
node scripts/fetch_special_sites.mjs <url> --json
```

## Routing Policy

- Default route: `https://r.jina.ai/<url>`.
- YouTube (`youtube.com`, `youtu.be`): `https://defuddle.md/<url>`.
- X/Twitter (`x.com`, `twitter.com`): `https://r.jina.ai/<url>`.
- WeChat/Zhihu/Feishu: run `scripts/fetch_special_sites.mjs`.
- If input is already proxy-formatted (`https://defuddle.md/https://...` or `https://r.jina.ai/https://...`), normalize back to the original URL and re-apply routing.

## Special-Site Extraction Behavior

Use a two-stage strategy for WeChat/Zhihu/Feishu:

1. Try `cuimp` HTTP/TLS impersonation first, then clean HTML with Mozilla Readability.
2. If stage 1 fails or returns blocked/shell content, fallback to `puppeteer-extra` browser impersonation.

- HTTP stage impersonates modern Chrome TLS/HTTP profile via `cuimp`.
- Browser stage impersonates a modern Chrome user agent and standard `sec-ch-ua` headers.
- Remove known login modals and backdrop overlays (best effort).
- Scroll the page to trigger lazy-loaded article blocks.
- Parse cleaned document with Mozilla Readability.
- Convert extracted HTML body to Markdown via Turndown.
- Resolve browser executable from `CHROME_PATH` first, then system Chrome/Chromium/Edge paths.

If special-site extraction fails due to anti-bot checks, account-only pages, or network limits, report failure clearly and ask for fallback input (for example raw page text).

## Output Contract

For normal usage, output markdown only.

When `--json` is used, return:

- `source`: backend source (`r.jina.ai`, `defuddle`, `cuimp`, `browser-readability`).
- `strategy`: selected route (`r-jina`, `defuddle`, `special-http-fetch`, `special-browser-fetch-fallback`).
- `requestedUrl`: original input.
- `resolvedUrl`: normalized/final URL.
- `markdown`: extracted markdown body.

## Resources

- [references/routing-and-notes.md](references/routing-and-notes.md): domain routing rules and operational caveats.
- `scripts/url_to_markdown.mjs`: primary entrypoint.
- `scripts/fetch_special_sites_http.mjs`: WeChat/Zhihu/Feishu HTTP impersonation fetcher (`cuimp` JS).
- `scripts/fetch_special_sites.mjs`: two-stage extractor (HTTP-first, browser-fallback).

Files: 9

Size: 128.0 KB

Complexity: 69/100

Category: Writing & Docs

Source: https://github.com/rookie-ricardo/erduo-skills/tree/main/skills/web-to-markdown

Related in Writing & Docs

jax-development

Included

Use this skill when the user is writing, debugging, profiling, refactoring, reviewing, benchmarking, parallelising, exporting, or explaining JAX code, or when they mention JAX, jax.numpy, jit, grad, value_and_grad, vmap, scan, lax, random keys, pytrees, jax.Array, sharding, Mesh, PartitionSpec, NamedSharding, pmap, shard_map, Pallas, XLA, StableHLO, checkify, profiler, or the JAX repo. It helps turn NumPy or PyTorch-style code into pure functional JAX, fix tracer/control-flow/shape/PRNG bugs, remove recompiles and host-device syncs, choose transforms and sharding strategies, inspect jaxpr/lowering/IR, and benchmark compiled code correctly.

Writing & Docsscripts

nature-article-writer

Included

Drafts, rewrites, diagnostically critiques, and style-calibrates primary research manuscripts for Nature and Nature Portfolio journals. Use when the user wants a Nature-style title, summary paragraph or abstract, introduction, results, discussion, methods, figure legends, presubmission enquiry, cover letter, reviewer response, or when a scientific draft sounds generic, jargon-heavy, structurally weak, or AI-ish and needs precise, broad-reader-friendly prose without inventing data, analyses, or references. Best for primary research articles and letters rather than reviews or press releases unless explicitly adapting one.

Writing & Docsscripts

deckrd

Included

Document-driven framework that derives requirements, specifications, implementation plans, and executable tasks from goals through structured AI dialogue. Use when user says "write requirements", "create spec", "plan implementation", "derive tasks", "structure this feature", "break down into tasks", or "document this module". Also use for reverse engineering existing code into docs (/deckrd rev). Do NOT use for direct code writing — use /deckrd-coder after tasks are generated. Do NOT use when the user only wants to run or fix existing code without planning.

Writing & Docsscripts

clinical-decision-support

Included

Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.

Writing & Docsscripts

handling-sf-data

Included

Salesforce data operations with 130-point scoring. Use this skill to create, update, delete, bulk import/export, generate test data, and clean up org records using sf CLI and anonymous Apex. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, needs data factory patterns for Apex tests, or needs to seed/clean records in a Salesforce org. DO NOT TRIGGER when: SOQL query writing only (use querying-soql), Apex test execution (use running-apex-tests), or metadata deployment (use deploying-metadata).

Writing & Docsscripts

accelint-ac-to-playwright

Included

Convert and validate acceptance criteria for Playwright test automation. Use when user asks to (1) review/evaluate/check if AC are ready for automation, (2) assess if AC can be converted as-is, (3) validate AC quality for Playwright, (4) turn AC into tests, (5) generate tests from acceptance criteria, (6) convert .md bullets or .feature Gherkin files to Playwright specs, (7) create test automation from requirements. Handles both bullet-style markdown and Gherkin syntax with JSON test plan generation and validation.

Writing & Docsscripts