web-to-markdown
Convert a web URL into cleaned Markdown with deterministic routing. Use when Codex needs to read article-like content from links and should apply source-aware fetch strategies: default to r.jina.ai for general pages (including X/Twitter), use defuddle.md for YouTube links, and use browser-impersonated extraction for WeChat/Zhihu/Feishu pages with Mozilla Readability cleanup.
What this skill does
# Web To Markdown Convert URLs into usable Markdown by applying domain-aware fetching routes, then return the cleaned content directly. ## Quick Workflow 1. Normalize and validate the input URL. 2. Select route: - `r.jina.ai`: general web + X/Twitter. - `defuddle.md`: YouTube transcript/content extraction. - `special-browser-fetch`: WeChat/Zhihu/Feishu. 3. Return markdown text (or JSON metadata if needed). For generic URLs (non-YouTube, non-WeChat/Zhihu/Feishu), use this fallback chain: - try `r.jina.ai` first, - if it fails, fallback to direct HTTP fetch + Readability, - if direct fetch still fails or returns shell-like content, fallback to browser extraction. ## Commands Run from this skill directory (`skills/web-to-markdown`): ```bash npm install node scripts/url_to_markdown.mjs <url> ``` Return metadata with markdown: ```bash node scripts/url_to_markdown.mjs <url> --json ``` Force special-site browser extraction: ```bash node scripts/fetch_special_sites.mjs <url> --json ``` ## Routing Policy - Default route: `https://r.jina.ai/<url>`. - YouTube (`youtube.com`, `youtu.be`): `https://defuddle.md/<url>`. - X/Twitter (`x.com`, `twitter.com`): `https://r.jina.ai/<url>`. - WeChat/Zhihu/Feishu: run `scripts/fetch_special_sites.mjs`. - If input is already proxy-formatted (`https://defuddle.md/https://...` or `https://r.jina.ai/https://...`), normalize back to the original URL and re-apply routing. ## Special-Site Extraction Behavior Use a two-stage strategy for WeChat/Zhihu/Feishu: 1. Try `cuimp` HTTP/TLS impersonation first, then clean HTML with Mozilla Readability. 2. If stage 1 fails or returns blocked/shell content, fallback to `puppeteer-extra` browser impersonation. - HTTP stage impersonates modern Chrome TLS/HTTP profile via `cuimp`. - Browser stage impersonates a modern Chrome user agent and standard `sec-ch-ua` headers. - Remove known login modals and backdrop overlays (best effort). - Scroll the page to trigger lazy-loaded article blocks. - Parse cleaned document with Mozilla Readability. - Convert extracted HTML body to Markdown via Turndown. - Resolve browser executable from `CHROME_PATH` first, then system Chrome/Chromium/Edge paths. If special-site extraction fails due to anti-bot checks, account-only pages, or network limits, report failure clearly and ask for fallback input (for example raw page text). ## Output Contract For normal usage, output markdown only. When `--json` is used, return: - `source`: backend source (`r.jina.ai`, `defuddle`, `cuimp`, `browser-readability`). - `strategy`: selected route (`r-jina`, `defuddle`, `special-http-fetch`, `special-browser-fetch-fallback`). - `requestedUrl`: original input. - `resolvedUrl`: normalized/final URL. - `markdown`: extracted markdown body. ## Resources - [references/routing-and-notes.md](references/routing-and-notes.md): domain routing rules and operational caveats. - `scripts/url_to_markdown.mjs`: primary entrypoint. - `scripts/fetch_special_sites_http.mjs`: WeChat/Zhihu/Feishu HTTP impersonation fetcher (`cuimp` JS). - `scripts/fetch_special_sites.mjs`: two-stage extractor (HTTP-first, browser-fallback).
Related in Writing & Docs
jax-development
IncludedUse this skill when the user is writing, debugging, profiling, refactoring, reviewing, benchmarking, parallelising, exporting, or explaining JAX code, or when they mention JAX, jax.numpy, jit, grad, value_and_grad, vmap, scan, lax, random keys, pytrees, jax.Array, sharding, Mesh, PartitionSpec, NamedSharding, pmap, shard_map, Pallas, XLA, StableHLO, checkify, profiler, or the JAX repo. It helps turn NumPy or PyTorch-style code into pure functional JAX, fix tracer/control-flow/shape/PRNG bugs, remove recompiles and host-device syncs, choose transforms and sharding strategies, inspect jaxpr/lowering/IR, and benchmark compiled code correctly.
nature-article-writer
IncludedDrafts, rewrites, diagnostically critiques, and style-calibrates primary research manuscripts for Nature and Nature Portfolio journals. Use when the user wants a Nature-style title, summary paragraph or abstract, introduction, results, discussion, methods, figure legends, presubmission enquiry, cover letter, reviewer response, or when a scientific draft sounds generic, jargon-heavy, structurally weak, or AI-ish and needs precise, broad-reader-friendly prose without inventing data, analyses, or references. Best for primary research articles and letters rather than reviews or press releases unless explicitly adapting one.
deckrd
IncludedDocument-driven framework that derives requirements, specifications, implementation plans, and executable tasks from goals through structured AI dialogue. Use when user says "write requirements", "create spec", "plan implementation", "derive tasks", "structure this feature", "break down into tasks", or "document this module". Also use for reverse engineering existing code into docs (/deckrd rev). Do NOT use for direct code writing — use /deckrd-coder after tasks are generated. Do NOT use when the user only wants to run or fix existing code without planning.
clinical-decision-support
IncludedGenerate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
handling-sf-data
IncludedSalesforce data operations with 130-point scoring. Use this skill to create, update, delete, bulk import/export, generate test data, and clean up org records using sf CLI and anonymous Apex. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, needs data factory patterns for Apex tests, or needs to seed/clean records in a Salesforce org. DO NOT TRIGGER when: SOQL query writing only (use querying-soql), Apex test execution (use running-apex-tests), or metadata deployment (use deploying-metadata).
accelint-ac-to-playwright
IncludedConvert and validate acceptance criteria for Playwright test automation. Use when user asks to (1) review/evaluate/check if AC are ready for automation, (2) assess if AC can be converted as-is, (3) validate AC quality for Playwright, (4) turn AC into tests, (5) generate tests from acceptance criteria, (6) convert .md bullets or .feature Gherkin files to Playwright specs, (7) create test automation from requirements. Handles both bullet-style markdown and Gherkin syntax with JSON test plan generation and validation.