article-exporter

Included with Lifetime

$97 forever

Export any web article to a local Obsidian-ready Markdown directory. Fetches page content via actionbook CLI, downloads images locally, rewrites image references to relative paths, and optionally translates the article using AI. Produces a self-contained folder with README.md, images/, and an index.md navigation file.

Image & Video

What this skill does


# Article Exporter - Export Articles to Obsidian

> **Version:** 0.5.0 | **Last Updated:** 2026-03-13

You are an expert at web content archiving and Obsidian workflow automation.

## Lessons from Failed Exports

These rules were extracted from real export failures. Each one prevents a specific class of error:

1. **Twitter/X needs AI reformatting** — `fetch` returns flat text because Twitter uses custom UI without semantic HTML. The AI reformatting step reconstructs headings, lists, and code blocks. See `references/twitter-handling.md`.
2. **Ask for output path first** — users have different vault locations. Assuming a default creates files in the wrong place and wastes time moving them.
3. **Check actionbook version >= 0.9.1** — the `--wait-hint` parameter was added in 0.9.1. Without it, dynamic content (SPAs, lazy-loaded pages) returns empty or partial results.
4. **Wait after navigation** — use `--wait-hint heavy` for Twitter, Medium, and other dynamic sites. Without it, the page hasn't finished rendering when content is extracted.
5. **Rate limit batch exports** — 3-5s delay between requests prevents being flagged as a bot (ToS compliance).

## Quick Reference

| Task | Command | Success Criteria |
|------|---------|------------------|
| Check deps | `actionbook --version` | Shows version >= 0.9.1 |
| Fetch article | `actionbook browser fetch <url> --wait-hint heavy` | Returns plain text (AI reformats to Markdown in Step 1b) |
| Translate | AI session directly | README_CN.md created |
| Open in Obsidian | `obsidian-cli open "path/index.md"` | File opens in Obsidian |

---

## Complete Export Workflow

**Goal:** Export web article to Obsidian directory with images and optional translation

**Success criteria:**
- Article directory created with README.md
- All images downloaded to images/
- index.md navigation file created
- Optional: README_CN.md translation
- Opened in Obsidian (if obsidian-cli available)

---

### Step 1: Fetch Article Content

**Execution:** Direct (Bash)

```bash
# Fetch article as readability text (with log cleaning)
actionbook browser fetch "$URL" --wait-hint heavy 2>/dev/null | \
  sed '/^[[:space:]]*$/d;/^\x1b\[/d;/^INFO/d' > /tmp/article_raw.txt
```

**Success criteria:**
- `/tmp/article_raw.txt` exists and size > 0 bytes
- Content contains the article's main text

The fetch command returns readability-extracted **plain text** (not Markdown).
AI reformatting in Step 1b is always needed to produce proper Markdown.

**Rules:**
- Use `--wait-hint heavy` for Twitter, Medium, dynamic content
- Use `--wait-hint light` for static blogs
- `2>/dev/null` suppresses stderr logs
- `sed` removes ANSI codes, INFO lines, empty lines

**Twitter/X Special Handling**

Twitter uses non-semantic HTML, so `fetch` output loses all structure (headings become flat text, code blocks disappear). If the URL contains `x.com` or `twitter.com`, pay extra attention to structure reconstruction in Step 1b. See `references/twitter-handling.md`.

---

### Step 1b: AI Reformat to Markdown

**Execution:** Direct (AI session)

Read `/tmp/article_raw.txt` and convert the plain text into well-structured Markdown. Save the result to `/tmp/article.md`.

**Reformatting rules:**
- Reconstruct headings (`#`, `##`, `###`) from the text structure
- Preserve original image URLs as `![alt](url)` references
- Format code blocks, lists, tables, and blockquotes
- Keep the original article title as the first `# H1` heading

**Success criteria:**
- `/tmp/article.md` exists and starts with `# <Title>`
- Image URLs are preserved as Markdown image syntax

---

### Step 2: Extract Metadata

**Execution:** Direct (Bash)

```bash
# Extract title (first H1 heading from AI-reformatted markdown)
TITLE=$(grep -m 1 "^# " /tmp/article.md | sed 's/^# //')

# Extract image URLs (filter out data: URLs)
IMAGE_URLS=$(grep -o '!\[[^]]*\]([^)]*)' /tmp/article.md | \
    sed -E 's/!\[[^]]*\]\(([^)]*)\)/\1/' | \
    grep -v '^data:')
```

**Success criteria:**
- `$TITLE` is non-empty
- `$IMAGE_URLS` count matches expected (use `wc -l`)

---

### Step 3: Ask Output Directory

**Execution:** [human]
**Human checkpoint:** Confirm output location before creating files

Ask user: "Where should I save the exported article?"

Suggested paths:
- `~/Work/Write/Articles` (default)
- `~/Documents/Obsidian/Articles`
- `~/Notes/Imported`
- (or custom path from `$output_dir` argument)

**Success criteria:** User confirms output directory

**Artifacts:** `$OUTPUT_DIR` variable set

---

### Step 4: Create Directory Structure

**Execution:** Direct (Bash)

```bash
# Use argument if provided, otherwise use confirmed path
OUTPUT_DIR="${output_dir:-$USER_CONFIRMED_PATH}"

# Sanitize title for directory name
SAFE_TITLE=$(echo "$TITLE" | sed 's/[/:*?"<>|]//g' | cut -c1-100 | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')

# Create output directory
ARTICLE_DIR="$OUTPUT_DIR/$SAFE_TITLE"
mkdir -p "$ARTICLE_DIR/images"
```

**Success criteria:**
- Directory `$ARTICLE_DIR` exists
- Subdirectory `images/` exists
- Directory is writable

**Rules:**
- Remove special characters: `/ : * ? " < > |`
- Limit title length to 100 characters
- Trim leading/trailing whitespace

---

### Step 5: Download Images (Parallel if possible)

**Execution:** Direct (Bash)

```bash
counter=1
for url in $IMAGE_URLS; do
    ext=$(echo "$url" | grep -oE '\.(jpg|jpeg|png|gif|webp|svg)' || echo ".jpg")
    curl -L -s "$url" -o "$ARTICLE_DIR/images/image_${counter}${ext}"

    # Check file size (detect 0-byte failures)
    if [ ! -s "$ARTICLE_DIR/images/image_${counter}${ext}" ]; then
        # Try alternative format (Twitter)
        curl -L -s "${url}?format=jpg&name=orig" -o "$ARTICLE_DIR/images/image_${counter}.jpg"
    fi

    counter=$((counter + 1))
done
```

**Success criteria:**
- All image files exist and size > 0 bytes
- File count matches `$IMAGE_URLS` count

**Rules:**
- Use `curl -L` to follow redirects
- Check file size after download
- Try alternative formats for Twitter images

---

### Step 6: Update Image References

**Execution:** Direct (Bash)

```bash
# Replace remote URLs with local paths
counter=1
for url in $IMAGE_URLS; do
    ext=$(echo "$url" | grep -oE '\.(jpg|jpeg|png|gif|webp|svg)' || echo ".jpg")
    sed -i.bak "s|$url|./images/image_${counter}${ext}|g" /tmp/article.md
    counter=$((counter + 1))
done

# Save updated markdown
cp /tmp/article.md "$ARTICLE_DIR/README.md"
rm /tmp/article.md.bak
```

**Success criteria:**
- `README.md` contains `./images/image_N.*` references
- No remote URLs remain in image links

---

### Step 7: AI Translation (Optional)

**Execution:** Direct (AI session)

**Human checkpoint:** Ask user: "Do you want to translate the article? (y/n)"

If yes:
1. Read `$ARTICLE_DIR/README.md`
2. Translate using AI capabilities (no external API)
3. Write to `$ARTICLE_DIR/README_CN.md` (or other language code)

**Translation Prompt Template:**
```
Translate the following Markdown article to [LANGUAGE] while preserving:
- All Markdown formatting (headings, lists, code blocks, tables)
- Image references exactly as-is: ![alt](./images/image_N.*)
- Links and URLs unchanged
- Code blocks and technical terms in original language

Only output the translated Markdown content.

---
[Paste README.md content]
```

**Success criteria:** Translation file exists and size ≈ original ± 20%

**Supported languages:** en, zh, es, fr, de, ja, ko

---

### Step 8: Create Navigation Index

**Execution:** Direct (Bash)

```bash
# Auto-detect source from URL
case "$URL" in
    *x.com*|*twitter.com*) SOURCE="X" ;;
    *medium.com*) SOURCE="Medium" ;;
    *dev.to*) SOURCE="Dev.to" ;;
    *openai.com*) SOURCE="OpenAI Blog" ;;
    *substack.com*) SOURCE="Substack" ;;
    *github.com*) SOURCE="GitHub" ;;
    *) SOURCE=$(echo "$URL" | sed 's|https\?://||' | cut -d/ -f1) ;;
esac

# Create index.md
cat > "$ARTICLE_DIR/index.md" <<EOF
# $TITLE

> **Export Date**: $(date +%Y-%m-%d)
> **Original URL**: $URL
> **Source**: $SOURCE

## 📚 Lan

Files: 8

Size: 34.6 KB

Complexity: 57/100

Category: Image & Video

Source: https://github.com/actionbook/actionbook/tree/main/playground/article-exporter

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts