arxiv-search
Search arXiv for preprints and scholarly articles across physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering, systems science, and economics. Supports field-specific queries (title, author, abstract, category), boolean logic, date filtering, and bulk retrieval with pagination.
What this skill does
# arXiv Search Skill
Search and retrieve metadata for preprints on arXiv, the open-access repository for scholarly articles.
## Critical: Things Claude Won't Know Without This Skill
### Response format is Atom XML, not JSON
The arXiv API returns **Atom 1.0 XML** — there is no JSON option. You must parse XML to extract results. The Python script handles this automatically.
### The base URL uses `export.arxiv.org`, not `arxiv.org`
```
http://export.arxiv.org/api/query?search_query=all:transformer&max_results=5
```
Using `arxiv.org` directly will not work for API queries.
### Boolean operators must be UPPERCASE
Use `AND`, `OR`, `ANDNOT` — lowercase will not work:
```
search_query=ti:attention AND ti:transformer
search_query=cat:cs.AI ANDNOT cat:cs.CL
```
### Parentheses and quotes must be URL-encoded
- Parentheses: `%28` and `%29`
- Quotes (phrase search): `%22`
```
search_query=ti:%22large+language+model%22
search_query=au:bengio+AND+%28cat:cs.LG+OR+cat:cs.AI%29
```
### Date range filtering uses a special syntax
```
search_query=submittedDate:[202401010000+TO+202412312359]
```
Format: `YYYYMMDDTHHMM` in GMT, 24-hour. The `T` is literal. Combine with other queries using `AND`.
### Rate limiting: 3 seconds between requests
arXiv asks for a minimum 3-second delay between API calls. Results update once daily, so there is no reason to poll frequently.
### Hard limits on results
- `max_results` caps at **2000** per request
- Total retrievable results cap at **30,000** (start + max_results ≤ 30000)
- Exceeding 30,000 returns HTTP 400
## Search Field Prefixes
| Prefix | Field |
|--------|-------|
| `ti` | Title |
| `au` | Author |
| `abs` | Abstract |
| `co` | Comment |
| `jr` | Journal Reference |
| `cat` | Subject Category |
| `rn` | Report Number |
| `id_list` | Specific arXiv IDs (comma-separated, passed as separate param) |
| `all` | All fields simultaneously |
## Python Script
Use `scripts/arxiv_api.py` for programmatic access (zero dependencies):
```python
from scripts.arxiv_api import ArxivAPI
api = ArxivAPI()
# Simple keyword search
results = api.search("all:transformer attention mechanism", max_results=10)
# Field-specific search
results = api.search("ti:diffusion AND cat:cs.CV", max_results=5)
# Author search
results = api.search("au:hinton AND cat:cs.LG", max_results=20)
# Date-filtered search
results = api.search(
"cat:cs.AI AND submittedDate:[202401010000 TO 202412312359]",
sort_by="submittedDate",
sort_order="descending",
max_results=50
)
# Fetch specific papers by arXiv ID
results = api.fetch_by_ids(["2301.07041", "2303.08774", "2305.10601"])
# Paginated retrieval
page1 = api.search("cat:cs.CL", max_results=100, start=0)
page2 = api.search("cat:cs.CL", max_results=100, start=100)
# Each result is a dict with keys:
# id, title, summary, authors, published, updated,
# categories, primary_category, links, comment, journal_ref, doi
```
## Common Category Codes
| Category | Description |
|----------|-------------|
| `cs.AI` | Artificial Intelligence |
| `cs.CL` | Computation and Language (NLP) |
| `cs.CV` | Computer Vision |
| `cs.LG` | Machine Learning |
| `cs.CR` | Cryptography and Security |
| `cs.DS` | Data Structures and Algorithms |
| `cs.SE` | Software Engineering |
| `math.AG` | Algebraic Geometry |
| `math.CO` | Combinatorics |
| `physics.hep-th` | High Energy Physics - Theory |
| `quant-ph` | Quantum Physics |
| `stat.ML` | Machine Learning (Statistics) |
| `econ.GN` | General Economics |
| `q-bio.GN` | Genomics |
Full list: https://arxiv.org/category_taxonomy
## API Etiquette
- **Rate limit**: 3-second minimum between requests
- **No authentication required**
- **Results update daily** — no need to poll the same query more than once per day
- **Use pagination** for large result sets instead of requesting thousands at once
- **Respect HTTP 429**: Back off if rate-limited
## Related Skills
- **wikidata-search**: Cross-reference arXiv authors with Wikidata entities for authority control identifiers (ORCID, VIAF, etc.)
- **harvard-library-catalog**: Look up published versions of arXiv preprints in Harvard's bibliographic records
## Resources
- `references/api_reference.md` — Complete API specs for query parameters, response fields, and error handling
- `scripts/arxiv_api.py` — Python client with XML parsing, rate limiting, and pagination support
Related in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.