datasette
Access Our World In Data's internal datasette instance to query data from our main MySQL database mirror and analytics data store. Use this when you need to look up or explore data from our internal infrastructure. Does not contain any timeseries data i.e. this does not help to answer questions about data like "what is the life expectancy in Nigeria?". Instead, it is useful for questions like "how many published charts do we have?" or "get the title for all views for a multidim".
What this skill does
# Querying OWID's internal Datasette instance
Our World In Data runs an internal [Datasette](https://datasette.io/) instance that mirrors data from two sources:
1. **Main MySQL database** — contains our catalog of charts, variables, datasets, sources, and other metadata used by the OWID website and data pipeline.
2. **Analytics data store** — contains page view and usage analytics data.
## Access
The datasette instance is available via tailscale at:
```
http://analytics/
```
No API keys or tokens are needed — access is controlled at the network level. However, **you must always include a User-Agent header** — requests without one get a 502 error. Use `-H "User-Agent: Claude"` on every curl call.
## Available databases
- `private` — mirror of the main MySQL database (charts, variables, datasets, entities, etc.)
- `analytics` — page view and usage analytics
## URL structure
Datasette URLs follow this hierarchy:
```
/ -> List all databases
/{database} -> Database overview (list tables)
/{database}/{table} -> Browse rows of a table
/{database}/{table}/{pk} -> Individual row by primary key
/{database}.json?sql=... -> Run arbitrary SQL as JSON. Our datasette instance uses duckdb, which is very similar to PostgreSQL.
/{database}.csv?sql=... -> Run arbitrary SQL as CSV
```
### Tilde escaping
Datasette uses tilde encoding (not percent-encoding) for special characters in database/table names within URL paths. Allowed characters that need no encoding: `A-Z a-z 0-9 _ -`. Spaces are encoded as `+`. Everything else uses `~` followed by a two-digit hex code (e.g. `.` becomes `~2E`, `/` becomes `~2F`).
## IMPORTANT: Always retrieve metadata before writing queries
Before constructing any SQL query against a database, you **must** first retrieve the table and column metadata so you know what tables exist and what columns they have. Do not guess at table or column names.
### Step 1: Get the list of tables with columns, primary keys, and foreign keys
```bash
curl -s -H "User-Agent: Claude" "http://analytics/private.json"
```
This returns a JSON object with a `tables` array. Each entry includes:
- `name` — table name
- `columns` — list of column names
- `primary_keys` — list of primary key columns
- `count` — row count
- `foreign_keys` — incoming and outgoing foreign key relationships
### Step 2: Get column types
To get the `CREATE TABLE` statement for a specific table, query `sqlite_master`:
```bash
curl -s -G -H "User-Agent: Claude" "http://analytics/private.json" \
--data-urlencode "sql=SELECT sql FROM sqlite_master WHERE name = 'charts'" \
--data-urlencode "_shape=array"
```
To get all column names and types for a table, you can also use `information_schema`:
```bash
curl -s -G -H "User-Agent: Claude" "http://analytics/private.json" \
--data-urlencode "sql=SELECT table_name, column_name, data_type FROM information_schema.columns WHERE table_name = 'charts'" \
--data-urlencode "_shape=array"
```
### Step 3: Check for human-readable descriptions
```bash
curl -s -H "User-Agent: Claude" "http://analytics/-/metadata.json"
```
This returns descriptions for databases, tables, and individual columns.
## Running SQL queries
### JSON output
Append `.json` to the database name:
```bash
curl -s -G -H "User-Agent: Claude" "http://analytics/private.json" \
--data-urlencode "sql=SELECT * FROM charts LIMIT 5" \
--data-urlencode "_shape=objects"
```
### CSV output
Append `.csv` instead:
```bash
curl -s -G -H "User-Agent: Claude" "http://analytics/private.csv" \
--data-urlencode "sql=SELECT * FROM charts LIMIT 5"
```
### JSON shape options
The `_shape` parameter controls how rows are formatted in JSON responses:
| Value | Effect |
| ------------ | --------------------------------------------------------- |
| `objects` | Rows as key/value objects (default and recommended) |
| `arrays` | Rows as arrays of values, no column keys |
| `array` | Flat JSON array of objects, no wrapper metadata |
| `arrayfirst` | Flat array of just the first column's values |
| `object` | Objects keyed by primary key |
### Browsing table data (no SQL needed)
You can also browse table data directly without writing SQL. Append `.json` or `.csv` to the table path:
```bash
# JSON
curl -s -H "User-Agent: Claude" "http://analytics/private/charts.json?_size=5&_shape=objects"
# CSV
curl -s -H "User-Agent: Claude" "http://analytics/private/charts.csv?_size=5"
```
Table endpoints support column filtering with `?column__operator=value` syntax:
```
?state__exact=CA
?planet_int__gt=1
?name__contains=energy
?id__in=1,2,3
```
## Useful query parameters
| Parameter | Description |
| ------------- | ---------------------------------------------------------- |
| `sql` | The SQL query (use `--data-urlencode` to pass it safely) |
| `_shape` | JSON response shape (see above) |
| `_size` | Rows per page (use `_size=max` for up to 1000) |
| `_sort` | Sort ascending by column |
| `_sort_desc` | Sort descending by column |
| `_col` | Include only this column (repeatable) |
| `_nocol` | Exclude this column (repeatable) |
| `_where` | Add a WHERE clause fragment |
| `_stream=on` | Stream all rows (CSV only, up to 100MB) |
| `_header=off` | Omit CSV header row |
## Tips
- Always use `curl -s -G -H "User-Agent: Claude"` with `--data-urlencode` to safely pass SQL queries — this avoids manual URL-encoding and the User-Agent header is required to avoid 502 errors.
- For large result sets, use `LIMIT` and `OFFSET` in your SQL to paginate, or use `_stream=on` with CSV output to get all rows at once.
- The datasette instance mirrors data periodically — it may not reflect the very latest changes to the production database.
- Only `SELECT` statements are allowed. `INSERT`, `UPDATE`, `DELETE`, and `PRAGMA` are rejected.
- Use `_size=max` to get up to 1000 rows in a single JSON response.
- The `grapher_configs` table in the private database contains our chart config JSON blobs. If you need to understand the structure of these, fetch the json schema from https://files.ourworldindata.org/schemas/grapher-schema.latest.json for reference.
## Example Queries
Find indicators by name:
```bash
curl -s -G -H "User-Agent: Claude" "http://analytics/private.json" \
--data-urlencode "sql=SELECT id, name, unit, catalogPath FROM variables WHERE name LIKE '%renewable energy%' ORDER BY id DESC LIMIT 10" \
--data-urlencode "_shape=array" | jq
```
Find published charts by topic:
```bash
curl -s -G -H "User-Agent: Claude" "http://analytics/private.json" \
--data-urlencode "sql=SELECT id, title, slug, type FROM charts WHERE title LIKE '%life expectancy%' AND isPublished = 1 LIMIT 10" \
--data-urlencode "_shape=array" | jq
```
Find which variables power a chart:
```bash
curl -s -G -H "User-Agent: Claude" "http://analytics/private.json" \
--data-urlencode "sql=SELECT v.id, v.name, v.unit FROM variables v JOIN chart_dimensions cd ON v.id = cd.variableId JOIN charts c ON cd.chartId = c.id WHERE c.slug = 'life-expectancy'" \
--data-urlencode "_shape=array" | jq
```
Count charts by type:
```bash
curl -s -G -H "User-Agent: Claude" "http://analytics/private.json" \
--data-urlencode "sql=SELECT type, COUNT(*) as count FROM charts WHERE isPublished = 1 GROUP BY type ORDER BY count DESC" \
--data-urlencode "_shape=array" | jq
```
Related in Data & Analytics
clawarr-suite
IncludedComprehensive management for self-hosted media stacks (Sonarr, Radarr, Lidarr, Readarr, Prowlarr, Bazarr, Overseerr, Plex, Tautulli, SABnzbd, Recyclarr, Unpackerr, Notifiarr, Maintainerr, Kometa, FlareSolverr). Deep library exploration, analytics, dashboard generation, content management, request handling, subtitle management, indexer control, download monitoring, quality profile sync, library cleanup automation, notification routing, collection/overlay management, and media tracker integration (Trakt, Letterboxd, Simkl).
querying-soql
IncludedSOQL query generation, optimization, and analysis with 100-point scoring. Use this skill when the user needs SOQL/SOSL authoring or optimization: natural-language-to-query generation, relationship queries, aggregates, query-plan analysis, and performance or safety improvements for Salesforce queries. TRIGGER when: user writes, optimizes, or debugs SOQL/SOSL queries, touches .soql files, or asks about relationship queries, aggregates, or query performance. DO NOT TRIGGER when: bulk data operations (use handling-sf-data), Apex DML logic (use generating-apex), or report/dashboard queries.
app-store-optimization
IncludedApp Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.
habit-flow
IncludedAI-powered atomic habit tracker with natural language logging, streak tracking, smart reminders, and coaching. Use for creating habits, logging completions naturally ("I meditated today"), viewing progress, and getting personalized coaching.
app-store-optimization
IncludedApp Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.
visualizing-data
IncludedBuilds dashboards, reports, and data-driven interfaces requiring charts, graphs, or visual analytics. Provides systematic framework for selecting appropriate visualizations based on data characteristics and analytical purpose. Includes 24+ visualization types organized by purpose (trends, comparisons, distributions, relationships, flows, hierarchies, geospatial), accessibility patterns (WCAG 2.1 AA compliance), colorblind-safe palettes, and performance optimization strategies. Use when creating visualizations, choosing chart types, displaying data graphically, or designing data interfaces.