emu-bulk-upload

Included with Lifetime

$97 forever

Help museum insect curators bulk upload specimen data to the Emu database. Maps any input format to Emu's template, matches localities to existing records, finds parent sites, creates bulk upload tables, and walks users through the upload process.

Generalscripts

What this skill does


# Emu Bulk Upload Skill

Help users prepare and upload entomological specimen data to the Emu collection database at the Field Museum of Natural History (FMNH). The process starts by mapping whatever data the user provides into Emu's standard format, then works through each Emu module (Sites, Events, Catalog) to match existing records, create new ones, and produce properly formatted tables for bulk upload.

## When to use

- User wants to upload specimen data to Emu
- User needs to match localities to existing Emu site records
- User has specimen data in any format that needs database preparation
- Keywords: "bulk upload", "Emu", "upload specimens", "match sites", "prepare data"

## Key reference files

- `references/Emu_upload_default.xlsx` — The Emu upload template (49 columns, 3 modules)
- `references/emu_field_reference.md` — Complete field reference with descriptions, export name mappings, and normalization rules
- `references/phase1_sites.md` — Sites phase detailed reference (scripts, matching, export guide)

## Session start

### File discovery

At the start of a session, search the working directory for files the user may have already placed there:
- Look for data files: `.xlsx`, `.csv`, `.tsv`, `.txt`, or other tabular formats
- Look for Emu exports: `.csv` files that may contain site/event data

If any are found, list them and confirm with the user:

> I found these files in the working directory:
> - `specimen_data.csv` (csv, 145 KB)
> - `US_sites_export.csv` (csv, 12 MB)
>
> Which of these should I work with? Is `specimen_data.csv` your specimen data and `US_sites_export.csv` an Emu sites export?

### Privilege gating

Before any work, ask the user:

> Do you have Emu bulk upload privileges?
> 1. Yes
> 2. No (not sure = no)

Record the answer. It determines how upload steps are handled later:
- **With privileges**: walk through Emu upload steps directly
- **Without privileges**: prepare tables and ask user to send them to their collection manager

### Coordinate provenance

Also ask once at session start:

> Are the coordinates in your data from your own sampling metadata (i.e. primary coordinates, recorded at the sampling site), or inherited from named places (centroids / generic coordinates)?
> 1. Primary — from sampling metadata
> 2. Inherited / centroid

If **primary**, the skill applies the unnamed-Precise-Locality rule: each specimen row with coordinates produces an unnamed `Precise Locality` node whose parent is the most specific named node (Village/Town/etc.). If **inherited**, coordinates stay attached to the named node directly.

---

## Step 1: Data intake and mapping

This step takes the user's data in **any format** and maps it into the Emu template format (`references/Emu_upload_default.xlsx`).

### Accept any input

The user's data may be:
- An xlsx already in Emu template format (3-row header with Emu field names in Row 2)
- An xlsx or csv with their own column names (e.g., "lat", "long", "state", "species")
- A tsv, txt, or other delimited file
- Data with completely different column naming conventions

### Analyze the input

1. Read the file and present the columns and a sample of the data to the user
2. Read `references/emu_field_reference.md` to understand all Emu fields
3. Propose a column mapping: for each user column, suggest which Emu field it corresponds to

Present the mapping as a table for user review:

> Here's how I'd map your columns to Emu fields:
>
> | Your column | Emu field | User-friendly name |
> |---|---|---|
> | lat | LatLatitude_nesttab | Latitude |
> | long | LatLongitude_nesttab | Longitude |
> | state | LocProvinceStateTerritory_tab | Province/State |
> | county | LocDistrictCountyShire_tab | County |
> | locality | LocPreciseLocation | Precise Location |
> | species | IdeTaxonRef_tab.irn | Taxon |
> | collector | ColParticipantRef_tab(2).irn | Collectors |
> | *unmapped* | — | (columns with no Emu equivalent) |
>
> Does this look right? Any corrections?

Wait for user confirmation before proceeding. Unmapped columns are preserved but not used in Emu uploads.

### Check for Emu template format

If the file already has the 3-row header structure (Row 1 = friendly names, Row 2 = Emu field names, Row 3 = example) with correct Emu field names, skip the mapping and confirm:

> Your file is already in Emu template format. I found N data rows with columns for [Sites, Events, Catalog] fields.

### Build transformation script

Once the mapping is confirmed, **write a Python script on the fly** to transform the user's data into the Emu template format:

1. Read the user's input file (whatever format it is)
2. Map columns according to the confirmed mapping
3. Output an xlsx in Emu template format:
   - Row 1: user-friendly names (from template)
   - Row 2: Emu field names (from template)
   - Row 3: first data row as example
   - Row 4+: data
   - Column colors: green (`FFCCFFCC`) for Sites, gray (`FFC0C0C0`) for Events, tan (`FFFFCC99`) for Catalog
4. Only include columns that have data (skip entirely empty Emu fields)

Save the script to `/tmp/emu_transform.py` and run it. Save the output to `/tmp/emu_user_data.xlsx`.

Reference `scripts/parse_user_data.py` for patterns on reading xlsx files with openpyxl and applying cell colors.

### Present summary

After transformation, report:
- Number of data rows
- Which Emu modules have data (Sites, Events, Catalog)
- Sample of the transformed data
- Any data quality notes (missing values, format issues)

---

## Phase 1: Sites

Match user localities to existing Emu site records, create new records where needed, and obtain site IRNs.

**Detailed reference**: `references/phase1_sites.md`

### Step 1.1: Extract site data

From the transformed user xlsx (`/tmp/emu_user_data.xlsx`), extract the site columns (green-filled: hierarchy, elevation, coordinates, site number).

```bash
python3 scripts/parse_user_data.py /tmp/emu_user_data.xlsx /tmp/emu_user_sites.json
```

Report: number of specimens, site columns detected, sample data.

### Step 1.2: Get Emu sites export

Ask for the Emu sites export (CSV). If identified during file discovery, use it directly. If the user doesn't have one:

1. Analyze the user's data and suggest search criteria (see `references/phase1_sites.md` § "Choosing search criteria")
2. Ask: "Do you know how to export sites from Emu, or would you like step-by-step guidance with screenshots?"
   - **Option 1 — Quick summary**: See `references/phase1_sites.md` § "Quick summary"
   - **Option 2 — Step-by-step guide**: Walk through each screenshot one at a time (see `references/phase1_sites.md` § "Step-by-step screenshot guide"). Show one step, wait for confirmation, then proceed.

Parse the export:
```bash
python3 scripts/parse_emu_export.py <emu_export.csv> /tmp/emu_index.json
```

Report: records loaded, coordinate coverage.

### Step 1.3: Deduplicate and match

```bash
python3 scripts/deduplicate_sites.py /tmp/emu_user_sites.json /tmp/emu_dedup.json
python3 scripts/match_sites.py /tmp/emu_dedup.json /tmp/emu_index.json /tmp/emu_match.json
```

Report: "Your N specimens contain M unique sites."

Review matches using your judgment (see `references/phase1_sites.md` § "Match review guidelines"):
- **Exact matches** (score >= 90): report silently
- **Near matches** (60–89): present comparison table, ask user to confirm/reject each
- **No matches**: note for parent finding

### Step 1.4: Find parents and intermediate chain

```bash
python3 scripts/find_parents.py /tmp/emu_match.json /tmp/emu_index.json /tmp/emu_parents.json
```

Review parent results (see `references/phase1_sites.md` § "Parent rules"):
- **Perfect parents** (score >= 90): proceed silently.
- **Partial parents**: present to user for confirmation.
- **No parent found**: flag for manual resolution.

The output also includes a `chain` array for each unmatched site listing every intermediate level the user specified (default status `needs_creation`). For each intermediate, call `match_named_

Files: 25

Size: 6819.5 KB

Complexity: 81/100

Category: General

Source: https://github.com/brunoasm/my_claude_skills/tree/main/Emu_bulk_upload_FMNH

Related in General

modeling-omnistudio-epc-catalog

Included

Salesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).

Generalscripts

relationship-science-coach

Included

Use this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.

Generalscripts

building-sf-integrations

Included

Salesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).

Generalscripts

venue-templates

Included

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

Generalscripts

let-fate-decide

Included

Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.

Generalscripts

net-ops

Included

Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.

Generalscripts