cupynumeric-migration-readiness

Included with Lifetime

$97 forever

Pre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must be refactored before porting, or mentions pre-port assessment, scaling analysis, or refactor planning. Inspect the user's source code, look up NumPy usage, cross-reference the cuPyNumeric API support manifest, and distinguish distributed-scaling-friendly patterns from blockers such as unsupported APIs, scalar synchronization, host round-trips, Python/object-heavy control flow, shape/data-dependent branching, and in-place mutation hazards. Produce a verdict of READY, LIGHT REFACTOR, SIGNIFICANT REFACTOR, or NOT RECOMMENDED, with concrete refactor pointers.

Backend & APIsscriptsassets

What this skill does


# cuPyNumeric Migration Readiness

## Purpose

**Use this skill BEFORE the migration, not during.** Answer one question: *which of the user's existing NumPy APIs will scale on cuPyNumeric, and which need refactoring, before they commit engineer-weeks to porting?* To answer it: read the source, classify each NumPy idiom by its expected multi-GPU scaling on the Legate/NVIDIA GPU stack, cross-reference the bundled API-support manifest, and produce a structured verdict with per-finding reasoning and recipe pointers.

**This is a static, read-only assessment.** Inspect the user's source with `Read`, `Grep`, and `Glob`. Do **not** execute the user's code, modify or write files, or print environment variables or secrets. The `legate`, and cuPyNumeric Doctor commands shown below are suggestions for the *user* to run — not actions this skill performs.

If this skill has never been seen before, head to [`references/getting-started.md`](references/getting-started.md) first.

## When to use this skill

Use when the user is **about to** migrate NumPy code to GPU and asks whether it will scale on cuPyNumeric / GPU, whether they should migrate, which parts will benefit, what must change before porting, or whether the port is worth it — or mentions pre-port assessment, scaling analysis, idiom analysis, GPU refactor planning, or identifying NumPy anti-patterns for GPU.

**Decline and redirect** when the request is *not* a pre-migration assessment:

- **Post-migration performance / profiling** ("already ported, why is it slow?") → point to `legate --profile` and the upstream [profiling and debugging](https://docs.nvidia.com/cupynumeric/latest/user/profiling_debugging.html) walkthrough.
- **Custom CUDA / kernel authoring** ("write/optimize a CUDA kernel")

A graph / sparse / ML / NLP  workload that the user *is* asking to migrate is still **in scope**: assess it and return **NOT RECOMMENDED** via Gate 4. That is a verdict, not a decline.

## Instructions

Run all five steps below, in order. Read the user's code and reason about it semantically; do not emit a one-shot prose verdict.

### Step 1 — Gather context

Elicit before scanning code. Each item below has a default tuned to the typical workload — use the default when the user does not volunteer specifics; do not block on questions.

- **Source location.** Default to the current working directory when no path is given.
- **Approximate hot-path array sizes at runtime.** Default to 30–50 million elements. Map the user's numbers (or this default) to the [Gate 2 tiers](references/decision-framework.md#gate-2-problem-size) (65K per-GPU floor; 10M+ for real single-GPU speedup; 100M+ for multi-GPU).
- **Target hardware.** Default to 1–4 GPUs, single-node. Confirm before assuming multi-node. For CPU-only runs, ask about RAM per node instead of FBMEM.
- **Dominant compute pattern.** Stencil / GEMM / Monte Carlo / reductions / mixed-with-SciPy. Ask the user to name it; otherwise infer it from the code in Step 3.

State the defaults you applied at the top of the assessment so the user can correct them. If a value is indeterminable, say so plainly and proceed with the qualitative-only assessment — do not fabricate numbers beyond the defaults above.

### Step 2 — Load the API support manifest

Read [`assets/api-support.md`](assets/api-support.md), the committed snapshot of the upstream NumPy-vs-cuPyNumeric comparison table. For each NumPy API the code calls, find its line and read the leading glyph:

- `✓✓ numpy.X` — implemented and works on multi-GPU (the best path).
- `✓ numpy.X` — implemented but single-GPU/CPU only (caveats multi-node).
- `🟡 numpy.X — <note>` — partial support; read the note.
- `✗ numpy.X` — not implemented on the cuPyNumeric distributed path. Behavior on call is version-specific (some unsupported APIs route through host NumPy, others raise an exception) — either way, hot-path use is a migration blocker. Do not promise users a silent fallback to host-NumPy.

If the `Fetched:` line is more than ~90 days old, refresh the snapshot — see the **Available Scripts** section.

### Step 3 — Read the code semantically

Walk the user's files with `Read` and `Grep` and classify each region of array math against [`references/idioms-that-scale.md`](references/idioms-that-scale.md) and [`references/idioms-that-block.md`](references/idioms-that-block.md) (full rationale and R-codes live there). Read semantically, not by regex: before flagging, confirm `arr` traces back to a `cupynumeric` array (or `np.*` aliased to it) and check whether the access sits inside a hot loop. Apply these rules:

- **Flag element loops** (`for i in range(n): arr[i] = ...`) as blockers; treat an epoch/step/file loop with a vectorized body as fine — distinguish the two.
- **Flag scalar sync** — `.item()` / `float()` / `int()` / `bool()` / `complex()` on a cuPyNumeric array inside a hot loop (per-iteration host sync); allow it at the boundary.
- **Flag reducing conditions** — `if`/`while` over an array reduction (`while np.max(err) > tol:`) syncs every iteration.
- **Flag hoistable allocation in a loop** as a fixable inefficiency.
- **Flag `mpi4py`** in runtime code that partitions/communicates array data alongside `cupynumeric` ([R108](references/idioms-that-block.md#r108)) — but first confirm it issues MPI calls on a hot path; ignore a grep hit in a README, build script, or alt-launcher.
- **Flag `order=`** on `reshape` / `asarray` / `flatten` as [R109](references/idioms-that-block.md#r109) — always, regardless of whether the version warns or silently no-ops.
- **Always cite [R304](references/idioms-that-scale.md#r304)** in INFO for `np.random.*` under multi-GPU: cross-GPU bit-identical reproducibility is impossible by default (`--gpus N` / `LEGATE_GPUS` is the [Legate launcher arg](https://docs.nvidia.com/legate/latest/manual/usage/running.html)).
- **Flag Python builtins on arrays** (`sum`/`max`/`min`/`any`/`iter(arr)`) — host-iteration fallback ([R110](references/idioms-that-block.md#r110); [upstream best practices](https://nv-legate.github.io/cupynumeric/user/practices.html#use-numpy-s-functions-avoid-using-python-s-built-in-functions)). Allow `len(arr)` (shape lookup; prefer `arr.shape[0]` / `arr.size` for 0-d safety).
- **Flag `cupy` mixed with `cupynumeric`** in a hot loop ([R111](references/idioms-that-block.md#r111)); the runtimes don't share GPU memory, so every hop goes through host NumPy.
- **Look up every NumPy API the code calls** in `assets/api-support.md` (glyph legend in Step 2).

For the deep "why," read [`references/gpu-stack.md`](references/gpu-stack.md) (memory, SM, communication, dispatch) and [`references/execution-model.md`](references/execution-model.md) (lazy execution, sync points, mapper).

### Step 4 — Produce a structured assessment

Deliver the report in this order. Cite `file:line` for every finding so the user can navigate.

1. **Verdict** in one sentence — see "Verdict framework" below.
1. **What works (SCALES findings)** — quote representative lines so the user sees what will speed up after the import swap.
1. **What blocks (BLOCKS findings)** — each tied to [`idioms-that-block.md`](references/idioms-that-block.md) and a recipe in [`refactor-recipes.md`](references/refactor-recipes.md).
1. **What's fixable (REFACTOR findings)** — group by recipe; one recipe often fixes many sites.
1. **Compatibility / cost notes (INFO findings)** — SciPy boundaries, single-GPU-only linalg / FFT, RNG layout vs `--gpus N`.
1. **API support gaps** — APIs the code calls that are unimplemented or single-GPU only per the manifest.
1. **Decision-framework summary** — Gates 1–6 from [`references/decision-framework.md`](references/decision-framework.md), marked pass / fail / uncertain.
1. **Recommended next steps** — which recipes to apply first, whether to port one module first, and when to involve cuPyNumeric Doctor.

**All 8 sections must appear**, even when the verdict is READY or NOT RECOMMENDED. Under an empty section write **"N

Files: 41

Size: 297.4 KB

Complexity: 100/100

Category: Backend & APIs

Source: https://github.com/NVIDIA/skills/tree/main/skills/cupynumeric-migration-readiness

Related in Backend & APIs

jfrog

Included

Interact with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.

Backend & APIsscripts

alibabacloud-data-agent-skill

Included

Invoke Alibaba Cloud Apsara Data Agent for Analytics via CLI to perform natural language-driven data analysis on enterprise databases. Data Agent for Analytics is an intelligent data analysis agent developed by Alibaba Cloud Database team for enterprise users. It automatically completes requirement analysis, data understanding, analysis insights, and report generation based on natural language descriptions. This tool supports: discovering data resources (instances/databases/tables) managed in DMS, initiating query or deep analysis sessions, real-time progress tracking, and retrieving analysis conclusions and generated reports. Use this Skill when users need to query databases, analyze data trends, generate data reports, ask questions in natural language, or mention "Data Agent", "data analysis", "database query", "SQL analysis", "data insights".

Backend & APIsscripts

token-optimizer

Included

Reduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session pruning, bootstrap size limits, cache TTL alignment). Use when token costs are high, API rate limits are being hit, or hosting multiple agents at scale. The 4 executable scripts (context_optimizer, model_router, heartbeat_optimizer, token_tracker) are local-only — no network requests, no subprocess calls, no system modifications. Reference files (PROVIDERS.md, config-patches.json) document optional multi-provider strategies that require external API keys and network access if you choose to use them. See SECURITY.md for full breakdown.

Backend & APIsscripts

resend-cli

Included

Use this skill when the task is specifically about operating Resend from an AI agent, terminal session, or CI job via the official resend CLI: installing/authenticating the CLI, sending/listing/updating/cancelling emails, batch sends, domains and DNS, webhooks and local listeners, inbound receiving, contacts, topics, segments, broadcasts, templates, API keys, profiles, or debugging Resend CLI/API failures. Trigger on mentions of Resend CLI, `resend`, `resend doctor`, `resend emails send`, `resend domains`, `resend webhooks listen`, `resend emails receiving`, or agent-friendly terminal automation.

Backend & APIsscripts

alibabacloud-odps-maxframe-coding

Included

Use this skill for MaxFrame SDK development and documentation navigation on Alibaba Cloud MaxCompute (ODPS). Helps answer MaxFrame API, concept, official example, and supported pandas API questions; create data processing programs; read/write MaxCompute tables; debug jobs (remote or local); and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute with MaxFrame, ODPS table processing, DPE runtime, MaxFrame docs/examples, DataFrame/Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.

Backend & APIsscripts

rhdh-jira

Included

Interacts with RHDH Jira projects (RHIDP, RHDHPLAN, RHDHBUGS, RHDHSUPP) using acli, GraphQL, and REST API. Covers the full Jira lifecycle: create issues, assign, refine, plan sprints, report, track releases, and update status. Trigger on Jira keys (RHIDP-1234), "create a feature/epic/story/task/bug", "who should take this", "refine this", "plan the sprint", "sprint report", "release status", "update jira", or any sprint ceremony prep.

Backend & APIsscripts