east-py-datascience
Data science and machine learning platform functions for the East language (TypeScript types). Use when writing East programs that need optimization (MADS, Optuna, SimAnneal, Scipy, Optimization, GoogleOr), machine learning (XGBoost, LightGBM, NGBoost, Torch MLP, Lightning, GP), Bayesian inference (PyMC), simulation (Simulation DES), ML utilities (Sklearn preprocessing, metrics, splits), conformal prediction (MAPIE), or model explainability (SHAP). Triggers for: (1) Writing East programs with @elaraai/east-py-datascience, (2) Derivative-free optimization with MADS, (3) Bayesian optimization with Optuna, (4) Discrete/combinatorial optimization with SimAnneal, (5) Gradient boosting with XGBoost or LightGBM, (6) Probabilistic predictions with NGBoost or GP, (7) Neural networks with Torch MLP or Lightning, (8) Data preprocessing and metrics with Sklearn, (9) Conformal prediction intervals with MAPIE, (10) Model explainability with Shap, (11) Iterative coordinate descent with Optimization, (12) Constraint programming, vehicle routing, LP/MIP, or graph algorithms with GoogleOr, (13) Bayesian regression, hierarchical models, and multi-layer estimation with PyMC, (14) Economic ontology simulation via discrete event simulation with Simulation.
What this skill does
# East Data Science
Data science and machine learning platform functions for the East language. Provides optimization, ML models, preprocessing, and explainability.
## Quick Start
```typescript
import { East, FloatType, variant } from "@elaraai/east";
import { MADS } from "@elaraai/east-py-datascience";
// Define objective function
const objective = East.function([MADS.Types.VectorType], FloatType, ($, x) => {
const x0 = $.let(x.get(0n));
const x1 = $.let(x.get(1n));
return $.return(x0.multiply(x0).add(x1.multiply(x1)));
});
// Optimize
const optimize = East.function([], MADS.Types.ResultType, $ => {
const x0 = $.let([0.5, 0.5]);
const bounds = $.let({ lower: [-1.0, -1.0], upper: [1.0, 1.0] });
const config = $.let({
max_bb_eval: variant('some', 100n),
display_degree: variant('some', 0n),
direction_type: variant('none', null),
initial_mesh_size: variant('none', null),
min_mesh_size: variant('none', null),
seed: variant('some', 42n),
});
return $.return(MADS.optimize(objective, x0, bounds, variant('none', null), config));
});
```
## Decision Tree: Which Module to Use
```
Task → What do you need?
│
├─ MADS (derivative-free continuous optimization)
│ └─ .optimize()
│
├─ Optuna (Bayesian hyperparameter tuning)
│ └─ .optimize()
│
├─ SimAnneal (discrete/combinatorial optimization)
│ └─ .optimize(), .optimizePermutation(), .optimizeSubset()
│
├─ ALNS (adaptive large neighborhood search)
│ └─ .optimize([SolutionType], initial, objective, destroys, repairs, config)
│ └─ Generic over solution type S - define your own struct
│
├─ Optimization (iterative coordinate descent)
│ └─ .iterative(objective, paramSpaces, config)
│
├─ GoogleOr (Google OR-Tools)
│ ├─ CP-SAT → .cpsatSolve(), .cpsatSolveAll()
│ ├─ Routing → .routingSolve() (TSP, CVRP, VRPTW, VRPPD)
│ ├─ Linear → .linearSolve() (LP, MIP)
│ └─ Graph → .minCostFlow(), .maxFlow(), .assignment()
│
├─ Scipy
│ ├─ Optimization → .optimizeMinimize(), .optimizeMinimizeQuadratic(), .optimizeDualAnnealing()
│ ├─ Statistics → .statsDescribe(), .statsPearsonr(), .statsSpearmanr(), .statsPercentile(), .statsPercentileOfScore(), .statsIqr(), .statsMedian(), .statsMad(), .statsRobust()
│ ├─ Histogram/KDE → .histogram(), .kdeFit(), .kdeEvaluate()
│ ├─ Curve Fitting → .curveFit()
│ └─ Interpolation → .interpolate1dFit(), .interpolate1dPredict()
│
├─ XGBoost (gradient boosting)
│ ├─ Train → .trainRegressor(), .trainClassifier(), .trainQuantile()
│ └─ Predict → .predict(), .predictClass(), .predictProba(), .predictQuantile()
│
├─ LightGBM (fast gradient boosting)
│ ├─ Train → .trainRegressor(), .trainClassifier()
│ └─ Predict → .predict(), .predictClass(), .predictProba()
│
├─ NGBoost (probabilistic gradient boosting)
│ ├─ Train → .trainRegressor()
│ └─ Predict → .predict(), .predictDist()
│
├─ Torch (neural networks)
│ ├─ Train → .mlpTrain(), .mlpTrainMulti()
│ ├─ Predict → .mlpPredict(), .mlpPredictMulti()
│ └─ Embeddings → .mlpEncode(), .mlpDecode()
│
├─ Lightning (PyTorch Lightning neural networks)
│ ├─ Train → .train(X, y, config, masks, group_weights, conditions)
│ ├─ Predict → .predict(model, X, masks, conditions)
│ ├─ Embeddings → .encode(), .decode(), .decodeConditional() (autoencoder only)
│ ├─ Architectures:
│ │ ├─ mlp: simple feedforward
│ │ ├─ autoencoder: encoder → latent → decoder
│ │ ├─ conv1d: 1D convolutional autoencoder (temporal)
│ │ ├─ sequential: LSTM/GRU autoencoder (temporal)
│ │ └─ transformer: attention-based autoencoder (temporal)
│ ├─ Output modes:
│ │ ├─ regression: MSE loss
│ │ ├─ binary: BCE loss, per-position pos_weights (VectorType), masks
│ │ └─ multi_head: N independent CE heads, per-head class_weights, masks
│ ├─ Conditional generation: condition_dim in temporal architectures
│ └─ Features: early stopping, gradient clipping, epoch callbacks, group_weights
│
├─ GP (Gaussian Process regression)
│ ├─ Train → .train()
│ └─ Predict → .predict(), .predictStd()
│
├─ MAPIE (conformal prediction intervals)
│ ├─ Regression → .trainConformalRegressor(), .trainCQR()
│ ├─ Classification → .trainConformalClassifier()
│ ├─ Predict → .predictInterval(), .predictSet()
│ └─ SHAP integration → .uncertaintyPredictorRegressor(), .uncertaintyPredictorClassifier()
│
├─ Sklearn (preprocessing, metrics & clustering)
│ ├─ Splitting → .split() (N-way with stratify, overlap, multi_overlap)
│ ├─ Overlap filtering → .overlap()
│ ├─ Scaling → .standardScalerFit/Transform(), .minMaxScalerFit/Transform(), .robustScalerFit/Transform()
│ ├─ Encoding → .labelEncoderFit/Transform/InverseTransform(), .ordinalEncoderFit/Transform()
│ ├─ Class weights → .computeClassWeight()
│ ├─ Regression metrics → .computeMetrics(), .computeMetricsMulti()
│ ├─ Classification metrics → .computeClassificationMetrics(), .computeClassificationMetricsMulti()
│ ├─ Probability metrics → .rocAucScore(), .logLoss(), .confusionMatrix()
│ ├─ Multi-target → .regressorChainTrain(), .regressorChainPredict()
│ ├─ GMM clustering → .gmmFit(), .gmmPredict(), .gmmPredictProba(), .gmmScoreSamples(), .gmmSample(), .gmmBic(), .gmmAic()
│ └─ Clustering evaluation → .silhouetteScore()
│
├─ PyMC (Bayesian inference)
│ ├─ Train → .trainRegression(), .trainHierarchical(), .trainMultiLayer()
│ ├─ Predict → .predict(), .predictDistribution()
│ ├─ Posterior → .posteriorSummary(), .posteriorSamples()
│ └─ Diagnostics → .diagnostics(), .posteriorPredictiveCheck()
│
├─ Simulation (economic ontology simulation via DES)
│ ├─ Single run → .run([R, E], initialState, initialEvents, process, config)
│ └─ Monte Carlo → .runTrajectories([R, E], initialState, initialEvents, process, config)
│
└─ Shap (model explainability)
├─ Create → .treeExplainerCreate() (XGBoost only), .kernelExplainerCreate() (any model)
├─ Compute → .computeValues(), .featureImportance()
└─ Supports → TreeExplainer: XGBoost; KernelExplainer: XGBoost, LightGBM, NGBoost, GP, Torch, RegressorChain, MAPIE
```
## Common Types
| Type | Definition | Description |
|------|------------|-------------|
| `VectorType` | `ArrayType(FloatType)` | 1D array of floats (e.g., `[1.0, 2.0, 3.0]`) |
| `MatrixType` | `ArrayType(ArrayType(FloatType))` | 2D array of floats (e.g., `[[1.0, 2.0], [3.0, 4.0]]`) |
| `LabelVectorType` | `ArrayType(IntegerType)` | Class labels as integers (e.g., `[0n, 1n, 0n, 2n]`) |
| `ModelBlobType` | `BlobType` | Serialized model (opaque, pass to predict functions) |
## Reference Documentation
- **[API Reference](./reference/api.md)** - Complete function signatures, types, and config options
- **[Examples](./reference/examples.md)** - Working code examples by use case
## Available Modules
| Module | Import | Purpose |
|--------|--------|---------|
| MADS | `import { MADS } from "@elaraai/east-py-datascience"` | Derivative-free blackbox optimization |
| Optuna | `import { Optuna } from "@elaraai/east-py-datascience"` | Bayesian optimization (hyperparameter tuning) |
| SimAnneal | `import { SimAnneal } from "@elaraai/east-py-datascience"` | Simulated annealing (permutation/subset) |
| ALNS | `import { ALNS } from "@elaraai/east-py-datascience"` | Adaptive Large Neighborhood Search (generic over solution type) |
| Scipy | `import { Scipy } from "@elaraai/east-py-datascience"` | Statistics, optimization, interpolation |
| XGBoost | `import { XGBoost } from "@elaraai/east-py-datascience"` | Gradient boosting (regression/classification/quantile) |
| LightGBM | `import { LighRelated in Writing & Docs
jax-development
IncludedUse this skill when the user is writing, debugging, profiling, refactoring, reviewing, benchmarking, parallelising, exporting, or explaining JAX code, or when they mention JAX, jax.numpy, jit, grad, value_and_grad, vmap, scan, lax, random keys, pytrees, jax.Array, sharding, Mesh, PartitionSpec, NamedSharding, pmap, shard_map, Pallas, XLA, StableHLO, checkify, profiler, or the JAX repo. It helps turn NumPy or PyTorch-style code into pure functional JAX, fix tracer/control-flow/shape/PRNG bugs, remove recompiles and host-device syncs, choose transforms and sharding strategies, inspect jaxpr/lowering/IR, and benchmark compiled code correctly.
nature-article-writer
IncludedDrafts, rewrites, diagnostically critiques, and style-calibrates primary research manuscripts for Nature and Nature Portfolio journals. Use when the user wants a Nature-style title, summary paragraph or abstract, introduction, results, discussion, methods, figure legends, presubmission enquiry, cover letter, reviewer response, or when a scientific draft sounds generic, jargon-heavy, structurally weak, or AI-ish and needs precise, broad-reader-friendly prose without inventing data, analyses, or references. Best for primary research articles and letters rather than reviews or press releases unless explicitly adapting one.
deckrd
IncludedDocument-driven framework that derives requirements, specifications, implementation plans, and executable tasks from goals through structured AI dialogue. Use when user says "write requirements", "create spec", "plan implementation", "derive tasks", "structure this feature", "break down into tasks", or "document this module". Also use for reverse engineering existing code into docs (/deckrd rev). Do NOT use for direct code writing — use /deckrd-coder after tasks are generated. Do NOT use when the user only wants to run or fix existing code without planning.
clinical-decision-support
IncludedGenerate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
handling-sf-data
IncludedSalesforce data operations with 130-point scoring. Use this skill to create, update, delete, bulk import/export, generate test data, and clean up org records using sf CLI and anonymous Apex. TRIGGER when: user creates test data, performs bulk import/export, uses sf data CLI commands, needs data factory patterns for Apex tests, or needs to seed/clean records in a Salesforce org. DO NOT TRIGGER when: SOQL query writing only (use querying-soql), Apex test execution (use running-apex-tests), or metadata deployment (use deploying-metadata).
accelint-ac-to-playwright
IncludedConvert and validate acceptance criteria for Playwright test automation. Use when user asks to (1) review/evaluate/check if AC are ready for automation, (2) assess if AC can be converted as-is, (3) validate AC quality for Playwright, (4) turn AC into tests, (5) generate tests from acceptance criteria, (6) convert .md bullets or .feature Gherkin files to Playwright specs, (7) create test automation from requirements. Handles both bullet-style markdown and Gherkin syntax with JSON test plan generation and validation.