torchdrug
Graph-based drug discovery toolkit. Molecular property prediction (ADMET), protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis, GNNs (GIN, GAT, SchNet), 40+ datasets, for PyTorch-based ML on molecules, proteins, and biomedical graphs.
What this skill does
# TorchDrug
## Overview
TorchDrug is a comprehensive PyTorch-based machine learning toolbox for drug discovery and molecular science. Apply graph neural networks, pre-trained models, and task definitions to molecules, proteins, and biological knowledge graphs, including molecular property prediction, protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis planning, with 40+ curated datasets and 20+ model architectures.
## When to Use This Skill
This skill should be used when working with:
**Data Types:**
- SMILES strings or molecular structures
- Protein sequences or 3D structures (PDB files)
- Chemical reactions and retrosynthesis
- Biomedical knowledge graphs
- Drug discovery datasets
**Tasks:**
- Predicting molecular properties (solubility, toxicity, activity)
- Protein function or structure prediction
- Drug-target binding prediction
- Generating new molecular structures
- Planning chemical synthesis routes
- Link prediction in biomedical knowledge bases
- Training graph neural networks on scientific data
**Libraries and Integration:**
- TorchDrug is the primary library
- Often used with RDKit for cheminformatics
- Compatible with PyTorch and PyTorch Lightning
- Integrates with AlphaFold and ESM for proteins
## Getting Started
### Installation
```bash
uv pip install torchdrug
# Or with optional dependencies
uv pip install torchdrug[full]
```
### Quick Example
```python
from torchdrug import datasets, models, tasks
from torch.utils.data import DataLoader
# Load molecular dataset
dataset = datasets.BBBP("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()
# Define GNN model
model = models.GIN(
input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256],
edge_input_dim=dataset.edge_feature_dim,
batch_norm=True,
readout="mean"
)
# Create property prediction task
task = tasks.PropertyPrediction(
model,
task=dataset.tasks,
criterion="bce",
metric=["auroc", "auprc"]
)
# Train with PyTorch
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
for epoch in range(100):
for batch in train_loader:
loss = task(batch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
```
## Core Capabilities
### 1. Molecular Property Prediction
Predict chemical, physical, and biological properties of molecules from structure.
**Use Cases:**
- Drug-likeness and ADMET properties
- Toxicity screening
- Quantum chemistry properties
- Binding affinity prediction
**Key Components:**
- 20+ molecular datasets (BBBP, HIV, Tox21, QM9, etc.)
- GNN models (GIN, GAT, SchNet)
- PropertyPrediction and MultipleBinaryClassification tasks
**Reference:** See `references/molecular_property_prediction.md` for:
- Complete dataset catalog
- Model selection guide
- Training workflows and best practices
- Feature engineering details
### 2. Protein Modeling
Work with protein sequences, structures, and properties.
**Use Cases:**
- Enzyme function prediction
- Protein stability and solubility
- Subcellular localization
- Protein-protein interactions
- Structure prediction
**Key Components:**
- 15+ protein datasets (EnzymeCommission, GeneOntology, PDBBind, etc.)
- Sequence models (ESM, ProteinBERT, ProteinLSTM)
- Structure models (GearNet, SchNet)
- Multiple task types for different prediction levels
**Reference:** See `references/protein_modeling.md` for:
- Protein-specific datasets
- Sequence vs structure models
- Pre-training strategies
- Integration with AlphaFold and ESM
### 3. Knowledge Graph Reasoning
Predict missing links and relationships in biological knowledge graphs.
**Use Cases:**
- Drug repurposing
- Disease mechanism discovery
- Gene-disease associations
- Multi-hop biomedical reasoning
**Key Components:**
- General KGs (FB15k, WN18) and biomedical (Hetionet)
- Embedding models (TransE, RotatE, ComplEx)
- KnowledgeGraphCompletion task
**Reference:** See `references/knowledge_graphs.md` for:
- Knowledge graph datasets (including Hetionet with 45k biomedical entities)
- Embedding model comparison
- Evaluation metrics and protocols
- Biomedical applications
### 4. Molecular Generation
Generate novel molecular structures with desired properties.
**Use Cases:**
- De novo drug design
- Lead optimization
- Chemical space exploration
- Property-guided generation
**Key Components:**
- Autoregressive generation
- GCPN (policy-based generation)
- GraphAutoregressiveFlow
- Property optimization workflows
**Reference:** See `references/molecular_generation.md` for:
- Generation strategies (unconditional, conditional, scaffold-based)
- Multi-objective optimization
- Validation and filtering
- Integration with property prediction
### 5. Retrosynthesis
Predict synthetic routes from target molecules to starting materials.
**Use Cases:**
- Synthesis planning
- Route optimization
- Synthetic accessibility assessment
- Multi-step planning
**Key Components:**
- USPTO-50k reaction dataset
- CenterIdentification (reaction center prediction)
- SynthonCompletion (reactant prediction)
- End-to-end Retrosynthesis pipeline
**Reference:** See `references/retrosynthesis.md` for:
- Task decomposition (center ID → synthon completion)
- Multi-step synthesis planning
- Commercial availability checking
- Integration with other retrosynthesis tools
### 6. Graph Neural Network Models
Comprehensive catalog of GNN architectures for different data types and tasks.
**Available Models:**
- General GNNs: GCN, GAT, GIN, RGCN, MPNN
- 3D-aware: SchNet, GearNet
- Protein-specific: ESM, ProteinBERT, GearNet
- Knowledge graph: TransE, RotatE, ComplEx, SimplE
- Generative: GraphAutoregressiveFlow
**Reference:** See `references/models_architectures.md` for:
- Detailed model descriptions
- Model selection guide by task and dataset
- Architecture comparisons
- Implementation tips
### 7. Datasets
40+ curated datasets spanning chemistry, biology, and knowledge graphs.
**Categories:**
- Molecular properties (drug discovery, quantum chemistry)
- Protein properties (function, structure, interactions)
- Knowledge graphs (general and biomedical)
- Retrosynthesis reactions
**Reference:** See `references/datasets.md` for:
- Complete dataset catalog with sizes and tasks
- Dataset selection guide
- Loading and preprocessing
- Splitting strategies (random, scaffold)
## Common Workflows
### Workflow 1: Molecular Property Prediction
**Scenario:** Predict blood-brain barrier penetration for drug candidates.
**Steps:**
1. Load dataset: `datasets.BBBP()`
2. Choose model: GIN for molecular graphs
3. Define task: `PropertyPrediction` with binary classification
4. Train with scaffold split for realistic evaluation
5. Evaluate using AUROC and AUPRC
**Navigation:** `references/molecular_property_prediction.md` → Dataset selection → Model selection → Training
### Workflow 2: Protein Function Prediction
**Scenario:** Predict enzyme function from sequence.
**Steps:**
1. Load dataset: `datasets.EnzymeCommission()`
2. Choose model: ESM (pre-trained) or GearNet (with structure)
3. Define task: `PropertyPrediction` with multi-class classification
4. Fine-tune pre-trained model or train from scratch
5. Evaluate using accuracy and per-class metrics
**Navigation:** `references/protein_modeling.md` → Model selection (sequence vs structure) → Pre-training strategies
### Workflow 3: Drug Repurposing via Knowledge Graphs
**Scenario:** Find new disease treatments in Hetionet.
**Steps:**
1. Load dataset: `datasets.Hetionet()`
2. Choose model: RotatE or ComplEx
3. Define task: `KnowledgeGraphCompletion`
4. Train with negative sampling
5. Query for "Compound-treats-Disease" predictions
6. Filter by plausibility and mechanism
**Navigation:** `references/knowledge_graphs.md` → Hetionet dataset → Model selection → Biomedical applications
### Workflow 4: De Novo Molecule Generation
**Scenario:** Generate drug-like molecules oRelated in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.