Claude
Skills
Sign in
Back

mdanalysis

Included with Lifetime
$97 forever

Comprehensive guide for MDAnalysis - the Python library for analyzing molecular dynamics trajectories. Use for trajectory loading, RMSD/RMSF calculations, distance/angle/dihedral analysis, atom selections, hydrogen bonds, solvent accessible surface area, protein structure analysis, membrane analysis, and integration with Biopython. Essential for MD simulation analysis.

General

What this skill does


# MDAnalysis - Molecular Dynamics Analysis

Python library for reading, writing, and analyzing molecular dynamics trajectories and structural files.

## When to Use

- Loading MD trajectories (DCD, XTC, TRR, NetCDF, etc.)
- RMSD and RMSF calculations
- Distance, angle, and dihedral analysis
- Atom selections (VMD-like syntax)
- Hydrogen bond analysis
- Solvent Accessible Surface Area (SASA)
- Protein secondary structure analysis
- Membrane system analysis
- Water/ion distribution analysis
- Trajectory alignment and fitting
- Custom trajectory analysis
- Converting between file formats

## Reference Documentation

**Official docs**: https://www.mdanalysis.org/docs/  
**Search patterns**: `MDAnalysis.Universe`, `MDAnalysis.analysis.rms`, `MDAnalysis.analysis.distances`

## Core Principles

### Use MDAnalysis For

| Task | Module | Example |
|------|--------|---------|
| Load trajectory | `Universe` | `Universe(topology, trajectory)` |
| RMSD calculation | `analysis.rms` | `RMSD(mobile, ref)` |
| Atom selection | `select_atoms` | `u.select_atoms('protein')` |
| Distance analysis | `analysis.distances` | `distance_array(pos1, pos2)` |
| H-bond analysis | `analysis.hbonds` | `HydrogenBondAnalysis()` |
| SASA calculation | `analysis.sasa` | `SASAnalysis()` |
| Contacts analysis | `analysis.contacts` | `Contacts()` |
| Trajectory writing | `Writer` | `with Writer() as W` |

### Do NOT Use For

- Running MD simulations (use GROMACS, AMBER, NAMD)
- Force field calculations (use OpenMM, MDTraj)
- Quantum chemistry (use PySCF, Qiskit)
- Protein structure prediction (use AlphaFold, RosettaFold)
- Initial structure building (use Biopython, PyMOL)

## Quick Reference

### Installation

```bash
# pip
pip install MDAnalysis

# With additional analysis modules
pip install MDAnalysis[analysis]

# conda
conda install -c conda-forge mdanalysis

# Development version
pip install git+https://github.com/MDAnalysis/mdanalysis.git
```

### Standard Imports

```python
# Core imports
import MDAnalysis as mda
from MDAnalysis import Universe
from MDAnalysis.analysis import rms, align, distances

# Common analysis modules
from MDAnalysis.analysis.rms import RMSD, RMSF
from MDAnalysis.analysis.distances import distance_array
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis
from MDAnalysis.analysis.dihedrals import Dihedral

# Utilities
import numpy as np
import matplotlib.pyplot as plt
```

### Basic Pattern - Load and Analyze

```python
import MDAnalysis as mda
from MDAnalysis.analysis.rms import RMSD

# Load trajectory
u = mda.Universe('topology.pdb', 'trajectory.dcd')

# Select atoms
protein = u.select_atoms('protein')
ca_atoms = u.select_atoms('protein and name CA')

# Calculate RMSD
rmsd_analysis = RMSD(protein, protein, select='backbone')
rmsd_analysis.run()

# Access results
rmsd = rmsd_analysis.results.rmsd
print(f"RMSD over time: {rmsd[:, 2]}")  # Column 2 is RMSD
```

### Basic Pattern - Atom Selection

```python
import MDAnalysis as mda

u = mda.Universe('structure.pdb')

# Various selections (VMD-like syntax)
protein = u.select_atoms('protein')
backbone = u.select_atoms('backbone')
ca = u.select_atoms('name CA')
resid_10 = u.select_atoms('resid 10')
within_5A = u.select_atoms('around 5 resid 10')
water = u.select_atoms('resname WAT or resname HOH')

print(f"Number of protein atoms: {len(protein)}")
print(f"Number of CA atoms: {len(ca)}")
```

## Critical Rules

### ✅ DO

- **Close trajectory files** - Use context managers or close explicitly
- **Use atom selections efficiently** - Cache selections for reuse
- **Check trajectory length** - Verify n_frames before analysis
- **Use vectorized operations** - Leverage NumPy for speed
- **Align trajectories** - Align before RMSD calculations
- **Handle periodic boundaries** - Use PBC-aware distance calculations
- **Validate atom groups** - Check empty selections
- **Use appropriate frames** - Slice trajectories if needed
- **Save intermediate results** - Don't recompute expensive calculations
- **Check units** - MDAnalysis uses Angstroms and picoseconds

### ❌ DON'T

- **Load entire trajectory in memory** - Stream through frames
- **Ignore PBC** - Always consider periodic boundary conditions
- **Forget to align** - RMSD without alignment is meaningless
- **Use wrong atom names** - Check topology for correct names
- **Mix coordinate systems** - Be consistent with units
- **Ignore missing atoms** - Handle incomplete residues
- **Recompute unnecessarily** - Cache expensive calculations
- **Use string selections in loops** - Parse once, reuse
- **Forget to unwrap coordinates** - Handle molecules split by PBC
- **Ignore memory limits** - Process large trajectories in chunks

## Anti-Patterns (NEVER)

```python
import MDAnalysis as mda
import numpy as np

# ❌ BAD: Loading entire trajectory in memory
u = mda.Universe('top.pdb', 'traj.dcd')
all_coords = []
for ts in u.trajectory:
    all_coords.append(u.atoms.positions.copy())
all_coords = np.array(all_coords)  # Huge memory usage!

# ✅ GOOD: Process frame by frame
u = mda.Universe('top.pdb', 'traj.dcd')
for ts in u.trajectory:
    # Process current frame
    coords = u.atoms.positions
    # Do analysis...
    # Move to next frame automatically

# ❌ BAD: RMSD without alignment
rmsd_values = []
for ts in u.trajectory:
    rmsd = rms.rmsd(mobile.positions, reference.positions)
    rmsd_values.append(rmsd)  # Wrong! Not aligned!

# ✅ GOOD: Align before RMSD
from MDAnalysis.analysis.rms import RMSD
R = RMSD(mobile, reference, select='backbone')
R.run()
rmsd_values = R.results.rmsd[:, 2]

# ❌ BAD: Creating selection in loop
for ts in u.trajectory:
    ca = u.select_atoms('name CA')  # Parsed every frame!
    # Do something with ca

# ✅ GOOD: Create selection once
ca = u.select_atoms('name CA')
for ts in u.trajectory:
    # Use ca (automatically updated each frame)
    positions = ca.positions

# ❌ BAD: Ignoring periodic boundaries
distance = np.linalg.norm(atom1.position - atom2.position)

# ✅ GOOD: PBC-aware distance
from MDAnalysis.lib.distances import distance_array
dist = distance_array(
    atom1.position[np.newaxis, :],
    atom2.position[np.newaxis, :],
    box=u.dimensions
)[0, 0]

# ❌ BAD: Not checking for empty selections
selection = u.select_atoms('resname XYZ')
# Continue without checking if selection is empty!
avg_pos = selection.center_of_mass()  # May crash!

# ✅ GOOD: Validate selections
selection = u.select_atoms('resname XYZ')
if len(selection) == 0:
    print("Warning: No atoms found matching selection")
else:
    avg_pos = selection.center_of_mass()
```

## Loading Trajectories (Universe)

### Basic Universe Creation

```python
import MDAnalysis as mda

# Single structure file
u = mda.Universe('protein.pdb')

# Topology + trajectory
u = mda.Universe('topology.pdb', 'trajectory.dcd')

# Multiple trajectories (concatenated)
u = mda.Universe('top.pdb', 'traj1.dcd', 'traj2.dcd', 'traj3.dcd')

# Different formats
u = mda.Universe('system.gro', 'traj.xtc')  # GROMACS
u = mda.Universe('system.psf', 'traj.dcd')  # CHARMM/NAMD
u = mda.Universe('system.prmtop', 'traj.nc')  # AMBER

# From memory (numpy arrays)
coords = np.random.rand(100, 3)  # 100 atoms, xyz
u = mda.Universe.empty(100, trajectory=True)
u.atoms.positions = coords

print(f"Number of atoms: {len(u.atoms)}")
print(f"Number of frames: {len(u.trajectory)}")
print(f"Total time: {u.trajectory.totaltime} ps")
```

### Trajectory Information

```python
import MDAnalysis as mda

u = mda.Universe('topology.pdb', 'trajectory.dcd')

# Trajectory properties
traj = u.trajectory
print(f"Number of frames: {traj.n_frames}")
print(f"Time step: {traj.dt} ps")
print(f"Total time: {traj.totaltime} ps")

# Current frame info
print(f"Current frame: {traj.frame}")
print(f"Current time: {traj.time} ps")
print(f"Box dimensions: {u.dimensions}")  # [a, b, c, alpha, beta, gamma]

# Iterate through frames
for i, ts in enumerate(u.trajectory):
    if i >= 5:
        break
    print(f"Frame {ts.frame}: time 

Related in General