Analysis Workflows ================== This tutorial demonstrates comprehensive analysis pipelines for characterizing protein-protein or protein-ligand binding interfaces. Complete Interface Analysis --------------------------- Combine multiple analysis tools to fully characterize a binding interface: .. code-block:: python from pathlib import Path from molecular_simulations.analysis import ( Fingerprinter, PPInteractions, SASA, RelativeSASA, ) import MDAnalysis as mda # Paths top = Path("system.prmtop") traj = Path("prod.dcd") out = Path("analysis_results") out.mkdir(exist_ok=True) # Load universe for MDAnalysis-based analyses u = mda.Universe(str(top), str(traj)) Step 1: Interaction Energy Fingerprinting ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Identify residues with strong energetic contributions: .. code-block:: python fp = Fingerprinter( topology=str(top), trajectory=str(traj), target_selection="segid A", binder_selection="segid B", ) fp.run() fp.save(out / "fingerprint.npz") # Analyze results import numpy as np data = np.load(out / "fingerprint.npz") # Find residues with strongest interactions total_energy = data["electrostatic"] + data["vdw"] mean_energy = total_energy.mean(axis=0) # Average over frames hotspots = np.argsort(mean_energy)[:10] # Top 10 contributors print("Top interaction hotspots:") for i in hotspots: print(f" Residue {data['residues'][i]}: {mean_energy[i]:.2f} kcal/mol") Step 2: Contact Analysis ~~~~~~~~~~~~~~~~~~~~~~~~ Characterize specific interaction types: .. code-block:: python ppi = PPInteractions( top=str(top), traj=str(traj), out=str(out), sel1="segid A", sel2="segid B", hbond_cutoff=3.5, sb_cutoff=6.0, hydrophobic_cutoff=8.0, ) ppi.run() # Results saved as JSON with: # - Hydrogen bond frequencies # - Salt bridge occupancies # - Hydrophobic contact probabilities Step 3: Burial Analysis ~~~~~~~~~~~~~~~~~~~~~~~ Calculate interface burial upon complex formation: .. code-block:: python # SASA of the complex complex_sasa = SASA(u, selection="protein") complex_sasa.run() # For interface residues, compare to isolated chains chain_a = u.select_atoms("segid A") chain_b = u.select_atoms("segid B") # Buried surface area = SASA(A) + SASA(B) - SASA(complex) Step 4: Relative Solvent Accessibility ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Identify buried vs exposed residues: .. code-block:: python rsasa = RelativeSASA(u, selection="segid A") rsasa.run() # Residues with relative SASA < 0.25 are typically buried buried = rsasa.results < 0.25 Batch Processing Multiple Systems --------------------------------- Process multiple trajectories efficiently: .. code-block:: python from natsort import natsorted from tqdm import tqdm from concurrent.futures import ProcessPoolExecutor def analyze_replica(replica_dir): """Analyze a single replica.""" top = replica_dir / "system.prmtop" traj = replica_dir / "prod.dcd" out = replica_dir / "analysis" out.mkdir(exist_ok=True) fp = Fingerprinter( topology=str(top), trajectory=str(traj), target_selection="segid A", ) fp.run() fp.save(out / "fingerprint.npz") return replica_dir # Process replicas in parallel replica_dirs = natsorted(Path("./").glob("replica_*")) with ProcessPoolExecutor(max_workers=4) as executor: results = list(tqdm( executor.map(analyze_replica, replica_dirs), total=len(replica_dirs), desc="Analyzing replicas" )) Aggregating Results ------------------- Combine results from multiple replicas: .. code-block:: python import numpy as np from pathlib import Path # Collect fingerprints from all replicas fingerprints = [] for rep in Path("./").glob("replica_*/analysis/fingerprint.npz"): data = np.load(rep) fingerprints.append(data["electrostatic"] + data["vdw"]) # Stack and compute statistics all_fp = np.concatenate(fingerprints, axis=0) mean_fp = all_fp.mean(axis=0) std_fp = all_fp.std(axis=0) print(f"Analyzed {all_fp.shape[0]} total frames") print(f"Mean binding energy: {mean_fp.sum():.2f} ± {std_fp.sum():.2f} kcal/mol") Visualization ------------- Create publication-quality figures: .. code-block:: python import matplotlib.pyplot as plt import seaborn as sns fig, axes = plt.subplots(1, 2, figsize=(12, 5)) # Energy fingerprint heatmap ax = axes[0] sns.heatmap( all_fp.mean(axis=0).reshape(-1, 1), cmap="RdBu_r", center=0, ax=ax, cbar_kws={"label": "Interaction Energy (kcal/mol)"} ) ax.set_title("Per-Residue Interaction Energies") # Contact frequency bar plot ax = axes[1] # ... load and plot PPI results plt.tight_layout() plt.savefig("interface_analysis.png", dpi=300) Interface Scoring for AlphaFold Models -------------------------------------- For predicted structures with AlphaFold confidence metrics: .. code-block:: python from molecular_simulations.analysis import ipSAE scorer = ipSAE( structure_file="alphafold_complex.pdb", pae_file="predicted_aligned_error.json", plddt_file="plddt_scores.json", ) scores = scorer.run() print(f"ipTM: {scores['ipTM']:.3f}") print(f"ipSAE: {scores['ipSAE']:.3f}") print(f"pDockQ: {scores['pDockQ']:.3f}") print(f"pDockQ2: {scores['pDockQ2']:.3f}") # Interpretation: # pDockQ > 0.5: Likely correct interface # pDockQ 0.23-0.5: Possible interface # pDockQ < 0.23: Unlikely correct