molecular_simulations.analysis.ipSAE module¶
Interface prediction Score from Aligned Errors (ipSAE) module.
This module computes interaction prediction scores from pLDDT and PAE data, adapted from https://doi.org/10.1101/2025.02.10.637595. Supports outputs from structure prediction tools like Boltz and AlphaFold.
- class molecular_simulations.analysis.ipSAE.ipSAE(structure_file, plddt_file, pae_file, out_path=None)[source]¶
Bases:
objectCompute interaction prediction Score from Aligned Errors.
Computes various model quality scores including pDockQ, pDockQ2, LIS, ipTM, and ipSAE for structure predictions.
- parser¶
ModelParser instance for structure file.
- plddt_file¶
Path to pLDDT data file.
- pae_file¶
Path to PAE data file.
- path¶
Output directory path.
- scores¶
Polars DataFrame of computed scores after run().
- Parameters:
structure_file (
Union[Path,str]) – Path to PDB/CIF model file.plddt_file (
Union[Path,str]) – Path to pLDDT numpy file (.npz with ‘plddt’ key).pae_file (
Union[Path,str]) – Path to PAE numpy file (.npz with ‘pae’ key).out_path (
Union[Path,str,None]) – Output directory path. If None, uses parent directory of plddt_file.
Example
>>> scorer = ipSAE('model.pdb', 'plddt.npz', 'pae.npz') >>> scorer.run() >>> print(scorer.scores)
Initialize the ipSAE scorer.
- Parameters:
- parse_structure_file()[source]¶
Parse the structure file and extract relevant details.
Runs the parser to read the structure file and classifies chains as protein or nucleic acid.
- Return type:
- prepare_scorer()[source]¶
Initialize the ScoreCalculator for computing scores.
Creates a ScoreCalculator instance with chain information extracted from the parsed structure.
- Return type:
- run()[source]¶
Execute the complete ipSAE scoring workflow.
Parses structure, computes distogram, loads pLDDT and PAE data, runs the scorer, and saves results.
- Return type:
- class molecular_simulations.analysis.ipSAE.ScoreCalculator(chains, chain_pair_type, n_residues, pdockq_cutoff=8.0, pae_cutoff=12.0, dist_cutoff=10.0)[source]¶
Bases:
objectCalculate model quality scores from structure predictions.
Computes pDockQ, pDockQ2, LIS, ipTM, and ipSAE scores for all chain pairs in a structure.
- chains¶
Array of chain IDs for each residue.
- unique_chains¶
Unique chain IDs in the structure.
- chain_pair_type¶
Dictionary mapping chain ID to type.
- n_res¶
Array of residue types.
- permuted¶
List of all chain pairs to evaluate.
- scores¶
DataFrame of computed scores after compute_scores().
- Parameters:
chains (
ndarray) – Array of chain IDs.chain_pair_type (
dict[str,str]) – Dictionary mapping chain ID to chain type (‘protein’ or ‘nucleic’).n_residues (
int) – Number of residues per chain.pdockq_cutoff (
float) – Distance cutoff for pDockQ in Angstroms. Defaults to 8.0.pae_cutoff (
float) – PAE cutoff for ipSAE in Angstroms. Defaults to 12.0.dist_cutoff (
float) – General distance cutoff in Angstroms. Defaults to 10.0.
Example
>>> calc = ScoreCalculator(chains, chain_types, n_residues) >>> calc.compute_scores(distances, plddt, pae) >>> print(calc.scores)
Initialize the ScoreCalculator.
- Parameters:
- __init__(chains, chain_pair_type, n_residues, pdockq_cutoff=8.0, pae_cutoff=12.0, dist_cutoff=10.0)[source]¶
Initialize the ScoreCalculator.
- compute_scores(distances, pLDDT, PAE)[source]¶
Compute all scores for all chain pairs.
Calculates pDockQ, pDockQ2, LIS, ipTM, and ipSAE scores for each permutation of chain pairs.
- compute_pDockQ_scores(chain1, chain2)[source]¶
Compute pDockQ and pDockQ2 scores for a chain pair.
pDockQ depends solely on pLDDT, while pDockQ2 depends on both pLDDT and PAE.
- compute_LIS(chain1, chain2)[source]¶
Compute Local Interaction Score (LIS) for a chain pair.
LIS is based on a subset of the predicted aligned error using a cutoff of 12 Å. Values range in (0, 1] where 1 indicates perfect accuracy.
Adapted from: https://doi.org/10.1101/2024.02.19.580970
- compute_ipTM_ipSAE(chain1, chain2)[source]¶
Compute ipTM and ipSAE scores for a chain pair.
These operations are combined as they rely on similar data processing.
- get_max_values()[source]¶
Extract maximum scores for undirected chain pairs.
Because some scores like ipSAE are asymmetric (A->B != B->A), takes the maximum score for either direction as the undirected score.
- Return type:
- permute_chains()[source]¶
Generate all permutations of chain pairs.
Creates all unique ordered pairs of chains, excluding self-pairs.
- Return type:
- static pDockQ_score(x)[source]¶
Compute pDockQ score.
Formula: pDockQ = 0.724 / (1 + exp(-0.052 * (x - 152.611))) + 0.018
Reference: https://doi.org/10.1038/s41467-022-28865-w
- static pDockQ2_score(x)[source]¶
Compute pDockQ2 score.
Formula: pDockQ2 = 1.31 / (1 + exp(-0.075 * (x - 84.733))) + 0.005
- static compute_pTM(x, d0)¶
Compute pTM score.
Formula: pTM = 1.0 / (1 + (x / d0)^2)
- class molecular_simulations.analysis.ipSAE.ModelParser(structure)[source]¶
Bases:
objectParse structure files to extract residue and atom information.
Handles both PDB and CIF format files, extracting C-alpha, C-beta, and nucleic acid backbone atom coordinates.
- structure¶
Path to the structure file.
- token_mask¶
List of token indicators for each residue.
- residues¶
List of dictionaries containing residue information.
- cb_residues¶
List of C-beta residue dictionaries.
- chains¶
List of chain IDs for each residue.
- chain_types¶
Dictionary mapping chain ID to type after classify_chains().
Example
>>> parser = ModelParser('model.pdb') >>> parser.parse_structure_file() >>> parser.classify_chains()
Initialize the ModelParser.
- parse_structure_file()[source]¶
Parse the structure file and extract atom/residue data.
Identifies file type and parses line by line, storing data for C-alpha, C-beta, C1’, and C3’ atoms.
- Return type:
- classify_chains()[source]¶
Classify chains as protein or nucleic acid.
Reads through residue data to assign chain identity based on whether nucleic acid residues are detected.
- Return type:
- property nucleic_acids: list[str]¶
Get canonical nucleic acid residue names.
- Returns:
List of RNA and DNA residue names.
- static package_line(atom_num, atom_name, residue_name, chain_id, residue_id, x, y, z)[source]¶
Package parsed line data into a dictionary.
- Parameters:
atom_num (
str) – Atom index.atom_name (
str) – Atom name (e.g., ‘CA’, ‘CB’).residue_name (
str) – Residue name (e.g., ‘ALA’).chain_id (
str) – Chain identifier.residue_id (
str) – Residue sequence number.x (
str) – X coordinate as string.y (
str) – Y coordinate as string.z (
str) – Z coordinate as string.
- Return type:
- Returns:
Dictionary containing parsed atom/residue data.