molecular_simulations.analysis.ipSAE module

Interface prediction Score from Aligned Errors (ipSAE) module.

This module computes interaction prediction scores from pLDDT and PAE data, adapted from https://doi.org/10.1101/2025.02.10.637595. Supports outputs from structure prediction tools like Boltz and AlphaFold.

class molecular_simulations.analysis.ipSAE.ipSAE(structure_file, plddt_file, pae_file, out_path=None)[source]

Bases: object

Compute interaction prediction Score from Aligned Errors.

Computes various model quality scores including pDockQ, pDockQ2, LIS, ipTM, and ipSAE for structure predictions.

parser

ModelParser instance for structure file.

plddt_file

Path to pLDDT data file.

pae_file

Path to PAE data file.

path

Output directory path.

scores

Polars DataFrame of computed scores after run().

Parameters:
  • structure_file (Union[Path, str]) – Path to PDB/CIF model file.

  • plddt_file (Union[Path, str]) – Path to pLDDT numpy file (.npz with ‘plddt’ key).

  • pae_file (Union[Path, str]) – Path to PAE numpy file (.npz with ‘pae’ key).

  • out_path (Union[Path, str, None]) – Output directory path. If None, uses parent directory of plddt_file.

Example

>>> scorer = ipSAE('model.pdb', 'plddt.npz', 'pae.npz')
>>> scorer.run()
>>> print(scorer.scores)

Initialize the ipSAE scorer.

Parameters:
  • structure_file (Union[Path, str]) – Path to structure file.

  • plddt_file (Union[Path, str]) – Path to pLDDT data file.

  • pae_file (Union[Path, str]) – Path to PAE data file.

  • out_path (Union[Path, str, None]) – Output directory path.

__init__(structure_file, plddt_file, pae_file, out_path=None)[source]

Initialize the ipSAE scorer.

Parameters:
  • structure_file (Union[Path, str]) – Path to structure file.

  • plddt_file (Union[Path, str]) – Path to pLDDT data file.

  • pae_file (Union[Path, str]) – Path to PAE data file.

  • out_path (Union[Path, str, None]) – Output directory path.

parse_structure_file()[source]

Parse the structure file and extract relevant details.

Runs the parser to read the structure file and classifies chains as protein or nucleic acid.

Return type:

None

prepare_scorer()[source]

Initialize the ScoreCalculator for computing scores.

Creates a ScoreCalculator instance with chain information extracted from the parsed structure.

Return type:

None

run()[source]

Execute the complete ipSAE scoring workflow.

Parses structure, computes distogram, loads pLDDT and PAE data, runs the scorer, and saves results.

Return type:

None

save_scores()[source]

Save scores DataFrame to a Parquet file.

Return type:

None

load_pLDDT_file()[source]

Load and scale pLDDT data.

Return type:

ndarray

Returns:

pLDDT array scaled to 0-100 range.

load_PAE_file()[source]

Load PAE data from file.

Return type:

ndarray

Returns:

PAE array from the ‘pae’ key in the npz file.

class molecular_simulations.analysis.ipSAE.ScoreCalculator(chains, chain_pair_type, n_residues, pdockq_cutoff=8.0, pae_cutoff=12.0, dist_cutoff=10.0)[source]

Bases: object

Calculate model quality scores from structure predictions.

Computes pDockQ, pDockQ2, LIS, ipTM, and ipSAE scores for all chain pairs in a structure.

chains

Array of chain IDs for each residue.

unique_chains

Unique chain IDs in the structure.

chain_pair_type

Dictionary mapping chain ID to type.

n_res

Array of residue types.

permuted

List of all chain pairs to evaluate.

scores

DataFrame of computed scores after compute_scores().

Parameters:
  • chains (ndarray) – Array of chain IDs.

  • chain_pair_type (dict[str, str]) – Dictionary mapping chain ID to chain type (‘protein’ or ‘nucleic’).

  • n_residues (int) – Number of residues per chain.

  • pdockq_cutoff (float) – Distance cutoff for pDockQ in Angstroms. Defaults to 8.0.

  • pae_cutoff (float) – PAE cutoff for ipSAE in Angstroms. Defaults to 12.0.

  • dist_cutoff (float) – General distance cutoff in Angstroms. Defaults to 10.0.

Example

>>> calc = ScoreCalculator(chains, chain_types, n_residues)
>>> calc.compute_scores(distances, plddt, pae)
>>> print(calc.scores)

Initialize the ScoreCalculator.

Parameters:
  • chains (ndarray) – Array of chain IDs.

  • chain_pair_type (dict[str, str]) – Chain ID to type mapping.

  • n_residues (int) – Residue type array.

  • pdockq_cutoff (float) – pDockQ distance cutoff.

  • pae_cutoff (float) – PAE cutoff.

  • dist_cutoff (float) – General distance cutoff.

__init__(chains, chain_pair_type, n_residues, pdockq_cutoff=8.0, pae_cutoff=12.0, dist_cutoff=10.0)[source]

Initialize the ScoreCalculator.

Parameters:
  • chains (ndarray) – Array of chain IDs.

  • chain_pair_type (dict[str, str]) – Chain ID to type mapping.

  • n_residues (int) – Residue type array.

  • pdockq_cutoff (float) – pDockQ distance cutoff.

  • pae_cutoff (float) – PAE cutoff.

  • dist_cutoff (float) – General distance cutoff.

compute_scores(distances, pLDDT, PAE)[source]

Compute all scores for all chain pairs.

Calculates pDockQ, pDockQ2, LIS, ipTM, and ipSAE scores for each permutation of chain pairs.

Parameters:
  • distances (ndarray) – Pairwise distance matrix between all residues.

  • pLDDT (ndarray) – Per-residue pLDDT values (0-100 scale).

  • PAE (ndarray) – Predicted aligned error matrix.

Return type:

None

compute_pDockQ_scores(chain1, chain2)[source]

Compute pDockQ and pDockQ2 scores for a chain pair.

pDockQ depends solely on pLDDT, while pDockQ2 depends on both pLDDT and PAE.

Parameters:
  • chain1 (str) – First chain identifier.

  • chain2 (str) – Second chain identifier.

Return type:

tuple[float, float]

Returns:

Tuple of (pDockQ, pDockQ2) scores.

compute_LIS(chain1, chain2)[source]

Compute Local Interaction Score (LIS) for a chain pair.

LIS is based on a subset of the predicted aligned error using a cutoff of 12 Å. Values range in (0, 1] where 1 indicates perfect accuracy.

Adapted from: https://doi.org/10.1101/2024.02.19.580970

Parameters:
  • chain1 (str) – First chain identifier.

  • chain2 (str) – Second chain identifier.

Return type:

float

Returns:

LIS value for the chain pair.

compute_ipTM_ipSAE(chain1, chain2)[source]

Compute ipTM and ipSAE scores for a chain pair.

These operations are combined as they rely on similar data processing.

Parameters:
  • chain1 (str) – First chain identifier.

  • chain2 (str) – Second chain identifier.

Return type:

tuple[float, float]

Returns:

Tuple of (ipTM, ipSAE) scores.

get_max_values()[source]

Extract maximum scores for undirected chain pairs.

Because some scores like ipSAE are asymmetric (A->B != B->A), takes the maximum score for either direction as the undirected score.

Return type:

None

permute_chains()[source]

Generate all permutations of chain pairs.

Creates all unique ordered pairs of chains, excluding self-pairs.

Return type:

None

static pDockQ_score(x)[source]

Compute pDockQ score.

Formula: pDockQ = 0.724 / (1 + exp(-0.052 * (x - 152.611))) + 0.018

Reference: https://doi.org/10.1038/s41467-022-28865-w

Parameters:

x (float) – Mean pLDDT scaled by log10 of the number of residue pairs meeting pLDDT and distance cutoffs.

Return type:

float

Returns:

pDockQ score.

static pDockQ2_score(x)[source]

Compute pDockQ2 score.

Formula: pDockQ2 = 1.31 / (1 + exp(-0.075 * (x - 84.733))) + 0.005

Reference: https://doi.org/10.1093/bioinformatics/btad424

Parameters:

x (float) – Mean pLDDT scaled by mean PAE score.

Return type:

float

Returns:

pDockQ2 score.

static compute_pTM(x, d0)

Compute pTM score.

Formula: pTM = 1.0 / (1 + (x / d0)^2)

Parameters:
  • x (float) – pLDDT or PAE value.

  • d0 (float) – Distance parameter from compute_d0.

Return type:

float

Returns:

pTM score.

static compute_d0(L, pair_type)[source]

Compute d0 parameter for pTM calculation.

Formula: d0 = max(min_value, 1.24 * (L - 15)^(1/3) - 1.8)

Parameters:
  • L (int) – Sequence length (minimum 27).

  • pair_type (str) – ‘protein’ or ‘nucleic_acid’.

Return type:

float

Returns:

d0 parameter value.

class molecular_simulations.analysis.ipSAE.ModelParser(structure)[source]

Bases: object

Parse structure files to extract residue and atom information.

Handles both PDB and CIF format files, extracting C-alpha, C-beta, and nucleic acid backbone atom coordinates.

structure

Path to the structure file.

token_mask

List of token indicators for each residue.

residues

List of dictionaries containing residue information.

cb_residues

List of C-beta residue dictionaries.

chains

List of chain IDs for each residue.

chain_types

Dictionary mapping chain ID to type after classify_chains().

Parameters:

structure (Union[Path, str]) – Path to PDB or CIF file.

Example

>>> parser = ModelParser('model.pdb')
>>> parser.parse_structure_file()
>>> parser.classify_chains()

Initialize the ModelParser.

Parameters:

structure (Union[Path, str]) – Path to PDB or CIF file.

__init__(structure)[source]

Initialize the ModelParser.

Parameters:

structure (Union[Path, str]) – Path to PDB or CIF file.

parse_structure_file()[source]

Parse the structure file and extract atom/residue data.

Identifies file type and parses line by line, storing data for C-alpha, C-beta, C1’, and C3’ atoms.

Return type:

None

classify_chains()[source]

Classify chains as protein or nucleic acid.

Reads through residue data to assign chain identity based on whether nucleic acid residues are detected.

Return type:

None

property nucleic_acids: list[str]

Get canonical nucleic acid residue names.

Returns:

List of RNA and DNA residue names.

static parse_pdb_line(line, *args)[source]

Parse a single line of a PDB file.

Parameters:
  • line (str) – Line from the PDB file.

  • *args – Unused, for API compatibility with parse_cif_line.

Return type:

dict[str, Any]

Returns:

Dictionary with atom/residue information.

static parse_cif_line(line, fields)[source]

Parse a single line of a CIF file.

Parameters:
  • line (str) – Line from the CIF file.

  • fields (dict[str, int]) – Dictionary mapping field names to column indices.

Return type:

dict[str, Any]

Returns:

Dictionary with atom/residue information, or None if residue_id is missing.

static package_line(atom_num, atom_name, residue_name, chain_id, residue_id, x, y, z)[source]

Package parsed line data into a dictionary.

Parameters:
  • atom_num (str) – Atom index.

  • atom_name (str) – Atom name (e.g., ‘CA’, ‘CB’).

  • residue_name (str) – Residue name (e.g., ‘ALA’).

  • chain_id (str) – Chain identifier.

  • residue_id (str) – Residue sequence number.

  • x (str) – X coordinate as string.

  • y (str) – Y coordinate as string.

  • z (str) – Z coordinate as string.

Return type:

dict[str, Any]

Returns:

Dictionary containing parsed atom/residue data.