molecular_simulations.analysis.utils module

Analysis utility functions and classes.

This module provides utility classes for embedding analysis data into PDB files, particularly for visualization of per-residue properties.

class molecular_simulations.analysis.utils.EmbedData(pdb, embedding_dict, out=None)[source]

Bases: object

Embed data into the beta-factor column of a PDB file.

Writes out to the same path as input PDB (backing up the original) unless an output path is explicitly provided. Embedding data should be provided as a dictionary where keys are MDAnalysis selection strings and values are numpy arrays.

pdb

Path to the PDB file.

embeddings

Dictionary of selections and data to embed.

out

Output path for the modified PDB.

u

MDAnalysis Universe object.

Parameters:
  • pdb (Path) – Path to PDB file to load. Also serves as output if one is not provided.

  • embedding_dict (dict[str, ndarray]) – Dictionary with MDAnalysis selections as keys and data arrays as values. Arrays should have shape (n_frames, n_residues, n_datapoints) or (n_residues, n_datapoints).

  • out (Path | str | None) – Output path. If None, uses the input PDB path.

Example

>>> data = {'protein': np.random.rand(100)}  # 100 residues
>>> embedder = EmbedData('structure.pdb', data)
>>> embedder.embed()

Initialize the EmbedData instance.

Parameters:
  • pdb (Path) – Path to input PDB file.

  • embedding_dict (dict[str, ndarray]) – Selection to data mapping.

  • out (Path | str | None) – Optional output path.

__init__(pdb, embedding_dict, out=None)[source]

Initialize the EmbedData instance.

Parameters:
  • pdb (Path) – Path to input PDB file.

  • embedding_dict (dict[str, ndarray]) – Selection to data mapping.

  • out (Path | str | None) – Optional output path.

embed()[source]

Embed all data and write the modified PDB.

Unpacks the embedding dictionary, embeds data into each selection, and writes out the new PDB file.

Return type:

None

embed_selection(selection, data)[source]

Embed data into a specific selection’s beta column.

Parameters:
  • selection (str) – MDAnalysis selection string.

  • data (ndarray) – Array of data to embed. Shape should be (n_residues_in_selection,) or compatible.

Return type:

None

write_new_pdb()[source]

Write the modified PDB file.

If output path exists and equals the input path, backs up the original PDB with ‘.orig.pdb’ extension (only if backup doesn’t already exist to prevent overwriting the true original).

Return type:

None

class molecular_simulations.analysis.utils.EmbedEnergyData(pdb, embedding_dict, out=None)[source]

Bases: EmbedData

Embed energy data into PDB beta-factor column.

Special case of EmbedData for non-bonded energy data with both LJ and Coulombic terms. Sums the energy terms and rescales to handle negative values (which many visualization tools don’t support in beta factors).

Parameters:
  • pdb (Path) – Path to PDB file to load.

  • embedding_dict (dict[str, ndarray]) – Dictionary with MDAnalysis selections as keys and energy data arrays as values.

  • out (Path | str | None) – Output path. If None, uses the input PDB path.

Example

>>> energies = {'chainA': energy_array}  # shape (n_frames, n_res, 2)
>>> embedder = EmbedEnergyData('structure.pdb', energies)
>>> embedder.embed()

Initialize the EmbedEnergyData instance.

Parameters:
  • pdb (Path) – Path to input PDB file.

  • embedding_dict (dict[str, ndarray]) – Selection to energy data mapping.

  • out (Path | str | None) – Optional output path.

__init__(pdb, embedding_dict, out=None)[source]

Initialize the EmbedEnergyData instance.

Parameters:
  • pdb (Path) – Path to input PDB file.

  • embedding_dict (dict[str, ndarray]) – Selection to energy data mapping.

  • out (Path | str | None) – Optional output path.

preprocess()[source]

Process embeddings data for PDB embedding.

Reduces multi-dimensional energy data to 1D per-residue values and rescales to ensure non-negative values while preserving relative differences.

Return type:

dict[str, ndarray]

Returns:

Processed data dictionary ready for embedding.

static sanitize_data(data)[source]

Reduce data to one-dimensional per-residue values.

Takes data of shape (n_frames, n_residues, n_terms) and returns array of shape (n_residues,) by averaging over frames and summing energy terms.

Parameters:

data (ndarray) – Input data array with multiple dimensions.

Return type:

ndarray

Returns:

One-dimensional processed data array.