molecular_simulations.analysis.constant_pH_analysis module

Improved constant pH analysis with UWHAM reweighting.

This implementation adds multistate analysis capabilities to the basic curve fitting approach. Uses log-space arithmetic for numerical stability.

class molecular_simulations.analysis.constant_pH_analysis.UWHAMSolver(tol=1e-07, maxiter=10000)[source]

Bases: object

Unbinned Weighted Histogram Analysis Method (UWHAM) solver.

NOTE: This class is NOT currently used because UWHAM/MBAR is designed for umbrella sampling and replica exchange, not independent constant pH simulations. For standard constant pH MD where each pH is an independent equilibrium simulation, simple curve fitting is the correct approach.

This class is retained for potential use with replica exchange constant pH (REX-cpH) simulations where samples ARE correlated across pH values.

Uses log-space arithmetic throughout for numerical stability with large systems (100+ titratable residues).

__init__(tol=1e-07, maxiter=10000)[source]
load_data(df, resid_cols)[source]

Load data from polars DataFrame into UWHAM-compatible format.

Parameters:
  • df (DataFrame) – DataFrame with columns: rankid, current_pH, and residue columns Residue columns should contain numeric protonation states (0 or 1)

  • resid_cols (list[str]) – List of column names corresponding to residue IDs

solve(verbose=False)[source]

Solve UWHAM self-consistent equations iteratively.

Uses the MBAR equation: f_k = -log(Σ_n exp(-u_k(x_n)) / Σ_l N_l exp(f_l - u_l(x_n)))

where the sum over n includes ALL samples from ALL states.

Returns:

f – Free energy offsets for each pH simulation

Return type:

np.ndarray

compute_log_weights(target_pH)[source]

Compute log weights for reweighting to target pH.

Uses MBAR formula: w_n ∝ exp(-u_target(x_n)) / Σ_l N_l exp(f_l - u_l(x_n))

Return type:

tuple[ndarray, float]

Returns:

  • log_weights (np.ndarray) – Log weights for all samples (flattened)

  • log_norm (float) – Log of the normalization constant

compute_expectation_at_pH(observable_by_state, target_pH)[source]

Compute expectation value of observable at arbitrary pH.

Parameters:
  • observable_by_state (list[ndarray]) – Observable values for each sample, organized by state index

  • target_pH (float) – pH value at which to compute the expectation

Returns:

expectation – Reweighted expectation value at target_pH

Return type:

float

get_occupancy_for_resid(resid)[source]

Get occupancy arrays for a specific residue across all pH values.

Return type:

list[ndarray]

class molecular_simulations.analysis.constant_pH_analysis.TitrationCurve(log_file, make_plots=True, out=PosixPath('.'), method='uwham')[source]

Bases: object

Analyze constant pH simulations with multiple fitting methods.

Available methods: - curvefit: Simple least squares fit of Hill equation to per-pH averages - weighted: Weighted least squares (weight by 1/variance) - bootstrap: Curve fitting with bootstrap confidence intervals

Note: For independent constant pH simulations (not replica exchange), simple curve fitting is the statistically correct approach. UWHAM/MBAR is only appropriate for replica exchange constant pH where samples are correlated across pH values.

__init__(log_file, make_plots=True, out=PosixPath('.'), method='uwham')[source]
static parse_log(log)[source]

Parse OpenMM constant pH log file.

Return type:

tuple[DataFrame, list[int]]

Returns:

  • df (pl.DataFrame) – DataFrame with columns: rankid, current_pH, and one column per residue

  • resids (List[int]) – List of residue IDs in order

prepare()[source]

Prepare data for analysis.

Return type:

None

compute_titrations_curvefit()[source]

Compute pKa and Hill coefficient using scipy curve_fit.

This is the simple approach that treats each pH independently.

Return type:

DataFrame

compute_titrations_weighted(verbose=False)[source]

Compute pKa and Hill coefficient using weighted least squares.

Weights each pH point by 1/variance, giving more influence to points with more samples and intermediate protonation fractions.

This is more statistically rigorous than unweighted curve fitting when sample sizes vary across pH values.

Return type:

DataFrame

compute_titrations_bootstrap(n_bootstrap=1000, verbose=False)[source]

Compute pKa and Hill coefficient with bootstrap confidence intervals.

Resamples the data at each pH to estimate uncertainty in fitted parameters. This gives robust error estimates even when the Hill equation doesn’t perfectly fit the data.

Parameters:
  • n_bootstrap (int) – Number of bootstrap iterations (default 1000)

  • verbose (bool) – Print progress

Return type:

DataFrame

compute_titrations(verbose=False, n_bootstrap=1000)[source]

Compute titrations using selected method.

Return type:

None

postprocess()[source]

Generate fitted curves for plotting.

Return type:

None

plot()[source]

Generate plots (to be implemented).

Return type:

None

diagnose_residue(resid, verbose=True)[source]

Diagnose why a residue might have failed pKa determination.

Parameters:
  • resid (str) – Residue ID to diagnose

  • verbose (bool) – Print diagnostic information

Return type:

dict

static hill_equation(pH, pKa, n)[source]

Hill equation for acid-base equilibrium.

Returns fraction protonated as function of pH.

Return type:

float | ndarray[tuple[Any, ...], dtype[floating]]

property protonation_mapping: dict[str, int]

Map state names to protonation numbers (1 = protonated, 0 = not).

property canonical_resname: dict[str, str]

Map any state name to canonical residue name.

compare_methods(resids=None)[source]

Compare curve fit vs UWHAM results for specified residues.

Parameters:

resids (list[str] | None) – Residues to compare. If None, compares all.

Return type:

DataFrame

class molecular_simulations.analysis.constant_pH_analysis.TitrationAnalyzer(log_files, output_dir=None)[source]

Bases: object

High-level analyzer for constant pH simulations.

Provides a streamlined API that runs both curve fitting and UWHAM analysis, generates comparisons, and creates publication-quality plots.

Example usage

>>> analyzer = TitrationAnalyzer(["cpH.log"])
>>> analyzer.run()
>>> analyzer.summary()
>>> analyzer.plot_residue("145")
>>> analyzer.plot_all(output_dir="plots/")
>>> analyzer.save_results("results/")

Initialize the analyzer.

param log_files:

Path(s) to constant pH log file(s)

type log_files:

Path | list[Path] | str | list[str]

param output_dir:

Directory for output files. If None, uses current directory.

type output_dir:

Path | str | None

__init__(log_files, output_dir=None)[source]

Initialize the analyzer.

Parameters:
  • log_files (Path | list[Path] | str | list[str]) – Path(s) to constant pH log file(s)

  • output_dir (Path | str | None) – Directory for output files. If None, uses current directory.

run(methods=['curvefit', 'weighted'], verbose=True, n_bootstrap=1000)[source]

Run the analysis with specified methods.

Parameters:
  • methods (list[str]) – Methods to run: ‘curvefit’, ‘weighted’, ‘bootstrap’ - curvefit: Simple least squares fit of Hill equation - weighted: Weighted least squares (by 1/variance) - bootstrap: Curve fit with bootstrap confidence intervals

  • verbose (bool) – Print progress information

  • n_bootstrap (int) – Number of bootstrap iterations (only used if ‘bootstrap’ in methods)

Returns:

self

Return type:

TitrationAnalyzer

summary(show_all=False)[source]

Print and return summary of results.

Parameters:

show_all (bool) – If True, show all residues. Otherwise show first 20.

Return type:

DataFrame | None

get_results(method='curvefit')[source]

Get results DataFrame for specified method.

Parameters:

method (str) – ‘curvefit’, ‘weighted’, ‘bootstrap’, or ‘comparison’

Return type:

DataFrame | None

plot_residue(resid, ax=None, show_curvefit=True, show_weighted=True, show_data=True, figsize=(8, 6), save=None)[source]

Plot titration curve for a single residue.

Parameters:
  • resid (str) – Residue ID to plot

  • ax (Axes | None) – Axes to plot on. If None, creates new figure.

  • show_curvefit (bool) – Show curve fitting result

  • show_weighted (bool) – Show weighted fit result

  • show_data (bool) – Show raw data points

  • figsize (tuple[float, float]) – Figure size if creating new figure

  • save (str | Path | None) – Path to save figure

Return type:

Figure

plot_all(output_dir=None, format='png', show_curvefit=True, show_weighted=True, residues=None, verbose=True)[source]

Generate plots for all (or selected) residues.

Parameters:
  • output_dir (str | Path | None) – Directory for plots. Uses self.output_dir / ‘plots’ if None.

  • format (str) – Image format (‘png’, ‘pdf’, ‘svg’)

  • show_curvefit (bool) – Include curve fitting results

  • show_weighted (bool) – Include weighted fit results

  • residues (list[str] | None) – Specific residues to plot. If None, plots all.

  • verbose (bool) – Print progress

Return type:

None

plot_summary(figsize=(12, 5), save=None)[source]

Generate summary plot comparing methods.

Creates a 2-panel figure: - Left: pKa comparison scatter plot - Right: Distribution of pKa differences

Return type:

Figure

save_results(output_dir=None, prefix='', formats=['csv'])[source]

Save all results to files.

Parameters:
  • output_dir (str | Path | None) – Output directory. Uses self.output_dir if None.

  • prefix (str) – Prefix for filenames

  • formats (list[str]) – Output formats: ‘csv’, ‘parquet’, ‘json’

Return type:

None

diagnose(resid)[source]

Get diagnostic information for a residue.

Return type:

dict

recommend_protonation(target_pH, confidence_threshold=0.7, use_bootstrap=False, verbose=True)[source]

Recommend protonation states for a target pH.

Uses the fitted titration curves to predict which residues are protonated vs deprotonated at the specified pH, with confidence estimates based on distance from pKa.

Parameters:
  • target_pH (float) – pH value to make predictions for (e.g., 3.0, 7.4)

  • confidence_threshold (float) – Probability threshold for “confident” predictions (default 0.7) Residues with P(protonated) between (1-threshold) and threshold are marked as “uncertain”

  • use_bootstrap (bool) – If True and bootstrap results available, use bootstrap CI for uncertainty estimation

  • verbose (bool) – Print summary of recommendations

Returns:

  • resid: residue ID

  • resname: canonical residue name (ASP, GLU, HIS, LYS, CYS)

  • pKa: fitted pKa value

  • prob_protonated: probability of being protonated at target pH

  • recommendation: ‘protonated’, ‘deprotonated’, or ‘uncertain’

  • confidence: ‘high’, ‘medium’, or ‘low’

  • state_name: recommended state name (e.g., ‘ASH’ or ‘ASP’)

Return type:

DataFrame

get_protonation_string(target_pH, confidence_threshold=0.7)[source]

Get a simple string of recommended protonation states.

Useful for setting up simulations.

Parameters:
  • target_pH (float) – pH value to make predictions for

  • confidence_threshold (float) – Probability threshold for confident predictions

Returns:

String with format

Return type:

str

export_protonation_states(target_pH, output_file=None, format='csv', confidence_threshold=0.7)[source]

Export protonation state recommendations to file.

Parameters:
  • target_pH (float) – pH value to make predictions for

  • output_file (str | Path | None) – Output file path. If None, uses output_dir/protonation_pH{pH}.{format}

  • format (str) – Output format: ‘csv’, ‘json’, or ‘txt’

  • confidence_threshold (float) – Probability threshold for confident predictions

Return type:

DataFrame

plot_protonation_summary(target_pH, figsize=(12, 6), save=None)[source]

Visualize protonation probabilities at target pH.

Creates a bar plot showing P(protonated) for each residue, colored by residue type.

Parameters:
Return type:

Figure

molecular_simulations.analysis.constant_pH_analysis.analyze_cph(log_files, output_dir=None, methods=['curvefit', 'weighted'], plot=True, verbose=True)[source]

Convenience function to run complete constant pH analysis.

Parameters:
Return type:

TitrationAnalyzer

Example

>>> results = analyze_cph("cpH.log", output_dir="analysis/")
>>> results.summary()
>>> results.plot_residue("145")