molecular_simulations.analysis.constant_pH_analysis module

Improved constant pH analysis with UWHAM reweighting.

This implementation adds multistate analysis capabilities to the basic curve fitting approach. Uses log-space arithmetic for numerical stability.

class molecular_simulations.analysis.constant_pH_analysis.UWHAMSolver(tol=1e-07, maxiter=10000)[source]

Bases: object

Unbinned Weighted Histogram Analysis Method (UWHAM) solver.

NOTE: This class is NOT currently used because UWHAM/MBAR is designed for umbrella sampling and replica exchange, not independent constant pH simulations. For standard constant pH MD where each pH is an independent equilibrium simulation, simple curve fitting is the correct approach.

This class is retained for potential use with replica exchange constant pH (REX-cpH) simulations where samples ARE correlated across pH values.

Uses log-space arithmetic throughout for numerical stability with large systems (100+ titratable residues).

__init__(tol=1e-07, maxiter=10000)[source]
load_data(df, resid_cols)[source]

Load data from polars DataFrame into UWHAM-compatible format.

Parameters:
  • df (pl.DataFrame) – DataFrame with columns: rankid, current_pH, and residue columns Residue columns should contain numeric protonation states (0 or 1)

  • resid_cols (List[str]) – List of column names corresponding to residue IDs

solve(verbose=False)[source]

Solve UWHAM self-consistent equations iteratively.

Uses the MBAR equation: f_k = -log(Σ_n exp(-u_k(x_n)) / Σ_l N_l exp(f_l - u_l(x_n)))

where the sum over n includes ALL samples from ALL states.

Returns:

f – Free energy offsets for each pH simulation

Return type:

np.ndarray

compute_log_weights(target_pH)[source]

Compute log weights for reweighting to target pH.

Uses MBAR formula: w_n ∝ exp(-u_target(x_n)) / Σ_l N_l exp(f_l - u_l(x_n))

Return type:

Tuple[ndarray, float]

Returns:

  • log_weights (np.ndarray) – Log weights for all samples (flattened)

  • log_norm (float) – Log of the normalization constant

compute_expectation_at_pH(observable_by_state, target_pH)[source]

Compute expectation value of observable at arbitrary pH.

Parameters:
  • observable_by_state (List[np.ndarray]) – Observable values for each sample, organized by state index

  • target_pH (float) – pH value at which to compute the expectation

Returns:

expectation – Reweighted expectation value at target_pH

Return type:

float

get_occupancy_for_resid(resid)[source]

Get occupancy arrays for a specific residue across all pH values.

Return type:

List[ndarray]

class molecular_simulations.analysis.constant_pH_analysis.TitrationCurve(log_file, make_plots=True, out=PosixPath('.'), method='uwham')[source]

Bases: object

Analyze constant pH simulations with multiple fitting methods.

Available methods: - curvefit: Simple least squares fit of Hill equation to per-pH averages - weighted: Weighted least squares (weight by 1/variance) - bootstrap: Curve fitting with bootstrap confidence intervals

Note: For independent constant pH simulations (not replica exchange), simple curve fitting is the statistically correct approach. UWHAM/MBAR is only appropriate for replica exchange constant pH where samples are correlated across pH values.

__init__(log_file, make_plots=True, out=PosixPath('.'), method='uwham')[source]
static parse_log(log)[source]

Parse OpenMM constant pH log file.

Return type:

Tuple[DataFrame, List[int]]

Returns:

  • df (pl.DataFrame) – DataFrame with columns: rankid, current_pH, and one column per residue

  • resids (List[int]) – List of residue IDs in order

prepare()[source]

Prepare data for analysis.

Return type:

None

compute_titrations_curvefit()[source]

Compute pKa and Hill coefficient using scipy curve_fit.

This is the simple approach that treats each pH independently.

Return type:

DataFrame

compute_titrations_weighted(verbose=False)[source]

Compute pKa and Hill coefficient using weighted least squares.

Weights each pH point by 1/variance, giving more influence to points with more samples and intermediate protonation fractions.

This is more statistically rigorous than unweighted curve fitting when sample sizes vary across pH values.

Return type:

DataFrame

compute_titrations_bootstrap(n_bootstrap=1000, verbose=False)[source]

Compute pKa and Hill coefficient with bootstrap confidence intervals.

Resamples the data at each pH to estimate uncertainty in fitted parameters. This gives robust error estimates even when the Hill equation doesn’t perfectly fit the data.

Parameters:
  • n_bootstrap (int) – Number of bootstrap iterations (default 1000)

  • verbose (bool) – Print progress

Return type:

DataFrame with pKa, Hill_n, and 95% confidence intervals

compute_titrations(verbose=False, n_bootstrap=1000)[source]

Compute titrations using selected method.

Return type:

None

postprocess()[source]

Generate fitted curves for plotting.

Return type:

None

plot()[source]

Generate plots (to be implemented).

Return type:

None

diagnose_residue(resid, verbose=True)[source]

Diagnose why a residue might have failed pKa determination.

Parameters:
  • resid (str) – Residue ID to diagnose

  • verbose (bool) – Print diagnostic information

Return type:

dict with diagnostic info including titration curve data

static hill_equation(pH, pKa, n)[source]

Hill equation for acid-base equilibrium.

Returns fraction protonated as function of pH.

Return type:

float

property protonation_mapping: Dict[str, int]

Map state names to protonation numbers (1 = protonated, 0 = not).

property canonical_resname: Dict[str, str]

Map any state name to canonical residue name.

compare_methods(resids=None)[source]

Compare curve fit vs UWHAM results for specified residues.

Parameters:

resids (List[str], optional) – Residues to compare. If None, compares all.

Return type:

DataFrame with both methods’ results side by side

class molecular_simulations.analysis.constant_pH_analysis.TitrationAnalyzer(log_files, output_dir=None)[source]

Bases: object

High-level analyzer for constant pH simulations.

Provides a streamlined API that runs both curve fitting and UWHAM analysis, generates comparisons, and creates publication-quality plots.

Example usage

>>> analyzer = TitrationAnalyzer(['cpH.log'])
>>> analyzer.run()
>>> analyzer.summary()
>>> analyzer.plot_residue('145')
>>> analyzer.plot_all(output_dir='plots/')
>>> analyzer.save_results('results/')

Initialize the analyzer.

type log_files:

Union[Path, List[Path], str, List[str]]

param log_files:

Path(s) to constant pH log file(s)

type log_files:

Path, str, or list thereof

type output_dir:

Union[Path, str, None]

param output_dir:

Directory for output files. If None, uses current directory.

type output_dir:

Path or str, optional

__init__(log_files, output_dir=None)[source]

Initialize the analyzer.

Parameters:
  • log_files (Path, str, or list thereof) – Path(s) to constant pH log file(s)

  • output_dir (Path or str, optional) – Directory for output files. If None, uses current directory.

run(methods=['curvefit', 'weighted'], verbose=True, n_bootstrap=1000)[source]

Run the analysis with specified methods.

Parameters:
  • methods (list of str) – Methods to run: ‘curvefit’, ‘weighted’, ‘bootstrap’ - curvefit: Simple least squares fit of Hill equation - weighted: Weighted least squares (by 1/variance) - bootstrap: Curve fit with bootstrap confidence intervals

  • verbose (bool) – Print progress information

  • n_bootstrap (int) – Number of bootstrap iterations (only used if ‘bootstrap’ in methods)

Returns:

self

Return type:

for method chaining

summary(show_all=False)[source]

Print and return summary of results.

Parameters:

show_all (bool) – If True, show all residues. Otherwise show first 20.

Return type:

DataFrame with comparison results

get_results(method='curvefit')[source]

Get results DataFrame for specified method.

Parameters:

method (str) – ‘curvefit’, ‘weighted’, ‘bootstrap’, or ‘comparison’

Return type:

DataFrame

plot_residue(resid, ax=None, show_curvefit=True, show_weighted=True, show_data=True, figsize=(8, 6), save=None)[source]

Plot titration curve for a single residue.

Parameters:
  • resid (str) – Residue ID to plot

  • ax (matplotlib Axes, optional) – Axes to plot on. If None, creates new figure.

  • show_curvefit (bool) – Show curve fitting result

  • show_weighted (bool) – Show weighted fit result

  • show_data (bool) – Show raw data points

  • figsize (tuple) – Figure size if creating new figure

  • save (str or Path, optional) – Path to save figure

Return type:

matplotlib Figure

plot_all(output_dir=None, format='png', show_curvefit=True, show_weighted=True, residues=None, verbose=True)[source]

Generate plots for all (or selected) residues.

Parameters:
  • output_dir (str or Path, optional) – Directory for plots. Uses self.output_dir / ‘plots’ if None.

  • format (str) – Image format (‘png’, ‘pdf’, ‘svg’)

  • show_curvefit (bool) – Include curve fitting results

  • show_weighted (bool) – Include weighted fit results

  • residues (list of str, optional) – Specific residues to plot. If None, plots all.

  • verbose (bool) – Print progress

Return type:

None

plot_summary(figsize=(12, 5), save=None)[source]

Generate summary plot comparing methods.

Creates a 2-panel figure: - Left: pKa comparison scatter plot - Right: Distribution of pKa differences

Return type:

plt.Figure

save_results(output_dir=None, prefix='', formats=['csv'])[source]

Save all results to files.

Parameters:
  • output_dir (str or Path, optional) – Output directory. Uses self.output_dir if None.

  • prefix (str) – Prefix for filenames

  • formats (list of str) – Output formats: ‘csv’, ‘parquet’, ‘json’

Return type:

None

diagnose(resid)[source]

Get diagnostic information for a residue.

Return type:

Dict

recommend_protonation(target_pH, confidence_threshold=0.7, use_bootstrap=False, verbose=True)[source]

Recommend protonation states for a target pH.

Uses the fitted titration curves to predict which residues are protonated vs deprotonated at the specified pH, with confidence estimates based on distance from pKa.

Parameters:
  • target_pH (float) – pH value to make predictions for (e.g., 3.0, 7.4)

  • confidence_threshold (float) – Probability threshold for “confident” predictions (default 0.7) Residues with P(protonated) between (1-threshold) and threshold are marked as “uncertain”

  • use_bootstrap (bool) – If True and bootstrap results available, use bootstrap CI for uncertainty estimation

  • verbose (bool) – Print summary of recommendations

Returns:

  • resid: residue ID

  • resname: canonical residue name (ASP, GLU, HIS, LYS, CYS)

  • pKa: fitted pKa value

  • prob_protonated: probability of being protonated at target pH

  • recommendation: ‘protonated’, ‘deprotonated’, or ‘uncertain’

  • confidence: ‘high’, ‘medium’, or ‘low’

  • state_name: recommended state name (e.g., ‘ASH’ or ‘ASP’)

Return type:

DataFrame with columns

get_protonation_string(target_pH, confidence_threshold=0.7)[source]

Get a simple string of recommended protonation states.

Useful for setting up simulations.

Parameters:
  • target_pH (float) – pH value to make predictions for

  • confidence_threshold (float) – Probability threshold for confident predictions

Returns:

String with format

Return type:

“resid:state,resid:state,…”

export_protonation_states(target_pH, output_file=None, format='csv', confidence_threshold=0.7)[source]

Export protonation state recommendations to file.

Parameters:
  • target_pH (float) – pH value to make predictions for

  • output_file (str or Path, optional) – Output file path. If None, uses output_dir/protonation_pH{pH}.{format}

  • format (str) – Output format: ‘csv’, ‘json’, or ‘txt’

  • confidence_threshold (float) – Probability threshold for confident predictions

Return type:

DataFrame with recommendations

plot_protonation_summary(target_pH, figsize=(12, 6), save=None)[source]

Visualize protonation probabilities at target pH.

Creates a bar plot showing P(protonated) for each residue, colored by residue type.

Parameters:
  • target_pH (float) – pH value to visualize

  • figsize (tuple) – Figure size

  • save (str or Path, optional) – Path to save figure

Return type:

matplotlib Figure

molecular_simulations.analysis.constant_pH_analysis.analyze_cph(log_files, output_dir=None, methods=['curvefit', 'weighted'], plot=True, verbose=True)[source]

Convenience function to run complete constant pH analysis.

Parameters:
  • log_files (path(s) to log files)

  • output_dir (output directory)

  • methods (list of methods to run ('curvefit', 'weighted', 'bootstrap'))

  • plot (whether to generate plots)

  • verbose (print progress)

Return type:

TitrationAnalyzer with results

Example

>>> results = analyze_cph('cpH.log', output_dir='analysis/')
>>> results.summary()
>>> results.plot_residue('145')