molecular_simulations.analysis.constant_pH_analysis module¶
Improved constant pH analysis with UWHAM reweighting.
This implementation adds multistate analysis capabilities to the basic curve fitting approach. Uses log-space arithmetic for numerical stability.
- class molecular_simulations.analysis.constant_pH_analysis.UWHAMSolver(tol=1e-07, maxiter=10000)[source]¶
Bases:
objectUnbinned Weighted Histogram Analysis Method (UWHAM) solver.
NOTE: This class is NOT currently used because UWHAM/MBAR is designed for umbrella sampling and replica exchange, not independent constant pH simulations. For standard constant pH MD where each pH is an independent equilibrium simulation, simple curve fitting is the correct approach.
This class is retained for potential use with replica exchange constant pH (REX-cpH) simulations where samples ARE correlated across pH values.
Uses log-space arithmetic throughout for numerical stability with large systems (100+ titratable residues).
- load_data(df, resid_cols)[source]¶
Load data from polars DataFrame into UWHAM-compatible format.
- Parameters:
df (pl.DataFrame) – DataFrame with columns: rankid, current_pH, and residue columns Residue columns should contain numeric protonation states (0 or 1)
resid_cols (List[str]) – List of column names corresponding to residue IDs
- solve(verbose=False)[source]¶
Solve UWHAM self-consistent equations iteratively.
Uses the MBAR equation: f_k = -log(Σ_n exp(-u_k(x_n)) / Σ_l N_l exp(f_l - u_l(x_n)))
where the sum over n includes ALL samples from ALL states.
- Returns:
f – Free energy offsets for each pH simulation
- Return type:
np.ndarray
- compute_log_weights(target_pH)[source]¶
Compute log weights for reweighting to target pH.
Uses MBAR formula: w_n ∝ exp(-u_target(x_n)) / Σ_l N_l exp(f_l - u_l(x_n))
- class molecular_simulations.analysis.constant_pH_analysis.TitrationCurve(log_file, make_plots=True, out=PosixPath('.'), method='uwham')[source]¶
Bases:
objectAnalyze constant pH simulations with multiple fitting methods.
Available methods: - curvefit: Simple least squares fit of Hill equation to per-pH averages - weighted: Weighted least squares (weight by 1/variance) - bootstrap: Curve fitting with bootstrap confidence intervals
Note: For independent constant pH simulations (not replica exchange), simple curve fitting is the statistically correct approach. UWHAM/MBAR is only appropriate for replica exchange constant pH where samples are correlated across pH values.
- compute_titrations_curvefit()[source]¶
Compute pKa and Hill coefficient using scipy curve_fit.
This is the simple approach that treats each pH independently.
- Return type:
DataFrame
- compute_titrations_weighted(verbose=False)[source]¶
Compute pKa and Hill coefficient using weighted least squares.
Weights each pH point by 1/variance, giving more influence to points with more samples and intermediate protonation fractions.
This is more statistically rigorous than unweighted curve fitting when sample sizes vary across pH values.
- Return type:
DataFrame
- compute_titrations_bootstrap(n_bootstrap=1000, verbose=False)[source]¶
Compute pKa and Hill coefficient with bootstrap confidence intervals.
Resamples the data at each pH to estimate uncertainty in fitted parameters. This gives robust error estimates even when the Hill equation doesn’t perfectly fit the data.
- compute_titrations(verbose=False, n_bootstrap=1000)[source]¶
Compute titrations using selected method.
- Return type:
- diagnose_residue(resid, verbose=True)[source]¶
Diagnose why a residue might have failed pKa determination.
- static hill_equation(pH, pKa, n)[source]¶
Hill equation for acid-base equilibrium.
Returns fraction protonated as function of pH.
- Return type:
- class molecular_simulations.analysis.constant_pH_analysis.TitrationAnalyzer(log_files, output_dir=None)[source]¶
Bases:
objectHigh-level analyzer for constant pH simulations.
Provides a streamlined API that runs both curve fitting and UWHAM analysis, generates comparisons, and creates publication-quality plots.
Example usage¶
>>> analyzer = TitrationAnalyzer(['cpH.log']) >>> analyzer.run() >>> analyzer.summary() >>> analyzer.plot_residue('145') >>> analyzer.plot_all(output_dir='plots/') >>> analyzer.save_results('results/')
Initialize the analyzer.
- type log_files:
- param log_files:
Path(s) to constant pH log file(s)
- type log_files:
Path, str, or list thereof
- type output_dir:
- param output_dir:
Directory for output files. If None, uses current directory.
- type output_dir:
Path or str, optional
- run(methods=['curvefit', 'weighted'], verbose=True, n_bootstrap=1000)[source]¶
Run the analysis with specified methods.
- Parameters:
methods (list of str) – Methods to run: ‘curvefit’, ‘weighted’, ‘bootstrap’ - curvefit: Simple least squares fit of Hill equation - weighted: Weighted least squares (by 1/variance) - bootstrap: Curve fit with bootstrap confidence intervals
verbose (bool) – Print progress information
n_bootstrap (int) – Number of bootstrap iterations (only used if ‘bootstrap’ in methods)
- Returns:
self
- Return type:
for method chaining
- summary(show_all=False)[source]¶
Print and return summary of results.
- Parameters:
show_all (bool) – If True, show all residues. Otherwise show first 20.
- Return type:
DataFrame with comparison results
- get_results(method='curvefit')[source]¶
Get results DataFrame for specified method.
- Parameters:
method (str) – ‘curvefit’, ‘weighted’, ‘bootstrap’, or ‘comparison’
- Return type:
DataFrame
- plot_residue(resid, ax=None, show_curvefit=True, show_weighted=True, show_data=True, figsize=(8, 6), save=None)[source]¶
Plot titration curve for a single residue.
- Parameters:
resid (str) – Residue ID to plot
ax (matplotlib Axes, optional) – Axes to plot on. If None, creates new figure.
show_curvefit (bool) – Show curve fitting result
show_weighted (bool) – Show weighted fit result
show_data (bool) – Show raw data points
figsize (tuple) – Figure size if creating new figure
save (str or Path, optional) – Path to save figure
- Return type:
matplotlib Figure
- plot_all(output_dir=None, format='png', show_curvefit=True, show_weighted=True, residues=None, verbose=True)[source]¶
Generate plots for all (or selected) residues.
- Parameters:
output_dir (str or Path, optional) – Directory for plots. Uses self.output_dir / ‘plots’ if None.
format (str) – Image format (‘png’, ‘pdf’, ‘svg’)
show_curvefit (bool) – Include curve fitting results
show_weighted (bool) – Include weighted fit results
residues (list of str, optional) – Specific residues to plot. If None, plots all.
verbose (bool) – Print progress
- Return type:
- plot_summary(figsize=(12, 5), save=None)[source]¶
Generate summary plot comparing methods.
Creates a 2-panel figure: - Left: pKa comparison scatter plot - Right: Distribution of pKa differences
- Return type:
plt.Figure
- recommend_protonation(target_pH, confidence_threshold=0.7, use_bootstrap=False, verbose=True)[source]¶
Recommend protonation states for a target pH.
Uses the fitted titration curves to predict which residues are protonated vs deprotonated at the specified pH, with confidence estimates based on distance from pKa.
- Parameters:
target_pH (float) – pH value to make predictions for (e.g., 3.0, 7.4)
confidence_threshold (float) – Probability threshold for “confident” predictions (default 0.7) Residues with P(protonated) between (1-threshold) and threshold are marked as “uncertain”
use_bootstrap (bool) – If True and bootstrap results available, use bootstrap CI for uncertainty estimation
verbose (bool) – Print summary of recommendations
- Returns:
resid: residue ID
resname: canonical residue name (ASP, GLU, HIS, LYS, CYS)
pKa: fitted pKa value
prob_protonated: probability of being protonated at target pH
recommendation: ‘protonated’, ‘deprotonated’, or ‘uncertain’
confidence: ‘high’, ‘medium’, or ‘low’
state_name: recommended state name (e.g., ‘ASH’ or ‘ASP’)
- Return type:
DataFrame with columns
- get_protonation_string(target_pH, confidence_threshold=0.7)[source]¶
Get a simple string of recommended protonation states.
Useful for setting up simulations.
- export_protonation_states(target_pH, output_file=None, format='csv', confidence_threshold=0.7)[source]¶
Export protonation state recommendations to file.
- Parameters:
- Return type:
DataFrame with recommendations
- molecular_simulations.analysis.constant_pH_analysis.analyze_cph(log_files, output_dir=None, methods=['curvefit', 'weighted'], plot=True, verbose=True)[source]¶
Convenience function to run complete constant pH analysis.
- Parameters:
- Return type:
TitrationAnalyzer with results
Example
>>> results = analyze_cph('cpH.log', output_dir='analysis/') >>> results.summary() >>> results.plot_residue('145')