
class wedap.H5_Pdist(h5='west.h5', data_type=None, Xname='pcoord', Xindex=0, Yname=None, Yindex=0, Zname=None, Zindex=0, Cname=None, Cindex=0, H5save_out=None, Xsave_name=None, Ysave_name=None, Zsave_name=None, data_proc=None, first_iter=1, last_iter=None, step_iter=1, bins=(100, 100), p_units='kT', T=298, weighted=True, skip_basis=None, succ_only=False, histrange_x=None, histrange_y=None, no_pbar=False, *args, **kwargs)

These class methods generate probability distributions from a WESTPA H5 file.

__init__(h5='west.h5', data_type=None, Xname='pcoord', Xindex=0, Yname=None, Yindex=0, Zname=None, Zindex=0, Cname=None, Cindex=0, H5save_out=None, Xsave_name=None, Ysave_name=None, Zsave_name=None, data_proc=None, first_iter=1, last_iter=None, step_iter=1, bins=(100, 100), p_units='kT', T=298, weighted=True, skip_basis=None, succ_only=False, histrange_x=None, histrange_y=None, no_pbar=False, *args, **kwargs)

Initialize this class with an h5 file and data_type. The X/Y/Zname args Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).

After instantiating this class, input args are saved as instance attributes.

These can then be updated if needed. The main method you will call is the H5_Pdist.pdist() method, which will return the X, Y, and Z arrays to be plotted. The X and Y arrays are 1D and represent the X and Y axis values to be plotted. The Z array is empty with 2D output but otherwise will be a 2D array.

  • h5 (str or list of str) – Path(s) to west.h5 file(s).

  • data_type (str) – ‘evolution’ (1 dataset); ‘average’ or ‘instant’ (1 or 2 datasets)

  • Xname (str or array) – Target data for x axis, default pcoord. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).

  • Xindex (int) – If X.ndim > 2, use this to index.

  • Yname (str or array) – Target data for y axis, default None. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).

  • Yindex (int) – If Y.ndim > 2, use this to index.

  • Zname (str or array) – Target data for z axis, default None. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending). Use this if you want to use a dataset instead of pdist for Z axis. This will be best plotted as a scatter plot with Z as the marker color. Instead of returning the pdist, only the XYZ datasets will be returned. This is becasue the weights/pdist isn’t considered.

  • Zindex (int) – If Z.ndim > 2, use this to index.

  • Cname (str or array) – Target data for cbar axis when using 3d projection scatter, default None. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).

  • Cindex (int) – If C.ndim > 2, use this to index.

  • H5save_out (str) – Paths to save a new H5 file with this dataset name. Right now it saves the requested X Y or Z data into a new aux_name. Note if you use this feature the input data must be the same shape and formatting as the other H5 file datasets. (TODO: organization?) Also can be name of the outfile h5 file for optionally outputting new skipped basis or succ_only h5 dataset with updated weights.

  • Xsave_name, Ysave_name, Zsave_name (str) – Respective names to call the new dataset saved into the new H5 file.

  • data_proc (function or tuple of functions) – Of the form f(data) where data has rows=segments, columns=frames until tau, depth=data dims. The input function must return a processed array of the same shape and formatting.

  • first_iter (int) – Default start pdist at iteration 1 data.

  • last_iter (int) – Last iteration data to include, default is the last recorded iteration in the west.h5 file. Note that instant type pdists only depend on last_iter.

  • step_iter (int) – Only use every step_iter size iteration intervals of the input data, e.g. step_iter=10 for every 10 iterations.

  • bins (tuple of ints (TODO: maybe the tuple isn’t user friendly for 1 dim? Could check items like md_pdist)) – Histogram bins in pdist data to be generated for x and y datasets, default both 100.

  • p_units (str) – Can be ‘kT’ (default), ‘kcal’, ‘raw’, or ‘raw_norm’. kT = -lnP, kcal/mol = -RT(lnP), where RT = 0.5922 at T Kelvin. ‘raw’ is the raw probabilities and ‘raw_norm’ is the raw probabilities P(max) normalized.

  • T (int) – Temperature if using kcal/mol.

  • weighted (bool) – Default True, use WE segment weights in pdist calculation.

  • skip_basis (list) – List of binaries for each basis state to determine if it is skipped. e.g. [0, 0, 1] would only consider the trajectory data from basis states 1 and 2 but would skip basis state 3, applying zero weights.

  • succ_only (bool) – Default False, set True to filter weights to show only successfull trajectories.

  • histrange_x, histrange_y (list or tuple of 2 floats or ints) – Optionally put custom bin ranges.

  • no_pbar (bool) – Optionally do not include the progress bar for pdist generation.

  • TODO (maybe also binsfromexpression?)


__init__([h5, data_type, Xname, Xindex, ...])

Initialize this class with an h5 file and data_type.


Take the auxiliary dataset for a single iteration and generate a weighted 1D probability distribution.


Take the auxiliary dataset for a single iteration and generate a weighted 2D probability distribution.


Unique case where Zname is specified and the XYZ datasets are returned.


Unique case where Zname is specified and the XYZ datasets are returned.


1 dataset: average pdist for a range of iterations.


2 datasets: average pdist for a range of iterations.


Returns the pdist for 1 coordinate for the range iterations specified.

find_iter_seg_from_xy_vals(val_x, val_y)

Find and return (iter, seg) closest to input data value(s).


Returns an 1D array of the weight for every frame of each tau for all segments of all iterations specified.

get_coords(path, data_name, data_index)

Get a list of data coordinates for plotting traces.

get_full_coords(walker_tuple, data_name[, ...])

Returns a full 1D set of data for a single trace (path).


Get parent of an input (iteration, walker).

get_total_data_array(name[, index, ...])

Loop through all iterations specified and get a 1d raw data array.


Unique case where Zname is specified and the XYZ datasets are returned.


Returns the x and y pdist datasets for a single iteration.


Returns the xyz pdist datasets for a single iteration.


TODO: actually make a new h5 file, see bstate filter code, integrate all.


Main public method with pdist generation controls.

plot_trace(walker_tuple[, color, linewidth, ...])

Plot trace.


Take an input 1d array of the data values at every segment for each iteration, and reshape them to make pdists.


TODO: Filter weights to be zero for all non successfull trajectories.

trace_walker(walker_tuple[, first_iter])

Get trace path of an input (iteration, walker).


Find and return all successfully recycled (iter, seg) pairs.