wedap.H5_Pdist

class wedap.H5_Pdist(h5='west.h5', data_type=None, Xname='pcoord', Xindex=0, Yname=None, Yindex=0, Zname=None, Zindex=0, Cname=None, Cindex=0, H5save_out=None, Xsave_name=None, Ysave_name=None, Zsave_name=None, data_proc=None, first_iter=1, last_iter=None, step_iter=1, bins=(100, 100), p_units='kT', T=298, weighted=True, skip_basis=None, succ_only=False, histrange_x=None, histrange_y=None, no_pbar=False, *args, **kwargs)

These class methods generate probability distributions from a WESTPA H5 file.

__init__(h5='west.h5', data_type=None, Xname='pcoord', Xindex=0, Yname=None, Yindex=0, Zname=None, Zindex=0, Cname=None, Cindex=0, H5save_out=None, Xsave_name=None, Ysave_name=None, Zsave_name=None, data_proc=None, first_iter=1, last_iter=None, step_iter=1, bins=(100, 100), p_units='kT', T=298, weighted=True, skip_basis=None, succ_only=False, histrange_x=None, histrange_y=None, no_pbar=False, *args, **kwargs)

Initialize this class with an h5 file and data_type. The X/Y/Zname args Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).

After instantiating this class, input args are saved as instance attributes.

These can then be updated if needed. The main method you will call is the H5_Pdist.pdist() method, which will return the X, Y, and Z arrays to be plotted. The X and Y arrays are 1D and represent the X and Y axis values to be plotted. The Z array is empty with 2D output but otherwise will be a 2D array.

Parameters:

h5 (str or list of str) – Path(s) to west.h5 file(s).
data_type (str) – ‘evolution’ (1 dataset); ‘average’ or ‘instant’ (1 or 2 datasets)
Xname (str or array) – Target data for x axis, default pcoord. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).
Xindex (int) – If X.ndim > 2, use this to index.
Yname (str or array) – Target data for y axis, default None. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).
Yindex (int) – If Y.ndim > 2, use this to index.
Zname (str or array) – Target data for z axis, default None. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending). Use this if you want to use a dataset instead of pdist for Z axis. This will be best plotted as a scatter plot with Z as the marker color. Instead of returning the pdist, only the XYZ datasets will be returned. This is becasue the weights/pdist isn’t considered.
Zindex (int) – If Z.ndim > 2, use this to index.
Cname (str or array) – Target data for cbar axis when using 3d projection scatter, default None. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).
Cindex (int) – If C.ndim > 2, use this to index.
H5save_out (str) – Paths to save a new H5 file with this dataset name. Right now it saves the requested X Y or Z data into a new aux_name. Note if you use this feature the input data must be the same shape and formatting as the other H5 file datasets. (TODO: organization?) Also can be name of the outfile h5 file for optionally outputting new skipped basis or succ_only h5 dataset with updated weights.
Xsave_name, Ysave_name, Zsave_name (str) – Respective names to call the new dataset saved into the new H5 file.
data_proc (function or tuple of functions) – Of the form f(data) where data has rows=segments, columns=frames until tau, depth=data dims. The input function must return a processed array of the same shape and formatting.
first_iter (int) – Default start pdist at iteration 1 data.
last_iter (int) – Last iteration data to include, default is the last recorded iteration in the west.h5 file. Note that instant type pdists only depend on last_iter.
step_iter (int) – Only use every step_iter size iteration intervals of the input data, e.g. step_iter=10 for every 10 iterations.
bins (tuple of ints (TODO: maybe the tuple isn’t user friendly for 1 dim? Could check items like md_pdist)) – Histogram bins in pdist data to be generated for x and y datasets, default both 100.
p_units (str) – Can be ‘kT’ (default), ‘kcal’, ‘raw’, or ‘raw_norm’. kT = -lnP, kcal/mol = -RT(lnP), where RT = 0.5922 at T Kelvin. ‘raw’ is the raw probabilities and ‘raw_norm’ is the raw probabilities P(max) normalized.
T (int) – Temperature if using kcal/mol.
weighted (bool) – Default True, use WE segment weights in pdist calculation.
skip_basis (list) – List of binaries for each basis state to determine if it is skipped. e.g. [0, 0, 1] would only consider the trajectory data from basis states 1 and 2 but would skip basis state 3, applying zero weights.
succ_only (bool) – Default False, set True to filter weights to show only successfull trajectories.
histrange_x, histrange_y (list or tuple of 2 floats or ints) – Optionally put custom bin ranges.
no_pbar (bool) – Optionally do not include the progress bar for pdist generation.
TODO (maybe also binsfromexpression?)

Methods

`__init__`([h5, data_type, Xname, Xindex, ...])	Initialize this class with an h5 file and data_type.
`aux_to_pdist_1d`(iteration)	Take the auxiliary dataset for a single iteration and generate a weighted 1D probability distribution.
`aux_to_pdist_2d`(iteration)	Take the auxiliary dataset for a single iteration and generate a weighted 2D probability distribution.
`average_datasets_3d`([interval])	Unique case where Zname is specified and the XYZ datasets are returned.
`average_datasets_4d`([interval])	Unique case where Zname is specified and the XYZ datasets are returned.
`average_pdist_1d`()	1 dataset: average pdist for a range of iterations.
`average_pdist_2d`()	2 datasets: average pdist for a range of iterations.
`evolution_pdist`()	Returns the pdist for 1 coordinate for the range iterations specified.
`find_iter_seg_from_xy_vals`(val_x, val_y)	Find and return (iter, seg) closest to input data value(s).
`get_all_weights`()	Returns an 1D array of the weight for every frame of each tau for all segments of all iterations specified.
`get_coords`(path, data_name, data_index)	Get a list of data coordinates for plotting traces.
`get_full_coords`(walker_tuple, data_name[, ...])	Returns a full 1D set of data for a single trace (path).
`get_parents`(walker_tuple)	Get parent of an input (iteration, walker).
`get_total_data_array`(name[, index, ...])	Loop through all iterations specified and get a 1d raw data array.
`instant_datasets_3d`()	Unique case where Zname is specified and the XYZ datasets are returned.
`instant_pdist_1d`()	Returns the x and y pdist datasets for a single iteration.
`instant_pdist_2d`()	Returns the xyz pdist datasets for a single iteration.
`make_new_h5`([new_weights])	TODO: actually make a new h5 file, see bstate filter code, integrate all.
`pdist`([normalize])	Main public method with pdist generation controls.
`plot_trace`(walker_tuple[, color, linewidth, ...])	Plot trace.
`reshape_total_data_array`(array)	Take an input 1d array of the data values at every segment for each iteration, and reshape them to make pdists.
`succ_pdist_weight_filter`()	TODO: Filter weights to be zero for all non successfull trajectories.
`trace_walker`(walker_tuple[, first_iter])	Get trace path of an input (iteration, walker).
`w_succ`()	Find and return all successfully recycled (iter, seg) pairs.