wedap.H5_Pdist

class wedap.H5_Pdist(h5='west.h5', data_type=None, Xname='pcoord', Xindex=0, Yname=None, Yindex=0, Zname=None, Zindex=0, Cname=None, Cindex=0, H5save_out=None, Xsave_name=None, Ysave_name=None, Zsave_name=None, data_proc=None, first_iter=1, last_iter=None, step_iter=1, bins=(100, 100), p_units='kT', T=298, weighted=True, skip_basis=None, succ_only=False, histrange_x=None, histrange_y=None, no_pbar=False, *args, **kwargs)

These class methods generate probability distributions from a WESTPA H5 file.

__init__(h5='west.h5', data_type=None, Xname='pcoord', Xindex=0, Yname=None, Yindex=0, Zname=None, Zindex=0, Cname=None, Cindex=0, H5save_out=None, Xsave_name=None, Ysave_name=None, Zsave_name=None, data_proc=None, first_iter=1, last_iter=None, step_iter=1, bins=(100, 100), p_units='kT', T=298, weighted=True, skip_basis=None, succ_only=False, histrange_x=None, histrange_y=None, no_pbar=False, *args, **kwargs)

Initialize this class with an h5 file and data_type. The X/Y/Zname args Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).

After instantiating this class, input args are saved as instance attributes.

These can then be updated if needed. The main method you will call is the H5_Pdist.pdist() method, which will return the X, Y, and Z arrays to be plotted. The X and Y arrays are 1D and represent the X and Y axis values to be plotted. The Z array is empty with 2D output but otherwise will be a 2D array.

Parameters:
  • h5 (str or list of str) – Path(s) to west.h5 file(s).

  • data_type (str) – ‘evolution’ (1 dataset); ‘average’ or ‘instant’ (1 or 2 datasets)

  • Xname (str or array) – Target data for x axis, default pcoord. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).

  • Xindex (int) – If X.ndim > 2, use this to index.

  • Yname (str or array) – Target data for y axis, default None. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).

  • Yindex (int) – If Y.ndim > 2, use this to index.

  • Zname (str or array) – Target data for z axis, default None. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending). Use this if you want to use a dataset instead of pdist for Z axis. This will be best plotted as a scatter plot with Z as the marker color. Instead of returning the pdist, only the XYZ datasets will be returned. This is becasue the weights/pdist isn’t considered.

  • Zindex (int) – If Z.ndim > 2, use this to index.

  • Cname (str or array) – Target data for cbar axis when using 3d projection scatter, default None. Can be a pcoord or aux dataset name in a west.h5 file, a 1D or 2D numpy array, or the path to and name of a file (with a .dat, .txt, .pkl, .npz, .npy ending).

  • Cindex (int) – If C.ndim > 2, use this to index.

  • H5save_out (str) – Paths to save a new H5 file with this dataset name. Right now it saves the requested X Y or Z data into a new aux_name. Note if you use this feature the input data must be the same shape and formatting as the other H5 file datasets. (TODO: organization?) Also can be name of the outfile h5 file for optionally outputting new skipped basis or succ_only h5 dataset with updated weights.

  • Xsave_name, Ysave_name, Zsave_name (str) – Respective names to call the new dataset saved into the new H5 file.

  • data_proc (function or tuple of functions) – Of the form f(data) where data has rows=segments, columns=frames until tau, depth=data dims. The input function must return a processed array of the same shape and formatting.

  • first_iter (int) – Default start pdist at iteration 1 data.

  • last_iter (int) – Last iteration data to include, default is the last recorded iteration in the west.h5 file. Note that instant type pdists only depend on last_iter.

  • step_iter (int) – Only use every step_iter size iteration intervals of the input data, e.g. step_iter=10 for every 10 iterations.

  • bins (tuple of ints (TODO: maybe the tuple isn’t user friendly for 1 dim? Could check items like md_pdist)) – Histogram bins in pdist data to be generated for x and y datasets, default both 100.

  • p_units (str) – Can be ‘kT’ (default), ‘kcal’, ‘raw’, or ‘raw_norm’. kT = -lnP, kcal/mol = -RT(lnP), where RT = 0.5922 at T Kelvin. ‘raw’ is the raw probabilities and ‘raw_norm’ is the raw probabilities P(max) normalized.

  • T (int) – Temperature if using kcal/mol.

  • weighted (bool) – Default True, use WE segment weights in pdist calculation.

  • skip_basis (list) – List of binaries for each basis state to determine if it is skipped. e.g. [0, 0, 1] would only consider the trajectory data from basis states 1 and 2 but would skip basis state 3, applying zero weights.

  • succ_only (bool) – Default False, set True to filter weights to show only successfull trajectories.

  • histrange_x, histrange_y (list or tuple of 2 floats or ints) – Optionally put custom bin ranges.

  • no_pbar (bool) – Optionally do not include the progress bar for pdist generation.

  • TODO (maybe also binsfromexpression?)

Methods

__init__([h5, data_type, Xname, Xindex, ...])

Initialize this class with an h5 file and data_type.

aux_to_pdist_1d(iteration)

Take the auxiliary dataset for a single iteration and generate a weighted 1D probability distribution.

aux_to_pdist_2d(iteration)

Take the auxiliary dataset for a single iteration and generate a weighted 2D probability distribution.

average_datasets_3d([interval])

Unique case where Zname is specified and the XYZ datasets are returned.

average_datasets_4d([interval])

Unique case where Zname is specified and the XYZ datasets are returned.

average_pdist_1d()

1 dataset: average pdist for a range of iterations.

average_pdist_2d()

2 datasets: average pdist for a range of iterations.

evolution_pdist()

Returns the pdist for 1 coordinate for the range iterations specified.

find_iter_seg_from_xy_vals(val_x, val_y)

Find and return (iter, seg) closest to input data value(s).

get_all_weights()

Returns an 1D array of the weight for every frame of each tau for all segments of all iterations specified.

get_coords(path, data_name, data_index)

Get a list of data coordinates for plotting traces.

get_full_coords(walker_tuple, data_name[, ...])

Returns a full 1D set of data for a single trace (path).

get_parents(walker_tuple)

Get parent of an input (iteration, walker).

get_total_data_array(name[, index, ...])

Loop through all iterations specified and get a 1d raw data array.

instant_datasets_3d()

Unique case where Zname is specified and the XYZ datasets are returned.

instant_pdist_1d()

Returns the x and y pdist datasets for a single iteration.

instant_pdist_2d()

Returns the xyz pdist datasets for a single iteration.

make_new_h5([new_weights])

TODO: actually make a new h5 file, see bstate filter code, integrate all.

pdist([normalize])

Main public method with pdist generation controls.

plot_trace(walker_tuple[, color, linewidth, ...])

Plot trace.

reshape_total_data_array(array)

Take an input 1d array of the data values at every segment for each iteration, and reshape them to make pdists.

succ_pdist_weight_filter()

TODO: Filter weights to be zero for all non successfull trajectories.

trace_walker(walker_tuple[, first_iter])

Get trace path of an input (iteration, walker).

w_succ()

Find and return all successfully recycled (iter, seg) pairs.