wedap.h5_pdist module

Convert auxillary data recorded during WESTPA simulation and stored in west.h5 file to various probability density datasets.

This script effectively replaces the need to use the native WESTPA plotting pipeline: west.h5 –w_pdist(with –construct-dataset module.py)–> pdist.h5 –plothist(with –postprocess-functions hist_settings.py)–> plot.pdf

TODO:

maybe add option to output pdist as file, this would speed up subsequent plotting
of the same data. H5_Plot could then use this data.
method to return pdist of a single trace, leading into option to plot all succ traces.

class wedap.h5_pdist.H5_Pdist(h5='west.h5', data_type=None, Xname='pcoord', Xindex=0, Yname=None, Yindex=0, Zname=None, Zindex=0, Cname=None, Cindex=0, H5save_out=None, Xsave_name=None, Ysave_name=None, Zsave_name=None, data_proc=None, first_iter=1, last_iter=None, step_iter=1, bins=(100, 100), p_units='kT', T=298, weighted=True, skip_basis=None, succ_only=False, histrange_x=None, histrange_y=None, no_pbar=False, *args, **kwargs)

Bases: object

These class methods generate probability distributions from a WESTPA H5 file.

aux_to_pdist_1d(iteration)

Take the auxiliary dataset for a single iteration and generate a weighted 1D probability distribution.

Parameters:

iteration (int) – Desired iteration to extract timeseries info from.

Returns:

midpoints_x (ndarray) – Histogram midpoint bin values for target aux coordinate of dimension 0.
midpoints_y (ndarray) – Optional histogram midpoint bin values for target aux coordinate of dimension 1.
histogram (ndarray) – Raw histogram count values of each histogram bin. Can be later normalized as -lnP(x).

aux_to_pdist_2d(iteration)

Take the auxiliary dataset for a single iteration and generate a weighted 2D probability distribution.

Parameters:

iteration (int) – Desired iteration to extract timeseries info from.

Returns:

midpoints_x (ndarray) – Histogram midpoint bin values for target aux coordinate of dimension 0.
midpoints_y (ndarray) – Optional histogram midpoint bin values for target aux coordinate of dimension 1.
histogram (ndarray) – Raw histogram count values of each histogram bin. Can be later normalized as -lnP(x).

average_datasets_3d(interval=1)

Unique case where Zname is specified and the XYZ datasets are returned. Averaged over the iteration range.

Returns:: X, Y, Z – Raw data for each named coordinate.
Return type:: arrays

average_datasets_4d(interval=1)

Unique case where Zname is specified and the XYZ datasets are returned. Averaged over the iteration range. With Cname, 4d.

Returns:: X, Y, Z, C – Raw data for each named coordinate.
Return type:: arrays

average_pdist_1d()

1 dataset: average pdist for a range of iterations.

Returns:: x and y axis values, x is the coordinate values and y is probabilities.
Return type:: x, y

average_pdist_2d()

2 datasets: average pdist for a range of iterations.

Returns:: x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.
Return type:: x, y, norm_hist

evolution_pdist()

Returns the pdist for 1 coordinate for the range iterations specified.

Returns:: x, y, norm_hist – x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.
Return type:: arrays

find_iter_seg_from_xy_vals(val_x, val_y)

Find and return (iter, seg) closest to input data value(s).

Parameters:

val_x (int or float) – X dataset value to search for.
val_y (int or float) – Y dataset value to search for.

Returns:

iter_num, seg_num – Iteration, segment number.

Return type:

int, int

get_all_weights()

Returns an 1D array of the weight for every frame of each tau for all segments of all iterations specified.

Returns:: weights_expanded
Return type:: array

get_coords(path, data_name, data_index)

Get a list of data coordinates for plotting traces. Only grabs the last frames.

Parameters:

path (list of tuples) – Tuples are (iteration, walker) traces.
data_name (str) – Name of dataset.
data_index (int) – Index of dataset.

Returns:

coordinates – Array of coordinates from the list of (iteration, walker) tuples.

Return type:

1d array

get_full_coords(walker_tuple, data_name, data_index=0, first_iter=1)

Returns a full 1D set of data for a single trace (path). This will be ordered from the first iter to the last.

Parameters:

walker_tuple (tuple) – (iteration, walker) start point to trace from.
data_name (str) – Name of dataset.
data_index (int) – Index of dataset.
first_iter (int) – Iter to trace back to. Default 1.

Returns:

coordinates – Array of coordinates from the list of (iteration, walker) tuples.

Return type:

1d array

get_parents(walker_tuple)

Get parent of an input (iteration, walker).

Parameters:: walker_tuple (tuple) – (iteration, walker)
Returns:: parent
Return type:: iteration, walker

get_total_data_array(name, index=0, interval=1, reshape=True)

Loop through all iterations specified and get a 1d raw data array. # TODO: this could be organized better with my other methods maybe I can separate the helper functions into another class for extracting and moving data around, this pdist class could be used strictly for making pdists from a nice and standard data array input that is handled by the H5_Processing class

Parameters:

name (str) – Name of data from h5 file such as pcoord or an aux dataset.
index (int) – Index of the data from h5 file.
interval (int) – If more sparse data is needed for efficiency.
reshape (bool) – Option to reshape into 1d array instead of each seg for all tau values.

Returns:

data – Raw (unweighted) data array for the name specified.

Return type:

1d array

instant_datasets_3d()

Unique case where Zname is specified and the XYZ datasets are returned. For single iteration.

Returns:: X, Y, Z – Raw data for each named coordinate.
Return type:: arrays

instant_pdist_1d()

Returns the x and y pdist datasets for a single iteration.

Returns:: Xdata, y – x (dataset) and y (pdist) axis values
Return type:: arrays

instant_pdist_2d()

Returns the xyz pdist datasets for a single iteration.

Returns:: x, y, norm_hist – x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.
Return type:: arrays

make_new_h5(new_weights=None)

TODO: actually make a new h5 file, see bstate filter code, integrate all. If self.H5save_out is not None and X/Y/Zsave_name is not None. Saves out a new h5 file of name self.H5save_out with the current X/Y/Zname data into auxdata of h5 file with name of X/Y/Zsave_name.

Parameters:: new_weights (numpy object array) – Updated weight values, e.g. from skip_basis or succ_only.

pdist(normalize=True)

Main public method with pdist generation controls.

Parameters:: normalize (bool) – By default (True), normalizes the output pdist. Must be True when using multiple h5 input files.
Returns:: X, Y, Z – Output probability distributions.
Return type:: arrays

plot_trace(walker_tuple, color='white', linewidth=1.0, linestyle='-', ax=None, find_iter_seg=False, mark_points=False, mp_size=80, mp_color=None, mp_markers=('o', 'v'), **kwargs)

Plot trace.

Parameters:

walker_tuple (tuple) – (iteration, walker) start point to trace from. Can also find the closest iteration/seg using input as (X_value,Y_value). find_iter_seg must be True to use this setting.
color (str)
linewidth (int)
linestyle (str)
ax (mpl axes object)
find_iter_seg (bool) – Default False and use walker tuple as (iter, seg). Set True to look for (iter, seg) using walker_tuple input as (X_value,Y_value).
mark_points (bool) – Default False, set to true to mark the starting and end points of the trace path.
mp_size (int) – Size of the marked points, default 80.
mp_color (str) – Color of the marked points, if None, defaults to color arg.
mp_markers (tuple) – Two item tuple: start point marker style, end point marker style.
**kwargs – Passed to mpl plt.plot line plots. E.g. alpha parameter.

Returns:

aux or aux_x, aux_y – The coordinate values at each point in the trace.

Return type:

1D arrays

reshape_total_data_array(array)

Take an input 1d array of the data values at every segment for each iteration, and reshape them to make pdists.

Parameters:: array (1d array) – Data values at every segment for each iteration.
Returns:: array – Now rows = segments, columns = frame until tau, depth = data dimensions.
Return type:: ndarray

succ_pdist_weight_filter()

TODO: Filter weights to be zero for all non successfull trajectories. Make an array of zero weights and fill out weights for succ trajs only. option to output new h5?

Returns:: succ_weights – Updated weight array.
Return type:: numpy object array

trace_walker(walker_tuple, first_iter=1)

Get trace path of an input (iteration, walker).

Parameters:

walker_tuple (tuple) – (iteration, walker)
first_iter (int) – Iter to trace back to. Default 1.

Returns:

trace – Tuples are (iteration, walker) traces.

Return type:

list of tuples

w_succ()

Find and return all successfully recycled (iter, seg) pairs.

Returns:: succ
Return type:: list of tuples (iter,wlk)

wedap.h5_plot module

Main plotting class of wedap. Plot all of the datasets generated with H5_Pdist.

line – plot 1D lines.
hist – plot histogram (default).
hist_l – plot histogram and contour lines.
contour – plot contour levels and lines.
contour_f – plot contour levels
contour_l – plot contour lines only.
scatter3d – plot 3 datasets in a scatter plot.
hexbin3d – plot 3 datasets in a hexbin plot.

maybe a ridgeline plot? - This would be maybe for 1D avg of every 100 iterations - https://matplotlib.org/matplotblog/posts/create-ridgeplots-in-matplotlib/ Option to overlay different datasets, could be done easily with python but maybe a cli option?

TODO: bin visualizer? and maybe show the trajectories as just dots?

class wedap.h5_plot.H5_Plot(X=None, Y=None, Z=None, plot_mode='hist', cmap=None, smoothing_level=None, color=None, ax=None, p_min=None, p_max=None, contour_interval=1, contour_levels=None, cbar_label=None, cax=None, jointplot=False, data_label=None, proj3d=False, proj4d=False, C=None, scatter_interval=10, scatter_s=1, hexbin_grid=100, linewidth=None, linestyle='-', postprocess_func=None, *args, **kwargs)

Bases: H5_Pdist

These methods provide various plotting options for pdist data.

Variables:: cbar_pad (float) – Default 0.05, can update this attribute to change cbar padding.

add_cbar(cax=None, pad=0.05, fontsize=None)

Add cbar.

Parameters:

cax (mpl cbar axis) – Optionally specify the cbar axis.
pad (float) – cbar padding level.
fontsize (float) – Use custom value if fontsize is specified, otherwise use style default.

static gaussian_filter(data, sigma)

Apply Gaussian smoothing to a 2D array.

Parameters:

data (ndarray) – Input 2D array.
sigma (float) – Standard deviation of the Gaussian filter.

Returns:

Smoothed 2D array.

Return type:

ndarray

static load_module(module_name, path=None): Load and return the given module, recursively loading containing packages as necessary.

plot(cbar=True)

Main public method. Master plotting run function Parse plot type and add cbars/tightlayout/plot_options/smoothing

Parameters:: cbar (bool) – Whether or not to include a colorbar.
Returns:: self.fig, self.ax – Generates, updates, and returns figure and axes objects.
Return type:: mpl figure and axes objects

plot_bar(): Simple bar plot.

plot_contour_f(): 2d contour plot, fill.

plot_contour_l(): 2d contour plot, lines.

plot_hexbin3d(gridsize=100): Hexbin plot.

plot_hist(): 2d hist plot.

plot_line(): 1d line plot.

plot_margins(): Joint plot of heatmap (pcolormesh). Must input raw probabilities from H5_Pdist(p_units = ‘raw’).

plot_scatter3d(interval=10, s=1)

3d scatter plot.

Parameters:

interval (int) – Interval to consider the XYZ datasets, increase to use less data.
s (float) – mpl scatter marker size.

wedap.h5_gif module

Helper function for making gifs.

wedap.h5_gif.make_gif(first_iter, last_iter, step_iter=1, avg_plus=100, duration=50, gif_out='example.gif', **kwargs)

Convenience function for gif making. Note that this is tailored for making average pdist plots.

Parameters:

first_iter (int) – Where to start the gif.
last_iter (int) – Where to end the gif. Important here is that you make sure avg_plus + last_iter does not exceed the total amount of iters you have available in the h5 file.
step_iter (int) – Interval for looping the first to last iter requested.
avg_plus (int) – The +range of interations for each iter in range(first,last,step). So as the loop progresses, avg_plus is added to each iter to make the range that the average pdist is taken from. Important here is that you make sure avg_plus + last_iter does not exceed the total amount of iters you have available in the h5 file. If you set avg_plus to 0, it will make instant plots of each iter in the range requested.
duration (int) – Duration in milliseconds between frames of the gif, default 50ms.
gif_out (str) – Out path to created gif file, default ‘example.gif’.
**kwargs – Can be useful to input dictionary of kwargs for H5_Plot init. E.g. can put xlim, xlabel, grid, Xname, etc.