wedap.h5_pdist module

Convert auxillary data recorded during WESTPA simulation and stored in west.h5 file to various probability density datasets.

This script effectively replaces the need to use the native WESTPA plotting pipeline: west.h5 –w_pdist(with –construct-dataset module.py)–> pdist.h5 –plothist(with –postprocess-functions hist_settings.py)–> plot.pdf

TODO:
  • maybe add option to output pdist as file, this would speed up subsequent plotting

    of the same data. H5_Plot could then use this data.

  • method to return pdist of a single trace, leading into option to plot all succ traces.

class wedap.h5_pdist.H5_Pdist(h5='west.h5', data_type=None, Xname='pcoord', Xindex=0, Yname=None, Yindex=0, Zname=None, Zindex=0, Cname=None, Cindex=0, H5save_out=None, Xsave_name=None, Ysave_name=None, Zsave_name=None, data_proc=None, first_iter=1, last_iter=None, step_iter=1, bins=(100, 100), p_units='kT', T=298, weighted=True, skip_basis=None, succ_only=False, histrange_x=None, histrange_y=None, no_pbar=False, *args, **kwargs)

Bases: object

These class methods generate probability distributions from a WESTPA H5 file.

aux_to_pdist_1d(iteration)

Take the auxiliary dataset for a single iteration and generate a weighted 1D probability distribution.

Parameters:

iteration (int) – Desired iteration to extract timeseries info from.

Returns:

  • midpoints_x (ndarray) – Histogram midpoint bin values for target aux coordinate of dimension 0.

  • midpoints_y (ndarray) – Optional histogram midpoint bin values for target aux coordinate of dimension 1.

  • histogram (ndarray) – Raw histogram count values of each histogram bin. Can be later normalized as -lnP(x).

aux_to_pdist_2d(iteration)

Take the auxiliary dataset for a single iteration and generate a weighted 2D probability distribution.

Parameters:

iteration (int) – Desired iteration to extract timeseries info from.

Returns:

  • midpoints_x (ndarray) – Histogram midpoint bin values for target aux coordinate of dimension 0.

  • midpoints_y (ndarray) – Optional histogram midpoint bin values for target aux coordinate of dimension 1.

  • histogram (ndarray) – Raw histogram count values of each histogram bin. Can be later normalized as -lnP(x).

average_datasets_3d(interval=1)

Unique case where Zname is specified and the XYZ datasets are returned. Averaged over the iteration range.

Returns:

X, Y, Z – Raw data for each named coordinate.

Return type:

arrays

average_datasets_4d(interval=1)

Unique case where Zname is specified and the XYZ datasets are returned. Averaged over the iteration range. With Cname, 4d.

Returns:

X, Y, Z, C – Raw data for each named coordinate.

Return type:

arrays

average_pdist_1d()

1 dataset: average pdist for a range of iterations.

Returns:

x and y axis values, x is the coordinate values and y is probabilities.

Return type:

x, y

average_pdist_2d()

2 datasets: average pdist for a range of iterations.

Returns:

x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.

Return type:

x, y, norm_hist

evolution_pdist()

Returns the pdist for 1 coordinate for the range iterations specified.

Returns:

x, y, norm_hist – x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.

Return type:

arrays

find_iter_seg_from_xy_vals(val_x, val_y)

Find and return (iter, seg) closest to input data value(s).

Parameters:
  • val_x (int or float) – X dataset value to search for.

  • val_y (int or float) – Y dataset value to search for.

Returns:

iter_num, seg_num – Iteration, segment number.

Return type:

int, int

get_all_weights()

Returns an 1D array of the weight for every frame of each tau for all segments of all iterations specified.

Returns:

weights_expanded

Return type:

array

get_coords(path, data_name, data_index)

Get a list of data coordinates for plotting traces. Only grabs the last frames.

Parameters:
  • path (list of tuples) – Tuples are (iteration, walker) traces.

  • data_name (str) – Name of dataset.

  • data_index (int) – Index of dataset.

Returns:

coordinates – Array of coordinates from the list of (iteration, walker) tuples.

Return type:

1d array

get_full_coords(walker_tuple, data_name, data_index=0, first_iter=1)

Returns a full 1D set of data for a single trace (path). This will be ordered from the first iter to the last.

Parameters:
  • walker_tuple (tuple) – (iteration, walker) start point to trace from.

  • data_name (str) – Name of dataset.

  • data_index (int) – Index of dataset.

  • first_iter (int) – Iter to trace back to. Default 1.

Returns:

coordinates – Array of coordinates from the list of (iteration, walker) tuples.

Return type:

1d array

get_parents(walker_tuple)

Get parent of an input (iteration, walker).

Parameters:

walker_tuple (tuple) – (iteration, walker)

Returns:

parent

Return type:

iteration, walker

get_total_data_array(name, index=0, interval=1, reshape=True)

Loop through all iterations specified and get a 1d raw data array. # TODO: this could be organized better with my other methods maybe I can separate the helper functions into another class for extracting and moving data around, this pdist class could be used strictly for making pdists from a nice and standard data array input that is handled by the H5_Processing class

Parameters:
  • name (str) – Name of data from h5 file such as pcoord or an aux dataset.

  • index (int) – Index of the data from h5 file.

  • interval (int) – If more sparse data is needed for efficiency.

  • reshape (bool) – Option to reshape into 1d array instead of each seg for all tau values.

Returns:

data – Raw (unweighted) data array for the name specified.

Return type:

1d array

instant_datasets_3d()

Unique case where Zname is specified and the XYZ datasets are returned. For single iteration.

Returns:

X, Y, Z – Raw data for each named coordinate.

Return type:

arrays

instant_pdist_1d()

Returns the x and y pdist datasets for a single iteration.

Returns:

Xdata, y – x (dataset) and y (pdist) axis values

Return type:

arrays

instant_pdist_2d()

Returns the xyz pdist datasets for a single iteration.

Returns:

x, y, norm_hist – x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.

Return type:

arrays

make_new_h5(new_weights=None)

TODO: actually make a new h5 file, see bstate filter code, integrate all. If self.H5save_out is not None and X/Y/Zsave_name is not None. Saves out a new h5 file of name self.H5save_out with the current X/Y/Zname data into auxdata of h5 file with name of X/Y/Zsave_name.

Parameters:

new_weights (numpy object array) – Updated weight values, e.g. from skip_basis or succ_only.

pdist(normalize=True)

Main public method with pdist generation controls.

Parameters:

normalize (bool) – By default (True), normalizes the output pdist. Must be True when using multiple h5 input files.

Returns:

X, Y, Z – Output probability distributions.

Return type:

arrays

plot_trace(walker_tuple, color='white', linewidth=1.0, linestyle='-', ax=None, find_iter_seg=False, mark_points=False, mp_size=80, mp_color=None, mp_markers=('o', 'v'), **kwargs)

Plot trace.

Parameters:
  • walker_tuple (tuple) – (iteration, walker) start point to trace from. Can also find the closest iteration/seg using input as (X_value,Y_value). find_iter_seg must be True to use this setting.

  • color (str)

  • linewidth (int)

  • linestyle (str)

  • ax (mpl axes object)

  • find_iter_seg (bool) – Default False and use walker tuple as (iter, seg). Set True to look for (iter, seg) using walker_tuple input as (X_value,Y_value).

  • mark_points (bool) – Default False, set to true to mark the starting and end points of the trace path.

  • mp_size (int) – Size of the marked points, default 80.

  • mp_color (str) – Color of the marked points, if None, defaults to color arg.

  • mp_markers (tuple) – Two item tuple: start point marker style, end point marker style.

  • **kwargs – Passed to mpl plt.plot line plots. E.g. alpha parameter.

Returns:

aux or aux_x, aux_y – The coordinate values at each point in the trace.

Return type:

1D arrays

reshape_total_data_array(array)

Take an input 1d array of the data values at every segment for each iteration, and reshape them to make pdists.

Parameters:

array (1d array) – Data values at every segment for each iteration.

Returns:

array – Now rows = segments, columns = frame until tau, depth = data dimensions.

Return type:

ndarray

succ_pdist_weight_filter()

TODO: Filter weights to be zero for all non successfull trajectories. Make an array of zero weights and fill out weights for succ trajs only. option to output new h5?

Returns:

succ_weights – Updated weight array.

Return type:

numpy object array

trace_walker(walker_tuple, first_iter=1)

Get trace path of an input (iteration, walker).

Parameters:
  • walker_tuple (tuple) – (iteration, walker)

  • first_iter (int) – Iter to trace back to. Default 1.

Returns:

trace – Tuples are (iteration, walker) traces.

Return type:

list of tuples

w_succ()

Find and return all successfully recycled (iter, seg) pairs.

Returns:

succ

Return type:

list of tuples (iter,wlk)

wedap.h5_plot module

Main plotting class of wedap. Plot all of the datasets generated with H5_Pdist.

  • line – plot 1D lines.

  • hist – plot histogram (default).

  • hist_l – plot histogram and contour lines.

  • contour – plot contour levels and lines.

  • contour_f – plot contour levels

  • contour_l – plot contour lines only.

  • scatter3d – plot 3 datasets in a scatter plot.

  • hexbin3d – plot 3 datasets in a hexbin plot.

maybe a ridgeline plot? - This would be maybe for 1D avg of every 100 iterations - https://matplotlib.org/matplotblog/posts/create-ridgeplots-in-matplotlib/ Option to overlay different datasets, could be done easily with python but maybe a cli option?

TODO: bin visualizer? and maybe show the trajectories as just dots?

class wedap.h5_plot.H5_Plot(X=None, Y=None, Z=None, plot_mode='hist', cmap=None, smoothing_level=None, color=None, ax=None, p_min=None, p_max=None, contour_interval=1, contour_levels=None, cbar_label=None, cax=None, jointplot=False, data_label=None, proj3d=False, proj4d=False, C=None, scatter_interval=10, scatter_s=1, hexbin_grid=100, linewidth=None, linestyle='-', postprocess_func=None, *args, **kwargs)

Bases: H5_Pdist

These methods provide various plotting options for pdist data.

Variables:

cbar_pad (float) – Default 0.05, can update this attribute to change cbar padding.

add_cbar(cax=None, pad=0.05, fontsize=None)

Add cbar.

Parameters:
  • cax (mpl cbar axis) – Optionally specify the cbar axis.

  • pad (float) – cbar padding level.

  • fontsize (float) – Use custom value if fontsize is specified, otherwise use style default.

static gaussian_filter(data, sigma)

Apply Gaussian smoothing to a 2D array.

Parameters:
  • data (ndarray) – Input 2D array.

  • sigma (float) – Standard deviation of the Gaussian filter.

Returns:

Smoothed 2D array.

Return type:

ndarray

static load_module(module_name, path=None)

Load and return the given module, recursively loading containing packages as necessary.

plot(cbar=True)

Main public method. Master plotting run function Parse plot type and add cbars/tightlayout/plot_options/smoothing

Parameters:

cbar (bool) – Whether or not to include a colorbar.

Returns:

self.fig, self.ax – Generates, updates, and returns figure and axes objects.

Return type:

mpl figure and axes objects

plot_bar()

Simple bar plot.

plot_contour_f()

2d contour plot, fill.

plot_contour_l()

2d contour plot, lines.

plot_hexbin3d(gridsize=100)

Hexbin plot.

plot_hist()

2d hist plot.

plot_line()

1d line plot.

plot_margins()

Joint plot of heatmap (pcolormesh). Must input raw probabilities from H5_Pdist(p_units = ‘raw’).

plot_scatter3d(interval=10, s=1)

3d scatter plot.

Parameters:
  • interval (int) – Interval to consider the XYZ datasets, increase to use less data.

  • s (float) – mpl scatter marker size.

wedap.h5_gif module

Helper function for making gifs.

wedap.h5_gif.make_gif(first_iter, last_iter, step_iter=1, avg_plus=100, duration=50, gif_out='example.gif', **kwargs)

Convenience function for gif making. Note that this is tailored for making average pdist plots.

Parameters:
  • first_iter (int) – Where to start the gif.

  • last_iter (int) – Where to end the gif. Important here is that you make sure avg_plus + last_iter does not exceed the total amount of iters you have available in the h5 file.

  • step_iter (int) – Interval for looping the first to last iter requested.

  • avg_plus (int) – The +range of interations for each iter in range(first,last,step). So as the loop progresses, avg_plus is added to each iter to make the range that the average pdist is taken from. Important here is that you make sure avg_plus + last_iter does not exceed the total amount of iters you have available in the h5 file. If you set avg_plus to 0, it will make instant plots of each iter in the range requested.

  • duration (int) – Duration in milliseconds between frames of the gif, default 50ms.

  • gif_out (str) – Out path to created gif file, default ‘example.gif’.

  • **kwargs – Can be useful to input dictionary of kwargs for H5_Plot init. E.g. can put xlim, xlabel, grid, Xname, etc.