wedap.h5_pdist module
Convert auxillary data recorded during WESTPA simulation and stored in west.h5 file to various probability density datasets.
This script effectively replaces the need to use the native WESTPA plotting pipeline: west.h5 –w_pdist(with –construct-dataset module.py)–> pdist.h5 –plothist(with –postprocess-functions hist_settings.py)–> plot.pdf
- TODO:
- maybe add option to output pdist as file, this would speed up subsequent plotting
of the same data. H5_Plot could then use this data.
method to return pdist of a single trace, leading into option to plot all succ traces.
- class wedap.h5_pdist.H5_Pdist(h5='west.h5', data_type=None, Xname='pcoord', Xindex=0, Yname=None, Yindex=0, Zname=None, Zindex=0, Cname=None, Cindex=0, H5save_out=None, Xsave_name=None, Ysave_name=None, Zsave_name=None, data_proc=None, first_iter=1, last_iter=None, step_iter=1, bins=(100, 100), p_units='kT', T=298, weighted=True, skip_basis=None, succ_only=False, histrange_x=None, histrange_y=None, no_pbar=False, *args, **kwargs)
Bases:
object
These class methods generate probability distributions from a WESTPA H5 file.
- aux_to_pdist_1d(iteration)
Take the auxiliary dataset for a single iteration and generate a weighted 1D probability distribution.
- Parameters:
iteration (int) – Desired iteration to extract timeseries info from.
- Returns:
midpoints_x (ndarray) – Histogram midpoint bin values for target aux coordinate of dimension 0.
midpoints_y (ndarray) – Optional histogram midpoint bin values for target aux coordinate of dimension 1.
histogram (ndarray) – Raw histogram count values of each histogram bin. Can be later normalized as -lnP(x).
- aux_to_pdist_2d(iteration)
Take the auxiliary dataset for a single iteration and generate a weighted 2D probability distribution.
- Parameters:
iteration (int) – Desired iteration to extract timeseries info from.
- Returns:
midpoints_x (ndarray) – Histogram midpoint bin values for target aux coordinate of dimension 0.
midpoints_y (ndarray) – Optional histogram midpoint bin values for target aux coordinate of dimension 1.
histogram (ndarray) – Raw histogram count values of each histogram bin. Can be later normalized as -lnP(x).
- average_datasets_3d(interval=1)
Unique case where Zname is specified and the XYZ datasets are returned. Averaged over the iteration range.
- Returns:
X, Y, Z – Raw data for each named coordinate.
- Return type:
arrays
- average_datasets_4d(interval=1)
Unique case where Zname is specified and the XYZ datasets are returned. Averaged over the iteration range. With Cname, 4d.
- Returns:
X, Y, Z, C – Raw data for each named coordinate.
- Return type:
arrays
- average_pdist_1d()
1 dataset: average pdist for a range of iterations.
- Returns:
x and y axis values, x is the coordinate values and y is probabilities.
- Return type:
x, y
- average_pdist_2d()
2 datasets: average pdist for a range of iterations.
- Returns:
x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.
- Return type:
x, y, norm_hist
- evolution_pdist()
Returns the pdist for 1 coordinate for the range iterations specified.
- Returns:
x, y, norm_hist – x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.
- Return type:
arrays
- find_iter_seg_from_xy_vals(val_x, val_y)
Find and return (iter, seg) closest to input data value(s).
- Parameters:
val_x (int or float) – X dataset value to search for.
val_y (int or float) – Y dataset value to search for.
- Returns:
iter_num, seg_num – Iteration, segment number.
- Return type:
int, int
- get_all_weights()
Returns an 1D array of the weight for every frame of each tau for all segments of all iterations specified.
- Returns:
weights_expanded
- Return type:
array
- get_coords(path, data_name, data_index)
Get a list of data coordinates for plotting traces. Only grabs the last frames.
- Parameters:
path (list of tuples) – Tuples are (iteration, walker) traces.
data_name (str) – Name of dataset.
data_index (int) – Index of dataset.
- Returns:
coordinates – Array of coordinates from the list of (iteration, walker) tuples.
- Return type:
1d array
- get_full_coords(walker_tuple, data_name, data_index=0, first_iter=1)
Returns a full 1D set of data for a single trace (path). This will be ordered from the first iter to the last.
- Parameters:
walker_tuple (tuple) – (iteration, walker) start point to trace from.
data_name (str) – Name of dataset.
data_index (int) – Index of dataset.
first_iter (int) – Iter to trace back to. Default 1.
- Returns:
coordinates – Array of coordinates from the list of (iteration, walker) tuples.
- Return type:
1d array
- get_parents(walker_tuple)
Get parent of an input (iteration, walker).
- Parameters:
walker_tuple (tuple) – (iteration, walker)
- Returns:
parent
- Return type:
iteration, walker
- get_total_data_array(name, index=0, interval=1, reshape=True)
Loop through all iterations specified and get a 1d raw data array. # TODO: this could be organized better with my other methods maybe I can separate the helper functions into another class for extracting and moving data around, this pdist class could be used strictly for making pdists from a nice and standard data array input that is handled by the H5_Processing class
- Parameters:
name (str) – Name of data from h5 file such as pcoord or an aux dataset.
index (int) – Index of the data from h5 file.
interval (int) – If more sparse data is needed for efficiency.
reshape (bool) – Option to reshape into 1d array instead of each seg for all tau values.
- Returns:
data – Raw (unweighted) data array for the name specified.
- Return type:
1d array
- instant_datasets_3d()
Unique case where Zname is specified and the XYZ datasets are returned. For single iteration.
- Returns:
X, Y, Z – Raw data for each named coordinate.
- Return type:
arrays
- instant_pdist_1d()
Returns the x and y pdist datasets for a single iteration.
- Returns:
Xdata, y – x (dataset) and y (pdist) axis values
- Return type:
arrays
- instant_pdist_2d()
Returns the xyz pdist datasets for a single iteration.
- Returns:
x, y, norm_hist – x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.
- Return type:
arrays
- make_new_h5(new_weights=None)
TODO: actually make a new h5 file, see bstate filter code, integrate all. If self.H5save_out is not None and X/Y/Zsave_name is not None. Saves out a new h5 file of name self.H5save_out with the current X/Y/Zname data into auxdata of h5 file with name of X/Y/Zsave_name.
- Parameters:
new_weights (numpy object array) – Updated weight values, e.g. from skip_basis or succ_only.
- pdist(normalize=True)
Main public method with pdist generation controls.
- Parameters:
normalize (bool) – By default (True), normalizes the output pdist. Must be True when using multiple h5 input files.
- Returns:
X, Y, Z – Output probability distributions.
- Return type:
arrays
- plot_trace(walker_tuple, color='white', linewidth=1.0, linestyle='-', ax=None, find_iter_seg=False, mark_points=False, mp_size=80, mp_color=None, mp_markers=('o', 'v'), **kwargs)
Plot trace.
- Parameters:
walker_tuple (tuple) – (iteration, walker) start point to trace from. Can also find the closest iteration/seg using input as (X_value,Y_value). find_iter_seg must be True to use this setting.
color (str)
linewidth (int)
linestyle (str)
ax (mpl axes object)
find_iter_seg (bool) – Default False and use walker tuple as (iter, seg). Set True to look for (iter, seg) using walker_tuple input as (X_value,Y_value).
mark_points (bool) – Default False, set to true to mark the starting and end points of the trace path.
mp_size (int) – Size of the marked points, default 80.
mp_color (str) – Color of the marked points, if None, defaults to color arg.
mp_markers (tuple) – Two item tuple: start point marker style, end point marker style.
**kwargs – Passed to mpl plt.plot line plots. E.g. alpha parameter.
- Returns:
aux or aux_x, aux_y – The coordinate values at each point in the trace.
- Return type:
1D arrays
- reshape_total_data_array(array)
Take an input 1d array of the data values at every segment for each iteration, and reshape them to make pdists.
- Parameters:
array (1d array) – Data values at every segment for each iteration.
- Returns:
array – Now rows = segments, columns = frame until tau, depth = data dimensions.
- Return type:
ndarray
- succ_pdist_weight_filter()
TODO: Filter weights to be zero for all non successfull trajectories. Make an array of zero weights and fill out weights for succ trajs only. option to output new h5?
- Returns:
succ_weights – Updated weight array.
- Return type:
numpy object array
- trace_walker(walker_tuple, first_iter=1)
Get trace path of an input (iteration, walker).
- Parameters:
walker_tuple (tuple) – (iteration, walker)
first_iter (int) – Iter to trace back to. Default 1.
- Returns:
trace – Tuples are (iteration, walker) traces.
- Return type:
list of tuples
- w_succ()
Find and return all successfully recycled (iter, seg) pairs.
- Returns:
succ
- Return type:
list of tuples (iter,wlk)
wedap.h5_plot module
Main plotting class of wedap. Plot all of the datasets generated with H5_Pdist.
line – plot 1D lines.
hist – plot histogram (default).
hist_l – plot histogram and contour lines.
contour – plot contour levels and lines.
contour_f – plot contour levels
contour_l – plot contour lines only.
scatter3d – plot 3 datasets in a scatter plot.
hexbin3d – plot 3 datasets in a hexbin plot.
maybe a ridgeline plot? - This would be maybe for 1D avg of every 100 iterations - https://matplotlib.org/matplotblog/posts/create-ridgeplots-in-matplotlib/ Option to overlay different datasets, could be done easily with python but maybe a cli option?
TODO: bin visualizer? and maybe show the trajectories as just dots?
- class wedap.h5_plot.H5_Plot(X=None, Y=None, Z=None, plot_mode='hist', cmap=None, smoothing_level=None, color=None, ax=None, p_min=None, p_max=None, contour_interval=1, contour_levels=None, cbar_label=None, cax=None, jointplot=False, data_label=None, proj3d=False, proj4d=False, C=None, scatter_interval=10, scatter_s=1, hexbin_grid=100, linewidth=None, linestyle='-', postprocess_func=None, *args, **kwargs)
Bases:
H5_Pdist
These methods provide various plotting options for pdist data.
- Variables:
cbar_pad (float) – Default 0.05, can update this attribute to change cbar padding.
- add_cbar(cax=None, pad=0.05, fontsize=None)
Add cbar.
- Parameters:
cax (mpl cbar axis) – Optionally specify the cbar axis.
pad (float) – cbar padding level.
fontsize (float) – Use custom value if fontsize is specified, otherwise use style default.
- static gaussian_filter(data, sigma)
Apply Gaussian smoothing to a 2D array.
- Parameters:
data (ndarray) – Input 2D array.
sigma (float) – Standard deviation of the Gaussian filter.
- Returns:
Smoothed 2D array.
- Return type:
ndarray
- static load_module(module_name, path=None)
Load and return the given module, recursively loading containing packages as necessary.
- plot(cbar=True)
Main public method. Master plotting run function Parse plot type and add cbars/tightlayout/plot_options/smoothing
- Parameters:
cbar (bool) – Whether or not to include a colorbar.
- Returns:
self.fig, self.ax – Generates, updates, and returns figure and axes objects.
- Return type:
mpl figure and axes objects
- plot_bar()
Simple bar plot.
- plot_contour_f()
2d contour plot, fill.
- plot_contour_l()
2d contour plot, lines.
- plot_hexbin3d(gridsize=100)
Hexbin plot.
- plot_hist()
2d hist plot.
- plot_line()
1d line plot.
- plot_margins()
Joint plot of heatmap (pcolormesh). Must input raw probabilities from H5_Pdist(p_units = ‘raw’).
- plot_scatter3d(interval=10, s=1)
3d scatter plot.
- Parameters:
interval (int) – Interval to consider the XYZ datasets, increase to use less data.
s (float) – mpl scatter marker size.
wedap.h5_gif module
Helper function for making gifs.
- wedap.h5_gif.make_gif(first_iter, last_iter, step_iter=1, avg_plus=100, duration=50, gif_out='example.gif', **kwargs)
Convenience function for gif making. Note that this is tailored for making average pdist plots.
- Parameters:
first_iter (int) – Where to start the gif.
last_iter (int) – Where to end the gif. Important here is that you make sure avg_plus + last_iter does not exceed the total amount of iters you have available in the h5 file.
step_iter (int) – Interval for looping the first to last iter requested.
avg_plus (int) – The +range of interations for each iter in range(first,last,step). So as the loop progresses, avg_plus is added to each iter to make the range that the average pdist is taken from. Important here is that you make sure avg_plus + last_iter does not exceed the total amount of iters you have available in the h5 file. If you set avg_plus to 0, it will make instant plots of each iter in the range requested.
duration (int) – Duration in milliseconds between frames of the gif, default 50ms.
gif_out (str) – Out path to created gif file, default ‘example.gif’.
**kwargs – Can be useful to input dictionary of kwargs for H5_Plot init. E.g. can put xlim, xlabel, grid, Xname, etc.