mdap.md_pdist module

Convert MD analyzed data to pdists.

TODO:

Maybe could include an arg for custom weights

class mdap.md_pdist.MD_Pdist(data_type=None, Xname=None, Xindex=1, Yname=None, Yindex=1, Zname=None, Zindex=1, Xinterval=1, Yinterval=1, Zinterval=1, data_proc=None, first_iter=1, last_iter=None, bins=(100, 100), p_units='kT', T=298, histrange_x=None, histrange_y=None, no_pbar=False, timescale=1000000, *args, **kwargs)

Bases: H5_Pdist

These class methods generate probability distributions from input MD data files.

aux_to_pdist_1d(iteration)

Take the auxiliary dataset for a single iteration and generate a weighted 1D probability distribution.

Parameters:

iteration (int) – Desired iteration to extract timeseries info from.

Returns:

  • midpoints_x (ndarray) – Histogram midpoint bin values for target aux coordinate of dimension 0.

  • midpoints_y (ndarray) – Optional histogram midpoint bin values for target aux coordinate of dimension 1.

  • histogram (ndarray) – Raw histogram count values of each histogram bin. Can be later normalized as -lnP(x).

aux_to_pdist_2d(iteration)

Take the auxiliary dataset for a single iteration and generate a weighted 2D probability distribution.

Parameters:

iteration (int) – Desired iteration to extract timeseries info from.

Returns:

  • midpoints_x (ndarray) – Histogram midpoint bin values for target aux coordinate of dimension 0.

  • midpoints_y (ndarray) – Optional histogram midpoint bin values for target aux coordinate of dimension 1.

  • histogram (ndarray) – Raw histogram count values of each histogram bin. Can be later normalized as -lnP(x).

average_datasets_3d(interval=1)

Unique case where Zname is specified and the XYZ datasets are returned. Averaged over the iteration range.

Returns:

X, Y, Z – Raw data for each named coordinate.

Return type:

arrays

average_datasets_4d(interval=1)

Unique case where Zname is specified and the XYZ datasets are returned. Averaged over the iteration range. With Cname, 4d.

Returns:

X, Y, Z, C – Raw data for each named coordinate.

Return type:

arrays

average_pdist_1d()

1 dataset: average pdist for a range of iterations.

Returns:

x and y axis values, x is the coordinate values and y is probabilities.

Return type:

x, y

average_pdist_2d()

2 datasets: average pdist for a range of iterations.

Returns:

x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.

Return type:

x, y, norm_hist

evolution_pdist()

Returns the pdist for 1 coordinate for the range iterations specified.

Returns:

x, y, norm_hist – x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.

Return type:

arrays

find_iter_seg_from_xy_vals(val_x, val_y)

Find and return (iter, seg) closest to input data value(s).

Parameters:
  • val_x (int or float) – X dataset value to search for.

  • val_y (int or float) – Y dataset value to search for.

Returns:

iter_num, seg_num – Iteration, segment number.

Return type:

int, int

get_all_weights()

Returns an 1D array of the weight for every frame of each tau for all segments of all iterations specified.

Returns:

weights_expanded

Return type:

array

get_coords(path, data_name, data_index)

Get a list of data coordinates for plotting traces. Only grabs the last frames.

Parameters:
  • path (list of tuples) – Tuples are (iteration, walker) traces.

  • data_name (str) – Name of dataset.

  • data_index (int) – Index of dataset.

Returns:

coordinates – Array of coordinates from the list of (iteration, walker) tuples.

Return type:

1d array

get_full_coords(walker_tuple, data_name, data_index=0, first_iter=1)

Returns a full 1D set of data for a single trace (path). This will be ordered from the first iter to the last.

Parameters:
  • walker_tuple (tuple) – (iteration, walker) start point to trace from.

  • data_name (str) – Name of dataset.

  • data_index (int) – Index of dataset.

  • first_iter (int) – Iter to trace back to. Default 1.

Returns:

coordinates – Array of coordinates from the list of (iteration, walker) tuples.

Return type:

1d array

get_parents(walker_tuple)

Get parent of an input (iteration, walker).

Parameters:

walker_tuple (tuple) – (iteration, walker)

Returns:

parent

Return type:

iteration, walker

get_total_data_array(name, index=0, interval=1, reshape=True)

Loop through all iterations specified and get a 1d raw data array. # TODO: this could be organized better with my other methods maybe I can separate the helper functions into another class for extracting and moving data around, this pdist class could be used strictly for making pdists from a nice and standard data array input that is handled by the H5_Processing class

Parameters:
  • name (str) – Name of data from h5 file such as pcoord or an aux dataset.

  • index (int) – Index of the data from h5 file.

  • interval (int) – If more sparse data is needed for efficiency.

  • reshape (bool) – Option to reshape into 1d array instead of each seg for all tau values.

Returns:

data – Raw (unweighted) data array for the name specified.

Return type:

1d array

instant_datasets_3d()

Unique case where Zname is specified and the XYZ datasets are returned. For single iteration.

Returns:

X, Y, Z – Raw data for each named coordinate.

Return type:

arrays

instant_pdist_1d()

Returns the x and y pdist datasets for a single iteration.

Returns:

Xdata, y – x (dataset) and y (pdist) axis values

Return type:

arrays

instant_pdist_2d()

Returns the xyz pdist datasets for a single iteration.

Returns:

x, y, norm_hist – x and y axis values, and if using Y or evolution (with only X), also returns norm_hist. norm_hist is a 2-D matrix of the normalized histogram values.

Return type:

arrays

make_new_h5(new_weights=None)

TODO: actually make a new h5 file, see bstate filter code, integrate all. If self.H5save_out is not None and X/Y/Zsave_name is not None. Saves out a new h5 file of name self.H5save_out with the current X/Y/Zname data into auxdata of h5 file with name of X/Y/Zsave_name.

Parameters:

new_weights (numpy object array) – Updated weight values, e.g. from skip_basis or succ_only.

pdist()

Main public method with pdist generation controls.

pdist_1d()
Returns:

  • X (ndarray)

  • Y (ndarray)

pdist_2d()
Returns:

  • X (ndarray)

  • Y (ndarray)

  • Z (ndarray)

pdist_3d()
Returns:

  • X (ndarray)

  • Y (ndarray)

  • Z (ndarray)

plot_trace(walker_tuple, color='white', linewidth=1.0, linestyle='-', ax=None, find_iter_seg=False, mark_points=False, mp_size=80, mp_color=None, mp_markers=('o', 'v'), **kwargs)

Plot trace.

Parameters:
  • walker_tuple (tuple) – (iteration, walker) start point to trace from. Can also find the closest iteration/seg using input as (X_value,Y_value). find_iter_seg must be True to use this setting.

  • color (str)

  • linewidth (int)

  • linestyle (str)

  • ax (mpl axes object)

  • find_iter_seg (bool) – Default False and use walker tuple as (iter, seg). Set True to look for (iter, seg) using walker_tuple input as (X_value,Y_value).

  • mark_points (bool) – Default False, set to true to mark the starting and end points of the trace path.

  • mp_size (int) – Size of the marked points, default 80.

  • mp_color (str) – Color of the marked points, if None, defaults to color arg.

  • mp_markers (tuple) – Two item tuple: start point marker style, end point marker style.

  • **kwargs – Passed to mpl plt.plot line plots. E.g. alpha parameter.

Returns:

aux or aux_x, aux_y – The coordinate values at each point in the trace.

Return type:

1D arrays

reshape_total_data_array(array)

Take an input 1d array of the data values at every segment for each iteration, and reshape them to make pdists.

Parameters:

array (1d array) – Data values at every segment for each iteration.

Returns:

array – Now rows = segments, columns = frame until tau, depth = data dimensions.

Return type:

ndarray

succ_pdist_weight_filter()

TODO: Filter weights to be zero for all non successfull trajectories. Make an array of zero weights and fill out weights for succ trajs only. option to output new h5?

Returns:

succ_weights – Updated weight array.

Return type:

numpy object array

timeseries()
Returns:

  • X (ndarray)

  • Y (ndarray)

trace_walker(walker_tuple, first_iter=1)

Get trace path of an input (iteration, walker).

Parameters:
  • walker_tuple (tuple) – (iteration, walker)

  • first_iter (int) – Iter to trace back to. Default 1.

Returns:

trace – Tuples are (iteration, walker) traces.

Return type:

list of tuples

w_succ()

Find and return all successfully recycled (iter, seg) pairs.

Returns:

succ

Return type:

list of tuples (iter,wlk)

mdap.md_plot module

Plot MD pdists.

TODO: specify option for:

timeseries (option for KDE side plot and option for stdev vs all reps) pdist (1D hist, 1D KDE, + others from H5_Plot) others?

class mdap.md_plot.MD_Plot(*args, **kwargs)

Bases: H5_Plot, MD_Pdist

These methods provide various plotting options for pdist data.