Structures

Histograms:

histogram(edges[, bins, out_of_range, ...])

A multidimensional histogram.

Histogram(edges[, bins, make_bins, weighted])

An element to produce histograms.

NumpyHistogram(*args, **kwargs)

Create a histogram using a 1-dimensional numpy.histogram.

Graph:

graph(coords[, field_names, scale])

Numeric arrays of equal size.

Graph([points, context, scale, sort])

Function at given coordinates (arbitraty dimensions).

root_graph_errors(graph[, type_code])

2-dimensional ROOT graph with errors.

ROOTGraphErrors()

Element to convert graphs to root_graph_errors.

DictTable:

dict_table(rows)

A list of dictionaries with easy selection.

DictTable()

Element to create dict_table-s.

Elements:

HistToGraph([make_value, get_coordinate, ...])

Transform a histogram to a graph.

ScaleTo(scale_to)

scale_to is the number to which the data will be scaled.

Split into bins:

IterateBins([create_edges_str, select_bins])

Iterate bins of histograms.

MapBins(seq[, select_bins, get_example_bin, ...])

Transform bin content of histograms.

SplitIntoBins(seq, arg_var, edges)

Split analysis into groups defined by bins.

Histogram functions:

HistCell(edges, bin, index)

A namedtuple with fields edges, bin, index.

cell_to_string(cell_edges[, var_context, ...])

Transform cell edges into a string.

check_edges_increasing(edges)

Assure that multidimensional edges are increasing.

get_bin_edges(index, edges)

Return edges of the bin for the given edges of a histogram.

get_bin_on_index(index, bins)

Return bin corresponding to multidimensional index.

get_bin_on_value(arg, edges)

Get the bin index for arg in a multidimensional array edges.

get_bin_on_value_1d(val, arr)

Return index for value in one-dimensional array.

get_example_bin(struct)

Return bin with zero index on each axis of the histogram bins.

hist_to_graph(hist[, make_value, ...])

Convert a histogram to a graph.

init_bins(edges[, value, deepcopy])

Initialize cells of the form edges with the given value.

integral(bins, edges)

Compute integral (scale for a histogram).

iter_bins(bins)

Iterate on bins.

iter_bins_with_edges(bins, edges)

Generate (bin content, bin edges) pairs.

iter_cells(hist[, ranges, coord_ranges])

Iterate cells of a histogram hist, possibly in a subrange.

unify_1_md(bins, edges)

Unify 1- and multidimensional bins and edges.

Histograms

class histogram(edges, bins=None, out_of_range=None, n_out_of_range=0)[source]

A multidimensional histogram.

Arbitrary dimension, variable bin size and weights are supported. Lower bin edge is included, upper edge is excluded. Underflow and overflow values are skipped. Bin content can be of arbitrary type, which is defined during initialization.

Examples:

>>> # a one-dimensional histogram
>>> hist = histogram([1, 2, 3])
>>> hist.fill(1)
>>> hist.bins
[1, 0]
>>> # a two-dimensional histogram
>>> hist = histogram([[0, 1, 2], [0, 1, 2]])
>>> hist.fill([0, 1])
>>> hist.bins
[[0, 1], [0, 0]]

edges is a sequence of one-dimensional arrays, each containing strictly increasing bin edges. bins, if not provided, are initialized with zeroes.

A histogram bin can be any object that supports addition with weight during fill (if you plan to fill the histogram). See init_bins() on how to create initial bins manually.

A histogram can be created from existing bins and edges. In this case a simple check of the shape of bins is done (raising LenaValueError if failed).

Attributes

edges is a list of edges on each dimension. Edges mark the borders of the bin. Edges along each dimension are one-dimensional lists, and the multidimensional bin is the result of all intersections of one-dimensional edges. For example, a 3-dimensional histogram has edges of the form [x_edges, y_edges, z_edges], and the 0th bin has borders ((x[0], x[1]), (y[0], y[1]), (z[0], z[1])).

Index in the edges is a tuple, where a given position corresponds to a dimension, and the content at that position to the bin along that dimension. For example, index (0, 1, 3) corresponds to the bin with lower edges (x[0], y[1], z[3]).

bins is a list of nested lists. Same index as for edges can be used to get bin content: bin at (0, 1, 3) can be obtained as bins[0][1][3]. Most nested arrays correspond to highest (further from x) coordinates. For example, for a 3-dimensional histogram bins equal to [[[1, 1], [0, 0]], [[0, 0], [0, 0]]] mean that the only filled bins are those where x and y indices are 0, and z index is 0 and 1.

dim is the dimension of a histogram (length of its edges for a multidimensional histogram).

out_of_range is a list of dim lists, each one corresponding to a coordinate. Each out_of_range sublist contains two items, the first one (index 0) for underflow, the second for overflow.

If subarrays of edges are not increasing or if any of them has length less than 2, LenaValueError is raised.

Developer’s note

one- and multidimensional histograms have different formats of bins and edges. For a better user experience, 1-dimensional edges are simply a list of x-edges, which is more intuitive, and also one-dimensional histograms are used more often. To be unified internally, 1-dimensional edges are nested into a list (like [[1, 2, 3]]). To unify 1- and multidimensional interfaces in your code, use unify_1_md().

__add__(other, edges_abs_tol=0.0, edges_rel_tol=1e-09)[source]

Add a histogram other to this one.

For each bin, the corresponding bin of other is added.

Histograms must have the same edges. They are compared approximately using math.isclose() with edges_abs_tol and edges_rel_tol tolerance levels. Use __sub__() for subtraction.

__mul__(num)[source]

Multiply bins by num and return a new histogram.

Number of events outside of histogram range are also rescaled.

A histogram should be multiplied on the right, hist2 = hist * 2. An inplace multiplication with *= is also allowed.

__sub__(other, edges_abs_tol=0.0, edges_rel_tol=1e-09)[source]

Subtract other from this histogram and return the result.

See __add__() on keyword arguments.

__eq__(other, abs_tol=0.0, rel_tol=1e-09)[source]

Two histograms are equal, if and only if they have equal bins, edges and numbers of outliers.

Note that floating numbers are compared approximately using math.isclose().

fill(coord, weight=1)[source]

Fill histogram at coord with the given weight.

Coordinates outside the histogram edges are added to n_out_of_range.

No additional accounting for weights is being done. For example, to calculate a sum of squares of weights like in ROOT, do it separately in parallel.

Example:

hist.fill((0, 1), 0.5)

fills a two-dimensional histogram at the bin containing the point (0, 1) with a weight of 0.5.

get_n_events()[source]

Return number of events in the histogram (with weights).

If the histogram was filled N times (with weights w_i), return N (sum of w_i). To get events outside of the histogram range, use n_out_of_range.

get_scale()[source]

Calculate integral of the histogram.

The integral (area under the histogram) is the sum of bin contents multiplied by bin size.

scale_to(other, n_events=False)[source]

Return a histogram rescaled to other.

other must be a number or a structure with a method get_scale() (or get_n_events()).

If n_events is true, rescale histogram so that the resulting number of events is set to other. Otherwise rescale the integral of this histogram.

If outliers should be taken into account, use n_out_of_range and the multiplication operator directly.

Histograms with scale equal to zero can not be rescaled, or LenaValueError is raised.

class Histogram(edges, bins=None, make_bins=None, weighted=False)[source]

An element to produce histograms.

edges and bins have the same meaning as during creation of a histogram.

weighted is used during fill().

make_bins is a function without arguments that creates new bins if provided (it will be called during __init__() and reset()), bin check is made. If both bins and make_bins are provided, LenaTypeError is raised.

compute()[source]

Yield histogram with context.

fill(value)[source]

Fill the histogram with value.

value can be a (data, context) pair. Values outside the histogram edges are ignored. If the Histogram was initialized weighted, data is assumed to be a tuple ((bin coordinates), weight).

reset()[source]

Reset the histogram.

Current context is reset to an empty dict. Bins are reinitialized with the initial_value or with make_bins() (depending on the initialization).

class NumpyHistogram(*args, **kwargs)[source]

Create a histogram using a 1-dimensional numpy.histogram.

The result of compute is a Lena histogram, but it is calculated using numpy histogram, and all its initialization arguments are passed to numpy.

Examples

With the default arguments NumpyHistogram(), bins are automatically derived from data.

With NumpyHistogram(bins=list(range(0, 5)), density=True) bins are set explicitly.

Warning

as numpy histogram is computed from an existing array, all values are stored in the internal data structure during fill, which is memory unsafe.

Use *args and **kwargs for numpy.histogram initialization.

Default bins keyword argument is auto.

Deprecated since version 0.6: A keyword argument reset specifies the exact behaviour of request.

compute()[source]

Yield the computed histogram and context.

fill(val)[source]

Add data to the internal storage.

request()[source]

Deprecated since version 0.6.

Compute the final histogram.

Return histogram with context.

If reset was set during the initialization, reset method is called.

reset()[source]

Reset data and context.

Remove all data for this histogram and set current context to {}.

Graph

class graph(coords, field_names=('x', 'y'), scale=None)[source]

Numeric arrays of equal size.

This structure generally corresponds to the graph of a function and represents arrays of coordinates and the function values of arbitrary dimensions.

coords is a list of one-dimensional coordinate and value sequences (usually lists). There is little to no distinction between them, and “values” can also be called “coordinates”.

field_names provide the meaning of these arrays. For example, a 3-dimensional graph could be distinguished from a 2-dimensional graph with errors by its fields (“x”, “y”, “z”) versus (“x”, “y”, “error_y”). Field names don’t affect drawing graphs: for that Variable-s should be used. Default field names, provided for the most used 2-dimensional graphs, are “x” and “y”.

field_names can be a string separated by whitespace and/or commas or a tuple of strings, such as (“x”, “y”). field_names must have as many elements as coords and each field name must be unique. Otherwise field names are arbitrary. Error fields must go after all other coordinates. Name of a coordinate error is “error_” appended by coordinate name. Further error details are appended after ‘_’. They could be arbitrary depending on the problem: “low”, “high”, “low_90%_cl”, etc. Example: (“E”, “time”, “error_E_low”, “error_time”).

scale of the graph is a kind of its norm related to plotting that graph with other structures (like histograms). It could be the integral of the function or its other property. A scale of a normalised probability density function would be one. An initialized scale is required if one needs to renormalise the graph in scale() (for example, to plot it with other graphs).

Coordinates of a function graph would usually be arrays of increasing values, which is not required here. Neither is it checked that coordinates indeed contain one-dimensional numeric values. However, non-standard graphs will likely lead to errors during plotting and will require more programmer’s work and caution, so use them only if you understand what you are doing.

A graph can be iterated yielding tuples of numbers for each point.

Attributes

coords is a list of one-dimensional lists of coordinates.

field_names

dim is the dimension of the graph, that is of all its coordinates without errors.

In case of incorrect initialization arguments, LenaTypeError or LenaValueError is raised.

Added in version 0.5.

rows()[source]

Return an iterable on rows of the graph.

Each row is a tuple of graph coordinates. Row representation is used to create tables in output.ToCSV.

scale_to(other)[source]

Return a new graph rescaled to other.

If a numeric other is provided, rescale to that value. If the graph has unknown or zero scale, rescaling that will raise LenaValueError.

To get meaningful results, graph’s fields are used. Only the last coordinate is rescaled. For example, if the graph has x and y coordinates, then y will be rescaled, and for a 3-dimensional graph z will be rescaled. All errors are rescaled together with their coordinate.

class Graph(points=None, context=None, scale=None, sort=True)[source]

Function at given coordinates (arbitraty dimensions).

Graph points can be set during the initialization and during fill(). It can be rescaled (producing a new Graph). A point is a tuple of (coordinate, value), where both coordinate and value can be tuples of numbers. Coordinate corresponds to a point in N-dimensional space, while value is some function’s value at this point (the function can take a value in M-dimensional space). Coordinate and value dimensions must be the same for all points.

One can get graph points as Graph.points attribute. They will be sorted each time before return if sort was set to True. An attempt to change points (use Graph.points on the left of ‘=’) will raise Python’s AttributeError.

points is an array of (coordinate, value) tuples.

context is the same as the most recent context during fill. Use it to provide a context when initializing a Graph from existing points.

scale sets the scale of the graph. It is used during plotting if rescaling is needed.

Graph coordinates are sorted by default. This is usually needed to plot graphs of functions. If you need to keep the order of insertion, set sort to False.

By default, sorting is done using standard Python lists and functions. You can disable sort and provide your own sorting container for points. Some implementations are compared here. Note that a rescaled graph uses a default list.

Note that Graph does not reduce data. All filled values will be stored in it. To reduce data, use histograms.

compute()[source]

Yield graph with context.

If sort was initialized True, graph points will be sorted.

fill(value)[source]

Fill the graph with value.

Value can be a (data, context) tuple. Data part must be a (coordinates, value) pair, where both coordinates and value are also tuples. For example, value can contain the principal number and its precision.

property points

Get graph points.

reset()[source]

Reset points to an empty list and current context to an empty dict.

rows()[source]

Return an iterable on rows of the Graph (used to create tables in output.ToCSV).

Each row is a flat tuple of coordinates and their values.

scale(other=None)[source]

Get or set the scale.

Graph’s scale comes from an external source. For example, if the graph was computed from a function, this may be its integral passed via context during fill(). Once the scale is set, it is stored in the graph. If one attempts to use scale which was not set, LenaAttributeError is raised.

If other is None, return the scale.

If a float other is provided, rescale to other. A new graph with the scale equal to other is returned, the original one remains unchanged. Note that in this case its points will be a simple list and new graph sort parameter will be True.

Graphs with scale equal to zero can’t be rescaled. Attempts to do that raise LenaValueError.

class root_graph_errors(graph, type_code='d')[source]

2-dimensional ROOT graph with errors.

This is an adapter for TGraphErrors and contains that graph as a field root_graph.

graph is a Lena graph.

type_code is the basic numeric type of array values (by default double). ‘f’ means floating values. See Python module array for more options.

Added in version 0.5.

class ROOTGraphErrors[source]

Element to convert graphs to root_graph_errors.

__call__(value)[source]

Convert data part of the value (which must be a graph) to root_graph_errors.

Added in version 0.5.

DictTable

A dict_table facilitates template rendering of large number of values. First, one combines the computed values into a table with the element DictTable. Data parts of the values should be dictionaries, context parts will be automatically merged into them. The computed dict_table can be easily used in a LaTeX template. A simplified example of a jinja2 template for a table with three fit coefficients for six data samples:

\begin{tabular}{ll*{3}{c}}
\toprule
    &   & $a_0$ & $a_1$ & $a_2$ \\
\BLOCK{ for detector in ("FDI", "FDII", "ND") }
\midrule
\BLOCK{ for data_type in ("data", "MC") }
\BLOCK{ set dt = context["data.detector." + detector]["data.data_type." + data_type]["dt"] }
\VAR{ detector if loop.index == 1 } & \VAR{ data_type } & \VAR{ dt.coefs[0] -}
\VAR{ " &" } \VAR{ dt.coefs[1] } & \VAR{ dt.coefs[2] -} \\
\BLOCK{ endfor }
\BLOCK{ endfor }
\bottomrule
\end{tabular}

Note how the sample is selected for each row with context["data.detector." + detector]["data.data_type." + data_type], and the nested object with needed coefficients is retrieved with its key dt.

See also

sophisticated selections are supported by pandas.DataFrame.loc; however, they require flattened structures and indexing the rows.

class dict_table(rows)[source]

A list of dictionaries with easy selection.

The table content is a list of rows, where each row is a dictionary. A row typically contains context metadata with values of interest.

Example:

>>> d1 = {"detector": "FD",  "mean": 3, "mean_err": 2}
>>> d2 = {"detector": "ND",  "mean": 2.5, "mean_err": 3}
>>>
>>> dt = dict_table([d1, d2])
>>> # filtering rows returns a dictionary
>>> dt["detector.FD"]  # d1
{'detector': 'FD', 'mean': 3, 'mean_err': 2}
>>> dt["detector.FD"]["mean"]
3

One can iterate a dict_table yielding contained rows, get its length and perform its boolean testing. Table length is the number of rows it contains. Tables with equal rows in the same order are considered equal. Sort rows in advance if needed for equality testing.

__getitem__(key)[source]

Get a selection of rows or columns of the table.

If the key corresponds to an item (“detector.FD”), return a selection with filter_rows(). If the key has a value (“mean”), return filter_columns() with key. In case of errors, use the corresponding functions.

Example: dt["detector.FD"]["mean"] returns 3, dt["detector"] returns ["FD", "ND"].

filter_columns(key)[source]

Get the value of the key from the table.

key is a string. If it contains a dot it corresponds to a nested dictionary .

Note

key is the same as an item in filter_rows() with a semantic distinction: an item is typically a fixed expression for filtering (“detector”, “ND”), while a key is for getting calculated values (“variable”, float); we omit the unknown last part of the key during the retrieval.

If the table contains one element, return the value corresponding to the key. If there are several elements with key, return a list of those values. If no rows have been found, raise LenaKeyError.

Example:

>>> dt["accepted.True"]  
# boolean values can be used as strings
>>> dt["mean"]  
[3, 2.5]

Returned list informs us that we forgot to make a proper row selection before getting the value. We have to update our selections when extending the results with new values (adding new filters, data sources, etc.).

filter_rows(item)[source]

Get a subtable (slice) of rows containing item.

item is a string corresponding to an item of a Python dict (a (key, value) pair) joined by a dot. Dictionary items can be nested. Example: filter_rows("data.detector.FD").

Return a new non-empty dict_table. If it contains only one element, return that element instead. If no elements have been found, LenaKeyError is raised.

class DictTable[source]

Element to create dict_table-s.

Example:

s = Source(
    # read data
    Sum(),
    # put data part into a dictionary
    # thus naming incoming values
    Data(lambda val: {"sum": val}),
    DictTable(),
    # put dict_table into a subcontext for jinja
    Data(lambda val: {"context": val}),
    # use a custom function to render LaTeX
    # from data part of values
    # my_render_latex("mytable.tex"),
    # write rendered tables
)
compute()[source]

Yield dict_table with context from filled values.

Resulting context is the intersection of all contexts. Each row in the table retains its own context.

fill(value)[source]

Update data part of value with its context part and add to the table.

Elements

class HistToGraph(make_value=None, get_coordinate='left', field_names=('x', 'y'), scale=None)[source]

Transform a histogram to a graph.

make_value is a Variable that creates graph value from the bin value.

get_coordinate defines the coordinate of the graph point. By default, it is the left bin edge. Other allowed values are “right” and “middle”.

field_names set field names of resulting graphs.

scale sets scales of resulting graphs. If it is True, the scale is computed from the histogram.

See hist_to_graph() for details and examples.

Incorrect values for make_value or get_coordinate raise, respectively, LenaTypeError or LenaValueError.

Added in version 0.3.

run(flow)[source]

Iterate the flow and transform histograms to graphs.

context.value is updated with make_value context. If histogram bins contained context (which is assumed to be the same for all bins), make_value context is composed with that.

Not histograms or histograms with context.histogram.to_graph set to False pass unchanged.

class ScaleTo(scale_to)[source]

scale_to is the number to which the data will be scaled.

To scale a group of values, use lena.flow.GroupScale.

__call__(value)[source]

Scale the data part of the value.

If the structure has zero or unknown scale, LenaValueError or LenaAttributeError will be raised.

Split into bins

Split analysis into groups defined by bins.

class IterateBins(create_edges_str=None, select_bins=None)[source]

Iterate bins of histograms.

create_edges_str is a callable that creates a string from bin’s edges and coordinate names and adds that to the context. It is passed parameters (edges, var_context), where var_context is variable context containing variable names (it can be a single Variable or Combine). By default it is cell_to_string().

select_bins is a callable used to test bin contents. By default, only those histograms are iterated where bins contain histograms. Use select_bins to choose other classes. See Selector for examples.

If create_edges_str is not callable, LenaTypeError is raised.

run(flow)[source]

Yield histogram bins one by one.

For each histogram from the flow, if its bins pass select_bins, they are iterated.

The resulting context is taken from bin’s context. Histogram’s context is preserved in context.bins. context.bin is updated with “edges” (with bin edges) and “edges_str” (their representation). If histogram’s context contains variable, that is used for edges’ representation.

Not histograms pass unchanged.

class MapBins(seq, select_bins=<function MapBins.<lambda>>, get_example_bin=<function get_example_bin>, drop_bins_context=True)[source]

Transform bin content of histograms.

This class can be used when histogram bins contain complex structures. For example, in order to plot a histogram with a 3-dimensional vector in each bin, one can create 3 histograms corresponding to the vector’s components.

seq is a sequence or an element applied to bin contents. If seq is not a Sequence or an element with run method, it is converted to a Sequence. Example: seq=Split([X(), Y(), Z()]) (provided that you have X, Y, Z variables).

If select_bins applied to histogram bins is True (tested on an arbitrary bin), the histogram is transformed. Bin types can be given in a list or as a general Selector. For example, select_bins=[lena.math.vector3, list] selects histograms where bins are vectors or lists. By default all histograms are accepted.

The “arbitrary bin” is returned by a callable get_example_bin (by default get_example_bin()).

MapBins creates histograms that may be plotted, because their bins contain only data without context. If drop_bins_context is False, context remains in bins. By default, context of all histogram bins is discarded. This discourages compositions of MapBins: make compositions of their internal sequences instead.

In case of incorrect arguments, LenaTypeError is raised.

run(flow)[source]

Transform histograms from flow.

context.value is updated with bin context (if that exists). It is assumed that all bins have the same context (because they were produced by the same sequence), therefore an arbitrary bin is taken and contexts of all other bins are ignored.

Not selected values pass unchanged.

class SplitIntoBins(seq, arg_var, edges)[source]

Split analysis into groups defined by bins.

seq is a FillComputeSeq sequence (or will be converted to that) that corresponds to the analysis being performed for different bins. Deep copy of seq is done for each bin.

arg_var is a Variable that takes data and returns value used to compute the bin index. Example of a two-dimensional function: arg_var = lena.variables.Variable("xy", lambda event: (event.x, event.y)).

edges is a sequence of arrays containing monotonically increasing bin edges along each dimension. Example: edges = lena.math.mesh((0, 1), 10).

Note

The final histogram may contain vectors, histograms and any other data the analysis produced. To plot them, one can extract vector components with e.g. MapBins. If bin contents are histograms, they can be yielded one by one with IterateBins.

Attributes: bins, edges.

If edges are not increasing, LenaValueError is raised. In case of other argument initialization problems, LenaTypeError is raised.

compute()[source]

Yield a (histogram, context) pair for each compute() for all bins.

The histogram is created from edges with bin contents taken from compute() for bins. Computational context is preserved in histogram’s bins.

SplitIntoBins adds context as a subcontext variable (corresponding to arg_var). This allows unification of SplitIntoBins with common analysis using variables (useful when creating plots from one template). Existing context values are preserved.

Note

In Python 3 the minimum number of compute() among all bins is used. In Python 2, if some bin is exhausted before the others, its content will be filled with None.

fill(val)[source]

Fill the cell corresponding to arg_var(val) with val.

Values outside the edges are ignored.

Histogram functions

Functions for histograms.

These functions are used for low-level work with histograms and their contents. They are not needed for normal usage.

class HistCell(edges, bin, index)[source]

A namedtuple with fields edges, bin, index.

Create new instance of HistCell(edges, bin, index)

cell_to_string(cell_edges, var_context=None, coord_names=None, coord_fmt='{}_lte_{}_lt_{}', coord_join='_', reverse=False)[source]

Transform cell edges into a string.

cell_edges is a tuple of pairs (lower bound, upper bound) for each coordinate.

coord_names is a list of coordinates names.

coord_fmt is a string, which defines how to format individual coordinates.

coord_join is a string, which joins coordinate pairs.

If reverse is True, coordinates are joined in reverse order.

check_edges_increasing(edges)[source]

Assure that multidimensional edges are increasing.

If length of edges or its subarray is less than 2 or if some subarray of edges contains not strictly increasing values, LenaValueError is raised.

get_bin_edges(index, edges)[source]

Return edges of the bin for the given edges of a histogram.

In one-dimensional case index must be an integer and a tuple of (x_low_edge, x_high_edge) for that bin is returned.

In a multidimensional case index is a container of numeric indices in each dimension. A list of bin edges in each dimension is returned.

get_bin_on_index(index, bins)[source]

Return bin corresponding to multidimensional index.

index can be a number or a list/tuple. If index length is less than dimension of bins, a subarray of bins is returned.

In case of an index error, LenaIndexError is raised.

Example:

>>> from lena.structures import histogram, get_bin_on_index
>>> hist = histogram([0, 1], [0])
>>> get_bin_on_index(0, hist.bins)
0
>>> get_bin_on_index((0, 1), [[0, 1], [0, 0]])
1
>>> get_bin_on_index(0, [[0, 1], [0, 0]])
[0, 1]
get_bin_on_value(arg, edges)[source]

Get the bin index for arg in a multidimensional array edges.

arg is a 1-dimensional array of numbers (or a number for 1-dimensional edges), and corresponds to a point in N-dimensional space.

edges is an array of N-1 dimensional arrays (lists or tuples) of numbers. Each 1-dimensional subarray consists of increasing numbers.

arg and edges must have the same length (otherwise LenaValueError is raised). arg and edges must be iterable and support len().

Return list of indices in edges corresponding to arg.

If any coordinate is out of its corresponding edge range, its index will be -1 for underflow or len(edge)-1 for overflow.

Examples:

>>> from lena.structures import get_bin_on_value
>>> edges = [[1, 2, 3], [1, 3.5]]
>>> get_bin_on_value((1.5, 2), edges)
[0, 0]
>>> get_bin_on_value((1.5, 0), edges)
[0, -1]
>>> # the upper edge is excluded
>>> get_bin_on_value((3, 2), edges)
[2, 0]
>>> # one-dimensional edges
>>> edges = [1, 2, 3]
>>> get_bin_on_value(2, edges)
[1]
get_bin_on_value_1d(val, arr)[source]

Return index for value in one-dimensional array.

arr must contain strictly increasing values (not necessarily equidistant), it is not checked.

“Linear binary search” is used, that is our array search by default assumes the array to be split on equidistant steps.

Example:

>>> from lena.structures import get_bin_on_value_1d
>>> arr = [0, 1, 4, 5, 7, 10]
>>> get_bin_on_value_1d(0, arr)
0
>>> get_bin_on_value_1d(4.5, arr)
2
>>> # upper range is excluded
>>> get_bin_on_value_1d(10, arr)
5
>>> # underflow
>>> get_bin_on_value_1d(-10, arr)
-1
get_example_bin(struct)[source]

Return bin with zero index on each axis of the histogram bins.

For example, if the histogram is two-dimensional, return hist[0][0].

struct can be a histogram or an array of bins.

hist_to_graph(hist, make_value=None, get_coordinate='left', field_names=('x', 'y'), scale=None)[source]

Convert a histogram to a graph.

make_value is a function to set the value of a graph’s point. By default it is bin content. make_value accepts a single value (bin content) without context.

This option could be used to create graph’s error bars. For example, to create a graph with errors from a histogram where bins contain a named tuple with fields mean, mean_error and a context one could use

>>> make_value = lambda bin_: (bin_.mean, bin_.mean_error)

get_coordinate defines what the coordinate of a graph point created from a histogram bin will be. It can be “left” (default), “right” and “middle”.

field_names set field names of the graph. Their number must be the same as the dimension of the result. For a make_value above they would be (“x”, “y_mean”, “y_mean_error”).

scale becomes the graph’s scale (unknown by default). If it is True, it uses the histogram scale.

hist must contain only numeric bins (without context) or make_value must remove context when creating a numeric graph.

Return the resulting graph.

init_bins(edges, value=0, deepcopy=False)[source]

Initialize cells of the form edges with the given value.

Return bins filled with copies of value.

Value must be copyable, usual numbers will suit. If the value is mutable, use deepcopy = True (or the content of cells will be identical).

Examples:

>>> edges = [[0, 1], [0, 1]]
>>> # one cell
>>> init_bins(edges)
[[0]]
>>> # no need to use floats,
>>> # because integers will automatically be cast to floats
>>> # when used together
>>> init_bins(edges, 0.0)
[[0.0]]
>>> init_bins([[0, 1, 2], [0, 1, 2]])
[[0, 0], [0, 0]]
>>> init_bins([0, 1, 2])
[0, 0]
integral(bins, edges)[source]

Compute integral (scale for a histogram).

bins contain values, and edges form the mesh for the integration. Their format is defined in histogram description.

iter_bins(bins)[source]

Iterate on bins. Yield (index, bin content).

Edges with higher index are iterated first (that is z, then y, then x for a 3-dimensional histogram).

iter_bins_with_edges(bins, edges)[source]

Generate (bin content, bin edges) pairs.

Bin edges is a tuple, such that its item at index i is (lower bound, upper bound) of the bin at i-th coordinate.

Examples:

>>> from lena.math import mesh
>>> list(iter_bins_with_edges([0, 1, 2], edges=mesh((0, 3), 3)))
[(0, ((0, 1.0),)), (1, ((1.0, 2.0),)), (2, ((2.0, 3),))]
>>>
>>> # 2-dimensional histogram
>>> list(iter_bins_with_edges(
...     bins=[[2]], edges=mesh(((0, 1), (0, 1)), (1, 1))
... ))
[(2, ((0, 1), (0, 1)))]

Added in version 0.5: made public.

iter_cells(hist, ranges=None, coord_ranges=None)[source]

Iterate cells of a histogram hist, possibly in a subrange.

For each bin, yield a HistCell containing bin edges, bin content and bin index. The order of iteration is the same as for iter_bins().

ranges are the ranges of bin indices to be used for each coordinate (the lower value is included, the upper value is excluded).

coord_ranges set real coordinate ranges based on histogram edges. Obviously, they can be not exactly bin edges. If one of the ranges for the given coordinate is outside the histogram edges, then only existing histogram edges within the range are selected. If the coordinate range is completely outside histogram edges, nothing is yielded. If a lower or upper coord_range falls within a bin, this bin is yielded. Note that if a coordinate range falls on a bin edge, the number of generated bins can be unstable because of limited float precision.

ranges and coord_ranges are tuples of tuples of limits in corresponding dimensions. For one-dimensional histogram it must be a tuple containing a tuple, for example ((None, None),).

None as an upper or lower range means no limit (((None, None),) is equivalent to ((0, len(bins)),) for a 1-dimensional histogram).

If a range index is lower than 0 or higher than possible index, LenaValueError is raised. If both coord_ranges and ranges are provided, LenaTypeError is raised.

unify_1_md(bins, edges)[source]

Unify 1- and multidimensional bins and edges.

Return a tuple of (bins, edges). Bins and multidimensional edges return unchanged, while one-dimensional edges are inserted into a list.