Structures

Histograms:

Histogram(edges[, bins, make_bins, …]) Multidimensional histogram.
NumpyHistogram

Graph:

Graph([points, scale, sort]) Function at given points.

Histogram functions:

check_edges_increasing(edges) Assure that multidimensional edges are increasing.
get_bin_on_value_1d(val, arr) Return index for value in one-dimensional array.
get_bin_on_value(arg, edges) Get the bin index for arg in a multidimensional array edges.
get_bin_on_index(index, bins) Return bin corresponding to multidimensional index.
iter_bins(bins) Iterate on bins.
init_bins(edges[, value, deepcopy]) Initialize cells of the form edges with the given value.
integral(bins, edges) Compute integral (scale for a histogram).
make_hist_context(hist, context) Update context with the context of a Histogram hist.
unify_1_md(bins, edges) Unify 1- and multidimensional bins and edges.

Histograms

class Histogram(edges, bins=None, make_bins=None, initial_value=0)[исходный код]

Multidimensional histogram.

Arbitrary dimension, variable bin size and a weight function during fill() are supported. Lower bin edge is included, upper edge is excluded. Underflow and overflow values are skipped. Bin content type is defined during the initialization.

Examples:

>>> # two-dimensional histogram
>>> hist = Histogram([[0, 1, 2], [0, 1, 2]])
>>> hist.fill([0, 1])
>>> hist.bins
[[0, 1], [0, 0]]
>>> values = [[0, 0], [1, 0], [1, 1]]
>>> # use fill method
>>> for v in values:
...     hist.fill(v)
>>> hist.bins
[[1, 1], [1, 1]]
>>> # use as a Lena FillCompute element
>>> # (yielded only after fully computed)
>>> hseq = lena.core.Sequence(hist)
>>> h, context = next(hseq.run(values))
>>> print(h.bins)
[[2, 1], [2, 2]]

edges is a sequence of one-dimensional arrays, each containing strictly increasing bin edges. If edges“ subarrays are not increasing or any of them has length less than 2, LenaValueError is raised.

Histogram bins by default are initialized with initial_value. It can be any object, which supports addition of a weight during fill (but that is not necessary if you don’t plan to fill the histogram). If the initial_value is compound and requires special copying, create initial bins yourself (see init_bins()).

Histogram may be created from existing bins and edges. In this case a simple check of the shape of bins is done. If that is incorrect, LenaValueError is raised.

make_bins is a function without arguments, which creates new bins (it will be called during __init__() and reset()). initial_value in this case is ignored, but bin check is being done. If both bins and make_bins are provided, LenaTypeError is raised.

Attributes

Histogram.edges is a list of edges on each dimension. Edges mark the borders of the bin. Edges along each dimension is a one-dimensional list, and the multidimensional bin is the result of all intersections of one-dimensional edges. For example, 3-dimensional histogram has edges of the form [x_edges, y_edges, z_edges], and the 0th bin has the borders ((x[0], x[1]), (y[0], y[1]), (z[0], z[1])).

Index in the edges is a tuple, where a given position corresponds to a dimension, and the content at that position to the bin along that dimension. For example, index (0, 1, 3) corresponds to the bin with lower edges (x[0], y[1], z[3]).

Histogram.bins is a list of nested lists. Same index as for edges can be used to get bin content: bin at (0, 1, 3) can be obtained as bins[0][1][3]. Most nested arrays correspond to highest (further from x) coordinates. For example, for a 3-dimensional histogram bins equal to [[[1, 1], [0, 0]], [[0, 0], [0, 0]]] mean that the only filled bins are those where x and y indices are 0, and z index is 0 and 1.

dim is the dimension of a histogram (length of its edges for a multidimensional histogram).

Programmer’s note

one- and multidimensional histograms have different bins and edges format. To be unified, 1-dimensional edges should be nested in a list (like [[1, 2, 3]]). Instead, they are simply the x-edges list, because it is more intuitive and one-dimensional histograms are used more often. To unify the interface for bins and edges in your code, use unify_1_md() function.

compute()[исходный код]

Yield this histogram with context.

fill(value, weight=1)[исходный код]

Fill histogram with value with the given weight.

Value can be a (data, context) pair. Values outside the histogram edges are ignored.

reset()[исходный код]

Reset the histogram.

Current context is reset to an empty dict. Bins are reinitialized with the initial_value or with make_bins (depending on the initialization).

If bins were set explicitly during the initialization, LenaRuntimeError is raised.

scale(other=None, recompute=False)[исходный код]

Compute or set scale (integral of the histogram).

If other is None, return scale of this histogram. If its scale was not computed before, it is computed and stored for subsequent use (unless explicitly asked to recompute).

If a float other is provided, rescale to other. A new histogram with the scale equal to other is returned, the original histogram remains unchanged.

Histograms with scale equal to zero can’t be rescaled. LenaValueError is raised if one tries to do that.

Graph

class Graph(points=None, scale=None, sort=True)[исходный код]

Function at given points.

Graph can be set during the initialization and during fill(). It can be rescaled (producing a new graph).

One can get graph points as Graph.points attribute. They will be sorted each time before return if sort was set to True. An attempt to change points (use Graph.points on the left of „=“) will raise Python’s AttributeError.

Предупреждение

Graph does not reduce data. All filled values will be stored in it. To reduce data, use histograms.

points is an array of (coordinate, value) tuples.

context will be added to graph context. If it contains «scale», scale() method will be available. Otherwise, if «scale» is contained in the context during fill(), it will be used. In this case it is assumed that this scale is same for all values (only the last filled context is checked). Context from flow takes precedence over the initialized one.

Graph coordinates are sorted by default. This is usually needed to plot graphs of functions. If you need to keep the order of insertion, set sort to False.

By default, sorting is done using standard Python lists and functions. You can disable sort and provide your own sorting container for points. Some implementations are compared here. Note that a rescaled graph uses a default list.

fill(value)[исходный код]

Fill the graph with value.

Value can be a (data, context) tuple. Data part must be a (coordinates, value) pair, where both coordinates and value are also tuples. For example, value can contain the principal number and the precision.

points

Get graph points (read only).

request()[исходный код]

Yield graph with context.

If sort was initialized True, graph points will be sorted. If flow contained scale it the context, it is set now.

reset()[исходный код]

Reset points to an empty list and current context to an empty dict.

scale(other=None)[исходный код]

Get or set the scale.

Graph’s scale comes from an external source. For example, if the graph was computed from a function, this may be its integral passed via context during fill(). Once the scale is set, it is stored in the graph. If one attempts to use scale which was not set, LenaAttributeError is raised.

If other is None, return the scale.

If a float other is provided, rescale to other. A new graph with the scale equal to other is returned, the original one remains unchanged. Note that in this case its points will be a simple list and new graph sort parameter will be True.

Graphs with scale equal to zero can’t be rescaled. Attempts to do that raise LenaValueError.

to_csv(separator=', ', header=None)[исходный код]

Convert graph’s points to CSV.

separator delimits values, default is a comma.

header, if not None, is the first string of the output (new line is added automatically).

Since a graph can be multidimensional, for each point first its coordinate is converted to string (separated by separator), than each part of its value.

To convert Graph to CSV inside a Lena sequence, use ToCSV.

Histogram functions

Functions for histograms.

These functions are used for low-level work with histograms and their contents. They are not needed for normal usage.

check_edges_increasing(edges)[исходный код]

Assure that multidimensional edges are increasing.

If length of edges or its subarray is less than 2 or if some subarray of edges contains not strictly increasing values, LenaValueError is raised.

get_bin_on_index(index, bins)[исходный код]

Return bin corresponding to multidimensional index.

index can be a number or a list/tuple. If index length is less than dimension of bins, a subarray of bins is returned.

In case of an index error, LenaIndexError is raised.

Example:

>>> from lena.structures import Histogram, get_bin_on_index
>>> hist = Histogram([0, 1], [0])
>>> get_bin_on_index(0, hist.bins)
0
>>> get_bin_on_index((0, 1), [[0, 1], [0, 0]])
1
>>> get_bin_on_index(0, [[0, 1], [0, 0]])
[0, 1]
get_bin_on_value(arg, edges)[исходный код]

Get the bin index for arg in a multidimensional array edges.

arg is a 1-dimensional array of numbers (or a number for 1-dimensional edges), and corresponds to a point in N-dimensional space.

edges is an array of N-1 dimensional arrays (lists or tuples) of numbers. Each 1-dimensional subarray consists of increasing numbers.

arg and edges must have the same length (otherwise LenaValueError is raised). arg and edges must be iterable and support len().

Return list of indices in edges corresponding to arg.

If any coordinate is out of its corresponding edge range, its index will be -1 for underflow or len(edge)-1 for overflow.

Examples:

>>> from lena.structures import get_bin_on_value
>>> edges = [[1, 2, 3], [1, 3.5]]
>>> get_bin_on_value((1.5, 2), edges)
[0, 0]
>>> get_bin_on_value((1.5, 0), edges)
[0, -1]
>>> # the upper edge is excluded
>>> get_bin_on_value((3, 2), edges)
[2, 0]
>>> # one-dimensional edges
>>> edges = [1, 2, 3]
>>> get_bin_on_value(2, edges)
[1]
get_bin_on_value_1d(val, arr)[исходный код]

Return index for value in one-dimensional array.

arr must contain strictly increasing values (not necessarily equidistant), it is not checked.

«Linear binary search» is used, that is our array search by default assumes the array to be split on equidistant steps.

Example:

>>> from lena.structures import get_bin_on_value_1d
>>> arr = [0, 1, 4, 5, 7, 10]
>>> get_bin_on_value_1d(0, arr)
0
>>> get_bin_on_value_1d(4.5, arr)
2
>>> # upper range is excluded
>>> get_bin_on_value_1d(10, arr)
5
>>> # underflow
>>> get_bin_on_value_1d(-10, arr)
-1
hist_to_graph(hist, context, make_graph_value=None, bin_coord='left')[исходный код]

Convert a Histogram hist to a Graph.

context becomes graph’s context. For example, it can contain a scale.

make_graph_value is a function to set graph point’s value. By default it is bin content. This option could be used to create graph error bars. make_graph_value must accept bin content and bin context as positional arguments.

bin_coord signifies which will be the coordinate of a graph’s point created from histogram’s bin. Can be «left» (default), «right» and «middle».

Return the resulting graph.

init_bins(edges, value=0, deepcopy=False)[исходный код]

Initialize cells of the form edges with the given value.

Return bins filled with copies of value.

Value must be copyable, usual numbers will suit. If the value is mutable, use deepcopy = True (or the content of cells will be identical).

Examples:

>>> edges = [[0, 1], [0, 1]]
>>> # one cell
>>> init_bins(edges)
[[0]]
>>> # no need to use floats,
>>> # because integers will automatically be cast to floats
>>> # when used together
>>> init_bins(edges, 0.0)
[[0.0]]
>>> init_bins([[0, 1, 2], [0, 1, 2]])
[[0, 0], [0, 0]]
>>> init_bins([0, 1, 2])
[0, 0]
integral(bins, edges)[исходный код]

Compute integral (scale for a histogram).

bins contain values, and edges form the mesh for the integration. Their format is defined in Histogram description.

iter_bins(bins)[исходный код]

Iterate on bins. Yield (index, bin content).

Edges with higher index are iterated first (that is z, then y, then x for a 3-dimensional histogram).

make_hist_context(hist, context)[исходный код]

Update context with the context of a Histogram hist.

Deep copy of updated context is returned.

unify_1_md(bins, edges)[исходный код]

Unify 1- and multidimensional bins and edges.

Return a tuple of (bins, edges). Bins and multidimensional edges return unchanged, while one-dimensional edges are inserted into a list.