Structures

Histograms:

Histogram(edges[, bins, make_bins, …]) Multidimensional histogram.
NumpyHistogram

Graph:

Graph([points, context, scale, sort]) Function at given coordinates (arbitraty dimensions).

Histogram functions:

HistCell A namedtuple with fields edges, bin, index.
check_edges_increasing(edges) Assure that multidimensional edges are increasing.
get_bin_edges(index, edges) Return edges of the bin for the given edges of a histogram.
get_bin_on_value_1d(val, arr) Return index for value in one-dimensional array.
get_bin_on_value(arg, edges) Get the bin index for arg in a multidimensional array edges.
get_bin_on_index(index, bins) Return bin corresponding to multidimensional index.
iter_bins(bins) Iterate on bins.
iter_cells(hist[, ranges, coord_ranges]) Iterate cells of a histogram hist, possibly in a subrange.
init_bins(edges[, value, deepcopy]) Initialize cells of the form edges with the given value.
integral(bins, edges) Compute integral (scale for a histogram).
make_hist_context(hist, context) Update context with the context of a Histogram hist.
unify_1_md(bins, edges) Unify 1- and multidimensional bins and edges.

Histograms

class Histogram(edges, bins=None, make_bins=None, initial_value=0, context=None)[исходный код]

Multidimensional histogram.

Arbitrary dimension, variable bin size and a weight function during fill() are supported. Lower bin edge is included, upper edge is excluded. Underflow and overflow values are skipped. Bin content type is defined during the initialization.

Examples:

>>> # two-dimensional histogram
>>> hist = Histogram([[0, 1, 2], [0, 1, 2]])
>>> hist.fill([0, 1])
>>> hist.bins
[[0, 1], [0, 0]]
>>> values = [[0, 0], [1, 0], [1, 1]]
>>> # use fill method
>>> for v in values:
...     hist.fill(v)
>>> hist.bins
[[1, 1], [1, 1]]
>>> # use as a Lena FillCompute element
>>> # (yielded only after fully computed)
>>> hseq = lena.core.Sequence(hist)
>>> h, context = next(hseq.run(values))
>>> print(h.bins)
[[2, 1], [2, 2]]

edges is a sequence of one-dimensional arrays, each containing strictly increasing bin edges. If edges“ subarrays are not increasing or any of them has length less than 2, LenaValueError is raised.

Histogram bins by default are initialized with initial_value. It can be any object, which supports addition of a weight during fill (but that is not necessary if you don’t plan to fill the histogram). If the initial_value is compound and requires special copying, create initial bins yourself (see init_bins()).

Histogram may be created from existing bins and edges. In this case a simple check of the shape of bins is done. If that is incorrect, LenaValueError is raised.

make_bins is a function without arguments that creates new bins (it will be called during __init__() and reset()). initial_value in this case is ignored, but bin check is being done. If both bins and make_bins are provided, LenaTypeError is raised.

Attributes

Histogram.edges is a list of edges on each dimension. Edges mark the borders of the bin. Edges along each dimension is a one-dimensional list, and the multidimensional bin is the result of all intersections of one-dimensional edges. For example, 3-dimensional histogram has edges of the form [x_edges, y_edges, z_edges], and the 0th bin has the borders ((x[0], x[1]), (y[0], y[1]), (z[0], z[1])).

Index in the edges is a tuple, where a given position corresponds to a dimension, and the content at that position to the bin along that dimension. For example, index (0, 1, 3) corresponds to the bin with lower edges (x[0], y[1], z[3]).

Histogram.bins is a list of nested lists. Same index as for edges can be used to get bin content: bin at (0, 1, 3) can be obtained as bins[0][1][3]. Most nested arrays correspond to highest (further from x) coordinates. For example, for a 3-dimensional histogram bins equal to [[[1, 1], [0, 0]], [[0, 0], [0, 0]]] mean that the only filled bins are those where x and y indices are 0, and z index is 0 and 1.

dim is the dimension of a histogram (length of its edges for a multidimensional histogram).

Programmer’s note

one- and multidimensional histograms have different bins and edges format. To be unified, 1-dimensional edges should be nested in a list (like [[1, 2, 3]]). Instead, they are simply the x-edges list, because it is more intuitive and one-dimensional histograms are used more often. To unify the interface for bins and edges in your code, use unify_1_md() function.

compute()[исходный код]

Yield this histogram with context.

fill(value, weight=1)[исходный код]

Fill histogram with value with the given weight.

Value can be a (data, context) pair. Values outside the histogram edges are ignored.

reset()[исходный код]

Reset the histogram.

Current context is reset to an empty dict. Bins are reinitialized with the initial_value or with make_bins (depending on the initialization).

If bins were set explicitly during the initialization, LenaRuntimeError is raised.

scale(other=None, recompute=False)[исходный код]

Compute or set scale (integral of the histogram).

If other is None, return scale of this histogram. If its scale was not computed before, it is computed and stored for subsequent use (unless explicitly asked to recompute).

If a float other is provided, rescale to other. A new histogram with the scale equal to other is returned, the original histogram remains unchanged.

Histograms with scale equal to zero can’t be rescaled. LenaValueError is raised if one tries to do that.

Graph

class Graph(points=None, context=None, scale=None, sort=True)[исходный код]

Function at given coordinates (arbitraty dimensions).

Graph points can be set during the initialization and during fill(). It can be rescaled (producing a new Graph). A point is a tuple of (coordinate, value), where both coordinate and value can be tuples of numbers. Coordinate corresponds to a point in N-dimensional space, while value is some function’s value at this point (the function can take a value in M-dimensional space). Coordinate and value dimensions must be the same for all points.

One can get graph points as Graph.points attribute. They will be sorted each time before return if sort was set to True. An attempt to change points (use Graph.points on the left of „=“) will raise Python’s AttributeError.

points is an array of (coordinate, value) tuples.

context is the same as the most recent context during fill. Use it to provide a context when initializing a Graph from existing points.

scale sets the scale of the graph. It is used during plotting if rescaling is needed.

Graph coordinates are sorted by default. This is usually needed to plot graphs of functions. If you need to keep the order of insertion, set sort to False.

By default, sorting is done using standard Python lists and functions. You can disable sort and provide your own sorting container for points. Some implementations are compared here. Note that a rescaled graph uses a default list.

Note that Graph does not reduce data. All filled values will be stored in it. To reduce data, use histograms.

fill(value)[исходный код]

Fill the graph with value.

Value can be a (data, context) tuple. Data part must be a (coordinates, value) pair, where both coordinates and value are also tuples. For example, value can contain the principal number and its precision.

points

Get graph points (read only).

request()[исходный код]

Yield graph with context.

If sort was initialized True, graph points will be sorted.

reset()[исходный код]

Reset points to an empty list and current context to an empty dict.

scale(other=None)[исходный код]

Get or set the scale.

Graph’s scale comes from an external source. For example, if the graph was computed from a function, this may be its integral passed via context during fill(). Once the scale is set, it is stored in the graph. If one attempts to use scale which was not set, LenaAttributeError is raised.

If other is None, return the scale.

If a float other is provided, rescale to other. A new graph with the scale equal to other is returned, the original one remains unchanged. Note that in this case its points will be a simple list and new graph sort parameter will be True.

Graphs with scale equal to zero can’t be rescaled. Attempts to do that raise LenaValueError.

to_csv(separator=', ', header=None)[исходный код]

Convert graph’s points to CSV.

separator delimits values, default is a comma.

header, if not None, is the first string of the output (new line is added automatically).

Since a graph can be multidimensional, for each point first its coordinate is converted to string (separated by separator), than each part of its value.

To convert Graph to CSV inside a Lena sequence, use lena.output.ToCSV.

Histogram functions

Functions for histograms.

These functions are used for low-level work with histograms and their contents. They are not needed for normal usage.

class HistCell[исходный код]

A namedtuple with fields edges, bin, index.

Create new instance of HistCell(edges, bin, index)

check_edges_increasing(edges)[исходный код]

Assure that multidimensional edges are increasing.

If length of edges or its subarray is less than 2 or if some subarray of edges contains not strictly increasing values, LenaValueError is raised.

get_bin_edges(index, edges)[исходный код]

Return edges of the bin for the given edges of a histogram.

In one-dimensional case index must be an integer and a tuple of (x_low_edge, x_high_edge) for that bin is returned.

In a multidimensional case index is a container of numeric indices in each dimension. A list of bin edges in each dimension is returned.

get_bin_on_index(index, bins)[исходный код]

Return bin corresponding to multidimensional index.

index can be a number or a list/tuple. If index length is less than dimension of bins, a subarray of bins is returned.

In case of an index error, LenaIndexError is raised.

Example:

>>> from lena.structures import Histogram, get_bin_on_index
>>> hist = Histogram([0, 1], [0])
>>> get_bin_on_index(0, hist.bins)
0
>>> get_bin_on_index((0, 1), [[0, 1], [0, 0]])
1
>>> get_bin_on_index(0, [[0, 1], [0, 0]])
[0, 1]
get_bin_on_value(arg, edges)[исходный код]

Get the bin index for arg in a multidimensional array edges.

arg is a 1-dimensional array of numbers (or a number for 1-dimensional edges), and corresponds to a point in N-dimensional space.

edges is an array of N-1 dimensional arrays (lists or tuples) of numbers. Each 1-dimensional subarray consists of increasing numbers.

arg and edges must have the same length (otherwise LenaValueError is raised). arg and edges must be iterable and support len().

Return list of indices in edges corresponding to arg.

If any coordinate is out of its corresponding edge range, its index will be -1 for underflow or len(edge)-1 for overflow.

Examples:

>>> from lena.structures import get_bin_on_value
>>> edges = [[1, 2, 3], [1, 3.5]]
>>> get_bin_on_value((1.5, 2), edges)
[0, 0]
>>> get_bin_on_value((1.5, 0), edges)
[0, -1]
>>> # the upper edge is excluded
>>> get_bin_on_value((3, 2), edges)
[2, 0]
>>> # one-dimensional edges
>>> edges = [1, 2, 3]
>>> get_bin_on_value(2, edges)
[1]
get_bin_on_value_1d(val, arr)[исходный код]

Return index for value in one-dimensional array.

arr must contain strictly increasing values (not necessarily equidistant), it is not checked.

«Linear binary search» is used, that is our array search by default assumes the array to be split on equidistant steps.

Example:

>>> from lena.structures import get_bin_on_value_1d
>>> arr = [0, 1, 4, 5, 7, 10]
>>> get_bin_on_value_1d(0, arr)
0
>>> get_bin_on_value_1d(4.5, arr)
2
>>> # upper range is excluded
>>> get_bin_on_value_1d(10, arr)
5
>>> # underflow
>>> get_bin_on_value_1d(-10, arr)
-1
hist_to_graph(hist, context, make_graph_value=None, bin_coord='left')[исходный код]

Convert a Histogram hist to a Graph.

context becomes graph’s context. For example, it can contain a scale.

make_graph_value is a function to set graph point’s value. By default it is bin content. This option could be used to create graph error bars. make_graph_value must accept bin content and bin context as positional arguments.

bin_coord signifies which will be the coordinate of a graph’s point created from histogram’s bin. Can be «left» (default), «right» and «middle».

Return the resulting graph.

init_bins(edges, value=0, deepcopy=False)[исходный код]

Initialize cells of the form edges with the given value.

Return bins filled with copies of value.

Value must be copyable, usual numbers will suit. If the value is mutable, use deepcopy = True (or the content of cells will be identical).

Examples:

>>> edges = [[0, 1], [0, 1]]
>>> # one cell
>>> init_bins(edges)
[[0]]
>>> # no need to use floats,
>>> # because integers will automatically be cast to floats
>>> # when used together
>>> init_bins(edges, 0.0)
[[0.0]]
>>> init_bins([[0, 1, 2], [0, 1, 2]])
[[0, 0], [0, 0]]
>>> init_bins([0, 1, 2])
[0, 0]
integral(bins, edges)[исходный код]

Compute integral (scale for a histogram).

bins contain values, and edges form the mesh for the integration. Their format is defined in Histogram description.

iter_bins(bins)[исходный код]

Iterate on bins. Yield (index, bin content).

Edges with higher index are iterated first (that is z, then y, then x for a 3-dimensional histogram).

iter_cells(hist, ranges=None, coord_ranges=None)[исходный код]

Iterate cells of a histogram hist, possibly in a subrange.

For each bin, yield a HistCell containing bin edges, bin content and bin index. The order of iteration is the same as for iter_bins().

ranges are the ranges of bin indices to be used for each coordinate (the lower value is included, the upper value is excluded).

coord_ranges set real coordinate ranges based on histogram edges. Obviously, they can be not exactly bin edges. If one of the ranges for the given coordinate is outside the histogram edges, then only existing histogram edges within the range are selected. If the coordinate range is completely outside histogram edges, nothing is yielded. If a lower or upper coord_range falls within a bin, this bin is yielded. Note that if a coordinate range falls on a bin edge, the number of generated bins can be unstable because of limited float precision.

ranges and coord_ranges are tuples of tuples of limits in corresponding dimensions. For one-dimensional histogram it must be a tuple containing a tuple, for example ((None, None),).

None as an upper or lower range means no limit (((None, None),) is equivalent to ((0, len(bins)),) for a 1-dimensional histogram).

If a range index is lower than 0 or higher than possible index, LenaValueError is raised. If both coord_ranges and ranges are provided, LenaTypeError is raised.

make_hist_context(hist, context)[исходный код]

Update context with the context of a Histogram hist.

Deep copy of updated context is returned.

unify_1_md(bins, edges)[исходный код]

Unify 1- and multidimensional bins and edges.

Return a tuple of (bins, edges). Bins and multidimensional edges return unchanged, while one-dimensional edges are inserted into a list.