Structures¶

Histograms:

`histogram`(edges[, bins, initial_value])	A multidimensional histogram.
`Histogram`(edges[, bins, make_bins, …])	An element to produce histograms.
`NumpyHistogram`(args, *kwargs)	Create a histogram using a 1-dimensional numpy.histogram.

Graph:

Graph([points, context, scale, sort]) Function at given coordinates (arbitraty dimensions).

HistToGraph([make_value, get_coordinate]) Transform a histogram to a Graph.

Histogram functions:

`HistCell`	A namedtuple with fields edges, bin, index.
`check_edges_increasing`(edges)	Assure that multidimensional edges are increasing.
`get_bin_edges`(index, edges)	Return edges of the bin for the given edges of a histogram.
`get_bin_on_index`(index, bins)	Return bin corresponding to multidimensional index.
`get_bin_on_value`(arg, edges)	Get the bin index for arg in a multidimensional array edges.
`get_bin_on_value_1d`(val, arr)	Return index for value in one-dimensional array.
`hist_to_graph`(hist, context[, make_value, …])	Convert a `histogram` to a `Graph`.
`init_bins`(edges[, value, deepcopy])	Initialize cells of the form edges with the given value.
`integral`(bins, edges)	Compute integral (scale for a histogram).
`iter_bins`(bins)	Iterate on bins.
`iter_cells`(hist[, ranges, coord_ranges])	Iterate cells of a histogram hist, possibly in a subrange.
`make_hist_context`(hist, context)	Update context with the context of a `histogram` hist.
`unify_1_md`(bins, edges)	Unify 1- and multidimensional bins and edges.

Histograms¶

class histogram(edges, bins=None, initial_value=0)[source]¶

A multidimensional histogram.

Arbitrary dimension, variable bin size and weights are supported. Lower bin edge is included, upper edge is excluded. Underflow and overflow values are skipped. Bin content can be of arbitrary type, which is defined during initialization.

Examples:

>>> # a two-dimensional histogram
>>> hist = histogram([[0, 1, 2], [0, 1, 2]])
>>> hist.fill([0, 1])
>>> hist.bins
[[0, 1], [0, 0]]
>>> values = [[0, 0], [1, 0], [1, 1]]
>>> # fill the histogram with values
>>> for v in values:
...     hist.fill(v)
>>> hist.bins
[[1, 1], [1, 1]]

edges is a sequence of one-dimensional arrays, each containing strictly increasing bin edges.

Histogram’s bins by default are initialized with initial_value. It can be any object that supports addition with weight during fill (but that is not necessary if you don’t plan to fill the histogram). If the initial_value is compound and requires special copying, create initial bins yourself (see init_bins()).

A histogram can be created from existing bins and edges. In this case a simple check of the shape of bins is done (raising LenaValueError if failed).

Attributes

edges is a list of edges on each dimension. Edges mark the borders of the bin. Edges along each dimension are one-dimensional lists, and the multidimensional bin is the result of all intersections of one-dimensional edges. For example, a 3-dimensional histogram has edges of the form [x_edges, y_edges, z_edges], and the 0th bin has borders ((x[0], x[1]), (y[0], y[1]), (z[0], z[1])).

Index in the edges is a tuple, where a given position corresponds to a dimension, and the content at that position to the bin along that dimension. For example, index (0, 1, 3) corresponds to the bin with lower edges (x[0], y[1], z[3]).

bins is a list of nested lists. Same index as for edges can be used to get bin content: bin at (0, 1, 3) can be obtained as bins[0][1][3]. Most nested arrays correspond to highest (further from x) coordinates. For example, for a 3-dimensional histogram bins equal to [[[1, 1], [0, 0]], [[0, 0], [0, 0]]] mean that the only filled bins are those where x and y indices are 0, and z index is 0 and 1.

dim is the dimension of a histogram (length of its edges for a multidimensional histogram).

If subarrays of edges are not increasing or if any of them has length less than 2, LenaValueError is raised.

Programmer’s note

one- and multidimensional histograms have different bins and edges format. To be unified, 1-dimensional edges should be nested in a list (like [[1, 2, 3]]). Instead, they are simply the x-edges list, because it is more intuitive and one-dimensional histograms are used more often. To unify the interface for bins and edges in your code, use unify_1_md() function.

__eq__(other)[source]¶

Two histograms are equal, if and only if they have equal bins and equal edges.

If other is not a histogram, return False.

Note that floating numbers should be compared approximately (using math.isclose()).

fill(coord, weight=1)[source]¶

Fill histogram at coord with the given weight.

Coordinates outside the histogram’s edges are ignored.

scale(other=None, recompute=False)[source]¶

Compute or set scale (integral of the histogram).

If other is None, return scale of this histogram. If its scale was not computed before, it is computed and stored for subsequent use (unless explicitly asked to recompute).

If a float other is provided, rescale to other. A new histogram with the scale equal to other is returned, the original histogram remains unchanged.

Histograms with scale equal to zero can’t be rescaled. LenaValueError is raised if one tries to do that.

class Histogram(edges, bins=None, make_bins=None, initial_value=0, context=None)[source]¶

An element to produce histograms.

edges, bins and initial_value have the same meaning as during creation of a histogram.

make_bins is a function without arguments that creates new bins (it will be called during __init__() and reset()). initial_value in this case is ignored, but bin check is being done. If both bins and make_bins are provided, LenaTypeError is raised.

compute()[source]¶

Yield histogram with context.

context.histogram is updated with histogram’s attributes.

fill(value, weight=1)[source]¶

Fill the histogram with value with given weight.

value can be a (data, context) pair. Values outside the histogram edges are ignored.

reset()[source]¶

Reset the histogram.

Current context is reset to an empty dict. Bins are reinitialized with the initial_value or with make_bins (depending on the initialization).

If bins were set explicitly during the initialization, LenaRuntimeError is raised.

class NumpyHistogram(*args, **kwargs)[source]¶

Create a histogram using a 1-dimensional numpy.histogram.

The result of compute is a Lena histogram, but it is calculated using numpy histogram, and all its initialization arguments are passed to numpy.

Examples

With NumpyHistogram() bins are automatically derived from data.

With NumpyHistogram(bins=list(range(0, 5)), density=True) bins are set explicitly.

Warning

as numpy histogram is computed from an existing array, all values are stored in the internal data structure during fill, which may take much memory.

Use *args and **kwargs for numpy.histogram initialization.

Default bins keyword argument is auto.

A keyword argument reset specifies the exact behaviour of request.

fill(val)[source]¶: Add data to the internal storage.

request()[source]¶

Compute the final histogram.

Return histogram with context.

If reset was set during the initialization, reset method is called.

reset()[source]¶

Reset data and context.

Remove all data for this histogram and set current context to {}.

Graph¶

class Graph(points=None, context=None, scale=None, sort=True)[source]¶

Function at given coordinates (arbitraty dimensions).

Graph points can be set during the initialization and during fill(). It can be rescaled (producing a new Graph). A point is a tuple of (coordinate, value), where both coordinate and value can be tuples of numbers. Coordinate corresponds to a point in N-dimensional space, while value is some function’s value at this point (the function can take a value in M-dimensional space). Coordinate and value dimensions must be the same for all points.

One can get graph points as Graph.points attribute. They will be sorted each time before return if sort was set to True. An attempt to change points (use Graph.points on the left of ‘=’) will raise Python’s AttributeError.

points is an array of (coordinate, value) tuples.

context is the same as the most recent context during fill. Use it to provide a context when initializing a Graph from existing points.

scale sets the scale of the graph. It is used during plotting if rescaling is needed.

Graph coordinates are sorted by default. This is usually needed to plot graphs of functions. If you need to keep the order of insertion, set sort to False.

By default, sorting is done using standard Python lists and functions. You can disable sort and provide your own sorting container for points. Some implementations are compared here. Note that a rescaled graph uses a default list.

Note that Graph does not reduce data. All filled values will be stored in it. To reduce data, use histograms.

fill(value)[source]¶

Fill the graph with value.

Value can be a (data, context) tuple. Data part must be a (coordinates, value) pair, where both coordinates and value are also tuples. For example, value can contain the principal number and its precision.

points¶: Get graph points (read only).

request()[source]¶

Yield graph with context.

If sort was initialized True, graph points will be sorted.

reset()[source]¶: Reset points to an empty list and current context to an empty dict.

scale(other=None)[source]¶

Get or set the scale.

Graph’s scale comes from an external source. For example, if the graph was computed from a function, this may be its integral passed via context during fill(). Once the scale is set, it is stored in the graph. If one attempts to use scale which was not set, LenaAttributeError is raised.

If other is None, return the scale.

If a float other is provided, rescale to other. A new graph with the scale equal to other is returned, the original one remains unchanged. Note that in this case its points will be a simple list and new graph sort parameter will be True.

Graphs with scale equal to zero can’t be rescaled. Attempts to do that raise LenaValueError.

to_csv(separator=', ', header=None)[source]¶

Convert graph’s points to CSV.

separator delimits values, the default is comma.

header, if not None, is the first string of the output (new line is added automatically).

Since a graph can be multidimensional, for each point first its coordinate is converted to string (separated by separator), then each part of its value.

To convert Graph to CSV inside a Lena sequence, use lena.output.ToCSV.

class HistToGraph(make_value=None, get_coordinate='left')[source]¶

Transform a histogram to a Graph.

make_value is a function, that creates graph’s value from the bin’s value. By default, it is simply bin value.

get_coordinate defines the coordinate of the graph’s point. By default, it is the left bin edge. Other allowed values are “right” and “middle”. An incorrect value raises LenaValueError during the initialization.

run(flow)[source]¶

Iterate the flow and transform histograms to graphs.

Not histograms and histograms with the context histogram.to_graph set to False are yielded unchanged.

Histogram functions¶

Functions for histograms.

These functions are used for low-level work with histograms and their contents. They are not needed for normal usage.

class HistCell[source]¶

A namedtuple with fields edges, bin, index.

Create new instance of HistCell(edges, bin, index)

check_edges_increasing(edges)[source]¶

Assure that multidimensional edges are increasing.

If length of edges or its subarray is less than 2 or if some subarray of edges contains not strictly increasing values, LenaValueError is raised.

get_bin_edges(index, edges)[source]¶

Return edges of the bin for the given edges of a histogram.

In one-dimensional case index must be an integer and a tuple of (x_low_edge, x_high_edge) for that bin is returned.

In a multidimensional case index is a container of numeric indices in each dimension. A list of bin edges in each dimension is returned.

get_bin_on_index(index, bins)[source]¶

Return bin corresponding to multidimensional index.

index can be a number or a list/tuple. If index length is less than dimension of bins, a subarray of bins is returned.

In case of an index error, LenaIndexError is raised.

Example:

>>> from lena.structures import histogram, get_bin_on_index
>>> hist = histogram([0, 1], [0])
>>> get_bin_on_index(0, hist.bins)
0
>>> get_bin_on_index((0, 1), [[0, 1], [0, 0]])
1
>>> get_bin_on_index(0, [[0, 1], [0, 0]])
[0, 1]

get_bin_on_value(arg, edges)[source]¶

Get the bin index for arg in a multidimensional array edges.

arg is a 1-dimensional array of numbers (or a number for 1-dimensional edges), and corresponds to a point in N-dimensional space.

edges is an array of N-1 dimensional arrays (lists or tuples) of numbers. Each 1-dimensional subarray consists of increasing numbers.

arg and edges must have the same length (otherwise LenaValueError is raised). arg and edges must be iterable and support len().

Return list of indices in edges corresponding to arg.

If any coordinate is out of its corresponding edge range, its index will be -1 for underflow or len(edge)-1 for overflow.

Examples:

>>> from lena.structures import get_bin_on_value
>>> edges = [[1, 2, 3], [1, 3.5]]
>>> get_bin_on_value((1.5, 2), edges)
[0, 0]
>>> get_bin_on_value((1.5, 0), edges)
[0, -1]
>>> # the upper edge is excluded
>>> get_bin_on_value((3, 2), edges)
[2, 0]
>>> # one-dimensional edges
>>> edges = [1, 2, 3]
>>> get_bin_on_value(2, edges)
[1]

get_bin_on_value_1d(val, arr)[source]¶

Return index for value in one-dimensional array.

arr must contain strictly increasing values (not necessarily equidistant), it is not checked.

“Linear binary search” is used, that is our array search by default assumes the array to be split on equidistant steps.

Example:

>>> from lena.structures import get_bin_on_value_1d
>>> arr = [0, 1, 4, 5, 7, 10]
>>> get_bin_on_value_1d(0, arr)
0
>>> get_bin_on_value_1d(4.5, arr)
2
>>> # upper range is excluded
>>> get_bin_on_value_1d(10, arr)
5
>>> # underflow
>>> get_bin_on_value_1d(-10, arr)
-1

hist_to_graph(hist, context, make_value=None, get_coordinate='left')[source]¶

Convert a histogram to a Graph.

context becomes the graph’s context. For example, it can contain a scale.

make_value is a function to set graph point’s value. By default it is bin content. This option could be used to create graph error bars. make_value accepts a single value (bin content), which can contain a context. Define this function depending on the expected data. For example, to create a graph with errors from a histogram where bins contain a named tuple with fields mean, mean_error and a context one could use

>>> make_value = lambda val: (val[0].mean, val[0].mean_error)

get_coordinate defines what will be the coordinate of a graph’s point created from a histogram’s bin. It can be “left” (default), “right” and “middle”.

Return the resulting graph.

init_bins(edges, value=0, deepcopy=False)[source]¶

Initialize cells of the form edges with the given value.

Return bins filled with copies of value.

Value must be copyable, usual numbers will suit. If the value is mutable, use deepcopy = True (or the content of cells will be identical).

Examples:

>>> edges = [[0, 1], [0, 1]]
>>> # one cell
>>> init_bins(edges)
[[0]]
>>> # no need to use floats,
>>> # because integers will automatically be cast to floats
>>> # when used together
>>> init_bins(edges, 0.0)
[[0.0]]
>>> init_bins([[0, 1, 2], [0, 1, 2]])
[[0, 0], [0, 0]]
>>> init_bins([0, 1, 2])
[0, 0]

integral(bins, edges)[source]¶

Compute integral (scale for a histogram).

bins contain values, and edges form the mesh for the integration. Their format is defined in histogram description.

iter_bins(bins)[source]¶

Iterate on bins. Yield (index, bin content).

Edges with higher index are iterated first (that is z, then y, then x for a 3-dimensional histogram).

iter_cells(hist, ranges=None, coord_ranges=None)[source]¶

Iterate cells of a histogram hist, possibly in a subrange.

For each bin, yield a HistCell containing bin edges, bin content and bin index. The order of iteration is the same as for iter_bins().

ranges are the ranges of bin indices to be used for each coordinate (the lower value is included, the upper value is excluded).

coord_ranges set real coordinate ranges based on histogram edges. Obviously, they can be not exactly bin edges. If one of the ranges for the given coordinate is outside the histogram edges, then only existing histogram edges within the range are selected. If the coordinate range is completely outside histogram edges, nothing is yielded. If a lower or upper coord_range falls within a bin, this bin is yielded. Note that if a coordinate range falls on a bin edge, the number of generated bins can be unstable because of limited float precision.

ranges and coord_ranges are tuples of tuples of limits in corresponding dimensions. For one-dimensional histogram it must be a tuple containing a tuple, for example ((None, None),).

None as an upper or lower range means no limit (((None, None),) is equivalent to ((0, len(bins)),) for a 1-dimensional histogram).

If a range index is lower than 0 or higher than possible index, LenaValueError is raised. If both coord_ranges and ranges are provided, LenaTypeError is raised.

make_hist_context(hist, context)[source]¶

Update context with the context of a histogram hist.

Deep copy of updated context is returned.

unify_1_md(bins, edges)[source]¶

Unify 1- and multidimensional bins and edges.

Return a tuple of (bins, edges). Bins and multidimensional edges return unchanged, while one-dimensional edges are inserted into a list.

Structures¶

Histograms¶

Graph¶

Histogram functions¶

Table of Contents

Previous topic

Next topic

This Page