Structures¶
Histograms:
histogram (edges[, bins, initial_value]) |
A multidimensional histogram. |
Histogram (edges[, bins, make_bins, …]) |
An element to produce histograms. |
NumpyHistogram (*args, **kwargs) |
Create a histogram using a 1-dimensional numpy.histogram. |
Graph:
Graph ([points, context, scale, sort]) |
Function at given coordinates (arbitraty dimensions). |
HistToGraph ([make_value, get_coordinate]) |
Transform a histogram to a Graph . |
Histogram functions:
HistCell |
A namedtuple with fields edges, bin, index. |
check_edges_increasing (edges) |
Assure that multidimensional edges are increasing. |
get_bin_edges (index, edges) |
Return edges of the bin for the given edges of a histogram. |
get_bin_on_index (index, bins) |
Return bin corresponding to multidimensional index. |
get_bin_on_value (arg, edges) |
Get the bin index for arg in a multidimensional array edges. |
get_bin_on_value_1d (val, arr) |
Return index for value in one-dimensional array. |
hist_to_graph (hist, context[, make_value, …]) |
Convert a histogram to a Graph . |
init_bins (edges[, value, deepcopy]) |
Initialize cells of the form edges with the given value. |
integral (bins, edges) |
Compute integral (scale for a histogram). |
iter_bins (bins) |
Iterate on bins. |
iter_cells (hist[, ranges, coord_ranges]) |
Iterate cells of a histogram hist, possibly in a subrange. |
make_hist_context (hist, context) |
Update context with the context of a histogram hist. |
unify_1_md (bins, edges) |
Unify 1- and multidimensional bins and edges. |
Histograms¶
-
class
histogram
(edges, bins=None, initial_value=0)[source]¶ A multidimensional histogram.
Arbitrary dimension, variable bin size and weights are supported. Lower bin edge is included, upper edge is excluded. Underflow and overflow values are skipped. Bin content can be of arbitrary type, which is defined during initialization.
Examples:
>>> # a two-dimensional histogram >>> hist = histogram([[0, 1, 2], [0, 1, 2]]) >>> hist.fill([0, 1]) >>> hist.bins [[0, 1], [0, 0]] >>> values = [[0, 0], [1, 0], [1, 1]] >>> # fill the histogram with values >>> for v in values: ... hist.fill(v) >>> hist.bins [[1, 1], [1, 1]]
edges is a sequence of one-dimensional arrays, each containing strictly increasing bin edges.
Histogram’s bins by default are initialized with initial_value. It can be any object that supports addition with weight during fill (but that is not necessary if you don’t plan to fill the histogram). If the initial_value is compound and requires special copying, create initial bins yourself (see
init_bins()
).A histogram can be created from existing bins and edges. In this case a simple check of the shape of bins is done (raising
LenaValueError
if failed).Attributes
edges
is a list of edges on each dimension. Edges mark the borders of the bin. Edges along each dimension are one-dimensional lists, and the multidimensional bin is the result of all intersections of one-dimensional edges. For example, a 3-dimensional histogram has edges of the form [x_edges, y_edges, z_edges], and the 0th bin has borders ((x[0], x[1]), (y[0], y[1]), (z[0], z[1])).Index in the edges is a tuple, where a given position corresponds to a dimension, and the content at that position to the bin along that dimension. For example, index (0, 1, 3) corresponds to the bin with lower edges (x[0], y[1], z[3]).
bins
is a list of nested lists. Same index as for edges can be used to get bin content: bin at (0, 1, 3) can be obtained as bins[0][1][3]. Most nested arrays correspond to highest (further from x) coordinates. For example, for a 3-dimensional histogram bins equal to [[[1, 1], [0, 0]], [[0, 0], [0, 0]]] mean that the only filled bins are those where x and y indices are 0, and z index is 0 and 1.dim
is the dimension of a histogram (length of its edges for a multidimensional histogram).If subarrays of edges are not increasing or if any of them has length less than 2,
LenaValueError
is raised.Programmer’s note
one- and multidimensional histograms have different bins and edges format. To be unified, 1-dimensional edges should be nested in a list (like [[1, 2, 3]]). Instead, they are simply the x-edges list, because it is more intuitive and one-dimensional histograms are used more often. To unify the interface for bins and edges in your code, use
unify_1_md()
function.-
__eq__
(other)[source]¶ Two histograms are equal, if and only if they have equal bins and equal edges.
If other is not a
histogram
, returnFalse
.Note that floating numbers should be compared approximately (using
math.isclose()
).
-
fill
(coord, weight=1)[source]¶ Fill histogram at coord with the given weight.
Coordinates outside the histogram’s edges are ignored.
-
scale
(other=None, recompute=False)[source]¶ Compute or set scale (integral of the histogram).
If other is None, return scale of this histogram. If its scale was not computed before, it is computed and stored for subsequent use (unless explicitly asked to recompute).
If a float other is provided, rescale to other. A new histogram with the scale equal to other is returned, the original histogram remains unchanged.
Histograms with scale equal to zero can’t be rescaled.
LenaValueError
is raised if one tries to do that.
-
-
class
Histogram
(edges, bins=None, make_bins=None, initial_value=0, context=None)[source]¶ An element to produce histograms.
edges, bins and initial_value have the same meaning as during creation of a
histogram
.make_bins is a function without arguments that creates new bins (it will be called during
__init__()
andreset()
). initial_value in this case is ignored, but bin check is being done. If both bins and make_bins are provided,LenaTypeError
is raised.-
compute
()[source]¶ Yield histogram with context.
context.histogram is updated with histogram’s attributes.
-
fill
(value, weight=1)[source]¶ Fill the histogram with value with given weight.
value can be a (data, context) pair. Values outside the histogram edges are ignored.
-
reset
()[source]¶ Reset the histogram.
Current context is reset to an empty dict. Bins are reinitialized with the initial_value or with make_bins (depending on the initialization).
If bins were set explicitly during the initialization,
LenaRuntimeError
is raised.
-
-
class
NumpyHistogram
(*args, **kwargs)[source]¶ Create a histogram using a 1-dimensional numpy.histogram.
The result of compute is a Lena
histogram
, but it is calculated using numpy histogram, and all its initialization arguments are passed to numpy.Examples
With NumpyHistogram() bins are automatically derived from data.
With NumpyHistogram(bins=list(range(0, 5)), density=True) bins are set explicitly.
Warning
as numpy histogram is computed from an existing array, all values are stored in the internal data structure during fill, which may take much memory.
Use *args and **kwargs for numpy.histogram initialization.
Default bins keyword argument is auto.
A keyword argument reset specifies the exact behaviour of request.
Graph¶
-
class
Graph
(points=None, context=None, scale=None, sort=True)[source]¶ Function at given coordinates (arbitraty dimensions).
Graph points can be set during the initialization and during
fill()
. It can be rescaled (producing a newGraph
). A point is a tuple of (coordinate, value), where both coordinate and value can be tuples of numbers. Coordinate corresponds to a point in N-dimensional space, while value is some function’s value at this point (the function can take a value in M-dimensional space). Coordinate and value dimensions must be the same for all points.One can get graph points as
Graph.points
attribute. They will be sorted each time before return if sort was set toTrue
. An attempt to change points (useGraph.points
on the left of ‘=’) will raise Python’sAttributeError
.points is an array of (coordinate, value) tuples.
context is the same as the most recent context during fill. Use it to provide a context when initializing a
Graph
from existing points.scale sets the scale of the graph. It is used during plotting if rescaling is needed.
Graph coordinates are sorted by default. This is usually needed to plot graphs of functions. If you need to keep the order of insertion, set sort to
False
.By default, sorting is done using standard Python lists and functions. You can disable sort and provide your own sorting container for points. Some implementations are compared here. Note that a rescaled graph uses a default list.
Note that
Graph
does not reduce data. All filled values will be stored in it. To reduce data, use histograms.-
fill
(value)[source]¶ Fill the graph with value.
Value can be a (data, context) tuple. Data part must be a (coordinates, value) pair, where both coordinates and value are also tuples. For example, value can contain the principal number and its precision.
-
points
¶ Get graph points (read only).
-
request
()[source]¶ Yield graph with context.
If sort was initialized
True
, graph points will be sorted.
-
scale
(other=None)[source]¶ Get or set the scale.
Graph’s scale comes from an external source. For example, if the graph was computed from a function, this may be its integral passed via context during
fill()
. Once the scale is set, it is stored in the graph. If one attempts to use scale which was not set,LenaAttributeError
is raised.If other is None, return the scale.
If a
float
other is provided, rescale to other. A new graph with the scale equal to other is returned, the original one remains unchanged. Note that in this case its points will be a simple list and new graph sort parameter will beTrue
.Graphs with scale equal to zero can’t be rescaled. Attempts to do that raise
LenaValueError
.
-
to_csv
(separator=', ', header=None)[source]¶ Convert graph’s points to CSV.
separator delimits values, the default is comma.
header, if not
None
, is the first string of the output (new line is added automatically).Since a graph can be multidimensional, for each point first its coordinate is converted to string (separated by separator), then each part of its value.
To convert
Graph
to CSV inside a Lena sequence, uselena.output.ToCSV
.
-
-
class
HistToGraph
(make_value=None, get_coordinate='left')[source]¶ Transform a
histogram
to aGraph
.make_value is a function, that creates graph’s value from the bin’s value. By default, it is simply bin value.
get_coordinate defines the coordinate of the graph’s point. By default, it is the left bin edge. Other allowed values are “right” and “middle”. An incorrect value raises
LenaValueError
during the initialization.
Histogram functions¶
Functions for histograms.
These functions are used for low-level work with histograms and their contents. They are not needed for normal usage.
-
class
HistCell
[source]¶ A namedtuple with fields edges, bin, index.
Create new instance of HistCell(edges, bin, index)
-
check_edges_increasing
(edges)[source]¶ Assure that multidimensional edges are increasing.
If length of edges or its subarray is less than 2 or if some subarray of edges contains not strictly increasing values,
LenaValueError
is raised.
-
get_bin_edges
(index, edges)[source]¶ Return edges of the bin for the given edges of a histogram.
In one-dimensional case index must be an integer and a tuple of (x_low_edge, x_high_edge) for that bin is returned.
In a multidimensional case index is a container of numeric indices in each dimension. A list of bin edges in each dimension is returned.
-
get_bin_on_index
(index, bins)[source]¶ Return bin corresponding to multidimensional index.
index can be a number or a list/tuple. If index length is less than dimension of bins, a subarray of bins is returned.
In case of an index error,
LenaIndexError
is raised.Example:
>>> from lena.structures import histogram, get_bin_on_index >>> hist = histogram([0, 1], [0]) >>> get_bin_on_index(0, hist.bins) 0 >>> get_bin_on_index((0, 1), [[0, 1], [0, 0]]) 1 >>> get_bin_on_index(0, [[0, 1], [0, 0]]) [0, 1]
-
get_bin_on_value
(arg, edges)[source]¶ Get the bin index for arg in a multidimensional array edges.
arg is a 1-dimensional array of numbers (or a number for 1-dimensional edges), and corresponds to a point in N-dimensional space.
edges is an array of N-1 dimensional arrays (lists or tuples) of numbers. Each 1-dimensional subarray consists of increasing numbers.
arg and edges must have the same length (otherwise
LenaValueError
is raised). arg and edges must be iterable and support len().Return list of indices in edges corresponding to arg.
If any coordinate is out of its corresponding edge range, its index will be
-1
for underflow orlen(edge)-1
for overflow.Examples:
>>> from lena.structures import get_bin_on_value >>> edges = [[1, 2, 3], [1, 3.5]] >>> get_bin_on_value((1.5, 2), edges) [0, 0] >>> get_bin_on_value((1.5, 0), edges) [0, -1] >>> # the upper edge is excluded >>> get_bin_on_value((3, 2), edges) [2, 0] >>> # one-dimensional edges >>> edges = [1, 2, 3] >>> get_bin_on_value(2, edges) [1]
-
get_bin_on_value_1d
(val, arr)[source]¶ Return index for value in one-dimensional array.
arr must contain strictly increasing values (not necessarily equidistant), it is not checked.
“Linear binary search” is used, that is our array search by default assumes the array to be split on equidistant steps.
Example:
>>> from lena.structures import get_bin_on_value_1d >>> arr = [0, 1, 4, 5, 7, 10] >>> get_bin_on_value_1d(0, arr) 0 >>> get_bin_on_value_1d(4.5, arr) 2 >>> # upper range is excluded >>> get_bin_on_value_1d(10, arr) 5 >>> # underflow >>> get_bin_on_value_1d(-10, arr) -1
-
hist_to_graph
(hist, context, make_value=None, get_coordinate='left')[source]¶ Convert a
histogram
to aGraph
.context becomes the graph’s context. For example, it can contain a scale.
make_value is a function to set graph point’s value. By default it is bin content. This option could be used to create graph error bars. make_value accepts a single value (bin content), which can contain a context. Define this function depending on the expected data. For example, to create a graph with errors from a histogram where bins contain a named tuple with fields mean, mean_error and a context one could use
>>> make_value = lambda val: (val[0].mean, val[0].mean_error)
get_coordinate defines what will be the coordinate of a graph’s point created from a histogram’s bin. It can be “left” (default), “right” and “middle”.
Return the resulting graph.
-
init_bins
(edges, value=0, deepcopy=False)[source]¶ Initialize cells of the form edges with the given value.
Return bins filled with copies of value.
Value must be copyable, usual numbers will suit. If the value is mutable, use deepcopy =
True
(or the content of cells will be identical).Examples:
>>> edges = [[0, 1], [0, 1]] >>> # one cell >>> init_bins(edges) [[0]] >>> # no need to use floats, >>> # because integers will automatically be cast to floats >>> # when used together >>> init_bins(edges, 0.0) [[0.0]] >>> init_bins([[0, 1, 2], [0, 1, 2]]) [[0, 0], [0, 0]] >>> init_bins([0, 1, 2]) [0, 0]
-
integral
(bins, edges)[source]¶ Compute integral (scale for a histogram).
bins contain values, and edges form the mesh for the integration. Their format is defined in
histogram
description.
-
iter_bins
(bins)[source]¶ Iterate on bins. Yield (index, bin content).
Edges with higher index are iterated first (that is z, then y, then x for a 3-dimensional histogram).
-
iter_cells
(hist, ranges=None, coord_ranges=None)[source]¶ Iterate cells of a histogram hist, possibly in a subrange.
For each bin, yield a
HistCell
containing bin edges, bin content and bin index. The order of iteration is the same as foriter_bins()
.ranges are the ranges of bin indices to be used for each coordinate (the lower value is included, the upper value is excluded).
coord_ranges set real coordinate ranges based on histogram edges. Obviously, they can be not exactly bin edges. If one of the ranges for the given coordinate is outside the histogram edges, then only existing histogram edges within the range are selected. If the coordinate range is completely outside histogram edges, nothing is yielded. If a lower or upper coord_range falls within a bin, this bin is yielded. Note that if a coordinate range falls on a bin edge, the number of generated bins can be unstable because of limited float precision.
ranges and coord_ranges are tuples of tuples of limits in corresponding dimensions. For one-dimensional histogram it must be a tuple containing a tuple, for example ((None, None),).
None
as an upper or lower range means no limit (((None, None),) is equivalent to ((0, len(bins)),) for a 1-dimensional histogram).If a range index is lower than 0 or higher than possible index,
LenaValueError
is raised. If both coord_ranges and ranges are provided,LenaTypeError
is raised.