Structures¶
Histograms:
|
A multidimensional histogram. |
|
An element to produce histograms. |
|
Create a histogram using a 1-dimensional numpy.histogram. |
Graph:
|
Numeric arrays of equal size. |
|
Deprecated since version 0.5. |
|
2-dimensional ROOT graph with errors. |
Element to convert graphs to |
|
Split into bins:
|
Iterate bins of histograms. |
|
Transform bin content of histograms. |
|
Split analysis into groups defined by bins. |
Histogram functions:
|
A namedtuple with fields edges, bin, index. |
|
Transform cell edges into a string. |
|
Assure that multidimensional edges are increasing. |
|
Return edges of the bin for the given edges of a histogram. |
|
Return bin corresponding to multidimensional index. |
|
Get the bin index for arg in a multidimensional array edges. |
|
Return index for value in one-dimensional array. |
|
Return bin with zero index on each axis of the histogram bins. |
|
|
|
Initialize cells of the form edges with the given value. |
|
Compute integral (scale for a histogram). |
|
Iterate on bins. |
|
Generate (bin content, bin edges) pairs. |
|
Iterate cells of a histogram hist, possibly in a subrange. |
|
Update a deep copy of context with the context of a |
|
Unify 1- and multidimensional bins and edges. |
Histograms¶
- class histogram(edges, bins=None, initial_value=0)[source]¶
A multidimensional histogram.
Arbitrary dimension, variable bin size and weights are supported. Lower bin edge is included, upper edge is excluded. Underflow and overflow values are skipped. Bin content can be of arbitrary type, which is defined during initialization.
Examples:
>>> # a two-dimensional histogram >>> hist = histogram([[0, 1, 2], [0, 1, 2]]) >>> hist.fill([0, 1]) >>> hist.bins [[0, 1], [0, 0]] >>> values = [[0, 0], [1, 0], [1, 1]] >>> # fill the histogram with values >>> for v in values: ... hist.fill(v) >>> hist.bins [[1, 1], [1, 1]]
edges is a sequence of one-dimensional arrays, each containing strictly increasing bin edges.
Histogram’s bins by default are initialized with initial_value. It can be any object that supports addition with weight during fill (but that is not necessary if you don’t plan to fill the histogram). If the initial_value is compound and requires special copying, create initial bins yourself (see
init_bins()
).A histogram can be created from existing bins and edges. In this case a simple check of the shape of bins is done (raising
LenaValueError
if failed).Attributes
edges
is a list of edges on each dimension. Edges mark the borders of the bin. Edges along each dimension are one-dimensional lists, and the multidimensional bin is the result of all intersections of one-dimensional edges. For example, a 3-dimensional histogram has edges of the form [x_edges, y_edges, z_edges], and the 0th bin has borders ((x[0], x[1]), (y[0], y[1]), (z[0], z[1])).Index in the edges is a tuple, where a given position corresponds to a dimension, and the content at that position to the bin along that dimension. For example, index (0, 1, 3) corresponds to the bin with lower edges (x[0], y[1], z[3]).
bins
is a list of nested lists. Same index as for edges can be used to get bin content: bin at (0, 1, 3) can be obtained as bins[0][1][3]. Most nested arrays correspond to highest (further from x) coordinates. For example, for a 3-dimensional histogram bins equal to [[[1, 1], [0, 0]], [[0, 0], [0, 0]]] mean that the only filled bins are those where x and y indices are 0, and z index is 0 and 1.dim
is the dimension of a histogram (length of its edges for a multidimensional histogram).If subarrays of edges are not increasing or if any of them has length less than 2,
LenaValueError
is raised.Programmer’s note
one- and multidimensional histograms have different bins and edges format. To be unified, 1-dimensional edges should be nested in a list (like [[1, 2, 3]]). Instead, they are simply the x-edges list, because it is more intuitive and one-dimensional histograms are used more often. To unify the interface for bins and edges in your code, use
unify_1_md()
function.- __eq__(other)[source]¶
Two histograms are equal, if and only if they have equal bins and equal edges.
If other is not a
histogram
, returnFalse
.Note that floating numbers should be compared approximately (using
math.isclose()
).
- add(other, weight=1)[source]¶
Add a histogram other to this one.
For each bin, the corresponding bin of other is added. It can be multiplied with weight. For example, to subtract other, use weight -1.
Histograms must have the same edges. Note that floating numbers should be compared approximately (using
math.isclose()
).
- fill(coord, weight=1)[source]¶
Fill histogram at coord with the given weight.
Coordinates outside the histogram edges are ignored.
- scale(other=None, recompute=False)[source]¶
Compute or set scale (integral of the histogram).
If other is
None
, return scale of this histogram. If its scale was not computed before, it is computed and stored for subsequent use (unless explicitly asked to recompute). Note that after changing (filling) the histogram one must explicitly recompute the scale if it was computed before.If a float other is provided, rescale self to other.
Histograms with scale equal to zero can’t be rescaled.
LenaValueError
is raised if one tries to do that.
- class Histogram(edges, bins=None, make_bins=None, initial_value=0)[source]¶
An element to produce histograms.
edges, bins and initial_value have the same meaning as during creation of a
histogram
.make_bins is a function without arguments that creates new bins (it will be called during
__init__()
andreset()
). initial_value in this case is ignored, but bin check is made. If both bins and make_bins are provided,LenaTypeError
is raised.
- class NumpyHistogram(*args, **kwargs)[source]¶
Create a histogram using a 1-dimensional numpy.histogram.
The result of compute is a Lena
histogram
, but it is calculated using numpy histogram, and all its initialization arguments are passed to numpy.Examples
With NumpyHistogram() bins are automatically derived from data.
With NumpyHistogram(bins=list(range(0, 5)), density=True) bins are set explicitly.
Warning
as numpy histogram is computed from an existing array, all values are stored in the internal data structure during fill, which may take much memory.
Use *args and **kwargs for numpy.histogram initialization.
Default bins keyword argument is auto.
A keyword argument reset specifies the exact behaviour of request.
Graph¶
- class graph(coords, field_names=('x', 'y'), scale=None)[source]¶
Numeric arrays of equal size.
This structure generally corresponds to the graph of a function and represents arrays of coordinates and the function values of arbitrary dimensions.
coords is a list of one-dimensional coordinate and value sequences (usually lists). There is little to no distinction between them, and “values” can also be called “coordinates”.
field_names provide the meaning of these arrays. For example, a 3-dimensional graph could be distinguished from a 2-dimensional graph with errors by its fields (“x”, “y”, “z”) versus (“x”, “y”, “error_y”). Field names don’t affect drawing graphs: for that
Variable
-s should be used. Default field names, provided for the most used 2-dimensional graphs, are “x” and “y”.field_names can be a string separated by whitespace and/or commas or a tuple of strings, such as (“x”, “y”). field_names must have as many elements as coords and each field name must be unique. Otherwise field names are arbitrary. Error fields must go after all other coordinates. Name of a coordinate error is “error_” appended by coordinate name. Further error details are appended after ‘_’. They could be arbitrary depending on the problem: “low”, “high”, “low_90%_cl”, etc. Example: (“E”, “time”, “error_E_low”, “error_time”).
scale of the graph is a kind of its norm. It could be the integral of the function or its other property. A scale of a normalised probability density function would be one. An initialized scale is required if one needs to renormalise the graph in
scale()
(for example, to plot it with other graphs).Coordinates of a function graph would usually be arrays of increasing values, which is not required here. Neither is it checked that coordinates indeed contain one-dimensional numeric values. However, non-standard graphs will likely lead to errors during plotting and will require more programmer’s work and caution, so use them only if you understand what you are doing.
A graph can be iterated yielding tuples of numbers for each point.
Attributes
coords
is a list of one-dimensional lists of coordinates.field_names
dim
is the dimension of the graph, that is of all its coordinates without errors.In case of incorrect initialization arguments,
LenaTypeError
orLenaValueError
is raised.New in version 0.5.
- scale(other=None)[source]¶
Get or set the scale of the graph.
If other is
None
, return the scale of this graph.If a numeric other is provided, rescale to that value. If the graph has unknown or zero scale, rescaling that will raise
LenaValueError
.To get meaningful results, graph’s fields are used. Only the last coordinate is rescaled. For example, if the graph has x and y coordinates, then y will be rescaled, and for a 3-dimensional graph z will be rescaled. All errors are rescaled together with their coordinate.
- class Graph(points=None, context=None, scale=None, sort=True)[source]¶
Deprecated since version 0.5: use
graph
. This class may be used in the future, but with a changed interface.Function at given coordinates (arbitraty dimensions).
Graph points can be set during the initialization and during
fill()
. It can be rescaled (producing a newGraph
). A point is a tuple of (coordinate, value), where both coordinate and value can be tuples of numbers. Coordinate corresponds to a point in N-dimensional space, while value is some function’s value at this point (the function can take a value in M-dimensional space). Coordinate and value dimensions must be the same for all points.One can get graph points as
Graph.points
attribute. They will be sorted each time before return if sort was set toTrue
. An attempt to change points (useGraph.points
on the left of ‘=’) will raise Python’sAttributeError
.points is an array of (coordinate, value) tuples.
context is the same as the most recent context during fill. Use it to provide a context when initializing a
Graph
from existing points.scale sets the scale of the graph. It is used during plotting if rescaling is needed.
Graph coordinates are sorted by default. This is usually needed to plot graphs of functions. If you need to keep the order of insertion, set sort to
False
.By default, sorting is done using standard Python lists and functions. You can disable sort and provide your own sorting container for points. Some implementations are compared here. Note that a rescaled graph uses a default list.
Note that
Graph
does not reduce data. All filled values will be stored in it. To reduce data, use histograms.- fill(value)[source]¶
Fill the graph with value.
Value can be a (data, context) tuple. Data part must be a (coordinates, value) pair, where both coordinates and value are also tuples. For example, value can contain the principal number and its precision.
- property points¶
Get graph points (read only).
- request()[source]¶
Yield graph with context.
If sort was initialized
True
, graph points will be sorted.
- scale(other=None)[source]¶
Get or set the scale.
Graph’s scale comes from an external source. For example, if the graph was computed from a function, this may be its integral passed via context during
fill()
. Once the scale is set, it is stored in the graph. If one attempts to use scale which was not set,LenaAttributeError
is raised.If other is None, return the scale.
If a
float
other is provided, rescale to other. A new graph with the scale equal to other is returned, the original one remains unchanged. Note that in this case its points will be a simple list and new graph sort parameter will beTrue
.Graphs with scale equal to zero can’t be rescaled. Attempts to do that raise
LenaValueError
.
- to_csv(separator=',', header=None)[source]¶
Deprecated since version 0.5: in Lena 0.5 to_csv is not used. Iterables are converted to tables.
Convert graph’s points to CSV.
separator delimits values, the default is comma.
header, if not
None
, is the first string of the output (new line is added automatically).Since a graph can be multidimensional, for each point first its coordinate is converted to string (separated by separator), then each part of its value.
To convert
Graph
to CSV inside a Lena sequence, uselena.output.ToCSV
.
- class root_graph_errors(graph, type_code='d')[source]¶
2-dimensional ROOT graph with errors.
This is an adapter for TGraphErrors and contains that graph as a field root_graph.
graph is a Lena
graph
.type_code is the basic numeric type of array values (by default double). ‘f’ means floating values. See Python module array for more options.
New in version 0.5.
- class ROOTGraphErrors[source]¶
Element to convert graphs to
root_graph_errors
.- __call__(value)[source]¶
Convert data part of the value (which must be a
graph
) toroot_graph_errors
.New in version 0.5.
- class HistToGraph(make_value, get_coordinate='left', field_names=('x', 'y'), scale=None)[source]¶
Transform a
histogram
to agraph
.make_value is a
Variable
that creates graph value from the bin value.get_coordinate defines the coordinate of the graph point. By default, it is the left bin edge. Other allowed values are “right” and “middle”.
field_names set field names of resulting graphs.
scale sets scales of resulting graphs. If it is
True
, the scale is computed from the histogram.See
hist_to_graph()
for details and examples.Incorrect values for make_value or get_coordinate raise, respectively,
LenaTypeError
orLenaValueError
.- run(flow)[source]¶
Iterate the flow and transform histograms to graphs.
context.value is updated with make_value context. If histogram bins contained context (which is assumed to be the same for all bins), make_value context is composed with that.
Not histograms or histograms with context.histogram.to_graph set to
False
pass unchanged.
Split into bins¶
Split analysis into groups defined by bins.
- class IterateBins(create_edges_str=None, select_bins=None)[source]¶
Iterate bins of histograms.
create_edges_str is a callable that creates a string from bin’s edges and coordinate names and adds that to the context. It is passed parameters (edges, var_context), where var_context is variable context containing variable names (it can be a single
Variable
orCombine
). By default it iscell_to_string()
.select_bins is a callable used to test bin contents. By default, only those histograms are iterated where bins contain histograms. Use select_bins to choose other classes. See
Selector
for examples.If create_edges_str is not callable,
LenaTypeError
is raised.- run(flow)[source]¶
Yield histogram bins one by one.
For each
histogram
from the flow, if its bins pass select_bins, they are iterated.The resulting context is taken from bin’s context. Histogram’s context is preserved in context.bins. context.bin is updated with “edges” (with bin edges) and “edges_str” (their representation). If histogram’s context contains variable, that is used for edges’ representation.
Not histograms pass unchanged.
- class MapBins(seq, select_bins=<function MapBins.<lambda>>, get_example_bin=<function get_example_bin>, drop_bins_context=True)[source]¶
Transform bin content of histograms.
This class can be used when histogram bins contain complex structures. For example, in order to plot a histogram with a 3-dimensional vector in each bin, one can create 3 histograms corresponding to the vector’s components.
seq is a sequence or an element applied to bin contents. If seq is not a
Sequence
or an element with run method, it is converted to aSequence
. Example:seq=Split([X(), Y(), Z()])
(provided that you have X, Y, Z variables).If select_bins applied to histogram bins is
True
(tested on an arbitrary bin), the histogram is transformed. Bin types can be given in alist
or as a generalSelector
. For example,select_bins=[lena.math.vector3, list]
selects histograms where bins are vectors or lists. By default all histograms are accepted.The “arbitrary bin” is returned by a callable get_example_bin (by default
get_example_bin()
).MapBins
creates histograms that may be plotted, because their bins contain only data without context. If drop_bins_context isFalse
, context remains in bins. By default, context of all histogram bins is discarded. This discourages compositions ofMapBins
: make compositions of their internal sequences instead.In case of incorrect arguments,
LenaTypeError
is raised.- run(flow)[source]¶
Transform histograms from flow.
context.value is updated with bin context (if that exists). It is assumed that all bins have the same context (because they were produced by the same sequence), therefore an arbitrary bin is taken and contexts of all other bins are ignored.
Not selected values pass unchanged.
- class SplitIntoBins(seq, arg_var, edges)[source]¶
Split analysis into groups defined by bins.
seq is a
FillComputeSeq
sequence (or will be converted to that) that corresponds to the analysis being performed for different bins. Deep copy of seq is done for each bin.arg_var is a
Variable
that takes data and returns value used to compute the bin index. Example of a two-dimensional function:arg_var = lena.variables.Variable("xy", lambda event: (event.x, event.y))
.edges is a sequence of arrays containing monotonically increasing bin edges along each dimension. Example:
edges = lena.math.mesh((0, 1), 10)
.Note
The final histogram may contain vectors, histograms and any other data the analysis produced. To plot them, one can extract vector components with e.g.
MapBins
. If bin contents are histograms, they can be yielded one by one withIterateBins
.Attributes: bins, edges.
If edges are not increasing,
LenaValueError
is raised. In case of other argument initialization problems,LenaTypeError
is raised.- compute()[source]¶
Yield a (histogram, context) pair for each compute() for all bins.
The
histogram
is created fromedges
with bin contents taken from compute() forbins
. Computational context is preserved in histogram’s bins.SplitIntoBins
adds context as a subcontext variable (corresponding to arg_var). This allows unification ofSplitIntoBins
with common analysis using variables (useful when creating plots from one template). Existing context values are preserved.Note
In Python 3 the minimum number of compute() among all bins is used. In Python 2, if some bin is exhausted before the others, its content will be filled with
None
.
Histogram functions¶
Functions for histograms.
These functions are used for low-level work with histograms and their contents. They are not needed for normal usage.
- class HistCell(edges, bin, index)[source]¶
A namedtuple with fields edges, bin, index.
Create new instance of HistCell(edges, bin, index)
- cell_to_string(cell_edges, var_context=None, coord_names=None, coord_fmt='{}_lte_{}_lt_{}', coord_join='_', reverse=False)[source]¶
Transform cell edges into a string.
cell_edges is a tuple of pairs (lower bound, upper bound) for each coordinate.
coord_names is a list of coordinates names.
coord_fmt is a string, which defines how to format individual coordinates.
coord_join is a string, which joins coordinate pairs.
If reverse is True, coordinates are joined in reverse order.
- check_edges_increasing(edges)[source]¶
Assure that multidimensional edges are increasing.
If length of edges or its subarray is less than 2 or if some subarray of edges contains not strictly increasing values,
LenaValueError
is raised.
- get_bin_edges(index, edges)[source]¶
Return edges of the bin for the given edges of a histogram.
In one-dimensional case index must be an integer and a tuple of (x_low_edge, x_high_edge) for that bin is returned.
In a multidimensional case index is a container of numeric indices in each dimension. A list of bin edges in each dimension is returned.
- get_bin_on_index(index, bins)[source]¶
Return bin corresponding to multidimensional index.
index can be a number or a list/tuple. If index length is less than dimension of bins, a subarray of bins is returned.
In case of an index error,
LenaIndexError
is raised.Example:
>>> from lena.structures import histogram, get_bin_on_index >>> hist = histogram([0, 1], [0]) >>> get_bin_on_index(0, hist.bins) 0 >>> get_bin_on_index((0, 1), [[0, 1], [0, 0]]) 1 >>> get_bin_on_index(0, [[0, 1], [0, 0]]) [0, 1]
- get_bin_on_value(arg, edges)[source]¶
Get the bin index for arg in a multidimensional array edges.
arg is a 1-dimensional array of numbers (or a number for 1-dimensional edges), and corresponds to a point in N-dimensional space.
edges is an array of N-1 dimensional arrays (lists or tuples) of numbers. Each 1-dimensional subarray consists of increasing numbers.
arg and edges must have the same length (otherwise
LenaValueError
is raised). arg and edges must be iterable and support len().Return list of indices in edges corresponding to arg.
If any coordinate is out of its corresponding edge range, its index will be
-1
for underflow orlen(edge)-1
for overflow.Examples:
>>> from lena.structures import get_bin_on_value >>> edges = [[1, 2, 3], [1, 3.5]] >>> get_bin_on_value((1.5, 2), edges) [0, 0] >>> get_bin_on_value((1.5, 0), edges) [0, -1] >>> # the upper edge is excluded >>> get_bin_on_value((3, 2), edges) [2, 0] >>> # one-dimensional edges >>> edges = [1, 2, 3] >>> get_bin_on_value(2, edges) [1]
- get_bin_on_value_1d(val, arr)[source]¶
Return index for value in one-dimensional array.
arr must contain strictly increasing values (not necessarily equidistant), it is not checked.
“Linear binary search” is used, that is our array search by default assumes the array to be split on equidistant steps.
Example:
>>> from lena.structures import get_bin_on_value_1d >>> arr = [0, 1, 4, 5, 7, 10] >>> get_bin_on_value_1d(0, arr) 0 >>> get_bin_on_value_1d(4.5, arr) 2 >>> # upper range is excluded >>> get_bin_on_value_1d(10, arr) 5 >>> # underflow >>> get_bin_on_value_1d(-10, arr) -1
- get_example_bin(struct)[source]¶
Return bin with zero index on each axis of the histogram bins.
For example, if the histogram is two-dimensional, return hist[0][0].
struct can be a
histogram
or an array of bins.
- hist_to_graph(hist, make_value=None, get_coordinate='left', field_names=('x', 'y'), scale=None)[source]¶
Convert a
histogram
to agraph
.make_value is a function to set the value of a graph’s point. By default it is bin content. make_value accepts a single value (bin content) without context.
This option could be used to create graph’s error bars. For example, to create a graph with errors from a histogram where bins contain a named tuple with fields mean, mean_error and a context one could use
>>> make_value = lambda bin_: (bin_.mean, bin_.mean_error)
get_coordinate defines what the coordinate of a graph point created from a histogram bin will be. It can be “left” (default), “right” and “middle”.
field_names set field names of the graph. Their number must be the same as the dimension of the result. For a make_value above they would be (“x”, “y_mean”, “y_mean_error”).
scale becomes the graph’s scale (unknown by default). If it is
True
, it uses the histogram scale.hist must contain only numeric bins (without context) or make_value must remove context when creating a numeric graph.
Return the resulting graph.
- init_bins(edges, value=0, deepcopy=False)[source]¶
Initialize cells of the form edges with the given value.
Return bins filled with copies of value.
Value must be copyable, usual numbers will suit. If the value is mutable, use deepcopy =
True
(or the content of cells will be identical).Examples:
>>> edges = [[0, 1], [0, 1]] >>> # one cell >>> init_bins(edges) [[0]] >>> # no need to use floats, >>> # because integers will automatically be cast to floats >>> # when used together >>> init_bins(edges, 0.0) [[0.0]] >>> init_bins([[0, 1, 2], [0, 1, 2]]) [[0, 0], [0, 0]] >>> init_bins([0, 1, 2]) [0, 0]
- integral(bins, edges)[source]¶
Compute integral (scale for a histogram).
bins contain values, and edges form the mesh for the integration. Their format is defined in
histogram
description.
- iter_bins(bins)[source]¶
Iterate on bins. Yield (index, bin content).
Edges with higher index are iterated first (that is z, then y, then x for a 3-dimensional histogram).
- iter_bins_with_edges(bins, edges)[source]¶
Generate (bin content, bin edges) pairs.
Bin edges is a tuple, such that its item at index i is (lower bound, upper bound) of the bin at i-th coordinate.
Examples:
>>> from lena.math import mesh >>> list(iter_bins_with_edges([0, 1, 2], edges=mesh((0, 3), 3))) [(0, ((0, 1.0),)), (1, ((1.0, 2.0),)), (2, ((2.0, 3),))] >>> >>> # 2-dimensional histogram >>> list(iter_bins_with_edges( ... bins=[[2]], edges=mesh(((0, 1), (0, 1)), (1, 1)) ... )) [(2, ((0, 1), (0, 1)))]
New in version 0.5: made public.
- iter_cells(hist, ranges=None, coord_ranges=None)[source]¶
Iterate cells of a histogram hist, possibly in a subrange.
For each bin, yield a
HistCell
containing bin edges, bin content and bin index. The order of iteration is the same as foriter_bins()
.ranges are the ranges of bin indices to be used for each coordinate (the lower value is included, the upper value is excluded).
coord_ranges set real coordinate ranges based on histogram edges. Obviously, they can be not exactly bin edges. If one of the ranges for the given coordinate is outside the histogram edges, then only existing histogram edges within the range are selected. If the coordinate range is completely outside histogram edges, nothing is yielded. If a lower or upper coord_range falls within a bin, this bin is yielded. Note that if a coordinate range falls on a bin edge, the number of generated bins can be unstable because of limited float precision.
ranges and coord_ranges are tuples of tuples of limits in corresponding dimensions. For one-dimensional histogram it must be a tuple containing a tuple, for example ((None, None),).
None
as an upper or lower range means no limit (((None, None),) is equivalent to ((0, len(bins)),) for a 1-dimensional histogram).If a range index is lower than 0 or higher than possible index,
LenaValueError
is raised. If both coord_ranges and ranges are provided,LenaTypeError
is raised.