Flow

Elements:

Cache(filename[, method, protocol]) Cache flow passing through.
DropContext(*args) Sequence that transforms (data, context) flow so that only data remains in the inner sequence.
End Stop sequence here.
Filter(selector) Filter values from flow.
Print([before, sep, end, transform]) Print values passing through.
Progress([name, format]) Print progress (how much data was processed and remains).
RunIf(select, seq) Run a sequence only for selected values.

Functions:

get_context(value) Get context from a possible (data, context) pair.
get_data(value) Get data from value (a possible (data, context) pair).
get_data_context(value) Get (data, context) from value (a possible (data, context) pair).
seq_map(seq, container[, one_result]) Map Lena Sequence seq to the container.

Group plots:

GroupBy(group_by) Group values.
GroupPlots(group_by, select[, transform, …]) Group several plots.
GroupScale(scale_to[, allow_zero_scale, …]) Scale a group of data.
Not(selector[, raise_on_error]) Negate a selector.
Selector(selector[, raise_on_error]) Determine whether an item should be selected.

Iterators:

Chain(*iterables) Chain generators.
CountFrom([start, step]) Generate numbers from start to infinity, with step between values.
Reverse() Reverse the flow (yield values from last to first).
Slice(*args) Slice data flow from start to stop with step.

Split into bins:

IterateBins([create_edges_str, select_bins]) Iterate bins of histograms.
MapBins(seq, select_bins[, drop_bins_context]) Transform bin content of histograms.
SplitIntoBins(seq, arg_var, edges) Split analysis into groups defined by bins.
cell_to_string(cell_edges[, var_context, …]) Transform cell edges into a string.
get_example_bin(struct) Return bin with zero index on each axis of the histogram bins.

Elements

Elements form Lena sequences. This group contains miscellaneous elements, which didn’t fit other categories.

class Cache(filename, method='cPickle', protocol=2)[исходный код]

Cache flow passing through.

On the first run, dump all flow to file (and yield the flow unaltered). On subsequent runs, load all flow from that file in the original order.

Example:

s = Source(
         ReadFiles(),
         ReadEvents(),
         MakeHistograms(),
         Cache("histograms.pkl"),
         MakeStats(),
         Cache("stats.pkl"),
      )

If stats.pkl exists, Cache will read data flow from that file and no other processing will be done. If the stats.pkl cache doesn’t exist, but the cache for histograms exist, it will be used and no previous processing (from ReadFiles to MakeHistograms) will occur. If both caches are not filled yet, processing will run as usually.

Only pickleable objects can be cached (otherwise a pickle.PickleError is raised).

Предупреждение

The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data from an untrusted source.

filename is the name of file where to store the cache. You can give it .pkl extension.

method can be pickle or cPickle (faster pickle). For Python 3 they are same.

protocol is pickle protocol. Version 2 is the highest supported by Python 2. Version 0 is «human-readable» (as noted in the documentation). 3 is recommended if compatibility between Python 3 versions is needed. 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations.

static alter_sequence(seq)[исходный код]

If the Sequence seq contains a Cache, which has an up-to-date cache, a Source is built based on the flattened seq and returned. Otherwise the seq is returned unchanged.

cache_exists()[исходный код]

Return True if file with cache exists and is readable.

drop_cache()[исходный код]

Remove file with cache if that exists, pass otherwise.

If cache exists and is readable, but could not be deleted, LenaEnvironmentError is raised.

run(flow)[исходный код]

Load cache or fill it.

If we can read filename, load flow from there. Otherwise use the incoming flow and fill the cache. All loaded or passing items are yielded.

class DropContext(*args)[исходный код]

Sequence that transforms (data, context) flow so that only data remains in the inner sequence. Context is restored outside DropContext.

DropContext works for most simple cases as a Sequence, but may not work in more advanced circumstances. For example, since DropContext is not transparent, Split can’t judge whether it has a FillCompute element inside, and this may lead to errors in the analysis. It is recommended to provide context when possible.

*args will form a Sequence.

run(flow)[исходный код]

Run the sequence without context, and generate output flow restoring the context before DropContext.

If the sequence adds a context, the returned context is updated with that.

class End[исходный код]

Stop sequence here.

run(flow)[исходный код]

Exhaust all preceding flow and stop iteration (yield nothing to the following flow).

class Filter(selector)[исходный код]

Filter values from flow.

selector is a boolean function. If it returns True, the value passes Filter. If selector is not callable, it is converted to a Selector. If the conversion could not be done, LenaTypeError is raised.

Примечание

Filter appeared in Lena only in version 0.4. There may be better alternatives to using this element:

  • don’t produce values that you will discard later. If you want to select data from a specific file, read only that file.
  • use a custom class. SelectPosition(«border») is more readable and maintainable than a Filter with many conditions, and it is also more cohesive if you group several options like «center» or «top» in a single place. If you make a selection, it can be useful to add information about that to the context (and Filter does not do that).

This doesn’t mean that we recommend against this class: sometimes it can be quick and explicit, and if one’s class name provides absolutely no clue what it does, a general Filter would be more readable.

fill_into(element, value)[исходный код]

Fill value into an element if selector(value) is True.

Element must have a fill(value) method.

run(flow)[исходный код]

Yield values from the flow for which the selector is True.

class Print(before='', sep='', end='n', transform=None)[исходный код]

Print values passing through.

before is a string appended before the first element in the item (which may be a container).

sep separates elements, end is appended after the last element.

transform is a function which transforms passing items (for example, it can select its specific fields).

__call__(value)[исходный код]

Print and return value.

class Progress(name='', format='')[исходный код]

Print progress (how much data was processed and remains).

name, if set, customizes the output with the collective name of values being processed (for example, «events»).

format is a formatting string for the output. It will be passed keyword arguments percent, index, total and name.

Use Progress when large processing will be done after that. For example, if you have files with much data, put this element after generating file names, but before reading files. To print indices without reading the whole flow, use CountFrom and Print.

Progress is estimated based on the number of items processed by this element. It does not take into account the creation of final plots or the difference in the processing time for different values.

Предупреждение

To measure progress, the whole flow is consumed.

run(flow)[исходный код]

Consume the flow, then yield values one by one and print progress.

class RunIf(select, seq)[исходный код]

Run a sequence only for selected values.

Примечание

In general, different flows are transformed to common data types (like histograms). In some complicated analyses (like in SplitIntoBins) there can appear values of very different types, for which additional transformation must be run. Use this element in such cases.

RunIf is similar to Filter, but the latter can be used as a FillInto element inside Split.

RunIf with a selector select (let us call its opposite not_select) is equivalent to

Split(
    [
        (
            select,
            seq
        ),
        not_select
        # not selected values pass unchanged
    ],
    bufsize=1,
    copy_buf=False
)

and can be considered «syntactic sugar». Use Split for more flexibility.

select is a function that accepts a value (maybe with context) and returns a boolean. It is converted to a Selector. See its specifications for available options.

seq is a sequence that will be run for selected values. If it is not a Sequence, it is converted to that.

run(flow)[исходный код]

Run the sequence for selected values from the flow.

Предупреждение

RunIf disrupts the flow: it feeds values to the sequence one by one, and yields the results. If the sequence depends on the complete flow (for example, yields the maximum element), this will be incorrect. The flow after RunIf is not disrupted.

Not selected values pass unchanged.

Functions

Functions to deal with data and context, and seq_map().

A value is considered a (data, context) pair, if it is a tuple of length 2, and the second element is a dictionary or its subclass.

get_context(value)[исходный код]

Get context from a possible (data, context) pair.

If context is not found, return an empty dictionary.

get_data(value)[исходный код]

Get data from value (a possible (data, context) pair).

If context is not found, return value.

get_data_context(value)[исходный код]

Get (data, context) from value (a possible (data, context) pair).

If context is not found, (value, {}) is returned.

Since get_data() and get_context() both check whether context is present, this function may be slightly more efficient and compact than the other two.

seq_map(seq, container, one_result=True)[исходный код]

Map Lena Sequence seq to the container.

For each value from the container, calculate seq.run([value]). This can be a list or a single value. If one_result is True, the result must be a single value. In this case, if results contain less than or more than one element, LenaValueError is raised.

The list of results (lists or single values) is returned. The results are in the same order as read from the container.

Group plots

Group several plots into one.

Since data can be produced in different places, several classes are needed to support this. First, the plots of interest must be selected (for example, one-dimensional histograms). This is done by Selector. Selected plots must be grouped. For example, we may want to plot data x versus Monte-Carlo x, but not data x vs data y. Data is grouped by GroupBy. To preserve the group, we can’t yield its members to the following elements, but have to transform the plots inside GroupPlots. We can also scale (normalize) all plots to one using GroupScale.

Example from a real analysis:

Sequence(
    # ... read data and produce histograms ...
    MakeFilename(dirname="background/{{run_number}}"),
    UpdateContext("output.plot.name", "{{variable.name}}",
                  raise_on_missing=True),
    lena.flow.GroupPlots(
        group_by="variable.coordinate",
        # Select either histograms (data) or Graphs (fit),
        # but only having "variable.coordinate" in context
        select=("variable.coordinate", [histogram, Graph]),
        # scale to data
        scale=Not("fit"),
        transform=(
            ToCSV(),
            # scaled plots will be written to separate files
            MakeFilename(
                "{{output.filename}}_scaled",
                overwrite=True,
            ),
            UpdateContext("output.plot.name", "{{variable.name}}",
                          raise_on_missing=True),
            write,
            # Several prints were used during this code creation
            # Print(transform=lambda val: val[1]["plot"]["name"]),
        ),
        # make both single and combined plots of coordinates
        yield_selected=True,
    ),
    # create file names for combined plots
    MakeFilename("combined_{{variable.coordinate}}"),
    # non-combined plots will still need file names
    MakeFilename("{{variable.name}}"),
    lena.output.ToCSV(),
    write,
    lena.context.Context(),
    # here our jinja template renders a group as a list of items
    lena.output.RenderLaTeX(template_dir=TEMPLATE_DIR,
                            select_template=select_template),
    # we have a single template, no more groups are present
    write,
    lena.output.LaTeXToPDF(),
)
class GroupBy(group_by)[исходный код]

Group values.

Data is added during update(). Groups dictionary is available as groups attribute. groups is a mapping of keys (defined by group_by) to lists of items with the same key.

group_by is a function, which returns distinct hashable results for items from different groups. It can be a dot-separated string, which corresponds to a subcontext (see context.get_recursively).

If group_by is not a callable or a string, LenaTypeError is raised.

clear()[исходный код]

Remove all groups.

update(val)[исходный код]

Find a group for val and add it there.

A group key is calculated by group_by. If no such key exists, a new group is created.

class GroupPlots(group_by, select, transform=(), scale=None, yield_selected=False)[исходный код]

Group several plots.

Plots to be grouped are chosen by select, which acts as a boolean function. If select is not a Selector, it is converted to that class. Use Selector for more options.

Plots are grouped by group_by, which returns different keys for different groups. If it is not an instance of GroupBy, it is converted to that class. Use GroupBy for more options.

transform is a sequence, which processes individual plots before yielding. For example, set transform=(ToCSV(), write). transform is called after scale.

scale is a number or a string. A number means the scale, to which plots must be normalized. A string is a name of the plot to which other plots must be normalized. If scale is not an instance of GroupScale, it is converted to that class. If a plot could not be rescaled, LenaValueError is raised. For more options, use GroupScale.

yield_selected defines whether selected items should be yielded during run(). By default it is False: if we used a variable in a combined plot, we don’t create a separate plot of that.

run(flow)[исходный код]

Run the flow and yield final groups.

Each item of the flow is checked with the selector. If it is selected, it is added to groups. Otherwise, it is yielded.

After the flow is finished, groups are yielded. Groups are lists of items, which have same keys returned from group_by. Each group’s context (including empty one) is inserted into a list in context.group. If any element’s context.output.changed is True, the final context.output.changed is set to True (and to False otherwise). The resulting context is updated with the intersection of groups“ contexts.

If scale was set, plots are normalized to the given value or plot. If that plot was not selected (is missing in the captured group) or its norm could not be calculated, LenaValueError is raised.

class GroupScale(scale_to, allow_zero_scale=False, allow_unknown_scale=False)[исходный код]

Scale a group of data.

scale_to defines the method of scaling. If a number is given, group items are scaled to that. Otherwise it is converted to a Selector, which must return a unique item from the group. Group items will be scaled to the scale of that item.

By default, attempts to rescale a structure with unknown or zero scale raise an error. If allow_zero_scale and allow_unknown_scale are set to True, the corresponding errors are ignored and the structure remains unscaled.

scale(group)[исходный код]

Scale group and return a rescaled group as a list.

The group can contain (structure, context) pairs. The original group is unchanged as long as structures“ scale method returns a new structure (default for Lena histograms and graphs).

If any item could not be rescaled and options were not set to ignore that, LenaValueError is raised.

class Not(selector, raise_on_error=False)[исходный код]
Базовые классы: lena.flow.selectors.Selector

Negate a selector.

selector is an instance of Selector or will be used to initialize that.

raise_on_error is used during the initialization of selector and has the same meaning as in Selector. It has no effect if selector is already initialized.

__call__(value)[исходный код]

Negate the result of the initialized selector.

If raise_on_error is False, then this is a complete negation (including the case of an error encountered in the selector). For example, if the selector is variable.name, and value’s context contains no «variable», Not(«variable.name»)(value) will be True. If raise_on_error is True, then any occurred exception will be raised here.

class Selector(selector, raise_on_error=False)[исходный код]

Determine whether an item should be selected.

Generally, selected means the result is convertible to True, but other values can be used as well.

The usage of selector depends on its type.

If selector is a class, __call__() checks that data part of the value is subclassed from that.

A callable is used as is.

A string means that value’s context must conform to that (as in context.contains).

selector can be a container. In this case its items are converted to selectors. If selector is a list, the result is or applied to results of each item. If it is a tuple, boolean and is applied to the results.

raise_on_error is a boolean that sets whether in case of an exception the selector raises that exception or returns False. If selector is a container, raise_on_error will be used during its items initialization (recursively).

If incorrect arguments are provided, LenaTypeError is raised.

__call__(value)[исходный код]

Check whether value is selected.

By default, if an exception occurs, the result is False. Thus it is safe to use non-existing attributes or arbitrary contexts. However, if raise_on_error was set to True, the exception will be raised. Use it if you are confident in the data and want to see any error.

Iterators

Iterators allow to transform a data flow or create a new one.

class Chain(*iterables)[исходный код]

Chain generators.

Chain can be used as a Source to generate data.

Example:

>>> c = lena.flow.Chain([1, 2, 3], ['a', 'b'])
>>> list(c())
[1, 2, 3, 'a', 'b']

iterables will be chained during __call__(), that is after the first one is exhausted, the second is called, etc.

__call__()[исходный код]

Generate values from chained iterables.

class CountFrom(start=0, step=1)[исходный код]

Generate numbers from start to infinity, with step between values.

Similar to itertools.count().

__call__()[исходный код]

Yield values from start to infinity with step.

ISlice(*args, **kwargs)[исходный код]

Deprecated since Lena 0.4. Use Slice.

class Reverse[исходный код]

Reverse the flow (yield values from last to first).

Предупреждение

This element will consume the whole flow.

run(flow)[исходный код]

Consume the flow and yield values in reverse order.

class Slice(*args)[исходный код]

Slice data flow from start to stop with step.

Initialization:

Slice (stop)

Slice (start, stop [, step])

Similar to itertools.islice() or range(). Negative indices for start and stop are supported during run().

Examples:

>>> Slice(1000)  # doctest: +SKIP

analyse only one thousand first events (no other values from flow are generated). Use it for quick checks of data on small subsamples.

>>> Slice(-1)  # doctest: +SKIP

yields all elements from the flow except the last one.

>>> Slice(1, -1)  # doctest: +SKIP

yields all elements from the flow except the first and the last one.

Note that in case of negative indices it is necessary to store abs(start) or abs(stop) values in memory. For example, to discard the last 200 elements one has to a) read the whole flow, b) store 200 elements during each iteration.

It is not possible to use negative indices with fill_into(), because it doesn’t control the flow and doesn’t know when it is finished. To obtain a negative step, use a composition with Reverse.

fill_into(element, value)[исходный код]

Fill element with value.

Values are filled in the order defined by (start, stop, step). Element must have a fill(value) method.

When the filling should stop, LenaStopFill is raised (Split handles this normally). Sometimes for step more than one LenaStopFill will be raised before reaching stop elements. Early exceptions are an optimization and don’t affect the correctness of this method.

run(flow)[исходный код]

Yield values from flow from start to stop with step.

Split into bins

Split analysis into groups defined by bins.

class IterateBins(create_edges_str=None, select_bins=None)[исходный код]

Iterate bins of histograms.

create_edges_str is a callable that creates a string from bin’s edges and coordinate names and adds that to the context. It is passed parameters (edges, var_context), where var_context is variable context containing variable names (it can be a single Variable or Combine). By default it is cell_to_string().

select_bins is a callable used to test bin contents. By default, only those histograms are iterated where bins contain histograms. Use select_bins to choose other classes. See Selector for examples.

If create_edges_str is not callable, LenaTypeError is raised.

run(flow)[исходный код]

Yield histogram bins one by one.

For each histogram from the flow, if its bins pass select_bins, they are iterated.

The resulting context is taken from bin’s context. Histogram’s context is preserved in context.bins. context.bin is updated with «edges» (with bin edges) and «edges_str» (their representation). If histogram’s context contains variable, that is used for edges“ representation.

Not histograms pass unchanged.

class MapBins(seq, select_bins, drop_bins_context=True)[исходный код]

Transform bin content of histograms.

This class can be used when histogram bins contain complex structures. For example, in order to plot a histogram with a 3-dimensional vector in each bin, one can create 3 histograms corresponding to the vector’s components.

seq is a sequence or an element applied to bin contents. If seq is not a Sequence or an element with run method, it is converted to a Sequence. Example: seq=Split([X(), Y(), Z()]) (provided that you have X, Y, Z variables).

If select_bins is True for histogram’s bins (tested on an arbitrary bin), the histogram is transformed. Bin types can be given in a list or as a general Selector. Example: select_bins=[lena.math.vector3, list].

MapBins creates histograms that may be plotted, because their bins contain only data without context. If drop_bins_context is False, context remains in bins. By default, context of all histogram bins is discarded. This discourages compositions of MapBins: make compositions of their internal sequences instead.

In case of incorrect arguments, LenaTypeError is raised.

run(flow)[исходный код]

Transform histograms from flow.

context.value is updated with bin context (if that exists). It is assumed that all bins have the same context (because they were produced by the same sequence), therefore an arbitrary bin is taken and contexts of all other bins are ignored.

Not selected values pass unchanged.

class SplitIntoBins(seq, arg_var, edges)[исходный код]

Split analysis into groups defined by bins.

seq is a FillComputeSeq sequence (or will be converted to that) that corresponds to the analysis being performed for different bins. Deep copy of seq is done for each bin.

arg_var is a Variable that takes data and returns value used to compute the bin index. Example of a two-dimensional function: arg_var = lena.variables.Variable("xy", lambda event: (event.x, event.y)).

edges is a sequence of arrays containing monotonically increasing bin edges along each dimension. Example: edges = lena.math.mesh((0, 1), 10).

Примечание

The final histogram may contain vectors, histograms and any other data the analysis produced. To plot them, one can extract vector components with e.g. MapBins. If bin contents are histograms, they can be yielded one by one with IterateBins.

Attributes: bins, edges.

If edges are not increasing, LenaValueError is raised. In case of other argument initialization problems, LenaTypeError is raised.

compute()[исходный код]

Yield a (histogram, context) pair for each compute() for all bins.

The histogram is created from edges with bin contents taken from compute() for bins. Computational context is preserved in histogram’s bins.

SplitIntoBins adds context as histogram (corresponding to edges) and variable (corresponding to arg_var) subcontexts. This allows unification of SplitIntoBins with common analysis using histograms and variables (useful when creating plots from one template). Old contexts, if exist, are preserved in nested subcontexts (that is histogram.histogram or variable.variable).

Примечание

In Python 3 the minimum number of compute() among all bins is used. In Python 2, if some bin is exhausted before the others, its content will be filled with None.

fill(val)[исходный код]

Fill the cell corresponding to arg_var(val) with val.

Values outside the edges are ignored.

cell_to_string(cell_edges, var_context=None, coord_names=None, coord_fmt='{}_lte_{}_lt_{}', coord_join='_', reverse=False)[исходный код]

Transform cell edges into a string.

cell_edges is a tuple of pairs (lower bound, upper bound) for each coordinate.

coord_names is a list of coordinates names.

coord_fmt is a string, which defines how to format individual coordinates.

coord_join is a string, which joins coordinate pairs.

If reverse is True, coordinates are joined in reverse order.

get_example_bin(struct)[исходный код]

Return bin with zero index on each axis of the histogram bins.

For example, if the histogram is two-dimensional, return hist[0][0].

struct can be a histogram or an array of bins.