Flow

Elements:

Cache(filename[, recompute, method, protocol])

Cache the flow passing through.

Count([name, count])

Count items that pass through.

DropContext(*args)

Sequence that transforms (data, context) flow so that only data remains in the inner sequence.

End()

Stop sequence here.

Filter(selector)

Filter values from flow.

Print([before, sep, end, transform])

Print values passing through.

Progress([name, format])

Print progress (how much data was processed and remains).

RunIf(select, *args)

Run a sequence only for selected values.

Functions:

get_context(value)

Get context from a possible (data, context) pair.

get_data(value)

Get data from value (a possible (data, context) pair).

get_data_context(value)

Get (data, context) from value (a possible (data, context) pair).

seq_map(seq, container[, one_result])

Map Lena Sequence seq to the container.

Group plots:

GroupBy([group_by, merge])

Group values.

GroupPlots(group_by[, select, transform, ...])

Не рекомендуется, начиная с версии 0.6.

group_plots(group)

Return data parts of the group and set context["group"] to their intersection.

GroupScale(scale_to[, allow_zero_scale, ...])

Scale a group of data.

MapGroup(*seq, **map_scalars)

Apply a sequence to groups.

Selector(selector[, raise_on_error])

A boolean function on values.

And(selectors[, raise_on_error])

And-test of multiple selectors.

Or(selectors[, raise_on_error])

Or-test of multiple selectors.

Not(selector[, raise_on_error])

Negate a selector.

Iterators:

Chain(*iterables)

Chain generators.

CountFrom([start, step])

Generate numbers from start to infinity, with step between values.

ISlice(*args, **kwargs)

Не рекомендуется, начиная с версии 0.4.

Reverse()

Reverse the flow (yield values from last to first).

Slice(*args)

Slice data flow from start to stop with step.

Split into bins:

Since Lena 0.5 moved to Structures.

Elements

Elements form Lena sequences. This group contains miscellaneous elements, which didn’t fit other categories.

class Cache(filename, recompute=False, method='cPickle', protocol=2)[исходный код]

Cache the flow passing through.

On the first run, dump the whole flow to a file (and yield the flow unaltered). On subsequent runs, load the flow from that file in the original order.

Example:

s = Source(
         ReadFiles(),
         ReadEvents(),
         MakeHistograms(),
         Cache("histograms.pkl"),
         MakeStats(),
         Cache("stats.pkl"),
      )

If stats.pkl exists, Cache will read the data from that file and no other processing will be done. If the stats.pkl cache doesn’t exist, but the cache for histograms exists, it will be used and no previous processing (from ReadFiles to MakeHistograms) will occur. If both caches were not filled yet, processing will go as usual.

Only pickleable objects can be cached (otherwise a pickle.PickleError will be raised).

Предупреждение

The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data from an untrusted source.

filename is the name of file where to store the cache. It can be given .pkl extension.

If recompute is True, an existing cache will always be overwritten. This option is typically used if one wants to define cache behaviour from the command line.

method can be pickle or cPickle (faster pickle). For Python 3 they are same.

protocol is pickle protocol. Version 2 is the highest supported by Python 2. Version 0 is «human-readable» (as noted in the documentation). 3 is recommended if compatibility between Python 3 versions is needed. 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations.

static alter_sequence(seq)[исходный код]

If the Sequence seq contains a Cache, which has an up-to-date cache, a Source is built based on the flattened seq and returned. Otherwise the seq is returned unchanged.

cache_exists()[исходный код]

Return True if file with cache exists and is readable.

If recompute was True during the initialization, pretend that cache does not exist (return False).

drop_cache()[исходный код]

Remove file with cache if that exists, pass otherwise.

If cache exists and is readable, but could not be deleted, LenaEnvironmentError is raised.

run(flow)[исходный код]

Load cache or fill it.

If we can read filename, load flow from there. Otherwise use the incoming flow and fill the cache. All loaded or passing items are yielded.

class Count(name='count', count=0)[исходный код]

Count items that pass through.

Example:

>>> flow = [0, 1, 2]
>>> c = Count("my_counter")
>>> list(c.run(iter(flow))) == [
...     0, 1, (2, {'my_counter': 3})
... ]
True

name is this counter’s name (added to context). One can use the default name if Count is filled, but it is recommended to provide a meaningful name in a Run element.

count is the initial counter. It is added to all countings. It is set to 0 during reset().

name and count are public attributes.

compute()[исходный код]

Yield (count, context).

context is taken from the last filled value and is updated with {self.name: self.count}.

fill(value)[исходный код]

Increase count and set current context from value.

fill_into(element, value)[исходный код]

Fill element with value and increase count.

value context is updated with {self.name: self.count}.

element must have a fill(value) method.

reset()[исходный код]

Set count to zero. Clear current context.

run(flow)[исходный код]

Yield incoming values and increase count.

After the flow is exhausted, update last value’s context with {self.name: self.count}.

If the flow was empty, nothing is yielded (so count can be zero only from compute()).

class DropContext(*args)[исходный код]

Sequence that transforms (data, context) flow so that only data remains in the inner sequence. Context is restored outside DropContext.

DropContext works for most simple cases as a Sequence, but may not work in more advanced circumstances. For example, since DropContext is not transparent, Split can’t judge whether it has a FillCompute element inside, and this may lead to errors in the analysis. It is recommended to provide context when possible.

*args will form a Sequence.

run(flow)[исходный код]

Run the sequence without context, and generate output flow restoring the context before DropContext.

If the sequence adds a context, the returned context is updated with that.

class End[исходный код]

Stop sequence here.

run(flow)[исходный код]

Exhaust all preceding flow and stop iteration (yield nothing to the following flow).

class Filter(selector)[исходный код]

Filter values from flow.

selector is a boolean function. If it returns True, the value passes Filter. If selector is not callable, it is converted to a Selector. If the conversion could not be done, LenaTypeError is raised.

Примечание

Filter appeared in Lena only in version 0.4. There may be better alternatives to using this element:

  • don’t produce values that you will discard later. If you want to select data from a specific file, read only that file.

  • use a custom class. SelectPosition(«border») is more readable and maintainable than a Filter with many conditions, and it is also more cohesive if you group several options like «center» or «top» in a single place. If you make a selection, it can be useful to add information about that to the context (and Filter does not do that).

This doesn’t mean that we recommend against this class: sometimes it can be quick and explicit, and if one’s class name provides absolutely no clue what it does, a general Filter would be more readable.

Добавлено в версии 0.4.

fill_into(element, value)[исходный код]

Fill value into an element if selector(value) is True.

Element must have a fill(value) method.

run(flow)[исходный код]

Yield values from the flow for which the selector is True.

class Print(before='', sep='', end='\n', transform=None)[исходный код]

Print values passing through.

before is a string appended before the first element in the item (which may be a container).

sep separates elements, end is appended after the last element.

transform is a function which transforms passing items (for example, it can select its specific fields).

__call__(value)[исходный код]

Print and return value.

class Progress(name='', format='')[исходный код]

Print progress (how much data was processed and remains).

name, if set, customizes the output with the collective name of values being processed (for example, «events»).

format is a formatting string for the output. It will be passed keyword arguments percent, index, total and name.

Use Progress before a large processing. For example, if you have files with much data, put this element after generating file names, but before reading files. To print indices without reading the whole flow, use CountFrom and Print.

Progress is estimated based on the number of items processed by this element. It does not take into account the creation of final plots or the difference in the processing time for different values.

Предупреждение

To measure progress, the whole flow is consumed.

run(flow)[исходный код]

Consume the flow, then yield values one by one and print progress.

class RunIf(select, *args)[исходный код]

Run a sequence only for selected values.

Примечание

In general, different flows are transformed to common data types (like histograms). In some complicated analyses (like in SplitIntoBins) there can appear values of very different types, for which additional transformation must be run. Use this element in such cases.

RunIf is similar to Filter, but the latter can be used as a FillInto element inside Split.

RunIf with a selector select (let us call its opposite not_select) is equivalent to

Split(
    [
        (
            select,
            Sequence(*args)
        ),
        not_select
        # not selected values pass unchanged
    ],
    bufsize=1,
    copy_buf=False
)

and can be considered «syntactic sugar». Use Split for more flexibility.

select is a function that accepts a value (maybe with context) and returns a boolean. It is converted to a Selector. See its specifications for available options.

args are an arbitrary number of elements that will be run for selected values. They are joined into a Sequence.

Добавлено в версии 0.4.

run(flow)[исходный код]

Run the sequence for selected values from the flow.

Предупреждение

RunIf disrupts the flow: it feeds values to the sequence one by one, and yields the results. If the sequence depends on the complete flow (for example, yields the maximum element), this will be incorrect. The flow after RunIf is not disrupted.

Not selected values pass unchanged.

Functions

Functions to deal with data and context, and seq_map().

A value is considered a (data, context) pair, if it is a tuple of length 2, and the second element is a dictionary or its subclass.

get_context(value)[исходный код]

Get context from a possible (data, context) pair.

If context is not found, return an empty dictionary.

get_data(value)[исходный код]

Get data from value (a possible (data, context) pair).

If context is not found, return value.

get_data_context(value)[исходный код]

Get (data, context) from value (a possible (data, context) pair).

If context is not found, (value, {}) is returned.

Since get_data() and get_context() both check whether context is present, this function may be slightly more efficient and compact than the other two.

seq_map(seq, container, one_result=True)[исходный код]

Map Lena Sequence seq to the container.

For each value from the container, calculate seq.run([value]). This can be a list or a single value. If one_result is True, the result must be a single value. In this case, if results contain less than or more than one element, LenaValueError is raised.

The list of results (lists or single values) is returned. The results are in the same order as read from the container.

Group plots

Group several plots into one.

Since data can be produced in different places, several classes are needed to support this. First, the plots of interest must be selected (for example, one-dimensional histograms). This is done by Selector. Selected plots must be grouped. For example, we may want to plot data x versus Monte-Carlo x, but not data x vs data y. Data is grouped by GroupBy. To preserve the group, we can’t yield its members to the following elements, but have to transform the plots inside GroupPlots. We can also scale (normalize) all plots to one using GroupScale.

Example from a real analysis:

Sequence(
    # ... read data and produce histograms ...
    MakeFilename(dirname="background/{{run_number}}"),
    UpdateContext("output.plot.name", "{{variable.name}}",
                  raise_on_missing=True),
    lena.flow.GroupPlots(
        group_by="variable.coordinate",
        # Select either histograms (data) or Graphs (fit),
        # but only having "variable.coordinate" in context
        select=("variable.coordinate", [histogram, Graph]),
        # scale to data
        scale=Not("fit"),
        transform=(
            ToCSV(),
            # scaled plots will be written to separate files
            MakeFilename(
                "{{output.filename}}_scaled",
                overwrite=True,
            ),
            UpdateContext("output.plot.name", "{{variable.name}}",
                          raise_on_missing=True),
            write,
            # Several prints were used during this code creation
            # Print(transform=lambda val: val[1]["plot"]["name"]),
        ),
        # make both single and combined plots of coordinates
        yield_selected=True,
    ),
    # create file names for combined plots
    MakeFilename("combined_{{variable.coordinate}}"),
    # non-combined plots will still need file names
    MakeFilename("{{variable.name}}"),
    lena.output.ToCSV(),
    write,
    lena.context.Context(),
    # here our jinja template renders a group as a list of items
    lena.output.RenderLaTeX(template_dir=TEMPLATE_DIR,
                            select_template=select_template),
    # we have a single template, no more groups are present
    write,
    lena.output.LaTeXToPDF(),
)
class GroupBy(group_by='', merge='')[исходный код]

Group values.

Data is added during fill(). Groups dictionary is available as groups attribute. groups is a mapping of keys (defined by group_by and merge) to lists of items with the same key.

group_by is a function that returns distinct hashable results for values from different groups. It can be also a dot-separated formatting string. In that case only the context part of the value is used (see context.format_context). group_by can be a tuple of strings or callables. In that case the hash value will be combined from each part of the tuple. A tuple may be used when not all parts of context can be always rendered (that would lead to an error or an empty string if they were combined into one formatting string).

Изменено в версии 0.6: group_by is no longer a function.

Добавлено в версии 0.6: merge allows ignoring keys.

clear()[исходный код]

Не рекомендуется, начиная с версии 0.6: use the standard reset() method.

compute()[исходный код]

Yield values groupped by distinct keys one by one.

Each group is a tuple of filled values having the same key.

fill(val)[исходный код]

Find the corresponding group and fill it with val.

A group key is calculated via group_by and merge. If no such key exists, a new group is created.

If a formatting key was not found for val (or if no values for a tuple group_by could produce keys) LenaValueError is raised.

reset()[исходный код]

Remove all groups.

update(val)[исходный код]

Не рекомендуется, начиная с версии 0.6: use the standard fill() method.

class GroupPlots(group_by, select=None, transform=(), scale=None, yield_selected=False)[исходный код]

Не рекомендуется, начиная с версии 0.6: use GroupBy, group_plots() and other relevant elements.

Plots to be grouped are chosen by select, which acts as a boolean function. By default everything is selected. If select is not a Selector, it is converted to that class. Use Selector for more options.

Не рекомендуется, начиная с версии 0.5: use RunIf instead of select.

Plots are grouped by group_by, which returns different keys for different groups. It can be a function of a value or a formatting string for its context (see GroupBy). Example: group_by=»{{value.variable.name}}_{{variable.name}}».

transform is a sequence that processes individual plots before yielding. Example: transform=(ToCSV(), write). transform is called after scale.

Не рекомендуется, начиная с версии 0.5: use MapGroup instead of transform.

scale is a number or a string. A number means the scale, to which plots must be normalized. A string is a name of the plot to which other plots must be normalized. If scale is not an instance of GroupScale, it is converted to that class. If a plot could not be rescaled, LenaValueError is raised. For more options, use GroupScale.

yield_selected defines whether selected items should be yielded during run(). By default it is False: if we used a variable in a combined plot, we don’t create a separate plot of that.

run(flow)[исходный код]

Run the flow and yield final groups.

Each item of the flow is checked with the selector. If it is selected, it is added to groups. Otherwise, it is yielded.

After the flow is finished, groups are yielded. Groups are lists of items, which have same keys returned from group_by. Each group’s context (including empty one) is inserted into a list in context.group. If any element’s context.output.changed is True, the final context.output.changed is set to True (and to False otherwise). The resulting context is updated with the intersection of groups“ contexts.

If scale was set, plots are normalized to the given value or plot. If that plot was not selected (is missing in the captured group) or its norm could not be calculated, LenaValueError is raised.

group_plots(group)[исходный код]

Return data parts of the group and set context[«group»] to their intersection.

If any of values has been changed, context.output.changed of the group is set to True.

class GroupScale(scale_to, allow_zero_scale=False, allow_unknown_scale=False)[исходный код]

Scale a group of data.

scale_to defines the method of scaling. If a number is given, group items are scaled to that. Otherwise it is converted to a Selector, which must return a unique item from the group. Group items will be scaled to the scale of that item.

By default, attempts to rescale a structure with unknown or zero scale raise an error. If allow_zero_scale and allow_unknown_scale are set to True, the corresponding errors are ignored and the structure remains unscaled.

__call__(group)[исходный код]

Scale the group. See scale_to() for details.

If group is not iterable, LenaValueError is raised.

class MapGroup(*seq, **map_scalars)[исходный код]

Apply a sequence to groups.

Arguments seq must form a Sequence.

Set a keyword argument map_scalars to False to ignore scalar values (those that are not groups). Other keyword arguments raise LenaTypeError.

Добавлено в версии 0.5.

run(flow)[исходный код]

Map seq to every group from flow.

A value represents a group if its context has a key group and its data part is iterable (for example, a list of values). If length of data is different from the length of context.group, LenaRuntimeError is raised.

seq must produce an equal number of results for each item of group, or LenaRuntimeError is raised. These results are yielded in groups one by one.

Common changes of group context update common context (that of the value). context.output.changed is set appropriately.

class Selector(selector, raise_on_error=True)[исходный код]

A boolean function on values.

The usage of selector depends on its type.

If selector is a class, __call__() checks that data part of the value is subclassed from that.

A callable is used as it is.

A string means that value’s context must conform to that (as in context.contains).

selector can be a container. In this case its items are converted to selectors. If selector is a list, the result is or applied to results of each item. If it is a tuple, boolean and is applied to the results.

raise_on_error is a boolean that sets whether in case of an exception the selector raises that exception or returns False. If selector is a container, raise_on_error will be used recursively during the initialization of its items.

__call__(value)[исходный код]

Check whether value is selected.

If an exception occurs and raise_on_error is False, the result is False. This could be used while testing potentially non-existing attributes or arbitrary contexts. However, this is not recommended, since it covers too many errors and some of them should be raised explicitly.

class And(selectors, raise_on_error=True)[исходный код]

Базовые классы: Selector

And-test of multiple selectors.

selectors is a tuple of items, each of which is a Selector or will be converted to that.

raise_on_error has the same meaning as in Selector, and will be applied to each newly initialized subselector.

__call__(val)[исходный код]

Check whether value is selected.

If an exception occurs and raise_on_error is False, the result is False. This could be used while testing potentially non-existing attributes or arbitrary contexts. However, this is not recommended, since it covers too many errors and some of them should be raised explicitly.

class Or(selectors, raise_on_error=True)[исходный код]

Базовые классы: Selector

Or-test of multiple selectors.

selectors is a list of items, each of which is a Selector or will be converted to that. Evaluation is short-circuit, that is if a selector was true, further ones are not applied.

raise_on_error has the same meaning as in Selector, and will be applied to each newly initialized subselector.

__call__(val)[исходный код]

Check whether value is selected.

If an exception occurs and raise_on_error is False, the result is False. This could be used while testing potentially non-existing attributes or arbitrary contexts. However, this is not recommended, since it covers too many errors and some of them should be raised explicitly.

class Not(selector, raise_on_error=True)[исходный код]

Базовые классы: Selector

Negate a selector.

selector is converted to Selector.

raise_on_error has the same meaning as in Selector.

__call__(value)[исходный код]

Negate the result of the selector.

If raise_on_error is False, then this is a full negation (including the case of an error encountered in the selector). If raise_on_error is True, then any occurred exception will be re-raised here.

Iterators

Iterators allow to transform a data flow or create a new one.

class Chain(*iterables)[исходный код]

Chain generators.

Chain can be used as a Source to generate data.

Example:

>>> c = lena.flow.Chain([1, 2, 3], ['a', 'b'])
>>> list(c())
[1, 2, 3, 'a', 'b']

iterables will be chained during __call__(), that is after the first one is exhausted, the second is called, etc.

__call__()[исходный код]

Generate values from chained iterables.

class CountFrom(start=0, step=1)[исходный код]

Generate numbers from start to infinity, with step between values.

Similar to itertools.count().

__call__()[исходный код]

Yield values from start to infinity with step.

ISlice(*args, **kwargs)[исходный код]

Не рекомендуется, начиная с версии 0.4: use Slice.

class Reverse[исходный код]

Reverse the flow (yield values from last to first).

Предупреждение

This element will consume the whole flow.

run(flow)[исходный код]

Consume the flow and yield values in reverse order.

class Slice(*args)[исходный код]

Slice data flow from start to stop with step.

Initialization:

Slice (stop)

Slice (start, stop [, step])

Similar to itertools.islice() or range(). Negative indices for start and stop are supported during run().

Examples:

>>> Slice(1000)  

analyse only one thousand first events (no other values from flow are generated). Use it for quick checks of data on small subsamples.

>>> Slice(-1)  

yields all elements from the flow except the last one.

>>> Slice(1, -1)  

yields all elements from the flow except the first and the last one.

Note that in case of negative indices it is necessary to store abs(start) or abs(stop) values in memory. For example, to discard the last 200 elements one has to a) read the whole flow, b) store 200 elements during each iteration.

It is not possible to use negative indices with fill_into(), because it doesn’t control the flow and doesn’t know when it is finished. To obtain a negative step, use a composition with Reverse.

fill_into(element, value)[исходный код]

Fill element with value.

Values are filled in the order defined by (start, stop, step). Element must have a fill(value) method.

When the filling should stop, LenaStopFill is raised (Split handles this normally). Sometimes for step more than one LenaStopFill will be raised before reaching stop elements. Early exceptions are an optimization and don’t affect the correctness of this method.

run(flow)[исходный код]

Yield values from flow from start to stop with step.