Flow¶
Elements:
|
Cache the flow passing through. |
|
Count items that pass through. |
|
Sequence that transforms (data, context) flow so that only data remains in the inner sequence. |
|
Stop sequence here. |
|
Filter values from flow. |
|
Print values passing through. |
|
Print progress (how much data was processed and remains). |
|
Run a sequence only for selected values. |
|
Store filled items. |
Context managers:
|
Apply transformation only to the data part of the flow. |
|
Apply transformation only to the context part of the flow. |
Functions:
|
Function composition of seq. |
|
Get context from a possible (data, context) pair. |
|
Get data from value (a possible (data, context) pair). |
|
Get (data, context) from value (a possible (data, context) pair). |
|
Map Lena Sequence seq to the container. |
Group plots:
|
Group values. |
|
|
|
Return data parts of the group and set context["group"] to their intersection. |
|
Scale a group of data. |
|
Apply a sequence to groups. |
|
A boolean function on values. |
|
And-test of multiple selectors. |
|
Or-test of multiple selectors. |
|
Negate a selector. |
Iterators:
|
|
|
Generate numbers from start to infinity, with step between values. |
|
|
|
Create a Source from an iterable. |
|
Reverse the flow (yield values from last to first). |
|
Slice data flow from start to stop with step. |
Split into bins:
Since Lena 0.5 moved to Structures.
Elements¶
Elements form Lena sequences. This group contains miscellaneous elements, which didn’t fit other categories.
- class Cache(filename, recompute=False, method='cPickle', protocol=2)[исходный код]¶
Cache the flow passing through.
On the first run, dump the whole flow to a file (and yield the flow unaltered). On subsequent runs, load the flow from that file in the original order.
Example:
s = Source( ReadFiles(), ReadEvents(), MakeHistograms(), Cache("histograms.pkl"), MakeStats(), Cache("stats.pkl"), )
If stats.pkl exists,
Cachewill read the data from that file and no other processing will be done. If the stats.pkl cache doesn’t exist, but the cache for histograms exists, it will be used and no previous processing (from ReadFiles to MakeHistograms) will occur. If both caches were not filled yet, processing will go as usual.Only pickleable objects can be cached (otherwise a pickle.PickleError will be raised).
Предупреждение
The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data from an untrusted source.
filename is the name of file where to store the cache. It can be given .pkl extension.
If recompute is
True, an existing cache will always be overwritten. This option is typically used if one wants to define cache behaviour from the command line.method can be pickle or cPickle (faster pickle). For Python 3 they are same.
protocol is pickle protocol. Version 2 is the highest supported by Python 2. Version 0 is «human-readable» (as noted in the documentation). 3 is recommended if compatibility between Python 3 versions is needed. 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations.
- static alter_sequence(seq)[исходный код]¶
If the Sequence seq contains a
Cache, which has an up-to-date cache, aSourceis built based on the flattened seq and returned. Otherwise the seq is returned unchanged.
- cache_exists()[исходный код]¶
Return
Trueif file with cache exists and is readable.If recompute was
Trueduring the initialization, pretend that cache does not exist (returnFalse).
- drop_cache()[исходный код]¶
Remove file with cache if that exists, pass otherwise.
If cache exists and is readable, but could not be deleted,
LenaEnvironmentErroris raised.
- run(flow)[исходный код]¶
Load cache or fill it.
If we can read filename, load flow from there. Otherwise use the incoming flow and fill the cache. All loaded or passing items are yielded.
- class Count(name='count', count=0)[исходный код]¶
Count items that pass through.
Example:
>>> flow = [0, 1, 2] >>> c = Count("my_counter") >>> list(c.run(iter(flow))) == [ ... 0, 1, (2, {'my_counter': 3}) ... ] True
name is this counter’s name (added to context). One can use the default name if Count is filled, but it is recommended to provide a meaningful name in a Run element.
count is the initial counter. It is added to all countings. It is set to 0 during
reset().name and count are public attributes.
- compute()[исходный код]¶
Yield (count, context).
context is taken from the last filled value and is updated with {self.name: self.count}.
- fill(value)[исходный код]¶
Increase count and set current context from value.
- fill_into(element, value)[исходный код]¶
Fill element with value and increase count.
value context is updated with {self.name: self.count}.
element must have a
fill(value)method.
- reset()[исходный код]¶
Set count to zero. Clear current context.
- run(flow)[исходный код]¶
Yield incoming values and increase count.
After the flow is exhausted, update last value’s context with {self.name: self.count}.
If the flow was empty, nothing is yielded (so count can be zero only from
compute()).
- class DropContext(*args)[исходный код]¶
Sequence that transforms (data, context) flow so that only data remains in the inner sequence. Context is restored outside DropContext.
DropContext works for most simple cases as a Sequence, but may not work in more advanced circumstances. For example, since DropContext is not transparent,
Splitcan’t judge whether it has a FillCompute element inside, and this may lead to errors in the analysis. It is recommended to provide context when possible.*args will form a
Sequence.- run(flow)[исходный код]¶
Run the sequence without context, and generate output flow restoring the context before DropContext.
If the sequence adds a context, the returned context is updated with that.
- class End[исходный код]¶
Stop sequence here.
- run(flow)[исходный код]¶
Exhaust all preceding flow and stop iteration (yield nothing to the following flow).
- class Filter(selector)[исходный код]¶
Filter values from flow.
selector is a boolean function. If it returns
True, the value passesFilter. If selector is not callable, it is converted to aSelector. If the conversion could not be done,LenaTypeErroris raised.Примечание
Filterappeared in Lena only in version 0.4. There may be better alternatives to using this element:don’t produce values that you will discard later. If you want to select data from a specific file, read only that file.
use a custom class. SelectPosition(«border») is more readable and maintainable than a
Filterwith many conditions, and it is also more cohesive if you group several options like «center» or «top» in a single place. If you make a selection, it can be useful to add information about that to the context (andFilterdoes not do that).
This doesn’t mean that we recommend against this class: sometimes it can be quick and explicit, and if one’s class name provides absolutely no clue what it does, a general
Filterwould be more readable.Добавлено в версии 0.4.
- fill_into(element, value)[исходный код]¶
Fill value into an element if selector(value) is
True.Element must have a fill(value) method.
- run(flow)[исходный код]¶
Yield values from the flow for which the selector is
True.
- class Print(transform=None, before='', sep='', end='\n')[исходный код]¶
Print values passing through.
transform is a function which transforms passing items (for example, it can select its specific fields).
before is a string appended before the first element in the item (which may be a container).
The first argument is defined according to its type: a function is considered transform, while a string is considered before.
sep separates elements, end is appended after the last element.
- __call__(value)[исходный код]¶
Print and return value.
- class Progress(name='', format='')[исходный код]¶
Print progress (how much data was processed and remains).
name, if set, customizes the output with the collective name of values being processed (for example, «events»).
format is a formatting string for the output. It will be passed keyword arguments percent, index, total and name.
Use
Progressbefore a large processing. For example, if you have files with much data, put this element after generating file names, but before reading files. To print indices without reading the whole flow, useCountFromandPrint.Progress is estimated based on the number of items processed by this element. It does not take into account the creation of final plots or the difference in the processing time for different values.
Предупреждение
To measure progress, the whole flow is consumed.
- run(flow)[исходный код]¶
Consume the flow, then yield values one by one and print progress.
- class RunIf(select, *args)[исходный код]¶
Run a sequence only for selected values.
Примечание
In general, different flows are transformed to common data types (like histograms). In some complicated analyses (like in
SplitIntoBins) there can appear values of very different types, for which additional transformation must be run. Use this element in such cases.RunIf is similar to
Filter, but the latter can be used as aFillIntoelement insideSplit.RunIf with a selector select (let us call its opposite not_select) is equivalent to
Split( [ ( select, Sequence(*args) ), not_select # not selected values pass unchanged ], bufsize=1, copy_buf=False )
and can be considered «syntactic sugar». Use
Splitfor more flexibility.select is a function that accepts a value (maybe with context) and returns a boolean. It is converted to a
Selector. See its specifications for available options.args are an arbitrary number of elements that will be run for selected values. They are joined into a
Sequence.Добавлено в версии 0.4.
- run(flow)[исходный код]¶
Run the sequence for selected values from the flow.
Предупреждение
RunIf disrupts the flow: it feeds values to the sequence one by one, and yields the results. If the sequence depends on the complete flow (for example, yields the maximum element), this will be incorrect. The flow after RunIf is not disrupted.
Not selected values pass unchanged.
- class StoreFilled(yield_as_a_group=True)[исходный код]¶
Store filled items.
If yield_as_a_group is
False, values are yielded one by one incompute(). By default they are yielded as a group.A public attribute
groupallows access to the list of filled values.This class is memory unsafe by definition. It is used mostly for testing purposes.
- compute()[исходный код]¶
Yield the collected values.
- fill(value)[исходный код]¶
Add value to the collected items.
- reset()[исходный код]¶
Clear the group.
Context managers¶
Context managers allow application of functions only to data or context parts of the flow. Example:
s = Source(
# read data ...
# add context
UpdateContext(...),
Context(
# apply functions only to the context
copy.deepcopy, # makes more sense in Split
other_function,
),
# transform only data part of the value
Data(lambda val: {"sum": val}),
# the same could be achieved with
# lambda val: ({"sum": val[0]}, val[1]),
# but Data makes it more explicit and structured.
# ...
# other elements again use both data and context
)
- class Context(*seq)[исходный код]¶
Apply transformation only to the context part of the flow.
seq is a sequence of callables, which will be applied to the context part of the value.
Добавлено в версии 0.6.
- __call__(value)[исходный код]¶
Apply self to the context part of the value.
- class Data(*seq)[исходный код]¶
Apply transformation only to the data part of the flow.
seq is a sequence of callables (one-to-one elements), which will be applied to the data part of the value.
The advantage of this element is its simplicity and flexibility (can also be used in a Fill sequence).
См. также
DropContextto use not only callables, but also any-to-any (Run) elements.Добавлено в версии 0.6.
- __call__(value)[исходный код]¶
Apply self to the data part of the value.
Functions¶
- class compose(*seq)[исходный код]¶
Function composition of seq.
All elements of seq must be callable and return a single value (be
Callelements). They are applied in the order of their appearance in seq.This is a helper class, but not a Lena sequence, since it is does not support transformations of many to many.
Добавлено в версии 0.6.
См. также
core.Sequenceaccepts iterators and takes into account context.variables.Composemakes composition of Lena variables.- __call__(val)[исходный код]¶
Call self as a function.
Functions to deal with data and context, and seq_map().
A value is considered a (data, context) pair, if it is a tuple of length 2, and the second element is a dictionary or its subclass.
- get_context(value)[исходный код]¶
Get context from a possible (data, context) pair.
If context is not found, return an empty dictionary.
- get_data(value)[исходный код]¶
Get data from value (a possible (data, context) pair).
If context is not found, return value.
- get_data_context(value)[исходный код]¶
Get (data, context) from value (a possible (data, context) pair).
If context is not found, (value, {}) is returned.
Since
get_data()andget_context()both check whether context is present, this function may be slightly more efficient and compact than the other two.
- seq_map(seq, container, one_result=True)[исходный код]¶
Map Lena Sequence seq to the container.
For each value from the container, calculate
seq.run([value]). This can be a list or a single value. If one_result is True, the result must be a single value. In this case, if results contain less than or more than one element,LenaValueErroris raised.The list of results (lists or single values) is returned. The results are in the same order as read from the container.
Group plots¶
Group several plots into one.
Since data can be produced in different places,
several classes are needed to support this.
First, the plots of interest must be selected
(for example, one-dimensional histograms).
This is done by Selector.
Selected plots must be grouped.
For example, we may want to plot data x versus Monte-Carlo x,
but not data x vs data y. Data is grouped by GroupBy.
To preserve the group,
we can’t yield its members to the following elements,
but have to transform the plots inside GroupPlots.
We can also scale (normalize) all plots to one
using GroupScale.
Example from a real analysis:
Sequence(
# ... read data and produce histograms ...
MakeFilename(dirname="background/{{run_number}}"),
UpdateContext("output.plot.name", "{{variable.name}}",
raise_on_missing=True),
lena.flow.GroupPlots(
group_by="variable.coordinate",
# Select either histograms (data) or Graphs (fit),
# but only having "variable.coordinate" in context
select=("variable.coordinate", [histogram, Graph]),
# scale to data
scale=Not("fit"),
transform=(
ToCSV(),
# scaled plots will be written to separate files
MakeFilename(
"{{output.filename}}_scaled",
overwrite=True,
),
UpdateContext("output.plot.name", "{{variable.name}}",
raise_on_missing=True),
write,
# Several prints were used during this code creation
# Print(transform=lambda val: val[1]["plot"]["name"]),
),
# make both single and combined plots of coordinates
yield_selected=True,
),
# create file names for combined plots
MakeFilename("combined_{{variable.coordinate}}"),
# non-combined plots will still need file names
MakeFilename("{{variable.name}}"),
lena.output.ToCSV(),
write,
lena.context.Context(),
# here our jinja template renders a group as a list of items
lena.output.RenderLaTeX(template_dir=TEMPLATE_DIR,
select_template=select_template),
# we have a single template, no more groups are present
write,
lena.output.LaTeXToPDF(),
)
- class GroupBy(group_by='', merge='')[исходный код]¶
Group values.
Data is added during
fill(). Groups dictionary is available asgroupsattribute.groupsis a mapping of keys (defined by group_by and merge) to lists of items with the same key.group_by defines distinct hashable results for values from different groups. It is a dot-separated formatting string. Only the context part of the value is used for grouping (see
context.format_context). group_by can be a tuple of strings. In that case the hash value is combined from each part of the tuple. A tuple may be used when not all parts of context can be rendered (that would lead to an error or an empty string if they were combined into one formatting string).An empty string represents the entire context. The default arguments add all values from the flow into one group (that is merge takes priority over group_by).
Изменено в версии 0.6: group_by is no longer a function.
Добавлено в версии 0.6: merge allows ignoring keys.
- clear()[исходный код]¶
Устарело, начиная с версии 0.6: use the standard
reset()method.
- compute()[исходный код]¶
Yield values groupped by distinct keys one by one.
Each group is a tuple of filled values having the same key.
- fill(val)[исходный код]¶
Find the corresponding group and fill it with val.
A group key is calculated via group_by and merge. If no such key exists, a new group is created.
If a formatting key was not found for val (or if no values for a tuple group_by could produce keys)
LenaValueErroris raised.
- reset()[исходный код]¶
Remove all groups.
- update(val)[исходный код]¶
Устарело, начиная с версии 0.6: use the standard
fill()method.
- class GroupPlots(group_by, select=None, transform=(), scale=None, yield_selected=False)[исходный код]¶
Устарело, начиная с версии 0.6: use
GroupBy,group_plots()and other relevant elements.Plots to be grouped are chosen by select, which acts as a boolean function. By default everything is selected. If select is not a
Selector, it is converted to that class. UseSelectorfor more options.Устарело, начиная с версии 0.5: use
RunIfinstead of select.Plots are grouped by group_by, which returns different keys for different groups. It can be a function of a value or a formatting string for its context (see
GroupBy). Example: group_by=»{{value.variable.name}}_{{variable.name}}».transform is a sequence that processes individual plots before yielding. Example:
transform=(ToCSV(), write). transform is called after scale.Устарело, начиная с версии 0.5: use
MapGroupinstead of transform.scale is a number or a string. A number means the scale, to which plots must be normalized. A string is a name of the plot to which other plots must be normalized. If scale is not an instance of
GroupScale, it is converted to that class. If a plot could not be rescaled,LenaValueErroris raised. For more options, useGroupScale.yield_selected defines whether selected items should be yielded during
run(). By default it isFalse: if we used a variable in a combined plot, we don’t create a separate plot of that.- run(flow)[исходный код]¶
Run the flow and yield final groups.
Each item of the flow is checked with the selector. If it is selected, it is added to groups. Otherwise, it is yielded.
After the flow is finished, groups are yielded. Groups are lists of items, which have same keys returned from group_by. Each group’s context (including empty one) is inserted into a list in context.group. If any element’s context.output.changed is
True, the final context.output.changed is set toTrue(and toFalseotherwise). The resulting context is updated with the intersection of groups“ contexts.If scale was set, plots are normalized to the given value or plot. If that plot was not selected (is missing in the captured group) or its norm could not be calculated,
LenaValueErroris raised.
- group_plots(group)[исходный код]¶
Return data parts of the group and set context[«group»] to their intersection.
If any of values has been changed, context.output.changed of the group is set to
True.
- class GroupScale(scale_to, allow_zero_scale=False, allow_unknown_scale=False)[исходный код]¶
Scale a group of data.
scale_to defines the method of scaling. If a number is given, group items are scaled to that. Otherwise it is converted to a
Selector, which must return a unique item from the group. Group items will be scaled to the scale of that item.By default, attempts to rescale a structure with unknown or zero scale raise an error. If allow_zero_scale and allow_unknown_scale are set to
True, the corresponding errors are ignored and the structure remains unscaled.Подсказка
To scale only one value, use
lena.structures.ScaleTo.- __call__(group)[исходный код]¶
Scale the group. See
scale_to()for details.If group is not iterable,
LenaValueErroris raised.
- class MapGroup(*seq, **map_scalars)[исходный код]¶
Apply a sequence to groups.
Arguments seq must form a Sequence.
Set a keyword argument map_scalars to
Falseto ignore scalar values (those that are not groups). Other keyword arguments raiseLenaTypeError.Добавлено в версии 0.5.
- run(flow)[исходный код]¶
Map seq to every group from flow.
A value represents a group if its context has a key group and its data part is iterable (for example, a list of values). If length of data is different from the length of context.group,
LenaRuntimeErroris raised.seq must produce an equal number of results for each item of group, or
LenaRuntimeErroris raised. These results are yielded in groups one by one.Common changes of group context update common context (that of the value). context.output.changed is set appropriately.
- class Selector(selector, raise_on_error=True)[исходный код]¶
A boolean function on values.
The usage of selector depends on its type.
If selector is a class,
__call__()checks that data part of the value is subclassed from that.A callable is used as it is.
A string means that value’s context must conform to that (as in
context.contains).selector can be a container. In this case its items are converted to selectors. If selector is a list, the result is or applied to results of each item. If it is a tuple, boolean and is applied to the results.
raise_on_error is a boolean that sets whether in case of an exception the selector raises that exception or returns
False. If selector is a container, raise_on_error will be used recursively during the initialization of its items.- __call__(value)[исходный код]¶
Check whether value is selected.
If an exception occurs and raise_on_error is
False, the result isFalse. This could be used while testing potentially non-existing attributes or arbitrary contexts. However, this is not recommended, since it covers too many errors and some of them should be raised explicitly.
- class And(selectors, raise_on_error=True)[исходный код]¶
Базовые классы:
SelectorAnd-test of multiple selectors.
selectors is a tuple of items, each of which is a
Selectoror will be converted to that.raise_on_error has the same meaning as in
Selector, and will be applied to each newly initialized subselector.- __call__(val)[исходный код]¶
Check whether value is selected.
If an exception occurs and raise_on_error is
False, the result isFalse. This could be used while testing potentially non-existing attributes or arbitrary contexts. However, this is not recommended, since it covers too many errors and some of them should be raised explicitly.
- class Or(selectors, raise_on_error=True)[исходный код]¶
Базовые классы:
SelectorOr-test of multiple selectors.
selectors is a list of items, each of which is a
Selectoror will be converted to that. Evaluation is short-circuit, that is if a selector was true, further ones are not applied.raise_on_error has the same meaning as in
Selector, and will be applied to each newly initialized subselector.- __call__(val)[исходный код]¶
Check whether value is selected.
If an exception occurs and raise_on_error is
False, the result isFalse. This could be used while testing potentially non-existing attributes or arbitrary contexts. However, this is not recommended, since it covers too many errors and some of them should be raised explicitly.
- class Not(selector, raise_on_error=True)[исходный код]¶
Базовые классы:
SelectorNegate a selector.
selector is converted to
Selector.raise_on_error has the same meaning as in
Selector.- __call__(value)[исходный код]¶
Negate the result of the selector.
If raise_on_error is
False, then this is a full negation (including the case of an error encountered in the selector). If raise_on_error isTrue, then any occurred exception will be re-raised here.
Iterators¶
Iterators allow to transform a data flow or create a new one.
- class Chain(*iterables)[исходный код]¶
Устарело, начиная с версии 0.6: use itertools.chain and
Iter.Chain generators.
Chaincan be used as aSourceto generate data.Example:
>>> c = Chain([1, 2, 3], ['a', 'b']) >>> list(c()) [1, 2, 3, 'a', 'b']
iterables will be chained during
__call__(), that is after the first one is exhausted, the second is called, etc.- __call__()[исходный код]¶
Generate values from chained iterables.
- class CountFrom(start=0, step=1)[исходный код]¶
Generate numbers from start to infinity, with step between values.
Similar to
itertools.count().- __call__()[исходный код]¶
Yield values from start to infinity with step.
- ISlice(*args, **kwargs)[исходный код]¶
Устарело, начиная с версии 0.4: use
Slice.
- class Iter(iterable)[исходный код]¶
Create a Source from an iterable.
Since
core.Sourceuses a special iteration with__call__(), builtin iter(iterable) would not work to make a Source element. A separate element also allows comparison of iterators due to saving of the initial arguments. Example:src = Source( # generate file names from the given list Iter(["file1.csv", "file2.csv"]), # read data from each file... )
Добавлено в версии 0.6.
См. также
core.SourceElis a more general and complicated alternative.- __call__()[исходный код]¶
Yield values from iterable.
- class Reverse[исходный код]¶
Reverse the flow (yield values from last to first).
Предупреждение
This element will consume the entire flow.
- run(flow)[исходный код]¶
Consume the flow and yield values in reverse order.
- class Slice(*args)[исходный код]¶
Slice data flow from start to stop with step.
Initialization:
Slice(stop)Slice(start, stop [, step])Similar to
itertools.islice()orrange(). Negative indices for start and stop are supported duringrun().Examples:
>>> Slice(1000)
analyse only one thousand first events (no other values from flow are generated). Use it for quick checks of data on small subsamples.
>>> Slice(-1)
yields all elements from the flow except the last one.
>>> Slice(1, -1)
yields all elements from the flow except the first and the last one.
Note that in case of negative indices it is necessary to store abs(start) or abs(stop) values in memory. For example, to discard the last 200 elements one has to a) read the whole flow, b) store 200 elements during each iteration.
It is not possible to use negative indices with
fill_into(), because it doesn’t control the flow and doesn’t know when it is finished. To obtain a negative step, use a composition withReverse.- fill_into(element, value)[исходный код]¶
Fill element with value.
Values are filled in the order defined by (start, stop, step). Element must have a
fill(value)method.When the filling should stop,
LenaStopFillis raised (Splithandles this normally). Sometimes for step more than oneLenaStopFillwill be raised before reaching stop elements. Early exceptions are an optimization and don’t affect the correctness of this method.
- run(flow)[исходный код]¶
Yield values from flow from start to stop with step.