Sendtools

Tools for composing consumers for iterators. A companion to itertools.
Download

Sendtools Ranking & Summary

Advertisement

  • Rating:
  • License:
  • MPL
  • Price:
  • FREE
  • Publisher Name:
  • Frank DiLecce
  • Publisher web site:
  • http://www.supportware.net/mozilla

Sendtools Tags


Sendtools Description

Tools for composing consumers for iterators. A companion to itertools. Sendtools is a collections of classes for efficiently consuming iterators into one or more data structures. Sendtools compliments the itertools module and other the excellent facilities Python offers for iteration. Sendtools is useful when: * Your source iterator is too big to fit in memory * Your data source is I/O bounds so you don't want to make more than one pass * You want to collect data into two or more lists (or other collection) * You want to group, filter, transform or otherwise aggregate the dataSuch situations occur when you're analysing query-sets from large databases or datafiles (HDF5-files, for example).Sendtools is written using Cython to produce a 100% compiled module, for maximum performance.RequirementsThere are no dependencies outside of python to compile and install sendtools (although you will need a compiler obviously).If you want to hack on the Cython code, you'll need Cython-0.12.1 or later.InstallationSendtools is installed from source using distutils in the usual way - run:python setup.py installto install it site-wideIf you have Cython installed, you can also import the sendtools.pyx file directly using the pyximport module (part of Cython). This is handy for development, as used in the unittest script.UsageSendtools is built on the concept of "Consumer" objects. These were inspired by python's generators (an early version of sendtools was implemented in python using generators). Consumer objects can have data "sent" into them. Unlike generators, Consumers do not produce data iteratively (no 'next' method), but they do produce a result which can be accessed at any time using the .result() method.Data is typically sent into a Consumer using the sendtools.send() function, which takes the form:output = send(source, target)where source is an iterator producing data. target is a Consumer object into which the data is sent. output is the Consumer's result, returned after the source has been fully consumed, or the Consumer indicates it's complete (by raising StopIteration), which ever happens first. Basically, the send function if a shortcut for writing a for-loop.The target may be list or set, representing the data structure you want to collect the data into. These are implicitly converted to Consumer objects by the send function. The input list (or set) is returned by the send function having been filled with data.Target can also be a (multiply nested) tuple of consumers. In this case the result will be a tuple which matches the structure of the target tuple, containing the results for each consumer. In this way, data from a source iterator can be collected into multiple lists in a single iteration pass.Sendtools defines many aggregation consumers. These do not produce a list or other collection as their result, but a scalar value.ExamplesLet's start with basic usage of the send() function:>>> from sendtools import send>>> data = range(10)>>> out=[]>>> result = send(data, out)>>> result>>> out>>> result is outTrueThe source 'data' is copied into the target 'out' and this is returned.Now lets see how to send data into multiple targets:>>> a, (b,c) = send(data, ([], ([],[])))>>> a is b; b is c; a is cFalseFalseFalse>>> a == b; b == c; a == cTrueTrueTrueThe data is copied into three different lists.Data can be collected into sets as well as lists:>>> data = >>> send(data, set())set()In fact, any MutableSequence or MutableSet (the Abstract Base Class) will do. Sadly, the std-lib array.array object is not registered as a MutableSequence out-the-box, but we can do this ourselves:>>> from array import array>>> from collections import MutableSequence>>> MutableSequence.register(array)>>> data = >>> target = array("f") #an empty array>>> send(data, target)array('f', )AggregationNow let's see some aggregation:>>> send(data, ([], (Max(), Min(), Sum(), Ave())))(, (9, 0, 45, 4.5))All the aggregation functions found in SQL are available: Sum, Max, Min, Ave, First, Last, Count.There are a few more besides these: * Select - Picks the n'th item in a sequence * Stats - Computes an incremental standard deviation, mean and count of it's input.This last one only works with numerical input and returns a length-3 tuple as it's result.Transformations and FilteringData can be filtered using Filter:>>> data = >>> send(data, Filter(lambda x:x%2==0, []))Data can be transformed using Map:>>> send(data, ([], Map(lambda x:x**2, [])))(, )One important use-case is splitting a sequence of tuples or other compound objects into multiple lists. Although this can be done with Map, this is such a common operation, we have a dedicated Get object for this purpose. eg.:>>> tups = >>> print tups>>> a,b = send(tups, (Get(0,[]), Get(1,[])))>>> a>>> bThis works for any suitable indexing object. For example, columns from a database query can be collected into some lists using this method. Object attributes can also be retrieved in a similar manner using the Attr object.Grouping and SwitchingData can be grouped in a variety of ways. The grouping objects take a factory function as a keyword argument. This is called to create each group. By default, a list group is created, but more complex group-types are possible: aggregates, tuples of targets or even other grouping objects. Any valid target object can be used.Here's an example of simple grouping by number into sublists:>>> data>>> send(data, GroupByN(3,[])), , , , , ]Now let's use a more complex group factory for get the mean of each group, as well as the group list:>>> send(data, GroupByN(3, [], factory=lambda :([],Ave()))), 1.0), (, 4.0), (, 7.0), (, 10.0),(, 13.0), (, 16.0)]Groups can also be created using a key-function, with the GroupByKey object:>>> data = >>> send(data, GroupByKey(lambda x:x==5, [])), , , , , , ]Note, new groups are created whenever the key-function returns a different result to the previous item, regardless of whether that result has been used to create previous groups.Switching is a very close relative to grouping. The Switch object passes it's input to a key-function which must return an int. The input is passed to one of N outputs according to this int. I.e. >>> data = >>> send(data, Switch(lambda x:int(x>> data = >>> func = lambda item: "low" if item>> send(data, SwitchByKey(func, init={"low":})) {'high': , 'low': } >>> send(data, SwitchByKey(func, factory=Sum)) {'high': 47, 'low': 35}The init keyword specifies a dictionary of groups with which to initialise the object (an empty dict by default). When a new key is encountered (that does not already exist in the dict), the factory function is called to create a new group for this key. Requirements: · Python


Sendtools Related Software