Candidate for a new itertool

George Sakkis george.sakkis at gmail.com
Thu Mar 19 00:02:06 EDT 2009


On Mar 7, 8:47 pm, Raymond Hettinger <pyt... at rcn.com> wrote:

> The existing groupby() itertool works great when every element in a
> group has the same key, but it is not so handy when groups are
> determined by boundary conditions.
>
> For edge-triggered events, we need to convert a boundary-event
> predicate to groupby-style key function.  The code below encapsulates
> that process in a new itertool called split_on().
>
> Would love you guys to experiment with it for a bit and confirm that
> you find it useful.  Suggestions are welcome.

That's pretty close to a recipe [1] I had posted some time ago (and
you commented improving it:)). As they stand, none is a generalization
of the other; your version allows either start or stop events (but not
both) while mine requires the start but takes an optional stop event
(plus an option to control whether separators are returned). If we
were to generalize them, the resulting signature would be something
like:

def split_on(iterable, **kwds):
    '''Split iterable on event boundaries.

    @keyword start,stop: The start and stop boundaries. Both are
optional
                         (but at least one of them must be given).
    @keyword yield_bounds: If True, yield also the boundary events.
    '''


On a related note, I recently needed a grouper that I couldn't come up
with either groupby() or the split_on() above. The reason is that
instead of one, it needs two consecutive events to decide whether to
make a split or not. An example would be to partition an iterable of
numbers (or any orderable objects for that matter) in increasing or
non-decreasing groups:

>>> from operator import gt, ge
>>> list(groupby_needsbettername([3,4,4,2,2,5,1], gt))
[[3, 4, 4], [2, 2, 5], [1]]
>>> list(groupby_needsbettername([3,4,4,2,2,5,1], ge))
[[3, 4], [4], [2], [2, 5], [1]]

def groupby_needsbettername(iterable, is_boundary):
    it = iter(iterable)
    try: cur = it.next()
    except StopIteration:
        return
    group = [cur]
    for next in it:
        if is_boundary(cur,next):
            yield group
            group = []
        group.append(next)
        cur = next
    yield group


George

[1] http://code.activestate.com/recipes/521877/



More information about the Python-list mailing list