Candidate for a new itertool

Sat Mar 7 19:47:21 EST 2009

The existing groupby() itertool works great when every element in a
group has the same key, but it is not so handy when groups are
determined by boundary conditions.

For edge-triggered events, we need to convert a boundary-event
predicate to groupby-style key function.  The code below encapsulates
that process in a new itertool called split_on().

Would love you guys to experiment with it for a bit and confirm that
you find it useful.  Suggestions are welcome.

Raymond

-----------------------------------------

from itertools import groupby

def split_on(iterable, event, start=True):
    'Split iterable on event boundaries (either start events or stop
events).'
    # split_on('X1X23X456X', 'X'.__eq__, True)  --> X1 X23 X456 X
    # split_on('X1X23X456X', 'X'.__eq__, False) --> X 1X 23X 456X
    def transition_counter(x, start=start, cnt=[0]):
        before = cnt[0]
        if event(x):
            cnt[0] += 1
        after = cnt[0]
        return after if start else before
    return (g for k, g in groupby(iterable, transition_counter))

if __name__ == '__main__':
    for start in True, False:
        for g in split_on('X1X23X456X', 'X'.__eq__, start):
            print list(g)
        print

    from pprint import pprint
    boundary = '--===============2615450625767277916==\n'
    email = open('email.txt')
    for mime_section in split_on(email, boundary.__eq__):
        pprint(list(mime_section, 1, None))
        print '= = ' * 30