Candidate for a new itertool
pruebauno at latinmail.com
pruebauno at latinmail.com
Thu Mar 12 13:36:30 EDT 2009
On Mar 7, 8:47 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> The existing groupby() itertool works great when every element in a
> group has the same key, but it is not so handy when groups are
> determined by boundary conditions.
>
> For edge-triggered events, we need to convert a boundary-event
> predicate to groupby-style key function. The code below encapsulates
> that process in a new itertool called split_on().
>
> Would love you guys to experiment with it for a bit and confirm that
> you find it useful. Suggestions are welcome.
>
> Raymond
>
> -----------------------------------------
>
> from itertools import groupby
>
> def split_on(iterable, event, start=True):
> 'Split iterable on event boundaries (either start events or stop
> events).'
> # split_on('X1X23X456X', 'X'.__eq__, True) --> X1 X23 X456 X
> # split_on('X1X23X456X', 'X'.__eq__, False) --> X 1X 23X 456X
> def transition_counter(x, start=start, cnt=[0]):
> before = cnt[0]
> if event(x):
> cnt[0] += 1
> after = cnt[0]
> return after if start else before
> return (g for k, g in groupby(iterable, transition_counter))
>
> if __name__ == '__main__':
> for start in True, False:
> for g in split_on('X1X23X456X', 'X'.__eq__, start):
> print list(g)
> print
>
> from pprint import pprint
> boundary = '--===============2615450625767277916==\n'
> email = open('email.txt')
> for mime_section in split_on(email, boundary.__eq__):
> pprint(list(mime_section, 1, None))
> print '= = ' * 30
For me your examples don't justify why you would need such a general
algorithm. A split function that works on iterables instead of just
strings seems straightforward, so maybe we should have that and
another one function with examples of problems where a plain split
does not work.
Something like this should work for the two examples you gave were the
boundaries are a known constants (and therefore there is really no
need to keep them. I can always add them later):
def split_on(iterable, boundary):
l=[]
for el in iterable:
if el!=boundary:
l.append(el)
else:
yield l
l=[]
yield l
def join_on(iterable, boundary):
it=iter(iterable)
firstel=it.next()
for el in it:
yield boundary
for x in el:
yield x
if __name__ == '__main__':
lst=[]
for g in split_on('X1X23X456X', 'X'):
print list(g)
lst.append(g)
print
print list(join_on(lst,'X'))
More information about the Python-list
mailing list