Would like some thoughts on a grouped iterator.

Peter Otten __peter__ at web.de
Mon Sep 5 06:46:57 EDT 2016


Jussi Piitulainen wrote:

> Antoon Pardon writes:
> 
>> I need an interator that takes an already existing iterator and
>> divides it into subiterators of items belonging together.
>>
>> For instance take the following class, wich would check whether
>> the argument is greater or equal to the previous argument.
>>
>> class upchecker:
>>     def __init__(self):
>> self.prev = None
>>     def __call__(self, arg):
>> if self.last is None:
>>             self.prev = arg
>>             return True
>>         elif arg >= self.last:
>>             self.prev = arg
>>             return True
>>         else:
>>             self.prev = arg
>> return False
>>
>> So the iterator I need --- I call it grouped --- in combination with
>> the above class would be used someting like:
>>
>> for itr in grouped([8, 10, 13, 11, 2, 17, 5, 12, 7, 14, 4, 6, 15, 16, 19,
>> 9, 0, 1, 3, 18], upchecker()):
>>     print list(itr)
>>
>> and the result would be:
>>
>> [8, 10, 13]
>> [11]
>> [2, 17]
>> [5, 12]
>> [7, 14]
>> [4, 6, 15, 16, 19]
>> [9]
>> [0, 1, 3, 18]
>>
>> Anyone an idea how I best tackle this?
> 
> Perhaps something like this when building from scratch (not wrapping
> itertools.groupby). The inner grouper needs to communicate to the outer
> grouper whether it ran out of this group but it obtained a next item, or
> it ran out of items altogether.
> 
> Your design allows inclusion conditions that depend on more than just
> the previous item in the group. This doesn't. I think itertools.groupby
> may raise an error if the caller didn't consumer a group before stepping
> to a new group. This doesn't. I'm not sure that itertools.groupby does
> either, and I'm too lazy to check.
> 
> def gps(source, belong):
>     def gp():
>         nonlocal prev, more
>         keep = True
>         while keep:
>             yield prev
>             try:
>                 this = next(source)
>             except StopIteration:
>                 more = False
>                 raise
>             prev, keep = this, belong(prev, this)
>     source = iter(source)
>     prev = next(source)
>     more = True
>     while more:
>         yield gp()
> 
> from operator import eq, lt, gt
> for data in ([], [3], [3,1], [3,1,4], [3,1,4,1,5,9,2,6]):
>     for tag, op in (('=', eq), ('<', lt), ('>', gt)):
>         print(tag, data, '=>', [list(g) for g in gps(data, op)])


As usual I couldn't stop and came up with something very similar:

def grouped(items, check):
    items = iter(items)
    buf = next(items)
    more = True

    def group():
        nonlocal buf, more
        for item in items:
            yield buf
            prev = buf
            buf = item
            if not check(prev, item):
                break
        else:
            yield buf
            more = False

    while more:
        g = group()
        yield g
        for _ in g: pass

if __name__ == "__main__":
    def upchecker(a, b):
        return a < b

    items = [
        8, 10, 13, 11, 2, 17, 5, 12, 7, 14, 4, 6, 15, 16, 19, 9, 0, 1, 3, 18
    ]
    for itr in grouped(items, upchecker):
        print(list(itr))

The one thing I think you should adopt from this is that the current group 
is consumed before yielding the next.





More information about the Python-list mailing list