Good use for itertools.dropwhile and itertools.takewhile

Ian Kelly ian.g.kelly at gmail.com
Wed Dec 5 11:16:09 EST 2012


On Wed, Dec 5, 2012 at 6:45 AM, Chris Angelico <rosuav at gmail.com> wrote:
> On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor <thebalancepro at gmail.com> wrote:
>>
>> takewhile mines for gold at the start of a sequence, dropwhile drops the dross at the start of a sequence.
>
> When you're using both over the same sequence and with the same
> condition, it seems odd that you need to iterate over it twice.
> Perhaps a partitioning iterator would be cleaner - something like
> this:
>
> def partitionwhile(predicate, iterable):
>     iterable = iter(iterable)
>     while True:
>         val = next(iterable)
>         if not predicate(val): break
>         yield val
>     raise StopIteration # Signal the end of Phase 1
>     for val in iterable: yield val # or just "yield from iterable", I think
>
> Only the cold hard boot of reality just stomped out the spark of an
> idea. Once StopIteration has been raised, that's it, there's no
> "resuming" the iterator. Is there a way around that? Is there a clean
> way to say "Done for now, but next time you ask, there'll be more"?

Return two separate iterators, with the contract that the second
iterator can't be used until the first has completed.  Combined with
Neil's groupby suggestion, we end up with something like this:

def partitionwhile(predicate, iterable):
    it = itertools.groupby(iterable, lambda x: bool(predicate(x)))
    pushback = missing = object()
    def first():
        nonlocal pushback
        pred, subit = next(it)
        if pred:
            yield from subit
            pushback = None
        else:
            pushback = subit
    def second():
        if pushback is missing:
            raise TypeError("can't yield from second iterator before
first iterator completes")
        elif pushback is not None:
            yield from pushback
        yield from itertools.chain.from_iterable(subit for key, subit in it)
    return first(), second()

>>> list(map(' '.join, partitionwhile(lambda x: x.upper() == x, "CAPSICUM RED fresh from QLD".split())))
['CAPSICUM RED', 'fresh from QLD']



More information about the Python-list mailing list