Peek inside iterator (is there a PEP about this?)

Thu Oct 2 04:40:53 EDT 2008

Luis Zarrabeitia wrote:

> On Wednesday 01 October 2008 01:14:14 pm Peter Otten wrote:
>> Luis Zarrabeitia wrote:
>> > a = iter([1,2,3,4,5]) # assume you got the iterator from a function and
>> > b = iter([1,2,3])     # these two are just examples.
>>
>> Can you provide a concrete use case?
> 
> I'd like to... but I've refactored away all the examples I had, as soon as
> I realized that I didn't know which one was the shortest sequence to put
> it first.
> 
> But, it went something like this:
> 
> ===
> def do_stuff(tasks, params):
>     params = iter(params)
>     for task in tasks:
>         for partial_task, param in zip(task, params):
>             pass #blah blah, do stuff here.
>         print "task completed"
> ===
> 
> Unfortunately that's not the real example (as it is, it shows very bad
> programming), but imagine if params and/or tasks were streams beyond your
> control (a data stream and a control stream). Note that I wouldn't like a
> task or param to be wasted.

This remains a bit foggy to me. Maybe you are better off with deques than
iterators?

> I didn't like the idea of changing both the 'iter' and the 'zip' (changing
> only one of them wouldn't have worked).
> 
>> > Will this iterator yield any value? Like with most iterables, a
>> > construct
>> >
>> > if iterator:
>> >    # do something
>>
>> I don't think this has a chance. By adding a __len__ to some iterators R.
>> Hettinger once managed to break GvR's code. The BDFL was not amused.
> 
> Ouch :D
> But, no no no. Adding a __len__ to iterators makes little sense (specially
> in my example), and adding an optional __len__ that some iterators have
> and some don't (the one that can't know their own lengths) would break too
> many things, and still, wouldn't solve the problem of knowing if there is
> a next element. A __nonzero__() that would move the iterator forward and
> cache the result, with a next() that would check the cache before
> advancing, would be closer to what I'd like.

The problem was that __len__() acts as a fallback for __nonzero__(), see

http://mail.python.org/pipermail/python-dev/2005-September/056649.html

>> > if any(iterator):
>> >    # do something ... but the first true value was already consumed and
>> >    # cannot be reused. "Any" cannot peek inside the iterator without
>> >    # consuming the value.
>>
>> for item in iflter(bool, iterator):
>>    # do something
>>    break
> 
> It is not, but (feel free to consider this silly) I don't like breaks. In
> this case, you would have to read until the end of the block to know that
> what you wanted was an if (if you are lucky you may figure out that you
> wanted to simulate an if test).

Ok, make it

for item in islice(ifilter(bool, iterator), 1):
    # do something

then ;)

> (Well, I use breaks sometimes, but most of them are because I need to test
> if an iterator is empty or not)
> 
>> Personally I think that Python's choice of EAFP over LBYL is a good one,
>> but one that cannot easily be reconciled with having peekable iterators.
>> If I were in charge I'd rather simplify the iterator protocol (scrap
>> send() and yield expressions) than making it more complex.
> 
> Oh, I defend EAFP strongly. On my university LBYL is preferred, so
> whenever I teach python, I have to give strong examples of why I like
> EAFP.
> 
> When the iterator is empty means that there is something wrong, I wouldn't
> think of using "if iterator:". That would be masquerading what should be
> an exception. However, if "iterator is empty" is meaningful, that case
> should go in an "else" clause, rather than "except". Consider if you need
> to find the first non-empty iterator from a list (and then sending it to
> another function - can't test for emptiness with a "for" there, or one
> could lose the first element)

You can do it

def non_empty(iterators):
    for iterator in iterators:
        it = iter(iterator)
        try:
            yield chain([it.next()], it)
        except StopIteration:
            pass

for it in non_empty(iterators):
   return process(it)

but with iterators as they currently are in Python you better rewrite
process() to handle empty iterators and then write

for it in iterators:
    try:
        return process(it)
    except NothingToProcess: # made up
        pass

That's how I understand EAFP. Assume one normal program flow and deal with
problems as they occur.

> But that's one of the cases where one should know what is doing. Both C#
> and Java have iterators that let you know if they are finished before
> consuming the item. (I didn't mean to compare, and I like java's more than
> C#, as java's iterator also promote the 'use once' design).

I think that may be the core of your problem. Good code built on Python's
iterators will not resemble the typical Java approach.

Peter