Peek inside iterator (is there a PEP about this?)

Terry Reedy tjreedy at udel.edu
Wed Oct 1 16:14:09 EDT 2008


Luis Zarrabeitia wrote:
> Hi there.
> 
> For most use cases I think about, the iterator protocol is more than enough. 
> However, on a few cases, I've needed some ugly hacks.
> 
> Ex 1:
> 
> a = iter([1,2,3,4,5]) # assume you got the iterator from a function and
> b = iter([1,2,3])     # these two are just examples.
> 
> then,
> 
> zip(a,b)
> 
> has a different side effect from
> 
> zip(b,a)
> 
> After the excecution, in the first case, iterator a contains just [5], on the 
> second, it contains [4,5]. I think the second one is correct (the 5 was never 
> used, after all). I tried to implement my 'own' zip, but there is no way to 
> know the length of the iterator (obviously), and there is also no way 
> to 'rewind' a value after calling 'next'.

Interesting observation.  Iterators are intended for 'iterate through 
once and discard' usages.  To zip a long sequence with several short 
sequences, either use itertools.chain(short sequences) or put the short 
sequences as the first zip arg.

> Ex 2:
> 
> Will this iterator yield any value? Like with most iterables, a construct
> 
> if iterator:
>    # do something
> 
> would be a very convenient thing to have, instead of wrapping a 'next' call on 
> a try...except and consuming the first item.

To test without consuming, wrap the iterator in a trivial-to-write 
one_ahead or peek class such as has been posted before.

> Ex 3:
> 
> if any(iterator):
>    # do something ... but the first true value was already consumed and 
>    # cannot be reused. "Any" cannot peek inside the iterator without 
>    # consuming the value.

If you are going to do something with the true value, use a for loop and 
break.  If you just want to peek inside, use a sequence (list(iterator)).

> Instead,
> 
> i1, i2 = tee(iterator)
> if any(i1):
>    # do something with i2

This effectively makes two partial lists and tosses one.  That may or 
may not be a better idea.

> Question/Proposal:
> 
> Has there been any PEP regarding the problem of 'peeking' inside an iterator? 

Iterators are not sequences and, in general, cannot be made to act like 
them.  The iterator protocol is a bare-minimum, least-common-denominator 
requirement for inter-operability.  You can, of course, add methods to 
iterators that you write for the cases where one-ahead or random access 
*is* possible.

> Knowing if the iteration will end or not, and/or accessing the next value, 
> without consuming it? Is there any (simple, elegant) way around it?

That much is trivial.  As suggested above, write a wrapper with the 
exact behavior you want.  A sample (untested)

class one_ahead():
   "Self.peek is the next item or undefined"
   def __init__(self, iterator):
     try:
       self.peek = next(iterator)
       self._it = iterator
     except StopIteration:
       pass
   def __bool__(self):
     return hasattr(self, 'peek')
   def __next__(self): # 3.0, 2.6?
     try:
       next = self.peek
       try:
         self.peek = next(self._it)
       except StopIteration:
         del self.peek
       return next
     except AttrError:
       raise StopIteration

Terry Jan Reedy




More information about the Python-list mailing list