itertools.izip brokeness

Paul Rubin http
Tue Jan 3 05:50:45 EST 2006


rurpy at yahoo.com writes:
> The problem is that sometimes, depending on which file is the
> shorter, a line ends up missing, appearing neither in the izip()
> output, or in the subsequent direct file iteration.  I would guess
> that it was in izip's buffer when izip terminates due to the
> exception on the other file.

Oh man, this is ugly.  The problem is there's no way to tell whether
an iterator is empty, other than by reading from it.  

  http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/413614

has a kludge that you can use inside a function but that's no good
for something like izip.  

For a temporary hack you could make a wrapped iterator that allows
pushing items back onto the iterator (sort of like ungetc) and a
version of izip that uses it, or a version of izip that tests the
iterators you pass it using the above recipe.

It's probably not reasonable to ask that an emptiness test be added to
the iterator interface, since the zillion iterator implementations now
existing won't support it.  

A different possible long term fix: change StopIteration so that it
takes an optional arg that the program can use to figure out what
happened.  Then change izip so that when one of its iterator args runs
out, it wraps up the remaining ones in a new tuple and passes that
to the StopIteration it raises.  Untested:

   def izip(*iterlist):
      while True:
        z = []
        finished = []      # iterators that have run out
        still_alive = []   # iterators that are still alive
          for i in iterlist:
             try:
                z.append(i.next())
                still_alive.append(i)
             except StopIteration:
                finished.append(i)
          if not finished:
             yield tuple(z)
          else:          
             raise StopIteration, (still_alive, finished)

You would want some kind of extended for-loop syntax (maybe involving
the new "with" statement) with a clean way to capture the exception info.
You'd then use it to continue the izip where it left off, with the
new (smaller) list of iterators.



More information about the Python-list mailing list