itertools.izip brokeness

Tue Jan 3 23:07:32 EST 2006

"Duncan Booth" <duncan.booth at invalid.invalid> wrote:
> Peter Otten wrote:
>
> > from itertools import izip, chain, repeat
> >
> > def prt_files (file1, file2):
> >     file1 = chain(file1, repeat(""))
> >     file2 = chain(file2, repeat(""))
> >     for line1, line2 in iter(izip(file1, file2).next, ("", "")):
> >         print line1.rstrip(), "\t", line2.rstrip()
> >
> > which can easily be generalized for an arbitrary number of files.
>
> Generalizing for an arbitrary number of files and for an arbitrary value to
> pad out the shorter sequences:
>
> def paddedizip(pad, *args):
>     terminator = [pad] * (len(args)-1)
>     def padder():
>         if not terminator:
>             return
>         t = terminator.pop()
>         while 1:
>             yield t
>     return izip(*(chain(a, padder()) for a in args))
>
> >>> for (p,q) in paddedizip(0,[1,2,3],[4,5]):
> print repr(p), repr(q)
>
>
> 1 4
> 2 5
> 3 0
[...more examples snipped...]

Here what I came up with:

def izipl (*iterables, **kwds):
        sentinel = ""     # Default value, maybe None would be better?
        for k,v in kwds:  # Look for "sentinel" arg, error on any
other.
            if k != "sentinel":
                raise TypeError, "got an unexpected keyword argument
'%s'" % k
            else: sentinel = v
        iterables = map (iter, iterables)  # itertools.izip does this.

        while iterables:
            result = [];  cnt = 0
            for i in iterables:
                try: result.append (i.next())
                except exceptions.StopIteration:
                    result.append (sentinel)
                    cnt += 1
            if cnt == len (iterables): raise StopIteration
            yield tuple(result)

Hmm, your function returns an izip object, mine just returns
the results of the iteration.  So I guess my function would
be the next() method of a izipl class?  I still have not got
my head around this stuff :-(

But here is my real question...
Why isn't something like this in itertools, or why shouldn't
it go into itertools?

It is clear that there is a real need for iterating in parallel
over multiple iterators to the end of the longest one.  Why
does something that stops at the shortest get included in
the standard library, but one that stops after the longest
doesn't?  Is there any hope for something like this being
included in 2.5?