itertools.izip brokeness

rurpy at yahoo.com rurpy at yahoo.com
Tue Jan 3 17:50:38 EST 2006


<bonono at gmail.com> wrote:
> But that is exactly the behaviour of python iterator, I don't see what
> is broken.
>
> izip/zip just read from the respectives streams and give back a tuple,
> if it can get one from each, otherwise stop. And because python
> iterator can only go in one direction, those consumed do lose in the
> zip/izip calls.

[This is really a reply to the thread in general, not specifically
to your response Bonono...]

Yes, I can understand the how the properties of iterators
and izip's design lead to the behavior I observed.
I am saying that the unfortunate interaction of those
properties leads to behavior that make izip essentially
useless in many cases that one would naively expect it
not to be, that that behavior is not pointed out in the docs,
and is subtle enough that it is not realistic to expect
most users to realize it based of the properties of izip
and iterators alone.

izip's uses can be partitioned two ways:
1. All iterables have equal lengths
2. Iterables have different lengths.

Case 1 is no problem obviously.
In Case 2 there are two sub-cases:

2a. You don't care what values occur in the other iterators
  after then end of the shortest.
2b. You do care.

In my experience 1 and 2b are the cases I encounter the most.
Seldom do I need case 2a.  That is, when I can have iterators
of unequal length, usually I want to do *something* with the
extra items in the longer iterators.  Seldom do I want to just
ignore them.

In case 2b one cannot (naively) use izip, because izip
irretrievably throws away data when the end of the
shortest iterable is reached.

The whole point of using izip is to make the code shorter,
more concise, and easier to write and understand.   If I
have to add a lot of extra code to work around izip's problem,
or write my own izip function, then there is no point using
izip().  Or I could just write a simple while loop and handle
the iterators' exhaustions individually.
Ergo, izip is useless for situations involving case 2b.
This should be pointed out in the docs, particularly
since, depending on the order of izip's arguments,
it can appear to be working as one might initially
but erroneously think it should.

However, it would be better if izip could be made useful
fot case 2b situations.  Or maybe, an izip2 (or something)
added.




More information about the Python-list mailing list