[Python-ideas] zip() problem.

Erik python at lucidity.plus.com
Fri Feb 12 18:36:27 EST 2016


Hi.

In writing my previous email, I noticed something about zip() that I'd 
not seen before (but is obvious, I guess) - when it reaches the shortest 
sequence and terminates, any iterators already processed in that pass 
will have generated one extra value than the others. Those additional 
values are discarded.

For example:

h = iter("Hello")
w = iter("World")
s = iter("Spam")
e = iter("Eggs")

for i in zip(h, w, s, e):
   print(i)

for i in (h, w, s, e):
   print(list(i))

---> All iterators are exhausted.

h = iter("Hello")
w = iter("World")
s = iter("Spam")
e = iter("Eggs")

for i in zip(h, s, e, w):
   print(i)

for i in (h, w, s, e):
   print(list(i))


---> "w" still has the trailing 'd' character.


So, if you're using zip() over itertools.zip_longest() then you have to 
be careful of the order of your arguments and try to put the 
probably-shortest one first if this would otherwise cause problems.


The reason I'm posting to 'ideas' is: what should/could be done about it?

1) A simple warning in the docstring for zip()?
2) Something to prevent it (for example a keyword argument to zip() to 
switch on some behaviour where the iterators are first queried that they 
have more items to generate before the values start being consumed)?
3) Nothing. There are bigger things to worry about ;)

WRT (2), I thought that perhaps __len__ was part of the iterator 
protocol, but it's not (just __iter__ and __next__), hence:

 >>> len(range(5, 40))
35
 >>> len(iter(range(5, 40)))
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: object of type 'range_iterator' has no len()
 >>> len(iter("FooBar"))
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: object of type 'str_iterator' has no len()

... though would that also be something to consider (I guess all 
iterators would have to keep some state regarding the amount of values 
previously generated and then apply that offset to the result of len() 
on the underlying object)? Perhaps that would just be too heavyweight 
for what is a relatively minor wart.


E.


More information about the Python-ideas mailing list