itertools.izip brokeness

Tom Anderson twic at urchin.earth.li
Tue Jan 3 10:18:03 EST 2006


On Tue, 3 Jan 2006, it was written:

> rurpy at yahoo.com writes:
>
>> The problem is that sometimes, depending on which file is the shorter, 
>> a line ends up missing, appearing neither in the izip() output, or in 
>> the subsequent direct file iteration.  I would guess that it was in 
>> izip's buffer when izip terminates due to the exception on the other 
>> file.
>
> A different possible long term fix: change StopIteration so that it
> takes an optional arg that the program can use to figure out what
> happened.  Then change izip so that when one of its iterator args runs
> out, it wraps up the remaining ones in a new tuple and passes that
> to the StopIteration it raises.

+1

I think you also want to send back the items you read out of the iterators 
which are still alive, which otherwise would be lost. Here's a somewhat 
minimalist (but tested!) implementation:

def izip(*iters):
 	while True:
 		z = []
 		try:
 			for i in iters:
 				z.append(i.next())
 			yield tuple(z)
 		except StopIteration:
 			raise StopIteration, z

The argument you get back with the exception is z, the list of items read 
before the first empty iterator was encountered; if you still have your 
array iters hanging about, you can find the iterator which stopped with 
iters[len(z)], the ones which are still going with iters[:len(z)], and the 
ones which are in an uncertain state, since they were never tried, with 
iters[(len(z) + 1):]. This code could easily be extended to return more 
information explicitly, of course, but simple, sparse, etc.

> You would want some kind of extended for-loop syntax (maybe involving 
> the new "with" statement) with a clean way to capture the exception 
> info.

How about for ... except?

for z in izip(a, b):
 	lovingly_fondle(z)
except StopIteration, leftovers:
 	angrily_discard(leftovers)

This has the advantage of not giving entirely new meaning to an existing 
keyword. It does, however, afford the somewhat dubious use:

for z in izip(a, b):
 	lovingly_fondle(z)
except ValueError, leftovers:
 	pass # execution should almost certainly never get here

Perhaps that form should be taken as meaning:

try:
 	for z in izip(a, b):
 		lovingly_fondle(z)
except ValueError, leftovers:
 	pass # execution could well get here if the fondling goes wrong

Although i think it would be more strictly correct if, more generally, it 
made:

for LOOP_VARIABLE in ITERATOR:
 	SUITE
except EXCEPTION:
 	HANDLER

Work like:

try:
 	while True:
 		try:
 			LOOP_VARIABLE = ITERATOR.next()
 		except EXCEPTION:
 			raise __StopIteration__, sys.exc_info()
 		except StopIteration:
 			break
 		SUITE
except __StopIteration__, exc_info:
 	somehow_set_sys_exc_info(exc_info)
 	HANDLER

As it stands, throwing a StopIteration in the suite inside a for loop 
doesn't terminate the loop - the exception escapes; by analogy, the 
for-except construct shouldn't trap exceptions from the loop body, only 
those raised by the iterator.

tom

-- 
Chance? Or sinister scientific conspiracy?



More information about the Python-list mailing list