Cleaner idiom for text processing?

Wed May 26 05:54:05 EDT 2004

Duncan Booth wrote:

> Peter Otten <__peter__ at web.de> wrote in news:c91got$vpi$00$1 at news.t-
> online.com:
> 
>> Yet another way to create the dictionary:
>> 
>>>>> import itertools
>>>>> nv = iter("foo 1 bar 2 baz 3\n".split())
>>>>> dict(itertools.izip(nv, nv))
>> {'baz': '3', 'foo': '1', 'bar': '2'}
>>>>>
> 
> You can also do that without using itertools:
> 
>>>> nv = iter("foo 1 bar 2 baz 3\n".split())
>>>> dict(zip(nv,nv))
> {'baz': '3', 'foo': '1', 'bar': '2'}
>>>> 

The advantage of my solution is that it omits the intermediate list.

> However, I'm not sure I trust either of these solutions. I know that
> intuitively it would seem that both zip and izip should act in this way,
> but is the order of consuming the inputs actually guaranteed anywhere?

I think an optimization that changes the order assumed above would be
*really* weird. When passing around an iterator, you could never be sure
whether the previous consumer just read 10 items ahead for efficiency
reasons. Allowing such optimizations would in effect limit iterators to for
loops. Moreover, the calling function has no way of knowing whether that
would really be efficient as the first iterator might take a looong time to
yield the next value while the second could just throw a StopIteration. If
a way around this is ever found, checking izip()'s arguments for identity
is only a minor complication.

But if that lets you sleep better at night, change Peter Hansen's suggestion
to use islice():

>>> from itertools import *
>>> nv = "foo 1 bar 2 baz 3\n".split()
>>> dict(izip(islice(nv, 0, None, 2), islice(nv, 1, None, 2)))
{'baz': '3', 'foo': '1', 'bar': '2'}
>>>

However, this is less readable (probably slower too) than the original with
normal slices and therefore not worth the effort for small lists like (I
guess) those in the OP's problem.

Peter