Python 3000, zip, *args and iterators
Steven Bethard
steven.bethard at gmail.com
Mon Dec 27 14:14:55 EST 2004
Raymond Hettinger wrote:
> [Steven Bethard]
>
>>What I would prefer is something like:
>>
>> >>> zip(*g(4))
>><iterator object at ...>
>> >>> x, y, z = zip(*g(4))
>> >>> x, y, z
>>(<iterator object at ...>, <iterator object at ..., <iterator object
> at ...)
>
> 2. It is instructive to look at Guido's reactions to other *args
> proposals. His receptivity to a,b,*c=it wanes whenever someone then
> requests support for a,*b,c=it.
Yeah, I've seen his responses to those kind of suggestions. I don't
think what I'm suggesting (at least in terms of *args) is quite as
extreme though -- I'm still only talking about *args in function
definitions. I'm just suggesting that in a function with a *args in the
def, the args variable be an iterator instead of a tuple. (This doesn't
entirely solve my zip problem of course, but it's the only *args change
I was suggesting.)
> Likewise, he considers zip(*args) as a
> transpose function to be an abuse of the *arg protocol.
Ahh, I didn't know that. Is there another (preferred) way to do this?
> 3. The recipe discussion and newsgroup posting present only toy
> examples -- real use cases have not yet emerged.
Ok, I'll try to give you one of my use cases. It's a little
complicated, so sorry if my explanation goes on for a bit here.
Basically, I'm parsing one file format to another. The files can be
quite large, so it's important to use iterators wherever possible. My
conversion function is a generator that generates a (label,
feature_dict) pair for each line in the input file.
Now, two possible things can happen at this point (depending on
parameters from the user):
CASE 1: I output the (label, feature_dict) pairs as is, with code
something like:
for label, feature_dict in generator:
write_instance(label, feature_dict)
This is, of course, the simple case.
CASE 2: I need to apply a windowing function to the iterables so that
each line includes not only its feature_dict's values, but also the
values of some of the surrounding feature_dicts. Note that I only want
to window the feature_dicts, not the labels. This gives me code
something like:
labels, feature_dicts = starzip(generator)
for label, feature_window in izip(labels, window(feature_dicts)):
write_instance(label, combine_dicts(feature_widow))
Note that I can't write the code like:
for label, feature_dict in generator:
feature_dict = combine_dicts(window(feature_dict)) # WRONG!
write_instance(label, feature_dict)
because window produces an iterable from an *iterable* of feature_dicts,
not from a single feature_dict. So basically what I've done here is to
"transpose" (to use your word) the iterators, apply my function, and
then transpose the iterators back.
Hopefully this gives a little better justification for starzip? If you
have a cleaner way to do this kind of thing, I'd welcome any suggestions
of course.
If zip(*) is discouraged as a transpose function, maybe I should be
lobbying for adding a transpose function instead? (For now, of course,
it would go into itertools, but when iterators become the standard in
Python 3.0, maybe it could be moved into the builtins...)
Thanks for your comments!
Steve
More information about the Python-list
mailing list