Python Iterables struggling using map() built-in

Mon Dec 8 21:50:25 EST 2014

Roy Smith wrote:

> Chris Angelico wrote:
>> > I'm actually glad PEP 479 will break this kind of code. Gives a good
>> > excuse for rewriting it to be more readable.
> 
> Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
>> What kind of code is that? Short, simple, Pythonic and elegant? :-)
>> 
>> Here's the code again, with indentation fixed:
>> 
>> 
>> def myzip(*args):
>>     iters = map(iter, args)
>>     while iters:
>>         res = [next(i) for i in iters]
>>         yield tuple(res)
> 
> Ugh.  When I see "while foo", my brain says, "OK, you're about to see a
> loop which is controlled by the value of foo being changed inside the
> loop".

Yes. Me too. 99% of the time when you see "while foo", that's what you'll
get, so it's the safe assumption. But it's only an assumption, not a
requirement. When you read a bit more of the code and see that iters isn't
being modified, your reaction ought to be closer "oh wow, that's neat"
than "oh noes it's different from what I expected".

"while foo" is logically equivalent to "if foo: while foo:". The if is
completely redundant.

> That's not at all what's happening here, so my brain runs into a 
> wall.

I hope you are exaggerating for effect, because if you genuinely mean that
reading that code causes major mental trauma (perhaps the equivalent of a
mental BSOD) then you've probably picked the wrong industry to be working
in. Imagine how you would cope reading genuinely obfuscated code. You would
probably have a nervous breakdown :-)

It's okay to read code which forces you to reevaluate your initial
assumption about the code. People, especially (allegedly) smart people like
programmers, are intelligent and flexible. If you can't do that, you're
going to hate Python:

- Python has no "repeat N times" loop, we have to use "for i in range(...)"
instead, so seeing a for-loop doesn't necessarily mean that the loop
variable will be used. Sometimes it isn't.

- I cannot count the number of times I've read, or written, a method that
doesn't use "self", but doesn't bother to declare it as a staticmethod.

- The official way to get a single arbitrary value from a set without
removing it is:

    for value in the_set:
        return value

GvR recently gave an example of how to process a single element in a
possibly-empty iterator:

    for x in it:
        print(x)
        break
    else:
        print('nothing')

so there are two examples of using a for-loop to *not* loop over something.

- Ducktyping. Just because some code is using a goose, doesn't mean that a
goose is required. Perhaps a duck is required but a goose is close enough.

> Next problem, what the heck is "res"?  We're not back in the punch-card
> days.  We don't have to abbreviate variable names to save columns.

*shrug* I didn't pick the name. But "res" is a standard abbreviation
for "result" or "resource", and from context it clearly should be "result".

[...]
>> I think this function makes a good test to separate the masters from the
>> apprentices.
> 
> The goal of good code is NOT to separate the masters from the
> apprentices.  The goal of good code is to be correct 

So far I agree.

> and easy to 
> understand by the next guy who comes along to maintain it.

No. That is *one secondary goal*. Efficiency is another secondary goal.
Sometimes education is even more important, and in this specific case the
function is being used to teach people.

Another secondary goal ought to be beauty and elegance over ugliness. It's
not often a beautiful function actually is good enough for production use.
It's usually surrounded by an inelegant if not downright ugly pile of code
testing arguments, checking for error conditions, handling corner cases,
etc. That ugliness can obfuscate the underlying algorithm and make the
function harder to understand. If it isn't *necessary*, take it out.

    "Perfection is achieved, not when there is nothing more to add, 
    but when there is nothing left to take away."

While ease of maintenance is an important goal, think about what we are
discussing. zip() is a primitive function. Once you have decided on the
public API, it will probably never need any maintenance. It's not like the
requirements will change -- "yeah, we used to want to zip the items
together, but now we need to add a 9% superannuation surcharge to them
first".

The beauty of this code is that it is so simple that it cannot fail to be
bug-free. (Am I wrong? Have I missed a corner-case?) There's no unnecessary
code in the function that can hide bugs and obfuscate what it does.

And it is so simple that it's hard to see anything written in pure Python
being more efficient. So as far as maintenance goes, that's irrelevant.
That function is simply finished, done, complete. Anything you do to
improve it can only make it slower, buggier, or uglier.

(And that's a rare and beautiful thing in code.)

>> If you can read this function and instantly tell how it works, that it is
>> bug-free and duplicates the behaviour of the built-in zip(), you're
>> probably Raymond Hettinger. If you can tell what it does but you have to
>> think about it for a minute or two before you understand why it works,
>> you can call yourself a Python master. If you have to sit down with the
>> interactive interpreter and experiment for a bit to understand it, you're
>> doing pretty well.
> 
> That pretty much is the point I'm trying to make.  If the code is so
> complicated that masters can only understand it after a couple of
> minutes of thought, 

No, you have misunderstood me. Yet again, it seems, my use of language is
too subtle and I fail to get my point across :-(

If you can read that function and immediately understand that it zips up
items, then you have achieved mastery over Python. You are no longer a
beginner who has to puzzle over questions like "what does [next(i) for i in
iters] do?". That's the first step.

The second step is to tell that the code is correct. It doesn't just zip up
items, but it handles the corner cases correctly: what if there are no
arguments at all? What if the arguments are of different sizes? What if
they're all infinite iterators? If some of the arguments are not iterable,
will it raise correctly? It's okay if you need to think for a minute or two
to convince yourself that this is the case. You have still achieved mastery
over them.

> and those of us who are just "doing pretty well" 
> need to sit down and puzzle it out in the REPL, then it's too
> complicated for most people to understand.  KISS beats elegant.

I love the REPL as much, if not more, than most programmers. It pains me to
think about using any language without one. But I don't believe for a
second that you *needed* to use the REPL to understand this function. You
might have chosen to do so, "just to be sure", but I'm confident that if I
had asked you to analyse the code just in your head you could have done so
correctly.

That's what I did, and I don't think I'm a better Python programmer than
you.

And for what it's worth, I too tried it in the REPL. Why? Not because I
couldn't understand the function, but because I feared I may have missed
some odd corner case. I'm pretty sure I have got them all: no arguments, a
single argument, different sized arguments. That's part of the "minute or
two" to be sure that the function is correct.

-- 
Steven