Re: Python Mystery Theatre -- Episode 2: Así Fue

Tue Jul 15 10:27:00 EDT 2003

Raymond Hettinger wrote:
> Here are four more mini-mysteries for your amusement
> and edification.
> 
> In this episode, the program output is not shown.
> Your goal is to predict the output and, if anything
> mysterious occurs, then explain what happened
> (again, in blindingly obvious terms).
> 
> There's extra credit for giving a design insight as to
> why things are as they are.
> 
> Try to solve these without looking at the other posts.
> Let me know if you learned something new along the way.
> 
> To challenge the those who thought the last episode
> was too easy, I've included one undocumented wrinkle
> known only to those who have read the code.
> 

I thought this one was much tougher than the Act 1. I ended up doing a 
lot of research on this one. I haven't read the other answers yet, I've 
been holding off until I finished this. (Having read my response, I 
apologize for the length. I don't think I scored so well on "blindingly 
obvious".) Here goes ...

> ACT I -----------------------------------------------
> print '*%*r*' % (10, 'guido')
> print '*%.*f*' % ((42,) * 2)

This one wasn't hard. I've used this feature before. The stars at the 
front and back tend to act as visual confusion. The stars in the middle 
indicate an option to the format that is provided as a parameter.  Thus 
the first one prints the representation (%r) of the string 'guido' as a 
ten character wide field. When I tried it, the only thing I missed was 
that the representation of 'guido' is "'guido'" not "guido". So the 
first one prints out:

*   'guido'*

rather than:

*     guido*

which would have been my first guess.

The second one takes just a little more thought. The result of this is
equivalent to:

print '*%.42f*' % 42

which yields

*42.000000000000000000000000000000000000000000*

That is a fixed point number with 42 digits after the decimal point. 
(Yes, I did copy that from Idle rather than counting zeros.)

Aside: I have to admit that the ((42,) * 2) did confuse me at first. I'm 
so used to doing 2 * (42,) when I want to repeat a sequence that I 
hadn't thought about the reversed form.

Having used this feature before, I have to say that I think the 
documentation for how to do this is quite comprehensible.

> ACT II -----------------------------------------------
> s = '0100'
> print int(s)
> for b in (16, 10, 8, 2, 0, -909, -1000, None):
>     print b, int(s, b)

Boy! This one send me to the documentation, and finally to the code.

According to the documentation the legal values for the parameter b are
b = 0 or 2 <= b <= 36. So the first print yields 100 (the default base 
for a string is 10 if not specified). The next few lines of output are:

16 256
10 100
8 64
2 4
0 64

The only one that deserves an additional comment is the last line. 
According to the documentation, a base of 0 means that the number is 
interpreted as if it appeared in program text, in this case, since the 
string begins with a '0', its interpreted as base 8.

Let's skip -909 for a moment. -1000 raises an exception. None would also 
raise an exception if we ever got there. I also find that one a little 
non-intuitive, more about that later.

For no immediately apparent reason (Raymond's undocumented wrinkle!), 
the next line of the output (after the above) is:

-909 100

The only reason I found that was to try it. After hunting through the 
code (Yes, I have no problem with C. No, I'm not familiar with the 
organization of the Python source.) I eventually (see int_new in 
intobject.c) find out that the int function (actually new for the int 
type) looks like it was defined as:

def int(x, b=-909):
     ...

That is, the default value for b is -909. So, int('0100', -909) has the 
same behavior as int('0100'). This explains the result.

Having read the code, I now understand _all_ about how this function 
works. I understand why there is a default value. For example:

int(100L) yields 100, but there is no documented value for b such that 
int(100L, b) yields anything except a TypeError. However, using b=-909 
is the same as not specifying b. This allows me to write code like:

if type(x) is str:
     b = 16
else:
     b = -909
return int(x, b)

I'm not really sure whether that's better than, for example

if type(x) is str:
     return int(x, 16)
else:
     return int(x)

or not. However, I find the use of the constant -909 is definitely 
"magic". If it was up to me, I would use a default value of b = None, so 
that int(x) and int(x, None) are equivalent. It seems to me that that 
could be documented and would not be subject to misinterpretation.

> ACT III ----------------------------------------------------
> def once(x): return x
> def twice(x): return 2*x
> def thrice(x): return 3*x
> funcs = [once, twice, thrice]
> 
> flim = [lambda x:funcs[0](x), lambda x:funcs[1](x), lambda x:funcs[2](x)]
> flam = [lambda x:f(x) for f in funcs]
> 
> print flim[0](1), flim[1](1), flim[2](1)
> print flam[0](1), flam[1](1), flam[2](1)

This one was ugly. I guessed the right answer but then had to do some 
more research to understand exactly what was going wrong.

The first line prints 1, 2, 3 just like you expect.

First reaction, the second line also prints 1, 2, 3. But, Raymond 
wouldn't have asked the question if it was that easy. So, guessing that 
something funny happens I guessed 3, 3, 3. I tried it. Good guessing.

Now why?

After a bunch of screwing around (including wondering about the details 
of how the interpreter implements lambda expressions). At one point I 
tried the following (in Idle):

for f in flam: print f(1)

And wondered why I got an exception for exceeding the maximum recursion 
limit. What I finally realized was that the definition of flam 
repeatedly binds the variable f to each of the functions in funcs. The 
lambda expression defines a function that calls the function referenced 
by f. At the end of the execution of that statement, f is thrice, so all 
three of the defined lambdas call thrice. That also explains why I hit 
the maximum recursion limit.

At this point I felt like I had egg on my face. I've been burned by this 
one in the past, and I spent a while figuring it out then. The fix is easy:

flam = [lambda x, fn=f: fn(x) for f in funcs]

which creates a new local binding which captures the correct value at 
each iteration. This is the kind of problem which makes me wonder 
whether we ought to re-think about binding of variables for loops.

> ACT IV ----------------------------------------------------
> import os
> os.environ['one'] = 'Now there are'
> os.putenv('two', 'three')
> print os.getenv('one'), os.getenv('two')

Obviously, this one is trying to trick you into thinking it will print 
'Now there are three'. I ended up trying it and getting 'Now there are 
None'. Then I went back and read the documentation. What I got confused 
about was that os.putenv updates the external environment without 
changing the contents of os.environ. Updating os.environ will change the 
external environment as a side effect. I had read about this before but 
had gotten the two behaviors reversed in my head.

Now, why is it this way? It makes sense that you may have a use case for 
  changing the external environment without changing the contents of 
os.environ and so need a mechanism for doing so. However, on reflection, 
I'm not sure whether I think the implemented mechanism is 
counter-intuitive or not.