[Python-ideas] fixing mutable default argument values

Thu Feb 1 04:33:09 CET 2007

(my response is a bit late, I needed some time to come up with a good  
answer to your objections)

On Tue, 30 Jan 2007 16:48:54 +0100, Greg Falcon <veloso at verylowsodium.com>  
wrote:

> On 1/30/07, Jan Kanis <jan.kanis at phil.uu.nl> wrote:
>> On the other hand, are there really any good reasons to choose the  
>> current
>> semantics of evaluation at definition time?
>
> While I sympathize with the programmer that falls for this common
> Python gotcha, and would not have minded if Python's semantics were
> different from the start (though the current behavior is cleaner and
> more consistent), making such a radical change to such a core part of
> the language semantics now is a very bad idea for many reasons.

It would be a py 3.0 change. Other important stuff is going to change as  
well. This part of python is IMO not that much part of the core that it  
can't change at all. Especially since the overwhelming majority of all  
uses of default args have immutable values, so their behaviour isn't going  
to change anyway. (judging by the usage in the std lib.)
Things like list comprehension and generators were a much greater change  
to python, drastically changing the way an idiomatic python program is  
written. They were added in 2.x because they could be implementen backward  
compatible. With python 3.0, backward compatibility isn't so important  
anymore. The whole reason for python 3.0's existance is to fix backward  
incompatible stuff.

>> What I've heard basically
>> boils down to two arguments:
>> - "let's not change anything", i.e. resist change because it is change,
>> which I don't think is a very pythonic argument.
>
> The argument here is not "let's not change anything because it's
> change," but rather "let's not break large amounts of existing code
> without a very good reason."  As has been stated here by others,
> making obsolete a common two-line idiom is not a compelling enough
> reason to do so.

py3k is going to break large ammounts of code anyway. This pep certainly  
won't break the most of it. And there's gonna be an automatic py2 -> py3  
refactoring tool, that can catch any possible breakage from this pep as  
well.

> Helping out beginning Python programmers, while well-intentioned,
> doesn't feel like enough of a motivation either.  Notice that the main
> challenge for the novice programmer is not to learn how default
> arguments work -- novices can learn to recognize and write the idiom
> easily enough -- but rather to learn how variables and objects work in
> general.
[snip]
> At some point in his Python career, a novice is going to have to
> understand why b "changed" but d didn't.  Fixing the default argument
> "wart" doesn't remove the necessity to understand the nature of
> mutable objects and variable bindings in Python; it just postpones the
> problem.  This is a fact worth keeping in mind when deciding whether
> the sweeping change in semantics is worth the costs.

The change was never intended to prevent newbies from learning about  
pythons object model. There are other ways to do that. But keeping a  
'wart' because newbies will learn from it seems like really bad reasoning,  
language-design wise.

>> - Arguments based on the assumption that people actually do make lots of
>> use of the fact that default arguments are shared between function
>> invocations, many of which will result in (much) more code if it has to  
>> be
>> transformed to using one of the alternative idioms. If this is true, it  
>> is
>> a valid argument. I guess there's still some stdlib grepping to do to
>> decide this.
>
> Though it's been decried here as unPythonic, I can't be the only
> person who uses the idiom
> def foo(..., cache={}):
> for making a cache when the function in question does not rise to the
> level of deserving to be a class object instead.  I don't apologize
> for finding it less ugly than using a global variable.

How often do you use this compared to the x=None idiom?

This idiom is really going to be the only idiom that's going to break.  
There are many ways around it, I wouldn't mind an @cache(var={}) decorator  
somewhere (perhaps in the stdlib). These kind of things seem to be exactly  
what decorators are good at.

> I know I'm not the only user of the idiom because I didn't invent it
> -- I learned it from the Python community.  And the fact that people
> have already found usages of the current default argument behavior in
> the standard library is an argument against the "unPythonic" claim.
>
> I'm reminded of GvR's post on what happened when he made strings
> non-iterable in a local build (iterable strings being another "wart"
> that people thought needed fixing):
> http://mail.python.org/pipermail/python-3000/2006-April/000824.html

In that thread, Guido is at first in favour of making strings  
non-iterable, one of the arguments being that it sometimes bites people  
who expect e.g. a list of strings and get a string. He decides not to make  
the change because there appear to be a number of valid use cases that are  
hard to change, and the number of people actually getting bitten by it is  
actually quite small. (To support that last part, note for example that  
none of the 'python problems' pages listed in the pep talk about string  
iteration while all talk about default arguments, some with dire warnings  
and quite a bit of text.)
In the end, the numbers are going to be important. There seems to be only  
a single use case in favour of definition time semantics for default  
variables (caching), which isn't very hard to do in a different way.  
Though seasoned python programmers don't get bitten by default args all  
the time, they have to work around it all the time using =None.

If it turns out that people are actually using caching and other idioms  
that require definition time semantics all the time, and the =None idiom  
is used only very rarely, I'd be all in favour of rejecting this pep.

>
>> So, are there any _other_ arguments in favour of the current semantics??
>
> Yes.  First, consistency.

[factoring out the first argument into another email. It's taking me some  
effort to get my head around the early/late binding part of the generator  
expressions pep, and the way you find an argument in that. As far as I  
understand it currently, either you or I do not understand that part of  
the pep correctly. I'll try to get this mail out somewhere tomorrow]

> Second, the a tool can't fix all usages of the old idiom.  When things
> break, they can break in subtle or confusing ways.  Consider my module
> "greeter":
>
> == begin greeter.py ==
> import sys
> def say_hi(out = sys.stdout):
>  print >> out, "Hi!"
> del sys # don't want to leak greeter.sys to the outside world
> == end greeter.py ==
>
> Nothing I've done here is strange or unidiomatic, and yet your
> proposed change breaks it, and it's unclear how an automated tool
> should fix it.

Sure this can be fixed by a tool:

import sys
@caching(out = sys.stdout)
def say_hi(out):
	print >> out, "Hi!"
del sys

where the function with the 'caching' wrapper checks to see if an argument  
for 'out' is provided, or else provides it itself. The caching(out =  
sys.stdout) is actually a function _call_, so it's sys.stdout gets  
evaluated immediately.

possible implementation of caching:

def caching(**cachevars):
	def inner(func):
		def wrapper(**argdict):
			for var in cachevars:
				if not var in argdict:
					argdict[var] = cachevars[var]
			return func(**argdict)
		return wrapper
	return inner

Defining a decorator unfortunately requires three levels of nested  
functions, but apart from that the thing is pretty straightforward, and it  
only needs to be defined once to use on every occasion of the caching  
idiom.
It doesn't currently handle positional vars, but that can be added.

> What's worse about the breakage is that it doesn't
> break when greeter is imported,

That's true of any function with a bug in it. Do you want to abandon  
functions alltogether?

> or even when greeter.say_hi is called
> with an argument.

Currently for people using x=None, if x=None <calculate default value>,  
this difference is a branch in the code. That's why you need to test _all_  
possible branches in your unit test. Analagously you need to test all  
combinations of arguments if you want to catch as many bugs as possible.

> It might take a while before getting a very
> surprising error "global name 'sys' is not defined".

However, your greeter module actually has a slight bug. What if I do this:

import sys, greeter
sys.stdout = my_output_proxy()
greeter.say_hi()

Now say_hi() still uses the old sys.stdout, which is most likely not what  
you want. If greeter were implemented like this:

import sys as _sys
def say_hi(out = _sys.stdout):
	print >> out, "Hi!"

under the proposed semantics, it would all by itself do a late binding of  
_sys.stdout, so when I change sys.stdout somewhere else, say_hi uses the  
new stdout.
Deleting sys in order not to 'leak' it to any other module is really not  
useful. Everybody knows that python does not actually enforce  
encapsulation, nor does it provide any kind of security barriers between  
modules. So if some other module wants to get at sys it can get there  
anyway, and if you want to indicate that sys isn't exporters and greeter's  
sys shouldn't be messed around with, the renaming import above does that  
just fine.

> Third, the old idiom is less surprising.
>
> def foo(x=None):
>  if x is None:
>    x=<some_expr>
>
> <some_expr> may take arbitrarily long to complete.  It may have side
> effects.  It may throw an exception.  It is evaluated inside the
> function call, but only evaluated when the default value is used (or
> the function is passed None).
>
> There is nothing surprising about any of that.  Now:
>
> def foo(x=<some_expr>):
>  pass
>
> Everything I said before applies.  The expression can take a long
> time, have side effects, throw an exception.  It is conditionally
> evaluated inside the function call.
>
> Only now, all of that is terribly confusing and surprising (IMO).

Read the "what's new in python 3.0" (assuming the pep gets incorporated,  
of course).
Exception tracebacks and profiler stats will point you at the right line,  
and you will figure it out. As you said above, all of it is true under the  
current =None idiom, so there are no totally new ways in which a program  
can break. If you know the ways current python can break (take too long,  
unwanted side effects, exceptions) you will figure it out in the new  
version.
Anyway, many python newbies consider it confusing and surprising that an  
empty list default value doesn't stay empty, and all other pythoneers have  
to work around it a lot of times. It will be a pretty unique python  
programmer whose program will break in ways mentioned above by the default  
expression being evaluated at call time, and wouldn't have broken under  
python's current behaviour, and who isn't able to figure out what happened  
in a reasonable amout of time. So even if your argument holds, it will  
still be a net win to accept the pep.

>
>
> Greg F
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

- Jan