What is a function parameter =[] for?

Thu Nov 19 07:19:53 EST 2015

On Thu, 19 Nov 2015 10:14 am, BartC wrote:

> On 18/11/2015 22:11, Ian Kelly wrote:

>> The list0 parameter has a default value, which is [], an initially
>> empty list. The default value is evaluated when the function is
>> defined, not when it is called, so the same list object is used each
>> time and changes to the list are consequently retained between calls.
> 
> That is really bizarre behaviour.

It is standard early binding behaviour, applied to function defaults.

Early versus late binding crops up in many places when programming:

https://support.microsoft.com/en-us/kb/245115

http://javascript.info/tutorial/binding

https://msdn.microsoft.com/en-us/library/0tcf61s1.aspx

Unfortunately, like many evocative or useful terms in computing, it the term
isn't used consistently. There are at least two related, but distinct, uses
for "early/late binding", and Wikipedia only talks about the *other* one:

https://en.wikipedia.org/wiki/Late_binding

In Python terms, *all* (or nearly all?) name binding (assignment) is
equivalent to C++ late binding, a.k.a. "dynamic dispatch". But Python also
uses early/late binding to refer to when the default object is assigned.

Consider this pair of functions:

def expensive():
    # Simulate some expensive calculation or procedure.
    time.sleep(100)
    return random.randint(1, 6)

def demo(arg=expensive()):
    return arg + 1

Now we call the second function four times, without an argument so that the
default value is used:

demo()
demo()
demo()
demo()

What results do you expect?

There are two standard behaviours:

Early binding means that the default value is generated *once*, when the
function `demo` is created. That's expensive, but it only gets used once,
and the default value is now fixed, and can be retrieved almost instantly
in subsequent calls. That's what Python uses.

Late binding means that the default value is generated every time you call
the function. That's expensive, and confusing. Why is the function
parameter list being re-evaluated each time the function is called? Why is
the default value different each time I call the function? Now *that's*
bizarre.

Compared to the weirdness of late binding, Python's early binding makes much
more sense.

Another advantage of early binding is that it involves a consistent
execution model: only the function body is executed when you call the
function, not the function declaration and parameter list.

def demo(arg=expensive()):  # declaration, including the parameter list
    # function body is indented

Now it is easy to tell which part gets executed when: just look at the
indentation.

Both early and late binding are useful, so which should a language default
to? (Assuming it doesn't offer both.) I think that there is absolutely no
doubt that function defaults should use early binding. The overwhelming
advantage of early binding is this:

- if the language defaults to early binding, it is *easy* for the 
  programmer to get late binding semantics;

- if the language defaults to late binding, it is *very difficult*
  for the programmer to get early binding semantics.

Given early binding, like Python has, it is easy to get late binding
semantics. All you have to do is use a sentinel value, and move the code
you want to execute every time the function is called into the body of the
function. The most common sentinel is None:

def demo(arg=None):
    if arg is None:
        arg = expensive()
    ...

Now it is obvious that expensive() will be called each time you call demo()
(with no argument provided), since the call to expensive is inside the
function body.

But let's try going the other way. Suppose function defaults were evaluated
each and every time you called the function. How could you *avoid* the
expense and waste of re-evaluating the default over and over again?

You can't, or at least, not cleanly and easily. The most obvious way is to
use a global variable:

ARG = expensive()

def demo(arg=ARG):
    ...

This is ... horrible. You are still evaluating the default each time, but at
least it is only a global variable lookup, not an expensive function call.
But the cost is great. Now you have to pollute the module with a global
variable for every function that has a default value. This breaks
encapsulation -- the default value for the function is no longer visible in
the function's declaration, you have to hunt for it through your module.
And what if somebody changes the global, or deletes it?

> So, looking at some source code, a default value for certain types is
> only certain to be that value for the very first call of that function?

When you deal with mutable objects, you have to expect them to mutate. The
whole point of mutability is that their value can change.

If you use a mutable default object, it is still mutable, and can change its
value. If you don't want that, then use an immutable object.

>  > The default value is evaluated when the function is
>  > defined, not when it is called
> 
> Given the amount of pointless dynamic stuff that goes on in Python, I'm
> surprised they've overlooked this one!
> 
> It seems simple enough to me to check for a missing parameter, and to
> assign whatever default value was designated ([] in this case). (How
> does the default mechanism work now?)

What "seems" simple is not simple. The function default is not guaranteed to
be a simple literal. Defaults are arbitrarily complex expressions:

def func(a, b=[1, x] + [random.random() for i in range(x**4)]): ...

So you have to store that expression:

    [1, x] + [random.random() for i in range(x**4)]

together with enough information to know which scope it will be evaluated
in. And you have to decide: should that be a closure, or a regular function
call? Whichever decision you make, you will make some people unhappy.

There are consequences of your choice:

- Assuming that "x" is a global variable, then making the expression a
closure will mean that the value of x is kept alive, possibly long after
the default value is no longer needed.

- But if you make the expression a non-closure, then the caller is
responsible for ensuring that x is not deleted. If it is deleted, then you
will get a NameError trying to evaluate the default value.

Whichever option you choose, there will be a multitude of people on the
internet telling you that you got it wrong.

-- 
Steven