Late-binding of function defaults (was Re: What is a function parameter =[] for?)

Wed Nov 25 21:01:56 EST 2015

On Thu, 26 Nov 2015 12:20 am, BartC wrote:

> On 25/11/2015 10:52, Steven D'Aprano wrote:
>> On Wed, 25 Nov 2015 07:14 pm, Antoon Pardon wrote:
> 
>>> What exactly is your point?
>>
>> That there is a simple analogy between the distinction between code
>> inside/outside a for-loop, and code inside/outside a function. If you can
>> understand why this loops forever, instead of just twice, then you can
>> understand why function defaults work the way they do:
>>
>> L = [1, 2]
>> for i in L:
>>      L.append(i)
>>      print(L)
>>
>>
>> There is nothing "bizarre" or complicated or difficult to understand
>> happening here.
> 
> What do you think most people would expect to happen here? 

I wouldn't try to guess what "most" people would expect. Their expectations
would depend on what exposure they have had with programming languages, if
any, what languages those are, and how well they grok the idea of
simulating code in their own head.

> I know, 
> because you gave a spoiler, that it loops forever, otherwise I wouldn't
> be sure /without trying it/ (but I tried it anyway).
> 
> Here's the same code in a somewhat different language:
> 
> L := (1,2)
> for i in L do
> |   L append:= i
> |   println L
> od
> 
> And this is the output:
> 
> (1,2,1)
> (1,2,1,2)
> 
> Which output (infinite series of [1,2,1,2,1,....] or the above) is more
> sensible, and which do you think people might prefer?

The infinite loop is more sensible, and people would prefer the above
output. Who wants an infinite loop? That's nearly always a bug.

But consider why, bug or no bug, the Python version which leads to an
infinite series should be preferred over the other version. I can think of
at least three reasons why the Python behaviour is objectively better,
despite leading to an unwanted infinite loop in this specific case.

(1) The Python version avoids always making a copy of the (potentially huge)
list before looping over it. That makes the performance of for-loops more
predictable and avoids horrible surprises:

for x in L:
    break

takes the same amount of time to execute whether L has one item or a hundred
billion items. The for-loop *overhead* (as opposed to the cost of the loop
itself) is a fixed, small amount.

(2) The Python version has a cleaner, easier to understand execution model.
Apart from two obvious exceptions (on the left hand side of an assignment,
or following the "del" statement) all uses of the name L have the same
meaning. Whether you write:

    for x in L

    if "foo" in L

    while L

    L + [1, 2]

    len(L)

    L or []

    L.append(42)

or any other expression, all uses of the name L evaluate to the same thing:
the object currently bound to that name. You don't have to try to remember
which uses evaluate to the original object and which make a copy. It is
*never* a copy -- if you want a copy, you can copy it yourself. (Use the
copy module, copy.copy(L), or take a slice, L[:].)

The cost of this is that you are always responsible for making a copy, since
Python won't automatically do it for you (whether you want a copy or not).
But the benefit is that you avoid making unnecessary copies. 

I infer from the line "L append:= i" that your language doesn't always make
a copy (otherwise, how could you modify the original?). That means that
you, the programmer, still has to manage the copying of the list, exactly
the same as we have to do with Python. The difference is, Python doesn't
lure you into a false sense of security by *sometimes* making that copy for
you. The rules are:

"In Python, if you want a copy of the list, make a copy yourself."

versus:

"In BartC's language, if you want a copy of the list, look up the operation
you are about to do to see if it already makes a copy, and if it doesn't,
then make a copy yourself. (Otherwise, you might make two copies.)"

(3) What if you don't want a copy? How does one disable that feature? Or
does your language simply declare that "you can't do that"?

for item in queue:
    process(item)
    if condition: queue.append(something)

What's the most natural equivalent to that code snippet in your language?

> The point is that the behaviour of the loop is by no means obvious, so
> neither is the behaviour of the function defaults.

I'm pretty sure that nobody said the behaviour was obvious in the sense of
predictable to those unfamiliar with the rules of the language. There's
even a FAQ about it.

There's a big difference between *obvious in advance* and *obvious in
hindsight once an explanation is given*.

-- 
Steven