functions, optional parameters

Sun May 10 04:59:38 EDT 2015

(To clarify, I am *not* talking about this as a change to Python, so
all questions of backward compatibility are immaterial. This is "what
happens if we go back in time and have Python use late binding
semantics". This is the "alternate 1985" of Back to the Future.)

On Sun, May 10, 2015 at 3:20 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Sun, 10 May 2015 01:33 pm, Chris Angelico wrote:
>> In fact, that would
>> be the language definition; the rest is an optimization. (It's like
>> how "x.y()" technically first looks up attribute "y" on object x, then
>> calls the result; but it's perfectly reasonable for a Python
>> implementation to notice this extremely common case and do an
>> "optimized method call" that doesn't actually create a function
>> object.)
>
> class X:
>    def y(self): pass
>
> y is already a function object.
>
> I think maybe you've got it backwards, and you mean the *method* object
> doesn't have to be created. Well, sure, that's possible, and maybe PyPy
> does something like that, and maybe it doesn't. Or maybe the function
> descriptor __get__ method could cache the result:

Apologies, that was indeed an error of terminology. I did indeed mean
the method object that doesn't have to be created. There is already a
function object (which can be identified as X.y - Py2 differences
needn't concern us here), and AFAIK, a peephole optimizer can
transform this safely:

x = X()
x.y()
# into
x = X()
X.y(x)

That's an optimization that can't possibly change the result (at
least, I'm not aware of a way that it can; I may be wrong), and so
it's a viable change for something like PyPy to do. But semantically,
a bound method object is still created, which means it's fully legal
to split that into two parts:

x = X()
f = x.y
f()

The only result should be that this defeats the optimization, so you
end up paying a greater cost in object (de)allocations.

> But that's much simpler than the early/late binding example. You talk
> about "the obvious cases" like int, bool, str and None. What about floats
> and frozensets, are they obvious? How about tuples? How about
> MyExpensiveImmutableObject?

Simple: if the optimizer doesn't know about them, they go by the
regular rule. As there's no semantic difference, there cannot be any
true effect beyond performance. Floats can easily be added to the list
I gave; tuples could be, as long as their members are also immutable;
frozenset doesn't have a literal form, nor would
MyExpensiveImmutableObject, so they would miss out on this benefit.

>> The simpler the rule, the easier to grok, and therefore the
>> less chance of introducing bugs.
>
> You're still going to surprise people who expect early binding:
>
> FLAG = True
>
> def spam(eggs=FLAG):
>     ...
>
>
> What do you mean, the default value gets recalculated every time I call
> spam? It's an obvious immutable type! And why does Python crash when I
> delete FLAG?

Still simple: Since late binding is the semantically-mandated
behaviour, this will always reevaluate FLAG - the optimizer has been
bypassed here. It's not an obvious immutable type - the example I
actually gave was "int/bool/str/None *literals*", not *values*. Here's
a non-toy example that would use this kind of flag-lookup semantics
usefully:

default_timeout = 60 # seconds

def url_get(url, timeout=default_timeout):
    """Perform a GET request and return the data"""

def url_post(url, body, timeout=default_timeout):
    """Perform a POST request and return the data"""

def dns_lookup(server, name, type="A", class="IN", timeout=default_timeout):
    """Send a DNS request and await a response"""

By changing modulename.default_timeout, you instantly change all of
the functions' defaults. In current Python, this would have to be done
as:

def url_get(url, timeout=None):
    if timeout is None: timeout = default_timeout

which duplicates that code down all of them, and it means that
introspection of the function can't show you what it's actually doing.
With late binding, an introspection could yield both the expression
used ("default_timeout") and, with evaluation, the effective default.

Now, this is a rarity. This is far FAR less common than the situations
where early binding is better. But there are places where it would
make sense.

> Worse:
>
>
> def factory():
>     funcs = []
>     for i in range(1, 5):
>         def adder(x, y=i):
>             return x + y
>         adder.__name__ = "adder%d" % i
>         funcs.append(adder)
>     return funcs
>
>
> The current behaviour with early binding:
>
>
> py> funcs = factory()
> py> [f(100) for f in funcs]
> [101, 102, 103, 104]
>
>
> What would it do with late binding? That's a tricky one. I can see two
> likely results:
>
> [f(100) for f in funcs]
> => returns [104, 104, 104, 104]
>
> or
>
> NameError: name 'i' is not defined
>
>
> both of which are significantly less useful.

I'd say the former makes more sense - it's what would happen if you
evaluated the expression "i" in the context of that factory function.
But yes, significantly less useful than early binding; I'm not sure
how to cleanly implement that kind of metaprogramming otherwise.

> As I've said, it is trivial to get late binding semantics if you start with
> early binding: just move setting the default value into the body of the
> function. 99% of the time you can use None as a sentinel, so the common
> case is easy:
>
> def func(x=None):
>     if x is None:
>         x = some_complex_calculation(i, want, to, repeat, each, time)
>
>
> and the rest of the time, you just need *one* persistent variable to hold a
> sentinel value to use instead of None:
>
> _sentinel = object
> def func(x=_sentinel, y=_sentinel, z=_sentinel):
>     if x is _sentinel: ...

Presumably that would instantiate an object() rather than using the
object type itself, but yes. Sometimes it'd be nice to be able to get
something with a more useful repr, but that's not a big deal.

> But if you start with late binding, it's hard to *cleanly* get early binding
> semantics. You need a separate global for each parameter of every function
> in the module:
>
> _default_x = some_complex_calculation(do, it, once)
> _default_y = another_complex_calculation(do, it, once)
> _default_z = a_third_complex_calculation(do, it, once)
> _default_x_for_some_other_function = something_else()
>
>
> def func(x=_default_x, y=_default_x, z=_default_z):  # oops, see the bug
>     ...

Yup, I see it... but I quite probably wouldn't if your variable names
were less toyish. Even as it is, the important info is getting lost in
this sea of "=_default_" that keeps having to be repeated.

> No, early binding by default is the only sensible solution, and Guido got it
> right. Having syntax for late binding would be a bonus, but it isn't really
> needed. We already have a foolproof and simple way to evaluate an
> expression at function call-time: put it in the body of the function.

I agree. As I said at the top, this is all just what happens if Biff
is in charge instead of Guido. It's not instantly internally
inconsistent, but it is a lot less useful than the early binding we
currently have.

The advantage of a late-binding syntax is that it could be visible in
the function signature, instead of being buried inside. If we had
something like this:

def print_list(lst, start=0, end==len(lst)):
    """Print out some or all elements of a given list"""

then it'd be obvious that the one-arg behaviour is to print out the
whole list; otherwise, you'd see None up there, and have to presume
that it means "to end of list". But syntax has to justify itself with
a lot more than uber-rare cases like these.

ChrisA