Early binding as an option

Wed Aug 3 06:16:46 EDT 2011

Chris Angelico wrote:

> On Tue, Aug 2, 2011 at 9:23 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>> On 8/2/2011 12:55 PM, Chris Angelico wrote:
>>>
>>> As I understand it, Python exclusively late-binds names; when you
>>> define a function, nothing is ever pre-bound.
>>
>> By 'pre-bound' you presumably mean bound at definition time rather than
>> call time. Default arg objects *are* pre-computed and pre-bound to
>> internal slots at definition time.
> 
> Of course; that's a different issue altogether. No, I'm talking about
> the way a tight loop will involve repeated lookups for the same name.

It's not really a different issue. The "standard" approach for performing
early binding in Python is by binding a global name to a local name at
function definition time. In CPython at least, local lookups are faster
than globals: locals are stored in a fixed array, and the function knows
the numeric offset of each variable. Non-locals have to search multiple
namespaces (globals + built-ins), using a hash table. That's fast, but
locals are faster.

(Aside: this is why locals() is not generally writable: the dict is a copy
of that internal array. You can write to the dict, but the changes don't
propagate back to the actual local table of the function.)

So a good way to speed up name lookups is to turn a global into a local, and
one common way to do that is to do it when the function is defined. For
example, you will see this in the random module:

    def randrange(self, start, stop=None, step=1, int=int, default=None,
                  maxwidth=1<<BPF):
        """Choose a random item from range(start, stop[, step]).

        This fixes the problem with randint() which includes the
        endpoint; in Python this is usually not what you want.
        Do not supply the 'int', 'default', and 'maxwidth' arguments.
        """

Note the int=int parameter.

> Unfortunately, there is no way - by definition - to guarantee that a
> binding won't change. Even in the example of getting the lengths of
> lines in a file, it's entirely possible for __len__ to rebind the
> global name "len" - so you can't rely on the repeated callings of
> len() to be calling the same function.
> 
> But who WOULD do that? It's somewhat ridiculous to consider, and
> there's a huge amount of code out there that does these repeated calls
> and does not do stupid rebindings in the middle. So Python permits
> crazy behaviour at the cost of the performance of normal behaviour.

This is a common criticism of Python, and the truth be told, it is right:
Python makes crazy monkey-patching that we're told not to do easy, at the
expense of allowing the compiler to optimise really common things.

But, are you *sure* that name lookups are *really* the bottleneck? Name
lookups are pretty fast. If you want them faster, turn them into a local
variable. It's not clear to me that syntax, or a pragma directive, or some
other form of magic, is better than an explicit assignment:

def func(x):
    _len = len  # bind at function call time
    for i in x:
        _len(i)  # lookups of _len will be faster than len

The argument that Python would be faster if it did early binding is not a
given. For trivial code that doesn't do much, I dare say it might shave off
a significant percentage, but trivial code is already pretty fast. Who
cares if you speed up a trivial operation? In non-trivial code that does
real work, it's not obvious that early binding will result in meaningful
time savings. ("Yay, my code now runs in 17.1 milliseconds instead of
17.2!") The onus is on the person proposing this change to prove that the
benefit is worth the extra complexity.

Knowing what we know now, I'm not sure whether Guido would have kept the
late binding semantics for Python, or privileged builtins in some fashion.
I suspect that he wouldn't have changed a thing, but if he did, I suspect
he'd be more concerned about accidental shadowing of builtins than about
optimizations.

E.g. people say list = [1,2,3], and then later try to call list(something).

Or maybe that's just me :)

Guido is generally very, very suspicious about proposed optimisations. They
tend to increase the code's complexity by far more than they increase its
speed. (E.g. an order of magnitude more complex to get 10% speed increase.)
And they often subtly break the semantics of the code. E.g. if you do
floating point maths, optimizing compilers that assume that x + y - x
equals y are BROKEN. 

> With the local-variable-snapshot technique ("len = len"), can anything
> be optimized, since the parser can guarantee that nothing ever
> reassigns to it? 

Very likely. But the optimization happens at runtime, not at compile time.

Default values for parameters are stored as an attribute of the function at
definition time. So if you have this:

def f(x, len=len):
    return len(x)

then f gets a publicly accessible attribute func_defaults:

>>> f.func_defaults
(<built-in function len>,)

Even if you re-bind len, that doesn't change the binding here, and the
*local* variable len will be assigned that value when f is called. Here's a
practical example:

>>> def test(len=len):
...     while True:
...             yield len
...
>>>
>>> it = test()
>>> next(it)
<built-in function len>
>>> len = None  # rebind the global len
>>> next(it)
<built-in function len>
>>> test.func_defaults = (None, )
>>> next(it)
<built-in function len>

But note that now as I've replaced the default value, the next time I call
test() I'll get len=None.

[...]
> So... Would this potentially produce wrong results? Would it be of any
> use, or would its benefit be only valued in times when the whole
> function needs to be redone in C?

You really would need to demonstrate that the bottleneck in useful code (as
opposed to toy examples) was the name lookups.

-- 
Steven