The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Sun Mar 13 23:01:19 EDT 2016

On Mon, 14 Mar 2016 12:16 am, BartC wrote:

> On 13/03/2016 09:39, Steven D'Aprano wrote:
>> On Sun, 13 Mar 2016 04:54 am, BartC wrote:
> 
>>> Common sense tells you it is unlikely.
>>
>> Perhaps your common sense is different from other people's common sense.
>> To me, and many other Python programmers, it's common sense that being
>> able to replace functions or methods on the fly is a useful feature worth
>> having.
> 
> Worth having but at significant cost. But look at my jpeg test (look at
> any other similar programs); how many function names are used
> dynamically? How many functions *really* need to be dynamic?

As difficult as it may seem sometimes, Python is not a specialist
programming language designed to be optimal for your jpeg test, it is a
general purpose programming language designed for many different uses :-)

"Really need" is a funny thing. We don't *really need* anything beyond
machine language, or maybe assembly code mnemonics that correspond directly
to machine codes. Everything beyond that is either, depending on your
perspective, a waste of time that leads to inefficient code, or the best
thing for programming ever that saves programmer time and effort and
increases productivity.

You may not have noticed yet *wink* but Python's emphasis is more on the
programmer productivity side than the machine efficiency side. And from
that perspective, it can be argued that, yes, absolutely, all those
functions *need* to be dynamic, or at least have the possibility to be
dynamic if and when needed.

>>> (Have you tried looking at the CPython sources?
> 
>> Right. Now add *on top of that complexity* the extra complexity needed to
>> manage not one, but *two* namespaces for every scope: one for "variables"
>> and one for "functions and methods".
> 
> No, there's one namespace per scope. (Otherwise you could have a
> function 'f' and a variable 'f' in each scope.)

I was going to ask you about that.

> Perhaps what you mean is the number of different kinds of identifiers
> there can be. At the minute, apart from obvious, fixed, reserved words
> (I assume there are some!), 

There are reserved words -- "for", "if", "else", "def", etc. The compiler
deals with reserved words at compile time: you get a syntax error if you
try to use them as a variable name, not a runtime error.

> there seems to be just one kind. The 
> different categories (function, variable, class, built-in, module etc)
> are sorted out at *run-time*.

What do you mean by "sorted out"? You have names in a namespace which are
bound to objects. Once you have bound a name, you can rebind it:

x = 23
x = 'abc'

is just a rebinding where the value of x changes from 23 to 'abc'.

> Some of this will just move to *compile-time*. 

How? It's easy to say "just move it to compile time", but *how* do you move
it (don't say "by editing the Python interpreter's source code"), and what
*precisely* is "it"?

At compile time, you don't in general know what value things will have. You
don't even know what type they will have (since the type of an object is
part of its value).

Here is an extreme example to demonstrate the difficulty:

import random
if random.random() < 0.5:
    def f():
        return 1
else:
    f = "hello"

f = "goodbye"

So tell me: *at compile time*, how do you know whether the final binding
(assignment of "goodbye" to variable f) should be allowed or not?

> Same amount of 
> complexity, but now you do it just once at compile-time, instead of a
> billion times at run-time.

I don't think you are, but let's suppose you're right and that the compiler
can reliably detect and prevent code like this:

def f(): return 1
g = f
g = something_else  # allowed
f = something_else  # not allowed

How do you intend to prevent this?

def f(): return 1
exec "f = something_else"

Suppose we removed the ability to rebind names that were bound to a
function, as you suggest. There's a very important and commonplace use for
rebinding functions that we would lose: decorators.

"Good riddance," you might say, "I never use them." Okay, but if so, that is
certainly the Blub paradox talking. Decorators have completely transformed
Python since decorator syntax was introduced in (by memory) version 2.3.
Decorators are a hugely important part of Python, and having to give them
up in order to get compile-time-constant functions would gut the language
and cripple almost all major libraries and code bases.

Decorators themselves were possible before 2.3(?), but they were
inconvenient to use and didn't have a well-known name and so nobody really
used them:

# Before decorator syntax
def func():
    ...

func = decorate(func)

# After decorator syntax
@decorate
def func():
    ...

where decorate itself is a function which usually pre- or post-processes the
original function. (Decorators can do much, much more, but that's probably
the most common use-case.)

Decorator syntax is just syntactic sugar for a wrapper around a function,
assigned to the same name as the function. Now, maybe a smart enough
interpreter will somehow deal with this. But there are times where you
cannot use decorator syntax, and have to write it the old-school way:

func = decorate(func)

(Even if it is only for backwards-compatibility.) In 2016, Python will no
more give that up than give up strings or ints.

>>> def f(): return "One"
>>> def g(): return "Two"
>>>
>>> h=f
> 
>> Let me see if I can draw you a more complete picture. Suppose I have a
>> function that relies on (let's say) something random or unpredictable:
>>
>> def get_data():
>>      return time.strftime("%S:%H")
> 
>> Now I use `get_data` in another function:
>>
>> def spam():
>>      value = get_stuff().replace(":", "")
> 
> (I assume you mean get_data here.)

Yes, thank you.

>> How do I test the `spam` function? I cannot easily predict what the
>> `get_data` function will return.
>>
>> In Python, I can easily monkey-patch this for the purposes of testing, or
>> debugging, by introducing a test double (think of "stunt double") to
>> replace the real `get_data` function:
>>
>> import mymodule
>> mymodule.get_data = lambda: "1:1"
>> assert spam() == "Spam to the 11"
> 
> (How do you get back the original get_data?) 

It's a test module. You let the test finish, the interpreter shuts down, and
everything goes back to normal. You haven't modified the "mymodule" code on
disk.

What if you have more tests to run? That's hardly more complicated:

import mymodule
save = mymodule.get_data
try:
    mymodule.get_data = lambda: "1:1"
    assert spam() == "Spam to the 11"
finally:
    mymodule.get_data = save
# more tests go here...

Of course, proper test frameworks will provide this as a feature of the
framework. For example, unittest will run setup and cleanup code before and
after each test, so you put your patch in the setup code and the restore in
the cleanup.

> But this looks a very 
> dangerous technique. Suppose, during the test, that another function in
> mymodule, or one that imports it, needs access to the original get_data
> function to work properly? Now it will get back nonsense.

I suppose that's a theoretical possibility. But in practice, it doesn't come
up, or at least not often. Unit tests, in particular, are small and
focused. If you are testing `spam`, you're not going to call a bunch of
other functions or other modules. That goes against the idea of unit
testing.

I suppose that your concern is more realistic when it comes to integration
testing. But integration tests are, by their nature, much bigger and more
complex. You can't replace components in an integration test, because then
you aren't testing the integration between all your components.

Trust me, runtime mocking is a standard part of Python. This is not some
weird and wacky corner done by desperadoes hacking on the extremes of the
language, it is an official language feature with standard library support:

https://docs.python.org/3/library/unittest.mock.html

>> How would you do it, when functions are constant? You would have to
>> re-write the module to allow it:
> 
> There are a dozen ways of doing it. It may involve temporary renaming or 
> hiding. But what you don't want is for production code to be lumbered
> with all these lookups (and having to sort out arguments, keywords and
> defaults) at runtime, just to make it a bit easier for debug code to run.
> 
> I think anyway that any Python program using dynamic functions, can be
> trivially transformed to one that uses static functions. It won't be
> pretty, but any function:
> 
>   def f(): whatever
> 
> could be rewritten as:
> 
>   def __f(): whatever
>   f = __f()
> 
> But now the static name __f is available for direct use, and can be
> depended on not to change.

No it can't. You haven't thought it through.

(1) Are you expecting to write "__f()" inside your code when you mean to
call f()? If so, then what's the purpose of f()? If not, then you
write "f()" and nothing has changed: the interpreter has to do a runtime
lookup because f might have changed.

(2) What is stopping people from changing __f in all the many different ways
that it could be changed?

__f = something_else

globals()['__f'] = something_else

exec '__f = something_else'

> (Perhaps such a convention can be used anyway. A functions that starts
> with "__" or uses some other device, the compiler and runtime will know
> it will always be that function, and could allow some optimisations.)

Now that I've shown you all the ways that code can be changed on the fly,
and probably convinced you that it is impossible to optimize Python without
changing the language, I'm going to tell you that in fact all is not lost.
There is considerable discussion going on among the core devs about
providing official support for both AST and byte code transformation tools:

https://www.python.org/dev/peps/pep-0511/

which will allow the sorts of optimizations you want, only *safely* and
without compromising on Python's dynamism.

I really recommend you read PEP 511 and see what Victor Stinner has in mind
for Python 3.6. (I believe that his work on this is being funded by Red
Hat.)

> (It's not quite so trivial for import module names, as you can't really
> just rename all modules! But in theory it could be done.)
> 
>> I once monkey-patched the `len` built-in so I could monitor the progress
>> of a long-running piece of code that wasn't written to give any feedback.
> 
> These ought to be tricks that you do with source code. It shouldn't be
> necessary for an implementation to allow that.

No, absolutely not! There is no "ought to do this by editing the source
code" here -- I reject completely your primitive and crippled language that
forces me to edit the source code and recompile/reload like some sort of
savage *wink*

> (But doesn't len() already have a mechanism where you can override it
> anyway?)

You're thinking of the __len__ magic method, for implementing len() of your
own classes. That's completely different.

-- 
Steven