[Python-ideas] RFC: PEP: Add dict.__version__

Steven D'Aprano steve at pearwood.info
Sat Jan 9 23:24:27 EST 2016


On Sat, Jan 09, 2016 at 05:18:40PM -0500, Terry Reedy wrote:
> On 1/8/2016 4:27 PM, Victor Stinner wrote:
> 
> >Add a new read-only ``__version__`` property to ``dict`` and
> >``collections.UserDict`` types, incremented at each change.
> 
> I agree with Neil Girdhar that this looks to me like a CPython-specific 
> implementation detail that should not be imposed on other 
> implementations.  For testing, perhaps we could add a dict_version 
> function in test.support that uses ctypes to access the internals.
> 
> Another reason to hide __version__ from the Python level is that its use 
> seems to me rather tricky and bug-prone.

What makes you say that? Isn't it a simple matter of:

v = mydict.__version__
maybe_modify(mydict)
if v != mydict.__version__:
    print("dict has changed")

which doesn't seen tricky or bug-prone to me.

The only thing I would consider is the risk that people will write v > 
mydict.__version__ instead of not equal, which is wrong if the flag 
overflows back to zero. But with a 64-bit flag, and one modification to 
the dict every nanosecond (i.e. a billion changes per second), it will 
take approximately 584 years before the counter overflows. I don't think 
this is a realistic scenario. How many computers do you know with an 
uptime of more than a decade?

(A 32-bit counter, on the other hand, will only take four seconds to 
overflow at that rate.)

 
> >Python is hard to optimize because almost everything is mutable: builtin
> >functions, function code, global variables, local variables, ... can be
> >modified at runtime.
> 
> I believe that C-coded functions are immutable.  But I believe that 
> mutability otherwise otherwise undercuts what your are trying to do.

If I have understood Victor's intention correctly, what he's looking for 
is a way to quickly detect the shadowing or monkey-patching of builtins, 
so that if they *haven't* been shadowed/monkey-patched, functions can 
bypass the (slow) lookup process with a fast inline version.

Here's a sketch of the idea:

def demo(arg):
    return len(arg)

This has to do a time-consuming lookup of len in the globals, and if not 
found, then a second lookup in builtins. But 99.99% of the time, we 
haven't shadowed or monkey-patched len, so the compiler ought to be able 
to inline the function and skip the search. This is how static 
programming languages typically operate, and is one of the reasons why 
they're so fast.

In Python, you will often see functions like this:

def demo(arg, len=len):
    return len(arg)

which replace the slow global lookup with a fast local lookup, but at 
the cost of adding an extra parameter to the function call. Ugly and 
confusing. And, it has the side-effect that if you do shadow or 
monkey-patch len, the demo function won't see the new version, which may 
not be what you want.

Victor wants to be able to make that idiom obsolete by allowing the 
compiler to automatically translate this:

def demo(arg):
    return len(arg)

into something like this:

def demo(arg):
    if len has been shadowed or monkey-patched:
        return len(arg)  # calls the new version
    else:
        return inlined or cached version of len(arg)

(I stress that you, the code's author, don't have to write the code like 
that, the compiler will automatically do this. And it won't just 
operate on len, it could potentially operate on any function that has 
no side-effects.)

This relies on the test for shadowing etc to be cheap, which Victor's 
tests suggest it is. But he needs a way to detect when the globals() and 
builtins.__dict__ dictionaries have been changed, hence his proposal.


> >Implementing optimizations respecting the Python
> >semantic requires to detect when "something changes":
> 
> But as near as I can tell, your proposal cannot detect all relevant 
> changes unless one is *very* careful.  A dict maps hashable objects to 
> objects.  Objects represent values.  So a dict represents a mapping of 
> values to values.  If an object is mutated, the object to object mapping 
> is not changed, but the semantic value to value mapping *is* changed. 
> In the following example, __version__ twice gives the 'wrong' answer 
> from a value perspective.
> 
> d = {'f': [int]}
> d['f'][0] = float # object mapping unchanged, value mapping changed
> d['f'] = [float]  # object mapping changed, value mapping unchanged

I don't think that matters for Victor's use-case. Going back to the toy 
example above, Victor doesn't need to detect internal modifications to 
the len built-in, because as you say it's immutable:

py> len.foo = "spam"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'builtin_function_or_method' object has no attribute 
'foo'

He just needs to know if globals()['len'] and/or builtins.len are 
different (in any way) from how they were when the function "demo" was 
compiled.

I'm sure that there are complications that I haven't thought of, but 
these sorts of static compiler optimizations aren't cutting edge 
computer science, they've been around for decades and are well-studied 
and well-understood.

> >The astoptimizer of the FAT Python project implements many optimizations
> >which require guards on namespaces. Examples:
> >
> >* Call pure builtins: to replace ``len("abc")`` with ``3``,
> 
> Replacing a call with a return value assumes that the function is 
> immutable, deterministic, and without side-effect.  Perhaps this is what 
> you meant by 'pure'.  

Yes, "pure function" is the term of art for a function which is 
deterministic and free of side-effects.

https://en.wikipedia.org/wiki/Pure_function

Immutability is only important in the sense that if a function *is* pure 
now, you know it will be pure in the future as well.


> Are you proposing to provide astoptimizer with 
> either a whitelist or blacklist of builtins that qualify or not?

I don't think the implementation details of astoptimizer are important 
for this proposal.


[...]
> The question in my mind is whether real code has enough pure builtin 
> calls of constants to justify the overhead.

Its not just builtin calls of constants, this technique has much wider 
application. If I understand Victor correctly, he thinks he can get 
function inlining, where instead of having to make a full function call 
to the built-in (which is slow), the compiler can jump directly to the 
function's implementation as if it were written inline.

https://en.wikipedia.org/wiki/Inline_expansion

Obviously you can't do this optimization if len has changed from the 
inlined version, hence Victor needs to detect changes to globals() and 
builtins.__dict__.

This shouldn't turn into a general critique of optimization techniques, 
but I think that Victor's PEP should justify why he is confident that 
these optimizations have a good chance to be worthwhile. It's not enough 
to end up with "well, we applied all the optimizations we could, and the 
good news is that Python is no slower". We want some evidence that it 
will actually be faster.




-- 
Steve


More information about the Python-ideas mailing list