[Python-ideas] RFC: PEP: Add dict.__version__

Victor Stinner victor.stinner at gmail.com
Mon Jan 11 09:04:15 EST 2016


2016-01-11 11:18 GMT+01:00 Neil Girdhar <mistersheik at gmail.com>:
>> No, he can still do what he wants transparently in the interpreter.  What
>> I want to avoid is Python users using __version__ in their own code.
>>
>> Well, he could change exec so it can use arbitrary mappings (or at least
>> dict subclasses), but I assume that's much harder and more disruptive than
>> his proposed change.
>>
>> Anyway, if I understand your point, it's this: __version__ should either
>> be a private implementation-specific property of dicts, or it should be a
>> property of all mappings; anything in between gets all the disadvantages of
>> both.
>
> Right.  I prefer the the former since making it a property of mappings
> bloats Mapping beyond a minimum interface.

The discussion on adding a __version__ property on all mapping types
is interesting. I now agree that it's a boolean choice: no mapping
type must have a __version__ property, or all types must have it. It
would be annoying to get a cryptic issue when we pass a dict subtype
or a dict-like type to a function expecting a "mapping".

I *don't* want to require all mapping types to implement a __version__
property. Even if it's simple to implement, some types can be a simple
wrapper on top on an existing efficient mapping type which doesn't
implement such property (or worse, have a similar *but different*
property). For example, Jython and IronPython probably reuse existing
mapping types of Java and .NET, and I don't think that they have such
version property.

The Mapping ABC already requires a lot of methods, having to implement
yet another property would make the implementation even more complex
and difficult to maintain. My PEP 509 requires 8 methods (including
the constructor) to update the __version__.

> Here is where I have to disagree.  I hate it when experts say "we'll just
> document it and then it's the user's fault for misusing it".  Yeah, you're
> right, but as a user, it is very frustrating to have to read other people's
> documentation.  You know that some elite Python programmer is going to
> optimize his code using this and someone years later is going to scratch his
> head wondering where __version__ is coming from.  Is it the provided by the
> caller?  Was it added to the object at some earlier point?  Finally, he'll
> search the web, arrive at a stackoverflow question with 95 upvotes that
> finally clears things up.  And for what?  Some minor optimization. (Not
> Victor's optimization, but a Python user's optimization in Python code.)

I agree that it would be a bad practice to use widely __version__ in a
project to micro-optimize manually an application. Well,
micro-optimizations are bad practice in most cases ;-) Remember that
dict lookup have a complex of O(1), that's why they are used for
namespaces ;-)

It's a bad idea because at the Python level, the dict lookup and
checking the version has... the same cost! (48.7 ns vs 47.5 ns... a
difference of 1 nanosecond)

haypo at smithers$ ./python -m timeit -s 'd = {str(i):i for i in
range(100)}' 'd["33"] == 33'
10000000 loops, best of 3: 0.0487 usec per loop
haypo at smithers$ ./python -m timeit -s 'd = {str(i):i for i in
range(100)}' 'd.__version__ == 100'
10000000 loops, best of 3: 0.0475 usec per loop

The difference is only visible at the C level:

* PyObject_GetItem: 16.5 ns
* PyDict_GetItem: 14.8 ns
* fat.GuardDict: 3.8 ns (check dict.__version__)

Well, 3.8 ns (guard) vs 14.8 ns (dict lookup) is nice but not so
amazing, a dict lookup is already *fast*. The difference between
guards and dict lookups is that a guard check has a complexity of O(1)
in the common case (if the dict was not modified).  For example, an
optimization using 10 global variables in a function, the check costs
148 ns for 10 dict lookups, whereas the guard still only cost 3.8 ns
(39x as fast).

The guards must be as cheap as possible, otherwise it will have to
work harder to implement more efficient optimizations :-D

Note: the performance of a dict lookup also depends if the key is
"interned" (in short, it's a kind of singleton to compare strings by
their address instead of having to compare character per character).
For code objects, Python interns strings which are made of characters
a-z, A-Z and "_".

Well, it's just to confirm that yes, the PEP is designed to implement
fast guards in C, but it would be a bad idea to start to use it widely
at the Python level.


> Also, using this __version__ in source code is going to complicate switching
> from CPython to any of the other Python implementations, so those
> implementations will probably end up implementing it just to simplify
> "porting", which would otherwise be painless.

IMHO *if* we add __version__ to dict (or even to all mapping types),
it must be done for all Python implementations. It would be really
annoying to have to start putting kind of #ifdef in the code for a
feature of a core builtin type (dict).

But again, I now agree to not expose the version at the Python level...



> Why don't we leave exposing __version__ in Python to another PEP?

According to this thread and my benchmark above, the __version__
property at the Python level is a *bad* idea. So I'm not interested
anymore to expose it.

Victor


More information about the Python-ideas mailing list