[Python-Dev] Fighting the theoretical randomness of "is" on immutables

Armin Rigo arigo at tunes.org
Mon May 6 10:46:33 CEST 2013


Hi all,

In the context PyPy, we've recently seen again the issue of "x is y"
not being well-defined on immutable constants.  I've tried to
summarize the issues and possible solutions in a mail to pypy-dev [1]
and got some answers already.  Having been convinced that the core is
a language design issue, I'm asking for help from people on this list.
 (Feel free to cross-post.)

[1] http://mail.python.org/pipermail/pypy-dev/2013-May/011299.html

To summarize: the issue is a combination of various optimizations that
work great otherwise.  For example we can store integers directly in
lists of integers, so when we read them back, we need to put them into
fresh W_IntObjects (equivalent of PyIntObject).  We solved temporarily
the issue of "I'm getting an object which isn't ``is``-identical to
the one I put in!" by making all equal integers ``is``-identical.
This required hacking at ``id(x)`` as well to keep the requirement ``x
is y <=> id(x)==id(y)``.  This is getting annoying for strings, though
-- how do you compute the id() of a long string?  Give a unique long
integer?  And if we do the same for tuples, what about their id()?

The long-term solution that seems the most stable to me would be to
relax the requirement ``x is y <=> id(x)==id(y)``.  If we can get away
with only ``x is y <= id(x)==id(y)`` then it would allow us to
implement ``is`` in a consistent way (e.g. two strings with equal
content would always be ``is``-identical) while keeping id()
reasonable (both in terms of complexity and of size of the resulting
long number).  Obviously ``x is y <=> id(x)==id(y)`` would still be
true if any of ``x`` or ``y`` is not an immutable "by-value" built-in
type.

This is clearly a language design issue though.  I can't really think
of a use case that would break if we relax the requirement, but I
might be wrong.  It seems to me that at most some modules like pickle
which use id()-keyed dictionaries will fail to find some
otherwise-identical objects, but would still work (even if tuples are
"relaxed" in this way, you can't have cycles with only tuples).


A bientôt,

Armin.


More information about the Python-Dev mailing list