Hash of None varies per-machine

Joshua Judson Rosen jrosen at ll.mit.edu
Fri Apr 3 14:18:56 EDT 2009


Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:
>
> ben.taylor at email.com writes:
> > 1. Is it correct that if you hash two things that are not equal they
> > might give you the same hash value?
> 
> Yes, hashes are 32 bit numbers and there are far more than 2**32
> possible Python values (think of long ints), so obviously there must
> be multiple values that hash to the same slot.

This is not true. CPython integers, at least up through the 2.x
series, are implemented as C *long integers*; on some platforms, this
means that they're 32 bits long. But on an increasing number of
platforms, long integes are 64 bits long.

But, more specifically, consider the following:

> > 2. Should the hash of None vary per-machine? 
> 
> If the docs say this shouldn't happen, then it's a bug.  Otherwise,
> it should probably be considered ok.
> 
> > 3. Given that presumably not all things can be hashed (since the
> > documentation description of hash() says it gives you the hash of the
> > object "if it can be hashed"), should None be hashable?
> 
> Yes, anything that can be used as a dict key (basically all immutable
> values with equality comparison) should be hashable.

My recollection is that what you're seeing here is that, when hash()
doesn't have any `proper value' to use other than object-identity, it
just returns the result of id(). And id() is documented as:

     Return the "identity" of an object. This is an integer (or long
     integer) which is guaranteed to be unique and constant for this
     object during its lifetime. Two objects with non-overlapping
     lifetimes may have the same id() value. (Implementation note:
     this is the address of the object.)

So, not only is the return-value from id() (and hash(), if there's not
actually a __hash__ method defined) non-portable between different
machines, it's not even necessarily portable between two *runs* on the
*same* machine.

In practice, your OS will probably start each new process with the
same virtual memory-address range, and a given *build* of Python will
probably initialise the portion of its memory-segment leading up to
the None-object the same way each time, but....

-- 
Don't be afraid to ask (Lf.((Lx.xx) (Lr.f(rr)))).



More information about the Python-list mailing list