[Python-Dev] Decimal <-> float comparisons in py3k.

Mark Dickinson dickinsm at gmail.com
Fri Mar 19 11:07:55 CET 2010


On Fri, Mar 19, 2010 at 9:37 AM, Mark Dickinson <dickinsm at gmail.com> wrote:
> Making hashes of int,
> float, Decimal *and* Fraction all compatible with one another,
> efficient for ints and floats, and not grossly inefficient for
> Fractions and Decimals, is kinda hairy, though I have a couple of
> ideas of how it could be done.

To elaborate, here's a cute scheme for the above, which is actually
remarkably close to what Python already does for ints and floats, and
which doesn't require any of the numeric types to figure out whether
it's exactly equal to an instance of some other numeric type.

After throwing out infinities and nans (which can easily be dealt with
separately), everything we care about is a rational number, so it's
enough to come up with some mapping from the set of all rationals to
the set of possible hashes, subject to the requirement that the
mapping can be computed efficiently for the types we care about.

For any prime P there's a natural 'reduce modulo P' map

  reduce : {rational numbers} --> {0, 1, ..., P-1, infinity}

defined in pseudocode by:

  reduce(m/n) = ((m % P) * inv(n, P)) % P  if n % P != 0  else infinity

where inv(n, P) represents the modular inverse to n modulo P.

Now let P be the handy Mersenne prime P = 2**31-1 (on a 64-bit
machine, the almost equally handy prime 2**61-1 could be used
instead), and define a hash function from the set of rationals to the
set [-2**31, 2**31) by:

hash(0) = 0
hash(m/n) = 1 + reduce(m/n - 1) if m/n > 0   # i.e., reduce m/n modulo
P, but to [1..P] rather than [0..P-1].
hash(m/n) = -hash(-m/n) if m/n < 0.

and in the last two cases, map a result of infinity to the unused hash
value -2**31.

For ints, this hash function is almost identical to what Python
already has, except that the current int hash does a reduction modulo
2**32-1 or 2**64-1 rather than 2**31-1.  For all small ints, hash(n)
== n, as currently.  Either way, the hash can be computed
digit-by-digit in exactly the same manner.  For floats, it's also easy
to compute:  express the float as m * 2**e for some integers m and e,
compute hash(m), and rotate e bits in the appropriate direction.  And
it's straightforward to implement for the Decimal and Fraction types,
too.

(One minor detail:  as usual, some postprocessing would be required to
replace a hash of -1 with something else, since a hash value of -1 is
invalid.)

Mark


More information about the Python-Dev mailing list