[Patches] fix float_hash and complex_hash for 64-bit *nix

Tim Peters tim_one@email.msn.com
Wed, 10 May 2000 00:53:16 -0400


[Trent Mick]
> Discussion:
>
> Okay, it is debatable to call float_hash and complex_hash broken,
> but their code presumed that sizeof(long) was 32-bits. As a result
> the hashed values for floats and complex values were not the same
> on a 64-bit *nix system as on a 32-bit *nix system. With this
> patch they are.

The goal is laudable but the analysis seems flawed.  For example, this new
comment:

> ! 		int hipart; /* algorithm expects this to be 32-bits */

isn't true.  The algorithm assumes that hipart is *at least* 32 bits, as
nothing larger than that can possibly get assigned to it (that's why it's
careful to multiply by 2**31 instead of something larger -- C guarantees
that frexp returns a result with abs value strictly less than 1.0, and that
times 2**31 occupies at most 31 bits with 1 bit to spare for the sign).  On
that basis, it's safer to (as the original code did) assume that
sizeof(long) >= 4 than (as the new code does) assume that sizeof(int) >= 4.

Looks to me like the real problem in the original was here:

    x = hipart + (long)fractpart + (long)intpart + (expo << 15);
                                   ^^^^^^^^^^^^^

The difficulty is that intpart may *not* fit in 32 bits, so the cast of
intpart to long is ill-defined when sizeof(long) == 4.  Note this
consequence under the Win32 Python:

>>> base = 2.**40 + 0.5
>>> base
1099511627776.5
>>> for i in range(32, 45):
	x = base + 2.**i
	print x, hash(x)


1.10380659507e+012 1073741824
1.10810156237e+012 1073741824
1.11669149696e+012 1073741824
1.13387136614e+012 1073741824
1.16823110451e+012 1073741824
1.23695058125e+012 1073741824
1.37438953472e+012 1073741824
1.64926744166e+012 1073741824
2.19902325555e+012 1073741824
3.29853488333e+012 1073741824
5.49755813888e+012 1073741824
9.89560464998e+012 1073741824
1.86916976722e+013 1073741824

That is, the hash function truly is broken for "large" values with a
fractional part, and I expect your after-patch code suffers the same
problem:  a hash function should never ignore any bit in its input.  The
solution to this is to break intpart in this branch into pieces no larger
than 32 bits too -- and that should also fix your 64-bit woes "by magic".