[Python-Dev] RE: Unicode character name hashing
Bill Tutt
billtut@microsoft.com
Sun, 16 Jul 2000 01:35:56 -0700
Not that any of this is terribly important given F bot's new patch, except
for wrt to the perfect hash generation code.
But...
> Tim wrote:
> [Bill Tutt]
> > I just had a rather unhappy epiphany this morning.
> > F1, and f2 in ucnhash.c might not work on machines where
> > sizeof(long) != 32 bits.
> If "not work" means "may not return the same answer as when a long does
have
> exactly 32 bits", then yes, it's certain not to work. Else I don't know
--
> I don't understand the (undocumented) postconditions (== what does "work"
> mean, exactly?) for these functions.
"Works" means that f1, and f2 must always generate the same bits no matter
what platform they're executed on
> If getting the same bits is what's important, f1 can be repaired by
> inserting this new block:
> /* cut back to 32 bits */
> x &= 0xffffffffL;
> if (x & 0x80000000L) {
> /* if negative as a 32-bit thing, extend sign bit to full
precision */
> x -= 0x80000000L; /* subtract 2**32 in a portable way */
> x -= 0x80000000L; /* by subtracting 2**31 twice */
> }
> between the existing
> x ^= cch + 10;
> and
> if (x == -1)
> This assumes that negative numbers are represented in 2's-complement, but
> should deliver the same bits in the end on any machine for which that's
true
> (I don't know of any Python platform for which it isn't). The same shoe
> work for f2 after replacing its negative literal with a 0x...L bit pattern
> too.
> The assumption about 2's-comp, and the new "if" block, could be removed by
> making these functions compute with and return unsigned longs instead. I
> don't know why they're using signed longs now (the bits produced are
exactly
> the same either way, up until the "%" operation, at which point C is
> ill-defined when using signed long).
The SF patch does indeed use unsigned longs.
Bill