[Python-Dev] RE: Unicode character name hashing

Bill Tutt billtut@microsoft.com
Fri, 14 Jul 2000 05:30:35 -0700


After making use of the test drive Alphas by Compaq, I just uploaded a patch
to SF that should fix this nasty issue.
Ugh. Not fun....

If anybody else cares about identical string hash values across 32 v. 64 bit
architectures, check out the patch. 

Bill

 -----Original Message-----
From: 	Mark Favas [mailto:m.favas@per.dem.csiro.au] 
Sent:	Thursday, July 13, 2000 4:00 PM
To:	Bill Tutt
Subject:	Re: Unicode character name hashing

Just tried it, and got the same message:

test_ucn
test test_ucn crashed -- exceptions.UnicodeError : Unicode-Escape
decoding error: Invalid Unicode Character Name

Cheers,
	Mark

Bill Tutt wrote:
> 
> Does this patch happen to fix it?
> I'm afraid my skills relating to signed overflow is a bit rusty... :(
> 
> Bill
> 
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Modules/ucnhash.c,v
> retrieving revision 1.2
> diff -u -r1.2 ucnhash.c
> --- ucnhash.c   2000/06/29 00:06:39     1.2
> +++ ucnhash.c   2000/07/13 21:41:07
> @@ -30,12 +30,12 @@
> 
>      len = cch;
>      p = (unsigned char *) key;
> -    x = 1694245428;
> +    x = (long)0x64fc2234;
>      while (--len >= 0)
> -        x = (1000003*x) ^ toupper(*(p++));
> +        x = ((0xf4243 * x) & 0xFFFFFFFF) ^ toupper(*(p++));
>      x ^= cch + 10;
> -    if (x == -1)
> -        x = -2;
> +    if (x == (long)0xFFFFFFFF)
> +        x = (long)0xfffffffe;
>      x %= k_cHashElements;
>      /* ensure the returned value is positive so we mimic Python's %
> operator */
>      if (x < 0)
> @@ -52,12 +52,12 @@
> 
>      len = cch;
>      p = (unsigned char *) key;
> -    x = -1917331657;
> +    x = (long)0x8db7d737;
>      while (--len >= 0)
> -        x = (1000003*x) ^ toupper(*(p++));
> +        x = ((0xf4243 * x) & 0xFFFFFFFF) ^ toupper(*(p++));
>      x ^= cch + 10;
> -    if (x == -1)
> -        x = -2;
> +    if (x == (long)0xFFFFFFFF)
> +        x = (long)0xfffffffe;
>      x %= k_cHashElements;
>      /* ensure the returned value is positive so we mimic Python's %
> operator */
>      if (x < 0)
> 
>  -----Original Message-----
> From:   Mark Favas [mailto:m.favas@per.dem.csiro.au]
> Sent:   Thursday, July 13, 2000 1:16 PM
> To:     python-dev@python.org; Bill Tutt
> Subject:        Unicode character name hashing
> 
> [Bill has epiphany]
> >I just had a rather unhappy epiphany this morning.
> >F1, and f2 in ucnhash.c might not work on machines where sizeof(long) >!=
> 32 bits.
> 
> I get the following from test_ucn on an Alpha running Tru64 Unix:
> 
> python Lib/test/test_ucn.py
> UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character
> Name
> 
> This is with the current CVS - and it's been failing this test for some
> time now. I'm happy to test any fixes...
> 
> --
> Email  - m.favas@per.dem.csiro.au        Mark C Favas
> Phone  - +61 8 9333 6268, 0418 926 074   CSIRO Exploration & Mining
> Fax    - +61 8 9383 9891                 Private Bag No 5, Wembley
> WGS84  - 31.95 S, 115.80 E               Western Australia 6913

-- 
Email  - m.favas@per.dem.csiro.au        Mark C Favas
Phone  - +61 8 9333 6268, 0418 926 074   CSIRO Exploration & Mining
Fax    - +61 8 9383 9891                 Private Bag No 5, Wembley
WGS84  - 31.95 S, 115.80 E               Western Australia 6913