[Python-Dev] new unicode hash calculation
M.-A. Lemburg
mal@lemburg.com
Mon, 10 Jul 2000 19:30:20 +0200
Fredrik Lundh wrote:
>
> mal wrote:
>
> > * change hash value calculation to work on the Py_UNICODE data
> > instead of creating a default encoded cached object (what
> > now is .utf8str)
>
> it this what you had in mind?
>
> static long
> unicode_hash(PyUnicodeObject *self)
> {
> register int len;
> register Py_UNICODE *p;
> register long x;
>
> if (self->hash != -1)
> return self->hash;
> len = PyUnicode_GET_SIZE(self);
> p = PyUnicode_AS_UNICODE(self);
> x = *p << 7;
> while (--len >= 0)
> x = (1000003*x) ^ *p++;
> x ^= a->ob_size;
> if (x == -1)
> x = -2;
> self->hash = x;
> return x;
> }
>
> </F>
Well, sort of. It should be done in such a way that Unicode
strings which only use the lower byte produce the same hash
value as normal 8-bit strings -- is this the case for the
above code ?
My first idea was to apply a kind of two pass scan which
first only uses the lower byte and then the higher byte to
calculate a hash value. Both passes would use the same
algorithm as the one for 8-bit strings.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/