python hash function

Jeff Epler jepler at unpythonic.net
Wed Jun 25 12:48:10 EDT 2003


On Tue, Jun 24, 2003 at 12:57:09PM -0700, Damien Morton wrote:
> static long
> string_hash1(register char *p, int size)
> {
[...]
> 	while (--len >= 0)

vs
> static long
> string_hash3(register char *p, register int size)
> {
[...]
>         while (c = *p++)

Surely this hash function is incorrect (since it ignores size and assumes
p is NUL-terminated.

It's probably true that the current hash function is a bad choice for
platforms without a fast multiply, but I think the python-dev search
will turn up one perceived-as-important property it does have.
AFAIR, it seems that strings like 'foo1', 'foo2' and 'foo3' get nice
adjacent hash values and are thus less likely to collide.  Whether the
importance of this property is true in real life is a question that has
probably not been answered (Tim Peters often laments the lack of the
"average user" when these kinds of discussions are made).  A potential
problem with this property is that if 'foo' and 'bar' happen to collide,
then 'foo1' will collide with 'bar1', etc.  This could end up being an
unfortunate property.

Jeff





More information about the Python-list mailing list