hash()

Raymond Hettinger python at rcn.com
Mon Dec 5 14:38:04 EST 2005


[John Marshall]
> For strings of > 1 character, what are the chances
> that hash(st) and hash(st[::-1]) would return the
> same value?

Python's string hash algorithm is non-commutative, so a collision with
a reversed string is not likely.  The exact answer depends on the
population of strings being hashed, but it's not hard to compute
collision statistics for a sampling of those strings:

   collisions = len(sample) - len(set(hash(s) for s in sample))


FWIW, here is how Python computes string hash values:

static long
string_hash(PyStringObject *a)
{
        register int len;
        register unsigned char *p;
        register long x;

        len = a->ob_size;
        p = (unsigned char *) a->ob_sval;
        x = *p << 7;
        while (--len >= 0)
                x = (1000003*x) ^ *p++;
        x ^= a->ob_size;
        if (x == -1)
                x = -2;
        return x;
}



> My goal is to uniquely identify multicharacter strings,
> all of which begin with "/" and never end with "/".
> Therefore, st != st[::-1].

Just use a set -- no string reversal is needed for detection of unique
multicharacter strings..


Raymond




More information about the Python-list mailing list