16bit hash

Paul Rubin http
Thu Jun 28 01:02:30 EDT 2007


"Martin v. Löwis" <martin at v.loewis.de> writes:
> So: what are your input data, and what is the
> distribution among them?

With good enough hash functions one shouldn't need to care about
the input distribution.  Basically functions like SHA can be 
used as extractors:

  http://en.wikipedia.org/wiki/Extractor

If there's a concern that the input distribution is specially
concocted to give nonuniform results with some known hash function,
then use one unknown to the input provider, e.g.

   import hmac
   def hash(obj, key='some string unknown to the input source'):
     return int(hmac.HMAC(key,repr(obj)).hexdigest()[:4], 16)

Anyway I don't have the impression that the OP is concerned with this
type of issue.  Otherwise s/he'd want much longer hashes than 16 bits.



More information about the Python-list mailing list