[MATRIX-SIG] hash() function w/ NumPy

Aaron Watters arw@dante.mh.lucent.com
Fri, 25 Jul 1997 12:29:54 -0400


The hash() function takes a string of arbitrary length
and transforms it into a 32 bit value.  If the string has more
than 32 bits there is no way to get the string back in general.

However, if the hash values are unique you can use a ancillary
dictionary mapping hash(x) --> x, but you better be ready
to do something appropriate in the case that collisions occur.

Maybe a better strategy is to simply number the strings sequentially
in an external index structure.

class string_cache:
    def __init__(self):
         self.dict = {}
         self.list = []
    def insert(self, str):
         d = self.dict
         if d.has_key(str):
            return d[str]
         else:
            result = d[str] = len(d)
            self.list.append(str)
            return result
    def get_by_index(self, index):
         return self.list[index]

Untested, and I hope you don't want to do deletions :).
If you want to get the strings back in aggregate
(ie, get a bunch of strings from the structure
in a couple interpreter ticks)
the above could be modified to use my favorite extension
module, kjbuckets :) but I'll spare you.

With Python 1.4+  string hash collisions are unlikely (with 1.3, sadly,
they were common, and still are for tuples, but this will
be fixed in 1.5). -- Aaron Watters

----------
> From: Hoon Yoon - IPT Quant <hyoon@nyptsrv1.etsd.ml.com>
> To: matrix-sig@python.org
> Subject: [MATRIX-SIG] hash() function w/ NumPy
> Date: Friday, July 25, 1997 10:41 AM
> 
> Hello,
> 
>   Taking a break from all the encouraging discussion about the Plots.
> I am attempting to incorporate hash('STR') to an Array. Since hash turns
> a String into uniq numeric sequence, I am trying to use it to sneak in
> string id's into Numerical arrays. Now I know how to turn one into hash,
> but how do I get it back to a string? There is supposed to be function,
> but I remeber reading that I may not be able get the same string back
> in reverse. If anyone remembers the function and the documentation in
> Lutz or Web pages let me know. If anyone had done some of these work
> already, please tip me off few tricks.
>   I guess this is not exactely recommended, but I cannot think of a good
> way to use strings as row and/or column headings (as a list may be). Has
> anyone done this?
>   Thanks much,
>   
> Hoon,

A sus ordenes :).

_______________
MATRIX-SIG  - SIG on Matrix Math for Python

send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
_______________