[Numpy-discussion] memoization with ndarray arguments

Mon Mar 23 04:20:17 EDT 2009

A Saturday 21 March 2009, Paul Northug escrigué:
[clip]
> numpy arrays are not hashable, maybe for a good reason.

Numpy array are not hashable because they are mutable.

> I tried 
> anyway by  keeping a dict of hash(tuple(X)), but started having
> collisions. So I switched to md5.new(X).digest() as the hash function 
> and it seems to work ok. In a quick search, I saw cPickle.dumps and
> repr are also used as key values.

Having collisions is not necessarily very bad, unless you have *a lot* 
of them.  I wonder what kind of X you are dealing with that can provoke 
so much collisions when using hash(tuple(X))?  Just curious.

> I am assuming this is a common problem with functions with numpy
> array arguments and was wondering what the best approach is
> (including not using memoization).

If md5.new(X).digest() works well for you, then go ahead; it seems fast:

In [14]: X = np.arange(1000.)

In [15]: timeit hash(tuple(X))
1000 loops, best of 3: 504 µs per loop

In [16]: timeit md5.new(X).digest()
10000 loops, best of 3: 40.4 µs per loop

Cheers,

-- 
Francesc Alted