[Numpy-discussion] memoization with ndarray arguments
Francesc Alted
faltet at pytables.org
Mon Mar 23 04:20:17 EDT 2009
A Saturday 21 March 2009, Paul Northug escrigué:
[clip]
> numpy arrays are not hashable, maybe for a good reason.
Numpy array are not hashable because they are mutable.
> I tried
> anyway by keeping a dict of hash(tuple(X)), but started having
> collisions. So I switched to md5.new(X).digest() as the hash function
> and it seems to work ok. In a quick search, I saw cPickle.dumps and
> repr are also used as key values.
Having collisions is not necessarily very bad, unless you have *a lot*
of them. I wonder what kind of X you are dealing with that can provoke
so much collisions when using hash(tuple(X))? Just curious.
> I am assuming this is a common problem with functions with numpy
> array arguments and was wondering what the best approach is
> (including not using memoization).
If md5.new(X).digest() works well for you, then go ahead; it seems fast:
In [14]: X = np.arange(1000.)
In [15]: timeit hash(tuple(X))
1000 loops, best of 3: 504 µs per loop
In [16]: timeit md5.new(X).digest()
10000 loops, best of 3: 40.4 µs per loop
Cheers,
--
Francesc Alted
More information about the NumPy-Discussion
mailing list