[SciPy-user] hash function on arrays

Tue Oct 9 18:49:15 EDT 2007

Tom Johnson wrote:
> On 10/9/07, Robert Kern <robert.kern at gmail.com> wrote:
>>   ('numpy.ndarray', a.shape, a.dtype, a.strides, str(a.flags), buffer(a))
> 
> Will this work for arrays defined in different python processes?
> 
> I will be storing these hash values (along with the matrices) in a
> database and doing comparisons at some later time, in some other
> python process.

Sorry, the hash of the dtype does depend on the pointer. For the
natively-supported types on your machine (dtype(float), dtype('=i4'), etc.),
this shouldn't matter since they appear to be shortcutted such that you get the
same object out always. However, non-native types like byteswapped versions and
custom dtypes give different hashes every time.

If you want to support the non-native types, then you have to expand the dtype
somewhat. If you only need to support byteswapped versions of the usual
datatypes, you can probably just use a.dtype.str. If you need to support simples
record arrays, tupe(a.dtype.descr) will probably work. If you need to support
nested record arrays ... you have some more work ahead of you.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco