[Numpy-discussion] dtype comparison and hashing

Geoffrey Irving irving at naml.us
Sat Oct 18 18:43:38 EDT 2008


On Wed, Oct 15, 2008 at 12:56 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Oct 15, 2008 at 02:20, Geoffrey Irving <irving at naml.us> wrote:
>> Hello,
>>
>> Currently in numpy comparing dtypes for equality with == does an
>> internal PyArray_EquivTypes check, which means that the dtypes NPY_INT
>> and NPY_LONG compare as equal in python.  However, the hash function
>> for dtypes reduces id(), which is therefore inconsistent with ==.
>> Unfortunately I can't produce a python snippet showing this since I
>> don't know how to create a NPY_INT dtype in pure python.
>>
>> Based on the source it looks like hash should raise a type error,
>> since tp_hash is null but tp_richcompare is not.  Does the following
>> snippet through an exception for others?
>>
>>>>> import numpy
>>>>> hash(numpy.dtype('int'))
>> 5708736
>>
>> This might be the problem:
>>
>> /* Macro to get the tp_richcompare field of a type if defined */
>> #define RICHCOMPARE(t) (PyType_HasFeature((t), Py_TPFLAGS_HAVE_RICHCOMPARE) \
>>                         ? (t)->tp_richcompare : NULL)
>>
>> I'm using the default Mac OS X 10.5 installation of python 2.5 and
>> numpy, so maybe those weren't compiled correctly.  Has anyone else
>> seen this issue?
>
> Actually, the problem is that we provide a hash function explicitly.
> In multiarraymodule.c:
>
>    PyArrayDescr_Type.tp_hash = (hashfunc)_Py_HashPointer;
>
> That is a violation of the hashing protocol (objects which compare
> equal and are hashable need to hash equal), and should be fixed.

Thanks for finding that.

Geoffrey



More information about the NumPy-Discussion mailing list