[Numpy-discussion] numpy type mismatch

Fri Jun 10 22:29:53 EDT 2011

2011/6/10 Olivier Delalleau <shish at keba.be>

> 2011/6/10 Charles R Harris <charlesr.harris at gmail.com>
>
>>
>>
>> On Fri, Jun 10, 2011 at 5:19 PM, Olivier Delalleau <shish at keba.be> wrote:
>>
>>> 2011/6/10 Charles R Harris <charlesr.harris at gmail.com>
>>>
>>>>
>>>>
>>>> On Fri, Jun 10, 2011 at 3:43 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 10, 2011 at 3:24 PM, Charles R Harris <
>>>>> charlesr.harris at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 10, 2011 at 2:17 PM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 10, 2011 at 3:02 PM, Charles R Harris <
>>>>>>> charlesr.harris at gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 10, 2011 at 1:50 PM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>>>
>>>>>>>>> Came across an odd error while using numpy master.  Note, my system
>>>>>>>>> is 32-bits.
>>>>>>>>>
>>>>>>>>> >>> import numpy as np
>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int32)) == np.int32
>>>>>>>>> False
>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int64)) == np.int64
>>>>>>>>> True
>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.float32)) == np.float32
>>>>>>>>> True
>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.float64)) == np.float64
>>>>>>>>> True
>>>>>>>>>
>>>>>>>>> So, only the summation performed with a np.int32 accumulator
>>>>>>>>> results in a type that doesn't match the expected type.  Now, for even more
>>>>>>>>> strangeness:
>>>>>>>>>
>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int32))
>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>> >>> hex(id(type(np.sum([1, 2, 3], dtype=np.int32))))
>>>>>>>>> '0x9599a0'
>>>>>>>>> >>> hex(id(np.int32))
>>>>>>>>> '0x959a80'
>>>>>>>>>
>>>>>>>>> So, the type from the sum() reports itself as a numpy int, but its
>>>>>>>>> memory address is different from the memory address for np.int32.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> One of them is probably a long, print out the typecode, dtype.char.
>>>>>>>>
>>>>>>>> Chuck
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> Good intuition, but odd result...
>>>>>>>
>>>>>>> >>> import numpy as np
>>>>>>> >>> a = np.sum([1, 2, 3], dtype=np.int32)
>>>>>>> >>> b = np.int32(6)
>>>>>>> >>> type(a)
>>>>>>> <type 'numpy.int32'>
>>>>>>> >>> type(b)
>>>>>>> <type 'numpy.int32'>
>>>>>>> >>> a.dtype.char
>>>>>>> 'i'
>>>>>>> >>> b.dtype.char
>>>>>>> 'l'
>>>>>>>
>>>>>>> So, the standard np.int32 is getting listed as a long somehow?  To
>>>>>>> further investigate:
>>>>>>>
>>>>>>>
>>>>>> Yes, long shifts around from int32 to int64 depending on the OS. For
>>>>>> instance, in 64 bit Windows it's 32 bits while in 64 bit Linux it's 64 bits.
>>>>>> On 32 bit systems it is 32 bits.
>>>>>>
>>>>>> Chuck
>>>>>>
>>>>>>
>>>>> Right, that makes sense.  But, the question is why does sum() put out a
>>>>> result dtype that is not identical to the dtype that I requested, or even
>>>>> the dtype of the input array?  Could this be an indication of a bug
>>>>> somewhere?  Even if the bug is harmless (it was only noticed within the test
>>>>> suite of larry), is this unexpected?
>>>>>
>>>>>
>>>> I expect sum is using a ufunc and it acts differently on account of the
>>>> cleanup of the ufunc casting rules. And yes, a long *is* int32 on your
>>>> machine. On mine
>>>>
>>>> In [4]: dtype('q') # long long
>>>> Out[4]: dtype('int64')
>>>>
>>>> In [5]: dtype('l') # long
>>>> Out[5]: dtype('int64')
>>>>
>>>> The mapping from C types to numpy width types isn't 1-1. Personally, I
>>>> think we should drop long ;) But it used to be the standard Python type in
>>>> the C API. Mark has also pointed out the problems/confusion this ambiguity
>>>> causes and someday we should probably think it out and fix it. But I don't
>>>> think it is the most pressing problem.
>>>>
>>>> Chuck
>>>>
>>>>
>>> But isn't it a bug if numpy.dtype('i') != numpy.dtype('l') on a 32 bit
>>> computer where both are int32?
>>>
>>>
>> Maybe yes, maybe no ;) They have different descriptors, so from numpy's
>> perspective they are different, but at the hardware/precision level they are
>> the same. It's more of a decision as to what  != means in this case. Since
>> numpy started as Numeric with only the c types the current behavior is
>> consistent, but that doesn't mean it shouldn't change at some point.
>>
>> Chuck
>>
>
> Well apparently it was actually changed recently, since in Numpy 1.5.1 on a
> Windows 32 bit machine, they are considered equal with '=='.
> Personally I think if the string representation of two dtypes is "int32",
> then they should be ==, otherwise it wouldn't make much sense given that you
> can directly test the equality of a dtype with a string like "int32" (like
> dtype('i') == "int32" and dtype('l') == "int32").
>

I also just checked on a fresh install of numpy 1.6.0 on python 3.2, and
both types are equal as well.
I've been playing quite a bit with numpy dtypes and it's the first time I
hear two dtypes representing the exact same kind of data do not compare
equal, so I'm still enclined to believe it should be considered a bug.

-=- Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110610/1b507a7d/attachment.html>