[Numpy-discussion] numpy type mismatch

Fri Jun 10 23:57:37 EDT 2011

On Fri, Jun 10, 2011 at 10:34 PM, Olivier Delalleau <shish at keba.be> wrote:

> 2011/6/10 Benjamin Root <ben.root at ou.edu>
>
>>
>>
>> On Fri, Jun 10, 2011 at 9:29 PM, Olivier Delalleau <shish at keba.be> wrote:
>>
>>>
>>> 2011/6/10 Olivier Delalleau <shish at keba.be>
>>>
>>>> 2011/6/10 Charles R Harris <charlesr.harris at gmail.com>
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 10, 2011 at 5:19 PM, Olivier Delalleau <shish at keba.be>wrote:
>>>>>
>>>>>> 2011/6/10 Charles R Harris <charlesr.harris at gmail.com>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 10, 2011 at 3:43 PM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 10, 2011 at 3:24 PM, Charles R Harris <
>>>>>>>> charlesr.harris at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 10, 2011 at 2:17 PM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 10, 2011 at 3:02 PM, Charles R Harris <
>>>>>>>>>> charlesr.harris at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 10, 2011 at 1:50 PM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Came across an odd error while using numpy master.  Note, my
>>>>>>>>>>>> system is 32-bits.
>>>>>>>>>>>>
>>>>>>>>>>>> >>> import numpy as np
>>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int32)) == np.int32
>>>>>>>>>>>> False
>>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int64)) == np.int64
>>>>>>>>>>>> True
>>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.float32)) == np.float32
>>>>>>>>>>>> True
>>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.float64)) == np.float64
>>>>>>>>>>>> True
>>>>>>>>>>>>
>>>>>>>>>>>> So, only the summation performed with a np.int32 accumulator
>>>>>>>>>>>> results in a type that doesn't match the expected type.  Now, for even more
>>>>>>>>>>>> strangeness:
>>>>>>>>>>>>
>>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int32))
>>>>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>>>>> >>> hex(id(type(np.sum([1, 2, 3], dtype=np.int32))))
>>>>>>>>>>>> '0x9599a0'
>>>>>>>>>>>> >>> hex(id(np.int32))
>>>>>>>>>>>> '0x959a80'
>>>>>>>>>>>>
>>>>>>>>>>>> So, the type from the sum() reports itself as a numpy int, but
>>>>>>>>>>>> its memory address is different from the memory address for np.int32.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> One of them is probably a long, print out the typecode,
>>>>>>>>>>> dtype.char.
>>>>>>>>>>>
>>>>>>>>>>> Chuck
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Good intuition, but odd result...
>>>>>>>>>>
>>>>>>>>>> >>> import numpy as np
>>>>>>>>>> >>> a = np.sum([1, 2, 3], dtype=np.int32)
>>>>>>>>>> >>> b = np.int32(6)
>>>>>>>>>> >>> type(a)
>>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>>> >>> type(b)
>>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>>> >>> a.dtype.char
>>>>>>>>>> 'i'
>>>>>>>>>> >>> b.dtype.char
>>>>>>>>>> 'l'
>>>>>>>>>>
>>>>>>>>>> So, the standard np.int32 is getting listed as a long somehow?  To
>>>>>>>>>> further investigate:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Yes, long shifts around from int32 to int64 depending on the OS.
>>>>>>>>> For instance, in 64 bit Windows it's 32 bits while in 64 bit Linux it's 64
>>>>>>>>> bits. On 32 bit systems it is 32 bits.
>>>>>>>>>
>>>>>>>>> Chuck
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Right, that makes sense.  But, the question is why does sum() put
>>>>>>>> out a result dtype that is not identical to the dtype that I requested, or
>>>>>>>> even the dtype of the input array?  Could this be an indication of a bug
>>>>>>>> somewhere?  Even if the bug is harmless (it was only noticed within the test
>>>>>>>> suite of larry), is this unexpected?
>>>>>>>>
>>>>>>>>
>>>>>>> I expect sum is using a ufunc and it acts differently on account of
>>>>>>> the cleanup of the ufunc casting rules. And yes, a long *is* int32 on your
>>>>>>> machine. On mine
>>>>>>>
>>>>>>> In [4]: dtype('q') # long long
>>>>>>> Out[4]: dtype('int64')
>>>>>>>
>>>>>>> In [5]: dtype('l') # long
>>>>>>> Out[5]: dtype('int64')
>>>>>>>
>>>>>>> The mapping from C types to numpy width types isn't 1-1. Personally,
>>>>>>> I think we should drop long ;) But it used to be the standard Python type in
>>>>>>> the C API. Mark has also pointed out the problems/confusion this ambiguity
>>>>>>> causes and someday we should probably think it out and fix it. But I don't
>>>>>>> think it is the most pressing problem.
>>>>>>>
>>>>>>> Chuck
>>>>>>>
>>>>>>>
>>>>>> But isn't it a bug if numpy.dtype('i') != numpy.dtype('l') on a 32 bit
>>>>>> computer where both are int32?
>>>>>>
>>>>>>
>>>>> Maybe yes, maybe no ;) They have different descriptors, so from numpy's
>>>>> perspective they are different, but at the hardware/precision level they are
>>>>> the same. It's more of a decision as to what  != means in this case. Since
>>>>> numpy started as Numeric with only the c types the current behavior is
>>>>> consistent, but that doesn't mean it shouldn't change at some point.
>>>>>
>>>>> Chuck
>>>>>
>>>>
>>>> Well apparently it was actually changed recently, since in Numpy 1.5.1
>>>> on a Windows 32 bit machine, they are considered equal with '=='.
>>>> Personally I think if the string representation of two dtypes is
>>>> "int32", then they should be ==, otherwise it wouldn't make much sense given
>>>> that you can directly test the equality of a dtype with a string like
>>>> "int32" (like dtype('i') == "int32" and dtype('l') == "int32").
>>>>
>>>
>>> I also just checked on a fresh install of numpy 1.6.0 on python 3.2, and
>>> both types are equal as well.
>>>
>>
>> Are you talking about the release of 1.6, or the continued development
>> branch?  This is happening to me on the master branch, but I have not tried
>> earlier versions.  Again, I think this bolsters the evidence that this is
>> from a (very) recent change.
>>
>>
>>> I've been playing quite a bit with numpy dtypes and it's the first time I
>>> hear two dtypes representing the exact same kind of data do not compare
>>> equal, so I'm still enclined to believe it should be considered a bug.
>>>
>>>
>> Quite honestly, I really don't care that the dtypes aren't equal.  I
>> usually work at a purely python level and performing actions based on types
>> is generally bad practice anyway.  Anytime that I (rarely) check types, I
>> would use isinstance() against one of the core numerical types rather than a
>> numpy type.  The fact that I even found this issue was completely by
>> accident while investigating a test failure in larry.
>>
>> What concerns me more is that the type coming from the ufunc is not the
>> same type that went in, or even requested through the dtype argument.  I
>> think *that* should be the main concern here, and should probably be tested
>> for in the unit tests.
>>
>> Ben Root
>>
>
> The project I'm working on (http://deeplearning.net/software/theano/)
> heavily relies on dtype.__eq__, because it uses typed objects associated to
> data of e.g. int32 or float64 types, and it needs to know if the provided
> numpy arrays are of the proper type.
> So we do a lot of comparisons like:
>    array.dtype == "int32"
>
> I'd be curious to know, in your case, what is the output of the following
> lines:
> numpy.dtype('i') == "int32"
> numpy.dtype('l') == "int32"
> str(numpy.dtype('i'))
> str(numpy.dtype('l'))
>
>
My output on a 32-bit, Ubuntu 11.04 machine with the latest numpy from
master is:

True
True
'int32'
'int32'

I will have a new 64-bit machine up and running on Monday (yay!) to do some
further tests on, but I suspect I am currently in the minority here for
architecture type.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110610/229df491/attachment.html>