[Numpy-discussion] change default integer from int32 to int64 on win64?

Julian Taylor jtaylor.debian at googlemail.com
Wed Jul 23 16:34:40 EDT 2014


On 23.07.2014 22:04, Robert Kern wrote:
> On Wed, Jul 23, 2014 at 8:50 PM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
>> On 23.07.2014 20:54, Robert Kern wrote:
>>> On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor
>>> <jtaylor.debian at googlemail.com> wrote:
>>>> hi,
>>>> it recently came to my attention that the default integer type in numpy
>>>> on windows 64 bit is a 32 bit integers [0].
>>>> This seems like a quite serious problem as it means you can't use any
>>>> integers created from python integers < 32 bit to index arrays larger
>>>> than 2GB.
>>>> For example np.product(array.shape) which will never overflow on linux
>>>> and mac, can overflow on win64.
>>>
>>> Currently, on win64, we use Python long integer objects for `.shape`
>>> and related attributes. I wonder if we could return numpy int64
>>> scalars instead. Then np.product() (or anything else that consumes
>>> these via np.asarray()) would infer the correct dtype for the result.
>>
>> this might be a less invasive alternative that might solve a lot of the
>> incompatibilities, but it would probably also change np.arange(5) and
>> similar functions to int64 which might change the dtype of a lot of
>> arrays. The difference to just changing it everywhere might not be so
>> large anymore.
> 
> No, np.arange(5) would not change behavior given my suggestion, only
> the type of the integer objects in ndarray.shape and related tuples.

ndarray.shape are not numpy scalars but python objects, so they would
always be converted back to 32 bit integers when given back to numpy.

> 
>>>> I think this is a very dangerous platform difference and a quite large
>>>> inconvenience for win64 users so I think it would be good to fix this.
>>>> This would be a very large change of API and probably also ABI.
>>>
>>> Yes. Not only would it be a very large change from the status quo, I
>>> think it introduces *much greater* platform difference than what we
>>> have currently. The assumption that the default integer object
>>> corresponds to the platform C long, whatever that is, is pretty
>>> heavily ingrained.
>>
>> This should be only a concern for the ABI which can be solved by simply
>> recompiling.
>> In comparison that the API is different on win64 compared to all other
>> platforms is something that needs source level changes.
> 
> No, the API is no different on win64 than other platforms. Why do you
> think it is? The win64 platform is a weird platform in this respect,
> having made a choice that other 64-bit platforms didn't, but numpy's
> API treats it consistently. When we say that something is a C long,
> it's a C long on all platforms.

The API is different if you consider it from a python perspective.
The default integer dtype should be sufficiently large to index into any
numpy array, thats what I call an API here. win64 behaves different, you
have to explicitly upcast your index to be able to index all memory.
But API or ABI is just semantics here, what I actually mean is the
difference of source changes vs recompiling to deal with the issue.
Of course there might be C code that needs more than recompiling, but it
should not be that much, it would have to be already somewhat
broken/restrictive code that uses numpy buffers without first checking
which type it has.

There can also be python code that might need source changes e.g.
np.int_ memory mapping a binary from win32 assuming np.int_ is also 32
bit on win64, but this would be broken on linux and mac already now.

>>>> But as we also never officially released win64 binaries we could change
>>>> it for from source compilations and give win64 binary distributors the
>>>> option to keep the old ABI/API at their discretion.
>>>
>>> That option would make the problem worse, not better.
>>
>> maybe, I'm not familiar with the numpy win64 distribution landscape.
>> Is it not like linux where you have one distributor per workstation
>> setup that can update all its packages to a new ABI on one go?
> 
> No. There tend to be multiple providers.
> 




More information about the NumPy-Discussion mailing list