[Numpy-discussion] advanced indexing bug with huge arrays?

Robert Kern robert.kern at gmail.com
Tue Jan 24 04:15:01 EST 2012


On Tue, Jan 24, 2012 at 08:37, Sturla Molden <sturla at molden.no> wrote:
> On 24.01.2012 09:21, Sturla Molden wrote:
>
>> randomkit.c handles C long correctly, I think. There are different codes
>> for 32 and 64 bit C long, and buffer sizes are size_t.
>
> distributions.c take C longs as parameters e.g. for the binomial
> distribution. mtrand.pyx correctly handles this, but it can give an
> unexpected overflow error on 64-bit Windows:
>
>
> In [1]: np.random.binomial(2**31, .5)
> ---------------------------------------------------------------------------
> OverflowError                             Traceback (most recent call last)
> C:\Windows\system32\<ipython-input-1-000aa0626c42> in <module>()
> ----> 1 np.random.binomial(2**31, .5)
>
> C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in
> mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()
>
> OverflowError: Python int too large to convert to C long
>
>
> On systems where C longs are 64 bit, this is likely not to produce an
> error.
>
> This begs the question if also randomkit.c and districutions.c should be
> changed to use npy_intp for consistency across all platforms.

There are two different uses of long that you need to distinguish. One
is for sizes, and one is for parameters and values. The sizes should
definitely be upgraded to npy_intp. The latter shouldn't; these should
remain as the default integer type of Python and numpy, a C long.

The reason longs are used for sizes is that I wrote mtrand for Numeric
and Python 2.4 before numpy was even announced (and I don't think we
had npy_intp at the time I merged it into numpy, but I could be
wrong). Using longs for sizes was the order of the day. I don't think
I had even touched a 64-bit machine that wasn't a DEC Alpha at the
time, so I knew very little about the issues.

So yes, please, fix whatever you can.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list