[Numpy-discussion] np.bincount raises MemoryError when given an empty array

Mon Feb 1 23:05:22 EST 2010

On Mon, Feb 1, 2010 at 8:37 PM, David Cournapeau <david at silveregg.co.jp> wrote:
> josef.pktd at gmail.com wrote:
>> On Mon, Feb 1, 2010 at 12:09 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>> 2010/2/1 Ernest Adrogué <eadrogue at gmx.net>:
>>>> Hello,
>>>>
>>>> Consider the following code:
>>>>
>>>> for j in range(5):
>>>>        f = np.bincount(x[y == j])
>>>>
>>>> It fails with MemoryError whenever y == j is all False element-wise.
>>>>
>>>>
>>>> In [96]: np.bincount([])
>>>> ---------------------------------------------------------------------------
>>>> MemoryError                               Traceback (most recent call last)
>>>>
>>>> /home/ernest/<ipython console> in <module>()
>>>>
>>>> MemoryError:
>>>>
>>>> In [97]: np.__version__
>>>> Out[97]: '1.3.0'
>>>>
>>>> Is this a bug?
>>>>
>>>> Bye.
>>> I get it to work sometimes:
>>>
>>> $ ipython
>>>>> import numpy as np
>>>>> np.bincount([])
>>> ---------------------------------------------------------------------------
>>> MemoryError:
>>>>> np.bincount(())
>>>   array([0])
>>>>> np.bincount([])
>>>   array([0])
>>>>> np.bincount([])
>>> ---------------------------------------------------------------------------
>>> MemoryError:
>>>>> np.__version__
>>>   '1.4.0rc2'
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>> I don't get a memory error but the results are strange for empty
>
> That may just be because you have enough memory for the (bogus) result:
> the value is a random memory value interpreted as an intp value, hence
> most likely very big on 64 bits system.
>
> It should be easy to fix, but I am not sure what is the expected result.
> An empty array ?

>>> np.bincount([])
array([0, 0, 0, ..., 0, 0, 0])
>>> np.bincount(np.array([]).astype(int))
array([0, 0, 0, ..., 0, 0, 0])
>>> np.bincount(())
array([0, 0, 0, ..., 0, 0, 0])
>>> np.bincount(()).shape
(41570297,)

I think this could be considered as a correct answer, the count of any
integer is zero.

Returning an array with one zero, or the empty array or raising an
exception? I don't see much of a pattern

>>> x=np.arange(5);np.unique(x[x == 7])
array([], dtype=int32)
>>> np.unique(x[x == 7], return_index=1)
(array([], dtype=int32), array([], dtype=bool))
>>> np.unique(x[x == 7], return_inverse=1)
(array([], dtype=int32), array([], dtype=bool))

>>> x=np.arange(5);np.histogram(x[x == 7])
Traceback (most recent call last):
  File "<pyshell#136>", line 1, in <module>
    x=np.arange(5);np.histogram(x[x == 7])
  File "C:\Programs\Python25\Lib\site-packages\numpy\lib\function_base.py",
line 202, in histogram
    range = (a.min(), a.max())
ValueError: zero-size array to ufunc.reduce without identity

>>> x=np.arange(5);np.digitize(x[x == 7],np.arange(6))
Traceback (most recent call last):
  File "<pyshell#140>", line 1, in <module>
    x=np.arange(5);np.digitize(x[x == 7],np.arange(6))
ValueError: Both x and bins must have non-zero length

the only meaningful test cases, I can think of, work both with
array([0]) or empty array

>>> np.sum(x[x == 7]) == np.bincount(x[x == 7]).sum()
True

>>> 1.*np.array([0]).astype(int) / np.sum(x[x == 7])
array([ NaN])
>>> 1.*np.array([]).astype(int) / np.sum(x[x == 7])
array([], dtype=float64)

>>> count = np.bincount(x[x == 7])
>>> count[count > 0]
array([], dtype=int32)

I'm slightly in favor of returning an empty array rather than
array([0]) as Keith got it.

Josef

> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>