[Numpy-discussion] np.bincount raises MemoryError when given an empty array
josef.pktd at gmail.com
josef.pktd at gmail.com
Mon Feb 1 23:05:22 EST 2010
On Mon, Feb 1, 2010 at 8:37 PM, David Cournapeau <david at silveregg.co.jp> wrote:
> josef.pktd at gmail.com wrote:
>> On Mon, Feb 1, 2010 at 12:09 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>> 2010/2/1 Ernest Adrogué <eadrogue at gmx.net>:
>>>> Hello,
>>>>
>>>> Consider the following code:
>>>>
>>>> for j in range(5):
>>>> f = np.bincount(x[y == j])
>>>>
>>>> It fails with MemoryError whenever y == j is all False element-wise.
>>>>
>>>>
>>>> In [96]: np.bincount([])
>>>> ---------------------------------------------------------------------------
>>>> MemoryError Traceback (most recent call last)
>>>>
>>>> /home/ernest/<ipython console> in <module>()
>>>>
>>>> MemoryError:
>>>>
>>>> In [97]: np.__version__
>>>> Out[97]: '1.3.0'
>>>>
>>>> Is this a bug?
>>>>
>>>> Bye.
>>> I get it to work sometimes:
>>>
>>> $ ipython
>>>>> import numpy as np
>>>>> np.bincount([])
>>> ---------------------------------------------------------------------------
>>> MemoryError:
>>>>> np.bincount(())
>>> array([0])
>>>>> np.bincount([])
>>> array([0])
>>>>> np.bincount([])
>>> ---------------------------------------------------------------------------
>>> MemoryError:
>>>>> np.__version__
>>> '1.4.0rc2'
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>> I don't get a memory error but the results are strange for empty
>
> That may just be because you have enough memory for the (bogus) result:
> the value is a random memory value interpreted as an intp value, hence
> most likely very big on 64 bits system.
>
> It should be easy to fix, but I am not sure what is the expected result.
> An empty array ?
>>> np.bincount([])
array([0, 0, 0, ..., 0, 0, 0])
>>> np.bincount(np.array([]).astype(int))
array([0, 0, 0, ..., 0, 0, 0])
>>> np.bincount(())
array([0, 0, 0, ..., 0, 0, 0])
>>> np.bincount(()).shape
(41570297,)
I think this could be considered as a correct answer, the count of any
integer is zero.
Returning an array with one zero, or the empty array or raising an
exception? I don't see much of a pattern
>>> x=np.arange(5);np.unique(x[x == 7])
array([], dtype=int32)
>>> np.unique(x[x == 7], return_index=1)
(array([], dtype=int32), array([], dtype=bool))
>>> np.unique(x[x == 7], return_inverse=1)
(array([], dtype=int32), array([], dtype=bool))
>>> x=np.arange(5);np.histogram(x[x == 7])
Traceback (most recent call last):
File "<pyshell#136>", line 1, in <module>
x=np.arange(5);np.histogram(x[x == 7])
File "C:\Programs\Python25\Lib\site-packages\numpy\lib\function_base.py",
line 202, in histogram
range = (a.min(), a.max())
ValueError: zero-size array to ufunc.reduce without identity
>>> x=np.arange(5);np.digitize(x[x == 7],np.arange(6))
Traceback (most recent call last):
File "<pyshell#140>", line 1, in <module>
x=np.arange(5);np.digitize(x[x == 7],np.arange(6))
ValueError: Both x and bins must have non-zero length
the only meaningful test cases, I can think of, work both with
array([0]) or empty array
>>> np.sum(x[x == 7]) == np.bincount(x[x == 7]).sum()
True
>>> 1.*np.array([0]).astype(int) / np.sum(x[x == 7])
array([ NaN])
>>> 1.*np.array([]).astype(int) / np.sum(x[x == 7])
array([], dtype=float64)
>>> count = np.bincount(x[x == 7])
>>> count[count > 0]
array([], dtype=int32)
I'm slightly in favor of returning an empty array rather than
array([0]) as Keith got it.
Josef
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list