[Numpy-discussion] np.bincount raises MemoryError when given an empty array

Tue Feb 2 01:03:23 EST 2010

On Tue, Feb 2, 2010 at 12:57 AM,  <josef.pktd at gmail.com> wrote:
> On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd at gmail.com> wrote:
>>>
>>> On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>> >
>>> >
>>> > On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape at gmail.com>
>>> > wrote:
>>> >>
>>> >> On Tue, Feb 2, 2010 at 1:05 PM,  <josef.pktd at gmail.com> wrote:
>>> >>
>>> >> > I think this could be considered as a correct answer, the count of
>>> >> > any
>>> >> > integer is zero.
>>> >>
>>> >> Maybe, but this shape is random - it would be different in different
>>> >> conditions, as the length of the returned array is just some random
>>> >> memory location.
>>> >>
>>> >> >
>>> >> > Returning an array with one zero, or the empty array or raising an
>>> >> > exception? I don't see much of a pattern
>>> >>
>>> >> Since there is no obvious solution, the only rationale for not raising
>>> >> an exception  I could see is to accommodate often-encountered special
>>> >> cases. I find returning [0] more confusing than returning empty
>>> >> arrays, though - maybe there is a usecase I don't know about.
>>> >>
>>> >
>>> > In this case I would expect an empty input to be a programming error and
>>> > raising an error to be the right thing.
>>>
>>> Not necessarily, if you run the bincount over groups in a dataset and
>>> your not sure if every group is actually observed. The main question,
>>> is whether the user needs or wants to check for empty groups before or
>>> after the loop over bincount.
>>>
>>
>> How would they know which bin to check? This seems like an unlikely way to
>> check for an empty input.
>
> # grade (e.g. SAT) distribution by school and race
> for s in schools:
>    for r in race:
>      print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)])

      a = np.bincount(allstudentgrades[(sch==s)*(ra==r)])
      print s, r, 100.*a /a.sum()

to get distribution with empty or nan

>
> allwhite schools and allblack schools raise an exception.
>
> I just made up the story, my first attempt was: all sectors, all
> firmsize groups, bincount something, will have empty cells for some
> size groups, e.g. nuclear power in family business.
>
> Josef
>
>>
>>>
>>> Like
>>> >>> np.sum([])
>>> 0.0
>>> >>> sum([])
>>> 0
>>> the empty array or the array([0]) can be considered as the default
>>> argument. In this case it is not really a programming error.
>>>
>>
>> I like that better than an empty array.
>>
>>>
>>> Since bincount usually returns redundant zero count unless
>>> np.unique(data) = np.arange(data.max()+1),
>>> array([0]) would also make sense as a minimum answer
>>> >>> np.bincount([7,8,9])
>>> array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1])
>>>
>>> I use bincount quite a lot but only with fixed sized arrays, so I
>>> never actually used it in this way (yet).
>>>
>>
>> Chuck
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>