[Numpy-discussion] Fancier indexing

Thu May 22 12:22:16 EDT 2008

On Thu, May 22, 2008 at 4:59 PM, Kevin Jacobs <jacobs at bioinformed.com>
<bioinformed at gmail.com> wrote:
> After poking around for a bit, I was wondering if there was a faster method
> for the following:
>
> # Array of index values 0..n
> items = numpy.array([0,3,2,1,4,2],dtype=int)
>
> # Count the number of occurrences of each index
> counts = numpy.zeros(5, dtype=int)
> for i in items:
>   counts[i] += 1
>
> In my real code, 'items' contain up to a million values and this loop will
> be in a performance critical area of code.  If there is no simple solution,
> I can trivially code this using the C-API.

I would use bincount:
count = bincount(items)
should be all you need:


In [192]: items = [0,3,2,1,4,2]

In [193]: bincount(items)
Out[193]: array([1, 1, 2, 1, 1])

In [194]: bincount?
Type:           builtin_function_or_method
Base Class:     <type 'builtin_function_or_method'>
String Form:    <built-in function bincount>
Namespace:      Interactive
Docstring:
    bincount(x,weights=None)

    Return the number of occurrences of each value in x.

    x must be a list of non-negative integers.  The output, b[i],
    represents the number of times that i is found in x.  If weights
    is specified, every occurrence of i at a position p contributes
    weights[p] instead of 1.

    See also: histogram, digitize, unique.

Robin