[SciPy-User] faster nonzero indices

Wed Oct 21 03:55:22 EDT 2009

A Wednesday 21 October 2009 06:02:53 Felix Schlesinger escrigué:
> Is there a faster way to do:
>
> foo = scipy.nonzero(bar > 1)[0]
>
> where bar is a 1d ndarray of type 'int32'
> i.e. to get all indices of an array for which a condition is true.
>
> Since in this case the arrays are quite large and the condition is only
> true for few items creating a long boolean array and then passing over it
> again to find non zero entries seems inefficient.

If the number of elements that evaluates the condition to true is effectively 
small, and you can afford to have a precomputed array with indexes in memory 
(typically, an `arange()`) you can try with numexpr [1]:

In [1]: import numpy as np

In [2]: import numexpr as ne

In [3]: bar = np.random.randint(0,1e6,1e6).astype('int32')

In [4]: timeit np.where(bar > 999000)[0]
100 loops, best of 3: 12.1 ms per loop

In [5]: idx = np.arange(len(bar))

In [6]: timeit idx[ne.evaluate('where(bar > 999000, 1, 0)').astype('bool')]
100 loops, best of 3: 7.68 ms per loop

which is more than 1.5x times faster than the numpy counterpart.

Even if you have to compute idx each time, the above approach is faster than 
numpy:

In [7]: timeit np.arange(len(bar))[ne.evaluate('where(bar > 999000, 1, 
0)').astype('bool')]
100 loops, best of 3: 11 ms per loop

although in that case, just by a meager 10%.

[1] http://code.google.com/p/numexpr/

-- 
Francesc Alted