[SciPy-User] faster nonzero indices
Francesc Alted
faltet at pytables.org
Wed Oct 21 03:55:22 EDT 2009
A Wednesday 21 October 2009 06:02:53 Felix Schlesinger escrigué:
> Is there a faster way to do:
>
> foo = scipy.nonzero(bar > 1)[0]
>
> where bar is a 1d ndarray of type 'int32'
> i.e. to get all indices of an array for which a condition is true.
>
> Since in this case the arrays are quite large and the condition is only
> true for few items creating a long boolean array and then passing over it
> again to find non zero entries seems inefficient.
If the number of elements that evaluates the condition to true is effectively
small, and you can afford to have a precomputed array with indexes in memory
(typically, an `arange()`) you can try with numexpr [1]:
In [1]: import numpy as np
In [2]: import numexpr as ne
In [3]: bar = np.random.randint(0,1e6,1e6).astype('int32')
In [4]: timeit np.where(bar > 999000)[0]
100 loops, best of 3: 12.1 ms per loop
In [5]: idx = np.arange(len(bar))
In [6]: timeit idx[ne.evaluate('where(bar > 999000, 1, 0)').astype('bool')]
100 loops, best of 3: 7.68 ms per loop
which is more than 1.5x times faster than the numpy counterpart.
Even if you have to compute idx each time, the above approach is faster than
numpy:
In [7]: timeit np.arange(len(bar))[ne.evaluate('where(bar > 999000, 1,
0)').astype('bool')]
100 loops, best of 3: 11 ms per loop
although in that case, just by a meager 10%.
[1] http://code.google.com/p/numexpr/
--
Francesc Alted
More information about the SciPy-User
mailing list