[Numpy-discussion] strange divergence in performance

Robert Kern robert.kern at gmail.com
Wed Jan 20 17:17:26 EST 2010


2010/1/20 Ernest Adrogué <eadrogue at gmx.net>:
> Hi,
>
> I have a function where an array of integers (1-d) is compared
> element-wise to an integer using the greater-than operator.
> I noticed that when the integer is 0 it takes about 75% more time
> than when it's 1 or 2. Is there an explanation?
>
> Here is a stripped-down version which does (sort of)show what I say:
>
> def filter_array(array, f1, f2, flag=False):
>
>    if flag:
>        k = 1
>    else:
>        k = 0
>
>    m1 = reduce(np.add, [(array['f1'] == i).astype(int) for i in f1]) > 0
>    m2 = reduce(np.add, [(array['f2'] == i).astype(int) for i in f2]) > 0
>
>    mask = reduce(np.add, (i.astype(int) for i in (m1, m2))) > k
>    return array[mask]
>
> Now let's create an array with two fields:
>
> a = np.array(zip( np.random.random_integers(0,10,size=5000), np.random.random_integers(0,10,size=5000)), dtype=[('f1',int),('f2',int)])
>
> Now call the function with flag=True and flag=False, and see what happens:
>
> In [29]: %timeit filter_array(a, (6,), (0,), flag=False)
> 1000 loops, best of 3: 536 us per loop
>
> In [30]: %timeit filter_array(a, (6,), (0,), flag=True)
> 1000 loops, best of 3: 245 us per loop
>
> In this example the difference seems to be 1:2. In my program
> is 1:4. I am at a loss about what causes this.

It is not the > operator that exhibits the difference.

In [28]: x = np.random.random_integers(0,10,size=5000)

In [29]: %timeit m = x > 0
100000 loops, best of 3: 19.1 us per loop

In [30]: %timeit m = x > 1
100000 loops, best of 3: 19.3 us per loop


The difference is in the array[mask]. There are necessarily fewer True
elements in the mask for >1 than >0.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list