[Numpy-discussion] strange divergence in performance

Wed Jan 20 17:26:01 EST 2010

20/01/10 @ 16:17 (-0600), thus spake Robert Kern:
> 2010/1/20 Ernest Adrogué <eadrogue at gmx.net>:
> > Hi,
> >
> > I have a function where an array of integers (1-d) is compared
> > element-wise to an integer using the greater-than operator.
> > I noticed that when the integer is 0 it takes about 75% more time
> > than when it's 1 or 2. Is there an explanation?
> >
> > Here is a stripped-down version which does (sort of)show what I say:
> >
> > def filter_array(array, f1, f2, flag=False):
> >
> >    if flag:
> >        k = 1
> >    else:
> >        k = 0
> >
> >    m1 = reduce(np.add, [(array['f1'] == i).astype(int) for i in f1]) > 0
> >    m2 = reduce(np.add, [(array['f2'] == i).astype(int) for i in f2]) > 0
> >
> >    mask = reduce(np.add, (i.astype(int) for i in (m1, m2))) > k
> >    return array[mask]
> >
> > Now let's create an array with two fields:
> >
> > a = np.array(zip( np.random.random_integers(0,10,size=5000), np.random.random_integers(0,10,size=5000)), dtype=[('f1',int),('f2',int)])
> >
> > Now call the function with flag=True and flag=False, and see what happens:
> >
> > In [29]: %timeit filter_array(a, (6,), (0,), flag=False)
> > 1000 loops, best of 3: 536 us per loop
> >
> > In [30]: %timeit filter_array(a, (6,), (0,), flag=True)
> > 1000 loops, best of 3: 245 us per loop
> >
> > In this example the difference seems to be 1:2. In my program
> > is 1:4. I am at a loss about what causes this.
> 
> It is not the > operator that exhibits the difference.
> 
> In [28]: x = np.random.random_integers(0,10,size=5000)
> 
> In [29]: %timeit m = x > 0
> 100000 loops, best of 3: 19.1 us per loop
> 
> In [30]: %timeit m = x > 1
> 100000 loops, best of 3: 19.3 us per loop
> 
> 
> The difference is in the array[mask]. There are necessarily fewer True
> elements in the mask for >1 than >0.

Ahh, I see... seems obvious now.

Thanks!