[Numpy-discussion] strange divergence in performance

Wed Jan 20 16:56:40 EST 2010

Hi,

I have a function where an array of integers (1-d) is compared
element-wise to an integer using the greater-than operator. 
I noticed that when the integer is 0 it takes about 75% more time
than when it's 1 or 2. Is there an explanation?

Here is a stripped-down version which does (sort of)show what I say:

def filter_array(array, f1, f2, flag=False):

    if flag:
        k = 1
    else:
        k = 0

    m1 = reduce(np.add, [(array['f1'] == i).astype(int) for i in f1]) > 0
    m2 = reduce(np.add, [(array['f2'] == i).astype(int) for i in f2]) > 0

    mask = reduce(np.add, (i.astype(int) for i in (m1, m2))) > k
    return array[mask]

Now let's create an array with two fields:

a = np.array(zip( np.random.random_integers(0,10,size=5000), np.random.random_integers(0,10,size=5000)), dtype=[('f1',int),('f2',int)])

Now call the function with flag=True and flag=False, and see what happens:

In [29]: %timeit filter_array(a, (6,), (0,), flag=False)
1000 loops, best of 3: 536 us per loop

In [30]: %timeit filter_array(a, (6,), (0,), flag=True)
1000 loops, best of 3: 245 us per loop

In this example the difference seems to be 1:2. In my program
is 1:4. I am at a loss about what causes this.

Bye.