[Numpy-discussion] Numpy array performance issue

Wed Feb 24 11:02:10 EST 2010

On Wed, Feb 24, 2010 at 09:55, Bruno Santos <bacmsantos at gmail.com> wrote:
> Hello everyone,
> I am using numpy arrays whenever I demand performance from my
> algorithms. Nevertheless, I am having a performance issue at the moment
> mainly because I am iterating several times over numpy arrays. Fot that
> reason I decided to use timeit to see the performance of different versions
> of the same procedure. What surprised me was that in fact Python lists are
> performing almost ten times faster than numpy. Why is this happening.

Pulling items out of an array (either explicitly, or via iteration as
you are doing here) is expensive because numpy needs to make a new
object for each item. numpy stores integers and floats efficiently as
their underlying C data, not the Python object. numpy is optimized for
bulk operations on arrays, not for iteration over the items of an
array with Python for loops.

> My test code is this:
> list1 = [random.randint(0,20) for i in xrange(100)]
>  list2 = numpy.zeros(100,dtype='Int64')
> for i in xrange(100):list2[i]=random.randint(0,20)
> def test1(listx):
>
> return len([elem for elem in list if elem >=10])

The idiomatic way of doing this for numpy arrays would be:

def test2(arrx):
    return (arrx >= 10).sum()

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco