[Numpy-discussion] when and where to use numpy arrays vs nested lists

Charles R Harris charlesr.harris at gmail.com
Thu Mar 1 12:26:34 EST 2007


On 3/1/07, Mark P. Miller <mpmusu at cc.usu.edu> wrote:
>
> I've been using Numpy arrays for some work recently.  Just for fun, I
> compared some "representative" code using Numpy arrays and an object
> comprised of nested lists to represent my arrays.  To my surprise, the
> array of nested lists outperformed Numpy in this particular application
> (in my actual code by 10%, results below are more dramatic).
>
> Can anyone shed some insight here?  The functions that I use in reality
> are much more complicated than those listed below, but they are
> nonetheless representative of the type of thing that I'm doing.
>
>
> ##imports
> import numpy as NP
> from numpy.random import randint
>
> #numpy array code
> array1 = NP.zeros((50,50), int)
>
> def random1():
>      c = array1(randint(10), randint(10))
>
> t=timeit.Timer("random1()", "from __main__ import random1")
> >>> t.timeit(10000)
> 0.1085283185432786
> >>> t.timeit(10000)
> 0.10784806448862128
> >>> t.timeit(10000)
> 0.1095533091495895
>
>
> #python 2d array based on nested lists
> array2 = []
> for aa in xrange(50):
>     array2.append([])
>     for bb in xrange(50):
>         array2[aa].append([])
>         array2[aa][bb] = 0
>
> def random2():
>      c = array2[randint(50)][randint(50)]
>
> >>> t=timeit.Timer("random2()", "from __main__ import random2")
> >>> t.timeit(10000)
> 0.076737965300061717
> >>> t.timeit(10000)
> 0.072883564810638291
> >>> t.timeit(10000)
> 0.07668181291194287


I'm going to guess that it is the indexing. Numpy tends to be slow when
using explicit indexes and they should be avoided when possible.
Vectorization also helps in this case.

In [19]: def random1() :
   ....:     c = array1[randint(10), randint(10)]
   ....:

In [20]: def random2() :
   ....:     i = randint(10, size=10000)
   ....:     j = randint(10, size=10000)
   ....:     c = array1[i,j]
   ....:

In [21]: t=timeit.Timer("random1()", "from __main__ import random1")

In [22]: t.timeit(10000)
Out[22]: 0.032405853271484375


In [28]: t=timeit.Timer("random2()", "from __main__ import random2")

In [29]: t.timeit(1)
Out[29]: 0.0022358894348144531

Speeds things up about 15x.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20070301/00b9af0b/attachment.html>


More information about the NumPy-Discussion mailing list