NumPy Slow ??

David M. Cooke cookedm at physics.mcmaster.ca
Mon Sep 18 18:28:30 EDT 2000


At some point, hdemers at venus.astro.umontreal.ca (Hugues Demers) wrote:

> Why is it that this line take ~30 sec to execute on a 500 MHz pentium III
> 128Meg RAM ?
> 
> data1 = choose(greater(data,z2),(data,z2))
> 
> where data is a 2048x2080 array of float and z2 is a float.
> 
> I thought that NumPy functions where written in C for faster execution. Or
> maybe that I'm wrong and this is fast execution?

Was that a C float or a NumPy Float (which is a C double)? I'll assume
Float.

The code below took ~2sec on my 550 MHz PIII with 256 Meg RAM:
>>> from Numeric import *
>>> from RandomArray import *
>>> data = random( (2048, 2080) )
>>> data1 = choose(greater(data, 0.5), (data, 0.5))

Note that greater creates a new array (of longs), and so does choose
(of doubles). There are 2048*2080=4 259 840 elements per array. A
double is 8 bytes and a long is 4, so the total memory taken by the
three arrays is 2048*2080*(8+4+8)= 81.25Meg. It's likely then that
your machine has to swap some of that in and out. At the end, though,
the array created by greater should be garbage collected.

Indeed, if I save the array created by greater, I can see that python
is using 83Meg of memory.

Using a python loop, this took ~15sec, without creating an intermediate
array:
>>> from copy import copy
>>> data1 = copy(data)     # this is fast
>>> d1f = data1.flat
>>> for i in xrange(0, d1f.shape[0]):
>>>     if d1f[i] > 0.5: d1f[i] = 0.5

Obviously, you could also do this in place. If you really need more
speed, write a C extension.

Moral of the story: to deal with a lot of data quickly, you need a lot
of memory.

-- 
|>|\/|<
----------------------------------------------------------------------------
David M. Cooke
cookedm at mcmaster.ca



More information about the Python-list mailing list