[Numpy-discussion] Timing array construction

Mark Janikas mjanikas at esri.com
Thu Apr 30 12:51:55 EDT 2009


Thanks Eric!

I have a lot of array constructions in my code that use NUM.array([list of values])... I am going to replace it with the empty allocation and insertion.  It is indeed twice as fast as "c_" (when it matters, I.e. N is relatively large):

	"c_", "empty"
100 0.0007, 0.0230
200 0.0007, 0.0002
400 0.0007, 0.0002
800 0.0020, 0.0002
1600 0.0009, 0.0003
3200 0.0010, 0.0003
6400 0.0013, 0.0005
12800 0.0058, 0.0032

-----Original Message-----
From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Eric Firing
Sent: Wednesday, April 29, 2009 11:49 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Timing array construction

Mark Janikas wrote:
> Hello All,
> 
>  
> 
> I was exploring some different ways to concatenate arrays, and using 
> "c_" is the fastest by far.  Is there a difference I am missing that can 
> account for the huge disparity?  Obviously the "zip" function makes the 
> "as array" and "array" calls slower, but the same arguments (xCoords, 
> yCoords) are being passed to the methods... so if there is no difference 
> in the outputs (there doesn't appear to be) then what reason would I 
> have to use "array" or "as array" in this context?  Thanks so much ahead 
> of time..

If you really want speed, use something like this:

import numpy as np
def useEmpty(xCoords, yCoords):
     out = np.empty((len(xCoords), 2), dtype=xCoords.dtype)
     out[:,0] = xCoords
     out[:,1] = yCoords
     return out

It is quite a bit faster than using c_; more than a factor of two on my 
machine for all your test cases.

All your methods using zip and array are doing a lot of unpacking, 
repacking, checking, iterating... Even the c_ method is slower than it 
needs to be for this case because it is more general and flexible.

Eric
> 
>  
> 
> MJ
> 
>  
> 
> ############## Snippet ###################
> 
> import numpy as NUM
> 
>  
> 
> def useAsArray(xCoords, yCoords):
> 
>     return NUM.asarray(zip(xCoords, yCoords))
> 
>  
> 
> def useArray(xCoords, yCoords):
> 
>     return NUM.array(zip(xCoords, yCoords))
> 
>  
> 
> def useC(xCoords, yCoords):
> 
>     return NUM.c_[xCoords, yCoords]
> 
>  
> 
>  
> 
> if __name__ == "__main__":
> 
>     from timeit import Timer
> 
>     import numpy.random as RAND
> 
>     import collections as COLL
> 
>  
> 
>     resAsArray = COLL.defaultdict(float)
> 
>     resArray = COLL.defaultdict(float)
> 
>     resMat = COLL.defaultdict(float)
> 
>     numTests = 0.0
> 
>     sameTests = 0.0
> 
>     N = [100, 200, 400, 800, 1600, 3200, 6400, 12800]
> 
>     for i in N:
> 
>         print "Time Join List into Array for N = " + str(i)
> 
>         xCoords = RAND.normal(10, 1, i)
> 
>         yCoords = RAND.normal(10, 1, i)
> 
>  
> 
>         statement = 'from __main__ import xCoords, yCoords, useAsArray'
> 
>         t1 = Timer('useAsArray(xCoords, yCoords)', statement)
> 
>         resAsArray[i] = t1.timeit(10)
> 
>  
> 
>         statement = 'from __main__ import xCoords, yCoords, useArray'
> 
>         t2 = Timer('useArray(xCoords, yCoords)', statement)
> 
>         resArray[i] = t2.timeit(10)
> 
>  
> 
>         statement = 'from __main__ import xCoords, yCoords, useC'
> 
>         t3 = Timer('useC(xCoords, yCoords)', statement)
> 
>         resMat[i] = t3.timeit(10)          
> 
>  
> 
>     for n in N:
> 
>         print "%i, %0.4f, %0.4f, %0.4f" % (n, resAsArray[n], 
> resArray[n], resMat[n])
> 
> ###############################################################
> 
>  
> 
> RESULT
> 
>  
> 
> N, useAsArray, useArray, useC
> 
> 100, 0.0066, 0.0065, 0.0007
> 
> 200, 0.0137, 0.0140, 0.0008
> 
> 400, 0.0277, 0.0288, 0.0007
> 
> 800, 0.0579, 0.0577, 0.0008
> 
> 1600, 0.1175, 0.1289, 0.0009
> 
> 3200, 0.2291, 0.2309, 0.0012
> 
> 6400, 0.4561, 0.4564, 0.0013
> 
> 12800, 0.9218, 0.9122, 0.0019
> 
>  
> 
>  
> 
> Mark Janikas
> 
> Product Engineer
> 
> ESRI, Geoprocessing
> 
> 380 New York St.
> 
> Redlands, CA 92373
> 
> 909-793-2853 (2563)
> 
> mjanikas at esri.com <mailto:mjanikas at esri.com>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list