[Numpy-discussion] Optimization of loops

Thu Oct 23 12:34:52 EDT 2008

> >> spikes = [(0, 2.3),(1, 5.6),(3, 2.5),(0, 5.2),(3, 10.2),(2, 16.2)]
> 
> mysort(spikes) 
> 
> should return:
> 
> [[2.3, 5.2], [5.6], [16.2], [2.5, 10.2]]
> 
> Intuitively, the simplest way to do that is to append elements while going 
> through all the tuples of the main list (called spikes) to empty lists:
> 
> res = [[] for i in xrange(N)]
> 
> for id, time in my_list:
>         res[id].append(time)
> 
> But this loop seems to be incredibly slow for large lists ! 
> 
> A faster way (after having performed some profiling) seems to do:
> spikes = numpy.array(spikes) # Convert the list into a numpy array
> res = []
> for id in xrange(N):
>         res.append(spikes[spikes[:,0] == id, 1]) # Use Numpy indexes
> 
> Nevertheless, this is still rather slow. Does anybody have any idea about a 
> faster way to do this ? Is there a Numpy function that could be used ?

I'm surprised you find using numpy indexing (your second method) is
faster. In my simple test (N=10, 100000 points in the list), your method
1 (using python lists) took 0.15s whereas your method 2 (numpy indexing)
took 1.8s. Big difference. 

I wonder how you are profiling this. I expect the numpy indexing to be
quite fast (and you only do it N times) but the overhead of copying the
entire list into an array is probably the bottleneck. If you need to
iterate over the final output arrays, there's a further penalty in using
a numpy array as iteration is not as fast as for a python list. 

There might be some benefit in using numpy arrays if you can create your
input directly as an array without creating the list-of-tuples first.

If you really need more speed you could try Cython (a means of compiling
python to C-code, while adding type-declarations to permit
optimisation). There may be a worthwhile improvement running the
python-list method through Cython (I'll try this tonight to satisfy my
own curiosity). If you can arrange for the inputs and outputs to be
numpy arrays, then you can do the iteration over the data and copy into
output array using pure C (but coded in Cython; much easier than C).
This will be as fast as it's possible to go. You need one pass to figure
out how big the output arrays need to be then a second to copy the data.

Finally, an even better approach may be to avoid creating such a large
data-structure in the first place. If you can replace your list with a
generator, you save time by avoiding the need to allocate the memory to
hold a large list. Similarly, if you don't need to store the output
lists for repeated use, then outputing a list of generators may be more
efficient. Whether this is viable depends on the context.

HTH
BC

> 
> Thanks in advance, 
> 
> Pierre