[Numpy-discussion] Slicing slower than matrix multiplication?

Sat Dec 12 06:59:16 EST 2009

Francesc Alted wrote:
> ...
> Yeah, I think taking slices here is taking quite a lot of time:
> 
> In [58]: timeit E + Xi2[P/2,:]
> 100000 loops, best of 3: 3.95 µs per loop
> 
> In [59]: timeit E + Xi2[P/2]
> 100000 loops, best of 3: 2.17 µs per loop
> 
> don't know why the additional ',:' in the slice is taking so much time, but my 
> guess is that passing & analyzing the second argument (slice(None,None,None)) 
> could be the responsible for the slowdown (but that is taking too much time).  
> Mmh, perhaps it would be worth to study this more carefully so that an 
> optimization could be done in NumPy.

This is indeed interesting! And very nice that this actually works the 
way you'd expect it to. I guess I've just worked too long with Matlab :)

>> I think the lesson mostly should be that with so little data,
>> benchmarking becomes a very difficult art.
> 
> Well, I think it is not difficult, it is just that you are perhaps 
> benchmarking Python/NumPy machinery instead ;-)  I'm curious whether Matlab 
> can do slicing much more faster than NumPy.  Jasper?

I had a look, these are the timings for Python for 60x20:
   Dot product: 0.051165 (5.116467e-06 per iter)
   Add a row: 0.092849 (9.284860e-06 per iter)
   Add a column: 0.082523 (8.252348e-06 per iter)
For Matlab 60x20:
   Dot product: 0.029927 (2.992664e-006 per iter)
   Add a row: 0.019664 (1.966444e-006 per iter)
   Add a column: 0.008384 (8.384376e-007 per iter)
For Python 600x200:
   Dot product: 1.917235 (1.917235e-04 per iter)
   Add a row: 0.113243 (1.132425e-05 per iter)
   Add a column: 0.162740 (1.627397e-05 per iter)
For Matlab 600x200:
   Dot product: 1.282778 (1.282778e-004 per iter)
   Add a row: 0.107252 (1.072525e-005 per iter)
   Add a column: 0.021325 (2.132527e-006 per iter)

If I fit a line through these two data points (60 and 600 rows), I get 
the following equations:
   Python, AR: 3.8e-5 * n + 0.091
   Matlab, AC: 2.4e-5 * n + 0.0069
This would suggest that Matlab performs the vector addition about 1.6 
times faster and has a 13 times smaller constant cost!

As for the questions about what I'm trying to compute, these tests are 
minimized as much as possible to show the bottleneck I encountered, they 
are part of a larger loop where it does make sense. In essence I'm 
iteratively adjusting w and E has to keep up (because that's what is 
used to determine the next change). Instead of recomputing E all the 
time based on E = Xi*w a little linear algebra shows that the vector 
addition is sufficient.