[SciPy-user] Getting the right numerical libraries for scipy

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Apr 3 10:49:57 EDT 2009


On Fri, Apr 3, 2009 at 10:12 AM, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:
> David Cournapeau wrote:
>>
>> I think it will depend on your dot implementation (does it uses atlas or
>> other heavily optimized implementation). This is to be taken with a
>> (big) grain of salt since I don't know much about sparse matrices, but
>> if the distribution is purely random, then I can see how sparse matrices
>> would be much slower than contiguous arrays. Memory access is often the
>> bottleneck for simple FPU operations on big data, and random memory
>> access just kills access performances  (can be order of magnitude slower
>> - a cache miss on modern CPU costs ~ 250 cycles).

I think, csr and csc matrices are row and column contiguous in the
non-zero elements, so the position of the elements might not matter so
much for random memory access.

I'm surprised sparse dot only takes twice the time on my computer than
np.dot (with atlas from 1.3.0b1 installer) for matrices larger than
(500,500) with 100% density.
William mentioned 100 times slower, and Stefan has more than 5 times slower.

The only time, I tried to use sparse matrices for kernel ridge
regression. When we use the "kernel trick" from machine learning, the
kernel matrix is number of observation squared, which can get very
large and might take a long time to invert or solve a linear matrix
equation. On my "small" computer, I worried more about memory than
speed.

Josef



More information about the SciPy-User mailing list