[SciPy-dev] sparsetools - new and (hopefully) improved!

Wed Jan 10 04:46:54 EST 2007

Nathan Bell wrote:
> Well, for an NxM matrix with R nonzeros per row, the old method is
> O( N*R + N + M)
> while row-by-row sorting is
> O(N * R log R + M)
> so asymptotically the old method wins (for sufficiently large R).
> However, even for R=1000 my testing shows the latter to still be
> noticeably faster (see below).  For the common case (R < 100), the row
> by row sort is ~5-8x faster.  While these tests aren't comprehensive,
> I expect the row by row sort to maintain an advantage in all
> (practical) cases of interest.

So it's ok. I have just looked at
http://en.wikipedia.org/wiki/Category:Sort_algorithms, there are some
non-comparison sort algorithms with just O(R) complexity, usually
needing a temp array of size of the range of values (which is already
there) and moreover the values to be sort never repeat -> we might be
able to do even better. But I am happy for now.

The main advantage is the halved memory usage -> we can see it on the
'sys' part of the times below.

> from scipy import *
> N = 10000
> 
> #R = 1000
> A = sparse.spdiags(rand(1000,N),arange(1000)-500,N,N)
> 
> In [5]: time B = A.tocsr().tocsc()
> CPU times: user 2.12 s, sys: 0.84 s, total: 2.96 s
> Wall time: 2.98
> In [6]: time A.ensure_sorted_indices(True)
> CPU times: user 0.89 s, sys: 0.00 s, total: 0.89 s
> Wall time: 0.89