[SciPy-dev] slow setitem

Wed Dec 2 03:24:27 EST 2009

2009/12/1 Nathan Bell <wnbell at gmail.com>:
> 2009/11/30 Benny Malengier <benny.malengier at gmail.com>:
>> A question about the sparse matrix and suggestion for speed improvement.
>>
>> I have a code of 1000*1000 sparse matrix of 11 full diagonals. Using csr.
>> Assigning the matrix via setdiags takes 14 seconds, of which 13
>> seconds is in the check_format function of sparse/base.py.
>>
>> This is due to setdiag doing:
>>
>>     self[i, i + k] = v
>>
>> while setitem in compressed.py always does a
>>
>>     self.check_format(full_check=True)
>>
>> Removing the check_format reduces assignment of the matrix to 1 second.
>> Is it really needed that the assignmentof setdiag is again checked via
>> check_format?
>> Allowing for a setitem that does not call check_format would be usefull.
>>
>> One could set a dirty flag on a setitem call, and do a check_format
>> only when a computation is performed and the flag is dirty. On the
>> other hand, I'm not making errors, and to have a flag like
>> NOCHECKS=TRUE would be handy as it would speed up the code a lot.
>> I could create a solution that is acceptable for scipy, but then I'd
>> need to know how such a solution should look like.
>>
>> Note that there are quite some zeroes in my diagonals now too, that
>> are now set to 0., if setitem would be fast, I could drop setdiag
>> call, and do a fast setitem call. Or are sparse matrixes supposed to
>> be used differently?
>
> Hi Pavol,
>
> Try creating your matrix with the spdiags() function or the related
> dia_matrix class.  Constructing a matrix from diagonals with one of
> these methods will be at least 100x faster than inserting into a
> csr_matrix.
>
> We don't optimize the csr_matrix element insertion code because it's
> inherently slow, checks or no checks.  Every time you introduce a new
> nonzero the csr_matrix must perform an O(N) update to the underlying
> data structure (hence, inserting N elements is O(N^2)).

Thanks,

As an update, we need to solve a matrixequation some 2000 times, but
it is a fixed band structure so there is no problem with changing the
data structure after the first creation via spdiags. After having
looked at the scipy code I see we can do it a lot more performant
probably by using a array as required for lapack, and then doing
sl.lapack.get_lapack_funcs(['gbsv'])
to obtain the banded lapack solver. I'll try if really faster later this week.

With spsolve we made it performant now by doing direct setting of
spmat.data, as fortnunately the data structure of a sparse matrix is
available in the api, and it is easy to know where an element of a
banded matrix ends up in a csr matrix. This made me think it might be
usefull to create a banded matrix class on top of csr, it would allow
for really fast setting of complete diagonals. But as said, probably
quicker to just use the lapack array structure for a banded sparse
matrix (but not if far off diagonals as arise eg in discretization via
method of lines of 2/3D problems)

So, looking back at this, I think the main problem for scipy is:
1/need to add documentation in functions as setdiag and in base
classes that it is slow, so not good to use in loops
2/need to add documentation in
http://docs.scipy.org/doc/scipy/reference/sparse.html in the API of
sparse matrixes, so indicating one can access data, indexes, .... of
csc and csr matrixes to achieve great speedup (if you know what you
are doing).
3/need to add documentation with example of how one can solve sparse
matrixic using lapack. A new users that wants to solve a sparse matrix
system with scipy looks in the documentation and arrives at spsolve,
unaware it probably is not the best way to solve his problem. It is
strange for experienced people to see new PhD students starting and
having no idea that blas and lapack exist.
Note that if you search in google on 'lapack scipy' you do not find
any example at all.

So I guess when finished coding, I should make a small tutorial or so
about solving a large system with scipy. The doc of sparse I find so
lacking I wouldn't know where to begin to update it. I wouldn't mind
spending a day improving the doc of sparse, but I'm really using this
part of scipy for the first time, so it would be colored with my
interpretation.

Benny

>
> --
> Nathan Bell wnbell at gmail.com
> http://www.wnbell.com/
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>