[SciPy-User] fancy indexation of sparse matrices is terrible slow

Dmitrey tmp50 at ukr.net
Tue Dec 15 06:03:08 EST 2009


От кого: Nathan Bell <wnbell at gmail.com>  
  
  2009/12/14 Dmitrey <tmp50 at ukr.net>:  
>  
> I have tried all 3, lil works faster than any others, but casting to numpy  
> ndarray and invoking it on the dense array obtained works several times  
> faster, that is extremely important for me. Thus on my problem with sparse  
> matrix of shape 151* 1178  
> each time casting it to dense before applying fancy indexing makes numerical  
> optimization ipopt solver works faster from 130 to 60 sec. Here's the code  
> that demonstrates the difference:  
>  
>  
> So, I have filed a ticket for it.  
  
Hi Dmitrey,  
  
I think what you're asking for is simply not possible. The time spent  
looking up elements in a dense matrix (a completely trivial operation)  
will always be less than the lookup in sparse matrix data structures  
where indexing elements is inherently more expensive.    
but what about dok format? I guess it shouldn't be too expensive - it is  O(numberOfRequiredElements*log2(nonZerosNumber)), and there will be no peak memory jump as for lil.  
  
  If you want something that does the above operation really fast then you can do  
>>> M = coo_matrix(M)  
>>> M.data # same as M[I,J]  
or    
But I want to have M[I, J], not just M. Some (or lots of) required elements can be zeros and thus absent in  M.data.  
  
  Why do you want to index sparse matrices in the first place? For many  
operations there are better formulations that avoid the need to do  
indexing of any kind.    
Because some optimization problems can have constraints gradient of size nVars*nConstraints very huge.  nVars are up to ~100'000, and nConstraints are up to 200'000'000. I will just have memory overflow if I will try casting it to dense each time.  
BTW, why return type of numpy.where is int64, but type of scipy.sparse.find is numpy.int32? I had spent lots of time to search why ipopt hadn't work with that data, and it turned out it is due to this type difference (maybe some issues related to connecting Python code to C++). Also, does it mean that I cannot create sparse matrices with one dimension greater than max(int32)  instead of max(int64)?  
Thank you in advance, D.  
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091215/dc818048/attachment.html>


More information about the SciPy-User mailing list