Which sparse matrix package?

Martin Manns mmanns at gmx.net
Thu Dec 18 17:18:51 EST 2008


Hi:

I am writing a spreadsheet application in Python

http://pyspread.sf.net

which currently uses numpy.array for:

+ storing object references 
  (each array element corresponds to one grid cell)
+ slicing (read and write)
+ mapping from/to smaller numpy.array
+ searching and replacing
+ growing and shrinking
+ transpose operation.

While fast and stable, this is really memory inefficient for large grids
(i.e. larger than 1E7 rows and columns), so that I am looking into dicts
and sparse matrices.

The dict that I tried out is of the type:

{(1,2,3): "2323", (1,2,545): "2324234", ... }

It is too slow for my application when it grows. One slicing operation
with list comprehensions takes about 1/2 s on my computer for 1E6
elements.

Therefore, I looked into sparse matrices and found scipy.sparse and
pysparse. I tried out both lil_matrix objects. (I wrote a wrapper that
turns them into Python object arrays.)

scipy.sparse.lil_matrix allowed __getitem__ slicing only for one of the
dimensions and used much memory when increasing the number of
columns above 1E7.

pysparse.spmatrix.ll_mat was faster, uses less space and allows slicing
for both dimensions. However, its methods are not documented well and
I am not able to compile it in Debian testing due to some g77
dependencies. Even though the deb package works well, I am concerned
about having a dependency to a problematic package.


Now my questions:

Is there a better suited / maintained module for such sparse matrices
(or multi-dim arrays)? 

Should I use another type of matrix in scipy.sparse? If yes which?

Does a different data-structure suit my above-stated needs better?

Best Regards

Martin




More information about the Python-list mailing list