[SciPy-Dev] scipy.sparse: add save and load functions for sparse matrices

Joscha Reimer jor at informatik.uni-kiel.de
Mon Aug 15 05:35:30 EDT 2016


Hallo,

I would like to propose a new save and load functionality for sparse 
matrices in SciPy.

So far, the scipy.io.savemat/loadmat functions allow to save and load 
sparse matrices in MATLAB file format (version 4 and 5). However, this 
has some serious drawbacks.

Big (sparse) matrices are not storable in a mat file (version 4 and 5) 
since maximal 2^31 bytes per variable are supported.

Besides sparse matrices are stored in a mat file always in csc matrix 
format. Thus, the original matrix format is not preserved. If another 
matrix format is used, the format has to be converted from the original 
format to csc before saving and back to the original format after 
loading. For large matrices this can take a lot of time. In addition, 
the indices must be sorted in a mat file. Which can take a lot of 
additional time.

Since the sparse matrices are always stored in csc format, the 
advantages of other matrix formats regarding disk consumption can not be 
exploited. For example, some suitable block matrices can be stored with 
much less disk consumption in bsr matrix format as in csc matrix format.

I propose to store directly the data arrays of the sparse matrics 
together with the matrix format in one file using NumPys savez and 
savez_compressed functions. The reconstruction while loading is then 
possible without much effort.

This can be done easily for the (csc, csr, bsr, dia and coo) formats. 
(The remaining dok and lil formats should only be used for construction 
sparse matrices anyway and than be converted to another matrix format.)

This would allow to store big sparse matrices and to benefit from the 
advantages of the different matrix formats.

A pull request (for the csc, csr and bsr matrix formats) is here:
https://github.com/scipy/scipy/pull/6394

Best regards,
Joscha Reimer


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4263 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20160815/d28936a2/attachment.bin>


More information about the SciPy-Dev mailing list