[SciPy-User] Reading / writing sparse matrices

Lutz Maibaum lutz.maibaum at gmail.com
Thu Nov 11 22:36:37 EST 2010


On Nov 11, 2010, at 5:27 PM, Matthew Brett wrote:
> On Fri, Jun 18, 2010 at 6:31 PM, Lutz Maibaum <lutz.maibaum at gmail.com> wrote:
>> How can I write a sparse matrix with elements of type uint64 to a file, and recover it while preserving the data type? For example:
>> 
>>>>> import numpy as np
>>>>> import scipy.sparse
>>>>> a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
>>>>> a[0,0]=9876543210
>> 
>> Now I save this matrix to a file:
>> 
>>>>> import scipy.io
>>>>> scipy.io.mmwrite("test.mtx", a, field='integer')
>> 
>> If I do not specify the field argument of mmwrite, I get a "unexpected dtype of kind u" exception. The generated file test.mtx looks as expected. But when I try to read this matrix, it is converted to int32:
>> 
>>>>> b=scipy.io.mmread("test.mtx")
>>>>> b.dtype
>> dtype('int32')
>>>>> b.data
>> array([-2147483648], dtype=int32)
>> 
>> As far as I can tell, it is not possible to specify a dtype when calling mmread. Is there a better way to go about this?
> 
> I had a quick look at the code, and then at the Matrix Market format,
> and it looks to me:
> 
> http://math.nist.gov/MatrixMarket/reports/MMformat.ps.gz
> 
> as if Matrix Market only allows integer, real or complex - hence the
> (somewhat unhelpful) error.

Yes, the Matrix Market file format has only these 3 types, and scipy.io.mmwrite (actually, scipy.io.mmio.MMFile._write) has to guess which of these to use for a given dtype:

        if field is None:
            kind = a.dtype.kind
            if kind == 'i':
                field = 'integer'
            elif kind == 'f':
                field = 'real'
            elif kind == 'c':
                field = 'complex'
            else:
                raise TypeError('unexpected dtype kind ' + kind)

It would be nice if this algorithm would be extended to handle unsigned integers (which seem to have kind=='u', but I'm not sure if that's sufficient and necessary) as well, which could also translate to "integer" in the MM file.

The opposite problem occurs when the file is read by mmread, which has to figure out how to translate the three Matrix Market types to python's numeric types. Using the system's default types for int, float and complex is very reasonable, but it would be nice if one could override this default by specifying an optional dtype argument (as is used, for example, by numpy.loadtxt).

Thanks for looking into this,

  Lutz




More information about the SciPy-User mailing list