[SciPy-User] Reading / writing sparse matrices

Fri Nov 12 03:46:17 EST 2010

Hi,

On Thu, Nov 11, 2010 at 9:33 PM, Lutz Maibaum <lutz.maibaum at gmail.com> wrote:
> On Thu, Nov 11, 2010 at 8:58 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> The problem I can see is that this would be confusing:
>>
>> a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
>> a[0,0]=9876543210
>> mmwrite(fname, a)
>> res = mmread(fname)
>> b.data
>> array([-2147483648], dtype=int32)
>>
>> That is, I think the writer shouldn't write something without warning,
>> that it will read incorrectly by default.   So, how about a
>> compromise:
>>
>> In [7]: mmwrite(fname, a)
>> ---------------------------------------------------------------------------
>> TypeError                                 Traceback (most recent call last)
>> ...
>> TypeError: Will not write unsigned integers by default. Please pass
>> field="integer" to write unsigned integers
>> In [8]: mmwrite(fname, a, field='integer')
>> In [9]: res = mmread(fname, dtype=np.uint64)
>> In [11]: res.todense()[0,0]
>> Out[11]: 9876543210
>
> That's one possibility, but I find it somewhat odd that this would
> generate an exception when the matrix is being saved, even though
> there is no ambiguity at this stage. It also wouldn't eliminate the
> potential for confusion if someone tries to load a matrix that they
> didn't save themselves, but got from some other source.

Yes, of course, we can't protect people from getting unexpected or
wrong results in that case.

> Are there other situations where the automated conversion from mmread
> may cause problems? For example, reading a matrix with 64-bit integers
> on a system where the default int dtype is only 32 bit?

Yes, the general case is saving anything that cannot be read back into
the default integer on the system on which the file is being read.
So, for example, uint16 is always safe.

> I think it would be ideal if mmread would generate a warning or throw
> an exception of the numerical value of the generated integer does not
> coincide with string that has been read from the file. I don't  know
> if that is feasible. Alternatively, one could store additional
> information about the integer data type in the Matrix Market header
> section as a comment.

I don't know the code well enough to know if it's practical to check
every value for integer overflow.  The comment idea sounds reasonable.
  I'll have a look.  I had already implemented the suggestion I sent
you before, so at least that should go in, unless we come up with
something better.

Cheers,

Matthew