[SciPy-User] SPARSE matrix dtypes, upcasting, sum function

Thu Sep 8 09:28:48 EDT 2011

On Thu, Sep 8, 2011 at 8:35 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> We have:
>
> I > 250000, J > 250000, nnz>10000000
>
> data = scipy.ones(nnz, dtype=numpy.uint8)
> A = sparse.csr_matrix((data, (xrow, xcolumn)), shape=(I,J))
>
> where xrow and xcolumn are int vectors of length nnz
>
> The row and column sums are:
> rowsum = A.sum(0)
> columnsum = A.sum(1)
>
> The max value given for each by Scipy are:
> rowsum .max() = 255
> columnsum .max() = 255
>
> But, the real values are:
> rowsum .max() = 41190
> columnsum .max() = 1080
>
> Can someone see what we are doing wrong?

It is at least a documentation bug, and I would have expected
upcasting as well. Note however that using integer will always have
some potential overflow issues, which are platform dependent (because
the default upcasting rules will use different sizes on different
platforms).

For example:

import numpy as np
a = 1024 * np.ones((4e6, 2), dtype=np.int16)
a.sum(0)

will give you the right answer on a 64 bits python on mac os x, but
the wrong one on 32 bits. As soon as you are doing operations which
can potentially overflow, I would advise to convert to float values.

cheers,

David