[SciPy-user] [Sparse matrix library] csr_matrix and column sum

Robert Kern robert.kern at gmail.com
Mon Apr 28 12:06:51 EDT 2008


On Mon, Apr 28, 2008 at 12:29 AM, Nathan Bell <wnbell at gmail.com> wrote:
> On Sun, Apr 27, 2008 at 11:57 PM, Dinesh B Vadhia
>  <dineshbvadhia at hotmail.com> wrote:
>  >
>  > 0 , -44 , 84 , -116 , -121 , -43 , -44 , -116 , -115 , -79 , 70 , -86 , 39 ,
>  > -17 , -21 , -112 , 29 , -126 , -19 , 33 , 59 , -6 , 24 , 18 , 57
>  >
>  >
>
> > 768 , 724 , 1108 , 652 , 1927 , 2005 , 724 , 908 , 1421 , 1457 , 2118 , 1450
>  > , 1575 , 3055 , 2283 , 656 , 1053 , 898 , 1517 , 1569 , 1339 , 762 , 3096 ,
>  > 530 , 1081
>  >
>
>  This is due to the fact that when integer arithmetic overflows (e.g. A
>  + B is too large) the result "wraps around".  The solution is to use a
>  data type with a greater range of values (more bits).
>
>  Replace your int8 data array with an int16 array and you will get the
>  expected results (albeit using one more byte per nonzero) provided
>  that the sums do not exceed 2^15 - 1.
>
>  To be safe, you might use int32 and not worry about ranges as much.

ndarray.sum() accepts a dtype= argument to specify the type of the
accumulator. You might consider implementing the same thing for sparse
arrays. Also, ndarray.sum() defaults to int32 (on 32-bit systems,
int64 on 64-bit systems) as the accumulator dtype for all smaller
integer types.


In [1]: from numpy import *

In [2]: a = ones(300, dtype=int8)

In [3]: a.sum?
Type:             builtin_function_or_method
Base Class:       <type 'builtin_function_or_method'>
Namespace:        Interactive
Docstring:
    a.sum(axis=None, dtype=None) -> Sum of array over given axis.

    Sum the array over the given axis.  If the axis is None, sum over
    all dimensions of the array.

    The optional dtype argument is the data type for the returned
    value and intermediate calculations.  The default is to upcast
    (promote) smaller integer types to the platform-dependent int.
    For example, on 32-bit platforms:

      a.dtype                         default sum dtype
      ---------------------------------------------------
      bool, int8, int16, int32        int32

    Warning: The arithmetic is modular and no error is raised on overflow.

    Examples
    --------
    >>> array([0.5, 1.5]).sum()
    2.0
    >>> array([0.5, 1.5]).sum(dtype=int32)
    1
    >>> array([[0, 1], [0, 5]]).sum(axis=0)
    array([0, 6])
    >>> array([[0, 1], [0, 5]]).sum(axis=1)
    array([1, 5])
    >>> ones(128, dtype=int8).sum(dtype=int8) # overflow!
    -128

In [4]: a.sum(dtype=int16)
Out[4]: 300


-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco



More information about the SciPy-User mailing list