[SciPy-dev] Possible fix for scipy.sparse.lil_matrix column-slicing problem

Tue Dec 1 01:53:16 EST 2009

On Mon, Nov 30, 2009 at 5:32 AM, Robert Cimrman wrote:
> Hi Tim,
>
> Tim Victor wrote:
>> On Thu, Nov 26, 2009 at 10:58 PM, Nathan Bell wrote:
>>> On Thu, Nov 26, 2009 at 7:21 PM, Tim Victor wrote:
>>>> I didn't know that I could do that! I've just created an account and
>>>> added a comment there with a link to this email thread. Thanks for the
>>>> reply.
>>>>
>>> Nice work!
>>>
>>> Your patch has been merged into r6121 [1].  The previous
>>> implementation was a bit of a nightmare so I'm glad someone finally
>>> took the time to simplify it :)
>>>
>>> Thanks for your contribution.  I hope it is the first of many!
>>>
>>> [1] http://projects.scipy.org/scipy/changeset/6121
>>>
>>> --
>>> Nathan Bell wnbell at gmail.com
>>> http://www.wnbell.com/
>>
>> Thanks for the really quick turnaround and the kind words, Nathan. I
>> enjoyed working on it. It was kinda like helping someone else on their
>> class project when I was a student. That was always more fun than
>> doing my own work.
>>
>> I might see what other open tickets there are. It seems like a good
>> way to learn something about the different parts of the system.
>>
>> Best regards,
>>
>> Tim Victor
>
> Is there are reason why you removed the special case of x being a scalar, namely:
>
> -        elif issequence(i) and issequence(j):
> -            if np.isscalar(x):
> -                for ii, jj in zip(i, j):
> -                    self._insertat(ii, jj, x)
> -            else:
> -                for ii, jj, xx in zip(i, j, x):
> -                    self._insertat(ii, jj, xx)
>
> This removal broke a code of mine, which now takes forever, and behaves in a
> different way. Try this:
>
> In [1]: import scipy.sparse as spp
> In [2]: a = spp.lil_matrix((1000, 1000))
> In [3]: a
> Out[3]:
> <1000x1000 sparse matrix of type '<type 'numpy.float64'>'
>         with 0 stored elements in LInked List format>
> In [4]: import numpy as np
> In [5]: ir = ic = np.arange(1000)
> In [6]: a[ir, ic] = 1
>
> The result is a matrix with all the entries set to 1 (= full!), not just the
> diagonal, which was the previous (IMHO good) behaviour. In the real code I do
> not set the diagonal, but some other elements given by two lists ir, ic, but
> the code above shows the symptoms.
>
> I can fix easily my code by not using the LIL matrix:
>
> In [15]: a = spp.coo_matrix((np.ones((ir.shape[0],)), (ir, ic)))
> In [16]: a
> Out[16]:
> <1000x1000 sparse matrix of type '<type 'numpy.float64'>'
>         with 1000 stored elements in COOrdinate format>
>
> but I wonder, if the above change in behaviour was intended...
>
> cheers,
> r.
---------------------

Robert,

> Is there are reason why you removed the special case of x being a scalar, namely:

Actually I don't think it was the special case of x being a scalar,
but rather the special case of the row index and the column index both
being sequences.

Let me know if this looks wrong to you, but using NumPy dense arrays
and matrices as a reference for correct behavior, I get:

m = sp.zeros((10,10))

# set first column of m to ones:
m[range(10),0] = 1

# set diagonal of m to -1:
m[range(10),range(10)] = -1
m

(prints:)
array([[-1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0., -1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0., -1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.]])

The first case, assigning a scalar with a range for the row index and
a scalar for the column index, lil_matrix still works the way it did.
It iterates the range of rows and sets column 0 for each. The second
case, iterating both of the ranges and using the row/column indexes in
pairs, was inadvertently changed.

A third case suggests that it isn't only about assigning a scalar,
since assigning from a sequence using pairwise-matched sequences of
row and column indices should work similarly:

# assign random values to the reverse diagonal:
m[range(10),range(9,-1,-1)] = sp.rand(10)
m

(prints:)
array([[-1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.90216781],
       [ 1.        , -1.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.54728632,  0.        ],
       [ 1.        ,  0.        , -1.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.97993962,  0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.        , -1.        ,  0.        ,
         0.        ,  0.66932184,  0.        ,  0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.        ,  0.        , -1.        ,
         0.64246538,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.        ,  0.        ,  0.93092643,
        -1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.        ,  0.25711642,  0.        ,
         0.        , -1.        ,  0.        ,  0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.28826595,  0.        ,  0.        ,
         0.        ,  0.        , -1.        ,  0.        ,  0.        ],
       [ 1.        ,  0.01331807,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        , -1.        ,  0.        ],
       [ 0.86963499,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        , -1.        ]])

Yes, I broke that. :-) I haven't used this behavior or seen it
documented and it wasn't covered by a unit test, and I missed it. (In
my defense, lil_matrix is broken even worse for that third case in
both SciPy versions 0.7.0 and 0.7.1.)

Robert, can you (or anyone else out there) tell me if this covers it
all, or whether there is some more general array-indexing behavior
that should be implemented? I'll be happy to put in a case to handle
it.

I'm reminded of the Zen of Python "Explicit is better than implicit." guideline.

Best regards, and many apologies for the inconvenience,

Tim