[SciPy-Dev] Indexing sparse matrices

Thu Jun 24 10:39:30 EDT 2010

On 06/23/2010 11:43 PM, Jason Grout wrote:
> On 6/23/10 12:31 PM, Bruce Southey wrote:
>    
>> On 06/23/2010 09:22 AM, Jason Grout wrote:
>>
>>      
>>> Hi everyone,
>>>
>>> This bug in scipy sparse matrices was recently reported in Sage:
>>>
>>> ----------------------------------------------------------------------
>>> | Sage Version 4.4.2, Release Date: 2010-05-19                       |
>>> | Type notebook() for the GUI, and license() for information.        |
>>> ----------------------------------------------------------------------
>>> sage: from scipy import sparse
>>> sage: a = sparse.lil_matrix((10,10))
>>> sage: a[1,2] = 1
>>> ---------------------------------------------------------------------------
>>> ValueError                                Traceback (most recent call last)
>>>
>>> /Users/grout/sage-4.4.2-test3/local/lib/python2.6/site-packages/numpy/core/<ipython
>>> console>    in<module>()
>>>
>>> /Users/grout/sage/local/lib/python2.6/site-packages/scipy/sparse/lil.pyc
>>> in __setitem__(self, index, x)
>>>         320                     self._insertat3(row, data, j, xx)
>>>         321         else:
>>> -->    322             raise ValueError('invalid index value: %s' % str((i,
>>> j)))
>>>         323
>>>         324     def _mul_scalar(self, other):
>>>
>>> ValueError: invalid index value: (1, 2)
>>>
>>>
>>> Since the preparser in Sage makes integers instances of the Sage Integer
>>> class, and the Sage Integer class is not in the specific list of types
>>> checked by numpy.isscalar, the case falls through the checks in the
>>> sparse matrix setitem method to the last error-raising case. The problem
>>> seems to be that scipy checks for a specific list of types (via
>>> numpy.isscalar), instead of just using the __index__ magic method which
>>> was designed for this purpose (by Travis Oliphant for numpy! see
>>> http://docs.python.org/whatsnew/2.5.html#pep-357-the-index-method).
>>>
>>> Note that things work fine in numpy:
>>>
>>> sage: import numpy
>>> sage: a=numpy.array([[1,2],[3,4]],dtype=int); a
>>> array([[1, 2],
>>>            [3, 4]])
>>> sage: a[1,1]=5
>>> sage: a
>>> array([[1, 2],
>>>            [3, 5]])
>>>
>>>
>>> Thanks,
>>>
>>> Jason
>>>
>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>>>
>>>        
>> Please post this discussion on the numpy list from the scipy-dev list
>> because this is not really scipy issue (yet).
>>
>>   
This is not the place for this discussion!
>>
>>      
> The thing is that the index object may not be a scalar, but it may have
> an __index__ method that is supposed to be used when the object is used
> as an index.  So the call to isscalar may rightfully return False, but
> the __index__ method should still be called to see if there is a value
> to act as an index.  For example, according to Python>=2.5, this should
> work:
>    
Remember (as given below) that numpy officially supports Python 2.4 so 
any Python 2.5+ features are not allowed (at least until some one 
notices them).
If you just look at the numpy's isscalar function you can see why it 
fails - so file a bug report on it.
> Python 2.6.4 (r264:75706, May 25 2010, 15:42:09)
> Type "copyright", "credits" or "license" for more information.
>
> IPython 0.9.1 -- An enhanced Interactive Python.
> ?         ->  Introduction and overview of IPython's features.
> %quickref ->  Quick reference.
> help      ->  Python's own help system.
> object?   ->  Details about 'object'. ?object also works, ?? prints more.
>
> In [1]: class MyObject(object):
>      ...:     def __index__(self):
>      ...:         return 3
>      ...:
>      ...:
>
> In [2]: import numpy
>
> In [3]: a=numpy.array([[1,2,3,4],[3,2,1,2]])
>
> In [4]: ind=MyObject()
>
> In [5]: a[1,ind]
> Out[5]: 2
>
> ***************************
> Note: it works fine in numpy
> ***************************
>
>
> In [6]: from scipy import sparse
>
> In [7]: a = sparse.lil_matrix((10,10))
>
> In [8]: a[1,ind]=3
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
>
> /Users/grout/projects/reptheory/practice/<ipython console>  in<module>()
>
> /Users/grout/sage/local/lib/python2.6/site-packages/scipy/sparse/lil.pyc
> in __setitem__(self, index, x)
>       302             row = self.rows[i]
>       303             data = self.data[i]
> -->  304             self._insertat3(row, data, j, x)
>       305         elif issequence(i) and issequence(j):
>       306             if np.isscalar(x):
>
> /Users/grout/sage/local/lib/python2.6/site-packages/scipy/sparse/lil.pyc
> in _insertat3(self, row, data, j, x)
>       276             self._insertat2(row, data, j, x)
>       277         else:
> -->  278             raise ValueError('invalid column value: %s' % str(j))
>       279
>       280
>
> ValueError: invalid column value:<__main__.MyObject object at 0x1017a9f90>
>
>
> **************************
> Note that the problem above is that only the first index is checked to
> be a scalar, instead of all indices.  Continuing on...
>    
It is meaningless to compare an ndarray with a sparse matrix because 
these are not the same thing (IIRC sparse matrices are not a subclass of 
ndarrays).

> **************************
>
>
> In [9]: a.__setitem__??
>
> In [10]: numpy.isscalar(ind)
> Out[10]: False
>    
Correct since ind should be a class instance.
 >>> ind
<__main__.MyObject object at 0x11e4950>

> In [11]: a.__setitem__??
>
> In [12]: numpy.isscalar(ind)
> Out[12]: False
>
> In [13]: a[ind,1]=3
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
>
> /Users/grout/projects/reptheory/practice/<ipython console>  in<module>()
>
> /Users/grout/sage/local/lib/python2.6/site-packages/scipy/sparse/lil.pyc
> in __setitem__(self, index, x)
>       320                     self._insertat3(row, data, j, xx)
>       321         else:
> -->  322             raise ValueError('invalid index value: %s' % str((i,
> j)))
>       323
>       324     def _mul_scalar(self, other):
>
> ValueError: invalid index value: (<__main__.MyObject object at
> 0x1017a9f90>, 1)
>
> **********
> Here, the __index__ method of ind was not called.  Note that ind itself
> may actually be something we'd normally consider to be not a scalar, so
> again, I don't think the isscalar check is really what needs to happen.
> Rather, if scipy wants an index, it should check for an index specifically.
>> I agree that this appears to be an issue between Sage and numpy's
>> isscalar function. There are many uses of isscalar() in numpy (by
>> grepping the source) and scipy that should be failing with Sage. For
>> example  you should be getting errors using numpy's polynomial functions
>> (polynomial/polynomial.py).  I would have thought that you would be
>> seeing lots of errors when running numpy and scipy tests within Sage.
>> (Perhaps those tests don't occur with Sage integers because Sage does
>> not create them.)
>>
>>      
> We used to have lots of errors before using the standard numpy ways for
> converting Sage datatypes to numpy datatypes.  We should look at these
> other situations when isscalar might be called, though.  I do believe
> the above issue is a different issue, since scipy specifically wants
> indices, which don't always correspond with something being a "scalar".
>    
>> Also, note that numpy still supports Python 2.4.
>>
>>      
> Good point.  I don't know if it's better to test the version number of
> python before trying to use __index__, or to just assume that python 2.4
> code may still use __index__ like we expect in 2.5.
>
> Thanks,
>
> Jason
>
>    
I am not an expert on numpy but I do think you confusing how you think 
it should work and how it actually works.

The isscalar function is a very simple function that checks if the input 
type is in np.generic or in np.ScalarType (I think these are all in 
numpy/core/numerictypes.py). Thus, isscalar is really only checking if 
the input is a valid dtype as those must be scalar - hence the wikipedia 
definition of scalar:
en.wikipedia.org/wiki/Scalar_(computing)

Thus not even a list is a numpy 'scalar'
 >>> np.isscalar([3.1])
False

Changing that numerictypes.py file would be an highly undesirable ABI 
change. The solution, which should help Sage, is to adapt isscalar to 
allow similar things to Sage's scalars. But I am not sure how check that 
a class/function returns a valid dtype scalar - I am not even sure if 
the test 'isinstance(num, generic)' is even needed.

So you need to report a bug for isscalar that includes a non-Sage 
example. It would be great if you could also provide a patch as well.

Bruce