[SciPy-User] csr_matrix rows remove

Nathaniel Smith njs at pobox.com
Sun Oct 7 06:59:48 EDT 2012


On Sun, Oct 7, 2012 at 6:04 AM, David Warde-Farley
<d.warde.farley at gmail.com> wrote:
> On Thu, Oct 4, 2012 at 9:05 AM, Pavel Lurye <pavel.lurye at gmail.com> wrote:
>> Hi,
>> I'm using scipy csr_matrix and I'm trying to figure out what is the
>> simple and fast way to remove a row from such matrix?
>> For example, I have a tuple of rows, that should be deleted. The only
>> way I see, is to generate a tuple of matrix parts and vstack it.
>> Please, help me out with this.
>
> Unfortunately, CSR/CSC do not admit terribly efficient row deletion.
> What would be required to do it semi-efficiently would be to determine
> how many non-zero elements live in those rows (call this number k),
> allocate 3 vectors (new_data, new_indices, new_indptr), mirroring the
> .data, .indices and .indptr attributes of the sparse matrix object,
> each of length nnz - k (where nnz is the number of non-zero elements
> in the original matrix). First, copy the contents of mycsrmatrix.data
> into new_data, omitting the ones in the deleted rows. Then things
> become tricky: you need to adjust the values of indices and indptr to
> account for the now missing rows. This would require reading up on the
> CSR format, and would be relatively complicated but not impossible.

Row deletion from CSR is about as efficient as from a dense matrix...
you have to copy the data, of course, but that's the only real cost. I
think it works to do something like (untested and only handling one
row, to illustrate the idea):

def delete_a_csr_row(row_i, data, indices, indptr):
    k = indptr[row_i + 1] - indptr[row_i]
    new_data = np.empty(len(data) - k, dtype=data.dtype)
    new_indices = np.empty(len(indices) - k, dtype=indices.dtype)
    new_indptr = np.empty(len(indptr) - 1, dtype=indptr.dtype)
    new_data[:indptr[row_i]] = data[:indptr[row_i]]
    new_data[indptr[row_i]:] = data[indptr[row_i + 1]:]
    new_indices[:indptr[row_i]] = indices[:indptr[row_i]]
    new_indices[indptr[row_i]:] = indices[indptr[row_i + 1]:]
    new_indptr[:row_i] = indptr[:row_i]
    new_indptr[row_i:] = indptr[row_i + 1:]
    new_indptr[row_i:] -= k
    return csr_matrix((new_data, new_indices, new_indptr))

I guess whether this counts as simple depends on your tolerance for
sparse matrix formats :-). But it's much simpler than trying to do the
same in, say, CSC format... and probably similar to COO.

-n



More information about the SciPy-User mailing list