[SciPy-Dev] scipy.sparse.coo_matrix a new method drop_zero_columns?

Evgeny Nekrasov evgeny.nekrasov at phystech.edu
Sun Nov 6 13:10:58 EST 2016


Dear Ralf,

Thank you for your response. The popular use case for drop_zero_columns is
feature selection before applying machine learning algorithm. For example a
very similar technique VarianceThreshold is implemented in sklearn (
http://scikit-learn.org/stable/modules/feature_selection.html). The problem
with current implementations is that it fails or takes much more resources
than actually needed if the amount of zero columns is really huge. Such
sparse data representations often produced by popular techniques such as
feature hashing, bag of words, bag of content_ids or similar. Such
techniques are implemented in sklearn (
http://scikit-learn.org/stable/modules/feature_extraction.html).
Nevertheless, custom implementations often needed, and here
drop_zero_columns is valuable.
Other questions:
1. It would be great to have drop_zero_columns for all matrix types. I
wrote about this method for COO due to it is sufficient to process data
with huge amount of zero columns in efficient way.
2. I don't know popular use cases for rows.

Best regards,
Evgeny

2016-11-06 0:44 GMT+03:00 Ralf Gommers <ralf.gommers at gmail.com>:

>
>
> On Fri, Nov 4, 2016 at 11:40 PM, Evgeny Nekrasov <
> evgeny.nekrasov at phystech.edu> wrote:
>
>> Hello,
>>
>> I am writing to follow up on the discussion
>> https://github.com/scipy/scipy/issues/6754
>>
>> I would be happy to get your opinions.
>>
>
> Hi Evgeny, it would be helpful to add to the issue description some links
> to code / examples where this method is used or would be useful. Right now
> there's just an assertion "this is useful", which is hard to evaluate.
>
> Other questions I would have:
> 1. The issue has a method for COO, but I'd think we want this for all or
> none of the formats?
> 2. Is it only drop column, or would drop row also make sense?
>
> Ralf
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20161106/7e78c3dc/attachment.html>


More information about the SciPy-Dev mailing list