[scikit-learn] Text classification of large dataset

Ranjana Girish ranjanagirish30 at gmail.com
Wed Dec 27 05:16:34 EST 2017


Hai all,

Thank you for your suggestions.

But I am still getting *memory error* while doing feature selection

*fs = feature_selection.SelectPercentile(feature_selection.chi2,
percentile=20)*
*documenttermmatrix1 = fs.fit_transform(documenttermmatrix,y1)*


*documenttermmatrix* will be of shape *(1594516,232832)*
type of *documenttermmatrix * is *scipy csr matrix*

Am I doing anything wrong?

Is there any better way of doing feature selection?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171227/76efd6f1/attachment.html>


More information about the scikit-learn mailing list