[scikit-learn] Text classification of large dataset

Wed Dec 27 05:16:34 EST 2017

Hai all,

Thank you for your suggestions.

But I am still getting *memory error* while doing feature selection

*fs = feature_selection.SelectPercentile(feature_selection.chi2,
percentile=20)*
*documenttermmatrix1 = fs.fit_transform(documenttermmatrix,y1)*

*documenttermmatrix* will be of shape *(1594516,232832)*
type of *documenttermmatrix * is *scipy csr matrix*

Am I doing anything wrong?

Is there any better way of doing feature selection?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171227/76efd6f1/attachment.html>