[scikit-learn] Feature selection with words.

Luigi Lomasto l.lomasto at innovationengineering.eu
Tue Dec 19 03:36:42 EST 2017


Hi all. 

I’m working for text classification to classify Wikipedia documents. I using a word count approach to extract feature from my text so I obtain a big vocabulary that contains all documents word (train dataset) after lemmatization and deleted stop word. Now I have 70000 features. I think that for this problems (word based) is not good to make feature selection (with SVD or PCA). Actual accuracy is 77%. 

Do you think that I need to do feature selection to grow up the accuracy? 

Thank you for answer. Regards. 

Luigi 





More information about the scikit-learn mailing list