[scikit-learn] best way to scale on the random forest for text w bag of words ...

Sasha Kacanski skacanski at gmail.com
Wed Mar 15 21:20:55 EDT 2017


Hi,
As soon as number of trees and features goes higher, 70Gb of ram is gone
and i am getting out of memory errors.
file size is 700Mb. Dataframe quickly shrinks from 14 to 2 columns but
there is ton of text ...
with 10 estimators and 100 features per word I can't tackle ~900 k of
records ...
Training set, about 15% of data does perfectly fine but when test come that
is it.

i can split stuff and multiprocess it but I believe that will simply skew
results...

Any ideas?


-- 
Aleksandar Kacanski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170315/2c0d634f/attachment-0001.html>


More information about the scikit-learn mailing list