[scikit-learn] best way to scale on the random forest for text w bag of words ...
Sasha Kacanski
skacanski at gmail.com
Wed Mar 15 21:20:55 EDT 2017
Hi,
As soon as number of trees and features goes higher, 70Gb of ram is gone
and i am getting out of memory errors.
file size is 700Mb. Dataframe quickly shrinks from 14 to 2 columns but
there is ton of text ...
with 10 estimators and 100 features per word I can't tackle ~900 k of
records ...
Training set, about 15% of data does perfectly fine but when test come that
is it.
i can split stuff and multiprocess it but I believe that will simply skew
results...
Any ideas?
--
Aleksandar Kacanski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170315/2c0d634f/attachment-0001.html>
More information about the scikit-learn
mailing list