[scikit-learn] Random forest fitting very well

chris brew cbrew at acm.org
Thu Jun 23 07:00:43 EDT 2016


It is probably a good idea to start by separating off part of your training
data into a held-out development set that is not used for training, which
you can use to create learning curves and estimate probable performance on
unseen data. I really recommend Andrew Ng's machine learning course
material from Stanford and Coursera. It shows you how to use learning
curves to understand your problem and also the way that different
estimators behave.


There are many estimators that will achieve an extremely good fit to
typical training data, but the differences between estimators show up
mostly in what happens with unseen test data. Personally I always start by
seeing how well simple classifiers or regressors do (Naive Bayes, linear
regression, etc.), then try regularized linear models like ElasticNets then
try SVMs, then try random forests or other ensemble models. That way, I
finish up using the powerful and complex models only when the data demands
it.

On 23 June 2016 at 10:20, muhammad waseem <m.waseem.ahmad at gmail.com> wrote:

> Hi All,
> I am trying to use random forests for a regression problem, with 10 input
> variables and one output variable. I am getting very good fit even with
> default parameters and low n_estimators. Even with n_estimator = 10, I get
> R^2 value of 0.95 on testing dataset (MSE=23) and a value of 0.99 for
> the training set. I was wondering, if this is common with random forest or
> I am missing something, Could you please share your experience? The total
> number of sample (training +testing) are equal to 10971.
> Also, what are the most important parameters (max_depth, bootstrap,
> max_leaf_nodes etc.) that I need to play with to tune my model even
> further? Lastly, is there is a way I can visualise a single tree of my
> forest (just for demonstration purposes)?
> Please see a figure below to demonstrate how well it is fitting with
> default values.
>
>
>
> [image: Inline image 1]
> Thanks
> Kindest Regards
> Waseem
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160623/2b8e989f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: forest fitting.png
Type: image/png
Size: 86146 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160623/2b8e989f/attachment-0001.png>


More information about the scikit-learn mailing list