[scikit-learn] Query regarding parameter class_weight in Random Forest Classifier

Sat Jan 21 13:18:45 EST 2017

Hi All,
             Greetings !

              I have a very basic question regarding the usage of the
parameter class_weight in scikit learn's Random Forest Classifier's fit
method.

              I have a fairly unbalanced sample and my positive class :
negative class ratio is 1:100. In other words, I have a million records
corresponding to negative class and 10,000 records corresponding to
positive class. I have trained the random forest classifier model using the
above record set successfully.

              Further, for a different problem, I want to test the
parameter class_weight. So, I am setting the class_weight as [0:0.001 ,
1:0.999] and I have tried running my model on the same dataset as mentioned
in the above paragraph but with the positive class records reduced to 1000
[because now each positive class is given approximately 10 times more
weight than a negative class]. However, the model run results are very very
different between the 2 runs (with and without class_weight). And I
expected a similar run results.

                Would you please be able to let me know where am I getting
wrong. I know it's something silly but just want to improve on my concept.

Thanks !
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170121/8cf2e12a/attachment-0001.html>