[scikit-learn] random forests and multil-class probability

Matteo Caorsi m.caorsi at l2f.ch
Sat Aug 14 09:13:28 EDT 2021


Greetings!

I am currently out of office, with limited access to emails, till August the 30th.
Please contact support at giotto.ai for technical issues concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.

With best regards,

Matteo


On 27 Jul 2021, at 11:31, Sole Galli via scikit-learn <scikit-learn at python.org> wrote:

Thank you!

I was confused because in the multiclass documentation it says that for those estimators that have multiclass support built in, like Decision trees and Random Forests, then we do not need to use the wrapper classes like the OnevsRest.

Thus I have the following question, if I want to determine the PR curves or the ROC curve, say with micro-average, do I need to wrap them with the 1 vs rest? Or it does not matter? The probability values do change slightly.

Thank you!





‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lemaître <g.lemaitre58 at gmail.com> wrote:

On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn scikit-learn at python.org wrote:

Hello community,

Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?.

Each decision tree of the forest is natively supporting multi class.

The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1?

According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1.

According to the documentation, the probabilities are computed as:

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.

Thank you

Sole

scikit-learn mailing list

scikit-learn at python.org

https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/4d6a317d/attachment-0001.html>


More information about the scikit-learn mailing list