[scikit-learn] Probabilities for LogisticRegression and LDA

Thu Feb 7 20:49:42 EST 2019

*The set of independent regressions described in Wikipedia is *not* an OvR
model.* It is just a (weird) way to understand the multinomial logistic
regression model.
OvR logistic regression and multinomial logistic regression are two
different models.

In multinomial logistic regression as a set of independent binary
regressions as described in Wikipedia, you have K - 1 binary regressions
between class k (k from 1 to K - 1) and class K.
Whereas in OvR logistic regression you have K binary regressions between
class k (k from 1 to K) and class "not class k".
The normalization is therefore different.

Indeed, in multinomial logistic regression as a set of independent binary
regressions, you have (from the beginning) the property 1 = sum_k p(y = k).
The normalization 1 / (1 + sum_{k=1}^{K - 1} p(y = k)) comes from the late
computation of p(y = K) using this property.
Whereas in OvR logistic regression, you only have 1 = p_k(y = k) + p_k(y !=
k). Therefore the probabilities p_k(y = k) do not sum to one, and you need
to normalize them with sum_{k=1}^{K} p_k(y = k) to create a valid
probability of the OvR model. This is done in the same way in
OneVsRestClassifier (
https://github.com/scikit-learn/scikit-learn/blob/1a850eb5b601f3bf0f88a43090f83c51b3d8c593/sklearn/multiclass.py#L350-L351
).

But I agree that this description of the multinomial model is quite
confusing, compared to the log-linear/softmax description.

Tom

Le jeu. 7 févr. 2019 à 08:31, Guillaume Lemaître <g.lemaitre58 at gmail.com> a
écrit :

> I was earlier looking at the code of predict_proba of LDA and
> LogisticRegression. While we certainly some bugs I was a bit confused and I
> thought an email would be better than opening an issue since that might not
> be one.
>
> In the case of multiclass classification, the probabilities could be
> computed with two different assumptions - either as a set of independent
> binary regression or as a log-linear model (
> https://en.wikipedia.org/wiki/Multinomial_logistic_regression).
>
> Then, we can compute the probabilities either by using a class as a pivot
> and computing exp(beta_c X) / 1 + sum(exp(beta_k X)) or using all classes
> and computing a softmax.
>
> My question is related to the LogisticRegression in the OvR scheme.
> Naively, I thought that it was corresponding to the former case (case of a
> set of independent regression). However, we are using another normalization
> there which was first implemented in liblinear. I search on liblinear's
> issue tracker and found: https://github.com/cjlin1/liblinear/pull/20
>
> It is related to the following paper:
> https://www.csie.ntu.edu.tw/~cjlin/papers/generalBT.pdf
>
> My skill in math is limited and I am not sure to grasp what is going on?
> Anybody could bring some lights on this OvR normalization and why is it
> different from the case of a set of independent regression describe in
> Wikipedia?
>
> Cheers,
> --
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190207/ce9edef9/attachment.html>