[scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1

Mon Jun 10 03:16:17 EDT 2019

see https://github.com/scikit-learn/scikit-learn/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aclosed+scale_C+
for historical perspective on this issue.

Alex

On Wed, May 29, 2019 at 11:32 PM Stuart Reynolds
<stuart at stuartreynolds.net> wrote:
>
> I looked into like a while ago. There were differences in which algorithms regularized the intercept, and which ones do not. (I believe liblinear does, lbgfs does not).
> All of the algorithms disagreed with logistic regression in scipy.
>
> - Stuart
>
> On Wed, May 29, 2019 at 10:50 AM Andreas Mueller <t3kcit at gmail.com> wrote:
>>
>> That is not very ideal indeed.
>> I think we just went with what liblinear did, and when saga was introduced kept that behavior.
>> It should probably be scaled as in Lasso, I would imagine?
>>
>>
>> On 5/29/19 1:42 PM, Michael Eickenberg wrote:
>>
>> Hi Jesse,
>>
>> I think there was an effort to compare normalization methods on the data attachment term between Lasso and Ridge regression back in 2012/13, but this might have not been finished or extended to Logistic Regression.
>>
>> If it is not documented well, it could definitely benefit from a documentation update.
>>
>> As for changing it to a more consistent state, that would require adding a keyword argument pertaining to this functionality and, after discussion, possibly changing the default value after some deprecation cycles (though this seems like a dangerous one to change at all imho).
>>
>> Michael
>>
>>
>> On Wed, May 29, 2019 at 10:38 AM Jesse Livezey <jesse.livezey at gmail.com> wrote:
>>>
>>> Hi everyone,
>>>
>>> I noticed recently that in the Lasso implementation (and docs), the MSE term is normalized by the number of samples
>>> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html
>>>
>>> but for LogisticRegression + L1, the logloss does not seem to be normalized by the number of samples. One consequence is that the strength of the regularization depends on the number of samples explicitly. For instance, in Lasso, if you tile a dataset N times, you will learn the same coef, but in LogisticRegression, you will learn a different coef.
>>>
>>> Is this the intended behavior of LogisticRegression? I was surprised by this. Either way, it would be helpful to document this more clearly in the Logistic Regression docs (I can make a PR.)
>>> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
>>>
>>> Jesse
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn