[scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1

Wed May 29 13:34:49 EDT 2019

Hi everyone,

I noticed recently that in the Lasso implementation (and docs), the MSE
term is normalized by the number of samples
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html

but for LogisticRegression + L1, the logloss does not seem to be normalized
by the number of samples. One consequence is that the strength of the
regularization depends on the number of samples explicitly. For instance,
in Lasso, if you tile a dataset N times, you will learn the same coef, but
in LogisticRegression, you will learn a different coef.

Is this the intended behavior of LogisticRegression? I was surprised by
this. Either way, it would be helpful to document this more clearly in the
Logistic Regression docs (I can make a PR.)
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Jesse
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190529/5ea1b9c5/attachment.html>