[scikit-learn] How is linear regression in scikit-learn done? Do you need train and test split?

C W tmrsg11 at gmail.com
Wed Jun 12 14:36:42 EDT 2019


Thank you both for the papers references.

@ Andreas,
What is your take? And what are you implying?

The Breiman (2001) paper points out the black box vs. statistical approach.
I call them black box vs. open box. He advocates black box in the paper.
Black box:
y <--- nature <--- x

Open box:
y <--- linear regression <---- x

Decision trees and neural nets are black box model. They require large
amount of data to train, and skip the part where it tries to understand
nature.

Because it is a black box, you can't open up to see what's inside. Linear
regression is a very simple model that you can use to approximate nature,
but the key thing is that you need to know how the data are generated.

@ Brown,
I know nothing about molecular modeling. The paper your linked "Beware of
q2!" paper raises some interesting point, as far as I see in sklearn linear
regression, score is R^2.

On Wed, Jun 5, 2019 at 9:11 AM Andreas Mueller <t3kcit at gmail.com> wrote:

>
> On 6/4/19 8:44 PM, C W wrote:
> > Thank you all for the replies.
> >
> > I agree that prediction accuracy is great for evaluating black-box ML
> > models. Especially advanced models like neural networks, or
> > not-so-black models like LASSO, because they are NP-hard to solve.
> >
> > Linear regression is not a black-box. I view prediction accuracy as an
> > overkill on interpretable models. Especially when you can use
> > R-squared, coefficient significance, etc.
> >
> > Prediction accuracy also does not tell you which feature is important.
> >
> > What do you guys think? Thank you!
> >
> Did you read the paper that I sent? ;)
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190612/f57b3b77/attachment.html>


More information about the scikit-learn mailing list