[scikit-learn] Problem with nested cross-validation example?
Daniel Homola
daniel.homola11 at imperial.ac.uk
Tue Nov 29 04:53:44 EST 2016
Hi Joel,
Unfortunately, the link says "artifact not found". Whatever that means..
On 29/11/16 09:50, Joel Nothman wrote:
> This makes me a little sad. Do Albert and Daniel think the explicit
> reference from blurb to code proposed at
> https://github.com/scikit-learn/scikit-learn/pull/7949 is a sufficient
> remedy? Otherwise could you please propose another clarifying change?
> Thanks.
>
> On 29 November 2016 at 20:04, Albert Thomas <albertthomas88 at gmail.com
> <mailto:albertthomas88 at gmail.com>> wrote:
>
> When I was reading Sebastian's blog posts on Cross Validation a
> few weeks ago I also found the example of Nested cross validation
> on scikit-learn. At first like Daniel I thought the example was
> not doing what it should be doing. But after a few minutes I
> finally realized that it was correct. So I am for a bit more
> clarification.
>
> Albert
>
> On Tue, 29 Nov 2016 at 02:53, Sebastian Raschka
> <se.raschka at gmail.com <mailto:se.raschka at gmail.com>> wrote:
>
> On first glance, the image shown in the image and the code
> example seem to do/show the same thing? Maybe it would be
> worth adding an explanatory figure like this to the docs to
> clarify?
>
> > On Nov 28, 2016, at 7:07 PM, Joel Nothman
> <joel.nothman at gmail.com <mailto:joel.nothman at gmail.com>> wrote:
> >
> > If that clarifies, please offer changes to the example (as a
> pull request) that make this clearer.
> >
> > On 29 November 2016 at 11:06, Joel Nothman
> <joel.nothman at gmail.com <mailto:joel.nothman at gmail.com>> wrote:
> > Briefly:
> >
> > clf = GridSearchCV(estimator=svr, param_grid=p_grid,
> cv=inner_cv)
> > nested_score = cross_val_score(clf, X=X_iris, y=y_iris,
> cv=outer_cv)
> >
> > Each train/test split in cross_val_score holds out test
> data. GridSearchCV then splits each train set into
> (inner-)train and validation sets. There is no leakage of test
> set knowledge from the outer loop into the grid search
> optimisation; no leakage of validation set knowledge into the
> SVR optimisation. The outer test data are reused as training
> data, but within each split are only used to measure
> generalisation error.
> >
> > Is that clear?
> >
> > On 29 November 2016 at 10:30, Daniel Homola
> <dani.homola at gmail.com <mailto:dani.homola at gmail.com>> wrote:
> > Dear all,
> >
> > I was wondering if the following example code is valid:
> >
> http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html
> <http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html>
> >
> > My understanding is, that the point of nested
> cross-validation is to prevent any data leakage from the inner
> grid-search/param optimization CV loop into the outer model
> evaluation CV loop. This could be achieved if the outer CV
> loop's test data is completely separated from the inner loop's
> CV, as shown here:
> >
> https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png
> <https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png>
> >
> > The code in the above example however doesn't seem to
> achieve this in any way.
> >
> > Am I missing something here?
> >
> > Thanks a lot,
> > dh
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org <mailto:scikit-learn at python.org>
> > https://mail.python.org/mailman/listinfo/scikit-learn
> <https://mail.python.org/mailman/listinfo/scikit-learn>
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org <mailto:scikit-learn at python.org>
> > https://mail.python.org/mailman/listinfo/scikit-learn
> <https://mail.python.org/mailman/listinfo/scikit-learn>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
> <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
> <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161129/b75ec79e/attachment-0001.html>
More information about the scikit-learn
mailing list