[scikit-learn] Problem with nested cross-validation example?

Daniel Homola daniel.homola11 at imperial.ac.uk
Tue Nov 29 04:53:44 EST 2016


Hi Joel,

Unfortunately, the link says "artifact not found". Whatever that means..


On 29/11/16 09:50, Joel Nothman wrote:
> This makes me a little sad. Do Albert and Daniel think the explicit 
> reference from blurb to code proposed at 
> https://github.com/scikit-learn/scikit-learn/pull/7949 is a sufficient 
> remedy? Otherwise could you please propose another clarifying change? 
> Thanks.
>
> On 29 November 2016 at 20:04, Albert Thomas <albertthomas88 at gmail.com 
> <mailto:albertthomas88 at gmail.com>> wrote:
>
>     When I was reading Sebastian's blog posts on Cross Validation a
>     few weeks ago I also found the example of Nested cross validation
>     on scikit-learn. At first like Daniel I thought the example was
>     not doing what it should be doing. But after a few minutes I
>     finally realized that it was correct. So I am for a bit more
>     clarification.
>
>     Albert
>
>     On Tue, 29 Nov 2016 at 02:53, Sebastian Raschka
>     <se.raschka at gmail.com <mailto:se.raschka at gmail.com>> wrote:
>
>         On first glance, the image shown in the image and the code
>         example seem to do/show the same thing? Maybe it would be
>         worth adding an explanatory figure like this to the docs to
>         clarify?
>
>         > On Nov 28, 2016, at 7:07 PM, Joel Nothman
>         <joel.nothman at gmail.com <mailto:joel.nothman at gmail.com>> wrote:
>         >
>         > If that clarifies, please offer changes to the example (as a
>         pull request) that make this clearer.
>         >
>         > On 29 November 2016 at 11:06, Joel Nothman
>         <joel.nothman at gmail.com <mailto:joel.nothman at gmail.com>> wrote:
>         > Briefly:
>         >
>         > clf = GridSearchCV(estimator=svr, param_grid=p_grid,
>         cv=inner_cv)
>         > nested_score = cross_val_score(clf, X=X_iris, y=y_iris,
>         cv=outer_cv)
>         >
>         > Each train/test split in cross_val_score holds out test
>         data. GridSearchCV then splits each train set into
>         (inner-)train and validation sets. There is no leakage of test
>         set knowledge from the outer loop into the grid search
>         optimisation; no leakage of validation set knowledge into the
>         SVR optimisation. The outer test data are reused as training
>         data, but within each split are only used to measure
>         generalisation error.
>         >
>         > Is that clear?
>         >
>         > On 29 November 2016 at 10:30, Daniel Homola
>         <dani.homola at gmail.com <mailto:dani.homola at gmail.com>> wrote:
>         > Dear all,
>         >
>         > I was wondering if the following example code is valid:
>         >
>         http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html
>         <http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html>
>         >
>         > My understanding is, that the point of nested
>         cross-validation is to prevent any data leakage from the inner
>         grid-search/param optimization CV loop into the outer model
>         evaluation CV loop. This could be achieved if the outer CV
>         loop's test data is completely separated from the inner loop's
>         CV, as shown here:
>         >
>         https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png
>         <https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png>
>         >
>         > The code in the above example however doesn't seem to
>         achieve this in any way.
>         >
>         > Am I missing something here?
>         >
>         > Thanks a lot,
>         > dh
>         >
>         > _______________________________________________
>         > scikit-learn mailing list
>         > scikit-learn at python.org <mailto:scikit-learn at python.org>
>         > https://mail.python.org/mailman/listinfo/scikit-learn
>         <https://mail.python.org/mailman/listinfo/scikit-learn>
>         >
>         >
>         >
>         > _______________________________________________
>         > scikit-learn mailing list
>         > scikit-learn at python.org <mailto:scikit-learn at python.org>
>         > https://mail.python.org/mailman/listinfo/scikit-learn
>         <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>         <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161129/b75ec79e/attachment-0001.html>


More information about the scikit-learn mailing list