[scikit-learn] Problem with nested cross-validation example?

Joel Nothman joel.nothman at gmail.com
Tue Nov 29 05:48:39 EST 2016


Wait an hour for the docs to build and you won't get artifact not found :)

If you'd looked at the PR diff, you'd see I've modified the description to
refer directly to GridSearchCV and cross_val_score:

In the inner loop (here executed by GridSearchCV), the score is
> approximately maximized by fitting a model to each training set, and then
> directly maximized in selecting (hyper)parameters over the validation set.
> In the outer loop (here in cross_val_score), ...


Further comments in the code are welcome.

On 29 November 2016 at 21:42, Albert Thomas <albertthomas88 at gmail.com>
wrote:

> I also get "artifact not found". And I agree with Daniel.
>
> Once you decompose what the code is doing you realize that it does the
> job. The simplicity of the code to perform nested cross validation using
> scikit learn objects is impressive but I guess it also makes it less
> obvious. So making the example clearer by explaining what the code does or
> by adding a few comments can be useful for others.
>
> Albert
>
> On Tue, 29 Nov 2016 at 11:19, Daniel Homola <daniel.homola11 at imperial.ac.
> uk> wrote:
>
>> Hi Joel,
>>
>> Thanks a lot for the answer.
>>
>> "Each train/test split in cross_val_score holds out test data.
>> GridSearchCV then splits each train set into (inner-)train and validation
>> sets. "
>>
>> I know this is what nested CV supposed to do but the code is doing an
>> excellent job at obscuring this. I'll try and add some clarification in as
>> comments later today.
>>
>> Cheers,
>>
>> d
>>
>>
>> On 29/11/16 00:07, Joel Nothman wrote:
>>
>> If that clarifies, please offer changes to the example (as a pull
>> request) that make this clearer.
>>
>> On 29 November 2016 at 11:06, Joel Nothman <joel.nothman at gmail.com>
>> wrote:
>>
>> Briefly:
>>
>> clf = GridSearchCV <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV>(estimator=svr, param_grid=p_grid, cv=inner_cv)nested_score = cross_val_score <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score>(clf, X=X_iris, y=y_iris, cv=outer_cv)
>>
>>
>> Each train/test split in cross_val_score holds out test data.
>> GridSearchCV then splits each train set into (inner-)train and validation
>> sets. There is no leakage of test set knowledge from the outer loop into
>> the grid search optimisation; no leakage of validation set knowledge into
>> the SVR optimisation. The outer test data are reused as training data, but
>> within each split are only used to measure generalisation error.
>>
>> Is that clear?
>>
>> On 29 November 2016 at 10:30, Daniel Homola <dani.homola at gmail.com>
>> wrote:
>>
>> Dear all,
>>
>>
>> I was wondering if the following example code is valid:
>>
>> http://scikit-learn.org/stable/auto_examples/model_
>> selection/plot_nested_cross_validation_iris.html
>>
>> My understanding is, that the point of nested cross-validation is to
>> prevent any data leakage from the inner grid-search/param optimization CV
>> loop into the outer model evaluation CV loop. This could be achieved if the
>> outer CV loop's test data is completely separated from the inner loop's CV,
>> as shown here:
>>
>> https://mlr-org.github.io/mlr-tutorial/release/html/img/
>> nested_resampling.png
>>
>>
>> The code in the above example however doesn't seem to achieve this in any
>> way.
>>
>>
>> Am I missing something here?
>>
>>
>> Thanks a lot,
>>
>> dh
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161129/c49b617c/attachment-0001.html>


More information about the scikit-learn mailing list