[scikit-learn] Scores in Cross Validation

Thu Jan 26 11:02:48 EST 2017

Hello,

I have 2 questions regarding cross_val_score.
1. Do the scores returned by cross_val_score correspond to only the test
set or the whole data set (training and test sets)?
I tried to look at the source code, and it looks like it returns the score
of only the test set (line 145: "return_train_score=False") - I am not sure
if I am reading the codes properly, though..
https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/model_
selection/_validation.py#L36
I came across the paper below and the authors use the score of the whole
dataset when the author performs repeated nested loop, grid search cv,
etc.. e.g. see algorithm 1 (line 1c) and 2 (line 2d) on page 3.
https://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-10
I wonder what's the pros and cons of using the accuracy score of the whole
dataset vs just the test set.. any thoughts?

2. On line 283 of the cross_val_score source code, there is a function
_score. However, I can't find where this function is called. Could you let
me know where this function is called?

Thank you very much!
Raga
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170126/3c4ec54c/attachment.html>