[scikit-learn] Why is cross_val_predict discouraged?

Wed Apr 3 07:59:18 EDT 2019

The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.

For a metric like MSE it will be almost identical assuming the test
sets have almost the same size. For something like Recall
(sensitivity) it will be almost identical assuming similar test set
sizes *and* stratification. For something like precision whose
denominator is determined by the biases of the learnt classifier on
the test dataset, you can't say the same. For something like ROC AUC
score, relying on some decision function that may not be equivalently
calibrated across splits, evaluating in this way is almost
meaningless.

On Wed, 3 Apr 2019 at 22:01, Boris Hollas
<hollas at informatik.htw-dresden.de> wrote:
>
> I use
>
> sum((cross_val_predict(model, X, y) - y)**2) / len(y)        (*)
>
> to evaluate the performance of a model. This conforms with Murphy: Machine Learning, section 6.5.3, and Hastie et al: The Elements of Statistical Learning,  eq. 7.48. However, according to the documentation of cross_val_predict, "it is not appropriate to pass these predictions into an evaluation metric". While it is obvious that cross_val_predict is different from cross_val_score, I don't see what should be wrong with (*).
>
> Also, the explanation that "cross_val_predict simply returns the labels (or probabilities)" is unclear, if not wrong. As I understand it, this function returns estimates and no labels or probabilities.
>
> Regards, Boris
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn