[scikit-learn] Why is cross_val_predict discouraged?

Joel Nothman joel.nothman at gmail.com
Wed Apr 3 17:46:57 EDT 2019


Pull requests improving the documentation are always welcome. At a minimum,
users need to know that these compute different things.

Accuracy is not precision. Precision is the number of true positives
divided by the number of true positives plus false positives. It therefore
cannot be decomposed as a sample-wise measure without knowing the rate of
positive predictions. This rate is dependent on the training data and
algorithm.

I'm not a statistician and cannot speak to issues of computing a mean of
means, but if what we are trying to estimate is the performance on a sample
of size approximately n_t of a model trained on a sample of size
approximately N - n_t, then I wouldn't have thought taking a mean over such
measures (with whatever score function) to be unreasonable.

On Thu., 4 Apr. 2019, 3:51 am Boris Hollas, <
hollas at informatik.htw-dresden.de> wrote:

> Am 03.04.19 um 13:59 schrieb Joel Nothman:
>
> The equations in Murphy and Hastie very clearly assume a metric
> decomposable over samples (a loss function). Several popular metrics
> are not.
>
> For a metric like MSE it will be almost identical assuming the test
> sets have almost the same size.
>
> What will be almost identical to what? I suppose you mean that (*) is
> consistent with the scores of the models in the fold (ie, the result of
> cross_val_score) if the loss function is (x-y)².
>
> For something like Recall
> (sensitivity) it will be almost identical assuming similar test set
> sizes **and** stratification. For something like precision whose
> denominator is determined by the biases of the learnt classifier on
> the test dataset, you can't say the same.
>
> I can't follow here. If the loss function is L(x,y) = 1_{x = y}, then (*)
> gives the accuracy.
>
>  For something like ROC AUC
> score, relying on some decision function that may not be equivalently
> calibrated across splits, evaluating in this way is almost
> meaningless.
>
> In any case, I still don't see what may be wrong with (*). Otherwise, the
> warning in the documentation about the use of cross_val_predict should be
> removed or revised.
>
> On the other hand, an example in the documentation uses
> cross_val_scores.mean(). This is debatable since this computes a mean of
> means.
>
>
>
> On Wed, 3 Apr 2019 at 22:01, Boris Hollas<hollas at informatik.htw-dresden.de> <hollas at informatik.htw-dresden.de> wrote:
>
> I use
>
> sum((cross_val_predict(model, X, y) - y)**2) / len(y)        (*)
>
> to evaluate the performance of a model. This conforms with Murphy: Machine Learning, section 6.5.3, and Hastie et al: The Elements of Statistical Learning,  eq. 7.48. However, according to the documentation of cross_val_predict, "it is not appropriate to pass these predictions into an evaluation metric". While it is obvious that cross_val_predict is different from cross_val_score, I don't see what should be wrong with (*).
>
> Also, the explanation that "cross_val_predict simply returns the labels (or probabilities)" is unclear, if not wrong. As I understand it, this function returns estimates and no labels or probabilities.
>
> Regards, Boris
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190404/18943e3d/attachment.html>


More information about the scikit-learn mailing list