[scikit-learn] precision_recall_curve giving incorrect results on very small example

Tue Apr 28 13:41:14 EDT 2020

Here is a very small example using precision_recall_curve():

from sklearn.metrics import precision_recall_curve, precision_score,
recall_score
y_true = [0, 1]
y_predict_proba = [0.25,0.75]
precision, recall, thresholds = precision_recall_curve(y_true, y_predict_proba)
precision, recall

which results in:

(array([1., 1.]), array([1., 0.]))

Now let's calculate manually to see whether that's correct.  There are
three possible class vectors depending on threshold: [0,0], [0,1], and
[1,1]. We have to discard [0,0] because it gives an undefined precision
(divide by zero). So, applying precision_score() and recall_score() to the
other two:

y_predict_class=[0,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)

which gives:

(1.0, 1.0)

and

y_predict_class=[1,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)

which gives

(0.5, 1.0)

This seems not to match the output of precision_recall_curve() (which for
example did not produce a 0.5 precision value).

Am I missing something?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200428/64d36267/attachment.html>