[scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score

Tomek Drabas drabas.t at gmail.com
Thu Aug 12 15:41:02 EDT 2021


In the simplest case of a simple linear regression what you wrote holds
true: the explained variance is simply a sum of variance explained by the
model and the residual variability that cannot be explained, and that would
always lie between 0 and 1. e.g. here:
https://online.stat.psu.edu/stat500/lesson/9/9.3

However, this would be quite hard to do for more complex models (even for a
multivariate linear regression) thus a need for a more general definition
like here: https://en.wikipedia.org/wiki/Coefficient_of_determination or
here https://www.investopedia.com/terms/r/r-squared.asp. I can easily
envision a situation where data has outliers (i.e. data is not clean
enough to be used in modeling) that it'd render a model that performs worse
than a base model of simply taking average as a prediction for each
observation.

Cheers,
-Tom

On Thu, Aug 12, 2021 at 12:19 PM Samir K Mahajan <
samirkmahajan1972 at gmail.com> wrote:

>
> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
> Thank you for your kind response.  Fair enough. I go with you R2 is not a
> square.  However, if you open any  book of econometrics,  it says R2 is  a
> ratio that lies between 0  and 1.  *This is the constraint.* It measures
> the proportion or percentage of the total variation in  response
> variable (Y)  explained by the regressors (Xs) in the model . Remaining
> proportion of variation in Y, if any,  is explained by the residual term(u)
> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
> linear scale (-5.763335245921777). This negative value breaks the *constraint.
> *I just want to highlight that. I think it needs to be corrected. Rest is
> up to you .
>
> I find that  Reshama Saikh  is hurt by my email. I am really sorry for
> that. Please note I never undermine your  capabilities and initiatives. You
> are great people doing great jobs. I realise that I should have been more
> sensible.
>
> My regards to all of you.
>
> Samir K Mahajan
>
>
>
>
>
>
>
>
> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
> christophe at pallier.org> wrote:
>
>> Simple: despite its name R2 is not a square. Look up its definition.
>>
>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>> wrote:
>>
>>> Dear All,
>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score and
>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>> OLS regression model)
>>> However, what amuses me more  is seeing you justifying   negative
>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>> make sense to me . Please justify to me how squared values are negative.
>>>
>>> Regards,
>>> Samir K Mahajan.
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210812/66cfb48d/attachment-0001.html>


More information about the scikit-learn mailing list