[scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score

Sat Aug 14 10:04:24 EDT 2021

Hi Samir, the following visualization might be useful for gaining intuition
on the meaning of a negative r2:
https://gist.github.com/WittmannF/02060b45ce3ec9239898a5b91df2564e

A negative r2 is reflects into a model predicting the opposite trend of the
data.

On Sat, Aug 14, 2021, 03:17 Samir K Mahajan <samirkmahajan1972 at gmail.com>
wrote:

> Dear Chrisophe,
> I think you are oversimplifying by saying econometrics tools are for
> inference.  Forecasting and prediction are integral parts of econometric
> analysis. Econometricians  forecast by inferring the right conclusion
> about the model .   I wish to convey to you that I teach  both
> statistics and econometrics,  and am now learning ML. There is a
> fundamental difference among statistics, econometrics and  machine
> learning.
> Regards,
>
> Samir K Mahajan
>
> On Fri, Aug 13, 2021 at 3:39 PM Christophe Pallier <christophe at pallier.org>
> wrote:
>
>> Indeed , this is basically what I told you (you do not be need to copy
>> textbook stuff: I taught probas/stats) : these are mostly problems for
>> *inference*.
>>
>> On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>> wrote:
>>
>>>
>>> Dear Christophe Pallier*,*
>>>
>>> When we are doing prediction, we are relying on the values of the
>>> coefficients of the model created. We are feeding test data on the model
>>> for prediction.    We may be nterested to see if the OLS
>>> estimators(coefficients)  are BLUE or not. In the presence of
>>> autocorrelation (normally noticed in time series data),  residuals are not
>>> independent, and as such the OLS estimators are not BLUE in the sense that
>>> they don't have minimum variance, and thus no more efficient estimators.
>>> Statistical tests (t, F and *χ*2)  may not be valid.  We may reject the
>>> model to make predictions in such a situation.  .   We have to rely upon
>>> other improved models.   There may be issues relating to multicollinearity
>>> (in case of multivariable regression model)  and heteroscedasticity (mostly
>>> seen  in cross-section data) too in a model.  Can we discard these  tools
>>> while predicting a model?
>>>
>>> Regards,
>>>
>>> Samir K Mahajan
>>>
>>>
>>> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <
>>> christophe at pallier.org> wrote:
>>>
>>>> Actually, multicollinearity and autocorrelation are problems for
>>>> *inference* more than for *prediction*. For example, if there is
>>>> autocorrelation, the residuals are not independent, and the degrees of
>>>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>>>> AR1 model).
>>>>
>>>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> A note please (to Sebastian Raschka, mrschots).
>>>>>
>>>>>
>>>>>   The OLS model  that I used  ( where the test score gave me a
>>>>> negative value)  was not a good fit.  Initial findings showed that t*he
>>>>> regression coefficients and  the model as a whole were significant,    *yet
>>>>> ,  finally  ,  it failed in two econometrics tests  such as VIF (used for
>>>>> detecting multicollinearity ) and Durbin-Watson test  ( used for detecting
>>>>> auto-correlation).  *Presence of multicollinearity and
>>>>> autocorrelation problems * in the model make it unsuitable for
>>>>> prediction.
>>>>> Regards,
>>>>>
>>>>> Samir K Mahajan.
>>>>>
>>>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>
>>>>>> Thanks  to all of you for your kind response.   Indeed, it  is a
>>>>>> great learning experience.  Yes, econometrics books  too create models for
>>>>>> prediction, and programming  really   makes things better in a complex
>>>>>> world.   My understanding is that machine learning does depend on
>>>>>> econometrics  too.
>>>>>>
>>>>>> My Regards,
>>>>>>
>>>>>> Samir K Mahajan
>>>>>>
>>>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>>>> mail at sebastianraschka.com> wrote:
>>>>>>
>>>>>>> The R2 function in scikit-learn works fine. A negative means that
>>>>>>> the regression model fits the data worse than a horizontal line
>>>>>>> representing the sample mean. E.g. you usually get that if you are
>>>>>>> overfitting the training set a lot and then apply that model to the test
>>>>>>> set. The econometrics book probably didn't cover applying a model to an
>>>>>>> independent data or test set, hence the [0, 1] suggestion.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Sebastian
>>>>>>>
>>>>>>>
>>>>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>>>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>>>
>>>>>>>
>>>>>>> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
>>>>>>> Thank you for your kind response.  Fair enough. I go with you R2 is
>>>>>>> not a square.  However, if you open any  book of econometrics,  it says R2
>>>>>>> is  a ratio that lies between 0  and 1.  *This is the constraint.*
>>>>>>> It measures the proportion or percentage of the total variation in
>>>>>>> response variable (Y)  explained by the regressors (Xs) in the model .
>>>>>>> Remaining proportion of variation in Y, if any,  is explained by the
>>>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative
>>>>>>> value lying on a linear scale (-5.763335245921777). This negative
>>>>>>> value breaks the *constraint.* I just want to highlight that. I
>>>>>>> think it needs to be corrected. Rest is up to you .
>>>>>>>
>>>>>>> I find that  Reshama Saikh  is hurt by my email. I am really sorry
>>>>>>> for that. Please note I never undermine your  capabilities and initiatives.
>>>>>>> You are great people doing great jobs. I realise that I should have been
>>>>>>> more sensible.
>>>>>>>
>>>>>>> My regards to all of you.
>>>>>>>
>>>>>>> Samir K Mahajan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>>>> christophe at pallier.org> wrote:
>>>>>>>
>>>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>>>
>>>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Dear All,
>>>>>>>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score
>>>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>>>> of OLS regression model)
>>>>>>>>> However, what amuses me more  is seeing you justifying   negative
>>>>>>>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Samir K Mahajan.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> scikit-learn mailing list
>>>>>>>>> scikit-learn at python.org
>>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikit-learn mailing list
>>>>>>>> scikit-learn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/631a47a2/attachment-0001.html>