From uros.pocek at gmail.com  Mon Aug  2 05:03:55 2021
From: uros.pocek at gmail.com (=?UTF-8?B?VXJvxaEgUG/EjWVr?=)
Date: Mon, 2 Aug 2021 11:03:55 +0200
Subject: [scikit-learn] scikit-learn for Apple Silicon M1 Macs
Message-ID: <CAPszk_7wBnY11BoxvHV5Uhoti+-bDP_WrHKv8eCiuqZxK3JV4Q@mail.gmail.com>

Hello, I am a student and ML programmer and I have been using scikit-learn
library for python for a few years now on my PC, but recently I switched to
M1 iMac and when I tried to transfer my projects and pip install used
libraries in them I ran in bunch of issues. Long story short I was able to
successfully install all ML libraries on my new Mac(tensorflow, numpy,
matplotlib, pandas, torch, ?) except scikit-learn (sklearn)! When can we
expect to see version of this library that can be installed using pip on M1
Macs and  that can be used without any issues?

Thank you all in advance.
Uros Pocek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210802/0829249b/attachment.html>

From rth.yurchak at gmail.com  Mon Aug  2 05:15:48 2021
From: rth.yurchak at gmail.com (Roman Yurchak)
Date: Mon, 2 Aug 2021 11:15:48 +0200
Subject: [scikit-learn] [TC Vote] Technical Committee vote: line length
In-Reply-To: <CAFvE7K6FYa3v5=Lxk_uzD6m5FCtdUtJhaMSUNs-_92Y6gXhBPg@mail.gmail.com>
References: <20210726212619.54iy56wbl4sdbe3z@phare.normalesup.org>
 <CAEOrW4-+LpeVeH3NYyLL2NAGWdREomAckXzM=H1h2+yBE1b3kQ@mail.gmail.com>
 <CAFvE7K6FYa3v5=Lxk_uzD6m5FCtdUtJhaMSUNs-_92Y6gXhBPg@mail.gmail.com>
Message-ID: <482f3b2c-fcff-719b-aa44-6f3c2d4afc0b@gmail.com>

I also don't have a strong opinion on this, and generally I'm just happy 
that black migration happened.

Still with a slight preference for 88 characters as the default.

On 28/07/2021 18:34, Olivier Grisel wrote:
> Many very active core devs not represented in the TC voted for 88 and
> my previous vote for 79 was not that strong. So I feel that I should
> now vote for 88:
> 
> Keep current 88 characters:
> 
> Olivier
> 
> Revert to 79 characters:
> 

From g.lemaitre58 at gmail.com  Mon Aug  2 06:07:18 2021
From: g.lemaitre58 at gmail.com (=?utf-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Mon, 2 Aug 2021 12:07:18 +0200
Subject: [scikit-learn] scikit-learn for Apple Silicon M1 Macs
In-Reply-To: <CAPszk_7wBnY11BoxvHV5Uhoti+-bDP_WrHKv8eCiuqZxK3JV4Q@mail.gmail.com>
References: <CAPszk_7wBnY11BoxvHV5Uhoti+-bDP_WrHKv8eCiuqZxK3JV4Q@mail.gmail.com>
Message-ID: <BF9A38DF-2C35-4D82-A265-C1543E0EC850@gmail.com>

There is no currently available wheel in PyPI because NumPy and SciPy does not provide wheels as well:
https://github.com/scikit-learn/scikit-learn/issues/19137 <https://github.com/scikit-learn/scikit-learn/issues/19137>


However, one can use `miniforge` or `mambaforge` to install binaries without the need to build from source:
https://scikit-learn.org/stable/install.html#installing-on-apple-silicon-m1-hardware <https://scikit-learn.org/stable/install.html#installing-on-apple-silicon-m1-hardware>

NB: I am currently developing scikit-learn with a M1 using `mambaforge` and the process is pretty smooth.
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

> On 2 Aug 2021, at 11:03, Uro? Po?ek <uros.pocek at gmail.com> wrote:
> 
> Hello, I am a student and ML programmer and I have been using scikit-learn library for python for a few years now on my PC, but recently I switched to M1 iMac and when I tried to transfer my projects and pip install used libraries in them I ran in bunch of issues. Long story short I was able to successfully install all ML libraries on my new Mac(tensorflow, numpy, matplotlib, pandas, torch, ?) except scikit-learn (sklearn)! When can we expect to see version of this library that can be installed using pip on M1 Macs and  that can be used without any issues?
> 
> Thank you all in advance.
> Uros Pocek
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210802/fb8fe8b3/attachment.html>

From reshama.stat at gmail.com  Thu Aug  5 10:38:18 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Thu, 5 Aug 2021 10:38:18 -0400
Subject: [scikit-learn] Open Source: sustainability and etiquette
In-Reply-To: <CAEjJ3fiyzScwSbw4hBc8aOExC5A3ZW7_yr2fo_=x6dJnCsgAgg@mail.gmail.com>
References: <CAKPCsujRtuY4XN3ck87C7-RKJQHvQuWTy8Fwj_bT+3wMNL9Z4A@mail.gmail.com>
 <CAEOrW4-fk1Vz5EN7MgCyU8kH1-tGV18STfcb6p7D28sRyyqxPg@mail.gmail.com>
 <CAEjJ3fiyzScwSbw4hBc8aOExC5A3ZW7_yr2fo_=x6dJnCsgAgg@mail.gmail.com>
Message-ID: <CAKPCsujfCDKrPVBXdzm_T4yWv5Eaa7b5OAnLjyqd6DzSPXanZw@mail.gmail.com>

Hello,
I found the video, it's from 2017. It's by Heather Miller, a professor at
CMU.  The 40-minute talk is entitled:  The Dramatic Consequences of the
Open Source Revolution [a]

Brigitta,
Heather references Nadia Eghbal's book in her talk, which I also added to
my list.  [b]

Adrin,
I added CHAOSS to the list as well.  They have a mailing list which I have
subscribed to.

[a]  https://youtu.be/K4mVuxcimWk
[b]  https://www.dataumbrella.org/open-source/open-source-sustainability


Reshama Shaikh
she/her
Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
| LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
<https://github.com/reshamas>

Data Umbrella <https://www.dataumbrella.org>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>


On Mon, Apr 19, 2021 at 6:51 PM Brigitta Sipocz <bsipocz at gmail.com> wrote:

> Hi,
>
> I've also very much liked Nadia Eghbal's book: Working in public; The
> making and maintenance of open source software. I haven't yet attended a
> conference where she was a speaker, but I'm certain there are some relevant
> recordings on youtube.
>
> Cheers,
>  Brigitta
>
>
> On Mon, 19 Apr 2021 at 06:27, Adrin <adrin.jalali at gmail.com> wrote:
>
>> This is a really good initiative Reshama, thanks for sharing.
>>
>> Have you seen CHAOSScon talks and activities? They're really good, and
>> touch on a lot of really good stuff when it comes to open source
>> communities and sustainability.
>> Eg.:  https://chaoss.community/chaosscon-2020-eu/
>>
>> Cheers,
>> Adrin
>>
>> On Fri, Apr 16, 2021 at 4:26 PM Reshama Shaikh <reshama.stat at gmail.com>
>> wrote:
>>
>>> Hello,
>>> I've seen some excellent resources that have explained open source, its
>>> sustainability, challenges and *indirectly, the etiquette*.
>>>
>>> I am starting to compile the list here [a].
>>>
>>> This keynote by Stuart Geiger is a must-watch:  The Invisible Work of
>>> Maintaining & Sustaining Open Source Software   [b]
>>>
>>> There is one more video by Emily someone who was at Microsoft, but is
>>> now a professor somewhere, and I am trying to track that video down.  I
>>> think it's from 2017.  I'll add it to the list once I find it.  If anyone
>>> knows the full name of the speaker, please share.
>>>
>>> [a]
>>> https://www.dataumbrella.org/open-source/open-source-sustainability
>>>
>>> [b]
>>> https://www.youtube.com/watch?v=PM3iltcaIL8
>>>
>>> Best,
>>> Reshama
>>> ---
>>> Reshama Shaikh
>>> she/her
>>> Blog <https://reshamas.github.io> | Twitter
>>> <https://twitter.com/reshamas> | LinkedIn
>>> <https://www.linkedin.com/in/reshamas/> | GitHub
>>> <https://github.com/reshamas>
>>>
>>> Data Umbrella <https://www.dataumbrella.org>
>>> NYC PyLadies
>>> <https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210805/394ba326/attachment.html>

From samirkmahajan1972 at gmail.com  Wed Aug 11 15:16:34 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Thu, 12 Aug 2021 00:46:34 +0530
Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score
 and sklearn.metrics.explained_variance_score
Message-ID: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>

Dear All,
I am amazed to find  negative  values of  sklearn.metrics.r2_score and
sklearn.metrics.explained_variance_score in a model ( cross validation of
OLS regression model)
However, what amuses me more  is seeing you justifying   negative
'sklearn.metrics.r2_score ' in your documentation.  This does not
make sense to me . Please justify to me how squared values are negative.

Regards,
Samir K Mahajan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210812/c49975d5/attachment.html>

From drabas.t at gmail.com  Wed Aug 11 15:29:09 2021
From: drabas.t at gmail.com (Tomek Drabas)
Date: Wed, 11 Aug 2021 19:29:09 +0000
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
Message-ID: <MW3PR12MB4345BCAA945993C2D41CB000A0F89@MW3PR12MB4345.namprd12.prod.outlook.com>

Hi Samir,

In the documentation there?s a link to how the coefficient of determination is defined: https://en.m.wikipedia.org/wiki/Coefficient_of_determination From this it is easy to see when the values can become negative: when the model performs significantly worse than the baseline (predicting average for each observation).

Common misconception is that the ?squaredness? is of some single value but in here (per CoD?s definition) it?s the ration of the squared distances of the baseline model and the estimated one.

Hope this helps,
-Tom

Sent on the go
________________________________
From: scikit-learn <scikit-learn-bounces+drabas.t=gmail.com at python.org> on behalf of Samir K Mahajan <samirkmahajan1972 at gmail.com>
Sent: Wednesday, August 11, 2021 12:16:34 PM
To: scikit-learn at python.org <scikit-learn at python.org>
Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score

Dear All,
I am amazed to find  negative  values of  sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model)
However, what amuses me more  is seeing you justifying   negative  'sklearn.metrics.r2_score ' in your documentation.  This does not make sense to me . Please justify to me how squared values are negative.

Regards,
Samir K Mahajan.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210811/ed6fcadb/attachment.html>

From reshama.stat at gmail.com  Wed Aug 11 15:35:06 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Wed, 11 Aug 2021 15:35:06 -0400
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
Message-ID: <0A284AE8-1F6C-4E62-92B9-69CBD43B9C78@gmail.com>

Hello Samir,
The tone of your email is disrespectful.  For any project, but particularly so for an open source project. It is not for this community.  

Please review the Code of Conduct for this library.
http://scikit-learn.org/stable/developers/contributing.html

Regards,
Reshama 

> On Aug 11, 2021, at 3:18 PM, Samir K Mahajan <samirkmahajan1972 at gmail.com> wrote:
> 
> ?
> Dear All,
> I am amazed to find  negative  values of  sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model) 
> However, what amuses me more  is seeing you justifying   negative  'sklearn.metrics.r2_score ' in your documentation.  This does not make sense to me . Please justify to me how squared values are negative. 
> 
> Regards,
> Samir K Mahajan. 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210811/48530648/attachment.html>

From christophe at pallier.org  Thu Aug 12 02:31:01 2021
From: christophe at pallier.org (Christophe Pallier)
Date: Thu, 12 Aug 2021 08:31:01 +0200
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
Message-ID: <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>

Simple: despite its name R2 is not a square. Look up its definition.

On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
wrote:

> Dear All,
> I am amazed to find  negative  values of  sklearn.metrics.r2_score and
> sklearn.metrics.explained_variance_score in a model ( cross validation of
> OLS regression model)
> However, what amuses me more  is seeing you justifying   negative
> 'sklearn.metrics.r2_score ' in your documentation.  This does not
> make sense to me . Please justify to me how squared values are negative.
>
> Regards,
> Samir K Mahajan.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210812/45abc814/attachment-0001.html>

From samirkmahajan1972 at gmail.com  Thu Aug 12 15:18:45 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Fri, 13 Aug 2021 00:48:45 +0530
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
Message-ID: <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>

Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
Thank you for your kind response.  Fair enough. I go with you R2 is not a
square.  However, if you open any  book of econometrics,  it says R2 is  a
ratio that lies between 0  and 1.  *This is the constraint.* It measures
the proportion or percentage of the total variation in  response
variable (Y)  explained by the regressors (Xs) in the model . Remaining
proportion of variation in Y, if any,  is explained by the residual term(u)
Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
linear scale (-5.763335245921777). This negative value breaks the *constraint.
*I just want to highlight that. I think it needs to be corrected. Rest is
up to you .

I find that  Reshama Saikh  is hurt by my email. I am really sorry for
that. Please note I never undermine your  capabilities and initiatives. You
are great people doing great jobs. I realise that I should have been more
sensible.

My regards to all of you.

Samir K Mahajan


On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <christophe at pallier.org>
wrote:

> Simple: despite its name R2 is not a square. Look up its definition.
>
> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
> wrote:
>
>> Dear All,
>> I am amazed to find  negative  values of  sklearn.metrics.r2_score and
>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>> OLS regression model)
>> However, what amuses me more  is seeing you justifying   negative
>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>> make sense to me . Please justify to me how squared values are negative.
>>
>> Regards,
>> Samir K Mahajan.
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210813/5bf78ebe/attachment.html>

From maykonschots at gmail.com  Thu Aug 12 15:30:34 2021
From: maykonschots at gmail.com (mrschots)
Date: Thu, 12 Aug 2021 16:30:34 -0300
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
Message-ID: <CANThK3L62EvSKr3tDZ+-z899AyYFyaoePq0iA=3Fhwra6sPskw@mail.gmail.com>

There is no constraint, that?s the point since nothing limits you to have a
model with crap predictions leading to be worse than to just predict the
target?s mean for every data point.

If you do so ?> negative R2.

Best Regards,

Em qui., 12 de ago. de 2021 ?s 16:21, Samir K Mahajan <
samirkmahajan1972 at gmail.com> escreveu:

>
> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
> Thank you for your kind response.  Fair enough. I go with you R2 is not a
> square.  However, if you open any  book of econometrics,  it says R2 is  a
> ratio that lies between 0  and 1.  *This is the constraint.* It measures
> the proportion or percentage of the total variation in  response
> variable (Y)  explained by the regressors (Xs) in the model . Remaining
> proportion of variation in Y, if any,  is explained by the residual term(u)
> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
> linear scale (-5.763335245921777). This negative value breaks the *constraint.
> *I just want to highlight that. I think it needs to be corrected. Rest is
> up to you .
>
> I find that  Reshama Saikh  is hurt by my email. I am really sorry for
> that. Please note I never undermine your  capabilities and initiatives. You
> are great people doing great jobs. I realise that I should have been more
> sensible.
>
> My regards to all of you.
>
> Samir K Mahajan
>
>
>
>
>
>
>
>
> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
> christophe at pallier.org> wrote:
>
>> Simple: despite its name R2 is not a square. Look up its definition.
>>
>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>> wrote:
>>
>>> Dear All,
>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score and
>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>> OLS regression model)
>>> However, what amuses me more  is seeing you justifying   negative
>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>> make sense to me . Please justify to me how squared values are negative.
>>>
>>> Regards,
>>> Samir K Mahajan.
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-- 
Schots
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210812/22f853f9/attachment.html>

From drabas.t at gmail.com  Thu Aug 12 15:41:02 2021
From: drabas.t at gmail.com (Tomek Drabas)
Date: Thu, 12 Aug 2021 12:41:02 -0700
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
Message-ID: <CADcdmN4hOQ_nKBuOLDqvYHw57EoqFUkLqCDXLambvO0VGPW-MQ@mail.gmail.com>

In the simplest case of a simple linear regression what you wrote holds
true: the explained variance is simply a sum of variance explained by the
model and the residual variability that cannot be explained, and that would
always lie between 0 and 1. e.g. here:
https://online.stat.psu.edu/stat500/lesson/9/9.3

However, this would be quite hard to do for more complex models (even for a
multivariate linear regression) thus a need for a more general definition
like here: https://en.wikipedia.org/wiki/Coefficient_of_determination or
here https://www.investopedia.com/terms/r/r-squared.asp. I can easily
envision a situation where data has outliers (i.e. data is not clean
enough to be used in modeling) that it'd render a model that performs worse
than a base model of simply taking average as a prediction for each
observation.

Cheers,
-Tom

On Thu, Aug 12, 2021 at 12:19 PM Samir K Mahajan <
samirkmahajan1972 at gmail.com> wrote:

>
> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
> Thank you for your kind response.  Fair enough. I go with you R2 is not a
> square.  However, if you open any  book of econometrics,  it says R2 is  a
> ratio that lies between 0  and 1.  *This is the constraint.* It measures
> the proportion or percentage of the total variation in  response
> variable (Y)  explained by the regressors (Xs) in the model . Remaining
> proportion of variation in Y, if any,  is explained by the residual term(u)
> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
> linear scale (-5.763335245921777). This negative value breaks the *constraint.
> *I just want to highlight that. I think it needs to be corrected. Rest is
> up to you .
>
> I find that  Reshama Saikh  is hurt by my email. I am really sorry for
> that. Please note I never undermine your  capabilities and initiatives. You
> are great people doing great jobs. I realise that I should have been more
> sensible.
>
> My regards to all of you.
>
> Samir K Mahajan
>
>
>
>
>
>
>
>
> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
> christophe at pallier.org> wrote:
>
>> Simple: despite its name R2 is not a square. Look up its definition.
>>
>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>> wrote:
>>
>>> Dear All,
>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score and
>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>> OLS regression model)
>>> However, what amuses me more  is seeing you justifying   negative
>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>> make sense to me . Please justify to me how squared values are negative.
>>>
>>> Regards,
>>> Samir K Mahajan.
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210812/66cfb48d/attachment-0001.html>

From mail at sebastianraschka.com  Thu Aug 12 15:28:03 2021
From: mail at sebastianraschka.com (Sebastian Raschka)
Date: Thu, 12 Aug 2021 14:28:03 -0500
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
Message-ID: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>

The R2 function in scikit-learn works fine. A negative means that the regression model fits the data worse than a horizontal line representing the sample mean. E.g. you usually get that if you are overfitting the training set a lot and then apply that model to the test set. The econometrics book probably didn't cover applying a model to an independent data or test set, hence the [0, 1] suggestion.

Cheers,
Sebastian


On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <samirkmahajan1972 at gmail.com>, wrote:
>
> Dear?Christophe Pallier,? Reshama?Saikh and Tromek?Drabas,
>
> Thank you for your kind response.??Fair enough. I go with?you R2 is not a square.? However, if you?open any? book of econometrics,? it says R2 is? a ratio that lies between 0? and 1.? This is the constraint. It measures the proportion or percentage of the total variation in? response variable?(Y)? explained by the regressors (Xs) in the model . Remaining proportion?of variation?in Y, if any,? is explained by the residual term(u) Now, sklearn.matrics.?metrics.r2_score gives me a negative value lying on a linear scale (-5.763335245921777). This negative value breaks the constraint. I just want to highlight that. I think it needs to be corrected. Rest is up to you .
>
> I find that? Reshama?Saikh? is hurt by my email. I am really sorry for that. Please note I never undermine your? capabilities?and initiatives. You are great?people doing great jobs. I realise that I should have been more sensible.
>
> My regards to all of you.
>
> Samir K Mahajan
>
>
>
>
>
>
>
>
> > On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <christophe at pallier.org> wrote:
> > > Simple: despite its name R2 is not a square. Look up its definition.
> > >
> > > > On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <samirkmahajan1972 at gmail.com> wrote:
> > > > > Dear All,
> > > > > I am amazed to find? negative? values of? sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model)
> > > > > However, what?amuses me more? is seeing you justifying? ?negative? 'sklearn.metrics.r2_score ' in your documentation.? This does not make?sense to?me . Please justify to me how squared?values are negative.
> > > > >
> > > > > Regards,
> > > > > Samir K Mahajan.
> > > > >
> > > > > _______________________________________________
> > > > > scikit-learn mailing list
> > > > > scikit-learn at python.org
> > > > > https://mail.python.org/mailman/listinfo/scikit-learn
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210812/9353a56b/attachment.html>

From samirkmahajan1972 at gmail.com  Thu Aug 12 16:11:17 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Fri, 13 Aug 2021 01:41:17 +0530
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
 <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>
Message-ID: <CAGJd5pW+hGU9uNcKk7uVbwmOfU3Lqvmae9iWXWzPX8zdH99oSQ@mail.gmail.com>

Thanks  to all of you for your kind response.   Indeed, it  is a
great learning experience.  Yes, econometrics books  too create models for
prediction, and programming  really   makes things better in a complex
world.   My understanding is that machine learning does depend on
econometrics  too.

My Regards,

Samir K Mahajan

On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <mail at sebastianraschka.com>
wrote:

> The R2 function in scikit-learn works fine. A negative means that the
> regression model fits the data worse than a horizontal line representing
> the sample mean. E.g. you usually get that if you are overfitting the
> training set a lot and then apply that model to the test set. The
> econometrics book probably didn't cover applying a model to an independent
> data or test set, hence the [0, 1] suggestion.
>
> Cheers,
> Sebastian
>
>
> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
> samirkmahajan1972 at gmail.com>, wrote:
>
>
> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
> Thank you for your kind response.  Fair enough. I go with you R2 is not a
> square.  However, if you open any  book of econometrics,  it says R2 is  a
> ratio that lies between 0  and 1.  *This is the constraint.* It measures
> the proportion or percentage of the total variation in  response
> variable (Y)  explained by the regressors (Xs) in the model . Remaining
> proportion of variation in Y, if any,  is explained by the residual term(u)
> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
> linear scale (-5.763335245921777). This negative value breaks the
> *constraint.* I just want to highlight that. I think it needs to be
> corrected. Rest is up to you .
>
> I find that  Reshama Saikh  is hurt by my email. I am really sorry for
> that. Please note I never undermine your  capabilities and initiatives. You
> are great people doing great jobs. I realise that I should have been more
> sensible.
>
> My regards to all of you.
>
> Samir K Mahajan
>
>
>
>
>
>
>
>
> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
> christophe at pallier.org> wrote:
>
>> Simple: despite its name R2 is not a square. Look up its definition.
>>
>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>> wrote:
>>
>>> Dear All,
>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score and
>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>> OLS regression model)
>>> However, what amuses me more  is seeing you justifying   negative
>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>> make sense to me . Please justify to me how squared values are negative.
>>>
>>> Regards,
>>> Samir K Mahajan.
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210813/fa58d18f/attachment-0001.html>

From samirkmahajan1972 at gmail.com  Thu Aug 12 16:32:03 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Fri, 13 Aug 2021 02:02:03 +0530
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pW+hGU9uNcKk7uVbwmOfU3Lqvmae9iWXWzPX8zdH99oSQ@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
 <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>
 <CAGJd5pW+hGU9uNcKk7uVbwmOfU3Lqvmae9iWXWzPX8zdH99oSQ@mail.gmail.com>
Message-ID: <CAGJd5pWC9H6GmwPhqNvHmgwF1_4-X2SnDeRBXOLBgmiYhO_PiQ@mail.gmail.com>

A note please (to Sebastian Raschka, mrschots).


  The OLS model  that I used  ( where the test score gave me a negative
value)  was not a good fit.  Initial findings showed that t*he
regression coefficients and  the model as a whole were significant,    *yet
,  finally  ,  it failed in two econometrics tests  such as VIF (used for
detecting multicollinearity ) and Durbin-Watson test  ( used for detecting
auto-correlation).  *Presence of multicollinearity and autocorrelation
problems * in the model make it unsuitable for prediction.
Regards,

Samir K Mahajan.

On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <samirkmahajan1972 at gmail.com>
wrote:

> Thanks  to all of you for your kind response.   Indeed, it  is a
> great learning experience.  Yes, econometrics books  too create models for
> prediction, and programming  really   makes things better in a complex
> world.   My understanding is that machine learning does depend on
> econometrics  too.
>
> My Regards,
>
> Samir K Mahajan
>
> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
> mail at sebastianraschka.com> wrote:
>
>> The R2 function in scikit-learn works fine. A negative means that the
>> regression model fits the data worse than a horizontal line representing
>> the sample mean. E.g. you usually get that if you are overfitting the
>> training set a lot and then apply that model to the test set. The
>> econometrics book probably didn't cover applying a model to an independent
>> data or test set, hence the [0, 1] suggestion.
>>
>> Cheers,
>> Sebastian
>>
>>
>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>> samirkmahajan1972 at gmail.com>, wrote:
>>
>>
>> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
>> Thank you for your kind response.  Fair enough. I go with you R2 is not
>> a square.  However, if you open any  book of econometrics,  it says R2 is
>> a ratio that lies between 0  and 1.  *This is the constraint.* It
>> measures the proportion or percentage of the total variation in  response
>> variable (Y)  explained by the regressors (Xs) in the model . Remaining
>> proportion of variation in Y, if any,  is explained by the residual term(u)
>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
>> linear scale (-5.763335245921777). This negative value breaks the
>> *constraint.* I just want to highlight that. I think it needs to be
>> corrected. Rest is up to you .
>>
>> I find that  Reshama Saikh  is hurt by my email. I am really sorry for
>> that. Please note I never undermine your  capabilities and initiatives. You
>> are great people doing great jobs. I realise that I should have been more
>> sensible.
>>
>> My regards to all of you.
>>
>> Samir K Mahajan
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>> christophe at pallier.org> wrote:
>>
>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>
>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>>> wrote:
>>>
>>>> Dear All,
>>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score and
>>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>>> OLS regression model)
>>>> However, what amuses me more  is seeing you justifying   negative
>>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>>> make sense to me . Please justify to me how squared values are negative.
>>>>
>>>> Regards,
>>>> Samir K Mahajan.
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210813/b1a0df2f/attachment.html>

From christophe at pallier.org  Fri Aug 13 03:36:06 2021
From: christophe at pallier.org (Christophe Pallier)
Date: Fri, 13 Aug 2021 09:36:06 +0200
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pWC9H6GmwPhqNvHmgwF1_4-X2SnDeRBXOLBgmiYhO_PiQ@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
 <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>
 <CAGJd5pW+hGU9uNcKk7uVbwmOfU3Lqvmae9iWXWzPX8zdH99oSQ@mail.gmail.com>
 <CAGJd5pWC9H6GmwPhqNvHmgwF1_4-X2SnDeRBXOLBgmiYhO_PiQ@mail.gmail.com>
Message-ID: <CALyk9bwRo2Cm9Ej1n+kB59x8y3-TVbMA=8LP=TEKko2-ywjbwQ@mail.gmail.com>

Actually, multicollinearity and autocorrelation are problems for
*inference* more than for *prediction*. For example, if there is
autocorrelation, the residuals are not independent, and the degrees of
freedom are wrong for the tests in an OLS model (but you can use, e.g., an
AR1 model).

On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
wrote:

> A note please (to Sebastian Raschka, mrschots).
>
>
>   The OLS model  that I used  ( where the test score gave me a negative
> value)  was not a good fit.  Initial findings showed that t*he
> regression coefficients and  the model as a whole were significant,    *yet
> ,  finally  ,  it failed in two econometrics tests  such as VIF (used for
> detecting multicollinearity ) and Durbin-Watson test  ( used for detecting
> auto-correlation).  *Presence of multicollinearity and autocorrelation
> problems * in the model make it unsuitable for prediction.
> Regards,
>
> Samir K Mahajan.
>
> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
> samirkmahajan1972 at gmail.com> wrote:
>
>> Thanks  to all of you for your kind response.   Indeed, it  is a
>> great learning experience.  Yes, econometrics books  too create models for
>> prediction, and programming  really   makes things better in a complex
>> world.   My understanding is that machine learning does depend on
>> econometrics  too.
>>
>> My Regards,
>>
>> Samir K Mahajan
>>
>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>> mail at sebastianraschka.com> wrote:
>>
>>> The R2 function in scikit-learn works fine. A negative means that the
>>> regression model fits the data worse than a horizontal line representing
>>> the sample mean. E.g. you usually get that if you are overfitting the
>>> training set a lot and then apply that model to the test set. The
>>> econometrics book probably didn't cover applying a model to an independent
>>> data or test set, hence the [0, 1] suggestion.
>>>
>>> Cheers,
>>> Sebastian
>>>
>>>
>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>>> samirkmahajan1972 at gmail.com>, wrote:
>>>
>>>
>>> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
>>> Thank you for your kind response.  Fair enough. I go with you R2 is not
>>> a square.  However, if you open any  book of econometrics,  it says R2 is
>>> a ratio that lies between 0  and 1.  *This is the constraint.* It
>>> measures the proportion or percentage of the total variation in  response
>>> variable (Y)  explained by the regressors (Xs) in the model . Remaining
>>> proportion of variation in Y, if any,  is explained by the residual term(u)
>>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
>>> linear scale (-5.763335245921777). This negative value breaks the
>>> *constraint.* I just want to highlight that. I think it needs to be
>>> corrected. Rest is up to you .
>>>
>>> I find that  Reshama Saikh  is hurt by my email. I am really sorry for
>>> that. Please note I never undermine your  capabilities and initiatives. You
>>> are great people doing great jobs. I realise that I should have been more
>>> sensible.
>>>
>>> My regards to all of you.
>>>
>>> Samir K Mahajan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>> christophe at pallier.org> wrote:
>>>
>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>
>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> Dear All,
>>>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score and
>>>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>>>> OLS regression model)
>>>>> However, what amuses me more  is seeing you justifying   negative
>>>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>
>>>>> Regards,
>>>>> Samir K Mahajan.
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210813/4c7a2f82/attachment-0001.html>

From samirkmahajan1972 at gmail.com  Fri Aug 13 06:02:55 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Fri, 13 Aug 2021 15:32:55 +0530
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CALyk9bwRo2Cm9Ej1n+kB59x8y3-TVbMA=8LP=TEKko2-ywjbwQ@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
 <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>
 <CAGJd5pW+hGU9uNcKk7uVbwmOfU3Lqvmae9iWXWzPX8zdH99oSQ@mail.gmail.com>
 <CAGJd5pWC9H6GmwPhqNvHmgwF1_4-X2SnDeRBXOLBgmiYhO_PiQ@mail.gmail.com>
 <CALyk9bwRo2Cm9Ej1n+kB59x8y3-TVbMA=8LP=TEKko2-ywjbwQ@mail.gmail.com>
Message-ID: <CAGJd5pUJnQJQoeem2ztYT6Vf6_iayXqV3uQ0reN0yTZ5z7Zs7g@mail.gmail.com>

Dear Christophe Pallier*,*

When we are doing prediction, we are relying on the values of the
coefficients of the model created. We are feeding test data on the model
for prediction.    We may be nterested to see if the OLS
estimators(coefficients)  are BLUE or not. In the presence of
autocorrelation (normally noticed in time series data),  residuals are not
independent, and as such the OLS estimators are not BLUE in the sense that
they don't have minimum variance, and thus no more efficient estimators.
Statistical tests (t, F and *?*2)  may not be valid.  We may reject the
model to make predictions in such a situation.  .   We have to rely upon
other improved models.   There may be issues relating to multicollinearity
(in case of multivariable regression model)  and heteroscedasticity (mostly
seen  in cross-section data) too in a model.  Can we discard these  tools
while predicting a model?

Regards,

Samir K Mahajan


On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <christophe at pallier.org>
wrote:

> Actually, multicollinearity and autocorrelation are problems for
> *inference* more than for *prediction*. For example, if there is
> autocorrelation, the residuals are not independent, and the degrees of
> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
> AR1 model).
>
> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
> wrote:
>
>> A note please (to Sebastian Raschka, mrschots).
>>
>>
>>   The OLS model  that I used  ( where the test score gave me a negative
>> value)  was not a good fit.  Initial findings showed that t*he
>> regression coefficients and  the model as a whole were significant,    *yet
>> ,  finally  ,  it failed in two econometrics tests  such as VIF (used for
>> detecting multicollinearity ) and Durbin-Watson test  ( used for detecting
>> auto-correlation).  *Presence of multicollinearity and autocorrelation
>> problems * in the model make it unsuitable for prediction.
>> Regards,
>>
>> Samir K Mahajan.
>>
>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>> samirkmahajan1972 at gmail.com> wrote:
>>
>>> Thanks  to all of you for your kind response.   Indeed, it  is a
>>> great learning experience.  Yes, econometrics books  too create models for
>>> prediction, and programming  really   makes things better in a complex
>>> world.   My understanding is that machine learning does depend on
>>> econometrics  too.
>>>
>>> My Regards,
>>>
>>> Samir K Mahajan
>>>
>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>> mail at sebastianraschka.com> wrote:
>>>
>>>> The R2 function in scikit-learn works fine. A negative means that the
>>>> regression model fits the data worse than a horizontal line representing
>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>> training set a lot and then apply that model to the test set. The
>>>> econometrics book probably didn't cover applying a model to an independent
>>>> data or test set, hence the [0, 1] suggestion.
>>>>
>>>> Cheers,
>>>> Sebastian
>>>>
>>>>
>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>
>>>>
>>>> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
>>>> Thank you for your kind response.  Fair enough. I go with you R2 is
>>>> not a square.  However, if you open any  book of econometrics,  it says R2
>>>> is  a ratio that lies between 0  and 1.  *This is the constraint.* It
>>>> measures the proportion or percentage of the total variation in  response
>>>> variable (Y)  explained by the regressors (Xs) in the model . Remaining
>>>> proportion of variation in Y, if any,  is explained by the residual term(u)
>>>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
>>>> linear scale (-5.763335245921777). This negative value breaks the
>>>> *constraint.* I just want to highlight that. I think it needs to be
>>>> corrected. Rest is up to you .
>>>>
>>>> I find that  Reshama Saikh  is hurt by my email. I am really sorry for
>>>> that. Please note I never undermine your  capabilities and initiatives. You
>>>> are great people doing great jobs. I realise that I should have been more
>>>> sensible.
>>>>
>>>> My regards to all of you.
>>>>
>>>> Samir K Mahajan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>> christophe at pallier.org> wrote:
>>>>
>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>
>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>
>>>>>> Dear All,
>>>>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score
>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>> of OLS regression model)
>>>>>> However, what amuses me more  is seeing you justifying   negative
>>>>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>
>>>>>> Regards,
>>>>>> Samir K Mahajan.
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210813/491645aa/attachment.html>

From christophe at pallier.org  Fri Aug 13 06:08:29 2021
From: christophe at pallier.org (Christophe Pallier)
Date: Fri, 13 Aug 2021 12:08:29 +0200
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pUJnQJQoeem2ztYT6Vf6_iayXqV3uQ0reN0yTZ5z7Zs7g@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
 <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>
 <CAGJd5pW+hGU9uNcKk7uVbwmOfU3Lqvmae9iWXWzPX8zdH99oSQ@mail.gmail.com>
 <CAGJd5pWC9H6GmwPhqNvHmgwF1_4-X2SnDeRBXOLBgmiYhO_PiQ@mail.gmail.com>
 <CALyk9bwRo2Cm9Ej1n+kB59x8y3-TVbMA=8LP=TEKko2-ywjbwQ@mail.gmail.com>
 <CAGJd5pUJnQJQoeem2ztYT6Vf6_iayXqV3uQ0reN0yTZ5z7Zs7g@mail.gmail.com>
Message-ID: <CALyk9bwm1vDNnhhnce2eYz2zLSH=wmMX=14aoNj1H0bzHUb65Q@mail.gmail.com>

Indeed , this is basically what I told you (you do not be need to copy
textbook stuff: I taught probas/stats) : these are mostly problems for
*inference*.

On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
wrote:

>
> Dear Christophe Pallier*,*
>
> When we are doing prediction, we are relying on the values of the
> coefficients of the model created. We are feeding test data on the model
> for prediction.    We may be nterested to see if the OLS
> estimators(coefficients)  are BLUE or not. In the presence of
> autocorrelation (normally noticed in time series data),  residuals are not
> independent, and as such the OLS estimators are not BLUE in the sense that
> they don't have minimum variance, and thus no more efficient estimators.
> Statistical tests (t, F and *?*2)  may not be valid.  We may reject the
> model to make predictions in such a situation.  .   We have to rely upon
> other improved models.   There may be issues relating to multicollinearity
> (in case of multivariable regression model)  and heteroscedasticity (mostly
> seen  in cross-section data) too in a model.  Can we discard these  tools
> while predicting a model?
>
> Regards,
>
> Samir K Mahajan
>
>
> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <christophe at pallier.org>
> wrote:
>
>> Actually, multicollinearity and autocorrelation are problems for
>> *inference* more than for *prediction*. For example, if there is
>> autocorrelation, the residuals are not independent, and the degrees of
>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>> AR1 model).
>>
>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>> wrote:
>>
>>> A note please (to Sebastian Raschka, mrschots).
>>>
>>>
>>>   The OLS model  that I used  ( where the test score gave me a negative
>>> value)  was not a good fit.  Initial findings showed that t*he
>>> regression coefficients and  the model as a whole were significant,    *yet
>>> ,  finally  ,  it failed in two econometrics tests  such as VIF (used for
>>> detecting multicollinearity ) and Durbin-Watson test  ( used for detecting
>>> auto-correlation).  *Presence of multicollinearity and autocorrelation
>>> problems * in the model make it unsuitable for prediction.
>>> Regards,
>>>
>>> Samir K Mahajan.
>>>
>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>> samirkmahajan1972 at gmail.com> wrote:
>>>
>>>> Thanks  to all of you for your kind response.   Indeed, it  is a
>>>> great learning experience.  Yes, econometrics books  too create models for
>>>> prediction, and programming  really   makes things better in a complex
>>>> world.   My understanding is that machine learning does depend on
>>>> econometrics  too.
>>>>
>>>> My Regards,
>>>>
>>>> Samir K Mahajan
>>>>
>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>> mail at sebastianraschka.com> wrote:
>>>>
>>>>> The R2 function in scikit-learn works fine. A negative means that the
>>>>> regression model fits the data worse than a horizontal line representing
>>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>>> training set a lot and then apply that model to the test set. The
>>>>> econometrics book probably didn't cover applying a model to an independent
>>>>> data or test set, hence the [0, 1] suggestion.
>>>>>
>>>>> Cheers,
>>>>> Sebastian
>>>>>
>>>>>
>>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>
>>>>>
>>>>> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
>>>>> Thank you for your kind response.  Fair enough. I go with you R2 is
>>>>> not a square.  However, if you open any  book of econometrics,  it says R2
>>>>> is  a ratio that lies between 0  and 1.  *This is the constraint.* It
>>>>> measures the proportion or percentage of the total variation in  response
>>>>> variable (Y)  explained by the regressors (Xs) in the model . Remaining
>>>>> proportion of variation in Y, if any,  is explained by the residual term(u)
>>>>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
>>>>> linear scale (-5.763335245921777). This negative value breaks the
>>>>> *constraint.* I just want to highlight that. I think it needs to be
>>>>> corrected. Rest is up to you .
>>>>>
>>>>> I find that  Reshama Saikh  is hurt by my email. I am really sorry for
>>>>> that. Please note I never undermine your  capabilities and initiatives. You
>>>>> are great people doing great jobs. I realise that I should have been more
>>>>> sensible.
>>>>>
>>>>> My regards to all of you.
>>>>>
>>>>> Samir K Mahajan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>> christophe at pallier.org> wrote:
>>>>>
>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>
>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>
>>>>>>> Dear All,
>>>>>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score
>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>> of OLS regression model)
>>>>>>> However, what amuses me more  is seeing you justifying   negative
>>>>>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Samir K Mahajan.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210813/31649a50/attachment-0001.html>

From danshiebler at gmail.com  Fri Aug 13 16:24:38 2021
From: danshiebler at gmail.com (Dan Shiebler)
Date: Fri, 13 Aug 2021 16:24:38 -0400
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CALyk9bwm1vDNnhhnce2eYz2zLSH=wmMX=14aoNj1H0bzHUb65Q@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
 <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>
 <CAGJd5pW+hGU9uNcKk7uVbwmOfU3Lqvmae9iWXWzPX8zdH99oSQ@mail.gmail.com>
 <CAGJd5pWC9H6GmwPhqNvHmgwF1_4-X2SnDeRBXOLBgmiYhO_PiQ@mail.gmail.com>
 <CALyk9bwRo2Cm9Ej1n+kB59x8y3-TVbMA=8LP=TEKko2-ywjbwQ@mail.gmail.com>
 <CAGJd5pUJnQJQoeem2ztYT6Vf6_iayXqV3uQ0reN0yTZ5z7Zs7g@mail.gmail.com>
 <CALyk9bwm1vDNnhhnce2eYz2zLSH=wmMX=14aoNj1H0bzHUb65Q@mail.gmail.com>
Message-ID: <CAMbUPQCy3vg5isjs1mMEGa9Bzb3Oc7509+5=cGLx2Sk_hVPw9g@mail.gmail.com>

Hey Samir, this blog post has some more details on the difference between
the square of the correlation coefficient and the coefficient of
determination: danshiebler.com/2017-06-25-metrics/

On Fri, Aug 13, 2021 at 6:10 AM Christophe Pallier <christophe at pallier.org>
wrote:

> Indeed , this is basically what I told you (you do not be need to copy
> textbook stuff: I taught probas/stats) : these are mostly problems for
> *inference*.
>
> On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
> wrote:
>
>>
>> Dear Christophe Pallier*,*
>>
>> When we are doing prediction, we are relying on the values of the
>> coefficients of the model created. We are feeding test data on the model
>> for prediction.    We may be nterested to see if the OLS
>> estimators(coefficients)  are BLUE or not. In the presence of
>> autocorrelation (normally noticed in time series data),  residuals are not
>> independent, and as such the OLS estimators are not BLUE in the sense that
>> they don't have minimum variance, and thus no more efficient estimators.
>> Statistical tests (t, F and *?*2)  may not be valid.  We may reject the
>> model to make predictions in such a situation.  .   We have to rely upon
>> other improved models.   There may be issues relating to multicollinearity
>> (in case of multivariable regression model)  and heteroscedasticity (mostly
>> seen  in cross-section data) too in a model.  Can we discard these  tools
>> while predicting a model?
>>
>> Regards,
>>
>> Samir K Mahajan
>>
>>
>> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <
>> christophe at pallier.org> wrote:
>>
>>> Actually, multicollinearity and autocorrelation are problems for
>>> *inference* more than for *prediction*. For example, if there is
>>> autocorrelation, the residuals are not independent, and the degrees of
>>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>>> AR1 model).
>>>
>>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>>> wrote:
>>>
>>>> A note please (to Sebastian Raschka, mrschots).
>>>>
>>>>
>>>>   The OLS model  that I used  ( where the test score gave me a negative
>>>> value)  was not a good fit.  Initial findings showed that t*he
>>>> regression coefficients and  the model as a whole were significant,    *yet
>>>> ,  finally  ,  it failed in two econometrics tests  such as VIF (used for
>>>> detecting multicollinearity ) and Durbin-Watson test  ( used for detecting
>>>> auto-correlation).  *Presence of multicollinearity and autocorrelation
>>>> problems * in the model make it unsuitable for prediction.
>>>> Regards,
>>>>
>>>> Samir K Mahajan.
>>>>
>>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> Thanks  to all of you for your kind response.   Indeed, it  is a
>>>>> great learning experience.  Yes, econometrics books  too create models for
>>>>> prediction, and programming  really   makes things better in a complex
>>>>> world.   My understanding is that machine learning does depend on
>>>>> econometrics  too.
>>>>>
>>>>> My Regards,
>>>>>
>>>>> Samir K Mahajan
>>>>>
>>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>>> mail at sebastianraschka.com> wrote:
>>>>>
>>>>>> The R2 function in scikit-learn works fine. A negative means that the
>>>>>> regression model fits the data worse than a horizontal line representing
>>>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>>>> training set a lot and then apply that model to the test set. The
>>>>>> econometrics book probably didn't cover applying a model to an independent
>>>>>> data or test set, hence the [0, 1] suggestion.
>>>>>>
>>>>>> Cheers,
>>>>>> Sebastian
>>>>>>
>>>>>>
>>>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>>
>>>>>>
>>>>>> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
>>>>>> Thank you for your kind response.  Fair enough. I go with you R2 is
>>>>>> not a square.  However, if you open any  book of econometrics,  it says R2
>>>>>> is  a ratio that lies between 0  and 1.  *This is the constraint.*
>>>>>> It measures the proportion or percentage of the total variation in
>>>>>> response variable (Y)  explained by the regressors (Xs) in the model .
>>>>>> Remaining proportion of variation in Y, if any,  is explained by the
>>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative
>>>>>> value lying on a linear scale (-5.763335245921777). This negative
>>>>>> value breaks the *constraint.* I just want to highlight that. I
>>>>>> think it needs to be corrected. Rest is up to you .
>>>>>>
>>>>>> I find that  Reshama Saikh  is hurt by my email. I am really sorry
>>>>>> for that. Please note I never undermine your  capabilities and initiatives.
>>>>>> You are great people doing great jobs. I realise that I should have been
>>>>>> more sensible.
>>>>>>
>>>>>> My regards to all of you.
>>>>>>
>>>>>> Samir K Mahajan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>>> christophe at pallier.org> wrote:
>>>>>>
>>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>>
>>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Dear All,
>>>>>>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score
>>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>>> of OLS regression model)
>>>>>>>> However, what amuses me more  is seeing you justifying   negative
>>>>>>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Samir K Mahajan.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikit-learn mailing list
>>>>>>>> scikit-learn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
danshiebler.com
(973) - 518 - 0886
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210813/d9d61c3e/attachment-0001.html>

From samirkmahajan1972 at gmail.com  Sat Aug 14 02:17:01 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Sat, 14 Aug 2021 11:47:01 +0530
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CALyk9bwm1vDNnhhnce2eYz2zLSH=wmMX=14aoNj1H0bzHUb65Q@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
 <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>
 <CAGJd5pW+hGU9uNcKk7uVbwmOfU3Lqvmae9iWXWzPX8zdH99oSQ@mail.gmail.com>
 <CAGJd5pWC9H6GmwPhqNvHmgwF1_4-X2SnDeRBXOLBgmiYhO_PiQ@mail.gmail.com>
 <CALyk9bwRo2Cm9Ej1n+kB59x8y3-TVbMA=8LP=TEKko2-ywjbwQ@mail.gmail.com>
 <CAGJd5pUJnQJQoeem2ztYT6Vf6_iayXqV3uQ0reN0yTZ5z7Zs7g@mail.gmail.com>
 <CALyk9bwm1vDNnhhnce2eYz2zLSH=wmMX=14aoNj1H0bzHUb65Q@mail.gmail.com>
Message-ID: <CAGJd5pX5Mqz=2fxM5eS=KD2FLv1aYFHuOA0fUyfRKfoW-YkK5Q@mail.gmail.com>

Dear Chrisophe,
I think you are oversimplifying by saying econometrics tools are for
inference.  Forecasting and prediction are integral parts of econometric
analysis. Econometricians  forecast by inferring the right conclusion
about the model .   I wish to convey to you that I teach  both
statistics and econometrics,  and am now learning ML. There is a
fundamental difference among statistics, econometrics and  machine
learning.
Regards,

Samir K Mahajan

On Fri, Aug 13, 2021 at 3:39 PM Christophe Pallier <christophe at pallier.org>
wrote:

> Indeed , this is basically what I told you (you do not be need to copy
> textbook stuff: I taught probas/stats) : these are mostly problems for
> *inference*.
>
> On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
> wrote:
>
>>
>> Dear Christophe Pallier*,*
>>
>> When we are doing prediction, we are relying on the values of the
>> coefficients of the model created. We are feeding test data on the model
>> for prediction.    We may be nterested to see if the OLS
>> estimators(coefficients)  are BLUE or not. In the presence of
>> autocorrelation (normally noticed in time series data),  residuals are not
>> independent, and as such the OLS estimators are not BLUE in the sense that
>> they don't have minimum variance, and thus no more efficient estimators.
>> Statistical tests (t, F and *?*2)  may not be valid.  We may reject the
>> model to make predictions in such a situation.  .   We have to rely upon
>> other improved models.   There may be issues relating to multicollinearity
>> (in case of multivariable regression model)  and heteroscedasticity (mostly
>> seen  in cross-section data) too in a model.  Can we discard these  tools
>> while predicting a model?
>>
>> Regards,
>>
>> Samir K Mahajan
>>
>>
>> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <
>> christophe at pallier.org> wrote:
>>
>>> Actually, multicollinearity and autocorrelation are problems for
>>> *inference* more than for *prediction*. For example, if there is
>>> autocorrelation, the residuals are not independent, and the degrees of
>>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>>> AR1 model).
>>>
>>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>>> wrote:
>>>
>>>> A note please (to Sebastian Raschka, mrschots).
>>>>
>>>>
>>>>   The OLS model  that I used  ( where the test score gave me a negative
>>>> value)  was not a good fit.  Initial findings showed that t*he
>>>> regression coefficients and  the model as a whole were significant,    *yet
>>>> ,  finally  ,  it failed in two econometrics tests  such as VIF (used for
>>>> detecting multicollinearity ) and Durbin-Watson test  ( used for detecting
>>>> auto-correlation).  *Presence of multicollinearity and autocorrelation
>>>> problems * in the model make it unsuitable for prediction.
>>>> Regards,
>>>>
>>>> Samir K Mahajan.
>>>>
>>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> Thanks  to all of you for your kind response.   Indeed, it  is a
>>>>> great learning experience.  Yes, econometrics books  too create models for
>>>>> prediction, and programming  really   makes things better in a complex
>>>>> world.   My understanding is that machine learning does depend on
>>>>> econometrics  too.
>>>>>
>>>>> My Regards,
>>>>>
>>>>> Samir K Mahajan
>>>>>
>>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>>> mail at sebastianraschka.com> wrote:
>>>>>
>>>>>> The R2 function in scikit-learn works fine. A negative means that the
>>>>>> regression model fits the data worse than a horizontal line representing
>>>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>>>> training set a lot and then apply that model to the test set. The
>>>>>> econometrics book probably didn't cover applying a model to an independent
>>>>>> data or test set, hence the [0, 1] suggestion.
>>>>>>
>>>>>> Cheers,
>>>>>> Sebastian
>>>>>>
>>>>>>
>>>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>>
>>>>>>
>>>>>> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
>>>>>> Thank you for your kind response.  Fair enough. I go with you R2 is
>>>>>> not a square.  However, if you open any  book of econometrics,  it says R2
>>>>>> is  a ratio that lies between 0  and 1.  *This is the constraint.*
>>>>>> It measures the proportion or percentage of the total variation in
>>>>>> response variable (Y)  explained by the regressors (Xs) in the model .
>>>>>> Remaining proportion of variation in Y, if any,  is explained by the
>>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative
>>>>>> value lying on a linear scale (-5.763335245921777). This negative
>>>>>> value breaks the *constraint.* I just want to highlight that. I
>>>>>> think it needs to be corrected. Rest is up to you .
>>>>>>
>>>>>> I find that  Reshama Saikh  is hurt by my email. I am really sorry
>>>>>> for that. Please note I never undermine your  capabilities and initiatives.
>>>>>> You are great people doing great jobs. I realise that I should have been
>>>>>> more sensible.
>>>>>>
>>>>>> My regards to all of you.
>>>>>>
>>>>>> Samir K Mahajan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>>> christophe at pallier.org> wrote:
>>>>>>
>>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>>
>>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Dear All,
>>>>>>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score
>>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>>> of OLS regression model)
>>>>>>>> However, what amuses me more  is seeing you justifying   negative
>>>>>>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Samir K Mahajan.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikit-learn mailing list
>>>>>>>> scikit-learn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/152511cc/attachment-0001.html>

From m.caorsi at l2f.ch  Sat Aug 14 09:12:18 2021
From: m.caorsi at l2f.ch (Matteo Caorsi)
Date: Sat, 14 Aug 2021 13:12:18 +0000
Subject: [scikit-learn] random forests and multil-class probability
In-Reply-To: <CAJe_vxDGZjeq=OHFPujYcaToy0aO5AHO7Xa-Du7fxH8EYbMF8Q@mail.gmail.com>
References: <zvqgJObklgFXtRrwFF3dgtP0A1g-OMVcopoyTeDMv_c4K16TP4Tv3FRG0zBZDpRjb1zwFVlxHY4BosxZd3BWE7PCi0oTxDkE1TcT6H9guBA=@protonmail.com>
 <B3AF592B-2725-4DFF-BE06-718061AD1974@gmail.com>
 <031152d2-ca59-69ee-b04c-125fda724105@gmail.com>
 <rhq5EcxyyE5ZzmmQ-buIREazvzSrU8XebYwXSaEXhc_iD5ZUTj78bkryGuuknNJ0eOe5_GuwwupfVIOTORtE_Cbo75MGCPp59nm7ky8rc3U=@protonmail.com>
 <7D53A0FD-EB5E-4C27-966B-D6954EEF7398@gmail.com>
 <CAJe_vxDGZjeq=OHFPujYcaToy0aO5AHO7Xa-Du7fxH8EYbMF8Q@mail.gmail.com>
Message-ID: <F0BF9D08-D3DF-4943-99E3-14F74C5311A1@l2f.ch>

Greetings!

I am currently out of office, with limited access to emails, till August the 30th.
Please contact support at giotto.ai for technical issue concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.

With best regards,

Matteo


On 27 Jul 2021, at 12:42, Brown J.B. via scikit-learn <scikit-learn at python.org> wrote:

2021?7?27?(?) 12:03 Guillaume Lema?tre <g.lemaitre58 at gmail.com>:
As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification.

Correct, the TPR-FPR curve (ROC) was originally intended for tuning a free parameter, in signal detection, and is a binary-type metric.
For ML problems, it lets you tune/determine an estimator's output value threshold (e.g., a probability or a raw discriminant value such as in SVM) for arriving an optimized model that will be used to give a final, binary-discretized answer in new prediction tasks.

Hope this helps, J.B.

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/2674655c/attachment.html>

From m.caorsi at l2f.ch  Sat Aug 14 09:12:23 2021
From: m.caorsi at l2f.ch (Matteo Caorsi)
Date: Sat, 14 Aug 2021 13:12:23 +0000
Subject: [scikit-learn] random forests and multil-class probability
In-Reply-To: <nqdkIKPvV_WVkz1Vv3WBCjIg-sxrrLy872XKhGyAebzcT8MicETzud_FTJoWZoZ7G_kogCpKFvlnL6MkngV0OqZGS4mIHECaN2h6KjbaVDg=@protonmail.com>
References: <zvqgJObklgFXtRrwFF3dgtP0A1g-OMVcopoyTeDMv_c4K16TP4Tv3FRG0zBZDpRjb1zwFVlxHY4BosxZd3BWE7PCi0oTxDkE1TcT6H9guBA=@protonmail.com>
 <B3AF592B-2725-4DFF-BE06-718061AD1974@gmail.com>
 <nqdkIKPvV_WVkz1Vv3WBCjIg-sxrrLy872XKhGyAebzcT8MicETzud_FTJoWZoZ7G_kogCpKFvlnL6MkngV0OqZGS4mIHECaN2h6KjbaVDg=@protonmail.com>
Message-ID: <EAF34C2E-FE19-409B-B13C-B3438FD81FA5@l2f.ch>

Greetings!

I am currently out of office, with limited access to emails, till August the 30th.
Please contact support at giotto.ai for technical issue concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.

With best regards,

Matteo


On 27 Jul 2021, at 11:31, Sole Galli via scikit-learn <scikit-learn at python.org> wrote:

Thank you!

I was confused because in the multiclass documentation it says that for those estimators that have multiclass support built in, like Decision trees and Random Forests, then we do not need to use the wrapper classes like the OnevsRest.

Thus I have the following question, if I want to determine the PR curves or the ROC curve, say with micro-average, do I need to wrap them with the 1 vs rest? Or it does not matter? The probability values do change slightly.

Thank you!


??????? Original Message ???????

On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lema?tre <g.lemaitre58 at gmail.com> wrote:

On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn scikit-learn at python.org wrote:

Hello community,

Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?.

Each decision tree of the forest is natively supporting multi class.

The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1?

According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1.

According to the documentation, the probabilities are computed as:

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.

Thank you

Sole

scikit-learn mailing list

scikit-learn at python.org

https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/ca191f88/attachment-0001.html>

From m.caorsi at l2f.ch  Sat Aug 14 09:13:25 2021
From: m.caorsi at l2f.ch (Matteo Caorsi)
Date: Sat, 14 Aug 2021 13:13:25 +0000
Subject: [scikit-learn] random forests and multil-class probability
In-Reply-To: <CAJe_vxDGZjeq=OHFPujYcaToy0aO5AHO7Xa-Du7fxH8EYbMF8Q@mail.gmail.com>
References: <zvqgJObklgFXtRrwFF3dgtP0A1g-OMVcopoyTeDMv_c4K16TP4Tv3FRG0zBZDpRjb1zwFVlxHY4BosxZd3BWE7PCi0oTxDkE1TcT6H9guBA=@protonmail.com>
 <B3AF592B-2725-4DFF-BE06-718061AD1974@gmail.com>
 <031152d2-ca59-69ee-b04c-125fda724105@gmail.com>
 <rhq5EcxyyE5ZzmmQ-buIREazvzSrU8XebYwXSaEXhc_iD5ZUTj78bkryGuuknNJ0eOe5_GuwwupfVIOTORtE_Cbo75MGCPp59nm7ky8rc3U=@protonmail.com>
 <7D53A0FD-EB5E-4C27-966B-D6954EEF7398@gmail.com>
 <CAJe_vxDGZjeq=OHFPujYcaToy0aO5AHO7Xa-Du7fxH8EYbMF8Q@mail.gmail.com>
Message-ID: <FB306430-E163-4FA8-BDC3-1FFD37080C00@l2f.ch>

Greetings!

I am currently out of office, with limited access to emails, till August the 30th.
Please contact support at giotto.ai for technical issues concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.

With best regards,

Matteo


On 27 Jul 2021, at 12:42, Brown J.B. via scikit-learn <scikit-learn at python.org> wrote:

2021?7?27?(?) 12:03 Guillaume Lema?tre <g.lemaitre58 at gmail.com>:
As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification.

Correct, the TPR-FPR curve (ROC) was originally intended for tuning a free parameter, in signal detection, and is a binary-type metric.
For ML problems, it lets you tune/determine an estimator's output value threshold (e.g., a probability or a raw discriminant value such as in SVM) for arriving an optimized model that will be used to give a final, binary-discretized answer in new prediction tasks.

Hope this helps, J.B.

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/dc5d401f/attachment.html>

From m.caorsi at l2f.ch  Sat Aug 14 09:13:28 2021
From: m.caorsi at l2f.ch (Matteo Caorsi)
Date: Sat, 14 Aug 2021 13:13:28 +0000
Subject: [scikit-learn] random forests and multil-class probability
In-Reply-To: <nqdkIKPvV_WVkz1Vv3WBCjIg-sxrrLy872XKhGyAebzcT8MicETzud_FTJoWZoZ7G_kogCpKFvlnL6MkngV0OqZGS4mIHECaN2h6KjbaVDg=@protonmail.com>
References: <zvqgJObklgFXtRrwFF3dgtP0A1g-OMVcopoyTeDMv_c4K16TP4Tv3FRG0zBZDpRjb1zwFVlxHY4BosxZd3BWE7PCi0oTxDkE1TcT6H9guBA=@protonmail.com>
 <B3AF592B-2725-4DFF-BE06-718061AD1974@gmail.com>
 <nqdkIKPvV_WVkz1Vv3WBCjIg-sxrrLy872XKhGyAebzcT8MicETzud_FTJoWZoZ7G_kogCpKFvlnL6MkngV0OqZGS4mIHECaN2h6KjbaVDg=@protonmail.com>
Message-ID: <10E3C9FF-9280-49BE-A617-41B9D0CFE417@l2f.ch>

Greetings!

I am currently out of office, with limited access to emails, till August the 30th.
Please contact support at giotto.ai for technical issues concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.

With best regards,

Matteo


On 27 Jul 2021, at 11:31, Sole Galli via scikit-learn <scikit-learn at python.org> wrote:

Thank you!

I was confused because in the multiclass documentation it says that for those estimators that have multiclass support built in, like Decision trees and Random Forests, then we do not need to use the wrapper classes like the OnevsRest.

Thus I have the following question, if I want to determine the PR curves or the ROC curve, say with micro-average, do I need to wrap them with the 1 vs rest? Or it does not matter? The probability values do change slightly.

Thank you!


??????? Original Message ???????

On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lema?tre <g.lemaitre58 at gmail.com> wrote:

On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn scikit-learn at python.org wrote:

Hello community,

Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?.

Each decision tree of the forest is natively supporting multi class.

The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1?

According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1.

According to the documentation, the probabilities are computed as:

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.

Thank you

Sole

scikit-learn mailing list

scikit-learn at python.org

https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/4d6a317d/attachment-0001.html>

From francois.dion at gmail.com  Sat Aug 14 09:52:00 2021
From: francois.dion at gmail.com (Francois Dion)
Date: Sat, 14 Aug 2021 09:52:00 -0400
Subject: [scikit-learn] random forests and multil-class probability
In-Reply-To: <7D53A0FD-EB5E-4C27-966B-D6954EEF7398@gmail.com>
References: <7D53A0FD-EB5E-4C27-966B-D6954EEF7398@gmail.com>
Message-ID: <AA4A49BE-1AFB-4BB1-B66D-2888D446B5F9@gmail.com>

Yellowbrick has multi label precision recall curves and multiclass roc/auc builtin:
https://www.scikit-yb.org/en/latest/api/classifier/rocauc.html


Sent from my iPad

> On Jul 27, 2021, at 6:03 AM, Guillaume Lema?tre <g.lemaitre58 at gmail.com> wrote:
> 
> ?As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification.
> Then, we provide an example for precision-recall that shows one way to compute precision-recall curve via averaging: https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> 
>> On 27 Jul 2021, at 11:42, Sole Galli via scikit-learn <scikit-learn at python.org> wrote:
>> 
>> Thank you!
>> 
>> So when in the multiclass document says that for the algorithms that support intrinsically multiclass, which are listed here, when it says that they do not need to be wrapped by the OnevsRest, it means that there is no need, because they can indeed handle multi class, each one in their own way.
>> 
>> But, if I want to plot PR curves or ROC curves, then I do need to wrap them because those metrics are calculated as a 1 vs rest manner, and this is not how it is handled by the algos. Is my understanding correct?
>> 
>> Thank you!
>> 
>> ??????? Original Message ???????
>> On Tuesday, July 27th, 2021 at 11:33 AM, Nicolas Hug <niourf at gmail.com> wrote:
>>> To add to Guillaume's answer: the native multiclass support for forests/trees is described here: https://scikit-learn.org/stable/modules/tree.html#multi-output-problems
>>> 
>>> It's not a one-vs-rest strategy and can be summed up as:
>>> 
>>> 
>>>> Store n output values in leaves, instead of 1;
>>>> 
>>>> Use splitting criteria that compute the average reduction across all n outputs.
>>>> 
>>> 
>>> 
>>> Nicolas
>>> 
>>> On 27/07/2021 10:22, Guillaume Lema?tre wrote:
>>>>>> On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn <scikit-learn at python.org> wrote:
>>>>>> 
>>>>>> Hello community,
>>>>>> 
>>>>>> Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?.
>>>>> Each decision tree of the forest is natively supporting multi class.
>>>>> 
>>>>> The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1?
>>>> According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1.
>>>> According to the documentation, the probabilities are computed as:
>>>> 
>>>> The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
>>>> 
>>>>> Thank you
>>>>> Sole
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/bc76f6b4/attachment.html>

From fernando.wittmann at gmail.com  Sat Aug 14 10:04:24 2021
From: fernando.wittmann at gmail.com (Fernando Marcos Wittmann)
Date: Sat, 14 Aug 2021 11:04:24 -0300
Subject: [scikit-learn] Regarding negative value of
 sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
In-Reply-To: <CAGJd5pX5Mqz=2fxM5eS=KD2FLv1aYFHuOA0fUyfRKfoW-YkK5Q@mail.gmail.com>
References: <CAGJd5pUUEw=03NyepzrrR8H06-sczd9HHGu1_8eqou5wNzkA_g@mail.gmail.com>
 <CALyk9byz8j7bj_nrCo9ezau1Jm+biX-1zQTZuLrGrncDe=J1aQ@mail.gmail.com>
 <CAGJd5pVn6PNszBtZKa2nXYmWEtFMGj6JM=EdkcFntVrNFgB+og@mail.gmail.com>
 <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark>
 <CAGJd5pW+hGU9uNcKk7uVbwmOfU3Lqvmae9iWXWzPX8zdH99oSQ@mail.gmail.com>
 <CAGJd5pWC9H6GmwPhqNvHmgwF1_4-X2SnDeRBXOLBgmiYhO_PiQ@mail.gmail.com>
 <CALyk9bwRo2Cm9Ej1n+kB59x8y3-TVbMA=8LP=TEKko2-ywjbwQ@mail.gmail.com>
 <CAGJd5pUJnQJQoeem2ztYT6Vf6_iayXqV3uQ0reN0yTZ5z7Zs7g@mail.gmail.com>
 <CALyk9bwm1vDNnhhnce2eYz2zLSH=wmMX=14aoNj1H0bzHUb65Q@mail.gmail.com>
 <CAGJd5pX5Mqz=2fxM5eS=KD2FLv1aYFHuOA0fUyfRKfoW-YkK5Q@mail.gmail.com>
Message-ID: <CABM1w2QpO32deONBZCGVhx=7gbrwugKt3ckNCJvG1WkUnefYrw@mail.gmail.com>

Hi Samir, the following visualization might be useful for gaining intuition
on the meaning of a negative r2:
https://gist.github.com/WittmannF/02060b45ce3ec9239898a5b91df2564e

A negative r2 is reflects into a model predicting the opposite trend of the
data.

On Sat, Aug 14, 2021, 03:17 Samir K Mahajan <samirkmahajan1972 at gmail.com>
wrote:

> Dear Chrisophe,
> I think you are oversimplifying by saying econometrics tools are for
> inference.  Forecasting and prediction are integral parts of econometric
> analysis. Econometricians  forecast by inferring the right conclusion
> about the model .   I wish to convey to you that I teach  both
> statistics and econometrics,  and am now learning ML. There is a
> fundamental difference among statistics, econometrics and  machine
> learning.
> Regards,
>
> Samir K Mahajan
>
> On Fri, Aug 13, 2021 at 3:39 PM Christophe Pallier <christophe at pallier.org>
> wrote:
>
>> Indeed , this is basically what I told you (you do not be need to copy
>> textbook stuff: I taught probas/stats) : these are mostly problems for
>> *inference*.
>>
>> On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>> wrote:
>>
>>>
>>> Dear Christophe Pallier*,*
>>>
>>> When we are doing prediction, we are relying on the values of the
>>> coefficients of the model created. We are feeding test data on the model
>>> for prediction.    We may be nterested to see if the OLS
>>> estimators(coefficients)  are BLUE or not. In the presence of
>>> autocorrelation (normally noticed in time series data),  residuals are not
>>> independent, and as such the OLS estimators are not BLUE in the sense that
>>> they don't have minimum variance, and thus no more efficient estimators.
>>> Statistical tests (t, F and *?*2)  may not be valid.  We may reject the
>>> model to make predictions in such a situation.  .   We have to rely upon
>>> other improved models.   There may be issues relating to multicollinearity
>>> (in case of multivariable regression model)  and heteroscedasticity (mostly
>>> seen  in cross-section data) too in a model.  Can we discard these  tools
>>> while predicting a model?
>>>
>>> Regards,
>>>
>>> Samir K Mahajan
>>>
>>>
>>> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <
>>> christophe at pallier.org> wrote:
>>>
>>>> Actually, multicollinearity and autocorrelation are problems for
>>>> *inference* more than for *prediction*. For example, if there is
>>>> autocorrelation, the residuals are not independent, and the degrees of
>>>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>>>> AR1 model).
>>>>
>>>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> A note please (to Sebastian Raschka, mrschots).
>>>>>
>>>>>
>>>>>   The OLS model  that I used  ( where the test score gave me a
>>>>> negative value)  was not a good fit.  Initial findings showed that t*he
>>>>> regression coefficients and  the model as a whole were significant,    *yet
>>>>> ,  finally  ,  it failed in two econometrics tests  such as VIF (used for
>>>>> detecting multicollinearity ) and Durbin-Watson test  ( used for detecting
>>>>> auto-correlation).  *Presence of multicollinearity and
>>>>> autocorrelation problems * in the model make it unsuitable for
>>>>> prediction.
>>>>> Regards,
>>>>>
>>>>> Samir K Mahajan.
>>>>>
>>>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>
>>>>>> Thanks  to all of you for your kind response.   Indeed, it  is a
>>>>>> great learning experience.  Yes, econometrics books  too create models for
>>>>>> prediction, and programming  really   makes things better in a complex
>>>>>> world.   My understanding is that machine learning does depend on
>>>>>> econometrics  too.
>>>>>>
>>>>>> My Regards,
>>>>>>
>>>>>> Samir K Mahajan
>>>>>>
>>>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>>>> mail at sebastianraschka.com> wrote:
>>>>>>
>>>>>>> The R2 function in scikit-learn works fine. A negative means that
>>>>>>> the regression model fits the data worse than a horizontal line
>>>>>>> representing the sample mean. E.g. you usually get that if you are
>>>>>>> overfitting the training set a lot and then apply that model to the test
>>>>>>> set. The econometrics book probably didn't cover applying a model to an
>>>>>>> independent data or test set, hence the [0, 1] suggestion.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Sebastian
>>>>>>>
>>>>>>>
>>>>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>>>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>>>
>>>>>>>
>>>>>>> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
>>>>>>> Thank you for your kind response.  Fair enough. I go with you R2 is
>>>>>>> not a square.  However, if you open any  book of econometrics,  it says R2
>>>>>>> is  a ratio that lies between 0  and 1.  *This is the constraint.*
>>>>>>> It measures the proportion or percentage of the total variation in
>>>>>>> response variable (Y)  explained by the regressors (Xs) in the model .
>>>>>>> Remaining proportion of variation in Y, if any,  is explained by the
>>>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative
>>>>>>> value lying on a linear scale (-5.763335245921777). This negative
>>>>>>> value breaks the *constraint.* I just want to highlight that. I
>>>>>>> think it needs to be corrected. Rest is up to you .
>>>>>>>
>>>>>>> I find that  Reshama Saikh  is hurt by my email. I am really sorry
>>>>>>> for that. Please note I never undermine your  capabilities and initiatives.
>>>>>>> You are great people doing great jobs. I realise that I should have been
>>>>>>> more sensible.
>>>>>>>
>>>>>>> My regards to all of you.
>>>>>>>
>>>>>>> Samir K Mahajan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>>>> christophe at pallier.org> wrote:
>>>>>>>
>>>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>>>
>>>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Dear All,
>>>>>>>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score
>>>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>>>> of OLS regression model)
>>>>>>>>> However, what amuses me more  is seeing you justifying   negative
>>>>>>>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Samir K Mahajan.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> scikit-learn mailing list
>>>>>>>>> scikit-learn at python.org
>>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikit-learn mailing list
>>>>>>>> scikit-learn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/631a47a2/attachment-0001.html>

From adrin.jalali at gmail.com  Mon Aug 16 05:56:57 2021
From: adrin.jalali at gmail.com (Adrin)
Date: Mon, 16 Aug 2021 11:56:57 +0200
Subject: [scikit-learn] Pandas copy-on-write proposal
Message-ID: <CAEOrW4-Dp4zqEa0qnX0gjcMT5irVnE1TfX5MzxmeV3EfwryRKA@mail.gmail.com>

Hi there,

I'd like to bring your attention to a proposal being discussed among pandas
developers, regarding copy-on-write semantics.

A very short summary of the proposal, according to the document
<https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit#>,
is:


*- The result of any indexing operation (subsetting a DataFrame or Series
in any way, i.e. including accessing a DataFrame column as a Series) or any
method returning a new DataFrame or Series, always behaves as if it were a
copy in terms of user API.- We implement Copy-on-Write (as implementation
detail). This way, we can actually use views as much as possible under the
hood, while ensuring the user API behaves as a copy.*
*- As a consequence, if you want to modify an object (DataFrame or Series),
the only way to do this is to modify that object itself directly.*


*This addresses multiple aspects: 1) a clear and consistent user API (a
clear rule: any subset or returned series/dataframe always behaves as a
copy of the original, and thus never modifies the original) and 2)
improving performance by avoiding excessive copies (eg a chained method
workflow would no longer return an actual data copy at each step). Because
every single indexing step behaves as a copy, this also means that with
this proposal, ?chained assignment? (with multiple setitem steps) will
never work.*

You can also read the related discussion on the pandas mailing list here
<https://mail.python.org/pipermail/pandas-dev/2021-July/001358.html>. It
would be nice for us to think about the implications of this proposal on
our work related to supporting pandas dataframes.

Cheers,
Adrin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210816/477ecf9b/attachment.html>

From petrizzo at gmail.com  Mon Aug 16 17:30:33 2021
From: petrizzo at gmail.com (Mariangela Petrizzo)
Date: Mon, 16 Aug 2021 17:30:33 -0400
Subject: [scikit-learn] Spanish translation proposal for Scikit-Learn
 documentation
In-Reply-To: <CAKHfLiUy7rOnCnDXQthNTqJm7HJN1LJuTCLhvhZcNBrhC0CSng@mail.gmail.com>
References: <CAKHfLiUy7rOnCnDXQthNTqJm7HJN1LJuTCLhvhZcNBrhC0CSng@mail.gmail.com>
Message-ID: <371E9F54-EB4F-45C7-AE19-07E1E769BC40@getmailspring.com>


Hello everyone!
We are writing briefly to announce that the Spanish translation of the Sci-kit learn 0.24.2 documentation is now available from:
https://qu4nt.github.io/sklearn-doc-es/index.html
Soon we will update in that repository the suggested workflow for future translations of this documentation. We are now in the final phase of this work, debugging and fine-tuning the last details, but we update the html version daily.
It has been a great pleasure for our team to support the Spanish community of users of this library and the Python community in general, with our work.

Mari?ngela Petrizzo
http://qu4nt.com

Mar?a ?ngela Petrizzo P?ez About Me (about.me/petrizzo)
Desc?rgate Redes para la Comprensi?n de la Pol?tica (http://www.elperroylarana.gob.ve/redes-para-la-comprension-de-la-politica/)
Usuario Linux # 498889
Miembro Red de Polit?logas - #NoSinMujeres (https://www.nosinmujeres.com/)
Publicaciones (https://hotelescuela.academia.edu/MariangelaPetrizzoPaez)
ORCID (http://orcid.org/0000-0001-9483-4185)
PEII - Nivel B

On feb. 9 2021, at 4:15 pm, Mariangela Petrizzo <petrizzo at gmail.com> wrote:
> Dear Scikit-Learn team!
>
>
>
> I am Mari?ngela Petrizzo, I am writing to you as a member of Qu4nt, a team dedicated to the use of open source tools for the development of software solutions with emphasis on data science. We have a strong interest in translating the Scikit-Learn documentation into Spanish.
> Our team is made up of members from various scientific fields, including some university faculty in linguistics and computer sciences, with a wide experience in Python as well as several libraries used for data analysis and machine learning, and also contribute locally as evangelists of its use in Spanish-speaking communities, in particular, the leader initiated the translation of some Software Carpentry lessons into Spanish.
> That is why we have been discussing the opportunity to offer our contribution to the Python project, promoting the translation into Spanish of the documentation of some of the libraries with the greatest impact in our areas of interest. Talking with David Mertz, to whom we are sending a copy of this email, we have explored options, and the idea of working with Scikit-learn has really seemed to be an exceptional opportunity for all of us and the community. He's very enthusiastic about the idea of generating a spanish translation of Scientific Python libraries like Scikit-learn.
> For us, this translation project has to be done through a completely open work on Github, taking as reference the restructured text sources for Sphinx from a git fork, using the tools provided by Sphinx itself for internationalization: https://www.sphinx-doc.org/en/1.8/intl.html, and applying tags to perform planned updates. In addition, as with any open source project, the main mechanism for quality assurance comes from the users themselves who will have the channels available for submitting issues. Our intention is to secure all the infrastructure and mechanisms to make this possible: making the process transparent through Github, using as much as possible tools like Transifex to facilitate participation, and providing guidelines for contributors as part of the project.
> Of course, this project cannot be realized without your support. We therefore come to you to inquire about your willingness to accompany and support this project.
> We would love to hear your feedback on our proposal.
> Best regards,
>
>
> Mari?ngela
>
>
> --
>
> Mar?a ?ngela Petrizzo P?ez
>
> about.me/petrizzo (https://about.me/petrizzo?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel&utm_content=plaintext)
>
>
>
>
>
>
>
>
>
>
>
>
> Desc?rgate Redes para la Comprensi?n de la Pol?tica (http://www.elperroylarana.gob.ve/redes-para-la-comprension-de-la-politica/)
>
>
>
> > A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino lo primero que se siembra y, por tanto, lo m?s radical.
>
>
> El ?nico modo de vencer el secuestro del conocimiento
> es comprender sus razones.
> La manera de revertirlo,
> es hacernos hackers de los secuestros cotidianos
> a cambio de no morir sin saber lo que somos
>
> ?Piensa para vivir,
> act?a para hackear!
> Cada d?a, una acci?n procom?n a la vez.
>
> > ?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos?
> > Albert Camus
> >
> > ?El poder, lejos de estorbar al saber, lo produce.? - Michael Foucault
> Usuario Linux # 498889
> Miembro Red de Polit?logas - #NoSinMujeres (http://www.nosinmujeres.com/)
> https://hotelescuela.academia.edu/MariangelaPetrizzoPaez
> http://orcid.org/0000-0001-9483-4185
> PEII - Nivel B
>
>
>
>
>
>
>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210816/eeb65807/attachment.html>

From reshama.stat at gmail.com  Tue Aug 17 09:03:08 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Tue, 17 Aug 2021 09:03:08 -0400
Subject: [scikit-learn] Spanish translation proposal for Scikit-Learn
 documentation
In-Reply-To: <371E9F54-EB4F-45C7-AE19-07E1E769BC40@getmailspring.com>
References: <CAKHfLiUy7rOnCnDXQthNTqJm7HJN1LJuTCLhvhZcNBrhC0CSng@mail.gmail.com>
 <371E9F54-EB4F-45C7-AE19-07E1E769BC40@getmailspring.com>
Message-ID: <CAKPCsuii2VMQx6cczJHh1zC2+BM4YFqg35dj16P09=dWNz8LfQ@mail.gmail.com>

Hi Mari?ngela,

That's an impressive accomplishment!  Congratulations.

A PR can be submitted to add the Spanish translation link to this page in
scikit-learn documentation:
https://scikit-learn.org/dev/related_projects.html#translations-of-scikit-learn-documentation

Reshama Shaikh
she/her
Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
| LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
<https://github.com/reshamas>

Data Umbrella <https://www.dataumbrella.org>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>


On Mon, Aug 16, 2021 at 5:32 PM Mariangela Petrizzo <petrizzo at gmail.com>
wrote:

>
> Hello everyone!
>
> We are writing briefly to announce that the Spanish translation of the
> Sci-kit learn 0.24.2 documentation is now available from:
>
> https://qu4nt.github.io/sklearn-doc-es/index.html
>
> Soon we will update in that repository the suggested workflow for future
> translations of this documentation. We are now in the final phase of this
> work, debugging and fine-tuning the last details, but we update the html
> version daily.
>
> It has been a great pleasure for our team to support the Spanish community
> of users of this library and the Python community in general, with our work.
>
>
> Mari?ngela Petrizzo
> http://qu4nt.com
>
> Mar?a ?ngela Petrizzo P?ezAbout Me <http://about.me/petrizzo>
> Desc?rgate Redes para la Comprensi?n de la Pol?tica
> <http://www.elperroylarana.gob.ve/redes-para-la-comprension-de-la-politica/>
> Usuario Linux # 498889
> Miembro Red de Polit?logas - #NoSinMujeres <https://www.nosinmujeres.com/>
> Publicaciones <https://hotelescuela.academia.edu/MariangelaPetrizzoPaez>
> ORCID <http://orcid.org/0000-0001-9483-4185>PEII - Nivel B
> On feb. 9 2021, at 4:15 pm, Mariangela Petrizzo <petrizzo at gmail.com>
> wrote:
>
> Dear Scikit-Learn team!
>
>
>
> I am Mari?ngela Petrizzo, I am writing to you as a member of Qu4nt, a team
> dedicated to the use of open source tools for the development of software
> solutions with emphasis on data science. We have a strong interest in
> translating the Scikit-Learn documentation into Spanish.
>
> Our team is made up of members from various scientific fields, including
> some university faculty in linguistics and computer sciences, with a wide
> experience in Python as well as several libraries used for data analysis
> and machine learning, and also contribute  locally as evangelists of its
> use in Spanish-speaking communities, in particular, the leader initiated
> the translation of some Software Carpentry lessons into Spanish.
>
> That is why we have been discussing the opportunity to offer our
> contribution to the Python project, promoting the translation into Spanish
> of the documentation of some of the libraries with the greatest impact in
> our areas of interest. Talking with David Mertz, to whom we are sending a
> copy of this email, we have explored options, and the idea of working with
> Scikit-learn has really seemed to be an exceptional opportunity for all of
> us and the community. He's very enthusiastic about the idea of generating a
> spanish translation of Scientific Python libraries like Scikit-learn.
>
> For us, this translation project has to be done through a completely open
> work on Github, taking as reference the restructured text sources for
> Sphinx from a git fork, using the tools provided by Sphinx itself for
> internationalization: https://www.sphinx-doc.org/en/1.8/intl.html
> <https://www.sphinx-doc.org/en/1.8/intl.html>, and applying tags to
> perform planned updates. In addition, as with any open source project, the
> main mechanism for quality assurance comes from the users themselves who
> will have the channels available for submitting issues. Our intention is to
> secure all the infrastructure and mechanisms to make this possible: making
> the process transparent through Github, using as much as possible tools
> like Transifex to facilitate participation, and providing guidelines for
> contributors as part of the project.
>
> Of course, this project cannot be realized without your support. We
> therefore come to you to inquire about your willingness to accompany and
> support this project.
>
> We would love to hear your feedback on our proposal.
>
> Best regards,
>
>
>
> Mari?ngela
>
>
> --
>
>
>
> Mar?a ?ngela Petrizzo P?ez
> [image: https://]
> [image: https://]about.me/petrizzo
> <https://about.me/petrizzo?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel&utm_content=plaintext>
> Desc?rgate Redes para la Comprensi?n de la Pol?tica
> <http://www.elperroylarana.gob.ve/redes-para-la-comprension-de-la-politica/>
>
> *A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino
> lo primero que se siembra y, por tanto, lo m?s radical.*
>
>
> El ?nico modo de vencer el secuestro del conocimiento
> es comprender sus razones.
> La manera de revertirlo,
> es hacernos hackers de los secuestros cotidianos
> a cambio de no morir sin saber lo que somos
>
> ?Piensa para vivir,
>         act?a para hackear!
> Cada d?a, una acci?n procom?n a la vez.
>
>
> *?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos?*
> *Albert Camus*
>
> *?El poder, lejos de estorbar al saber, lo produce.? - Michael Foucault*
>
>
> Usuario Linux # 498889
> Miembro Red de Polit?logas - #NoSinMujeres <http://www.nosinmujeres.com/>
> https://hotelescuela.academia.edu/MariangelaPetrizzoPaez
> http://orcid.org/0000-0001-9483-4185
> PEII - Nivel B
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210817/6a1f273b/attachment-0001.html>

From johngrenci61 at yahoo.com  Fri Aug 20 18:15:09 2021
From: johngrenci61 at yahoo.com (John Grenci)
Date: Fri, 20 Aug 2021 22:15:09 +0000 (UTC)
Subject: [scikit-learn] cant install scikit-learn
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
Message-ID: <1717831625.594362.1629497709231@mail.yahoo.com>


Hello, hoping somebody can help me.

?

I have tried.. what seems like everything.

?

I get an OS error

?

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz'

HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at?https://pip.pypa.io/warnings/enable-long-paths

?

?

I tried enabling more than 260 characters as suggested, but that did not help? gave me a different error actually.

?
I don?t think it has to do with bits, as my computer is 64 bit.
I also tried pip install sklearn
?
I am at a loss at this point.
?
PS- I am ?not the most techy of person.? also, looked everywhere online that I could
can somebody help?
?
Thanks
John
?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210820/6675a989/attachment.html>

From mablue92 at gmail.com  Sun Aug 22 02:17:05 2021
From: mablue92 at gmail.com (Masoud Azizi)
Date: Sun, 22 Aug 2021 10:47:05 +0430
Subject: [scikit-learn] how the skpot optimize avoids flats
Message-ID: <CABJ=VzBDEzu2caX2BTpOJzQEqhnk=_oaH8SHDJdcSDAP5UZCeg@mail.gmail.com>

Hi to all Im new in sk mailing list :) I need your help about that
how hyperoption avoids this flat places?

is there a code address to findout that?
see the attachment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210822/ff94534b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unnamed.png
Type: image/png
Size: 21512 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210822/ff94534b/attachment-0001.png>

From skacanski at gmail.com  Sun Aug 22 16:24:45 2021
From: skacanski at gmail.com (Sasha Kacanski)
Date: Sun, 22 Aug 2021 16:24:45 -0400
Subject: [scikit-learn] cant install scikit-learn
In-Reply-To: <1717831625.594362.1629497709231@mail.yahoo.com>
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
 <1717831625.594362.1629497709231@mail.yahoo.com>
Message-ID: <CAFWiikq5MxaVaz-kDZbvndeq_echxpzK5KNxKfHS7Lc8X+wgpg@mail.gmail.com>

Who about Linux desktop for a change. i suggest Debian or Arch!

On Fri, Aug 20, 2021 at 6:17 PM John Grenci via scikit-learn <
scikit-learn at python.org> wrote:

> Hello, hoping somebody can help me.
>
>
>
> I have tried.. what seems like everything.
>
>
>
> I get an OS error
>
>
>
> ERROR: Could not install packages due to an OSError: [Errno 2] No such
> file or directory:
> 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz'
>
> HINT: This error might have occurred since this system does not have
> Windows Long Path support enabled. You can find information on how to
> enable this at https://pip.pypa.io/warnings/enable-long-paths
>
>
>
>
>
> I tried enabling more than 260 characters as suggested, but that did not
> help  gave me a different error actually.
>
>
> I don?t think it has to do with bits, as my computer is 64 bit.
>
> I also tried pip install sklearn
>
>
> I am at a loss at this point.
>
>
> PS- I am  not the most techy of person.  also, looked everywhere online
> that I could
>
> can somebody help?
>
>
> Thanks
>
> John
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Aleksandar Kacanski - Sasha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210822/1e64a4bd/attachment.html>

From rdslater at gmail.com  Sun Aug 22 16:42:10 2021
From: rdslater at gmail.com (Robert Slater)
Date: Sun, 22 Aug 2021 15:42:10 -0500
Subject: [scikit-learn] cant install scikit-learn
In-Reply-To: <1717831625.594362.1629497709231@mail.yahoo.com>
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
 <1717831625.594362.1629497709231@mail.yahoo.com>
Message-ID: <CAMt686YPXYC1dVyocp1x60nwovRN3HTtTp17b3OHE5Zv+TwSvQ@mail.gmail.com>

What was the second error?

What version of python are you using?What version of windows are you using?


This will help troubleshoot the issue.


On Fri, Aug 20, 2021, 5:16 PM John Grenci via scikit-learn <
scikit-learn at python.org> wrote:

> Hello, hoping somebody can help me.
>
>
>
> I have tried.. what seems like everything.
>
>
>
> I get an OS error
>
>
>
> ERROR: Could not install packages due to an OSError: [Errno 2] No such
> file or directory:
> 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz'
>
> HINT: This error might have occurred since this system does not have
> Windows Long Path support enabled. You can find information on how to
> enable this at https://pip.pypa.io/warnings/enable-long-paths
>
>
>
>
>
> I tried enabling more than 260 characters as suggested, but that did not
> help  gave me a different error actually.
>
>
> I don?t think it has to do with bits, as my computer is 64 bit.
>
> I also tried pip install sklearn
>
>
> I am at a loss at this point.
>
>
> PS- I am  not the most techy of person.  also, looked everywhere online
> that I could
>
> can somebody help?
>
>
> Thanks
>
> John
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210822/204ddf4d/attachment.html>

From thomasjpfan at gmail.com  Sun Aug 22 17:11:22 2021
From: thomasjpfan at gmail.com (Thomas J. Fan)
Date: Sun, 22 Aug 2021 17:11:22 -0400
Subject: [scikit-learn] cant install scikit-learn
In-Reply-To: <CAMt686YPXYC1dVyocp1x60nwovRN3HTtTp17b3OHE5Zv+TwSvQ@mail.gmail.com>
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
 <1717831625.594362.1629497709231@mail.yahoo.com>
 <CAMt686YPXYC1dVyocp1x60nwovRN3HTtTp17b3OHE5Zv+TwSvQ@mail.gmail.com>
Message-ID: <CAK3g5AZCVLC8F19i5ye2Mu-uuoyQxAAEYdnQvcvN5asFcPFqTA@mail.gmail.com>

Here are instructions on how to resolve the issue:
https://scikit-learn.org/stable/install.html#error-caused-by-file-path-length-limit-on-windows

In the upcoming release of scikit-learn, we have reduced the number of
characters in the filename. This should resolve this issue without needing
to edit the Windows registry.

Thomas

On Sun, Aug 22, 2021 at 4:44 PM Robert Slater <rdslater at gmail.com> wrote:

> What was the second error?
>
> What version of python are you using?What version of windows are you using?
>
>
> This will help troubleshoot the issue.
>
>
> On Fri, Aug 20, 2021, 5:16 PM John Grenci via scikit-learn <
> scikit-learn at python.org> wrote:
>
>> Hello, hoping somebody can help me.
>>
>>
>>
>> I have tried.. what seems like everything.
>>
>>
>>
>> I get an OS error
>>
>>
>>
>> ERROR: Could not install packages due to an OSError: [Errno 2] No such
>> file or directory:
>> 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz'
>>
>> HINT: This error might have occurred since this system does not have
>> Windows Long Path support enabled. You can find information on how to
>> enable this at https://pip.pypa.io/warnings/enable-long-paths
>>
>>
>>
>>
>>
>> I tried enabling more than 260 characters as suggested, but that did not
>> help  gave me a different error actually.
>>
>>
>> I don?t think it has to do with bits, as my computer is 64 bit.
>>
>> I also tried pip install sklearn
>>
>>
>> I am at a loss at this point.
>>
>>
>> PS- I am  not the most techy of person.  also, looked everywhere online
>> that I could
>>
>> can somebody help?
>>
>>
>> Thanks
>>
>> John
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210822/276e3f73/attachment-0001.html>

From varavind121 at yahoo.com  Sun Aug 22 17:13:23 2021
From: varavind121 at yahoo.com (aravind ramesh)
Date: Sun, 22 Aug 2021 21:13:23 +0000 (UTC)
Subject: [scikit-learn] cant install scikit-learn
In-Reply-To: <CAFWiikq5MxaVaz-kDZbvndeq_echxpzK5KNxKfHS7Lc8X+wgpg@mail.gmail.com>
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
 <1717831625.594362.1629497709231@mail.yahoo.com>
 <CAFWiikq5MxaVaz-kDZbvndeq_echxpzK5KNxKfHS7Lc8X+wgpg@mail.gmail.com>
Message-ID: <1741730219.681482.1629666803984@mail.yahoo.com>

 Hi,
Try using Anaconda Python distribution(Anaconda | Individual Edition) it comes with sci-kit learn, no hassle of dealing with any dependency issues. 


| 
| 
| 
|  |  |

 |

 |
| 
|  | 
Anaconda | Individual Edition

Anaconda's open-source Individual Edition is the easiest way to perform Python/R data science and machine learni...
 |

 |

 |


    On Monday, August 23, 2021, 01:56:42 AM GMT+5:30, Sasha Kacanski <skacanski at gmail.com> wrote:  
 
 Who about Linux desktop for a change. i suggest Debian or Arch!

On Fri, Aug 20, 2021 at 6:17 PM John Grenci via scikit-learn <scikit-learn at python.org> wrote:


Hello, hoping somebody can help me.

?

I have tried.. what seems like everything.

?

I get an OS error

?

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz'

HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at?https://pip.pypa.io/warnings/enable-long-paths

?

?

I tried enabling more than 260 characters as suggested, but that did not help? gave me a different error actually.

?
I don?t think it has to do with bits, as my computer is 64 bit.
I also tried pip install sklearn
?
I am at a loss at this point.
?
PS- I am ?not the most techy of person.? also, looked everywhere online that I could
can somebody help?
?
Thanks
John
?

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn


-- 
Aleksandar Kacanski - Sasha
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210822/aa8d57d6/attachment.html>

From johngrenci61 at yahoo.com  Mon Aug 23 09:34:06 2021
From: johngrenci61 at yahoo.com (John Grenci)
Date: Mon, 23 Aug 2021 13:34:06 +0000 (UTC)
Subject: [scikit-learn] cant install scikit-learn
In-Reply-To: <CAK3g5AZCVLC8F19i5ye2Mu-uuoyQxAAEYdnQvcvN5asFcPFqTA@mail.gmail.com>
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
 <1717831625.594362.1629497709231@mail.yahoo.com>
 <CAMt686YPXYC1dVyocp1x60nwovRN3HTtTp17b3OHE5Zv+TwSvQ@mail.gmail.com>
 <CAK3g5AZCVLC8F19i5ye2Mu-uuoyQxAAEYdnQvcvN5asFcPFqTA@mail.gmail.com>
Message-ID: <279180649.1033217.1629725646648@mail.yahoo.com>

 Thomas, and everybody else who responded.
the instructions below worked.
thanks so much.
I just joined this group and four people responded rather quickly.
not being a "techhy person" per se, people who respond help alleviate the frustration that can commonly occur
thanks again, much appreciated.
and everyone have a great day.
John
    On Sunday, August 22, 2021, 05:12:22 PM EDT, Thomas J. Fan <thomasjpfan at gmail.com> wrote:  
 
 Here are instructions on how to resolve the issue: https://scikit-learn.org/stable/install.html#error-caused-by-file-path-length-limit-on-windows
In the upcoming release of scikit-learn, we have reduced the number of characters?in the filename. This should resolve this issue without needing to edit the?Windows registry.
Thomas
On Sun, Aug 22, 2021 at 4:44 PM Robert Slater <rdslater at gmail.com> wrote:

What was the second error?
What version of python are you using?What version of windows are you using?

This will help troubleshoot the issue.

On Fri, Aug 20, 2021, 5:16 PM John Grenci via scikit-learn <scikit-learn at python.org> wrote:


Hello, hoping somebody can help me.

?

I have tried.. what seems like everything.

?

I get an OS error

?

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz'

HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at?https://pip.pypa.io/warnings/enable-long-paths

?

?

I tried enabling more than 260 characters as suggested, but that did not help? gave me a different error actually.

?
I don?t think it has to do with bits, as my computer is 64 bit.
I also tried pip install sklearn
?
I am at a loss at this point.
?
PS- I am ?not the most techy of person.? also, looked everywhere online that I could
can somebody help?
?
Thanks
John
?

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210823/206382ab/attachment-0001.html>

From olivier.grisel at ensta.org  Wed Aug 25 04:52:21 2021
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Wed, 25 Aug 2021 10:52:21 +0200
Subject: [scikit-learn] Pandas copy-on-write proposal
In-Reply-To: <CAEOrW4-Dp4zqEa0qnX0gjcMT5irVnE1TfX5MzxmeV3EfwryRKA@mail.gmail.com>
References: <CAEOrW4-Dp4zqEa0qnX0gjcMT5irVnE1TfX5MzxmeV3EfwryRKA@mail.gmail.com>
Message-ID: <CAFvE7K5e4dJba9bSoZQyT-LSWdQwHF=iPUTroQh7prnKD8b8wA@mail.gmail.com>

Thanks for the heads up! This is interesting. We rarely update
dataframe values in-place in scikit-learn but this is interesting to
know that we could leverage this for more efficient pandas-in
pandas-out support, for instance for missing value imputation.

From olivier.grisel at ensta.org  Wed Aug 25 05:06:07 2021
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Wed, 25 Aug 2021 11:06:07 +0200
Subject: [scikit-learn] Dataframe protocol RFC
Message-ID: <CAFvE7K6vWp-AxhEk4Ejo_ZMrvj03zgnqFtQBqQs3GX6OSa6TDA@mail.gmail.com>

Hi all,

This is an email to notify everybody interested that the discussion on
interoperability of Python dataframe libraries has moved to an
official repo under the data-apis.org initiative:

https://data-apis.org/blog/dataframe_protocol_rfc/
https://github.com/data-apis/dataframe-api

and they are requesting feedback from library authors (both dataframe
providers and consumers).

-- 
Olivier

From johngrenci61 at yahoo.com  Wed Aug 25 09:00:23 2021
From: johngrenci61 at yahoo.com (John Grenci)
Date: Wed, 25 Aug 2021 13:00:23 +0000 (UTC)
Subject: [scikit-learn] data reader group?
In-Reply-To: <CAFvE7K6vWp-AxhEk4Ejo_ZMrvj03zgnqFtQBqQs3GX6OSa6TDA@mail.gmail.com>
References: <CAFvE7K6vWp-AxhEk4Ejo_ZMrvj03zgnqFtQBqQs3GX6OSa6TDA@mail.gmail.com>
Message-ID: <677005045.474056.1629896423296@mail.yahoo.com>

 Hello everyone, I am new to this group.
I was wondering if there is something akin to this for data reader?
or are questions other than scikit-learn acceptable on this forum?
thanks
John
    On Wednesday, August 25, 2021, 05:06:42 AM EDT, Olivier Grisel <olivier.grisel at ensta.org> wrote:  
 
 Hi all,

This is an email to notify everybody interested that the discussion on
interoperability of Python dataframe libraries has moved to an
official repo under the data-apis.org initiative:

https://data-apis.org/blog/dataframe_protocol_rfc/
https://github.com/data-apis/dataframe-api

and they are requesting feedback from library authors (both dataframe
providers and consumers).

-- 
Olivier
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210825/26f99dc8/attachment.html>

From thomasjpfan at gmail.com  Wed Aug 25 10:03:57 2021
From: thomasjpfan at gmail.com (Thomas J. Fan)
Date: Wed, 25 Aug 2021 10:03:57 -0400
Subject: [scikit-learn] scikit-learn monthly developer meeting: Monday
 August 30th 2021
Message-ID: <CAK3g5AZ+8gNPAmWm+zPS=-tbB_w+YkAWaGkUQYweWKibwme9QA@mail.gmail.com>

Dear all,

The scikit-learn developer monthly meeting will take place on Monday
August 30th at 1PM UTC.

- Video call link: https://meet.google.com/ews-uszu-djs
- Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q
- Local times:
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=8&day=30&hour=13&min=0&sec=0&p1=1440&p2=240&p3=248&p4=195&p5=179&p6=224

The goal of this meeting is to discuss ongoing development topics for
the project. Everybody is welcome.

As usual, please follow the code of conduct of the project:
https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md

Regards,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210825/a72704c6/attachment.html>

From reshama.stat at gmail.com  Wed Aug 25 22:53:58 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Wed, 25 Aug 2021 22:53:58 -0400
Subject: [scikit-learn] pipeline diagram
Message-ID: <CAKPCsuikz2Gu-q4bzudcc9NHJ+xFcqCkL7pG_ceak1scHmU4Xg@mail.gmail.com>

Hello,
This question is for the community (*not* the core contributors).

In referencing the *diagram representation* of the pipeline [a], what would
be the best way for you to find out what "strategy" (from: mean, median,
most_frequent, constant) is being used for "SimpleImputer"?

(Also, I am attaching a screenshot of the diagram.)

It's not a quiz or anything [ :) ], I'm trying to figure out where folks
would look first to get more information on the pipeline.

[a]
https://scikit-learn.org/dev/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py

Thanks,
Reshama
---
Reshama Shaikh
she/her
Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
| LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
<https://github.com/reshamas>

Data Umbrella <https://www.dataumbrella.org>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210825/00d43731/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pipeline_diagram.png
Type: image/png
Size: 110835 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210825/00d43731/attachment-0001.png>

From aidangawronski at gmail.com  Fri Aug 27 20:14:20 2021
From: aidangawronski at gmail.com (Aidan Gawronski)
Date: Fri, 27 Aug 2021 17:14:20 -0700
Subject: [scikit-learn] LabelPropagation - transduction_ vs predict
Message-ID: <CAHKrBE4S-Lkjb+yDsA5hkr_OPSAv9vxyOrpK=RiO4znQQk7Adw@mail.gmail.com>

Hi all,

I was exploring sklearn.semi_supervised.LabelPropagation and I noticed
that I get difference results if I train a model and look at
"model.transduction_" compared to taking the same model and using
"model.predict(X_train)" on the training data.

I couldn't easily find the difference on google, so I began reading
through the code but it seems pretty involved and I thought someone
here might know the difference off hand.

Any help is greatly appreciated :)

Thanks,
Aidan.

From joel.nothman at gmail.com  Sun Aug 29 02:21:49 2021
From: joel.nothman at gmail.com (Joel Nothman)
Date: Sun, 29 Aug 2021 16:21:49 +1000
Subject: [scikit-learn] pipeline diagram
In-Reply-To: <CAKPCsuikz2Gu-q4bzudcc9NHJ+xFcqCkL7pG_ceak1scHmU4Xg@mail.gmail.com>
References: <CAKPCsuikz2Gu-q4bzudcc9NHJ+xFcqCkL7pG_ceak1scHmU4Xg@mail.gmail.com>
Message-ID: <CAAkaFLVBPDrUfnDBcanaPrFi1HjdMEV3eF1CbFoqkYaoK_QY6A@mail.gmail.com>

HI Reshama,

You can click the nodes in the diagram (obviously the screenshot loses
this). Is there some way we can make that more obvious? Passing your
mouse (if you're on an appropriate device) over it shows the hand cursor,
which is some indication.

Would it be helpful if when the user put their cursor over the diagram at
all, it showed something like "Click an estimator type to see its
parameters"??

Joel

On Thu, 26 Aug 2021 at 12:55, Reshama Shaikh <reshama.stat at gmail.com> wrote:

> Hello,
> This question is for the community (*not* the core contributors).
>
> In referencing the *diagram representation* of the pipeline [a], what
> would be the best way for you to find out what "strategy" (from: mean,
> median, most_frequent, constant) is being used for "SimpleImputer"?
>
> (Also, I am attaching a screenshot of the diagram.)
>
> It's not a quiz or anything [ :) ], I'm trying to figure out where folks
> would look first to get more information on the pipeline.
>
> [a]
>
> https://scikit-learn.org/dev/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py
>
> Thanks,
> Reshama
> ---
> Reshama Shaikh
> she/her
> Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
> | LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
> <https://github.com/reshamas>
>
> Data Umbrella <https://www.dataumbrella.org>
> NYC PyLadies
> <https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210829/c8f1294a/attachment.html>

From reshama.stat at gmail.com  Sun Aug 29 10:09:34 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Sun, 29 Aug 2021 10:09:34 -0400
Subject: [scikit-learn] pipeline diagram
In-Reply-To: <CAAkaFLVBPDrUfnDBcanaPrFi1HjdMEV3eF1CbFoqkYaoK_QY6A@mail.gmail.com>
References: <CAKPCsuikz2Gu-q4bzudcc9NHJ+xFcqCkL7pG_ceak1scHmU4Xg@mail.gmail.com>
 <CAAkaFLVBPDrUfnDBcanaPrFi1HjdMEV3eF1CbFoqkYaoK_QY6A@mail.gmail.com>
Message-ID: <CAKPCsugUoOu6Yv3mbFrC5P4NfxFShWCEweKSj+GeNARpOHgGfQ@mail.gmail.com>

Hi Joel,
I am working on the PR to add the diagram visualization to the
documentation [a].
I had added both text and diagram output to all the examples, because I did
not realize you could click on the diagram sections to get more
information.  It wasn't until my recent discussion with Thomas where he
pointed it out; it wasn't intuitive to me.  It would be good to either:
a)  add a note somewhere indicating "click on the text in the pipeline
visualization to see more details, such as parameter settings"
b)  add a GIF of it to the documentation
c)  if when the user puts their cursor over the diagram at all, show
something like "Click an estimator type to see its parameters"

I added this PR to the agenda for the next scikit-learn meeting.

[a]
https://github.com/scikit-learn/scikit-learn/pull/18758

Reshama Shaikh
she/her
Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
| LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
<https://github.com/reshamas>

Data Umbrella <https://www.dataumbrella.org>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>


On Sun, Aug 29, 2021 at 2:24 AM Joel Nothman <joel.nothman at gmail.com> wrote:

> HI Reshama,
>
> You can click the nodes in the diagram (obviously the screenshot loses
> this). Is there some way we can make that more obvious? Passing your
> mouse (if you're on an appropriate device) over it shows the hand cursor,
> which is some indication.
>
> Would it be helpful if when the user put their cursor over the diagram at
> all, it showed something like "Click an estimator type to see its
> parameters"??
>
> Joel
>
> On Thu, 26 Aug 2021 at 12:55, Reshama Shaikh <reshama.stat at gmail.com>
> wrote:
>
>> Hello,
>> This question is for the community (*not* the core contributors).
>>
>> In referencing the *diagram representation* of the pipeline [a], what
>> would be the best way for you to find out what "strategy" (from: mean,
>> median, most_frequent, constant) is being used for "SimpleImputer"?
>>
>> (Also, I am attaching a screenshot of the diagram.)
>>
>> It's not a quiz or anything [ :) ], I'm trying to figure out where folks
>> would look first to get more information on the pipeline.
>>
>> [a]
>>
>> https://scikit-learn.org/dev/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py
>>
>> Thanks,
>> Reshama
>> ---
>> Reshama Shaikh
>> she/her
>> Blog <https://reshamas.github.io> | Twitter
>> <https://twitter.com/reshamas> | LinkedIn
>> <https://www.linkedin.com/in/reshamas/> | GitHub
>> <https://github.com/reshamas>
>>
>> Data Umbrella <https://www.dataumbrella.org>
>> NYC PyLadies
>> <https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210829/2b3ebc48/attachment.html>