From uros.pocek at gmail.com Mon Aug 2 05:03:55 2021 From: uros.pocek at gmail.com (=?UTF-8?B?VXJvxaEgUG/EjWVr?=) Date: Mon, 2 Aug 2021 11:03:55 +0200 Subject: [scikit-learn] scikit-learn for Apple Silicon M1 Macs Message-ID: Hello, I am a student and ML programmer and I have been using scikit-learn library for python for a few years now on my PC, but recently I switched to M1 iMac and when I tried to transfer my projects and pip install used libraries in them I ran in bunch of issues. Long story short I was able to successfully install all ML libraries on my new Mac(tensorflow, numpy, matplotlib, pandas, torch, ?) except scikit-learn (sklearn)! When can we expect to see version of this library that can be installed using pip on M1 Macs and that can be used without any issues? Thank you all in advance. Uros Pocek -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at gmail.com Mon Aug 2 05:15:48 2021 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Mon, 2 Aug 2021 11:15:48 +0200 Subject: [scikit-learn] [TC Vote] Technical Committee vote: line length In-Reply-To: References: <20210726212619.54iy56wbl4sdbe3z@phare.normalesup.org> Message-ID: <482f3b2c-fcff-719b-aa44-6f3c2d4afc0b@gmail.com> I also don't have a strong opinion on this, and generally I'm just happy that black migration happened. Still with a slight preference for 88 characters as the default. On 28/07/2021 18:34, Olivier Grisel wrote: > Many very active core devs not represented in the TC voted for 88 and > my previous vote for 79 was not that strong. So I feel that I should > now vote for 88: > > Keep current 88 characters: > > Olivier > > Revert to 79 characters: > From g.lemaitre58 at gmail.com Mon Aug 2 06:07:18 2021 From: g.lemaitre58 at gmail.com (=?utf-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Mon, 2 Aug 2021 12:07:18 +0200 Subject: [scikit-learn] scikit-learn for Apple Silicon M1 Macs In-Reply-To: References: Message-ID: There is no currently available wheel in PyPI because NumPy and SciPy does not provide wheels as well: https://github.com/scikit-learn/scikit-learn/issues/19137 However, one can use `miniforge` or `mambaforge` to install binaries without the need to build from source: https://scikit-learn.org/stable/install.html#installing-on-apple-silicon-m1-hardware NB: I am currently developing scikit-learn with a M1 using `mambaforge` and the process is pretty smooth. -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ > On 2 Aug 2021, at 11:03, Uro? Po?ek wrote: > > Hello, I am a student and ML programmer and I have been using scikit-learn library for python for a few years now on my PC, but recently I switched to M1 iMac and when I tried to transfer my projects and pip install used libraries in them I ran in bunch of issues. Long story short I was able to successfully install all ML libraries on my new Mac(tensorflow, numpy, matplotlib, pandas, torch, ?) except scikit-learn (sklearn)! When can we expect to see version of this library that can be installed using pip on M1 Macs and that can be used without any issues? > > Thank you all in advance. > Uros Pocek > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From reshama.stat at gmail.com Thu Aug 5 10:38:18 2021 From: reshama.stat at gmail.com (Reshama Shaikh) Date: Thu, 5 Aug 2021 10:38:18 -0400 Subject: [scikit-learn] Open Source: sustainability and etiquette In-Reply-To: References: Message-ID: Hello, I found the video, it's from 2017. It's by Heather Miller, a professor at CMU. The 40-minute talk is entitled: The Dramatic Consequences of the Open Source Revolution [a] Brigitta, Heather references Nadia Eghbal's book in her talk, which I also added to my list. [b] Adrin, I added CHAOSS to the list as well. They have a mailing list which I have subscribed to. [a] https://youtu.be/K4mVuxcimWk [b] https://www.dataumbrella.org/open-source/open-source-sustainability Reshama Shaikh she/her Blog | Twitter | LinkedIn | GitHub Data Umbrella NYC PyLadies On Mon, Apr 19, 2021 at 6:51 PM Brigitta Sipocz wrote: > Hi, > > I've also very much liked Nadia Eghbal's book: Working in public; The > making and maintenance of open source software. I haven't yet attended a > conference where she was a speaker, but I'm certain there are some relevant > recordings on youtube. > > Cheers, > Brigitta > > > On Mon, 19 Apr 2021 at 06:27, Adrin wrote: > >> This is a really good initiative Reshama, thanks for sharing. >> >> Have you seen CHAOSScon talks and activities? They're really good, and >> touch on a lot of really good stuff when it comes to open source >> communities and sustainability. >> Eg.: https://chaoss.community/chaosscon-2020-eu/ >> >> Cheers, >> Adrin >> >> On Fri, Apr 16, 2021 at 4:26 PM Reshama Shaikh >> wrote: >> >>> Hello, >>> I've seen some excellent resources that have explained open source, its >>> sustainability, challenges and *indirectly, the etiquette*. >>> >>> I am starting to compile the list here [a]. >>> >>> This keynote by Stuart Geiger is a must-watch: The Invisible Work of >>> Maintaining & Sustaining Open Source Software [b] >>> >>> There is one more video by Emily someone who was at Microsoft, but is >>> now a professor somewhere, and I am trying to track that video down. I >>> think it's from 2017. I'll add it to the list once I find it. If anyone >>> knows the full name of the speaker, please share. >>> >>> [a] >>> https://www.dataumbrella.org/open-source/open-source-sustainability >>> >>> [b] >>> https://www.youtube.com/watch?v=PM3iltcaIL8 >>> >>> Best, >>> Reshama >>> --- >>> Reshama Shaikh >>> she/her >>> Blog | Twitter >>> | LinkedIn >>> | GitHub >>> >>> >>> Data Umbrella >>> NYC PyLadies >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samirkmahajan1972 at gmail.com Wed Aug 11 15:16:34 2021 From: samirkmahajan1972 at gmail.com (Samir K Mahajan) Date: Thu, 12 Aug 2021 00:46:34 +0530 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score Message-ID: Dear All, I am amazed to find negative values of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model) However, what amuses me more is seeing you justifying negative 'sklearn.metrics.r2_score ' in your documentation. This does not make sense to me . Please justify to me how squared values are negative. Regards, Samir K Mahajan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From drabas.t at gmail.com Wed Aug 11 15:29:09 2021 From: drabas.t at gmail.com (Tomek Drabas) Date: Wed, 11 Aug 2021 19:29:09 +0000 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: Message-ID: Hi Samir, In the documentation there?s a link to how the coefficient of determination is defined: https://en.m.wikipedia.org/wiki/Coefficient_of_determination From this it is easy to see when the values can become negative: when the model performs significantly worse than the baseline (predicting average for each observation). Common misconception is that the ?squaredness? is of some single value but in here (per CoD?s definition) it?s the ration of the squared distances of the baseline model and the estimated one. Hope this helps, -Tom Sent on the go ________________________________ From: scikit-learn on behalf of Samir K Mahajan Sent: Wednesday, August 11, 2021 12:16:34 PM To: scikit-learn at python.org Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score Dear All, I am amazed to find negative values of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model) However, what amuses me more is seeing you justifying negative 'sklearn.metrics.r2_score ' in your documentation. This does not make sense to me . Please justify to me how squared values are negative. Regards, Samir K Mahajan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From reshama.stat at gmail.com Wed Aug 11 15:35:06 2021 From: reshama.stat at gmail.com (Reshama Shaikh) Date: Wed, 11 Aug 2021 15:35:06 -0400 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: Message-ID: <0A284AE8-1F6C-4E62-92B9-69CBD43B9C78@gmail.com> Hello Samir, The tone of your email is disrespectful. For any project, but particularly so for an open source project. It is not for this community. Please review the Code of Conduct for this library. http://scikit-learn.org/stable/developers/contributing.html Regards, Reshama > On Aug 11, 2021, at 3:18 PM, Samir K Mahajan wrote: > > ? > Dear All, > I am amazed to find negative values of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model) > However, what amuses me more is seeing you justifying negative 'sklearn.metrics.r2_score ' in your documentation. This does not make sense to me . Please justify to me how squared values are negative. > > Regards, > Samir K Mahajan. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From christophe at pallier.org Thu Aug 12 02:31:01 2021 From: christophe at pallier.org (Christophe Pallier) Date: Thu, 12 Aug 2021 08:31:01 +0200 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: Message-ID: Simple: despite its name R2 is not a square. Look up its definition. On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, wrote: > Dear All, > I am amazed to find negative values of sklearn.metrics.r2_score and > sklearn.metrics.explained_variance_score in a model ( cross validation of > OLS regression model) > However, what amuses me more is seeing you justifying negative > 'sklearn.metrics.r2_score ' in your documentation. This does not > make sense to me . Please justify to me how squared values are negative. > > Regards, > Samir K Mahajan. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samirkmahajan1972 at gmail.com Thu Aug 12 15:18:45 2021 From: samirkmahajan1972 at gmail.com (Samir K Mahajan) Date: Fri, 13 Aug 2021 00:48:45 +0530 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: Message-ID: Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, Thank you for your kind response. Fair enough. I go with you R2 is not a square. However, if you open any book of econometrics, it says R2 is a ratio that lies between 0 and 1. *This is the constraint.* It measures the proportion or percentage of the total variation in response variable (Y) explained by the regressors (Xs) in the model . Remaining proportion of variation in Y, if any, is explained by the residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a linear scale (-5.763335245921777). This negative value breaks the *constraint. *I just want to highlight that. I think it needs to be corrected. Rest is up to you . I find that Reshama Saikh is hurt by my email. I am really sorry for that. Please note I never undermine your capabilities and initiatives. You are great people doing great jobs. I realise that I should have been more sensible. My regards to all of you. Samir K Mahajan On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier wrote: > Simple: despite its name R2 is not a square. Look up its definition. > > On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, > wrote: > >> Dear All, >> I am amazed to find negative values of sklearn.metrics.r2_score and >> sklearn.metrics.explained_variance_score in a model ( cross validation of >> OLS regression model) >> However, what amuses me more is seeing you justifying negative >> 'sklearn.metrics.r2_score ' in your documentation. This does not >> make sense to me . Please justify to me how squared values are negative. >> >> Regards, >> Samir K Mahajan. >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maykonschots at gmail.com Thu Aug 12 15:30:34 2021 From: maykonschots at gmail.com (mrschots) Date: Thu, 12 Aug 2021 16:30:34 -0300 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: Message-ID: There is no constraint, that?s the point since nothing limits you to have a model with crap predictions leading to be worse than to just predict the target?s mean for every data point. If you do so ?> negative R2. Best Regards, Em qui., 12 de ago. de 2021 ?s 16:21, Samir K Mahajan < samirkmahajan1972 at gmail.com> escreveu: > > Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, > Thank you for your kind response. Fair enough. I go with you R2 is not a > square. However, if you open any book of econometrics, it says R2 is a > ratio that lies between 0 and 1. *This is the constraint.* It measures > the proportion or percentage of the total variation in response > variable (Y) explained by the regressors (Xs) in the model . Remaining > proportion of variation in Y, if any, is explained by the residual term(u) > Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a > linear scale (-5.763335245921777). This negative value breaks the *constraint. > *I just want to highlight that. I think it needs to be corrected. Rest is > up to you . > > I find that Reshama Saikh is hurt by my email. I am really sorry for > that. Please note I never undermine your capabilities and initiatives. You > are great people doing great jobs. I realise that I should have been more > sensible. > > My regards to all of you. > > Samir K Mahajan > > > > > > > > > On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < > christophe at pallier.org> wrote: > >> Simple: despite its name R2 is not a square. Look up its definition. >> >> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, >> wrote: >> >>> Dear All, >>> I am amazed to find negative values of sklearn.metrics.r2_score and >>> sklearn.metrics.explained_variance_score in a model ( cross validation of >>> OLS regression model) >>> However, what amuses me more is seeing you justifying negative >>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>> make sense to me . Please justify to me how squared values are negative. >>> >>> Regards, >>> Samir K Mahajan. >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Schots -------------- next part -------------- An HTML attachment was scrubbed... URL: From drabas.t at gmail.com Thu Aug 12 15:41:02 2021 From: drabas.t at gmail.com (Tomek Drabas) Date: Thu, 12 Aug 2021 12:41:02 -0700 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: Message-ID: In the simplest case of a simple linear regression what you wrote holds true: the explained variance is simply a sum of variance explained by the model and the residual variability that cannot be explained, and that would always lie between 0 and 1. e.g. here: https://online.stat.psu.edu/stat500/lesson/9/9.3 However, this would be quite hard to do for more complex models (even for a multivariate linear regression) thus a need for a more general definition like here: https://en.wikipedia.org/wiki/Coefficient_of_determination or here https://www.investopedia.com/terms/r/r-squared.asp. I can easily envision a situation where data has outliers (i.e. data is not clean enough to be used in modeling) that it'd render a model that performs worse than a base model of simply taking average as a prediction for each observation. Cheers, -Tom On Thu, Aug 12, 2021 at 12:19 PM Samir K Mahajan < samirkmahajan1972 at gmail.com> wrote: > > Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, > Thank you for your kind response. Fair enough. I go with you R2 is not a > square. However, if you open any book of econometrics, it says R2 is a > ratio that lies between 0 and 1. *This is the constraint.* It measures > the proportion or percentage of the total variation in response > variable (Y) explained by the regressors (Xs) in the model . Remaining > proportion of variation in Y, if any, is explained by the residual term(u) > Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a > linear scale (-5.763335245921777). This negative value breaks the *constraint. > *I just want to highlight that. I think it needs to be corrected. Rest is > up to you . > > I find that Reshama Saikh is hurt by my email. I am really sorry for > that. Please note I never undermine your capabilities and initiatives. You > are great people doing great jobs. I realise that I should have been more > sensible. > > My regards to all of you. > > Samir K Mahajan > > > > > > > > > On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < > christophe at pallier.org> wrote: > >> Simple: despite its name R2 is not a square. Look up its definition. >> >> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, >> wrote: >> >>> Dear All, >>> I am amazed to find negative values of sklearn.metrics.r2_score and >>> sklearn.metrics.explained_variance_score in a model ( cross validation of >>> OLS regression model) >>> However, what amuses me more is seeing you justifying negative >>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>> make sense to me . Please justify to me how squared values are negative. >>> >>> Regards, >>> Samir K Mahajan. >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Thu Aug 12 15:28:03 2021 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 12 Aug 2021 14:28:03 -0500 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: Message-ID: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> The R2 function in scikit-learn works fine. A negative means that the regression model fits the data worse than a horizontal line representing the sample mean. E.g. you usually get that if you are overfitting the training set a lot and then apply that model to the test set. The econometrics book probably didn't cover applying a model to an independent data or test set, hence the [0, 1] suggestion. Cheers, Sebastian On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan , wrote: > > Dear?Christophe Pallier,? Reshama?Saikh and Tromek?Drabas, > > Thank you for your kind response.??Fair enough. I go with?you R2 is not a square.? However, if you?open any? book of econometrics,? it says R2 is? a ratio that lies between 0? and 1.? This is the constraint. It measures the proportion or percentage of the total variation in? response variable?(Y)? explained by the regressors (Xs) in the model . Remaining proportion?of variation?in Y, if any,? is explained by the residual term(u) Now, sklearn.matrics.?metrics.r2_score gives me a negative value lying on a linear scale (-5.763335245921777). This negative value breaks the constraint. I just want to highlight that. I think it needs to be corrected. Rest is up to you . > > I find that? Reshama?Saikh? is hurt by my email. I am really sorry for that. Please note I never undermine your? capabilities?and initiatives. You are great?people doing great jobs. I realise that I should have been more sensible. > > My regards to all of you. > > Samir K Mahajan > > > > > > > > > > On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier wrote: > > > Simple: despite its name R2 is not a square. Look up its definition. > > > > > > > On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, wrote: > > > > > Dear All, > > > > > I am amazed to find? negative? values of? sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model) > > > > > However, what?amuses me more? is seeing you justifying? ?negative? 'sklearn.metrics.r2_score ' in your documentation.? This does not make?sense to?me . Please justify to me how squared?values are negative. > > > > > > > > > > Regards, > > > > > Samir K Mahajan. > > > > > > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From samirkmahajan1972 at gmail.com Thu Aug 12 16:11:17 2021 From: samirkmahajan1972 at gmail.com (Samir K Mahajan) Date: Fri, 13 Aug 2021 01:41:17 +0530 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> References: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> Message-ID: Thanks to all of you for your kind response. Indeed, it is a great learning experience. Yes, econometrics books too create models for prediction, and programming really makes things better in a complex world. My understanding is that machine learning does depend on econometrics too. My Regards, Samir K Mahajan On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka wrote: > The R2 function in scikit-learn works fine. A negative means that the > regression model fits the data worse than a horizontal line representing > the sample mean. E.g. you usually get that if you are overfitting the > training set a lot and then apply that model to the test set. The > econometrics book probably didn't cover applying a model to an independent > data or test set, hence the [0, 1] suggestion. > > Cheers, > Sebastian > > > On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan < > samirkmahajan1972 at gmail.com>, wrote: > > > Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, > Thank you for your kind response. Fair enough. I go with you R2 is not a > square. However, if you open any book of econometrics, it says R2 is a > ratio that lies between 0 and 1. *This is the constraint.* It measures > the proportion or percentage of the total variation in response > variable (Y) explained by the regressors (Xs) in the model . Remaining > proportion of variation in Y, if any, is explained by the residual term(u) > Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a > linear scale (-5.763335245921777). This negative value breaks the > *constraint.* I just want to highlight that. I think it needs to be > corrected. Rest is up to you . > > I find that Reshama Saikh is hurt by my email. I am really sorry for > that. Please note I never undermine your capabilities and initiatives. You > are great people doing great jobs. I realise that I should have been more > sensible. > > My regards to all of you. > > Samir K Mahajan > > > > > > > > > On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < > christophe at pallier.org> wrote: > >> Simple: despite its name R2 is not a square. Look up its definition. >> >> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, >> wrote: >> >>> Dear All, >>> I am amazed to find negative values of sklearn.metrics.r2_score and >>> sklearn.metrics.explained_variance_score in a model ( cross validation of >>> OLS regression model) >>> However, what amuses me more is seeing you justifying negative >>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>> make sense to me . Please justify to me how squared values are negative. >>> >>> Regards, >>> Samir K Mahajan. >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samirkmahajan1972 at gmail.com Thu Aug 12 16:32:03 2021 From: samirkmahajan1972 at gmail.com (Samir K Mahajan) Date: Fri, 13 Aug 2021 02:02:03 +0530 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> Message-ID: A note please (to Sebastian Raschka, mrschots). The OLS model that I used ( where the test score gave me a negative value) was not a good fit. Initial findings showed that t*he regression coefficients and the model as a whole were significant, *yet , finally , it failed in two econometrics tests such as VIF (used for detecting multicollinearity ) and Durbin-Watson test ( used for detecting auto-correlation). *Presence of multicollinearity and autocorrelation problems * in the model make it unsuitable for prediction. Regards, Samir K Mahajan. On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan wrote: > Thanks to all of you for your kind response. Indeed, it is a > great learning experience. Yes, econometrics books too create models for > prediction, and programming really makes things better in a complex > world. My understanding is that machine learning does depend on > econometrics too. > > My Regards, > > Samir K Mahajan > > On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka < > mail at sebastianraschka.com> wrote: > >> The R2 function in scikit-learn works fine. A negative means that the >> regression model fits the data worse than a horizontal line representing >> the sample mean. E.g. you usually get that if you are overfitting the >> training set a lot and then apply that model to the test set. The >> econometrics book probably didn't cover applying a model to an independent >> data or test set, hence the [0, 1] suggestion. >> >> Cheers, >> Sebastian >> >> >> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan < >> samirkmahajan1972 at gmail.com>, wrote: >> >> >> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, >> Thank you for your kind response. Fair enough. I go with you R2 is not >> a square. However, if you open any book of econometrics, it says R2 is >> a ratio that lies between 0 and 1. *This is the constraint.* It >> measures the proportion or percentage of the total variation in response >> variable (Y) explained by the regressors (Xs) in the model . Remaining >> proportion of variation in Y, if any, is explained by the residual term(u) >> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a >> linear scale (-5.763335245921777). This negative value breaks the >> *constraint.* I just want to highlight that. I think it needs to be >> corrected. Rest is up to you . >> >> I find that Reshama Saikh is hurt by my email. I am really sorry for >> that. Please note I never undermine your capabilities and initiatives. You >> are great people doing great jobs. I realise that I should have been more >> sensible. >> >> My regards to all of you. >> >> Samir K Mahajan >> >> >> >> >> >> >> >> >> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < >> christophe at pallier.org> wrote: >> >>> Simple: despite its name R2 is not a square. Look up its definition. >>> >>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, >>> wrote: >>> >>>> Dear All, >>>> I am amazed to find negative values of sklearn.metrics.r2_score and >>>> sklearn.metrics.explained_variance_score in a model ( cross validation of >>>> OLS regression model) >>>> However, what amuses me more is seeing you justifying negative >>>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>>> make sense to me . Please justify to me how squared values are negative. >>>> >>>> Regards, >>>> Samir K Mahajan. >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christophe at pallier.org Fri Aug 13 03:36:06 2021 From: christophe at pallier.org (Christophe Pallier) Date: Fri, 13 Aug 2021 09:36:06 +0200 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> Message-ID: Actually, multicollinearity and autocorrelation are problems for *inference* more than for *prediction*. For example, if there is autocorrelation, the residuals are not independent, and the degrees of freedom are wrong for the tests in an OLS model (but you can use, e.g., an AR1 model). On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, wrote: > A note please (to Sebastian Raschka, mrschots). > > > The OLS model that I used ( where the test score gave me a negative > value) was not a good fit. Initial findings showed that t*he > regression coefficients and the model as a whole were significant, *yet > , finally , it failed in two econometrics tests such as VIF (used for > detecting multicollinearity ) and Durbin-Watson test ( used for detecting > auto-correlation). *Presence of multicollinearity and autocorrelation > problems * in the model make it unsuitable for prediction. > Regards, > > Samir K Mahajan. > > On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan < > samirkmahajan1972 at gmail.com> wrote: > >> Thanks to all of you for your kind response. Indeed, it is a >> great learning experience. Yes, econometrics books too create models for >> prediction, and programming really makes things better in a complex >> world. My understanding is that machine learning does depend on >> econometrics too. >> >> My Regards, >> >> Samir K Mahajan >> >> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka < >> mail at sebastianraschka.com> wrote: >> >>> The R2 function in scikit-learn works fine. A negative means that the >>> regression model fits the data worse than a horizontal line representing >>> the sample mean. E.g. you usually get that if you are overfitting the >>> training set a lot and then apply that model to the test set. The >>> econometrics book probably didn't cover applying a model to an independent >>> data or test set, hence the [0, 1] suggestion. >>> >>> Cheers, >>> Sebastian >>> >>> >>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan < >>> samirkmahajan1972 at gmail.com>, wrote: >>> >>> >>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, >>> Thank you for your kind response. Fair enough. I go with you R2 is not >>> a square. However, if you open any book of econometrics, it says R2 is >>> a ratio that lies between 0 and 1. *This is the constraint.* It >>> measures the proportion or percentage of the total variation in response >>> variable (Y) explained by the regressors (Xs) in the model . Remaining >>> proportion of variation in Y, if any, is explained by the residual term(u) >>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a >>> linear scale (-5.763335245921777). This negative value breaks the >>> *constraint.* I just want to highlight that. I think it needs to be >>> corrected. Rest is up to you . >>> >>> I find that Reshama Saikh is hurt by my email. I am really sorry for >>> that. Please note I never undermine your capabilities and initiatives. You >>> are great people doing great jobs. I realise that I should have been more >>> sensible. >>> >>> My regards to all of you. >>> >>> Samir K Mahajan >>> >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < >>> christophe at pallier.org> wrote: >>> >>>> Simple: despite its name R2 is not a square. Look up its definition. >>>> >>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, < >>>> samirkmahajan1972 at gmail.com> wrote: >>>> >>>>> Dear All, >>>>> I am amazed to find negative values of sklearn.metrics.r2_score and >>>>> sklearn.metrics.explained_variance_score in a model ( cross validation of >>>>> OLS regression model) >>>>> However, what amuses me more is seeing you justifying negative >>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>>>> make sense to me . Please justify to me how squared values are negative. >>>>> >>>>> Regards, >>>>> Samir K Mahajan. >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samirkmahajan1972 at gmail.com Fri Aug 13 06:02:55 2021 From: samirkmahajan1972 at gmail.com (Samir K Mahajan) Date: Fri, 13 Aug 2021 15:32:55 +0530 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> Message-ID: Dear Christophe Pallier*,* When we are doing prediction, we are relying on the values of the coefficients of the model created. We are feeding test data on the model for prediction. We may be nterested to see if the OLS estimators(coefficients) are BLUE or not. In the presence of autocorrelation (normally noticed in time series data), residuals are not independent, and as such the OLS estimators are not BLUE in the sense that they don't have minimum variance, and thus no more efficient estimators. Statistical tests (t, F and *?*2) may not be valid. We may reject the model to make predictions in such a situation. . We have to rely upon other improved models. There may be issues relating to multicollinearity (in case of multivariable regression model) and heteroscedasticity (mostly seen in cross-section data) too in a model. Can we discard these tools while predicting a model? Regards, Samir K Mahajan On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier wrote: > Actually, multicollinearity and autocorrelation are problems for > *inference* more than for *prediction*. For example, if there is > autocorrelation, the residuals are not independent, and the degrees of > freedom are wrong for the tests in an OLS model (but you can use, e.g., an > AR1 model). > > On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, > wrote: > >> A note please (to Sebastian Raschka, mrschots). >> >> >> The OLS model that I used ( where the test score gave me a negative >> value) was not a good fit. Initial findings showed that t*he >> regression coefficients and the model as a whole were significant, *yet >> , finally , it failed in two econometrics tests such as VIF (used for >> detecting multicollinearity ) and Durbin-Watson test ( used for detecting >> auto-correlation). *Presence of multicollinearity and autocorrelation >> problems * in the model make it unsuitable for prediction. >> Regards, >> >> Samir K Mahajan. >> >> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan < >> samirkmahajan1972 at gmail.com> wrote: >> >>> Thanks to all of you for your kind response. Indeed, it is a >>> great learning experience. Yes, econometrics books too create models for >>> prediction, and programming really makes things better in a complex >>> world. My understanding is that machine learning does depend on >>> econometrics too. >>> >>> My Regards, >>> >>> Samir K Mahajan >>> >>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka < >>> mail at sebastianraschka.com> wrote: >>> >>>> The R2 function in scikit-learn works fine. A negative means that the >>>> regression model fits the data worse than a horizontal line representing >>>> the sample mean. E.g. you usually get that if you are overfitting the >>>> training set a lot and then apply that model to the test set. The >>>> econometrics book probably didn't cover applying a model to an independent >>>> data or test set, hence the [0, 1] suggestion. >>>> >>>> Cheers, >>>> Sebastian >>>> >>>> >>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan < >>>> samirkmahajan1972 at gmail.com>, wrote: >>>> >>>> >>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, >>>> Thank you for your kind response. Fair enough. I go with you R2 is >>>> not a square. However, if you open any book of econometrics, it says R2 >>>> is a ratio that lies between 0 and 1. *This is the constraint.* It >>>> measures the proportion or percentage of the total variation in response >>>> variable (Y) explained by the regressors (Xs) in the model . Remaining >>>> proportion of variation in Y, if any, is explained by the residual term(u) >>>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a >>>> linear scale (-5.763335245921777). This negative value breaks the >>>> *constraint.* I just want to highlight that. I think it needs to be >>>> corrected. Rest is up to you . >>>> >>>> I find that Reshama Saikh is hurt by my email. I am really sorry for >>>> that. Please note I never undermine your capabilities and initiatives. You >>>> are great people doing great jobs. I realise that I should have been more >>>> sensible. >>>> >>>> My regards to all of you. >>>> >>>> Samir K Mahajan >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < >>>> christophe at pallier.org> wrote: >>>> >>>>> Simple: despite its name R2 is not a square. Look up its definition. >>>>> >>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, < >>>>> samirkmahajan1972 at gmail.com> wrote: >>>>> >>>>>> Dear All, >>>>>> I am amazed to find negative values of sklearn.metrics.r2_score >>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation >>>>>> of OLS regression model) >>>>>> However, what amuses me more is seeing you justifying negative >>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>>>>> make sense to me . Please justify to me how squared values are negative. >>>>>> >>>>>> Regards, >>>>>> Samir K Mahajan. >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christophe at pallier.org Fri Aug 13 06:08:29 2021 From: christophe at pallier.org (Christophe Pallier) Date: Fri, 13 Aug 2021 12:08:29 +0200 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> Message-ID: Indeed , this is basically what I told you (you do not be need to copy textbook stuff: I taught probas/stats) : these are mostly problems for *inference*. On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, wrote: > > Dear Christophe Pallier*,* > > When we are doing prediction, we are relying on the values of the > coefficients of the model created. We are feeding test data on the model > for prediction. We may be nterested to see if the OLS > estimators(coefficients) are BLUE or not. In the presence of > autocorrelation (normally noticed in time series data), residuals are not > independent, and as such the OLS estimators are not BLUE in the sense that > they don't have minimum variance, and thus no more efficient estimators. > Statistical tests (t, F and *?*2) may not be valid. We may reject the > model to make predictions in such a situation. . We have to rely upon > other improved models. There may be issues relating to multicollinearity > (in case of multivariable regression model) and heteroscedasticity (mostly > seen in cross-section data) too in a model. Can we discard these tools > while predicting a model? > > Regards, > > Samir K Mahajan > > > On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier > wrote: > >> Actually, multicollinearity and autocorrelation are problems for >> *inference* more than for *prediction*. For example, if there is >> autocorrelation, the residuals are not independent, and the degrees of >> freedom are wrong for the tests in an OLS model (but you can use, e.g., an >> AR1 model). >> >> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, >> wrote: >> >>> A note please (to Sebastian Raschka, mrschots). >>> >>> >>> The OLS model that I used ( where the test score gave me a negative >>> value) was not a good fit. Initial findings showed that t*he >>> regression coefficients and the model as a whole were significant, *yet >>> , finally , it failed in two econometrics tests such as VIF (used for >>> detecting multicollinearity ) and Durbin-Watson test ( used for detecting >>> auto-correlation). *Presence of multicollinearity and autocorrelation >>> problems * in the model make it unsuitable for prediction. >>> Regards, >>> >>> Samir K Mahajan. >>> >>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan < >>> samirkmahajan1972 at gmail.com> wrote: >>> >>>> Thanks to all of you for your kind response. Indeed, it is a >>>> great learning experience. Yes, econometrics books too create models for >>>> prediction, and programming really makes things better in a complex >>>> world. My understanding is that machine learning does depend on >>>> econometrics too. >>>> >>>> My Regards, >>>> >>>> Samir K Mahajan >>>> >>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka < >>>> mail at sebastianraschka.com> wrote: >>>> >>>>> The R2 function in scikit-learn works fine. A negative means that the >>>>> regression model fits the data worse than a horizontal line representing >>>>> the sample mean. E.g. you usually get that if you are overfitting the >>>>> training set a lot and then apply that model to the test set. The >>>>> econometrics book probably didn't cover applying a model to an independent >>>>> data or test set, hence the [0, 1] suggestion. >>>>> >>>>> Cheers, >>>>> Sebastian >>>>> >>>>> >>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan < >>>>> samirkmahajan1972 at gmail.com>, wrote: >>>>> >>>>> >>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, >>>>> Thank you for your kind response. Fair enough. I go with you R2 is >>>>> not a square. However, if you open any book of econometrics, it says R2 >>>>> is a ratio that lies between 0 and 1. *This is the constraint.* It >>>>> measures the proportion or percentage of the total variation in response >>>>> variable (Y) explained by the regressors (Xs) in the model . Remaining >>>>> proportion of variation in Y, if any, is explained by the residual term(u) >>>>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a >>>>> linear scale (-5.763335245921777). This negative value breaks the >>>>> *constraint.* I just want to highlight that. I think it needs to be >>>>> corrected. Rest is up to you . >>>>> >>>>> I find that Reshama Saikh is hurt by my email. I am really sorry for >>>>> that. Please note I never undermine your capabilities and initiatives. You >>>>> are great people doing great jobs. I realise that I should have been more >>>>> sensible. >>>>> >>>>> My regards to all of you. >>>>> >>>>> Samir K Mahajan >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < >>>>> christophe at pallier.org> wrote: >>>>> >>>>>> Simple: despite its name R2 is not a square. Look up its definition. >>>>>> >>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, < >>>>>> samirkmahajan1972 at gmail.com> wrote: >>>>>> >>>>>>> Dear All, >>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score >>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation >>>>>>> of OLS regression model) >>>>>>> However, what amuses me more is seeing you justifying negative >>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>>>>>> make sense to me . Please justify to me how squared values are negative. >>>>>>> >>>>>>> Regards, >>>>>>> Samir K Mahajan. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danshiebler at gmail.com Fri Aug 13 16:24:38 2021 From: danshiebler at gmail.com (Dan Shiebler) Date: Fri, 13 Aug 2021 16:24:38 -0400 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> Message-ID: Hey Samir, this blog post has some more details on the difference between the square of the correlation coefficient and the coefficient of determination: danshiebler.com/2017-06-25-metrics/ On Fri, Aug 13, 2021 at 6:10 AM Christophe Pallier wrote: > Indeed , this is basically what I told you (you do not be need to copy > textbook stuff: I taught probas/stats) : these are mostly problems for > *inference*. > > On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, > wrote: > >> >> Dear Christophe Pallier*,* >> >> When we are doing prediction, we are relying on the values of the >> coefficients of the model created. We are feeding test data on the model >> for prediction. We may be nterested to see if the OLS >> estimators(coefficients) are BLUE or not. In the presence of >> autocorrelation (normally noticed in time series data), residuals are not >> independent, and as such the OLS estimators are not BLUE in the sense that >> they don't have minimum variance, and thus no more efficient estimators. >> Statistical tests (t, F and *?*2) may not be valid. We may reject the >> model to make predictions in such a situation. . We have to rely upon >> other improved models. There may be issues relating to multicollinearity >> (in case of multivariable regression model) and heteroscedasticity (mostly >> seen in cross-section data) too in a model. Can we discard these tools >> while predicting a model? >> >> Regards, >> >> Samir K Mahajan >> >> >> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier < >> christophe at pallier.org> wrote: >> >>> Actually, multicollinearity and autocorrelation are problems for >>> *inference* more than for *prediction*. For example, if there is >>> autocorrelation, the residuals are not independent, and the degrees of >>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an >>> AR1 model). >>> >>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, >>> wrote: >>> >>>> A note please (to Sebastian Raschka, mrschots). >>>> >>>> >>>> The OLS model that I used ( where the test score gave me a negative >>>> value) was not a good fit. Initial findings showed that t*he >>>> regression coefficients and the model as a whole were significant, *yet >>>> , finally , it failed in two econometrics tests such as VIF (used for >>>> detecting multicollinearity ) and Durbin-Watson test ( used for detecting >>>> auto-correlation). *Presence of multicollinearity and autocorrelation >>>> problems * in the model make it unsuitable for prediction. >>>> Regards, >>>> >>>> Samir K Mahajan. >>>> >>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan < >>>> samirkmahajan1972 at gmail.com> wrote: >>>> >>>>> Thanks to all of you for your kind response. Indeed, it is a >>>>> great learning experience. Yes, econometrics books too create models for >>>>> prediction, and programming really makes things better in a complex >>>>> world. My understanding is that machine learning does depend on >>>>> econometrics too. >>>>> >>>>> My Regards, >>>>> >>>>> Samir K Mahajan >>>>> >>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka < >>>>> mail at sebastianraschka.com> wrote: >>>>> >>>>>> The R2 function in scikit-learn works fine. A negative means that the >>>>>> regression model fits the data worse than a horizontal line representing >>>>>> the sample mean. E.g. you usually get that if you are overfitting the >>>>>> training set a lot and then apply that model to the test set. The >>>>>> econometrics book probably didn't cover applying a model to an independent >>>>>> data or test set, hence the [0, 1] suggestion. >>>>>> >>>>>> Cheers, >>>>>> Sebastian >>>>>> >>>>>> >>>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan < >>>>>> samirkmahajan1972 at gmail.com>, wrote: >>>>>> >>>>>> >>>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, >>>>>> Thank you for your kind response. Fair enough. I go with you R2 is >>>>>> not a square. However, if you open any book of econometrics, it says R2 >>>>>> is a ratio that lies between 0 and 1. *This is the constraint.* >>>>>> It measures the proportion or percentage of the total variation in >>>>>> response variable (Y) explained by the regressors (Xs) in the model . >>>>>> Remaining proportion of variation in Y, if any, is explained by the >>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative >>>>>> value lying on a linear scale (-5.763335245921777). This negative >>>>>> value breaks the *constraint.* I just want to highlight that. I >>>>>> think it needs to be corrected. Rest is up to you . >>>>>> >>>>>> I find that Reshama Saikh is hurt by my email. I am really sorry >>>>>> for that. Please note I never undermine your capabilities and initiatives. >>>>>> You are great people doing great jobs. I realise that I should have been >>>>>> more sensible. >>>>>> >>>>>> My regards to all of you. >>>>>> >>>>>> Samir K Mahajan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < >>>>>> christophe at pallier.org> wrote: >>>>>> >>>>>>> Simple: despite its name R2 is not a square. Look up its definition. >>>>>>> >>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, < >>>>>>> samirkmahajan1972 at gmail.com> wrote: >>>>>>> >>>>>>>> Dear All, >>>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score >>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation >>>>>>>> of OLS regression model) >>>>>>>> However, what amuses me more is seeing you justifying negative >>>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>>>>>>> make sense to me . Please justify to me how squared values are negative. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Samir K Mahajan. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> scikit-learn mailing list >>>>>>>> scikit-learn at python.org >>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- danshiebler.com (973) - 518 - 0886 -------------- next part -------------- An HTML attachment was scrubbed... URL: From samirkmahajan1972 at gmail.com Sat Aug 14 02:17:01 2021 From: samirkmahajan1972 at gmail.com (Samir K Mahajan) Date: Sat, 14 Aug 2021 11:47:01 +0530 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> Message-ID: Dear Chrisophe, I think you are oversimplifying by saying econometrics tools are for inference. Forecasting and prediction are integral parts of econometric analysis. Econometricians forecast by inferring the right conclusion about the model . I wish to convey to you that I teach both statistics and econometrics, and am now learning ML. There is a fundamental difference among statistics, econometrics and machine learning. Regards, Samir K Mahajan On Fri, Aug 13, 2021 at 3:39 PM Christophe Pallier wrote: > Indeed , this is basically what I told you (you do not be need to copy > textbook stuff: I taught probas/stats) : these are mostly problems for > *inference*. > > On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, > wrote: > >> >> Dear Christophe Pallier*,* >> >> When we are doing prediction, we are relying on the values of the >> coefficients of the model created. We are feeding test data on the model >> for prediction. We may be nterested to see if the OLS >> estimators(coefficients) are BLUE or not. In the presence of >> autocorrelation (normally noticed in time series data), residuals are not >> independent, and as such the OLS estimators are not BLUE in the sense that >> they don't have minimum variance, and thus no more efficient estimators. >> Statistical tests (t, F and *?*2) may not be valid. We may reject the >> model to make predictions in such a situation. . We have to rely upon >> other improved models. There may be issues relating to multicollinearity >> (in case of multivariable regression model) and heteroscedasticity (mostly >> seen in cross-section data) too in a model. Can we discard these tools >> while predicting a model? >> >> Regards, >> >> Samir K Mahajan >> >> >> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier < >> christophe at pallier.org> wrote: >> >>> Actually, multicollinearity and autocorrelation are problems for >>> *inference* more than for *prediction*. For example, if there is >>> autocorrelation, the residuals are not independent, and the degrees of >>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an >>> AR1 model). >>> >>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, >>> wrote: >>> >>>> A note please (to Sebastian Raschka, mrschots). >>>> >>>> >>>> The OLS model that I used ( where the test score gave me a negative >>>> value) was not a good fit. Initial findings showed that t*he >>>> regression coefficients and the model as a whole were significant, *yet >>>> , finally , it failed in two econometrics tests such as VIF (used for >>>> detecting multicollinearity ) and Durbin-Watson test ( used for detecting >>>> auto-correlation). *Presence of multicollinearity and autocorrelation >>>> problems * in the model make it unsuitable for prediction. >>>> Regards, >>>> >>>> Samir K Mahajan. >>>> >>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan < >>>> samirkmahajan1972 at gmail.com> wrote: >>>> >>>>> Thanks to all of you for your kind response. Indeed, it is a >>>>> great learning experience. Yes, econometrics books too create models for >>>>> prediction, and programming really makes things better in a complex >>>>> world. My understanding is that machine learning does depend on >>>>> econometrics too. >>>>> >>>>> My Regards, >>>>> >>>>> Samir K Mahajan >>>>> >>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka < >>>>> mail at sebastianraschka.com> wrote: >>>>> >>>>>> The R2 function in scikit-learn works fine. A negative means that the >>>>>> regression model fits the data worse than a horizontal line representing >>>>>> the sample mean. E.g. you usually get that if you are overfitting the >>>>>> training set a lot and then apply that model to the test set. The >>>>>> econometrics book probably didn't cover applying a model to an independent >>>>>> data or test set, hence the [0, 1] suggestion. >>>>>> >>>>>> Cheers, >>>>>> Sebastian >>>>>> >>>>>> >>>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan < >>>>>> samirkmahajan1972 at gmail.com>, wrote: >>>>>> >>>>>> >>>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, >>>>>> Thank you for your kind response. Fair enough. I go with you R2 is >>>>>> not a square. However, if you open any book of econometrics, it says R2 >>>>>> is a ratio that lies between 0 and 1. *This is the constraint.* >>>>>> It measures the proportion or percentage of the total variation in >>>>>> response variable (Y) explained by the regressors (Xs) in the model . >>>>>> Remaining proportion of variation in Y, if any, is explained by the >>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative >>>>>> value lying on a linear scale (-5.763335245921777). This negative >>>>>> value breaks the *constraint.* I just want to highlight that. I >>>>>> think it needs to be corrected. Rest is up to you . >>>>>> >>>>>> I find that Reshama Saikh is hurt by my email. I am really sorry >>>>>> for that. Please note I never undermine your capabilities and initiatives. >>>>>> You are great people doing great jobs. I realise that I should have been >>>>>> more sensible. >>>>>> >>>>>> My regards to all of you. >>>>>> >>>>>> Samir K Mahajan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < >>>>>> christophe at pallier.org> wrote: >>>>>> >>>>>>> Simple: despite its name R2 is not a square. Look up its definition. >>>>>>> >>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, < >>>>>>> samirkmahajan1972 at gmail.com> wrote: >>>>>>> >>>>>>>> Dear All, >>>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score >>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation >>>>>>>> of OLS regression model) >>>>>>>> However, what amuses me more is seeing you justifying negative >>>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>>>>>>> make sense to me . Please justify to me how squared values are negative. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Samir K Mahajan. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> scikit-learn mailing list >>>>>>>> scikit-learn at python.org >>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.caorsi at l2f.ch Sat Aug 14 09:12:18 2021 From: m.caorsi at l2f.ch (Matteo Caorsi) Date: Sat, 14 Aug 2021 13:12:18 +0000 Subject: [scikit-learn] random forests and multil-class probability In-Reply-To: References: <031152d2-ca59-69ee-b04c-125fda724105@gmail.com> <7D53A0FD-EB5E-4C27-966B-D6954EEF7398@gmail.com> Message-ID: Greetings! I am currently out of office, with limited access to emails, till August the 30th. Please contact support at giotto.ai for technical issue concerning Giotto Platform. Otherwise, I will reply to your email as soon as possible upon my return. With best regards, Matteo On 27 Jul 2021, at 12:42, Brown J.B. via scikit-learn wrote: 2021?7?27?(?) 12:03 Guillaume Lema?tre : As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification. Correct, the TPR-FPR curve (ROC) was originally intended for tuning a free parameter, in signal detection, and is a binary-type metric. For ML problems, it lets you tune/determine an estimator's output value threshold (e.g., a probability or a raw discriminant value such as in SVM) for arriving an optimized model that will be used to give a final, binary-discretized answer in new prediction tasks. Hope this helps, J.B. _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.caorsi at l2f.ch Sat Aug 14 09:12:23 2021 From: m.caorsi at l2f.ch (Matteo Caorsi) Date: Sat, 14 Aug 2021 13:12:23 +0000 Subject: [scikit-learn] random forests and multil-class probability In-Reply-To: References: Message-ID: Greetings! I am currently out of office, with limited access to emails, till August the 30th. Please contact support at giotto.ai for technical issue concerning Giotto Platform. Otherwise, I will reply to your email as soon as possible upon my return. With best regards, Matteo On 27 Jul 2021, at 11:31, Sole Galli via scikit-learn wrote: Thank you! I was confused because in the multiclass documentation it says that for those estimators that have multiclass support built in, like Decision trees and Random Forests, then we do not need to use the wrapper classes like the OnevsRest. Thus I have the following question, if I want to determine the PR curves or the ROC curve, say with micro-average, do I need to wrap them with the 1 vs rest? Or it does not matter? The probability values do change slightly. Thank you! ??????? Original Message ??????? On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lema?tre wrote: On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn scikit-learn at python.org wrote: Hello community, Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?. Each decision tree of the forest is natively supporting multi class. The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1? According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1. According to the documentation, the probabilities are computed as: The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf. Thank you Sole scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.caorsi at l2f.ch Sat Aug 14 09:13:25 2021 From: m.caorsi at l2f.ch (Matteo Caorsi) Date: Sat, 14 Aug 2021 13:13:25 +0000 Subject: [scikit-learn] random forests and multil-class probability In-Reply-To: References: <031152d2-ca59-69ee-b04c-125fda724105@gmail.com> <7D53A0FD-EB5E-4C27-966B-D6954EEF7398@gmail.com> Message-ID: Greetings! I am currently out of office, with limited access to emails, till August the 30th. Please contact support at giotto.ai for technical issues concerning Giotto Platform. Otherwise, I will reply to your email as soon as possible upon my return. With best regards, Matteo On 27 Jul 2021, at 12:42, Brown J.B. via scikit-learn wrote: 2021?7?27?(?) 12:03 Guillaume Lema?tre : As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification. Correct, the TPR-FPR curve (ROC) was originally intended for tuning a free parameter, in signal detection, and is a binary-type metric. For ML problems, it lets you tune/determine an estimator's output value threshold (e.g., a probability or a raw discriminant value such as in SVM) for arriving an optimized model that will be used to give a final, binary-discretized answer in new prediction tasks. Hope this helps, J.B. _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.caorsi at l2f.ch Sat Aug 14 09:13:28 2021 From: m.caorsi at l2f.ch (Matteo Caorsi) Date: Sat, 14 Aug 2021 13:13:28 +0000 Subject: [scikit-learn] random forests and multil-class probability In-Reply-To: References: Message-ID: <10E3C9FF-9280-49BE-A617-41B9D0CFE417@l2f.ch> Greetings! I am currently out of office, with limited access to emails, till August the 30th. Please contact support at giotto.ai for technical issues concerning Giotto Platform. Otherwise, I will reply to your email as soon as possible upon my return. With best regards, Matteo On 27 Jul 2021, at 11:31, Sole Galli via scikit-learn wrote: Thank you! I was confused because in the multiclass documentation it says that for those estimators that have multiclass support built in, like Decision trees and Random Forests, then we do not need to use the wrapper classes like the OnevsRest. Thus I have the following question, if I want to determine the PR curves or the ROC curve, say with micro-average, do I need to wrap them with the 1 vs rest? Or it does not matter? The probability values do change slightly. Thank you! ??????? Original Message ??????? On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lema?tre wrote: On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn scikit-learn at python.org wrote: Hello community, Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?. Each decision tree of the forest is natively supporting multi class. The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1? According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1. According to the documentation, the probabilities are computed as: The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf. Thank you Sole scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From francois.dion at gmail.com Sat Aug 14 09:52:00 2021 From: francois.dion at gmail.com (Francois Dion) Date: Sat, 14 Aug 2021 09:52:00 -0400 Subject: [scikit-learn] random forests and multil-class probability In-Reply-To: <7D53A0FD-EB5E-4C27-966B-D6954EEF7398@gmail.com> References: <7D53A0FD-EB5E-4C27-966B-D6954EEF7398@gmail.com> Message-ID: Yellowbrick has multi label precision recall curves and multiclass roc/auc builtin: https://www.scikit-yb.org/en/latest/api/classifier/rocauc.html Sent from my iPad > On Jul 27, 2021, at 6:03 AM, Guillaume Lema?tre wrote: > > ?As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification. > Then, we provide an example for precision-recall that shows one way to compute precision-recall curve via averaging: https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > >> On 27 Jul 2021, at 11:42, Sole Galli via scikit-learn wrote: >> >> Thank you! >> >> So when in the multiclass document says that for the algorithms that support intrinsically multiclass, which are listed here, when it says that they do not need to be wrapped by the OnevsRest, it means that there is no need, because they can indeed handle multi class, each one in their own way. >> >> But, if I want to plot PR curves or ROC curves, then I do need to wrap them because those metrics are calculated as a 1 vs rest manner, and this is not how it is handled by the algos. Is my understanding correct? >> >> Thank you! >> >> ??????? Original Message ??????? >> On Tuesday, July 27th, 2021 at 11:33 AM, Nicolas Hug wrote: >>> To add to Guillaume's answer: the native multiclass support for forests/trees is described here: https://scikit-learn.org/stable/modules/tree.html#multi-output-problems >>> >>> It's not a one-vs-rest strategy and can be summed up as: >>> >>> >>>> Store n output values in leaves, instead of 1; >>>> >>>> Use splitting criteria that compute the average reduction across all n outputs. >>>> >>> >>> >>> Nicolas >>> >>> On 27/07/2021 10:22, Guillaume Lema?tre wrote: >>>>>> On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn wrote: >>>>>> >>>>>> Hello community, >>>>>> >>>>>> Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?. >>>>> Each decision tree of the forest is natively supporting multi class. >>>>> >>>>> The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1? >>>> According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1. >>>> According to the documentation, the probabilities are computed as: >>>> >>>> The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf. >>>> >>>>> Thank you >>>>> Sole >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From fernando.wittmann at gmail.com Sat Aug 14 10:04:24 2021 From: fernando.wittmann at gmail.com (Fernando Marcos Wittmann) Date: Sat, 14 Aug 2021 11:04:24 -0300 Subject: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score In-Reply-To: References: <7d546c00-43ef-430c-b8e0-b046eb4748d6@Spark> Message-ID: Hi Samir, the following visualization might be useful for gaining intuition on the meaning of a negative r2: https://gist.github.com/WittmannF/02060b45ce3ec9239898a5b91df2564e A negative r2 is reflects into a model predicting the opposite trend of the data. On Sat, Aug 14, 2021, 03:17 Samir K Mahajan wrote: > Dear Chrisophe, > I think you are oversimplifying by saying econometrics tools are for > inference. Forecasting and prediction are integral parts of econometric > analysis. Econometricians forecast by inferring the right conclusion > about the model . I wish to convey to you that I teach both > statistics and econometrics, and am now learning ML. There is a > fundamental difference among statistics, econometrics and machine > learning. > Regards, > > Samir K Mahajan > > On Fri, Aug 13, 2021 at 3:39 PM Christophe Pallier > wrote: > >> Indeed , this is basically what I told you (you do not be need to copy >> textbook stuff: I taught probas/stats) : these are mostly problems for >> *inference*. >> >> On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, >> wrote: >> >>> >>> Dear Christophe Pallier*,* >>> >>> When we are doing prediction, we are relying on the values of the >>> coefficients of the model created. We are feeding test data on the model >>> for prediction. We may be nterested to see if the OLS >>> estimators(coefficients) are BLUE or not. In the presence of >>> autocorrelation (normally noticed in time series data), residuals are not >>> independent, and as such the OLS estimators are not BLUE in the sense that >>> they don't have minimum variance, and thus no more efficient estimators. >>> Statistical tests (t, F and *?*2) may not be valid. We may reject the >>> model to make predictions in such a situation. . We have to rely upon >>> other improved models. There may be issues relating to multicollinearity >>> (in case of multivariable regression model) and heteroscedasticity (mostly >>> seen in cross-section data) too in a model. Can we discard these tools >>> while predicting a model? >>> >>> Regards, >>> >>> Samir K Mahajan >>> >>> >>> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier < >>> christophe at pallier.org> wrote: >>> >>>> Actually, multicollinearity and autocorrelation are problems for >>>> *inference* more than for *prediction*. For example, if there is >>>> autocorrelation, the residuals are not independent, and the degrees of >>>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an >>>> AR1 model). >>>> >>>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, < >>>> samirkmahajan1972 at gmail.com> wrote: >>>> >>>>> A note please (to Sebastian Raschka, mrschots). >>>>> >>>>> >>>>> The OLS model that I used ( where the test score gave me a >>>>> negative value) was not a good fit. Initial findings showed that t*he >>>>> regression coefficients and the model as a whole were significant, *yet >>>>> , finally , it failed in two econometrics tests such as VIF (used for >>>>> detecting multicollinearity ) and Durbin-Watson test ( used for detecting >>>>> auto-correlation). *Presence of multicollinearity and >>>>> autocorrelation problems * in the model make it unsuitable for >>>>> prediction. >>>>> Regards, >>>>> >>>>> Samir K Mahajan. >>>>> >>>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan < >>>>> samirkmahajan1972 at gmail.com> wrote: >>>>> >>>>>> Thanks to all of you for your kind response. Indeed, it is a >>>>>> great learning experience. Yes, econometrics books too create models for >>>>>> prediction, and programming really makes things better in a complex >>>>>> world. My understanding is that machine learning does depend on >>>>>> econometrics too. >>>>>> >>>>>> My Regards, >>>>>> >>>>>> Samir K Mahajan >>>>>> >>>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka < >>>>>> mail at sebastianraschka.com> wrote: >>>>>> >>>>>>> The R2 function in scikit-learn works fine. A negative means that >>>>>>> the regression model fits the data worse than a horizontal line >>>>>>> representing the sample mean. E.g. you usually get that if you are >>>>>>> overfitting the training set a lot and then apply that model to the test >>>>>>> set. The econometrics book probably didn't cover applying a model to an >>>>>>> independent data or test set, hence the [0, 1] suggestion. >>>>>>> >>>>>>> Cheers, >>>>>>> Sebastian >>>>>>> >>>>>>> >>>>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan < >>>>>>> samirkmahajan1972 at gmail.com>, wrote: >>>>>>> >>>>>>> >>>>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas, >>>>>>> Thank you for your kind response. Fair enough. I go with you R2 is >>>>>>> not a square. However, if you open any book of econometrics, it says R2 >>>>>>> is a ratio that lies between 0 and 1. *This is the constraint.* >>>>>>> It measures the proportion or percentage of the total variation in >>>>>>> response variable (Y) explained by the regressors (Xs) in the model . >>>>>>> Remaining proportion of variation in Y, if any, is explained by the >>>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative >>>>>>> value lying on a linear scale (-5.763335245921777). This negative >>>>>>> value breaks the *constraint.* I just want to highlight that. I >>>>>>> think it needs to be corrected. Rest is up to you . >>>>>>> >>>>>>> I find that Reshama Saikh is hurt by my email. I am really sorry >>>>>>> for that. Please note I never undermine your capabilities and initiatives. >>>>>>> You are great people doing great jobs. I realise that I should have been >>>>>>> more sensible. >>>>>>> >>>>>>> My regards to all of you. >>>>>>> >>>>>>> Samir K Mahajan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier < >>>>>>> christophe at pallier.org> wrote: >>>>>>> >>>>>>>> Simple: despite its name R2 is not a square. Look up its definition. >>>>>>>> >>>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, < >>>>>>>> samirkmahajan1972 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Dear All, >>>>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score >>>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation >>>>>>>>> of OLS regression model) >>>>>>>>> However, what amuses me more is seeing you justifying negative >>>>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not >>>>>>>>> make sense to me . Please justify to me how squared values are negative. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Samir K Mahajan. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> scikit-learn mailing list >>>>>>>>> scikit-learn at python.org >>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> scikit-learn mailing list >>>>>>>> scikit-learn at python.org >>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrin.jalali at gmail.com Mon Aug 16 05:56:57 2021 From: adrin.jalali at gmail.com (Adrin) Date: Mon, 16 Aug 2021 11:56:57 +0200 Subject: [scikit-learn] Pandas copy-on-write proposal Message-ID: Hi there, I'd like to bring your attention to a proposal being discussed among pandas developers, regarding copy-on-write semantics. A very short summary of the proposal, according to the document , is: *- The result of any indexing operation (subsetting a DataFrame or Series in any way, i.e. including accessing a DataFrame column as a Series) or any method returning a new DataFrame or Series, always behaves as if it were a copy in terms of user API.- We implement Copy-on-Write (as implementation detail). This way, we can actually use views as much as possible under the hood, while ensuring the user API behaves as a copy.* *- As a consequence, if you want to modify an object (DataFrame or Series), the only way to do this is to modify that object itself directly.* *This addresses multiple aspects: 1) a clear and consistent user API (a clear rule: any subset or returned series/dataframe always behaves as a copy of the original, and thus never modifies the original) and 2) improving performance by avoiding excessive copies (eg a chained method workflow would no longer return an actual data copy at each step). Because every single indexing step behaves as a copy, this also means that with this proposal, ?chained assignment? (with multiple setitem steps) will never work.* You can also read the related discussion on the pandas mailing list here . It would be nice for us to think about the implications of this proposal on our work related to supporting pandas dataframes. Cheers, Adrin -------------- next part -------------- An HTML attachment was scrubbed... URL: From petrizzo at gmail.com Mon Aug 16 17:30:33 2021 From: petrizzo at gmail.com (Mariangela Petrizzo) Date: Mon, 16 Aug 2021 17:30:33 -0400 Subject: [scikit-learn] Spanish translation proposal for Scikit-Learn documentation In-Reply-To: References: Message-ID: <371E9F54-EB4F-45C7-AE19-07E1E769BC40@getmailspring.com> Hello everyone! We are writing briefly to announce that the Spanish translation of the Sci-kit learn 0.24.2 documentation is now available from: https://qu4nt.github.io/sklearn-doc-es/index.html Soon we will update in that repository the suggested workflow for future translations of this documentation. We are now in the final phase of this work, debugging and fine-tuning the last details, but we update the html version daily. It has been a great pleasure for our team to support the Spanish community of users of this library and the Python community in general, with our work. Mari?ngela Petrizzo http://qu4nt.com Mar?a ?ngela Petrizzo P?ez About Me (about.me/petrizzo) Desc?rgate Redes para la Comprensi?n de la Pol?tica (http://www.elperroylarana.gob.ve/redes-para-la-comprension-de-la-politica/) Usuario Linux # 498889 Miembro Red de Polit?logas - #NoSinMujeres (https://www.nosinmujeres.com/) Publicaciones (https://hotelescuela.academia.edu/MariangelaPetrizzoPaez) ORCID (http://orcid.org/0000-0001-9483-4185) PEII - Nivel B On feb. 9 2021, at 4:15 pm, Mariangela Petrizzo wrote: > Dear Scikit-Learn team! > > > > I am Mari?ngela Petrizzo, I am writing to you as a member of Qu4nt, a team dedicated to the use of open source tools for the development of software solutions with emphasis on data science. We have a strong interest in translating the Scikit-Learn documentation into Spanish. > Our team is made up of members from various scientific fields, including some university faculty in linguistics and computer sciences, with a wide experience in Python as well as several libraries used for data analysis and machine learning, and also contribute locally as evangelists of its use in Spanish-speaking communities, in particular, the leader initiated the translation of some Software Carpentry lessons into Spanish. > That is why we have been discussing the opportunity to offer our contribution to the Python project, promoting the translation into Spanish of the documentation of some of the libraries with the greatest impact in our areas of interest. Talking with David Mertz, to whom we are sending a copy of this email, we have explored options, and the idea of working with Scikit-learn has really seemed to be an exceptional opportunity for all of us and the community. He's very enthusiastic about the idea of generating a spanish translation of Scientific Python libraries like Scikit-learn. > For us, this translation project has to be done through a completely open work on Github, taking as reference the restructured text sources for Sphinx from a git fork, using the tools provided by Sphinx itself for internationalization: https://www.sphinx-doc.org/en/1.8/intl.html, and applying tags to perform planned updates. In addition, as with any open source project, the main mechanism for quality assurance comes from the users themselves who will have the channels available for submitting issues. Our intention is to secure all the infrastructure and mechanisms to make this possible: making the process transparent through Github, using as much as possible tools like Transifex to facilitate participation, and providing guidelines for contributors as part of the project. > Of course, this project cannot be realized without your support. We therefore come to you to inquire about your willingness to accompany and support this project. > We would love to hear your feedback on our proposal. > Best regards, > > > Mari?ngela > > > -- > > Mar?a ?ngela Petrizzo P?ez > > about.me/petrizzo (https://about.me/petrizzo?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel&utm_content=plaintext) > > > > > > > > > > > > > Desc?rgate Redes para la Comprensi?n de la Pol?tica (http://www.elperroylarana.gob.ve/redes-para-la-comprension-de-la-politica/) > > > > > A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino lo primero que se siembra y, por tanto, lo m?s radical. > > > El ?nico modo de vencer el secuestro del conocimiento > es comprender sus razones. > La manera de revertirlo, > es hacernos hackers de los secuestros cotidianos > a cambio de no morir sin saber lo que somos > > ?Piensa para vivir, > act?a para hackear! > Cada d?a, una acci?n procom?n a la vez. > > > ?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos? > > Albert Camus > > > > ?El poder, lejos de estorbar al saber, lo produce.? - Michael Foucault > Usuario Linux # 498889 > Miembro Red de Polit?logas - #NoSinMujeres (http://www.nosinmujeres.com/) > https://hotelescuela.academia.edu/MariangelaPetrizzoPaez > http://orcid.org/0000-0001-9483-4185 > PEII - Nivel B > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From reshama.stat at gmail.com Tue Aug 17 09:03:08 2021 From: reshama.stat at gmail.com (Reshama Shaikh) Date: Tue, 17 Aug 2021 09:03:08 -0400 Subject: [scikit-learn] Spanish translation proposal for Scikit-Learn documentation In-Reply-To: <371E9F54-EB4F-45C7-AE19-07E1E769BC40@getmailspring.com> References: <371E9F54-EB4F-45C7-AE19-07E1E769BC40@getmailspring.com> Message-ID: Hi Mari?ngela, That's an impressive accomplishment! Congratulations. A PR can be submitted to add the Spanish translation link to this page in scikit-learn documentation: https://scikit-learn.org/dev/related_projects.html#translations-of-scikit-learn-documentation Reshama Shaikh she/her Blog | Twitter | LinkedIn | GitHub Data Umbrella NYC PyLadies On Mon, Aug 16, 2021 at 5:32 PM Mariangela Petrizzo wrote: > > Hello everyone! > > We are writing briefly to announce that the Spanish translation of the > Sci-kit learn 0.24.2 documentation is now available from: > > https://qu4nt.github.io/sklearn-doc-es/index.html > > Soon we will update in that repository the suggested workflow for future > translations of this documentation. We are now in the final phase of this > work, debugging and fine-tuning the last details, but we update the html > version daily. > > It has been a great pleasure for our team to support the Spanish community > of users of this library and the Python community in general, with our work. > > > Mari?ngela Petrizzo > http://qu4nt.com > > Mar?a ?ngela Petrizzo P?ezAbout Me > Desc?rgate Redes para la Comprensi?n de la Pol?tica > > Usuario Linux # 498889 > Miembro Red de Polit?logas - #NoSinMujeres > Publicaciones > ORCID PEII - Nivel B > On feb. 9 2021, at 4:15 pm, Mariangela Petrizzo > wrote: > > Dear Scikit-Learn team! > > > > I am Mari?ngela Petrizzo, I am writing to you as a member of Qu4nt, a team > dedicated to the use of open source tools for the development of software > solutions with emphasis on data science. We have a strong interest in > translating the Scikit-Learn documentation into Spanish. > > Our team is made up of members from various scientific fields, including > some university faculty in linguistics and computer sciences, with a wide > experience in Python as well as several libraries used for data analysis > and machine learning, and also contribute locally as evangelists of its > use in Spanish-speaking communities, in particular, the leader initiated > the translation of some Software Carpentry lessons into Spanish. > > That is why we have been discussing the opportunity to offer our > contribution to the Python project, promoting the translation into Spanish > of the documentation of some of the libraries with the greatest impact in > our areas of interest. Talking with David Mertz, to whom we are sending a > copy of this email, we have explored options, and the idea of working with > Scikit-learn has really seemed to be an exceptional opportunity for all of > us and the community. He's very enthusiastic about the idea of generating a > spanish translation of Scientific Python libraries like Scikit-learn. > > For us, this translation project has to be done through a completely open > work on Github, taking as reference the restructured text sources for > Sphinx from a git fork, using the tools provided by Sphinx itself for > internationalization: https://www.sphinx-doc.org/en/1.8/intl.html > , and applying tags to > perform planned updates. In addition, as with any open source project, the > main mechanism for quality assurance comes from the users themselves who > will have the channels available for submitting issues. Our intention is to > secure all the infrastructure and mechanisms to make this possible: making > the process transparent through Github, using as much as possible tools > like Transifex to facilitate participation, and providing guidelines for > contributors as part of the project. > > Of course, this project cannot be realized without your support. We > therefore come to you to inquire about your willingness to accompany and > support this project. > > We would love to hear your feedback on our proposal. > > Best regards, > > > > Mari?ngela > > > -- > > > > Mar?a ?ngela Petrizzo P?ez > [image: https://] > [image: https://]about.me/petrizzo > > Desc?rgate Redes para la Comprensi?n de la Pol?tica > > > *A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino > lo primero que se siembra y, por tanto, lo m?s radical.* > > > El ?nico modo de vencer el secuestro del conocimiento > es comprender sus razones. > La manera de revertirlo, > es hacernos hackers de los secuestros cotidianos > a cambio de no morir sin saber lo que somos > > ?Piensa para vivir, > act?a para hackear! > Cada d?a, una acci?n procom?n a la vez. > > > *?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos?* > *Albert Camus* > > *?El poder, lejos de estorbar al saber, lo produce.? - Michael Foucault* > > > Usuario Linux # 498889 > Miembro Red de Polit?logas - #NoSinMujeres > https://hotelescuela.academia.edu/MariangelaPetrizzoPaez > http://orcid.org/0000-0001-9483-4185 > PEII - Nivel B > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johngrenci61 at yahoo.com Fri Aug 20 18:15:09 2021 From: johngrenci61 at yahoo.com (John Grenci) Date: Fri, 20 Aug 2021 22:15:09 +0000 (UTC) Subject: [scikit-learn] cant install scikit-learn References: <1717831625.594362.1629497709231.ref@mail.yahoo.com> Message-ID: <1717831625.594362.1629497709231@mail.yahoo.com> Hello, hoping somebody can help me. ? I have tried.. what seems like everything. ? I get an OS error ? ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz' HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at?https://pip.pypa.io/warnings/enable-long-paths ? ? I tried enabling more than 260 characters as suggested, but that did not help? gave me a different error actually. ? I don?t think it has to do with bits, as my computer is 64 bit. I also tried pip install sklearn ? I am at a loss at this point. ? PS- I am ?not the most techy of person.? also, looked everywhere online that I could can somebody help? ? Thanks John ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mablue92 at gmail.com Sun Aug 22 02:17:05 2021 From: mablue92 at gmail.com (Masoud Azizi) Date: Sun, 22 Aug 2021 10:47:05 +0430 Subject: [scikit-learn] how the skpot optimize avoids flats Message-ID: Hi to all Im new in sk mailing list :) I need your help about that how hyperoption avoids this flat places? is there a code address to findout that? see the attachment -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: unnamed.png Type: image/png Size: 21512 bytes Desc: not available URL: From skacanski at gmail.com Sun Aug 22 16:24:45 2021 From: skacanski at gmail.com (Sasha Kacanski) Date: Sun, 22 Aug 2021 16:24:45 -0400 Subject: [scikit-learn] cant install scikit-learn In-Reply-To: <1717831625.594362.1629497709231@mail.yahoo.com> References: <1717831625.594362.1629497709231.ref@mail.yahoo.com> <1717831625.594362.1629497709231@mail.yahoo.com> Message-ID: Who about Linux desktop for a change. i suggest Debian or Arch! On Fri, Aug 20, 2021 at 6:17 PM John Grenci via scikit-learn < scikit-learn at python.org> wrote: > Hello, hoping somebody can help me. > > > > I have tried.. what seems like everything. > > > > I get an OS error > > > > ERROR: Could not install packages due to an OSError: [Errno 2] No such > file or directory: > 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz' > > HINT: This error might have occurred since this system does not have > Windows Long Path support enabled. You can find information on how to > enable this at https://pip.pypa.io/warnings/enable-long-paths > > > > > > I tried enabling more than 260 characters as suggested, but that did not > help gave me a different error actually. > > > I don?t think it has to do with bits, as my computer is 64 bit. > > I also tried pip install sklearn > > > I am at a loss at this point. > > > PS- I am not the most techy of person. also, looked everywhere online > that I could > > can somebody help? > > > Thanks > > John > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Aleksandar Kacanski - Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdslater at gmail.com Sun Aug 22 16:42:10 2021 From: rdslater at gmail.com (Robert Slater) Date: Sun, 22 Aug 2021 15:42:10 -0500 Subject: [scikit-learn] cant install scikit-learn In-Reply-To: <1717831625.594362.1629497709231@mail.yahoo.com> References: <1717831625.594362.1629497709231.ref@mail.yahoo.com> <1717831625.594362.1629497709231@mail.yahoo.com> Message-ID: What was the second error? What version of python are you using?What version of windows are you using? This will help troubleshoot the issue. On Fri, Aug 20, 2021, 5:16 PM John Grenci via scikit-learn < scikit-learn at python.org> wrote: > Hello, hoping somebody can help me. > > > > I have tried.. what seems like everything. > > > > I get an OS error > > > > ERROR: Could not install packages due to an OSError: [Errno 2] No such > file or directory: > 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz' > > HINT: This error might have occurred since this system does not have > Windows Long Path support enabled. You can find information on how to > enable this at https://pip.pypa.io/warnings/enable-long-paths > > > > > > I tried enabling more than 260 characters as suggested, but that did not > help gave me a different error actually. > > > I don?t think it has to do with bits, as my computer is 64 bit. > > I also tried pip install sklearn > > > I am at a loss at this point. > > > PS- I am not the most techy of person. also, looked everywhere online > that I could > > can somebody help? > > > Thanks > > John > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasjpfan at gmail.com Sun Aug 22 17:11:22 2021 From: thomasjpfan at gmail.com (Thomas J. Fan) Date: Sun, 22 Aug 2021 17:11:22 -0400 Subject: [scikit-learn] cant install scikit-learn In-Reply-To: References: <1717831625.594362.1629497709231.ref@mail.yahoo.com> <1717831625.594362.1629497709231@mail.yahoo.com> Message-ID: Here are instructions on how to resolve the issue: https://scikit-learn.org/stable/install.html#error-caused-by-file-path-length-limit-on-windows In the upcoming release of scikit-learn, we have reduced the number of characters in the filename. This should resolve this issue without needing to edit the Windows registry. Thomas On Sun, Aug 22, 2021 at 4:44 PM Robert Slater wrote: > What was the second error? > > What version of python are you using?What version of windows are you using? > > > This will help troubleshoot the issue. > > > On Fri, Aug 20, 2021, 5:16 PM John Grenci via scikit-learn < > scikit-learn at python.org> wrote: > >> Hello, hoping somebody can help me. >> >> >> >> I have tried.. what seems like everything. >> >> >> >> I get an OS error >> >> >> >> ERROR: Could not install packages due to an OSError: [Errno 2] No such >> file or directory: >> 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz' >> >> HINT: This error might have occurred since this system does not have >> Windows Long Path support enabled. You can find information on how to >> enable this at https://pip.pypa.io/warnings/enable-long-paths >> >> >> >> >> >> I tried enabling more than 260 characters as suggested, but that did not >> help gave me a different error actually. >> >> >> I don?t think it has to do with bits, as my computer is 64 bit. >> >> I also tried pip install sklearn >> >> >> I am at a loss at this point. >> >> >> PS- I am not the most techy of person. also, looked everywhere online >> that I could >> >> can somebody help? >> >> >> Thanks >> >> John >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From varavind121 at yahoo.com Sun Aug 22 17:13:23 2021 From: varavind121 at yahoo.com (aravind ramesh) Date: Sun, 22 Aug 2021 21:13:23 +0000 (UTC) Subject: [scikit-learn] cant install scikit-learn In-Reply-To: References: <1717831625.594362.1629497709231.ref@mail.yahoo.com> <1717831625.594362.1629497709231@mail.yahoo.com> Message-ID: <1741730219.681482.1629666803984@mail.yahoo.com> Hi, Try using Anaconda Python distribution(Anaconda | Individual Edition) it comes with sci-kit learn, no hassle of dealing with any dependency issues. | | | | | | | | | | | Anaconda | Individual Edition Anaconda's open-source Individual Edition is the easiest way to perform Python/R data science and machine learni... | | | On Monday, August 23, 2021, 01:56:42 AM GMT+5:30, Sasha Kacanski wrote: Who about Linux desktop for a change. i suggest Debian or Arch! On Fri, Aug 20, 2021 at 6:17 PM John Grenci via scikit-learn wrote: Hello, hoping somebody can help me. ? I have tried.. what seems like everything. ? I get an OS error ? ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz' HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at?https://pip.pypa.io/warnings/enable-long-paths ? ? I tried enabling more than 260 characters as suggested, but that did not help? gave me a different error actually. ? I don?t think it has to do with bits, as my computer is 64 bit. I also tried pip install sklearn ? I am at a loss at this point. ? PS- I am ?not the most techy of person.? also, looked everywhere online that I could can somebody help? ? Thanks John ? _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -- Aleksandar Kacanski - Sasha _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From johngrenci61 at yahoo.com Mon Aug 23 09:34:06 2021 From: johngrenci61 at yahoo.com (John Grenci) Date: Mon, 23 Aug 2021 13:34:06 +0000 (UTC) Subject: [scikit-learn] cant install scikit-learn In-Reply-To: References: <1717831625.594362.1629497709231.ref@mail.yahoo.com> <1717831625.594362.1629497709231@mail.yahoo.com> Message-ID: <279180649.1033217.1629725646648@mail.yahoo.com> Thomas, and everybody else who responded. the instructions below worked. thanks so much. I just joined this group and four people responded rather quickly. not being a "techhy person" per se, people who respond help alleviate the frustration that can commonly occur thanks again, much appreciated. and everyone have a great day. John On Sunday, August 22, 2021, 05:12:22 PM EDT, Thomas J. Fan wrote: Here are instructions on how to resolve the issue: https://scikit-learn.org/stable/install.html#error-caused-by-file-path-length-limit-on-windows In the upcoming release of scikit-learn, we have reduced the number of characters?in the filename. This should resolve this issue without needing to edit the?Windows registry. Thomas On Sun, Aug 22, 2021 at 4:44 PM Robert Slater wrote: What was the second error? What version of python are you using?What version of windows are you using? This will help troubleshoot the issue. On Fri, Aug 20, 2021, 5:16 PM John Grenci via scikit-learn wrote: Hello, hoping somebody can help me. ? I have tried.. what seems like everything. ? I get an OS error ? ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python39\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz' HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at?https://pip.pypa.io/warnings/enable-long-paths ? ? I tried enabling more than 260 characters as suggested, but that did not help? gave me a different error actually. ? I don?t think it has to do with bits, as my computer is 64 bit. I also tried pip install sklearn ? I am at a loss at this point. ? PS- I am ?not the most techy of person.? also, looked everywhere online that I could can somebody help? ? Thanks John ? _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Wed Aug 25 04:52:21 2021 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 25 Aug 2021 10:52:21 +0200 Subject: [scikit-learn] Pandas copy-on-write proposal In-Reply-To: References: Message-ID: Thanks for the heads up! This is interesting. We rarely update dataframe values in-place in scikit-learn but this is interesting to know that we could leverage this for more efficient pandas-in pandas-out support, for instance for missing value imputation. From olivier.grisel at ensta.org Wed Aug 25 05:06:07 2021 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 25 Aug 2021 11:06:07 +0200 Subject: [scikit-learn] Dataframe protocol RFC Message-ID: Hi all, This is an email to notify everybody interested that the discussion on interoperability of Python dataframe libraries has moved to an official repo under the data-apis.org initiative: https://data-apis.org/blog/dataframe_protocol_rfc/ https://github.com/data-apis/dataframe-api and they are requesting feedback from library authors (both dataframe providers and consumers). -- Olivier From johngrenci61 at yahoo.com Wed Aug 25 09:00:23 2021 From: johngrenci61 at yahoo.com (John Grenci) Date: Wed, 25 Aug 2021 13:00:23 +0000 (UTC) Subject: [scikit-learn] data reader group? In-Reply-To: References: Message-ID: <677005045.474056.1629896423296@mail.yahoo.com> Hello everyone, I am new to this group. I was wondering if there is something akin to this for data reader? or are questions other than scikit-learn acceptable on this forum? thanks John On Wednesday, August 25, 2021, 05:06:42 AM EDT, Olivier Grisel wrote: Hi all, This is an email to notify everybody interested that the discussion on interoperability of Python dataframe libraries has moved to an official repo under the data-apis.org initiative: https://data-apis.org/blog/dataframe_protocol_rfc/ https://github.com/data-apis/dataframe-api and they are requesting feedback from library authors (both dataframe providers and consumers). -- Olivier _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasjpfan at gmail.com Wed Aug 25 10:03:57 2021 From: thomasjpfan at gmail.com (Thomas J. Fan) Date: Wed, 25 Aug 2021 10:03:57 -0400 Subject: [scikit-learn] scikit-learn monthly developer meeting: Monday August 30th 2021 Message-ID: Dear all, The scikit-learn developer monthly meeting will take place on Monday August 30th at 1PM UTC. - Video call link: https://meet.google.com/ews-uszu-djs - Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q - Local times: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=8&day=30&hour=13&min=0&sec=0&p1=1440&p2=240&p3=248&p4=195&p5=179&p6=224 The goal of this meeting is to discuss ongoing development topics for the project. Everybody is welcome. As usual, please follow the code of conduct of the project: https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md Regards, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From reshama.stat at gmail.com Wed Aug 25 22:53:58 2021 From: reshama.stat at gmail.com (Reshama Shaikh) Date: Wed, 25 Aug 2021 22:53:58 -0400 Subject: [scikit-learn] pipeline diagram Message-ID: Hello, This question is for the community (*not* the core contributors). In referencing the *diagram representation* of the pipeline [a], what would be the best way for you to find out what "strategy" (from: mean, median, most_frequent, constant) is being used for "SimpleImputer"? (Also, I am attaching a screenshot of the diagram.) It's not a quiz or anything [ :) ], I'm trying to figure out where folks would look first to get more information on the pipeline. [a] https://scikit-learn.org/dev/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py Thanks, Reshama --- Reshama Shaikh she/her Blog | Twitter | LinkedIn | GitHub Data Umbrella NYC PyLadies -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pipeline_diagram.png Type: image/png Size: 110835 bytes Desc: not available URL: From aidangawronski at gmail.com Fri Aug 27 20:14:20 2021 From: aidangawronski at gmail.com (Aidan Gawronski) Date: Fri, 27 Aug 2021 17:14:20 -0700 Subject: [scikit-learn] LabelPropagation - transduction_ vs predict Message-ID: Hi all, I was exploring sklearn.semi_supervised.LabelPropagation and I noticed that I get difference results if I train a model and look at "model.transduction_" compared to taking the same model and using "model.predict(X_train)" on the training data. I couldn't easily find the difference on google, so I began reading through the code but it seems pretty involved and I thought someone here might know the difference off hand. Any help is greatly appreciated :) Thanks, Aidan. From joel.nothman at gmail.com Sun Aug 29 02:21:49 2021 From: joel.nothman at gmail.com (Joel Nothman) Date: Sun, 29 Aug 2021 16:21:49 +1000 Subject: [scikit-learn] pipeline diagram In-Reply-To: References: Message-ID: HI Reshama, You can click the nodes in the diagram (obviously the screenshot loses this). Is there some way we can make that more obvious? Passing your mouse (if you're on an appropriate device) over it shows the hand cursor, which is some indication. Would it be helpful if when the user put their cursor over the diagram at all, it showed something like "Click an estimator type to see its parameters"?? Joel On Thu, 26 Aug 2021 at 12:55, Reshama Shaikh wrote: > Hello, > This question is for the community (*not* the core contributors). > > In referencing the *diagram representation* of the pipeline [a], what > would be the best way for you to find out what "strategy" (from: mean, > median, most_frequent, constant) is being used for "SimpleImputer"? > > (Also, I am attaching a screenshot of the diagram.) > > It's not a quiz or anything [ :) ], I'm trying to figure out where folks > would look first to get more information on the pipeline. > > [a] > > https://scikit-learn.org/dev/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py > > Thanks, > Reshama > --- > Reshama Shaikh > she/her > Blog | Twitter > | LinkedIn | GitHub > > > Data Umbrella > NYC PyLadies > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From reshama.stat at gmail.com Sun Aug 29 10:09:34 2021 From: reshama.stat at gmail.com (Reshama Shaikh) Date: Sun, 29 Aug 2021 10:09:34 -0400 Subject: [scikit-learn] pipeline diagram In-Reply-To: References: Message-ID: Hi Joel, I am working on the PR to add the diagram visualization to the documentation [a]. I had added both text and diagram output to all the examples, because I did not realize you could click on the diagram sections to get more information. It wasn't until my recent discussion with Thomas where he pointed it out; it wasn't intuitive to me. It would be good to either: a) add a note somewhere indicating "click on the text in the pipeline visualization to see more details, such as parameter settings" b) add a GIF of it to the documentation c) if when the user puts their cursor over the diagram at all, show something like "Click an estimator type to see its parameters" I added this PR to the agenda for the next scikit-learn meeting. [a] https://github.com/scikit-learn/scikit-learn/pull/18758 Reshama Shaikh she/her Blog | Twitter | LinkedIn | GitHub Data Umbrella NYC PyLadies On Sun, Aug 29, 2021 at 2:24 AM Joel Nothman wrote: > HI Reshama, > > You can click the nodes in the diagram (obviously the screenshot loses > this). Is there some way we can make that more obvious? Passing your > mouse (if you're on an appropriate device) over it shows the hand cursor, > which is some indication. > > Would it be helpful if when the user put their cursor over the diagram at > all, it showed something like "Click an estimator type to see its > parameters"?? > > Joel > > On Thu, 26 Aug 2021 at 12:55, Reshama Shaikh > wrote: > >> Hello, >> This question is for the community (*not* the core contributors). >> >> In referencing the *diagram representation* of the pipeline [a], what >> would be the best way for you to find out what "strategy" (from: mean, >> median, most_frequent, constant) is being used for "SimpleImputer"? >> >> (Also, I am attaching a screenshot of the diagram.) >> >> It's not a quiz or anything [ :) ], I'm trying to figure out where folks >> would look first to get more information on the pipeline. >> >> [a] >> >> https://scikit-learn.org/dev/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py >> >> Thanks, >> Reshama >> --- >> Reshama Shaikh >> she/her >> Blog | Twitter >> | LinkedIn >> | GitHub >> >> >> Data Umbrella >> NYC PyLadies >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: