From andreasmuellerml at gmail.com Wed Dec 2 19:35:45 2020 From: andreasmuellerml at gmail.com (Andreas C. Mueller) Date: Thu, 03 Dec 2020 00:35:45 +0000 Subject: [scikit-learn] Changes in Travis billing In-Reply-To: <20201126141212.x72eecggoozdocck@phare.normalesup.org> References: <20201102105018.kfarylwpao6iliju@phare.normalesup.org> <20201126140652.tdrsjzfasb64kqua@phare.normalesup.org> <20201126141212.x72eecggoozdocck@phare.normalesup.org> Message-ID: Sorry I'm probably missing some detail but what does travis provide that github actions and azure pipeline don't provide? ------ Original Message ------ From: "Gael Varoquaux" To: "Scikit-learn mailing list" Sent: 11/26/2020 6:12:12 AM Subject: Re: [scikit-learn] Changes in Travis billing >On Thu, Nov 26, 2020 at 03:06:52PM +0100, Gael Varoquaux wrote: >> On Thu, Nov 26, 2020 at 02:45:33PM +0100, Adrin wrote: >> > At this point I'm at a loss, and reading the NumFocus chat and other >> > packages' experience with them on the same topic, seems like we just >> > need to move out of Travis. > >> Agreed. Do we still need them for something essential? > >Sorry, ARM, it was just above in the conversation. > >I think that we have no other option than reduce the frequency of the >cron, and wait for other platforms to offer ARM, which will hopefully >happen soonish. > >G >_______________________________________________ >scikit-learn mailing list >scikit-learn at python.org >https://mail.python.org/mailman/listinfo/scikit-learn From g.lemaitre58 at gmail.com Thu Dec 3 04:15:50 2020 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Thu, 3 Dec 2020 10:15:50 +0100 Subject: [scikit-learn] Changes in Travis billing In-Reply-To: References: <20201102105018.kfarylwpao6iliju@phare.normalesup.org> <20201126140652.tdrsjzfasb64kqua@phare.normalesup.org> <20201126141212.x72eecggoozdocck@phare.normalesup.org> Message-ID: ARM support On Thu, 3 Dec 2020 at 01:37, Andreas C. Mueller wrote: > Sorry I'm probably missing some detail but what does travis provide that > github actions and azure pipeline don't provide? > > ------ Original Message ------ > From: "Gael Varoquaux" > To: "Scikit-learn mailing list" > Sent: 11/26/2020 6:12:12 AM > Subject: Re: [scikit-learn] Changes in Travis billing > > >On Thu, Nov 26, 2020 at 03:06:52PM +0100, Gael Varoquaux wrote: > >> On Thu, Nov 26, 2020 at 02:45:33PM +0100, Adrin wrote: > >> > At this point I'm at a loss, and reading the NumFocus chat and other > >> > packages' experience with them on the same topic, seems like we just > >> > need to move out of Travis. > > > >> Agreed. Do we still need them for something essential? > > > >Sorry, ARM, it was just above in the conversation. > > > >I think that we have no other option than reduce the frequency of the > >cron, and wait for other platforms to offer ARM, which will hopefully > >happen soonish. > > > >G > >_______________________________________________ > >scikit-learn mailing list > >scikit-learn at python.org > >https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrin.jalali at gmail.com Thu Dec 3 05:36:32 2020 From: adrin.jalali at gmail.com (Adrin) Date: Thu, 3 Dec 2020 11:36:32 +0100 Subject: [scikit-learn] Changes in Travis billing In-Reply-To: References: <20201102105018.kfarylwpao6iliju@phare.normalesup.org> <20201126140652.tdrsjzfasb64kqua@phare.normalesup.org> <20201126141212.x72eecggoozdocck@phare.normalesup.org> Message-ID: I got a response from Travis, and according to this, we don't qualify for the free credits: The free plan will grant your organization 10000 credits. We offer an Open Source Subscription for free to non-commercial open-source projects. To qualify for an Open Source subscription, the project must meet the following requirements: - You are a project lead or regular committer (latest commit in the last month) - Project must be at least 3 months old and is in active development (with regular commits and activity) - Project meets the OSD specification - *Project must not be sponsored by a commercial company or organization (monetary or with employees paid to work on the project) * - Project can not provide commercial services or distribute paid versions of the software Does this sound like you and your project? We'd be very happy to support you! However, if your project does not match these requirements or you have further questions [1], please feel free to ask! We look forward to your response if you meet these requirements to proceed with the next steps. Thank you On Thu, Dec 3, 2020 at 10:17 AM Guillaume Lema?tre wrote: > ARM support > > On Thu, 3 Dec 2020 at 01:37, Andreas C. Mueller < > andreasmuellerml at gmail.com> wrote: > >> Sorry I'm probably missing some detail but what does travis provide that >> github actions and azure pipeline don't provide? >> >> ------ Original Message ------ >> From: "Gael Varoquaux" >> To: "Scikit-learn mailing list" >> Sent: 11/26/2020 6:12:12 AM >> Subject: Re: [scikit-learn] Changes in Travis billing >> >> >On Thu, Nov 26, 2020 at 03:06:52PM +0100, Gael Varoquaux wrote: >> >> On Thu, Nov 26, 2020 at 02:45:33PM +0100, Adrin wrote: >> >> > At this point I'm at a loss, and reading the NumFocus chat and other >> >> > packages' experience with them on the same topic, seems like we just >> >> > need to move out of Travis. >> > >> >> Agreed. Do we still need them for something essential? >> > >> >Sorry, ARM, it was just above in the conversation. >> > >> >I think that we have no other option than reduce the frequency of the >> >cron, and wait for other platforms to offer ARM, which will hopefully >> >happen soonish. >> > >> >G >> >_______________________________________________ >> >scikit-learn mailing list >> >scikit-learn at python.org >> >https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nelle.varoquaux at gmail.com Thu Dec 3 05:47:39 2020 From: nelle.varoquaux at gmail.com (Nelle Varoquaux) Date: Thu, 3 Dec 2020 11:47:39 +0100 Subject: [scikit-learn] Changes in Travis billing In-Reply-To: References: <20201102105018.kfarylwpao6iliju@phare.normalesup.org> <20201126140652.tdrsjzfasb64kqua@phare.normalesup.org> <20201126141212.x72eecggoozdocck@phare.normalesup.org> Message-ID: Wow? what?! That's insane? scikit-image got some credits ( https://github.com/scikit-image/scikit-image-wheels/pull/47#issuecomment-736760539) without much issue? Maybe someone should reach out directly on this thread to the travis people? Cheers, N On Thu, 3 Dec 2020 at 11:37, Adrin wrote: > I got a response from Travis, and according to this, we don't qualify for > the free credits: > > The free plan will grant your organization 10000 credits. > > We offer an Open Source Subscription for free to non-commercial > open-source projects. To qualify for an Open Source subscription, the > project must meet the following requirements: > > > - You are a project lead or regular committer (latest commit in the > last month) > - Project must be at least 3 months old and is in active development > (with regular commits and activity) > - Project meets the OSD specification > - > *Project must not be sponsored by a commercial company or organization > (monetary or with employees paid to work on the project) * > - Project can not provide commercial services or distribute paid > versions of the software > > > Does this sound like you and your project? We'd be very happy to support > you! > > However, if your project does not match these requirements or you have > further questions [1], please feel free to ask! > > We look forward to your response if you meet these requirements to proceed > with the next steps. > > Thank you > > On Thu, Dec 3, 2020 at 10:17 AM Guillaume Lema?tre > wrote: > >> ARM support >> >> On Thu, 3 Dec 2020 at 01:37, Andreas C. Mueller < >> andreasmuellerml at gmail.com> wrote: >> >>> Sorry I'm probably missing some detail but what does travis provide that >>> github actions and azure pipeline don't provide? >>> >>> ------ Original Message ------ >>> From: "Gael Varoquaux" >>> To: "Scikit-learn mailing list" >>> Sent: 11/26/2020 6:12:12 AM >>> Subject: Re: [scikit-learn] Changes in Travis billing >>> >>> >On Thu, Nov 26, 2020 at 03:06:52PM +0100, Gael Varoquaux wrote: >>> >> On Thu, Nov 26, 2020 at 02:45:33PM +0100, Adrin wrote: >>> >> > At this point I'm at a loss, and reading the NumFocus chat and >>> other >>> >> > packages' experience with them on the same topic, seems like we >>> just >>> >> > need to move out of Travis. >>> > >>> >> Agreed. Do we still need them for something essential? >>> > >>> >Sorry, ARM, it was just above in the conversation. >>> > >>> >I think that we have no other option than reduce the frequency of the >>> >cron, and wait for other platforms to offer ARM, which will hopefully >>> >happen soonish. >>> > >>> >G >>> >_______________________________________________ >>> >scikit-learn mailing list >>> >scikit-learn at python.org >>> >https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> -- >> Guillaume Lemaitre >> Scikit-learn @ Inria Foundation >> https://glemaitre.github.io/ >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solegalli at protonmail.com Thu Dec 3 05:55:38 2020 From: solegalli at protonmail.com (Sole Galli) Date: Thu, 03 Dec 2020 10:55:38 +0000 Subject: [scikit-learn] sample_weight vs class_weight Message-ID: Hello team, What is the difference in the implementation of class_weight and sample_weight in those algorithms that support both? like random forest or logistic regression? Are both modifying the loss function? in a similar way? Thank you! Sole -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Dec 3 13:20:52 2020 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 3 Dec 2020 19:20:52 +0100 Subject: [scikit-learn] [ANN] scikit-learn 0.24.0rc1 is online! Message-ID: Please help us test the first release candidate for scikit-learn 0.24.0: pip install scikit-learn==0.24.0rc1 Changelog: https://scikit-learn.org/0.24/whats_new/v0.24.html In particular, if you maintain a project with a dependency on scikit-learn, please let us know about any regression. Feel free to also retweet the announcement to get more people to test it before the final release (probably in 1 week or 2): https://twitter.com/scikit_learn/status/1334562221498753026 Thanks to anybody who helped make this happen! -- Olivier From bsipocz at gmail.com Thu Dec 3 14:59:32 2020 From: bsipocz at gmail.com (Brigitta Sipocz) Date: Thu, 3 Dec 2020 11:59:32 -0800 Subject: [scikit-learn] Changes in Travis billing In-Reply-To: References: <20201102105018.kfarylwpao6iliju@phare.normalesup.org> <20201126140652.tdrsjzfasb64kqua@phare.normalesup.org> <20201126141212.x72eecggoozdocck@phare.normalesup.org> Message-ID: Hi, Astropy also got a (one time) credit, that we very quickly burned up. And NumPy also got a credit, but I'm not sure whether it's also a one-off or a monthly. For astropy, our workaround is to run the tests on aarch64 and the other more exotic hardware from a weekly cron, as each takes a very long time on GH Actions. Hopefully, a better, native solution will come to Actions/Azure soon enough. I link the PR that set this up, maybe this solution would be good enough for sklearn, too. https://github.com/astropy/astropy/pull/11045 Cheers, Brigitta On Thu, 3 Dec 2020 at 02:50, Nelle Varoquaux wrote: > Wow? what?! That's insane? > scikit-image got some credits ( > https://github.com/scikit-image/scikit-image-wheels/pull/47#issuecomment-736760539) > without much issue? Maybe someone should reach out directly on this thread > to the travis people? > > Cheers, > N > > On Thu, 3 Dec 2020 at 11:37, Adrin wrote: > >> I got a response from Travis, and according to this, we don't qualify for >> the free credits: >> >> The free plan will grant your organization 10000 credits. >> >> We offer an Open Source Subscription for free to non-commercial >> open-source projects. To qualify for an Open Source subscription, the >> project must meet the following requirements: >> >> >> - You are a project lead or regular committer (latest commit in the >> last month) >> - Project must be at least 3 months old and is in active development >> (with regular commits and activity) >> - Project meets the OSD >> specification >> - >> *Project must not be sponsored by a commercial company or organization >> (monetary or with employees paid to work on the project) * >> - Project can not provide commercial services or distribute paid >> versions of the software >> >> >> Does this sound like you and your project? We'd be very happy to support >> you! >> >> However, if your project does not match these requirements or you have >> further questions [1], please feel free to ask! >> >> We look forward to your response if you meet these requirements to >> proceed with the next steps. >> >> Thank you >> >> On Thu, Dec 3, 2020 at 10:17 AM Guillaume Lema?tre < >> g.lemaitre58 at gmail.com> wrote: >> >>> ARM support >>> >>> On Thu, 3 Dec 2020 at 01:37, Andreas C. Mueller < >>> andreasmuellerml at gmail.com> wrote: >>> >>>> Sorry I'm probably missing some detail but what does travis provide >>>> that >>>> github actions and azure pipeline don't provide? >>>> >>>> ------ Original Message ------ >>>> From: "Gael Varoquaux" >>>> To: "Scikit-learn mailing list" >>>> Sent: 11/26/2020 6:12:12 AM >>>> Subject: Re: [scikit-learn] Changes in Travis billing >>>> >>>> >On Thu, Nov 26, 2020 at 03:06:52PM +0100, Gael Varoquaux wrote: >>>> >> On Thu, Nov 26, 2020 at 02:45:33PM +0100, Adrin wrote: >>>> >> > At this point I'm at a loss, and reading the NumFocus chat and >>>> other >>>> >> > packages' experience with them on the same topic, seems like we >>>> just >>>> >> > need to move out of Travis. >>>> > >>>> >> Agreed. Do we still need them for something essential? >>>> > >>>> >Sorry, ARM, it was just above in the conversation. >>>> > >>>> >I think that we have no other option than reduce the frequency of the >>>> >cron, and wait for other platforms to offer ARM, which will hopefully >>>> >happen soonish. >>>> > >>>> >G >>>> >_______________________________________________ >>>> >scikit-learn mailing list >>>> >scikit-learn at python.org >>>> >https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> >>> >>> -- >>> Guillaume Lemaitre >>> Scikit-learn @ Inria Foundation >>> https://glemaitre.github.io/ >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solegalli at protonmail.com Fri Dec 4 05:40:44 2020 From: solegalli at protonmail.com (Sole Galli) Date: Fri, 04 Dec 2020 10:40:44 +0000 Subject: [scikit-learn] sample_weight vs class_weight In-Reply-To: References: Message-ID: Actually, I found the answer. Both seem to be optimising the loss function for the various algorithms, below I include some links. If, we pass class_weight and sample_weight, then the final cost / weight is a combination of both. I have a follow up question: in which scenario would we use both? why do some estimators allow to pass weights both as a dict in the init or as sample weights in fit? what's the logic? I found it a bit confusing at the beginning. Thank you! https://stackoverflow.com/questions/30805192/scikit-learn-random-forest-class-weight-and-sample-weight-parameters https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work/30982811#30982811 Soledad Galli https://www.trainindata.com/ ??????? Original Message ??????? On Thursday, December 3, 2020 11:55 AM, Sole Galli via scikit-learn wrote: > Hello team, > > What is the difference in the implementation of class_weight and sample_weight in those algorithms that support both? like random forest or logistic regression? > > Are both modifying the loss function? in a similar way? > > Thank you! > > Sole -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Fri Dec 4 05:59:03 2020 From: niourf at gmail.com (Nicolas Hug) Date: Fri, 4 Dec 2020 10:59:03 +0000 Subject: [scikit-learn] sample_weight vs class_weight In-Reply-To: References: Message-ID: Basically passing class weights should be equivalent to passing per-class-constant sample weights. > why do some estimators allow to pass weights both as a dict in the init or as sample weights in fit? what's the logic? SW is a per-sample property (aligned with X and y) so we avoid passing those to init because the data isn't known when initializing the estimator. It's only known when calling fit. In general we avoid passing data-related info into init so that the same instance can be fitted on any data (with different number of samples, different classes, etc.). We allow to pass class_weight in init because the 'balanced' option is data-agnostic. Arguably, allowing a dict with actual class values violates the above argument (of not having data-related stuff in init), so I guess that's where the logic ends ;) As to why one would use both, I'm not so sure honestly. Nicolas On 12/4/20 10:40 AM, Sole Galli via scikit-learn wrote: > Actually, I found the answer. Both seem to be optimising the loss > function for the various algorithms, below I include some links. > > If, we pass *class_weight* and *sample_weight,* then the final cost / > weight is a combination of both. > > I have a follow up question: in which scenario would we use both? why > do some estimators allow to pass weights both as a dict in the init or > as sample weights in fit? what's the logic? I found it a bit confusing > at the beginning. > > Thank you! > > https://stackoverflow.com/questions/30805192/scikit-learn-random-forest-class-weight-and-sample-weight-parameters > > https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work/30982811#30982811 > > Soledad Galli > https://www.trainindata.com/ > > > ??????? Original Message ??????? > On Thursday, December 3, 2020 11:55 AM, Sole Galli via scikit-learn > wrote: > >> Hello team, >> >> What is the difference in the implementation of class_weight and >> sample_weight in those algorithms that support both? like random >> forest or logistic regression? >> >> Are both modifying the loss function? in a similar way? >> >> Thank you! >> >> Sole >> >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From maykonschots at gmail.com Fri Dec 4 06:06:54 2020 From: maykonschots at gmail.com (mrschots) Date: Fri, 4 Dec 2020 12:06:54 +0100 Subject: [scikit-learn] sample_weight vs class_weight In-Reply-To: References: Message-ID: I have been using both in time-series classification. I put a exponential decay in sample_weights AND class weights as a dictionary. BR/Schots Em sex., 4 de dez. de 2020 ?s 12:01, Nicolas Hug escreveu: > Basically passing class weights should be equivalent to passing > per-class-constant sample weights. > > > why do some estimators allow to pass weights both as a dict in the init > or as sample weights in fit? what's the logic? > > SW is a per-sample property (aligned with X and y) so we avoid passing > those to init because the data isn't known when initializing the estimator. > It's only known when calling fit. In general we avoid passing data-related > info into init so that the same instance can be fitted on any data (with > different number of samples, different classes, etc.). > > We allow to pass class_weight in init because the 'balanced' option is > data-agnostic. Arguably, allowing a dict with actual class values violates > the above argument (of not having data-related stuff in init), so I guess > that's where the logic ends ;) > > As to why one would use both, I'm not so sure honestly. > > > Nicolas > > > On 12/4/20 10:40 AM, Sole Galli via scikit-learn wrote: > > Actually, I found the answer. Both seem to be optimising the loss function > for the various algorithms, below I include some links. > > If, we pass *class_weight* and *sample_weight,* then the final cost / > weight is a combination of both. > > I have a follow up question: in which scenario would we use both? why do > some estimators allow to pass weights both as a dict in the init or as > sample weights in fit? what's the logic? I found it a bit confusing at the > beginning. > > Thank you! > > > https://stackoverflow.com/questions/30805192/scikit-learn-random-forest-class-weight-and-sample-weight-parameters > > > https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work/30982811#30982811 > > Soledad Galli > https://www.trainindata.com/ > > > ??????? Original Message ??????? > On Thursday, December 3, 2020 11:55 AM, Sole Galli via scikit-learn > wrote: > > Hello team, > > What is the difference in the implementation of class_weight and > sample_weight in those algorithms that support both? like random forest or > logistic regression? > > Are both modifying the loss function? in a similar way? > > Thank you! > > Sole > > > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Schots -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Dec 4 16:06:36 2020 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 4 Dec 2020 22:06:36 +0100 Subject: [scikit-learn] Presented scikit-learn to the French President Message-ID: <20201204210636.36geowufvvphujmg@phare.normalesup.org> Hi scikit-learn community, Today, I presented some efforts in digital health to the French president and part of the government. As these efforts were partly powered by scikit-learn (and the whole pydata stack, to be fair), the team in charge of the event had printed a huge scikit-learn logo behind me: https://twitter.com/GaelVaroquaux/status/1334959438059462659 (terrible mobile-phone picture) I would have liked to get a picture with the president and the logo, but it seems that they are releasing only a handful of pictures :(. Anyhow... Thanks to the community! This is a huge success. For health topics (we are talking nationwide electronic health records) the ability to build on an independent open-source stack is extremely important. We, as a wider community, are building something priceless. Cheers, Ga?l From adrin.jalali at gmail.com Fri Dec 4 18:08:20 2020 From: adrin.jalali at gmail.com (Adrin) Date: Sat, 5 Dec 2020 00:08:20 +0100 Subject: [scikit-learn] Presented scikit-learn to the French President In-Reply-To: <20201204210636.36geowufvvphujmg@phare.normalesup.org> References: <20201204210636.36geowufvvphujmg@phare.normalesup.org> Message-ID: Nice, such a milestone! On Fri., Dec. 4, 2020, 23:59 Gael Varoquaux, wrote: > Hi scikit-learn community, > > Today, I presented some efforts in digital health to the French president > and part of the government. As these efforts were partly powered by > scikit-learn (and the whole pydata stack, to be fair), the team in charge > of the event had printed a huge scikit-learn logo behind me: > https://twitter.com/GaelVaroquaux/status/1334959438059462659 (terrible > mobile-phone picture) > > I would have liked to get a picture with the president and the logo, but > it seems that they are releasing only a handful of pictures :(. Anyhow... > > > Thanks to the community! This is a huge success. For health topics (we > are talking nationwide electronic health records) the ability to build on > an independent open-source stack is extremely important. We, as a wider > community, are building something priceless. > > Cheers, > > Ga?l > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivertomic at zoho.com Fri Dec 4 18:22:50 2020 From: olivertomic at zoho.com (Oliver Tomic) Date: Sat, 05 Dec 2020 00:22:50 +0100 Subject: [scikit-learn] Presented scikit-learn to the French President In-Reply-To: References: <20201204210636.36geowufvvphujmg@phare.normalesup.org> Message-ID: <1763010b5df.1049ed1a417502.9100953640053855241@zoho.com> That's brilliant! Good to know that scikit-learn is increasingly visible outside the ML community and that even the French president heard about it now.? Thanks a lot to the community for this fantastic package. We use scikit-learn extensively in our ML course at the Norwegian University of Life Sciences and our students love it.? Cheers Oliver ---- On Sat, 05 Dec 2020 00:08:20 +0100 Adrin wrote ---- Nice, such a milestone! On Fri., Dec. 4, 2020, 23:59 Gael Varoquaux, wrote: Hi scikit-learn community, Today, I presented some efforts in digital health to the French president and part of the government. As these efforts were partly powered by scikit-learn (and the whole pydata stack, to be fair), the team in charge of the event had printed a huge scikit-learn logo behind me: https://twitter.com/GaelVaroquaux/status/1334959438059462659 (terrible mobile-phone picture) I would have liked to get a picture with the president and the logo, but it seems that they are releasing only a handful of pictures :(. Anyhow... Thanks to the community! This is a huge success. For health topics (we are talking nationwide electronic health records) the ability to build on an independent open-source stack is extremely important. We, as a wider community, are building something priceless. Cheers, Ga?l _______________________________________________ scikit-learn mailing list mailto:scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list mailto:scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From solegalli at protonmail.com Sat Dec 5 07:55:41 2020 From: solegalli at protonmail.com (Sole Galli) Date: Sat, 05 Dec 2020 12:55:41 +0000 Subject: [scikit-learn] sample_weight vs class_weight In-Reply-To: References: Message-ID: Thank you guys! very helpful :) Soledad Galli https://www.trainindata.com/ ??????? Original Message ??????? On Friday, December 4, 2020 12:06 PM, mrschots wrote: > I have been using both in time-series classification. I put a exponential decay in sample_weights AND class weights as a dictionary. > > BR/Schots > > Em sex., 4 de dez. de 2020 ?s 12:01, Nicolas Hug escreveu: > >> Basically passing class weights should be equivalent to passing per-class-constant sample weights. >> >>> why do some estimators allow to pass weights both as a dict in the init or as sample weights in fit? what's the logic? >> >> SW is a per-sample property (aligned with X and y) so we avoid passing those to init because the data isn't known when initializing the estimator. It's only known when calling fit. In general we avoid passing data-related info into init so that the same instance can be fitted on any data (with different number of samples, different classes, etc.). >> >> We allow to pass class_weight in init because the 'balanced' option is data-agnostic. Arguably, allowing a dict with actual class values violates the above argument (of not having data-related stuff in init), so I guess that's where the logic ends ;) >> >> As to why one would use both, I'm not so sure honestly. >> >> Nicolas >> >> On 12/4/20 10:40 AM, Sole Galli via scikit-learn wrote: >> >>> Actually, I found the answer. Both seem to be optimising the loss function for the various algorithms, below I include some links. >>> >>> If, we pass class_weight and sample_weight, then the final cost / weight is a combination of both. >>> >>> I have a follow up question: in which scenario would we use both? why do some estimators allow to pass weights both as a dict in the init or as sample weights in fit? what's the logic? I found it a bit confusing at the beginning. >>> >>> Thank you! >>> >>> https://stackoverflow.com/questions/30805192/scikit-learn-random-forest-class-weight-and-sample-weight-parameters >>> >>> https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work/30982811#30982811 >>> >>> Soledad Galli >>> https://www.trainindata.com/ >>> >>> ??????? Original Message ??????? >>> On Thursday, December 3, 2020 11:55 AM, Sole Galli via scikit-learn [](mailto:scikit-learn at python.org) wrote: >>> >>>> Hello team, >>>> >>>> What is the difference in the implementation of class_weight and sample_weight in those algorithms that support both? like random forest or logistic regression? >>>> >>>> Are both modifying the loss function? in a similar way? >>>> >>>> Thank you! >>>> >>>> Sole >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > -- > Schots -------------- next part -------------- An HTML attachment was scrubbed... URL: From jk231092 at gmail.com Sat Dec 5 10:28:39 2020 From: jk231092 at gmail.com (Jitesh Khandelwal) Date: Sat, 5 Dec 2020 20:58:39 +0530 Subject: [scikit-learn] Presented scikit-learn to the French President In-Reply-To: <20201204210636.36geowufvvphujmg@phare.normalesup.org> References: <20201204210636.36geowufvvphujmg@phare.normalesup.org> Message-ID: Amazing, inspiring! Kudos to the sklearn team. On Sat, Dec 5, 2020, 4:30 AM Gael Varoquaux wrote: > Hi scikit-learn community, > > Today, I presented some efforts in digital health to the French president > and part of the government. As these efforts were partly powered by > scikit-learn (and the whole pydata stack, to be fair), the team in charge > of the event had printed a huge scikit-learn logo behind me: > https://twitter.com/GaelVaroquaux/status/1334959438059462659 (terrible > mobile-phone picture) > > I would have liked to get a picture with the president and the logo, but > it seems that they are releasing only a handful of pictures :(. Anyhow... > > > Thanks to the community! This is a huge success. For health topics (we > are talking nationwide electronic health records) the ability to build on > an independent open-source stack is extremely important. We, as a wider > community, are building something priceless. > > Cheers, > > Ga?l > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Sat Dec 5 11:05:48 2020 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Sat, 5 Dec 2020 10:05:48 -0600 Subject: [scikit-learn] Presented scikit-learn to the French President In-Reply-To: References: <20201204210636.36geowufvvphujmg@phare.normalesup.org> Message-ID: <48B89C62-8C0E-4B8C-BB4E-58BAD976F684@sebastianraschka.com> This is really awesome news! Thanks a lot to everyone developing scikit-learn. I am just wrapping up another successful semester, teaching students ML basics. Most coming from an R background, they really loved scikit-learn and appreciated it's ease of use and well-thought-out API. Best, Sebastian > On Dec 5, 2020, at 9:28 AM, Jitesh Khandelwal wrote: > > Amazing, inspiring! Kudos to the sklearn team. > > On Sat, Dec 5, 2020, 4:30 AM Gael Varoquaux wrote: > Hi scikit-learn community, > > Today, I presented some efforts in digital health to the French president > and part of the government. As these efforts were partly powered by > scikit-learn (and the whole pydata stack, to be fair), the team in charge > of the event had printed a huge scikit-learn logo behind me: > https://twitter.com/GaelVaroquaux/status/1334959438059462659 (terrible > mobile-phone picture) > > I would have liked to get a picture with the president and the logo, but > it seems that they are releasing only a handful of pictures :(. Anyhow... > > > Thanks to the community! This is a huge success. For health topics (we > are talking nationwide electronic health records) the ability to build on > an independent open-source stack is extremely important. We, as a wider > community, are building something priceless. > > Cheers, > > Ga?l > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From jbbrown at kuhp.kyoto-u.ac.jp Sun Dec 6 16:36:42 2020 From: jbbrown at kuhp.kyoto-u.ac.jp (Brown J.B.) Date: Sun, 6 Dec 2020 22:36:42 +0100 Subject: [scikit-learn] Presented scikit-learn to the French President In-Reply-To: <48B89C62-8C0E-4B8C-BB4E-58BAD976F684@sebastianraschka.com> References: <20201204210636.36geowufvvphujmg@phare.normalesup.org> <48B89C62-8C0E-4B8C-BB4E-58BAD976F684@sebastianraschka.com> Message-ID: Congratulations to all developers and contributors to scikit-learn, from core-devs to webmasters, documentation checkers and commenters, and other facilitators! Keeping a project alive takes a substantial amount of vision and hard work, and scikit-learn is a mature ecosystem because of the vision and hard work of everyone. This recognition by the French government is fantastic -- congratulations Gael to you, your leadership, and your team! In fact, scikit-learn is probably more ubiquitous than anyone individually recognizes, because for all of the contributions in github and mailing lists, there are probably many more people who are benefitting from applying it to their individual scenarios. I myself am a very appreciative user. :) Sincere regards and congratulations again, J.B. Brown 2020?12?5?(?) 17:53 Sebastian Raschka : > This is really awesome news! Thanks a lot to everyone developing > scikit-learn. I am just wrapping up another successful semester, teaching > students ML basics. Most coming from an R background, they really loved > scikit-learn and appreciated it's ease of use and well-thought-out API. > > Best, > Sebastian > > > On Dec 5, 2020, at 9:28 AM, Jitesh Khandelwal > wrote: > > > > Amazing, inspiring! Kudos to the sklearn team. > > > > On Sat, Dec 5, 2020, 4:30 AM Gael Varoquaux < > gael.varoquaux at normalesup.org> wrote: > > Hi scikit-learn community, > > > > Today, I presented some efforts in digital health to the French president > > and part of the government. As these efforts were partly powered by > > scikit-learn (and the whole pydata stack, to be fair), the team in charge > > of the event had printed a huge scikit-learn logo behind me: > > https://twitter.com/GaelVaroquaux/status/1334959438059462659 (terrible > > mobile-phone picture) > > > > I would have liked to get a picture with the president and the logo, but > > it seems that they are releasing only a handful of pictures :(. > Anyhow... > > > > > > Thanks to the community! This is a huge success. For health topics (we > > are talking nationwide electronic health records) the ability to build on > > an independent open-source stack is extremely important. We, as a wider > > community, are building something priceless. > > > > Cheers, > > > > Ga?l > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrin.jalali at gmail.com Tue Dec 8 07:01:56 2020 From: adrin.jalali at gmail.com (Adrin) Date: Tue, 8 Dec 2020 13:01:56 +0100 Subject: [scikit-learn] Changes in Travis billing In-Reply-To: References: <20201102105018.kfarylwpao6iliju@phare.normalesup.org> <20201126140652.tdrsjzfasb64kqua@phare.normalesup.org> <20201126141212.x72eecggoozdocck@phare.normalesup.org> Message-ID: We've got 10k extra credit, not sure if this is monthly or a one time one. On Thu, Dec 3, 2020 at 9:00 PM Brigitta Sipocz wrote: > Hi, > > Astropy also got a (one time) credit, that we very quickly burned up. And > NumPy also got a credit, but I'm not sure whether it's also a one-off or a > monthly. > > For astropy, our workaround is to run the tests on aarch64 and the other > more exotic hardware from a weekly cron, as each takes a very long time on > GH Actions. Hopefully, a better, native solution will come to Actions/Azure > soon enough. > I link the PR that set this up, maybe this solution would be good enough > for sklearn, too. > > https://github.com/astropy/astropy/pull/11045 > > > Cheers, > Brigitta > > On Thu, 3 Dec 2020 at 02:50, Nelle Varoquaux > wrote: > >> Wow? what?! That's insane? >> scikit-image got some credits ( >> https://github.com/scikit-image/scikit-image-wheels/pull/47#issuecomment-736760539) >> without much issue? Maybe someone should reach out directly on this thread >> to the travis people? >> >> Cheers, >> N >> >> On Thu, 3 Dec 2020 at 11:37, Adrin wrote: >> >>> I got a response from Travis, and according to this, we don't qualify >>> for the free credits: >>> >>> The free plan will grant your organization 10000 credits. >>> >>> We offer an Open Source Subscription for free to non-commercial >>> open-source projects. To qualify for an Open Source subscription, the >>> project must meet the following requirements: >>> >>> >>> - You are a project lead or regular committer (latest commit in the >>> last month) >>> - Project must be at least 3 months old and is in active development >>> (with regular commits and activity) >>> - Project meets the OSD >>> specification >>> - >>> *Project must not be sponsored by a commercial company or organization >>> (monetary or with employees paid to work on the project) * >>> - Project can not provide commercial services or distribute paid >>> versions of the software >>> >>> >>> Does this sound like you and your project? We'd be very happy to support >>> you! >>> >>> However, if your project does not match these requirements or you have >>> further questions [1], please feel free to ask! >>> >>> We look forward to your response if you meet these requirements to >>> proceed with the next steps. >>> >>> Thank you >>> >>> On Thu, Dec 3, 2020 at 10:17 AM Guillaume Lema?tre < >>> g.lemaitre58 at gmail.com> wrote: >>> >>>> ARM support >>>> >>>> On Thu, 3 Dec 2020 at 01:37, Andreas C. Mueller < >>>> andreasmuellerml at gmail.com> wrote: >>>> >>>>> Sorry I'm probably missing some detail but what does travis provide >>>>> that >>>>> github actions and azure pipeline don't provide? >>>>> >>>>> ------ Original Message ------ >>>>> From: "Gael Varoquaux" >>>>> To: "Scikit-learn mailing list" >>>>> Sent: 11/26/2020 6:12:12 AM >>>>> Subject: Re: [scikit-learn] Changes in Travis billing >>>>> >>>>> >On Thu, Nov 26, 2020 at 03:06:52PM +0100, Gael Varoquaux wrote: >>>>> >> On Thu, Nov 26, 2020 at 02:45:33PM +0100, Adrin wrote: >>>>> >> > At this point I'm at a loss, and reading the NumFocus chat and >>>>> other >>>>> >> > packages' experience with them on the same topic, seems like we >>>>> just >>>>> >> > need to move out of Travis. >>>>> > >>>>> >> Agreed. Do we still need them for something essential? >>>>> > >>>>> >Sorry, ARM, it was just above in the conversation. >>>>> > >>>>> >I think that we have no other option than reduce the frequency of the >>>>> >cron, and wait for other platforms to offer ARM, which will hopefully >>>>> >happen soonish. >>>>> > >>>>> >G >>>>> >_______________________________________________ >>>>> >scikit-learn mailing list >>>>> >scikit-learn at python.org >>>>> >https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> >>>> >>>> -- >>>> Guillaume Lemaitre >>>> Scikit-learn @ Inria Foundation >>>> https://glemaitre.github.io/ >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahmood.nt at gmail.com Wed Dec 9 04:39:14 2020 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Wed, 9 Dec 2020 10:39:14 +0100 Subject: [scikit-learn] Drawing contours in KMeans Message-ID: Hi I use the following code to highlight the cluster centers with some red dots. kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10, random_state=0) pred_y = kmeans.fit_predict(a) plt.scatter(a[:,0], a[:,1]) plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100, c='red') plt.show() I would like to know if it is possible to draw contours over the clusters. Is there any way for that? Please let me know if there is a function or option in KMeans. Regards, Mahmood -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahowe42 at gmail.com Wed Dec 9 05:47:40 2020 From: ahowe42 at gmail.com (Andrew Howe) Date: Wed, 9 Dec 2020 10:47:40 +0000 Subject: [scikit-learn] Drawing contours in KMeans In-Reply-To: References: Message-ID: Contours generally indicate a third variable - often a probability density. Kmeans doesn't provide density estimates, so what precisely would you want the contours to represent? Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile ResearchGate Profile Open Researcher and Contributor ID (ORCID) Github Profile Personal Website I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan wrote: > Hi > I use the following code to highlight the cluster centers with some red > dots. > > kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10, > random_state=0) > pred_y = kmeans.fit_predict(a) > plt.scatter(a[:,0], a[:,1]) > plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], > s=100, c='red') > plt.show() > > I would like to know if it is possible to draw contours over the clusters. > Is there any way for that? > Please let me know if there is a function or option in KMeans. > > Regards, > Mahmood > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahmood.nt at gmail.com Wed Dec 9 07:57:18 2020 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Wed, 9 Dec 2020 13:57:18 +0100 Subject: [scikit-learn] Drawing contours in KMeans In-Reply-To: References: Message-ID: I mean a circle/contour to group the points in a cluster for better representation. For example, if there are 6 six clusters, it will be more meaningful to group large data points in a circle or contour. Regards, Mahmood On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe wrote: > Contours generally indicate a third variable - often a probability > density. Kmeans doesn't provide density estimates, so what precisely would > you want the contours to represent? > > Andrew > > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > J. Andrew Howe, PhD > LinkedIn Profile > ResearchGate Profile > Open Researcher and Contributor ID (ORCID) > > Github Profile > Personal Website > I live to learn, so I can learn to live. - me > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > > On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan > wrote: > >> Hi >> I use the following code to highlight the cluster centers with some red >> dots. >> >> kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10, >> random_state=0) >> pred_y = kmeans.fit_predict(a) >> plt.scatter(a[:,0], a[:,1]) >> plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], >> s=100, c='red') >> plt.show() >> >> I would like to know if it is possible to draw contours over the >> clusters. Is there any way for that? >> Please let me know if there is a function or option in KMeans. >> >> Regards, >> Mahmood >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahowe42 at gmail.com Wed Dec 9 14:22:30 2020 From: ahowe42 at gmail.com (Andrew Howe) Date: Wed, 9 Dec 2020 19:22:30 +0000 Subject: [scikit-learn] Drawing contours in KMeans In-Reply-To: References: Message-ID: Ok, I see. Well the attached notebook demonstrates doing this by simply finding the maximum distance from each centroid to it's datapoints and drawing a circle using that radius. It's simple, but will hopefully at least point you in a useful direction. [image: image.png] Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile ResearchGate Profile Open Researcher and Contributor ID (ORCID) Github Profile Personal Website I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan wrote: > I mean a circle/contour to group the points in a cluster for better > representation. > For example, if there are 6 six clusters, it will be more meaningful to > group large data points in a circle or contour. > > Regards, > Mahmood > > > > > On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe wrote: > >> Contours generally indicate a third variable - often a probability >> density. Kmeans doesn't provide density estimates, so what precisely would >> you want the contours to represent? >> >> Andrew >> >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >> J. Andrew Howe, PhD >> LinkedIn Profile >> ResearchGate Profile >> Open Researcher and Contributor ID (ORCID) >> >> Github Profile >> Personal Website >> I live to learn, so I can learn to live. - me >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >> >> >> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan >> wrote: >> >>> Hi >>> I use the following code to highlight the cluster centers with some red >>> dots. >>> >>> kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10, >>> random_state=0) >>> pred_y = kmeans.fit_predict(a) >>> plt.scatter(a[:,0], a[:,1]) >>> plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, >>> 1], s=100, c='red') >>> plt.show() >>> >>> I would like to know if it is possible to draw contours over the >>> clusters. Is there any way for that? >>> Please let me know if there is a function or option in KMeans. >>> >>> Regards, >>> Mahmood >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 44525 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: KMeans_Cluster_Circle.ipynb Type: application/octet-stream Size: 27501 bytes Desc: not available URL: From helmrp at yahoo.com Wed Dec 9 14:53:06 2020 From: helmrp at yahoo.com (The Helmbolds) Date: Wed, 9 Dec 2020 19:53:06 +0000 (UTC) Subject: [scikit-learn] Drawing contours in KMeans In-Reply-To: References: Message-ID: <694862547.3937714.1607543587026@mail.yahoo.com> [scikit-learn] Drawing contours in KMeans4 Mebbe principal components analysis would suggest an ellipsoid containing "most" of the points in a "cloud". "You won't find the right answers if you don't ask the right questions!" (Robert Helmbold, 2013) On Wednesday, December 9, 2020, 12:22:49 PM MST, Andrew Howe wrote: Ok, I see. Well the attached notebook demonstrates?doing this by simply finding the maximum distance from each centroid to it's datapoints?and drawing a circle using that radius. It's simple, but will hopefully at least point you?in a useful direction. Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhDLinkedIn ProfileResearchGate ProfileOpen Researcher and Contributor ID (ORCID)Github Profile Personal WebsiteI live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan wrote: I mean a circle/contour to group the points in a cluster for better representation. For example, if there are 6 six clusters, it will be more meaningful to group large data points in a circle or contour. Regards, Mahmood On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe wrote: Contours generally indicate a third variable - often a probability density. Kmeans doesn't provide density estimates, so what precisely would you want the contours to represent? Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhDLinkedIn ProfileResearchGate ProfileOpen Researcher and Contributor ID (ORCID)Github Profile Personal WebsiteI live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan wrote: HiI use the following code to highlight the cluster centers with some red dots. kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10, random_state=0) pred_y = kmeans.fit_predict(a) plt.scatter(a[:,0], a[:,1]) plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100, c='red') plt.show() I would like to know if it is possible to draw contours over the clusters. Is there any way for that?Please let me know if there is a function or option in KMeans. Regards, Mahmood _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 44525 bytes Desc: not available URL: From mahmood.nt at gmail.com Wed Dec 9 15:06:19 2020 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Wed, 9 Dec 2020 21:06:19 +0100 Subject: [scikit-learn] Drawing contours in KMeans In-Reply-To: <694862547.3937714.1607543587026@mail.yahoo.com> References: <694862547.3937714.1607543587026@mail.yahoo.com> Message-ID: >Mebbe principal components analysis would suggest an >ellipsoid containing "most" of the points in a "cloud". Sorry I didn't understand. Can you explain more? Regards, Mahmood On Wed, Dec 9, 2020 at 8:55 PM The Helmbolds via scikit-learn < scikit-learn at python.org> wrote: > [scikit-learn] Drawing contours in KMeans4 > > > Mebbe principal components analysis would suggest an ellipsoid containing > "most" of the points in a "cloud". > > > > > "You won't find the right answers if you don't ask the right questions!" > (Robert Helmbold, 2013) > > > On Wednesday, December 9, 2020, 12:22:49 PM MST, Andrew Howe < > ahowe42 at gmail.com> wrote: > > > Ok, I see. Well the attached notebook demonstrates doing this by simply > finding the maximum distance from each centroid to it's datapoints and > drawing a circle using that radius. It's simple, but will hopefully at > least point you in a useful direction. > [image: image.png] > Andrew > > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > J. Andrew Howe, PhD > LinkedIn Profile > ResearchGate Profile > Open Researcher and Contributor ID (ORCID) > > Github Profile > Personal Website > I live to learn, so I can learn to live. - me > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > > On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan > wrote: > > I mean a circle/contour to group the points in a cluster for better > representation. > For example, if there are 6 six clusters, it will be more meaningful to > group large data points in a circle or contour. > > Regards, > Mahmood > > > > > On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe wrote: > > Contours generally indicate a third variable - often a probability > density. Kmeans doesn't provide density estimates, so what precisely would > you want the contours to represent? > > Andrew > > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > J. Andrew Howe, PhD > LinkedIn Profile > ResearchGate Profile > Open Researcher and Contributor ID (ORCID) > > Github Profile > Personal Website > I live to learn, so I can learn to live. - me > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > > On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan > wrote: > > Hi > I use the following code to highlight the cluster centers with some red > dots. > > kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10, > random_state=0) > pred_y = kmeans.fit_predict(a) > plt.scatter(a[:,0], a[:,1]) > plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], > s=100, c='red') > plt.show() > > I would like to know if it is possible to draw contours over the clusters. > Is there any way for that? > Please let me know if there is a function or option in KMeans. > > Regards, > Mahmood > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 44525 bytes Desc: not available URL: From jbbrown at kuhp.kyoto-u.ac.jp Wed Dec 9 15:40:15 2020 From: jbbrown at kuhp.kyoto-u.ac.jp (Brown J.B.) Date: Wed, 9 Dec 2020 21:40:15 +0100 Subject: [scikit-learn] Drawing contours in KMeans In-Reply-To: References: <694862547.3937714.1607543587026@mail.yahoo.com> Message-ID: Dear Mahmood, Andrew's solution with a circle will guarantee you render an image in which every point is covered within some circle. However, if data contains outliers or artifacts, you might get circles which are excessively large and distort the image you want. For example, imagine if there were a single red point in Andrew's image at the coordinate (3,10); then, the resulting circle would cover all points in the entire plot, which is unlikely what you want. You could potentially generate a density estimate for each class and then have matplotlib render the contour lines (e.g., solutions of where estimates have a specific value), but as was said, this is not the job of Kmeans, but rather of general data analysis. The ellipsoid solution proposed to you is, in a sense, a middle ground between these two solutions (the circles and the density plots). You could adjust the (4 or 5) parameters of an ellipsoid to cover "most" of the points for a particular class and tolerate that the ellipsoids don't cover a few outliers or artifacts (e.g., the coordinate (3,10) I mentioned above). The resulting functional forms of the ellipses might be more precise than circles and less complex than density contours, and might lead to actionable knowledge depending on your context/domain. Hope this helps. J.B. Brown 2020?12?9?(?) 21:08 Mahmood Naderan : > >Mebbe principal components analysis would suggest an > >ellipsoid containing "most" of the points in a "cloud". > > Sorry I didn't understand. Can you explain more? > Regards, > Mahmood > > > > > On Wed, Dec 9, 2020 at 8:55 PM The Helmbolds via scikit-learn < > scikit-learn at python.org> wrote: > >> [scikit-learn] Drawing contours in KMeans4 >> >> >> Mebbe principal components analysis would suggest an ellipsoid containing >> "most" of the points in a "cloud". >> >> >> >> >> "You won't find the right answers if you don't ask the right questions!" >> (Robert Helmbold, 2013) >> >> >> On Wednesday, December 9, 2020, 12:22:49 PM MST, Andrew Howe < >> ahowe42 at gmail.com> wrote: >> >> >> Ok, I see. Well the attached notebook demonstrates doing this by simply >> finding the maximum distance from each centroid to it's datapoints and >> drawing a circle using that radius. It's simple, but will hopefully at >> least point you in a useful direction. >> [image: image.png] >> Andrew >> >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >> J. Andrew Howe, PhD >> LinkedIn Profile >> ResearchGate Profile >> Open Researcher and Contributor ID (ORCID) >> >> Github Profile >> Personal Website >> I live to learn, so I can learn to live. - me >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >> >> >> On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan >> wrote: >> >> I mean a circle/contour to group the points in a cluster for better >> representation. >> For example, if there are 6 six clusters, it will be more meaningful to >> group large data points in a circle or contour. >> >> Regards, >> Mahmood >> >> >> >> >> On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe wrote: >> >> Contours generally indicate a third variable - often a probability >> density. Kmeans doesn't provide density estimates, so what precisely would >> you want the contours to represent? >> >> Andrew >> >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >> J. Andrew Howe, PhD >> LinkedIn Profile >> ResearchGate Profile >> Open Researcher and Contributor ID (ORCID) >> >> Github Profile >> Personal Website >> I live to learn, so I can learn to live. - me >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >> >> >> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan >> wrote: >> >> Hi >> I use the following code to highlight the cluster centers with some red >> dots. >> >> kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10, >> random_state=0) >> pred_y = kmeans.fit_predict(a) >> plt.scatter(a[:,0], a[:,1]) >> plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], >> s=100, c='red') >> plt.show() >> >> I would like to know if it is possible to draw contours over the >> clusters. Is there any way for that? >> Please let me know if there is a function or option in KMeans. >> >> Regards, >> Mahmood >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 44525 bytes Desc: not available URL: From abhishek.ghose.82 at gmail.com Wed Dec 9 16:21:27 2020 From: abhishek.ghose.82 at gmail.com (Abhishek Ghose) Date: Wed, 9 Dec 2020 13:21:27 -0800 Subject: [scikit-learn] Drawing contours in KMeans In-Reply-To: References: <694862547.3937714.1607543587026@mail.yahoo.com> Message-ID: Hi, A quick way I use is to draw a convex hull (scipy) around the points in a cluster. Here's a short example - k-means with k=2 is run on synthetic data: from sklearn.datasets import make_blobs from sklearn.cluster import KMeans from matplotlib import pyplot as plt from scipy.spatial import ConvexHull X, _ = make_blobs(centers=2) kmeans = KMeans(n_clusters=2, random_state=0).fit(X) # uncomment the next line if you're using a notebook #%matplotlib inline for label in set(kmeans.labels_): X_clust = X[kmeans.labels_==label] hull = ConvexHull(X_clust, qhull_options='QJ') vertices_cycle = hull.vertices.tolist() vertices_cycle.append(hull.vertices[0]) plt.plot(X_clust[vertices_cycle, 0], X_clust[vertices_cycle, 1], 'k--', lw=1) plt.scatter(X_clust[:, 0], X_clust[:, 1]) Note: 1. You can still have overlaps between boundaries - but I think this is a good effort-to-results tradeoff. 2. To draw a closed boundary, you'd need to add the first vertex to the list returned by the hull function - the above code does that. 3. You'd need to handle the case for clusters with <=2 points explicitly - not shown in the above code. 4. I use the "QJ" option (other options at the qhull library page, which scipy internally uses: http://www.qhull.org/html/qh-optq.htm) to joggle the points a bit when they lie on a line. Regards On Wed, Dec 9, 2020 at 12:41 PM Brown J.B. via scikit-learn < scikit-learn at python.org> wrote: > Dear Mahmood, > > Andrew's solution with a circle will guarantee you render an image in > which every point is covered within some circle. > > However, if data contains outliers or artifacts, you might get circles > which are excessively large and distort the image you want. > For example, imagine if there were a single red point in Andrew's image at > the coordinate (3,10); then, the resulting circle would cover all points in > the entire plot, which is unlikely what you want. > You could potentially generate a density estimate for each class and then > have matplotlib render the contour lines (e.g., solutions of where > estimates have a specific value), but as was said, this is not the job of > Kmeans, but rather of general data analysis. > > The ellipsoid solution proposed to you is, in a sense, a middle ground > between these two solutions (the circles and the density plots). > You could adjust the (4 or 5) parameters of an ellipsoid to cover "most" > of the points for a particular class and tolerate that the ellipsoids don't > cover a few outliers or artifacts (e.g., the coordinate (3,10) I mentioned > above). > The resulting functional forms of the ellipses might be more precise than > circles and less complex than density contours, and might lead to > actionable knowledge depending on your context/domain. > > Hope this helps. > J.B. Brown > > 2020?12?9?(?) 21:08 Mahmood Naderan : > >> >Mebbe principal components analysis would suggest an >> >ellipsoid containing "most" of the points in a "cloud". >> >> Sorry I didn't understand. Can you explain more? >> Regards, >> Mahmood >> >> >> >> >> On Wed, Dec 9, 2020 at 8:55 PM The Helmbolds via scikit-learn < >> scikit-learn at python.org> wrote: >> >>> [scikit-learn] Drawing contours in KMeans4 >>> >>> >>> Mebbe principal components analysis would suggest an ellipsoid >>> containing "most" of the points in a "cloud". >>> >>> >>> >>> >>> "You won't find the right answers if you don't ask the right questions!" >>> (Robert Helmbold, 2013) >>> >>> >>> On Wednesday, December 9, 2020, 12:22:49 PM MST, Andrew Howe < >>> ahowe42 at gmail.com> wrote: >>> >>> >>> Ok, I see. Well the attached notebook demonstrates doing this by simply >>> finding the maximum distance from each centroid to it's datapoints and >>> drawing a circle using that radius. It's simple, but will hopefully at >>> least point you in a useful direction. >>> [image: image.png] >>> Andrew >>> >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> J. Andrew Howe, PhD >>> LinkedIn Profile >>> ResearchGate Profile >>> Open Researcher and Contributor ID (ORCID) >>> >>> Github Profile >>> Personal Website >>> I live to learn, so I can learn to live. - me >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> >>> >>> On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan >>> wrote: >>> >>> I mean a circle/contour to group the points in a cluster for better >>> representation. >>> For example, if there are 6 six clusters, it will be more meaningful to >>> group large data points in a circle or contour. >>> >>> Regards, >>> Mahmood >>> >>> >>> >>> >>> On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe wrote: >>> >>> Contours generally indicate a third variable - often a probability >>> density. Kmeans doesn't provide density estimates, so what precisely would >>> you want the contours to represent? >>> >>> Andrew >>> >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> J. Andrew Howe, PhD >>> LinkedIn Profile >>> ResearchGate Profile >>> Open Researcher and Contributor ID (ORCID) >>> >>> Github Profile >>> Personal Website >>> I live to learn, so I can learn to live. - me >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> >>> >>> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan >>> wrote: >>> >>> Hi >>> I use the following code to highlight the cluster centers with some red >>> dots. >>> >>> kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10, >>> random_state=0) >>> pred_y = kmeans.fit_predict(a) >>> plt.scatter(a[:,0], a[:,1]) >>> plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, >>> 1], s=100, c='red') >>> plt.show() >>> >>> I would like to know if it is possible to draw contours over the >>> clusters. Is there any way for that? >>> Please let me know if there is a function or option in KMeans. >>> >>> Regards, >>> Mahmood >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Computers: The eventual realization of Douglas Adams' musings - the world depends on machines controlled by mice. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 44525 bytes Desc: not available URL: From abhishek.ghose.82 at gmail.com Wed Dec 9 19:25:38 2020 From: abhishek.ghose.82 at gmail.com (Abhishek Ghose) Date: Wed, 9 Dec 2020 16:25:38 -0800 Subject: [scikit-learn] Drawing contours in KMeans In-Reply-To: References: <694862547.3937714.1607543587026@mail.yahoo.com> Message-ID: Sorry, just noticed that I had forgotten to attach a sample image. Regards On Wed, Dec 9, 2020 at 1:21 PM Abhishek Ghose wrote: > Hi, > > A quick way I use is to draw a convex hull (scipy) around the points in a > cluster. > Here's a short example - k-means with k=2 is run on synthetic data: > > from sklearn.datasets import make_blobs > from sklearn.cluster import KMeans > from matplotlib import pyplot as plt > from scipy.spatial import ConvexHull > > > X, _ = make_blobs(centers=2) > kmeans = KMeans(n_clusters=2, random_state=0).fit(X) > > # uncomment the next line if you're using a notebook > #%matplotlib inline > for label in set(kmeans.labels_): > X_clust = X[kmeans.labels_==label] > hull = ConvexHull(X_clust, qhull_options='QJ') > vertices_cycle = hull.vertices.tolist() > vertices_cycle.append(hull.vertices[0]) > plt.plot(X_clust[vertices_cycle, 0], X_clust[vertices_cycle, 1], > 'k--', lw=1) > plt.scatter(X_clust[:, 0], X_clust[:, 1]) > > Note: > > 1. You can still have overlaps between boundaries - but I think this > is a good effort-to-results tradeoff. > 2. To draw a closed boundary, you'd need to add the first vertex to > the list returned by the hull function - the above code does that. > 3. You'd need to handle the case for clusters with <=2 points > explicitly - not shown in the above code. > 4. I use the "QJ" option (other options at the qhull library page, > which scipy internally uses: http://www.qhull.org/html/qh-optq.htm) to > joggle the points a bit when they lie on a line. > > Regards > > > On Wed, Dec 9, 2020 at 12:41 PM Brown J.B. via scikit-learn < > scikit-learn at python.org> wrote: > >> Dear Mahmood, >> >> Andrew's solution with a circle will guarantee you render an image in >> which every point is covered within some circle. >> >> However, if data contains outliers or artifacts, you might get circles >> which are excessively large and distort the image you want. >> For example, imagine if there were a single red point in Andrew's image >> at the coordinate (3,10); then, the resulting circle would cover all points >> in the entire plot, which is unlikely what you want. >> You could potentially generate a density estimate for each class and then >> have matplotlib render the contour lines (e.g., solutions of where >> estimates have a specific value), but as was said, this is not the job of >> Kmeans, but rather of general data analysis. >> >> The ellipsoid solution proposed to you is, in a sense, a middle ground >> between these two solutions (the circles and the density plots). >> You could adjust the (4 or 5) parameters of an ellipsoid to cover "most" >> of the points for a particular class and tolerate that the ellipsoids don't >> cover a few outliers or artifacts (e.g., the coordinate (3,10) I mentioned >> above). >> The resulting functional forms of the ellipses might be more precise than >> circles and less complex than density contours, and might lead to >> actionable knowledge depending on your context/domain. >> >> Hope this helps. >> J.B. Brown >> >> 2020?12?9?(?) 21:08 Mahmood Naderan : >> >>> >Mebbe principal components analysis would suggest an >>> >ellipsoid containing "most" of the points in a "cloud". >>> >>> Sorry I didn't understand. Can you explain more? >>> Regards, >>> Mahmood >>> >>> >>> >>> >>> On Wed, Dec 9, 2020 at 8:55 PM The Helmbolds via scikit-learn < >>> scikit-learn at python.org> wrote: >>> >>>> [scikit-learn] Drawing contours in KMeans4 >>>> >>>> >>>> Mebbe principal components analysis would suggest an ellipsoid >>>> containing "most" of the points in a "cloud". >>>> >>>> >>>> >>>> >>>> "You won't find the right answers if you don't ask the right >>>> questions!" (Robert Helmbold, 2013) >>>> >>>> >>>> On Wednesday, December 9, 2020, 12:22:49 PM MST, Andrew Howe < >>>> ahowe42 at gmail.com> wrote: >>>> >>>> >>>> Ok, I see. Well the attached notebook demonstrates doing this by simply >>>> finding the maximum distance from each centroid to it's datapoints and >>>> drawing a circle using that radius. It's simple, but will hopefully at >>>> least point you in a useful direction. >>>> [image: image.png] >>>> Andrew >>>> >>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>>> J. Andrew Howe, PhD >>>> LinkedIn Profile >>>> ResearchGate Profile >>>> Open Researcher and Contributor ID (ORCID) >>>> >>>> Github Profile >>>> Personal Website >>>> I live to learn, so I can learn to live. - me >>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>>> >>>> >>>> On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan >>>> wrote: >>>> >>>> I mean a circle/contour to group the points in a cluster for better >>>> representation. >>>> For example, if there are 6 six clusters, it will be more meaningful to >>>> group large data points in a circle or contour. >>>> >>>> Regards, >>>> Mahmood >>>> >>>> >>>> >>>> >>>> On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe wrote: >>>> >>>> Contours generally indicate a third variable - often a probability >>>> density. Kmeans doesn't provide density estimates, so what precisely would >>>> you want the contours to represent? >>>> >>>> Andrew >>>> >>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>>> J. Andrew Howe, PhD >>>> LinkedIn Profile >>>> ResearchGate Profile >>>> Open Researcher and Contributor ID (ORCID) >>>> >>>> Github Profile >>>> Personal Website >>>> I live to learn, so I can learn to live. - me >>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>>> >>>> >>>> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan >>>> wrote: >>>> >>>> Hi >>>> I use the following code to highlight the cluster centers with some red >>>> dots. >>>> >>>> kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, >>>> n_init=10, random_state=0) >>>> pred_y = kmeans.fit_predict(a) >>>> plt.scatter(a[:,0], a[:,1]) >>>> plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, >>>> 1], s=100, c='red') >>>> plt.show() >>>> >>>> I would like to know if it is possible to draw contours over the >>>> clusters. Is there any way for that? >>>> Please let me know if there is a function or option in KMeans. >>>> >>>> Regards, >>>> Mahmood >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Computers: The eventual realization of Douglas Adams' musings - the world > depends on machines controlled by mice. > -- Computers: The eventual realization of Douglas Adams' musings - the world depends on machines controlled by mice. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 44525 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: kmeans_convexhull.PNG Type: image/png Size: 29519 bytes Desc: not available URL: From mahmood.nt at gmail.com Mon Dec 14 11:31:24 2020 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Mon, 14 Dec 2020 17:31:24 +0100 Subject: [scikit-learn] Using text labels in dendrogram Message-ID: Hi I use the following code to create a dendrogram from a set of x-y points import matplotlib.pyplot as plt import numpy as np from scipy.cluster.hierarchy import dendrogram, linkage a = np.array([ [5.840,-2.339], [6.320,-2.665], [-1.698,-0.084], ],) linked = linkage(a, 'single') labelList = range(1, 69) dendrogram(linked, orientation='top', labels=labelList, distance_sort='descending', show_leaf_counts=True) plt.show() This code automatically assigns labels for the points and the result is not sorted. For example, the x-axis labels are 2 1 3. I want to assign a text for each point. Then no matter where the point goes, I can see the text that I have assigned. For example, I want to see PDD, XDD, BDD. How can I create a list of text labels and bind that to the array a? Regards, Mahmood -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahmood.nt at gmail.com Tue Dec 15 01:36:57 2020 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Tue, 15 Dec 2020 07:36:57 +0100 Subject: [scikit-learn] Using text labels in dendrogram In-Reply-To: References: Message-ID: OK. As I checked the documents, I have to fix the labelList line to the string that I want. Regards, Mahmood On Mon, Dec 14, 2020 at 5:31 PM Mahmood Naderan wrote: > Hi > I use the following code to create a dendrogram from a set of x-y points > > import matplotlib.pyplot as plt > import numpy as np > from scipy.cluster.hierarchy import dendrogram, linkage > a = np.array([ > [5.840,-2.339], > [6.320,-2.665], > [-1.698,-0.084], > ],) > linked = linkage(a, 'single') > labelList = range(1, 69) > dendrogram(linked, > orientation='top', > labels=labelList, > distance_sort='descending', > show_leaf_counts=True) > plt.show() > > This code automatically assigns labels for the points and the result is > not sorted. For example, the x-axis labels are 2 1 3. > I want to assign a text for each point. Then no matter where the point > goes, I can see the text that I have assigned. For example, I want to see > PDD, XDD, BDD. > How can I create a list of text labels and bind that to the array a? > > Regards, > Mahmood > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrin.jalali at gmail.com Thu Dec 17 02:10:04 2020 From: adrin.jalali at gmail.com (Adrin) Date: Thu, 17 Dec 2020 08:10:04 +0100 Subject: [scikit-learn] Daily failure: Run failed: Wheel builder - master (1e53ae2) In-Reply-To: References: Message-ID: Hi, I keep getting this email every night, sending it here to make sure we're aware of it. Cheers, Adrin ---------- Forwarded message --------- From: Adrin Jalali Date: Thu., Dec. 17, 2020, 05:21 Subject: [adrinjalali/scikit-learn] Run failed: Wheel builder - master (1e53ae2) To: adrinjalali/scikit-learn Cc: Ci activity [image: GitHub] [adrinjalali/scikit-learn] Wheel builder workflow run Wheel builder: Some jobs were not successful View workflow run [image: Check build trigger] *Wheel builder* / Check build trigger Succeeded in 17 seconds [image: Build wheel for cp36-win32] *Wheel builder* / Build wheel for cp36-win32 Succeeded in 21 minutes [image: Build wheel for cp36-win_amd64] *Wheel builder* / Build wheel for cp36-win_amd64 Succeeded in 20 minutes and 3 seconds [image: Build wheel for cp37-win32] *Wheel builder* / Build wheel for cp37-win32 Succeeded in 19 minutes and 20 seconds [image: Build wheel for cp37-win_amd64] *Wheel builder* / Build wheel for cp37-win_amd64 Succeeded in 22 minutes and 28 seconds [image: Build wheel for cp38-win32] *Wheel builder* / Build wheel for cp38-win32 Succeeded in 19 minutes and 39 seconds [image: Build wheel for cp38-win_amd64] *Wheel builder* / Build wheel for cp38-win_amd64 Succeeded in 21 minutes and 21 seconds [image: Build wheel for cp39-win32] *Wheel builder* / Build wheel for cp39-win32 Succeeded in 20 minutes and 37 seconds [image: Build wheel for cp39-win_amd64] *Wheel builder* / Build wheel for cp39-win_amd64 Succeeded in 24 minutes and 7 seconds [image: Build wheel for cp36-manylinux_i686] *Wheel builder* / Build wheel for cp36-manylinux_i686 Succeeded in 17 minutes and 50 seconds View all 23 job statuses ? You are receiving this because this workflow ran on your branch. Manage your GitHub Actions notifications GitHub, Inc. ?88 Colin P Kelly Jr Street ?San Francisco, CA 94107 -------------- next part -------------- An HTML attachment was scrubbed... URL: From loic.esteve at ymail.com Thu Dec 17 02:42:36 2020 From: loic.esteve at ymail.com (=?utf-8?B?TG/Dr2MgRXN0w6h2ZQ==?=) Date: Thu, 17 Dec 2020 08:42:36 +0100 Subject: [scikit-learn] Daily failure: Run failed: Wheel builder - master (1e53ae2) In-Reply-To: References: Message-ID: Me too, as a quick fix I reverted my fork (lesteve/scikit-learn) master branch to an old version to avoid this ... The better fix would be to modify the Github Action so that it does not run in forks but only in the main repo i.e. scikit-learn/scikit-learn. Cheers, Lo?c > Hi, > > I keep getting this email every night, sending it here to make sure we're aware of it. > > Cheers, > Adrin > > ---------- Forwarded message --------- > From: Adrin Jalali > Date: Thu., Dec. 17, 2020, 05:21 > Subject: [adrinjalali/scikit-learn] Run failed: Wheel builder - master (1e53ae2) > To: adrinjalali/scikit-learn > Cc: Ci activity > > > GitHub > > [adrinjalali/scikit-learn] Wheel builder workflow run > > * > > > Wheel builder: Some jobs were not successful > > > View workflow run > > Check build trigger Wheel builder / Check build trigger > Succeeded in 17 seconds > > Build wheel for cp36-win32 Wheel builder / Build wheel for cp36-win32 > Succeeded in 21 minutes > > Build wheel for cp36-win_amd64 Wheel builder / Build wheel for cp36-win_amd64 > Succeeded in 20 minutes and 3 seconds > > Build wheel for cp37-win32 Wheel builder / Build wheel for cp37-win32 > Succeeded in 19 minutes and 20 seconds > > Build wheel for cp37-win_amd64 Wheel builder / Build wheel for cp37-win_amd64 > Succeeded in 22 minutes and 28 seconds > > Build wheel for cp38-win32 Wheel builder / Build wheel for cp38-win32 > Succeeded in 19 minutes and 39 seconds > > Build wheel for cp38-win_amd64 Wheel builder / Build wheel for cp38-win_amd64 > Succeeded in 21 minutes and 21 seconds > > Build wheel for cp39-win32 Wheel builder / Build wheel for cp39-win32 > Succeeded in 20 minutes and 37 seconds > > Build wheel for cp39-win_amd64 Wheel builder / Build wheel for cp39-win_amd64 > Succeeded in 24 minutes and 7 seconds > > Build wheel for cp36-manylinux_i686 Wheel builder / Build wheel for cp36-manylinux_i686 > Succeeded in 17 minutes and 50 seconds > > View all 23 job statuses > > > > ? > You are receiving this because this workflow ran on your branch. > Manage your GitHub Actions notifications > > > GitHub, Inc. ?88 Colin P Kelly Jr Street ?San Francisco, CA 94107 > GitHub > * > Check build trigger > Build wheel for cp36-win32 > Build wheel for cp36-win_amd64 > Build wheel for cp37-win32 > Build wheel for cp37-win_amd64 > Build wheel for cp38-win32 > Build wheel for cp38-win_amd64 > Build wheel for cp39-win32 > Build wheel for cp39-win_amd64 > Build wheel for cp36-manylinux_i686 From g.lemaitre58 at gmail.com Tue Dec 22 12:11:05 2020 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Tue, 22 Dec 2020 18:11:05 +0100 Subject: [scikit-learn] ANN scikit-learn 0.24.0 release Message-ID: We're happy to announce the 0.24.0 release and already out on PyPI and conda-forge. You can read the release highlights under https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_0_24_0.html and the long version of the change log under https://scikit-learn.org/stable/whats_new/v0.24.html#version-0-24-0 New major features include: Highlights include successive halving, categorical support for GBRT, SelfTrainingClassifier, SequentialFeatureSelector and much more! This version supports Python versions 3.6 to 3.9. You can give it a go using `pip install -U scikit-learn` or `conda install -c conda-forge scikit-learn`. A big thanks to all contributors for making this release possible. Regards, On the behalf of the scikit-learn maintainer team. -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Tue Dec 22 22:00:44 2020 From: joel.nothman at gmail.com (Joel Nothman) Date: Wed, 23 Dec 2020 14:00:44 +1100 Subject: [scikit-learn] ANN scikit-learn 0.24.0 release In-Reply-To: References: Message-ID: Thanks and congrats to all involved! Some very helpful features. And rumour has it we might have version 1.0 in 2021... :-o On Wed, 23 Dec 2020, 4:12 am Guillaume Lema?tre, wrote: > We're happy to announce the 0.24.0 release and already out on PyPI and > conda-forge. > > You can read the release highlights under > https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_0_24_0.html > and the long version of the change log under > https://scikit-learn.org/stable/whats_new/v0.24.html#version-0-24-0 > > New major features include: Highlights include successive halving, > categorical support for GBRT, > SelfTrainingClassifier, SequentialFeatureSelector and much more! > > This version supports Python versions 3.6 to 3.9. You can give it a go > using `pip install -U scikit-learn` or `conda install -c conda-forge > scikit-learn`. > > A big thanks to all contributors for making this release possible. > > Regards, > On the behalf of the scikit-learn maintainer team. > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahmood.nt at gmail.com Fri Dec 25 09:16:02 2020 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Fri, 25 Dec 2020 15:16:02 +0100 Subject: [scikit-learn] Comparing Scikit and Xlstat for PCA analysis Message-ID: Hi I have a test csv file and I have written a code to show the PCA for that. I also use another tool in Excel (XLSTAT) to compare the results. The XLSTAT automatically calculates the number of features, however, based on my understanding, I have to specify how many components are needed using the scikit package. For example, while XLSTAT shows 5 features: Factor scores: F1 F2 F3 F4 F5 A1 -1.293 -0.663 -0.462 -0.713 0.010 A2 -0.297 0.293 -1.429 0.397 0.056 A3 2.328 0.069 0.987 -0.108 0.062 A4 -0.556 -2.273 0.538 0.344 -0.032 A5 1.823 0.775 -0.597 -0.052 -0.085 A6 -2.005 1.799 0.963 0.133 -0.011 In the following code, I specified 2 components: x = StandardScaler().fit_transform(x) pca = PCA(n_components=2) principalComponents = pca.fit_transform(x) print( principalComponents ) [[-1.29292842 0.66325508] [-0.29706395 -0.29346337] [ 2.32751305 -0.06850045] [-0.5558091 2.27288988] [ 1.82312052 -0.77527304] [-2.0048321 -1.7989081 ]] As you can see, the first column in XLSTAT and scikit are the same. However, the second columns are negated. For example, considering F1 and F2, we see XLSTAT => -1.293 -0.663 scikit => [-1.29292842 0.66325508] So, my questions are 1) Isn't there any way to use scikit for an unknown number of principal components? So that I can query the number of principal components and use a scree plot then. 2) Considering the F1 and F2 as a XY scatter point, I want to know why the value of Y in XLSTAT and scikit are opposite? The code which I write is available at https://pastebin.com/ghJQ6L4C Any idea? Regards, Mahmood -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidcee at hotmail.com Sun Dec 27 21:36:59 2020 From: davidcee at hotmail.com (DAVID cofield) Date: Mon, 28 Dec 2020 02:36:59 +0000 Subject: [scikit-learn] Interpreting results of random forest classifier Message-ID: Could someone explain what the probability values provided by the random forest classifier represents? When I run the classifier with two classes, I get prediction values and associated to these prediction values are probabilities. As I understand it now the 0 probability is probability that the prediction is wrong, and the 1 probability is the probability that the prediction is correct. One thing I do not understand is why the probability ranges do not go from 0 to 100; they go from 0 to 49 for 0 probability and 49-100 for 1 probability. How do I interpret the probabilities? Sent from Mail for Windows 10 -------------- next part -------------- An HTML attachment was scrubbed... URL: From marmochiaskl at gmail.com Mon Dec 28 06:31:30 2020 From: marmochiaskl at gmail.com (Chiara Marmo) Date: Mon, 28 Dec 2020 12:31:30 +0100 Subject: [scikit-learn] Monthly meeting January 4th 2021 Message-ID: Dear list, The first 2021 scikit-learn monthly meeting will take place on Monday January 4th at 8PM UTC: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=01&day=04&hour=20&min=0&sec=0&p1=179&p2=240&p3=195&p4=224 While these meetings are mainly for core-devs to discuss the current topics, we are also happy to welcome non-core devs and other project maintainers. Feel free to join, using the following link: https://meet.google.com/xhq-yoga-rtf If you plan to attend and you would like to discuss something specific about your contribution please add your name (or github pseudo) in the " Contributors " section, of the public pad: https://hackmd.io/qtKt7pTNSXanU-MJOIMxbw Best Chiara -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Mon Dec 28 07:41:46 2020 From: niourf at gmail.com (Nicolas Hug) Date: Mon, 28 Dec 2020 12:41:46 +0000 Subject: [scikit-learn] Interpreting results of random forest classifier In-Reply-To: References: Message-ID: <6906f0d9-d34a-f751-9975-3f3d8d2f3b80@gmail.com> Hi David, > As I understand it now the 0 probability is probability that the prediction is wrong, and the 1 probability is the probability that the prediction is correct No: in binary classification, the `predict_proba` method returns a single number in [0, 1] indicating the probability that the sample belongs to the positive class (1). In other words, ? (proba <= 0.5? iff prediction == 0). The threshold is 0.5 since there are only 2 classes. > One thing I do not understand is why the probability ranges do not go from 0 to 100; they go from 0 to 49 for 0 probability and 49-100 for 1 probability This is a correct observation and it's a direct consequence of the above definition. The way probabilities are computed is briefly described here: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.predict_proba Nicolas On 12/28/20 2:36 AM, DAVID cofield wrote: > > Could someone explain what the probability values provided by the > random forest classifier represents? > > When I run the classifier with two classes, I get prediction values > and associated to these prediction values are probabilities. As I > understand it now the 0 probability is probability that the prediction > is wrong, and the 1 probability is the probability that the prediction > is correct. One thing I do not understand is why the probability > ranges do not go from 0 to 100; they go from 0 to 49 for 0 probability > and 49-100 for 1 probability. How do I interpret ?the probabilities? > > Sent from Mail for > Windows 10 > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Mon Dec 28 10:28:06 2020 From: g.lemaitre58 at gmail.com (=?ISO-8859-1?Q?Guillaume_Lema=EEtre?=) Date: Mon, 28 Dec 2020 16:28:06 +0100 Subject: [scikit-learn] Comparing Scikit and Xlstat for PCA analysis In-Reply-To: Message-ID: n_components set to 'auto' is a strategy that will pick the number of components. The sign of the PC does not matter so much since they are still orthogonal. So change will depend of the solver that should be different in both software. Sent from my phone - sorry to be brief and potential misspell. From mahmood.nt at gmail.com Mon Dec 28 11:52:19 2020 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Mon, 28 Dec 2020 17:52:19 +0100 Subject: [scikit-learn] Comparing Scikit and Xlstat for PCA analysis In-Reply-To: References: Message-ID: Hi Guillaume, Thanks for the reply. May I know if I can choose different solvers in the scikit package or not. Regards, Mahmood On Mon, Dec 28, 2020 at 4:30 PM Guillaume Lema?tre wrote: > n_components set to 'auto' is a strategy that will pick the number of > components. The sign of the PC does not matter so much since they are still > orthogonal. So change will depend of the solver that should be different in > both software. > > > > > Sent from my phone - sorry to be brief and potential misspell. > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: