From benoit.presles at u-bourgogne.fr Mon Sep 2 04:16:52 2019 From: benoit.presles at u-bourgogne.fr (=?UTF-8?Q?Beno=c3=aet_Presles?=) Date: Mon, 2 Sep 2019 10:16:52 +0200 Subject: [scikit-learn] No convergence warning in logistic regression In-Reply-To: References: <58775d9b-bf80-c0f4-f696-b1470cb37745@u-bourgogne.fr> Message-ID: <5816eb6d-e554-1ddd-cdb6-9cbab8b8c904@u-bourgogne.fr> Hello Sebastian, I have tried with the lbfgs solver and it does not change anything. I do not have any convergence warning. Thanks for your help, Ben Le 30/08/2019 ? 18:29, Sebastian Raschka a ?crit?: > Hi Ben, > > I can recall seeing convergence warnings for scikit-learn's logistic regression model on datasets in the past as well. Which solver did you use for LogisticRegression in sklearn? If you haven't done so, have used the lbfgs solver? I.e., > > LogisticRegression(..., solver='lbfgs')? > > Best, > Sebastian > >> On Aug 30, 2019, at 9:52 AM, Beno?t Presles wrote: >> >> Dear all, >> >> I compared the logistic regression of statsmodels (Logit) with the logistic regression of sklearn (LogisticRegression). As I do not do regularization, I use the fit method with statsmodels and set penalty='none' in sklearn. Most of the time, I have got the same results between the two packages. >> >> However, when data are correlated, it is not the case. In fact, I have got a very useful convergence warning with statsmodel (ConvergenceWarning: Maximum Likelihood optimization failed to converge) that I do not have with sklearn? Is it normal that I do not have any convergence warning with sklearn even if I put verbose=1? I guess sklearn did not converge either. >> >> >> Thanks for your help, >> Best regards, >> Ben >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From g.lemaitre58 at gmail.com Mon Sep 2 05:40:12 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Mon, 2 Sep 2019 11:40:12 +0200 Subject: [scikit-learn] No convergence warning in logistic regression In-Reply-To: <5816eb6d-e554-1ddd-cdb6-9cbab8b8c904@u-bourgogne.fr> References: <58775d9b-bf80-c0f4-f696-b1470cb37745@u-bourgogne.fr> <5816eb6d-e554-1ddd-cdb6-9cbab8b8c904@u-bourgogne.fr> Message-ID: LBFGS will raise ConvergenceWarning for sure. You can check the n_iter_ attribute to know if you really converged. On Mon, 2 Sep 2019 at 10:28, Beno?t Presles wrote: > Hello Sebastian, > > I have tried with the lbfgs solver and it does not change anything. I do > not have any convergence warning. > > Thanks for your help, > Ben > > > Le 30/08/2019 ? 18:29, Sebastian Raschka a ?crit : > > Hi Ben, > > > > I can recall seeing convergence warnings for scikit-learn's logistic > regression model on datasets in the past as well. Which solver did you use > for LogisticRegression in sklearn? If you haven't done so, have used the > lbfgs solver? I.e., > > > > LogisticRegression(..., solver='lbfgs')? > > > > Best, > > Sebastian > > > >> On Aug 30, 2019, at 9:52 AM, Beno?t Presles < > benoit.presles at u-bourgogne.fr> wrote: > >> > >> Dear all, > >> > >> I compared the logistic regression of statsmodels (Logit) with the > logistic regression of sklearn (LogisticRegression). As I do not do > regularization, I use the fit method with statsmodels and set > penalty='none' in sklearn. Most of the time, I have got the same results > between the two packages. > >> > >> However, when data are correlated, it is not the case. In fact, I have > got a very useful convergence warning with statsmodel (ConvergenceWarning: > Maximum Likelihood optimization failed to converge) that I do not have with > sklearn? Is it normal that I do not have any convergence warning with > sklearn even if I put verbose=1? I guess sklearn did not converge either. > >> > >> > >> Thanks for your help, > >> Best regards, > >> Ben > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at gmail.com Mon Sep 2 09:14:39 2019 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Mon, 2 Sep 2019 15:14:39 +0200 Subject: [scikit-learn] scikit-learn website and documentation In-Reply-To: References: <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com> <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org> Message-ID: Hello Chiara, as far as I understood scikit-learn#14849 started as an incremental improvement of the scikit-learn website and ended up as a more in depth rewrite of the sphinx theme. If you have any comments or suggestions don't hesitate to comment on that issue. For instance, that PR went with Boostrap and I'm wondering about be the advantages/limitations with respect to using something like PureCSS. Reviews of that PR would also be very much appreciated. -- Roman On 30/08/2019 18:58, Chiara Marmo wrote: > Hello, > > Should I consider this PR [1] as an answer? ;) > > Cheers, > Chiara > > [1] https://github.com/scikit-learn/scikit-learn/pull/14849 > > > On Sat, Aug 24, 2019 at 1:53 PM Chiara Marmo > wrote: > > Hi Nicolas, > > Working on visual and contents of the the docs is in my skills and > I'm happy to finish the job. > But I'm not a web designer and I don't like to impose myself... :) > > Maybe you can check at the Monday meeting if everybody is ok with > that and write down comments in the minutes? For the next meeting I > will be available for collecting specifications, if any. > > Ga?l, I will check purecss.io : how much > customization the basic theme needs has to be considered too. > > CiaoCiao > > Chiara > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From safiullahmarwat at gmail.com Mon Sep 2 10:06:07 2019 From: safiullahmarwat at gmail.com (Safi Ullah Marwat) Date: Mon, 2 Sep 2019 23:06:07 +0900 Subject: [scikit-learn] Clustering Algorithm based on correlation distance Message-ID: Dear List, Is there any clustering algorithm, which is based on correlation coefficient instead of Euclidean/Manhattan distance? Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Sep 3 13:40:38 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 3 Sep 2019 13:40:38 -0400 Subject: [scikit-learn] No convergence warning in logistic regression In-Reply-To: References: <58775d9b-bf80-c0f4-f696-b1470cb37745@u-bourgogne.fr> <5816eb6d-e554-1ddd-cdb6-9cbab8b8c904@u-bourgogne.fr> Message-ID: <8ce4c72a-0f03-993c-d33c-384f38e9d2d5@gmail.com> Having correlated data is not the same as not converging. We could warn on correlated data but I don't think that's actually useful for scikit-learn. I actually recently argued to remove the warning in linear discriminant analysis: https://github.com/scikit-learn/scikit-learn/issues/14361 As argued in many places, we're not a stats library and as long as there's a well-defined solution, there's no reason to warn. LogisticRegression will give you the solution with minimum coefficient norm if there's multiple solutions. On 9/2/19 5:40 AM, Guillaume Lema?tre wrote: > LBFGS will raise ConvergenceWarning for sure. You can check the > n_iter_ attribute to know if you really converged. > > On Mon, 2 Sep 2019 at 10:28, Beno?t Presles > > > wrote: > > Hello Sebastian, > > I have tried with the lbfgs solver and it does not change > anything. I do > not have any convergence warning. > > Thanks for your help, > Ben > > > Le 30/08/2019 ? 18:29, Sebastian Raschka a ?crit?: > > Hi Ben, > > > > I can recall seeing convergence warnings for scikit-learn's > logistic regression model on datasets in the past as well. Which > solver did you use for LogisticRegression in sklearn? If you > haven't done so, have used the lbfgs solver? I.e., > > > > LogisticRegression(..., solver='lbfgs')? > > > > Best, > > Sebastian > > > >> On Aug 30, 2019, at 9:52 AM, Beno?t Presles > > wrote: > >> > >> Dear all, > >> > >> I compared the logistic regression of statsmodels (Logit) with > the logistic regression of sklearn (LogisticRegression). As I do > not do regularization, I use the fit method with statsmodels and > set penalty='none' in sklearn. Most of the time, I have got the > same results between the two packages. > >> > >> However, when data are correlated, it is not the case. In fact, > I have got a very useful convergence warning with statsmodel > (ConvergenceWarning: Maximum Likelihood optimization failed to > converge) that I do not have with sklearn? Is it normal that I do > not have any convergence warning with sklearn even if I put > verbose=1? I guess sklearn did not converge either. > >> > >> > >> Thanks for your help, > >> Best regards, > >> Ben > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Sep 3 13:41:08 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 3 Sep 2019 13:41:08 -0400 Subject: [scikit-learn] Clustering Algorithm based on correlation distance In-Reply-To: References: Message-ID: <2faad0de-9bc3-54bc-ff8f-56000f319d38@gmail.com> There are many that allow "metric='precomputed'". On 9/2/19 10:06 AM, Safi Ullah Marwat wrote: > Dear List, > Is there any clustering algorithm, which is based on correlation > coefficient instead of Euclidean/Manhattan distance? > > Regards > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Sep 3 13:46:44 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 3 Sep 2019 13:46:44 -0400 Subject: [scikit-learn] scikit-learn Digest, Vol 41, Issue 21 In-Reply-To: References: Message-ID: <6f2a593f-fac8-edf5-6b69-a0699248a493@gmail.com> https://scikit-learn.org/stable/developers/contributing.html#contributing On 8/26/19 1:09 PM, Mike Smith wrote: > Hi, > > I have been scouring around everywhere to volunteer. I took a one > month python course from a training company that promised me a job in > two months but they're still working on it after 3. So I decide to > volunteer. I'm looking to use python with DS, ML, AI, etc, I love > neural nets, then it hit me that I get the scikit mailing list and > opened it up and you guys are talking about volunteers. I would love > to volunteer for scikit. But I just have one month training in python. > I have prior experience with java and javascript, some computer > science education, How can I start volunteering? > > On Mon, Aug 26, 2019 at 9:03 AM > wrote: > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > ? ?1. Re: Monthly meetings between core developers + "Hello World" > ? ? ? (Nicolas Hug) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 26 Aug 2019 08:54:21 -0400 > From: Nicolas Hug > > To: scikit-learn at python.org > Subject: Re: [scikit-learn] Monthly meetings between core developers + > ? ? ? ? "Hello World" > Message-ID: <136faf1a-5514-1c21-7514-0673b4ddde81 at gmail.com > > > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > Meeting is in 5 minutes everyone! Prepare to be np.random.choice'd? :) > > https://appear.in/amueller > > > > On 8/22/19 10:11 AM, Nicolas Hug wrote: > > > > Hi Everyone, > > > > Quick reminder that the next meeting is on Monday! *Please > update your > > cards on the project board* so we can all have a look before the > week-end. > > > > We decided to go for a "scrum-like" approach this time: quickly go > > through everyone's notes first, then discuss main issues. > > > > Anyone interested in hosting? I think we should have a new > person each > > time, or you'll soon be fed up with me. If nobody speaks up I'll > > np.random.choice someone on Monday ;) > > > > ---- > > > > Time and date: > > > https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=8&day=26&hour=13&min=0&sec=0&p1=240&p2=33&p3=37&p4=179 > > > > Project board: > > https://github.com/scikit-learn/scikit-learn/projects/15 > > > > > > > Meeting link: https://appear.in/amueller > > > > > > > > > See you on Monday! > > > > Nicolas > > > > > > On 8/5/19 10:31 AM, Andreas Mueller wrote: > >> As usual, I agree ;) > >> I think it would be good to call out particularly important > bugfixes > >> so they get reviews. > >> We might also want to think about how we can organize the issue > >> tracker better. > >> > >> Having more full-time people on the project certainly means more > >> activity but ideally we can use some of that time to make the > issue > >> tracker more organized. > >> > >> > >> On 8/5/19 9:21 AM, Joel Nothman wrote: > >>> Yay for technology!?Awesome to see you all and have some matters > >>> clarified. > >>> > >>> Adrin is right that the issue tracker is increasingly > overwhelming > >>> (because there are more awesome people hired to work on the > project, > >>> more frequent sprints, etc). This meeting is a useful summary. > >>> > >>> The meeting mostly focussed on big features. We should be > careful to > >>> not leave behind important bugs fixes and work originating > outside > >>> the core devs. > >>> > >>> Despite that: Some of Guillaume's activities got cut off. I > think it > >>> would be great to progress both on stacking and resampling before > >>> the next release. > >>> > >>> I also think these meetings should, as a standing item, note the > >>> estimated upcoming release schedule, to help us remain aware > of that > >>> cadence. > >>> > >>> Good night! > >>> > >>> J > >>> > >>> _______________________________________________ > >>> scikit-learn mailing list > >>> scikit-learn at python.org > >>> https://mail.python.org/mailman/listinfo/scikit-learn > >> > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 41, Issue 21 > ******************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rs2715 at stern.nyu.edu Tue Sep 3 21:08:26 2019 From: rs2715 at stern.nyu.edu (Reshama Shaikh) Date: Tue, 3 Sep 2019 21:08:26 -0400 Subject: [scikit-learn] WiMLDS scikit-learn sprints Message-ID: Hello, I'm currently working on organizing the 3rd WiMLDS scikit-learn sprint for 2019, this last one is in San Francisco (SF). Someone suggested it would be a good idea to share information about those sprints with this community. Repo for latest sprint in NYC: https://github.com/WiMLDS/nyc-2019-scikit-sprint Website for upcoming sprint in SF: https://sites.google.com/view/bay-area-wimlds-2019-sprint/home RELATED ARTICLES * [About WiMLDS open source sprints](http://wimlds.org/opensourcesprints-2/) (Reshama Shaikh) * [Nairobi WiMLDS 2019 Sprint Impact Report]( https://reshamas.github.io/nairobi-wimlds-2019-scikit-learn-sprint-impact-report/) (Reshama Shaikh) * [Scikit-learn Sprint at Nairobi, Kenya]( https://adrin.info/scikit-learn-sprint-at-nairobi-kenya.html) (Adrin Jalali) * [Highlights from the 2019 Nairobi WiMLDS Scikit-learn Sprint]( https://medium.com/@mariamhaji01/highlights-from-the-2019-nairobi-wimlds-scikit-sprint-889de3b20215) (Mariam Haji) * [NYC WiMLDS: 2017-2018 Sprint Impact Report]( https://reshamas.github.io/impact-report-for-wimlds-scikit-learn-sprints/) (Reshama Shaikh) * [Highlights from 2018 WiMLDS NYC / Scikit Sprint]( https://reshamas.github.io/highlights-from-the-2018-NYC-WiMLDS-scikit-sprint/) (Reshama Shaikh) * [Interview with Andreas Mueller, Core Contributor to Scikit-Learn]( http://mlconf.com/interview-andreas-muller-lecturer-columbia-university-core-contributor-scikit-learn-reshama-shaikh/) (Reshama Shaikh) Best, Reshama --------------------------------------- Reshama Shaikh Blog | Twitter | LinkedIn | Instagram | GitHub NYC WiMLDS Co-organizer WiMLDS Board Member NYC WiMLDS NYC PyLadies --------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From safiullahmarwat at gmail.com Wed Sep 4 00:41:09 2019 From: safiullahmarwat at gmail.com (Safi Ullah Marwat) Date: Wed, 4 Sep 2019 13:41:09 +0900 Subject: [scikit-learn] Clustering Algorithm based on correlation distance In-Reply-To: <2faad0de-9bc3-54bc-ff8f-56000f319d38@gmail.com> References: <2faad0de-9bc3-54bc-ff8f-56000f319d38@gmail.com> Message-ID: Thank you Mr.Mueller Can you share any example sentence? I searched but found this link https://stackoverflow.com/questions/24560799/how-to-use-a-precomputed-distance-matrix-in-scikit-kmeans which says one cannot supply precomputed distance matric. the one kmean calculate precomputed matric that's for speed purpose, but that's too based on euclidean distance. thanks in advance On Wed, Sep 4, 2019 at 2:41 AM Andreas Mueller wrote: > There are many that allow "metric='precomputed'". > > > On 9/2/19 10:06 AM, Safi Ullah Marwat wrote: > > Dear List, > Is there any clustering algorithm, which is based on correlation > coefficient instead of Euclidean/Manhattan distance? > > Regards > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.braune79 at gmail.com Wed Sep 4 00:57:44 2019 From: christian.braune79 at gmail.com (Christian Braune) Date: Wed, 4 Sep 2019 06:57:44 +0200 Subject: [scikit-learn] Clustering Algorithm based on correlation distance In-Reply-To: References: <2faad0de-9bc3-54bc-ff8f-56000f319d38@gmail.com> Message-ID: Using correlation as a similarity measure leads to some problems with k-means (mainly because the arithmetic mean is not at all an estimator that can be used with correlation). If you properly normalized the correlation DBSCAN might be an alternative. The minpts parameter will still have the same meaning, the eps will state the maximal allowed difference in correlation (somewhat dubious meaning...) that points may have when calculating the neighborhoods of points. But be aware that points belonging to the same cluster (in DBSCAN) might be completely uncorrelated in the end. Safi Ullah Marwat schrieb am Mi., 4. Sep. 2019, 06:42: > Thank you Mr.Mueller > Can you share any example sentence? I searched but found this link > https://stackoverflow.com/questions/24560799/how-to-use-a-precomputed-distance-matrix-in-scikit-kmeans which > says one cannot supply precomputed distance matric. the one kmean calculate > precomputed matric that's for speed purpose, but that's too based on > euclidean distance. > thanks in advance > > On Wed, Sep 4, 2019 at 2:41 AM Andreas Mueller wrote: > >> There are many that allow "metric='precomputed'". >> >> >> On 9/2/19 10:06 AM, Safi Ullah Marwat wrote: >> >> Dear List, >> Is there any clustering algorithm, which is based on correlation >> coefficient instead of Euclidean/Manhattan distance? >> >> Regards >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marmochiaskl at gmail.com Wed Sep 4 05:22:03 2019 From: marmochiaskl at gmail.com (Chiara Marmo) Date: Wed, 4 Sep 2019 11:22:03 +0200 Subject: [scikit-learn] scikit-learn website and documentation In-Reply-To: References: <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com> <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org> Message-ID: Hello Roman, thanks for your answer. Much appreciated. Cheers, Chiara On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak wrote: > Hello Chiara, > > as far as I understood scikit-learn#14849 started as an incremental > improvement of the scikit-learn website and ended up as a more in depth > rewrite of the sphinx theme. > > If you have any comments or suggestions don't hesitate to comment on > that issue. For instance, that PR went with Boostrap and I'm wondering > about be the advantages/limitations with respect to using something like > PureCSS. > > Reviews of that PR would also be very much appreciated. > > -- > Roman > > On 30/08/2019 18:58, Chiara Marmo wrote: > > Hello, > > > > Should I consider this PR [1] as an answer? ;) > > > > Cheers, > > Chiara > > > > [1] https://github.com/scikit-learn/scikit-learn/pull/14849 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrin.jalali at gmail.com Sun Sep 8 13:48:47 2019 From: adrin.jalali at gmail.com (Adrin) Date: Sun, 8 Sep 2019 19:48:47 +0200 Subject: [scikit-learn] Outreachy program Message-ID: Hi, During EuroScipy, we had a few discussions regarding diversity in open source in general, and one of the ways some projects have tried to improve that has been through participation in the Outreachy program (https://www.outreachy.org/). I'd be happy to mentor somebody if they apply. Would that be okay if we apply? The deadline has just passed, but if they're flexible, we may be able to still apply. Thanks, Adrin. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sun Sep 8 21:14:11 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Mon, 9 Sep 2019 11:14:11 +1000 Subject: [scikit-learn] Outreachy program In-Reply-To: References: Message-ID: I'm broadly supportive, but just wanted to note our challenges with mentoring GSoC in the past: - Limited mentor availability should not be a big issue now. - Need to focus on a single project may not be well aligned with Scikit-learn's goals, or may not yield optimal code results. - Reviewers may feel compelled to expedite the merge of materials not clearly up to standard or useful. - Needs to be an investment in someone who would continue involvement with the project. In this case it's not clear whether having ongoing involvement is as essential an outcome for the project to be worthwhile. Given the relatively large base of funded contributors / core devs at the moment, there may be a challenge finding projects with low assumed knowledge, at least if they involve code. J On Mon, 9 Sep 2019 at 03:50, Adrin wrote: > Hi, > > During EuroScipy, we had a few discussions regarding diversity in open > source in general, and > one of the ways some projects have tried to improve that has been through > participation in the > Outreachy program (https://www.outreachy.org/). I'd be happy to mentor > somebody if they apply. > > Would that be okay if we apply? The deadline has just passed, but if > they're flexible, we may > be able to still apply. > > Thanks, > Adrin. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sim4n6 at gmail.com Mon Sep 9 06:44:34 2019 From: sim4n6 at gmail.com (Sim a) Date: Mon, 9 Sep 2019 11:44:34 +0100 Subject: [scikit-learn] scikit-learn website and documentation In-Reply-To: References: <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com> <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org> Message-ID: Hi there, I hope I am not intruding ...but the mock-up website https://cmarmo.github.io/mockup-skl/ has a little unusual effect while scrolling on Firefox 69.0. Please check the attached screen capture. On Wed, Sep 4, 2019 at 10:23 AM Chiara Marmo wrote: > Hello Roman, > > thanks for your answer. > Much appreciated. > > Cheers, > Chiara > > On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak > wrote: > >> Hello Chiara, >> >> as far as I understood scikit-learn#14849 started as an incremental >> improvement of the scikit-learn website and ended up as a more in depth >> rewrite of the sphinx theme. >> >> If you have any comments or suggestions don't hesitate to comment on >> that issue. For instance, that PR went with Boostrap and I'm wondering >> about be the advantages/limitations with respect to using something like >> PureCSS. >> >> Reviews of that PR would also be very much appreciated. >> >> -- >> Roman >> >> On 30/08/2019 18:58, Chiara Marmo wrote: >> > Hello, >> > >> > Should I consider this PR [1] as an answer? ;) >> > >> > Cheers, >> > Chiara >> > >> > [1] https://github.com/scikit-learn/scikit-learn/pull/14849 >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: some-kind-of-strange-behavior.png Type: image/png Size: 375722 bytes Desc: not available URL: From jaquesgrobler at gmail.com Mon Sep 9 07:02:55 2019 From: jaquesgrobler at gmail.com (Jaques Grobler) Date: Mon, 9 Sep 2019 13:02:55 +0200 Subject: [scikit-learn] scikit-learn website and documentation In-Reply-To: References: <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com> <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org> Message-ID: @Sim - I can reproduce this on Chrome too ... It happens for narrow viewports where there is no gutter around the main content. Up to a width of 1280px, the sidebar behaves, I assume, correctly - as it does with mobile view - opening up from the hamburger-menu over the content. For super-wide screens, the sidebar lands in the left-gutter on scrolling, and doesn't interfere, but inbetween the sidebar will appear over the content as in Sim's message, as the gutter isn't there anymore. One can quick-fix this my just making the problem-media-width behave like that of the mobile/ipad widths - else one needs to look at the position and flex configuration of the content vs. the sidebar, to maybe make the sidebar *push* the content to the right when open (if there is no gutter). Just my two cents - Looks cool beyond the glitch :) El lun., 9 de sep. de 2019 a la(s) 12:32, Sim a (sim4n6 at gmail.com) escribi?: > Hi there, > > I hope I am not intruding ...but the mock-up website > https://cmarmo.github.io/mockup-skl/ > has a little unusual effect while scrolling on Firefox 69.0. Please check > the attached screen capture. > > On Wed, Sep 4, 2019 at 10:23 AM Chiara Marmo > wrote: > >> Hello Roman, >> >> thanks for your answer. >> Much appreciated. >> >> Cheers, >> Chiara >> >> On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak >> wrote: >> >>> Hello Chiara, >>> >>> as far as I understood scikit-learn#14849 started as an incremental >>> improvement of the scikit-learn website and ended up as a more in depth >>> rewrite of the sphinx theme. >>> >>> If you have any comments or suggestions don't hesitate to comment on >>> that issue. For instance, that PR went with Boostrap and I'm wondering >>> about be the advantages/limitations with respect to using something like >>> PureCSS. >>> >>> Reviews of that PR would also be very much appreciated. >>> >>> -- >>> Roman >>> >>> On 30/08/2019 18:58, Chiara Marmo wrote: >>> > Hello, >>> > >>> > Should I consider this PR [1] as an answer? ;) >>> > >>> > Cheers, >>> > Chiara >>> > >>> > [1] https://github.com/scikit-learn/scikit-learn/pull/14849 >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaquesgrobler at gmail.com Mon Sep 9 07:08:31 2019 From: jaquesgrobler at gmail.com (Jaques Grobler) Date: Mon, 9 Sep 2019 13:08:31 +0200 Subject: [scikit-learn] scikit-learn website and documentation In-Reply-To: References: <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com> <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org> Message-ID: Sorry to spam - here's a little gif to show the behaviour and problem area: [image: responsive-sidenav.gif] One would need to decide on what the desktop behavior of the sideNav will be. ipad/mobile is fine IMHO. Hope this helps :) El lun., 9 de sep. de 2019 a la(s) 13:02, Jaques Grobler ( jaquesgrobler at gmail.com) escribi?: > @Sim - I can reproduce this on Chrome too ... It happens for narrow > viewports where there is no gutter around the main content. > > Up to a width of 1280px, the sidebar behaves, I assume, correctly - as it > does with mobile view - opening up from the hamburger-menu over the content. > For super-wide screens, the sidebar lands in the left-gutter on scrolling, > and doesn't interfere, > but inbetween the sidebar will appear over the content as in Sim's > message, as the gutter isn't there anymore. > > One can quick-fix this my just making the problem-media-width behave like > that of the mobile/ipad widths - > else one needs to look at the position and flex configuration of the > content vs. the sidebar, to maybe make the sidebar *push* the content to > the right when open (if there is no gutter). > > Just my two cents - > Looks cool beyond the glitch :) > > El lun., 9 de sep. de 2019 a la(s) 12:32, Sim a (sim4n6 at gmail.com) > escribi?: > >> Hi there, >> >> I hope I am not intruding ...but the mock-up website >> https://cmarmo.github.io/mockup-skl/ >> has a little unusual effect while scrolling on Firefox 69.0. Please check >> the attached screen capture. >> >> On Wed, Sep 4, 2019 at 10:23 AM Chiara Marmo >> wrote: >> >>> Hello Roman, >>> >>> thanks for your answer. >>> Much appreciated. >>> >>> Cheers, >>> Chiara >>> >>> On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak >>> wrote: >>> >>>> Hello Chiara, >>>> >>>> as far as I understood scikit-learn#14849 started as an incremental >>>> improvement of the scikit-learn website and ended up as a more in depth >>>> rewrite of the sphinx theme. >>>> >>>> If you have any comments or suggestions don't hesitate to comment on >>>> that issue. For instance, that PR went with Boostrap and I'm wondering >>>> about be the advantages/limitations with respect to using something >>>> like >>>> PureCSS. >>>> >>>> Reviews of that PR would also be very much appreciated. >>>> >>>> -- >>>> Roman >>>> >>>> On 30/08/2019 18:58, Chiara Marmo wrote: >>>> > Hello, >>>> > >>>> > Should I consider this PR [1] as an answer? ;) >>>> > >>>> > Cheers, >>>> > Chiara >>>> > >>>> > [1] https://github.com/scikit-learn/scikit-learn/pull/14849 >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: responsive-sidenav.gif Type: image/gif Size: 1602229 bytes Desc: not available URL: From niourf at gmail.com Mon Sep 9 11:34:05 2019 From: niourf at gmail.com (Nicolas Hug) Date: Mon, 9 Sep 2019 11:34:05 -0400 Subject: [scikit-learn] scikit-learn website and documentation In-Reply-To: References: <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com> <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org> Message-ID: <11cc438d-b42b-51d8-d171-68ed1625b10b@gmail.com> Hi Jacques and Sim, Thanks a lot for you input. As previously mentionned though, we will be moving forward with https://github.com/scikit-learn/scikit-learn/pull/14849 instead of the original proposal. Any feedback on this PR would be greatly appreciated too! Nicolas On 9/9/19 7:08 AM, Jaques Grobler wrote: > Sorry to spam - > here's a little gif to show the behaviour and problem area: > > responsive-sidenav.gif > > One would need to decide on what the desktop behavior of the sideNav > will be. ipad/mobile is fine IMHO. > > Hope this?helps :) > > El lun., 9 de sep. de 2019 a la(s) 13:02, Jaques Grobler > (jaquesgrobler at gmail.com ) escribi?: > > @Sim -?I can reproduce this on Chrome too ... It happens for > narrow viewports where there is no gutter around the main content. > > Up to a width of 1280px, the sidebar behaves, I assume, correctly > - as it does with mobile view - opening up from the hamburger-menu > over the content. > For super-wide screens, the sidebar lands in the left-gutter on > scrolling, and doesn't interfere, > but inbetween the sidebar will appear over the content as in Sim's > message, as the gutter isn't there anymore. > > One can quick-fix this my just making the problem-media-width > behave like that of the mobile/ipad widths - > else one needs to look at the position and flex configuration of > the content vs. the sidebar, to maybe make the sidebar /push/?the > content to the right when open (if there is no gutter). > > Just my two cents - > Looks cool beyond the glitch :) > > El lun., 9 de sep. de 2019 a la(s) 12:32, Sim a (sim4n6 at gmail.com > ) escribi?: > > Hi there, > > I hope I am not intruding ...but the mock-up website > https://cmarmo.github.io/mockup-skl/ > has a little unusual effect while scrolling on Firefox 69.0. > Please check the attached screen capture. > > On Wed, Sep 4, 2019 at 10:23 AM Chiara Marmo > > wrote: > > Hello Roman, > > thanks for your answer. > Much appreciated. > > Cheers, > Chiara > > On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak > > wrote: > > Hello Chiara, > > as far as I understood scikit-learn#14849 started as > an incremental > improvement of the scikit-learn website and ended up > as a more in depth > rewrite of the sphinx theme. > > If you have any comments or suggestions don't hesitate > to comment on > that issue. For instance, that PR went with Boostrap > and I'm wondering > about be the advantages/limitations with respect to > using something like > PureCSS. > > Reviews of that PR would also be very much appreciated. > > -- > Roman > > On 30/08/2019 18:58, Chiara Marmo wrote: > > Hello, > > > > Should I consider this PR [1] as an answer? ;) > > > > Cheers, > > Chiara > > > > [1] > https://github.com/scikit-learn/scikit-learn/pull/14849 > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: responsive-sidenav.gif Type: image/gif Size: 1602229 bytes Desc: not available URL: From fad469 at uregina.ca Mon Sep 9 12:56:11 2019 From: fad469 at uregina.ca (Farzana Anowar) Date: Mon, 09 Sep 2019 10:56:11 -0600 Subject: [scikit-learn] Questions about partial_fit and the Incremental library in Sci-kit learn Message-ID: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca> Hello Sir/Madam, I subscribed to the link you sent me. I am posting my question again: This Is Farzana Anowar, a Ph.D. candidate in University of Regina. Currently, I'm working to develop a model that learns incrementally from non-stationary data. I have come across an Incremental library in sci-kit learn that actually allows to do that using partial_fit. I have searched a lot for the detailed information about this 'incremental' library and 'partial_fit', however, I couldn't find any. It would be great if you could provide me with some detailed information about these two regarding how they actually work. For example, If we take SGD as a classifier, the incremental library will allow me to take chunks/batches of data. My question is: Do this incremental library train (using parial_fit) the whole batch at a time and then produce a classification performance or it takes a batch and trains each instance at a time from the batch. Thanks in advance! -- Regards, Farzana Anowar From dbsullivan23 at gmail.com Mon Sep 9 14:12:55 2019 From: dbsullivan23 at gmail.com (Daniel Sullivan) Date: Mon, 9 Sep 2019 13:12:55 -0500 Subject: [scikit-learn] Questions about partial_fit and the Incremental library in Sci-kit learn In-Reply-To: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca> References: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca> Message-ID: Hi Farzana, If I understand your question correctly you're asking how the SGD classifier works incrementally? The SGD algorithm maintains a single set of weights and iterates through all data points one at a time in a batch. It adjusts its weights on each iteration. So to answer your question, it trains on each instance, not on the batch. However, the algorithm can iterate multiple times through a single batch. Let me know if that answers your question. Best, Danny On Mon, Sep 9, 2019 at 11:56 AM Farzana Anowar wrote: > Hello Sir/Madam, > > I subscribed to the link you sent me. > > > I am posting my question again: > > This Is Farzana Anowar, a Ph.D. candidate in University of Regina. > Currently, I'm working to develop a model that learns incrementally from > non-stationary data. I have come across an Incremental library in > sci-kit learn that actually allows to do that using partial_fit. I have > searched a lot for the detailed information about this 'incremental' > library and 'partial_fit', however, I couldn't find any. > > It would be great if you could provide me with some detailed information > about these two regarding how they actually work. For example, If we > take SGD as a classifier, the incremental library will allow me to take > chunks/batches of data. My question is: Do this incremental library > train (using parial_fit) the whole batch at a time and then produce a > classification performance or it takes a batch and trains each instance > at a time from the batch. > > Thanks in advance! > > -- > Regards, > > Farzana Anowar > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fad469 at uregina.ca Mon Sep 9 14:27:22 2019 From: fad469 at uregina.ca (Farzana Anowar) Date: Mon, 09 Sep 2019 12:27:22 -0600 Subject: [scikit-learn] Questions about partial_fit and the Incremental library in Sci-kit learn In-Reply-To: References: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca> Message-ID: <752fe9919f044d83a589f19720e2ce08@uregina.ca> On 2019-09-09 12:12, Daniel Sullivan wrote: > Hi Farzana, > > If I understand your question correctly you're asking how the SGD > classifier works incrementally? The SGD algorithm maintains a single > set of weights and iterates through all data points one at a time in a > batch. It adjusts its weights on each iteration. So to answer your > question, it trains on each instance, not on the batch. However, the > algorithm can iterate multiple times through a single batch. Let me > know if that answers your question. > > Best, > > Danny > > On Mon, Sep 9, 2019 at 11:56 AM Farzana Anowar > wrote: > >> Hello Sir/Madam, >> >> I subscribed to the link you sent me. >> >> I am posting my question again: >> >> This Is Farzana Anowar, a Ph.D. candidate in University of Regina. >> Currently, I'm working to develop a model that learns incrementally >> from >> non-stationary data. I have come across an Incremental library in >> sci-kit learn that actually allows to do that using partial_fit. I >> have >> searched a lot for the detailed information about this 'incremental' >> >> library and 'partial_fit', however, I couldn't find any. >> >> It would be great if you could provide me with some detailed >> information >> about these two regarding how they actually work. For example, If we >> >> take SGD as a classifier, the incremental library will allow me to >> take >> chunks/batches of data. My question is: Do this incremental library >> train (using parial_fit) the whole batch at a time and then produce >> a >> classification performance or it takes a batch and trains each >> instance >> at a time from the batch. >> >> Thanks in advance! >> >> -- >> Regards, >> >> Farzana Anowar >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Regards, Farzana Anowar From fad469 at uregina.ca Mon Sep 9 14:32:04 2019 From: fad469 at uregina.ca (Farzana Anowar) Date: Mon, 09 Sep 2019 12:32:04 -0600 Subject: [scikit-learn] Questions about partial_fit and the Incremental library in Sci-kit learn In-Reply-To: References: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca> Message-ID: <1c94dd6cee74a47b98582e25ae5eeef3@uregina.ca> On 2019-09-09 12:12, Daniel Sullivan wrote: > Hi Farzana, > > If I understand your question correctly you're asking how the SGD > classifier works incrementally? The SGD algorithm maintains a single > set of weights and iterates through all data points one at a time in a > batch. It adjusts its weights on each iteration. So to answer your > question, it trains on each instance, not on the batch. However, the > algorithm can iterate multiple times through a single batch. Let me > know if that answers your question. > > Best, > > Danny > > On Mon, Sep 9, 2019 at 11:56 AM Farzana Anowar > wrote: > >> Hello Sir/Madam, >> >> I subscribed to the link you sent me. >> >> I am posting my question again: >> >> This Is Farzana Anowar, a Ph.D. candidate in University of Regina. >> Currently, I'm working to develop a model that learns incrementally >> from >> non-stationary data. I have come across an Incremental library in >> sci-kit learn that actually allows to do that using partial_fit. I >> have >> searched a lot for the detailed information about this 'incremental' >> >> library and 'partial_fit', however, I couldn't find any. >> >> It would be great if you could provide me with some detailed >> information >> about these two regarding how they actually work. For example, If we >> >> take SGD as a classifier, the incremental library will allow me to >> take >> chunks/batches of data. My question is: Do this incremental library >> train (using parial_fit) the whole batch at a time and then produce >> a >> classification performance or it takes a batch and trains each >> instance >> at a time from the batch. >> >> Thanks in advance! >> >> -- >> Regards, >> >> Farzana Anowar >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn Hello Daniel, Thank you so much! I think your clarification makes sense. So, whatever batches I am passing through the classifier it will train each instance through a single batch. I was just wondering if you could give me some information about partial_fit. Just for your reference, I was having a look at this code. https://dask-ml.readthedocs.io/en/latest/incremental.html Thanks! -- Regards, Farzana Anowar From dbsullivan23 at gmail.com Mon Sep 9 14:54:59 2019 From: dbsullivan23 at gmail.com (Daniel Sullivan) Date: Mon, 9 Sep 2019 13:54:59 -0500 Subject: [scikit-learn] Questions about partial_fit and the Incremental library in Sci-kit learn In-Reply-To: <1c94dd6cee74a47b98582e25ae5eeef3@uregina.ca> References: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca> <1c94dd6cee74a47b98582e25ae5eeef3@uregina.ca> Message-ID: Hi Farzana, Do you have a specific question about partial_fit? Essentially it works the same as the fit method, but the weights are preserved between calls. Within the partial fit and fit methods, the model makes an estimate based on the single data point and adjusts the weights proportionally based on the difference between the estimate and the target. How much the weights are changed depends on the loss function and learning rate you specify. On Mon, Sep 9, 2019 at 1:32 PM Farzana Anowar wrote: > On 2019-09-09 12:12, Daniel Sullivan wrote: > > Hi Farzana, > > > > If I understand your question correctly you're asking how the SGD > > classifier works incrementally? The SGD algorithm maintains a single > > set of weights and iterates through all data points one at a time in a > > batch. It adjusts its weights on each iteration. So to answer your > > question, it trains on each instance, not on the batch. However, the > > algorithm can iterate multiple times through a single batch. Let me > > know if that answers your question. > > > > Best, > > > > Danny > > > > On Mon, Sep 9, 2019 at 11:56 AM Farzana Anowar > > wrote: > > > >> Hello Sir/Madam, > >> > >> I subscribed to the link you sent me. > >> > >> I am posting my question again: > >> > >> This Is Farzana Anowar, a Ph.D. candidate in University of Regina. > >> Currently, I'm working to develop a model that learns incrementally > >> from > >> non-stationary data. I have come across an Incremental library in > >> sci-kit learn that actually allows to do that using partial_fit. I > >> have > >> searched a lot for the detailed information about this 'incremental' > >> > >> library and 'partial_fit', however, I couldn't find any. > >> > >> It would be great if you could provide me with some detailed > >> information > >> about these two regarding how they actually work. For example, If we > >> > >> take SGD as a classifier, the incremental library will allow me to > >> take > >> chunks/batches of data. My question is: Do this incremental library > >> train (using parial_fit) the whole batch at a time and then produce > >> a > >> classification performance or it takes a batch and trains each > >> instance > >> at a time from the batch. > >> > >> Thanks in advance! > >> > >> -- > >> Regards, > >> > >> Farzana Anowar > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > Hello Daniel, > > Thank you so much! I think your clarification makes sense. So, whatever > batches I am passing through the classifier it will train each instance > through a single batch. > > I was just wondering if you could give me some information about > partial_fit. Just for your reference, I was having a look at this code. > > https://dask-ml.readthedocs.io/en/latest/incremental.html > > Thanks! > > -- > Regards, > > Farzana Anowar > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fad469 at uregina.ca Mon Sep 9 18:38:03 2019 From: fad469 at uregina.ca (Farzana Anowar) Date: Mon, 09 Sep 2019 16:38:03 -0600 Subject: [scikit-learn] Incremental learning in scikit-learn Message-ID: <9fd8813b59fbc3a29283a65b9e971d9a@uregina.ca> Hello Sir/Madam, I am going through the incremental learning algorithm in Scikit-learn. SGD in sci-kit learn is such a kind of algorithm that allows learning incrementally by passing chunks/batches. Now my question is: does sci-kit learn keeps all the batches for training data in memory? Or it keeps chunks/batches in memory up to a certain amount of size? Or it keeps only one chunk/batch while training in memory and removes the other trained chunks/batches after training? Does that mean it suffers from catastrophic forgetting? Thanks! -- Regards, Farzana Anowar From dbsullivan23 at gmail.com Mon Sep 9 19:53:39 2019 From: dbsullivan23 at gmail.com (Daniel Sullivan) Date: Mon, 9 Sep 2019 18:53:39 -0500 Subject: [scikit-learn] Incremental learning in scikit-learn In-Reply-To: <9fd8813b59fbc3a29283a65b9e971d9a@uregina.ca> References: <9fd8813b59fbc3a29283a65b9e971d9a@uregina.ca> Message-ID: Hey Farzana, The algorithm only keeps one batch in memory at a time. Between processing over each batch, SGD keeps a set of weights that it alters with each iteration of a data point or instance within a batch. This set of weights functions as the persisted state between calls of partial_fit. That means you will get the same results with SGD regardless of your batch size and you can choose your batch size according to your memory constraints. Hope that helps. - Danny On Mon, Sep 9, 2019 at 5:53 PM Farzana Anowar wrote: > Hello Sir/Madam, > > I am going through the incremental learning algorithm in Scikit-learn. > SGD in sci-kit learn is such a kind of algorithm that allows learning > incrementally by passing chunks/batches. Now my question is: does > sci-kit learn keeps all the batches for training data in memory? Or it > keeps chunks/batches in memory up to a certain amount of size? Or it > keeps only one chunk/batch while training in memory and removes the > other trained chunks/batches after training? Does that mean it suffers > from catastrophic forgetting? > > Thanks! > > -- > Regards, > > Farzana Anowar > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fad469 at uregina.ca Mon Sep 9 20:15:38 2019 From: fad469 at uregina.ca (Farzana Anowar) Date: Mon, 09 Sep 2019 18:15:38 -0600 Subject: [scikit-learn] Incremental learning in scikit-learn In-Reply-To: References: <9fd8813b59fbc3a29283a65b9e971d9a@uregina.ca> Message-ID: <2a3df80b95a7e8bb5d3199273012b8c3@uregina.ca> On 2019-09-09 17:53, Daniel Sullivan wrote: > Hey Farzana, > > The algorithm only keeps one batch in memory at a time. Between > processing over each batch, SGD keeps a set of weights that it alters > with each iteration of a data point or instance within a batch. This > set of weights functions as the persisted state between calls of > partial_fit. That means you will get the same results with SGD > regardless of your batch size and you can choose your batch size > according to your memory constraints. Hope that helps. > > - Danny > > On Mon, Sep 9, 2019 at 5:53 PM Farzana Anowar > wrote: > >> Hello Sir/Madam, >> >> I am going through the incremental learning algorithm in >> Scikit-learn. >> SGD in sci-kit learn is such a kind of algorithm that allows >> learning >> incrementally by passing chunks/batches. Now my question is: does >> sci-kit learn keeps all the batches for training data in memory? Or >> it >> keeps chunks/batches in memory up to a certain amount of size? Or it >> >> keeps only one chunk/batch while training in memory and removes the >> other trained chunks/batches after training? Does that mean it >> suffers >> from catastrophic forgetting? >> >> Thanks! >> >> -- >> Regards, >> >> Farzana Anowar >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn Thanks a lot! -- Regards, Farzana Anowar From joel.nothman at gmail.com Tue Sep 10 21:19:16 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Wed, 11 Sep 2019 11:19:16 +1000 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments Message-ID: As per our Governance document, changes to API principles are to be established through an Enhancement Proposal (SLEP) from which any core developer can call for a vote on its acceptance. *SLEP009 Keyword Only Arguments is the first SLEP up for a vote. Please see* *https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html * *This proposal discusses the path to gradually forcing users to pass arguments, or most of them, as keyword arguments only.* Core developers are invited to vote on this change until 11 October 2019 by replying to this email thread. All members of the community are welcome to comment on the proposal on this mailing list, or to propose minor changes through Issues and Pull Requests at https://github.com/scikit-learn/enhancement_proposals/. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrin.jalali at gmail.com Wed Sep 11 04:58:28 2019 From: adrin.jalali at gmail.com (Adrin) Date: Wed, 11 Sep 2019 10:58:28 +0200 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: Message-ID: It's a yes for me. On Wed, Sep 11, 2019 at 3:20 AM Joel Nothman wrote: > As per our Governance > document, changes to API principles are to be established through an > Enhancement Proposal (SLEP) from which any core developer can call for a > vote on its acceptance. > > *SLEP009 Keyword Only Arguments is the first SLEP up for a vote. Please > see* > > *https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html > * > > *This proposal discusses the path to gradually forcing users to pass > arguments, or most of them, as keyword arguments only.* > > Core developers are invited to vote on this change until 11 October 2019 > by replying to this email thread. > > All members of the community are welcome to comment on the proposal on > this mailing list, or to propose minor changes through Issues and Pull > Requests at https://github.com/scikit-learn/enhancement_proposals/. > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahowe42 at gmail.com Wed Sep 11 07:20:01 2019 From: ahowe42 at gmail.com (Andrew Howe) Date: Wed, 11 Sep 2019 12:20:01 +0100 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: Message-ID: I'm strongly supportive of moving to keyword only arguments. Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile ResearchGate Profile Open Researcher and Contributor ID (ORCID) Github Profile Personal Website I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Wed, Sep 11, 2019 at 2:21 AM Joel Nothman wrote: > As per our Governance > document, changes to API principles are to be established through an > Enhancement Proposal (SLEP) from which any core developer can call for a > vote on its acceptance. > > *SLEP009 Keyword Only Arguments is the first SLEP up for a vote. Please > see* > > *https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html > * > > *This proposal discusses the path to gradually forcing users to pass > arguments, or most of them, as keyword arguments only.* > > Core developers are invited to vote on this change until 11 October 2019 > by replying to this email thread. > > All members of the community are welcome to comment on the proposal on > this mailing list, or to propose minor changes through Issues and Pull > Requests at https://github.com/scikit-learn/enhancement_proposals/. > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at inria.fr Wed Sep 11 09:22:15 2019 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Wed, 11 Sep 2019 15:22:15 +0200 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: Message-ID: hi, Adrin do you suggest this for everything or maybe just for __init__ params of estimators and stuff that can come after X, y in fit eg sample_weights? would: clf.fit(X, y) still be allowed? Alex From adrin.jalali at gmail.com Wed Sep 11 09:38:09 2019 From: adrin.jalali at gmail.com (Adrin) Date: Wed, 11 Sep 2019 15:38:09 +0200 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: Message-ID: Hi, I'm (mostly) the messenger, don't shoot me :P It may help to summarize the SLEP: 1. This can be applied to all methods, not just __init__. 2. The SLEP doesn't say we have to apply it everywhere. It's mostly that it lets us do that. 3. It doesn't make ALL inputs a keywords only argument. The common ones such as X and y in fit(X, y) will stay as they are. Therefore clf.fit(X, y) will definitely be allowed. 4. Whether or not sample_weight should be keyword only or not in fit, requires its own discussion, and the route of the discussion is defined in the SLEP. In other words, if an input parameter is used as a positional argument less frequently than X% of the time, then it can/should be a keyword only argument. But the SLEP better defines these conditions. I hope that clarifies it a little bit. Adrin/ On Wed, Sep 11, 2019 at 3:23 PM Alexandre Gramfort < alexandre.gramfort at inria.fr> wrote: > hi, > > Adrin do you suggest this for everything or maybe just for __init__ > params of estimators > and stuff that can come after X, y in fit eg sample_weights? > > would: > > clf.fit(X, y) > > still be allowed? > > Alex > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Wed Sep 11 14:21:34 2019 From: niourf at gmail.com (Nicolas Hug) Date: Wed, 11 Sep 2019 14:21:34 -0400 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: Message-ID: <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com> Since there is no explicit proposal in the SLEP it's not very clear what we need to vote for / against? But overall I'm? + 1 on forcing kwargs for all __init__ methods. Nicolas On 9/11/19 9:38 AM, Adrin wrote: > Hi, > > I'm (mostly) the messenger, don't shoot me :P > > It may help to summarize the SLEP: > 1. This can be applied to all methods, not just __init__. > 2. The SLEP doesn't say we have to apply it everywhere. It's mostly > that it lets us do that. > 3. It doesn't make ALL inputs a keywords only argument. The common > ones such as X and y in fit(X, y) will stay as they are. > ?? Therefore clf.fit(X, y) will definitely be allowed. > 4. Whether or not sample_weight should be keyword only or not in fit, > requires its own discussion, and the route of the discussion > ?? is defined in the SLEP. > > In other words, if an input parameter is used as a positional argument > less frequently than X% of the time, then it can/should be > a keyword only argument. But the SLEP better defines these conditions. > > I hope that clarifies it a little bit. > > Adrin/ > > On Wed, Sep 11, 2019 at 3:23 PM Alexandre Gramfort > > wrote: > > hi, > > Adrin do you suggest this for everything or maybe just for __init__ > params of estimators > and stuff that can come after X, y in fit eg sample_weights? > > would: > > clf.fit(X, y) > > still be allowed? > > Alex > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at inria.fr Wed Sep 11 15:41:18 2019 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Wed, 11 Sep 2019 21:41:18 +0200 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com> References: <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com> Message-ID: > But overall I'm + 1 on forcing kwargs for all __init__ methods. yes I think it will help for __init__ methods Alex PS : I don't shoot people (usually...) From qinhanmin2005 at sina.com Wed Sep 11 20:37:16 2019 From: qinhanmin2005 at sina.com (Hanmin Qin) Date: Thu, 12 Sep 2019 08:37:16 +0800 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments Message-ID: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> I'll vote +1, though there're still lots of things to decide. Hanmin Qin ----- Original Message ----- From: Alexandre Gramfort To: Scikit-learn mailing list Subject: Re: [scikit-learn] Vote on SLEP009: keyword only arguments Date: 2019-09-12 03:43 > But overall I'm + 1 on forcing kwargs for all __init__ methods. yes I think it will help for __init__ methods Alex PS : I don't shoot people (usually...) _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed Sep 11 22:40:51 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 12 Sep 2019 12:40:51 +1000 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> Message-ID: These there details of specific API changes to be decided: The question being put, as per the SLEP, is: do we want to utilise Python 3's force-keyword-argument syntax and to change existing APIs which support arguments positionally to use this syntax, via a deprecation period? -------------- next part -------------- An HTML attachment was scrubbed... URL: From spsayakpaul at gmail.com Thu Sep 12 00:31:36 2019 From: spsayakpaul at gmail.com (Sayak Paul) Date: Thu, 12 Sep 2019 10:01:36 +0530 Subject: [scikit-learn] MultiLabelBinarizer gives individual characters instead of the classes Message-ID: Hi. I am working on a Multi-label text classification problem. In order to encode the labels, I am using MultiLabelBinarizer. The labels of the dataset look like - [image: image] When I am using mlb = MultiLabelBinarizer() mlb.fit(labels)print(mlb.classes_) I am getting - [image: image] Whereas, the output (sample output) I want is - [image: image] I got the above output by - mlb = MultiLabelBinarizer() sample_labels = [ ['stat.ML', 'cs.LG'], ['cs.CV', 'cs.RO'] ] mlb.fit(sample_labels)print(mlb.classes_) Help would be very much appreciated here. Here's the dataset I had prepared: arXivdata.csv.zip I stripped away the double quotes in the labels after loading it in a pandas DataFrame by - import re arxiv_data['labels'] = arxiv_data['labels'].str.replace(r"[\"]", '') scikit-learn version: '0.21.3' Sayak Paul | sayak.dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From loic.esteve at ymail.com Thu Sep 12 01:24:48 2019 From: loic.esteve at ymail.com (=?utf-8?B?TG/Dr2MgRXN0w6h2ZQ==?=) Date: Thu, 12 Sep 2019 07:24:48 +0200 Subject: [scikit-learn] MultiLabelBinarizer gives individual characters instead of the classes In-Reply-To: References: Message-ID: I think this caveat has been added in the dev doc (not yet in the stable doc). You may want to read: https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html and in particular the part that starts with "A common mistake is to pass in a list". Cheers, Lo?c > Hi. > > I am working on a Multi-label text classification problem. In order to encode the labels, I am using MultiLabelBinarizer. The labels of the dataset look like - > > image > > When I am using > > mlb = MultiLabelBinarizer() > mlb.fit(labels) > print(mlb.classes_) > > I am getting - > > image > > Whereas, the output (sample output) I want is - > > image > > I got the above output by - > > mlb = MultiLabelBinarizer() > sample_labels = [ > ['stat.ML', 'cs.LG'], > ['cs.CV', 'cs.RO'] > ] > mlb.fit(sample_labels) > print(mlb.classes_) > > Help would be very much appreciated here. > > Here's the dataset I had prepared: > arXivdata.csv.zip > > I stripped away the double quotes in the labels after loading it in a pandas DataFrame by - > > import re > > arxiv_data['labels'] = arxiv_data['labels'].str.replace(r"[\"]", '') > > scikit-learn version: '0.21.3' > > Sayak Paul | sayak.dev From g.lemaitre58 at gmail.com Thu Sep 12 04:06:30 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Thu, 12 Sep 2019 10:06:30 +0200 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> Message-ID: To the question: do we want to utilise Python 3's force-keyword-argument syntax and to change existing APIs which support arguments positionally to use this syntax, via a deprecation period? I am +1. IMO, even if the syntax might be unknown, it will remain unknown until projects from the ecosystem are not using it. To the question: which methods should be impacted? I think we should be as gentle as possible at first. I am a little concerned about breaking some codes which were working fine before. On Thu, 12 Sep 2019 at 04:43, Joel Nothman wrote: > These there details of specific API changes to be decided: > > The question being put, as per the SLEP, is: > do we want to utilise Python 3's force-keyword-argument syntax > and to change existing APIs which support arguments positionally to use > this syntax, via a deprecation period? > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alejandro.peralta at mercadolibre.com Thu Sep 12 08:23:03 2019 From: alejandro.peralta at mercadolibre.com (Alejandro Javier Peralta Frias) Date: Thu, 12 Sep 2019 09:23:03 -0300 Subject: [scikit-learn] How can I enable line tracing for cython modules. Message-ID: Hello all, To enable cython tracing (in particular I want to line trace neighbors module) I understand that I have to recompile the cython modules with CYTHON_TRACE=1 but I'm not sure where should I set this. Should I use: # distutils: define_macros=CYTHON_TRACE_NOGIL=1 In the files I want to trace? Regards, -- Ale -------------- next part -------------- An HTML attachment was scrubbed... URL: From spsayakpaul at gmail.com Fri Sep 13 01:16:09 2019 From: spsayakpaul at gmail.com (Sayak Paul) Date: Fri, 13 Sep 2019 10:46:09 +0530 Subject: [scikit-learn] scikit-learn Digest, Vol 42, Issue 14 In-Reply-To: References: Message-ID: I was able to solve the problem using - mlb = MultiLabelBinarizer() mlb.fit([y_train]) Thanks for the suggestions. The output of mlb.classes_ now looks the following (first ten classes): [image: image.png] However, when I transform it using mlb.transform([y_train]), another problem arrises - [image: image.png] Kindly suggest :) Sayak Paul | sayak.dev On Thu, Sep 12, 2019 at 9:33 PM wrote: > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: MultiLabelBinarizer gives individual characters instead > of the classes (Lo?c Est?ve) > 2. Re: Vote on SLEP009: keyword only arguments (Guillaume Lema?tre) > 3. How can I enable line tracing for cython modules. > (Alejandro Javier Peralta Frias) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 12 Sep 2019 07:24:48 +0200 > From: Lo?c Est?ve > To: Scikit-learn mailing list > Subject: Re: [scikit-learn] MultiLabelBinarizer gives individual > characters instead of the classes > Message-ID: > Content-Type: text/plain; charset=utf-8 > > I think this caveat has been added in the dev doc (not yet in the stable > doc). You may want to read: > > https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html > and in particular the part that starts with "A common mistake is to pass > in a list". > > Cheers, > Lo?c > > > Hi. > > > > I am working on a Multi-label text classification problem. In order to > encode the labels, I am using MultiLabelBinarizer. The labels of the > dataset look like - > > > > image > > > > When I am using > > > > mlb = MultiLabelBinarizer() > > mlb.fit(labels) > > print(mlb.classes_) > > > > I am getting - > > > > image > > > > Whereas, the output (sample output) I want is - > > > > image > > > > I got the above output by - > > > > mlb = MultiLabelBinarizer() > > sample_labels = [ > > ['stat.ML', 'cs.LG'], > > ['cs.CV', 'cs.RO'] > > ] > > mlb.fit(sample_labels) > > print(mlb.classes_) > > > > Help would be very much appreciated here. > > > > Here's the dataset I had prepared: > > arXivdata.csv.zip > > > > I stripped away the double quotes in the labels after loading it in a > pandas DataFrame by - > > > > import re > > > > arxiv_data['labels'] = arxiv_data['labels'].str.replace(r"[\"]", '') > > > > scikit-learn version: '0.21.3' > > > > Sayak Paul | sayak.dev > > > > ------------------------------ > > Message: 2 > Date: Thu, 12 Sep 2019 10:06:30 +0200 > From: Guillaume Lema?tre > To: Scikit-learn mailing list > Subject: Re: [scikit-learn] Vote on SLEP009: keyword only arguments > Message-ID: > < > CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > To the question: do we want to utilise Python 3's force-keyword-argument > syntax > and to change existing APIs which support arguments positionally to use > this > syntax, via a deprecation period? > > I am +1. > > IMO, even if the syntax might be unknown, it will remain unknown until > projects > from the ecosystem are not using it. > > To the question: which methods should be impacted? > > I think we should be as gentle as possible at first. I am a little > concerned about > breaking some codes which were working fine before. > > On Thu, 12 Sep 2019 at 04:43, Joel Nothman wrote: > > > These there details of specific API changes to be decided: > > > > The question being put, as per the SLEP, is: > > do we want to utilise Python 3's force-keyword-argument syntax > > and to change existing APIs which support arguments positionally to use > > this syntax, via a deprecation period? > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20190912/047eb83c/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Thu, 12 Sep 2019 09:23:03 -0300 > From: Alejandro Javier Peralta Frias > > To: scikit-learn at python.org > Subject: [scikit-learn] How can I enable line tracing for cython > modules. > Message-ID: > mgsgFpcmASzUhZA at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello all, > > To enable cython tracing (in particular I want to line trace neighbors > module) I understand that I have to recompile the cython modules with > CYTHON_TRACE=1 but I'm not sure where should I set this. > > Should I use: > > # distutils: define_macros=CYTHON_TRACE_NOGIL=1 > > > In the files I want to trace? > > Regards, > -- > Ale > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20190912/0377329b/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 42, Issue 14 > ******************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 16117 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7675 bytes Desc: not available URL: From jeremie.du-boisberranger at inria.fr Fri Sep 13 05:53:39 2019 From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger) Date: Fri, 13 Sep 2019 11:53:39 +0200 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> Message-ID: <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> I don't know what is the policy about a sklearn 1.0 w.r.t api changes. If it's meant to be a special release with possible api changes without deprecation cycles, I think this change is a good candidate for 1.0 Otherwise I'm +1 and agree with Guillaume, people will get used to it by using it. J?r?mie On 12/09/2019 10:06, Guillaume Lema?tre wrote: > To the question: do we want to?utilise Python 3's > force-keyword-argument syntax > and to change existing APIs which support arguments positionally to > use this > syntax, via a deprecation period? > > I am +1. > > IMO, even if the syntax might be unknown, it will remain unknown until > projects > from the ecosystem are not using it. > > To the question: which methods should be impacted? > > I think we should be as gentle as possible at first. I am a little > concerned about > breaking some codes which were working fine before. > > On Thu, 12 Sep 2019 at 04:43, Joel Nothman > wrote: > > These there details of specific API changes to be decided: > > The question being put, as per the SLEP, is: > do we want to?utilise Python 3's force-keyword-argument syntax > and to change existing APIs which support arguments positionally > to use this syntax, via a deprecation period? > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmrsg11 at gmail.com Fri Sep 13 22:38:16 2019 From: tmrsg11 at gmail.com (C W) Date: Fri, 13 Sep 2019 22:38:16 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? Message-ID: Hello all, I'm very confused. Can the decision tree module handle both continuous and categorical features in the dataset? In this case, it's just CART (Classification and Regression Trees). For example, Gender Age Income Car Attendance Male 30 10000 BMW Yes Female 35 9000 Toyota No Male 50 12000 Audi Yes According to the documentation https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, it can not! It says: "scikit-learn implementation does not support categorical variables for now". Is this true? If not, can someone point me to an example? If yes, what do people do? Thank you very much! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Fri Sep 13 23:35:45 2019 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Fri, 13 Sep 2019 22:35:45 -0500 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References: Message-ID: <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com> Hi, if you have the category "car" as shown in your example, this would effectively be something like BMW=0 Toyota=1 Audi=2 Sure, the algorithm will execute just fine on the feature column with values in {0, 1, 2}. However, the problem is that it will come up with binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat it is a continuous variable. What you can do is to encode this feature via one-hot encoding -- basically extend it into 2 (or 3) binary variables. This has it's own problems (if you have a feature with many possible values, you will end up with a large number of binary variables, and they may dominate in the resulting tree over other feature variables). In any case, I guess this is what > "scikit-learn implementation does not support categorical variables for now". means ;). Best, Sebastian > On Sep 13, 2019, at 9:38 PM, C W wrote: > > Hello all, > I'm very confused. Can the decision tree module handle both continuous and categorical features in the dataset? In this case, it's just CART (Classification and Regression Trees). > > For example, > Gender Age Income Car Attendance > Male 30 10000 BMW Yes > Female 35 9000 Toyota No > Male 50 12000 Audi Yes > > According to the documentation https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, it can not! > > It says: "scikit-learn implementation does not support categorical variables for now". > > Is this true? If not, can someone point me to an example? If yes, what do people do? > > Thank you very much! > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From tmrsg11 at gmail.com Sat Sep 14 00:41:06 2019 From: tmrsg11 at gmail.com (C W) Date: Sat, 14 Sep 2019 00:41:06 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com> References: <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com> Message-ID: Thanks, Sebastian. It's great to know that it works, just need to do one-hot-encoding first. I have mixed data type (continuous and categorical). Should I tree. DecisionTreeClassifier() or tree.DecisionTreeRegressor()? I'm guessing tree.DecisionTreeClassifier()? Best, Mike On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka < mail at sebastianraschka.com> wrote: > Hi, > > if you have the category "car" as shown in your example, this would > effectively be something like > > BMW=0 > Toyota=1 > Audi=2 > > Sure, the algorithm will execute just fine on the feature column with > values in {0, 1, 2}. However, the problem is that it will come up with > binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat > it is a continuous variable. > > What you can do is to encode this feature via one-hot encoding -- > basically extend it into 2 (or 3) binary variables. This has it's own > problems (if you have a feature with many possible values, you will end up > with a large number of binary variables, and they may dominate in the > resulting tree over other feature variables). > > In any case, I guess this is what > > > "scikit-learn implementation does not support categorical variables for > now". > > > means ;). > > Best, > Sebastian > > > On Sep 13, 2019, at 9:38 PM, C W wrote: > > > > Hello all, > > I'm very confused. Can the decision tree module handle both continuous > and categorical features in the dataset? In this case, it's just CART > (Classification and Regression Trees). > > > > For example, > > Gender Age Income Car Attendance > > Male 30 10000 BMW Yes > > Female 35 9000 Toyota No > > Male 50 12000 Audi Yes > > > > According to the documentation > https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, > it can not! > > > > It says: "scikit-learn implementation does not support categorical > variables for now". > > > > Is this true? If not, can someone point me to an example? If yes, what > do people do? > > > > Thank you very much! > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Sat Sep 14 00:56:15 2019 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Fri, 13 Sep 2019 23:56:15 -0500 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References: <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com> Message-ID: Hi Mike, just to make sure we are on the same page, > I have mixed data type (continuous and categorical). Should I tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? that's independent from the previous email. The comment > > "scikit-learn implementation does not support categorical variables for now". we discussed via the previous email was referring to feature variables. Whether you choose the DT regressor or classifier depends on the format of your target variable. Best, Sebastian > On Sep 13, 2019, at 11:41 PM, C W wrote: > > Thanks, Sebastian. It's great to know that it works, just need to do one-hot-encoding first. > > I have mixed data type (continuous and categorical). Should I tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? > > I'm guessing tree.DecisionTreeClassifier()? > > Best, > > Mike > > On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka wrote: > Hi, > > if you have the category "car" as shown in your example, this would effectively be something like > > BMW=0 > Toyota=1 > Audi=2 > > Sure, the algorithm will execute just fine on the feature column with values in {0, 1, 2}. However, the problem is that it will come up with binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat it is a continuous variable. > > What you can do is to encode this feature via one-hot encoding -- basically extend it into 2 (or 3) binary variables. This has it's own problems (if you have a feature with many possible values, you will end up with a large number of binary variables, and they may dominate in the resulting tree over other feature variables). > > In any case, I guess this is what > > > "scikit-learn implementation does not support categorical variables for now". > > > means ;). > > Best, > Sebastian > > > On Sep 13, 2019, at 9:38 PM, C W wrote: > > > > Hello all, > > I'm very confused. Can the decision tree module handle both continuous and categorical features in the dataset? In this case, it's just CART (Classification and Regression Trees). > > > > For example, > > Gender Age Income Car Attendance > > Male 30 10000 BMW Yes > > Female 35 9000 Toyota No > > Male 50 12000 Audi Yes > > > > According to the documentation https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, it can not! > > > > It says: "scikit-learn implementation does not support categorical variables for now". > > > > Is this true? If not, can someone point me to an example? If yes, what do people do? > > > > Thank you very much! > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From tmrsg11 at gmail.com Sat Sep 14 01:26:58 2019 From: tmrsg11 at gmail.com (C W) Date: Sat, 14 Sep 2019 01:26:58 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References: <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com> Message-ID: Ahh, you are right. Regression vs. Classification is about the type of target variable, not features. Thanks, more clear now. Mike On Sat, Sep 14, 2019 at 1:23 AM Sebastian Raschka wrote: > Hi Mike, > > just to make sure we are on the same page, > > > I have mixed data type (continuous and categorical). Should I > tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? > > that's independent from the previous email. The comment > > > > "scikit-learn implementation does not support categorical variables > for now". > > we discussed via the previous email was referring to feature variables. > Whether you choose the DT regressor or classifier depends on the format of > your target variable. > > Best, > Sebastian > > > On Sep 13, 2019, at 11:41 PM, C W wrote: > > > > Thanks, Sebastian. It's great to know that it works, just need to do > one-hot-encoding first. > > > > I have mixed data type (continuous and categorical). Should I > tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? > > > > I'm guessing tree.DecisionTreeClassifier()? > > > > Best, > > > > Mike > > > > On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka < > mail at sebastianraschka.com> wrote: > > Hi, > > > > if you have the category "car" as shown in your example, this would > effectively be something like > > > > BMW=0 > > Toyota=1 > > Audi=2 > > > > Sure, the algorithm will execute just fine on the feature column with > values in {0, 1, 2}. However, the problem is that it will come up with > binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat > it is a continuous variable. > > > > What you can do is to encode this feature via one-hot encoding -- > basically extend it into 2 (or 3) binary variables. This has it's own > problems (if you have a feature with many possible values, you will end up > with a large number of binary variables, and they may dominate in the > resulting tree over other feature variables). > > > > In any case, I guess this is what > > > > > "scikit-learn implementation does not support categorical variables > for now". > > > > > > means ;). > > > > Best, > > Sebastian > > > > > On Sep 13, 2019, at 9:38 PM, C W wrote: > > > > > > Hello all, > > > I'm very confused. Can the decision tree module handle both continuous > and categorical features in the dataset? In this case, it's just CART > (Classification and Regression Trees). > > > > > > For example, > > > Gender Age Income Car Attendance > > > Male 30 10000 BMW Yes > > > Female 35 9000 Toyota No > > > Male 50 12000 Audi Yes > > > > > > According to the documentation > https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, > it can not! > > > > > > It says: "scikit-learn implementation does not support categorical > variables for now". > > > > > > Is this true? If not, can someone point me to an example? If yes, what > do people do? > > > > > > Thank you very much! > > > > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Sat Sep 14 05:14:17 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Sat, 14 Sep 2019 11:14:17 +0200 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References: <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com> Message-ID: I will just add that if you have heterogeneous types, you might want to look at the ColumnTransformer: https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html You might want to apply some scaling (would not be relevant for tree thought) and encode categories (ordinal encoding for the tree-based) and then dispatch these data to a decision tree. The previous example shows how to construct such a preprocessor and pipeline it with a predictor. On Sat, 14 Sep 2019 at 07:29, C W wrote: > Ahh, you are right. Regression vs. Classification is about the type of > target variable, not features. > > Thanks, more clear now. > > Mike > > On Sat, Sep 14, 2019 at 1:23 AM Sebastian Raschka < > mail at sebastianraschka.com> wrote: > >> Hi Mike, >> >> just to make sure we are on the same page, >> >> > I have mixed data type (continuous and categorical). Should I >> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? >> >> that's independent from the previous email. The comment >> >> > > "scikit-learn implementation does not support categorical variables >> for now". >> >> we discussed via the previous email was referring to feature variables. >> Whether you choose the DT regressor or classifier depends on the format of >> your target variable. >> >> Best, >> Sebastian >> >> > On Sep 13, 2019, at 11:41 PM, C W wrote: >> > >> > Thanks, Sebastian. It's great to know that it works, just need to do >> one-hot-encoding first. >> > >> > I have mixed data type (continuous and categorical). Should I >> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? >> > >> > I'm guessing tree.DecisionTreeClassifier()? >> > >> > Best, >> > >> > Mike >> > >> > On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka < >> mail at sebastianraschka.com> wrote: >> > Hi, >> > >> > if you have the category "car" as shown in your example, this would >> effectively be something like >> > >> > BMW=0 >> > Toyota=1 >> > Audi=2 >> > >> > Sure, the algorithm will execute just fine on the feature column with >> values in {0, 1, 2}. However, the problem is that it will come up with >> binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat >> it is a continuous variable. >> > >> > What you can do is to encode this feature via one-hot encoding -- >> basically extend it into 2 (or 3) binary variables. This has it's own >> problems (if you have a feature with many possible values, you will end up >> with a large number of binary variables, and they may dominate in the >> resulting tree over other feature variables). >> > >> > In any case, I guess this is what >> > >> > > "scikit-learn implementation does not support categorical variables >> for now". >> > >> > >> > means ;). >> > >> > Best, >> > Sebastian >> > >> > > On Sep 13, 2019, at 9:38 PM, C W wrote: >> > > >> > > Hello all, >> > > I'm very confused. Can the decision tree module handle both >> continuous and categorical features in the dataset? In this case, it's just >> CART (Classification and Regression Trees). >> > > >> > > For example, >> > > Gender Age Income Car Attendance >> > > Male 30 10000 BMW Yes >> > > Female 35 9000 Toyota No >> > > Male 50 12000 Audi Yes >> > > >> > > According to the documentation >> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, >> it can not! >> > > >> > > It says: "scikit-learn implementation does not support categorical >> variables for now". >> > > >> > > Is this true? If not, can someone point me to an example? If yes, >> what do people do? >> > > >> > > Thank you very much! >> > > >> > > >> > > >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn at python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat Sep 14 08:10:29 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Sat, 14 Sep 2019 22:10:29 +1000 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> Message-ID: I am +1 for this change. I agree that users will accommodate the syntax sooner or later. On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, < jeremie.du-boisberranger at inria.fr> wrote: > I don't know what is the policy about a sklearn 1.0 w.r.t api changes. > > If it's meant to be a special release with possible api changes without > deprecation cycles, I think this change is a good candidate for 1.0 > > > Otherwise I'm +1 and agree with Guillaume, people will get used to it by > using it. > > J?r?mie > > > > On 12/09/2019 10:06, Guillaume Lema?tre wrote: > > To the question: do we want to utilise Python 3's force-keyword-argument > syntax > and to change existing APIs which support arguments positionally to use > this > syntax, via a deprecation period? > > I am +1. > > IMO, even if the syntax might be unknown, it will remain unknown until > projects > from the ecosystem are not using it. > > To the question: which methods should be impacted? > > I think we should be as gentle as possible at first. I am a little > concerned about > breaking some codes which were working fine before. > > On Thu, 12 Sep 2019 at 04:43, Joel Nothman wrote: > >> These there details of specific API changes to be decided: >> >> The question being put, as per the SLEP, is: >> do we want to utilise Python 3's force-keyword-argument syntax >> and to change existing APIs which support arguments positionally to use >> this syntax, via a deprecation period? >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlopez at ende.cc Sat Sep 14 09:23:13 2019 From: jlopez at ende.cc (=?UTF-8?Q?Javier_L=C3=B3pez?=) Date: Sat, 14 Sep 2019 14:23:13 +0100 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References: Message-ID: If you have datasets with many categorical features, and perhaps many categories, the tools in sklearn are quite limited, but there are alternative implementations of boosted trees that are designed with categorical features in mind. Take a look at catboost [1], which has an sklearn-compatible API. J [1] https://catboost.ai/ On Sat, Sep 14, 2019 at 3:40 AM C W wrote: > Hello all, > I'm very confused. Can the decision tree module handle both continuous and > categorical features in the dataset? In this case, it's just CART > (Classification and Regression Trees). > > For example, > Gender Age Income Car Attendance > Male 30 10000 BMW Yes > Female 35 9000 Toyota No > Male 50 12000 Audi Yes > > According to the documentation > https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, > it can not! > > It says: "scikit-learn implementation does not support categorical > variables for now". > > Is this true? If not, can someone point me to an example? If yes, what do > people do? > > Thank you very much! > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spsayakpaul at gmail.com Sat Sep 14 12:30:37 2019 From: spsayakpaul at gmail.com (Sayak Paul) Date: Sat, 14 Sep 2019 22:00:37 +0530 Subject: [scikit-learn] Problem regarding MultiLabelBinarizer In-Reply-To: References: Message-ID: Sayak Paul | sayak.dev ---------- Forwarded message --------- From: Date: Fri, Sep 13, 2019 at 10:46 AM Subject: scikit-learn Digest, Vol 42, Issue 15 To: Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Re: scikit-learn Digest, Vol 42, Issue 14 (Sayak Paul) ---------------------------------------------------------------------- Message: 1 Date: Fri, 13 Sep 2019 10:46:09 +0530 From: Sayak Paul To: scikit-learn at python.org Subject: Re: [scikit-learn] scikit-learn Digest, Vol 42, Issue 14 Message-ID: Content-Type: text/plain; charset="utf-8" I was able to solve the problem using - mlb = MultiLabelBinarizer() mlb.fit([y_train]) Thanks for the suggestions. The output of mlb.classes_ now looks the following (first ten classes): [image: image.png] However, when I transform it using mlb.transform([y_train]), another problem arrises - [image: image.png] Kindly suggest :) Sayak Paul | sayak.dev On Thu, Sep 12, 2019 at 9:33 PM wrote: > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: MultiLabelBinarizer gives individual characters instead > of the classes (Lo?c Est?ve) > 2. Re: Vote on SLEP009: keyword only arguments (Guillaume Lema?tre) > 3. How can I enable line tracing for cython modules. > (Alejandro Javier Peralta Frias) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 12 Sep 2019 07:24:48 +0200 > From: Lo?c Est?ve > To: Scikit-learn mailing list > Subject: Re: [scikit-learn] MultiLabelBinarizer gives individual > characters instead of the classes > Message-ID: > Content-Type: text/plain; charset=utf-8 > > I think this caveat has been added in the dev doc (not yet in the stable > doc). You may want to read: > > https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html > and in particular the part that starts with "A common mistake is to pass > in a list". > > Cheers, > Lo?c > > > Hi. > > > > I am working on a Multi-label text classification problem. In order to > encode the labels, I am using MultiLabelBinarizer. The labels of the > dataset look like - > > > > image > > > > When I am using > > > > mlb = MultiLabelBinarizer() > > mlb.fit(labels) > > print(mlb.classes_) > > > > I am getting - > > > > image > > > > Whereas, the output (sample output) I want is - > > > > image > > > > I got the above output by - > > > > mlb = MultiLabelBinarizer() > > sample_labels = [ > > ['stat.ML', 'cs.LG'], > > ['cs.CV', 'cs.RO'] > > ] > > mlb.fit(sample_labels) > > print(mlb.classes_) > > > > Help would be very much appreciated here. > > > > Here's the dataset I had prepared: > > arXivdata.csv.zip > > > > I stripped away the double quotes in the labels after loading it in a > pandas DataFrame by - > > > > import re > > > > arxiv_data['labels'] = arxiv_data['labels'].str.replace(r"[\"]", '') > > > > scikit-learn version: '0.21.3' > > > > Sayak Paul | sayak.dev > > > > ------------------------------ > > Message: 2 > Date: Thu, 12 Sep 2019 10:06:30 +0200 > From: Guillaume Lema?tre > To: Scikit-learn mailing list > Subject: Re: [scikit-learn] Vote on SLEP009: keyword only arguments > Message-ID: > < > CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > To the question: do we want to utilise Python 3's force-keyword-argument > syntax > and to change existing APIs which support arguments positionally to use > this > syntax, via a deprecation period? > > I am +1. > > IMO, even if the syntax might be unknown, it will remain unknown until > projects > from the ecosystem are not using it. > > To the question: which methods should be impacted? > > I think we should be as gentle as possible at first. I am a little > concerned about > breaking some codes which were working fine before. > > On Thu, 12 Sep 2019 at 04:43, Joel Nothman wrote: > > > These there details of specific API changes to be decided: > > > > The question being put, as per the SLEP, is: > > do we want to utilise Python 3's force-keyword-argument syntax > > and to change existing APIs which support arguments positionally to use > > this syntax, via a deprecation period? > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20190912/047eb83c/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Thu, 12 Sep 2019 09:23:03 -0300 > From: Alejandro Javier Peralta Frias > > To: scikit-learn at python.org > Subject: [scikit-learn] How can I enable line tracing for cython > modules. > Message-ID: > mgsgFpcmASzUhZA at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello all, > > To enable cython tracing (in particular I want to line trace neighbors > module) I understand that I have to recompile the cython modules with > CYTHON_TRACE=1 but I'm not sure where should I set this. > > Should I use: > > # distutils: define_macros=CYTHON_TRACE_NOGIL=1 > > > In the files I want to trace? > > Regards, > -- > Ale > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20190912/0377329b/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 42, Issue 14 > ******************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://mail.python.org/pipermail/scikit-learn/attachments/20190913/921c80cd/attachment.html > -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 16117 bytes Desc: not available URL: < http://mail.python.org/pipermail/scikit-learn/attachments/20190913/921c80cd/attachment.png > -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7675 bytes Desc: not available URL: < http://mail.python.org/pipermail/scikit-learn/attachments/20190913/921c80cd/attachment-0001.png > ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 42, Issue 15 ******************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmrsg11 at gmail.com Sat Sep 14 14:57:22 2019 From: tmrsg11 at gmail.com (C W) Date: Sat, 14 Sep 2019 14:57:22 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References: Message-ID: Thanks, Guillaume. Column transformer looks pretty neat. I've also heard though, this pipeline can be tedious to set up? Specifying what you want for every feature is a pain. Jaiver, Actually, you guessed right. My real data has only one numerical variable, looks more like this: Gender Date Income Car Attendance Male 2019/3/01 10000 BMW Yes Female 2019/5/02 9000 Toyota No Male 2019/7/15 12000 Audi Yes I am predicting income using all other categorical variables. Maybe it is catboost! Thanks, M On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez wrote: > If you have datasets with many categorical features, and perhaps many > categories, the tools in sklearn are quite limited, > but there are alternative implementations of boosted trees that are > designed with categorical features in mind. Take a look > at catboost [1], which has an sklearn-compatible API. > > J > > [1] https://catboost.ai/ > > On Sat, Sep 14, 2019 at 3:40 AM C W wrote: > >> Hello all, >> I'm very confused. Can the decision tree module handle both continuous >> and categorical features in the dataset? In this case, it's just CART >> (Classification and Regression Trees). >> >> For example, >> Gender Age Income Car Attendance >> Male 30 10000 BMW Yes >> Female 35 9000 Toyota No >> Male 50 12000 Audi Yes >> >> According to the documentation >> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, >> it can not! >> >> It says: "scikit-learn implementation does not support categorical >> variables for now". >> >> Is this true? If not, can someone point me to an example? If yes, what do >> people do? >> >> Thank you very much! >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasjpfan at gmail.com Sat Sep 14 18:21:12 2019 From: thomasjpfan at gmail.com (Thomas J Fan) Date: Sat, 14 Sep 2019 18:21:12 -0400 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> Message-ID: +1 from me On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman wrote: > I am +1 for this change. > > I agree that users will accommodate the syntax sooner or later. > > On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, < > jeremie.du-boisberranger at inria.fr> wrote: > >> I don't know what is the policy about a sklearn 1.0 w.r.t api changes. >> >> If it's meant to be a special release with possible api changes without >> deprecation cycles, I think this change is a good candidate for 1.0 >> >> >> Otherwise I'm +1 and agree with Guillaume, people will get used to it by >> using it. >> >> J?r?mie >> >> >> >> On 12/09/2019 10:06, Guillaume Lema?tre wrote: >> >> To the question: do we want to utilise Python 3's force-keyword-argument >> syntax >> and to change existing APIs which support arguments positionally to use >> this >> syntax, via a deprecation period? >> >> I am +1. >> >> IMO, even if the syntax might be unknown, it will remain unknown until >> projects >> from the ecosystem are not using it. >> >> To the question: which methods should be impacted? >> >> I think we should be as gentle as possible at first. I am a little >> concerned about >> breaking some codes which were working fine before. >> >> On Thu, 12 Sep 2019 at 04:43, Joel Nothman >> wrote: >> >>> These there details of specific API changes to be decided: >>> >>> The question being put, as per the SLEP, is: >>> do we want to utilise Python 3's force-keyword-argument syntax >>> and to change existing APIs which support arguments positionally to use >>> this syntax, via a deprecation period? >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> -- >> Guillaume Lemaitre >> INRIA Saclay - Parietal team >> Center for Data Science Paris-Saclay >> https://glemaitre.github.io/ >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Sun Sep 15 08:16:29 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Sun, 15 Sep 2019 14:16:29 +0200 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References: Message-ID: On Sat, 14 Sep 2019 at 20:59, C W wrote: > Thanks, Guillaume. > Column transformer looks pretty neat. I've also heard though, this > pipeline can be tedious to set up? Specifying what you want for every > feature is a pain. > It would be interesting for us which part of the pipeline is tedious to set up to know if we can improve something there. Do you mean, that you would like to automatically detect of which type of feature (categorical/numerical) and apply a default encoder/scaling such as discuss there: https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 IMO, one a user perspective, it would be cleaner in some cases at the cost of applying blindly a black box which might be dangerous. > > Jaiver, > Actually, you guessed right. My real data has only one numerical > variable, looks more like this: > > Gender Date Income Car Attendance > Male 2019/3/01 10000 BMW Yes > Female 2019/5/02 9000 Toyota No > Male 2019/7/15 12000 Audi Yes > > I am predicting income using all other categorical variables. Maybe it is > catboost! > > Thanks, > > M > > > > > > > On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez wrote: > >> If you have datasets with many categorical features, and perhaps many >> categories, the tools in sklearn are quite limited, >> but there are alternative implementations of boosted trees that are >> designed with categorical features in mind. Take a look >> at catboost [1], which has an sklearn-compatible API. >> >> J >> >> [1] https://catboost.ai/ >> >> On Sat, Sep 14, 2019 at 3:40 AM C W wrote: >> >>> Hello all, >>> I'm very confused. Can the decision tree module handle both continuous >>> and categorical features in the dataset? In this case, it's just CART >>> (Classification and Regression Trees). >>> >>> For example, >>> Gender Age Income Car Attendance >>> Male 30 10000 BMW Yes >>> Female 35 9000 Toyota No >>> Male 50 12000 Audi Yes >>> >>> According to the documentation >>> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, >>> it can not! >>> >>> It says: "scikit-learn implementation does not support categorical >>> variables for now". >>> >>> Is this true? If not, can someone point me to an example? If yes, what >>> do people do? >>> >>> Thank you very much! >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From spsayakpaul at gmail.com Mon Sep 16 00:12:24 2019 From: spsayakpaul at gmail.com (Sayak Paul) Date: Mon, 16 Sep 2019 09:42:24 +0530 Subject: [scikit-learn] MultiBinarizer issue Message-ID: I am working on a multi-label text classification problem. In order to encode the labels, I am using MultiLabelBinarizer. The labels of the dataset look like - [cs.AI, cs.CL, cs.CV, cs.NE, stat.ML][cs.CL, cs.AI, cs.LG, cs.NE, stat.ML][cs.CL, cs.AI, cs.LG, cs.NE, stat.ML][stat.ML, cs.AI, cs.CL, cs.LG, cs.NE][cs.CL, cs.AI, cs.LG, cs.NE, stat.ML] When I am using mlb = MultiLabelBinarizer() mlb.fit(labels)print(mlb.classes_) It gives me - array([' ', ',', '.', 'A', 'B', 'C', 'D', 'E', 'G', 'H', 'I', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'V', 'Y', '[', ']', 'a', 'c', 'h', 'm', 's', 't'], dtype=object) I (partially) fixed this problem by mlb.fit([y_train]) and I got (I printed first 10 classes) - array(['[cs.AI, cs.CC]', '[cs.AI, cs.CV]', '[cs.AI, cs.CY]', '[cs.AI, cs.DB]', '[cs.AI, cs.DS]', '[cs.AI, cs.GT]', '[cs.AI, cs.HC]', '[cs.AI, cs.IR]', '[cs.AI, cs.LG, stat.ML]', '[cs.AI, cs.LG]'], dtype=object) Ideally, it should output the individual classes (there may be something wrong in my code). When I am using mlb.fit_transform([y_train]), I am getting - array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]) Help would be very much appreciated. Here's the corresponding StackOverflow issue: https://stackoverflow.com/questions/57917936/multilabelbinarizer-gives-individual-characters-instead-of-the-classes Sayak Paul | sayak.dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyr14 at gmail.com Mon Sep 16 06:02:18 2019 From: zephyr14 at gmail.com (Vlad Niculae) Date: Mon, 16 Sep 2019 11:02:18 +0100 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> Message-ID: I vote +1 Hopefully keyword-only args become normalized and a future will come where I won't see `x.sum(0)` anymore VN On Sat, Sep 14, 2019 at 11:23 PM Thomas J Fan wrote: > +1 from me > > On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman > wrote: > >> I am +1 for this change. >> >> I agree that users will accommodate the syntax sooner or later. >> >> On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, < >> jeremie.du-boisberranger at inria.fr> wrote: >> >>> I don't know what is the policy about a sklearn 1.0 w.r.t api changes. >>> >>> If it's meant to be a special release with possible api changes without >>> deprecation cycles, I think this change is a good candidate for 1.0 >>> >>> >>> Otherwise I'm +1 and agree with Guillaume, people will get used to it by >>> using it. >>> >>> J?r?mie >>> >>> >>> >>> On 12/09/2019 10:06, Guillaume Lema?tre wrote: >>> >>> To the question: do we want to utilise Python 3's force-keyword-argument >>> syntax >>> and to change existing APIs which support arguments positionally to use >>> this >>> syntax, via a deprecation period? >>> >>> I am +1. >>> >>> IMO, even if the syntax might be unknown, it will remain unknown until >>> projects >>> from the ecosystem are not using it. >>> >>> To the question: which methods should be impacted? >>> >>> I think we should be as gentle as possible at first. I am a little >>> concerned about >>> breaking some codes which were working fine before. >>> >>> On Thu, 12 Sep 2019 at 04:43, Joel Nothman >>> wrote: >>> >>>> These there details of specific API changes to be decided: >>>> >>>> The question being put, as per the SLEP, is: >>>> do we want to utilise Python 3's force-keyword-argument syntax >>>> and to change existing APIs which support arguments positionally to use >>>> this syntax, via a deprecation period? >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> >>> >>> -- >>> Guillaume Lemaitre >>> INRIA Saclay - Parietal team >>> Center for Data Science Paris-Saclay >>> https://glemaitre.github.io/ >>> >>> _______________________________________________ >>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at gmail.com Mon Sep 16 06:04:25 2019 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Mon, 16 Sep 2019 12:04:25 +0200 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> Message-ID: <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> +1 assuming we are careful about continuing to allow some frequently used positional arguments, even in __init__. For instance, n_components = 10 pca = PCA(n_components) is still more readable, I think, than, pca = PCA(n_components=n_components) -- Roman On 15/09/2019 00:21, Thomas J Fan wrote: > +1 from me > > On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman > wrote: > > I am +1 for this change. > > I agree that users will accommodate the syntax sooner or later. > > On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, > > wrote: > > I don't know what is the policy about a sklearn 1.0 w.r.t api > changes. > > If it's meant to be a special release with possible api changes > without deprecation cycles, I think this change is a good > candidate for 1.0 > > > Otherwise I'm +1 and agree with Guillaume, people will get used > to it by using it. > > J?r?mie > > > > On 12/09/2019 10:06, Guillaume Lema?tre wrote: >> To the question: do we want to?utilise Python 3's >> force-keyword-argument syntax >> and to change existing APIs which support arguments >> positionally to use this >> syntax, via a deprecation period? >> >> I am +1. >> >> IMO, even if the syntax might be unknown, it will remain >> unknown until projects >> from the ecosystem are not using it. >> >> To the question: which methods should be impacted? >> >> I think we should be as gentle as possible at first. I am a >> little concerned about >> breaking some codes which were working fine before. >> >> On Thu, 12 Sep 2019 at 04:43, Joel Nothman >> > wrote: >> >> These there details of specific API changes to be decided: >> >> The question being put, as per the SLEP, is: >> do we want to?utilise Python 3's force-keyword-argument syntax >> and to change existing APIs which support arguments >> positionally to use this syntax, via a deprecation period? >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> -- >> Guillaume Lemaitre >> INRIA Saclay - Parietal team >> Center for Data Science Paris-Saclay >> https://glemaitre.github.io/ >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From joel.nothman at gmail.com Mon Sep 16 09:28:57 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Mon, 16 Sep 2019 23:28:57 +1000 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> Message-ID: Btw, consensus is defined by 2/3 of cast votes by core devs, according to our Governance. https://scikit-learn.org/dev/about.html#authors lists 20 core devs. That is, we could consider this resolved after 14 votes in favour. So far, if I've interpreted correctly: +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, roman) = 9. I've not understood a clear position from Alex. I'm assuming Andreas is in favour given his comments elsewhere, but we've not seen comment here. On Mon, 16 Sep 2019 at 20:06, Roman Yurchak wrote: > +1 assuming we are careful about continuing to allow some frequently > used positional arguments, even in __init__. > > For instance, > > n_components = 10 > pca = PCA(n_components) > > is still more readable, I think, than, > > pca = PCA(n_components=n_components) > > > -- > Roman > > On 15/09/2019 00:21, Thomas J Fan wrote: > > +1 from me > > > > On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman > > wrote: > > > > I am +1 for this change. > > > > I agree that users will accommodate the syntax sooner or later. > > > > On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, > > > > wrote: > > > > I don't know what is the policy about a sklearn 1.0 w.r.t api > > changes. > > > > If it's meant to be a special release with possible api changes > > without deprecation cycles, I think this change is a good > > candidate for 1.0 > > > > > > Otherwise I'm +1 and agree with Guillaume, people will get used > > to it by using it. > > > > J?r?mie > > > > > > > > On 12/09/2019 10:06, Guillaume Lema?tre wrote: > >> To the question: do we want to utilise Python 3's > >> force-keyword-argument syntax > >> and to change existing APIs which support arguments > >> positionally to use this > >> syntax, via a deprecation period? > >> > >> I am +1. > >> > >> IMO, even if the syntax might be unknown, it will remain > >> unknown until projects > >> from the ecosystem are not using it. > >> > >> To the question: which methods should be impacted? > >> > >> I think we should be as gentle as possible at first. I am a > >> little concerned about > >> breaking some codes which were working fine before. > >> > >> On Thu, 12 Sep 2019 at 04:43, Joel Nothman > >> > wrote: > >> > >> These there details of specific API changes to be decided: > >> > >> The question being put, as per the SLEP, is: > >> do we want to utilise Python 3's force-keyword-argument > syntax > >> and to change existing APIs which support arguments > >> positionally to use this syntax, via a deprecation period? > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > >> > >> > >> -- > >> Guillaume Lemaitre > >> INRIA Saclay - Parietal team > >> Center for Data Science Paris-Saclay > >> https://glemaitre.github.io/ > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bertrand.thirion at inria.fr Mon Sep 16 12:58:42 2019 From: bertrand.thirion at inria.fr (Bertrand Thirion) Date: Mon, 16 Sep 2019 18:58:42 +0200 (CEST) Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> Message-ID: <562566079.46061430.1568653122836.JavaMail.zimbra@inria.fr> +1 for the geenralization of kw arguments. This is obviously relevant for __init__ methods, why good old (X,y) should remain positional. Best, Bertrand > De: "Joel Nothman" > ?: "Scikit-learn mailing list" > Envoy?: Lundi 16 Septembre 2019 15:28:57 > Objet: Re: [scikit-learn] Vote on SLEP009: keyword only arguments > Btw, consensus is defined by 2/3 of cast votes by core devs, according to our > Governance. [ https://scikit-learn.org/dev/about.html#authors | > https://scikit-learn.org/dev/about.html#authors ] lists 20 core devs. > That is, we could consider this resolved after 14 votes in favour. > So far, if I've interpreted correctly: > +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, roman) = 9. > I've not understood a clear position from Alex. I'm assuming Andreas is in > favour given his comments elsewhere, but we've not seen comment here. > On Mon, 16 Sep 2019 at 20:06, Roman Yurchak < [ mailto:rth.yurchak at gmail.com | > rth.yurchak at gmail.com ] > wrote: >> +1 assuming we are careful about continuing to allow some frequently >> used positional arguments, even in __init__. >> For instance, >> n_components = 10 >> pca = PCA(n_components) >> is still more readable, I think, than, >> pca = PCA(n_components=n_components) >> -- >> Roman >> On 15/09/2019 00:21, Thomas J Fan wrote: >> > +1 from me >>> On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman < [ mailto:joel.nothman at gmail.com | >> > joel.nothman at gmail.com ] >> > > wrote: >> > I am +1 for this change. >> > I agree that users will accommodate the syntax sooner or later. >> > On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, >>> < [ mailto:jeremie.du-boisberranger at inria.fr | jeremie.du-boisberranger at inria.fr >> > ] >>> > > jeremie.du-boisberranger at inria.fr ] >> wrote: >> > I don't know what is the policy about a sklearn 1.0 w.r.t api >> > changes. >> > If it's meant to be a special release with possible api changes >> > without deprecation cycles, I think this change is a good >> > candidate for 1.0 >> > Otherwise I'm +1 and agree with Guillaume, people will get used >> > to it by using it. >> > J?r?mie >> > On 12/09/2019 10:06, Guillaume Lema?tre wrote: >> >> To the question: do we want to utilise Python 3's >> >> force-keyword-argument syntax >> >> and to change existing APIs which support arguments >> >> positionally to use this >> >> syntax, via a deprecation period? >> >> I am +1. >> >> IMO, even if the syntax might be unknown, it will remain >> >> unknown until projects >> >> from the ecosystem are not using it. >> >> To the question: which methods should be impacted? >> >> I think we should be as gentle as possible at first. I am a >> >> little concerned about >> >> breaking some codes which were working fine before. >> >> On Thu, 12 Sep 2019 at 04:43, Joel Nothman >>>> < [ mailto:joel.nothman at gmail.com | joel.nothman at gmail.com ] > >> mailto:joel.nothman at gmail.com | joel.nothman at gmail.com ] >> wrote: >> >> These there details of specific API changes to be decided: >> >> The question being put, as per the SLEP, is: >> >> do we want to utilise Python 3's force-keyword-argument syntax >> >> and to change existing APIs which support arguments >> >> positionally to use this syntax, via a deprecation period? >> >> _______________________________________________ >> >> scikit-learn mailing list >>>> [ mailto:scikit-learn at python.org | scikit-learn at python.org ] > >> mailto:scikit-learn at python.org | scikit-learn at python.org ] > >>>> [ https://mail.python.org/mailman/listinfo/scikit-learn | >> >> https://mail.python.org/mailman/listinfo/scikit-learn ] >> >> -- >> >> Guillaume Lemaitre >> >> INRIA Saclay - Parietal team >> >> Center for Data Science Paris-Saclay >> >> [ https://glemaitre.github.io/ | https://glemaitre.github.io/ ] >> >> _______________________________________________ >> >> scikit-learn mailing list >>>> [ mailto:scikit-learn at python.org | scikit-learn at python.org ] > >> mailto:scikit-learn at python.org | scikit-learn at python.org ] > >>>> [ https://mail.python.org/mailman/listinfo/scikit-learn | >> >> https://mail.python.org/mailman/listinfo/scikit-learn ] >> > _______________________________________________ >> > scikit-learn mailing list >>> [ mailto:scikit-learn at python.org | scikit-learn at python.org ] > > mailto:scikit-learn at python.org | scikit-learn at python.org ] > >>> [ https://mail.python.org/mailman/listinfo/scikit-learn | >> > https://mail.python.org/mailman/listinfo/scikit-learn ] >> > _______________________________________________ >> > scikit-learn mailing list >>> [ mailto:scikit-learn at python.org | scikit-learn at python.org ] > > mailto:scikit-learn at python.org | scikit-learn at python.org ] > >>> [ https://mail.python.org/mailman/listinfo/scikit-learn | >> > https://mail.python.org/mailman/listinfo/scikit-learn ] >> > _______________________________________________ >> > scikit-learn mailing list >> > [ mailto:scikit-learn at python.org | scikit-learn at python.org ] >>> [ https://mail.python.org/mailman/listinfo/scikit-learn | >> > https://mail.python.org/mailman/listinfo/scikit-learn ] >> _______________________________________________ >> scikit-learn mailing list >> [ mailto:scikit-learn at python.org | scikit-learn at python.org ] >> [ https://mail.python.org/mailman/listinfo/scikit-learn | >> https://mail.python.org/mailman/listinfo/scikit-learn ] > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.duprelatour at orange.fr Mon Sep 16 14:00:58 2019 From: tom.duprelatour at orange.fr (Tom DLT) Date: Mon, 16 Sep 2019 11:00:58 -0700 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> Message-ID: I vote +1 Tom Le lun. 16 sept. 2019 ? 06:30, Joel Nothman a ?crit : > Btw, consensus is defined by 2/3 of cast votes by core devs, according to > our Governance. https://scikit-learn.org/dev/about.html#authors lists 20 > core devs. > > That is, we could consider this resolved after 14 votes in favour. > > So far, if I've interpreted correctly: > > +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, roman) > = 9. > > I've not understood a clear position from Alex. I'm assuming Andreas is in > favour given his comments elsewhere, but we've not seen comment here. > > > > On Mon, 16 Sep 2019 at 20:06, Roman Yurchak wrote: > >> +1 assuming we are careful about continuing to allow some frequently >> used positional arguments, even in __init__. >> >> For instance, >> >> n_components = 10 >> pca = PCA(n_components) >> >> is still more readable, I think, than, >> >> pca = PCA(n_components=n_components) >> >> >> -- >> Roman >> >> On 15/09/2019 00:21, Thomas J Fan wrote: >> > +1 from me >> > >> > On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman > > > wrote: >> > >> > I am +1 for this change. >> > >> > I agree that users will accommodate the syntax sooner or later. >> > >> > On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, >> > > > > wrote: >> > >> > I don't know what is the policy about a sklearn 1.0 w.r.t api >> > changes. >> > >> > If it's meant to be a special release with possible api changes >> > without deprecation cycles, I think this change is a good >> > candidate for 1.0 >> > >> > >> > Otherwise I'm +1 and agree with Guillaume, people will get used >> > to it by using it. >> > >> > J?r?mie >> > >> > >> > >> > On 12/09/2019 10:06, Guillaume Lema?tre wrote: >> >> To the question: do we want to utilise Python 3's >> >> force-keyword-argument syntax >> >> and to change existing APIs which support arguments >> >> positionally to use this >> >> syntax, via a deprecation period? >> >> >> >> I am +1. >> >> >> >> IMO, even if the syntax might be unknown, it will remain >> >> unknown until projects >> >> from the ecosystem are not using it. >> >> >> >> To the question: which methods should be impacted? >> >> >> >> I think we should be as gentle as possible at first. I am a >> >> little concerned about >> >> breaking some codes which were working fine before. >> >> >> >> On Thu, 12 Sep 2019 at 04:43, Joel Nothman >> >> > >> wrote: >> >> >> >> These there details of specific API changes to be decided: >> >> >> >> The question being put, as per the SLEP, is: >> >> do we want to utilise Python 3's force-keyword-argument >> syntax >> >> and to change existing APIs which support arguments >> >> positionally to use this syntax, via a deprecation period? >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> >> >> >> >> -- >> >> Guillaume Lemaitre >> >> INRIA Saclay - Parietal team >> >> Center for Data Science Paris-Saclay >> >> https://glemaitre.github.io/ >> >> >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Mon Sep 16 15:32:37 2019 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 16 Sep 2019 15:32:37 -0400 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> Message-ID: <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org> On Mon, Sep 16, 2019 at 11:28:57PM +1000, Joel Nothman wrote: > That is, we could consider this resolved after 14 votes in favour. > So far, if I've interpreted correctly: > +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, roman) = 9. > I've not understood a clear position from Alex. I'm assuming Andreas is in > favour given his comments elsewhere, but we've not seen comment here. I was planning to vote -0 mostly to avoid the vote to seem like bandwagon (and because I am not fully sold on the idea), but I actually want this to move forward, and it seems that my vote is needed. Hence, I vote +1. Hopefully Andreas and Alex make their position clear and we can adopt the SLEP. Thank you to you all. Ga?l > On Mon, 16 Sep 2019 at 20:06, Roman Yurchak wrote: > +1 assuming we are careful about continuing to allow some frequently > used positional arguments, even in __init__. > For instance, > n_components = 10 > pca = PCA(n_components) > is still more readable, I think, than, > pca = PCA(n_components=n_components) -- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From albertthomas88 at gmail.com Mon Sep 16 16:22:48 2019 From: albertthomas88 at gmail.com (Albert Thomas) Date: Mon, 16 Sep 2019 22:22:48 +0200 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org> References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org> Message-ID: Hi all, Just a few comments about this SLEP from a contributor and user of the library :). I think it is important for users to be able to quickly and easily know/learn which arguments should be keyword arguments when they use scikit-learn. As a user, I do not want to have to double check each time I use a function the arguments that should be keyword arguments. Hence the following sentence of the SLEP "the decision for these methods should be the same throughout the library in order to keep a consistent interface to the user" is very important to me. Also how is this going to be rendered by sphinx in the doc? (before numpydoc supports section for parameters) Thanks, Albert On Mon, Sep 16, 2019 at 9:33 PM Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Mon, Sep 16, 2019 at 11:28:57PM +1000, Joel Nothman wrote: > > That is, we could consider this resolved after 14 votes in favour. > > > So far, if I've interpreted correctly: > > > +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, > roman) = 9. > > > I've not understood a clear position from Alex. I'm assuming Andreas is > in > > favour given his comments elsewhere, but we've not seen comment here. > > I was planning to vote -0 mostly to avoid the vote to seem like bandwagon > (and because I am not fully sold on the idea), but I actually want this > to move forward, and it seems that my vote is needed. > > Hence, I vote +1. > > Hopefully Andreas and Alex make their position clear and we can adopt the > SLEP. > > Thank you to you all. > > Ga?l > > > On Mon, 16 Sep 2019 at 20:06, Roman Yurchak > wrote: > > > +1 assuming we are careful about continuing to allow some frequently > > used positional arguments, even in __init__. > > > For instance, > > > n_components = 10 > > pca = PCA(n_components) > > > is still more readable, I think, than, > > > pca = PCA(n_components=n_components) > -- > Gael Varoquaux > Research Director, INRIA > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at inria.fr Tue Sep 17 02:09:40 2019 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Tue, 17 Sep 2019 08:09:40 +0200 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org> Message-ID: Yes I am +1 for positional arguments for the __init__ of the estimators. Alex On Mon, Sep 16, 2019 at 10:25 PM Albert Thomas wrote: > Hi all, > > Just a few comments about this SLEP from a contributor and user of the > library :). > > I think it is important for users to be able to quickly and easily > know/learn which arguments should be keyword arguments when they use > scikit-learn. As a user, I do not want to have to double check each time I > use a function the arguments that should be keyword arguments. Hence the > following sentence of the SLEP "the decision for these methods should be > the same throughout the library in order to keep a consistent interface to > the user" is very important to me. Also how is this going to be > rendered by sphinx in the doc? (before numpydoc supports section for > parameters) > > Thanks, > Albert > > > On Mon, Sep 16, 2019 at 9:33 PM Gael Varoquaux < > gael.varoquaux at normalesup.org> wrote: > >> On Mon, Sep 16, 2019 at 11:28:57PM +1000, Joel Nothman wrote: >> > That is, we could consider this resolved after 14 votes in favour. >> >> > So far, if I've interpreted correctly: >> >> > +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, >> roman) = 9. >> >> > I've not understood a clear position from Alex. I'm assuming Andreas is >> in >> > favour given his comments elsewhere, but we've not seen comment here. >> >> I was planning to vote -0 mostly to avoid the vote to seem like bandwagon >> (and because I am not fully sold on the idea), but I actually want this >> to move forward, and it seems that my vote is needed. >> >> Hence, I vote +1. >> >> Hopefully Andreas and Alex make their position clear and we can adopt the >> SLEP. >> >> Thank you to you all. >> >> Ga?l >> >> > On Mon, 16 Sep 2019 at 20:06, Roman Yurchak >> wrote: >> >> > +1 assuming we are careful about continuing to allow some frequently >> > used positional arguments, even in __init__. >> >> > For instance, >> >> > n_components = 10 >> > pca = PCA(n_components) >> >> > is still more readable, I think, than, >> >> > pca = PCA(n_components=n_components) >> -- >> Gael Varoquaux >> Research Director, INRIA >> http://gael-varoquaux.info >> http://twitter.com/GaelVaroquaux >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Tue Sep 17 03:42:56 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 17 Sep 2019 17:42:56 +1000 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org> Message-ID: I think you mean keyword-only, Alex On Tue., 17 Sep. 2019, 4:11 pm Alexandre Gramfort, < alexandre.gramfort at inria.fr> wrote: > Yes I am +1 for positional arguments for the __init__ of the estimators. > > Alex > Albert: my position when reviewing changes in accordance with this SLEP would be to (a) perhaps get usage evidence as discussed in the SLEP pull request review; and (b) apply a rule of thumb like "are the semantics reasonably clear when the argument is passed positionally?" I think they are clear for PCA's components, for Pipeline's steps, and for GridSearchCV's estimator and parameter grid. Other parameters of those estimators seem more suitable for keyword-only. Trickier is whether n_components in TSNE should follow PCA in being positional... It's not as commonly set by users. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Tue Sep 17 19:28:30 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Wed, 18 Sep 2019 09:28:30 +1000 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org> Message-ID: If we were to assume Andy's vote in the positive, him having been a major proponent of this change, we would say this was accepted by a unanimous vote of a majority of core developers. Having tentatively accepted is good enough basis for us to start implementation. And ideally getting statistics to guide that. We should tackle this module by module, perhaps working through estimators before other public API. As such, I have opened https://github.com/scikit-learn/scikit-learn/issues/15005 to start tracking this work. Thanks everyone, and Andy, we await your vote! J -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Wed Sep 18 10:29:20 2019 From: niourf at gmail.com (Nicolas Hug) Date: Wed, 18 Sep 2019 10:29:20 -0400 Subject: [scikit-learn] Monthly meetings between core developers + "Hello World" In-Reply-To: <136faf1a-5514-1c21-7514-0673b4ddde81@gmail.com> References: <20190718100451.B608918C0090@webmail.sinamail.sina.com.cn> <08716118-a3a8-0131-aeca-f97a8aba3f25@gmail.com> <60f8ad16-3e13-765a-4c4a-6a80f7a4d998@gmail.com> <1e489f79-ebb5-b394-c99c-ed71bce1e607@gmail.com> <92ce29e5-4a54-9545-1d51-79bda3713c25@gmail.com> <136faf1a-5514-1c21-7514-0673b4ddde81@gmail.com> Message-ID: <890e938c-71a1-df9d-3f26-a331e5a0244c@gmail.com> Hi everyone, Remainder that the next monthly meeting is on Monday! Please update your project notes *before Friday* so we don't have extra work on the WE :) https://github.com/scikit-learn/scikit-learn/projects/15 https://appear.in/amueller https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=9&day=23&hour=13&min=0&sec=0&p1=240&p2=33&p3=37&p4=179 Cheers, Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Sep 18 11:11:40 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 18 Sep 2019 11:11:40 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References: Message-ID: On 9/15/19 8:16 AM, Guillaume Lema?tre wrote: > > > On Sat, 14 Sep 2019 at 20:59, C W > wrote: > > Thanks,?Guillaume. > Column transformer looks pretty neat. I've also heard though, this > pipeline can be tedious to set up? Specifying what you want for > every feature is a pain. > > > It would be interesting for us which part of the pipeline is tedious > to set up to know if we can improve something there. > Do you mean, that you would like to automatically detect of which type > of feature (categorical/numerical) and apply a > default encoder/scaling such as discuss there: > https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 > > IMO, one a user perspective, it would be cleaner in some cases at the > cost of applying blindly a black box > which might be dangerous. Also see https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor Which basically does that. > > Jaiver, > Actually, you guessed right. My real data has only one numerical > variable,?looks more like this: > > Gender Date? ? ? ? ? ? Income? Car?? Attendance > Male? ? ?2019/3/01? ?10000?? BMW????????? Yes > Female 2019/5/02? ? 9000? ?Toyota? ??????? No > Male???? 2019/7/15? ?12000 ?? Audi ? ? ????? Yes > > I am predicting income using all other categorical variables. > Maybe?it is catboost! > > Thanks, > > M > > > > > > > On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez wrote: > > If you have datasets with many categorical features, and > perhaps many categories, the tools in sklearn are quite limited, > but there are alternative implementations of boosted trees > that are designed with categorical features in mind. Take a look > at catboost [1], which has an sklearn-compatible API. > > J > > [1] https://catboost.ai/ > > On Sat, Sep 14, 2019 at 3:40 AM C W > wrote: > > Hello all, > I'm very confused. Can the decision tree module handle > both continuous and categorical features in the dataset? > In this case, it's just CART (Classification and > Regression Trees). > > For example, > Gender Age Income? Car?? Attendance > Male???? 30?? 10000?? BMW????????? Yes > Female 35???? 9000? Toyota? ??????? No > Male???? 50?? 12000 ?? Audi ? ? ????? Yes > > According to the documentation > https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, > it can not! > > It says: "scikit-learn implementation does not support > categorical variables for now". > > Is this true? If not, can someone point me to an example? > If yes, what do people do? > > Thank you very much! > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Sep 18 11:16:44 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 18 Sep 2019 11:16:44 -0400 Subject: [scikit-learn] MultiBinarizer issue In-Reply-To: References: Message-ID: <2e2dc156-f2bf-335a-984f-7f6622fe7d5b@gmail.com> Please don't repost questions. Also, you didn't create a minimal reproducible example as suggested on stackoverflow: https://stackoverflow.com/help/minimal-reproducible-example That process would probably have shown you where the issue is. I highly recommend doing that next time. On 9/16/19 12:12 AM, Sayak Paul wrote: > > I am working on a multi-label text classification problem. In order to > encode the labels, I am using |MultiLabelBinarizer|. The labels of the > dataset look like - > > |[cs.AI,cs.CL,cs.CV,cs.NE,stat.ML][cs.CL,cs.AI,cs.LG,cs.NE,stat.ML][cs.CL,cs.AI,cs.LG,cs.NE,stat.ML][stat.ML,cs.AI,cs.CL,cs.LG,cs.NE][cs.CL,cs.AI,cs.LG,cs.NE,stat.ML]| > > When I am using > > |mlb =MultiLabelBinarizer()mlb.fit(labels)print(mlb.classes_)| > > It gives me - > > |array([' > ',',','.','A','B','C','D','E','G','H','I','L','M','N','O','P','R','S','T','V','Y','[',']','a','c','h','m','s','t'],dtype=object)| > > I (partially) fixed this problem by |mlb.fit([y_train])|?and I got (I > printed first 10 classes) - > > |array(['[cs.AI, cs.CC]','[cs.AI, cs.CV]','[cs.AI, cs.CY]','[cs.AI, > cs.DB]','[cs.AI, cs.DS]','[cs.AI, cs.GT]','[cs.AI, cs.HC]','[cs.AI, > cs.IR]','[cs.AI, cs.LG, stat.ML]','[cs.AI, cs.LG]'],dtype=object)| > > Ideally, it should output the individual classes (there may be > something wrong in my code). When I am using > |mlb.fit_transform([y_train])|, I am getting - > > |array([[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]])| > > Help would be very much appreciated. > > Here's the corresponding StackOverflow issue: > https://stackoverflow.com/questions/57917936/multilabelbinarizer-gives-individual-characters-instead-of-the-classes > > > Sayak Paul |sayak.dev > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Sep 18 11:24:55 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 18 Sep 2019 11:24:55 -0400 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com> References: <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com> Message-ID: <10788b6e-683d-c208-5dd0-d193f28fc23d@gmail.com> The SLEP says: This proposal suggests making only/most commonly/used parameters positional. The/most commonly/used parameters are defined per method or function, to be defined as either of the following two ways: * The set defined and agreed upon by the core developers, which should cover the/easy/cases. * A set identified as being in the top 95% of the use cases, using some automated analysis such asthis one orthis one . And describes a clear deprecation path. So that seems pretty actionable? Also, I vote +1 on the SLEP. Nicolas: Do you think this is not actionable? I had suggested that we define a clear rule but doing a case-by-case seems better than bikeshedding now. Alexandre: did you read the SLEP before asking? I thought the point of the SLEP was to summarize the discussion. If your question is not answered we should amend the SLEP. On 9/11/19 2:21 PM, Nicolas Hug wrote: > > Since there is no explicit proposal in the SLEP it's not very clear > what we need to vote for / against? > > But overall I'm? + 1 on forcing kwargs for all __init__ methods. > > > Nicolas > > > On 9/11/19 9:38 AM, Adrin wrote: >> Hi, >> >> I'm (mostly) the messenger, don't shoot me :P >> >> It may help to summarize the SLEP: >> 1. This can be applied to all methods, not just __init__. >> 2. The SLEP doesn't say we have to apply it everywhere. It's mostly >> that it lets us do that. >> 3. It doesn't make ALL inputs a keywords only argument. The common >> ones such as X and y in fit(X, y) will stay as they are. >> Therefore clf.fit(X, y) will definitely be allowed. >> 4. Whether or not sample_weight should be keyword only or not in fit, >> requires its own discussion, and the route of the discussion >> ?? is defined in the SLEP. >> >> In other words, if an input parameter is used as a positional >> argument less frequently than X% of the time, then it can/should be >> a keyword only argument. But the SLEP better defines these conditions. >> >> I hope that clarifies it a little bit. >> >> Adrin/ >> >> On Wed, Sep 11, 2019 at 3:23 PM Alexandre Gramfort >> > wrote: >> >> hi, >> >> Adrin do you suggest this for everything or maybe just for __init__ >> params of estimators >> and stuff that can come after X, y in fit eg sample_weights? >> >> would: >> >> clf.fit(X, y) >> >> still be allowed? >> >> Alex >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Sep 18 11:27:54 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 18 Sep 2019 11:27:54 -0400 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org> Message-ID: <31ca2161-79b9-07ef-6775-9eea4b374225@gmail.com> Sorry, I was on vacation ;) ?+1 from me. On 9/17/19 7:28 PM, Joel Nothman wrote: > If we were to assume Andy's vote in the positive, him having been a > major proponent of this change, we would say this was accepted by a > unanimous vote of a majority of core developers. > > Having tentatively accepted is good enough basis for us to start > implementation. And ideally getting statistics to guide that. > > We should tackle this module by module, perhaps working through > estimators before other public API. > > As such, I have opened > https://github.com/scikit-learn/scikit-learn/issues/15005?to start > tracking this work. > > Thanks everyone, and Andy, we await your vote! > > J > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Sep 18 11:34:18 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 18 Sep 2019 11:34:18 -0400 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn> <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr> <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com> <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org> Message-ID: On 9/17/19 3:42 AM, Joel Nothman wrote: > I think you mean keyword-only, Alex > > On Tue., 17 Sep. 2019, 4:11 pm Alexandre Gramfort, > > wrote: > > Yes I am?+1 for positional arguments for the __init__ of the > estimators. > > Alex > > > > Albert: my position when reviewing changes in accordance with this > SLEP would be to (a) perhaps get usage evidence as discussed in the > SLEP pull request review; and (b) apply a rule of thumb like "are the > semantics reasonably clear when the argument is passed positionally?" > I think they are clear for PCA's components, for Pipeline's steps, and > for GridSearchCV's estimator and parameter grid. Other parameters of > those estimators seem more suitable for keyword-only. I think you're not fully addressing Albert's concern, which I think is quite important and hasn't been brought up before. I think Albert is saying that it should be easy for a new user to build a mental model of when a positional argument is allowed. If we can't specify a simple rule, then it's very hard for a new (or really any) user to have clear expectations. And I think sklearn is all about setting clear expectations. > Also how? is this going to be rendered by sphinx in the doc? There will be a star in the signature between positional and kw only args i.e. PCA(n_components=2, *, copy=True, ...) So you could always look at the docs to figure it out. That's clearly not very convenient. -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Wed Sep 18 11:46:21 2019 From: niourf at gmail.com (Nicolas Hug) Date: Wed, 18 Sep 2019 11:46:21 -0400 Subject: [scikit-learn] Vote on SLEP009: keyword only arguments In-Reply-To: <10788b6e-683d-c208-5dd0-d193f28fc23d@gmail.com> References: <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com> <10788b6e-683d-c208-5dd0-d193f28fc23d@gmail.com> Message-ID: I think Alex's and my concerns are legit. The SLEP is asking "are you OK with forcing some parameters to be kwords only? We still don't know which ones though". I understand why you don't want to bike shed now, but that's a surprisingly mild SLEP, hence the questions. The only response I can give to the SLEP is right now is "sure, depends". Nicolas On 9/18/19 11:24 AM, Andreas Mueller wrote: > The SLEP says: > > This proposal suggests making only/most commonly/used parameters > positional. The/most commonly/used parameters are defined per method > or function, to be defined as either of the following two ways: > > * The set defined and agreed upon by the core developers, which > should cover the/easy/cases. > * A set identified as being in the top 95% of the use cases, using > some automated analysis such asthis one > orthis one > . > > And describes a clear deprecation path. > > So that seems pretty actionable? > > > Also, I vote +1 on the SLEP. > > Nicolas: Do you think this is not actionable? I had suggested that we > define a clear rule but doing a case-by-case seems better than > bikeshedding now. > > Alexandre: did you read the SLEP before asking? I thought the point of > the SLEP was to summarize the discussion. If your question is not > answered we should amend the SLEP. > > > > On 9/11/19 2:21 PM, Nicolas Hug wrote: >> >> Since there is no explicit proposal in the SLEP it's not very clear >> what we need to vote for / against? >> >> But overall I'm? + 1 on forcing kwargs for all __init__ methods. >> >> >> Nicolas >> >> >> On 9/11/19 9:38 AM, Adrin wrote: >>> Hi, >>> >>> I'm (mostly) the messenger, don't shoot me :P >>> >>> It may help to summarize the SLEP: >>> 1. This can be applied to all methods, not just __init__. >>> 2. The SLEP doesn't say we have to apply it everywhere. It's mostly >>> that it lets us do that. >>> 3. It doesn't make ALL inputs a keywords only argument. The common >>> ones such as X and y in fit(X, y) will stay as they are. >>> Therefore clf.fit(X, y) will definitely be allowed. >>> 4. Whether or not sample_weight should be keyword only or not in >>> fit, requires its own discussion, and the route of the discussion >>> ?? is defined in the SLEP. >>> >>> In other words, if an input parameter is used as a positional >>> argument less frequently than X% of the time, then it can/should be >>> a keyword only argument. But the SLEP better defines these conditions. >>> >>> I hope that clarifies it a little bit. >>> >>> Adrin/ >>> >>> On Wed, Sep 11, 2019 at 3:23 PM Alexandre Gramfort >>> > >>> wrote: >>> >>> hi, >>> >>> Adrin do you suggest this for everything or maybe just for __init__ >>> params of estimators >>> and stuff that can come after X, y in fit eg sample_weights? >>> >>> would: >>> >>> clf.fit(X, y) >>> >>> still be allowed? >>> >>> Alex >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchapman48 at gatech.edu Thu Sep 19 15:41:07 2019 From: jchapman48 at gatech.edu (Chapman, James E) Date: Thu, 19 Sep 2019 19:41:07 +0000 Subject: [scikit-learn] Porting old MLPY KRR model to scikit-learn Message-ID: Hello, I have some old KRR models from MLPY and I need to port those models over to a new code written with scikit-learn (transfer MLPY KRR data to a scikit-learn KernelRidge instance). Does anyone know if this is even possible, and if so, could you give me some suggestions as to how to accomplish it? Thanks and regards, James -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.eickenberg at gmail.com Thu Sep 19 15:51:23 2019 From: michael.eickenberg at gmail.com (Michael Eickenberg) Date: Thu, 19 Sep 2019 12:51:23 -0700 Subject: [scikit-learn] Porting old MLPY KRR model to scikit-learn In-Reply-To: References: Message-ID: What exactly do you mean by "port"? Put already fitted models into a sklearn estimator object? You can do this as follows: You should be able to create a `estimator = sklearn.kernel_ridge.KernelRidge(...)` object, call `fit` to some random data of the appropriate shape, and then set `estimator.dual_coef_` to the ones from your MLPY model (the sklearn version sets them here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_ridge.py#L165 ). If this is not what you mean, then maybe you just want to refit them using the appropriate KernelRidge kernel? Hope this helps! Michael On Thu, Sep 19, 2019 at 12:43 PM Chapman, James E wrote: > Hello, > > I have some old KRR models from MLPY and I need to port those models over > to a new code written with scikit-learn (transfer MLPY KRR data to a > scikit-learn KernelRidge instance). Does anyone know if this is even > possible, and if so, could you give me some suggestions as to how to > accomplish it? > > > > Thanks and regards, > > James > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchapman48 at gatech.edu Thu Sep 19 21:20:39 2019 From: jchapman48 at gatech.edu (Chapman, James E) Date: Fri, 20 Sep 2019 01:20:39 +0000 Subject: [scikit-learn] Porting old MLPY KRR model to scikit-learn In-Reply-To: References: Message-ID: <246ABEC6-7933-4A8E-9337-CE0AFF73C95F@gatech.edu> Hello, Thank you for your comments. I had actually initially tried your first suggestion, but the predicted values just wouldn?t line up between the two models. As I dug into the source code of the two, I realized that they don?t appear to be the same. MLPY adds a bias term to both the training and prediction process, whereas, correct me if I?m wrong, scikit-learn does not. This results in two fundamentally different sets of codes: MLPY (prediction): np.dot(self._alpha, Kt_arr.T) + self._b Scikit-learn(prediction): np.dot(K, self.dual_coef_) Here, MLPY?s alphas correspond to scikit-learn?s dual_coef_, and the kernel values are just stored differently, so one has to be transposed. If I just try and add MLPY?s bias term to scikit-learn?s prediction (model.predict), the values don?t match those predicted by MLPY (they?re close but they are not off by a constant value). Am I missing something obvious, or is there really a fundamental difference here? From: scikit-learn on behalf of Michael Eickenberg Reply-To: Scikit-learn mailing list Date: Thursday, September 19, 2019 at 3:53 PM To: Scikit-learn mailing list Subject: Re: [scikit-learn] Porting old MLPY KRR model to scikit-learn What exactly do you mean by "port"? Put already fitted models into a sklearn estimator object? You can do this as follows: You should be able to create a `estimator = sklearn.kernel_ridge.KernelRidge(...)` object, call `fit` to some random data of the appropriate shape, and then set `estimator.dual_coef_` to the ones from your MLPY model (the sklearn version sets them here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_ridge.py#L165). If this is not what you mean, then maybe you just want to refit them using the appropriate KernelRidge kernel? Hope this helps! Michael On Thu, Sep 19, 2019 at 12:43 PM Chapman, James E > wrote: Hello, I have some old KRR models from MLPY and I need to port those models over to a new code written with scikit-learn (transfer MLPY KRR data to a scikit-learn KernelRidge instance). Does anyone know if this is even possible, and if so, could you give me some suggestions as to how to accomplish it? Thanks and regards, James _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sun Sep 22 18:56:21 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Mon, 23 Sep 2019 08:56:21 +1000 Subject: [scikit-learn] Website redesign Message-ID: Hi scikit-learn users, Scikit-learn developer Thomas Fan recently gave our documentation and web site a refresh, targeting desktop and mobile devices. Please give it a try at https://scikit-learn.org/dev/ and raise usability issues at https://github.com/scikit-learn/scikit-learn/issues/new to help us get it ready for the next release. Congratulations to Thomas on some great work! Thanks all! Joel -------------- next part -------------- An HTML attachment was scrubbed... URL: From solegalli1 at gmail.com Tue Sep 24 07:39:33 2019 From: solegalli1 at gmail.com (Sole Galli) Date: Tue, 24 Sep 2019 12:39:33 +0100 Subject: [scikit-learn] Normalizer, l1 and l2 norms Message-ID: Hello team, Quick question respect to the Normalizer(). My understanding is that this transformer divides the values (rows) of a vector by the vector euclidean (l2) or manhattan distances (l1). >From the sklearn docs, I understand that the Normalizer() does not learn the distances from the train set and stores them. It rathers normalises the data according to distance the data set presents, which could be or not, the same in test and train. Am I understanding this correctly? If so, what is the reason not to store these parameters in the Normalizer and use them to scale future data? If not getting it right, what am I missing? Many thanks and I will appreciate if you have an article on this to share. Cheers Sole -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Tue Sep 24 07:59:25 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Tue, 24 Sep 2019 13:59:25 +0200 Subject: [scikit-learn] Normalizer, l1 and l2 norms In-Reply-To: References: Message-ID: Since you are normalizing sample by sample, you don't need information from the training set to normalize a new sample. You just need to compute the norm of this new sample. On Tue, 24 Sep 2019 at 13:41, Sole Galli wrote: > Hello team, > > Quick question respect to the Normalizer(). > > My understanding is that this transformer divides the values (rows) of a > vector by the vector euclidean (l2) or manhattan distances (l1). > > From the sklearn docs, I understand that the Normalizer() does not learn > the distances from the train set and stores them. It rathers normalises the > data according to distance the data set presents, which could be or not, > the same in test and train. > > Am I understanding this correctly? > > If so, what is the reason not to store these parameters in the Normalizer > and use them to scale future data? > > If not getting it right, what am I missing? > > Many thanks and I will appreciate if you have an article on this to share. > > Cheers > > Sole > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From solegalli1 at gmail.com Tue Sep 24 08:02:25 2019 From: solegalli1 at gmail.com (Sole Galli) Date: Tue, 24 Sep 2019 13:02:25 +0100 Subject: [scikit-learn] Normalizer, l1 and l2 norms In-Reply-To: References: Message-ID: Sorry, ignore my question, I got it right now. It is calculating the norm of the observation vector (across variables), and its distance varies obs per obs, that is why it needs to be re-calculated, and therefore not stored. I would appreciate some articles / links with successful implementations of this technique and why it adds value to ML. Would you be able to point me to any? Cheers Sole On Tue, 24 Sep 2019 at 12:39, Sole Galli wrote: > Hello team, > > Quick question respect to the Normalizer(). > > My understanding is that this transformer divides the values (rows) of a > vector by the vector euclidean (l2) or manhattan distances (l1). > > From the sklearn docs, I understand that the Normalizer() does not learn > the distances from the train set and stores them. It rathers normalises the > data according to distance the data set presents, which could be or not, > the same in test and train. > > Am I understanding this correctly? > > If so, what is the reason not to store these parameters in the Normalizer > and use them to scale future data? > > If not getting it right, what am I missing? > > Many thanks and I will appreciate if you have an article on this to share. > > Cheers > > Sole > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Tue Sep 24 09:03:03 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Tue, 24 Sep 2019 15:03:03 +0200 Subject: [scikit-learn] Normalizer, l1 and l2 norms In-Reply-To: References: Message-ID: One example where I saw it used was Scale-Invariant Feature Transform (SIFT). Normalizing each vector to have a unit length will compensate for affine changes in illumination between samples. The use case given in scikit-learn would be something similar but with text processing: "Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community." So basically, you cancel a transform and it allows you to compare samples between each other. On Tue, 24 Sep 2019 at 14:04, Sole Galli wrote: > Sorry, ignore my question, I got it right now. > > It is calculating the norm of the observation vector (across variables), > and its distance varies obs per obs, that is why it needs to be > re-calculated, and therefore not stored. > > I would appreciate some articles / links with successful implementations > of this technique and why it adds value to ML. Would you be able to point > me to any? > > Cheers > > Sole > > > > > > On Tue, 24 Sep 2019 at 12:39, Sole Galli wrote: > >> Hello team, >> >> Quick question respect to the Normalizer(). >> >> My understanding is that this transformer divides the values (rows) of a >> vector by the vector euclidean (l2) or manhattan distances (l1). >> >> From the sklearn docs, I understand that the Normalizer() does not learn >> the distances from the train set and stores them. It rathers normalises the >> data according to distance the data set presents, which could be or not, >> the same in test and train. >> >> Am I understanding this correctly? >> >> If so, what is the reason not to store these parameters in the Normalizer >> and use them to scale future data? >> >> If not getting it right, what am I missing? >> >> Many thanks and I will appreciate if you have an article on this to share. >> >> Cheers >> >> Sole >> >> >> _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From solegalli1 at gmail.com Wed Sep 25 04:05:16 2019 From: solegalli1 at gmail.com (Sole Galli) Date: Wed, 25 Sep 2019 09:05:16 +0100 Subject: [scikit-learn] Normalizer, l1 and l2 norms In-Reply-To: References: Message-ID: Thank you Guillaume, that is helpful. Cheers Sole On Tue, 24 Sep 2019 at 14:04, Guillaume Lema?tre wrote: > One example where I saw it used was Scale-Invariant Feature Transform > (SIFT). Normalizing each vector to have a unit length will compensate for > affine changes in illumination between samples. > The use case given in scikit-learn would be something similar but with > text processing: > > "Scaling inputs to unit norms is a common operation for text > classification or clustering for instance. For instance the dot product of > two l2-normalized TF-IDF vectors is the cosine similarity of the vectors > and is the base similarity metric for the Vector Space Model commonly used > by the Information Retrieval community." > > So basically, you cancel a transform and it allows you to compare samples > between each other. > > On Tue, 24 Sep 2019 at 14:04, Sole Galli wrote: > >> Sorry, ignore my question, I got it right now. >> >> It is calculating the norm of the observation vector (across variables), >> and its distance varies obs per obs, that is why it needs to be >> re-calculated, and therefore not stored. >> >> I would appreciate some articles / links with successful implementations >> of this technique and why it adds value to ML. Would you be able to point >> me to any? >> >> Cheers >> >> Sole >> >> >> >> >> >> On Tue, 24 Sep 2019 at 12:39, Sole Galli wrote: >> >>> Hello team, >>> >>> Quick question respect to the Normalizer(). >>> >>> My understanding is that this transformer divides the values (rows) of a >>> vector by the vector euclidean (l2) or manhattan distances (l1). >>> >>> From the sklearn docs, I understand that the Normalizer() does not learn >>> the distances from the train set and stores them. It rathers normalises the >>> data according to distance the data set presents, which could be or not, >>> the same in test and train. >>> >>> Am I understanding this correctly? >>> >>> If so, what is the reason not to store these parameters in the >>> Normalizer and use them to scale future data? >>> >>> If not getting it right, what am I missing? >>> >>> Many thanks and I will appreciate if you have an article on this to >>> share. >>> >>> Cheers >>> >>> Sole >>> >>> >>> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu at mblondel.org Thu Sep 26 08:53:43 2019 From: mathieu at mblondel.org (Mathieu Blondel) Date: Thu, 26 Sep 2019 14:53:43 +0200 Subject: [scikit-learn] Website redesign In-Reply-To: References: Message-ID: Great work indeed! Love it! Mathieu On Mon, Sep 23, 2019 at 12:58 AM Joel Nothman wrote: > Hi scikit-learn users, > > Scikit-learn developer Thomas Fan recently gave our documentation and web > site a refresh, targeting desktop and mobile devices. Please give it a try > at https://scikit-learn.org/dev/ and raise usability issues at > https://github.com/scikit-learn/scikit-learn/issues/new to help us get it > ready for the next release. > > Congratulations to Thomas on some great work! > > Thanks all! > > Joel > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: