From pahome.chen at mirlab.org Fri Feb 1 00:19:20 2019 From: pahome.chen at mirlab.org (lampahome) Date: Fri, 1 Feb 2019 13:19:20 +0800 Subject: [scikit-learn] Does model consider about previous training results after reloading model and then training with new data? Message-ID: As title, I'm confused. If I reload model and train with new data, what happened? 1st train old data -> save model -> reload -> train with new data Does the 2nd training will consider about previous training results? Or just a new result with new data? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Fri Feb 1 00:26:36 2019 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 31 Jan 2019 23:26:36 -0600 Subject: [scikit-learn] Does model consider about previous training results after reloading model and then training with new data? In-Reply-To: References: Message-ID: Hi there, if you call the "fit" method, the learning will essentially start from scratch. So no, it doesn't consider previous training results. However, certain algorithms are implemented with an additional partial_fit method that would consider previous training rounds. Best, Sebastian > On Jan 31, 2019, at 11:19 PM, lampahome wrote: > > As title, I'm confused. > > If I reload model and train with new data, what happened? > > 1st train old data -> save model -> reload -> train with new data > > Does the 2nd training will consider about previous training results? > Or just a new result with new data? > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From pahome.chen at mirlab.org Fri Feb 1 01:52:52 2019 From: pahome.chen at mirlab.org (lampahome) Date: Fri, 1 Feb 2019 14:52:52 +0800 Subject: [scikit-learn] Does model consider about previous training results after reloading model and then training with new data? In-Reply-To: References: Message-ID: Sebastian Raschka ? 2019?2?1? ?? ??1:48??? > Hi there, > > if you call the "fit" method, the learning will essentially start from > scratch. So no, it doesn't consider previous training results. However, certain algorithms are implemented with an additional partial_fit > method that would consider previous training rounds. > So if I want to reach like "continue training", I should choose model with partial_fit, right? What I want is regression, but I saw nothing have partial_fit function in ensemble methods, Can found in other places? thx -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Fri Feb 1 02:07:46 2019 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Fri, 1 Feb 2019 01:07:46 -0600 Subject: [scikit-learn] Does model consider about previous training results after reloading model and then training with new data? In-Reply-To: References: Message-ID: <579CC79B-6D69-49CF-BE0F-858A37345F64@sebastianraschka.com> > So if I want to reach like "continue training", I should choose model with partial_fit, right? Yes. > but I saw nothing have partial_fit function in ensemble methods, Hm, technically, if the models in the ensemble support partial_fit the ensemble method itself should also be able to use partial_fit. My guess is that it is not implemented because it cannot be guaranteed that the individual models support partial_fit. However, if you are using the voting classifier, you could probably just train the individual models of the ensemble, because the voting classifier's decision rule is fixed. I think the following could work if the estimators_ support partial_fit: voter = VotingClassifier(...) voter.fit(...) For further training: for i in len(estimators_): voter.estimators_[i].partial_fit(...) Best, Sebastian > On Feb 1, 2019, at 12:52 AM, lampahome wrote: > > > > Sebastian Raschka ? 2019?2?1? ?? ??1:48??? > Hi there, > > if you call the "fit" method, the learning will essentially start from scratch. So no, it doesn't consider previous training results. > However, certain algorithms are implemented with an additional partial_fit method that would consider previous training rounds. > > So if I want to reach like "continue training", I should choose model with partial_fit, right? > > What I want is regression, but I saw nothing have partial_fit function in ensemble methods, > > Can found in other places? > > thx > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From pahome.chen at mirlab.org Fri Feb 1 02:38:44 2019 From: pahome.chen at mirlab.org (lampahome) Date: Fri, 1 Feb 2019 15:38:44 +0800 Subject: [scikit-learn] Does model consider about previous training results after reloading model and then training with new data? In-Reply-To: <579CC79B-6D69-49CF-BE0F-858A37345F64@sebastianraschka.com> References: <579CC79B-6D69-49CF-BE0F-858A37345F64@sebastianraschka.com> Message-ID: > > > > I think the following could work if the estimators_ support partial_fit: > > voter = VotingClassifier(...) > voter.fit(...) > > For further training: > > for i in len(estimators_): > voter.estimators_[i].partial_fit(...) > > ok, maybe using Voting classifier to determine regression -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane.grigsby at colorado.edu Sat Feb 2 15:29:32 2019 From: shane.grigsby at colorado.edu (Shane Grigsby) Date: Sat, 2 Feb 2019 13:29:32 -0700 Subject: [scikit-learn] sklearn.cluster.OPTICS In-Reply-To: References: <1288774727.8422712.1548767430428.JavaMail.zimbra@zimbra.unideb.hu> Message-ID: <20190202202932.y3vp4rav76agzrqi@talus> Hi Mohit, If you install the development version of sklearn, OPTICS should be available... Can you be a bit more specific about how it isn't working? Are you running into an error? Best, Shane On 01/29, Adrin wrote: >Hi, > >OPTICS is still under development and there are quite a few open issues and >PRs regarding the method. It's available on master, but not on any of the >releases yet. We will hopefully have it out for the next release. > >Best, >Adrin. > >On Tue, 29 Jan 2019 at 14:31 Mohit Srivastava < >mohit.srivastava at med.unideb.hu> wrote: > >> Dear all, >> >> I want to use your clustering algorithm "sklearn.cluster.OPTICS". >> But it is not working and found that it's not available at the moment( >> found on the internet). >> Could you please help me with the issue? >> When would it be possible to use it? >> Please reply as soon as possible. >> thanks >> regards >> Mohit Srivastava >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >_______________________________________________ >scikit-learn mailing list >scikit-learn at python.org >https://mail.python.org/mailman/listinfo/scikit-learn -- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* From mohit.srivastava at med.unideb.hu Mon Feb 4 05:20:59 2019 From: mohit.srivastava at med.unideb.hu (Mohit Srivastava) Date: Mon, 4 Feb 2019 11:20:59 +0100 (CET) Subject: [scikit-learn] sklearn.cluster.OPTICS In-Reply-To: <20190202202932.y3vp4rav76agzrqi@talus> References: <1288774727.8422712.1548767430428.JavaMail.zimbra@zimbra.unideb.hu> <20190202202932.y3vp4rav76agzrqi@talus> Message-ID: <1156762115.5592279.1549275659526.JavaMail.zimbra@zimbra.unideb.hu> Hello , How can I install the development version of sklearn in Anaconda 3? thanks regards Mohit ----- Original Message ----- From: Shane Grigsby To: Scikit-learn mailing list Cc: mohit srivastava Sent: Sat, 02 Feb 2019 21:29:32 +0100 (CET) Subject: Re: [scikit-learn] sklearn.cluster.OPTICS Hi Mohit, If you install the development version of sklearn, OPTICS should be available... Can you be a bit more specific about how it isn't working? Are you running into an error? Best, Shane On 01/29, Adrin wrote: >Hi, > >OPTICS is still under development and there are quite a few open issues and >PRs regarding the method. It's available on master, but not on any of the >releases yet. We will hopefully have it out for the next release. > >Best, >Adrin. > >On Tue, 29 Jan 2019 at 14:31 Mohit Srivastava < >mohit.srivastava at med.unideb.hu> wrote: > >> Dear all, >> >> I want to use your clustering algorithm "sklearn.cluster.OPTICS". >> But it is not working and found that it's not available at the moment( >> found on the internet). >> Could you please help me with the issue? >> When would it be possible to use it? >> Please reply as soon as possible. >> thanks >> regards >> Mohit Srivastava >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >_______________________________________________ >scikit-learn mailing list >scikit-learn at python.org >https://mail.python.org/mailman/listinfo/scikit-learn -- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* From adrin.jalali at gmail.com Mon Feb 4 05:24:08 2019 From: adrin.jalali at gmail.com (Adrin) Date: Mon, 4 Feb 2019 11:24:08 +0100 Subject: [scikit-learn] sklearn.cluster.OPTICS In-Reply-To: <1156762115.5592279.1549275659526.JavaMail.zimbra@zimbra.unideb.hu> References: <1288774727.8422712.1548767430428.JavaMail.zimbra@zimbra.unideb.hu> <20190202202932.y3vp4rav76agzrqi@talus> <1156762115.5592279.1549275659526.JavaMail.zimbra@zimbra.unideb.hu> Message-ID: https://scikit-learn.org/dev/developers/contributing.html#contributing-code On Mon, Feb 4, 2019 at 11:22 AM Mohit Srivastava < mohit.srivastava at med.unideb.hu> wrote: > Hello , > > How can I install the development version of sklearn in Anaconda 3? > thanks > regards > Mohit > > ----- Original Message ----- > From: Shane Grigsby > To: Scikit-learn mailing list > Cc: mohit srivastava > Sent: Sat, 02 Feb 2019 21:29:32 +0100 (CET) > Subject: Re: [scikit-learn] sklearn.cluster.OPTICS > > Hi Mohit, > If you install the development version of sklearn, OPTICS should be > available... > > Can you be a bit more specific about how it isn't working? Are you > running into an error? > Best, > Shane > > On 01/29, Adrin wrote: > >Hi, > > > >OPTICS is still under development and there are quite a few open issues > and > >PRs regarding the method. It's available on master, but not on any of the > >releases yet. We will hopefully have it out for the next release. > > > >Best, > >Adrin. > > > >On Tue, 29 Jan 2019 at 14:31 Mohit Srivastava < > >mohit.srivastava at med.unideb.hu> wrote: > > > >> Dear all, > >> > >> I want to use your clustering algorithm "sklearn.cluster.OPTICS". > >> But it is not working and found that it's not available at the moment( > >> found on the internet). > >> Could you please help me with the issue? > >> When would it be possible to use it? > >> Please reply as soon as possible. > >> thanks > >> regards > >> Mohit Srivastava > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > > >_______________________________________________ > >scikit-learn mailing list > >scikit-learn at python.org > >https://mail.python.org/mailman/listinfo/scikit-learn > > > -- > *PhD candidate & Research Assistant* > *Cooperative Institute for Research in Environmental Sciences (CIRES)* > *University of Colorado at Boulder* > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurent at moldus.org Mon Feb 4 10:28:21 2019 From: laurent at moldus.org (Laurent Julliard) Date: Mon, 4 Feb 2019 07:28:21 -0800 Subject: [scikit-learn] Scikit-learn porting strategy Message-ID: Hi everyone, If one were to start porting scikit-learn to another language what would be the plan to follow? I'm looking for directions that would say something like a) start with foundational components (e.g. numpy I guess) b) then port module A for a quick win, c) follow with modules B, C... d) the address the scipy dependent modules... Thank you for your help and advice. Eljay -- Laurent Julliard -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Feb 4 10:43:49 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 4 Feb 2019 10:43:49 -0500 Subject: [scikit-learn] Scikit-learn porting strategy In-Reply-To: References: Message-ID: <34e73389-e2da-9a73-f42b-13cc80c919fe@gmail.com> Hi Eljay. Which language? And you want to reimplement it? How many full-time developers do you have for how many year? ;) Openhub estimates scikit-learn took 39 person-years: https://www.openhub.net/p/scikit-learn/estimated_cost I'm asking about the language because there are similar projects already existing in other languages, like Julia. Cheers, Andy On 2/4/19 10:28 AM, Laurent Julliard wrote: > Hi everyone, > > If one were to start porting scikit-learn to another language what > would be the plan to follow? I'm looking for directions that would say > something like > a) start with foundational components (e.g. numpy I guess) > b) then port module A for a quick win, > c) follow with modules B, C... > d) the address the scipy dependent modules... > > Thank you for your help and advice. > > Eljay > -- > > Laurent Julliard > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurent at moldus.org Tue Feb 5 03:19:43 2019 From: laurent at moldus.org (Laurent Julliard) Date: Tue, 5 Feb 2019 00:19:43 -0800 Subject: [scikit-learn] Scikit-learn porting strategy In-Reply-To: <34e73389-e2da-9a73-f42b-13cc80c919fe@gmail.com> References: <34e73389-e2da-9a73-f42b-13cc80c919fe@gmail.com> Message-ID: Hi Andreas, The person.year input is very valuable. This is a also the kind of information I was looking for. The language would be Ruby. Now, it's true that Ruby can already benefit from Scikit-learn through the PyCall extension... The point in my first question was also around the porting strategy: is it like you can start small and get there step by step or you cannot make anything work until you have completed say 50% of the code or more. Eljay On Mon, Feb 4, 2019 at 4:43 PM Andreas Mueller wrote: > Hi Eljay. > Which language? And you want to reimplement it? How many full-time > developers do you have for how many year? ;) Openhub estimates scikit-learn > took 39 person-years: > https://www.openhub.net/p/scikit-learn/estimated_cost > > I'm asking about the language because there are similar projects already > existing in other languages, like Julia. > > Cheers, > Andy > > > > > On 2/4/19 10:28 AM, Laurent Julliard wrote: > > Hi everyone, > > If one were to start porting scikit-learn to another language what would > be the plan to follow? I'm looking for directions that would say something > like > a) start with foundational components (e.g. numpy I guess) > b) then port module A for a quick win, > c) follow with modules B, C... > d) the address the scipy dependent modules... > > Thank you for your help and advice. > > Eljay > -- > > Laurent Julliard > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Laurent Julliard -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Tue Feb 5 06:12:48 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 5 Feb 2019 22:12:48 +1100 Subject: [scikit-learn] Scikit-learn porting strategy In-Reply-To: References: <34e73389-e2da-9a73-f42b-13cc80c919fe@gmail.com> Message-ID: If you count things in Scipy and NumPy (and Joblib and Cython?) that Scikit-learn depends on and which may be lacking or hard to find in SciRuby, it's much much more than 39 years. PyCall, and potentially some Scikit-learn-specific wrappers around it, seems a much more sensible approach. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 5 11:40:03 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 5 Feb 2019 11:40:03 -0500 Subject: [scikit-learn] Scikit-learn porting strategy In-Reply-To: References: <34e73389-e2da-9a73-f42b-13cc80c919fe@gmail.com> Message-ID: <7a1ed9f5-0757-9380-8a04-1cdf81ca1bc7@gmail.com> There's some stuff already: https://github.com/SciRuby/ And in terms of strategy: No, you can go estimator by estimator and at some point implement cross-validation and grid-search and pipelines and metrics pretty independently. It looks like daru is written in ruby which I expect to be too slow. nmatrix is written in C++, so I guess you'd have to write many of the algorithms in C++. At that point it might be easier to wrap an existing C++ library like mlpack or shogun. On 2/5/19 6:12 AM, Joel Nothman wrote: > If you count things in Scipy and NumPy (and Joblib and Cython?) that > Scikit-learn depends on and which may be lacking or hard to find > in?SciRuby, it's much much more than 39 years. PyCall, and potentially > some Scikit-learn-specific wrappers around it, seems a much more > sensible approach. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From pi at berkeley.edu Tue Feb 5 17:12:43 2019 From: pi at berkeley.edu (Paul Ivanov) Date: Tue, 5 Feb 2019 14:12:43 -0800 Subject: [scikit-learn] SciPy 2019 Conference - 10 days left for submissions, registration now open Message-ID: SciPy 2019, the 18th annual Scientific Computing with Python conference, will be held July 8-14, 2019 in Austin, Texas. The annual SciPy Conference brings together over 800 participants from industry, academia, and government to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. The call for abstracts for SciPy 2019 for talks, posters and tutorials is now open. The original deadline for submissions has been extended and the new deadline is February 15, 2019. Conference Website: https://www.scipy2019.scipy.org/ Submission Website: https://easychair.org/conferences/?conf=scipy2019 *Talks and Posters (July 10-12, 2019)* In addition to the general track, this year will have specialized tracks focused on: - Data Driven Discoveries (including Machine Learning and Data Science) - Open Source Communities (Sustainability) *Mini Symposia* - Science Communication through Visualization - Neuroscience and Cognitive Science - Image Processing - Earth, Ocean, Geo and Atmospheric Science There will also be a SciPy Tools Plenary Session each day with 2 to 5 minute updates on tools and libraries. *Tutorials (July 8-9, 2019)* Tutorials should be focused on covering a well-defined topic in a hands-on manner. We are looking for useful techniques or packages, helping new or advanced Python programmers develop better or faster scientific applications. We encourage submissions to be designed to allow at least 50% of the time for hands-on exercises even if this means the subject matter needs to be limited. Tutorials will be 4 hours in duration. In your tutorial application, you can indicate what prerequisite skills and knowledge will be needed for your tutorial, and the approximate expected level of knowledge of your students (i.e., beginner, intermediate, advanced). Instructors of accepted tutorials will receive a stipend. -- _ / \ A* \^ - ,./ _.`\\ / \ / ,--.S \/ \ / `"~,_ \ \ __o ? _ \<,_ /:\ --(_)/-(_)----.../ | \ --------------.......J Paul Ivanov http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7 -------------- next part -------------- An HTML attachment was scrubbed... URL: From avigross at verizon.net Wed Feb 6 00:22:21 2019 From: avigross at verizon.net (Avi Gross) Date: Wed, 6 Feb 2019 00:22:21 -0500 Subject: [scikit-learn] Scikit-learn porting strategy In-Reply-To: <7a1ed9f5-0757-9380-8a04-1cdf81ca1bc7@gmail.com> References: <34e73389-e2da-9a73-f42b-13cc80c919fe@gmail.com> <7a1ed9f5-0757-9380-8a04-1cdf81ca1bc7@gmail.com> Message-ID: <00b601d4bddb$efb12b00$cf138100$@verizon.net> I haven?t looked at Ruby in a long time. I do wonder what people mean by PORTING to another language or environment that already has their own way of doing things. I did most of my recent work in native R enhanced by packages and have been learning how to do similar things in modules on top of modules ? on top of native python. R chose lots of built-in functionality up-front that python did not, and vice versa. If someone wanted to port some machine learning tools to R from python, there would not necessarily be much point in porting numpy or pandas as a whole. If you did, there would be even more duplication than there is now. On the other hand, I have seen people port things to R like a dict datatype which is not quite the same as the environments objects R uses. So if RUBY already has available much of what is needed, it could make sense to rewrite algorithms around them and only add what is needed. For efficiency, sure, you might want to link in C/C++/FORTRAN libraries. As mentioned, there are already ways to run some languages within/from others. R and python can be run with either one being the initiator. If you want RUBY to completely have the new functionality, do you want to slavishly copy entire packages or have your own new one designed eclectically? There are many ways to do these things and each time I compare a few, I see differences that make some more easy or intuitive than others and other times reversed. And how far do you expect to port? What does RUBY provide for graphics for example? R had base graphics and added lattice and then ggplot. I use them all, depending on the task and how much detail I want to tweak. They are quite different as is the matplotlib that seems to be used quite a bit in python. Making plots is definitely a part of the process but if a function expects certain data structures then would your version of numpy and pandas data structures interface well with that? As Andreas says (and I am coincidentally in middle of the book he wrote with a Guido, albeit that is her last name unlike the python founder) you may find that a part of what you would do is create wrappers that accept one function interface and massage things to call a different interface. Calling a graphics program that expects a list using an array won?t work unless you quietly convert first ? From: scikit-learn On Behalf Of Andreas Mueller Sent: Tuesday, February 5, 2019 11:40 AM To: scikit-learn at python.org Subject: Re: [scikit-learn] Scikit-learn porting strategy There's some stuff already: https://github.com/SciRuby/ And in terms of strategy: No, you can go estimator by estimator and at some point implement cross-validation and grid-search and pipelines and metrics pretty independently. It looks like daru is written in ruby which I expect to be too slow. nmatrix is written in C++, so I guess you'd have to write many of the algorithms in C++. At that point it might be easier to wrap an existing C++ library like mlpack or shogun. On 2/5/19 6:12 AM, Joel Nothman wrote: If you count things in Scipy and NumPy (and Joblib and Cython?) that Scikit-learn depends on and which may be lacking or hard to find in SciRuby, it's much much more than 39 years. PyCall, and potentially some Scikit-learn-specific wrappers around it, seems a much more sensible approach. _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Wed Feb 6 12:28:46 2019 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Wed, 6 Feb 2019 09:28:46 -0800 Subject: [scikit-learn] AUCROC/MAP confidence intervals in scikit Message-ID: https://papers.nips.cc/paper/2645-confidence-intervals-for-the-area-under-the-roc-curve.pdf Does scikit (or other Python libraries) provide functions to measure the confidence interval of AUROC scores? Same question also for mean average precision. It seems like this should be a standard results reporting practice if a method is available. - Stuart -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Wed Feb 6 13:19:38 2019 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Wed, 6 Feb 2019 12:19:38 -0600 Subject: [scikit-learn] AUCROC/MAP confidence intervals in scikit In-Reply-To: References: Message-ID: <1CD58B12-55DA-4D93-954E-B3BA0EC69C4E@sebastianraschka.com> Hi Stuart, I don't think so because there is no standard way to compute CI's. That goes for all performance measures (accuracy, precision, recall, etc.). Some people use simple binomial approximation intervals, some people prefer bootstrapping etc. And it also depends on the data you have. In large datasets, binomial approximation intervals may be sufficient and bootstrapping too expensive etc. Thanks for sharing that paper btw, will have a look. Best, Sebastian > On Feb 6, 2019, at 11:28 AM, Stuart Reynolds wrote: > > https://papers.nips.cc/paper/2645-confidence-intervals-for-the-area-under-the-roc-curve.pdf > Does scikit (or other Python libraries) provide functions to measure the confidence interval of AUROC scores? Same question also for mean average precision. > > It seems like this should be a standard results reporting practice if a method is available. > > - Stuart > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From stefan.frank.ulbrich at googlemail.com Wed Feb 6 13:34:47 2019 From: stefan.frank.ulbrich at googlemail.com (Stefan Ulbrich) Date: Wed, 6 Feb 2019 19:34:47 +0100 Subject: [scikit-learn] Possible bug in BayesianGaussianMixture? Message-ID: Hello, I think I might have found a bug in the BayesianGaussianMixture?or at least encountered a behavior that I was not expecting. The problem occurs when having clusters with small extent (in my case, it is 2D geographic data) that are far away from each other. While the means and their number are determined correctly, the co-variance matrices are not (at least compared to the regular GMM): They are are much wider and point towards the mean of the cluster centers. A minimal example and visualization can be seen on a stackoverflow question I opened. https://stackoverflow.com/q/54524283 So my question is whether the results of GMM and BGMM should be similar or this is the expected behavior (and why)? Thanks in advance for an answer and best wishes Stefan -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Thu Feb 7 10:59:38 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 7 Feb 2019 10:59:38 -0500 Subject: [scikit-learn] AUCROC/MAP confidence intervals in scikit In-Reply-To: <1CD58B12-55DA-4D93-954E-B3BA0EC69C4E@sebastianraschka.com> References: <1CD58B12-55DA-4D93-954E-B3BA0EC69C4E@sebastianraschka.com> Message-ID: The paper definitely looks interesting and the authors are certainly some giants in the field. But it is actually not widely cited (139 citations since 2005), and I've never seen it used. I don't know why that is, and looking at the citations there doesn't seem to be a lot of follow-up work. I think this would need more validation before getting into sklearn. Sebastian: This paper is distribution independent and doesn't need bootstrapping, so it looks indeed quite nice. On 2/6/19 1:19 PM, Sebastian Raschka wrote: > Hi Stuart, > > I don't think so because there is no standard way to compute CI's. That goes for all performance measures (accuracy, precision, recall, etc.). Some people use simple binomial approximation intervals, some people prefer bootstrapping etc. And it also depends on the data you have. In large datasets, binomial approximation intervals may be sufficient and bootstrapping too expensive etc. > > Thanks for sharing that paper btw, will have a look. > > Best, > Sebastian > > >> On Feb 6, 2019, at 11:28 AM, Stuart Reynolds wrote: >> >> https://papers.nips.cc/paper/2645-confidence-intervals-for-the-area-under-the-roc-curve.pdf >> Does scikit (or other Python libraries) provide functions to measure the confidence interval of AUROC scores? Same question also for mean average precision. >> >> It seems like this should be a standard results reporting practice if a method is available. >> >> - Stuart >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From t3kcit at gmail.com Thu Feb 7 11:03:56 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 7 Feb 2019 11:03:56 -0500 Subject: [scikit-learn] Possible bug in BayesianGaussianMixture? In-Reply-To: References: Message-ID: <717b934f-a471-262f-bec4-810d2bfefedd@gmail.com> Hey Stefan. I would expect that to depend on the prior. It could either be a bug or an issue with the variational inference. Maybe comparing against an MCMC implementation might be helpful? Though if that works, I'm not sure what the conclusion would be tbh. (I hate debugging variational inference, I can't get the hang of it) Can you check the estimated covariance? what is it? The samples that you're showing are from all 100 components, right? Cheers, Andy On 2/6/19 1:34 PM, Stefan Ulbrich via scikit-learn wrote: > Hello, > > I think I might have found a bug in the BayesianGaussianMixture?or at > least encountered a behavior that I was not expecting. The problem > occurs when having clusters with small extent (in my case, it is 2D > geographic data) that are far away from each?other. While the means > and their number are determined correctly, the co-variance matrices > are not (at least compared to the regular GMM): They are are much > wider and point towards the mean of the cluster centers. > A minimal example and visualization can be seen on a stackoverflow > question I opened. > > https://stackoverflow.com/q/54524283 > > So my question is whether the results of GMM and BGMM should be > similar or this is the expected behavior (and why)? > > Thanks in advance for an answer and best wishes > Stefan > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Feb 7 11:15:08 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 Feb 2019 11:15:08 -0500 Subject: [scikit-learn] AUCROC/MAP confidence intervals in scikit In-Reply-To: References: <1CD58B12-55DA-4D93-954E-B3BA0EC69C4E@sebastianraschka.com> Message-ID: Just a skeptical comment from a bystander. I only skimmed parts of the article. My impression is that this does not apply (directly) to the regression setting. AFAIU, they assume that all observations have the same propability. To me it looks more like the literature on testing of or confidence intervals for a single proportion. I might be wrong. Josef On Thu, Feb 7, 2019 at 11:00 AM Andreas Mueller wrote: > The paper definitely looks interesting and the authors are certainly > some giants in the field. > But it is actually not widely cited (139 citations since 2005), and I've > never seen it used. > > I don't know why that is, and looking at the citations there doesn't > seem to be a lot of follow-up work. > I think this would need more validation before getting into sklearn. > > Sebastian: This paper is distribution independent and doesn't need > bootstrapping, so it looks indeed quite nice. > > > On 2/6/19 1:19 PM, Sebastian Raschka wrote: > > Hi Stuart, > > > > I don't think so because there is no standard way to compute CI's. That > goes for all performance measures (accuracy, precision, recall, etc.). Some > people use simple binomial approximation intervals, some people prefer > bootstrapping etc. And it also depends on the data you have. In large > datasets, binomial approximation intervals may be sufficient and > bootstrapping too expensive etc. > > > > Thanks for sharing that paper btw, will have a look. > > > > Best, > > Sebastian > > > > > >> On Feb 6, 2019, at 11:28 AM, Stuart Reynolds > wrote: > >> > >> > https://papers.nips.cc/paper/2645-confidence-intervals-for-the-area-under-the-roc-curve.pdf > >> Does scikit (or other Python libraries) provide functions to measure > the confidence interval of AUROC scores? Same question also for mean > average precision. > >> > >> It seems like this should be a standard results reporting practice if a > method is available. > >> > >> - Stuart > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Thu Feb 7 11:29:50 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Thu, 7 Feb 2019 17:29:50 +0100 Subject: [scikit-learn] Probabilities for LogisticRegression and LDA Message-ID: I was earlier looking at the code of predict_proba of LDA and LogisticRegression. While we certainly some bugs I was a bit confused and I thought an email would be better than opening an issue since that might not be one. In the case of multiclass classification, the probabilities could be computed with two different assumptions - either as a set of independent binary regression or as a log-linear model ( https://en.wikipedia.org/wiki/Multinomial_logistic_regression). Then, we can compute the probabilities either by using a class as a pivot and computing exp(beta_c X) / 1 + sum(exp(beta_k X)) or using all classes and computing a softmax. My question is related to the LogisticRegression in the OvR scheme. Naively, I thought that it was corresponding to the former case (case of a set of independent regression). However, we are using another normalization there which was first implemented in liblinear. I search on liblinear's issue tracker and found: https://github.com/cjlin1/liblinear/pull/20 It is related to the following paper: https://www.csie.ntu.edu.tw/~cjlin/papers/generalBT.pdf My skill in math is limited and I am not sure to grasp what is going on? Anybody could bring some lights on this OvR normalization and why is it different from the case of a set of independent regression describe in Wikipedia? Cheers, -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Thu Feb 7 11:20:57 2019 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 7 Feb 2019 10:20:57 -0600 Subject: [scikit-learn] AUCROC/MAP confidence intervals in scikit In-Reply-To: References: <1CD58B12-55DA-4D93-954E-B3BA0EC69C4E@sebastianraschka.com> Message-ID: Still haven't had a chance to read it, but ROC for binary classification anyway? Also, i.i.d. assumptions are typical for the learning algorithms as well. Best, Sebastian > On Feb 7, 2019, at 10:15 AM, josef.pktd at gmail.com wrote: > > Just a skeptical comment from a bystander. > > I only skimmed parts of the article. My impression is that this does not apply (directly) to the regression setting. > AFAIU, they assume that all observations have the same propability. > > To me it looks more like the literature on testing of or confidence intervals for a single proportion. > > I might be wrong. > > Josef > > On Thu, Feb 7, 2019 at 11:00 AM Andreas Mueller wrote: > The paper definitely looks interesting and the authors are certainly > some giants in the field. > But it is actually not widely cited (139 citations since 2005), and I've > never seen it used. > > I don't know why that is, and looking at the citations there doesn't > seem to be a lot of follow-up work. > I think this would need more validation before getting into sklearn. > > Sebastian: This paper is distribution independent and doesn't need > bootstrapping, so it looks indeed quite nice. > > > On 2/6/19 1:19 PM, Sebastian Raschka wrote: > > Hi Stuart, > > > > I don't think so because there is no standard way to compute CI's. That goes for all performance measures (accuracy, precision, recall, etc.). Some people use simple binomial approximation intervals, some people prefer bootstrapping etc. And it also depends on the data you have. In large datasets, binomial approximation intervals may be sufficient and bootstrapping too expensive etc. > > > > Thanks for sharing that paper btw, will have a look. > > > > Best, > > Sebastian > > > > > >> On Feb 6, 2019, at 11:28 AM, Stuart Reynolds wrote: > >> > >> https://papers.nips.cc/paper/2645-confidence-intervals-for-the-area-under-the-roc-curve.pdf > >> Does scikit (or other Python libraries) provide functions to measure the confidence interval of AUROC scores? Same question also for mean average precision. > >> > >> It seems like this should be a standard results reporting practice if a method is available. > >> > >> - Stuart > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From tom.duprelatour at orange.fr Thu Feb 7 20:49:42 2019 From: tom.duprelatour at orange.fr (Tom DLT) Date: Thu, 7 Feb 2019 17:49:42 -0800 Subject: [scikit-learn] Probabilities for LogisticRegression and LDA In-Reply-To: References: Message-ID: *The set of independent regressions described in Wikipedia is *not* an OvR model.* It is just a (weird) way to understand the multinomial logistic regression model. OvR logistic regression and multinomial logistic regression are two different models. In multinomial logistic regression as a set of independent binary regressions as described in Wikipedia, you have K - 1 binary regressions between class k (k from 1 to K - 1) and class K. Whereas in OvR logistic regression you have K binary regressions between class k (k from 1 to K) and class "not class k". The normalization is therefore different. Indeed, in multinomial logistic regression as a set of independent binary regressions, you have (from the beginning) the property 1 = sum_k p(y = k). The normalization 1 / (1 + sum_{k=1}^{K - 1} p(y = k)) comes from the late computation of p(y = K) using this property. Whereas in OvR logistic regression, you only have 1 = p_k(y = k) + p_k(y != k). Therefore the probabilities p_k(y = k) do not sum to one, and you need to normalize them with sum_{k=1}^{K} p_k(y = k) to create a valid probability of the OvR model. This is done in the same way in OneVsRestClassifier ( https://github.com/scikit-learn/scikit-learn/blob/1a850eb5b601f3bf0f88a43090f83c51b3d8c593/sklearn/multiclass.py#L350-L351 ). But I agree that this description of the multinomial model is quite confusing, compared to the log-linear/softmax description. Tom Le jeu. 7 f?vr. 2019 ? 08:31, Guillaume Lema?tre a ?crit : > I was earlier looking at the code of predict_proba of LDA and > LogisticRegression. While we certainly some bugs I was a bit confused and I > thought an email would be better than opening an issue since that might not > be one. > > In the case of multiclass classification, the probabilities could be > computed with two different assumptions - either as a set of independent > binary regression or as a log-linear model ( > https://en.wikipedia.org/wiki/Multinomial_logistic_regression). > > Then, we can compute the probabilities either by using a class as a pivot > and computing exp(beta_c X) / 1 + sum(exp(beta_k X)) or using all classes > and computing a softmax. > > My question is related to the LogisticRegression in the OvR scheme. > Naively, I thought that it was corresponding to the former case (case of a > set of independent regression). However, we are using another normalization > there which was first implemented in liblinear. I search on liblinear's > issue tracker and found: https://github.com/cjlin1/liblinear/pull/20 > > It is related to the following paper: > https://www.csie.ntu.edu.tw/~cjlin/papers/generalBT.pdf > > My skill in math is limited and I am not sure to grasp what is going on? > Anybody could bring some lights on this OvR normalization and why is it > different from the case of a set of independent regression describe in > Wikipedia? > > Cheers, > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Fri Feb 8 01:47:33 2019 From: g.lemaitre58 at gmail.com (=?ISO-8859-1?Q?Guillaume_Lema=EEtre?=) Date: Fri, 08 Feb 2019 07:47:33 +0100 Subject: [scikit-learn] Probabilities for LogisticRegression and LDA In-Reply-To: Message-ID: <5k4i9qg804c1ncdthbjb7bjv.1549608453818@gmail.com> An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Fri Feb 8 20:59:08 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Fri, 8 Feb 2019 20:59:08 -0500 Subject: [scikit-learn] VOTE: scikit-learn governance document Message-ID: Hey all. I want to call a vote on the final version on the scikit-learn governance document, which can be found in this PR: https://github.com/scikit-learn/scikit-learn/pull/12878 It underwent some significant changes in the last couple of weeks. The two-sentence summary is: conflicts are resolved by vote among core devs, with a technical committee resolving anything that can not be decided by at least a 2/3 majority. The initial technical committee is Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l Varoquaux and myself (Andreas M?ller). I would ask all of the *core developers* to either vote +1 for the governance doc, -1 against it, or to explicitly abstain here on the public mailing list (which is the way any vote will be conducted according to the new governance document). I suggest we leave the vote open for two weeks, so that the decision is made before the sprint and we can take actions. Anyone can still comment on the PR or here, though I would rather not make more changes as this has already been discussed to some length. Thank you for participating, Andy From adrin.jalali at gmail.com Sat Feb 9 05:59:15 2019 From: adrin.jalali at gmail.com (Adrin) Date: Sat, 9 Feb 2019 11:59:15 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: Message-ID: +1 Thanks for the work you've put in it! On Sat, Feb 9, 2019, 03:00 Andreas Mueller Hey all. > > I want to call a vote on the final version on the scikit-learn > governance document, which can be found in this PR: > > https://github.com/scikit-learn/scikit-learn/pull/12878 > > It underwent some significant changes in the last couple of weeks. > > The two-sentence summary is: conflicts are resolved by vote among core > devs, with a technical committee resolving anything that can not be > decided by at least a 2/3 majority. The initial technical committee is > Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l > Varoquaux and myself (Andreas M?ller). > > I would ask all of the *core developers* to either vote +1 for the > governance doc, -1 against it, or to explicitly abstain here on the > public mailing list (which is the way any vote will be conducted > according to the new governance document). > > I suggest we leave the vote open for two weeks, so that the decision is > made before the sprint and we can take actions. > > Anyone can still comment on the PR or here, though I would rather not > make more changes as this has already been discussed to some length. > > Thank you for participating, > > Andy > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.louppe at gmail.com Sat Feb 9 16:03:59 2019 From: g.louppe at gmail.com (Gilles Louppe) Date: Sat, 9 Feb 2019 22:03:59 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: Message-ID: Hi Andy, I read through to document. Even though I have not been really active these past months/years, I think it summarizes well our governance model. +1. Gilles On Sat, 9 Feb 2019 at 12:01, Adrin wrote: > > +1 > > Thanks for the work you've put in it! > > On Sat, Feb 9, 2019, 03:00 Andreas Mueller > >> Hey all. >> >> I want to call a vote on the final version on the scikit-learn >> governance document, which can be found in this PR: >> >> https://github.com/scikit-learn/scikit-learn/pull/12878 >> >> It underwent some significant changes in the last couple of weeks. >> >> The two-sentence summary is: conflicts are resolved by vote among core >> devs, with a technical committee resolving anything that can not be >> decided by at least a 2/3 majority. The initial technical committee is >> Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l >> Varoquaux and myself (Andreas M?ller). >> >> I would ask all of the *core developers* to either vote +1 for the >> governance doc, -1 against it, or to explicitly abstain here on the >> public mailing list (which is the way any vote will be conducted >> according to the new governance document). >> >> I suggest we leave the vote open for two weeks, so that the decision is >> made before the sprint and we can take actions. >> >> Anyone can still comment on the PR or here, though I would rather not >> make more changes as this has already been discussed to some length. >> >> Thank you for participating, >> >> Andy >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From alexandre.gramfort at inria.fr Sun Feb 10 12:27:07 2019 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Sun, 10 Feb 2019 18:27:07 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: Message-ID: +1 for me too Alex On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe wrote: > Hi Andy, > > I read through to document. Even though I have not been really active > these past months/years, I think it summarizes well our governance > model. > > +1. > > Gilles > > On Sat, 9 Feb 2019 at 12:01, Adrin wrote: > > > > +1 > > > > Thanks for the work you've put in it! > > > > On Sat, Feb 9, 2019, 03:00 Andreas Mueller >> > >> Hey all. > >> > >> I want to call a vote on the final version on the scikit-learn > >> governance document, which can be found in this PR: > >> > >> https://github.com/scikit-learn/scikit-learn/pull/12878 > >> > >> It underwent some significant changes in the last couple of weeks. > >> > >> The two-sentence summary is: conflicts are resolved by vote among core > >> devs, with a technical committee resolving anything that can not be > >> decided by at least a 2/3 majority. The initial technical committee is > >> Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l > >> Varoquaux and myself (Andreas M?ller). > >> > >> I would ask all of the *core developers* to either vote +1 for the > >> governance doc, -1 against it, or to explicitly abstain here on the > >> public mailing list (which is the way any vote will be conducted > >> according to the new governance document). > >> > >> I suggest we leave the vote open for two weeks, so that the decision is > >> made before the sprint and we can take actions. > >> > >> Anyone can still comment on the PR or here, though I would rather not > >> make more changes as this has already been discussed to some length. > >> > >> Thank you for participating, > >> > >> Andy > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Sun Feb 10 12:52:43 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Sun, 10 Feb 2019 12:52:43 -0500 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: Message-ID: Thanks for chiming in Gilles! On 2/9/19 4:03 PM, Gilles Louppe wrote: > Hi Andy, > > I read through to document. Even though I have not been really active > these past months/years, I think it summarizes well our governance > model. > > +1. > > Gilles > > On Sat, 9 Feb 2019 at 12:01, Adrin wrote: >> +1 >> >> Thanks for the work you've put in it! >> >> On Sat, Feb 9, 2019, 03:00 Andreas Mueller >> Hey all. >>> >>> I want to call a vote on the final version on the scikit-learn >>> governance document, which can be found in this PR: >>> >>> https://github.com/scikit-learn/scikit-learn/pull/12878 >>> >>> It underwent some significant changes in the last couple of weeks. >>> >>> The two-sentence summary is: conflicts are resolved by vote among core >>> devs, with a technical committee resolving anything that can not be >>> decided by at least a 2/3 majority. The initial technical committee is >>> Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l >>> Varoquaux and myself (Andreas M?ller). >>> >>> I would ask all of the *core developers* to either vote +1 for the >>> governance doc, -1 against it, or to explicitly abstain here on the >>> public mailing list (which is the way any vote will be conducted >>> according to the new governance document). >>> >>> I suggest we leave the vote open for two weeks, so that the decision is >>> made before the sprint and we can take actions. >>> >>> Anyone can still comment on the PR or here, though I would rather not >>> make more changes as this has already been discussed to some length. >>> >>> Thank you for participating, >>> >>> Andy >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From qinhanmin2005 at sina.com Sun Feb 10 19:53:44 2019 From: qinhanmin2005 at sina.com (Hanmin Qin) Date: Mon, 11 Feb 2019 08:53:44 +0800 Subject: [scikit-learn] VOTE: scikit-learn governance document Message-ID: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> +1 (personally I still think it's better to keep the flow chart, it seems useful for beginners) Hanmin Qin ----- Original Message ----- From: Alexandre Gramfort To: Scikit-learn mailing list Subject: Re: [scikit-learn] VOTE: scikit-learn governance document Date: 2019-02-11 01:29 +1 for me too Alex On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe wrote: Hi Andy, I read through to document. Even though I have not been really active these past months/years, I think it summarizes well our governance model. +1. Gilles On Sat, 9 Feb 2019 at 12:01, Adrin wrote: > > +1 > > Thanks for the work you've put in it! > > On Sat, Feb 9, 2019, 03:00 Andreas Mueller > >> Hey all. >> >> I want to call a vote on the final version on the scikit-learn >> governance document, which can be found in this PR: >> >> https://github.com/scikit-learn/scikit-learn/pull/12878 >> >> It underwent some significant changes in the last couple of weeks. >> >> The two-sentence summary is: conflicts are resolved by vote among core >> devs, with a technical committee resolving anything that can not be >> decided by at least a 2/3 majority. The initial technical committee is >> Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l >> Varoquaux and myself (Andreas M?ller). >> >> I would ask all of the *core developers* to either vote +1 for the >> governance doc, -1 against it, or to explicitly abstain here on the >> public mailing list (which is the way any vote will be conducted >> according to the new governance document). >> >> I suggest we leave the vote open for two weeks, so that the decision is >> made before the sprint and we can take actions. >> >> Anyone can still comment on the PR or here, though I would rather not >> make more changes as this has already been discussed to some length. >> >> Thank you for participating, >> >> Andy >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Sun Feb 10 20:53:35 2019 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Sun, 10 Feb 2019 17:53:35 -0800 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> Message-ID: +1 from me as well. Thanks for putting in the time to write this all out. On Sun, Feb 10, 2019 at 4:54 PM Hanmin Qin wrote: > +1 (personally I still think it's better to keep the flow chart, it seems > useful for beginners) > > Hanmin Qin > > ----- Original Message ----- > From: Alexandre Gramfort > To: Scikit-learn mailing list > Subject: Re: [scikit-learn] VOTE: scikit-learn governance document > Date: 2019-02-11 01:29 > > +1 for me too > > Alex > > > On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe wrote: > > Hi Andy, > > I read through to document. Even though I have not been really active > these past months/years, I think it summarizes well our governance > model. > > +1. > > Gilles > > On Sat, 9 Feb 2019 at 12:01, Adrin wrote: > > > > +1 > > > > Thanks for the work you've put in it! > > > > On Sat, Feb 9, 2019, 03:00 Andreas Mueller >> > >> Hey all. > >> > >> I want to call a vote on the final version on the scikit-learn > >> governance document, which can be found in this PR: > >> > >> https://github.com/scikit-learn/scikit-learn/pull/12878 > >> > >> It underwent some significant changes in the last couple of weeks. > >> > >> The two-sentence summary is: conflicts are resolved by vote among core > >> devs, with a technical committee resolving anything that can not be > >> decided by at least a 2/3 majority. The initial technical committee is > >> Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l > >> Varoquaux and myself (Andreas M?ller). > >> > >> I would ask all of the *core developers* to either vote +1 for the > >> governance doc, -1 against it, or to explicitly abstain here on the > >> public mailing list (which is the way any vote will be conducted > >> according to the new governance document). > >> > >> I suggest we leave the vote open for two weeks, so that the decision is > >> made before the sprint and we can take actions. > >> > >> Anyone can still comment on the PR or here, though I would rather not > >> make more changes as this has already been discussed to some length. > >> > >> Thank you for participating, > >> > >> Andy > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noel.dawe at gmail.com Sun Feb 10 21:45:28 2019 From: noel.dawe at gmail.com (Noel Dawe) Date: Sun, 10 Feb 2019 21:45:28 -0500 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> Message-ID: Hi Andy, +1 from me as well :) On Sun, Feb 10, 2019 at 8:54 PM Jacob Schreiber wrote: > +1 from me as well. Thanks for putting in the time to write this all out. > > On Sun, Feb 10, 2019 at 4:54 PM Hanmin Qin wrote: > >> +1 (personally I still think it's better to keep the flow chart, it seems >> useful for beginners) >> >> Hanmin Qin >> >> ----- Original Message ----- >> From: Alexandre Gramfort >> To: Scikit-learn mailing list >> Subject: Re: [scikit-learn] VOTE: scikit-learn governance document >> Date: 2019-02-11 01:29 >> >> +1 for me too >> >> Alex >> >> >> On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe wrote: >> >> Hi Andy, >> >> I read through to document. Even though I have not been really active >> these past months/years, I think it summarizes well our governance >> model. >> >> +1. >> >> Gilles >> >> On Sat, 9 Feb 2019 at 12:01, Adrin wrote: >> > >> > +1 >> > >> > Thanks for the work you've put in it! >> > >> > On Sat, Feb 9, 2019, 03:00 Andreas Mueller > >> >> >> Hey all. >> >> >> >> I want to call a vote on the final version on the scikit-learn >> >> governance document, which can be found in this PR: >> >> >> >> https://github.com/scikit-learn/scikit-learn/pull/12878 >> >> >> >> It underwent some significant changes in the last couple of weeks. >> >> >> >> The two-sentence summary is: conflicts are resolved by vote among core >> >> devs, with a technical committee resolving anything that can not be >> >> decided by at least a 2/3 majority. The initial technical committee is >> >> Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l >> >> Varoquaux and myself (Andreas M?ller). >> >> >> >> I would ask all of the *core developers* to either vote +1 for the >> >> governance doc, -1 against it, or to explicitly abstain here on the >> >> public mailing list (which is the way any vote will be conducted >> >> according to the new governance document). >> >> >> >> I suggest we leave the vote open for two weeks, so that the decision is >> >> made before the sprint and we can take actions. >> >> >> >> Anyone can still comment on the PR or here, though I would rather not >> >> make more changes as this has already been discussed to some length. >> >> >> >> Thank you for participating, >> >> >> >> Andy >> >> >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyr14 at gmail.com Mon Feb 11 02:53:51 2019 From: zephyr14 at gmail.com (Vlad Niculae) Date: Mon, 11 Feb 2019 07:53:51 +0000 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> Message-ID: +1 Thank you for the effort to formalize this! Best, Vlad On Mon, Feb 11, 2019, 02:47 Noel Dawe Hi Andy, > > +1 from me as well :) > > On Sun, Feb 10, 2019 at 8:54 PM Jacob Schreiber > wrote: > >> +1 from me as well. Thanks for putting in the time to write this all out. >> >> On Sun, Feb 10, 2019 at 4:54 PM Hanmin Qin >> wrote: >> >>> +1 (personally I still think it's better to keep the flow chart, it >>> seems useful for beginners) >>> >>> Hanmin Qin >>> >>> ----- Original Message ----- >>> From: Alexandre Gramfort >>> To: Scikit-learn mailing list >>> Subject: Re: [scikit-learn] VOTE: scikit-learn governance document >>> Date: 2019-02-11 01:29 >>> >>> +1 for me too >>> >>> Alex >>> >>> >>> On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe >>> wrote: >>> >>> Hi Andy, >>> >>> I read through to document. Even though I have not been really active >>> these past months/years, I think it summarizes well our governance >>> model. >>> >>> +1. >>> >>> Gilles >>> >>> On Sat, 9 Feb 2019 at 12:01, Adrin wrote: >>> > >>> > +1 >>> > >>> > Thanks for the work you've put in it! >>> > >>> > On Sat, Feb 9, 2019, 03:00 Andreas Mueller >> >> >>> >> Hey all. >>> >> >>> >> I want to call a vote on the final version on the scikit-learn >>> >> governance document, which can be found in this PR: >>> >> >>> >> https://github.com/scikit-learn/scikit-learn/pull/12878 >>> >> >>> >> It underwent some significant changes in the last couple of weeks. >>> >> >>> >> The two-sentence summary is: conflicts are resolved by vote among core >>> >> devs, with a technical committee resolving anything that can not be >>> >> decided by at least a 2/3 majority. The initial technical committee is >>> >> Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l >>> >> Varoquaux and myself (Andreas M?ller). >>> >> >>> >> I would ask all of the *core developers* to either vote +1 for the >>> >> governance doc, -1 against it, or to explicitly abstain here on the >>> >> public mailing list (which is the way any vote will be conducted >>> >> according to the new governance document). >>> >> >>> >> I suggest we leave the vote open for two weeks, so that the decision >>> is >>> >> made before the sprint and we can take actions. >>> >> >>> >> Anyone can still comment on the PR or here, though I would rather not >>> >> make more changes as this has already been discussed to some length. >>> >> >>> >> Thank you for participating, >>> >> >>> >> Andy >>> >> >>> >> _______________________________________________ >>> >> scikit-learn mailing list >>> >> scikit-learn at python.org >>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> > _______________________________________________ >>> > scikit-learn mailing list >>> > scikit-learn at python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Mon Feb 11 03:47:56 2019 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 11 Feb 2019 09:47:56 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> Message-ID: <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> +1 on my side too. Thanks a lot Andy for moving this forward. Ga?l On Mon, Feb 11, 2019 at 07:53:51AM +0000, Vlad Niculae wrote: > +1 > Thank you for the effort to formalize this! > Best, > Vlad > On Mon, Feb 11, 2019, 02:47 Noel Dawe Hi Andy, > +1 from me as well :) > On Sun, Feb 10, 2019 at 8:54 PM Jacob Schreiber > wrote: > +1 from me as well. Thanks for putting in the time to write this all > out.? > On Sun, Feb 10, 2019 at 4:54 PM Hanmin Qin > wrote: > +1 (personally I still think it's better to keep the flow chart, it > seems useful for beginners) > Hanmin Qin > ----- Original Message ----- > From: Alexandre Gramfort > To: Scikit-learn mailing list > Subject: Re: [scikit-learn] VOTE: scikit-learn governance document > Date: 2019-02-11 01:29 > +1 for me too > Alex > On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe > wrote: > Hi Andy, > I read through to document. Even though I have not been really > active > these past months/years, I think it summarizes well our > governance > model. > +1. > Gilles > On Sat, 9 Feb 2019 at 12:01, Adrin > wrote: > > +1 > > Thanks for the work you've put in it! > > On Sat, Feb 9, 2019, 03:00 Andreas Mueller wrote: > >> Hey all. > >> I want to call a vote on the final version on the > scikit-learn > >> governance document, which can be found in this PR: > >> https://github.com/scikit-learn/scikit-learn/pull/12878 > >> It underwent some significant changes in the last couple of > weeks. > >> The two-sentence summary is: conflicts are resolved by vote > among core > >> devs, with a technical committee resolving anything that can > not be > >> decided by at least a 2/3 majority. The initial technical > committee is > >> Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin > Qin, Ga?l > >> Varoquaux and myself (Andreas M?ller). > >> I would ask all of the *core developers* to either vote +1 > for the > >> governance doc, -1 against it, or to explicitly abstain here > on the > >> public mailing list (which is the way any vote will be > conducted > >> according to the new governance document). > >> I suggest we leave the vote open for two weeks, so that the > decision is > >> made before the sprint and we can take actions. > >> Anyone can still comment on the PR or here, though I would > rather not > >> make more changes as this has already been discussed to some > length. > >> Thank you for participating, > >> Andy > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Senior Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From pahome.chen at mirlab.org Mon Feb 11 04:50:20 2019 From: pahome.chen at mirlab.org (lampahome) Date: Mon, 11 Feb 2019 17:50:20 +0800 Subject: [scikit-learn] How to design system if I have huge items to real time analysis? Message-ID: Hello, I'm figuring out some way to deal with real time regression on disk block access times. But I have multiple patterns of each block. Ex: Some block were accessed once a month, some blocks were accessed everyday. They all have different access patterns. The question is that how to predict access pattern of each block well in real time? I tried regression.ensemble but they don't have partial_fit to fit real time. I found leanr_model.SGDRegressor and neural_network.MLPRegressor, they have partial_fit. But they only predict one result.(But result of each block shouldn't be the same cuz they have different access times) I want to predict access times of each block in real time but I don't know how to reach the same effect. Should I change algo? thx -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at pm.me Mon Feb 11 13:14:54 2019 From: rth.yurchak at pm.me (Roman Yurchak) Date: Mon, 11 Feb 2019 18:14:54 +0000 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> Message-ID: +1 as well Roman On 11/02/2019 09:47, Gael Varoquaux wrote: > +1 on my side too. > > Thanks a lot Andy for moving this forward. > > Ga?l > > On Mon, Feb 11, 2019 at 07:53:51AM +0000, Vlad Niculae wrote: >> +1 > >> Thank you for the effort to formalize this! > >> Best, >> Vlad > >> On Mon, Feb 11, 2019, 02:47 Noel Dawe >> Hi Andy, > >> +1 from me as well :) > >> On Sun, Feb 10, 2019 at 8:54 PM Jacob Schreiber >> wrote: > >> +1 from me as well. Thanks for putting in the time to write this all >> out. > >> On Sun, Feb 10, 2019 at 4:54 PM Hanmin Qin >> wrote: > >> +1 (personally I still think it's better to keep the flow chart, it >> seems useful for beginners) > >> Hanmin Qin > >> ----- Original Message ----- >> From: Alexandre Gramfort >> To: Scikit-learn mailing list >> Subject: Re: [scikit-learn] VOTE: scikit-learn governance document >> Date: 2019-02-11 01:29 > >> +1 for me too > >> Alex > > >> On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe >> wrote: > >> Hi Andy, > >> I read through to document. Even though I have not been really >> active >> these past months/years, I think it summarizes well our >> governance >> model. > >> +1. > >> Gilles > >> On Sat, 9 Feb 2019 at 12:01, Adrin >> wrote: > >> > +1 > >> > Thanks for the work you've put in it! > >> > On Sat, Feb 9, 2019, 03:00 Andreas Mueller > wrote: > >> >> Hey all. > >> >> I want to call a vote on the final version on the >> scikit-learn >> >> governance document, which can be found in this PR: > >> >> https://github.com/scikit-learn/scikit-learn/pull/12878 > >> >> It underwent some significant changes in the last couple of >> weeks. > >> >> The two-sentence summary is: conflicts are resolved by vote >> among core >> >> devs, with a technical committee resolving anything that can >> not be >> >> decided by at least a 2/3 majority. The initial technical >> committee is >> >> Alexander Gramfort, Olivier Grisel, Joel Nothman, Hanmin >> Qin, Ga?l >> >> Varoquaux and myself (Andreas M?ller). > >> >> I would ask all of the *core developers* to either vote +1 >> for the >> >> governance doc, -1 against it, or to explicitly abstain here >> on the >> >> public mailing list (which is the way any vote will be >> conducted >> >> according to the new governance document). > >> >> I suggest we leave the vote open for two weeks, so that the >> decision is >> >> made before the sprint and we can take actions. > >> >> Anyone can still comment on the PR or here, though I would >> rather not >> >> make more changes as this has already been discussed to some >> length. > >> >> Thank you for participating, > >> >> Andy > >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > -- > Gael Varoquaux > Senior Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From nelle.varoquaux at gmail.com Mon Feb 11 13:20:15 2019 From: nelle.varoquaux at gmail.com (Nelle Varoquaux) Date: Mon, 11 Feb 2019 10:20:15 -0800 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> Message-ID: +1 On Mon, 11 Feb 2019 at 10:16, Roman Yurchak via scikit-learn < scikit-learn at python.org> wrote: > +1 as well > > Roman > > On 11/02/2019 09:47, Gael Varoquaux wrote: > > +1 on my side too. > > > > Thanks a lot Andy for moving this forward. > > > > Ga?l > > > > On Mon, Feb 11, 2019 at 07:53:51AM +0000, Vlad Niculae wrote: > >> +1 > > > >> Thank you for the effort to formalize this! > > > >> Best, > >> Vlad > > > >> On Mon, Feb 11, 2019, 02:47 Noel Dawe > > >> Hi Andy, > > > >> +1 from me as well :) > > > >> On Sun, Feb 10, 2019 at 8:54 PM Jacob Schreiber < > jmschreiber91 at gmail.com> > >> wrote: > > > >> +1 from me as well. Thanks for putting in the time to write > this all > >> out. > > > >> On Sun, Feb 10, 2019 at 4:54 PM Hanmin Qin < > qinhanmin2005 at sina.com> > >> wrote: > > > >> +1 (personally I still think it's better to keep the flow > chart, it > >> seems useful for beginners) > > > >> Hanmin Qin > > > >> ----- Original Message ----- > >> From: Alexandre Gramfort > >> To: Scikit-learn mailing list > >> Subject: Re: [scikit-learn] VOTE: scikit-learn governance > document > >> Date: 2019-02-11 01:29 > > > >> +1 for me too > > > >> Alex > > > > > >> On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe < > g.louppe at gmail.com> > >> wrote: > > > >> Hi Andy, > > > >> I read through to document. Even though I have not > been really > >> active > >> these past months/years, I think it summarizes well our > >> governance > >> model. > > > >> +1. > > > >> Gilles > > > >> On Sat, 9 Feb 2019 at 12:01, Adrin < > adrin.jalali at gmail.com> > >> wrote: > > > >> > +1 > > > >> > Thanks for the work you've put in it! > > > >> > On Sat, Feb 9, 2019, 03:00 Andreas Mueller < > t3kcit at gmail.com > >> wrote: > > > >> >> Hey all. > > > >> >> I want to call a vote on the final version on the > >> scikit-learn > >> >> governance document, which can be found in this PR: > > > >> >> > https://github.com/scikit-learn/scikit-learn/pull/12878 > > > >> >> It underwent some significant changes in the last > couple of > >> weeks. > > > >> >> The two-sentence summary is: conflicts are resolved > by vote > >> among core > >> >> devs, with a technical committee resolving anything > that can > >> not be > >> >> decided by at least a 2/3 majority. The initial > technical > >> committee is > >> >> Alexander Gramfort, Olivier Grisel, Joel Nothman, > Hanmin > >> Qin, Ga?l > >> >> Varoquaux and myself (Andreas M?ller). > > > >> >> I would ask all of the *core developers* to either > vote +1 > >> for the > >> >> governance doc, -1 against it, or to explicitly > abstain here > >> on the > >> >> public mailing list (which is the way any vote will > be > >> conducted > >> >> according to the new governance document). > > > >> >> I suggest we leave the vote open for two weeks, so > that the > >> decision is > >> >> made before the sprint and we can take actions. > > > >> >> Anyone can still comment on the PR or here, though > I would > >> rather not > >> >> make more changes as this has already been > discussed to some > >> length. > > > >> >> Thank you for participating, > > > >> >> Andy > > > >> >> _______________________________________________ > >> >> scikit-learn mailing list > >> >> scikit-learn at python.org > >> >> > https://mail.python.org/mailman/listinfo/scikit-learn > > > >> > _______________________________________________ > >> > scikit-learn mailing list > >> > scikit-learn at python.org > >> > > https://mail.python.org/mailman/listinfo/scikit-learn > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > -- > > Gael Varoquaux > > Senior Researcher, INRIA Parietal > > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > > Phone: ++ 33-1-69-08-79-68 > > http://gael-varoquaux.info > http://twitter.com/GaelVaroquaux > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.duprelatour at orange.fr Mon Feb 11 13:27:39 2019 From: tom.duprelatour at orange.fr (Tom DLT) Date: Mon, 11 Feb 2019 10:27:39 -0800 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> Message-ID: +1 as well Le lun. 11 f?vr. 2019 ? 10:23, Nelle Varoquaux a ?crit : > +1 > > On Mon, 11 Feb 2019 at 10:16, Roman Yurchak via scikit-learn < > scikit-learn at python.org> wrote: > >> +1 as well >> >> Roman >> >> On 11/02/2019 09:47, Gael Varoquaux wrote: >> > +1 on my side too. >> > >> > Thanks a lot Andy for moving this forward. >> > >> > Ga?l >> > >> > On Mon, Feb 11, 2019 at 07:53:51AM +0000, Vlad Niculae wrote: >> >> +1 >> > >> >> Thank you for the effort to formalize this! >> > >> >> Best, >> >> Vlad >> > >> >> On Mon, Feb 11, 2019, 02:47 Noel Dawe > > >> >> Hi Andy, >> > >> >> +1 from me as well :) >> > >> >> On Sun, Feb 10, 2019 at 8:54 PM Jacob Schreiber < >> jmschreiber91 at gmail.com> >> >> wrote: >> > >> >> +1 from me as well. Thanks for putting in the time to write >> this all >> >> out. >> > >> >> On Sun, Feb 10, 2019 at 4:54 PM Hanmin Qin < >> qinhanmin2005 at sina.com> >> >> wrote: >> > >> >> +1 (personally I still think it's better to keep the flow >> chart, it >> >> seems useful for beginners) >> > >> >> Hanmin Qin >> > >> >> ----- Original Message ----- >> >> From: Alexandre Gramfort >> >> To: Scikit-learn mailing list >> >> Subject: Re: [scikit-learn] VOTE: scikit-learn governance >> document >> >> Date: 2019-02-11 01:29 >> > >> >> +1 for me too >> > >> >> Alex >> > >> > >> >> On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe < >> g.louppe at gmail.com> >> >> wrote: >> > >> >> Hi Andy, >> > >> >> I read through to document. Even though I have not >> been really >> >> active >> >> these past months/years, I think it summarizes well >> our >> >> governance >> >> model. >> > >> >> +1. >> > >> >> Gilles >> > >> >> On Sat, 9 Feb 2019 at 12:01, Adrin < >> adrin.jalali at gmail.com> >> >> wrote: >> > >> >> > +1 >> > >> >> > Thanks for the work you've put in it! >> > >> >> > On Sat, Feb 9, 2019, 03:00 Andreas Mueller < >> t3kcit at gmail.com >> >> wrote: >> > >> >> >> Hey all. >> > >> >> >> I want to call a vote on the final version on the >> >> scikit-learn >> >> >> governance document, which can be found in this PR: >> > >> >> >> >> https://github.com/scikit-learn/scikit-learn/pull/12878 >> > >> >> >> It underwent some significant changes in the last >> couple of >> >> weeks. >> > >> >> >> The two-sentence summary is: conflicts are >> resolved by vote >> >> among core >> >> >> devs, with a technical committee resolving >> anything that can >> >> not be >> >> >> decided by at least a 2/3 majority. The initial >> technical >> >> committee is >> >> >> Alexander Gramfort, Olivier Grisel, Joel Nothman, >> Hanmin >> >> Qin, Ga?l >> >> >> Varoquaux and myself (Andreas M?ller). >> > >> >> >> I would ask all of the *core developers* to either >> vote +1 >> >> for the >> >> >> governance doc, -1 against it, or to explicitly >> abstain here >> >> on the >> >> >> public mailing list (which is the way any vote >> will be >> >> conducted >> >> >> according to the new governance document). >> > >> >> >> I suggest we leave the vote open for two weeks, so >> that the >> >> decision is >> >> >> made before the sprint and we can take actions. >> > >> >> >> Anyone can still comment on the PR or here, though >> I would >> >> rather not >> >> >> make more changes as this has already been >> discussed to some >> >> length. >> > >> >> >> Thank you for participating, >> > >> >> >> Andy >> > >> >> >> _______________________________________________ >> >> >> scikit-learn mailing list >> >> >> scikit-learn at python.org >> >> >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> >> > _______________________________________________ >> >> > scikit-learn mailing list >> >> > scikit-learn at python.org >> >> > >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> > -- >> > Gael Varoquaux >> > Senior Researcher, INRIA Parietal >> > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >> > Phone: ++ 33-1-69-08-79-68 >> > http://gael-varoquaux.info >> http://twitter.com/GaelVaroquaux >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Mon Feb 11 17:15:21 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Mon, 11 Feb 2019 23:15:21 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> Message-ID: +1 as well On Mon, 11 Feb 2019 at 19:30, Tom DLT wrote: > +1 as well > > Le lun. 11 f?vr. 2019 ? 10:23, Nelle Varoquaux > a ?crit : > >> +1 >> >> On Mon, 11 Feb 2019 at 10:16, Roman Yurchak via scikit-learn < >> scikit-learn at python.org> wrote: >> >>> +1 as well >>> >>> Roman >>> >>> On 11/02/2019 09:47, Gael Varoquaux wrote: >>> > +1 on my side too. >>> > >>> > Thanks a lot Andy for moving this forward. >>> > >>> > Ga?l >>> > >>> > On Mon, Feb 11, 2019 at 07:53:51AM +0000, Vlad Niculae wrote: >>> >> +1 >>> > >>> >> Thank you for the effort to formalize this! >>> > >>> >> Best, >>> >> Vlad >>> > >>> >> On Mon, Feb 11, 2019, 02:47 Noel Dawe >> > >>> >> Hi Andy, >>> > >>> >> +1 from me as well :) >>> > >>> >> On Sun, Feb 10, 2019 at 8:54 PM Jacob Schreiber < >>> jmschreiber91 at gmail.com> >>> >> wrote: >>> > >>> >> +1 from me as well. Thanks for putting in the time to write >>> this all >>> >> out. >>> > >>> >> On Sun, Feb 10, 2019 at 4:54 PM Hanmin Qin < >>> qinhanmin2005 at sina.com> >>> >> wrote: >>> > >>> >> +1 (personally I still think it's better to keep the >>> flow chart, it >>> >> seems useful for beginners) >>> > >>> >> Hanmin Qin >>> > >>> >> ----- Original Message ----- >>> >> From: Alexandre Gramfort >>> >> To: Scikit-learn mailing list >>> >> Subject: Re: [scikit-learn] VOTE: scikit-learn >>> governance document >>> >> Date: 2019-02-11 01:29 >>> > >>> >> +1 for me too >>> > >>> >> Alex >>> > >>> > >>> >> On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe < >>> g.louppe at gmail.com> >>> >> wrote: >>> > >>> >> Hi Andy, >>> > >>> >> I read through to document. Even though I have not >>> been really >>> >> active >>> >> these past months/years, I think it summarizes well >>> our >>> >> governance >>> >> model. >>> > >>> >> +1. >>> > >>> >> Gilles >>> > >>> >> On Sat, 9 Feb 2019 at 12:01, Adrin < >>> adrin.jalali at gmail.com> >>> >> wrote: >>> > >>> >> > +1 >>> > >>> >> > Thanks for the work you've put in it! >>> > >>> >> > On Sat, Feb 9, 2019, 03:00 Andreas Mueller < >>> t3kcit at gmail.com >>> >> wrote: >>> > >>> >> >> Hey all. >>> > >>> >> >> I want to call a vote on the final version on the >>> >> scikit-learn >>> >> >> governance document, which can be found in this >>> PR: >>> > >>> >> >> >>> https://github.com/scikit-learn/scikit-learn/pull/12878 >>> > >>> >> >> It underwent some significant changes in the last >>> couple of >>> >> weeks. >>> > >>> >> >> The two-sentence summary is: conflicts are >>> resolved by vote >>> >> among core >>> >> >> devs, with a technical committee resolving >>> anything that can >>> >> not be >>> >> >> decided by at least a 2/3 majority. The initial >>> technical >>> >> committee is >>> >> >> Alexander Gramfort, Olivier Grisel, Joel Nothman, >>> Hanmin >>> >> Qin, Ga?l >>> >> >> Varoquaux and myself (Andreas M?ller). >>> > >>> >> >> I would ask all of the *core developers* to >>> either vote +1 >>> >> for the >>> >> >> governance doc, -1 against it, or to explicitly >>> abstain here >>> >> on the >>> >> >> public mailing list (which is the way any vote >>> will be >>> >> conducted >>> >> >> according to the new governance document). >>> > >>> >> >> I suggest we leave the vote open for two weeks, >>> so that the >>> >> decision is >>> >> >> made before the sprint and we can take actions. >>> > >>> >> >> Anyone can still comment on the PR or here, >>> though I would >>> >> rather not >>> >> >> make more changes as this has already been >>> discussed to some >>> >> length. >>> > >>> >> >> Thank you for participating, >>> > >>> >> >> Andy >>> > >>> >> >> _______________________________________________ >>> >> >> scikit-learn mailing list >>> >> >> scikit-learn at python.org >>> >> >> >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> >> > _______________________________________________ >>> >> > scikit-learn mailing list >>> >> > scikit-learn at python.org >>> >> > >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >>> >> scikit-learn mailing list >>> >> scikit-learn at python.org >>> >> >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> >> _______________________________________________ >>> >> scikit-learn mailing list >>> >> scikit-learn at python.org >>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >>> >> scikit-learn mailing list >>> >> scikit-learn at python.org >>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> >> _______________________________________________ >>> >> scikit-learn mailing list >>> >> scikit-learn at python.org >>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> >> _______________________________________________ >>> >> scikit-learn mailing list >>> >> scikit-learn at python.org >>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> > >>> >> _______________________________________________ >>> >> scikit-learn mailing list >>> >> scikit-learn at python.org >>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> > >>> > -- >>> > Gael Varoquaux >>> > Senior Researcher, INRIA Parietal >>> > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>> > Phone: ++ 33-1-69-08-79-68 >>> > http://gael-varoquaux.info >>> http://twitter.com/GaelVaroquaux >>> > _______________________________________________ >>> > scikit-learn mailing list >>> > scikit-learn at python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdholt1 at gmail.com Tue Feb 12 03:18:30 2019 From: bdholt1 at gmail.com (Brian Holt) Date: Tue, 12 Feb 2019 08:18:30 +0000 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> Message-ID: I've not been active for a long time but it's a +1 from me too. Brian On Mon, Feb 11, 2019, 22:18 Guillaume Lema?tre +1 as well > > On Mon, 11 Feb 2019 at 19:30, Tom DLT wrote: > >> +1 as well >> >> Le lun. 11 f?vr. 2019 ? 10:23, Nelle Varoquaux >> a ?crit : >> >>> +1 >>> >>> On Mon, 11 Feb 2019 at 10:16, Roman Yurchak via scikit-learn < >>> scikit-learn at python.org> wrote: >>> >>>> +1 as well >>>> >>>> Roman >>>> >>>> On 11/02/2019 09:47, Gael Varoquaux wrote: >>>> > +1 on my side too. >>>> > >>>> > Thanks a lot Andy for moving this forward. >>>> > >>>> > Ga?l >>>> > >>>> > On Mon, Feb 11, 2019 at 07:53:51AM +0000, Vlad Niculae wrote: >>>> >> +1 >>>> > >>>> >> Thank you for the effort to formalize this! >>>> > >>>> >> Best, >>>> >> Vlad >>>> > >>>> >> On Mon, Feb 11, 2019, 02:47 Noel Dawe >>> > >>>> >> Hi Andy, >>>> > >>>> >> +1 from me as well :) >>>> > >>>> >> On Sun, Feb 10, 2019 at 8:54 PM Jacob Schreiber < >>>> jmschreiber91 at gmail.com> >>>> >> wrote: >>>> > >>>> >> +1 from me as well. Thanks for putting in the time to write >>>> this all >>>> >> out. >>>> > >>>> >> On Sun, Feb 10, 2019 at 4:54 PM Hanmin Qin < >>>> qinhanmin2005 at sina.com> >>>> >> wrote: >>>> > >>>> >> +1 (personally I still think it's better to keep the >>>> flow chart, it >>>> >> seems useful for beginners) >>>> > >>>> >> Hanmin Qin >>>> > >>>> >> ----- Original Message ----- >>>> >> From: Alexandre Gramfort >>>> >> To: Scikit-learn mailing list >>>> >> Subject: Re: [scikit-learn] VOTE: scikit-learn >>>> governance document >>>> >> Date: 2019-02-11 01:29 >>>> > >>>> >> +1 for me too >>>> > >>>> >> Alex >>>> > >>>> > >>>> >> On Sat, Feb 9, 2019 at 10:06 PM Gilles Louppe < >>>> g.louppe at gmail.com> >>>> >> wrote: >>>> > >>>> >> Hi Andy, >>>> > >>>> >> I read through to document. Even though I have not >>>> been really >>>> >> active >>>> >> these past months/years, I think it summarizes well >>>> our >>>> >> governance >>>> >> model. >>>> > >>>> >> +1. >>>> > >>>> >> Gilles >>>> > >>>> >> On Sat, 9 Feb 2019 at 12:01, Adrin < >>>> adrin.jalali at gmail.com> >>>> >> wrote: >>>> > >>>> >> > +1 >>>> > >>>> >> > Thanks for the work you've put in it! >>>> > >>>> >> > On Sat, Feb 9, 2019, 03:00 Andreas Mueller < >>>> t3kcit at gmail.com >>>> >> wrote: >>>> > >>>> >> >> Hey all. >>>> > >>>> >> >> I want to call a vote on the final version on the >>>> >> scikit-learn >>>> >> >> governance document, which can be found in this >>>> PR: >>>> > >>>> >> >> >>>> https://github.com/scikit-learn/scikit-learn/pull/12878 >>>> > >>>> >> >> It underwent some significant changes in the >>>> last couple of >>>> >> weeks. >>>> > >>>> >> >> The two-sentence summary is: conflicts are >>>> resolved by vote >>>> >> among core >>>> >> >> devs, with a technical committee resolving >>>> anything that can >>>> >> not be >>>> >> >> decided by at least a 2/3 majority. The initial >>>> technical >>>> >> committee is >>>> >> >> Alexander Gramfort, Olivier Grisel, Joel >>>> Nothman, Hanmin >>>> >> Qin, Ga?l >>>> >> >> Varoquaux and myself (Andreas M?ller). >>>> > >>>> >> >> I would ask all of the *core developers* to >>>> either vote +1 >>>> >> for the >>>> >> >> governance doc, -1 against it, or to explicitly >>>> abstain here >>>> >> on the >>>> >> >> public mailing list (which is the way any vote >>>> will be >>>> >> conducted >>>> >> >> according to the new governance document). >>>> > >>>> >> >> I suggest we leave the vote open for two weeks, >>>> so that the >>>> >> decision is >>>> >> >> made before the sprint and we can take actions. >>>> > >>>> >> >> Anyone can still comment on the PR or here, >>>> though I would >>>> >> rather not >>>> >> >> make more changes as this has already been >>>> discussed to some >>>> >> length. >>>> > >>>> >> >> Thank you for participating, >>>> > >>>> >> >> Andy >>>> > >>>> >> >> _______________________________________________ >>>> >> >> scikit-learn mailing list >>>> >> >> scikit-learn at python.org >>>> >> >> >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >>>> >> > _______________________________________________ >>>> >> > scikit-learn mailing list >>>> >> > scikit-learn at python.org >>>> >> > >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >> _______________________________________________ >>>> >> scikit-learn mailing list >>>> >> scikit-learn at python.org >>>> >> >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >>>> >> _______________________________________________ >>>> >> scikit-learn mailing list >>>> >> scikit-learn at python.org >>>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >> _______________________________________________ >>>> >> scikit-learn mailing list >>>> >> scikit-learn at python.org >>>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >>>> >> _______________________________________________ >>>> >> scikit-learn mailing list >>>> >> scikit-learn at python.org >>>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >>>> >> _______________________________________________ >>>> >> scikit-learn mailing list >>>> >> scikit-learn at python.org >>>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >>>> > >>>> >> _______________________________________________ >>>> >> scikit-learn mailing list >>>> >> scikit-learn at python.org >>>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >>>> > >>>> > -- >>>> > Gael Varoquaux >>>> > Senior Researcher, INRIA Parietal >>>> > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >>>> > Phone: ++ 33-1-69-08-79-68 >>>> > http://gael-varoquaux.info >>>> http://twitter.com/GaelVaroquaux >>>> > _______________________________________________ >>>> > scikit-learn mailing list >>>> > scikit-learn at python.org >>>> > https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pahome.chen at mirlab.org Tue Feb 12 21:04:18 2019 From: pahome.chen at mirlab.org (lampahome) Date: Wed, 13 Feb 2019 10:04:18 +0800 Subject: [scikit-learn] How to deal with hierarchical and real-time analysis in machine learning? Message-ID: For example, I may have huge different regions and every regions have many or less points. And I also want to real-time to analyze the newest data and older data, but I don't want to put data into memory cuz I don't have enough memory. What I thought I can use is partial_fit to accept streaming data when new data comes in. But the incoming data has hierarchical, it's hard to cluster them cuz I don't have older and newer data together to cluster. How to design the system better? thx -------------- next part -------------- An HTML attachment was scrubbed... URL: From maxhalford25 at gmail.com Wed Feb 13 05:13:26 2019 From: maxhalford25 at gmail.com (Max Halford) Date: Wed, 13 Feb 2019 11:13:26 +0100 Subject: [scikit-learn] How to deal with hierarchical and real-time analysis in machine learning? In-Reply-To: References: Message-ID: Hey lampahome, I'm currently working on an online learning library called creme: https://creme-ml.github.io/. Each estimator and transformer has a fit_one(x, y) method so that you can learn from a stream of data. I've only been working on it for a bit less than a month now but it might be of interest to you nonetheless. Maybe it will give you some ideas. There's an introductory tutorial on GitHub. Kind regards. On 13/02/2019, lampahome wrote: > For example, I may have huge different regions and every regions have many > or less points. > > And I also want to real-time to analyze the newest data and older data, but > I don't want to put data into memory cuz I don't have enough memory. > > What I thought I can use is partial_fit to accept streaming data when new > data comes in. > > But the incoming data has hierarchical, it's hard to cluster them cuz I > don't have older and newer data together to cluster. > > How to design the system better? > > thx > -- Max Halford +336 28 25 13 38 From anni-bauer at outlook.com Wed Feb 13 08:09:36 2019 From: anni-bauer at outlook.com (Anni Bauer) Date: Wed, 13 Feb 2019 13:09:36 +0000 Subject: [scikit-learn] cross_validate() with HMM Message-ID: Hi! I want to be able to run each fold of a k-fold cross validation fold in parallel, using all of my 6 CPUs at once. My model is a hidden markov model and I want to train it using the training portion of the data and then extract the anomaly score (negative log-likelihood) of each test sequence of the test portion with every fold and use ROC as an evaluation technique with every fold. I have found the function cross_validate() which seems to provide the option of running things in parralel with n_jobs = -1. I assume the estimator is then my HMM model. As of now I'm using pomegranate to train the model and extract the anomaly score of the test sequences. I don't understand how to call the cross_validate function with the right arguments for my HMM model. All examples I've seen havn't used HMM. I'm confused on where to specify the hidden states number if Im not callign my usual pomegranate function from_samples(), which I've used before. Also how can I extract the anomay scores within each fold using this function? I'm unsure what exactly is happening with in the cross_validate function and how to control it the way I need. If anyone has an example or explanation or another idea on how to run the folds in parallel, I would really appreciate it! This is my attempt of using cross_validate, which gets stuck or seems to not be running through (although I'm quite sure I'm not using it properly): import pomegranate import sklearn model = pomegranate.HiddenMarkovModel() results = cross_validate(model, listToUse, y=None, groups=None, scoring=None, cv=3, n_jobs=-1, verbose=10) print(results) Below is how I've manually set my cross-validation up as of now: listExample = [] kfold = KFold(10, True) for train, test in kfold.split(listToUse): listExample.append([listToUse[train], listToUse[test]]) scoreList = [] for ex in listExample: hmmModel = hmm.hmm(ex[0]) scoreListFold = [] mid = time.time() for li in ex[1]: prob = hmmModel.log_probability(li) scoreListFold.append(prob) scoreList.append(numpy.mean(scoreListFold)) avg = numpy.mean(scoreList) Thanks again! Anni -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Feb 13 15:37:54 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 13 Feb 2019 15:37:54 -0500 Subject: [scikit-learn] Sprint discussion points? Message-ID: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> Hey all. Should we collect some discussion points for the sprint? There's an unusual amount of core-devs present and I think we should seize the opportunity. Maybe we should create a page in the wiki or add it to the sprint page? Things that are high on my list of priorities are: * slicing pipelines * add get_feature_names to pipelines * freezing estimator * faster multi-metric scoring * fit_transform doing something other than fit.transform * imbalance-learn interface / subsampling in pipelines * Specifying search spaces and valid hyper parameters (https://github.com/scikit-learn/scikit-learn/issues/13031). * allowing EstimatorCV-style speed-up in GridSearches * storing pandas column names and using them as feature names Trying to discuss all of these might be too much, but maybe we can figure out a subset and make sure we have sleps to discuss? Most of these issues are on the roadmap, issue 13031 is reladed to #18 but not directly on the roadmap. Thanks, Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed Feb 13 20:08:02 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 14 Feb 2019 12:08:02 +1100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> Message-ID: Yes, I was thinking the same. I think there are some other core issues to solve, such as: * euclidean_distances numerical issues * commitment to ARM testing and debugging * logistic regression stability We should also nut out OPTICS issues or remove it from 0.21. I'm still keen on trying to work out sample props (supporting weighted scoring at least), but perhaps I'm being persuaded this will never be a top-priority requirement, and the solutions add much complexity. On Thu, 14 Feb 2019 at 07:39, Andreas Mueller wrote: > Hey all. > > Should we collect some discussion points for the sprint? > > There's an unusual amount of core-devs present and I think we should seize > the opportunity. > Maybe we should create a page in the wiki or add it to the sprint page? > > Things that are high on my list of priorities are: > > - slicing pipelines > - add get_feature_names to pipelines > - freezing estimator > - faster multi-metric scoring > - fit_transform doing something other than fit.transform > - imbalance-learn interface / subsampling in pipelines > - Specifying search spaces and valid hyper parameters ( > https://github.com/scikit-learn/scikit-learn/issues/13031). > - allowing EstimatorCV-style speed-up in GridSearches > - storing pandas column names and using them as feature names > > > Trying to discuss all of these might be too much, but maybe we can figure > out a subset and make sure we have sleps to discuss? > Most of these issues are on the roadmap, issue 13031 is reladed to #18 but > not directly on the roadmap. > > Thanks, > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Feb 13 20:54:54 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 13 Feb 2019 20:54:54 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> Message-ID: <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> Do you have a reference for the logistic regression stability? Is it convergence warnings? Happy to discuss the other two issues, though I feel they seem easier than most of what's on my list. I have no idea what's going on with OPTICS tbh, and I'll leave it up to you and the others to decide whether that's something we should discuss. I can try to read up and weigh in but that might not be the most effective way to do it. the sample props is something I left out because I personally don't feel it's a priority compared to all the other things; my students have basically no way to figure out what features the coefficients in their linear model correspond to, that seems a bit more important to me. We can put it on the discussion list again, but I'm not super enthusiastic about it. How should we prioritize things? On 2/13/19 8:08 PM, Joel Nothman wrote: > Yes, I was thinking the same. I think there are some other core issues > to solve, such as: > > * euclidean_distances numerical issues > * commitment to ARM testing and debugging > * logistic regression stability > > We should also nut out OPTICS issues or remove it from 0.21. I'm still > keen on trying to work out sample props (supporting weighted scoring > at least), but perhaps I'm being persuaded this will never be a > top-priority requirement, and the solutions add much complexity. > > On Thu, 14 Feb 2019 at 07:39, Andreas Mueller > wrote: > > Hey all. > > Should we collect some discussion points for the sprint? > > There's an unusual amount of core-devs present and I think we > should seize the opportunity. > Maybe we should create a page in the wiki or add it to the sprint > page? > > Things that are high on my list of priorities are: > > * slicing pipelines > * add get_feature_names to pipelines > * freezing estimator > * faster multi-metric scoring > * fit_transform doing something other than fit.transform > * imbalance-learn interface / subsampling in pipelines > * Specifying search spaces and valid hyper parameters > (https://github.com/scikit-learn/scikit-learn/issues/13031). > * allowing EstimatorCV-style speed-up in GridSearches > * storing pandas column names and using them as feature names > > > Trying to discuss all of these might be too much, but maybe we can > figure out a subset and make sure we have sleps to discuss? > Most of these issues are on the roadmap, issue 13031 is reladed to > #18 but not directly on the roadmap. > > Thanks, > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed Feb 13 23:28:55 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 14 Feb 2019 15:28:55 +1100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> Message-ID: Convergence in logistic regression ( https://github.com/scikit-learn/scikit-learn/issues/11536) is indeed one problem (and it presents a general issue of what max_iter means when you have several solvers, or how good defaults are selected). But I was sure we had problems with non-determinism on some platforms... but now can't find. > my students have basically no way to figure out what features the coefficients in their linear model correspond to, that seems a bit more important to me. Yes, I agree... Assuming coefficients are helpful, rather than using permutation-based measures of importance, for instance. I generally think a review of distances might be a good thing at some point, given the confusing triplication across sklearn.neighbors, sklearn.metrics.pairwise, scipy.spatial... and that minkowski,p=2 is not implemented the same as euclidean. On Thu, 14 Feb 2019 at 12:56, Andreas Mueller wrote: > Do you have a reference for the logistic regression stability? Is it > convergence warnings? > > Happy to discuss the other two issues, though I feel they seem easier than > most of what's on my list. > > I have no idea what's going on with OPTICS tbh, and I'll leave it up to > you and the others to decide whether that's something we should discuss. > I can try to read up and weigh in but that might not be the most effective > way to do it. > > the sample props is something I left out because I personally don't feel > it's a priority compared to all the other things; > my students have basically no way to figure out what features the > coefficients in their linear model correspond to, that seems a bit more > important to me. > > We can put it on the discussion list again, but I'm not super enthusiastic > about it. > > How should we prioritize things? > > > On 2/13/19 8:08 PM, Joel Nothman wrote: > > Yes, I was thinking the same. I think there are some other core issues to > solve, such as: > > * euclidean_distances numerical issues > * commitment to ARM testing and debugging > * logistic regression stability > > We should also nut out OPTICS issues or remove it from 0.21. I'm still > keen on trying to work out sample props (supporting weighted scoring at > least), but perhaps I'm being persuaded this will never be a top-priority > requirement, and the solutions add much complexity. > > On Thu, 14 Feb 2019 at 07:39, Andreas Mueller wrote: > >> Hey all. >> >> Should we collect some discussion points for the sprint? >> >> There's an unusual amount of core-devs present and I think we should >> seize the opportunity. >> Maybe we should create a page in the wiki or add it to the sprint page? >> >> Things that are high on my list of priorities are: >> >> - slicing pipelines >> - add get_feature_names to pipelines >> - freezing estimator >> - faster multi-metric scoring >> - fit_transform doing something other than fit.transform >> - imbalance-learn interface / subsampling in pipelines >> - Specifying search spaces and valid hyper parameters ( >> https://github.com/scikit-learn/scikit-learn/issues/13031). >> - allowing EstimatorCV-style speed-up in GridSearches >> - storing pandas column names and using them as feature names >> >> >> Trying to discuss all of these might be too much, but maybe we can figure >> out a subset and make sure we have sleps to discuss? >> Most of these issues are on the roadmap, issue 13031 is reladed to #18 >> but not directly on the roadmap. >> >> Thanks, >> Andy >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Thu Feb 14 05:46:13 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Thu, 14 Feb 2019 11:46:13 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> Message-ID: I am really interested in the union of the list given by Andy and Joel. I'll like to have some discussions related to the "impute" module. Compare to the other topics, it is not a high priority discussion thought. On Thu, 14 Feb 2019 at 05:31, Joel Nothman wrote: > Convergence in logistic regression ( > https://github.com/scikit-learn/scikit-learn/issues/11536) is indeed one > problem (and it presents a general issue of what max_iter means when you > have several solvers, or how good defaults are selected). But I was sure we > had problems with non-determinism on some platforms... but now can't find. > > > my students have basically no way to figure out what features the > coefficients in their linear model correspond to, that seems a bit more > important to me. > > Yes, I agree... Assuming coefficients are helpful, rather than using > permutation-based measures of importance, for instance. > > I generally think a review of distances might be a good thing at some > point, given the confusing triplication across sklearn.neighbors, > sklearn.metrics.pairwise, scipy.spatial... and that minkowski,p=2 is not > implemented the same as euclidean. > > > On Thu, 14 Feb 2019 at 12:56, Andreas Mueller wrote: > >> Do you have a reference for the logistic regression stability? Is it >> convergence warnings? >> >> Happy to discuss the other two issues, though I feel they seem easier >> than most of what's on my list. >> >> I have no idea what's going on with OPTICS tbh, and I'll leave it up to >> you and the others to decide whether that's something we should discuss. >> I can try to read up and weigh in but that might not be the most >> effective way to do it. >> >> the sample props is something I left out because I personally don't feel >> it's a priority compared to all the other things; >> my students have basically no way to figure out what features the >> coefficients in their linear model correspond to, that seems a bit more >> important to me. >> >> We can put it on the discussion list again, but I'm not super >> enthusiastic about it. >> >> How should we prioritize things? >> >> >> On 2/13/19 8:08 PM, Joel Nothman wrote: >> >> Yes, I was thinking the same. I think there are some other core issues to >> solve, such as: >> >> * euclidean_distances numerical issues >> * commitment to ARM testing and debugging >> * logistic regression stability >> >> We should also nut out OPTICS issues or remove it from 0.21. I'm still >> keen on trying to work out sample props (supporting weighted scoring at >> least), but perhaps I'm being persuaded this will never be a top-priority >> requirement, and the solutions add much complexity. >> >> On Thu, 14 Feb 2019 at 07:39, Andreas Mueller wrote: >> >>> Hey all. >>> >>> Should we collect some discussion points for the sprint? >>> >>> There's an unusual amount of core-devs present and I think we should >>> seize the opportunity. >>> Maybe we should create a page in the wiki or add it to the sprint page? >>> >>> Things that are high on my list of priorities are: >>> >>> - slicing pipelines >>> - add get_feature_names to pipelines >>> - freezing estimator >>> - faster multi-metric scoring >>> - fit_transform doing something other than fit.transform >>> - imbalance-learn interface / subsampling in pipelines >>> - Specifying search spaces and valid hyper parameters ( >>> https://github.com/scikit-learn/scikit-learn/issues/13031). >>> - allowing EstimatorCV-style speed-up in GridSearches >>> - storing pandas column names and using them as feature names >>> >>> >>> Trying to discuss all of these might be too much, but maybe we can >>> figure out a subset and make sure we have sleps to discuss? >>> Most of these issues are on the roadmap, issue 13031 is reladed to #18 >>> but not directly on the roadmap. >>> >>> Thanks, >>> Andy >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrin.jalali at gmail.com Thu Feb 14 08:05:44 2019 From: adrin.jalali at gmail.com (Adrin) Date: Thu, 14 Feb 2019 14:05:44 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> Message-ID: I've been working on some bias mitigation metrics and methods and that usecase changes the data as well as up/down sampling as a transformer. Almost all those methods also need sample properties for the observations to work. I'm trying to make them "sklearn compatible", but for now it's pretty hacky. So I'd be happy if we discuss the union of what Joel and Andy suggest. Cheers, Adrin. On Thu, Feb 14, 2019, 11:47 Guillaume Lema?tre I am really interested in the union of the list given by Andy and Joel. > > I'll like to have some discussions related to the "impute" module. Compare > to the other topics, it is not a high priority discussion thought. > > On Thu, 14 Feb 2019 at 05:31, Joel Nothman wrote: > >> Convergence in logistic regression ( >> https://github.com/scikit-learn/scikit-learn/issues/11536) is indeed one >> problem (and it presents a general issue of what max_iter means when you >> have several solvers, or how good defaults are selected). But I was sure we >> had problems with non-determinism on some platforms... but now can't find. >> >> > my students have basically no way to figure out what features the >> coefficients in their linear model correspond to, that seems a bit more >> important to me. >> >> Yes, I agree... Assuming coefficients are helpful, rather than using >> permutation-based measures of importance, for instance. >> >> I generally think a review of distances might be a good thing at some >> point, given the confusing triplication across sklearn.neighbors, >> sklearn.metrics.pairwise, scipy.spatial... and that minkowski,p=2 is not >> implemented the same as euclidean. >> >> >> On Thu, 14 Feb 2019 at 12:56, Andreas Mueller wrote: >> >>> Do you have a reference for the logistic regression stability? Is it >>> convergence warnings? >>> >>> Happy to discuss the other two issues, though I feel they seem easier >>> than most of what's on my list. >>> >>> I have no idea what's going on with OPTICS tbh, and I'll leave it up to >>> you and the others to decide whether that's something we should discuss. >>> I can try to read up and weigh in but that might not be the most >>> effective way to do it. >>> >>> the sample props is something I left out because I personally don't feel >>> it's a priority compared to all the other things; >>> my students have basically no way to figure out what features the >>> coefficients in their linear model correspond to, that seems a bit more >>> important to me. >>> >>> We can put it on the discussion list again, but I'm not super >>> enthusiastic about it. >>> >>> How should we prioritize things? >>> >>> >>> On 2/13/19 8:08 PM, Joel Nothman wrote: >>> >>> Yes, I was thinking the same. I think there are some other core issues >>> to solve, such as: >>> >>> * euclidean_distances numerical issues >>> * commitment to ARM testing and debugging >>> * logistic regression stability >>> >>> We should also nut out OPTICS issues or remove it from 0.21. I'm still >>> keen on trying to work out sample props (supporting weighted scoring at >>> least), but perhaps I'm being persuaded this will never be a top-priority >>> requirement, and the solutions add much complexity. >>> >>> On Thu, 14 Feb 2019 at 07:39, Andreas Mueller wrote: >>> >>>> Hey all. >>>> >>>> Should we collect some discussion points for the sprint? >>>> >>>> There's an unusual amount of core-devs present and I think we should >>>> seize the opportunity. >>>> Maybe we should create a page in the wiki or add it to the sprint page? >>>> >>>> Things that are high on my list of priorities are: >>>> >>>> - slicing pipelines >>>> - add get_feature_names to pipelines >>>> - freezing estimator >>>> - faster multi-metric scoring >>>> - fit_transform doing something other than fit.transform >>>> - imbalance-learn interface / subsampling in pipelines >>>> - Specifying search spaces and valid hyper parameters ( >>>> https://github.com/scikit-learn/scikit-learn/issues/13031). >>>> - allowing EstimatorCV-style speed-up in GridSearches >>>> - storing pandas column names and using them as feature names >>>> >>>> >>>> Trying to discuss all of these might be too much, but maybe we can >>>> figure out a subset and make sure we have sleps to discuss? >>>> Most of these issues are on the roadmap, issue 13031 is reladed to #18 >>>> but not directly on the roadmap. >>>> >>>> Thanks, >>>> Andy >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Thu Feb 14 08:26:57 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 14 Feb 2019 08:26:57 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> Message-ID: On 2/13/19 11:28 PM, Joel Nothman wrote: > Convergence in logistic regression > (https://github.com/scikit-learn/scikit-learn/issues/11536)?is indeed > one problem (and it presents a general issue of what max_iter means > when you have several solvers, or how good defaults are selected). But > I was sure we had problems with non-determinism on some platforms... > but now can't find. > > > my students have basically no way to figure out what features the > coefficients in their linear model correspond to, that seems a bit > more important to me. > > Yes, I agree... Assuming coefficients are helpful, rather than using > permutation-based measures of importance, for instance. You would apply the permutation based feature importances before any preprocessing? I guess there's a case to be made for either option. I think there are good reasons to look at coefficients though. > I generally think a review of distances might be a good thing at some > point, given the confusing triplication across sklearn.neighbors, > sklearn.metrics.pairwise, scipy.spatial... and that minkowski,p=2 is > not implemented the same as euclidean. > Yes, I agree. I guess right now I'm more enthusiastic about new features/APIs than decreasing technical debt, maybe because you're the one dealing with the technical debt ;) From t3kcit at gmail.com Thu Feb 14 08:31:27 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 14 Feb 2019 08:31:27 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> Message-ID: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> As I said, I think it's too much and we need to prioritize. We could either rank issues and start with some and see how far we get, or we could go as far as to schedule meetings on the different topics. Also, I'll be only arriving Tuesday late morning, I think. On 2/14/19 8:05 AM, Adrin wrote: > I've been working on some bias mitigation metrics and methods and that > usecase > changes the data as well as up/down sampling as a transformer. Almost > all those > methods also need sample properties for the observations to work. I'm > trying to > make them "sklearn compatible", but for now it's pretty hacky. So I'd > be happy if > we discuss the union of what Joel and Andy suggest. > > Cheers, > Adrin. > > On Thu, Feb 14, 2019, 11:47 Guillaume Lema?tre wrote: > > I am really interested in the union of the list given by Andy and > Joel. > > I'll like to have some discussions related to the "impute" module. > Compare to the other topics, it is not a high priority discussion > thought. > > On Thu, 14 Feb 2019 at 05:31, Joel Nothman > wrote: > > Convergence in logistic regression > (https://github.com/scikit-learn/scikit-learn/issues/11536)?is > indeed one problem (and it presents a general issue of what > max_iter means when you have several solvers, or how good > defaults are selected). But I was sure we had problems with > non-determinism on some platforms... but now can't find. > > > my students have basically no way to figure out what > features the coefficients in their linear model correspond to, > that seems a bit more important to me. > > Yes, I agree... Assuming coefficients are helpful, rather than > using permutation-based measures of importance, for instance. > > I generally think a review of distances might be a good thing > at some point, given the confusing triplication across > sklearn.neighbors, sklearn.metrics.pairwise, scipy.spatial... > and that minkowski,p=2 is not implemented the same as euclidean. > > > On Thu, 14 Feb 2019 at 12:56, Andreas Mueller > > wrote: > > Do you have a reference for the logistic regression > stability? Is it convergence warnings? > > Happy to discuss the other two issues, though I feel they > seem easier than most of what's on my list. > > I have no idea what's going on with OPTICS tbh, and I'll > leave it up to you and the others to decide whether that's > something we should discuss. > I can try to read up and weigh in but that might not be > the most effective way to do it. > > the sample props is something I left out because I > personally don't feel it's a priority compared to all the > other things; > my students have basically no way to figure out what > features the coefficients in their linear model correspond > to, that seems a bit more important to me. > > We can put it on the discussion list again, but I'm not > super enthusiastic about it. > > How should we prioritize things? > > > On 2/13/19 8:08 PM, Joel Nothman wrote: >> Yes, I was thinking the same. I think there are some >> other core issues to solve, such as: >> >> * euclidean_distances numerical issues >> * commitment to ARM testing and debugging >> * logistic regression stability >> >> We should also nut out OPTICS issues or remove it from >> 0.21. I'm still keen on trying to work out sample props >> (supporting weighted scoring at least), but perhaps I'm >> being persuaded this will never be a top-priority >> requirement, and the solutions add much complexity. >> >> On Thu, 14 Feb 2019 at 07:39, Andreas Mueller >> > wrote: >> >> Hey all. >> >> Should we collect some discussion points for the sprint? >> >> There's an unusual amount of core-devs present and I >> think we should seize the opportunity. >> Maybe we should create a page in the wiki or add it >> to the sprint page? >> >> Things that are high on my list of priorities are: >> >> * slicing pipelines >> * add get_feature_names to pipelines >> * freezing estimator >> * faster multi-metric scoring >> * fit_transform doing something other than >> fit.transform >> * imbalance-learn interface / subsampling in pipelines >> * Specifying search spaces and valid hyper >> parameters >> (https://github.com/scikit-learn/scikit-learn/issues/13031). >> * allowing EstimatorCV-style speed-up in GridSearches >> * storing pandas column names and using them as >> feature names >> >> >> Trying to discuss all of these might be too much, but >> maybe we can figure out a subset and make sure we >> have sleps to discuss? >> Most of these issues are on the roadmap, issue 13031 >> is reladed to #18 but not directly on the roadmap. >> >> Thanks, >> Andy >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From skim22 at memphis.edu Thu Feb 14 11:30:47 2019 From: skim22 at memphis.edu (skim22) Date: Thu, 14 Feb 2019 16:30:47 +0000 Subject: [scikit-learn] [Question & Help]The criterion of data size for choosing a right algorithm. In-Reply-To: References: , Message-ID: Dear Sir or Madam, Good morning, My name is Steven Kim from Memphis, and I am a graduate student at the University of Memphis. Recently, I found the page of choosing the right estimator on the official website (https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html). It was greatly helpful to distinguish what algorithms I should use. But, I would like to know something in detail on the page. My question is that there are several criteria of sample size such as ">50" and "100k>" before the decision. Could you let me know that the grounds( ex) academic papers ) for the sample sizes? It would be helpful to understand them more deeply. Regards, Steven Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Thu Feb 14 11:40:05 2019 From: niourf at gmail.com (Nicolas Hug) Date: Thu, 14 Feb 2019 11:40:05 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> Message-ID: <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> > or we could go as far as to schedule meetings on the different topics. Given the number of issues to discuss this is probably the best approach IMO On 2/14/19 8:31 AM, Andreas Mueller wrote: > > As I said, I think it's too much and we need to prioritize. > > We could either rank issues and start with some and see how far we > get, or we could go as far as to schedule meetings on the different > topics. > > Also, I'll be only arriving Tuesday late morning, I think. > > > On 2/14/19 8:05 AM, Adrin wrote: >> I've been working on some bias mitigation metrics and methods and >> that usecase >> changes the data as well as up/down sampling as a transformer. Almost >> all those >> methods also need sample properties for the observations to work. I'm >> trying to >> make them "sklearn compatible", but for now it's pretty hacky. So I'd >> be happy if >> we discuss the union of what Joel and Andy suggest. >> >> Cheers, >> Adrin. >> >> On Thu, Feb 14, 2019, 11:47 Guillaume Lema?tre >> wrote: >> >> I am really interested in the union of the list given by Andy and >> Joel. >> >> I'll like to have some discussions related to the "impute" >> module. Compare to the other topics, it is not a high priority >> discussion thought. >> >> On Thu, 14 Feb 2019 at 05:31, Joel Nothman >> > wrote: >> >> Convergence in logistic regression >> (https://github.com/scikit-learn/scikit-learn/issues/11536)?is >> indeed one problem (and it presents a general issue of what >> max_iter means when you have several solvers, or how good >> defaults are selected). But I was sure we had problems with >> non-determinism on some platforms... but now can't find. >> >> > my students have basically no way to figure out what >> features the coefficients in their linear model correspond >> to, that seems a bit more important to me. >> >> Yes, I agree... Assuming coefficients are helpful, rather >> than using permutation-based measures of importance, for >> instance. >> >> I generally think a review of distances might be a good thing >> at some point, given the confusing triplication across >> sklearn.neighbors, sklearn.metrics.pairwise, scipy.spatial... >> and that minkowski,p=2 is not implemented the same as euclidean. >> >> >> On Thu, 14 Feb 2019 at 12:56, Andreas Mueller >> > wrote: >> >> Do you have a reference for the logistic regression >> stability? Is it convergence warnings? >> >> Happy to discuss the other two issues, though I feel they >> seem easier than most of what's on my list. >> >> I have no idea what's going on with OPTICS tbh, and I'll >> leave it up to you and the others to decide whether >> that's something we should discuss. >> I can try to read up and weigh in but that might not be >> the most effective way to do it. >> >> the sample props is something I left out because I >> personally don't feel it's a priority compared to all the >> other things; >> my students have basically no way to figure out what >> features the coefficients in their linear model >> correspond to, that seems a bit more important to me. >> >> We can put it on the discussion list again, but I'm not >> super enthusiastic about it. >> >> How should we prioritize things? >> >> >> On 2/13/19 8:08 PM, Joel Nothman wrote: >>> Yes, I was thinking the same. I think there are some >>> other core issues to solve, such as: >>> >>> * euclidean_distances numerical issues >>> * commitment to ARM testing and debugging >>> * logistic regression stability >>> >>> We should also nut out OPTICS issues or remove it from >>> 0.21. I'm still keen on trying to work out sample props >>> (supporting weighted scoring at least), but perhaps I'm >>> being persuaded this will never be a top-priority >>> requirement, and the solutions add much complexity. >>> >>> On Thu, 14 Feb 2019 at 07:39, Andreas Mueller >>> > wrote: >>> >>> Hey all. >>> >>> Should we collect some discussion points for the sprint? >>> >>> There's an unusual amount of core-devs present and I >>> think we should seize the opportunity. >>> Maybe we should create a page in the wiki or add it >>> to the sprint page? >>> >>> Things that are high on my list of priorities are: >>> >>> * slicing pipelines >>> * add get_feature_names to pipelines >>> * freezing estimator >>> * faster multi-metric scoring >>> * fit_transform doing something other than >>> fit.transform >>> * imbalance-learn interface / subsampling in pipelines >>> * Specifying search spaces and valid hyper >>> parameters >>> (https://github.com/scikit-learn/scikit-learn/issues/13031). >>> * allowing EstimatorCV-style speed-up in GridSearches >>> * storing pandas column names and using them as >>> feature names >>> >>> >>> Trying to discuss all of these might be too much, >>> but maybe we can figure out a subset and make sure >>> we have sleps to discuss? >>> Most of these issues are on the roadmap, issue 13031 >>> is reladed to #18 but not directly on the roadmap. >>> >>> Thanks, >>> Andy >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> -- >> Guillaume Lemaitre >> INRIA Saclay - Parietal team >> Center for Data Science Paris-Saclay >> https://glemaitre.github.io/ >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Fri Feb 15 10:04:32 2019 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 15 Feb 2019 16:04:32 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> Message-ID: I would also add generalizing early stopping options to most estimators. This is a bit related to Joel's point on max_iter consistency in LogisticRegression. -- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From huantd at gmail.com Fri Feb 15 19:52:36 2019 From: huantd at gmail.com (Huan Tran) Date: Fri, 15 Feb 2019 19:52:36 -0500 Subject: [scikit-learn] inconsistency across version Message-ID: Dear community, I did a very small pca analysis on a 3D data to print out the explained_variance. I found that with scikit-learn 0.18.1 AND 0.20.2, the results are significantly different. In particular, for 0.18.1 I got +3.875925353581E+00 +3.270175297443E+00 +2.207814537475E+00 and with 0.20.2, I got +4.651110424297E+00 +3.924210356932E+00 +2.649377444970E+00 Could anyone has a hint on what is going on? FYI, my data and code are enclosed. Many thanks. Huan My data is -3.117642E+00, 1.453819E+00, -7.952874E-02 3.081224E+00, 1.453819E+00, -7.952874E-02 1.376932E-01, -2.491454E+00, -1.908521E-01 9.578602E-02, 3.632759E+00, -1.908521E-01 -1.238644E-01, 5.396424E-02, -3.147031E+00 6.335262E-01, 1.393937E+00, 2.500474E+00 and my code is import pandas as pd import numpy as np from sklearn import decomposition df = pd.read_csv('data', delimiter=',', header=None) data = np.array(df) X = data[:,:] data_size = X.shape[0] feature_dim = X.shape[1] print X pca = decomposition.PCA(n_components=feature_dim) X_transformed = pca.fit_transform(X) print "%+4.12E %+4.12E %+4.12E" %(pca.explained_variance_[0], pca.explained_variance_[1], pca.explained_variance_[2]) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: data Type: application/octet-stream Size: 272 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pca.py Type: text/x-python Size: 429 bytes Desc: not available URL: From niourf at gmail.com Fri Feb 15 20:31:50 2019 From: niourf at gmail.com (Nicolas Hug) Date: Fri, 15 Feb 2019 20:31:50 -0500 Subject: [scikit-learn] inconsistency across version In-Reply-To: References: Message-ID: <0531e274-b204-e473-3cc8-96548db006bc@gmail.com> There was a bug in 0.18 that was fixed here https://github.com/scikit-learn/scikit-learn/pull/9105 The results from 0.20 should be correct. It looks like you're still using Python 2, please be aware that *scikit-learn will drop support for python 2 in the next release*! Nicolas On 2/15/19 7:52 PM, Huan Tran wrote: > Dear community, > > I did a very small pca analysis on a 3D data to print out the > explained_variance. I found that with scikit-learn 0.18.1 AND 0.20.2, > the results are significantly different. In particular, for 0.18.1 I got > +3.875925353581E+00 +3.270175297443E+00 +2.207814537475E+00 > > and with 0.20.2, I got > +4.651110424297E+00 +3.924210356932E+00 +2.649377444970E+00 > > Could anyone has a hint on what is going on? FYI, my data and code are > enclosed. Many thanks. > > Huan > My data is > > ?-3.117642E+00,? 1.453819E+00, -7.952874E-02 > ? 3.081224E+00,? 1.453819E+00, -7.952874E-02 > ? 1.376932E-01, -2.491454E+00, -1.908521E-01 > ? 9.578602E-02,? 3.632759E+00, -1.908521E-01 > ?-1.238644E-01,? 5.396424E-02, -3.147031E+00 > ? 6.335262E-01,? 1.393937E+00,? 2.500474E+00 > > and my code is > > import pandas as pd > import numpy as np > from sklearn import decomposition > > df = pd.read_csv('data', delimiter=',', header=None) > data = np.array(df) > > X = data[:,:] > data_size?? = X.shape[0] > feature_dim = X.shape[1] > > print X > > pca = decomposition.PCA(n_components=feature_dim) > X_transformed = pca.fit_transform(X) > print "%+4.12E %+4.12E %+4.12E" %(pca.explained_variance_[0], > pca.explained_variance_[1], pca.explained_variance_[2]) > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From huantd at gmail.com Sat Feb 16 16:42:26 2019 From: huantd at gmail.com (Huan Tran) Date: Sat, 16 Feb 2019 16:42:26 -0500 Subject: [scikit-learn] inconsistency across version In-Reply-To: References: Message-ID: Thank you, Nicolas. Huan On Fri, Feb 15, 2019 at 7:52 PM Huan Tran wrote: > Dear community, > > I did a very small pca analysis on a 3D data to print out the > explained_variance. I found that with scikit-learn 0.18.1 AND 0.20.2, the > results are significantly different. In particular, for 0.18.1 I got > +3.875925353581E+00 +3.270175297443E+00 +2.207814537475E+00 > > and with 0.20.2, I got > +4.651110424297E+00 +3.924210356932E+00 +2.649377444970E+00 > > Could anyone has a hint on what is going on? FYI, my data and code are > enclosed. Many thanks. > > Huan > > My data is > > -3.117642E+00, 1.453819E+00, -7.952874E-02 > 3.081224E+00, 1.453819E+00, -7.952874E-02 > 1.376932E-01, -2.491454E+00, -1.908521E-01 > 9.578602E-02, 3.632759E+00, -1.908521E-01 > -1.238644E-01, 5.396424E-02, -3.147031E+00 > 6.335262E-01, 1.393937E+00, 2.500474E+00 > > and my code is > > import pandas as pd > import numpy as np > from sklearn import decomposition > > df = pd.read_csv('data', delimiter=',', header=None) > data = np.array(df) > > X = data[:,:] > data_size = X.shape[0] > feature_dim = X.shape[1] > > print X > > pca = decomposition.PCA(n_components=feature_dim) > X_transformed = pca.fit_transform(X) > print "%+4.12E %+4.12E %+4.12E" %(pca.explained_variance_[0], > pca.explained_variance_[1], pca.explained_variance_[2]) > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bono10200807 at hotmail.com Sun Feb 17 04:18:04 2019 From: bono10200807 at hotmail.com (Yash Raj Rai) Date: Sun, 17 Feb 2019 09:18:04 +0000 Subject: [scikit-learn] Fw: Inclusion of an LSTM Classifier In-Reply-To: References: Message-ID: ________________________________ From: Yash Raj Rai Sent: Sunday, February 17, 2019 2:34 PM To: scikit-learn at python.org Subject: Inclusion of an LSTM Classifier Hello I wanted to know if there are any on-going projects on LSTM Classifier model in sklearn. If no, is there any possibility of its inclusion in the library? Is there anything beyond the contributor's guidelines that I need to know for the introduction of a new model? Than you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun Feb 17 04:55:13 2019 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 17 Feb 2019 10:55:13 +0100 Subject: [scikit-learn] Fw: Inclusion of an LSTM Classifier In-Reply-To: References: Message-ID: <20190217095513.npyzi533qwehbm6g@phare.normalesup.org> Hi, Thank you for the suggestion. Such an approach is a deep-learning approach, and is out-of-scope for scikit-learn: https://scikit-learn.org/stable/faq.html#why-is-there-no-support-for-deep-or-reinforcement-learning-will-there-be-support-for-deep-or-reinforcement-learning-in-scikit-learn Best, Ga?l On Sun, Feb 17, 2019 at 09:18:04AM +0000, Yash Raj Rai wrote: > ??????????????????????????????????????????????????????????????????????????????? > From: Yash Raj Rai > Sent: Sunday, February 17, 2019 2:34 PM > To: scikit-learn at python.org > Subject: Inclusion of an LSTM Classifier > Hello > I wanted to know if there are any on-going projects on LSTM Classifier model in > sklearn. If no, is there any possibility of its inclusion in the library? > Is there anything beyond the contributor's guidelines that I need to know for > the introduction of a new model? > Than you. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Senior Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From david.mo.burns at gmail.com Sun Feb 17 13:54:18 2019 From: david.mo.burns at gmail.com (David Burns) Date: Sun, 17 Feb 2019 13:54:18 -0500 Subject: [scikit-learn] Inclusion of an LSTM Classifier Message-ID: There is an sklearn wrapper for Keras models in the Keras library. That's an easy way to use LSTM in sklearn. Also the sklearn estimator API is pretty easy to figure out if you want to roll your own wrapper for any model really. -------------- next part -------------- An HTML attachment was scrubbed... URL: From niedakh at gmail.com Sun Feb 17 14:09:54 2019 From: niedakh at gmail.com (=?UTF-8?Q?Piotr_Szyma=C5=84ski?=) Date: Sun, 17 Feb 2019 20:09:54 +0100 Subject: [scikit-learn] Inclusion of an LSTM Classifier In-Reply-To: References: Message-ID: I've created a couple of scikit-learn compatible wrappers and model generators for scikit-multilearn: http://scikit.ml/multilabeldnn.html Depends on what library you prefer, here's some examples on how to use LSTMs via: - Keras: https://medium.com/@dclengacher/keras-lstm-recurrent-neural-networks-c1f5febde03d - pyTorch: https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html Just create a relevant model generating function, take a wrapper from scikit-multilearn, and put it into the scikit pipeline anyway you want. Best, Piotr Szymanski Scikit-multilearn Maintainer On Sun, Feb 17, 2019 at 7:55 PM David Burns wrote: > There is an sklearn wrapper for Keras models in the Keras library. That's > an easy way to use LSTM in sklearn. Also the sklearn estimator API is > pretty easy to figure out if you want to roll your own wrapper for any > model really. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Piotr Szyma?ski niedakh at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From qinhanmin2005 at sina.com Mon Feb 18 09:04:20 2019 From: qinhanmin2005 at sina.com (Hanmin Qin) Date: Mon, 18 Feb 2019 22:04:20 +0800 Subject: [scikit-learn] Sprint discussion points? Message-ID: <20190218140420.E7CE75D0009C@webmail.sinamail.sina.com.cn> Maybe it's worthwhile to discuss (and release) 0.20.3 during the sprint. We're almost ready except for a few test failures on specific platforms. I've labeled all the related PRs (i.e., PRs with a what's new entry in 0.20.3) as 0.20.3. We need to decide whether we want to backport more bug fixes (maybe more doc/example corrections) to 0.20.3. Joel mentions this several times but seems that he hasn't made the decision. I tend to do so, though technically we should only include bug fixes related to features introduced in 0.20.X (but I won't argue if someone make the decision). Some bugs seems not trivial (e.g., #13142 related to BaseMixture and #13124 related to StratifiedKFold). Hanmin Qin ----- Original Message ----- From: Olivier Grisel To: Scikit-learn mailing list Subject: Re: [scikit-learn] Sprint discussion points? Date: 2019-02-15 23:06 I would also add generalizing early stopping options to most estimators. This is a bit related to Joel's point on max_iter consistency in LogisticRegression. -- Olivier _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Mon Feb 18 15:06:20 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 19 Feb 2019 07:06:20 +1100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <20190218140420.E7CE75D0009C@webmail.sinamail.sina.com.cn> References: <20190218140420.E7CE75D0009C@webmail.sinamail.sina.com.cn> Message-ID: And here I was thinking we'd better just push out 0.20.3 this week with what's been listed for it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Feb 18 18:34:06 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 18 Feb 2019 18:34:06 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <20190218140420.E7CE75D0009C@webmail.sinamail.sina.com.cn> Message-ID: On 2/18/19 3:06 PM, Joel Nothman wrote: > And here I was thinking we'd better just push out 0.20.3 this week > with what's been listed for it. > I wouldn't mind this, just don't expect me to help ;) From joel.nothman at gmail.com Tue Feb 19 05:46:50 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 19 Feb 2019 21:46:50 +1100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> Message-ID: Uhh... I forgot to vote. +1 :) It seems there's some consensus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Tue Feb 19 05:56:42 2019 From: niourf at gmail.com (Nicolas Hug) Date: Tue, 19 Feb 2019 05:56:42 -0500 Subject: [scikit-learn] Reddit thread with complaints about scikit-learn In-Reply-To: References: Message-ID: Hi everyone, I stumbled upon this reddit thread [1] where people point out what they dislike about the scikit-learn API. It's mostly about the lack of consistency for linear models. Just thought it'd be interesting to have some external critics. Best, Nicolas [1] https://www.reddit.com/r/MachineLearning/comments/aryjif/d_alternatives_to_scikitlearn/ From jorisvandenbossche at gmail.com Tue Feb 19 06:46:27 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Tue, 19 Feb 2019 12:46:27 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> Message-ID: And a +1 from me as well Op di 19 feb. 2019 11:47 schreef Joel Nothman Uhh... I forgot to vote. +1 :) > > It seems there's some consensus. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 19 10:37:43 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 19 Feb 2019 10:37:43 -0500 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> Message-ID: <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> A good time to remind all core devs to vote (or abstain). +1 from me as well (as might be expected), I didn't want to put my vote in my call for the vote. Participation is not super high (as might be expected), 13 of the 49 core devs voted so far.. There are some people who have voiced opinions on the internal list before that I'll harass now - the document underwent substantial changes since then. I'll send a reminder to those who participated in the discussion before, and for anyone who hasn't voted by Thursday, I think it makes sense to send a request to vote combined with a question on whether they want to become emeritus members, so people that have been gone for a long time don't have to read/answer to two emails. On 2/19/19 5:46 AM, Joel Nothman wrote: > Uhh... I forgot to vote.?+1 :) > > It seems there's some consensus. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From paolo.losi at gmail.com Tue Feb 19 10:55:26 2019 From: paolo.losi at gmail.com (Paolo Losi) Date: Tue, 19 Feb 2019 16:55:26 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> Message-ID: +1 if my opinion matters On Tue, Feb 19, 2019 at 4:39 PM Andreas Mueller wrote: > A good time to remind all core devs to vote (or abstain). > > +1 from me as well (as might be expected), I didn't want to put my vote in > my call for the vote. > > Participation is not super high (as might be expected), 13 of the 49 core > devs voted so far.. > > There are some people who have voiced opinions on the internal list before > that I'll harass now - the document underwent substantial changes since > then. > > I'll send a reminder to those who participated in the discussion before, > and for anyone who hasn't voted by Thursday, I think it makes sense to send > a request to vote combined with a question on whether they want to become > emeritus members, so people that have been gone for a long time don't have > to read/answer to two emails. > > > > > On 2/19/19 5:46 AM, Joel Nothman wrote: > > Uhh... I forgot to vote. +1 :) > > It seems there's some consensus. > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 19 11:31:22 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 19 Feb 2019 11:31:22 -0500 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> Message-ID: On 2/19/19 10:55 AM, Paolo Losi wrote: > +1 if my opinion matters > Thank you and it does :) From t3kcit at gmail.com Tue Feb 19 12:12:46 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 19 Feb 2019 12:12:46 -0500 Subject: [scikit-learn] Reddit thread with complaints about scikit-learn In-Reply-To: References: Message-ID: <91cb9063-6160-4c8d-ef7b-717b7c142fb4@gmail.com> I agree with most of their points and have tried to prioritize some (and I think you were the victim of me trying to address some of these ;). The question about structuring the estimators is really something tricky. Maybe it's worth putting it on the roadmap to discuss this at some point? Generally I thought it would be too much a hassle but the inconsistency is kind of annoying (having a class per loss or per regularizer or per solver sometimes). On 2/19/19 5:56 AM, Nicolas Hug wrote: > Hi everyone, > > I stumbled upon this reddit thread [1] where people point out what > they dislike about the scikit-learn API. It's mostly about the lack of > consistency for linear models. Just thought it'd be interesting to > have some external critics. > > Best, > > Nicolas > > > [1] > https://www.reddit.com/r/MachineLearning/comments/aryjif/d_alternatives_to_scikitlearn/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From t3kcit at gmail.com Tue Feb 19 12:23:57 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 19 Feb 2019 12:23:57 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> Message-ID: <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> On 2/14/19 11:40 AM, Nicolas Hug wrote: > >> or we could go as far as to schedule meetings on the different topics. > > Given the number of issues to discuss this is probably the best > approach IMO > > If we want to schedule meetings we could do one of two things: have a scheduling meeting first thing Monday, or do it now. The issue I have with the Monday meeting is that I won't be there - I could join remotely, though, if the timing works. Maybe having like two group design meetings per day is enough to get our brains smoking? We could do an hour in the morning (10am?) and an hour in the afternoon (2pm?) and then do informal discussions at other points to clarify? There's already a meeting Monday morning and people probably want to generally catch up. Which would leave Monday afternoon and 8 more meeting slots. Does OPTICS require a meeting or is it clear what to do and the work "just" needs to be done? Cheers, Andy From adrin.jalali at gmail.com Tue Feb 19 12:28:20 2019 From: adrin.jalali at gmail.com (Adrin) Date: Tue, 19 Feb 2019 18:28:20 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> Message-ID: > Does OPTICS require a meeting or is it clear what to do and the work > "just" needs to be done? Definitely needs (some) discussions. On Tue, Feb 19, 2019, 18:25 Andreas Mueller > > On 2/14/19 11:40 AM, Nicolas Hug wrote: > > > >> or we could go as far as to schedule meetings on the different topics. > > > > Given the number of issues to discuss this is probably the best > > approach IMO > > > > > If we want to schedule meetings we could do one of two things: have a > scheduling meeting first thing Monday, or do it now. > The issue I have with the Monday meeting is that I won't be there - I > could join remotely, though, if the timing works. > > Maybe having like two group design meetings per day is enough to get our > brains smoking? > We could do an hour in the morning (10am?) and an hour in the afternoon > (2pm?) and then do informal discussions at other points to clarify? > > There's already a meeting Monday morning and people probably want to > generally catch up. > Which would leave Monday afternoon and 8 more meeting slots. > > Does OPTICS require a meeting or is it clear what to do and the work > "just" needs to be done? > > Cheers, > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From f at bianp.net Tue Feb 19 14:37:41 2019 From: f at bianp.net (Fabian Pedregosa) Date: Tue, 19 Feb 2019 14:37:41 -0500 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> Message-ID: +1 (not sure if my previous email went through) On Tue, Feb 19, 2019 at 11:31 AM Andreas Mueller wrote: > > > On 2/19/19 10:55 AM, Paolo Losi wrote: > > +1 if my opinion matters > > > Thank you and it does :) > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.louppe at gmail.com Tue Feb 19 15:48:57 2019 From: g.louppe at gmail.com (Gilles Louppe) Date: Tue, 19 Feb 2019 21:48:57 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> Message-ID: +1 On Tue, 19 Feb 2019 at 20:40, Fabian Pedregosa wrote: > > +1 (not sure if my previous email went through) > > On Tue, Feb 19, 2019 at 11:31 AM Andreas Mueller wrote: >> >> >> >> On 2/19/19 10:55 AM, Paolo Losi wrote: >> > +1 if my opinion matters >> > >> Thank you and it does :) >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From joel.nothman at gmail.com Tue Feb 19 16:17:41 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Wed, 20 Feb 2019 08:17:41 +1100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> Message-ID: I don't think optics requires a large meeting, just a few people. I'm happy with your proposal generally, Andy. Do we schedule specific topics at this point? -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 19 16:21:18 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 19 Feb 2019 16:21:18 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> Message-ID: <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> Yeah, sounds good. I didn't want to unilaterally post a schedule, but doing some google form or similar seems a bit heavy-handed? Not sure if Guillaume had ideas about the schedule, given that he seems to be running the show? On 2/19/19 4:17 PM, Joel Nothman wrote: > I don't think optics requires a large meeting, just a few people. > > I'm happy with your proposal generally, Andy. Do we schedule specific > topics at this point? > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud4567 at gmail.com Tue Feb 19 17:17:36 2019 From: arnaud4567 at gmail.com (Arnaud Joly) Date: Tue, 19 Feb 2019 23:17:36 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> Message-ID: <90C4CA07-2C37-45CF-9037-33ED17DEC088@gmail.com> +1 Arnaud > On 19 Feb 2019, at 21:48, Gilles Louppe wrote: > > +1 > > On Tue, 19 Feb 2019 at 20:40, Fabian Pedregosa wrote: >> >> +1 (not sure if my previous email went through) >> >> On Tue, Feb 19, 2019 at 11:31 AM Andreas Mueller wrote: >>> >>> >>> >>> On 2/19/19 10:55 AM, Paolo Losi wrote: >>>> +1 if my opinion matters >>>> >>> Thank you and it does :) >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From g.lemaitre58 at gmail.com Tue Feb 19 17:48:15 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Tue, 19 Feb 2019 23:48:15 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> Message-ID: > Not sure if Guillaume had ideas about the schedule, given that he seems to be running the show? Mostly running behind the show ... For the moment, we only have a 30 minutes presentation of introduction planned on Monday. For the rest of the week, this is pretty much opened and I think that we can propose a schedule such that we can be efficient. IMO, two meetings of an hour per day look good to me. Shall we prioritize the list of the issues? Maybe that some issues could be packed together. I would not be against having a rough schedule on the wiki directly and I think that having it before Monday could be better. Let me know how I can help. On Tue, 19 Feb 2019 at 22:23, Andreas Mueller wrote: > Yeah, sounds good. > I didn't want to unilaterally post a schedule, but doing some google form > or similar seems a bit heavy-handed? > Not sure if Guillaume had ideas about the schedule, given that he seems to > be running the show? > > On 2/19/19 4:17 PM, Joel Nothman wrote: > > I don't think optics requires a large meeting, just a few people. > > I'm happy with your proposal generally, Andy. Do we schedule specific > topics at this point? > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 19 18:16:20 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 19 Feb 2019 18:16:20 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> Message-ID: <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> I put a draft schedule here: https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule it's obviously somewhat opinionated ;) Happy to reprioritize. Basically I wouldn't like to miss any of the big API discussions because coming late to the party. The two things on (grid?) searches are somewhat related: one is about specifying search-spaces, the other about executing a given search space efficiently. They probably warrant separate discussions. I haven't added plotting or sample props on it, which are maybe two other discussion points. I tried to cover most controversial things from the roadmap. Not sure if discussing the schedule via the mailing list is the best way? Don't want to create even more traffic? than I already am ;) On 2/19/19 5:48 PM, Guillaume Lema?tre wrote: > > Not sure if Guillaume had ideas about the schedule, given that he > seems to be running the show? > > Mostly running behind the show ... > > For the moment, we only have a 30 minutes presentation of introduction > planned on Monday. > For the rest of the week, this is pretty much opened and I think that > we can propose a schedule such that we can be efficient. > IMO, two meetings of an hour per day look good to me. > > Shall we prioritize the list of the issues? Maybe that some issues > could be packed together. > I would not be against having a rough schedule on the wiki directly > and I think that having it before Monday could be better. > > Let me know how I can help. > > On Tue, 19 Feb 2019 at 22:23, Andreas Mueller > wrote: > > Yeah, sounds good. > I didn't want to unilaterally post a schedule, but doing some > google form or similar seems a bit heavy-handed? > Not sure if Guillaume had ideas about the schedule, given that he > seems to be running the show? > > On 2/19/19 4:17 PM, Joel Nothman wrote: >> I don't think optics requires a large meeting, just a few people. >> >> I'm happy with your proposal generally, Andy. Do we schedule >> specific topics at this point? >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Tue Feb 19 20:30:33 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Wed, 20 Feb 2019 12:30:33 +1100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <1c8097ae-a2c9-b67c-7feb-c6ec461602b0@gmail.com> <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> Message-ID: I don't think I'll be able to stay for the Friday 10am discussion, but have a PR open on "efficient grid search" so should probably be involved. Perhaps the fit_transform discussion can happen without you, Andy? On Wed, 20 Feb 2019 at 10:17, Andreas Mueller wrote: > I put a draft schedule here: > > https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule > > it's obviously somewhat opinionated ;) > Happy to reprioritize. > Basically I wouldn't like to miss any of the big API discussions because > coming late to the party. > > The two things on (grid?) searches are somewhat related: one is about > specifying search-spaces, the other about executing a given search space > efficiently. They probably warrant separate discussions. > > I haven't added plotting or sample props on it, which are maybe two other > discussion points. > I tried to cover most controversial things from the roadmap. > > Not sure if discussing the schedule via the mailing list is the best way? > Don't want to create even more traffic than I already am ;) > > On 2/19/19 5:48 PM, Guillaume Lema?tre wrote: > > > Not sure if Guillaume had ideas about the schedule, given that he seems > to be running the show? > > Mostly running behind the show ... > > For the moment, we only have a 30 minutes presentation of introduction > planned on Monday. > For the rest of the week, this is pretty much opened and I think that we > can propose a schedule such that we can be efficient. > IMO, two meetings of an hour per day look good to me. > > Shall we prioritize the list of the issues? Maybe that some issues could > be packed together. > I would not be against having a rough schedule on the wiki directly and I > think that having it before Monday could be better. > > Let me know how I can help. > > On Tue, 19 Feb 2019 at 22:23, Andreas Mueller wrote: > >> Yeah, sounds good. >> I didn't want to unilaterally post a schedule, but doing some google form >> or similar seems a bit heavy-handed? >> Not sure if Guillaume had ideas about the schedule, given that he seems >> to be running the show? >> >> On 2/19/19 4:17 PM, Joel Nothman wrote: >> >> I don't think optics requires a large meeting, just a few people. >> >> I'm happy with your proposal generally, Andy. Do we schedule specific >> topics at this point? >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at pm.me Wed Feb 20 01:33:18 2019 From: rth.yurchak at pm.me (Roman Yurchak) Date: Wed, 20 Feb 2019 06:33:18 +0000 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> Message-ID: Thanks for putting the draft schedule together! Personally I will be there 3 days out of 5 and wouldn't want to miss the discussion on euclidean distance issues. Maybe we could adjust the schedule during the sprint (say on Tuesday) based on people's interest and availability? That might be easier than trying to figure it out for 29 participants over email.. Also IMO it would makes sense to have some discussions (that are not that controversial or about high level API but still useful) earlier during the week to be able to work on them during the sprint. -- Roman On 20/02/2019 02:30, Joel Nothman wrote: > I don't think I'll be able to stay for the Friday 10am discussion, but > have a PR open on "efficient grid search" so should probably be involved. > > Perhaps the fit_transform discussion can happen without you, Andy? > > On Wed, 20 Feb 2019 at 10:17, Andreas Mueller > wrote: > > I put a draft schedule here: > https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule > > it's obviously somewhat opinionated ;) > Happy to reprioritize. > Basically I wouldn't like to miss any of the big API discussions > because coming late to the party. > > The two things on (grid?) searches are somewhat related: one is > about specifying search-spaces, the other about executing a given > search space efficiently. They probably warrant separate discussions. > > I haven't added plotting or sample props on it, which are maybe two > other discussion points. > I tried to cover most controversial things from the roadmap. > > Not sure if discussing the schedule via the mailing list is the best > way? Don't want to create even more traffic? than I already am ;) > > On 2/19/19 5:48 PM, Guillaume Lema?tre wrote: >> > Not sure if Guillaume had ideas about the schedule, given that >> he seems to be running the show? >> >> Mostly running behind the show ... >> >> For the moment, we only have a 30 minutes presentation of >> introduction planned on Monday. >> For the rest of the week, this is pretty much opened and I think >> that we can propose a schedule such that we can be efficient. >> IMO, two meetings of an hour per day look good to me. >> >> Shall we prioritize the list of the issues? Maybe that some issues >> could be packed together. >> I would not be against having a rough schedule on the wiki >> directly and I think that having it before Monday could be better. >> >> Let me know how I can help. >> >> On Tue, 19 Feb 2019 at 22:23, Andreas Mueller > > wrote: >> >> Yeah, sounds good. >> I didn't want to unilaterally post a schedule, but doing some >> google form or similar seems a bit heavy-handed? >> Not sure if Guillaume had ideas about the schedule, given that >> he seems to be running the show? >> >> On 2/19/19 4:17 PM, Joel Nothman wrote: >>> I don't think optics requires a large meeting, just a few >>> people. >>> >>> I'm happy with your proposal generally, Andy. Do we schedule >>> specific topics at this point? >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> -- >> Guillaume Lemaitre >> INRIA Saclay - Parietal team >> Center for Data Science Paris-Saclay >> https://glemaitre.github.io/ >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From loic.esteve at ymail.com Wed Feb 20 02:12:27 2019 From: loic.esteve at ymail.com (=?utf-8?B?TG/Dr2MgRXN0w6h2ZQ==?=) Date: Wed, 20 Feb 2019 08:12:27 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: <90C4CA07-2C37-45CF-9037-33ED17DEC088@gmail.com> References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> <90C4CA07-2C37-45CF-9037-33ED17DEC088@gmail.com> Message-ID: +1 from me. Cheers, Lo?c From alexandre.gramfort at inria.fr Wed Feb 20 03:48:27 2019 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Wed, 20 Feb 2019 09:48:27 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> Message-ID: we should also see if we can have a lot of CI machines for the 5 days as it's always the blocker to move fast during 1 week. my 2c Alex On Wed, Feb 20, 2019 at 7:35 AM Roman Yurchak via scikit-learn wrote: > > Thanks for putting the draft schedule together! > > Personally I will be there 3 days out of 5 and wouldn't want to miss the > discussion on euclidean distance issues. Maybe we could adjust the > schedule during the sprint (say on Tuesday) based on people's interest > and availability? That might be easier than trying to figure it out for > 29 participants over email.. > > Also IMO it would makes sense to have some discussions (that are not > that controversial or about high level API but still useful) earlier > during the week to be able to work on them during the sprint. > > -- > Roman > > On 20/02/2019 02:30, Joel Nothman wrote: > > I don't think I'll be able to stay for the Friday 10am discussion, but > > have a PR open on "efficient grid search" so should probably be involved. > > > > Perhaps the fit_transform discussion can happen without you, Andy? > > > > On Wed, 20 Feb 2019 at 10:17, Andreas Mueller > > wrote: > > > > I put a draft schedule here: > > https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule > > > > it's obviously somewhat opinionated ;) > > Happy to reprioritize. > > Basically I wouldn't like to miss any of the big API discussions > > because coming late to the party. > > > > The two things on (grid?) searches are somewhat related: one is > > about specifying search-spaces, the other about executing a given > > search space efficiently. They probably warrant separate discussions. > > > > I haven't added plotting or sample props on it, which are maybe two > > other discussion points. > > I tried to cover most controversial things from the roadmap. > > > > Not sure if discussing the schedule via the mailing list is the best > > way? Don't want to create even more traffic than I already am ;) > > > > On 2/19/19 5:48 PM, Guillaume Lema?tre wrote: > >> > Not sure if Guillaume had ideas about the schedule, given that > >> he seems to be running the show? > >> > >> Mostly running behind the show ... > >> > >> For the moment, we only have a 30 minutes presentation of > >> introduction planned on Monday. > >> For the rest of the week, this is pretty much opened and I think > >> that we can propose a schedule such that we can be efficient. > >> IMO, two meetings of an hour per day look good to me. > >> > >> Shall we prioritize the list of the issues? Maybe that some issues > >> could be packed together. > >> I would not be against having a rough schedule on the wiki > >> directly and I think that having it before Monday could be better. > >> > >> Let me know how I can help. > >> > >> On Tue, 19 Feb 2019 at 22:23, Andreas Mueller >> > wrote: > >> > >> Yeah, sounds good. > >> I didn't want to unilaterally post a schedule, but doing some > >> google form or similar seems a bit heavy-handed? > >> Not sure if Guillaume had ideas about the schedule, given that > >> he seems to be running the show? > >> > >> On 2/19/19 4:17 PM, Joel Nothman wrote: > >>> I don't think optics requires a large meeting, just a few > >>> people. > >>> > >>> I'm happy with your proposal generally, Andy. Do we schedule > >>> specific topics at this point? > >>> > >>> _______________________________________________ > >>> scikit-learn mailing list > >>> scikit-learn at python.org > >>> https://mail.python.org/mailman/listinfo/scikit-learn > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > >> > >> > >> -- > >> Guillaume Lemaitre > >> INRIA Saclay - Parietal team > >> Center for Data Science Paris-Saclay > >> https://glemaitre.github.io/ > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From olivier.grisel at ensta.org Wed Feb 20 07:06:47 2019 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 20 Feb 2019 13:06:47 +0100 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> <90C4CA07-2C37-45CF-9037-33ED17DEC088@gmail.com> Message-ID: +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Feb 20 11:20:58 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 20 Feb 2019 11:20:58 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> Message-ID: <6f38bda9-77e8-06c5-5964-bd6aed982075@gmail.com> Thanks for bringing that up. Did I email travis last time? We should also follow up with Microsoft as they promised unlimited builds... On 2/20/19 3:48 AM, Alexandre Gramfort wrote: > we should also see if we can have a lot of CI machines for the 5 days > as it's always the blocker to move fast during 1 week. > > my 2c > Alex > > On Wed, Feb 20, 2019 at 7:35 AM Roman Yurchak via scikit-learn > wrote: >> Thanks for putting the draft schedule together! >> >> Personally I will be there 3 days out of 5 and wouldn't want to miss the >> discussion on euclidean distance issues. Maybe we could adjust the >> schedule during the sprint (say on Tuesday) based on people's interest >> and availability? That might be easier than trying to figure it out for >> 29 participants over email.. >> >> Also IMO it would makes sense to have some discussions (that are not >> that controversial or about high level API but still useful) earlier >> during the week to be able to work on them during the sprint. >> >> -- >> Roman >> >> On 20/02/2019 02:30, Joel Nothman wrote: >>> I don't think I'll be able to stay for the Friday 10am discussion, but >>> have a PR open on "efficient grid search" so should probably be involved. >>> >>> Perhaps the fit_transform discussion can happen without you, Andy? >>> >>> On Wed, 20 Feb 2019 at 10:17, Andreas Mueller >> > wrote: >>> >>> I put a draft schedule here: >>> https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule >>> >>> it's obviously somewhat opinionated ;) >>> Happy to reprioritize. >>> Basically I wouldn't like to miss any of the big API discussions >>> because coming late to the party. >>> >>> The two things on (grid?) searches are somewhat related: one is >>> about specifying search-spaces, the other about executing a given >>> search space efficiently. They probably warrant separate discussions. >>> >>> I haven't added plotting or sample props on it, which are maybe two >>> other discussion points. >>> I tried to cover most controversial things from the roadmap. >>> >>> Not sure if discussing the schedule via the mailing list is the best >>> way? Don't want to create even more traffic than I already am ;) >>> >>> On 2/19/19 5:48 PM, Guillaume Lema?tre wrote: >>>> > Not sure if Guillaume had ideas about the schedule, given that >>>> he seems to be running the show? >>>> >>>> Mostly running behind the show ... >>>> >>>> For the moment, we only have a 30 minutes presentation of >>>> introduction planned on Monday. >>>> For the rest of the week, this is pretty much opened and I think >>>> that we can propose a schedule such that we can be efficient. >>>> IMO, two meetings of an hour per day look good to me. >>>> >>>> Shall we prioritize the list of the issues? Maybe that some issues >>>> could be packed together. >>>> I would not be against having a rough schedule on the wiki >>>> directly and I think that having it before Monday could be better. >>>> >>>> Let me know how I can help. >>>> >>>> On Tue, 19 Feb 2019 at 22:23, Andreas Mueller >>> > wrote: >>>> >>>> Yeah, sounds good. >>>> I didn't want to unilaterally post a schedule, but doing some >>>> google form or similar seems a bit heavy-handed? >>>> Not sure if Guillaume had ideas about the schedule, given that >>>> he seems to be running the show? >>>> >>>> On 2/19/19 4:17 PM, Joel Nothman wrote: >>>>> I don't think optics requires a large meeting, just a few >>>>> people. >>>>> >>>>> I'm happy with your proposal generally, Andy. Do we schedule >>>>> specific topics at this point? >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >>>> -- >>>> Guillaume Lemaitre >>>> INRIA Saclay - Parietal team >>>> Center for Data Science Paris-Saclay >>>> https://glemaitre.github.io/ >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From t3kcit at gmail.com Wed Feb 20 11:33:50 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 20 Feb 2019 11:33:50 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> Message-ID: Sure, we can change it up on Tuesday. I agree having things that we can implement during the week would be good. I was actually kind of optimistic and was hoping we could make some dent into the freezing, and the convergence issues might be less controversial and more a technical challenge. I would like to be part of the fit_transform discussion but I don't have to be. Certainly I would want to discuss the grid-search stuff with Joel being present. On 2/20/19 1:33 AM, Roman Yurchak via scikit-learn wrote: > Thanks for putting the draft schedule together! > > Personally I will be there 3 days out of 5 and wouldn't want to miss the > discussion on euclidean distance issues. Maybe we could adjust the > schedule during the sprint (say on Tuesday) based on people's interest > and availability? That might be easier than trying to figure it out for > 29 participants over email.. > > Also IMO it would makes sense to have some discussions (that are not > that controversial or about high level API but still useful) earlier > during the week to be able to work on them during the sprint. > From g.lemaitre58 at gmail.com Wed Feb 20 12:12:01 2019 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Wed, 20 Feb 2019 18:12:01 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <6f38bda9-77e8-06c5-5964-bd6aed982075@gmail.com> References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <6f38bda9-77e8-06c5-5964-bd6aed982075@gmail.com> Message-ID: @Andy You were the one contacting Travis. On Wed, 20 Feb 2019 at 17:23, Andreas Mueller wrote: > Thanks for bringing that up. > Did I email travis last time? > > We should also follow up with Microsoft as they promised unlimited > builds... > > > On 2/20/19 3:48 AM, Alexandre Gramfort wrote: > > we should also see if we can have a lot of CI machines for the 5 days > > as it's always the blocker to move fast during 1 week. > > > > my 2c > > Alex > > > > On Wed, Feb 20, 2019 at 7:35 AM Roman Yurchak via scikit-learn > > wrote: > >> Thanks for putting the draft schedule together! > >> > >> Personally I will be there 3 days out of 5 and wouldn't want to miss the > >> discussion on euclidean distance issues. Maybe we could adjust the > >> schedule during the sprint (say on Tuesday) based on people's interest > >> and availability? That might be easier than trying to figure it out for > >> 29 participants over email.. > >> > >> Also IMO it would makes sense to have some discussions (that are not > >> that controversial or about high level API but still useful) earlier > >> during the week to be able to work on them during the sprint. > >> > >> -- > >> Roman > >> > >> On 20/02/2019 02:30, Joel Nothman wrote: > >>> I don't think I'll be able to stay for the Friday 10am discussion, but > >>> have a PR open on "efficient grid search" so should probably be > involved. > >>> > >>> Perhaps the fit_transform discussion can happen without you, Andy? > >>> > >>> On Wed, 20 Feb 2019 at 10:17, Andreas Mueller >>> > wrote: > >>> > >>> I put a draft schedule here: > >>> > https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule > >>> > >>> it's obviously somewhat opinionated ;) > >>> Happy to reprioritize. > >>> Basically I wouldn't like to miss any of the big API discussions > >>> because coming late to the party. > >>> > >>> The two things on (grid?) searches are somewhat related: one is > >>> about specifying search-spaces, the other about executing a given > >>> search space efficiently. They probably warrant separate > discussions. > >>> > >>> I haven't added plotting or sample props on it, which are maybe > two > >>> other discussion points. > >>> I tried to cover most controversial things from the roadmap. > >>> > >>> Not sure if discussing the schedule via the mailing list is the > best > >>> way? Don't want to create even more traffic than I already am ;) > >>> > >>> On 2/19/19 5:48 PM, Guillaume Lema?tre wrote: > >>>> > Not sure if Guillaume had ideas about the schedule, given that > >>>> he seems to be running the show? > >>>> > >>>> Mostly running behind the show ... > >>>> > >>>> For the moment, we only have a 30 minutes presentation of > >>>> introduction planned on Monday. > >>>> For the rest of the week, this is pretty much opened and I think > >>>> that we can propose a schedule such that we can be efficient. > >>>> IMO, two meetings of an hour per day look good to me. > >>>> > >>>> Shall we prioritize the list of the issues? Maybe that some > issues > >>>> could be packed together. > >>>> I would not be against having a rough schedule on the wiki > >>>> directly and I think that having it before Monday could be > better. > >>>> > >>>> Let me know how I can help. > >>>> > >>>> On Tue, 19 Feb 2019 at 22:23, Andreas Mueller >>>> > wrote: > >>>> > >>>> Yeah, sounds good. > >>>> I didn't want to unilaterally post a schedule, but doing some > >>>> google form or similar seems a bit heavy-handed? > >>>> Not sure if Guillaume had ideas about the schedule, given > that > >>>> he seems to be running the show? > >>>> > >>>> On 2/19/19 4:17 PM, Joel Nothman wrote: > >>>>> I don't think optics requires a large meeting, just a few > >>>>> people. > >>>>> > >>>>> I'm happy with your proposal generally, Andy. Do we schedule > >>>>> specific topics at this point? > >>>>> > >>>>> _______________________________________________ > >>>>> scikit-learn mailing list > >>>>> scikit-learn at python.org > >>>>> https://mail.python.org/mailman/listinfo/scikit-learn > >>>> _______________________________________________ > >>>> scikit-learn mailing list > >>>> scikit-learn at python.org > >>>> https://mail.python.org/mailman/listinfo/scikit-learn > >>>> > >>>> > >>>> > >>>> -- > >>>> Guillaume Lemaitre > >>>> INRIA Saclay - Parietal team > >>>> Center for Data Science Paris-Saclay > >>>> https://glemaitre.github.io/ > >>>> > >>>> _______________________________________________ > >>>> scikit-learn mailing list > >>>> scikit-learn at python.org > >>>> https://mail.python.org/mailman/listinfo/scikit-learn > >>> _______________________________________________ > >>> scikit-learn mailing list > >>> scikit-learn at python.org > >>> https://mail.python.org/mailman/listinfo/scikit-learn > >>> > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Feb 20 15:33:36 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 20 Feb 2019 15:33:36 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <8498bf4d-83da-08d5-9e04-35d9c534199b@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <6f38bda9-77e8-06c5-5964-bd6aed982075@gmail.com> Message-ID: I messaged them and also tasks Thomas Fan with working with Microsoft to set up azure pipelines. On 2/20/19 12:12 PM, Guillaume Lema?tre wrote: > @Andy You were the one contacting Travis. > > On Wed, 20 Feb 2019 at 17:23, Andreas Mueller > wrote: > > Thanks for bringing that up. > Did I email travis last time? > > We should also follow up with Microsoft as they promised unlimited > builds... > > > On 2/20/19 3:48 AM, Alexandre Gramfort wrote: > > we should also see if we can have a lot of CI machines for the 5 > days > > as it's always the blocker to move fast during 1 week. > > > > my 2c > > Alex > > > > On Wed, Feb 20, 2019 at 7:35 AM Roman Yurchak via scikit-learn > > > wrote: > >> Thanks for putting the draft schedule together! > >> > >> Personally I will be there 3 days out of 5 and wouldn't want to > miss the > >> discussion on euclidean distance issues. Maybe we could adjust the > >> schedule during the sprint (say on Tuesday) based on people's > interest > >> and availability? That might be easier than trying to figure it > out for > >> 29 participants over email.. > >> > >> Also IMO it would makes sense to have some discussions (that > are not > >> that controversial or about high level API but still useful) > earlier > >> during the week to be able to work on them during the sprint. > >> > >> -- > >> Roman > >> > >> On 20/02/2019 02:30, Joel Nothman wrote: > >>> I don't think I'll be able to stay for the Friday 10am > discussion, but > >>> have a PR open on "efficient grid search" so should probably > be involved. > >>> > >>> Perhaps the fit_transform discussion can happen without you, Andy? > >>> > >>> On Wed, 20 Feb 2019 at 10:17, Andreas Mueller > > >>> >> wrote: > >>> > >>>? ? ? I put a draft schedule here: > >>> > https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule > >>> > >>>? ? ? it's obviously somewhat opinionated ;) > >>>? ? ? Happy to reprioritize. > >>>? ? ? Basically I wouldn't like to miss any of the big API > discussions > >>>? ? ? because coming late to the party. > >>> > >>>? ? ? The two things on (grid?) searches are somewhat related: > one is > >>>? ? ? about specifying search-spaces, the other about executing > a given > >>>? ? ? search space efficiently. They probably warrant separate > discussions. > >>> > >>>? ? ? I haven't added plotting or sample props on it, which are > maybe two > >>>? ? ? other discussion points. > >>>? ? ? I tried to cover most controversial things from the roadmap. > >>> > >>>? ? ? Not sure if discussing the schedule via the mailing list > is the best > >>>? ? ? way? Don't want to create even more traffic than I > already am ;) > >>> > >>>? ? ? On 2/19/19 5:48 PM, Guillaume Lema?tre wrote: > >>>>? ? ? > Not sure if Guillaume had ideas about the schedule, > given that > >>>>? ? ? he seems to be running the show? > >>>> > >>>>? ? ? Mostly running behind the show ... > >>>> > >>>>? ? ? For the moment, we only have a 30 minutes presentation of > >>>>? ? ? introduction planned on Monday. > >>>>? ? ? For the rest of the week, this is pretty much opened and > I think > >>>>? ? ? that we can propose a schedule such that we can be > efficient. > >>>>? ? ? IMO, two meetings of an hour per day look good to me. > >>>> > >>>>? ? ? Shall we prioritize the list of the issues? Maybe that > some issues > >>>>? ? ? could be packed together. > >>>>? ? ? I would not be against having a rough schedule on the wiki > >>>>? ? ? directly and I think that having it before Monday could > be better. > >>>> > >>>>? ? ? Let me know how I can help. > >>>> > >>>>? ? ? On Tue, 19 Feb 2019 at 22:23, Andreas Mueller > > >>>>? ? ? >> wrote: > >>>> > >>>>? ? ? ? ? Yeah, sounds good. > >>>>? ? ? ? ? I didn't want to unilaterally post a schedule, but > doing some > >>>>? ? ? ? ? google form or similar seems a bit heavy-handed? > >>>>? ? ? ? ? Not sure if Guillaume had ideas about the schedule, > given that > >>>>? ? ? ? ? he seems to be running the show? > >>>> > >>>>? ? ? ? ? On 2/19/19 4:17 PM, Joel Nothman wrote: > >>>>>? ? ? ? ? I don't think optics requires a large meeting, just > a few > >>>>>? ? ? ? ? people. > >>>>> > >>>>>? ? ? ? ? I'm happy with your proposal generally, Andy. Do we > schedule > >>>>>? ? ? ? ? specific topics at this point? > >>>>> > >>>>> _______________________________________________ > >>>>>? ? ? ? ? scikit-learn mailing list > >>>>> scikit-learn at python.org > > > >>>>> https://mail.python.org/mailman/listinfo/scikit-learn > >>>> _______________________________________________ > >>>>? ? ? ? ? scikit-learn mailing list > >>>> scikit-learn at python.org > > > >>>> https://mail.python.org/mailman/listinfo/scikit-learn > >>>> > >>>> > >>>> > >>>>? ? ? -- > >>>>? ? ? Guillaume Lemaitre > >>>>? ? ? INRIA Saclay - Parietal team > >>>>? ? ? Center for Data Science Paris-Saclay > >>>> https://glemaitre.github.io/ > >>>> > >>>> _______________________________________________ > >>>>? ? ? scikit-learn mailing list > >>>> scikit-learn at python.org > > > >>>> https://mail.python.org/mailman/listinfo/scikit-learn > >>> _______________________________________________ > >>>? ? ? scikit-learn mailing list > >>> scikit-learn at python.org > > > >>> https://mail.python.org/mailman/listinfo/scikit-learn > >>> > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From satra at mit.edu Wed Feb 20 16:28:14 2019 From: satra at mit.edu (Satrajit Ghosh) Date: Wed, 20 Feb 2019 16:28:14 -0500 Subject: [scikit-learn] VOTE: scikit-learn governance document In-Reply-To: References: <20190211005345.0B5B146400A2@webmail.sinamail.sina.com.cn> <20190211084756.3jttddocqrxwrli2@phare.normalesup.org> <5418d593-d0ad-4ae3-1eab-ae7f91766607@gmail.com> <90C4CA07-2C37-45CF-9037-33ED17DEC088@gmail.com> Message-ID: +1 cheers, satra -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Feb 20 16:40:37 2019 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 20 Feb 2019 22:40:37 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> References: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> Message-ID: <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> On Tue, Feb 19, 2019 at 06:16:20PM -0500, Andreas Mueller wrote: > I put a draft schedule here: > https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule I'd like to discuss sample_props. They are important to me. Should I add them somewhere on the schedule? Maybe in a place where people who care about them (AFAIK Joel and Alex also do) are available? Ga?l > it's obviously somewhat opinionated ;) > Happy to reprioritize. > Basically I wouldn't like to miss any of the big API discussions because coming > late to the party. > The two things on (grid?) searches are somewhat related: one is about > specifying search-spaces, the other about executing a given search space > efficiently. They probably warrant separate discussions. > I haven't added plotting or sample props on it, which are maybe two other > discussion points. > I tried to cover most controversial things from the roadmap. > Not sure if discussing the schedule via the mailing list is the best way? Don't > want to create even more traffic? than I already am ;) > On 2/19/19 5:48 PM, Guillaume Lema?tre wrote: > > Not sure if Guillaume had ideas about the schedule, given that he seems > to be running the show? > Mostly running behind the show ... > For the moment, we only have a 30 minutes presentation of introduction > planned on Monday. > For the rest of the week, this is pretty much opened and I think that we > can propose a schedule such that we can be efficient. > IMO, two meetings of an hour per day look good to me. > Shall we prioritize the list of the issues? Maybe that some issues could be > packed together. > I would not be against having a rough schedule on the wiki directly and I > think that having it before Monday could be better. > Let me know how I can help. > On Tue, 19 Feb 2019 at 22:23, Andreas Mueller wrote: > Yeah, sounds good. > I didn't want to unilaterally post a schedule, but doing some google > form or similar seems a bit heavy-handed? > Not sure if Guillaume had ideas about the schedule, given that he seems > to be running the show? > On 2/19/19 4:17 PM, Joel Nothman wrote: > I don't think optics requires a large meeting, just a few people. > I'm happy with your proposal generally, Andy. Do we schedule > specific topics at this point? > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Senior Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From t3kcit at gmail.com Wed Feb 20 17:12:23 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 20 Feb 2019 17:12:23 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> References: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> Message-ID: On 2/20/19 4:40 PM, Gael Varoquaux wrote: > On Tue, Feb 19, 2019 at 06:16:20PM -0500, Andreas Mueller wrote: >> I put a draft schedule here: >> https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule > I'd like to discuss sample_props. They are important to me. > > Should I add them somewhere on the schedule? Maybe in a place where > people who care about them (AFAIK Joel and Alex also do) are available? > Sure, sounds like a plan. If they are discussed I'd like to be part of the discussion if possible given the complexity involved (and because I tried to implement it twice). But feel free to have it Monday without me if that works better. From joel.nothman at gmail.com Thu Feb 21 02:40:34 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 21 Feb 2019 18:40:34 +1100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> Message-ID: @Hanmin are there particular conversations you are keen to take part in, and particular times that suit you? On Thu., 21 Feb. 2019, 9:13 am Andreas Mueller, wrote: > > > On 2/20/19 4:40 PM, Gael Varoquaux wrote: > > On Tue, Feb 19, 2019 at 06:16:20PM -0500, Andreas Mueller wrote: > >> I put a draft schedule here: > >> > https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule > > I'd like to discuss sample_props. They are important to me. > > > > Should I add them somewhere on the schedule? Maybe in a place where > > people who care about them (AFAIK Joel and Alex also do) are available? > > > Sure, sounds like a plan. If they are discussed I'd like to be part of > the discussion if possible given the complexity involved (and because I > tried to implement it twice). But feel free to have it Monday without me > if that works better. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qinhanmin2005 at sina.com Thu Feb 21 04:01:56 2019 From: qinhanmin2005 at sina.com (Hanmin Qin) Date: Thu, 21 Feb 2019 17:01:56 +0800 Subject: [scikit-learn] Sprint discussion points? Message-ID: <20190221090156.5EA1946400A1@webmail.sinamail.sina.com.cn> Thanks. I'll take part in the OPTICS discussion and I'd like to see it at 14:00, though 10:00 will also be acceptable. The core issue now is how to design the API (i.e., use multiple extraction methods without calculating RD/CD again), and how to deal with the mysterious additions in _extract_optics (See https://github.com/scikit-learn/scikit-learn/issues/12375. I'm unable to contact the original author so I tend to follow the original paper and remove these additions. This will get rid of some parameters and make the interface much more friendly IMO). For other issues, I don't think you need to consider my time. I'll comment on relevant issues if I have any thoughts. Hanmin Qin ----- Original Message ----- From: Joel Nothman To: Scikit-learn user and developer mailing list , Hanmin Qin Subject: Re: [scikit-learn] Sprint discussion points? Date: 2019-02-21 15:40 @Hanmin are there particular conversations you are keen to take part in, and particular times that suit you? On Thu., 21 Feb. 2019, 9:13 am Andreas Mueller, wrote: On 2/20/19 4:40 PM, Gael Varoquaux wrote: > On Tue, Feb 19, 2019 at 06:16:20PM -0500, Andreas Mueller wrote: >> I put a draft schedule here: >> https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule > I'd like to discuss sample_props. They are important to me. > > Should I add them somewhere on the schedule? Maybe in a place where > people who care about them (AFAIK Joel and Alex also do) are available? > Sure, sounds like a plan. If they are discussed I'd like to be part of the discussion if possible given the complexity involved (and because I tried to implement it twice). But feel free to have it Monday without me if that works better. _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From pahome.chen at mirlab.org Sat Feb 23 05:35:58 2019 From: pahome.chen at mirlab.org (lampahome) Date: Sat, 23 Feb 2019 18:35:58 +0800 Subject: [scikit-learn] Incremental learning but predict the older data not well? Message-ID: I tried to use SGDRegressor to train data incrementally because I have newer data everyday. But I found when I train with data for 30 days, and then predict the result of 1st day. The result is very different. Then I predict from 1st to 14th day, I found the result are all the same value. But result of each day are all different and have multiple patterns. Is that correct? or just the size of training data too small? thx -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat Feb 23 20:55:08 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Sun, 24 Feb 2019 12:55:08 +1100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <20190221090156.5EA1946400A1@webmail.sinamail.sina.com.cn> References: <20190221090156.5EA1946400A1@webmail.sinamail.sina.com.cn> Message-ID: Something else worth discussing might be the maintenance of scikit-learn-contrib -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkataraman.bal at gmail.com Sun Feb 24 00:31:51 2019 From: venkataraman.bal at gmail.com (Venkataraman B) Date: Sun, 24 Feb 2019 00:31:51 -0500 Subject: [scikit-learn] Fit and predict method Message-ID: Hi, I had a question on the predict and fit methods The fit method is used to build the model ie classifier.fit(X,y). But when the predict method is called the model that is built is never passed. You only pass the test set. So what model does the predict function use to predict the output I am picking python after working on R and the predict function in R made more sense because the model that was built is passed along with the test set that has to be predicted Any response would be greatly appreciated -- Regards, Venkataraman B -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyr14 at gmail.com Sun Feb 24 04:53:07 2019 From: zephyr14 at gmail.com (Vlad Niculae) Date: Sun, 24 Feb 2019 09:53:07 +0000 Subject: [scikit-learn] Fit and predict method In-Reply-To: References: Message-ID: Hi, The `classifier` object in your code _is_ the model. In other words, after `fit`, the classifier object will have some new attributes (for instance `classifier.coef_` in the case of linear models), which are used to make predictions when you call `predict`. Hope this helps, Vlad On Sun, Feb 24, 2019, 05:34 Venkataraman B wrote: > Hi, I had a question on the predict and fit methods > > The fit method is used to build the model ie classifier.fit(X,y). But when > the predict method is called the model that is built is never passed. You > only pass the test set. So what model does the predict function use to > predict the output > > I am picking python after working on R and the predict function in R made > more sense because the model that was built is passed along with the test > set that has to be predicted > > Any response would be greatly appreciated > -- > Regards, Venkataraman B > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkataraman.bal at gmail.com Sun Feb 24 08:46:30 2019 From: venkataraman.bal at gmail.com (Venkataraman B) Date: Sun, 24 Feb 2019 08:46:30 -0500 Subject: [scikit-learn] Fit and predict method In-Reply-To: References: Message-ID: Got it - thanks On Sun, Feb 24, 2019 at 4:54 AM Vlad Niculae wrote: > Hi, > > The `classifier` object in your code _is_ the model. In other words, after > `fit`, the classifier object will have some new attributes (for instance > `classifier.coef_` in the case of linear models), which are used to make > predictions when you call `predict`. > > Hope this helps, > Vlad > > On Sun, Feb 24, 2019, 05:34 Venkataraman B > wrote: > >> Hi, I had a question on the predict and fit methods >> >> The fit method is used to build the model ie classifier.fit(X,y). But >> when the predict method is called the model that is built is never passed. >> You only pass the test set. So what model does the predict function use to >> predict the output >> >> I am picking python after working on R and the predict function in R made >> more sense because the model that was built is passed along with the test >> set that has to be predicted >> >> Any response would be greatly appreciated >> -- >> Regards, Venkataraman B >> > _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Regards, Venkataraman B -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Feb 25 10:21:39 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 25 Feb 2019 10:21:39 -0500 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> References: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> Message-ID: <068eb65e-6a0f-b807-c3f6-930f61a0f45e@gmail.com> One other topic that I kind of forgot about are keyword-only arguments. I like the idea of the decorator that I proposed but Joel didn't like it, I think ;) We might want to think about other Python3 features like type annotations as well. On 2/20/19 4:40 PM, Gael Varoquaux wrote: > On Tue, Feb 19, 2019 at 06:16:20PM -0500, Andreas Mueller wrote: >> I put a draft schedule here: >> https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#technical-discussions-schedule > I'd like to discuss sample_props. They are important to me. > > Should I add them somewhere on the schedule? Maybe in a place where > people who care about them (AFAIK Joel and Alex also do) are available? > > Ga?l > >> it's obviously somewhat opinionated ;) >> Happy to reprioritize. >> Basically I wouldn't like to miss any of the big API discussions because coming >> late to the party. >> The two things on (grid?) searches are somewhat related: one is about >> specifying search-spaces, the other about executing a given search space >> efficiently. They probably warrant separate discussions. >> I haven't added plotting or sample props on it, which are maybe two other >> discussion points. >> I tried to cover most controversial things from the roadmap. >> Not sure if discussing the schedule via the mailing list is the best way? Don't >> want to create even more traffic? than I already am ;) >> On 2/19/19 5:48 PM, Guillaume Lema?tre wrote: >> > Not sure if Guillaume had ideas about the schedule, given that he seems >> to be running the show? >> Mostly running behind the show ... >> For the moment, we only have a 30 minutes presentation of introduction >> planned on Monday. >> For the rest of the week, this is pretty much opened and I think that we >> can propose a schedule such that we can be efficient. >> IMO, two meetings of an hour per day look good to me. >> Shall we prioritize the list of the issues? Maybe that some issues could be >> packed together. >> I would not be against having a rough schedule on the wiki directly and I >> think that having it before Monday could be better. >> Let me know how I can help. >> On Tue, 19 Feb 2019 at 22:23, Andreas Mueller wrote: >> Yeah, sounds good. >> I didn't want to unilaterally post a schedule, but doing some google >> form or similar seems a bit heavy-handed? >> Not sure if Guillaume had ideas about the schedule, given that he seems >> to be running the show? >> On 2/19/19 4:17 PM, Joel Nothman wrote: >> I don't think optics requires a large meeting, just a few people. >> I'm happy with your proposal generally, Andy. Do we schedule >> specific topics at this point? > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn From joel.nothman at gmail.com Mon Feb 25 10:45:15 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Mon, 25 Feb 2019 16:45:15 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <068eb65e-6a0f-b807-c3f6-930f61a0f45e@gmail.com> References: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> <068eb65e-6a0f-b807-c3f6-930f61a0f45e@gmail.com> Message-ID: I'm all for the decorator if you can get numpydoc working with it! -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Feb 25 10:52:20 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 25 Feb 2019 10:52:20 -0500 Subject: [scikit-learn] New Governance document accepted! Message-ID: <5baa6edb-bd12-a799-e95d-4d4dbd744bbf@gmail.com> Hey y'all. It's my pleasure to announce that the new governance document has been accepted by a core-dev vote. Out of the 49 eligible core-devs, 22 voted "yes" on the mailing list and 4 voted "yes" on the issue tracker. The remaining core devs did not vote. You can find the document on github and soon also on the dev website: https://github.com/scikit-learn/scikit-learn/blob/master/doc/governance.rst This also means we now have a Technical Committee to resolve technical issues, with is (for now) Alexandre Gramfort, Olivier Grisel, Joel Nothman, Hanmin Qin, Ga?l Varoquaux, Roman Yurchak and myself. As described in the governance document, we will also be reaching out to any developers that haven't been actively involved in the project in the past 12 month, asking if they want to transition to emeritus status, recanting their commit and voting rights until they become active again. If you're a current core dev and think you'd be more appropriately an emeritus member, feel free to comment here: https://github.com/scikit-learn/scikit-learn/issues/13257 Thanks, and hope to see you all soon at the sprint (online or in person). Andy From gael.varoquaux at normalesup.org Tue Feb 26 04:46:45 2019 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 26 Feb 2019 10:46:45 +0100 Subject: [scikit-learn] Consultation: eligibility of inclusion of speed-up? Message-ID: <20190226094645.ibs3dzugzmnvxerg@phare.normalesup.org> I need core devs opinion (please, only core devs, I am sending this on the public ML for transparency): The following PR adds a speed up for expansion of polynomial kernels: https://github.com/scikit-learn/scikit-learn/pull/13003 According to the author, the speed up is significant (needs to be verified during a code review). The paper is a bit below citation level for inclusion of a new method, however this can be seen as a speed up of Nystrom. Strictly speaking, it is not just a speed-up, as it introduces a new estimator. The discussion on the PR is short and quickly reviews the relevant literature. My question: should we consider this as acceptable for inclusion (provided that it does show significant speedups with good prediction accuracy)? I am asking to know if we start the review and inclusion process or not. Cheers, Ga?l From pahome.chen at mirlab.org Tue Feb 26 05:16:41 2019 From: pahome.chen at mirlab.org (lampahome) Date: Tue, 26 Feb 2019 18:16:41 +0800 Subject: [scikit-learn] What's are the advantages and disadvantages of incremental learning? Message-ID: Generally speaking, we all know it's to save spaces with incremental learning. According to the ques in stackoverflow , it also said that. But what's the disadvantages? What I know from my experiments is two points below: 1. Train with subsets of data *but shouldn't be too small*. I prepared very small datasets and the predict result is very worse. 2. When training for a very long time, some elder behavors will be forgotten due to the multiple training epochs. That's all from my experience when training with *xgboost* incrementally. Or anything else? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremie.du-boisberranger at inria.fr Tue Feb 26 05:22:21 2019 From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger) Date: Tue, 26 Feb 2019 11:22:21 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> <068eb65e-6a0f-b807-c3f6-930f61a0f45e@gmail.com> Message-ID: I totally forgot to mention it before the sprint started but i'd like to have a discussion about the integration of a new benchmark suite into the scikit-learn organization. Essentially, I've been working on a benchmark suite for sklearn using the airspeed velocity (asv) framework. The purpose of asv is to benchmark a repo across commits. It can be used for instance to detect regressions, performance wise and memory wise. If you want to discuss it, let me know. I'm here the whole week. J?r?mie From adrin.jalali at gmail.com Tue Feb 26 05:28:42 2019 From: adrin.jalali at gmail.com (Adrin) Date: Tue, 26 Feb 2019 11:28:42 +0100 Subject: [scikit-learn] Consultation: eligibility of inclusion of speed-up? In-Reply-To: <20190226094645.ibs3dzugzmnvxerg@phare.normalesup.org> References: <20190226094645.ibs3dzugzmnvxerg@phare.normalesup.org> Message-ID: To me the maintainability of the added code also plays a role, and this PR is really nice and short in its implementation. It needs better documentation (other than the plot_ file) to better demonstrate its benefits, otherwise looks reasonable to have it IMO. On Tue, Feb 26, 2019 at 11:11 AM Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > I need core devs opinion (please, only core devs, I am sending this on > the public ML for transparency): > > The following PR adds a speed up for expansion of polynomial kernels: > https://github.com/scikit-learn/scikit-learn/pull/13003 > > According to the author, the speed up is significant (needs to be > verified during a code review). > > The paper is a bit below citation level for inclusion of a new method, > however this can be seen as a speed up of Nystrom. Strictly speaking, it > is not just a speed-up, as it introduces a new estimator. > > The discussion on the PR is short and quickly reviews the relevant > literature. > > > My question: should we consider this as acceptable for inclusion > (provided that it does show significant speedups with good prediction > accuracy)? I am asking to know if we start the review and inclusion > process or not. > > > Cheers, > > > Ga?l > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Tue Feb 26 05:48:01 2019 From: t3kcit at gmail.com (Andreas Mueller) Date: Tue, 26 Feb 2019 11:48:01 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: References: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> <068eb65e-6a0f-b807-c3f6-930f61a0f45e@gmail.com> Message-ID: <3d156be5-3ad2-1776-82fb-bbb41fdc7e24@gmail.com> Was that the same that Vlad used? https://github.com/scikit-learn/scikit-learn-speed We might want to just replace that, given that it hasn't been touched in 7 years? On 2/26/19 5:22 AM, Jeremie du Boisberranger wrote: > I totally forgot to mention it before the sprint started but i'd like > to have a discussion about the integration of a new benchmark suite > into the scikit-learn organization. > > Essentially, I've been working on a benchmark suite for sklearn using > the airspeed velocity (asv) framework. The purpose of asv is to > benchmark a repo across commits. It can be used for instance to detect > regressions, performance wise and memory wise. > > If you want to discuss it, let me know. I'm here the whole week. > > J?r?mie > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From jeremie.du-boisberranger at inria.fr Tue Feb 26 06:05:38 2019 From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger) Date: Tue, 26 Feb 2019 12:05:38 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <3d156be5-3ad2-1776-82fb-bbb41fdc7e24@gmail.com> References: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> <068eb65e-6a0f-b807-c3f6-930f61a0f45e@gmail.com> <3d156be5-3ad2-1776-82fb-bbb41fdc7e24@gmail.com> Message-ID: <9f478c05-214e-0c06-7ca9-239ee4d9bd5b@inria.fr> Not the same, although there are similarities. However asv provides tools to compare benchmarks across commits, and to publish them in html format to follow their evolution through time such as https://pv.github.io/numpy-bench/ Here's the link of the benchmark suite : https://github.com/jeremiedbb/scikit-learn_benchmarks On 26/02/2019 11:48, Andreas Mueller wrote: > Was that the same that Vlad used? > > https://github.com/scikit-learn/scikit-learn-speed > > We might want to just replace that, given that it hasn't been touched > in 7 years? > > > On 2/26/19 5:22 AM, Jeremie du Boisberranger wrote: >> I totally forgot to mention it before the sprint started but i'd like >> to have a discussion about the integration of a new benchmark suite >> into the scikit-learn organization. >> >> Essentially, I've been working on a benchmark suite for sklearn using >> the airspeed velocity (asv) framework. The purpose of asv is to >> benchmark a repo across commits. It can be used for instance to >> detect regressions, performance wise and memory wise. >> >> If you want to discuss it, let me know. I'm here the whole week. >> >> J?r?mie >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From joel.nothman at gmail.com Tue Feb 26 17:07:32 2019 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 26 Feb 2019 23:07:32 +0100 Subject: [scikit-learn] Sprint discussion points? In-Reply-To: <9f478c05-214e-0c06-7ca9-239ee4d9bd5b@inria.fr> References: <9708f3ba-af9f-c7f7-3613-55ad9fd982d2@gmail.com> <455b2ad2-d978-dad6-653a-99da5b9e9e16@gmail.com> <071d9f57-e8c7-5dab-ec9f-ecdbe3f9b897@gmail.com> <5e3a5a31-20bb-2186-081c-10d2593d82dd@gmail.com> <9c07e172-20bc-b8dd-e1c6-0e956d40b5d9@gmail.com> <20190220214037.k5yzd7rjpgo7767p@phare.normalesup.org> <068eb65e-6a0f-b807-c3f6-930f61a0f45e@gmail.com> <3d156be5-3ad2-1776-82fb-bbb41fdc7e24@gmail.com> <9f478c05-214e-0c06-7ca9-239ee4d9bd5b@inria.fr> Message-ID: What do you think needs to be raised for discussion? On Tue., 26 Feb. 2019, 12:06 pm Jeremie du Boisberranger, < jeremie.du-boisberranger at inria.fr> wrote: > Not the same, although there are similarities. However asv provides > tools to compare benchmarks across commits, and to publish them in html > format to follow their evolution through time such as > https://pv.github.io/numpy-bench/ > > Here's the link of the benchmark suite : > https://github.com/jeremiedbb/scikit-learn_benchmarks > > > On 26/02/2019 11:48, Andreas Mueller wrote: > > Was that the same that Vlad used? > > > > https://github.com/scikit-learn/scikit-learn-speed > > > > We might want to just replace that, given that it hasn't been touched > > in 7 years? > > > > > > On 2/26/19 5:22 AM, Jeremie du Boisberranger wrote: > >> I totally forgot to mention it before the sprint started but i'd like > >> to have a discussion about the integration of a new benchmark suite > >> into the scikit-learn organization. > >> > >> Essentially, I've been working on a benchmark suite for sklearn using > >> the airspeed velocity (asv) framework. The purpose of asv is to > >> benchmark a repo across commits. It can be used for instance to > >> detect regressions, performance wise and memory wise. > >> > >> If you want to discuss it, let me know. I'm here the whole week. > >> > >> J?r?mie > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bertrand.thirion at inria.fr Wed Feb 27 17:33:07 2019 From: bertrand.thirion at inria.fr (bthirion) Date: Wed, 27 Feb 2019 23:33:07 +0100 Subject: [scikit-learn] Consultation: eligibility of inclusion of speed-up? In-Reply-To: <20190226094645.ibs3dzugzmnvxerg@phare.normalesup.org> References: <20190226094645.ibs3dzugzmnvxerg@phare.normalesup.org> Message-ID: My understanding is that this is a rather well-grounded and light-weight sketching technique, that fits well in sklearn. +1 for me But yes, I remember that we have enforced the 200-citation rule quite strictly in the past. Best, B On 26/02/2019 10:46, Gael Varoquaux wrote: > I need core devs opinion (please, only core devs, I am sending this on > the public ML for transparency): > > The following PR adds a speed up for expansion of polynomial kernels: > https://github.com/scikit-learn/scikit-learn/pull/13003 > > According to the author, the speed up is significant (needs to be > verified during a code review). > > The paper is a bit below citation level for inclusion of a new method, > however this can be seen as a speed up of Nystrom. Strictly speaking, it > is not just a speed-up, as it introduces a new estimator. > > The discussion on the PR is short and quickly reviews the relevant > literature. > > > My question: should we consider this as acceptable for inclusion > (provided that it does show significant speedups with good prediction > accuracy)? I am asking to know if we start the review and inclusion > process or not. > > > Cheers, > > > Ga?l > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn