From reshama.stat at gmail.com Thu Dec 7 11:15:09 2023 From: reshama.stat at gmail.com (Reshama Shaikh) Date: Thu, 7 Dec 2023 11:15:09 -0500 Subject: [scikit-learn] =?utf-8?q?video=3A_scikit-learn=27s_Past=2C_Prese?= =?utf-8?q?nt_and_Future_=E2=80=94_with_scikit-learn_co-founder_Dr?= =?utf-8?q?=2E_Ga=C3=ABl_Varoquaux?= Message-ID: Hello, In this episode, Ga?l details: ? The genesis, present capabilities and fast-moving future direction of scikit-learn. ? How to best apply scikit-learn to your particular ML problem. ? How ever-larger datasets and GPU-based accelerations impact the scikit-learn project. ? How (whether you write code or not!) you can get started on contributing to a mega-impactful open-source project like scikit-learn yourself. ? Hugely successful social-impact data projects his Soda lab has had recently. ? Why statistical rigor is more important than ever and how software tools could nudge us in the direction of making more statistically sound decisions. VIDEO interview: https://www.jonkrohn.com/posts/2023/12/5/scikit-learns-past-present-and-future-with-scikit-learn-co-founder-dr-gal-varoquaux ---- Best, Reshama Shaikh -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Dec 7 11:29:22 2023 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 7 Dec 2023 17:29:22 +0100 Subject: [scikit-learn] =?utf-8?q?video=3A_scikit-learn=27s_Past=2C_Prese?= =?utf-8?q?nt_and_Future_=E2=80=94_with_scikit-learn_co-founder_Dr=2E_Ga?= =?utf-8?q?=C3=ABl_Varoquaux?= In-Reply-To: References: Message-ID: <20231207162922.rodqrsft4mum4exf@gaellaptop> Hi Reshama, Thanks for putting me in contact with Jon, it was a create experience. I haven't had time to listen to the video myself. I hope that I haven't said too much nonsense :$ G On Thu, Dec 07, 2023 at 11:15:09AM -0500, Reshama Shaikh wrote: > Hello, > In this episode, Ga?l details: > ? The genesis, present capabilities and fast-moving future direction of > scikit-learn. > ? How to best apply scikit-learn to your particular ML problem. > ? How ever-larger datasets and GPU-based accelerations impact the scikit-learn > project. > ? How (whether you write code or not!) you can get started on contributing to a > mega-impactful open-source project like scikit-learn yourself. > ? Hugely successful social-impact data projects his Soda lab has had recently. > ? Why statistical rigor is more important than ever and how software tools > could nudge us in the direction of making more statistically sound decisions. > VIDEO interview:?https://www.jonkrohn.com/posts/2023/12/5/ > scikit-learns-past-present-and-future-with-scikit-learn-co-founder-dr-gal-varoquaux > ---- > Best, > Reshama Shaikh > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From gael.varoquaux at normalesup.org Mon Dec 18 12:49:31 2023 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 18 Dec 2023 18:49:31 +0100 Subject: [scikit-learn] Announcing skrub: Prepping tables for machine learning Message-ID: <20231218174931.cgbvzk4rtljfeypt@gaellaptop> Hi everyone, We are very happy to announce the first release of a new package called "skrub". It's goal is to facilitate data preparation from tables to machine learning with an API similar to that of scikit-learn. https://skrub-data.org The most useful tool in the short term is the "TableVectorizer", which applies a bunch of heuristics to turn a complex into a good data representation for learning (for instance encoding dates, or strings). Combined with scikit-learn HistGradientBoosting, it gives a strong baseline for most tabular learning settings without data massaging: from sklearn.ensemble import HistGradientBoostingRegressor from sklearn.pipeline import make_pipeline from skrub import TableVectorizer pipeline = make_pipeline(TableVectorizer(), HistGradientBoostingRegressor()) pipeline.fit(X, y) In the longer term, skrub will enable assembling full data processing pipelines across multiple tables that can be cross-validated with scikit_learn and one day put in production: Joining, Aggregation, transformation to build models directly from the original tables and database. One example of such pipeline can be seen here: https://skrub-data.org/stable/auto_examples/08_join_aggregation.html#chaining-everything-together-in-a-pipeline But there is a lot that remains to be done, and the questions are quite open. In my eyes, the dream is to bridge scikit-learn's API, that separates fit/transform (because it helps making robust and valid predictive pipelines) with dataframe/database operations. The goal is not to provide something as flexible as SQL or pandas, but the cover the most frequent usecases in machine learning, as explained here https://skrub-data.org/stable/vision.html Of course, skrub will be developed in the open, with an eye to quality, staying as lightweight as possible while still providing powerful tool. I hope that many will join this adventure! Cheers, Ga?l From fernando.wittmann at gmail.com Mon Dec 18 19:18:03 2023 From: fernando.wittmann at gmail.com (Fernando Marcos Wittmann) Date: Mon, 18 Dec 2023 21:18:03 -0300 Subject: [scikit-learn] Announcing skrub: Prepping tables for machine learning In-Reply-To: <20231218174931.cgbvzk4rtljfeypt@gaellaptop> References: <20231218174931.cgbvzk4rtljfeypt@gaellaptop> Message-ID: Very strong baseline indeed. Did a quick check with the Ames housing dataset: https://colab.research.google.com/drive/1RVVl_R5X3YYC7kj-B9uI5Fq7-SCYhYnD?usp=sharing Thanks all for the contribution! On Mon, Dec 18, 2023 at 2:49?PM Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > Hi everyone, > > We are very happy to announce the first release of a new package called > "skrub". It's goal is to facilitate data preparation from tables to machine > learning with an API similar to that of scikit-learn. > https://skrub-data.org > > The most useful tool in the short term is the "TableVectorizer", which > applies a bunch of heuristics to turn a complex into a good data > representation for learning (for instance encoding dates, or strings). > Combined with scikit-learn HistGradientBoosting, it gives a strong baseline > for most tabular learning settings without data massaging: > > from sklearn.ensemble import HistGradientBoostingRegressor > from sklearn.pipeline import make_pipeline > from skrub import TableVectorizer > > pipeline = make_pipeline(TableVectorizer(), > HistGradientBoostingRegressor()) > pipeline.fit(X, y) > > > In the longer term, skrub will enable assembling full data processing > pipelines across multiple tables that can be cross-validated with > scikit_learn and one day put in production: Joining, Aggregation, > transformation to build models directly from the original tables and > database. > > One example of such pipeline can be seen here: > > https://skrub-data.org/stable/auto_examples/08_join_aggregation.html#chaining-everything-together-in-a-pipeline > > But there is a lot that remains to be done, and the questions are quite > open. > > In my eyes, the dream is to bridge scikit-learn's API, that separates > fit/transform (because it helps making robust and valid predictive > pipelines) with dataframe/database operations. The goal is not to provide > something as flexible as SQL or pandas, but the cover the most frequent > usecases in machine learning, as explained here > https://skrub-data.org/stable/vision.html > > Of course, skrub will be developed in the open, with an eye to quality, > staying as lightweight as possible while still providing powerful tool. I > hope that many will join this adventure! > > Cheers, > > Ga?l > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Tue Dec 19 03:10:13 2023 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 19 Dec 2023 09:10:13 +0100 Subject: [scikit-learn] Announcing skrub: Prepping tables for machine learning In-Reply-To: References: <20231218174931.cgbvzk4rtljfeypt@gaellaptop> Message-ID: <20231219081013.te4o6dxmn5xwibif@gaellaptop> Thanks for trying and for the feedback, Fernando! I had not tried it on the Ames housing dataset. I just had a quick look at it, and I think that with the recent improvements in scikit-learn (namely native support of categorical column), we are going to be able to have an even better behavior out of the box. Cheers, Ga?l On Mon, Dec 18, 2023 at 09:18:03PM -0300, Fernando Marcos Wittmann wrote: > Very strong baseline indeed. Did a quick check with the Ames housing dataset:? > https://colab.research.google.com/drive/1RVVl_R5X3YYC7kj-B9uI5Fq7-SCYhYnD?usp= > sharing > Thanks all for the contribution!? > On Mon, Dec 18, 2023 at 2:49?PM Gael Varoquaux > wrote: > Hi everyone, > We are very happy to announce the first release of a new package called > "skrub". It's goal is to facilitate data preparation from tables to machine > learning with an API similar to that of scikit-learn. > https://skrub-data.org > The most useful tool in the short term is the "TableVectorizer", which > applies a bunch of heuristics to turn a complex into a good data > representation for learning (for instance encoding dates, or strings). > Combined with scikit-learn HistGradientBoosting, it gives a strong baseline > for most tabular learning settings without data massaging: > from sklearn.ensemble import HistGradientBoostingRegressor > from sklearn.pipeline import make_pipeline > from skrub import TableVectorizer > pipeline = make_pipeline(TableVectorizer(), HistGradientBoostingRegressor > ()) > pipeline.fit(X, y) > In the longer term, skrub will enable assembling full data processing > pipelines across multiple tables that can be cross-validated with > scikit_learn and one day put in production: Joining, Aggregation, > transformation to build models directly from the original tables and > database. > One example of such pipeline can be seen here: > https://skrub-data.org/stable/auto_examples/08_join_aggregation.html# > chaining-everything-together-in-a-pipeline > But there is a lot that remains to be done, and the questions are quite > open. > In my eyes, the dream is to bridge scikit-learn's API, that separates fit/ > transform (because it helps making robust and valid predictive pipelines) > with dataframe/database operations. The goal is not to provide something as > flexible as SQL or pandas, but the cover the most frequent usecases in > machine learning, as explained here https://skrub-data.org/stable/ > vision.html > Of course, skrub will be developed in the open, with an eye to quality, > staying as lightweight as possible while still providing powerful tool. I > hope that many will join this adventure! > Cheers, > Ga?l > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From jeremie.du-boisberranger at inria.fr Wed Dec 20 15:11:13 2023 From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger) Date: Wed, 20 Dec 2023 21:11:13 +0100 Subject: [scikit-learn] [ANN] scikit-learn 1.4.0rc1 is online ! Please test In-Reply-To: References: <5f1f099d-371a-3127-8bff-dc51d6439cce@inria.fr> Message-ID: Hi everyone, Please help us test the first release candidate for scikit-learn 1.4: ?? pip install scikit-learn==1.4.0rc1 Changelog: https://scikit-learn.org/1.4/whats_new/v1.4.html In particular, if you maintain a project with a dependency on scikit-learn, please let us know about any regression. Thanks to everyone who contributed to this release! J?r?mie, on behalf of the scikit-learn maintainer team. From lorentzen.ch at gmail.com Thu Dec 21 14:20:15 2023 From: lorentzen.ch at gmail.com (Christian Lorentzen) Date: Thu, 21 Dec 2023 20:20:15 +0100 Subject: [scikit-learn] [ANN] scikit-learn 1.4.0rc1 is online ! Please test In-Reply-To: References: Message-ID: Thanks J?r?mie! The RC makes testing the upcoming release indeed much easier. Christian > Am 20.12.2023 um 21:12 schrieb Jeremie du Boisberranger : > > ?Hi everyone, > > Please help us test the first release candidate for scikit-learn 1.4: > > pip install scikit-learn==1.4.0rc1 > > Changelog: https://scikit-learn.org/1.4/whats_new/v1.4.html > > In particular, if you maintain a project with a dependency on > scikit-learn, please let us know about any regression. > > Thanks to everyone who contributed to this release! > > > J?r?mie, > > on behalf of the scikit-learn maintainer team. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn