From adrin.jalali at gmail.com Fri Feb 5 12:13:18 2021 From: adrin.jalali at gmail.com (Adrin) Date: Fri, 5 Feb 2021 18:13:18 +0100 Subject: [scikit-learn] Crypto project to fund open source In-Reply-To: References: Message-ID: I just got this on the pandas-dev mailing list. Seems rather interesting! ---------- Forwarded message --------- From: Marc Garcia Date: Fri, Feb 5, 2021 at 2:21 PM Subject: [Pandas-dev] Crypto project to fund open source To: pandas-dev Hi, I've been contacting regarding a crypto project that aims at funding open source projects. Not sure about the details, but seems like if we register pandas, we'll be getting funds as the crypto is mined (after users endorse pandas I think). Not so familiar myself with crypto or the project, but just in case anyone finds it interesting and wants to add pandas to it: https://devprotocol.xyz/ Cheers, Marc _______________________________________________ Pandas-dev mailing list Pandas-dev at python.org https://mail.python.org/mailman/listinfo/pandas-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Fri Feb 5 15:12:44 2021 From: daniele at grinta.net (Daniele Nicolodi) Date: Fri, 5 Feb 2021 21:12:44 +0100 Subject: [scikit-learn] Crypto project to fund open source In-Reply-To: References: Message-ID: <7324dfd0-272a-b919-f1be-eaaabee81d08@grinta.net> On 05/02/2021 18:13, Adrin wrote: > I just got this on the pandas-dev mailing list. Seems rather interesting! It looks like an elaborate Ponzi scheme to me. Cheers, Dan From loic.esteve at ymail.com Mon Feb 8 02:02:51 2021 From: loic.esteve at ymail.com (=?utf-8?B?TG/Dr2MgRXN0w6h2ZQ==?=) Date: Mon, 08 Feb 2021 08:02:51 +0100 Subject: [scikit-learn] Github Discussions enabled for scikit-learn References: Message-ID: Hi, we enabled Github Discussions a few weeks ago. You can find it at: https://github.com/scikit-learn/scikit-learn/discussions For now we consider this as experimental, we will be monitoring it and see how useful it can be. The main hope I personally have with it, is to help build the scikit-learn community, in particular to get more scikit-learn users to answer user questions. For more details about Github Discussions: https://docs.github.com/en/discussions A useful option to be able to follow the scikit-learn Github Discussions activity: - go to the scikit-learn repo page: https://github.com/scikit-learn/scikit-learn - click the Watch button (top-right, left of the Star button) then chose Custom, then check Discussions Cheers, Lo?c From adrin.jalali at gmail.com Mon Feb 8 04:18:24 2021 From: adrin.jalali at gmail.com (Adrin) Date: Mon, 8 Feb 2021 10:18:24 +0100 Subject: [scikit-learn] Crypto project to fund open source In-Reply-To: References: Message-ID: I had a chat with Guillaume and he raised the concern of energy consumption on blockchain platforms. I had a little look, and realized this platform runs on Etherium, which has a significantly lower energy footprint than bitcoin, but still a rather high footprint. You can see the consumption charts through years here: https://digiconomist.net/ethereum-energy-consumption According the above link, energy per transaction on etherium is ~40kWh, which is about 6EUR/transaction on my electricity bill, and to be that's way too high of a price and electricity consumption. I think then my vote would be a no wrt. this platform. Cheers, Adrin On Fri, Feb 5, 2021 at 6:13 PM Adrin wrote: > I just got this on the pandas-dev mailing list. Seems rather interesting! > > ---------- Forwarded message --------- > From: Marc Garcia > Date: Fri, Feb 5, 2021 at 2:21 PM > Subject: [Pandas-dev] Crypto project to fund open source > To: pandas-dev > > > Hi, > > I've been contacting regarding a crypto project that aims at funding open > source projects. Not sure about the details, but seems like if we register > pandas, we'll be getting funds as the crypto is mined (after users endorse > pandas I think). > > Not so familiar myself with crypto or the project, but just in case anyone > finds it interesting and wants to add pandas to it: > https://devprotocol.xyz/ > > Cheers, > Marc > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Mon Feb 8 05:51:12 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Mon, 8 Feb 2021 11:51:12 +0100 Subject: [scikit-learn] Crypto project to fund open source In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 at 10:20, Adrin wrote: > I had a chat with Guillaume and he raised the concern of energy > consumption on blockchain platforms. > > I had a little look, and realized this platform runs on Etherium, which > has a significantly lower energy footprint > than bitcoin, but still a rather high footprint. You can see the > consumption charts through years here: > https://digiconomist.net/ethereum-energy-consumption > > According the above link, energy per transaction on etherium is ~40kWh, > which is about 6EUR/transaction > on my electricity bill, and to be that's way too high of a price and > electricity consumption. > Or on another scale, myself riding a bike during 200 hours :) > > I think then my vote would be a no wrt. this platform. > > Cheers, > Adrin > > On Fri, Feb 5, 2021 at 6:13 PM Adrin wrote: > >> I just got this on the pandas-dev mailing list. Seems rather interesting! >> >> ---------- Forwarded message --------- >> From: Marc Garcia >> Date: Fri, Feb 5, 2021 at 2:21 PM >> Subject: [Pandas-dev] Crypto project to fund open source >> To: pandas-dev >> >> >> Hi, >> >> I've been contacting regarding a crypto project that aims at funding open >> source projects. Not sure about the details, but seems like if we register >> pandas, we'll be getting funds as the crypto is mined (after users endorse >> pandas I think). >> >> Not so familiar myself with crypto or the project, but just in case >> anyone finds it interesting and wants to add pandas to it: >> https://devprotocol.xyz/ >> >> Cheers, >> Marc >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From petrizzo at gmail.com Tue Feb 9 15:15:02 2021 From: petrizzo at gmail.com (Mariangela Petrizzo) Date: Tue, 9 Feb 2021 16:15:02 -0400 Subject: [scikit-learn] Spanish translation proposal for Scikit-Learn documentation Message-ID: Dear Scikit-Learn team! I am Mari?ngela Petrizzo, I am writing to you as a member of Qu4nt, a team dedicated to the use of open source tools for the development of software solutions with emphasis on data science. We have a strong interest in translating the Scikit-Learn documentation into Spanish. Our team is made up of members from various scientific fields, including some university faculty in linguistics and computer sciences, with a wide experience in Python as well as several libraries used for data analysis and machine learning, and also contribute locally as evangelists of its use in Spanish-speaking communities, in particular, the leader initiated the translation of some Software Carpentry lessons into Spanish. That is why we have been discussing the opportunity to offer our contribution to the Python project, promoting the translation into Spanish of the documentation of some of the libraries with the greatest impact in our areas of interest. Talking with David Mertz, to whom we are sending a copy of this email, we have explored options, and the idea of working with Scikit-learn has really seemed to be an exceptional opportunity for all of us and the community. He's very enthusiastic about the idea of generating a spanish translation of Scientific Python libraries like Scikit-learn. For us, this translation project has to be done through a completely open work on Github, taking as reference the restructured text sources for Sphinx from a git fork, using the tools provided by Sphinx itself for internationalization: https://www.sphinx-doc.org/en/1.8/intl.html, and applying tags to perform planned updates. In addition, as with any open source project, the main mechanism for quality assurance comes from the users themselves who will have the channels available for submitting issues. Our intention is to secure all the infrastructure and mechanisms to make this possible: making the process transparent through Github, using as much as possible tools like Transifex to facilitate participation, and providing guidelines for contributors as part of the project. Of course, this project cannot be realized without your support. We therefore come to you to inquire about your willingness to accompany and support this project. We would love to hear your feedback on our proposal. Best regards, Mari?ngela -- Mar?a ?ngela Petrizzo P?ez [image: https://] [image: https://]about.me/petrizzo Desc?rgate Redes para la Comprensi?n de la Pol?tica *A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino lo primero que se siembra y, por tanto, lo m?s radical.* El ?nico modo de vencer el secuestro del conocimiento es comprender sus razones. La manera de revertirlo, es hacernos hackers de los secuestros cotidianos a cambio de no morir sin saber lo que somos ?Piensa para vivir, act?a para hackear! Cada d?a, una acci?n procom?n a la vez. *?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos?* *Albert Camus* *?El poder, lejos de estorbar al saber, lo produce.? - **Michael Foucault* Usuario Linux # 498889 Miembro Red de Polit?logas - #NoSinMujeres https://hotelescuela.academia.edu/MariangelaPetrizzoPaez http://orcid.org/0000-0001-9483-4185 PEII - Nivel B -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Tue Feb 9 16:58:12 2021 From: niourf at gmail.com (Nicolas Hug) Date: Tue, 9 Feb 2021 21:58:12 +0000 Subject: [scikit-learn] Spanish translation proposal for Scikit-Learn documentation In-Reply-To: References: Message-ID: Hi Mar?a ?ngela, Thank you for your interest in contributing to scikit-learn! Could you detail a bit more what kind of involvement you would need from the scikit-learn maintainers / team? So far, we've been welcoming third-party translations and they have a dedicated section on our website where you'll also find how we relate to them: https://scikit-learn.org/dev/related_projects.html#translations-of-scikit-learn-documentation. Note in particular that we're not able to officially maintain any of these translations: managing the English one and keeping it up to date is already a **significant** workload, and we wouldn't have the resources (or language knowledge) to support other versions. We'd be more than happy to add your Spanish translation to the list though! Nicolas On Tue, 9 Feb 2021 at 20:16, Mariangela Petrizzo wrote: > Dear Scikit-Learn team! > > > I am Mari?ngela Petrizzo, I am writing to you as a member of Qu4nt, a team > dedicated to the use of open source tools for the development of software > solutions with emphasis on data science. We have a strong interest in > translating the Scikit-Learn documentation into Spanish. > > Our team is made up of members from various scientific fields, including > some university faculty in linguistics and computer sciences, with a wide > experience in Python as well as several libraries used for data analysis > and machine learning, and also contribute locally as evangelists of its > use in Spanish-speaking communities, in particular, the leader initiated > the translation of some Software Carpentry lessons into Spanish. > > That is why we have been discussing the opportunity to offer our > contribution to the Python project, promoting the translation into Spanish > of the documentation of some of the libraries with the greatest impact in > our areas of interest. Talking with David Mertz, to whom we are sending a > copy of this email, we have explored options, and the idea of working with > Scikit-learn has really seemed to be an exceptional opportunity for all of > us and the community. He's very enthusiastic about the idea of generating a > spanish translation of Scientific Python libraries like Scikit-learn. > > For us, this translation project has to be done through a completely open > work on Github, taking as reference the restructured text sources for > Sphinx from a git fork, using the tools provided by Sphinx itself for > internationalization: https://www.sphinx-doc.org/en/1.8/intl.html, and > applying tags to perform planned updates. In addition, as with any open > source project, the main mechanism for quality assurance comes from the > users themselves who will have the channels available for submitting > issues. Our intention is to secure all the infrastructure and mechanisms to > make this possible: making the process transparent through Github, using as > much as possible tools like Transifex to facilitate participation, and > providing guidelines for contributors as part of the project. > > Of course, this project cannot be realized without your support. We > therefore come to you to inquire about your willingness to accompany and > support this project. > > We would love to hear your feedback on our proposal. > > Best regards, > > > Mari?ngela > > > -- > > > > Mar?a ?ngela Petrizzo P?ez > [image: https://] > [image: https://]about.me/petrizzo > > Desc?rgate Redes para la Comprensi?n de la Pol?tica > > > *A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino > lo primero que se siembra y, por tanto, lo m?s radical.* > > > El ?nico modo de vencer el secuestro del conocimiento > es comprender sus razones. > La manera de revertirlo, > es hacernos hackers de los secuestros cotidianos > a cambio de no morir sin saber lo que somos > > ?Piensa para vivir, > act?a para hackear! > Cada d?a, una acci?n procom?n a la vez. > > > *?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos?* > > *Albert Camus* > *?El poder, lejos de estorbar al saber, lo produce.? - **Michael Foucault* > > > Usuario Linux # 498889 > Miembro Red de Polit?logas - #NoSinMujeres > https://hotelescuela.academia.edu/MariangelaPetrizzoPaez > http://orcid.org/0000-0001-9483-4185 > PEII - Nivel B > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From petrizzo at gmail.com Wed Feb 10 09:58:18 2021 From: petrizzo at gmail.com (Mariangela Petrizzo) Date: Wed, 10 Feb 2021 10:58:18 -0400 Subject: [scikit-learn] Spanish translation proposal for Scikit-Learn documentation In-Reply-To: References: Message-ID: Thanks for your response Nicolas! For grant approval purposes, PSF indicates that it is only necessary that somebody from the Scikit-learn project team endorse our translation efforts. Mariangela On Tue, Feb 9, 2021 at 5:59 PM Nicolas Hug wrote: > Hi Mar?a ?ngela, > Thank you for your interest in contributing to scikit-learn! > Could you detail a bit more what kind of involvement you would need from > the scikit-learn maintainers / team? So far, we've been welcoming > third-party translations and they have a dedicated section on our website > where you'll also find how we relate to them: > https://scikit-learn.org/dev/related_projects.html#translations-of-scikit-learn-documentation. > Note in particular that we're not able to officially maintain any of these > translations: managing the English one and keeping it up to date is already > a **significant** workload, and we wouldn't have the resources (or language > knowledge) to support other versions. We'd be more than happy to add your > Spanish translation to the list though! > > Nicolas > > On Tue, 9 Feb 2021 at 20:16, Mariangela Petrizzo > wrote: > >> Dear Scikit-Learn team! >> >> >> I am Mari?ngela Petrizzo, I am writing to you as a member of Qu4nt, a >> team dedicated to the use of open source tools for the development of >> software solutions with emphasis on data science. We have a strong interest >> in translating the Scikit-Learn documentation into Spanish. >> >> Our team is made up of members from various scientific fields, including >> some university faculty in linguistics and computer sciences, with a wide >> experience in Python as well as several libraries used for data analysis >> and machine learning, and also contribute locally as evangelists of its >> use in Spanish-speaking communities, in particular, the leader initiated >> the translation of some Software Carpentry lessons into Spanish. >> >> That is why we have been discussing the opportunity to offer our >> contribution to the Python project, promoting the translation into Spanish >> of the documentation of some of the libraries with the greatest impact in >> our areas of interest. Talking with David Mertz, to whom we are sending a >> copy of this email, we have explored options, and the idea of working with >> Scikit-learn has really seemed to be an exceptional opportunity for all of >> us and the community. He's very enthusiastic about the idea of generating a >> spanish translation of Scientific Python libraries like Scikit-learn. >> >> For us, this translation project has to be done through a completely open >> work on Github, taking as reference the restructured text sources for >> Sphinx from a git fork, using the tools provided by Sphinx itself for >> internationalization: https://www.sphinx-doc.org/en/1.8/intl.html, and >> applying tags to perform planned updates. In addition, as with any open >> source project, the main mechanism for quality assurance comes from the >> users themselves who will have the channels available for submitting >> issues. Our intention is to secure all the infrastructure and mechanisms to >> make this possible: making the process transparent through Github, using as >> much as possible tools like Transifex to facilitate participation, and >> providing guidelines for contributors as part of the project. >> >> Of course, this project cannot be realized without your support. We >> therefore come to you to inquire about your willingness to accompany and >> support this project. >> >> We would love to hear your feedback on our proposal. >> >> Best regards, >> >> >> Mari?ngela >> >> >> -- >> >> >> >> Mar?a ?ngela Petrizzo P?ez >> [image: https://] >> [image: https://]about.me/petrizzo >> >> Desc?rgate Redes para la Comprensi?n de la Pol?tica >> >> >> *A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino >> lo primero que se siembra y, por tanto, lo m?s radical.* >> >> >> El ?nico modo de vencer el secuestro del conocimiento >> es comprender sus razones. >> La manera de revertirlo, >> es hacernos hackers de los secuestros cotidianos >> a cambio de no morir sin saber lo que somos >> >> ?Piensa para vivir, >> act?a para hackear! >> Cada d?a, una acci?n procom?n a la vez. >> >> >> *?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos?* >> >> *Albert Camus* >> *?El poder, lejos de estorbar al saber, lo produce.? - **Michael >> Foucault* >> >> >> Usuario Linux # 498889 >> Miembro Red de Polit?logas - #NoSinMujeres >> https://hotelescuela.academia.edu/MariangelaPetrizzoPaez >> http://orcid.org/0000-0001-9483-4185 >> PEII - Nivel B >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Mar?a ?ngela Petrizzo P?ez [image: https://] [image: https://]about.me/petrizzo Desc?rgate Redes para la Comprensi?n de la Pol?tica *A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino lo primero que se siembra y, por tanto, lo m?s radical.* El ?nico modo de vencer el secuestro del conocimiento es comprender sus razones. La manera de revertirlo, es hacernos hackers de los secuestros cotidianos a cambio de no morir sin saber lo que somos ?Piensa para vivir, act?a para hackear! Cada d?a, una acci?n procom?n a la vez. *?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos?* *Albert Camus* *?El poder, lejos de estorbar al saber, lo produce.? - **Michael Foucault* Usuario Linux # 498889 Miembro Red de Polit?logas - #NoSinMujeres https://hotelescuela.academia.edu/MariangelaPetrizzoPaez http://orcid.org/0000-0001-9483-4185 PEII - Nivel B -------------- next part -------------- An HTML attachment was scrubbed... URL: From fad469 at uregina.ca Thu Feb 11 11:29:16 2021 From: fad469 at uregina.ca (Farzana Anowar) Date: Thu, 11 Feb 2021 11:29:16 -0500 Subject: [scikit-learn] Issue in BIRCH clustering algo Message-ID: <1ba11afabd79324e60ae0a225f50ee16@uregina.ca> Hello everyone, I was trying to run the BIRCH clustering algorithm. However, after fitting the model I am facing the following error: AttributeError: '_CFSubcluster' object has no attribute 'sq_norm_' This error occurs only after fitting the model and I couldn't find any proper explanation of this. Could anyone give me any suggestions on that? It would be really helpful. Here is my code: from sklearn.cluster import Birch # Creating the BIRCH clustering model model = Birch(n_clusters = None) # Fit the data (Training) model.fit(df) # Predict the same data pred = model.predict(df) -- Best Regards, Farzana Anowar, PhD Candidate Department of Computer Science University of Regina From rth.yurchak at gmail.com Thu Feb 11 11:51:16 2021 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Thu, 11 Feb 2021 17:51:16 +0100 Subject: [scikit-learn] Issue in BIRCH clustering algo In-Reply-To: <1ba11afabd79324e60ae0a225f50ee16@uregina.ca> References: <1ba11afabd79324e60ae0a225f50ee16@uregina.ca> Message-ID: It's a known issue, see https://github.com/scikit-learn/scikit-learn/issues/17966 Someone would would need to investigate more to find a fix though. If you have a minimal reproducible example that's different from the one in that issue, and could post it there it would help. Roman On 11/02/2021 17:29, Farzana Anowar wrote: > Hello everyone, > > I was trying to run the BIRCH clustering algorithm. However, after > fitting the model I am facing the following error: > > AttributeError: '_CFSubcluster' object has no attribute 'sq_norm_' > > This error occurs only after fitting the model and I couldn't find any > proper explanation of this. Could anyone give me any suggestions on > that? It would be really helpful. > > Here is my code: > > from sklearn.cluster import Birch > > # Creating the BIRCH clustering model > model = Birch(n_clusters = None) > > # Fit the data (Training) > model.fit(df) > > # Predict the same data > pred = model.predict(df) > From joel.nothman at gmail.com Wed Feb 17 08:08:43 2021 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 18 Feb 2021 00:08:43 +1100 Subject: [scikit-learn] [Vote] SLEP006: Routing sample-aligned metadata Message-ID: With thanks to Alex, Adrin and Christian, we have a proposal to implement what we used to call "sample props" that should be expressive enough for us to resolve tens of issues and PRs, but will be largely unobtrusive for most current users. Core developers, please cast your vote in this PR after considering the proposal here , which has a partial implementation in #16079 . In brief, the problem we are trying to solve: Scikit-learn has limited support for information pertaining to each sample (henceforth ?sample properties?) to be passed through an estimation pipeline. The user can, for instance, pass fit parameters to all members of a FeatureUnion, or to a specified member of a Pipeline using dunder (__) prefixing: >>> from sklearn.pipeline import Pipeline>>> from sklearn.linear_model import LogisticRegression>>> pipe = Pipeline([('clf', LogisticRegression())])>>> pipe.fit([[1, 2], [3, 4]], [5, 6],... clf__sample_weight=[.5, .7]) Several other meta-estimators, such as GridSearchCV, support forwarding these fit parameters to their base estimator when fitting. Yet a number of important use cases are currently not supported. Features we currently do not support and wish to include: - passing sample properties (e.g. sample_weight ) to a scorer used in cross-validation - passing sample properties (e.g. groups ) to a CV splitter in nested cross validation - passing sample properties (e.g. sample_weight ) to some scorers and not others in a multi-metric cross-validation setup Solution: Each consumer requests A meta-estimator provides along to its children only what they request. A meta-estimator needs to request, on behalf of its children, any metadata that descendant consumers request. Each object that could receive metadata should have a method called get_metadata_request() which returns a dict that specifies which metadata is consumed by each of its methods (keys of this dictionary are therefore method names, e.g. fit , transform etc.). Estimators supporting weighted fitting may return {} by default, but have a method called request_sample_weight which allows the user to specify the requested sample_weight in each of its methods. make_scorer accepts request_metadata as keyword parameter through which the user can specify what metadata is requested. Regards, Joel -------------- next part -------------- An HTML attachment was scrubbed... URL: From marmochiaskl at gmail.com Thu Feb 18 08:39:42 2021 From: marmochiaskl at gmail.com (Chiara Marmo) Date: Thu, 18 Feb 2021 14:39:42 +0100 Subject: [scikit-learn] Monthly meeting February 22nd 2021 Message-ID: Dear list, The scikit-learn monthly meeting will take place on Monday February 22nd at 8PM UTC: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=02&day=22&hour=20&min=0&sec=0&p1=179&p2=240&p3=195&p4=224 While these meetings are mainly for core-devs to discuss the current topics, we are also happy to welcome non-core devs and other project maintainers. Feel free to join, using the following link: https://meet.google.com/xhq-yoga-rtf If you plan to attend and you would like to discuss something specific about your contribution please add your name (or github pseudo) in the " Contributors " section, of the public pad: https://hackmd.io/AOY6-uVBQZmsOluLrtHblA Best Chiara -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat Feb 27 04:42:35 2021 From: joel.nothman at gmail.com (Joel Nothman) Date: Sat, 27 Feb 2021 20:42:35 +1100 Subject: [scikit-learn] [Vote] SLEP006: Routing sample-aligned metadata In-Reply-To: References: Message-ID: Hi all, Just a reminder that we are ten days into the month-long voting period, with one vote on record. Core devs, please find time to consider this proposal. Thanks to Andy's suggestion, we have added an example of the new API to the opening section: This SLEP proposes an API where users can request certain metadata to be passed to its consumer by the meta-estimator it is wrapped in. The following example illustrates the new request_metadata parameter for making scorers, the request_sample_weight estimator method, the metadata parameter replacing fit_params in cross_validate, and the automatic passing of groups to GroupKFold to enable nested grouped cross validation. Here, the user requests that the sample_weight metadata key should be passed to a customised accuracy scorer (although a predefined ?weighted_accuracy? scorer could be introduced), and to the LogisticRegressionCV. GroupKFold requests groups by default. >>> from sklearn.metrics import accuracy_score, make_scorer>>> from sklearn.model_selection import cross_validate, GroupKFold>>> from sklearn.linear_model import LogisticRegressionCV>>> weighted_acc = make_scorer(accuracy_score,... request_metadata=['sample_weight'])>>> group_cv = GroupKFold()>>> lr = LogisticRegressionCV(... cv=group_cv,... scoring=weighted_acc,... ).request_sample_weight(fit=True)>>> cross_validate(lr, X, y, cv=group_cv,... metadata={'sample_weight': my_weights,... 'groups': my_groups},... scoring=weighted_acc) On Thu, 18 Feb 2021 at 00:08, Joel Nothman wrote: > With thanks to Alex, Adrin and Christian, we have a proposal to implement > what we used to call "sample props" that should be expressive enough for us > to resolve tens of issues and PRs, but will be largely unobtrusive for most > current users. > > Core developers, please cast your vote in this PR > after > considering the proposal here > , > which has a partial implementation in #16079 > . > > > In brief, the problem we are trying to solve: > > Scikit-learn has limited support for information pertaining to each sample > (henceforth ?sample properties?) to be passed through an estimation > pipeline. The user can, for instance, pass fit parameters to all members of > a FeatureUnion, or to a specified member of a Pipeline using dunder (__) > prefixing: > > >>> from sklearn.pipeline import Pipeline>>> from sklearn.linear_model import LogisticRegression>>> pipe = Pipeline([('clf', LogisticRegression())])>>> pipe.fit([[1, 2], [3, 4]], [5, 6],... clf__sample_weight=[.5, .7]) > > Several other meta-estimators, such as GridSearchCV, support forwarding > these fit parameters to their base estimator when fitting. Yet a number of > important use cases are currently not supported. > > Features we currently do not support and wish to include: > > - passing sample properties (e.g. sample_weight > ) to > a scorer used in cross-validation > - passing sample properties (e.g. groups > ) to a CV > splitter in nested cross validation > - passing sample properties (e.g. sample_weight > ) to > some scorers and not others in a multi-metric cross-validation setup > > Solution: Each consumer requests > > A meta-estimator provides along to its children only what they request. A > meta-estimator needs to request, on behalf of its children, any metadata > that descendant consumers request. > > Each object that could receive metadata should have a method called > get_metadata_request() which returns a dict that specifies which metadata > is consumed by each of its methods (keys of this dictionary are therefore > method names, e.g. fit > , transform > etc.). > Estimators supporting weighted fitting may return {} by default, but have > a method called request_sample_weight which allows the user to specify > the requested sample_weight > in > each of its methods. make_scorer accepts request_metadata as keyword > parameter through which the user can specify what metadata is requested. > > Regards, > > Joel > -------------- next part -------------- An HTML attachment was scrubbed... URL: