From guetlein at posteo.de  Thu Mar  2 04:01:45 2023
From: guetlein at posteo.de (=?UTF-8?Q?Martin_G=C3=BCtlein?=)
Date: Thu, 02 Mar 2023 09:01:45 +0000
Subject: [scikit-learn] classification model that can handle missing
 values w/o learning from missing values
In-Reply-To: <c95204e1bd99dda2415bf2c1f4c7399d@posteo.de>
References: <c95204e1bd99dda2415bf2c1f4c7399d@posteo.de>
Message-ID: <0f0363e36de7849ca9abe2bc2542e441@posteo.de>

It would already help us, if someone could confirm that this is not 
possible in sci-kit learn, because we are still not entirely sure that 
we have no missed something.?

Regards,
Martin

Am 21.02.2023 15:48 schrieb Martin G?tlein:
> Hi,
> 
> I am looking for a classification model in python that can handle
> missing values, without imputation and "without learning from missing
> values", i.e. without using the fact that the information is missing
> for the inference.
> 
> Explained with the help of decision trees:
> * The algorithm should NOT learn whether missing values should go to
> the left or right child (like the HistGradientBoostingClassifier).
> * Instead it could built the prediction for each child node and
> aggregate these (like some Random Forest implementations do).
> 
> If that is not possible in sci-kit learn, maybe you have already
> discussed this? Or you know of a fork of sci-kit learn that is able to
> do this, or some other python library?
> 
> Any help would be really appreciated, kind regards,
> Martin
> 
> 
> P.S. Here is my use-case, in case you are interested: I have a binary
> classification problem with a positive and a negative class, and two
> types of features A and B. In my training data, I have a lot more data
> (90%) where B is missing. In my test data, I always have B, which is
> good because the B features are better than the A features. In the
> cases where B is present in the training data, the ratio of positive
> examples is much higher than when its missing. So what
> HistGradientBoostingClassifier does, it uses the fact that B is not
> missing in the test data, and predicts way too many positives.
> (Additionally, some feature values of type A are also often missing)

From gael.varoquaux at normalesup.org  Fri Mar  3 02:33:31 2023
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Fri, 3 Mar 2023 08:33:31 +0100
Subject: [scikit-learn] classification model that can handle missing
 values w/o learning from missing values
In-Reply-To: <0f0363e36de7849ca9abe2bc2542e441@posteo.de>
References: <c95204e1bd99dda2415bf2c1f4c7399d@posteo.de>
 <0f0363e36de7849ca9abe2bc2542e441@posteo.de>
Message-ID: <20230303073331.hwjdkfcw7gk2cljo@gaellaptop>

Dear Martin,

>From what I understand, you want a classifier that:
1. Is not based on imputation
2. Ignores whether a value is missing or not for the inference

It seems to me that those two requirements are in contradiction, and it is not clear to me how such a classifier would be theoretically grounded.

Best,

Ga?l

On Thu, Mar 02, 2023 at 09:01:45AM +0000, Martin G?tlein wrote:
> It would already help us, if someone could confirm that this is not possible
> in sci-kit learn, because we are still not entirely sure that we have no
> missed something.?

> Regards,
> Martin

> Am 21.02.2023 15:48 schrieb Martin G?tlein:
> > Hi,

> > I am looking for a classification model in python that can handle
> > missing values, without imputation and "without learning from missing
> > values", i.e. without using the fact that the information is missing
> > for the inference.

> > Explained with the help of decision trees:
> > * The algorithm should NOT learn whether missing values should go to
> > the left or right child (like the HistGradientBoostingClassifier).
> > * Instead it could built the prediction for each child node and
> > aggregate these (like some Random Forest implementations do).

> > If that is not possible in sci-kit learn, maybe you have already
> > discussed this? Or you know of a fork of sci-kit learn that is able to
> > do this, or some other python library?

> > Any help would be really appreciated, kind regards,
> > Martin


> > P.S. Here is my use-case, in case you are interested: I have a binary
> > classification problem with a positive and a negative class, and two
> > types of features A and B. In my training data, I have a lot more data
> > (90%) where B is missing. In my test data, I always have B, which is
> > good because the B features are better than the A features. In the
> > cases where B is present in the training data, the ratio of positive
> > examples is much higher than when its missing. So what
> > HistGradientBoostingClassifier does, it uses the fact that B is not
> > missing in the test data, and predicts way too many positives.
> > (Additionally, some feature values of type A are also often missing)
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-- 
    Gael Varoquaux
    Research Director, INRIA
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From guetlein at posteo.de  Fri Mar  3 05:22:04 2023
From: guetlein at posteo.de (=?UTF-8?Q?Martin_G=C3=BCtlein?=)
Date: Fri, 03 Mar 2023 10:22:04 +0000
Subject: [scikit-learn] classification model that can handle missing
 values w/o learning from missing values
In-Reply-To: <20230303073331.hwjdkfcw7gk2cljo@gaellaptop>
References: <c95204e1bd99dda2415bf2c1f4c7399d@posteo.de>
 <0f0363e36de7849ca9abe2bc2542e441@posteo.de>
 <20230303073331.hwjdkfcw7gk2cljo@gaellaptop>
Message-ID: <0ae4c5e830880b4353ca698cba93717c@posteo.de>

Dear Ga?l,

Thanks for your response.

> 2. Ignores whether a value is missing or not for the inference
What I meant is rather, that the missing value should NOT be treated as 
another possible value of the variable (this is e.g., what the 
HistGradientBoostingClassifier implementation in sk-learn does). 
Instead, multiple predictions could be done when a split-attribute is 
missing, and those can be averaged.

This is how it is e.g. implemented in WEKA (we cannot switch do Java, 
though ;-): 
http://web.archive.org/web/20080601175721/http://wekadocs.com/node/2/#_edn4
and described by the inventors of the RF: 
https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1

I am pretty sure something, similar is done in other classification 
algorithms, like naive bayes, where each feature is handled separately 
anyway and missing ones could just be omitted.

Regards,
Martin

Am 03.03.2023 08:33 schrieb Gael Varoquaux:
> Dear Martin,
> 
> From what I understand, you want a classifier that:
> 1. Is not based on imputation
> 2. Ignores whether a value is missing or not for the inference
> 
> It seems to me that those two requirements are in contradiction, and
> it is not clear to me how such a classifier would be theoretically
> grounded.
> 
> Best,
> 
> Ga?l
> 
> On Thu, Mar 02, 2023 at 09:01:45AM +0000, Martin G?tlein wrote:
>> It would already help us, if someone could confirm that this is not 
>> possible
>> in sci-kit learn, because we are still not entirely sure that we have 
>> no
>> missed something.?
> 
>> Regards,
>> Martin
> 
>> Am 21.02.2023 15:48 schrieb Martin G?tlein:
>> > Hi,
> 
>> > I am looking for a classification model in python that can handle
>> > missing values, without imputation and "without learning from missing
>> > values", i.e. without using the fact that the information is missing
>> > for the inference.
> 
>> > Explained with the help of decision trees:
>> > * The algorithm should NOT learn whether missing values should go to
>> > the left or right child (like the HistGradientBoostingClassifier).
>> > * Instead it could built the prediction for each child node and
>> > aggregate these (like some Random Forest implementations do).
> 
>> > If that is not possible in sci-kit learn, maybe you have already
>> > discussed this? Or you know of a fork of sci-kit learn that is able to
>> > do this, or some other python library?
> 
>> > Any help would be really appreciated, kind regards,
>> > Martin
> 
> 
>> > P.S. Here is my use-case, in case you are interested: I have a binary
>> > classification problem with a positive and a negative class, and two
>> > types of features A and B. In my training data, I have a lot more data
>> > (90%) where B is missing. In my test data, I always have B, which is
>> > good because the B features are better than the A features. In the
>> > cases where B is present in the training data, the ratio of positive
>> > examples is much higher than when its missing. So what
>> > HistGradientBoostingClassifier does, it uses the fact that B is not
>> > missing in the test data, and predicts way too many positives.
>> > (Additionally, some feature values of type A are also often missing)
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn

From gael.varoquaux at normalesup.org  Fri Mar  3 09:41:09 2023
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Fri, 3 Mar 2023 15:41:09 +0100
Subject: [scikit-learn] classification model that can handle missing
 values w/o learning from missing values
In-Reply-To: <0ae4c5e830880b4353ca698cba93717c@posteo.de>
References: <c95204e1bd99dda2415bf2c1f4c7399d@posteo.de>
 <0f0363e36de7849ca9abe2bc2542e441@posteo.de>
 <20230303073331.hwjdkfcw7gk2cljo@gaellaptop>
 <0ae4c5e830880b4353ca698cba93717c@posteo.de>
Message-ID: <20230303144109.htwghqmoqgvfcooc@gaellaptop>

On Fri, Mar 03, 2023 at 10:22:04AM +0000, Martin G?tlein wrote:
> > 2. Ignores whether a value is missing or not for the inference
> What I meant is rather, that the missing value should NOT be treated as
> another possible value of the variable (this is e.g., what the
> HistGradientBoostingClassifier implementation in sk-learn does). Instead,
> multiple predictions could be done when a split-attribute is missing, and
> those can be averaged.

> This is how it is e.g. implemented in WEKA (we cannot switch do Java, though
> ;-):
> http://web.archive.org/web/20080601175721/http://wekadocs.com/node/2/#_edn4
> and described by the inventors of the RF:
> https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1

The text that you link to describes two types of strategies, one that is similar to that done in HistGradientBoosting, the other one that amounts to imputation using a forest, and can be done in scikit-learn by setting up the IteratuiveImputer to use forests as a base learner (this will however be slow).

Cheers,

Ga?l

From lorentzen.ch at gmail.com  Mon Mar  6 17:50:19 2023
From: lorentzen.ch at gmail.com (Christian Lorentzen)
Date: Mon, 6 Mar 2023 23:50:19 +0100
Subject: [scikit-learn] New core developer: Tim Head
Message-ID: <06b9ffea-8fed-6305-3790-0bc1c34ee20d@gmail.com>

Dear all

I'm very excited to announce that Tim Head, https://github.com/betatim, 
is joining scikit-learn as core developer.
Congratulations and a warm welcome Tim!

on behalf of the scikit-learn team
Christian


From solegalli at protonmail.com  Tue Mar  7 09:53:43 2023
From: solegalli at protonmail.com (Sole Galli)
Date: Tue, 07 Mar 2023 14:53:43 +0000
Subject: [scikit-learn] obtaining intervals from the decision tree struture
Message-ID: <DppzdUBPI-Fw-_5JRJsa1j2RhVZb2CE9E_HjQ-Fxo26yHS8lbE_2Je2CIENnsv0Pku_fC8CJdm-GM535gp8NlhcVHf_eukymVQG0Jmnx-CI=@protonmail.com>

Hello,

I would like to obtain final intervals from the decision tree structure. I am not interested in every node, just the limits that take a sample to a final decision /leaf.

For example, if the tree structure is this one:

|--- feature_0 <= 0.08
|   |--- class: 0
|--- feature_0 >  0.08
|   |--- feature_0 <= 8.50
|   |   |--- feature_0 <= 1.50
|   |   |   |--- class: 1
|   |   |--- feature_0 >  1.50
|   |   |   |--- class: 1
|   |--- feature_0 >  8.50
|   |   |--- feature_0 <= 60.25
|   |   |   |--- class: 0
|   |   |--- feature_0 >  60.25
|   |   |   |--- class: 0

Then, I would like to obtain these limits:

0-0.08 ; 0.08-1.50; 1.50-8.50 ; 8.50-60; >60

Potentially as the following numpy array:

[-np.inf, 0.08, 1.5, 8.5, 60, np.inf]

Is it possible?

I have a stackoverflow question here for more details and code
https://stackoverflow.com/questions/75663472/how-to-obtain-the-interval-limits-from-a-decision-tree-with-scikit-learn

Thank you!
Sole

Sent with [Proton Mail](https://proton.me/) secure email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230307/36b49e96/attachment.html>

From g.lemaitre58 at gmail.com  Tue Mar  7 10:41:47 2023
From: g.lemaitre58 at gmail.com (=?utf-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Tue, 7 Mar 2023 16:41:47 +0100
Subject: [scikit-learn] obtaining intervals from the decision tree
 struture
In-Reply-To: <DppzdUBPI-Fw-_5JRJsa1j2RhVZb2CE9E_HjQ-Fxo26yHS8lbE_2Je2CIENnsv0Pku_fC8CJdm-GM535gp8NlhcVHf_eukymVQG0Jmnx-CI=@protonmail.com>
References: <DppzdUBPI-Fw-_5JRJsa1j2RhVZb2CE9E_HjQ-Fxo26yHS8lbE_2Je2CIENnsv0Pku_fC8CJdm-GM535gp8NlhcVHf_eukymVQG0Jmnx-CI=@protonmail.com>
Message-ID: <82CAF07D-C86E-4F57-9CAE-98DA1F7B5BB8@gmail.com>

Hi Sole,

You can use `apply` on the training `X` to get the leaf where the sample will fall in. Then a groupby should allow you to get the statistic that you want.

Cheers,
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

> On 7 Mar 2023, at 15:53, Sole Galli via scikit-learn <scikit-learn at python.org> wrote:
> 
> Hello,
> 
> I would like to obtain final intervals from the decision tree structure. I am not interested in every node, just the limits that take a sample to a final decision /leaf.
> 
> For example, if the tree structure is this one:
> |--- feature_0 <= 0.08
> |   |--- class: 0
> |--- feature_0 >  0.08
> |   |--- feature_0 <= 8.50
> |   |   |--- feature_0 <= 1.50
> |   |   |   |--- class: 1
> |   |   |--- feature_0 >  1.50
> |   |   |   |--- class: 1
> |   |--- feature_0 >  8.50
> |   |   |--- feature_0 <= 60.25
> |   |   |   |--- class: 0
> |   |   |--- feature_0 >  60.25
> |   |   |   |--- class: 0
> Then, I would like to obtain these limits:
> 0-0.08 ; 0.08-1.50; 1.50-8.50 ; 8.50-60; >60
> 
> Potentially as the following numpy array:
> [-np.inf, 0.08, 1.5, 8.5, 60, np.inf]
> 
> Is it possible?
> 
> I have a stackoverflow question here for more details and code
> https://stackoverflow.com/questions/75663472/how-to-obtain-the-interval-limits-from-a-decision-tree-with-scikit-learn
> 
> Thank you!
> Sole
> 
> Sent with Proton Mail <https://proton.me/> secure email.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230307/38caab02/attachment.html>

From thomasjpfan at gmail.com  Tue Mar  7 17:07:40 2023
From: thomasjpfan at gmail.com (Thomas J. Fan)
Date: Tue, 7 Mar 2023 17:07:40 -0500
Subject: [scikit-learn] scikit-learn monthly developer meeting: Monday March
 27, 2023
Message-ID: <CAK3g5AZGHs1MaJ=UQAfOx89Av79k9_+iC8w-hQkyrn1drdDSqg@mail.gmail.com>

Dear all,

The scikit-learn developer monthly meeting will take place on Monday March
27 at 11:00 UTC.

- Video call link: https://meet.google.com/gmn-acub-mrr
- Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q
- Local times:
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2023&month=3&day=27&hour=11&min=0&sec=0&p1=1440&p2=240&p3=248&p4=195&p5=179&p6=224

The goal of this meeting is to discuss ongoing development topics for the
project. Everybody is welcome.

As usual, please follow the code of conduct of the project:
https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md

Regards,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230307/e0d388ca/attachment.html>

From betatim at gmail.com  Wed Mar  8 05:06:41 2023
From: betatim at gmail.com (Tim Head)
Date: Wed, 8 Mar 2023 11:06:41 +0100
Subject: [scikit-learn] New core developer: Tim Head
In-Reply-To: <06b9ffea-8fed-6305-3790-0bc1c34ee20d@gmail.com>
References: <06b9ffea-8fed-6305-3790-0bc1c34ee20d@gmail.com>
Message-ID: <CAN3x1RZf8fRjPqQ_On1XgOj-mUwBffXeTqVJkvn-yYzqXZr=NA@mail.gmail.com>

Thanks a lot! I look forward to working together with the community and
other contributors!

T

On Mon, 6 Mar 2023 at 23:51, Christian Lorentzen <lorentzen.ch at gmail.com>
wrote:

> Dear all
>
> I'm very excited to announce that Tim Head, https://github.com/betatim,
> is joining scikit-learn as core developer.
> Congratulations and a warm welcome Tim!
>
> on behalf of the scikit-learn team
> Christian
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230308/cfe88255/attachment.html>

From ruchika.work at gmail.com  Wed Mar  8 09:33:10 2023
From: ruchika.work at gmail.com (Ruchika Nayyar)
Date: Wed, 8 Mar 2023 09:33:10 -0500
Subject: [scikit-learn] New core developer: Tim Head
In-Reply-To: <CAN3x1RZf8fRjPqQ_On1XgOj-mUwBffXeTqVJkvn-yYzqXZr=NA@mail.gmail.com>
References: <06b9ffea-8fed-6305-3790-0bc1c34ee20d@gmail.com>
 <CAN3x1RZf8fRjPqQ_On1XgOj-mUwBffXeTqVJkvn-yYzqXZr=NA@mail.gmail.com>
Message-ID: <CAGz0NpjYyR=FkptzcTp-164ui5qGk8OD1BHcoRwF5GXtEjsyow@mail.gmail.com>

Congratulations Tim! Good to see you virtually :)

Thanks,
Ruchika

****************
Dr. Ruchika Nayyar
Data Scientist, Greene Tweed & Co.


On Wed, Mar 8, 2023 at 5:09?AM Tim Head <betatim at gmail.com> wrote:

> Thanks a lot! I look forward to working together with the community and
> other contributors!
>
> T
>
> On Mon, 6 Mar 2023 at 23:51, Christian Lorentzen <lorentzen.ch at gmail.com>
> wrote:
>
>> Dear all
>>
>> I'm very excited to announce that Tim Head, https://github.com/betatim,
>> is joining scikit-learn as core developer.
>> Congratulations and a warm welcome Tim!
>>
>> on behalf of the scikit-learn team
>> Christian
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230308/70245c02/attachment.html>

From mail at sebastianraschka.com  Wed Mar  8 09:37:57 2023
From: mail at sebastianraschka.com (Sebastian Raschka)
Date: Wed, 8 Mar 2023 08:37:57 -0600
Subject: [scikit-learn] New core developer: Tim Head
In-Reply-To: <CAGz0NpjYyR=FkptzcTp-164ui5qGk8OD1BHcoRwF5GXtEjsyow@mail.gmail.com>
References: <06b9ffea-8fed-6305-3790-0bc1c34ee20d@gmail.com>
 <CAN3x1RZf8fRjPqQ_On1XgOj-mUwBffXeTqVJkvn-yYzqXZr=NA@mail.gmail.com>
 <CAGz0NpjYyR=FkptzcTp-164ui5qGk8OD1BHcoRwF5GXtEjsyow@mail.gmail.com>
Message-ID: <c1dba757-e648-472c-8e0b-10717a724831@Spark>

Awesome news! Congrats Tim!

Cheers,
Sebastian


On Mar 8, 2023, 8:35 AM -0600, Ruchika Nayyar <ruchika.work at gmail.com>, wrote:
> Congratulations Tim! Good to see you virtually :)
>
> Thanks,
> Ruchika
>
> ****************
> Dr. Ruchika Nayyar
> Data Scientist, Greene Tweed & Co.
>
>
> > On Wed, Mar 8, 2023 at 5:09?AM Tim Head <betatim at gmail.com> wrote:
> > > Thanks a lot! I look forward to working together with the community and other contributors!
> > >
> > > T
> > >
> > > > On Mon, 6 Mar 2023 at 23:51, Christian Lorentzen <lorentzen.ch at gmail.com> wrote:
> > > > > Dear all
> > > > >
> > > > > I'm very excited to announce that Tim Head, https://github.com/betatim,
> > > > > is joining scikit-learn as core developer.
> > > > > Congratulations and a warm welcome Tim!
> > > > >
> > > > > on behalf of the scikit-learn team
> > > > > Christian
> > > > >
> > > > > _______________________________________________
> > > > > scikit-learn mailing list
> > > > > scikit-learn at python.org
> > > > > https://mail.python.org/mailman/listinfo/scikit-learn
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230308/5b8c8b04/attachment-0001.html>

From chris at aridas.eu  Wed Mar  8 10:33:04 2023
From: chris at aridas.eu (Chris Aridas)
Date: Wed, 8 Mar 2023 17:33:04 +0200
Subject: [scikit-learn] New core developer: Tim Head
In-Reply-To: <c1dba757-e648-472c-8e0b-10717a724831@Spark>
References: <06b9ffea-8fed-6305-3790-0bc1c34ee20d@gmail.com>
 <CAN3x1RZf8fRjPqQ_On1XgOj-mUwBffXeTqVJkvn-yYzqXZr=NA@mail.gmail.com>
 <CAGz0NpjYyR=FkptzcTp-164ui5qGk8OD1BHcoRwF5GXtEjsyow@mail.gmail.com>
 <c1dba757-e648-472c-8e0b-10717a724831@Spark>
Message-ID: <CAHTPD-2rSe7b+d75UhbJKZWir8RYLHhK8_i7YL5G_PXaXoi3yA@mail.gmail.com>

Congrats Tim!

Best,
Chris


On Wed, Mar 8, 2023 at 5:02?PM Sebastian Raschka <mail at sebastianraschka.com>
wrote:

> Awesome news! Congrats Tim!
>
> Cheers,
> Sebastian
>
>
>
>
>
>
>
>
> On Mar 8, 2023, 8:35 AM -0600, Ruchika Nayyar <ruchika.work at gmail.com>,
> wrote:
>
> Congratulations Tim! Good to see you virtually :)
>
> Thanks,
> Ruchika
>
> ****************
> Dr. Ruchika Nayyar
> Data Scientist, Greene Tweed & Co.
>
>
> On Wed, Mar 8, 2023 at 5:09?AM Tim Head <betatim at gmail.com> wrote:
>
>> Thanks a lot! I look forward to working together with the community and
>> other contributors!
>>
>> T
>>
>> On Mon, 6 Mar 2023 at 23:51, Christian Lorentzen <lorentzen.ch at gmail.com>
>> wrote:
>>
>>> Dear all
>>>
>>> I'm very excited to announce that Tim Head, https://github.com/betatim,
>>> is joining scikit-learn as core developer.
>>> Congratulations and a warm welcome Tim!
>>>
>>> on behalf of the scikit-learn team
>>> Christian
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230308/70040a00/attachment.html>

From jeremie.du-boisberranger at inria.fr  Thu Mar  9 05:15:42 2023
From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger)
Date: Thu, 9 Mar 2023 11:15:42 +0100
Subject: [scikit-learn] [ANN] scikit-learn 1.2.2 is online!
In-Reply-To: <CACDxx9j7cDZ+c8ZSTdjRmtL6OewpBHUhEEvPfUtY4gAexpnE5w@mail.gmail.com>
References: <CACDxx9j7cDZ+c8ZSTdjRmtL6OewpBHUhEEvPfUtY4gAexpnE5w@mail.gmail.com>
Message-ID: <1c32cfc9-8c4f-dc6d-4656-3ea53cff3a25@inria.fr>

scikit-learn 1.2.2 is out on pypi.org <http://pypi.org> and conda-forge!

This is a maintenance release that fixes several regressions introduced 
in version 1.2
<https://scikit-learn.org/stable/whats_new/v1.2.html#version-1-2-1>https://scikit-learn.org/stable/whats_new/v1.2.html#version-1-2-2

You can upgrade with pip as usual:

|pip install -U scikit-learn |

The conda-forge builds will be available shortly, which you can then 
install using:

|conda install -c conda-forge scikit-learn |

Thanks to all contributors who helped on this release.

J?r?mie,
On the behalf of the Scikit-learn maintainers team.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230309/a09b1feb/attachment.html>

From g.lemaitre58 at gmail.com  Thu Mar  9 05:32:37 2023
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Thu, 9 Mar 2023 11:32:37 +0100
Subject: [scikit-learn] [ANN] scikit-learn 1.2.2 is online!
In-Reply-To: <1c32cfc9-8c4f-dc6d-4656-3ea53cff3a25@inria.fr>
References: <CACDxx9j7cDZ+c8ZSTdjRmtL6OewpBHUhEEvPfUtY4gAexpnE5w@mail.gmail.com>
 <1c32cfc9-8c4f-dc6d-4656-3ea53cff3a25@inria.fr>
Message-ID: <CACDxx9jFThhPV-2ieYL76JFjGY7VwTr0NUdwXvw5vWGjQAO=NA@mail.gmail.com>

Thanks for taking care of this release Jeremie.

Cheers,

On Thu, 9 Mar 2023 at 11:17, Jeremie du Boisberranger <
jeremie.du-boisberranger at inria.fr> wrote:

> scikit-learn 1.2.2 is out on pypi.org and conda-forge!
> This is a maintenance release that fixes several regressions introduced in
> version 1.2
> <https://scikit-learn.org/stable/whats_new/v1.2.html#version-1-2-1>
> https://scikit-learn.org/stable/whats_new/v1.2.html#version-1-2-2
>
> You can upgrade with pip as usual:
>
> pip install -U scikit-learn
>
> The conda-forge builds will be available shortly, which you can then
> install using:
>
> conda install -c conda-forge scikit-learn
>
>
> Thanks to all contributors who helped on this release.
> J?r?mie,
> On the behalf of the Scikit-learn maintainers team.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230309/aa00e240/attachment.html>

From guetlein at posteo.de  Fri Mar 10 08:19:09 2023
From: guetlein at posteo.de (=?UTF-8?Q?Martin_G=C3=BCtlein?=)
Date: Fri, 10 Mar 2023 13:19:09 +0000
Subject: [scikit-learn] classification model that can handle missing
 values w/o learning from missing values
In-Reply-To: <20230303144109.htwghqmoqgvfcooc@gaellaptop>
References: <c95204e1bd99dda2415bf2c1f4c7399d@posteo.de>
 <0f0363e36de7849ca9abe2bc2542e441@posteo.de>
 <20230303073331.hwjdkfcw7gk2cljo@gaellaptop>
 <0ae4c5e830880b4353ca698cba93717c@posteo.de>
 <20230303144109.htwghqmoqgvfcooc@gaellaptop>
Message-ID: <bc0cee1b86f4130fb9b5616da3f06719@posteo.de>

Hi Ga?l,

> [...] the other one that
> amounts to imputation using a forest, and can be done in scikit-learn
> by setting up the IteratuiveImputer to use forests as a base learner
> (this will however be slow).

The main difference is that when I use the IterativeImputer in 
scikit-learn, I still have to apply this imputation on the test set, 
before being able to predict with the RF. However, other implementations 
do not impute missing values, but instead split up the test instance.

I made the experience that this makes a big difference, and you are able 
to use features where the majority of values is missing, and where at 
the same time the class ratio of the examples with missing values is 
largely different to those without missing values.

Kind regards,
Martin


Am 03.03.2023 15:41 schrieb Gael Varoquaux:
> On Fri, Mar 03, 2023 at 10:22:04AM +0000, Martin G?tlein wrote:
>> > 2. Ignores whether a value is missing or not for the inference
>> What I meant is rather, that the missing value should NOT be treated 
>> as
>> another possible value of the variable (this is e.g., what the
>> HistGradientBoostingClassifier implementation in sk-learn does). 
>> Instead,
>> multiple predictions could be done when a split-attribute is missing, 
>> and
>> those can be averaged.
> 
>> This is how it is e.g. implemented in WEKA (we cannot switch do Java, 
>> though
>> ;-):
>> http://web.archive.org/web/20080601175721/http://wekadocs.com/node/2/#_edn4
>> and described by the inventors of the RF:
>> https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1
> 
> The text that you link to describes two types of strategies, one that
> is similar to that done in HistGradientBoosting, the other one that
> amounts to imputation using a forest, and can be done in scikit-learn
> by setting up the IteratuiveImputer to use forests as a base learner
> (this will however be slow).
> 
> Cheers,
> 
> Ga?l

From g.lemaitre58 at gmail.com  Fri Mar 10 08:38:31 2023
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Fri, 10 Mar 2023 14:38:31 +0100
Subject: [scikit-learn] classification model that can handle missing
 values w/o learning from missing values
In-Reply-To: <bc0cee1b86f4130fb9b5616da3f06719@posteo.de>
References: <c95204e1bd99dda2415bf2c1f4c7399d@posteo.de>
 <0f0363e36de7849ca9abe2bc2542e441@posteo.de>
 <20230303073331.hwjdkfcw7gk2cljo@gaellaptop>
 <0ae4c5e830880b4353ca698cba93717c@posteo.de>
 <20230303144109.htwghqmoqgvfcooc@gaellaptop>
 <bc0cee1b86f4130fb9b5616da3f06719@posteo.de>
Message-ID: <CACDxx9jj0RjeLP3a0dHipG4WHHMAm2ZrRKcZDOJREi4+hZnk_Q@mail.gmail.com>

Hi Martin,

I think that you could use `imbalanced-learn` and a bit of Pandas/NumPy to
get the behaviour that you want.
You can use a `FunctionSampler` (
https://imbalanced-learn.org/stable/references/generated/imblearn.FunctionSampler.html)
in which you remove the sample containing missing values.
This process is only apply when calling `fit`. You will need to use the
`Pipeline` from imbalanced-learn` as well.

In some way, it seems that you want to resample the training set which what
the `Sampler` are intended for in `imbalanced-learn`.

Cheers,

On Fri, 10 Mar 2023 at 14:21, Martin G?tlein <guetlein at posteo.de> wrote:

> Hi Ga?l,
>
> > [...] the other one that
> > amounts to imputation using a forest, and can be done in scikit-learn
> > by setting up the IteratuiveImputer to use forests as a base learner
> > (this will however be slow).
>
> The main difference is that when I use the IterativeImputer in
> scikit-learn, I still have to apply this imputation on the test set,
> before being able to predict with the RF. However, other implementations
> do not impute missing values, but instead split up the test instance.
>
> I made the experience that this makes a big difference, and you are able
> to use features where the majority of values is missing, and where at
> the same time the class ratio of the examples with missing values is
> largely different to those without missing values.
>
> Kind regards,
> Martin
>
>
>
>
>
> Am 03.03.2023 15:41 schrieb Gael Varoquaux:
> > On Fri, Mar 03, 2023 at 10:22:04AM +0000, Martin G?tlein wrote:
> >> > 2. Ignores whether a value is missing or not for the inference
> >> What I meant is rather, that the missing value should NOT be treated
> >> as
> >> another possible value of the variable (this is e.g., what the
> >> HistGradientBoostingClassifier implementation in sk-learn does).
> >> Instead,
> >> multiple predictions could be done when a split-attribute is missing,
> >> and
> >> those can be averaged.
> >
> >> This is how it is e.g. implemented in WEKA (we cannot switch do Java,
> >> though
> >> ;-):
> >>
> http://web.archive.org/web/20080601175721/http://wekadocs.com/node/2/#_edn4
> >> and described by the inventors of the RF:
> >>
> https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1
> >
> > The text that you link to describes two types of strategies, one that
> > is similar to that done in HistGradientBoosting, the other one that
> > amounts to imputation using a forest, and can be done in scikit-learn
> > by setting up the IteratuiveImputer to use forests as a base learner
> > (this will however be slow).
> >
> > Cheers,
> >
> > Ga?l
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230310/8fa2cc7e/attachment.html>

From adrin.jalali at gmail.com  Tue Mar 14 07:14:16 2023
From: adrin.jalali at gmail.com (Adrin)
Date: Tue, 14 Mar 2023 12:14:16 +0100
Subject: [scikit-learn] VOTE: Governance update: elevating voting rights
Message-ID: <CAEOrW49g=5mkUsA0+yJiJN9Nmhn=VQSsjYrTNnNvwqdKBw1RLw@mail.gmail.com>

Hi,

Since SLEP020
<https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep020/proposal.html>
updates to the governance model don't require a separate SLEP, and can be
done through a pull request on the repo.

This PR <https://github.com/scikit-learn/scikit-learn/pull/25753>
introduces certain changes to the governance, which in effect is elevating
voting rights of two existing groups: Contributor experience team and the
communication team.

It also renames "core developers" team to "maintainers", and puts them with
the above two teams in a "Core contributors" group.

According to our governance, we need to call a vote for any such changes,
hence I'm calling for a vote. Please vote on the pull request
<https://github.com/scikit-learn/scikit-learn/pull/25753/files>.

The vote will conclude in a month, and we need a 2/3 majority of the cast
vote to pass the motion.

Regards,
Adrin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230314/a1833d56/attachment.html>

From adrin.jalali at gmail.com  Sat Mar 25 15:42:25 2023
From: adrin.jalali at gmail.com (Adrin)
Date: Sat, 25 Mar 2023 20:42:25 +0100
Subject: [scikit-learn] CFP: GitHub copilot for PRs
Message-ID: <CAEOrW49e1nLAHEvkxNpD4PXbDB8SB+DJph3nphNU5gMJJ1xM4w@mail.gmail.com>

What do we think of GitHub copilot for PRs?

Had anybody tried it? Is it something we think is a good idea at this
point?

I'm gonna try it on some smaller repos and see what it does.

https://github.com/features/preview/copilot-x

Cheers,
Adrin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230325/707b0b39/attachment.html>

From readyready15728 at gmail.com  Sat Mar 25 15:49:23 2023
From: readyready15728 at gmail.com (Lynn Bradshaw)
Date: Sat, 25 Mar 2023 15:49:23 -0400
Subject: [scikit-learn] CFP: GitHub copilot for PRs
In-Reply-To: <CAEOrW49e1nLAHEvkxNpD4PXbDB8SB+DJph3nphNU5gMJJ1xM4w@mail.gmail.com>
References: <CAEOrW49e1nLAHEvkxNpD4PXbDB8SB+DJph3nphNU5gMJJ1xM4w@mail.gmail.com>
Message-ID: <CABVV=fMR6DAukOq8FQ1eRLEQ6b25FakOt=k8wp7ROtKrs1u+0g@mail.gmail.com>

I'm disinclined to use it, barring extensive human review, because of chats
I've had like this

[image: Screenshot 2023-03-19 at 20-47-32 ChatGPT.png]

On Sat, Mar 25, 2023 at 3:43?PM Adrin <adrin.jalali at gmail.com> wrote:

> What do we think of GitHub copilot for PRs?
>
> Had anybody tried it? Is it something we think is a good idea at this
> point?
>
> I'm gonna try it on some smaller repos and see what it does.
>
> https://github.com/features/preview/copilot-x
>
> Cheers,
> Adrin
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230325/69508f47/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-03-19 at 20-47-32 ChatGPT.png
Type: image/png
Size: 107843 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230325/69508f47/attachment-0001.png>

From g.lemaitre58 at gmail.com  Sat Mar 25 16:20:38 2023
From: g.lemaitre58 at gmail.com (=?utf-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Sat, 25 Mar 2023 21:20:38 +0100
Subject: [scikit-learn] CFP: GitHub copilot for PRs
In-Reply-To: <CABVV=fMR6DAukOq8FQ1eRLEQ6b25FakOt=k8wp7ROtKrs1u+0g@mail.gmail.com>
References: <CAEOrW49e1nLAHEvkxNpD4PXbDB8SB+DJph3nphNU5gMJJ1xM4w@mail.gmail.com>
 <CABVV=fMR6DAukOq8FQ1eRLEQ6b25FakOt=k8wp7ROtKrs1u+0g@mail.gmail.com>
Message-ID: <EAD78CC6-6C2B-4CD2-ADCF-DDA9B294A158@gmail.com>

I assume that we need to check which feature could be used.

For instance, providing automatic description in PRs could be something that I kind of like.
Proposing non-regression tests for new comers that never wrote some could also be useful.

At the end, we will always add manual reviews before merging.
The 1 billion dollars question is ?is Copilot X can accelerate or ease the contribution or reviewing process??

Cheers,

> On 25 Mar 2023, at 20:49, Lynn Bradshaw <readyready15728 at gmail.com> wrote:
> 
> I'm disinclined to use it, barring extensive human review, because of chats I've had like this
> 
> <Screenshot 2023-03-19 at 20-47-32 ChatGPT.png>
> 
> On Sat, Mar 25, 2023 at 3:43?PM Adrin <adrin.jalali at gmail.com <mailto:adrin.jalali at gmail.com>> wrote:
>> What do we think of GitHub copilot for PRs? 
>> 
>> Had anybody tried it? Is it something we think is a good idea at this point? 
>> 
>> I'm gonna try it on some smaller repos and see what it does.
>> 
>> https://github.com/features/preview/copilot-x
>> 
>> Cheers,
>> Adrin 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230325/949217d1/attachment.html>