From benoit.presles at u-bourgogne.fr  Fri Apr  3 07:00:36 2020
From: benoit.presles at u-bourgogne.fr (=?UTF-8?Q?Beno=c3=aet_Presles?=)
Date: Fri, 3 Apr 2020 13:00:36 +0200
Subject: [scikit-learn] Number of informative features vs total number
 of features
In-Reply-To: <10c2473f-50e3-c959-b9f7-07c2b903c840@u-bourgogne.fr>
References: <10c2473f-50e3-c959-b9f7-07c2b903c840@u-bourgogne.fr>
Message-ID: <525257ad-dad6-cf48-9749-4461452a3a72@u-bourgogne.fr>

Dear sklearn users,

I have just checked if the generated features were independents by 
computing the covariance and correlation matrices and it seems they are, 
so I really do not understand my results.
Any idea ?

Thanks for your help,
Best regards,
Ben


Le 31/03/2020 ? 15:48, Beno?t Presles a ?crit?:
> Dear sklearn users,
>
> I did some supervised classification simulations with the 
> make_classification function from sklearn increasing the number of 
> informative features from 1 out of 40 to 40 out of 40 (100%). I did 
> not generate any repeated or redundant features. I fixed the number of 
> classes to two and the number of clusters per class to one.
>
> I split the dataset 100 times using the StratifiedShuffleSplit 
> function into two subsets: a training set and a test set (80% - 20%). 
> I performed a logistic regression and calculated training and testing 
> accuracies and averaged the results over the 100 splits leading to a 
> mean training accuracy and a mean testing accuracy.
>
> I was expecting to get an increasing accuracy score as a function of 
> informative features for both the training and the test sets. On the 
> contrary, I have got the best training and test scores for one 
> informative feature. Why do I get these results ?
>
> Thanks for your help,
> Best regards,
> Ben
>
> Below the simulation code I have written:
>
> import numpy as np
> from sklearn.datasets import make_classification
> from sklearn.model_selection import StratifiedShuffleSplit
> from sklearn.preprocessing import StandardScaler
> from sklearn.linear_model import LogisticRegression
> from sklearn.metrics import accuracy_score
> import matplotlib.pyplot as plt
>
> RANDOM_SEED = 4
> n_inf = np.array([1, 5, 10, 15, 20, 25, 30, 35, 40])
>
> mean_training_score_array = np.array([])
> mean_testing_score_array = np.array([])
> for n_inf_value in n_inf:
> ??? X, y = make_classification(n_samples=2500,
> ?????????????????????????????? n_features=40,
> ?????????????????????????????? n_informative=n_inf_value,
> ?????????????????????????????? n_redundant=0,
> ?????????????????????????????? n_repeated=0,
> ?????????????????????????????? n_classes=2,
> ?????????????????????????????? n_clusters_per_class=1,
> ?????????????????????????????? random_state=RANDOM_SEED,
> ?????????????????????????????? shuffle=False)
> ??? #
> ??? print('Simulated data - number of informative features = ' + 
> str(n_inf_value))
> ??? #
> ??? sss = StratifiedShuffleSplit(n_splits=100, test_size=0.2, 
> random_state=RANDOM_SEED)
> ??? training_score_array = np.array([])
> ??? testing_score_array = np.array([])
> ??? for train_index_split, test_index_split in sss.split(X, y):
> ??????? X_split_train, X_split_test = X[train_index_split], 
> X[test_index_split]
> ??????? y_split_train, y_split_test = y[train_index_split], 
> y[test_index_split]
> ??????? scaler = StandardScaler()
> ??????? X_split_train = scaler.fit_transform(X_split_train)
> ??????? X_split_test = scaler.transform(X_split_test)
> ??????? lr = LogisticRegression(fit_intercept=True, max_iter=1e9, 
> verbose=0,
> ??????????????????????????????? random_state=RANDOM_SEED, 
> solver='lbfgs', tol=1e-6, C=10)
> ??????? lr.fit(X_split_train, y_split_train)
> ??????? y_pred_train = lr.predict(X_split_train)
> ??????? y_pred_test = lr.predict(X_split_test)
> ??????? accuracy_train_score = accuracy_score(y_split_train, 
> y_pred_train)
> ??????? accuracy_test_score = accuracy_score(y_split_test, y_pred_test)
> ??????? training_score_array = np.append(training_score_array, 
> accuracy_train_score)
> ??????? testing_score_array = np.append(testing_score_array, 
> accuracy_test_score)
> ??? mean_training_score_array = np.append(mean_training_score_array, 
> np.average(training_score_array))
> ??? mean_testing_score_array = np.append(mean_testing_score_array, 
> np.average(testing_score_array))
> #
> print('mean_training_score_array=' + str(mean_training_score_array))
> print('mean_testing_score_array=' + str(mean_testing_score_array))
> #
> plt.plot(n_inf, mean_training_score_array, 'r', label='mean training 
> score')
> plt.plot(n_inf, mean_testing_score_array, 'g', label='mean testing 
> score')
> plt.xlabel('number of informative features out of 40')
> plt.ylabel('accuracy')
> plt.legend()
> plt.show()
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From t3kcit at gmail.com  Fri Apr  3 10:51:15 2020
From: t3kcit at gmail.com (Andreas Mueller)
Date: Fri, 3 Apr 2020 10:51:15 -0400
Subject: [scikit-learn] Number of informative features vs total number
 of features
In-Reply-To: <525257ad-dad6-cf48-9749-4461452a3a72@u-bourgogne.fr>
References: <10c2473f-50e3-c959-b9f7-07c2b903c840@u-bourgogne.fr>
 <525257ad-dad6-cf48-9749-4461452a3a72@u-bourgogne.fr>
Message-ID: <809a5bde-f637-0a74-2d68-08f9b4d7ba7c@gmail.com>

Hi Ben.
I'd recommend you check the code to see how the data is generated.

Best,
Andy

On 4/3/20 7:00 AM, Beno?t Presles wrote:
> Dear sklearn users,
>
> I have just checked if the generated features were independents by 
> computing the covariance and correlation matrices and it seems they 
> are, so I really do not understand my results.
> Any idea ?
>
> Thanks for your help,
> Best regards,
> Ben
>
>
> Le 31/03/2020 ? 15:48, Beno?t Presles a ?crit?:
>> Dear sklearn users,
>>
>> I did some supervised classification simulations with the 
>> make_classification function from sklearn increasing the number of 
>> informative features from 1 out of 40 to 40 out of 40 (100%). I did 
>> not generate any repeated or redundant features. I fixed the number 
>> of classes to two and the number of clusters per class to one.
>>
>> I split the dataset 100 times using the StratifiedShuffleSplit 
>> function into two subsets: a training set and a test set (80% - 20%). 
>> I performed a logistic regression and calculated training and testing 
>> accuracies and averaged the results over the 100 splits leading to a 
>> mean training accuracy and a mean testing accuracy.
>>
>> I was expecting to get an increasing accuracy score as a function of 
>> informative features for both the training and the test sets. On the 
>> contrary, I have got the best training and test scores for one 
>> informative feature. Why do I get these results ?
>>
>> Thanks for your help,
>> Best regards,
>> Ben
>>
>> Below the simulation code I have written:
>>
>> import numpy as np
>> from sklearn.datasets import make_classification
>> from sklearn.model_selection import StratifiedShuffleSplit
>> from sklearn.preprocessing import StandardScaler
>> from sklearn.linear_model import LogisticRegression
>> from sklearn.metrics import accuracy_score
>> import matplotlib.pyplot as plt
>>
>> RANDOM_SEED = 4
>> n_inf = np.array([1, 5, 10, 15, 20, 25, 30, 35, 40])
>>
>> mean_training_score_array = np.array([])
>> mean_testing_score_array = np.array([])
>> for n_inf_value in n_inf:
>> ??? X, y = make_classification(n_samples=2500,
>> ?????????????????????????????? n_features=40,
>> ?????????????????????????????? n_informative=n_inf_value,
>> ?????????????????????????????? n_redundant=0,
>> ?????????????????????????????? n_repeated=0,
>> ?????????????????????????????? n_classes=2,
>> ?????????????????????????????? n_clusters_per_class=1,
>> ?????????????????????????????? random_state=RANDOM_SEED,
>> ?????????????????????????????? shuffle=False)
>> ??? #
>> ??? print('Simulated data - number of informative features = ' + 
>> str(n_inf_value))
>> ??? #
>> ??? sss = StratifiedShuffleSplit(n_splits=100, test_size=0.2, 
>> random_state=RANDOM_SEED)
>> ??? training_score_array = np.array([])
>> ??? testing_score_array = np.array([])
>> ??? for train_index_split, test_index_split in sss.split(X, y):
>> ??????? X_split_train, X_split_test = X[train_index_split], 
>> X[test_index_split]
>> ??????? y_split_train, y_split_test = y[train_index_split], 
>> y[test_index_split]
>> ??????? scaler = StandardScaler()
>> ??????? X_split_train = scaler.fit_transform(X_split_train)
>> ??????? X_split_test = scaler.transform(X_split_test)
>> ??????? lr = LogisticRegression(fit_intercept=True, max_iter=1e9, 
>> verbose=0,
>> ??????????????????????????????? random_state=RANDOM_SEED, 
>> solver='lbfgs', tol=1e-6, C=10)
>> ??????? lr.fit(X_split_train, y_split_train)
>> ??????? y_pred_train = lr.predict(X_split_train)
>> ??????? y_pred_test = lr.predict(X_split_test)
>> ??????? accuracy_train_score = accuracy_score(y_split_train, 
>> y_pred_train)
>> ??????? accuracy_test_score = accuracy_score(y_split_test, y_pred_test)
>> ??????? training_score_array = np.append(training_score_array, 
>> accuracy_train_score)
>> ??????? testing_score_array = np.append(testing_score_array, 
>> accuracy_test_score)
>> ??? mean_training_score_array = np.append(mean_training_score_array, 
>> np.average(training_score_array))
>> ??? mean_testing_score_array = np.append(mean_testing_score_array, 
>> np.average(testing_score_array))
>> #
>> print('mean_training_score_array=' + str(mean_training_score_array))
>> print('mean_testing_score_array=' + str(mean_testing_score_array))
>> #
>> plt.plot(n_inf, mean_training_score_array, 'r', label='mean training 
>> score')
>> plt.plot(n_inf, mean_testing_score_array, 'g', label='mean testing 
>> score')
>> plt.xlabel('number of informative features out of 40')
>> plt.ylabel('accuracy')
>> plt.legend()
>> plt.show()
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


From marmochiaskl at gmail.com  Fri Apr 24 06:29:19 2020
From: marmochiaskl at gmail.com (Chiara Marmo)
Date: Fri, 24 Apr 2020 12:29:19 +0200
Subject: [scikit-learn] April 27th scikit-learn monthly meeting
Message-ID: <CAGfF148JXxF-SmO9oX6hEGh1RJTk_Hp+QGKji-BtxEWKyztJxg@mail.gmail.com>

Hi all,

The next scikit-learn monthly meeting will take place on Monday April 27th
at the usual time:
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=4&day=27&hour=12&min=0&sec=0&p1=240&p2=33&p3=37&p4=179&p5=195

While these meetings are mainly for core-devs to discuss the current
topics, we're also happy to welcome non-core devs and other projects
maintainers! Feel free to join, using the following link:

https://anaconda.zoom.us/j/94399382811?pwd=cXBtQ2lTVEtVbFpVTkE3TVFxdEhqZz09

Meeting ID: 943 9938 2811
Password: 68473658

If you plan to attend and you would like to discuss something specific
about your contribution please add your name (or github pseudo) in the "Issue
and comments from contributors
<https://hackmd.io/5c6LxpnWSzeaBwJfuX5gPA?both#Issue-and-comments-from-contributors>",
of the public pad:

https://hackmd.io/5c6LxpnWSzeaBwJfuX5gPA


*@core devs, please make sure to update your notes on Friday.*


Best,

Chiara
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200424/41a0a31f/attachment.html>

From paisanohermes at hotmail.com  Fri Apr 24 06:36:32 2020
From: paisanohermes at hotmail.com (Hermes Morales)
Date: Fri, 24 Apr 2020 10:36:32 +0000
Subject: [scikit-learn] April 27th scikit-learn monthly meeting
In-Reply-To: <CAGfF148JXxF-SmO9oX6hEGh1RJTk_Hp+QGKji-BtxEWKyztJxg@mail.gmail.com>
References: <CAGfF148JXxF-SmO9oX6hEGh1RJTk_Hp+QGKji-BtxEWKyztJxg@mail.gmail.com>
Message-ID: <SN2PR0801MB2223DF6394E3A8D0304490C9A0D00@SN2PR0801MB2223.namprd08.prod.outlook.com>

Thank you Chiara
Which is the usual time?

Obtener Outlook para Android<https://aka.ms/ghei36>

________________________________
From: scikit-learn <scikit-learn-bounces+paisanohermes=hotmail.com at python.org> on behalf of Chiara Marmo <marmochiaskl at gmail.com>
Sent: Friday, April 24, 2020 7:29:19 AM
To: Scikit-learn mailing list <scikit-learn at python.org>
Subject: [scikit-learn] April 27th scikit-learn monthly meeting


Hi all,

The next scikit-learn monthly meeting will take place on Monday April 27th at the usual time: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=4&day=27&hour=12&min=0&sec=0&p1=240&p2=33&p3=37&p4=179&p5=195

While these meetings are mainly for core-devs to discuss the current topics, we're also happy to welcome non-core devs and other projects maintainers! Feel free to join, using the following link:

https://anaconda.zoom.us/j/94399382811?pwd=cXBtQ2lTVEtVbFpVTkE3TVFxdEhqZz09

Meeting ID: 943 9938 2811
Password: 68473658


If you plan to attend and you would like to discuss something specific about your contribution please add your name (or github pseudo) in the "Issue and comments from contributors<https://hackmd.io/5c6LxpnWSzeaBwJfuX5gPA?both#Issue-and-comments-from-contributors>", of the public pad:

https://hackmd.io/5c6LxpnWSzeaBwJfuX5gPA


@core devs, please make sure to update your notes on Friday.


Best,

Chiara
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200424/8beabaa8/attachment.html>

From faf96 at hotmail.it  Fri Apr 24 06:37:43 2020
From: faf96 at hotmail.it (Francesco basciani)
Date: Fri, 24 Apr 2020 10:37:43 +0000
Subject: [scikit-learn] Class weight SVC
Message-ID: <PR2PR03MB5291C94A829D8FDAFB4AB989DDD00@PR2PR03MB5291.eurprd03.prod.outlook.com>

Hi, i have a question regarding the class weights in SVC. I have an imbalanced binary classification problem. In my case the ratio between the positive class and the negative class is 4:1. I just want to know if setting class weight to:
class_weight = {1: 0.25, 0: 1} is the same to setting it to:
class_weight = {1: 1, 0: 4}.

Because in my case i obtain differents results using the two definition of the class weight

Inviato da Posta<https://go.microsoft.com/fwlink/?LinkId=550986> per Windows 10

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200424/0715726e/attachment-0001.html>

From adrin.jalali at gmail.com  Fri Apr 24 06:40:47 2020
From: adrin.jalali at gmail.com (Adrin)
Date: Fri, 24 Apr 2020 12:40:47 +0200
Subject: [scikit-learn] April 27th scikit-learn monthly meeting
In-Reply-To: <SN2PR0801MB2223DF6394E3A8D0304490C9A0D00@SN2PR0801MB2223.namprd08.prod.outlook.com>
References: <CAGfF148JXxF-SmO9oX6hEGh1RJTk_Hp+QGKji-BtxEWKyztJxg@mail.gmail.com>
 <SN2PR0801MB2223DF6394E3A8D0304490C9A0D00@SN2PR0801MB2223.namprd08.prod.outlook.com>
Message-ID: <CAEOrW48GH9H0oceJf3MWdeMGtg=4FBtuD3Bg==8fHJS0TWZC8g@mail.gmail.com>

Hi Hermes,

It's 12pm (noon) UTC
Thanks for asking.
Best,
Adrin.

On Fri, Apr 24, 2020 at 12:37 PM Hermes Morales <paisanohermes at hotmail.com>
wrote:

> Thank you Chiara
> Which is the usual time?
>
> Obtener Outlook para Android <https://aka.ms/ghei36>
>
> ------------------------------
> *From:* scikit-learn <scikit-learn-bounces+paisanohermes=
> hotmail.com at python.org> on behalf of Chiara Marmo <marmochiaskl at gmail.com>
> *Sent:* Friday, April 24, 2020 7:29:19 AM
> *To:* Scikit-learn mailing list <scikit-learn at python.org>
> *Subject:* [scikit-learn] April 27th scikit-learn monthly meeting
>
>
> Hi all,
>
> The next scikit-learn monthly meeting will take place on Monday April 27th
> at the usual time:
> https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=4&day=27&hour=12&min=0&sec=0&p1=240&p2=33&p3=37&p4=179&p5=195
>
> While these meetings are mainly for core-devs to discuss the current
> topics, we're also happy to welcome non-core devs and other projects
> maintainers! Feel free to join, using the following link:
>
> https://anaconda.zoom.us/j/94399382811?pwd=cXBtQ2lTVEtVbFpVTkE3TVFxdEhqZz09
>
> Meeting ID: 943 9938 2811
> Password: 68473658
>
> If you plan to attend and you would like to discuss something specific
> about your contribution please add your name (or github pseudo) in the "Issue
> and comments from contributors
> <https://hackmd.io/5c6LxpnWSzeaBwJfuX5gPA?both#Issue-and-comments-from-contributors>",
> of the public pad:
>
> https://hackmd.io/5c6LxpnWSzeaBwJfuX5gPA
>
>
>
> *@core devs, please make sure to update your notes on Friday. *
>
>
> Best,
>
> Chiara
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200424/4949dbc7/attachment.html>

From marmochiaskl at gmail.com  Mon Apr 27 07:22:32 2020
From: marmochiaskl at gmail.com (Chiara Marmo)
Date: Mon, 27 Apr 2020 13:22:32 +0200
Subject: [scikit-learn] April 27th scikit-learn monthly meeting
In-Reply-To: <CAGfF148JXxF-SmO9oX6hEGh1RJTk_Hp+QGKji-BtxEWKyztJxg@mail.gmail.com>
References: <CAGfF148JXxF-SmO9oX6hEGh1RJTk_Hp+QGKji-BtxEWKyztJxg@mail.gmail.com>
Message-ID: <CAGfF149SFzejKvLnO7KZEYFAA5y=RYxCO_ESCSm_DiaGsMJeAg@mail.gmail.com>

Dear all,

the zoom link used for the core-dev meeting had to be updated.
The new link follows.

Join the core-dev Zoom Meeting at
https://us02web.zoom.us/j/2752786717

Meeting ID: 275 278 6717 <callto:275 278 6717>

See you there!

Best,
Chiara


On Fri, Apr 24, 2020 at 12:29 PM Chiara Marmo <marmochiaskl at gmail.com>
wrote:

> Hi all,
>
> The next scikit-learn monthly meeting will take place on Monday April 27th
> at the usual time:
> https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=4&day=27&hour=12&min=0&sec=0&p1=240&p2=33&p3=37&p4=179&p5=195
>
> While these meetings are mainly for core-devs to discuss the current
> topics, we're also happy to welcome non-core devs and other projects
> maintainers! Feel free to join, using the following link:
>
> https://anaconda.zoom.us/j/94399382811?pwd=cXBtQ2lTVEtVbFpVTkE3TVFxdEhqZz09
>
> Meeting ID: 943 9938 2811
> Password: 68473658
>
> If you plan to attend and you would like to discuss something specific
> about your contribution please add your name (or github pseudo) in the "Issue
> and comments from contributors
> <https://hackmd.io/5c6LxpnWSzeaBwJfuX5gPA?both#Issue-and-comments-from-contributors>",
> of the public pad:
>
> https://hackmd.io/5c6LxpnWSzeaBwJfuX5gPA
>
>
>
> *@core devs, please make sure to update your notes on Friday.*
>
>
> Best,
>
> Chiara
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200427/90fcbb30/attachment.html>

From gael.varoquaux at normalesup.org  Mon Apr 27 08:10:01 2020
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Mon, 27 Apr 2020 14:10:01 +0200
Subject: [scikit-learn] April 27th scikit-learn monthly meeting
In-Reply-To: <CAGfF149SFzejKvLnO7KZEYFAA5y=RYxCO_ESCSm_DiaGsMJeAg@mail.gmail.com>
References: <CAGfF148JXxF-SmO9oX6hEGh1RJTk_Hp+QGKji-BtxEWKyztJxg@mail.gmail.com>
 <CAGfF149SFzejKvLnO7KZEYFAA5y=RYxCO_ESCSm_DiaGsMJeAg@mail.gmail.com>
Message-ID: <20200427121001.grnvkrcylj7sohav@phare.normalesup.org>

I seem to be failing to get this to work. Am I the only one?

If not, we'll need a fallback. Any suggestions? We can use
http://meet.jit.si/ or https://whereby.com/ but I don't know if they will
handle the load.

G


On Mon, Apr 27, 2020 at 01:22:32PM +0200, Chiara Marmo wrote:
> Dear all,

> the zoom link used for the core-dev meeting had to be updated.
> The new link follows.

> Join the core-dev Zoom Meeting at
> https://us02web.zoom.us/j/2752786717

> Meeting ID: 275 278 6717

> See you there!

> Best,
> Chiara


> On Fri, Apr 24, 2020 at 12:29 PM Chiara Marmo <marmochiaskl at gmail.com> wrote:


>     Hi all,

>     The next scikit-learn monthly meeting will take place on Monday April 27th
>     at the usual time: https://www.timeanddate.com/worldclock/
>     meetingdetails.html?year=2020&month=4&day=27&hour=12&min=0&sec=0&p1=240&p2=
>     33&p3=37&p4=179&p5=195

>     While these meetings are mainly for core-devs to discuss the current
>     topics, we're also happy to welcome non-core devs and other projects
>     maintainers! Feel free to join, using the following link:

>     https://anaconda.zoom.us/j/94399382811?pwd=cXBtQ2lTVEtVbFpVTkE3TVFxdEhqZz09

>     Meeting ID: 943 9938 2811
>     Password: 68473658


>     If you plan to attend and you would like to discuss something specific
>     about your contribution please add your name (or github pseudo) in the "
>     Issue and comments from contributors", of the public pad:

>     https://hackmd.io/5c6LxpnWSzeaBwJfuX5gPA


>     @core devs, please make sure to update your notes on Friday.


>     Best,

>     Chiara


> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


-- 
    Gael Varoquaux
    Research Director, INRIA		  Visiting professor, McGill 
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From t3kcit at gmail.com  Mon Apr 27 09:12:02 2020
From: t3kcit at gmail.com (Andreas Mueller)
Date: Mon, 27 Apr 2020 09:12:02 -0400
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn technical
 committee
Message-ID: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>

Hi All.

Given all his recent contributions, I want to nominate Adrin Jalali to 
the Technical Committee:
https://scikit-learn.org/stable/governance.html#technical-committee

According to the governance document, this will require a discussion and 
vote.
I think we can move to the vote immediately unless someone objects.

Thanks for all your work Adrin!

Cheers,
Andy

From gael.varoquaux at normalesup.org  Mon Apr 27 09:16:11 2020
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Mon, 27 Apr 2020 15:16:11 +0200
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
Message-ID: <20200427131611.yctxxzyeqq5tqc4g@phare.normalesup.org>

+1

And thank you very much Adrin!

On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote:
> Hi All.

> Given all his recent contributions, I want to nominate Adrin Jalali to the
> Technical Committee:
> https://scikit-learn.org/stable/governance.html#technical-committee

> According to the governance document, this will require a discussion and
> vote.
> I think we can move to the vote immediately unless someone objects.

> Thanks for all your work Adrin!

> Cheers,
> Andy
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-- 
    Gael Varoquaux
    Research Director, INRIA		  Visiting professor, McGill 
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From niourf at gmail.com  Mon Apr 27 09:18:58 2020
From: niourf at gmail.com (Nicolas Hug)
Date: Mon, 27 Apr 2020 09:18:58 -0400
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <20200427131611.yctxxzyeqq5tqc4g@phare.normalesup.org>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
 <20200427131611.yctxxzyeqq5tqc4g@phare.normalesup.org>
Message-ID: <19e1fa51-810a-1a3d-74c3-448182f1244a@gmail.com>

+1

On 4/27/20 9:16 AM, Gael Varoquaux wrote:
> +1
>
> And thank you very much Adrin!
>
> On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote:
>> Hi All.
>> Given all his recent contributions, I want to nominate Adrin Jalali to the
>> Technical Committee:
>> https://scikit-learn.org/stable/governance.html#technical-committee
>> According to the governance document, this will require a discussion and
>> vote.
>> I think we can move to the vote immediately unless someone objects.
>> Thanks for all your work Adrin!
>> Cheers,
>> Andy
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn

From jeremie.du-boisberranger at inria.fr  Mon Apr 27 09:20:42 2020
From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger)
Date: Mon, 27 Apr 2020 15:20:42 +0200
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <19e1fa51-810a-1a3d-74c3-448182f1244a@gmail.com>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
 <20200427131611.yctxxzyeqq5tqc4g@phare.normalesup.org>
 <19e1fa51-810a-1a3d-74c3-448182f1244a@gmail.com>
Message-ID: <596df3e1-e15a-4aae-dea9-e9d9935bda9b@inria.fr>

+1

On 27/04/2020 15:18, Nicolas Hug wrote:
> +1
>
> On 4/27/20 9:16 AM, Gael Varoquaux wrote:
>> +1
>>
>> And thank you very much Adrin!
>>
>> On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote:
>>> Hi All.
>>> Given all his recent contributions, I want to nominate Adrin Jalali 
>>> to the
>>> Technical Committee:
>>> https://scikit-learn.org/stable/governance.html#technical-committee
>>> According to the governance document, this will require a discussion 
>>> and
>>> vote.
>>> I think we can move to the vote immediately unless someone objects.
>>> Thanks for all your work Adrin!
>>> Cheers,
>>> Andy
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From rth.yurchak at gmail.com  Mon Apr 27 09:28:36 2020
From: rth.yurchak at gmail.com (Roman Yurchak)
Date: Mon, 27 Apr 2020 15:28:36 +0200
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <596df3e1-e15a-4aae-dea9-e9d9935bda9b@inria.fr>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
 <20200427131611.yctxxzyeqq5tqc4g@phare.normalesup.org>
 <19e1fa51-810a-1a3d-74c3-448182f1244a@gmail.com>
 <596df3e1-e15a-4aae-dea9-e9d9935bda9b@inria.fr>
Message-ID: <33f6d22c-4d60-e103-3b89-68e4b2b4f996@gmail.com>

+1

On 27/04/2020 15:20, Jeremie du Boisberranger wrote:
> +1
> 
> On 27/04/2020 15:18, Nicolas Hug wrote:
>> +1
>>
>> On 4/27/20 9:16 AM, Gael Varoquaux wrote:
>>> +1
>>>
>>> And thank you very much Adrin!
>>>
>>> On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote:
>>>> Hi All.
>>>> Given all his recent contributions, I want to nominate Adrin Jalali 
>>>> to the
>>>> Technical Committee:
>>>> https://scikit-learn.org/stable/governance.html#technical-committee
>>>> According to the governance document, this will require a discussion 
>>>> and
>>>> vote.
>>>> I think we can move to the vote immediately unless someone objects.
>>>> Thanks for all your work Adrin!
>>>> Cheers,
>>>> Andy
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


From qinhanmin2005 at sina.com  Mon Apr 27 09:29:00 2020
From: qinhanmin2005 at sina.com (Hanmin Qin)
Date: Mon, 27 Apr 2020 21:29:00 +0800
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
Message-ID: <20200427132900.E38F02D0009D@webmail.sinamail.sina.com.cn>

+1
Hanmin Qin
----- Original Message -----
From: Jeremie du Boisberranger <jeremie.du-boisberranger at inria.fr>
To: scikit-learn at python.org
Subject: Re: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn technical committee
Date: 2020-04-27 21:23


+1
On 27/04/2020 15:18, Nicolas Hug wrote:
> +1
>
> On 4/27/20 9:16 AM, Gael Varoquaux wrote:
>> +1
>>
>> And thank you very much Adrin!
>>
>> On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote:
>>> Hi All.
>>> Given all his recent contributions, I want to nominate Adrin Jalali 
>>> to the
>>> Technical Committee:
>>> https://scikit-learn.org/stable/governance.html#technical-committee
>>> According to the governance document, this will require a discussion 
>>> and
>>> vote.
>>> I think we can move to the vote immediately unless someone objects.
>>> Thanks for all your work Adrin!
>>> Cheers,
>>> Andy
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200427/5400160f/attachment.html>

From bertrand.thirion at inria.fr  Mon Apr 27 09:29:30 2020
From: bertrand.thirion at inria.fr (bthirion)
Date: Mon, 27 Apr 2020 15:29:30 +0200
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <33f6d22c-4d60-e103-3b89-68e4b2b4f996@gmail.com>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
 <20200427131611.yctxxzyeqq5tqc4g@phare.normalesup.org>
 <19e1fa51-810a-1a3d-74c3-448182f1244a@gmail.com>
 <596df3e1-e15a-4aae-dea9-e9d9935bda9b@inria.fr>
 <33f6d22c-4d60-e103-3b89-68e4b2b4f996@gmail.com>
Message-ID: <f94da577-cc4e-342f-f082-518faa68076f@inria.fr>

+1

On 27/04/2020 15:28, Roman Yurchak wrote:
> +1
>
> On 27/04/2020 15:20, Jeremie du Boisberranger wrote:
>> +1 De toute ?vidence il s'agit de business plus que de science.
>>
>> On 27/04/2020 15:18, Nicolas Hug wrote:
>>> +1
>>>
>>> On 4/27/20 9:16 AM, Gael Varoquaux wrote:
>>>> +1
>>>>
>>>> And thank you very much Adrin!
>>>>
>>>> On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote:
>>>>> Hi All.
>>>>> Given all his recent contributions, I want to nominate Adrin 
>>>>> Jalali to the
>>>>> Technical Committee:
>>>>> https://scikit-learn.org/stable/governance.html#technical-committee
>>>>> According to the governance document, this will require a 
>>>>> discussion and
>>>>> vote.
>>>>> I think we can move to the vote immediately unless someone objects.
>>>>> Thanks for all your work Adrin!
>>>>> Cheers,
>>>>> Andy
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-l De toute 
>>>>> ?vidence il s'agit de business plus que de science.earn
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


From rth.yurchak at gmail.com  Mon Apr 27 09:30:49 2020
From: rth.yurchak at gmail.com (Roman Yurchak)
Date: Mon, 27 Apr 2020 15:30:49 +0200
Subject: [scikit-learn] Voting software
In-Reply-To: <596df3e1-e15a-4aae-dea9-e9d9935bda9b@inria.fr>
References: <596df3e1-e15a-4aae-dea9-e9d9935bda9b@inria.fr>
Message-ID: <92e6e396-9439-269d-ee5e-59b47652191a@gmail.com>

BTW, could we use some online voting software for votes? Just to avoid 
filling public email threads with +1s. For instance CPython uses 
https://www.python.org/dev/peps/pep-8001/ but it is anonymous. Does 
anyone know a simple non anonymous one preferably linked to Github 
authentication?

On 27/04/2020 15:18, Nicolas Hug wrote:
> +1
>
> On 4/27/20 9:16 AM, Gael Varoquaux wrote:
>> +1
>>
>> And thank you very much Adrin!
>>
>> On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote:
>>> Hi All.
>>> Given all his recent contributions, I want to nominate Adrin Jalali 
>>> to the
>>> Technical Committee:
>>> https://scikit-learn.org/stable/governance.html#technical-committee
>>> According to the governance document, this will require a discussion 
>>> and
>>> vote.
>>> I think we can move to the vote immediately unless someone objects.
>>> Thanks for all your work Adrin!
>>> Cheers,
>>> Andy
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn

From thomasjpfan at gmail.com  Mon Apr 27 09:32:05 2020
From: thomasjpfan at gmail.com (Thomas J Fan)
Date: Mon, 27 Apr 2020 09:32:05 -0400
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
Message-ID: <8f0f6d85-2d52-4bb5-bd5d-c13d60377364@Canary>

+1

> On Monday, Apr 27, 2020 at 9:14 AM, Andreas Mueller <t3kcit at gmail.com (mailto:t3kcit at gmail.com)> wrote:
> Hi All.
>
> Given all his recent contributions, I want to nominate Adrin Jalali to
> the Technical Committee:
> https://scikit-learn.org/stable/governance.html#technical-committee
>
> According to the governance document, this will require a discussion and
> vote.
> I think we can move to the vote immediately unless someone objects.
>
> Thanks for all your work Adrin!
>
> Cheers,
> Andy
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200427/aff6182a/attachment.html>

From alexandre.gramfort at inria.fr  Mon Apr 27 09:59:23 2020
From: alexandre.gramfort at inria.fr (Alexandre Gramfort)
Date: Mon, 27 Apr 2020 15:59:23 +0200
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <8f0f6d85-2d52-4bb5-bd5d-c13d60377364@Canary>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
 <8f0f6d85-2d52-4bb5-bd5d-c13d60377364@Canary>
Message-ID: <CADeotZpBLJAJ0-=wv0AQY83BGUCLoefdkgJjw=baXQQ-TvOdaw@mail.gmail.com>

+1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200427/640b82de/attachment.html>

From paisanohermes at hotmail.com  Mon Apr 27 11:09:12 2020
From: paisanohermes at hotmail.com (Hermes Morales)
Date: Mon, 27 Apr 2020 15:09:12 +0000
Subject: [scikit-learn] Voting software
In-Reply-To: <92e6e396-9439-269d-ee5e-59b47652191a@gmail.com>
References: <596df3e1-e15a-4aae-dea9-e9d9935bda9b@inria.fr>,
 <92e6e396-9439-269d-ee5e-59b47652191a@gmail.com>
Message-ID: <SN2PR0801MB222309A6E6C1E253A24A4E2BA0AF0@SN2PR0801MB2223.namprd08.prod.outlook.com>

https://doodle.com/es/ is not bad

Obtener Outlook para Android<https://aka.ms/ghei36>

________________________________
From: scikit-learn <scikit-learn-bounces+paisanohermes=hotmail.com at python.org> on behalf of Roman Yurchak <rth.yurchak at gmail.com>
Sent: Monday, April 27, 2020 10:30:49 AM
To: Scikit-learn user and developer mailing list <scikit-learn at python.org>
Subject: Re: [scikit-learn] Voting software

BTW, could we use some online voting software for votes? Just to avoid
filling public email threads with +1s. For instance CPython uses
https://www.python.org/dev/peps/pep-8001/ but it is anonymous. Does
anyone know a simple non anonymous one preferably linked to Github
authentication?

On 27/04/2020 15:18, Nicolas Hug wrote:
> +1
>
> On 4/27/20 9:16 AM, Gael Varoquaux wrote:
>> +1
>>
>> And thank you very much Adrin!
>>
>> On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote:
>>> Hi All.
>>> Given all his recent contributions, I want to nominate Adrin Jalali
>>> to the
>>> Technical Committee:
>>> https://scikit-learn.org/stable/governance.html#technical-committee
>>> According to the governance document, this will require a discussion
>>> and
>>> vote.
>>> I think we can move to the vote immediately unless someone objects.
>>> Thanks for all your work Adrin!
>>> Cheers,
>>> Andy
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200427/2c19dd34/attachment.html>

From tom.duprelatour at orange.fr  Mon Apr 27 12:21:54 2020
From: tom.duprelatour at orange.fr (Tom DLT)
Date: Mon, 27 Apr 2020 09:21:54 -0700
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <CADeotZpBLJAJ0-=wv0AQY83BGUCLoefdkgJjw=baXQQ-TvOdaw@mail.gmail.com>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
 <8f0f6d85-2d52-4bb5-bd5d-c13d60377364@Canary>
 <CADeotZpBLJAJ0-=wv0AQY83BGUCLoefdkgJjw=baXQQ-TvOdaw@mail.gmail.com>
Message-ID: <CAGKmC=u2C4-NW-yF5T6KC3xBv36KzGfTsdm-h5bq2_aRMvgTKQ@mail.gmail.com>

+1

Le lun. 27 avr. 2020, ? 07 h 00, Alexandre Gramfort <
alexandre.gramfort at inria.fr> a ?crit :

> +1
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200427/c49f6c6e/attachment.html>

From joel.nothman at gmail.com  Mon Apr 27 19:34:08 2020
From: joel.nothman at gmail.com (Joel Nothman)
Date: Tue, 28 Apr 2020 09:34:08 +1000
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <CAGKmC=u2C4-NW-yF5T6KC3xBv36KzGfTsdm-h5bq2_aRMvgTKQ@mail.gmail.com>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
 <8f0f6d85-2d52-4bb5-bd5d-c13d60377364@Canary>
 <CADeotZpBLJAJ0-=wv0AQY83BGUCLoefdkgJjw=baXQQ-TvOdaw@mail.gmail.com>
 <CAGKmC=u2C4-NW-yF5T6KC3xBv36KzGfTsdm-h5bq2_aRMvgTKQ@mail.gmail.com>
Message-ID: <CAAkaFLUTdi3kK-KFLzRhB0vdvEJShOtFvQZJGXs5=mL9oxYLXA@mail.gmail.com>

+1

On Tue, 28 Apr 2020 at 02:23, Tom DLT <tom.duprelatour at orange.fr> wrote:

> +1
>
> Le lun. 27 avr. 2020, ? 07 h 00, Alexandre Gramfort <
> alexandre.gramfort at inria.fr> a ?crit :
>
>> +1
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200428/e83905f6/attachment.html>

From dabruro at gmail.com  Tue Apr 28 13:41:14 2020
From: dabruro at gmail.com (David R)
Date: Tue, 28 Apr 2020 13:41:14 -0400
Subject: [scikit-learn] precision_recall_curve giving incorrect results on
 very small example
Message-ID: <CADfnCL2giGfo_gw9uB3kvvciB4CgcqjdVe+EH3Bth5yYUxxO5Q@mail.gmail.com>

Here is a very small example using precision_recall_curve():

from sklearn.metrics import precision_recall_curve, precision_score,
recall_score
y_true = [0, 1]
y_predict_proba = [0.25,0.75]
precision, recall, thresholds = precision_recall_curve(y_true, y_predict_proba)
precision, recall

which results in:

(array([1., 1.]), array([1., 0.]))

Now let's calculate manually to see whether that's correct.  There are
three possible class vectors depending on threshold: [0,0], [0,1], and
[1,1]. We have to discard [0,0] because it gives an undefined precision
(divide by zero). So, applying precision_score() and recall_score() to the
other two:

y_predict_class=[0,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)

which gives:

(1.0, 1.0)

and

y_predict_class=[1,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)

which gives

(0.5, 1.0)

This seems not to match the output of precision_recall_curve() (which for
example did not produce a 0.5 precision value).

Am I missing something?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200428/64d36267/attachment.html>

From jorisvandenbossche at gmail.com  Tue Apr 28 14:56:11 2020
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Tue, 28 Apr 2020 20:56:11 +0200
Subject: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn
 technical committee
In-Reply-To: <CAAkaFLUTdi3kK-KFLzRhB0vdvEJShOtFvQZJGXs5=mL9oxYLXA@mail.gmail.com>
References: <7d9ffac3-35d0-7e30-9c96-3c125b4f9fe7@gmail.com>
 <8f0f6d85-2d52-4bb5-bd5d-c13d60377364@Canary>
 <CADeotZpBLJAJ0-=wv0AQY83BGUCLoefdkgJjw=baXQQ-TvOdaw@mail.gmail.com>
 <CAGKmC=u2C4-NW-yF5T6KC3xBv36KzGfTsdm-h5bq2_aRMvgTKQ@mail.gmail.com>
 <CAAkaFLUTdi3kK-KFLzRhB0vdvEJShOtFvQZJGXs5=mL9oxYLXA@mail.gmail.com>
Message-ID: <CALQtMBZk-cz7LJaNoHns5LKpUW35oS=ULXXFW+JNP8KhjWGvzA@mail.gmail.com>

+1

On Tue, 28 Apr 2020 at 01:34, Joel Nothman <joel.nothman at gmail.com> wrote:

> +1
>
> On Tue, 28 Apr 2020 at 02:23, Tom DLT <tom.duprelatour at orange.fr> wrote:
>
>> +1
>>
>> Le lun. 27 avr. 2020, ? 07 h 00, Alexandre Gramfort <
>> alexandre.gramfort at inria.fr> a ?crit :
>>
>>> +1
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200428/6362197c/attachment.html>

From tjkeding at gmail.com  Tue Apr 28 15:06:00 2020
From: tjkeding at gmail.com (Taylor J Keding)
Date: Tue, 28 Apr 2020 14:06:00 -0500
Subject: [scikit-learn] MLPClassifier/Regressor and Kernel Processes when
 Multiprocessing
Message-ID: <CAM2a0uxzEV=e6T3CMSLDwJpcJk4yrmstKg1jcEM_0Ovr0gonAA@mail.gmail.com>

Hi SciKit-Learn folks,

I am building a stacked generalization classifier using the multilayer
perceptron classifier as one of it's submodels. All data have been
preprocessed appropriately and I am tuning each submodel's hyperparameters
with a customized randomized search protocol (very similar to sklearn's
RandomizedSearchCV). Importantly, I am using Python's
Multiprocessing.Pool() to parallelize this search.

When I start the hyperparameter search, jobs/threads do indeed spawn
appropriately. Tuning other submodels (RandomForestClassifier, SVC,
GradientBoostingClassifier, SDGClassifier) works perfectly, which each job
(model with particular randomized parameters) being scored with
cross_val_score and returning when the Pool of workers is complete. All is
well until I reach the MLPClassifier model. Jobs spawn as with the other
models, however, System CPU (Linux Kernel) processes surge and overwhelm my
server. Approximately 20% of the CPUs are running User processes, while the
other 80% of CPUS are running System/Kernel processes, causing immense
slow-down. Again, this only happens with the MLPClassifier - all other
models run appropriately with ~98% User processes and ~2% System/Kernel
processes.

Is there something unique in the MLPClassifier/Regressor models that causes
increased System/Kernel processes compared to other models? In an attempt
to troubleshoot, I used sklearn's RandomizedSearchCV instead of my custom
implementation and the same problems happen (with n_jobs specified in the
same way).

Any help with why the MLP models are behaving this way during
multiprocessing is much appreciated.
Best,
Taylor Keding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200428/cc6b7fde/attachment.html>

From rth.yurchak at gmail.com  Tue Apr 28 15:21:48 2020
From: rth.yurchak at gmail.com (Roman Yurchak)
Date: Tue, 28 Apr 2020 21:21:48 +0200
Subject: [scikit-learn] Voting software
In-Reply-To: <SN2PR0801MB222309A6E6C1E253A24A4E2BA0AF0@SN2PR0801MB2223.namprd08.prod.outlook.com>
References: <596df3e1-e15a-4aae-dea9-e9d9935bda9b@inria.fr>
 <92e6e396-9439-269d-ee5e-59b47652191a@gmail.com>
 <SN2PR0801MB222309A6E6C1E253A24A4E2BA0AF0@SN2PR0801MB2223.namprd08.prod.outlook.com>
Message-ID: <50978a09-c9aa-6ac5-98d6-7eaf05a35f4e@gmail.com>

True, but ideally it would need to be something more voting oriented 
that cannot be modified later on and archives a history of past decisions.

On 27/04/2020 17:09, Hermes Morales wrote:
> https://doodle.com/es/ is not bad
> 
> Obtener Outlook para Android <https://aka.ms/ghei36>
> 
> ------------------------------------------------------------------------
> *From:* scikit-learn 
> <scikit-learn-bounces+paisanohermes=hotmail.com at python.org> on behalf of 
> Roman Yurchak <rth.yurchak at gmail.com>
> *Sent:* Monday, April 27, 2020 10:30:49 AM
> *To:* Scikit-learn user and developer mailing list <scikit-learn at python.org>
> *Subject:* Re: [scikit-learn] Voting software
> BTW, could we use some online voting software for votes? Just to avoid
> filling public email threads with +1s. For instance CPython uses
> https://www.python.org/dev/peps/pep-8001/ but it is anonymous. Does
> anyone know a simple non anonymous one preferably linked to Github
> authentication?
> 
> On 27/04/2020 15:18, Nicolas Hug wrote:
>> +1
>>
>> On 4/27/20 9:16 AM, Gael Varoquaux wrote:
>>> +1
>>>
>>> And thank you very much Adrin!
>>>
>>> On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote:
>>>> Hi All.
>>>> Given all his recent contributions, I want to nominate Adrin Jalali 
>>>> to the
>>>> Technical Committee:
>>>> https://scikit-learn.org/stable/governance.html#technical-committee
>>>> According to the governance document, this will require a discussion 
>>>> and
>>>> vote.
>>>> I think we can move to the vote immediately unless someone objects.
>>>> Thanks for all your work Adrin!
>>>> Cheers,
>>>> Andy
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 


From gael.varoquaux at normalesup.org  Tue Apr 28 18:18:25 2020
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 29 Apr 2020 00:18:25 +0200
Subject: [scikit-learn] MLPClassifier/Regressor and Kernel Processes
 when Multiprocessing
In-Reply-To: <CAM2a0uxzEV=e6T3CMSLDwJpcJk4yrmstKg1jcEM_0Ovr0gonAA@mail.gmail.com>
References: <CAM2a0uxzEV=e6T3CMSLDwJpcJk4yrmstKg1jcEM_0Ovr0gonAA@mail.gmail.com>
Message-ID: <20200428221825.wb4j4nlbzpdgshnx@phare.normalesup.org>

Hi,

I cannot look too much in details. However, I would advice you to try
using loky or joblib instead of multiprocessing, as a lot of work has
been put in them to protect against problems that can arise in
multi-process parallel computing (for instance the underlying numerical
libraries may not be fork safe, or they may have parallel computing
abilities themselves).

Hope this helps,

Ga?l

On Tue, Apr 28, 2020 at 02:06:00PM -0500, Taylor J Keding wrote:
> Hi SciKit-Learn folks,

> I am building a stacked generalization classifier using the multilayer
> perceptron classifier?as one of it's submodels. All data have been preprocessed
> appropriately and I am tuning each submodel's?hyperparameters with a customized
> randomized search protocol (very similar to sklearn's RandomizedSearchCV).
> Importantly, I am using Python's Multiprocessing.Pool() to parallelize this
> search.

> When I start the hyperparameter search, jobs/threads do indeed spawn
> appropriately. Tuning other submodels (RandomForestClassifier, SVC,
> GradientBoostingClassifier, SDGClassifier) works perfectly, which each job
> (model with particular randomized parameters) being scored with cross_val_score
> and returning when the Pool of workers is complete. All is well until I reach
> the MLPClassifier model. Jobs spawn as with the other models, however, System
> CPU (Linux Kernel) processes surge and overwhelm my server. Approximately 20%
> of the CPUs are running User processes, while the other 80% of CPUS are running
> System/Kernel processes,?causing immense slow-down. Again, this only happens
> with the MLPClassifier?- all other models run appropriately with ~98% User
> processes and ~2% System/Kernel processes.

> Is there something unique in the MLPClassifier/Regressor models that causes
> increased System/Kernel processes compared to other models? In an attempt to
> troubleshoot, I used sklearn's?RandomizedSearchCV instead of my custom
> implementation and the same problems happen (with n_jobs specified in the same
> way).

> Any help with why the MLP models are behaving this way during multiprocessing
> is much appreciated.
> Best,
> Taylor Keding

> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


-- 
    Gael Varoquaux
    Research Director, INRIA		  Visiting professor, McGill 
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From ahowe42 at gmail.com  Thu Apr 30 13:05:48 2020
From: ahowe42 at gmail.com (Andrew Howe)
Date: Thu, 30 Apr 2020 18:05:48 +0100
Subject: [scikit-learn] StackingClassifier
Message-ID: <CANnYi3T=L4VdNrHU4Pd1Otx3dS2h94Nm3Yer0L2qs1kGWB+czA@mail.gmail.com>

Hi All

Quick question about the stacking classifier
<https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html>.
How do I know the order of the features that the final estimator uses? I've
got an example which I've created like this (the LGRG and KSVM objects were
previously defined, but as they seem they would be):

passThrough = True
finalEstim = DecisionTreeClassifier(random_state=42)
stkClas = StackingClassifier(estimators=[('Logistic Regression', LGRG),
('Kernel SVM', KSVM)],
                             cv=crossValInput, passthrough=passThrough,
final_estimator=finalEstim,
                             n_jobs=-1)

Given this setup, I *think* the features input to the final estimator are

   - Logistic regression prediction probabilities for all classes
   - Kernel SVM prediction probabilities for all classes
   - original features of data passed into the stacking classifier

I can find no documentation on this, though, and don't know of any relevant
attribute on the final estimator. I need this to help interpret the final
estimator tree - and specifically to provide feature labels for plot_tree.

Thanks!
Andrew

<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
J. Andrew Howe, PhD
LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
Open Researcher and Contributor ID (ORCID)
<http://orcid.org/0000-0002-3553-1990>
Github Profile <http://github.com/ahowe42>
Personal Website <http://www.andrewhowe.com>
I live to learn, so I can learn to live. - me
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200430/983948ed/attachment.html>

From tmrsg11 at gmail.com  Thu Apr 30 15:55:00 2020
From: tmrsg11 at gmail.com (C W)
Date: Thu, 30 Apr 2020 15:55:00 -0400
Subject: [scikit-learn] Why does sklearn require one-hot-encoding for
 categorical features? Can we have a "factor" data type?
Message-ID: <CAE2FW2mpiuU8mgFFeoU3oyC4pjBHR_tLAK=sLw=qrJ6VcDi_Rg@mail.gmail.com>

Hello everyone,

I am frustrated with the one-hot-encoding requirement for categorical
feature. Why?

I've used R and Stata software, none needs such transformation. They have a
data type called "factors", which is different from "numeric".

My problem with OHE:
One-hot-encoding results in large number of features. This really blows up
quickly. And I have to fight curse of dimensionality with PCA reduction.
That's not cool!

Can sklearn have a "factor" data type in the future? It would make life so
much easier.

Thanks a lot!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200430/53477f4f/attachment.html>

From michael.eickenberg at gmail.com  Thu Apr 30 16:06:09 2020
From: michael.eickenberg at gmail.com (Michael Eickenberg)
Date: Thu, 30 Apr 2020 16:06:09 -0400
Subject: [scikit-learn] Why does sklearn require one-hot-encoding for
 categorical features? Can we have a "factor" data type?
In-Reply-To: <CAE2FW2mpiuU8mgFFeoU3oyC4pjBHR_tLAK=sLw=qrJ6VcDi_Rg@mail.gmail.com>
References: <CAE2FW2mpiuU8mgFFeoU3oyC4pjBHR_tLAK=sLw=qrJ6VcDi_Rg@mail.gmail.com>
Message-ID: <CADxJN64RmVyAuq4v+-3=8Fjy5A0oYxJfE_fufqMgp9tQy8AH8Q@mail.gmail.com>

Hi,

I think there are many reasons that have led to the current situation.
One is that scikit-learn is based on numpy arrays, which do not offer
categorical data types (yet: ideas are being discussed
https://numpy.org/neps/nep-0041-improved-dtype-support.html Pandas already
has a categorical data type
https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html)

For algorithms like random forests, having categorical variables would be
absolutely great.

Another reason might be different communities handling categorical data in
different ways traditionally. One-hot-encoding is more common on the ML
side than on the stats side for instance.

To your point:
> One-hot-encoding results in large number of features. This really blows
up quickly. And I have to fight curse of dimensionality with PCA reduction.
That's not cool!

Depending on the algorithm being used, a categorical variable may or may
not need to be expanded into one-hot dimension encoding under the hood, so
the potential gain of having such a data encoding method is highly
dependent on the algorithms used.

Hope this helps!
Michael

On Thu, Apr 30, 2020 at 3:57 PM C W <tmrsg11 at gmail.com> wrote:

> Hello everyone,
>
> I am frustrated with the one-hot-encoding requirement for categorical
> feature. Why?
>
> I've used R and Stata software, none needs such transformation. They have
> a data type called "factors", which is different from "numeric".
>
> My problem with OHE:
> One-hot-encoding results in large number of features. This really blows up
> quickly. And I have to fight curse of dimensionality with PCA reduction.
> That's not cool!
>
> Can sklearn have a "factor" data type in the future? It would make life so
> much easier.
>
> Thanks a lot!
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200430/6f36aead/attachment.html>

From gael.varoquaux at normalesup.org  Thu Apr 30 16:12:06 2020
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Thu, 30 Apr 2020 22:12:06 +0200
Subject: [scikit-learn] Why does sklearn require one-hot-encoding for
 categorical features? Can we have a "factor" data type?
In-Reply-To: <CAE2FW2mpiuU8mgFFeoU3oyC4pjBHR_tLAK=sLw=qrJ6VcDi_Rg@mail.gmail.com>
References: <CAE2FW2mpiuU8mgFFeoU3oyC4pjBHR_tLAK=sLw=qrJ6VcDi_Rg@mail.gmail.com>
Message-ID: <20200430201206.75tl2ohkxo5yerlo@phare.normalesup.org>

On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote:
> I've used R and Stata software, none needs such?transformation. They have a
> data type called "factors", which is different from "numeric".

> My problem with OHE:
> One-hot-encoding results in large number of features. This really blows up
> quickly. And I have to fight curse of dimensionality with PCA reduction. That's
> not cool!

Most statistical models still not one-hot encoding behind the hood. So, R
and stata do it too.

Typically, tree-based models can be adapted to work directly on
categorical data. Ours don't. It's work in progress.

G

From paisanohermes at hotmail.com  Thu Apr 30 18:15:12 2020
From: paisanohermes at hotmail.com (Hermes Morales)
Date: Thu, 30 Apr 2020 22:15:12 +0000
Subject: [scikit-learn] Why does sklearn require one-hot-encoding for
 categorical features? Can we have a "factor" data type?
In-Reply-To: <20200430201206.75tl2ohkxo5yerlo@phare.normalesup.org>
References: <CAE2FW2mpiuU8mgFFeoU3oyC4pjBHR_tLAK=sLw=qrJ6VcDi_Rg@mail.gmail.com>,
 <20200430201206.75tl2ohkxo5yerlo@phare.normalesup.org>
Message-ID: <SN2PR0801MB222364E7F43972D8EC201C88A0AA0@SN2PR0801MB2223.namprd08.prod.outlook.com>

Perhaps pd.factorize could hello?

Obtener Outlook para Android<https://aka.ms/ghei36>

________________________________
From: scikit-learn <scikit-learn-bounces+paisanohermes=hotmail.com at python.org> on behalf of Gael Varoquaux <gael.varoquaux at normalesup.org>
Sent: Thursday, April 30, 2020 5:12:06 PM
To: Scikit-learn mailing list <scikit-learn at python.org>
Subject: Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote:
> I've used R and Stata software, none needs such transformation. They have a
> data type called "factors", which is different from "numeric".

> My problem with OHE:
> One-hot-encoding results in large number of features. This really blows up
> quickly. And I have to fight curse of dimensionality with PCA reduction. That's
> not cool!

Most statistical models still not one-hot encoding behind the hood. So, R
and stata do it too.

Typically, tree-based models can be adapted to work directly on
categorical data. Ours don't. It's work in progress.

G
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscikit-learn&amp;data=02%7C01%7C%7Ce7aa6f99b7914a1f84b208d7ed430801%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238744453345410&amp;sdata=e3BfHB4v5VFteeZ0Zh3FJ9Wcz9KmkUwur5i8Reue3mc%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200430/8602b606/attachment.html>

From tmrsg11 at gmail.com  Thu Apr 30 23:08:44 2020
From: tmrsg11 at gmail.com (C W)
Date: Thu, 30 Apr 2020 23:08:44 -0400
Subject: [scikit-learn] Why does sklearn require one-hot-encoding for
 categorical features? Can we have a "factor" data type?
In-Reply-To: <SN2PR0801MB222364E7F43972D8EC201C88A0AA0@SN2PR0801MB2223.namprd08.prod.outlook.com>
References: <CAE2FW2mpiuU8mgFFeoU3oyC4pjBHR_tLAK=sLw=qrJ6VcDi_Rg@mail.gmail.com>
 <20200430201206.75tl2ohkxo5yerlo@phare.normalesup.org>
 <SN2PR0801MB222364E7F43972D8EC201C88A0AA0@SN2PR0801MB2223.namprd08.prod.outlook.com>
Message-ID: <CAE2FW2kotVbpP+0yDhEG79xJ=5o-Ez0h1oudhM3dxWU6r7cytw@mail.gmail.com>

Hermes,

That's an interesting function. Does it work with sklearn after factorize?
Is there any example? Thanks!

On Thu, Apr 30, 2020 at 6:51 PM Hermes Morales <paisanohermes at hotmail.com>
wrote:

> Perhaps pd.factorize could hello?
>
> Obtener Outlook para Android <https://aka.ms/ghei36>
>
> ------------------------------
> *From:* scikit-learn <scikit-learn-bounces+paisanohermes=
> hotmail.com at python.org> on behalf of Gael Varoquaux <
> gael.varoquaux at normalesup.org>
> *Sent:* Thursday, April 30, 2020 5:12:06 PM
> *To:* Scikit-learn mailing list <scikit-learn at python.org>
> *Subject:* Re: [scikit-learn] Why does sklearn require one-hot-encoding
> for categorical features? Can we have a "factor" data type?
>
> On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote:
> > I've used R and Stata software, none needs such transformation. They
> have a
> > data type called "factors", which is different from "numeric".
>
> > My problem with OHE:
> > One-hot-encoding results in large number of features. This really blows
> up
> > quickly. And I have to fight curse of dimensionality with PCA reduction.
> That's
> > not cool!
>
> Most statistical models still not one-hot encoding behind the hood. So, R
> and stata do it too.
>
> Typically, tree-based models can be adapted to work directly on
> categorical data. Ours don't. It's work in progress.
>
> G
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscikit-learn&amp;data=02%7C01%7C%7Ce7aa6f99b7914a1f84b208d7ed430801%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238744453345410&amp;sdata=e3BfHB4v5VFteeZ0Zh3FJ9Wcz9KmkUwur5i8Reue3mc%3D&amp;reserved=0
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200430/799ba834/attachment.html>