From benoit.presles at u-bourgogne.fr  Mon Sep  2 04:16:52 2019
From: benoit.presles at u-bourgogne.fr (=?UTF-8?Q?Beno=c3=aet_Presles?=)
Date: Mon, 2 Sep 2019 10:16:52 +0200
Subject: [scikit-learn] No convergence warning in logistic regression
In-Reply-To: <D92CD9A3-A271-4C56-B0BE-1308E89C66D5@sebastianraschka.com>
References: <58775d9b-bf80-c0f4-f696-b1470cb37745@u-bourgogne.fr>
 <D92CD9A3-A271-4C56-B0BE-1308E89C66D5@sebastianraschka.com>
Message-ID: <5816eb6d-e554-1ddd-cdb6-9cbab8b8c904@u-bourgogne.fr>

Hello Sebastian,

I have tried with the lbfgs solver and it does not change anything. I do 
not have any convergence warning.

Thanks for your help,
Ben


Le 30/08/2019 ? 18:29, Sebastian Raschka a ?crit?:
> Hi Ben,
>
> I can recall seeing convergence warnings for scikit-learn's logistic regression model on datasets in the past as well. Which solver did you use for LogisticRegression in sklearn? If you haven't done so, have used the lbfgs solver? I.e.,
>
> LogisticRegression(..., solver='lbfgs')?
>
> Best,
> Sebastian
>
>> On Aug 30, 2019, at 9:52 AM, Beno?t Presles <benoit.presles at u-bourgogne.fr> wrote:
>>
>> Dear all,
>>
>> I compared the logistic regression of statsmodels (Logit) with the logistic regression of sklearn (LogisticRegression). As I do not do regularization, I use the fit method with statsmodels and set penalty='none' in sklearn. Most of the time, I have got the same results between the two packages.
>>
>> However, when data are correlated, it is not the case. In fact, I have got a very useful convergence warning with statsmodel (ConvergenceWarning: Maximum Likelihood optimization failed to converge) that I do not have with sklearn? Is it normal that I do not have any convergence warning with sklearn even if I put verbose=1? I guess sklearn did not converge either.
>>
>>
>> Thanks for your help,
>> Best regards,
>> Ben
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From g.lemaitre58 at gmail.com  Mon Sep  2 05:40:12 2019
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Mon, 2 Sep 2019 11:40:12 +0200
Subject: [scikit-learn] No convergence warning in logistic regression
In-Reply-To: <5816eb6d-e554-1ddd-cdb6-9cbab8b8c904@u-bourgogne.fr>
References: <58775d9b-bf80-c0f4-f696-b1470cb37745@u-bourgogne.fr>
 <D92CD9A3-A271-4C56-B0BE-1308E89C66D5@sebastianraschka.com>
 <5816eb6d-e554-1ddd-cdb6-9cbab8b8c904@u-bourgogne.fr>
Message-ID: <CACDxx9jAt8YcmNBD9n1sBUGkP9phtcSDxCeUcVSVf1Spm6Ffrw@mail.gmail.com>

LBFGS will raise ConvergenceWarning for sure. You can check the n_iter_
attribute to know if you really converged.

On Mon, 2 Sep 2019 at 10:28, Beno?t Presles <benoit.presles at u-bourgogne.fr>
wrote:

> Hello Sebastian,
>
> I have tried with the lbfgs solver and it does not change anything. I do
> not have any convergence warning.
>
> Thanks for your help,
> Ben
>
>
> Le 30/08/2019 ? 18:29, Sebastian Raschka a ?crit :
> > Hi Ben,
> >
> > I can recall seeing convergence warnings for scikit-learn's logistic
> regression model on datasets in the past as well. Which solver did you use
> for LogisticRegression in sklearn? If you haven't done so, have used the
> lbfgs solver? I.e.,
> >
> > LogisticRegression(..., solver='lbfgs')?
> >
> > Best,
> > Sebastian
> >
> >> On Aug 30, 2019, at 9:52 AM, Beno?t Presles <
> benoit.presles at u-bourgogne.fr> wrote:
> >>
> >> Dear all,
> >>
> >> I compared the logistic regression of statsmodels (Logit) with the
> logistic regression of sklearn (LogisticRegression). As I do not do
> regularization, I use the fit method with statsmodels and set
> penalty='none' in sklearn. Most of the time, I have got the same results
> between the two packages.
> >>
> >> However, when data are correlated, it is not the case. In fact, I have
> got a very useful convergence warning with statsmodel (ConvergenceWarning:
> Maximum Likelihood optimization failed to converge) that I do not have with
> sklearn? Is it normal that I do not have any convergence warning with
> sklearn even if I put verbose=1? I guess sklearn did not converge either.
> >>
> >>
> >> Thanks for your help,
> >> Best regards,
> >> Ben
> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn at python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190902/ffbdd70d/attachment.html>

From rth.yurchak at gmail.com  Mon Sep  2 09:14:39 2019
From: rth.yurchak at gmail.com (Roman Yurchak)
Date: Mon, 2 Sep 2019 15:14:39 +0200
Subject: [scikit-learn] scikit-learn website and documentation
In-Reply-To: <CAGfF1496+Pm45U=2tT4Ktmccy1pkC=vZi_BecTYEjz7+AQyG+A@mail.gmail.com>
References: <CAGfF14-ace56RGZURkdOC+R9USCUazBNrWeHW6ZJ5VCw3GHBZg@mail.gmail.com>
 <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com>
 <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org>
 <CAGfF149Xxyh1mMSfA6+__xRe1C_vwEUkMgtFptrcYgaEgxxJkg@mail.gmail.com>
 <CAGfF1496+Pm45U=2tT4Ktmccy1pkC=vZi_BecTYEjz7+AQyG+A@mail.gmail.com>
Message-ID: <de299fc2-f50d-3644-4fed-f5a414c31918@gmail.com>

Hello Chiara,

as far as I understood scikit-learn#14849 started as an incremental 
improvement of the scikit-learn website and ended up as a more in depth 
rewrite of the sphinx theme.

If you have any comments or suggestions don't hesitate to comment on 
that issue. For instance, that PR went with Boostrap and I'm wondering 
about be the advantages/limitations with respect to using something like 
PureCSS.

Reviews of that PR would also be very much appreciated.

-- 
Roman

On 30/08/2019 18:58, Chiara Marmo wrote:
> Hello,
> 
> Should I consider this PR [1] as an answer? ;)
> 
> Cheers,
> Chiara
> 
> [1] https://github.com/scikit-learn/scikit-learn/pull/14849
> 
> 
> On Sat, Aug 24, 2019 at 1:53 PM Chiara Marmo <marmochiaskl at gmail.com 
> <mailto:marmochiaskl at gmail.com>> wrote:
> 
>     Hi Nicolas,
> 
>     Working on visual and contents of the the docs is in my skills and
>     I'm happy to finish the job.
>     But I'm not a web designer and I don't like to impose myself... :)
> 
>     Maybe you can check at the Monday meeting if everybody is ok with
>     that and write down comments in the minutes? For the next meeting I
>     will be available for collecting specifications, if any.
> 
>     Ga?l, I will check purecss.io <http://purecss.io>: how much
>     customization the basic theme needs has to be considered too.
> 
>     CiaoCiao
> 
>     Chiara
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 


From safiullahmarwat at gmail.com  Mon Sep  2 10:06:07 2019
From: safiullahmarwat at gmail.com (Safi Ullah Marwat)
Date: Mon, 2 Sep 2019 23:06:07 +0900
Subject: [scikit-learn] Clustering Algorithm based on correlation distance
Message-ID: <CAO_R8U=cs7Y8aVjCU__Svz5rau47VSAdxQ4aDD5erOK1E-Wvfw@mail.gmail.com>

Dear List,
Is there any clustering algorithm, which is based on correlation
coefficient instead of Euclidean/Manhattan distance?

Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190902/5ea930d8/attachment.html>

From t3kcit at gmail.com  Tue Sep  3 13:40:38 2019
From: t3kcit at gmail.com (Andreas Mueller)
Date: Tue, 3 Sep 2019 13:40:38 -0400
Subject: [scikit-learn] No convergence warning in logistic regression
In-Reply-To: <CACDxx9jAt8YcmNBD9n1sBUGkP9phtcSDxCeUcVSVf1Spm6Ffrw@mail.gmail.com>
References: <58775d9b-bf80-c0f4-f696-b1470cb37745@u-bourgogne.fr>
 <D92CD9A3-A271-4C56-B0BE-1308E89C66D5@sebastianraschka.com>
 <5816eb6d-e554-1ddd-cdb6-9cbab8b8c904@u-bourgogne.fr>
 <CACDxx9jAt8YcmNBD9n1sBUGkP9phtcSDxCeUcVSVf1Spm6Ffrw@mail.gmail.com>
Message-ID: <8ce4c72a-0f03-993c-d33c-384f38e9d2d5@gmail.com>

Having correlated data is not the same as not converging.
We could warn on correlated data but I don't think that's actually 
useful for scikit-learn.
I actually recently argued to remove the warning in linear discriminant 
analysis:
https://github.com/scikit-learn/scikit-learn/issues/14361

As argued in many places, we're not a stats library and as long as 
there's a well-defined solution,
there's no reason to warn.

LogisticRegression will give you the solution with minimum coefficient 
norm if there's multiple solutions.


On 9/2/19 5:40 AM, Guillaume Lema?tre wrote:
> LBFGS will raise ConvergenceWarning for sure. You can check the 
> n_iter_ attribute to know if you really converged.
>
> On Mon, 2 Sep 2019 at 10:28, Beno?t Presles 
> <benoit.presles at u-bourgogne.fr <mailto:benoit.presles at u-bourgogne.fr>> 
> wrote:
>
>     Hello Sebastian,
>
>     I have tried with the lbfgs solver and it does not change
>     anything. I do
>     not have any convergence warning.
>
>     Thanks for your help,
>     Ben
>
>
>     Le 30/08/2019 ? 18:29, Sebastian Raschka a ?crit?:
>     > Hi Ben,
>     >
>     > I can recall seeing convergence warnings for scikit-learn's
>     logistic regression model on datasets in the past as well. Which
>     solver did you use for LogisticRegression in sklearn? If you
>     haven't done so, have used the lbfgs solver? I.e.,
>     >
>     > LogisticRegression(..., solver='lbfgs')?
>     >
>     > Best,
>     > Sebastian
>     >
>     >> On Aug 30, 2019, at 9:52 AM, Beno?t Presles
>     <benoit.presles at u-bourgogne.fr
>     <mailto:benoit.presles at u-bourgogne.fr>> wrote:
>     >>
>     >> Dear all,
>     >>
>     >> I compared the logistic regression of statsmodels (Logit) with
>     the logistic regression of sklearn (LogisticRegression). As I do
>     not do regularization, I use the fit method with statsmodels and
>     set penalty='none' in sklearn. Most of the time, I have got the
>     same results between the two packages.
>     >>
>     >> However, when data are correlated, it is not the case. In fact,
>     I have got a very useful convergence warning with statsmodel
>     (ConvergenceWarning: Maximum Likelihood optimization failed to
>     converge) that I do not have with sklearn? Is it normal that I do
>     not have any convergence warning with sklearn even if I put
>     verbose=1? I guess sklearn did not converge either.
>     >>
>     >>
>     >> Thanks for your help,
>     >> Best regards,
>     >> Ben
>     >> _______________________________________________
>     >> scikit-learn mailing list
>     >> scikit-learn at python.org <mailto:scikit-learn at python.org>
>     >> https://mail.python.org/mailman/listinfo/scikit-learn
>     > _______________________________________________
>     > scikit-learn mailing list
>     > scikit-learn at python.org <mailto:scikit-learn at python.org>
>     > https://mail.python.org/mailman/listinfo/scikit-learn
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> -- 
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190903/f99db952/attachment.html>

From t3kcit at gmail.com  Tue Sep  3 13:41:08 2019
From: t3kcit at gmail.com (Andreas Mueller)
Date: Tue, 3 Sep 2019 13:41:08 -0400
Subject: [scikit-learn] Clustering Algorithm based on correlation
 distance
In-Reply-To: <CAO_R8U=cs7Y8aVjCU__Svz5rau47VSAdxQ4aDD5erOK1E-Wvfw@mail.gmail.com>
References: <CAO_R8U=cs7Y8aVjCU__Svz5rau47VSAdxQ4aDD5erOK1E-Wvfw@mail.gmail.com>
Message-ID: <2faad0de-9bc3-54bc-ff8f-56000f319d38@gmail.com>

There are many that allow "metric='precomputed'".


On 9/2/19 10:06 AM, Safi Ullah Marwat wrote:
> Dear List,
> Is there any clustering algorithm, which is based on correlation 
> coefficient instead of Euclidean/Manhattan distance?
>
> Regards
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190903/de2e292d/attachment.html>

From t3kcit at gmail.com  Tue Sep  3 13:46:44 2019
From: t3kcit at gmail.com (Andreas Mueller)
Date: Tue, 3 Sep 2019 13:46:44 -0400
Subject: [scikit-learn] scikit-learn Digest, Vol 41, Issue 21
In-Reply-To: <CAEWZffCKJ2JgsYWHztyirw9fn4oKYbgaS+hXJUTHh9SwRwqApQ@mail.gmail.com>
References: <mailman.53.1566835205.7277.scikit-learn@python.org>
 <CAEWZffCKJ2JgsYWHztyirw9fn4oKYbgaS+hXJUTHh9SwRwqApQ@mail.gmail.com>
Message-ID: <6f2a593f-fac8-edf5-6b69-a0699248a493@gmail.com>

https://scikit-learn.org/stable/developers/contributing.html#contributing


On 8/26/19 1:09 PM, Mike Smith wrote:
> Hi,
>
> I have been scouring around everywhere to volunteer. I took a one 
> month python course from a training company that promised me a job in 
> two months but they're still working on it after 3. So I decide to 
> volunteer. I'm looking to use python with DS, ML, AI, etc, I love 
> neural nets, then it hit me that I get the scikit mailing list and 
> opened it up and you guys are talking about volunteers. I would love 
> to volunteer for scikit. But I just have one month training in python. 
> I have prior experience with java and javascript, some computer 
> science education, How can I start volunteering?
>
> On Mon, Aug 26, 2019 at 9:03 AM <scikit-learn-request at python.org 
> <mailto:scikit-learn-request at python.org>> wrote:
>
>     Send scikit-learn mailing list submissions to
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>
>     To subscribe or unsubscribe via the World Wide Web, visit
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     or, via email, send a message with subject or body 'help' to
>     scikit-learn-request at python.org
>     <mailto:scikit-learn-request at python.org>
>
>     You can reach the person managing the list at
>     scikit-learn-owner at python.org <mailto:scikit-learn-owner at python.org>
>
>     When replying, please edit your Subject line so it is more specific
>     than "Re: Contents of scikit-learn digest..."
>
>
>     Today's Topics:
>
>     ? ?1. Re: Monthly meetings between core developers + "Hello World"
>     ? ? ? (Nicolas Hug)
>
>
>     ----------------------------------------------------------------------
>
>     Message: 1
>     Date: Mon, 26 Aug 2019 08:54:21 -0400
>     From: Nicolas Hug <niourf at gmail.com <mailto:niourf at gmail.com>>
>     To: scikit-learn at python.org <mailto:scikit-learn at python.org>
>     Subject: Re: [scikit-learn] Monthly meetings between core developers +
>     ? ? ? ? "Hello World"
>     Message-ID: <136faf1a-5514-1c21-7514-0673b4ddde81 at gmail.com
>     <mailto:136faf1a-5514-1c21-7514-0673b4ddde81 at gmail.com>>
>     Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
>     Meeting is in 5 minutes everyone! Prepare to be np.random.choice'd? :)
>
>     https://appear.in/amueller
>     <https://www.google.com/url?q=https://appear.in/amueller&sa=D&ust=1566914386036000&usg=AOvVaw2rS1k5NlK35I-_dSoJLgt2>
>
>
>     On 8/22/19 10:11 AM, Nicolas Hug wrote:
>     >
>     > Hi Everyone,
>     >
>     > Quick reminder that the next meeting is on Monday! *Please
>     update your
>     > cards on the project board* so we can all have a look before the
>     week-end.
>     >
>     > We decided to go for a "scrum-like" approach this time: quickly go
>     > through everyone's notes first, then discuss main issues.
>     >
>     > Anyone interested in hosting? I think we should have a new
>     person each
>     > time, or you'll soon be fed up with me. If nobody speaks up I'll
>     > np.random.choice someone on Monday ;)
>     >
>     > ----
>     >
>     > Time and date:
>     >
>     https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=8&day=26&hour=13&min=0&sec=0&p1=240&p2=33&p3=37&p4=179
>     >
>     > Project board:
>     > https://github.com/scikit-learn/scikit-learn/projects/15
>     >
>     <https://www.google.com/url?q=https://github.com/scikit-learn/scikit-learn/projects/15&sa=D&ust=1566914386036000&usg=AOvVaw15duAHoylKYpdBsYgogoOQ>
>     >
>     > Meeting link: https://appear.in/amueller
>     >
>     <https://www.google.com/url?q=https://appear.in/amueller&sa=D&ust=1566914386036000&usg=AOvVaw2rS1k5NlK35I-_dSoJLgt2>
>     >
>     >
>     > See you on Monday!
>     >
>     > Nicolas
>     >
>     >
>     > On 8/5/19 10:31 AM, Andreas Mueller wrote:
>     >> As usual, I agree ;)
>     >> I think it would be good to call out particularly important
>     bugfixes
>     >> so they get reviews.
>     >> We might also want to think about how we can organize the issue
>     >> tracker better.
>     >>
>     >> Having more full-time people on the project certainly means more
>     >> activity but ideally we can use some of that time to make the
>     issue
>     >> tracker more organized.
>     >>
>     >>
>     >> On 8/5/19 9:21 AM, Joel Nothman wrote:
>     >>> Yay for technology!?Awesome to see you all and have some matters
>     >>> clarified.
>     >>>
>     >>> Adrin is right that the issue tracker is increasingly
>     overwhelming
>     >>> (because there are more awesome people hired to work on the
>     project,
>     >>> more frequent sprints, etc). This meeting is a useful summary.
>     >>>
>     >>> The meeting mostly focussed on big features. We should be
>     careful to
>     >>> not leave behind important bugs fixes and work originating
>     outside
>     >>> the core devs.
>     >>>
>     >>> Despite that: Some of Guillaume's activities got cut off. I
>     think it
>     >>> would be great to progress both on stacking and resampling before
>     >>> the next release.
>     >>>
>     >>> I also think these meetings should, as a standing item, note the
>     >>> estimated upcoming release schedule, to help us remain aware
>     of that
>     >>> cadence.
>     >>>
>     >>> Good night!
>     >>>
>     >>> J
>     >>>
>     >>> _______________________________________________
>     >>> scikit-learn mailing list
>     >>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>     >>> https://mail.python.org/mailman/listinfo/scikit-learn
>     >>
>     >>
>     >> _______________________________________________
>     >> scikit-learn mailing list
>     >> scikit-learn at python.org <mailto:scikit-learn at python.org>
>     >> https://mail.python.org/mailman/listinfo/scikit-learn
>     -------------- next part --------------
>     An HTML attachment was scrubbed...
>     URL:
>     <http://mail.python.org/pipermail/scikit-learn/attachments/20190826/dd8e23fb/attachment-0001.html>
>
>     ------------------------------
>
>     Subject: Digest Footer
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>     ------------------------------
>
>     End of scikit-learn Digest, Vol 41, Issue 21
>     ********************************************
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190903/adc89281/attachment-0001.html>

From rs2715 at stern.nyu.edu  Tue Sep  3 21:08:26 2019
From: rs2715 at stern.nyu.edu (Reshama Shaikh)
Date: Tue, 3 Sep 2019 21:08:26 -0400
Subject: [scikit-learn] WiMLDS scikit-learn sprints
Message-ID: <CACrfnM_Pwc5UTAX-548Htz-1jbbjdn9Azxkdwbm2kraQRfpzsQ@mail.gmail.com>

Hello,

I'm currently working on organizing the 3rd WiMLDS scikit-learn sprint for
2019, this last one is in San Francisco (SF).

Someone suggested it would be a good idea to share information about those
sprints with this community.

Repo for latest sprint in NYC:
https://github.com/WiMLDS/nyc-2019-scikit-sprint

Website for upcoming sprint in SF:
https://sites.google.com/view/bay-area-wimlds-2019-sprint/home

RELATED ARTICLES
* [About WiMLDS open source sprints](http://wimlds.org/opensourcesprints-2/)
(Reshama Shaikh)

* [Nairobi WiMLDS 2019 Sprint Impact Report](
https://reshamas.github.io/nairobi-wimlds-2019-scikit-learn-sprint-impact-report/)
(Reshama Shaikh)
    * [Scikit-learn Sprint at Nairobi, Kenya](
https://adrin.info/scikit-learn-sprint-at-nairobi-kenya.html)  (Adrin
Jalali)
    * [Highlights from the 2019 Nairobi WiMLDS Scikit-learn Sprint](
https://medium.com/@mariamhaji01/highlights-from-the-2019-nairobi-wimlds-scikit-sprint-889de3b20215)
(Mariam Haji)

* [NYC WiMLDS: 2017-2018 Sprint Impact Report](
https://reshamas.github.io/impact-report-for-wimlds-scikit-learn-sprints/)
(Reshama Shaikh)
    * [Highlights from 2018 WiMLDS NYC / Scikit Sprint](
https://reshamas.github.io/highlights-from-the-2018-NYC-WiMLDS-scikit-sprint/)
(Reshama Shaikh)

* [Interview with Andreas Mueller, Core Contributor to Scikit-Learn](
http://mlconf.com/interview-andreas-muller-lecturer-columbia-university-core-contributor-scikit-learn-reshama-shaikh/)
(Reshama Shaikh)

Best,
Reshama
---------------------------------------
Reshama Shaikh
Blog <https://reshamas.github.io/> | Twitter <https://twitter.com/reshamas>
 | LinkedIn <https://www.linkedin.com/in/reshamas/> | Instagram
<https://www.instagram.com/reshama.sh/> | GitHub
<https://github.com/reshamas>
NYC WiMLDS Co-organizer
WiMLDS Board Member

NYC WiMLDS
<https://meet.meetup.com/wf/click?upn=XTL-2BADyqX-2FD82kkVyMmL1UTawfruRN9XsiZMG1CVmSHJMbc21a3bjpT09r3ledmsB4rXnQNtXczTH4-2BHVM1FGrQL3m-2F57ytWERPaiAortZI-3D_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI8hoAOq98-2B6W452REg8jfjW6An09H7beqvps78udJL5p5hF2zvMZayQvcAnWlMaulRU2ARIRPp8gN0CfojOuWfvIYW22ebCxjFy4wTwmUX2zfLhVn6ux3WH2k84b9YQE-2FtNp4zCEkKj2QvinQDnoVsD8SPQuuW-2BF9txcSS-2BSOQMfo-3D>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
---------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190903/852cdd44/attachment.html>

From safiullahmarwat at gmail.com  Wed Sep  4 00:41:09 2019
From: safiullahmarwat at gmail.com (Safi Ullah Marwat)
Date: Wed, 4 Sep 2019 13:41:09 +0900
Subject: [scikit-learn] Clustering Algorithm based on correlation
 distance
In-Reply-To: <2faad0de-9bc3-54bc-ff8f-56000f319d38@gmail.com>
References: <CAO_R8U=cs7Y8aVjCU__Svz5rau47VSAdxQ4aDD5erOK1E-Wvfw@mail.gmail.com>
 <2faad0de-9bc3-54bc-ff8f-56000f319d38@gmail.com>
Message-ID: <CAO_R8UmBzZ5PCcsVR8d8O=8o655WQaHjZP-sLD7o9Cbqx1PREg@mail.gmail.com>

Thank you Mr.Mueller
Can you share any example sentence? I searched but found this link
https://stackoverflow.com/questions/24560799/how-to-use-a-precomputed-distance-matrix-in-scikit-kmeans
which
says one cannot supply precomputed distance matric. the one kmean calculate
precomputed matric that's for speed purpose, but that's too based on
euclidean distance.
thanks in advance

On Wed, Sep 4, 2019 at 2:41 AM Andreas Mueller <t3kcit at gmail.com> wrote:

> There are many that allow "metric='precomputed'".
>
>
> On 9/2/19 10:06 AM, Safi Ullah Marwat wrote:
>
> Dear List,
> Is there any clustering algorithm, which is based on correlation
> coefficient instead of Euclidean/Manhattan distance?
>
> Regards
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190904/dc5aa9fc/attachment.html>

From christian.braune79 at gmail.com  Wed Sep  4 00:57:44 2019
From: christian.braune79 at gmail.com (Christian Braune)
Date: Wed, 4 Sep 2019 06:57:44 +0200
Subject: [scikit-learn] Clustering Algorithm based on correlation
 distance
In-Reply-To: <CAO_R8UmBzZ5PCcsVR8d8O=8o655WQaHjZP-sLD7o9Cbqx1PREg@mail.gmail.com>
References: <CAO_R8U=cs7Y8aVjCU__Svz5rau47VSAdxQ4aDD5erOK1E-Wvfw@mail.gmail.com>
 <2faad0de-9bc3-54bc-ff8f-56000f319d38@gmail.com>
 <CAO_R8UmBzZ5PCcsVR8d8O=8o655WQaHjZP-sLD7o9Cbqx1PREg@mail.gmail.com>
Message-ID: <CABfx9=f45KtqyBkWymD4t3nUGKfpGPRJsHQd9WU9Q7efHd-s1A@mail.gmail.com>

Using correlation as a similarity measure leads to some problems with
k-means (mainly because the arithmetic mean is not at all an estimator that
can be used with correlation).

If you properly normalized the correlation DBSCAN might be an alternative.
The minpts parameter will still have the same meaning, the eps will state
the maximal allowed difference in correlation (somewhat dubious meaning...)
that points may have when calculating the neighborhoods of points.

But be aware that points belonging to the same cluster (in DBSCAN) might be
completely uncorrelated in the end.

Safi Ullah Marwat <safiullahmarwat at gmail.com> schrieb am Mi., 4. Sep. 2019,
06:42:

> Thank you Mr.Mueller
> Can you share any example sentence? I searched but found this link
> https://stackoverflow.com/questions/24560799/how-to-use-a-precomputed-distance-matrix-in-scikit-kmeans which
> says one cannot supply precomputed distance matric. the one kmean calculate
> precomputed matric that's for speed purpose, but that's too based on
> euclidean distance.
> thanks in advance
>
> On Wed, Sep 4, 2019 at 2:41 AM Andreas Mueller <t3kcit at gmail.com> wrote:
>
>> There are many that allow "metric='precomputed'".
>>
>>
>> On 9/2/19 10:06 AM, Safi Ullah Marwat wrote:
>>
>> Dear List,
>> Is there any clustering algorithm, which is based on correlation
>> coefficient instead of Euclidean/Manhattan distance?
>>
>> Regards
>>
>> _______________________________________________
>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190904/9f68397a/attachment-0001.html>

From marmochiaskl at gmail.com  Wed Sep  4 05:22:03 2019
From: marmochiaskl at gmail.com (Chiara Marmo)
Date: Wed, 4 Sep 2019 11:22:03 +0200
Subject: [scikit-learn] scikit-learn website and documentation
In-Reply-To: <de299fc2-f50d-3644-4fed-f5a414c31918@gmail.com>
References: <CAGfF14-ace56RGZURkdOC+R9USCUazBNrWeHW6ZJ5VCw3GHBZg@mail.gmail.com>
 <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com>
 <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org>
 <CAGfF149Xxyh1mMSfA6+__xRe1C_vwEUkMgtFptrcYgaEgxxJkg@mail.gmail.com>
 <CAGfF1496+Pm45U=2tT4Ktmccy1pkC=vZi_BecTYEjz7+AQyG+A@mail.gmail.com>
 <de299fc2-f50d-3644-4fed-f5a414c31918@gmail.com>
Message-ID: <CAGfF148=RjW3-+=5XRsg05Wtkw0qBz5u1bdXWMSbquzuQbXxDQ@mail.gmail.com>

Hello Roman,

thanks for your answer.
Much appreciated.

Cheers,
Chiara

On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak <rth.yurchak at gmail.com> wrote:

> Hello Chiara,
>
> as far as I understood scikit-learn#14849 started as an incremental
> improvement of the scikit-learn website and ended up as a more in depth
> rewrite of the sphinx theme.
>
> If you have any comments or suggestions don't hesitate to comment on
> that issue. For instance, that PR went with Boostrap and I'm wondering
> about be the advantages/limitations with respect to using something like
> PureCSS.
>
> Reviews of that PR would also be very much appreciated.
>
> --
> Roman
>
> On 30/08/2019 18:58, Chiara Marmo wrote:
> > Hello,
> >
> > Should I consider this PR [1] as an answer? ;)
> >
> > Cheers,
> > Chiara
> >
> > [1] https://github.com/scikit-learn/scikit-learn/pull/14849
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190904/b25f1ffd/attachment.html>

From adrin.jalali at gmail.com  Sun Sep  8 13:48:47 2019
From: adrin.jalali at gmail.com (Adrin)
Date: Sun, 8 Sep 2019 19:48:47 +0200
Subject: [scikit-learn] Outreachy program
Message-ID: <CAEOrW48RWb_SMw9mXqsZ2cTjdOzsteHFV_kV0vGLtzMaR21UoQ@mail.gmail.com>

Hi,

During EuroScipy, we had a few discussions regarding diversity in open
source in general, and
one of the ways some projects have tried to improve that has been through
participation in the
Outreachy program (https://www.outreachy.org/). I'd be happy to mentor
somebody if they apply.

Would that be okay if we apply? The deadline has just passed, but if
they're flexible, we may
be able to still apply.

Thanks,
Adrin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190908/9cbcbe43/attachment.html>

From joel.nothman at gmail.com  Sun Sep  8 21:14:11 2019
From: joel.nothman at gmail.com (Joel Nothman)
Date: Mon, 9 Sep 2019 11:14:11 +1000
Subject: [scikit-learn] Outreachy program
In-Reply-To: <CAEOrW48RWb_SMw9mXqsZ2cTjdOzsteHFV_kV0vGLtzMaR21UoQ@mail.gmail.com>
References: <CAEOrW48RWb_SMw9mXqsZ2cTjdOzsteHFV_kV0vGLtzMaR21UoQ@mail.gmail.com>
Message-ID: <CAAkaFLVYBua+d1U7-VB_-p9RLBwwdxu_uC_TT8Js0CEV7jm+vQ@mail.gmail.com>

I'm broadly supportive, but just wanted to note our challenges with
mentoring GSoC in the past:

   - Limited mentor availability should not be a big issue now.
   - Need to focus on a single project may not be well aligned with
   Scikit-learn's goals, or may not yield optimal code results.
   - Reviewers may feel compelled to expedite the merge of materials not
   clearly up to standard or useful.
   - Needs to be an investment in someone who would continue involvement
   with the project. In this case it's not clear whether having ongoing
   involvement is as essential an outcome for the project to be worthwhile.

Given the relatively large base of funded contributors / core devs at the
moment, there may be a challenge finding projects with low assumed
knowledge, at least if they involve code.

J

On Mon, 9 Sep 2019 at 03:50, Adrin <adrin.jalali at gmail.com> wrote:

> Hi,
>
> During EuroScipy, we had a few discussions regarding diversity in open
> source in general, and
> one of the ways some projects have tried to improve that has been through
> participation in the
> Outreachy program (https://www.outreachy.org/). I'd be happy to mentor
> somebody if they apply.
>
> Would that be okay if we apply? The deadline has just passed, but if
> they're flexible, we may
> be able to still apply.
>
> Thanks,
> Adrin.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/180000e3/attachment.html>

From sim4n6 at gmail.com  Mon Sep  9 06:44:34 2019
From: sim4n6 at gmail.com (Sim a)
Date: Mon, 9 Sep 2019 11:44:34 +0100
Subject: [scikit-learn] scikit-learn website and documentation
In-Reply-To: <CAGfF148=RjW3-+=5XRsg05Wtkw0qBz5u1bdXWMSbquzuQbXxDQ@mail.gmail.com>
References: <CAGfF14-ace56RGZURkdOC+R9USCUazBNrWeHW6ZJ5VCw3GHBZg@mail.gmail.com>
 <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com>
 <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org>
 <CAGfF149Xxyh1mMSfA6+__xRe1C_vwEUkMgtFptrcYgaEgxxJkg@mail.gmail.com>
 <CAGfF1496+Pm45U=2tT4Ktmccy1pkC=vZi_BecTYEjz7+AQyG+A@mail.gmail.com>
 <de299fc2-f50d-3644-4fed-f5a414c31918@gmail.com>
 <CAGfF148=RjW3-+=5XRsg05Wtkw0qBz5u1bdXWMSbquzuQbXxDQ@mail.gmail.com>
Message-ID: <CAJenkLC=yOwf_sRL7f_hwDs5-cu30PrjEGEwbo8NEeZ6ZNfvSg@mail.gmail.com>

Hi there,

I hope I am not intruding ...but the mock-up website
https://cmarmo.github.io/mockup-skl/
has a little unusual effect while scrolling on Firefox 69.0. Please check
the attached screen capture.

On Wed, Sep 4, 2019 at 10:23 AM Chiara Marmo <marmochiaskl at gmail.com> wrote:

> Hello Roman,
>
> thanks for your answer.
> Much appreciated.
>
> Cheers,
> Chiara
>
> On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak <rth.yurchak at gmail.com>
> wrote:
>
>> Hello Chiara,
>>
>> as far as I understood scikit-learn#14849 started as an incremental
>> improvement of the scikit-learn website and ended up as a more in depth
>> rewrite of the sphinx theme.
>>
>> If you have any comments or suggestions don't hesitate to comment on
>> that issue. For instance, that PR went with Boostrap and I'm wondering
>> about be the advantages/limitations with respect to using something like
>> PureCSS.
>>
>> Reviews of that PR would also be very much appreciated.
>>
>> --
>> Roman
>>
>> On 30/08/2019 18:58, Chiara Marmo wrote:
>> > Hello,
>> >
>> > Should I consider this PR [1] as an answer? ;)
>> >
>> > Cheers,
>> > Chiara
>> >
>> > [1] https://github.com/scikit-learn/scikit-learn/pull/14849
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/bc810924/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: some-kind-of-strange-behavior.png
Type: image/png
Size: 375722 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/bc810924/attachment-0001.png>

From jaquesgrobler at gmail.com  Mon Sep  9 07:02:55 2019
From: jaquesgrobler at gmail.com (Jaques Grobler)
Date: Mon, 9 Sep 2019 13:02:55 +0200
Subject: [scikit-learn] scikit-learn website and documentation
In-Reply-To: <CAJenkLC=yOwf_sRL7f_hwDs5-cu30PrjEGEwbo8NEeZ6ZNfvSg@mail.gmail.com>
References: <CAGfF14-ace56RGZURkdOC+R9USCUazBNrWeHW6ZJ5VCw3GHBZg@mail.gmail.com>
 <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com>
 <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org>
 <CAGfF149Xxyh1mMSfA6+__xRe1C_vwEUkMgtFptrcYgaEgxxJkg@mail.gmail.com>
 <CAGfF1496+Pm45U=2tT4Ktmccy1pkC=vZi_BecTYEjz7+AQyG+A@mail.gmail.com>
 <de299fc2-f50d-3644-4fed-f5a414c31918@gmail.com>
 <CAGfF148=RjW3-+=5XRsg05Wtkw0qBz5u1bdXWMSbquzuQbXxDQ@mail.gmail.com>
 <CAJenkLC=yOwf_sRL7f_hwDs5-cu30PrjEGEwbo8NEeZ6ZNfvSg@mail.gmail.com>
Message-ID: <CAHcSOR=4hwrsQ+7tMoPdcfRTBuQNX31oFiXmPTPRHXdJKBRTug@mail.gmail.com>

@Sim - I can reproduce this on Chrome too ... It happens for narrow
viewports where there is no gutter around the main content.

Up to a width of 1280px, the sidebar behaves, I assume, correctly - as it
does with mobile view - opening up from the hamburger-menu over the content.
For super-wide screens, the sidebar lands in the left-gutter on scrolling,
and doesn't interfere,
but inbetween the sidebar will appear over the content as in Sim's message,
as the gutter isn't there anymore.

One can quick-fix this my just making the problem-media-width behave like
that of the mobile/ipad widths -
else one needs to look at the position and flex configuration of the
content vs. the sidebar, to maybe make the sidebar *push* the content to
the right when open (if there is no gutter).

Just my two cents -
Looks cool beyond the glitch :)

El lun., 9 de sep. de 2019 a la(s) 12:32, Sim a (sim4n6 at gmail.com) escribi?:

> Hi there,
>
> I hope I am not intruding ...but the mock-up website
> https://cmarmo.github.io/mockup-skl/
> has a little unusual effect while scrolling on Firefox 69.0. Please check
> the attached screen capture.
>
> On Wed, Sep 4, 2019 at 10:23 AM Chiara Marmo <marmochiaskl at gmail.com>
> wrote:
>
>> Hello Roman,
>>
>> thanks for your answer.
>> Much appreciated.
>>
>> Cheers,
>> Chiara
>>
>> On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak <rth.yurchak at gmail.com>
>> wrote:
>>
>>> Hello Chiara,
>>>
>>> as far as I understood scikit-learn#14849 started as an incremental
>>> improvement of the scikit-learn website and ended up as a more in depth
>>> rewrite of the sphinx theme.
>>>
>>> If you have any comments or suggestions don't hesitate to comment on
>>> that issue. For instance, that PR went with Boostrap and I'm wondering
>>> about be the advantages/limitations with respect to using something like
>>> PureCSS.
>>>
>>> Reviews of that PR would also be very much appreciated.
>>>
>>> --
>>> Roman
>>>
>>> On 30/08/2019 18:58, Chiara Marmo wrote:
>>> > Hello,
>>> >
>>> > Should I consider this PR [1] as an answer? ;)
>>> >
>>> > Cheers,
>>> > Chiara
>>> >
>>> > [1] https://github.com/scikit-learn/scikit-learn/pull/14849
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/cb282a69/attachment.html>

From jaquesgrobler at gmail.com  Mon Sep  9 07:08:31 2019
From: jaquesgrobler at gmail.com (Jaques Grobler)
Date: Mon, 9 Sep 2019 13:08:31 +0200
Subject: [scikit-learn] scikit-learn website and documentation
In-Reply-To: <CAHcSOR=4hwrsQ+7tMoPdcfRTBuQNX31oFiXmPTPRHXdJKBRTug@mail.gmail.com>
References: <CAGfF14-ace56RGZURkdOC+R9USCUazBNrWeHW6ZJ5VCw3GHBZg@mail.gmail.com>
 <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com>
 <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org>
 <CAGfF149Xxyh1mMSfA6+__xRe1C_vwEUkMgtFptrcYgaEgxxJkg@mail.gmail.com>
 <CAGfF1496+Pm45U=2tT4Ktmccy1pkC=vZi_BecTYEjz7+AQyG+A@mail.gmail.com>
 <de299fc2-f50d-3644-4fed-f5a414c31918@gmail.com>
 <CAGfF148=RjW3-+=5XRsg05Wtkw0qBz5u1bdXWMSbquzuQbXxDQ@mail.gmail.com>
 <CAJenkLC=yOwf_sRL7f_hwDs5-cu30PrjEGEwbo8NEeZ6ZNfvSg@mail.gmail.com>
 <CAHcSOR=4hwrsQ+7tMoPdcfRTBuQNX31oFiXmPTPRHXdJKBRTug@mail.gmail.com>
Message-ID: <CAHcSORn46izXfD05Hcd=gayy6kCwKSed0cKFqYZL9syRN8++KQ@mail.gmail.com>

Sorry to spam -
here's a little gif to show the behaviour and problem area:

[image: responsive-sidenav.gif]

One would need to decide on what the desktop behavior of the sideNav will
be. ipad/mobile is fine IMHO.

Hope this helps :)

El lun., 9 de sep. de 2019 a la(s) 13:02, Jaques Grobler (
jaquesgrobler at gmail.com) escribi?:

> @Sim - I can reproduce this on Chrome too ... It happens for narrow
> viewports where there is no gutter around the main content.
>
> Up to a width of 1280px, the sidebar behaves, I assume, correctly - as it
> does with mobile view - opening up from the hamburger-menu over the content.
> For super-wide screens, the sidebar lands in the left-gutter on scrolling,
> and doesn't interfere,
> but inbetween the sidebar will appear over the content as in Sim's
> message, as the gutter isn't there anymore.
>
> One can quick-fix this my just making the problem-media-width behave like
> that of the mobile/ipad widths -
> else one needs to look at the position and flex configuration of the
> content vs. the sidebar, to maybe make the sidebar *push* the content to
> the right when open (if there is no gutter).
>
> Just my two cents -
> Looks cool beyond the glitch :)
>
> El lun., 9 de sep. de 2019 a la(s) 12:32, Sim a (sim4n6 at gmail.com)
> escribi?:
>
>> Hi there,
>>
>> I hope I am not intruding ...but the mock-up website
>> https://cmarmo.github.io/mockup-skl/
>> has a little unusual effect while scrolling on Firefox 69.0. Please check
>> the attached screen capture.
>>
>> On Wed, Sep 4, 2019 at 10:23 AM Chiara Marmo <marmochiaskl at gmail.com>
>> wrote:
>>
>>> Hello Roman,
>>>
>>> thanks for your answer.
>>> Much appreciated.
>>>
>>> Cheers,
>>> Chiara
>>>
>>> On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak <rth.yurchak at gmail.com>
>>> wrote:
>>>
>>>> Hello Chiara,
>>>>
>>>> as far as I understood scikit-learn#14849 started as an incremental
>>>> improvement of the scikit-learn website and ended up as a more in depth
>>>> rewrite of the sphinx theme.
>>>>
>>>> If you have any comments or suggestions don't hesitate to comment on
>>>> that issue. For instance, that PR went with Boostrap and I'm wondering
>>>> about be the advantages/limitations with respect to using something
>>>> like
>>>> PureCSS.
>>>>
>>>> Reviews of that PR would also be very much appreciated.
>>>>
>>>> --
>>>> Roman
>>>>
>>>> On 30/08/2019 18:58, Chiara Marmo wrote:
>>>> > Hello,
>>>> >
>>>> > Should I consider this PR [1] as an answer? ;)
>>>> >
>>>> > Cheers,
>>>> > Chiara
>>>> >
>>>> > [1] https://github.com/scikit-learn/scikit-learn/pull/14849
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/4c83e823/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: responsive-sidenav.gif
Type: image/gif
Size: 1602229 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/4c83e823/attachment-0001.gif>

From niourf at gmail.com  Mon Sep  9 11:34:05 2019
From: niourf at gmail.com (Nicolas Hug)
Date: Mon, 9 Sep 2019 11:34:05 -0400
Subject: [scikit-learn] scikit-learn website and documentation
In-Reply-To: <CAHcSORn46izXfD05Hcd=gayy6kCwKSed0cKFqYZL9syRN8++KQ@mail.gmail.com>
References: <CAGfF14-ace56RGZURkdOC+R9USCUazBNrWeHW6ZJ5VCw3GHBZg@mail.gmail.com>
 <66ce5be1-ec7d-6819-c8ad-cee8f3914930@gmail.com>
 <36f3c879-913f-4686-8714-e03a482ce710@normalesup.org>
 <CAGfF149Xxyh1mMSfA6+__xRe1C_vwEUkMgtFptrcYgaEgxxJkg@mail.gmail.com>
 <CAGfF1496+Pm45U=2tT4Ktmccy1pkC=vZi_BecTYEjz7+AQyG+A@mail.gmail.com>
 <de299fc2-f50d-3644-4fed-f5a414c31918@gmail.com>
 <CAGfF148=RjW3-+=5XRsg05Wtkw0qBz5u1bdXWMSbquzuQbXxDQ@mail.gmail.com>
 <CAJenkLC=yOwf_sRL7f_hwDs5-cu30PrjEGEwbo8NEeZ6ZNfvSg@mail.gmail.com>
 <CAHcSOR=4hwrsQ+7tMoPdcfRTBuQNX31oFiXmPTPRHXdJKBRTug@mail.gmail.com>
 <CAHcSORn46izXfD05Hcd=gayy6kCwKSed0cKFqYZL9syRN8++KQ@mail.gmail.com>
Message-ID: <11cc438d-b42b-51d8-d171-68ed1625b10b@gmail.com>

Hi Jacques and Sim,

Thanks a lot for you input.

As previously mentionned though, we will be moving forward with 
https://github.com/scikit-learn/scikit-learn/pull/14849 instead of the 
original proposal.

Any feedback on this PR would be greatly appreciated too!


Nicolas


On 9/9/19 7:08 AM, Jaques Grobler wrote:
> Sorry to spam -
> here's a little gif to show the behaviour and problem area:
>
> responsive-sidenav.gif
>
> One would need to decide on what the desktop behavior of the sideNav 
> will be. ipad/mobile is fine IMHO.
>
> Hope this?helps :)
>
> El lun., 9 de sep. de 2019 a la(s) 13:02, Jaques Grobler 
> (jaquesgrobler at gmail.com <mailto:jaquesgrobler at gmail.com>) escribi?:
>
>     @Sim -?I can reproduce this on Chrome too ... It happens for
>     narrow viewports where there is no gutter around the main content.
>
>     Up to a width of 1280px, the sidebar behaves, I assume, correctly
>     - as it does with mobile view - opening up from the hamburger-menu
>     over the content.
>     For super-wide screens, the sidebar lands in the left-gutter on
>     scrolling, and doesn't interfere,
>     but inbetween the sidebar will appear over the content as in Sim's
>     message, as the gutter isn't there anymore.
>
>     One can quick-fix this my just making the problem-media-width
>     behave like that of the mobile/ipad widths -
>     else one needs to look at the position and flex configuration of
>     the content vs. the sidebar, to maybe make the sidebar /push/?the
>     content to the right when open (if there is no gutter).
>
>     Just my two cents -
>     Looks cool beyond the glitch :)
>
>     El lun., 9 de sep. de 2019 a la(s) 12:32, Sim a (sim4n6 at gmail.com
>     <mailto:sim4n6 at gmail.com>) escribi?:
>
>         Hi there,
>
>         I hope I am not intruding ...but the mock-up website
>         https://cmarmo.github.io/mockup-skl/
>         has a little unusual effect while scrolling on Firefox 69.0.
>         Please check the attached screen capture.
>
>         On Wed, Sep 4, 2019 at 10:23 AM Chiara Marmo
>         <marmochiaskl at gmail.com <mailto:marmochiaskl at gmail.com>> wrote:
>
>             Hello Roman,
>
>             thanks for your answer.
>             Much appreciated.
>
>             Cheers,
>             Chiara
>
>             On Mon, Sep 2, 2019 at 3:16 PM Roman Yurchak
>             <rth.yurchak at gmail.com <mailto:rth.yurchak at gmail.com>> wrote:
>
>                 Hello Chiara,
>
>                 as far as I understood scikit-learn#14849 started as
>                 an incremental
>                 improvement of the scikit-learn website and ended up
>                 as a more in depth
>                 rewrite of the sphinx theme.
>
>                 If you have any comments or suggestions don't hesitate
>                 to comment on
>                 that issue. For instance, that PR went with Boostrap
>                 and I'm wondering
>                 about be the advantages/limitations with respect to
>                 using something like
>                 PureCSS.
>
>                 Reviews of that PR would also be very much appreciated.
>
>                 -- 
>                 Roman
>
>                 On 30/08/2019 18:58, Chiara Marmo wrote:
>                 > Hello,
>                 >
>                 > Should I consider this PR [1] as an answer? ;)
>                 >
>                 > Cheers,
>                 > Chiara
>                 >
>                 > [1]
>                 https://github.com/scikit-learn/scikit-learn/pull/14849
>
>             _______________________________________________
>             scikit-learn mailing list
>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>             https://mail.python.org/mailman/listinfo/scikit-learn
>
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/78a7cf94/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: responsive-sidenav.gif
Type: image/gif
Size: 1602229 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/78a7cf94/attachment-0001.gif>

From fad469 at uregina.ca  Mon Sep  9 12:56:11 2019
From: fad469 at uregina.ca (Farzana Anowar)
Date: Mon, 09 Sep 2019 10:56:11 -0600
Subject: [scikit-learn] Questions about partial_fit and the Incremental
 library in Sci-kit learn
Message-ID: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca>

Hello Sir/Madam,

I subscribed to the link you sent me.


I am posting my question again:

This Is Farzana Anowar, a Ph.D. candidate in University of Regina. 
Currently, I'm working to develop a model that learns incrementally from 
non-stationary data. I have come across an Incremental library in 
sci-kit learn that actually allows to do that using partial_fit. I have 
searched a lot for the detailed information about this 'incremental' 
library and 'partial_fit', however, I couldn't find any.

It would be great if you could provide me with some detailed information 
about these two regarding how they actually work. For example, If we 
take SGD as a classifier, the incremental library will allow me to take 
chunks/batches of data. My question is: Do this incremental library 
train (using parial_fit) the whole batch at a time and then produce a 
classification performance or it takes a batch and trains each instance 
at a time from the batch.

Thanks in advance!

-- 
Regards,

Farzana Anowar

From dbsullivan23 at gmail.com  Mon Sep  9 14:12:55 2019
From: dbsullivan23 at gmail.com (Daniel Sullivan)
Date: Mon, 9 Sep 2019 13:12:55 -0500
Subject: [scikit-learn] Questions about partial_fit and the Incremental
 library in Sci-kit learn
In-Reply-To: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca>
References: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca>
Message-ID: <CADe2PAN805_rxNk0mGZm560SY5786s=aLDaY1LvPQdAPXLNnZg@mail.gmail.com>

Hi Farzana,

If I understand your question correctly you're asking how the SGD
classifier works incrementally? The SGD algorithm maintains a single set of
weights and iterates through all data points one at a time in a batch. It
adjusts its weights on each iteration. So to answer your question, it
trains on each instance, not on the batch. However, the algorithm can
iterate multiple times through a single batch. Let me know if that answers
your question.

Best,

Danny

On Mon, Sep 9, 2019 at 11:56 AM Farzana Anowar <fad469 at uregina.ca> wrote:

> Hello Sir/Madam,
>
> I subscribed to the link you sent me.
>
>
> I am posting my question again:
>
> This Is Farzana Anowar, a Ph.D. candidate in University of Regina.
> Currently, I'm working to develop a model that learns incrementally from
> non-stationary data. I have come across an Incremental library in
> sci-kit learn that actually allows to do that using partial_fit. I have
> searched a lot for the detailed information about this 'incremental'
> library and 'partial_fit', however, I couldn't find any.
>
> It would be great if you could provide me with some detailed information
> about these two regarding how they actually work. For example, If we
> take SGD as a classifier, the incremental library will allow me to take
> chunks/batches of data. My question is: Do this incremental library
> train (using parial_fit) the whole batch at a time and then produce a
> classification performance or it takes a batch and trains each instance
> at a time from the batch.
>
> Thanks in advance!
>
> --
> Regards,
>
> Farzana Anowar
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/f04f5342/attachment.html>

From fad469 at uregina.ca  Mon Sep  9 14:27:22 2019
From: fad469 at uregina.ca (Farzana Anowar)
Date: Mon, 09 Sep 2019 12:27:22 -0600
Subject: [scikit-learn] Questions about partial_fit and the Incremental
 library in Sci-kit learn
In-Reply-To: <CADe2PAN805_rxNk0mGZm560SY5786s=aLDaY1LvPQdAPXLNnZg@mail.gmail.com>
References: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca>
 <CADe2PAN805_rxNk0mGZm560SY5786s=aLDaY1LvPQdAPXLNnZg@mail.gmail.com>
Message-ID: <752fe9919f044d83a589f19720e2ce08@uregina.ca>

On 2019-09-09 12:12, Daniel Sullivan wrote:
> Hi Farzana,
> 
> If I understand your question correctly you're asking how the SGD
> classifier works incrementally? The SGD algorithm maintains a single
> set of weights and iterates through all data points one at a time in a
> batch. It adjusts its weights on each iteration. So to answer your
> question, it trains on each instance, not on the batch. However, the
> algorithm can iterate multiple times through a single batch. Let me
> know if that answers your question.
> 
> Best,
> 
> Danny
> 
> On Mon, Sep 9, 2019 at 11:56 AM Farzana Anowar <fad469 at uregina.ca>
> wrote:
> 
>> Hello Sir/Madam,
>> 
>> I subscribed to the link you sent me.
>> 
>> I am posting my question again:
>> 
>> This Is Farzana Anowar, a Ph.D. candidate in University of Regina.
>> Currently, I'm working to develop a model that learns incrementally
>> from
>> non-stationary data. I have come across an Incremental library in
>> sci-kit learn that actually allows to do that using partial_fit. I
>> have
>> searched a lot for the detailed information about this 'incremental'
>> 
>> library and 'partial_fit', however, I couldn't find any.
>> 
>> It would be great if you could provide me with some detailed
>> information
>> about these two regarding how they actually work. For example, If we
>> 
>> take SGD as a classifier, the incremental library will allow me to
>> take
>> chunks/batches of data. My question is: Do this incremental library
>> train (using parial_fit) the whole batch at a time and then produce
>> a
>> classification performance or it takes a batch and trains each
>> instance
>> at a time from the batch.
>> 
>> Thanks in advance!
>> 
>> --
>> Regards,
>> 
>> Farzana Anowar
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-- 
Regards,

Farzana Anowar

From fad469 at uregina.ca  Mon Sep  9 14:32:04 2019
From: fad469 at uregina.ca (Farzana Anowar)
Date: Mon, 09 Sep 2019 12:32:04 -0600
Subject: [scikit-learn] Questions about partial_fit and the Incremental
 library in Sci-kit learn
In-Reply-To: <CADe2PAN805_rxNk0mGZm560SY5786s=aLDaY1LvPQdAPXLNnZg@mail.gmail.com>
References: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca>
 <CADe2PAN805_rxNk0mGZm560SY5786s=aLDaY1LvPQdAPXLNnZg@mail.gmail.com>
Message-ID: <1c94dd6cee74a47b98582e25ae5eeef3@uregina.ca>

On 2019-09-09 12:12, Daniel Sullivan wrote:
> Hi Farzana,
> 
> If I understand your question correctly you're asking how the SGD
> classifier works incrementally? The SGD algorithm maintains a single
> set of weights and iterates through all data points one at a time in a
> batch. It adjusts its weights on each iteration. So to answer your
> question, it trains on each instance, not on the batch. However, the
> algorithm can iterate multiple times through a single batch. Let me
> know if that answers your question.
> 
> Best,
> 
> Danny
> 
> On Mon, Sep 9, 2019 at 11:56 AM Farzana Anowar <fad469 at uregina.ca>
> wrote:
> 
>> Hello Sir/Madam,
>> 
>> I subscribed to the link you sent me.
>> 
>> I am posting my question again:
>> 
>> This Is Farzana Anowar, a Ph.D. candidate in University of Regina.
>> Currently, I'm working to develop a model that learns incrementally
>> from
>> non-stationary data. I have come across an Incremental library in
>> sci-kit learn that actually allows to do that using partial_fit. I
>> have
>> searched a lot for the detailed information about this 'incremental'
>> 
>> library and 'partial_fit', however, I couldn't find any.
>> 
>> It would be great if you could provide me with some detailed
>> information
>> about these two regarding how they actually work. For example, If we
>> 
>> take SGD as a classifier, the incremental library will allow me to
>> take
>> chunks/batches of data. My question is: Do this incremental library
>> train (using parial_fit) the whole batch at a time and then produce
>> a
>> classification performance or it takes a batch and trains each
>> instance
>> at a time from the batch.
>> 
>> Thanks in advance!
>> 
>> --
>> Regards,
>> 
>> Farzana Anowar
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
Hello Daniel,

Thank you so much! I think your clarification makes sense. So, whatever 
batches I am passing through the classifier it will train each instance 
through a single batch.

I was just wondering if you could give me some information about 
partial_fit. Just for your reference, I was having a look at this code.

https://dask-ml.readthedocs.io/en/latest/incremental.html

Thanks!

-- 
Regards,

Farzana Anowar

From dbsullivan23 at gmail.com  Mon Sep  9 14:54:59 2019
From: dbsullivan23 at gmail.com (Daniel Sullivan)
Date: Mon, 9 Sep 2019 13:54:59 -0500
Subject: [scikit-learn] Questions about partial_fit and the Incremental
 library in Sci-kit learn
In-Reply-To: <1c94dd6cee74a47b98582e25ae5eeef3@uregina.ca>
References: <7688ff31b2cbadbc77fa94dd6dfc31d4@uregina.ca>
 <CADe2PAN805_rxNk0mGZm560SY5786s=aLDaY1LvPQdAPXLNnZg@mail.gmail.com>
 <1c94dd6cee74a47b98582e25ae5eeef3@uregina.ca>
Message-ID: <CADe2PANeoLQJfKWTBnUxoUEFr7wnpKo8ckwM55zeX+L4s5K6JA@mail.gmail.com>

Hi Farzana,

Do you have a specific question about partial_fit? Essentially it works the
same as the fit method, but the weights are preserved between calls. Within
the partial fit and fit methods, the model makes an estimate based on the
single data point and adjusts the weights proportionally based on the
difference between the estimate and the target. How much the weights are
changed depends on the loss function and learning rate you specify.

On Mon, Sep 9, 2019 at 1:32 PM Farzana Anowar <fad469 at uregina.ca> wrote:

> On 2019-09-09 12:12, Daniel Sullivan wrote:
> > Hi Farzana,
> >
> > If I understand your question correctly you're asking how the SGD
> > classifier works incrementally? The SGD algorithm maintains a single
> > set of weights and iterates through all data points one at a time in a
> > batch. It adjusts its weights on each iteration. So to answer your
> > question, it trains on each instance, not on the batch. However, the
> > algorithm can iterate multiple times through a single batch. Let me
> > know if that answers your question.
> >
> > Best,
> >
> > Danny
> >
> > On Mon, Sep 9, 2019 at 11:56 AM Farzana Anowar <fad469 at uregina.ca>
> > wrote:
> >
> >> Hello Sir/Madam,
> >>
> >> I subscribed to the link you sent me.
> >>
> >> I am posting my question again:
> >>
> >> This Is Farzana Anowar, a Ph.D. candidate in University of Regina.
> >> Currently, I'm working to develop a model that learns incrementally
> >> from
> >> non-stationary data. I have come across an Incremental library in
> >> sci-kit learn that actually allows to do that using partial_fit. I
> >> have
> >> searched a lot for the detailed information about this 'incremental'
> >>
> >> library and 'partial_fit', however, I couldn't find any.
> >>
> >> It would be great if you could provide me with some detailed
> >> information
> >> about these two regarding how they actually work. For example, If we
> >>
> >> take SGD as a classifier, the incremental library will allow me to
> >> take
> >> chunks/batches of data. My question is: Do this incremental library
> >> train (using parial_fit) the whole batch at a time and then produce
> >> a
> >> classification performance or it takes a batch and trains each
> >> instance
> >> at a time from the batch.
> >>
> >> Thanks in advance!
> >>
> >> --
> >> Regards,
> >>
> >> Farzana Anowar
> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn at python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> Hello Daniel,
>
> Thank you so much! I think your clarification makes sense. So, whatever
> batches I am passing through the classifier it will train each instance
> through a single batch.
>
> I was just wondering if you could give me some information about
> partial_fit. Just for your reference, I was having a look at this code.
>
> https://dask-ml.readthedocs.io/en/latest/incremental.html
>
> Thanks!
>
> --
> Regards,
>
> Farzana Anowar
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/58df93f3/attachment-0001.html>

From fad469 at uregina.ca  Mon Sep  9 18:38:03 2019
From: fad469 at uregina.ca (Farzana Anowar)
Date: Mon, 09 Sep 2019 16:38:03 -0600
Subject: [scikit-learn] Incremental learning in scikit-learn
Message-ID: <9fd8813b59fbc3a29283a65b9e971d9a@uregina.ca>

Hello Sir/Madam,

I am going through the incremental learning algorithm in Scikit-learn. 
SGD in sci-kit learn is such a kind of algorithm that allows learning 
incrementally by passing chunks/batches. Now my question is: does 
sci-kit learn keeps all the batches for training data in memory? Or it 
keeps chunks/batches in memory up to a certain amount of size? Or it 
keeps only one chunk/batch while training in memory and removes the 
other trained chunks/batches after training? Does that mean it suffers 
from catastrophic forgetting?

Thanks!

-- 
Regards,

Farzana Anowar

From dbsullivan23 at gmail.com  Mon Sep  9 19:53:39 2019
From: dbsullivan23 at gmail.com (Daniel Sullivan)
Date: Mon, 9 Sep 2019 18:53:39 -0500
Subject: [scikit-learn] Incremental learning in scikit-learn
In-Reply-To: <9fd8813b59fbc3a29283a65b9e971d9a@uregina.ca>
References: <9fd8813b59fbc3a29283a65b9e971d9a@uregina.ca>
Message-ID: <CADe2PAMeLe0GV-bS=H8uDk76n7Pk90nO2ukSvBJ1vPMsMnKTEA@mail.gmail.com>

Hey Farzana,

The algorithm only keeps one batch in memory at a time. Between processing
over each batch, SGD keeps a set of weights that it alters with each
iteration of a data point or instance within a batch. This set of weights
functions as the persisted state between calls of partial_fit. That means
you will get the same results with SGD regardless of your batch size and
you can choose your batch size according to your memory constraints. Hope
that helps.

- Danny

On Mon, Sep 9, 2019 at 5:53 PM Farzana Anowar <fad469 at uregina.ca> wrote:

> Hello Sir/Madam,
>
> I am going through the incremental learning algorithm in Scikit-learn.
> SGD in sci-kit learn is such a kind of algorithm that allows learning
> incrementally by passing chunks/batches. Now my question is: does
> sci-kit learn keeps all the batches for training data in memory? Or it
> keeps chunks/batches in memory up to a certain amount of size? Or it
> keeps only one chunk/batch while training in memory and removes the
> other trained chunks/batches after training? Does that mean it suffers
> from catastrophic forgetting?
>
> Thanks!
>
> --
> Regards,
>
> Farzana Anowar
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190909/45770dfb/attachment.html>

From fad469 at uregina.ca  Mon Sep  9 20:15:38 2019
From: fad469 at uregina.ca (Farzana Anowar)
Date: Mon, 09 Sep 2019 18:15:38 -0600
Subject: [scikit-learn] Incremental learning in scikit-learn
In-Reply-To: <CADe2PAMeLe0GV-bS=H8uDk76n7Pk90nO2ukSvBJ1vPMsMnKTEA@mail.gmail.com>
References: <9fd8813b59fbc3a29283a65b9e971d9a@uregina.ca>
 <CADe2PAMeLe0GV-bS=H8uDk76n7Pk90nO2ukSvBJ1vPMsMnKTEA@mail.gmail.com>
Message-ID: <2a3df80b95a7e8bb5d3199273012b8c3@uregina.ca>

On 2019-09-09 17:53, Daniel Sullivan wrote:
> Hey Farzana,
> 
> The algorithm only keeps one batch in memory at a time. Between
> processing over each batch, SGD keeps a set of weights that it alters
> with each iteration of a data point or instance within a batch. This
> set of weights functions as the persisted state between calls of
> partial_fit. That means you will get the same results with SGD
> regardless of your batch size and you can choose your batch size
> according to your memory constraints. Hope that helps.
> 
> - Danny
> 
> On Mon, Sep 9, 2019 at 5:53 PM Farzana Anowar <fad469 at uregina.ca>
> wrote:
> 
>> Hello Sir/Madam,
>> 
>> I am going through the incremental learning algorithm in
>> Scikit-learn.
>> SGD in sci-kit learn is such a kind of algorithm that allows
>> learning
>> incrementally by passing chunks/batches. Now my question is: does
>> sci-kit learn keeps all the batches for training data in memory? Or
>> it
>> keeps chunks/batches in memory up to a certain amount of size? Or it
>> 
>> keeps only one chunk/batch while training in memory and removes the
>> other trained chunks/batches after training? Does that mean it
>> suffers
>> from catastrophic forgetting?
>> 
>> Thanks!
>> 
>> --
>> Regards,
>> 
>> Farzana Anowar
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
Thanks a lot!
-- 
Regards,

Farzana Anowar

From joel.nothman at gmail.com  Tue Sep 10 21:19:16 2019
From: joel.nothman at gmail.com (Joel Nothman)
Date: Wed, 11 Sep 2019 11:19:16 +1000
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
Message-ID: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>

As per our Governance <http://scikit-learn.org/stable/governance.html>
document, changes to API principles are to be established through an
Enhancement Proposal (SLEP) from which any core developer can call for a
vote on its acceptance.

*SLEP009 Keyword Only Arguments is the first SLEP up for a vote. Please see*

*https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html
<https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html>*

*This proposal discusses the path to gradually forcing users to pass
arguments, or most of them, as keyword arguments only.*

Core developers are invited to vote on this change until 11 October 2019 by
replying to this email thread.

All members of the community are welcome to comment on the proposal on this
mailing list, or to propose minor changes through Issues and Pull Requests
at https://github.com/scikit-learn/enhancement_proposals/.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190911/2429b16e/attachment.html>

From adrin.jalali at gmail.com  Wed Sep 11 04:58:28 2019
From: adrin.jalali at gmail.com (Adrin)
Date: Wed, 11 Sep 2019 10:58:28 +0200
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
References: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
Message-ID: <CAEOrW4-nVLRzmxeY7j-i9J+mWb1x77+dTuhnQyPAa05vUh2cSg@mail.gmail.com>

It's a yes for me.

On Wed, Sep 11, 2019 at 3:20 AM Joel Nothman <joel.nothman at gmail.com> wrote:

> As per our Governance <http://scikit-learn.org/stable/governance.html>
> document, changes to API principles are to be established through an
> Enhancement Proposal (SLEP) from which any core developer can call for a
> vote on its acceptance.
>
> *SLEP009 Keyword Only Arguments is the first SLEP up for a vote. Please
> see*
>
> *https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html
> <https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html>*
>
> *This proposal discusses the path to gradually forcing users to pass
> arguments, or most of them, as keyword arguments only.*
>
> Core developers are invited to vote on this change until 11 October 2019
> by replying to this email thread.
>
> All members of the community are welcome to comment on the proposal on
> this mailing list, or to propose minor changes through Issues and Pull
> Requests at https://github.com/scikit-learn/enhancement_proposals/.
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190911/cd0cd35f/attachment.html>

From ahowe42 at gmail.com  Wed Sep 11 07:20:01 2019
From: ahowe42 at gmail.com (Andrew Howe)
Date: Wed, 11 Sep 2019 12:20:01 +0100
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
References: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
Message-ID: <CANnYi3TDW=7g6C9s1Hq14dUE7giTYhv1-+HCqg70576ee3m=Pw@mail.gmail.com>

I'm strongly supportive of moving to keyword only arguments.

Andrew

<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
J. Andrew Howe, PhD
LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
Open Researcher and Contributor ID (ORCID)
<http://orcid.org/0000-0002-3553-1990>
Github Profile <http://github.com/ahowe42>
Personal Website <http://www.andrewhowe.com>
I live to learn, so I can learn to live. - me
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>


On Wed, Sep 11, 2019 at 2:21 AM Joel Nothman <joel.nothman at gmail.com> wrote:

> As per our Governance <http://scikit-learn.org/stable/governance.html>
> document, changes to API principles are to be established through an
> Enhancement Proposal (SLEP) from which any core developer can call for a
> vote on its acceptance.
>
> *SLEP009 Keyword Only Arguments is the first SLEP up for a vote. Please
> see*
>
> *https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html
> <https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html>*
>
> *This proposal discusses the path to gradually forcing users to pass
> arguments, or most of them, as keyword arguments only.*
>
> Core developers are invited to vote on this change until 11 October 2019
> by replying to this email thread.
>
> All members of the community are welcome to comment on the proposal on
> this mailing list, or to propose minor changes through Issues and Pull
> Requests at https://github.com/scikit-learn/enhancement_proposals/.
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190911/c5a8364b/attachment.html>

From alexandre.gramfort at inria.fr  Wed Sep 11 09:22:15 2019
From: alexandre.gramfort at inria.fr (Alexandre Gramfort)
Date: Wed, 11 Sep 2019 15:22:15 +0200
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CANnYi3TDW=7g6C9s1Hq14dUE7giTYhv1-+HCqg70576ee3m=Pw@mail.gmail.com>
References: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
 <CANnYi3TDW=7g6C9s1Hq14dUE7giTYhv1-+HCqg70576ee3m=Pw@mail.gmail.com>
Message-ID: <CADeotZrKmVTqBrJSZM4ojsWOVG7ryijzp4EWxt3XGULvPQEtPw@mail.gmail.com>

hi,

Adrin do you suggest this for everything or maybe just for __init__
params of estimators
and stuff that can come after X, y in fit eg sample_weights?

would:

clf.fit(X, y)

still be allowed?

Alex

From adrin.jalali at gmail.com  Wed Sep 11 09:38:09 2019
From: adrin.jalali at gmail.com (Adrin)
Date: Wed, 11 Sep 2019 15:38:09 +0200
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CADeotZrKmVTqBrJSZM4ojsWOVG7ryijzp4EWxt3XGULvPQEtPw@mail.gmail.com>
References: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
 <CANnYi3TDW=7g6C9s1Hq14dUE7giTYhv1-+HCqg70576ee3m=Pw@mail.gmail.com>
 <CADeotZrKmVTqBrJSZM4ojsWOVG7ryijzp4EWxt3XGULvPQEtPw@mail.gmail.com>
Message-ID: <CAEOrW48aFRmP9DQbG_VvJ2zgvqQ6KZ3F1h5quX1=WodruDbgTA@mail.gmail.com>

Hi,

I'm (mostly) the messenger, don't shoot me :P

It may help to summarize the SLEP:
1. This can be applied to all methods, not just __init__.
2. The SLEP doesn't say we have to apply it everywhere. It's mostly that it
lets us do that.
3. It doesn't make ALL inputs a keywords only argument. The common ones
such as X and y in fit(X, y) will stay as they are.
   Therefore clf.fit(X, y) will definitely be allowed.
4. Whether or not sample_weight should be keyword only or not in fit,
requires its own discussion, and the route of the discussion
   is defined in the SLEP.

In other words, if an input parameter is used as a positional argument less
frequently than X% of the time, then it can/should be
a keyword only argument. But the SLEP better defines these conditions.

I hope that clarifies it a little bit.

Adrin/

On Wed, Sep 11, 2019 at 3:23 PM Alexandre Gramfort <
alexandre.gramfort at inria.fr> wrote:

> hi,
>
> Adrin do you suggest this for everything or maybe just for __init__
> params of estimators
> and stuff that can come after X, y in fit eg sample_weights?
>
> would:
>
> clf.fit(X, y)
>
> still be allowed?
>
> Alex
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190911/e504e960/attachment.html>

From niourf at gmail.com  Wed Sep 11 14:21:34 2019
From: niourf at gmail.com (Nicolas Hug)
Date: Wed, 11 Sep 2019 14:21:34 -0400
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAEOrW48aFRmP9DQbG_VvJ2zgvqQ6KZ3F1h5quX1=WodruDbgTA@mail.gmail.com>
References: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
 <CANnYi3TDW=7g6C9s1Hq14dUE7giTYhv1-+HCqg70576ee3m=Pw@mail.gmail.com>
 <CADeotZrKmVTqBrJSZM4ojsWOVG7ryijzp4EWxt3XGULvPQEtPw@mail.gmail.com>
 <CAEOrW48aFRmP9DQbG_VvJ2zgvqQ6KZ3F1h5quX1=WodruDbgTA@mail.gmail.com>
Message-ID: <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com>

Since there is no explicit proposal in the SLEP it's not very clear what 
we need to vote for / against?

But overall I'm? + 1 on forcing kwargs for all __init__ methods.


Nicolas


On 9/11/19 9:38 AM, Adrin wrote:
> Hi,
>
> I'm (mostly) the messenger, don't shoot me :P
>
> It may help to summarize the SLEP:
> 1. This can be applied to all methods, not just __init__.
> 2. The SLEP doesn't say we have to apply it everywhere. It's mostly 
> that it lets us do that.
> 3. It doesn't make ALL inputs a keywords only argument. The common 
> ones such as X and y in fit(X, y) will stay as they are.
> ?? Therefore clf.fit(X, y) will definitely be allowed.
> 4. Whether or not sample_weight should be keyword only or not in fit, 
> requires its own discussion, and the route of the discussion
> ?? is defined in the SLEP.
>
> In other words, if an input parameter is used as a positional argument 
> less frequently than X% of the time, then it can/should be
> a keyword only argument. But the SLEP better defines these conditions.
>
> I hope that clarifies it a little bit.
>
> Adrin/
>
> On Wed, Sep 11, 2019 at 3:23 PM Alexandre Gramfort 
> <alexandre.gramfort at inria.fr <mailto:alexandre.gramfort at inria.fr>> wrote:
>
>     hi,
>
>     Adrin do you suggest this for everything or maybe just for __init__
>     params of estimators
>     and stuff that can come after X, y in fit eg sample_weights?
>
>     would:
>
>     clf.fit(X, y)
>
>     still be allowed?
>
>     Alex
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190911/2ed410c7/attachment.html>

From alexandre.gramfort at inria.fr  Wed Sep 11 15:41:18 2019
From: alexandre.gramfort at inria.fr (Alexandre Gramfort)
Date: Wed, 11 Sep 2019 21:41:18 +0200
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com>
References: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
 <CANnYi3TDW=7g6C9s1Hq14dUE7giTYhv1-+HCqg70576ee3m=Pw@mail.gmail.com>
 <CADeotZrKmVTqBrJSZM4ojsWOVG7ryijzp4EWxt3XGULvPQEtPw@mail.gmail.com>
 <CAEOrW48aFRmP9DQbG_VvJ2zgvqQ6KZ3F1h5quX1=WodruDbgTA@mail.gmail.com>
 <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com>
Message-ID: <CADeotZorhGDNyNRAqtva8x3YSN1qhzxWa5-VwuY-yr5jy2dPjw@mail.gmail.com>

> But overall I'm  + 1 on forcing kwargs for all __init__ methods.

yes I think it will help for __init__ methods

Alex

PS : I don't shoot people (usually...)

From qinhanmin2005 at sina.com  Wed Sep 11 20:37:16 2019
From: qinhanmin2005 at sina.com (Hanmin Qin)
Date: Thu, 12 Sep 2019 08:37:16 +0800
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
Message-ID: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>

I'll vote +1, though there're still lots of things to decide.
Hanmin Qin
----- Original Message -----
From: Alexandre Gramfort <alexandre.gramfort at inria.fr>
To: Scikit-learn mailing list <scikit-learn at python.org>
Subject: Re: [scikit-learn] Vote on SLEP009: keyword only arguments
Date: 2019-09-12 03:43


> But overall I'm  + 1 on forcing kwargs for all __init__ methods.
yes I think it will help for __init__ methods
Alex
PS : I don't shoot people (usually...)
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190912/1061a3dd/attachment.html>

From joel.nothman at gmail.com  Wed Sep 11 22:40:51 2019
From: joel.nothman at gmail.com (Joel Nothman)
Date: Thu, 12 Sep 2019 12:40:51 +1000
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
Message-ID: <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>

These there details of specific API changes to be decided:

The question being put, as per the SLEP, is:
do we want to utilise Python 3's force-keyword-argument syntax
and to change existing APIs which support arguments positionally to use
this syntax, via a deprecation period?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190912/713dda6c/attachment.html>

From spsayakpaul at gmail.com  Thu Sep 12 00:31:36 2019
From: spsayakpaul at gmail.com (Sayak Paul)
Date: Thu, 12 Sep 2019 10:01:36 +0530
Subject: [scikit-learn] MultiLabelBinarizer gives individual characters
 instead of the classes
Message-ID: <CAGa_XGFR1XoLSJjuAOpaww0SoAe5-XpU054Arzg6fNzhK-P-vg@mail.gmail.com>

Hi.

I am working on a Multi-label text classification problem. In order to
encode the labels, I am using MultiLabelBinarizer. The labels of the
dataset look like -

[image: image]
<https://user-images.githubusercontent.com/22957388/64753547-42b10a00-d541-11e9-80b2-f0a9245df327.png>

When I am using

mlb = MultiLabelBinarizer()
mlb.fit(labels)print(mlb.classes_)

I am getting -

[image: image]
<https://user-images.githubusercontent.com/22957388/64753625-78ee8980-d541-11e9-8833-a17769f1bf47.png>

Whereas, the output (sample output) I want is -

[image: image]
<https://user-images.githubusercontent.com/22957388/64753641-89066900-d541-11e9-98fb-fb9f9e1e7305.png>

I got the above output by -

mlb = MultiLabelBinarizer()
sample_labels = [
    ['stat.ML', 'cs.LG'],
    ['cs.CV', 'cs.RO']
]
mlb.fit(sample_labels)print(mlb.classes_)

Help would be very much appreciated here.

Here's the dataset I had prepared:
arXivdata.csv.zip
<https://github.com/scikit-learn/scikit-learn/files/3603687/arXivdata.csv.zip>

I stripped away the double quotes in the labels after loading it in a
pandas DataFrame
by -

import re

arxiv_data['labels'] = arxiv_data['labels'].str.replace(r"[\"]", '')

scikit-learn version: '0.21.3'

Sayak Paul | sayak.dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190912/4d7357e1/attachment-0001.html>

From loic.esteve at ymail.com  Thu Sep 12 01:24:48 2019
From: loic.esteve at ymail.com (=?utf-8?B?TG/Dr2MgRXN0w6h2ZQ==?=)
Date: Thu, 12 Sep 2019 07:24:48 +0200
Subject: [scikit-learn] MultiLabelBinarizer gives individual characters
 instead of the classes
In-Reply-To: <CAGa_XGFR1XoLSJjuAOpaww0SoAe5-XpU054Arzg6fNzhK-P-vg@mail.gmail.com>
References: <CAGa_XGFR1XoLSJjuAOpaww0SoAe5-XpU054Arzg6fNzhK-P-vg@mail.gmail.com>
Message-ID: <vnokwoeeozmn.fsf@ymail.com>

I think this caveat has been added in the dev doc (not yet in the stable
doc). You may want to read:
https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html
and in particular the part that starts with "A common mistake is to pass
in a list".

Cheers,
Lo?c

> Hi.
>
> I am working on a Multi-label text classification problem. In order to encode the labels, I am using MultiLabelBinarizer. The labels of the dataset look like -
>
> image
>
> When I am using
>
> mlb = MultiLabelBinarizer()
> mlb.fit(labels)
> print(mlb.classes_)
>
> I am getting -
>
> image
>
> Whereas, the output (sample output) I want is -
>
> image
>
> I got the above output by -
>
> mlb = MultiLabelBinarizer()
> sample_labels = [
>     ['stat.ML', 'cs.LG'],
>     ['cs.CV', 'cs.RO']
> ]
> mlb.fit(sample_labels)
> print(mlb.classes_)
>
> Help would be very much appreciated here.
>
> Here's the dataset I had prepared:
> arXivdata.csv.zip
>
> I stripped away the double quotes in the labels after loading it in a pandas DataFrame by -
>
> import re 
>
> arxiv_data['labels'] = arxiv_data['labels'].str.replace(r"[\"]", '')
>
> scikit-learn version: '0.21.3'
>
> Sayak Paul | sayak.dev


From g.lemaitre58 at gmail.com  Thu Sep 12 04:06:30 2019
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Thu, 12 Sep 2019 10:06:30 +0200
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
Message-ID: <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>

To the question: do we want to utilise Python 3's force-keyword-argument
syntax
and to change existing APIs which support arguments positionally to use this
syntax, via a deprecation period?

I am +1.

IMO, even if the syntax might be unknown, it will remain unknown until
projects
from the ecosystem are not using it.

To the question: which methods should be impacted?

I think we should be as gentle as possible at first. I am a little
concerned about
breaking some codes which were working fine before.

On Thu, 12 Sep 2019 at 04:43, Joel Nothman <joel.nothman at gmail.com> wrote:

> These there details of specific API changes to be decided:
>
> The question being put, as per the SLEP, is:
> do we want to utilise Python 3's force-keyword-argument syntax
> and to change existing APIs which support arguments positionally to use
> this syntax, via a deprecation period?
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190912/047eb83c/attachment.html>

From alejandro.peralta at mercadolibre.com  Thu Sep 12 08:23:03 2019
From: alejandro.peralta at mercadolibre.com (Alejandro Javier Peralta Frias)
Date: Thu, 12 Sep 2019 09:23:03 -0300
Subject: [scikit-learn] How can I enable line tracing for cython modules.
Message-ID: <CAL+ZpG6ccwnnJm1Q2CQM4qt+sfiMtHV5Tr=mgsgFpcmASzUhZA@mail.gmail.com>

Hello all,

To enable cython tracing (in particular I want to line trace neighbors
module) I understand that I have to recompile the cython modules with
CYTHON_TRACE=1 but I'm not sure where should I set this.

Should I use:

# distutils: define_macros=CYTHON_TRACE_NOGIL=1


In the files I want to trace?

Regards,
-- 
Ale
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190912/0377329b/attachment.html>

From spsayakpaul at gmail.com  Fri Sep 13 01:16:09 2019
From: spsayakpaul at gmail.com (Sayak Paul)
Date: Fri, 13 Sep 2019 10:46:09 +0530
Subject: [scikit-learn] scikit-learn Digest, Vol 42, Issue 14
In-Reply-To: <mailman.51.1568304006.4225.scikit-learn@python.org>
References: <mailman.51.1568304006.4225.scikit-learn@python.org>
Message-ID: <CAGa_XGHgifW7W81-dVv3R3e80Nz2BVx=eWiKv52oRf-BS+RjXw@mail.gmail.com>

I was able to solve the problem using -

mlb = MultiLabelBinarizer()
mlb.fit([y_train])

Thanks for the suggestions. The output of mlb.classes_ now looks the
following (first ten classes):
[image: image.png]

However, when I transform it using mlb.transform([y_train]), another
problem arrises -

[image: image.png]

Kindly suggest :)


Sayak Paul | sayak.dev


On Thu, Sep 12, 2019 at 9:33 PM <scikit-learn-request at python.org> wrote:

> Send scikit-learn mailing list submissions to
>         scikit-learn at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/scikit-learn
> or, via email, send a message with subject or body 'help' to
>         scikit-learn-request at python.org
>
> You can reach the person managing the list at
>         scikit-learn-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of scikit-learn digest..."
>
>
> Today's Topics:
>
>    1. Re: MultiLabelBinarizer gives individual characters instead
>       of the classes (Lo?c Est?ve)
>    2. Re: Vote on SLEP009: keyword only arguments (Guillaume Lema?tre)
>    3. How can I enable line tracing for cython modules.
>       (Alejandro Javier Peralta Frias)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 12 Sep 2019 07:24:48 +0200
> From: Lo?c Est?ve <loic.esteve at ymail.com>
> To: Scikit-learn mailing list <scikit-learn at python.org>
> Subject: Re: [scikit-learn] MultiLabelBinarizer gives individual
>         characters instead of the classes
> Message-ID: <vnokwoeeozmn.fsf at ymail.com>
> Content-Type: text/plain; charset=utf-8
>
> I think this caveat has been added in the dev doc (not yet in the stable
> doc). You may want to read:
>
> https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html
> and in particular the part that starts with "A common mistake is to pass
> in a list".
>
> Cheers,
> Lo?c
>
> > Hi.
> >
> > I am working on a Multi-label text classification problem. In order to
> encode the labels, I am using MultiLabelBinarizer. The labels of the
> dataset look like -
> >
> > image
> >
> > When I am using
> >
> > mlb = MultiLabelBinarizer()
> > mlb.fit(labels)
> > print(mlb.classes_)
> >
> > I am getting -
> >
> > image
> >
> > Whereas, the output (sample output) I want is -
> >
> > image
> >
> > I got the above output by -
> >
> > mlb = MultiLabelBinarizer()
> > sample_labels = [
> >     ['stat.ML', 'cs.LG'],
> >     ['cs.CV', 'cs.RO']
> > ]
> > mlb.fit(sample_labels)
> > print(mlb.classes_)
> >
> > Help would be very much appreciated here.
> >
> > Here's the dataset I had prepared:
> > arXivdata.csv.zip
> >
> > I stripped away the double quotes in the labels after loading it in a
> pandas DataFrame by -
> >
> > import re
> >
> > arxiv_data['labels'] = arxiv_data['labels'].str.replace(r"[\"]", '')
> >
> > scikit-learn version: '0.21.3'
> >
> > Sayak Paul | sayak.dev
>
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 12 Sep 2019 10:06:30 +0200
> From: Guillaume Lema?tre <g.lemaitre58 at gmail.com>
> To: Scikit-learn mailing list <scikit-learn at python.org>
> Subject: Re: [scikit-learn] Vote on SLEP009: keyword only arguments
> Message-ID:
>         <
> CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> To the question: do we want to utilise Python 3's force-keyword-argument
> syntax
> and to change existing APIs which support arguments positionally to use
> this
> syntax, via a deprecation period?
>
> I am +1.
>
> IMO, even if the syntax might be unknown, it will remain unknown until
> projects
> from the ecosystem are not using it.
>
> To the question: which methods should be impacted?
>
> I think we should be as gentle as possible at first. I am a little
> concerned about
> breaking some codes which were working fine before.
>
> On Thu, 12 Sep 2019 at 04:43, Joel Nothman <joel.nothman at gmail.com> wrote:
>
> > These there details of specific API changes to be decided:
> >
> > The question being put, as per the SLEP, is:
> > do we want to utilise Python 3's force-keyword-argument syntax
> > and to change existing APIs which support arguments positionally to use
> > this syntax, via a deprecation period?
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
>
>
> --
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/scikit-learn/attachments/20190912/047eb83c/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Thu, 12 Sep 2019 09:23:03 -0300
> From: Alejandro Javier Peralta Frias
>         <alejandro.peralta at mercadolibre.com>
> To: scikit-learn at python.org
> Subject: [scikit-learn] How can I enable line tracing for cython
>         modules.
> Message-ID:
>         <CAL+ZpG6ccwnnJm1Q2CQM4qt+sfiMtHV5Tr=
> mgsgFpcmASzUhZA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello all,
>
> To enable cython tracing (in particular I want to line trace neighbors
> module) I understand that I have to recompile the cython modules with
> CYTHON_TRACE=1 but I'm not sure where should I set this.
>
> Should I use:
>
> # distutils: define_macros=CYTHON_TRACE_NOGIL=1
>
>
> In the files I want to trace?
>
> Regards,
> --
> Ale
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/scikit-learn/attachments/20190912/0377329b/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> ------------------------------
>
> End of scikit-learn Digest, Vol 42, Issue 14
> ********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190913/921c80cd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 16117 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190913/921c80cd/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 7675 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190913/921c80cd/attachment-0003.png>

From jeremie.du-boisberranger at inria.fr  Fri Sep 13 05:53:39 2019
From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger)
Date: Fri, 13 Sep 2019 11:53:39 +0200
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
Message-ID: <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>

I don't know what is the policy about a sklearn 1.0 w.r.t api changes.

If it's meant to be a special release with possible api changes without 
deprecation cycles, I think this change is a good candidate for 1.0


Otherwise I'm +1 and agree with Guillaume, people will get used to it by 
using it.

J?r?mie


On 12/09/2019 10:06, Guillaume Lema?tre wrote:
> To the question: do we want to?utilise Python 3's 
> force-keyword-argument syntax
> and to change existing APIs which support arguments positionally to 
> use this
> syntax, via a deprecation period?
>
> I am +1.
>
> IMO, even if the syntax might be unknown, it will remain unknown until 
> projects
> from the ecosystem are not using it.
>
> To the question: which methods should be impacted?
>
> I think we should be as gentle as possible at first. I am a little 
> concerned about
> breaking some codes which were working fine before.
>
> On Thu, 12 Sep 2019 at 04:43, Joel Nothman <joel.nothman at gmail.com 
> <mailto:joel.nothman at gmail.com>> wrote:
>
>     These there details of specific API changes to be decided:
>
>     The question being put, as per the SLEP, is:
>     do we want to?utilise Python 3's force-keyword-argument syntax
>     and to change existing APIs which support arguments positionally
>     to use this syntax, via a deprecation period?
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> -- 
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190913/7d9cfc68/attachment.html>

From tmrsg11 at gmail.com  Fri Sep 13 22:38:16 2019
From: tmrsg11 at gmail.com (C W)
Date: Fri, 13 Sep 2019 22:38:16 -0400
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
Message-ID: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>

Hello all,
I'm very confused. Can the decision tree module handle both continuous and
categorical features in the dataset? In this case, it's just CART
(Classification and Regression Trees).

For example,
Gender Age Income  Car   Attendance
Male     30   10000   BMW          Yes
Female 35     9000  Toyota          No
Male     50   12000    Audi           Yes

According to the documentation
https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
it can not!

It says: "scikit-learn implementation does not support categorical
variables for now".

Is this true? If not, can someone point me to an example? If yes, what do
people do?

Thank you very much!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190913/9fb698d8/attachment.html>

From mail at sebastianraschka.com  Fri Sep 13 23:35:45 2019
From: mail at sebastianraschka.com (Sebastian Raschka)
Date: Fri, 13 Sep 2019 22:35:45 -0500
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
In-Reply-To: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
References: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
Message-ID: <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com>

Hi,

if you have the category "car" as shown in your example, this would effectively be something like

BMW=0
Toyota=1
Audi=2

Sure, the algorithm will execute just fine on the feature column with values in {0, 1, 2}. However, the problem is that it will come up with binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat it is a continuous variable. 

What you can do is to encode this feature via one-hot encoding -- basically extend it into 2 (or 3) binary variables. This has it's own problems (if you have a feature with many possible values, you will end up with a large number of binary variables, and they may dominate in the resulting tree over other feature variables).

In any case, I guess this is what 

> "scikit-learn implementation does not support categorical variables for now". 


means ;).

Best,
Sebastian

> On Sep 13, 2019, at 9:38 PM, C W <tmrsg11 at gmail.com> wrote:
> 
> Hello all,
> I'm very confused. Can the decision tree module handle both continuous and categorical features in the dataset? In this case, it's just CART (Classification and Regression Trees).
> 
> For example,
> Gender Age Income  Car   Attendance
> Male     30   10000   BMW          Yes
> Female 35     9000  Toyota          No
> Male     50   12000    Audi           Yes
> 
> According to the documentation https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, it can not! 
> 
> It says: "scikit-learn implementation does not support categorical variables for now". 
> 
> Is this true? If not, can someone point me to an example? If yes, what do people do?
> 
> Thank you very much!
> 
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


From tmrsg11 at gmail.com  Sat Sep 14 00:41:06 2019
From: tmrsg11 at gmail.com (C W)
Date: Sat, 14 Sep 2019 00:41:06 -0400
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
In-Reply-To: <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com>
References: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
 <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com>
Message-ID: <CAE2FW2=uDWwpeErRfRYYJpDSSWBgYzqZsxLRWeyPsRs5FhuW8A@mail.gmail.com>

Thanks, Sebastian. It's great to know that it works, just need to do
one-hot-encoding first.

I have mixed data type (continuous and categorical). Should I tree.
DecisionTreeClassifier() or tree.DecisionTreeRegressor()?

I'm guessing tree.DecisionTreeClassifier()?

Best,

Mike

On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka <
mail at sebastianraschka.com> wrote:

> Hi,
>
> if you have the category "car" as shown in your example, this would
> effectively be something like
>
> BMW=0
> Toyota=1
> Audi=2
>
> Sure, the algorithm will execute just fine on the feature column with
> values in {0, 1, 2}. However, the problem is that it will come up with
> binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat
> it is a continuous variable.
>
> What you can do is to encode this feature via one-hot encoding --
> basically extend it into 2 (or 3) binary variables. This has it's own
> problems (if you have a feature with many possible values, you will end up
> with a large number of binary variables, and they may dominate in the
> resulting tree over other feature variables).
>
> In any case, I guess this is what
>
> > "scikit-learn implementation does not support categorical variables for
> now".
>
>
> means ;).
>
> Best,
> Sebastian
>
> > On Sep 13, 2019, at 9:38 PM, C W <tmrsg11 at gmail.com> wrote:
> >
> > Hello all,
> > I'm very confused. Can the decision tree module handle both continuous
> and categorical features in the dataset? In this case, it's just CART
> (Classification and Regression Trees).
> >
> > For example,
> > Gender Age Income  Car   Attendance
> > Male     30   10000   BMW          Yes
> > Female 35     9000  Toyota          No
> > Male     50   12000    Audi           Yes
> >
> > According to the documentation
> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
> it can not!
> >
> > It says: "scikit-learn implementation does not support categorical
> variables for now".
> >
> > Is this true? If not, can someone point me to an example? If yes, what
> do people do?
> >
> > Thank you very much!
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190914/7021307f/attachment.html>

From mail at sebastianraschka.com  Sat Sep 14 00:56:15 2019
From: mail at sebastianraschka.com (Sebastian Raschka)
Date: Fri, 13 Sep 2019 23:56:15 -0500
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
In-Reply-To: <CAE2FW2=uDWwpeErRfRYYJpDSSWBgYzqZsxLRWeyPsRs5FhuW8A@mail.gmail.com>
References: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
 <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com>
 <CAE2FW2=uDWwpeErRfRYYJpDSSWBgYzqZsxLRWeyPsRs5FhuW8A@mail.gmail.com>
Message-ID: <E95C5BE1-6183-4FF4-869F-3BCA33DBFD87@sebastianraschka.com>

Hi Mike,

just to make sure we are on the same page,

> I have mixed data type (continuous and categorical). Should I tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()?

that's independent from the previous email. The comment 

> > "scikit-learn implementation does not support categorical variables for now". 

we discussed via the previous email was referring to feature variables. Whether you choose the DT regressor or classifier depends on the format of your target variable.

Best,
Sebastian

> On Sep 13, 2019, at 11:41 PM, C W <tmrsg11 at gmail.com> wrote:
> 
> Thanks, Sebastian. It's great to know that it works, just need to do one-hot-encoding first.
> 
> I have mixed data type (continuous and categorical). Should I tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()?
> 
> I'm guessing tree.DecisionTreeClassifier()?
> 
> Best,
> 
> Mike
> 
> On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka <mail at sebastianraschka.com> wrote:
> Hi,
> 
> if you have the category "car" as shown in your example, this would effectively be something like
> 
> BMW=0
> Toyota=1
> Audi=2
> 
> Sure, the algorithm will execute just fine on the feature column with values in {0, 1, 2}. However, the problem is that it will come up with binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat it is a continuous variable. 
> 
> What you can do is to encode this feature via one-hot encoding -- basically extend it into 2 (or 3) binary variables. This has it's own problems (if you have a feature with many possible values, you will end up with a large number of binary variables, and they may dominate in the resulting tree over other feature variables).
> 
> In any case, I guess this is what 
> 
> > "scikit-learn implementation does not support categorical variables for now". 
> 
> 
> means ;).
> 
> Best,
> Sebastian
> 
> > On Sep 13, 2019, at 9:38 PM, C W <tmrsg11 at gmail.com> wrote:
> > 
> > Hello all,
> > I'm very confused. Can the decision tree module handle both continuous and categorical features in the dataset? In this case, it's just CART (Classification and Regression Trees).
> > 
> > For example,
> > Gender Age Income  Car   Attendance
> > Male     30   10000   BMW          Yes
> > Female 35     9000  Toyota          No
> > Male     50   12000    Audi           Yes
> > 
> > According to the documentation https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, it can not! 
> > 
> > It says: "scikit-learn implementation does not support categorical variables for now". 
> > 
> > Is this true? If not, can someone point me to an example? If yes, what do people do?
> > 
> > Thank you very much!
> > 
> > 
> > 
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


From tmrsg11 at gmail.com  Sat Sep 14 01:26:58 2019
From: tmrsg11 at gmail.com (C W)
Date: Sat, 14 Sep 2019 01:26:58 -0400
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
In-Reply-To: <E95C5BE1-6183-4FF4-869F-3BCA33DBFD87@sebastianraschka.com>
References: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
 <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com>
 <CAE2FW2=uDWwpeErRfRYYJpDSSWBgYzqZsxLRWeyPsRs5FhuW8A@mail.gmail.com>
 <E95C5BE1-6183-4FF4-869F-3BCA33DBFD87@sebastianraschka.com>
Message-ID: <CAE2FW2=bBk1=qb5bywCmPV+i1aG5nHPz5PLvcx1h1zsmt_rGwQ@mail.gmail.com>

Ahh, you are right. Regression vs. Classification is about the type of
target variable, not features.

Thanks, more clear now.

Mike

On Sat, Sep 14, 2019 at 1:23 AM Sebastian Raschka <mail at sebastianraschka.com>
wrote:

> Hi Mike,
>
> just to make sure we are on the same page,
>
> > I have mixed data type (continuous and categorical). Should I
> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()?
>
> that's independent from the previous email. The comment
>
> > > "scikit-learn implementation does not support categorical variables
> for now".
>
> we discussed via the previous email was referring to feature variables.
> Whether you choose the DT regressor or classifier depends on the format of
> your target variable.
>
> Best,
> Sebastian
>
> > On Sep 13, 2019, at 11:41 PM, C W <tmrsg11 at gmail.com> wrote:
> >
> > Thanks, Sebastian. It's great to know that it works, just need to do
> one-hot-encoding first.
> >
> > I have mixed data type (continuous and categorical). Should I
> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()?
> >
> > I'm guessing tree.DecisionTreeClassifier()?
> >
> > Best,
> >
> > Mike
> >
> > On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka <
> mail at sebastianraschka.com> wrote:
> > Hi,
> >
> > if you have the category "car" as shown in your example, this would
> effectively be something like
> >
> > BMW=0
> > Toyota=1
> > Audi=2
> >
> > Sure, the algorithm will execute just fine on the feature column with
> values in {0, 1, 2}. However, the problem is that it will come up with
> binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat
> it is a continuous variable.
> >
> > What you can do is to encode this feature via one-hot encoding --
> basically extend it into 2 (or 3) binary variables. This has it's own
> problems (if you have a feature with many possible values, you will end up
> with a large number of binary variables, and they may dominate in the
> resulting tree over other feature variables).
> >
> > In any case, I guess this is what
> >
> > > "scikit-learn implementation does not support categorical variables
> for now".
> >
> >
> > means ;).
> >
> > Best,
> > Sebastian
> >
> > > On Sep 13, 2019, at 9:38 PM, C W <tmrsg11 at gmail.com> wrote:
> > >
> > > Hello all,
> > > I'm very confused. Can the decision tree module handle both continuous
> and categorical features in the dataset? In this case, it's just CART
> (Classification and Regression Trees).
> > >
> > > For example,
> > > Gender Age Income  Car   Attendance
> > > Male     30   10000   BMW          Yes
> > > Female 35     9000  Toyota          No
> > > Male     50   12000    Audi           Yes
> > >
> > > According to the documentation
> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
> it can not!
> > >
> > > It says: "scikit-learn implementation does not support categorical
> variables for now".
> > >
> > > Is this true? If not, can someone point me to an example? If yes, what
> do people do?
> > >
> > > Thank you very much!
> > >
> > >
> > >
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190914/f934b83b/attachment.html>

From g.lemaitre58 at gmail.com  Sat Sep 14 05:14:17 2019
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Sat, 14 Sep 2019 11:14:17 +0200
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
In-Reply-To: <CAE2FW2=bBk1=qb5bywCmPV+i1aG5nHPz5PLvcx1h1zsmt_rGwQ@mail.gmail.com>
References: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
 <7A421666-3446-44A4-985C-B7708C8D6966@sebastianraschka.com>
 <CAE2FW2=uDWwpeErRfRYYJpDSSWBgYzqZsxLRWeyPsRs5FhuW8A@mail.gmail.com>
 <E95C5BE1-6183-4FF4-869F-3BCA33DBFD87@sebastianraschka.com>
 <CAE2FW2=bBk1=qb5bywCmPV+i1aG5nHPz5PLvcx1h1zsmt_rGwQ@mail.gmail.com>
Message-ID: <CACDxx9hnsHFd-hta8r5gjkG_qjj-wQ5buAVO2M-yx8zq4LDUyw@mail.gmail.com>

I will just add that if you have heterogeneous types, you might want to
look at the ColumnTransformer:
https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html

You might want to apply some scaling (would not be relevant for tree
thought) and encode categories
(ordinal encoding for the tree-based) and then dispatch these data to a
decision tree.

The previous example shows how to construct such a preprocessor and
pipeline it with a predictor.

On Sat, 14 Sep 2019 at 07:29, C W <tmrsg11 at gmail.com> wrote:

> Ahh, you are right. Regression vs. Classification is about the type of
> target variable, not features.
>
> Thanks, more clear now.
>
> Mike
>
> On Sat, Sep 14, 2019 at 1:23 AM Sebastian Raschka <
> mail at sebastianraschka.com> wrote:
>
>> Hi Mike,
>>
>> just to make sure we are on the same page,
>>
>> > I have mixed data type (continuous and categorical). Should I
>> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()?
>>
>> that's independent from the previous email. The comment
>>
>> > > "scikit-learn implementation does not support categorical variables
>> for now".
>>
>> we discussed via the previous email was referring to feature variables.
>> Whether you choose the DT regressor or classifier depends on the format of
>> your target variable.
>>
>> Best,
>> Sebastian
>>
>> > On Sep 13, 2019, at 11:41 PM, C W <tmrsg11 at gmail.com> wrote:
>> >
>> > Thanks, Sebastian. It's great to know that it works, just need to do
>> one-hot-encoding first.
>> >
>> > I have mixed data type (continuous and categorical). Should I
>> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()?
>> >
>> > I'm guessing tree.DecisionTreeClassifier()?
>> >
>> > Best,
>> >
>> > Mike
>> >
>> > On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka <
>> mail at sebastianraschka.com> wrote:
>> > Hi,
>> >
>> > if you have the category "car" as shown in your example, this would
>> effectively be something like
>> >
>> > BMW=0
>> > Toyota=1
>> > Audi=2
>> >
>> > Sure, the algorithm will execute just fine on the feature column with
>> values in {0, 1, 2}. However, the problem is that it will come up with
>> binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat
>> it is a continuous variable.
>> >
>> > What you can do is to encode this feature via one-hot encoding --
>> basically extend it into 2 (or 3) binary variables. This has it's own
>> problems (if you have a feature with many possible values, you will end up
>> with a large number of binary variables, and they may dominate in the
>> resulting tree over other feature variables).
>> >
>> > In any case, I guess this is what
>> >
>> > > "scikit-learn implementation does not support categorical variables
>> for now".
>> >
>> >
>> > means ;).
>> >
>> > Best,
>> > Sebastian
>> >
>> > > On Sep 13, 2019, at 9:38 PM, C W <tmrsg11 at gmail.com> wrote:
>> > >
>> > > Hello all,
>> > > I'm very confused. Can the decision tree module handle both
>> continuous and categorical features in the dataset? In this case, it's just
>> CART (Classification and Regression Trees).
>> > >
>> > > For example,
>> > > Gender Age Income  Car   Attendance
>> > > Male     30   10000   BMW          Yes
>> > > Female 35     9000  Toyota          No
>> > > Male     50   12000    Audi           Yes
>> > >
>> > > According to the documentation
>> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
>> it can not!
>> > >
>> > > It says: "scikit-learn implementation does not support categorical
>> variables for now".
>> > >
>> > > Is this true? If not, can someone point me to an example? If yes,
>> what do people do?
>> > >
>> > > Thank you very much!
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > scikit-learn mailing list
>> > > scikit-learn at python.org
>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>> >
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn at python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn at python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190914/b34fdb12/attachment-0001.html>

From joel.nothman at gmail.com  Sat Sep 14 08:10:29 2019
From: joel.nothman at gmail.com (Joel Nothman)
Date: Sat, 14 Sep 2019 22:10:29 +1000
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
Message-ID: <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>

I am +1 for this change.

I agree that users will accommodate the syntax sooner or later.

On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, <
jeremie.du-boisberranger at inria.fr> wrote:

> I don't know what is the policy about a sklearn 1.0 w.r.t api changes.
>
> If it's meant to be a special release with possible api changes without
> deprecation cycles, I think this change is a good candidate for 1.0
>
>
> Otherwise I'm +1 and agree with Guillaume, people will get used to it by
> using it.
>
> J?r?mie
>
>
>
> On 12/09/2019 10:06, Guillaume Lema?tre wrote:
>
> To the question: do we want to utilise Python 3's force-keyword-argument
> syntax
> and to change existing APIs which support arguments positionally to use
> this
> syntax, via a deprecation period?
>
> I am +1.
>
> IMO, even if the syntax might be unknown, it will remain unknown until
> projects
> from the ecosystem are not using it.
>
> To the question: which methods should be impacted?
>
> I think we should be as gentle as possible at first. I am a little
> concerned about
> breaking some codes which were working fine before.
>
> On Thu, 12 Sep 2019 at 04:43, Joel Nothman <joel.nothman at gmail.com> wrote:
>
>> These there details of specific API changes to be decided:
>>
>> The question being put, as per the SLEP, is:
>> do we want to utilise Python 3's force-keyword-argument syntax
>> and to change existing APIs which support arguments positionally to use
>> this syntax, via a deprecation period?
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190914/ca01530e/attachment.html>

From jlopez at ende.cc  Sat Sep 14 09:23:13 2019
From: jlopez at ende.cc (=?UTF-8?Q?Javier_L=C3=B3pez?=)
Date: Sat, 14 Sep 2019 14:23:13 +0100
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
In-Reply-To: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
References: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
Message-ID: <CAJn5T5WXJT-6RjyXkDYZ2GC=TY+s299JgY8LRyzAOpx9FcbYug@mail.gmail.com>

If you have datasets with many categorical features, and perhaps many
categories, the tools in sklearn are quite limited,
but there are alternative implementations of boosted trees that are
designed with categorical features in mind. Take a look
at catboost [1], which has an sklearn-compatible API.

J

[1] https://catboost.ai/

On Sat, Sep 14, 2019 at 3:40 AM C W <tmrsg11 at gmail.com> wrote:

> Hello all,
> I'm very confused. Can the decision tree module handle both continuous and
> categorical features in the dataset? In this case, it's just CART
> (Classification and Regression Trees).
>
> For example,
> Gender Age Income  Car   Attendance
> Male     30   10000   BMW          Yes
> Female 35     9000  Toyota          No
> Male     50   12000    Audi           Yes
>
> According to the documentation
> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
> it can not!
>
> It says: "scikit-learn implementation does not support categorical
> variables for now".
>
> Is this true? If not, can someone point me to an example? If yes, what do
> people do?
>
> Thank you very much!
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190914/c6365316/attachment.html>

From spsayakpaul at gmail.com  Sat Sep 14 12:30:37 2019
From: spsayakpaul at gmail.com (Sayak Paul)
Date: Sat, 14 Sep 2019 22:00:37 +0530
Subject: [scikit-learn] Problem regarding MultiLabelBinarizer
In-Reply-To: <mailman.5532.1568351783.30343.scikit-learn@python.org>
References: <mailman.5532.1568351783.30343.scikit-learn@python.org>
Message-ID: <CAGa_XGG65NpdhEcQuBmD6T=XD17Ca_xuz4BwgN1wq_+6t2MjMg@mail.gmail.com>

Sayak Paul | sayak.dev


---------- Forwarded message ---------
From: <scikit-learn-request at python.org>
Date: Fri, Sep 13, 2019 at 10:46 AM
Subject: scikit-learn Digest, Vol 42, Issue 15
To: <scikit-learn at python.org>


Send scikit-learn mailing list submissions to
        scikit-learn at python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
        scikit-learn-request at python.org

You can reach the person managing the list at
        scikit-learn-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

   1. Re: scikit-learn Digest, Vol 42, Issue 14 (Sayak Paul)


----------------------------------------------------------------------

Message: 1
Date: Fri, 13 Sep 2019 10:46:09 +0530
From: Sayak Paul <spsayakpaul at gmail.com>
To: scikit-learn at python.org
Subject: Re: [scikit-learn] scikit-learn Digest, Vol 42, Issue 14
Message-ID:
        <CAGa_XGHgifW7W81-dVv3R3e80Nz2BVx=eWiKv52oRf-BS+RjXw at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I was able to solve the problem using -

mlb = MultiLabelBinarizer()
mlb.fit([y_train])

Thanks for the suggestions. The output of mlb.classes_ now looks the
following (first ten classes):
[image: image.png]

However, when I transform it using mlb.transform([y_train]), another
problem arrises -

[image: image.png]

Kindly suggest :)


Sayak Paul | sayak.dev


On Thu, Sep 12, 2019 at 9:33 PM <scikit-learn-request at python.org> wrote:

> Send scikit-learn mailing list submissions to
>         scikit-learn at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/scikit-learn
> or, via email, send a message with subject or body 'help' to
>         scikit-learn-request at python.org
>
> You can reach the person managing the list at
>         scikit-learn-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of scikit-learn digest..."
>
>
> Today's Topics:
>
>    1. Re: MultiLabelBinarizer gives individual characters instead
>       of the classes (Lo?c Est?ve)
>    2. Re: Vote on SLEP009: keyword only arguments (Guillaume Lema?tre)
>    3. How can I enable line tracing for cython modules.
>       (Alejandro Javier Peralta Frias)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 12 Sep 2019 07:24:48 +0200
> From: Lo?c Est?ve <loic.esteve at ymail.com>
> To: Scikit-learn mailing list <scikit-learn at python.org>
> Subject: Re: [scikit-learn] MultiLabelBinarizer gives individual
>         characters instead of the classes
> Message-ID: <vnokwoeeozmn.fsf at ymail.com>
> Content-Type: text/plain; charset=utf-8
>
> I think this caveat has been added in the dev doc (not yet in the stable
> doc). You may want to read:
>
>
https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html
> and in particular the part that starts with "A common mistake is to pass
> in a list".
>
> Cheers,
> Lo?c
>
> > Hi.
> >
> > I am working on a Multi-label text classification problem. In order to
> encode the labels, I am using MultiLabelBinarizer. The labels of the
> dataset look like -
> >
> > image
> >
> > When I am using
> >
> > mlb = MultiLabelBinarizer()
> > mlb.fit(labels)
> > print(mlb.classes_)
> >
> > I am getting -
> >
> > image
> >
> > Whereas, the output (sample output) I want is -
> >
> > image
> >
> > I got the above output by -
> >
> > mlb = MultiLabelBinarizer()
> > sample_labels = [
> >     ['stat.ML', 'cs.LG'],
> >     ['cs.CV', 'cs.RO']
> > ]
> > mlb.fit(sample_labels)
> > print(mlb.classes_)
> >
> > Help would be very much appreciated here.
> >
> > Here's the dataset I had prepared:
> > arXivdata.csv.zip
> >
> > I stripped away the double quotes in the labels after loading it in a
> pandas DataFrame by -
> >
> > import re
> >
> > arxiv_data['labels'] = arxiv_data['labels'].str.replace(r"[\"]", '')
> >
> > scikit-learn version: '0.21.3'
> >
> > Sayak Paul | sayak.dev
>
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 12 Sep 2019 10:06:30 +0200
> From: Guillaume Lema?tre <g.lemaitre58 at gmail.com>
> To: Scikit-learn mailing list <scikit-learn at python.org>
> Subject: Re: [scikit-learn] Vote on SLEP009: keyword only arguments
> Message-ID:
>         <
> CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> To the question: do we want to utilise Python 3's force-keyword-argument
> syntax
> and to change existing APIs which support arguments positionally to use
> this
> syntax, via a deprecation period?
>
> I am +1.
>
> IMO, even if the syntax might be unknown, it will remain unknown until
> projects
> from the ecosystem are not using it.
>
> To the question: which methods should be impacted?
>
> I think we should be as gentle as possible at first. I am a little
> concerned about
> breaking some codes which were working fine before.
>
> On Thu, 12 Sep 2019 at 04:43, Joel Nothman <joel.nothman at gmail.com> wrote:
>
> > These there details of specific API changes to be decided:
> >
> > The question being put, as per the SLEP, is:
> > do we want to utilise Python 3's force-keyword-argument syntax
> > and to change existing APIs which support arguments positionally to use
> > this syntax, via a deprecation period?
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
>
>
> --
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
>
http://mail.python.org/pipermail/scikit-learn/attachments/20190912/047eb83c/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Thu, 12 Sep 2019 09:23:03 -0300
> From: Alejandro Javier Peralta Frias
>         <alejandro.peralta at mercadolibre.com>
> To: scikit-learn at python.org
> Subject: [scikit-learn] How can I enable line tracing for cython
>         modules.
> Message-ID:
>         <CAL+ZpG6ccwnnJm1Q2CQM4qt+sfiMtHV5Tr=
> mgsgFpcmASzUhZA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello all,
>
> To enable cython tracing (in particular I want to line trace neighbors
> module) I understand that I have to recompile the cython modules with
> CYTHON_TRACE=1 but I'm not sure where should I set this.
>
> Should I use:
>
> # distutils: define_macros=CYTHON_TRACE_NOGIL=1
>
>
> In the files I want to trace?
>
> Regards,
> --
> Ale
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
>
http://mail.python.org/pipermail/scikit-learn/attachments/20190912/0377329b/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> ------------------------------
>
> End of scikit-learn Digest, Vol 42, Issue 14
> ********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://mail.python.org/pipermail/scikit-learn/attachments/20190913/921c80cd/attachment.html
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 16117 bytes
Desc: not available
URL: <
http://mail.python.org/pipermail/scikit-learn/attachments/20190913/921c80cd/attachment.png
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 7675 bytes
Desc: not available
URL: <
http://mail.python.org/pipermail/scikit-learn/attachments/20190913/921c80cd/attachment-0001.png
>

------------------------------

Subject: Digest Footer

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn


------------------------------

End of scikit-learn Digest, Vol 42, Issue 15
********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190914/d2c7d692/attachment-0001.html>

From tmrsg11 at gmail.com  Sat Sep 14 14:57:22 2019
From: tmrsg11 at gmail.com (C W)
Date: Sat, 14 Sep 2019 14:57:22 -0400
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
In-Reply-To: <CAJn5T5WXJT-6RjyXkDYZ2GC=TY+s299JgY8LRyzAOpx9FcbYug@mail.gmail.com>
References: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
 <CAJn5T5WXJT-6RjyXkDYZ2GC=TY+s299JgY8LRyzAOpx9FcbYug@mail.gmail.com>
Message-ID: <CAE2FW2k5XzphFjQqwbWbP7uyh4jhzttpJF+5t6tgE6yQRb_M9w@mail.gmail.com>

Thanks, Guillaume.
Column transformer looks pretty neat. I've also heard though, this pipeline
can be tedious to set up? Specifying what you want for every feature is a
pain.

Jaiver,
Actually, you guessed right. My real data has only one numerical
variable, looks more like this:

Gender Date            Income  Car   Attendance
Male     2019/3/01   10000   BMW          Yes
Female 2019/5/02    9000   Toyota          No
Male     2019/7/15   12000    Audi           Yes

I am predicting income using all other categorical variables. Maybe it is
catboost!

Thanks,

M


On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez <jlopez at ende.cc> wrote:

> If you have datasets with many categorical features, and perhaps many
> categories, the tools in sklearn are quite limited,
> but there are alternative implementations of boosted trees that are
> designed with categorical features in mind. Take a look
> at catboost [1], which has an sklearn-compatible API.
>
> J
>
> [1] https://catboost.ai/
>
> On Sat, Sep 14, 2019 at 3:40 AM C W <tmrsg11 at gmail.com> wrote:
>
>> Hello all,
>> I'm very confused. Can the decision tree module handle both continuous
>> and categorical features in the dataset? In this case, it's just CART
>> (Classification and Regression Trees).
>>
>> For example,
>> Gender Age Income  Car   Attendance
>> Male     30   10000   BMW          Yes
>> Female 35     9000  Toyota          No
>> Male     50   12000    Audi           Yes
>>
>> According to the documentation
>> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
>> it can not!
>>
>> It says: "scikit-learn implementation does not support categorical
>> variables for now".
>>
>> Is this true? If not, can someone point me to an example? If yes, what do
>> people do?
>>
>> Thank you very much!
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190914/fdfe8fd7/attachment.html>

From thomasjpfan at gmail.com  Sat Sep 14 18:21:12 2019
From: thomasjpfan at gmail.com (Thomas J Fan)
Date: Sat, 14 Sep 2019 18:21:12 -0400
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
Message-ID: <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>

+1 from me

On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman <joel.nothman at gmail.com> wrote:

> I am +1 for this change.
>
> I agree that users will accommodate the syntax sooner or later.
>
> On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, <
> jeremie.du-boisberranger at inria.fr> wrote:
>
>> I don't know what is the policy about a sklearn 1.0 w.r.t api changes.
>>
>> If it's meant to be a special release with possible api changes without
>> deprecation cycles, I think this change is a good candidate for 1.0
>>
>>
>> Otherwise I'm +1 and agree with Guillaume, people will get used to it by
>> using it.
>>
>> J?r?mie
>>
>>
>>
>> On 12/09/2019 10:06, Guillaume Lema?tre wrote:
>>
>> To the question: do we want to utilise Python 3's force-keyword-argument
>> syntax
>> and to change existing APIs which support arguments positionally to use
>> this
>> syntax, via a deprecation period?
>>
>> I am +1.
>>
>> IMO, even if the syntax might be unknown, it will remain unknown until
>> projects
>> from the ecosystem are not using it.
>>
>> To the question: which methods should be impacted?
>>
>> I think we should be as gentle as possible at first. I am a little
>> concerned about
>> breaking some codes which were working fine before.
>>
>> On Thu, 12 Sep 2019 at 04:43, Joel Nothman <joel.nothman at gmail.com>
>> wrote:
>>
>>> These there details of specific API changes to be decided:
>>>
>>> The question being put, as per the SLEP, is:
>>> do we want to utilise Python 3's force-keyword-argument syntax
>>> and to change existing APIs which support arguments positionally to use
>>> this syntax, via a deprecation period?
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> --
>> Guillaume Lemaitre
>> INRIA Saclay - Parietal team
>> Center for Data Science Paris-Saclay
>> https://glemaitre.github.io/
>>
>> _______________________________________________
>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190914/0164b8c7/attachment.html>

From g.lemaitre58 at gmail.com  Sun Sep 15 08:16:29 2019
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Sun, 15 Sep 2019 14:16:29 +0200
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
In-Reply-To: <CAE2FW2k5XzphFjQqwbWbP7uyh4jhzttpJF+5t6tgE6yQRb_M9w@mail.gmail.com>
References: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
 <CAJn5T5WXJT-6RjyXkDYZ2GC=TY+s299JgY8LRyzAOpx9FcbYug@mail.gmail.com>
 <CAE2FW2k5XzphFjQqwbWbP7uyh4jhzttpJF+5t6tgE6yQRb_M9w@mail.gmail.com>
Message-ID: <CACDxx9iv5YyyAMnYfim1t4vre_8CDvV2CT=RVMMZwCXKJ5mfag@mail.gmail.com>

On Sat, 14 Sep 2019 at 20:59, C W <tmrsg11 at gmail.com> wrote:

> Thanks, Guillaume.
> Column transformer looks pretty neat. I've also heard though, this
> pipeline can be tedious to set up? Specifying what you want for every
> feature is a pain.
>

It would be interesting for us which part of the pipeline is tedious to set
up to know if we can improve something there.
Do you mean, that you would like to automatically detect of which type of
feature (categorical/numerical) and apply a
default encoder/scaling such as discuss there:
https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127

IMO, one a user perspective, it would be cleaner in some cases at the cost
of applying blindly a black box
which might be dangerous.


>
> Jaiver,
> Actually, you guessed right. My real data has only one numerical
> variable, looks more like this:
>
> Gender Date            Income  Car   Attendance
> Male     2019/3/01   10000   BMW          Yes
> Female 2019/5/02    9000   Toyota          No
> Male     2019/7/15   12000    Audi           Yes
>
> I am predicting income using all other categorical variables. Maybe it is
> catboost!
>
> Thanks,
>
> M
>
>
>
>
>
>
> On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez <jlopez at ende.cc> wrote:
>
>> If you have datasets with many categorical features, and perhaps many
>> categories, the tools in sklearn are quite limited,
>> but there are alternative implementations of boosted trees that are
>> designed with categorical features in mind. Take a look
>> at catboost [1], which has an sklearn-compatible API.
>>
>> J
>>
>> [1] https://catboost.ai/
>>
>> On Sat, Sep 14, 2019 at 3:40 AM C W <tmrsg11 at gmail.com> wrote:
>>
>>> Hello all,
>>> I'm very confused. Can the decision tree module handle both continuous
>>> and categorical features in the dataset? In this case, it's just CART
>>> (Classification and Regression Trees).
>>>
>>> For example,
>>> Gender Age Income  Car   Attendance
>>> Male     30   10000   BMW          Yes
>>> Female 35     9000  Toyota          No
>>> Male     50   12000    Audi           Yes
>>>
>>> According to the documentation
>>> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
>>> it can not!
>>>
>>> It says: "scikit-learn implementation does not support categorical
>>> variables for now".
>>>
>>> Is this true? If not, can someone point me to an example? If yes, what
>>> do people do?
>>>
>>> Thank you very much!
>>>
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190915/0d4e0680/attachment-0001.html>

From spsayakpaul at gmail.com  Mon Sep 16 00:12:24 2019
From: spsayakpaul at gmail.com (Sayak Paul)
Date: Mon, 16 Sep 2019 09:42:24 +0530
Subject: [scikit-learn] MultiBinarizer issue
Message-ID: <CAGa_XGEK_0g5ev8y0BzPX9SRngHpvEEqzM0wQWDKf78Pb0k-MA@mail.gmail.com>

I am working on a multi-label text classification problem. In order to
encode the labels, I am using MultiLabelBinarizer. The labels of the
dataset look like -

[cs.AI, cs.CL, cs.CV, cs.NE, stat.ML][cs.CL, cs.AI, cs.LG, cs.NE,
stat.ML][cs.CL, cs.AI, cs.LG, cs.NE, stat.ML][stat.ML, cs.AI, cs.CL,
cs.LG, cs.NE][cs.CL, cs.AI, cs.LG, cs.NE, stat.ML]

When I am using

mlb = MultiLabelBinarizer()
mlb.fit(labels)print(mlb.classes_)

It gives me -

array([' ', ',', '.', 'A', 'B', 'C', 'D', 'E', 'G', 'H', 'I', 'L', 'M',
       'N', 'O', 'P', 'R', 'S', 'T', 'V', 'Y', '[', ']', 'a', 'c', 'h',
       'm', 's', 't'], dtype=object)

I (partially) fixed this problem by mlb.fit([y_train]) and I got (I printed
first 10 classes) -

array(['[cs.AI, cs.CC]', '[cs.AI, cs.CV]', '[cs.AI, cs.CY]',
       '[cs.AI, cs.DB]', '[cs.AI, cs.DS]', '[cs.AI, cs.GT]',
       '[cs.AI, cs.HC]', '[cs.AI, cs.IR]', '[cs.AI, cs.LG, stat.ML]',
       '[cs.AI, cs.LG]'], dtype=object)

Ideally, it should output the individual classes (there may be something
wrong in my code). When I am using mlb.fit_transform([y_train]), I am
getting -

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

Help would be very much appreciated.

Here's the corresponding StackOverflow issue:
https://stackoverflow.com/questions/57917936/multilabelbinarizer-gives-individual-characters-instead-of-the-classes

Sayak Paul | sayak.dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190916/c2cfce10/attachment-0001.html>

From zephyr14 at gmail.com  Mon Sep 16 06:02:18 2019
From: zephyr14 at gmail.com (Vlad Niculae)
Date: Mon, 16 Sep 2019 11:02:18 +0100
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
Message-ID: <CAFJw_eGemJBXtyQp1vTWEaR-dSz-UhXkN1-+zDUDcFAOx7HASQ@mail.gmail.com>

I vote +1

Hopefully keyword-only args become normalized and a future will come where
I won't see `x.sum(0)` anymore

VN

On Sat, Sep 14, 2019 at 11:23 PM Thomas J Fan <thomasjpfan at gmail.com> wrote:

> +1 from me
>
> On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman <joel.nothman at gmail.com>
> wrote:
>
>> I am +1 for this change.
>>
>> I agree that users will accommodate the syntax sooner or later.
>>
>> On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, <
>> jeremie.du-boisberranger at inria.fr> wrote:
>>
>>> I don't know what is the policy about a sklearn 1.0 w.r.t api changes.
>>>
>>> If it's meant to be a special release with possible api changes without
>>> deprecation cycles, I think this change is a good candidate for 1.0
>>>
>>>
>>> Otherwise I'm +1 and agree with Guillaume, people will get used to it by
>>> using it.
>>>
>>> J?r?mie
>>>
>>>
>>>
>>> On 12/09/2019 10:06, Guillaume Lema?tre wrote:
>>>
>>> To the question: do we want to utilise Python 3's force-keyword-argument
>>> syntax
>>> and to change existing APIs which support arguments positionally to use
>>> this
>>> syntax, via a deprecation period?
>>>
>>> I am +1.
>>>
>>> IMO, even if the syntax might be unknown, it will remain unknown until
>>> projects
>>> from the ecosystem are not using it.
>>>
>>> To the question: which methods should be impacted?
>>>
>>> I think we should be as gentle as possible at first. I am a little
>>> concerned about
>>> breaking some codes which were working fine before.
>>>
>>> On Thu, 12 Sep 2019 at 04:43, Joel Nothman <joel.nothman at gmail.com>
>>> wrote:
>>>
>>>> These there details of specific API changes to be decided:
>>>>
>>>> The question being put, as per the SLEP, is:
>>>> do we want to utilise Python 3's force-keyword-argument syntax
>>>> and to change existing APIs which support arguments positionally to use
>>>> this syntax, via a deprecation period?
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>
>>>
>>> --
>>> Guillaume Lemaitre
>>> INRIA Saclay - Parietal team
>>> Center for Data Science Paris-Saclay
>>> https://glemaitre.github.io/
>>>
>>> _______________________________________________
>>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190916/78672b96/attachment.html>

From rth.yurchak at gmail.com  Mon Sep 16 06:04:25 2019
From: rth.yurchak at gmail.com (Roman Yurchak)
Date: Mon, 16 Sep 2019 12:04:25 +0200
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
Message-ID: <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>

+1 assuming we are careful about continuing to allow some frequently 
used positional arguments, even in __init__.

For instance,

n_components = 10
pca = PCA(n_components)

is still more readable, I think, than,

pca = PCA(n_components=n_components)


-- 
Roman

On 15/09/2019 00:21, Thomas J Fan wrote:
> +1 from me
> 
> On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman <joel.nothman at gmail.com 
> <mailto:joel.nothman at gmail.com>> wrote:
> 
>     I am +1 for this change.
> 
>     I agree that users will accommodate the syntax sooner or later.
> 
>     On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger,
>     <jeremie.du-boisberranger at inria.fr
>     <mailto:jeremie.du-boisberranger at inria.fr>> wrote:
> 
>         I don't know what is the policy about a sklearn 1.0 w.r.t api
>         changes.
> 
>         If it's meant to be a special release with possible api changes
>         without deprecation cycles, I think this change is a good
>         candidate for 1.0
> 
> 
>         Otherwise I'm +1 and agree with Guillaume, people will get used
>         to it by using it.
> 
>         J?r?mie
> 
> 
> 
>         On 12/09/2019 10:06, Guillaume Lema?tre wrote:
>>         To the question: do we want to?utilise Python 3's
>>         force-keyword-argument syntax
>>         and to change existing APIs which support arguments
>>         positionally to use this
>>         syntax, via a deprecation period?
>>
>>         I am +1.
>>
>>         IMO, even if the syntax might be unknown, it will remain
>>         unknown until projects
>>         from the ecosystem are not using it.
>>
>>         To the question: which methods should be impacted?
>>
>>         I think we should be as gentle as possible at first. I am a
>>         little concerned about
>>         breaking some codes which were working fine before.
>>
>>         On Thu, 12 Sep 2019 at 04:43, Joel Nothman
>>         <joel.nothman at gmail.com <mailto:joel.nothman at gmail.com>> wrote:
>>
>>             These there details of specific API changes to be decided:
>>
>>             The question being put, as per the SLEP, is:
>>             do we want to?utilise Python 3's force-keyword-argument syntax
>>             and to change existing APIs which support arguments
>>             positionally to use this syntax, via a deprecation period?
>>             _______________________________________________
>>             scikit-learn mailing list
>>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>>             https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>
>>         -- 
>>         Guillaume Lemaitre
>>         INRIA Saclay - Parietal team
>>         Center for Data Science Paris-Saclay
>>         https://glemaitre.github.io/
>>
>>         _______________________________________________
>>         scikit-learn mailing list
>>         scikit-learn at python.org  <mailto:scikit-learn at python.org>
>>         https://mail.python.org/mailman/listinfo/scikit-learn
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
> 
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 


From joel.nothman at gmail.com  Mon Sep 16 09:28:57 2019
From: joel.nothman at gmail.com (Joel Nothman)
Date: Mon, 16 Sep 2019 23:28:57 +1000
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
Message-ID: <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>

Btw, consensus is defined by 2/3 of cast votes by core devs, according to
our Governance. https://scikit-learn.org/dev/about.html#authors lists 20
core devs.

That is, we could consider this resolved after 14 votes in favour.

So far, if I've interpreted correctly:

+1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, roman)
= 9.

I've not understood a clear position from Alex. I'm assuming Andreas is in
favour given his comments elsewhere, but we've not seen comment here.


On Mon, 16 Sep 2019 at 20:06, Roman Yurchak <rth.yurchak at gmail.com> wrote:

> +1 assuming we are careful about continuing to allow some frequently
> used positional arguments, even in __init__.
>
> For instance,
>
> n_components = 10
> pca = PCA(n_components)
>
> is still more readable, I think, than,
>
> pca = PCA(n_components=n_components)
>
>
> --
> Roman
>
> On 15/09/2019 00:21, Thomas J Fan wrote:
> > +1 from me
> >
> > On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman <joel.nothman at gmail.com
> > <mailto:joel.nothman at gmail.com>> wrote:
> >
> >     I am +1 for this change.
> >
> >     I agree that users will accommodate the syntax sooner or later.
> >
> >     On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger,
> >     <jeremie.du-boisberranger at inria.fr
> >     <mailto:jeremie.du-boisberranger at inria.fr>> wrote:
> >
> >         I don't know what is the policy about a sklearn 1.0 w.r.t api
> >         changes.
> >
> >         If it's meant to be a special release with possible api changes
> >         without deprecation cycles, I think this change is a good
> >         candidate for 1.0
> >
> >
> >         Otherwise I'm +1 and agree with Guillaume, people will get used
> >         to it by using it.
> >
> >         J?r?mie
> >
> >
> >
> >         On 12/09/2019 10:06, Guillaume Lema?tre wrote:
> >>         To the question: do we want to utilise Python 3's
> >>         force-keyword-argument syntax
> >>         and to change existing APIs which support arguments
> >>         positionally to use this
> >>         syntax, via a deprecation period?
> >>
> >>         I am +1.
> >>
> >>         IMO, even if the syntax might be unknown, it will remain
> >>         unknown until projects
> >>         from the ecosystem are not using it.
> >>
> >>         To the question: which methods should be impacted?
> >>
> >>         I think we should be as gentle as possible at first. I am a
> >>         little concerned about
> >>         breaking some codes which were working fine before.
> >>
> >>         On Thu, 12 Sep 2019 at 04:43, Joel Nothman
> >>         <joel.nothman at gmail.com <mailto:joel.nothman at gmail.com>> wrote:
> >>
> >>             These there details of specific API changes to be decided:
> >>
> >>             The question being put, as per the SLEP, is:
> >>             do we want to utilise Python 3's force-keyword-argument
> syntax
> >>             and to change existing APIs which support arguments
> >>             positionally to use this syntax, via a deprecation period?
> >>             _______________________________________________
> >>             scikit-learn mailing list
> >>             scikit-learn at python.org <mailto:scikit-learn at python.org>
> >>             https://mail.python.org/mailman/listinfo/scikit-learn
> >>
> >>
> >>
> >>         --
> >>         Guillaume Lemaitre
> >>         INRIA Saclay - Parietal team
> >>         Center for Data Science Paris-Saclay
> >>         https://glemaitre.github.io/
> >>
> >>         _______________________________________________
> >>         scikit-learn mailing list
> >>         scikit-learn at python.org  <mailto:scikit-learn at python.org>
> >>         https://mail.python.org/mailman/listinfo/scikit-learn
> >         _______________________________________________
> >         scikit-learn mailing list
> >         scikit-learn at python.org <mailto:scikit-learn at python.org>
> >         https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >     _______________________________________________
> >     scikit-learn mailing list
> >     scikit-learn at python.org <mailto:scikit-learn at python.org>
> >     https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190916/12bed797/attachment-0001.html>

From bertrand.thirion at inria.fr  Mon Sep 16 12:58:42 2019
From: bertrand.thirion at inria.fr (Bertrand Thirion)
Date: Mon, 16 Sep 2019 18:58:42 +0200 (CEST)
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
 <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
Message-ID: <562566079.46061430.1568653122836.JavaMail.zimbra@inria.fr>

+1 for the geenralization of kw arguments. 
This is obviously relevant for __init__ methods, why good old (X,y) should remain positional. 
Best, 
Bertrand 

> De: "Joel Nothman" <joel.nothman at gmail.com>
> ?: "Scikit-learn mailing list" <scikit-learn at python.org>
> Envoy?: Lundi 16 Septembre 2019 15:28:57
> Objet: Re: [scikit-learn] Vote on SLEP009: keyword only arguments

> Btw, consensus is defined by 2/3 of cast votes by core devs, according to our
> Governance. [ https://scikit-learn.org/dev/about.html#authors |
> https://scikit-learn.org/dev/about.html#authors ] lists 20 core devs.
> That is, we could consider this resolved after 14 votes in favour.
> So far, if I've interpreted correctly:

> +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, roman) = 9.

> I've not understood a clear position from Alex. I'm assuming Andreas is in
> favour given his comments elsewhere, but we've not seen comment here.

> On Mon, 16 Sep 2019 at 20:06, Roman Yurchak < [ mailto:rth.yurchak at gmail.com |
> rth.yurchak at gmail.com ] > wrote:

>> +1 assuming we are careful about continuing to allow some frequently
>> used positional arguments, even in __init__.

>> For instance,

>> n_components = 10
>> pca = PCA(n_components)

>> is still more readable, I think, than,

>> pca = PCA(n_components=n_components)

>> --
>> Roman

>> On 15/09/2019 00:21, Thomas J Fan wrote:
>> > +1 from me

>>> On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman < [ mailto:joel.nothman at gmail.com |
>> > joel.nothman at gmail.com ]
>> > <mailto: [ mailto:joel.nothman at gmail.com | joel.nothman at gmail.com ] >> wrote:

>> > I am +1 for this change.

>> > I agree that users will accommodate the syntax sooner or later.

>> > On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger,
>>> < [ mailto:jeremie.du-boisberranger at inria.fr | jeremie.du-boisberranger at inria.fr
>> > ]
>>> <mailto: [ mailto:jeremie.du-boisberranger at inria.fr |
>> > jeremie.du-boisberranger at inria.fr ] >> wrote:

>> > I don't know what is the policy about a sklearn 1.0 w.r.t api
>> > changes.

>> > If it's meant to be a special release with possible api changes
>> > without deprecation cycles, I think this change is a good
>> > candidate for 1.0


>> > Otherwise I'm +1 and agree with Guillaume, people will get used
>> > to it by using it.

>> > J?r?mie


>> > On 12/09/2019 10:06, Guillaume Lema?tre wrote:
>> >> To the question: do we want to utilise Python 3's
>> >> force-keyword-argument syntax
>> >> and to change existing APIs which support arguments
>> >> positionally to use this
>> >> syntax, via a deprecation period?

>> >> I am +1.

>> >> IMO, even if the syntax might be unknown, it will remain
>> >> unknown until projects
>> >> from the ecosystem are not using it.

>> >> To the question: which methods should be impacted?

>> >> I think we should be as gentle as possible at first. I am a
>> >> little concerned about
>> >> breaking some codes which were working fine before.

>> >> On Thu, 12 Sep 2019 at 04:43, Joel Nothman
>>>> < [ mailto:joel.nothman at gmail.com | joel.nothman at gmail.com ] <mailto: [
>> >> mailto:joel.nothman at gmail.com | joel.nothman at gmail.com ] >> wrote:

>> >> These there details of specific API changes to be decided:

>> >> The question being put, as per the SLEP, is:
>> >> do we want to utilise Python 3's force-keyword-argument syntax
>> >> and to change existing APIs which support arguments
>> >> positionally to use this syntax, via a deprecation period?
>> >> _______________________________________________
>> >> scikit-learn mailing list
>>>> [ mailto:scikit-learn at python.org | scikit-learn at python.org ] <mailto: [
>> >> mailto:scikit-learn at python.org | scikit-learn at python.org ] >
>>>> [ https://mail.python.org/mailman/listinfo/scikit-learn |
>> >> https://mail.python.org/mailman/listinfo/scikit-learn ]


>> >> --
>> >> Guillaume Lemaitre
>> >> INRIA Saclay - Parietal team
>> >> Center for Data Science Paris-Saclay
>> >> [ https://glemaitre.github.io/ | https://glemaitre.github.io/ ]

>> >> _______________________________________________
>> >> scikit-learn mailing list
>>>> [ mailto:scikit-learn at python.org | scikit-learn at python.org ] <mailto: [
>> >> mailto:scikit-learn at python.org | scikit-learn at python.org ] >
>>>> [ https://mail.python.org/mailman/listinfo/scikit-learn |
>> >> https://mail.python.org/mailman/listinfo/scikit-learn ]
>> > _______________________________________________
>> > scikit-learn mailing list
>>> [ mailto:scikit-learn at python.org | scikit-learn at python.org ] <mailto: [
>> > mailto:scikit-learn at python.org | scikit-learn at python.org ] >
>>> [ https://mail.python.org/mailman/listinfo/scikit-learn |
>> > https://mail.python.org/mailman/listinfo/scikit-learn ]

>> > _______________________________________________
>> > scikit-learn mailing list
>>> [ mailto:scikit-learn at python.org | scikit-learn at python.org ] <mailto: [
>> > mailto:scikit-learn at python.org | scikit-learn at python.org ] >
>>> [ https://mail.python.org/mailman/listinfo/scikit-learn |
>> > https://mail.python.org/mailman/listinfo/scikit-learn ]


>> > _______________________________________________
>> > scikit-learn mailing list
>> > [ mailto:scikit-learn at python.org | scikit-learn at python.org ]
>>> [ https://mail.python.org/mailman/listinfo/scikit-learn |
>> > https://mail.python.org/mailman/listinfo/scikit-learn ]


>> _______________________________________________
>> scikit-learn mailing list
>> [ mailto:scikit-learn at python.org | scikit-learn at python.org ]
>> [ https://mail.python.org/mailman/listinfo/scikit-learn |
>> https://mail.python.org/mailman/listinfo/scikit-learn ]

> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190916/7934757a/attachment.html>

From tom.duprelatour at orange.fr  Mon Sep 16 14:00:58 2019
From: tom.duprelatour at orange.fr (Tom DLT)
Date: Mon, 16 Sep 2019 11:00:58 -0700
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
 <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
Message-ID: <CAGKmC=vfF1f2NPncXvFvW32BcRbVb+M8rEfXk+4cHm6WQjSwQg@mail.gmail.com>

I vote +1

Tom

Le lun. 16 sept. 2019 ? 06:30, Joel Nothman <joel.nothman at gmail.com> a
?crit :

> Btw, consensus is defined by 2/3 of cast votes by core devs, according to
> our Governance. https://scikit-learn.org/dev/about.html#authors lists 20
> core devs.
>
> That is, we could consider this resolved after 14 votes in favour.
>
> So far, if I've interpreted correctly:
>
> +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, roman)
> = 9.
>
> I've not understood a clear position from Alex. I'm assuming Andreas is in
> favour given his comments elsewhere, but we've not seen comment here.
>
>
>
> On Mon, 16 Sep 2019 at 20:06, Roman Yurchak <rth.yurchak at gmail.com> wrote:
>
>> +1 assuming we are careful about continuing to allow some frequently
>> used positional arguments, even in __init__.
>>
>> For instance,
>>
>> n_components = 10
>> pca = PCA(n_components)
>>
>> is still more readable, I think, than,
>>
>> pca = PCA(n_components=n_components)
>>
>>
>> --
>> Roman
>>
>> On 15/09/2019 00:21, Thomas J Fan wrote:
>> > +1 from me
>> >
>> > On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman <joel.nothman at gmail.com
>> > <mailto:joel.nothman at gmail.com>> wrote:
>> >
>> >     I am +1 for this change.
>> >
>> >     I agree that users will accommodate the syntax sooner or later.
>> >
>> >     On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger,
>> >     <jeremie.du-boisberranger at inria.fr
>> >     <mailto:jeremie.du-boisberranger at inria.fr>> wrote:
>> >
>> >         I don't know what is the policy about a sklearn 1.0 w.r.t api
>> >         changes.
>> >
>> >         If it's meant to be a special release with possible api changes
>> >         without deprecation cycles, I think this change is a good
>> >         candidate for 1.0
>> >
>> >
>> >         Otherwise I'm +1 and agree with Guillaume, people will get used
>> >         to it by using it.
>> >
>> >         J?r?mie
>> >
>> >
>> >
>> >         On 12/09/2019 10:06, Guillaume Lema?tre wrote:
>> >>         To the question: do we want to utilise Python 3's
>> >>         force-keyword-argument syntax
>> >>         and to change existing APIs which support arguments
>> >>         positionally to use this
>> >>         syntax, via a deprecation period?
>> >>
>> >>         I am +1.
>> >>
>> >>         IMO, even if the syntax might be unknown, it will remain
>> >>         unknown until projects
>> >>         from the ecosystem are not using it.
>> >>
>> >>         To the question: which methods should be impacted?
>> >>
>> >>         I think we should be as gentle as possible at first. I am a
>> >>         little concerned about
>> >>         breaking some codes which were working fine before.
>> >>
>> >>         On Thu, 12 Sep 2019 at 04:43, Joel Nothman
>> >>         <joel.nothman at gmail.com <mailto:joel.nothman at gmail.com>>
>> wrote:
>> >>
>> >>             These there details of specific API changes to be decided:
>> >>
>> >>             The question being put, as per the SLEP, is:
>> >>             do we want to utilise Python 3's force-keyword-argument
>> syntax
>> >>             and to change existing APIs which support arguments
>> >>             positionally to use this syntax, via a deprecation period?
>> >>             _______________________________________________
>> >>             scikit-learn mailing list
>> >>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>> >>             https://mail.python.org/mailman/listinfo/scikit-learn
>> >>
>> >>
>> >>
>> >>         --
>> >>         Guillaume Lemaitre
>> >>         INRIA Saclay - Parietal team
>> >>         Center for Data Science Paris-Saclay
>> >>         https://glemaitre.github.io/
>> >>
>> >>         _______________________________________________
>> >>         scikit-learn mailing list
>> >>         scikit-learn at python.org  <mailto:scikit-learn at python.org>
>> >>         https://mail.python.org/mailman/listinfo/scikit-learn
>> >         _______________________________________________
>> >         scikit-learn mailing list
>> >         scikit-learn at python.org <mailto:scikit-learn at python.org>
>> >         https://mail.python.org/mailman/listinfo/scikit-learn
>> >
>> >     _______________________________________________
>> >     scikit-learn mailing list
>> >     scikit-learn at python.org <mailto:scikit-learn at python.org>
>> >     https://mail.python.org/mailman/listinfo/scikit-learn
>> >
>> >
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn at python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>> >
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190916/d660a51a/attachment-0001.html>

From gael.varoquaux at normalesup.org  Mon Sep 16 15:32:37 2019
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Mon, 16 Sep 2019 15:32:37 -0400
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
 <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
Message-ID: <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org>

On Mon, Sep 16, 2019 at 11:28:57PM +1000, Joel Nothman wrote:
> That is, we could consider this resolved after 14 votes in favour.

> So far, if I've interpreted correctly:

> +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad, roman) = 9.

> I've not understood a clear position from Alex. I'm assuming Andreas is in
> favour given his comments elsewhere, but we've not seen comment here.

I was planning to vote -0 mostly to avoid the vote to seem like bandwagon
(and because I am not fully sold on the idea), but I actually want this
to move forward, and it seems that my vote is needed.

Hence, I vote +1.

Hopefully Andreas and Alex make their position clear and we can adopt the
SLEP.

Thank you to you all.

Ga?l

> On Mon, 16 Sep 2019 at 20:06, Roman Yurchak <rth.yurchak at gmail.com> wrote:

>     +1 assuming we are careful about continuing to allow some frequently
>     used positional arguments, even in __init__.

>     For instance,

>     n_components = 10
>     pca = PCA(n_components)

>     is still more readable, I think, than,

>     pca = PCA(n_components=n_components)
-- 
    Gael Varoquaux
    Research Director, INRIA 
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From albertthomas88 at gmail.com  Mon Sep 16 16:22:48 2019
From: albertthomas88 at gmail.com (Albert Thomas)
Date: Mon, 16 Sep 2019 22:22:48 +0200
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
 <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
 <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org>
Message-ID: <CAK6amUP_s8UTeBz61s-=gfELndWTXD7+6JF7XCA9k45vi-9i=w@mail.gmail.com>

Hi all,

Just a few comments about this SLEP from a contributor and user of the
library :).

I think it is important for users to be able to quickly and easily
know/learn which arguments should be keyword arguments when they use
scikit-learn. As a user, I do not want to have to double check each time I
use a function the arguments that should be keyword arguments. Hence the
following sentence of the SLEP "the decision for these methods should be
the same throughout the library in order to keep a consistent interface to
the user"  is very important to me. Also how  is this going to be rendered
by sphinx in the doc? (before numpydoc supports section for parameters)

Thanks,
Albert


On Mon, Sep 16, 2019 at 9:33 PM Gael Varoquaux <
gael.varoquaux at normalesup.org> wrote:

> On Mon, Sep 16, 2019 at 11:28:57PM +1000, Joel Nothman wrote:
> > That is, we could consider this resolved after 14 votes in favour.
>
> > So far, if I've interpreted correctly:
>
> > +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad,
> roman) = 9.
>
> > I've not understood a clear position from Alex. I'm assuming Andreas is
> in
> > favour given his comments elsewhere, but we've not seen comment here.
>
> I was planning to vote -0 mostly to avoid the vote to seem like bandwagon
> (and because I am not fully sold on the idea), but I actually want this
> to move forward, and it seems that my vote is needed.
>
> Hence, I vote +1.
>
> Hopefully Andreas and Alex make their position clear and we can adopt the
> SLEP.
>
> Thank you to you all.
>
> Ga?l
>
> > On Mon, 16 Sep 2019 at 20:06, Roman Yurchak <rth.yurchak at gmail.com>
> wrote:
>
> >     +1 assuming we are careful about continuing to allow some frequently
> >     used positional arguments, even in __init__.
>
> >     For instance,
>
> >     n_components = 10
> >     pca = PCA(n_components)
>
> >     is still more readable, I think, than,
>
> >     pca = PCA(n_components=n_components)
> --
>     Gael Varoquaux
>     Research Director, INRIA
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190916/44b6fc6e/attachment.html>

From alexandre.gramfort at inria.fr  Tue Sep 17 02:09:40 2019
From: alexandre.gramfort at inria.fr (Alexandre Gramfort)
Date: Tue, 17 Sep 2019 08:09:40 +0200
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAK6amUP_s8UTeBz61s-=gfELndWTXD7+6JF7XCA9k45vi-9i=w@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
 <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
 <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org>
 <CAK6amUP_s8UTeBz61s-=gfELndWTXD7+6JF7XCA9k45vi-9i=w@mail.gmail.com>
Message-ID: <CADeotZpHJ6P0zbNTFOw7gSh5owzQke6CrMH4qOqRGROQ+FKAcg@mail.gmail.com>

Yes I am +1 for positional arguments for the __init__ of the estimators.

Alex


On Mon, Sep 16, 2019 at 10:25 PM Albert Thomas <albertthomas88 at gmail.com>
wrote:

> Hi all,
>
> Just a few comments about this SLEP from a contributor and user of the
> library :).
>
> I think it is important for users to be able to quickly and easily
> know/learn which arguments should be keyword arguments when they use
> scikit-learn. As a user, I do not want to have to double check each time I
> use a function the arguments that should be keyword arguments. Hence the
> following sentence of the SLEP "the decision for these methods should be
> the same throughout the library in order to keep a consistent interface to
> the user"  is very important to me. Also how  is this going to be
> rendered by sphinx in the doc? (before numpydoc supports section for
> parameters)
>
> Thanks,
> Albert
>
>
> On Mon, Sep 16, 2019 at 9:33 PM Gael Varoquaux <
> gael.varoquaux at normalesup.org> wrote:
>
>> On Mon, Sep 16, 2019 at 11:28:57PM +1000, Joel Nothman wrote:
>> > That is, we could consider this resolved after 14 votes in favour.
>>
>> > So far, if I've interpreted correctly:
>>
>> > +1 (adrin, nicolas, hanmin, joel, guillaume, jeremie, thomas, vlad,
>> roman) = 9.
>>
>> > I've not understood a clear position from Alex. I'm assuming Andreas is
>> in
>> > favour given his comments elsewhere, but we've not seen comment here.
>>
>> I was planning to vote -0 mostly to avoid the vote to seem like bandwagon
>> (and because I am not fully sold on the idea), but I actually want this
>> to move forward, and it seems that my vote is needed.
>>
>> Hence, I vote +1.
>>
>> Hopefully Andreas and Alex make their position clear and we can adopt the
>> SLEP.
>>
>> Thank you to you all.
>>
>> Ga?l
>>
>> > On Mon, 16 Sep 2019 at 20:06, Roman Yurchak <rth.yurchak at gmail.com>
>> wrote:
>>
>> >     +1 assuming we are careful about continuing to allow some frequently
>> >     used positional arguments, even in __init__.
>>
>> >     For instance,
>>
>> >     n_components = 10
>> >     pca = PCA(n_components)
>>
>> >     is still more readable, I think, than,
>>
>> >     pca = PCA(n_components=n_components)
>> --
>>     Gael Varoquaux
>>     Research Director, INRIA
>>     http://gael-varoquaux.info
>> http://twitter.com/GaelVaroquaux
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190917/dbe27f42/attachment-0001.html>

From joel.nothman at gmail.com  Tue Sep 17 03:42:56 2019
From: joel.nothman at gmail.com (Joel Nothman)
Date: Tue, 17 Sep 2019 17:42:56 +1000
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CADeotZpHJ6P0zbNTFOw7gSh5owzQke6CrMH4qOqRGROQ+FKAcg@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
 <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
 <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org>
 <CAK6amUP_s8UTeBz61s-=gfELndWTXD7+6JF7XCA9k45vi-9i=w@mail.gmail.com>
 <CADeotZpHJ6P0zbNTFOw7gSh5owzQke6CrMH4qOqRGROQ+FKAcg@mail.gmail.com>
Message-ID: <CAAkaFLVns9X5Fwt32ORk-+zaj6P8N2aRT7xz8PUO1h9QRUgtpg@mail.gmail.com>

I think you mean keyword-only, Alex

On Tue., 17 Sep. 2019, 4:11 pm Alexandre Gramfort, <
alexandre.gramfort at inria.fr> wrote:

> Yes I am +1 for positional arguments for the __init__ of the estimators.
>
> Alex
>


Albert: my position when reviewing changes in accordance with this SLEP
would be to (a) perhaps get usage evidence as discussed in the SLEP pull
request review; and (b) apply a rule of thumb like "are the semantics
reasonably clear when the argument is passed positionally?" I think they
are clear for PCA's components, for Pipeline's steps, and for
GridSearchCV's estimator and parameter grid. Other parameters of those
estimators seem more suitable for keyword-only.

Trickier is whether n_components in TSNE should follow PCA in being
positional... It's not as commonly set by users.

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190917/811a063f/attachment.html>

From joel.nothman at gmail.com  Tue Sep 17 19:28:30 2019
From: joel.nothman at gmail.com (Joel Nothman)
Date: Wed, 18 Sep 2019 09:28:30 +1000
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLVns9X5Fwt32ORk-+zaj6P8N2aRT7xz8PUO1h9QRUgtpg@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
 <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
 <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org>
 <CAK6amUP_s8UTeBz61s-=gfELndWTXD7+6JF7XCA9k45vi-9i=w@mail.gmail.com>
 <CADeotZpHJ6P0zbNTFOw7gSh5owzQke6CrMH4qOqRGROQ+FKAcg@mail.gmail.com>
 <CAAkaFLVns9X5Fwt32ORk-+zaj6P8N2aRT7xz8PUO1h9QRUgtpg@mail.gmail.com>
Message-ID: <CAAkaFLXD1b4NwAFc7o6h7NGBEe8rJ_Hk5b2XB+DcPhROowiWJw@mail.gmail.com>

If we were to assume Andy's vote in the positive, him having been a major
proponent of this change, we would say this was accepted by a unanimous
vote of a majority of core developers.

Having tentatively accepted is good enough basis for us to start
implementation. And ideally getting statistics to guide that.

We should tackle this module by module, perhaps working through estimators
before other public API.

As such, I have opened
https://github.com/scikit-learn/scikit-learn/issues/15005 to start tracking
this work.

Thanks everyone, and Andy, we await your vote!

J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/70b6e230/attachment.html>

From niourf at gmail.com  Wed Sep 18 10:29:20 2019
From: niourf at gmail.com (Nicolas Hug)
Date: Wed, 18 Sep 2019 10:29:20 -0400
Subject: [scikit-learn] Monthly meetings between core developers +
 "Hello World"
In-Reply-To: <136faf1a-5514-1c21-7514-0673b4ddde81@gmail.com>
References: <20190718100451.B608918C0090@webmail.sinamail.sina.com.cn>
 <CAAkaFLX60h==ZazLjZat6tv7pSnykYHnY8YiF3+XP35q0vSiBQ@mail.gmail.com>
 <CAGfF14_7VJfcpgTqd_JPu51oO2bJJTyDfV6Oy9eHVzye=d2pmw@mail.gmail.com>
 <CAEOrW48LZDTkAd9yX==cxHBuGEbZApg9QWFrLsAmVFXEpdN-xA@mail.gmail.com>
 <b57b51be-478d-2098-ee3b-fceaec21d906@gmail.com>
 <CAEOrW4-L08RDzKkidBb5r0UXUzRJ9rrjEhGUUPY+ogezQ1E4Wg@mail.gmail.com>
 <08716118-a3a8-0131-aeca-f97a8aba3f25@gmail.com>
 <60f8ad16-3e13-765a-4c4a-6a80f7a4d998@gmail.com>
 <CAAkaFLU3GUhRyBAc2r=UBAUM0atft2zBE1bJNSw9ybn9Cb+udA@mail.gmail.com>
 <1e489f79-ebb5-b394-c99c-ed71bce1e607@gmail.com>
 <CAAkaFLVC1aE68M5VQORtYzSC5ATy3eFDwxbxvUuQiMF4aO5nzA@mail.gmail.com>
 <dc9ce0b8-7e27-30b1-5bb3-e456f38b407e@gmail.com>
 <92ce29e5-4a54-9545-1d51-79bda3713c25@gmail.com>
 <136faf1a-5514-1c21-7514-0673b4ddde81@gmail.com>
Message-ID: <890e938c-71a1-df9d-3f26-a331e5a0244c@gmail.com>

Hi everyone,

Remainder that the next monthly meeting is on Monday! Please update your 
project notes *before Friday* so we don't have extra work on the WE :)


https://github.com/scikit-learn/scikit-learn/projects/15 
<https://www.google.com/url?q=https://github.com/scikit-learn/scikit-learn/projects/15&sa=D&ust=1569248921932000&usg=AOvVaw0mv4SPS00zPOz2HIzLKy1o>

https://appear.in/amueller 
<https://www.google.com/url?q=https://appear.in/amueller&sa=D&ust=1569248921932000&usg=AOvVaw2YzZNavzQWVTX9wXs586Hw> 


https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=9&day=23&hour=13&min=0&sec=0&p1=240&p2=33&p3=37&p4=179


Cheers,

Nicolas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/ba8ed131/attachment.html>

From t3kcit at gmail.com  Wed Sep 18 11:11:40 2019
From: t3kcit at gmail.com (Andreas Mueller)
Date: Wed, 18 Sep 2019 11:11:40 -0400
Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both
 continuous and categorical features?
In-Reply-To: <CACDxx9iv5YyyAMnYfim1t4vre_8CDvV2CT=RVMMZwCXKJ5mfag@mail.gmail.com>
References: <CAE2FW2=xuUWCp_kZH2z+1KjAB8JXUW3da4NSgJOd3W+BjDMEiA@mail.gmail.com>
 <CAJn5T5WXJT-6RjyXkDYZ2GC=TY+s299JgY8LRyzAOpx9FcbYug@mail.gmail.com>
 <CAE2FW2k5XzphFjQqwbWbP7uyh4jhzttpJF+5t6tgE6yQRb_M9w@mail.gmail.com>
 <CACDxx9iv5YyyAMnYfim1t4vre_8CDvV2CT=RVMMZwCXKJ5mfag@mail.gmail.com>
Message-ID: <dacfb9d0-567d-ef94-a791-3e3fdd40580b@gmail.com>


On 9/15/19 8:16 AM, Guillaume Lema?tre wrote:
>
>
> On Sat, 14 Sep 2019 at 20:59, C W <tmrsg11 at gmail.com 
> <mailto:tmrsg11 at gmail.com>> wrote:
>
>     Thanks,?Guillaume.
>     Column transformer looks pretty neat. I've also heard though, this
>     pipeline can be tedious to set up? Specifying what you want for
>     every feature is a pain.
>
>
> It would be interesting for us which part of the pipeline is tedious 
> to set up to know if we can improve something there.
> Do you mean, that you would like to automatically detect of which type 
> of feature (categorical/numerical) and apply a
> default encoder/scaling such as discuss there: 
> https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127
>
> IMO, one a user perspective, it would be cleaner in some cases at the 
> cost of applying blindly a black box
> which might be dangerous.
Also see 
https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor
Which basically does that.


>
>     Jaiver,
>     Actually, you guessed right. My real data has only one numerical
>     variable,?looks more like this:
>
>     Gender Date? ? ? ? ? ? Income? Car?? Attendance
>     Male? ? ?2019/3/01? ?10000?? BMW????????? Yes
>     Female 2019/5/02? ? 9000? ?Toyota? ??????? No
>     Male???? 2019/7/15? ?12000 ?? Audi ? ? ????? Yes
>
>     I am predicting income using all other categorical variables.
>     Maybe?it is catboost!
>
>     Thanks,
>
>     M
>
>
>
>
>
>
>     On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez <jlopez at ende.cc> wrote:
>
>         If you have datasets with many categorical features, and
>         perhaps many categories, the tools in sklearn are quite limited,
>         but there are alternative implementations of boosted trees
>         that are designed with categorical features in mind. Take a look
>         at catboost [1], which has an sklearn-compatible API.
>
>         J
>
>         [1] https://catboost.ai/
>
>         On Sat, Sep 14, 2019 at 3:40 AM C W <tmrsg11 at gmail.com
>         <mailto:tmrsg11 at gmail.com>> wrote:
>
>             Hello all,
>             I'm very confused. Can the decision tree module handle
>             both continuous and categorical features in the dataset?
>             In this case, it's just CART (Classification and
>             Regression Trees).
>
>             For example,
>             Gender Age Income? Car?? Attendance
>             Male???? 30?? 10000?? BMW????????? Yes
>             Female 35???? 9000? Toyota? ??????? No
>             Male???? 50?? 12000 ?? Audi ? ? ????? Yes
>
>             According to the documentation
>             https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
>             it can not!
>
>             It says: "scikit-learn implementation does not support
>             categorical variables for now".
>
>             Is this true? If not, can someone point me to an example?
>             If yes, what do people do?
>
>             Thank you very much!
>
>
>
>             _______________________________________________
>             scikit-learn mailing list
>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>             https://mail.python.org/mailman/listinfo/scikit-learn
>
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> -- 
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/9531171b/attachment-0001.html>

From t3kcit at gmail.com  Wed Sep 18 11:16:44 2019
From: t3kcit at gmail.com (Andreas Mueller)
Date: Wed, 18 Sep 2019 11:16:44 -0400
Subject: [scikit-learn] MultiBinarizer issue
In-Reply-To: <CAGa_XGEK_0g5ev8y0BzPX9SRngHpvEEqzM0wQWDKf78Pb0k-MA@mail.gmail.com>
References: <CAGa_XGEK_0g5ev8y0BzPX9SRngHpvEEqzM0wQWDKf78Pb0k-MA@mail.gmail.com>
Message-ID: <2e2dc156-f2bf-335a-984f-7f6622fe7d5b@gmail.com>

Please don't repost questions.
Also, you didn't create a minimal reproducible example as suggested on 
stackoverflow:
https://stackoverflow.com/help/minimal-reproducible-example

That process would probably have shown you where the issue is.
I highly recommend doing that next time.


On 9/16/19 12:12 AM, Sayak Paul wrote:
>
> I am working on a multi-label text classification problem. In order to 
> encode the labels, I am using |MultiLabelBinarizer|. The labels of the 
> dataset look like -
>
> |[cs.AI,cs.CL,cs.CV,cs.NE,stat.ML][cs.CL,cs.AI,cs.LG,cs.NE,stat.ML][cs.CL,cs.AI,cs.LG,cs.NE,stat.ML][stat.ML,cs.AI,cs.CL,cs.LG,cs.NE][cs.CL,cs.AI,cs.LG,cs.NE,stat.ML]|
>
> When I am using
>
> |mlb =MultiLabelBinarizer()mlb.fit(labels)print(mlb.classes_)|
>
> It gives me -
>
> |array([' 
> ',',','.','A','B','C','D','E','G','H','I','L','M','N','O','P','R','S','T','V','Y','[',']','a','c','h','m','s','t'],dtype=object)|
>
> I (partially) fixed this problem by |mlb.fit([y_train])|?and I got (I 
> printed first 10 classes) -
>
> |array(['[cs.AI, cs.CC]','[cs.AI, cs.CV]','[cs.AI, cs.CY]','[cs.AI, 
> cs.DB]','[cs.AI, cs.DS]','[cs.AI, cs.GT]','[cs.AI, cs.HC]','[cs.AI, 
> cs.IR]','[cs.AI, cs.LG, stat.ML]','[cs.AI, cs.LG]'],dtype=object)|
>
> Ideally, it should output the individual classes (there may be 
> something wrong in my code). When I am using 
> |mlb.fit_transform([y_train])|, I am getting -
>
> |array([[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]])|
>
> Help would be very much appreciated.
>
> Here's the corresponding StackOverflow issue: 
> https://stackoverflow.com/questions/57917936/multilabelbinarizer-gives-individual-characters-instead-of-the-classes
>
>
> Sayak Paul |sayak.dev <http://sayak.dev>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/71b23522/attachment-0001.html>

From t3kcit at gmail.com  Wed Sep 18 11:24:55 2019
From: t3kcit at gmail.com (Andreas Mueller)
Date: Wed, 18 Sep 2019 11:24:55 -0400
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com>
References: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
 <CANnYi3TDW=7g6C9s1Hq14dUE7giTYhv1-+HCqg70576ee3m=Pw@mail.gmail.com>
 <CADeotZrKmVTqBrJSZM4ojsWOVG7ryijzp4EWxt3XGULvPQEtPw@mail.gmail.com>
 <CAEOrW48aFRmP9DQbG_VvJ2zgvqQ6KZ3F1h5quX1=WodruDbgTA@mail.gmail.com>
 <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com>
Message-ID: <10788b6e-683d-c208-5dd0-d193f28fc23d@gmail.com>

The SLEP says:

This proposal suggests making only/most commonly/used parameters 
positional. The/most commonly/used parameters are defined per method or 
function, to be defined as either of the following two ways:

  * The set defined and agreed upon by the core developers, which should
    cover the/easy/cases.
  * A set identified as being in the top 95% of the use cases, using
    some automated analysis such asthis one
    <https://odyssey.readthedocs.io/en/latest/tutorial.html>orthis one
    <https://github.com/Quansight-Labs/python-api-inspect>.

And describes a clear deprecation path.

So that seems pretty actionable?


Also, I vote +1 on the SLEP.

Nicolas: Do you think this is not actionable? I had suggested that we 
define a clear rule but doing a case-by-case seems better than 
bikeshedding now.

Alexandre: did you read the SLEP before asking? I thought the point of 
the SLEP was to summarize the discussion. If your question is not 
answered we should amend the SLEP.


On 9/11/19 2:21 PM, Nicolas Hug wrote:
>
> Since there is no explicit proposal in the SLEP it's not very clear 
> what we need to vote for / against?
>
> But overall I'm? + 1 on forcing kwargs for all __init__ methods.
>
>
> Nicolas
>
>
> On 9/11/19 9:38 AM, Adrin wrote:
>> Hi,
>>
>> I'm (mostly) the messenger, don't shoot me :P
>>
>> It may help to summarize the SLEP:
>> 1. This can be applied to all methods, not just __init__.
>> 2. The SLEP doesn't say we have to apply it everywhere. It's mostly 
>> that it lets us do that.
>> 3. It doesn't make ALL inputs a keywords only argument. The common 
>> ones such as X and y in fit(X, y) will stay as they are.
>> Therefore clf.fit(X, y) will definitely be allowed.
>> 4. Whether or not sample_weight should be keyword only or not in fit, 
>> requires its own discussion, and the route of the discussion
>> ?? is defined in the SLEP.
>>
>> In other words, if an input parameter is used as a positional 
>> argument less frequently than X% of the time, then it can/should be
>> a keyword only argument. But the SLEP better defines these conditions.
>>
>> I hope that clarifies it a little bit.
>>
>> Adrin/
>>
>> On Wed, Sep 11, 2019 at 3:23 PM Alexandre Gramfort 
>> <alexandre.gramfort at inria.fr <mailto:alexandre.gramfort at inria.fr>> wrote:
>>
>>     hi,
>>
>>     Adrin do you suggest this for everything or maybe just for __init__
>>     params of estimators
>>     and stuff that can come after X, y in fit eg sample_weights?
>>
>>     would:
>>
>>     clf.fit(X, y)
>>
>>     still be allowed?
>>
>>     Alex
>>     _______________________________________________
>>     scikit-learn mailing list
>>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/7767df89/attachment.html>

From t3kcit at gmail.com  Wed Sep 18 11:27:54 2019
From: t3kcit at gmail.com (Andreas Mueller)
Date: Wed, 18 Sep 2019 11:27:54 -0400
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLXD1b4NwAFc7o6h7NGBEe8rJ_Hk5b2XB+DcPhROowiWJw@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
 <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
 <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org>
 <CAK6amUP_s8UTeBz61s-=gfELndWTXD7+6JF7XCA9k45vi-9i=w@mail.gmail.com>
 <CADeotZpHJ6P0zbNTFOw7gSh5owzQke6CrMH4qOqRGROQ+FKAcg@mail.gmail.com>
 <CAAkaFLVns9X5Fwt32ORk-+zaj6P8N2aRT7xz8PUO1h9QRUgtpg@mail.gmail.com>
 <CAAkaFLXD1b4NwAFc7o6h7NGBEe8rJ_Hk5b2XB+DcPhROowiWJw@mail.gmail.com>
Message-ID: <31ca2161-79b9-07ef-6775-9eea4b374225@gmail.com>

Sorry, I was on vacation ;)
 ?+1 from me.

On 9/17/19 7:28 PM, Joel Nothman wrote:
> If we were to assume Andy's vote in the positive, him having been a 
> major proponent of this change, we would say this was accepted by a 
> unanimous vote of a majority of core developers.
>
> Having tentatively accepted is good enough basis for us to start 
> implementation. And ideally getting statistics to guide that.
>
> We should tackle this module by module, perhaps working through 
> estimators before other public API.
>
> As such, I have opened 
> https://github.com/scikit-learn/scikit-learn/issues/15005?to start 
> tracking this work.
>
> Thanks everyone, and Andy, we await your vote!
>
> J
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/06a52c78/attachment.html>

From t3kcit at gmail.com  Wed Sep 18 11:34:18 2019
From: t3kcit at gmail.com (Andreas Mueller)
Date: Wed, 18 Sep 2019 11:34:18 -0400
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <CAAkaFLVns9X5Fwt32ORk-+zaj6P8N2aRT7xz8PUO1h9QRUgtpg@mail.gmail.com>
References: <20190912003716.738E42D0009B@webmail.sinamail.sina.com.cn>
 <CAAkaFLU3=ADvsF0THzSi9ULSYWGiOYq_8N1rBa3n1hc5CbbFqw@mail.gmail.com>
 <CACDxx9jCkE5GAjRNj3TKinbuyWZQvXMrrcHBBqn6q_FXYdPrbQ@mail.gmail.com>
 <29eccb39-b4f8-bd21-d6c8-005b9c2b087a@inria.fr>
 <CAAkaFLWRWjs4itJaD9c-wV8Fqq9Sitx8nU19viSR8sjzjzv74A@mail.gmail.com>
 <CAK3g5AbMWr6Syf8FOVgugQ56nGGJRaV+Mx-RY_4pM1t3qwjQ2A@mail.gmail.com>
 <80a7e620-dd10-bc09-22a6-9011d455648b@gmail.com>
 <CAAkaFLWDu+K5NhEd2Phjf0CN36qETakWMdt3KZyyE5HufDTORg@mail.gmail.com>
 <20190916193237.7qrx5med5ijqscsp@phare.normalesup.org>
 <CAK6amUP_s8UTeBz61s-=gfELndWTXD7+6JF7XCA9k45vi-9i=w@mail.gmail.com>
 <CADeotZpHJ6P0zbNTFOw7gSh5owzQke6CrMH4qOqRGROQ+FKAcg@mail.gmail.com>
 <CAAkaFLVns9X5Fwt32ORk-+zaj6P8N2aRT7xz8PUO1h9QRUgtpg@mail.gmail.com>
Message-ID: <ec294693-e30a-d066-b4d1-bbe7ee78bce2@gmail.com>


On 9/17/19 3:42 AM, Joel Nothman wrote:
> I think you mean keyword-only, Alex
>
> On Tue., 17 Sep. 2019, 4:11 pm Alexandre Gramfort, 
> <alexandre.gramfort at inria.fr <mailto:alexandre.gramfort at inria.fr>> wrote:
>
>     Yes I am?+1 for positional arguments for the __init__ of the
>     estimators.
>
>     Alex
>
>
>
> Albert: my position when reviewing changes in accordance with this 
> SLEP would be to (a) perhaps get usage evidence as discussed in the 
> SLEP pull request review; and (b) apply a rule of thumb like "are the 
> semantics reasonably clear when the argument is passed positionally?" 
> I think they are clear for PCA's components, for Pipeline's steps, and 
> for GridSearchCV's estimator and parameter grid. Other parameters of 
> those estimators seem more suitable for keyword-only.
I think you're not fully addressing Albert's concern, which I think is 
quite important and hasn't been brought up before.

I think Albert is saying that it should be easy for a new user to build 
a mental model of when a positional argument is allowed.
If we can't specify a simple rule, then it's very hard for a new (or 
really any) user to have clear expectations.
And I think sklearn is all about setting clear expectations.

 > Also how? is this going to be rendered by sphinx in the doc?

There will be a star in the signature between positional and kw only 
args i.e. PCA(n_components=2, *, copy=True, ...)
So you could always look at the docs to figure it out. That's clearly 
not very convenient.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/1b13d5fc/attachment-0001.html>

From niourf at gmail.com  Wed Sep 18 11:46:21 2019
From: niourf at gmail.com (Nicolas Hug)
Date: Wed, 18 Sep 2019 11:46:21 -0400
Subject: [scikit-learn] Vote on SLEP009: keyword only arguments
In-Reply-To: <10788b6e-683d-c208-5dd0-d193f28fc23d@gmail.com>
References: <CAAkaFLWXWNFyv8C4_FNMQZsJcdhzh+oTid_vQU9z1ppYmB+Jxg@mail.gmail.com>
 <CANnYi3TDW=7g6C9s1Hq14dUE7giTYhv1-+HCqg70576ee3m=Pw@mail.gmail.com>
 <CADeotZrKmVTqBrJSZM4ojsWOVG7ryijzp4EWxt3XGULvPQEtPw@mail.gmail.com>
 <CAEOrW48aFRmP9DQbG_VvJ2zgvqQ6KZ3F1h5quX1=WodruDbgTA@mail.gmail.com>
 <56cd260b-20ce-d863-61a6-f6cd6c1f4aab@gmail.com>
 <10788b6e-683d-c208-5dd0-d193f28fc23d@gmail.com>
Message-ID: <e5c7a765-6b68-c994-dc8d-7649533fff19@gmail.com>

I think Alex's and my concerns are legit. The SLEP is asking "are you OK 
with forcing some parameters to be kwords only? We still don't know 
which ones though".

I understand why you don't want to bike shed now, but that's a 
surprisingly mild SLEP, hence the questions.

The only response I can give to the SLEP is right now is "sure, depends".

Nicolas


On 9/18/19 11:24 AM, Andreas Mueller wrote:
> The SLEP says:
>
> This proposal suggests making only/most commonly/used parameters 
> positional. The/most commonly/used parameters are defined per method 
> or function, to be defined as either of the following two ways:
>
>   * The set defined and agreed upon by the core developers, which
>     should cover the/easy/cases.
>   * A set identified as being in the top 95% of the use cases, using
>     some automated analysis such asthis one
>     <https://odyssey.readthedocs.io/en/latest/tutorial.html>orthis one
>     <https://github.com/Quansight-Labs/python-api-inspect>.
>
> And describes a clear deprecation path.
>
> So that seems pretty actionable?
>
>
> Also, I vote +1 on the SLEP.
>
> Nicolas: Do you think this is not actionable? I had suggested that we 
> define a clear rule but doing a case-by-case seems better than 
> bikeshedding now.
>
> Alexandre: did you read the SLEP before asking? I thought the point of 
> the SLEP was to summarize the discussion. If your question is not 
> answered we should amend the SLEP.
>
>
>
> On 9/11/19 2:21 PM, Nicolas Hug wrote:
>>
>> Since there is no explicit proposal in the SLEP it's not very clear 
>> what we need to vote for / against?
>>
>> But overall I'm? + 1 on forcing kwargs for all __init__ methods.
>>
>>
>> Nicolas
>>
>>
>> On 9/11/19 9:38 AM, Adrin wrote:
>>> Hi,
>>>
>>> I'm (mostly) the messenger, don't shoot me :P
>>>
>>> It may help to summarize the SLEP:
>>> 1. This can be applied to all methods, not just __init__.
>>> 2. The SLEP doesn't say we have to apply it everywhere. It's mostly 
>>> that it lets us do that.
>>> 3. It doesn't make ALL inputs a keywords only argument. The common 
>>> ones such as X and y in fit(X, y) will stay as they are.
>>> Therefore clf.fit(X, y) will definitely be allowed.
>>> 4. Whether or not sample_weight should be keyword only or not in 
>>> fit, requires its own discussion, and the route of the discussion
>>> ?? is defined in the SLEP.
>>>
>>> In other words, if an input parameter is used as a positional 
>>> argument less frequently than X% of the time, then it can/should be
>>> a keyword only argument. But the SLEP better defines these conditions.
>>>
>>> I hope that clarifies it a little bit.
>>>
>>> Adrin/
>>>
>>> On Wed, Sep 11, 2019 at 3:23 PM Alexandre Gramfort 
>>> <alexandre.gramfort at inria.fr <mailto:alexandre.gramfort at inria.fr>> 
>>> wrote:
>>>
>>>     hi,
>>>
>>>     Adrin do you suggest this for everything or maybe just for __init__
>>>     params of estimators
>>>     and stuff that can come after X, y in fit eg sample_weights?
>>>
>>>     would:
>>>
>>>     clf.fit(X, y)
>>>
>>>     still be allowed?
>>>
>>>     Alex
>>>     _______________________________________________
>>>     scikit-learn mailing list
>>>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/86cb5cf6/attachment.html>

From jchapman48 at gatech.edu  Thu Sep 19 15:41:07 2019
From: jchapman48 at gatech.edu (Chapman, James E)
Date: Thu, 19 Sep 2019 19:41:07 +0000
Subject: [scikit-learn] Porting old MLPY KRR model to scikit-learn
Message-ID: <BFD5CD8A-1226-4DFA-AADC-C0C4CC8E8F65@gatech.edu>

Hello,
I have some old KRR models from MLPY and I need to port those models over to a new code written with scikit-learn (transfer MLPY KRR data to a scikit-learn KernelRidge instance). Does anyone know if this is even possible, and if so, could you give me some suggestions as to how to accomplish it?

Thanks and regards,
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190919/9ee26e47/attachment.html>

From michael.eickenberg at gmail.com  Thu Sep 19 15:51:23 2019
From: michael.eickenberg at gmail.com (Michael Eickenberg)
Date: Thu, 19 Sep 2019 12:51:23 -0700
Subject: [scikit-learn] Porting old MLPY KRR model to scikit-learn
In-Reply-To: <BFD5CD8A-1226-4DFA-AADC-C0C4CC8E8F65@gatech.edu>
References: <BFD5CD8A-1226-4DFA-AADC-C0C4CC8E8F65@gatech.edu>
Message-ID: <CADxJN67kA7i-QJeq04VBigN_FW6moJv6RoD0qjfKfdxidFJE1w@mail.gmail.com>

What exactly do you mean by "port"? Put already fitted models into a
sklearn estimator object? You can do this as follows:

You should be able to create a `estimator =
sklearn.kernel_ridge.KernelRidge(...)` object, call `fit` to some random
data of the appropriate shape, and then set `estimator.dual_coef_` to the
ones from your MLPY model (the sklearn version sets them here:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_ridge.py#L165
).


If this is not what you mean, then maybe you just want to refit them using
the appropriate KernelRidge kernel?

Hope this helps!

Michael


On Thu, Sep 19, 2019 at 12:43 PM Chapman, James E <jchapman48 at gatech.edu>
wrote:

> Hello,
>
> I have some old KRR models from MLPY and I need to port those models over
> to a new code written with scikit-learn (transfer MLPY KRR data to a
> scikit-learn KernelRidge instance). Does anyone know if this is even
> possible, and if so, could you give me some suggestions as to how to
> accomplish it?
>
>
>
> Thanks and regards,
>
> James
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190919/2cb4a494/attachment.html>

From jchapman48 at gatech.edu  Thu Sep 19 21:20:39 2019
From: jchapman48 at gatech.edu (Chapman, James E)
Date: Fri, 20 Sep 2019 01:20:39 +0000
Subject: [scikit-learn] Porting old MLPY KRR model to scikit-learn
In-Reply-To: <CADxJN67kA7i-QJeq04VBigN_FW6moJv6RoD0qjfKfdxidFJE1w@mail.gmail.com>
References: <BFD5CD8A-1226-4DFA-AADC-C0C4CC8E8F65@gatech.edu>
 <CADxJN67kA7i-QJeq04VBigN_FW6moJv6RoD0qjfKfdxidFJE1w@mail.gmail.com>
Message-ID: <246ABEC6-7933-4A8E-9337-CE0AFF73C95F@gatech.edu>

Hello,
Thank you for your comments. I had actually initially tried your first suggestion, but the predicted values just wouldn?t line up between the two models. As I dug into the source code of the two, I realized that they don?t appear to be the same. MLPY adds a bias term to both the training and prediction process, whereas, correct me if I?m wrong, scikit-learn does not. This results in two fundamentally different sets of codes:

MLPY (prediction):
np.dot(self._alpha, Kt_arr.T) + self._b

Scikit-learn(prediction):
                np.dot(K, self.dual_coef_)

Here, MLPY?s alphas correspond to scikit-learn?s dual_coef_, and the kernel values are just stored differently, so one has to be transposed. If I just try and add MLPY?s bias term to scikit-learn?s prediction (model.predict), the values don?t match those predicted by MLPY (they?re close but they are not off by a constant value). Am I missing something obvious, or is there really a fundamental difference here?

From: scikit-learn <scikit-learn-bounces+jchapman48=gatech.edu at python.org> on behalf of Michael Eickenberg <michael.eickenberg at gmail.com>
Reply-To: Scikit-learn mailing list <scikit-learn at python.org>
Date: Thursday, September 19, 2019 at 3:53 PM
To: Scikit-learn mailing list <scikit-learn at python.org>
Subject: Re: [scikit-learn] Porting old MLPY KRR model to scikit-learn

What exactly do you mean by "port"? Put already fitted models into a sklearn estimator object? You can do this as follows:

You should be able to create a `estimator = sklearn.kernel_ridge.KernelRidge(...)` object, call `fit` to some random data of the appropriate shape, and then set `estimator.dual_coef_` to the ones from your MLPY model (the sklearn version sets them here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_ridge.py#L165).

If this is not what you mean, then maybe you just want to refit them using the appropriate KernelRidge kernel?

Hope this helps!

Michael


On Thu, Sep 19, 2019 at 12:43 PM Chapman, James E <jchapman48 at gatech.edu<mailto:jchapman48 at gatech.edu>> wrote:
Hello,
I have some old KRR models from MLPY and I need to port those models over to a new code written with scikit-learn (transfer MLPY KRR data to a scikit-learn KernelRidge instance). Does anyone know if this is even possible, and if so, could you give me some suggestions as to how to accomplish it?

Thanks and regards,
James
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org<mailto:scikit-learn at python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190920/c019ba19/attachment-0001.html>

From joel.nothman at gmail.com  Sun Sep 22 18:56:21 2019
From: joel.nothman at gmail.com (Joel Nothman)
Date: Mon, 23 Sep 2019 08:56:21 +1000
Subject: [scikit-learn] Website redesign
Message-ID: <CAAkaFLW6qEatkdj2cf00zP66CsB8QoxH-sneyH31nUR2EpkKfA@mail.gmail.com>

Hi scikit-learn users,

Scikit-learn developer Thomas Fan recently gave our documentation and web
site a refresh, targeting desktop and mobile devices. Please give it a try
at https://scikit-learn.org/dev/ and raise usability issues at
https://github.com/scikit-learn/scikit-learn/issues/new to help us get it
ready for the next release.

Congratulations to Thomas on some great work!

Thanks all!

Joel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190923/d708e6cc/attachment.html>

From solegalli1 at gmail.com  Tue Sep 24 07:39:33 2019
From: solegalli1 at gmail.com (Sole Galli)
Date: Tue, 24 Sep 2019 12:39:33 +0100
Subject: [scikit-learn] Normalizer, l1 and l2 norms
Message-ID: <CANDT+DFWqHKqR9+1vzr40FPZYT0SLfd-bz6cUW71FmoPhXdYNg@mail.gmail.com>

Hello team,

Quick question respect to the Normalizer().

My understanding is that this transformer divides the values (rows) of a
vector by the vector euclidean (l2) or manhattan distances (l1).

>From the sklearn docs, I understand that the Normalizer() does not learn
the distances from the train set and stores them. It rathers normalises the
data according to distance the data set presents, which could be or not,
the same in test and train.

Am I understanding this correctly?

If so, what is the reason not to store these parameters in the Normalizer
and use them to scale future data?

If not getting it right, what am I missing?

Many thanks and I will appreciate if you have an article on this to share.

Cheers

Sole
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190924/e27363bf/attachment.html>

From g.lemaitre58 at gmail.com  Tue Sep 24 07:59:25 2019
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Tue, 24 Sep 2019 13:59:25 +0200
Subject: [scikit-learn] Normalizer, l1 and l2 norms
In-Reply-To: <CANDT+DFWqHKqR9+1vzr40FPZYT0SLfd-bz6cUW71FmoPhXdYNg@mail.gmail.com>
References: <CANDT+DFWqHKqR9+1vzr40FPZYT0SLfd-bz6cUW71FmoPhXdYNg@mail.gmail.com>
Message-ID: <CACDxx9iJjwCcJDkLPhhy+3GpmTuw8wzXuWT6qV4eeyakym6QjQ@mail.gmail.com>

Since you are normalizing sample by sample, you don't need information from
the training set to normalize a new sample.
You just need to compute the norm of this new sample.

On Tue, 24 Sep 2019 at 13:41, Sole Galli <solegalli1 at gmail.com> wrote:

> Hello team,
>
> Quick question respect to the Normalizer().
>
> My understanding is that this transformer divides the values (rows) of a
> vector by the vector euclidean (l2) or manhattan distances (l1).
>
> From the sklearn docs, I understand that the Normalizer() does not learn
> the distances from the train set and stores them. It rathers normalises the
> data according to distance the data set presents, which could be or not,
> the same in test and train.
>
> Am I understanding this correctly?
>
> If so, what is the reason not to store these parameters in the Normalizer
> and use them to scale future data?
>
> If not getting it right, what am I missing?
>
> Many thanks and I will appreciate if you have an article on this to share.
>
> Cheers
>
> Sole
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190924/82ad8162/attachment.html>

From solegalli1 at gmail.com  Tue Sep 24 08:02:25 2019
From: solegalli1 at gmail.com (Sole Galli)
Date: Tue, 24 Sep 2019 13:02:25 +0100
Subject: [scikit-learn] Normalizer, l1 and l2 norms
In-Reply-To: <CANDT+DFWqHKqR9+1vzr40FPZYT0SLfd-bz6cUW71FmoPhXdYNg@mail.gmail.com>
References: <CANDT+DFWqHKqR9+1vzr40FPZYT0SLfd-bz6cUW71FmoPhXdYNg@mail.gmail.com>
Message-ID: <CANDT+DGrj=DfZ7YgWq0-8Dzv6UMDDtGO3HyjZ8WwXE=pfc42bg@mail.gmail.com>

Sorry, ignore my question, I got it right now.

It is calculating the norm of the observation vector (across variables),
and its distance varies obs per obs, that is why it needs to be
re-calculated, and therefore not stored.

I would appreciate some articles / links with successful implementations of
this technique and why it adds value to ML. Would you be able to point me
to any?

Cheers

Sole


On Tue, 24 Sep 2019 at 12:39, Sole Galli <solegalli1 at gmail.com> wrote:

> Hello team,
>
> Quick question respect to the Normalizer().
>
> My understanding is that this transformer divides the values (rows) of a
> vector by the vector euclidean (l2) or manhattan distances (l1).
>
> From the sklearn docs, I understand that the Normalizer() does not learn
> the distances from the train set and stores them. It rathers normalises the
> data according to distance the data set presents, which could be or not,
> the same in test and train.
>
> Am I understanding this correctly?
>
> If so, what is the reason not to store these parameters in the Normalizer
> and use them to scale future data?
>
> If not getting it right, what am I missing?
>
> Many thanks and I will appreciate if you have an article on this to share.
>
> Cheers
>
> Sole
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190924/61a8181f/attachment.html>

From g.lemaitre58 at gmail.com  Tue Sep 24 09:03:03 2019
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Tue, 24 Sep 2019 15:03:03 +0200
Subject: [scikit-learn] Normalizer, l1 and l2 norms
In-Reply-To: <CANDT+DGrj=DfZ7YgWq0-8Dzv6UMDDtGO3HyjZ8WwXE=pfc42bg@mail.gmail.com>
References: <CANDT+DFWqHKqR9+1vzr40FPZYT0SLfd-bz6cUW71FmoPhXdYNg@mail.gmail.com>
 <CANDT+DGrj=DfZ7YgWq0-8Dzv6UMDDtGO3HyjZ8WwXE=pfc42bg@mail.gmail.com>
Message-ID: <CACDxx9gVdExBdGPEtvxw01y0_kDwc28uFHSMRFow=eZr_8OO0w@mail.gmail.com>

One example where I saw it used was Scale-Invariant Feature Transform
(SIFT). Normalizing each vector to have a unit length will compensate for
affine changes in illumination between samples.
The use case given in scikit-learn would be something similar but with text
processing:

"Scaling inputs to unit norms is a common operation for text classification
or clustering for instance. For instance the dot product of two
l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is
the base similarity metric for the Vector Space Model commonly used by the
Information Retrieval community."

So basically, you cancel a transform and it allows you to compare samples
between each other.

On Tue, 24 Sep 2019 at 14:04, Sole Galli <solegalli1 at gmail.com> wrote:

> Sorry, ignore my question, I got it right now.
>
> It is calculating the norm of the observation vector (across variables),
> and its distance varies obs per obs, that is why it needs to be
> re-calculated, and therefore not stored.
>
> I would appreciate some articles / links with successful implementations
> of this technique and why it adds value to ML. Would you be able to point
> me to any?
>
> Cheers
>
> Sole
>
>
>
>
>
> On Tue, 24 Sep 2019 at 12:39, Sole Galli <solegalli1 at gmail.com> wrote:
>
>> Hello team,
>>
>> Quick question respect to the Normalizer().
>>
>> My understanding is that this transformer divides the values (rows) of a
>> vector by the vector euclidean (l2) or manhattan distances (l1).
>>
>> From the sklearn docs, I understand that the Normalizer() does not learn
>> the distances from the train set and stores them. It rathers normalises the
>> data according to distance the data set presents, which could be or not,
>> the same in test and train.
>>
>> Am I understanding this correctly?
>>
>> If so, what is the reason not to store these parameters in the Normalizer
>> and use them to scale future data?
>>
>> If not getting it right, what am I missing?
>>
>> Many thanks and I will appreciate if you have an article on this to share.
>>
>> Cheers
>>
>> Sole
>>
>>
>> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190924/32a4da57/attachment-0001.html>

From solegalli1 at gmail.com  Wed Sep 25 04:05:16 2019
From: solegalli1 at gmail.com (Sole Galli)
Date: Wed, 25 Sep 2019 09:05:16 +0100
Subject: [scikit-learn] Normalizer, l1 and l2 norms
In-Reply-To: <CACDxx9gVdExBdGPEtvxw01y0_kDwc28uFHSMRFow=eZr_8OO0w@mail.gmail.com>
References: <CANDT+DFWqHKqR9+1vzr40FPZYT0SLfd-bz6cUW71FmoPhXdYNg@mail.gmail.com>
 <CANDT+DGrj=DfZ7YgWq0-8Dzv6UMDDtGO3HyjZ8WwXE=pfc42bg@mail.gmail.com>
 <CACDxx9gVdExBdGPEtvxw01y0_kDwc28uFHSMRFow=eZr_8OO0w@mail.gmail.com>
Message-ID: <CANDT+DFUvOqqDKwya0DdVHbbWSazddmidgtuKbMAhKBDEkgSiA@mail.gmail.com>

Thank you Guillaume, that is helpful.

Cheers

Sole

On Tue, 24 Sep 2019 at 14:04, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
wrote:

> One example where I saw it used was Scale-Invariant Feature Transform
> (SIFT). Normalizing each vector to have a unit length will compensate for
> affine changes in illumination between samples.
> The use case given in scikit-learn would be something similar but with
> text processing:
>
> "Scaling inputs to unit norms is a common operation for text
> classification or clustering for instance. For instance the dot product of
> two l2-normalized TF-IDF vectors is the cosine similarity of the vectors
> and is the base similarity metric for the Vector Space Model commonly used
> by the Information Retrieval community."
>
> So basically, you cancel a transform and it allows you to compare samples
> between each other.
>
> On Tue, 24 Sep 2019 at 14:04, Sole Galli <solegalli1 at gmail.com> wrote:
>
>> Sorry, ignore my question, I got it right now.
>>
>> It is calculating the norm of the observation vector (across variables),
>> and its distance varies obs per obs, that is why it needs to be
>> re-calculated, and therefore not stored.
>>
>> I would appreciate some articles / links with successful implementations
>> of this technique and why it adds value to ML. Would you be able to point
>> me to any?
>>
>> Cheers
>>
>> Sole
>>
>>
>>
>>
>>
>> On Tue, 24 Sep 2019 at 12:39, Sole Galli <solegalli1 at gmail.com> wrote:
>>
>>> Hello team,
>>>
>>> Quick question respect to the Normalizer().
>>>
>>> My understanding is that this transformer divides the values (rows) of a
>>> vector by the vector euclidean (l2) or manhattan distances (l1).
>>>
>>> From the sklearn docs, I understand that the Normalizer() does not learn
>>> the distances from the train set and stores them. It rathers normalises the
>>> data according to distance the data set presents, which could be or not,
>>> the same in test and train.
>>>
>>> Am I understanding this correctly?
>>>
>>> If so, what is the reason not to store these parameters in the
>>> Normalizer and use them to scale future data?
>>>
>>> If not getting it right, what am I missing?
>>>
>>> Many thanks and I will appreciate if you have an article on this to
>>> share.
>>>
>>> Cheers
>>>
>>> Sole
>>>
>>>
>>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190925/ad2fbd66/attachment.html>

From mathieu at mblondel.org  Thu Sep 26 08:53:43 2019
From: mathieu at mblondel.org (Mathieu Blondel)
Date: Thu, 26 Sep 2019 14:53:43 +0200
Subject: [scikit-learn] Website redesign
In-Reply-To: <CAAkaFLW6qEatkdj2cf00zP66CsB8QoxH-sneyH31nUR2EpkKfA@mail.gmail.com>
References: <CAAkaFLW6qEatkdj2cf00zP66CsB8QoxH-sneyH31nUR2EpkKfA@mail.gmail.com>
Message-ID: <CAOKSrLxo-PSo14xF_-O94ND0mN+KP2mL4Vs-YVajd0cBE_4wYA@mail.gmail.com>

Great work indeed! Love it!

Mathieu

On Mon, Sep 23, 2019 at 12:58 AM Joel Nothman <joel.nothman at gmail.com>
wrote:

> Hi scikit-learn users,
>
> Scikit-learn developer Thomas Fan recently gave our documentation and web
> site a refresh, targeting desktop and mobile devices. Please give it a try
> at https://scikit-learn.org/dev/ and raise usability issues at
> https://github.com/scikit-learn/scikit-learn/issues/new to help us get it
> ready for the next release.
>
> Congratulations to Thomas on some great work!
>
> Thanks all!
>
> Joel
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190926/9ca9f9e4/attachment.html>