From yrohinkumar at gmail.com Tue Aug 1 09:15:56 2017 From: yrohinkumar at gmail.com (Rohin Kumar) Date: Tue, 1 Aug 2017 18:45:56 +0530 Subject: [scikit-learn] Nearest neighbor search with 2 distance measures In-Reply-To: References: <379121501436421@mxfront4j.mail.yandex.net> Message-ID: Since you seem to be from Astrophysics/Cosmology background (I am assuming you are jakevdp - the creator of astroML - if you are - I am lucky!), I can explain my application scenario. I am trying to calculate the anisotropic two-point correlation function something like done in rp_pi_tpcf or s_mu_tpcf using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation In halotools ( http://halotools.readthedocs.io/en/latest/function_usage/mock_observables_functions.html) it is implemented using rectangular grids. I could calculate 2pcf with custom metrics using one variable with BallTree as done in astroML. I intend to find the anisotropic counter part. Thanks & Regards, Rohin Y.Rohin Kumar, +919818092877. On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar wrote: > Dear Jake, > > Thanks for your response. I meant to group/count pairs in boxes (using two > arrays simultaneously-hence needing 2 metrics) instead of one distance > array as the binning parameter. I don't know if the algorithm supports such > a thing. For now, I am proceeding with your suggestion of two ball trees at > huge computational cost. I hope I am able to frame my question properly. > > Thanks & Regards, > Rohin. > > > > On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < > jakevdp at cs.washington.edu> wrote: > >> On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar >> wrote: >> >>> *update* >>> >>> May be it doesn't have to be done at the tree creation level. It could >>> be using loops and creating two different balltrees. Something like >>> >>> tree1=BallTree(X,metric='metric1') #for x-z plane >>> tree2=BallTree(X,metric='metric2') #for y-z plane >>> >>> And then calculate correlation functions in a loop to get tpcf(X,r1,r2) >>> using tree1.two_point_correlation(X,r1) and >>> tree2.two_point_correlation(X,r2) >>> >> >> Hi Rohin, >> It's not exactly clear to me what you wish the tree to do with the two >> different metrics, but in any case the ball tree only supports one metric >> at a time. If you can construct your desired result from two ball trees >> each with its own metric, then that's probably the best way to proceed, >> Jake >> >> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yrohinkumar at gmail.com Tue Aug 1 07:48:23 2017 From: yrohinkumar at gmail.com (Rohin Kumar) Date: Tue, 1 Aug 2017 17:18:23 +0530 Subject: [scikit-learn] Nearest neighbor search with 2 distance measures In-Reply-To: References: <379121501436421@mxfront4j.mail.yandex.net> Message-ID: Dear Jake, Thanks for your response. I meant to group/count pairs in boxes (using two arrays simultaneously-hence needing 2 metrics) instead of one distance array as the binning parameter. I don't know if the algorithm supports such a thing. For now, I am proceeding with your suggestion of two ball trees at huge computational cost. I hope I am able to frame my question properly. Thanks & Regards, Rohin. On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas wrote: > On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar > wrote: > >> *update* >> >> May be it doesn't have to be done at the tree creation level. It could be >> using loops and creating two different balltrees. Something like >> >> tree1=BallTree(X,metric='metric1') #for x-z plane >> tree2=BallTree(X,metric='metric2') #for y-z plane >> >> And then calculate correlation functions in a loop to get tpcf(X,r1,r2) >> using tree1.two_point_correlation(X,r1) and tree2.two_point_correlation( >> X,r2) >> > > Hi Rohin, > It's not exactly clear to me what you wish the tree to do with the two > different metrics, but in any case the ball tree only supports one metric > at a time. If you can construct your desired result from two ball trees > each with its own metric, then that's probably the best way to proceed, > Jake > > >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jeremiah.Johnson at unh.edu Tue Aug 1 12:03:01 2017 From: Jeremiah.Johnson at unh.edu (Johnson, Jeremiah) Date: Tue, 1 Aug 2017 16:03:01 +0000 Subject: [scikit-learn] question about class_weights in LogisticRegression Message-ID: Hello all, I'm looking for confirmation on an implementation detail that is somewhere in liblinear, but I haven't found documentation for yet. When the class_weights='balanced' parameter is set in LogisticRegression, then the regularisation parameter for an observation from class I is class_weight[I] * C, where C is the usual regularization parameter - is this correct? Thanks, Jeremiah -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Tue Aug 1 12:19:54 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Tue, 1 Aug 2017 09:19:54 -0700 Subject: [scikit-learn] question about class_weights in LogisticRegression In-Reply-To: References: Message-ID: I hope not. And not accoring to the docs... https://github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/linear_model/logistic.py#L947 class_weight : dict or 'balanced', optional Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. On Tue, Aug 1, 2017 at 9:03 AM, Johnson, Jeremiah wrote: > Hello all, > > I?m looking for confirmation on an implementation detail that is somewhere > in liblinear, but I haven?t found documentation for yet. When the > class_weights=?balanced? parameter is set in LogisticRegression, then the > regularisation parameter for an observation from class I is class_weight[I] > * C, where C is the usual regularization parameter ? is this correct? > > Thanks, > Jeremiah > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From Jeremiah.Johnson at unh.edu Tue Aug 1 12:30:22 2017 From: Jeremiah.Johnson at unh.edu (Johnson, Jeremiah) Date: Tue, 1 Aug 2017 16:30:22 +0000 Subject: [scikit-learn] question about class_weights in LogisticRegression In-Reply-To: References: Message-ID: Right, I know how the class_weight calculation is performed. But then those class weights are utilized during the model fit process in some way in liblinear, and that?s what I am interested in. libSVM does class_weight[I] * C (https://www.csie.ntu.edu.tw/~cjlin/libsvm/); is the implementation in liblinear the same? Best, Jeremiah On 8/1/17, 12:19 PM, "scikit-learn on behalf of Stuart Reynolds" wrote: >I hope not. And not accoring to the docs... >https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dl >earn_scikit-2Dlearn_blob_ab93d65_sklearn_linear-5Fmodel_logistic.py-23L947 >&d=DwIGaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=hQNTLb4Jonm4n54VBW80WEzIAaqvTOcTEjhIk >rRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_FwzMKMwnvEOWj8o&s=4uJZS3EaQgysmQlzjt- >yuLkSlcXTd5G50LkEFMcbZLQ&e= > >class_weight : dict or 'balanced', optional >Weights associated with classes in the form ``{class_label: weight}``. >If not given, all classes are supposed to have weight one. >The "balanced" mode uses the values of y to automatically adjust >weights inversely proportional to class frequencies in the input data >as ``n_samples / (n_classes * np.bincount(y))``. >Note that these weights will be multiplied with sample_weight (passed >through the fit method) if sample_weight is specified. > >On Tue, Aug 1, 2017 at 9:03 AM, Johnson, Jeremiah > wrote: >> Hello all, >> >> I?m looking for confirmation on an implementation detail that is >>somewhere >> in liblinear, but I haven?t found documentation for yet. When the >> class_weights=?balanced? parameter is set in LogisticRegression, then >>the >> regularisation parameter for an observation from class I is >>class_weight[I] >> * C, where C is the usual regularization parameter ? is this correct? >> >> Thanks, >> Jeremiah >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> >>https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mail >>man_listinfo_scikit-2Dlearn&d=DwIGaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=hQNTLb4Jo >>nm4n54VBW80WEzIAaqvTOcTEjhIkrRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_FwzMKMwn >>vEOWj8o&s=MgZoI9VOHFh3omGKHTASFx3NAVjj6AY3j_75mnOUg04&e= >> >_______________________________________________ >scikit-learn mailing list >scikit-learn at python.org >https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailm >an_listinfo_scikit-2Dlearn&d=DwIGaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=hQNTLb4Jonm >4n54VBW80WEzIAaqvTOcTEjhIkrRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_FwzMKMwnvEO >Wj8o&s=MgZoI9VOHFh3omGKHTASFx3NAVjj6AY3j_75mnOUg04&e= From jakevdp at cs.washington.edu Tue Aug 1 13:25:52 2017 From: jakevdp at cs.washington.edu (Jacob Vanderplas) Date: Tue, 1 Aug 2017 10:25:52 -0700 Subject: [scikit-learn] Nearest neighbor search with 2 distance measures In-Reply-To: References: <379121501436421@mxfront4j.mail.yandex.net> Message-ID: Hi Rohin, Ah, I see. I don't think a BallTree is the right data structure for an anisotropic N-point query, because it fundamentally assumes spherical symmetry of the metric. You may be able to do something like this with a specialized KD-tree, but scikit-learn doesn't support this, and I don't imagine that it ever will given the very specialized nature of the application. I'm certain someone has written efficient code for this operation in the astronomy community, but I don't know of any good Python package to recommend for this ? I'd suggest googling for keywords and seeing where that gets you. Thanks, Jake Jake VanderPlas Senior Data Science Fellow Director of Open Software University of Washington eScience Institute On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar wrote: > Since you seem to be from Astrophysics/Cosmology background (I am assuming > you are jakevdp - the creator of astroML - if you are - I am lucky!), I can > explain my application scenario. I am trying to calculate the anisotropic > two-point correlation function something like done in rp_pi_tpcf > > or s_mu_tpcf > > using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation > > In halotools (http://halotools.readthedocs.io/en/latest/function_usage/ > mock_observables_functions.html) it is implemented using rectangular > grids. I could calculate 2pcf with custom metrics using one variable with > BallTree as done in astroML. I intend to find the anisotropic counter part. > > Thanks & Regards, > Rohin > > Y.Rohin Kumar, > +919818092877 <+91%2098180%2092877>. > > On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar wrote: > >> Dear Jake, >> >> Thanks for your response. I meant to group/count pairs in boxes (using >> two arrays simultaneously-hence needing 2 metrics) instead of one distance >> array as the binning parameter. I don't know if the algorithm supports such >> a thing. For now, I am proceeding with your suggestion of two ball trees at >> huge computational cost. I hope I am able to frame my question properly. >> >> Thanks & Regards, >> Rohin. >> >> >> >> On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < >> jakevdp at cs.washington.edu> wrote: >> >>> On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar >>> wrote: >>> >>>> *update* >>>> >>>> May be it doesn't have to be done at the tree creation level. It could >>>> be using loops and creating two different balltrees. Something like >>>> >>>> tree1=BallTree(X,metric='metric1') #for x-z plane >>>> tree2=BallTree(X,metric='metric2') #for y-z plane >>>> >>>> And then calculate correlation functions in a loop to get tpcf(X,r1,r2) >>>> using tree1.two_point_correlation(X,r1) and >>>> tree2.two_point_correlation(X,r2) >>>> >>> >>> Hi Rohin, >>> It's not exactly clear to me what you wish the tree to do with the two >>> different metrics, but in any case the ball tree only supports one metric >>> at a time. If you can construct your desired result from two ball trees >>> each with its own metric, then that's probably the best way to proceed, >>> Jake >>> >>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yrohinkumar at gmail.com Tue Aug 1 13:50:58 2017 From: yrohinkumar at gmail.com (Rohin Kumar) Date: Tue, 1 Aug 2017 23:20:58 +0530 Subject: [scikit-learn] Nearest neighbor search with 2 distance measures In-Reply-To: References: <379121501436421@mxfront4j.mail.yandex.net> Message-ID: Dear Jake, Thank you for your prompt reply. I started with KD-tree but after realising it doesn't support custom metrics (I don't know the reason for this - would be nice feature) I shifted to BallTree and was looking for a 2 metric based categorisation. After looking around, the best I could find at most were brute-force methods written in python (had my own version too) or better optimised ones in C or FORTRAN. The closest one was halotools which again works with euclidean metric. For now, I will try to get my work done with 2 different BallTrees iteratively in bins. If I find a better option will try to post an update. Regards, Rohin. On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas wrote: > Hi Rohin, > Ah, I see. I don't think a BallTree is the right data structure for an > anisotropic N-point query, because it fundamentally assumes spherical > symmetry of the metric. You may be able to do something like this with a > specialized KD-tree, but scikit-learn doesn't support this, and I don't > imagine that it ever will given the very specialized nature of the > application. > > I'm certain someone has written efficient code for this operation in the > astronomy community, but I don't know of any good Python package to > recommend for this ? I'd suggest googling for keywords and seeing where > that gets you. > > Thanks, > Jake > > Jake VanderPlas > Senior Data Science Fellow > Director of Open Software > University of Washington eScience Institute > > On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar wrote: > >> Since you seem to be from Astrophysics/Cosmology background (I am >> assuming you are jakevdp - the creator of astroML - if you are - I am >> lucky!), I can explain my application scenario. I am trying to calculate >> the anisotropic two-point correlation function something like done in >> rp_pi_tpcf >> >> or s_mu_tpcf >> >> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation >> >> In halotools (http://halotools.readthedocs.io/en/latest/function_usage/mo >> ck_observables_functions.html) it is implemented using rectangular >> grids. I could calculate 2pcf with custom metrics using one variable with >> BallTree as done in astroML. I intend to find the anisotropic counter part. >> >> Thanks & Regards, >> Rohin >> >> >> On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar >> wrote: >> >>> Dear Jake, >>> >>> Thanks for your response. I meant to group/count pairs in boxes (using >>> two arrays simultaneously-hence needing 2 metrics) instead of one distance >>> array as the binning parameter. I don't know if the algorithm supports such >>> a thing. For now, I am proceeding with your suggestion of two ball trees at >>> huge computational cost. I hope I am able to frame my question properly. >>> >>> Thanks & Regards, >>> Rohin. >>> >>> >>> >>> On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < >>> jakevdp at cs.washington.edu> wrote: >>> >>>> On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar >>>> wrote: >>>> >>>>> *update* >>>>> >>>>> May be it doesn't have to be done at the tree creation level. It could >>>>> be using loops and creating two different balltrees. Something like >>>>> >>>>> tree1=BallTree(X,metric='metric1') #for x-z plane >>>>> tree2=BallTree(X,metric='metric2') #for y-z plane >>>>> >>>>> And then calculate correlation functions in a loop to get >>>>> tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and >>>>> tree2.two_point_correlation(X,r2) >>>>> >>>> >>>> Hi Rohin, >>>> It's not exactly clear to me what you wish the tree to do with the two >>>> different metrics, but in any case the ball tree only supports one metric >>>> at a time. If you can construct your desired result from two ball trees >>>> each with its own metric, then that's probably the best way to proceed, >>>> Jake >>>> >>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakevdp at cs.washington.edu Tue Aug 1 13:59:21 2017 From: jakevdp at cs.washington.edu (Jacob Vanderplas) Date: Tue, 1 Aug 2017 10:59:21 -0700 Subject: [scikit-learn] Nearest neighbor search with 2 distance measures In-Reply-To: References: <379121501436421@mxfront4j.mail.yandex.net> Message-ID: On Tue, Aug 1, 2017 at 10:50 AM, Rohin Kumar wrote: > I started with KD-tree but after realising it doesn't support custom > metrics (I don't know the reason for this - would be nice feature) > The scikit-learn KD-tree doesn't support custom metrics because it utilizes relatively strong assumptions about the form of the metric when constructing the tree. The Ball Tree makes fewer assumptions, which is why it can support arbitrary metrics. It would in principal be possible to create a KD Tree that supports custom *axis-aligned* metrics, but again I think that would be too specialized for inclusion in scikit-learn. One project you might check out is cykdtree: https://pypi.python.org/pypi/cykdtree I'm not certain whether it supports the queries you need, but I would bet the team behind that would be willing to work toward these sorts of specialized queries if they don't already exist. Jake > I shifted to BallTree and was looking for a 2 metric based categorisation. > After looking around, the best I could find at most were brute-force > methods written in python (had my own version too) or better optimised ones > in C or FORTRAN. The closest one was halotools which again works with > euclidean metric. For now, I will try to get my work done with 2 different > BallTrees iteratively in bins. If I find a better option will try to post > an update. > > Regards, > Rohin. > > > On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas < > jakevdp at cs.washington.edu> wrote: > >> Hi Rohin, >> Ah, I see. I don't think a BallTree is the right data structure for an >> anisotropic N-point query, because it fundamentally assumes spherical >> symmetry of the metric. You may be able to do something like this with a >> specialized KD-tree, but scikit-learn doesn't support this, and I don't >> imagine that it ever will given the very specialized nature of the >> application. >> >> I'm certain someone has written efficient code for this operation in the >> astronomy community, but I don't know of any good Python package to >> recommend for this ? I'd suggest googling for keywords and seeing where >> that gets you. >> >> Thanks, >> Jake >> >> Jake VanderPlas >> Senior Data Science Fellow >> Director of Open Software >> University of Washington eScience Institute >> >> On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar >> wrote: >> >>> Since you seem to be from Astrophysics/Cosmology background (I am >>> assuming you are jakevdp - the creator of astroML - if you are - I am >>> lucky!), I can explain my application scenario. I am trying to calculate >>> the anisotropic two-point correlation function something like done in >>> rp_pi_tpcf >>> >>> or s_mu_tpcf >>> >>> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation >>> >>> In halotools (http://halotools.readthedocs. >>> io/en/latest/function_usage/mock_observables_functions.html) it is >>> implemented using rectangular grids. I could calculate 2pcf with custom >>> metrics using one variable with BallTree as done in astroML. I intend to >>> find the anisotropic counter part. >>> >>> Thanks & Regards, >>> Rohin >>> >>> >>> On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar >>> wrote: >>> >>>> Dear Jake, >>>> >>>> Thanks for your response. I meant to group/count pairs in boxes (using >>>> two arrays simultaneously-hence needing 2 metrics) instead of one distance >>>> array as the binning parameter. I don't know if the algorithm supports such >>>> a thing. For now, I am proceeding with your suggestion of two ball trees at >>>> huge computational cost. I hope I am able to frame my question properly. >>>> >>>> Thanks & Regards, >>>> Rohin. >>>> >>>> >>>> >>>> On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < >>>> jakevdp at cs.washington.edu> wrote: >>>> >>>>> On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar >>>>> wrote: >>>>> >>>>>> *update* >>>>>> >>>>>> May be it doesn't have to be done at the tree creation level. It >>>>>> could be using loops and creating two different balltrees. Something like >>>>>> >>>>>> tree1=BallTree(X,metric='metric1') #for x-z plane >>>>>> tree2=BallTree(X,metric='metric2') #for y-z plane >>>>>> >>>>>> And then calculate correlation functions in a loop to get >>>>>> tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and >>>>>> tree2.two_point_correlation(X,r2) >>>>>> >>>>> >>>>> Hi Rohin, >>>>> It's not exactly clear to me what you wish the tree to do with the two >>>>> different metrics, but in any case the ball tree only supports one metric >>>>> at a time. If you can construct your desired result from two ball trees >>>>> each with its own metric, then that's probably the best way to proceed, >>>>> Jake >>>>> >>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sambarnett95 at gmail.com Wed Aug 2 08:38:50 2017 From: sambarnett95 at gmail.com (Sam Barnett) Date: Wed, 2 Aug 2017 13:38:50 +0100 Subject: [scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer Message-ID: Dear all, I have created a 2-step pipeline with a custom transformer followed by a simple SVC classifier, and I wish to run a grid-search over it. I am able to successfully create the transformer and the pipeline, and each of these elements work fine. However, when I try to use the fit() method on my GridSearchCV object, I get the following error: 57 # during fit. 58 if X.shape != self.input_shape_: ---> 59 raise ValueError('Shape of input is different from what was seen ' 60 'in `fit`') 61 ValueError: Shape of input is different from what was seen in `fit` For a full breakdown of the problem, I have written a Jupyter notebook showing exactly how the error occurs (this also contains all .py files necessary to run the notebook). Can anybody see how to work through this? Many thanks, Sam Barnett -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Sequential Kernel Test.zip Type: application/zip Size: 6759 bytes Desc: not available URL: From viewsonic234 at aim.com Wed Aug 2 11:36:24 2017 From: viewsonic234 at aim.com (Chris Carrion) Date: Wed, 2 Aug 2017 11:36:24 -0400 Subject: [scikit-learn] minibatchkmeans deprecation warning? Message-ID: <3xMy9f2YqXzFqm1@mail.python.org> Hi, I?m working in an environment provided by Quantopian, an algorithmic-traders hub for research. I imported the minibatch kmeans from sklearn.clusters in the environment they provided, but I?m getting a deprecation warning. After reaching out to Quantopian support, they claim it?s something with the way sklearn is coded, and nothing can be done on their end. I was wondering whether this was true or not. Curious, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Aug 2 12:05:17 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 2 Aug 2017 12:05:17 -0400 Subject: [scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer In-Reply-To: References: Message-ID: Hi Sam. GridSearchCV will do cross-validation, which requires to "transform" the test data. The shape of the test-data will be different from the shape of the training data. You need to have the ability to compute the kernel between the training data and new test data. A more hacky solution would be to compute the full kernel matrix in advance and pass that to GridSearchCV. You probably don't need it here, but you should also checkout what the _pairwise attribute does in cross-validation, because that it likely to come up when playing with kernels. Hth, Andy On 08/02/2017 08:38 AM, Sam Barnett wrote: > Dear all, > > I have created a 2-step pipeline with a custom transformer followed by > a simple SVC classifier, and I wish to run a grid-search over it. I am > able to successfully create the transformer and the pipeline, and each > of these elements work fine. However, when I try to use the fit() > method on my GridSearchCV object, I get the following error: > > 57 # during fit. > 58 if X.shape != self.input_shape_: > ---> 59 raise ValueError('Shape of input is different from > what was seen ' > 60 'in `fit`') > 61 > > ValueError: Shape of input is different from what was seen in `fit` > > For a full breakdown of the problem, I have written a Jupyter notebook > showing exactly how the error occurs (this also contains all .py files > necessary to run the notebook). Can anybody see how to work through this? > > Many thanks, > Sam Barnett > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Aug 2 12:05:44 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 2 Aug 2017 12:05:44 -0400 Subject: [scikit-learn] minibatchkmeans deprecation warning? In-Reply-To: <3xMy9f2YqXzFqm1@mail.python.org> References: <3xMy9f2YqXzFqm1@mail.python.org> Message-ID: Hi Chris. What is the warning? Andy On 08/02/2017 11:36 AM, Chris Carrion via scikit-learn wrote: > > Hi, > > I?m working in an environment provided by Quantopian, an > algorithmic-traders hub for research. I imported the minibatch kmeans > from sklearn.clusters in the environment they provided, but I?m > getting a deprecation warning. After reaching out to Quantopian > support, they claim it?s something with the way sklearn is coded, and > nothing can be done on their end. I was wondering whether this was > true or not. > > Curious, > > Chris > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From viewsonic234 at aim.com Wed Aug 2 12:10:30 2017 From: viewsonic234 at aim.com (Chris Carrion) Date: Wed, 2 Aug 2017 12:10:30 -0400 Subject: [scikit-learn] minibatchkmeans deprecation warning? In-Reply-To: References: <3xMy9f2YqXzFqm1@mail.python.org> Message-ID: <3xMypN0FjFzFqw2@mail.python.org> Hi Andy, WARN sklearn/cluster/k_means_.py:1301: DeprecationWarning: This function is deprecated. Please call randint(0, 179 + 1) instead That?s all I?m given From: Andreas Mueller Sent: Wednesday, August 2, 2017 12:09 PM To: Chris Carrion via scikit-learn Subject: Re: [scikit-learn] minibatchkmeans deprecation warning? Hi Chris. What is the warning? Andy On 08/02/2017 11:36 AM, Chris Carrion via scikit-learn wrote: Hi, ? I?m working in an environment provided by Quantopian, an algorithmic-traders hub for research. I imported the minibatch kmeans from sklearn.clusters in the environment they provided, but I?m getting a deprecation warning. After reaching out to Quantopian support, they claim it?s something with the way sklearn is coded, and nothing can be done on their end. I was wondering whether this was true or not. ? Curious, Chris _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Aug 2 12:32:03 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 2 Aug 2017 12:32:03 -0400 Subject: [scikit-learn] minibatchkmeans deprecation warning? In-Reply-To: <3xMypN0FjFzFqw2@mail.python.org> References: <3xMy9f2YqXzFqm1@mail.python.org> <3xMypN0FjFzFqw2@mail.python.org> Message-ID: <66043d0e-dce5-ebac-a100-31bc02760aa3@gmail.com> Ah. That's actually a deprecation warning coming from numpy, and it think it'll be removed in 0.19 (if not already in 0.18.1). It's really nothing to worry about, though. Andy On 08/02/2017 12:10 PM, Chris Carrion via scikit-learn wrote: > > Hi Andy, > > WARNsklearn/cluster/k_means_.py:1301: DeprecationWarning: This > function is deprecated. Please call randint(0, 179 + 1) instead > > That?s all I?m given > > *From: *Andreas Mueller > *Sent: *Wednesday, August 2, 2017 12:09 PM > *To: *Chris Carrion via scikit-learn > *Subject: *Re: [scikit-learn] minibatchkmeans deprecation warning? > > Hi Chris. > > What is the warning? > > Andy > > On 08/02/2017 11:36 AM, Chris Carrion via scikit-learn wrote: > > Hi, > > I?m working in an environment provided by Quantopian, an > algorithmic-traders hub for research. I imported the minibatch > kmeans from sklearn.clusters in the environment they provided, but > I?m getting a deprecation warning. After reaching out to > Quantopian support, they claim it?s something with the way sklearn > is coded, and nothing can be done on their end. I was wondering > whether this was true or not. > > Curious, > > Chris > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From viewsonic234 at aim.com Wed Aug 2 12:38:47 2017 From: viewsonic234 at aim.com (Chris Carrion) Date: Wed, 2 Aug 2017 12:38:47 -0400 Subject: [scikit-learn] minibatchkmeans deprecation warning? In-Reply-To: <66043d0e-dce5-ebac-a100-31bc02760aa3@gmail.com> References: <3xMy9f2YqXzFqm1@mail.python.org> <3xMypN0FjFzFqw2@mail.python.org> <66043d0e-dce5-ebac-a100-31bc02760aa3@gmail.com> Message-ID: <3xMzR06sl2zFqwt@mail.python.org> That?s great to hear, thanks! Chris From: Andreas Mueller Sent: Wednesday, August 2, 2017 12:34 PM To: Chris Carrion via scikit-learn Subject: Re: [scikit-learn] minibatchkmeans deprecation warning? Ah. That's actually a deprecation warning coming from numpy, and it think it'll be removed in 0.19 (if not already in 0.18.1). It's really nothing to worry about, though. Andy On 08/02/2017 12:10 PM, Chris Carrion via scikit-learn wrote: Hi Andy, WARN sklearn/cluster/k_means_.py:1301: DeprecationWarning: This function is deprecated. Please call randint(0, 179 + 1) instead ? That?s all I?m given From: Andreas Mueller Sent: Wednesday, August 2, 2017 12:09 PM To: Chris Carrion via scikit-learn Subject: Re: [scikit-learn] minibatchkmeans deprecation warning? ? Hi Chris. What is the warning? Andy On 08/02/2017 11:36 AM, Chris Carrion via scikit-learn wrote: Hi, ? I?m working in an environment provided by Quantopian, an algorithmic-traders hub for research. I imported the minibatch kmeans from sklearn.clusters in the environment they provided, but I?m getting a deprecation warning. After reaching out to Quantopian support, they claim it?s something with the way sklearn is coded, and nothing can be done on their end. I was wondering whether this was true or not. ? Curious, Chris _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ? ? _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From viewsonic234 at aim.com Wed Aug 2 12:48:06 2017 From: viewsonic234 at aim.com (Chris Carrion) Date: Wed, 2 Aug 2017 12:48:06 -0400 Subject: [scikit-learn] minibatchkmeans deprecation warning? In-Reply-To: <66043d0e-dce5-ebac-a100-31bc02760aa3@gmail.com> References: <3xMy9f2YqXzFqm1@mail.python.org> <3xMypN0FjFzFqw2@mail.python.org> <66043d0e-dce5-ebac-a100-31bc02760aa3@gmail.com> Message-ID: <3xMzdn0TcWzFqVr@mail.python.org> Before I forget, is there an ETA for .19, or an average time between upgrades? From: Andreas Mueller Sent: Wednesday, August 2, 2017 12:34 PM To: Chris Carrion via scikit-learn Subject: Re: [scikit-learn] minibatchkmeans deprecation warning? Ah. That's actually a deprecation warning coming from numpy, and it think it'll be removed in 0.19 (if not already in 0.18.1). It's really nothing to worry about, though. Andy On 08/02/2017 12:10 PM, Chris Carrion via scikit-learn wrote: Hi Andy, WARN sklearn/cluster/k_means_.py:1301: DeprecationWarning: This function is deprecated. Please call randint(0, 179 + 1) instead ? That?s all I?m given From: Andreas Mueller Sent: Wednesday, August 2, 2017 12:09 PM To: Chris Carrion via scikit-learn Subject: Re: [scikit-learn] minibatchkmeans deprecation warning? ? Hi Chris. What is the warning? Andy On 08/02/2017 11:36 AM, Chris Carrion via scikit-learn wrote: Hi, ? I?m working in an environment provided by Quantopian, an algorithmic-traders hub for research. I imported the minibatch kmeans from sklearn.clusters in the environment they provided, but I?m getting a deprecation warning. After reaching out to Quantopian support, they claim it?s something with the way sklearn is coded, and nothing can be done on their end. I was wondering whether this was true or not. ? Curious, Chris _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ? ? _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Aug 2 14:36:02 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 2 Aug 2017 14:36:02 -0400 Subject: [scikit-learn] minibatchkmeans deprecation warning? In-Reply-To: <3xMzdn0TcWzFqVr@mail.python.org> References: <3xMy9f2YqXzFqm1@mail.python.org> <3xMypN0FjFzFqw2@mail.python.org> <66043d0e-dce5-ebac-a100-31bc02760aa3@gmail.com> <3xMzdn0TcWzFqVr@mail.python.org> Message-ID: <31bc0362-f3b4-94af-b240-0a1d4bb9e7e0@gmail.com> The docs say 3 month, I think. Though it's been more like 8. 0.19 will come out in August. On 08/02/2017 12:48 PM, Chris Carrion via scikit-learn wrote: > > Before I forget, is there an ETA for .19, or an average time between > upgrades? > > *From: *Andreas Mueller > *Sent: *Wednesday, August 2, 2017 12:34 PM > *To: *Chris Carrion via scikit-learn > *Subject: *Re: [scikit-learn] minibatchkmeans deprecation warning? > > Ah. > That's actually a deprecation warning coming from numpy, and it think > it'll be removed in 0.19 (if not already in 0.18.1). > It's really nothing to worry about, though. > > Andy > > On 08/02/2017 12:10 PM, Chris Carrion via scikit-learn wrote: > > Hi Andy, > > WARNsklearn/cluster/k_means_.py:1301: DeprecationWarning: This > function is deprecated. Please call randint(0, 179 + 1) instead > > That?s all I?m given > > *From: *Andreas Mueller > *Sent: *Wednesday, August 2, 2017 12:09 PM > *To: *Chris Carrion via scikit-learn > *Subject: *Re: [scikit-learn] minibatchkmeans deprecation warning? > > Hi Chris. > > What is the warning? > > Andy > > On 08/02/2017 11:36 AM, Chris Carrion via scikit-learn wrote: > > Hi, > > I?m working in an environment provided by Quantopian, an > algorithmic-traders hub for research. I imported the minibatch > kmeans from sklearn.clusters in the environment they provided, > but I?m getting a deprecation warning. After reaching out to > Quantopian support, they claim it?s something with the way > sklearn is coded, and nothing can be done on their end. I was > wondering whether this was true or not. > > Curious, > > Chris > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From sambarnett95 at gmail.com Wed Aug 2 15:08:07 2017 From: sambarnett95 at gmail.com (Sam Barnett) Date: Wed, 2 Aug 2017 20:08:07 +0100 Subject: [scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer In-Reply-To: References: Message-ID: Hi Andy, The purpose of the transformer is to take an ordinary kernel (in this case I have taken 'rbf' as a default) and return a 'sequentialised' kernel using a few extra parameters. Hence, the transformer takes an ordinary data-target pair X, y as its input, and the fit_transform(X, y) method will output the Gram matrix for X that is associated with this sequentialised kernel. In the pipeline, this Gram matrix is passed into an SVC classifier with the kernel parameter set to 'precomputed'. Therefore, I do not think your hacky solution would be possible. However, I am still unsure how to implement your first solution: won't the Gram matrix from the transformer contain all the necessary kernel values? Could you elaborate further? Best, Sam On Wed, Aug 2, 2017 at 5:05 PM, Andreas Mueller wrote: > Hi Sam. > GridSearchCV will do cross-validation, which requires to "transform" the > test data. > The shape of the test-data will be different from the shape of the > training data. > You need to have the ability to compute the kernel between the training > data and new test data. > > A more hacky solution would be to compute the full kernel matrix in > advance and pass that to GridSearchCV. > > You probably don't need it here, but you should also checkout what the > _pairwise attribute does in cross-validation, > because that it likely to come up when playing with kernels. > > Hth, > Andy > > > On 08/02/2017 08:38 AM, Sam Barnett wrote: > > Dear all, > > I have created a 2-step pipeline with a custom transformer followed by a > simple SVC classifier, and I wish to run a grid-search over it. I am able > to successfully create the transformer and the pipeline, and each of these > elements work fine. However, when I try to use the fit() method on my > GridSearchCV object, I get the following error: > > 57 # during fit. > 58 if X.shape != self.input_shape_: > ---> 59 raise ValueError('Shape of input is different from > what was seen ' > 60 'in `fit`') > 61 > > ValueError: Shape of input is different from what was seen in `fit` > > For a full breakdown of the problem, I have written a Jupyter notebook > showing exactly how the error occurs (this also contains all .py files > necessary to run the notebook). Can anybody see how to work through this? > > Many thanks, > Sam Barnett > > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pybokeh at gmail.com Wed Aug 2 22:01:36 2017 From: pybokeh at gmail.com (pybokeh) Date: Wed, 2 Aug 2017 22:01:36 -0400 Subject: [scikit-learn] Help With Text Classification Message-ID: Hello, I am studying this example from scikit-learn's site: http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_ data.html The problem that I need to solve is very similar to this example, except I have one additional feature column (part #) that is categorical of type string. My label or target values consist of just 2 values: 0 or 1. With that additional feature column, I am transforming it with a LabelEncoder and then I am encoding it with the OneHotEncoder. Then I am concatenating that one-hot encoded column (part #) to the text/document feature column (complaint), which I had applied the CountVectorizer and TfidfTransformer transformations. Then I chose the MultinomialNB model to fit my concatenated training data with. The problem I run into is when I invoke the prediction, I get a dimension mis-match error. Here's my jupyter notebook gist: http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85ef86ba41424b311 I would gladly appreciate it if someone can guide me where I went wrong. Thanks! - Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed Aug 2 22:38:34 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 3 Aug 2017 12:38:34 +1000 Subject: [scikit-learn] Help With Text Classification In-Reply-To: References: Message-ID: Use a Pipeline to help avoid this kind of issue (and others). You might also want to do something like http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html On 3 August 2017 at 12:01, pybokeh wrote: > Hello, > I am studying this example from scikit-learn's site: > http://scikit-learn.org/stable/tutorial/text_analytics/ > working_with_text_data.html > > The problem that I need to solve is very similar to this example, except I > have one > additional feature column (part #) that is categorical of type string. My > label or target > values consist of just 2 values: 0 or 1. > > With that additional feature column, I am transforming it with a > LabelEncoder and > then I am encoding it with the OneHotEncoder. > > Then I am concatenating that one-hot encoded column (part #) to the > text/document > feature column (complaint), which I had applied the CountVectorizer and > TfidfTransformer transformations. > > Then I chose the MultinomialNB model to fit my concatenated training data > with. > > The problem I run into is when I invoke the prediction, I get a dimension > mis-match error. > > Here's my jupyter notebook gist: > http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85 > ef86ba41424b311 > > I would gladly appreciate it if someone can guide me where I went wrong. > Thanks! > > - Daniel > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pybokeh at gmail.com Wed Aug 2 23:12:36 2017 From: pybokeh at gmail.com (pybokeh) Date: Wed, 2 Aug 2017 23:12:36 -0400 Subject: [scikit-learn] Help With Text Classification In-Reply-To: References: Message-ID: Thanks Joel for recommending FeatureUnion. I did run across that. But for just 2 features, I thought that might be overkill. I am aware of Pipeline which the scikit-learn example explains very well, which I was going to utilize once I finalize my script. I did not want to abstract away too much early on since I am in the beginning stages of learning machine learning and scikit-learn. - Daniel On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman wrote: > Use a Pipeline to help avoid this kind of issue (and others). You might > also want to do something like http://scikit-learn.org/ > stable/auto_examples/hetero_feature_union.html > > On 3 August 2017 at 12:01, pybokeh wrote: > >> Hello, >> I am studying this example from scikit-learn's site: >> http://scikit-learn.org/stable/tutorial/text_analytics/worki >> ng_with_text_data.html >> >> The problem that I need to solve is very similar to this example, except >> I have one >> additional feature column (part #) that is categorical of type string. >> My label or target >> values consist of just 2 values: 0 or 1. >> >> With that additional feature column, I am transforming it with a >> LabelEncoder and >> then I am encoding it with the OneHotEncoder. >> >> Then I am concatenating that one-hot encoded column (part #) to the >> text/document >> feature column (complaint), which I had applied the CountVectorizer and >> TfidfTransformer transformations. >> >> Then I chose the MultinomialNB model to fit my concatenated training data >> with. >> >> The problem I run into is when I invoke the prediction, I get a dimension >> mis-match error. >> >> Here's my jupyter notebook gist: >> http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85 >> ef86ba41424b311 >> >> I would gladly appreciate it if someone can guide me where I went wrong. >> Thanks! >> >> - Daniel >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yrohinkumar at gmail.com Wed Aug 2 23:42:58 2017 From: yrohinkumar at gmail.com (Rohin Kumar) Date: Thu, 3 Aug 2017 09:12:58 +0530 Subject: [scikit-learn] Nearest neighbor search with 2 distance measures In-Reply-To: References: <379121501436421@mxfront4j.mail.yandex.net> Message-ID: Dear Jake, Thank you for your inputs. Had a look at cykdtree. Core implementation of the algorithm is in C/C++ modifying which is currently beyond my skill. Will try to contact their team if they entertain special requests. I should be able fork and modify the sklearn algorithm in cython once my current project is complete. Currently going ahead with brute-force method. For now, this thread may be considered closed. Thanks once again! Regards, Rohin. On Tue, Aug 1, 2017 at 11:29 PM, Jacob Vanderplas wrote: > On Tue, Aug 1, 2017 at 10:50 AM, Rohin Kumar > wrote: > >> I started with KD-tree but after realising it doesn't support custom >> metrics (I don't know the reason for this - would be nice feature) >> > > The scikit-learn KD-tree doesn't support custom metrics because it > utilizes relatively strong assumptions about the form of the metric when > constructing the tree. The Ball Tree makes fewer assumptions, which is why > it can support arbitrary metrics. It would in principal be possible to > create a KD Tree that supports custom *axis-aligned* metrics, but again I > think that would be too specialized for inclusion in scikit-learn. > > One project you might check out is cykdtree: https://pypi.python. > org/pypi/cykdtree > I'm not certain whether it supports the queries you need, but I would bet > the team behind that would be willing to work toward these sorts of > specialized queries if they don't already exist. > > Jake > > > > >> I shifted to BallTree and was looking for a 2 metric based >> categorisation. After looking around, the best I could find at most were >> brute-force methods written in python (had my own version too) or better >> optimised ones in C or FORTRAN. The closest one was halotools which again >> works with euclidean metric. For now, I will try to get my work done with 2 >> different BallTrees iteratively in bins. If I find a better option will try >> to post an update. >> >> Regards, >> Rohin. >> >> >> On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas < >> jakevdp at cs.washington.edu> wrote: >> >>> Hi Rohin, >>> Ah, I see. I don't think a BallTree is the right data structure for an >>> anisotropic N-point query, because it fundamentally assumes spherical >>> symmetry of the metric. You may be able to do something like this with a >>> specialized KD-tree, but scikit-learn doesn't support this, and I don't >>> imagine that it ever will given the very specialized nature of the >>> application. >>> >>> I'm certain someone has written efficient code for this operation in the >>> astronomy community, but I don't know of any good Python package to >>> recommend for this ? I'd suggest googling for keywords and seeing where >>> that gets you. >>> >>> Thanks, >>> Jake >>> >>> Jake VanderPlas >>> Senior Data Science Fellow >>> Director of Open Software >>> University of Washington eScience Institute >>> >>> On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar >>> wrote: >>> >>>> Since you seem to be from Astrophysics/Cosmology background (I am >>>> assuming you are jakevdp - the creator of astroML - if you are - I am >>>> lucky!), I can explain my application scenario. I am trying to calculate >>>> the anisotropic two-point correlation function something like done in >>>> rp_pi_tpcf >>>> >>>> or s_mu_tpcf >>>> >>>> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation >>>> >>>> In halotools (http://halotools.readthedocs. >>>> io/en/latest/function_usage/mock_observables_functions.html) it is >>>> implemented using rectangular grids. I could calculate 2pcf with custom >>>> metrics using one variable with BallTree as done in astroML. I intend to >>>> find the anisotropic counter part. >>>> >>>> Thanks & Regards, >>>> Rohin >>>> >>>> >>>> On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar >>>> wrote: >>>> >>>>> Dear Jake, >>>>> >>>>> Thanks for your response. I meant to group/count pairs in boxes (using >>>>> two arrays simultaneously-hence needing 2 metrics) instead of one distance >>>>> array as the binning parameter. I don't know if the algorithm supports such >>>>> a thing. For now, I am proceeding with your suggestion of two ball trees at >>>>> huge computational cost. I hope I am able to frame my question properly. >>>>> >>>>> Thanks & Regards, >>>>> Rohin. >>>>> >>>>> >>>>> >>>>> On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < >>>>> jakevdp at cs.washington.edu> wrote: >>>>> >>>>>> On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar >>>>>> wrote: >>>>>> >>>>>>> *update* >>>>>>> >>>>>>> May be it doesn't have to be done at the tree creation level. It >>>>>>> could be using loops and creating two different balltrees. Something like >>>>>>> >>>>>>> tree1=BallTree(X,metric='metric1') #for x-z plane >>>>>>> tree2=BallTree(X,metric='metric2') #for y-z plane >>>>>>> >>>>>>> And then calculate correlation functions in a loop to get >>>>>>> tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and >>>>>>> tree2.two_point_correlation(X,r2) >>>>>>> >>>>>> >>>>>> Hi Rohin, >>>>>> It's not exactly clear to me what you wish the tree to do with the >>>>>> two different metrics, but in any case the ball tree only supports one >>>>>> metric at a time. If you can construct your desired result from two ball >>>>>> trees each with its own metric, then that's probably the best way to >>>>>> proceed, >>>>>> Jake >>>>>> >>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Thu Aug 3 00:54:18 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 3 Aug 2017 14:54:18 +1000 Subject: [scikit-learn] Help With Text Classification In-Reply-To: References: Message-ID: One of the key advantages of Pipeline is that it makes sure that equivalent processing happens at training and prediction time (assuming you do not write your own transformers that break their contract). This is what appears to have broken in your current attempts. On 3 August 2017 at 13:12, pybokeh wrote: > Thanks Joel for recommending FeatureUnion. I did run across that. But > for just 2 features, I thought that might be overkill. I am aware of > Pipeline which the scikit-learn example explains very well, which I was > going to utilize once I finalize my script. I did not want to abstract > away too much early on since I am in the beginning stages of learning > machine learning and scikit-learn. > > - Daniel > > On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman > wrote: > >> Use a Pipeline to help avoid this kind of issue (and others). You might >> also want to do something like http://scikit-learn.org/stable >> /auto_examples/hetero_feature_union.html >> >> On 3 August 2017 at 12:01, pybokeh wrote: >> >>> Hello, >>> I am studying this example from scikit-learn's site: >>> http://scikit-learn.org/stable/tutorial/text_analytics/worki >>> ng_with_text_data.html >>> >>> The problem that I need to solve is very similar to this example, except >>> I have one >>> additional feature column (part #) that is categorical of type string. >>> My label or target >>> values consist of just 2 values: 0 or 1. >>> >>> With that additional feature column, I am transforming it with a >>> LabelEncoder and >>> then I am encoding it with the OneHotEncoder. >>> >>> Then I am concatenating that one-hot encoded column (part #) to the >>> text/document >>> feature column (complaint), which I had applied the CountVectorizer and >>> TfidfTransformer transformations. >>> >>> Then I chose the MultinomialNB model to fit my concatenated training >>> data with. >>> >>> The problem I run into is when I invoke the prediction, I get a >>> dimension mis-match error. >>> >>> Here's my jupyter notebook gist: >>> http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85 >>> ef86ba41424b311 >>> >>> I would gladly appreciate it if someone can guide me where I went >>> wrong. Thanks! >>> >>> - Daniel >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishekraj10 at yahoo.com Thu Aug 3 06:15:50 2017 From: abhishekraj10 at yahoo.com (Abhishek Raj) Date: Thu, 3 Aug 2017 15:45:50 +0530 Subject: [scikit-learn] OneClassSvm | Different results on different runs Message-ID: Hi, I am using one class svm for developing an anomaly detection model. I observed that different runs of training on the same data set outputs different accuracy. One run takes the accuracy as high as 98% and another run on the same data brings it down to 93%. Googling a little bit I found out that this is happening because of the random_state parameter but I am not clear of the details. Can anyone expand on how is the parameter exactly affecting my training and how I can figure out the best value to get the model with best accuracy? Thanks, Abhishek -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaquesgrobler at gmail.com Thu Aug 3 06:39:44 2017 From: jaquesgrobler at gmail.com (Jaques Grobler) Date: Thu, 3 Aug 2017 12:39:44 +0200 Subject: [scikit-learn] OneClassSvm | Different results on different runs In-Reply-To: References: Message-ID: Hi, The random_state parameter is used to generate a pseudo random number that is used when shuffling your data for probability estimation The seed of the pseudo random number generator to use when shuffling the data for probability estimation. A seed can be provided to control the shuffling for reproducible behavior. Also, from the SVM docs The underlying LinearSVC > implementation > uses a random number generator to select features when fitting the model. > It is thus not uncommon, to have slightly different results for the same > input data. If that happens, try with a smaller *tol *parameter. Hope that helps 2017-08-03 12:15 GMT+02:00 Abhishek Raj via scikit-learn < scikit-learn at python.org>: > Hi, > > I am using one class svm for developing an anomaly detection model. I > observed that different runs of training on the same data set outputs > different accuracy. One run takes the accuracy as high as 98% and another > run on the same data brings it down to 93%. Googling a little bit I found > out that this is happening because of the random_state > parameter > but I am not clear of the details. > > Can anyone expand on how is the parameter exactly affecting my training > and how I can figure out the best value to get the model with best accuracy? > > Thanks, > Abhishek > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From albertthomas88 at gmail.com Thu Aug 3 07:26:17 2017 From: albertthomas88 at gmail.com (Albert Thomas) Date: Thu, 03 Aug 2017 11:26:17 +0000 Subject: [scikit-learn] OneClassSvm | Different results on different runs In-Reply-To: References: Message-ID: Hi Abhishek, Could you provide a small code snippet? I don't think the random_state parameter should influence the result of the OneClassSVM as there is no probability estimation for this estimator. Albert On Thu, Aug 3, 2017 at 12:41 PM Jaques Grobler wrote: > Hi, > > The random_state parameter is used to generate a pseudo random number that > is used when shuffling your data for probability estimation > > The seed of the pseudo random number generator to use when shuffling the > data for probability estimation. > A seed can be provided to control the shuffling for reproducible behavior. > > Also, from the SVM docs > > > The underlying LinearSVC >> implementation >> uses a random number generator to select features when fitting the model. >> It is thus not uncommon, to have slightly different results for the same >> input data. If that happens, try with a smaller *tol *parameter. > > > Hope that helps > > 2017-08-03 12:15 GMT+02:00 Abhishek Raj via scikit-learn < > scikit-learn at python.org>: > >> Hi, >> >> I am using one class svm for developing an anomaly detection model. I >> observed that different runs of training on the same data set outputs >> different accuracy. One run takes the accuracy as high as 98% and another >> run on the same data brings it down to 93%. Googling a little bit I found >> out that this is happening because of the random_state >> parameter >> but I am not clear of the details. >> >> Can anyone expand on how is the parameter exactly affecting my training >> and how I can figure out the best value to get the model with best accuracy? >> >> Thanks, >> Abhishek >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From goix.nicolas at gmail.com Thu Aug 3 07:54:37 2017 From: goix.nicolas at gmail.com (Nicolas Goix) Date: Thu, 3 Aug 2017 13:54:37 +0200 Subject: [scikit-learn] OneClassSvm | Different results on different runs In-Reply-To: References: Message-ID: @albertcthomas isn't there some randomness in SMO which could influence the result if the tolerance parameter is too large? On Aug 3, 2017 1:28 PM, "Albert Thomas" wrote: > Hi Abhishek, > > Could you provide a small code snippet? I don't think the random_state > parameter should influence the result of the OneClassSVM as there is no > probability estimation for this estimator. > > Albert > > On Thu, Aug 3, 2017 at 12:41 PM Jaques Grobler > wrote: > >> Hi, >> >> The random_state parameter is used to generate a pseudo random number >> that is used when shuffling your data for probability estimation >> >> The seed of the pseudo random number generator to use when shuffling the >> data for probability estimation. >> A seed can be provided to control the shuffling for reproducible behavior. >> >> Also, from the SVM docs >> >> >> The underlying LinearSVC >>> >>> implementation uses a random number generator to select features when >>> fitting the model. It is thus not uncommon, to have slightly different >>> results for the same input data. If that happens, try with a smaller *tol >>> *parameter. >> >> >> Hope that helps >> >> 2017-08-03 12:15 GMT+02:00 Abhishek Raj via scikit-learn < >> scikit-learn at python.org>: >> >>> Hi, >>> >>> I am using one class svm for developing an anomaly detection model. I >>> observed that different runs of training on the same data set outputs >>> different accuracy. One run takes the accuracy as high as 98% and another >>> run on the same data brings it down to 93%. Googling a little bit I found >>> out that this is happening because of the random_state >>> parameter >>> but I am not clear of the details. >>> >>> Can anyone expand on how is the parameter exactly affecting my training >>> and how I can figure out the best value to get the model with best accuracy? >>> >>> Thanks, >>> Abhishek >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From albertthomas88 at gmail.com Thu Aug 3 09:17:38 2017 From: albertthomas88 at gmail.com (Albert Thomas) Date: Thu, 03 Aug 2017 13:17:38 +0000 Subject: [scikit-learn] OneClassSvm | Different results on different runs In-Reply-To: References: Message-ID: Yes, in fact, changing the random_state might have an influence on the result. The docstring of the random_state parameter for the OneClassSVM seems incorrect though... Albert On Thu, Aug 3, 2017 at 1:55 PM Nicolas Goix wrote: > @albertcthomas isn't there some randomness in SMO which could influence > the result if the tolerance parameter is too large? > > On Aug 3, 2017 1:28 PM, "Albert Thomas" wrote: > >> Hi Abhishek, >> >> Could you provide a small code snippet? I don't think the random_state >> parameter should influence the result of the OneClassSVM as there is no >> probability estimation for this estimator. >> >> Albert >> >> On Thu, Aug 3, 2017 at 12:41 PM Jaques Grobler >> wrote: >> >>> Hi, >>> >>> The random_state parameter is used to generate a pseudo random number >>> that is used when shuffling your data for probability estimation >>> >>> The seed of the pseudo random number generator to use when shuffling the >>> data for probability estimation. >>> A seed can be provided to control the shuffling for reproducible >>> behavior. >>> >>> Also, from the SVM docs >>> >>> >>> The underlying LinearSVC >>>> implementation >>>> uses a random number generator to select features when fitting the model. >>>> It is thus not uncommon, to have slightly different results for the same >>>> input data. If that happens, try with a smaller *tol *parameter. >>> >>> >>> Hope that helps >>> >>> 2017-08-03 12:15 GMT+02:00 Abhishek Raj via scikit-learn < >>> scikit-learn at python.org>: >>> >>>> Hi, >>>> >>>> I am using one class svm for developing an anomaly detection model. I >>>> observed that different runs of training on the same data set outputs >>>> different accuracy. One run takes the accuracy as high as 98% and another >>>> run on the same data brings it down to 93%. Googling a little bit I found >>>> out that this is happening because of the random_state >>>> parameter >>>> but I am not clear of the details. >>>> >>>> Can anyone expand on how is the parameter exactly affecting my training >>>> and how I can figure out the best value to get the model with best accuracy? >>>> >>>> Thanks, >>>> Abhishek >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.waseem.ahmad at gmail.com Thu Aug 3 10:37:02 2017 From: m.waseem.ahmad at gmail.com (muhammad waseem) Date: Thu, 3 Aug 2017 15:37:02 +0100 Subject: [scikit-learn] Extra trees tuning parameters Message-ID: Hi All, I was wondering if you could please tell me what is the "nmin , the minimum sample size for splitting a node" (referred by Geurts et al., 2006) in scikit-learn API for Extra trees? Is it min_samples_split in skearn? Regards Waseem -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.duprelatour at orange.fr Thu Aug 3 11:18:17 2017 From: tom.duprelatour at orange.fr (Tom DLT) Date: Thu, 3 Aug 2017 17:18:17 +0200 Subject: [scikit-learn] question about class_weights in LogisticRegression In-Reply-To: References: Message-ID: The class weights and sample weights are used in the same way, as a factor specific to each sample, in the loss function. In LogisticRegression, it is equivalent to incorporate this factor into a regularization parameter C specific to each sample. Tom 2017-08-01 18:30 GMT+02:00 Johnson, Jeremiah : > Right, I know how the class_weight calculation is performed. But then > those class weights are utilized during the model fit process in some way > in liblinear, and that?s what I am interested in. libSVM does > class_weight[I] * C (https://www.csie.ntu.edu.tw/~cjlin/libsvm/); is the > implementation in liblinear the same? > > Best, > Jeremiah > > > > On 8/1/17, 12:19 PM, "scikit-learn on behalf of Stuart Reynolds" > stuart at stuartreynolds.net> wrote: > > >I hope not. And not accoring to the docs... > >https://urldefense.proofpoint.com/v2/url?u=https- > 3A__github.com_scikit-2Dl > >earn_scikit-2Dlearn_blob_ab93d65_sklearn_linear- > 5Fmodel_logistic.py-23L947 > >&d=DwIGaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=hQNTLb4Jonm4n54VBW80WEzIAaqvTO > cTEjhIk > >rRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_FwzMKMwnvEOWj8o&s= > 4uJZS3EaQgysmQlzjt- > >yuLkSlcXTd5G50LkEFMcbZLQ&e= > > > >class_weight : dict or 'balanced', optional > >Weights associated with classes in the form ``{class_label: weight}``. > >If not given, all classes are supposed to have weight one. > >The "balanced" mode uses the values of y to automatically adjust > >weights inversely proportional to class frequencies in the input data > >as ``n_samples / (n_classes * np.bincount(y))``. > >Note that these weights will be multiplied with sample_weight (passed > >through the fit method) if sample_weight is specified. > > > >On Tue, Aug 1, 2017 at 9:03 AM, Johnson, Jeremiah > > wrote: > >> Hello all, > >> > >> I?m looking for confirmation on an implementation detail that is > >>somewhere > >> in liblinear, but I haven?t found documentation for yet. When the > >> class_weights=?balanced? parameter is set in LogisticRegression, then > >>the > >> regularisation parameter for an observation from class I is > >>class_weight[I] > >> * C, where C is the usual regularization parameter ? is this correct? > >> > >> Thanks, > >> Jeremiah > >> > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> > >>https://urldefense.proofpoint.com/v2/url?u=https- > 3A__mail.python.org_mail > >>man_listinfo_scikit-2Dlearn&d=DwIGaQ&c=c6MrceVCY5m5A_ > KAUkrdoA&r=hQNTLb4Jo > >>nm4n54VBW80WEzIAaqvTOcTEjhIkrRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_ > FwzMKMwn > >>vEOWj8o&s=MgZoI9VOHFh3omGKHTASFx3NAVjj6AY3j_75mnOUg04&e= > >> > >_______________________________________________ > >scikit-learn mailing list > >scikit-learn at python.org > >https://urldefense.proofpoint.com/v2/url?u=https- > 3A__mail.python.org_mailm > >an_listinfo_scikit-2Dlearn&d=DwIGaQ&c=c6MrceVCY5m5A_ > KAUkrdoA&r=hQNTLb4Jonm > >4n54VBW80WEzIAaqvTOcTEjhIkrRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_ > FwzMKMwnvEO > >Wj8o&s=MgZoI9VOHFh3omGKHTASFx3NAVjj6AY3j_75mnOUg04&e= > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Thu Aug 3 12:12:12 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 3 Aug 2017 12:12:12 -0400 Subject: [scikit-learn] OneClassSvm | Different results on different runs In-Reply-To: References: Message-ID: On 08/03/2017 09:17 AM, Albert Thomas wrote: > Yes, in fact, changing the random_state might have an influence on the > result. The docstring of the random_state parameter for the > OneClassSVM seems incorrect though... PR or issue welcome. From t3kcit at gmail.com Thu Aug 3 13:35:46 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 3 Aug 2017 13:35:46 -0400 Subject: [scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer In-Reply-To: References: Message-ID: Hi Sam. You need to put these into a reachable namespace (possibly as private functions) so that they can be pickled. Please stay on the sklearn mailing list, I might not have time to reply. Andy On 08/03/2017 01:24 PM, Sam Barnett wrote: > Hi Andy, > > I've since tried a different solution: instead of a pipeline, I've > simply created a classifier that is for the most part like svm.SVC, > though it takes a few extra inputs for the sequentialisation step. > I've used a Python function that can compute the Gram matrix between > two datasets of any shape to pass into SVC(), though I'm now having > trouble with pickling on the check_estimator test. It appears that > SeqSVC.fit() doesn't like to have methods defined within it. Can you > see how to pass this test? (the .ipynb file shows the error). > > Best, > Sam > > On Wed, Aug 2, 2017 at 9:44 PM, Sam Barnett > wrote: > > You're right: it does fail without GridSearchCV when I change the > size of seq_test. I will look at the transform tomorrow to see if > I can work this out. Thank you for your help so far! > > On Wed, Aug 2, 2017 at 9:20 PM, Andreas Mueller > wrote: > > Change the size of seq_test in your notebook and you'll see > the failure without GridSearchCV. > I haven't looked at your code in detail, but transform is > supposed to work on arbitrary new data with the same number of > features. > Your code requires the test data to have the same shape as the > training data. > Cross-validation will lead to training data and test data > having different sizes. But I feel like something is already > wrong if your > test data size depends on your training data size. > > > > On 08/02/2017 03:08 PM, Sam Barnett wrote: >> Hi Andy, >> >> The purpose of the transformer is to take an ordinary kernel >> (in this case I have taken 'rbf' as a default) and return a >> 'sequentialised' kernel using a few extra parameters. Hence, >> the transformer takes an ordinary data-target pair X, y as >> its input, and the fit_transform(X, y) method will output the >> Gram matrix for X that is associated with this sequentialised >> kernel. In the pipeline, this Gram matrix is passed into an >> SVC classifier with the kernel parameter set to 'precomputed'. >> >> Therefore, I do not think your hacky solution would be >> possible. However, I am still unsure how to implement your >> first solution: won't the Gram matrix from the transformer >> contain all the necessary kernel values? Could you elaborate >> further? >> >> >> Best, >> Sam >> >> On Wed, Aug 2, 2017 at 5:05 PM, Andreas Mueller >> > wrote: >> >> Hi Sam. >> GridSearchCV will do cross-validation, which requires to >> "transform" the test data. >> The shape of the test-data will be different from the >> shape of the training data. >> You need to have the ability to compute the kernel >> between the training data and new test data. >> >> A more hacky solution would be to compute the full kernel >> matrix in advance and pass that to GridSearchCV. >> >> You probably don't need it here, but you should also >> checkout what the _pairwise attribute does in >> cross-validation, >> because that it likely to come up when playing with kernels. >> >> Hth, >> Andy >> >> >> On 08/02/2017 08:38 AM, Sam Barnett wrote: >>> Dear all, >>> >>> I have created a 2-step pipeline with a custom >>> transformer followed by a simple SVC classifier, and I >>> wish to run a grid-search over it. I am able to >>> successfully create the transformer and the pipeline, >>> and each of these elements work fine. However, when I >>> try to use the fit() method on my GridSearchCV object, I >>> get the following error: >>> >>> 57 # during fit. >>> 58 if X.shape != self.input_shape_: >>> ---> 59 raise ValueError('Shape of input is >>> different from what was seen ' >>> 60 'in `fit`') >>> 61 >>> >>> ValueError: Shape of input is different from what was >>> seen in `fit` >>> >>> For a full breakdown of the problem, I have written a >>> Jupyter notebook showing exactly how the error occurs >>> (this also contains all .py files necessary to run the >>> notebook). Can anybody see how to work through this? >>> >>> Many thanks, >>> Sam Barnett >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pybokeh at gmail.com Thu Aug 3 17:48:26 2017 From: pybokeh at gmail.com (pybokeh) Date: Thu, 3 Aug 2017 17:48:26 -0400 Subject: [scikit-learn] Help With Text Classification In-Reply-To: References: Message-ID: I found my problem. When I one-hot encoded my test part #, it resulted in being a 1x1 matrix, when I need it to be a 1x153. This happened because I used the default setting ('auto') for n_values, when I needed it set it to 153. Now when I horizontally stacked it to my other feature matrix, the resulting total # of columns now correctly comes to 1294, instead of 1142. Looking back now, not sure if using Pipeline or using FeatureUnion would have helped in this case or prevented this since this error occurred on the prediction side, not on training or modeling side. On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman wrote: > Use a Pipeline to help avoid this kind of issue (and others). You might > also want to do something like http://scikit-learn.org/ > stable/auto_examples/hetero_feature_union.html > > On 3 August 2017 at 12:01, pybokeh wrote: > >> Hello, >> I am studying this example from scikit-learn's site: >> http://scikit-learn.org/stable/tutorial/text_analytics/worki >> ng_with_text_data.html >> >> The problem that I need to solve is very similar to this example, except >> I have one >> additional feature column (part #) that is categorical of type string. >> My label or target >> values consist of just 2 values: 0 or 1. >> >> With that additional feature column, I am transforming it with a >> LabelEncoder and >> then I am encoding it with the OneHotEncoder. >> >> Then I am concatenating that one-hot encoded column (part #) to the >> text/document >> feature column (complaint), which I had applied the CountVectorizer and >> TfidfTransformer transformations. >> >> Then I chose the MultinomialNB model to fit my concatenated training data >> with. >> >> The problem I run into is when I invoke the prediction, I get a dimension >> mis-match error. >> >> Here's my jupyter notebook gist: >> http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85 >> ef86ba41424b311 >> >> I would gladly appreciate it if someone can guide me where I went wrong. >> Thanks! >> >> - Daniel >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Thu Aug 3 18:29:10 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Fri, 4 Aug 2017 08:29:10 +1000 Subject: [scikit-learn] Help With Text Classification In-Reply-To: References: Message-ID: pipeline helps in prediction time too. On 4 Aug 2017 7:49 am, "pybokeh" wrote: > I found my problem. When I one-hot encoded my test part #, it resulted in > being a 1x1 matrix, when I need it to be a 1x153. This happened because I > used the default setting ('auto') for n_values, when I needed it set it to > 153. Now when I horizontally stacked it to my other feature matrix, the > resulting total # of columns now correctly comes to 1294, instead of > 1142. Looking back now, not sure if using Pipeline or using FeatureUnion > would have helped in this case or prevented this since this error occurred > on the prediction side, not on training or modeling side. > > On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman > wrote: > >> Use a Pipeline to help avoid this kind of issue (and others). You might >> also want to do something like http://scikit-learn.org/stable >> /auto_examples/hetero_feature_union.html >> >> On 3 August 2017 at 12:01, pybokeh wrote: >> >>> Hello, >>> I am studying this example from scikit-learn's site: >>> http://scikit-learn.org/stable/tutorial/text_analytics/worki >>> ng_with_text_data.html >>> >>> The problem that I need to solve is very similar to this example, except >>> I have one >>> additional feature column (part #) that is categorical of type string. >>> My label or target >>> values consist of just 2 values: 0 or 1. >>> >>> With that additional feature column, I am transforming it with a >>> LabelEncoder and >>> then I am encoding it with the OneHotEncoder. >>> >>> Then I am concatenating that one-hot encoded column (part #) to the >>> text/document >>> feature column (complaint), which I had applied the CountVectorizer and >>> TfidfTransformer transformations. >>> >>> Then I chose the MultinomialNB model to fit my concatenated training >>> data with. >>> >>> The problem I run into is when I invoke the prediction, I get a >>> dimension mis-match error. >>> >>> Here's my jupyter notebook gist: >>> http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85 >>> ef86ba41424b311 >>> >>> I would gladly appreciate it if someone can guide me where I went >>> wrong. Thanks! >>> >>> - Daniel >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sambarnett95 at gmail.com Fri Aug 4 06:29:50 2017 From: sambarnett95 at gmail.com (Sam Barnett) Date: Fri, 4 Aug 2017 11:29:50 +0100 Subject: [scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer Message-ID: Hi Andy, I have since been able to resolve the pickling issue, though I am now getting an error message saying that an error message does not include the expected string 'fit'. In general, I am trying to use the fit() method of my classifier to instantiate a separate SVC() classifier with a custom kernel, fit THAT to the data, then return this instance as the fitted version of the new classifier. Is this possible in theory? If so, what is the best way to implement it? As before, the requisite code and a .ipynb file is attached. Best, Sam On Thu, Aug 3, 2017 at 6:35 PM, Andreas Mueller wrote: > Hi Sam. > You need to put these into a reachable namespace (possibly as private > functions) so that they can be pickled. > Please stay on the sklearn mailing list, I might not have time to reply. > > Andy > > > On 08/03/2017 01:24 PM, Sam Barnett wrote: > > Hi Andy, > > I've since tried a different solution: instead of a pipeline, I've simply > created a classifier that is for the most part like svm.SVC, though it > takes a few extra inputs for the sequentialisation step. I've used a Python > function that can compute the Gram matrix between two datasets of any shape > to pass into SVC(), though I'm now having trouble with pickling on the > check_estimator test. It appears that SeqSVC.fit() doesn't like to have > methods defined within it. Can you see how to pass this test? (the .ipynb > file shows the error). > > Best, > Sam > > On Wed, Aug 2, 2017 at 9:44 PM, Sam Barnett > wrote: > >> You're right: it does fail without GridSearchCV when I change the size of >> seq_test. I will look at the transform tomorrow to see if I can work this >> out. Thank you for your help so far! >> >> On Wed, Aug 2, 2017 at 9:20 PM, Andreas Mueller wrote: >> >>> Change the size of seq_test in your notebook and you'll see the failure >>> without GridSearchCV. >>> I haven't looked at your code in detail, but transform is supposed to >>> work on arbitrary new data with the same number of features. >>> Your code requires the test data to have the same shape as the training >>> data. >>> Cross-validation will lead to training data and test data having >>> different sizes. But I feel like something is already wrong if your >>> test data size depends on your training data size. >>> >>> >>> >>> On 08/02/2017 03:08 PM, Sam Barnett wrote: >>> >>> Hi Andy, >>> >>> The purpose of the transformer is to take an ordinary kernel (in this >>> case I have taken 'rbf' as a default) and return a 'sequentialised' kernel >>> using a few extra parameters. Hence, the transformer takes an ordinary >>> data-target pair X, y as its input, and the fit_transform(X, y) method will >>> output the Gram matrix for X that is associated with this sequentialised >>> kernel. In the pipeline, this Gram matrix is passed into an SVC classifier >>> with the kernel parameter set to 'precomputed'. >>> >>> Therefore, I do not think your hacky solution would be possible. >>> However, I am still unsure how to implement your first solution: won't the >>> Gram matrix from the transformer contain all the necessary kernel values? >>> Could you elaborate further? >>> >>> >>> Best, >>> Sam >>> >>> On Wed, Aug 2, 2017 at 5:05 PM, Andreas Mueller >>> wrote: >>> >>>> Hi Sam. >>>> GridSearchCV will do cross-validation, which requires to "transform" >>>> the test data. >>>> The shape of the test-data will be different from the shape of the >>>> training data. >>>> You need to have the ability to compute the kernel between the training >>>> data and new test data. >>>> >>>> A more hacky solution would be to compute the full kernel matrix in >>>> advance and pass that to GridSearchCV. >>>> >>>> You probably don't need it here, but you should also checkout what the >>>> _pairwise attribute does in cross-validation, >>>> because that it likely to come up when playing with kernels. >>>> >>>> Hth, >>>> Andy >>>> >>>> >>>> On 08/02/2017 08:38 AM, Sam Barnett wrote: >>>> >>>> Dear all, >>>> >>>> I have created a 2-step pipeline with a custom transformer followed by >>>> a simple SVC classifier, and I wish to run a grid-search over it. I am able >>>> to successfully create the transformer and the pipeline, and each of these >>>> elements work fine. However, when I try to use the fit() method on my >>>> GridSearchCV object, I get the following error: >>>> >>>> 57 # during fit. >>>> 58 if X.shape != self.input_shape_: >>>> ---> 59 raise ValueError('Shape of input is different from >>>> what was seen ' >>>> 60 'in `fit`') >>>> 61 >>>> >>>> ValueError: Shape of input is different from what was seen in `fit` >>>> >>>> For a full breakdown of the problem, I have written a Jupyter notebook >>>> showing exactly how the error occurs (this also contains all .py files >>>> necessary to run the notebook). Can anybody see how to work through this? >>>> >>>> Many thanks, >>>> Sam Barnett >>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seqsvc.py Type: text/x-python-script Size: 3051 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Sequential Kernel SVC GridSearchCV Test.ipynb Type: application/octet-stream Size: 7678 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SeqKernelLucy.py Type: text/x-python-script Size: 2628 bytes Desc: not available URL: From albertthomas88 at gmail.com Fri Aug 4 08:49:16 2017 From: albertthomas88 at gmail.com (Albert Thomas) Date: Fri, 04 Aug 2017 12:49:16 +0000 Subject: [scikit-learn] OneClassSvm | Different results on different runs In-Reply-To: References: Message-ID: I opened an issue https://github.com/scikit-learn/scikit-learn/issues/9497 Albert On Thu, Aug 3, 2017 at 6:16 PM Andreas Mueller wrote: > > > On 08/03/2017 09:17 AM, Albert Thomas wrote: > > Yes, in fact, changing the random_state might have an influence on the > > result. The docstring of the random_state parameter for the > > OneClassSVM seems incorrect though... > PR or issue welcome. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Fri Aug 4 09:54:00 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 4 Aug 2017 15:54:00 +0200 Subject: [scikit-learn] Extra trees tuning parameters In-Reply-To: References: Message-ID: I believe so even though it's always better to check in the code to see how this parameter is actually used. -- Olivier ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Fri Aug 4 10:50:40 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Fri, 4 Aug 2017 10:50:40 -0400 Subject: [scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer In-Reply-To: References: Message-ID: <90aed61d-2d9b-8d02-3c9b-fbe0ccdd98ee@gmail.com> Yes, that's totally fine. The error is unrelated and just means you need to call ``check_is_fitted`` in your predict method to give a nicer error message. On 08/04/2017 06:29 AM, Sam Barnett wrote: > Hi Andy, > I have since been able to resolve the pickling issue, though I am now > getting an error message saying that an error message does not include > the expected string 'fit'. In general, I am trying to use the fit() > method of my classifier to instantiate a separate SVC() classifier > with a custom kernel, fit THAT to the data, then return this instance > as the fitted version of the new classifier. Is this possible in > theory? If so, what is the best way to implement it? > > As before, the requisite code and a .ipynb file is attached. > > Best, > Sam > > On Thu, Aug 3, 2017 at 6:35 PM, Andreas Mueller > wrote: > > Hi Sam. > You need to put these into a reachable namespace (possibly as > private functions) so that they can be pickled. > Please stay on the sklearn mailing list, I might not have time to > reply. > > Andy > > > On 08/03/2017 01:24 PM, Sam Barnett wrote: >> Hi Andy, >> >> I've since tried a different solution: instead of a pipeline, >> I've simply created a classifier that is for the most part like >> svm.SVC, though it takes a few extra inputs for the >> sequentialisation step. I've used a Python function that can >> compute the Gram matrix between two datasets of any shape to pass >> into SVC(), though I'm now having trouble with pickling on the >> check_estimator test. It appears that SeqSVC.fit() doesn't like >> to have methods defined within it. Can you see how to pass this >> test? (the .ipynb file shows the error). >> >> Best, >> Sam >> >> On Wed, Aug 2, 2017 at 9:44 PM, Sam Barnett >> > wrote: >> >> You're right: it does fail without GridSearchCV when I change >> the size of seq_test. I will look at the transform tomorrow >> to see if I can work this out. Thank you for your help so far! >> >> On Wed, Aug 2, 2017 at 9:20 PM, Andreas Mueller >> > wrote: >> >> Change the size of seq_test in your notebook and you'll >> see the failure without GridSearchCV. >> I haven't looked at your code in detail, but transform is >> supposed to work on arbitrary new data with the same >> number of features. >> Your code requires the test data to have the same shape >> as the training data. >> Cross-validation will lead to training data and test data >> having different sizes. But I feel like something is >> already wrong if your >> test data size depends on your training data size. >> >> >> >> On 08/02/2017 03:08 PM, Sam Barnett wrote: >>> Hi Andy, >>> >>> The purpose of the transformer is to take an ordinary >>> kernel (in this case I have taken 'rbf' as a default) >>> and return a 'sequentialised' kernel using a few extra >>> parameters. Hence, the transformer takes an ordinary >>> data-target pair X, y as its input, and the >>> fit_transform(X, y) method will output the Gram matrix >>> for X that is associated with this sequentialised >>> kernel. In the pipeline, this Gram matrix is passed into >>> an SVC classifier with the kernel parameter set to >>> 'precomputed'. >>> >>> Therefore, I do not think your hacky solution would be >>> possible. However, I am still unsure how to implement >>> your first solution: won't the Gram matrix from the >>> transformer contain all the necessary kernel values? >>> Could you elaborate further? >>> >>> >>> Best, >>> Sam >>> >>> On Wed, Aug 2, 2017 at 5:05 PM, Andreas Mueller >>> > wrote: >>> >>> Hi Sam. >>> GridSearchCV will do cross-validation, which >>> requires to "transform" the test data. >>> The shape of the test-data will be different from >>> the shape of the training data. >>> You need to have the ability to compute the kernel >>> between the training data and new test data. >>> >>> A more hacky solution would be to compute the full >>> kernel matrix in advance and pass that to GridSearchCV. >>> >>> You probably don't need it here, but you should also >>> checkout what the _pairwise attribute does in >>> cross-validation, >>> because that it likely to come up when playing with >>> kernels. >>> >>> Hth, >>> Andy >>> >>> >>> On 08/02/2017 08:38 AM, Sam Barnett wrote: >>>> Dear all, >>>> >>>> I have created a 2-step pipeline with a custom >>>> transformer followed by a simple SVC classifier, >>>> and I wish to run a grid-search over it. I am able >>>> to successfully create the transformer and the >>>> pipeline, and each of these elements work fine. >>>> However, when I try to use the fit() method on my >>>> GridSearchCV object, I get the following error: >>>> >>>> 57 # during fit. >>>> 58 if X.shape != self.input_shape_: >>>> ---> 59 raise ValueError('Shape of >>>> input is different from what was seen ' >>>> 60 'in `fit`') >>>> 61 >>>> >>>> ValueError: Shape of input is different from what >>>> was seen in `fit` >>>> >>>> For a full breakdown of the problem, I have written >>>> a Jupyter notebook showing exactly how the error >>>> occurs (this also contains all .py files necessary >>>> to run the notebook). Can anybody see how to work >>>> through this? >>>> >>>> Many thanks, >>>> Sam Barnett >>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From georg.kf.heiler at gmail.com Sat Aug 5 05:10:57 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Sat, 05 Aug 2017 09:10:57 +0000 Subject: [scikit-learn] transform categorical data to numerical representation Message-ID: Hi, the LabelEncooder is only meant for a single column i.e. target variable. Is the DictVectorizeer or a manual chaining of multiple LabelEncoders (one per categorical column) the desired way to get values which can be fed into a subsequent classifier? Is there some way I have overlooked which works better and possibly also can handle unseen values by applying most frequent imputation? regards, Georg -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Sat Aug 5 12:13:10 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Sat, 5 Aug 2017 12:13:10 -0400 Subject: [scikit-learn] transform categorical data to numerical representation In-Reply-To: References: Message-ID: <6D0BF22C-9ABA-4C2A-B35B-210673439286@gmail.com> Hi, Georg, I bring this up every time here on the mailing list :), and you probably aware of this issue, but it makes a difference whether your categorical data is nominal or ordinal. For instance if you have an ordinal variable like with values like {small, medium, large} you probably want to encode it as {1, 2, 3} or {1, 20, 100} or whatever is appropriate based on your domain knowledge regarding the variable. If you have sth like {blue, red, green} it may make more sense to do a one-hot encoding so that the classifier doesn't assume a relationship between the variables like blue > red > green or sth like that. Now, the DictVectorizer and OneHotEncoder are both doing one hot encoding. The LabelEncoder does convert a variable to integer values, but if you have sth like {small, medium, large}, it wouldn't know the order (if that's an ordinal variable) and it would just assign arbitrary integers in increasing order. Thus, if you are dealing ordinal variables, there's no way around doing this manually; for example you could create mapping dictionaries for that (most conveniently done in pandas). Best, Sebastian > On Aug 5, 2017, at 5:10 AM, Georg Heiler wrote: > > Hi, > > the LabelEncooder is only meant for a single column i.e. target variable. Is the DictVectorizeer or a manual chaining of multiple LabelEncoders (one per categorical column) the desired way to get values which can be fed into a subsequent classifier? > > Is there some way I have overlooked which works better and possibly also can handle unseen values by applying most frequent imputation? > > regards, > Georg > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From joel.nothman at gmail.com Sat Aug 5 18:47:23 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Sun, 6 Aug 2017 08:47:23 +1000 Subject: [scikit-learn] transform categorical data to numerical representation In-Reply-To: <6D0BF22C-9ABA-4C2A-B35B-210673439286@gmail.com> References: <6D0BF22C-9ABA-4C2A-B35B-210673439286@gmail.com> Message-ID: We are working on CategoricalEncoder in https://github.com/scikit-learn/scikit-learn/pull/9151 to help users more with this kind of thing. Feedback and testing is welcome. On 6 August 2017 at 02:13, Sebastian Raschka wrote: > Hi, Georg, > > I bring this up every time here on the mailing list :), and you probably > aware of this issue, but it makes a difference whether your categorical > data is nominal or ordinal. For instance if you have an ordinal variable > like with values like {small, medium, large} you probably want to encode it > as {1, 2, 3} or {1, 20, 100} or whatever is appropriate based on your > domain knowledge regarding the variable. If you have sth like {blue, red, > green} it may make more sense to do a one-hot encoding so that the > classifier doesn't assume a relationship between the variables like blue > > red > green or sth like that. > > Now, the DictVectorizer and OneHotEncoder are both doing one hot encoding. > The LabelEncoder does convert a variable to integer values, but if you have > sth like {small, medium, large}, it wouldn't know the order (if that's an > ordinal variable) and it would just assign arbitrary integers in increasing > order. Thus, if you are dealing ordinal variables, there's no way around > doing this manually; for example you could create mapping dictionaries for > that (most conveniently done in pandas). > > Best, > Sebastian > > > On Aug 5, 2017, at 5:10 AM, Georg Heiler > wrote: > > > > Hi, > > > > the LabelEncooder is only meant for a single column i.e. target > variable. Is the DictVectorizeer or a manual chaining of multiple > LabelEncoders (one per categorical column) the desired way to get values > which can be fed into a subsequent classifier? > > > > Is there some way I have overlooked which works better and possibly also > can handle unseen values by applying most frequent imputation? > > > > regards, > > Georg > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From georg.kf.heiler at gmail.com Sun Aug 6 06:30:28 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Sun, 06 Aug 2017 10:30:28 +0000 Subject: [scikit-learn] transform categorical data to numerical representation In-Reply-To: References: <6D0BF22C-9ABA-4C2A-B35B-210673439286@gmail.com> Message-ID: @sebastian: thanks. Indeed, I am aware of this problem. I developed something here: https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce but realized that the performance of prediction is pretty lame when there are around 100-150 columns used as the input. Do you have some ideas how to speed this up? Regards, Georg Joel Nothman schrieb am So., 6. Aug. 2017 um 00:49 Uhr: > We are working on CategoricalEncoder in > https://github.com/scikit-learn/scikit-learn/pull/9151 to help users more > with this kind of thing. Feedback and testing is welcome. > > On 6 August 2017 at 02:13, Sebastian Raschka wrote: > >> Hi, Georg, >> >> I bring this up every time here on the mailing list :), and you probably >> aware of this issue, but it makes a difference whether your categorical >> data is nominal or ordinal. For instance if you have an ordinal variable >> like with values like {small, medium, large} you probably want to encode it >> as {1, 2, 3} or {1, 20, 100} or whatever is appropriate based on your >> domain knowledge regarding the variable. If you have sth like {blue, red, >> green} it may make more sense to do a one-hot encoding so that the >> classifier doesn't assume a relationship between the variables like blue > >> red > green or sth like that. >> >> Now, the DictVectorizer and OneHotEncoder are both doing one hot >> encoding. The LabelEncoder does convert a variable to integer values, but >> if you have sth like {small, medium, large}, it wouldn't know the order (if >> that's an ordinal variable) and it would just assign arbitrary integers in >> increasing order. Thus, if you are dealing ordinal variables, there's no >> way around doing this manually; for example you could create mapping >> dictionaries for that (most conveniently done in pandas). >> >> Best, >> Sebastian >> >> > On Aug 5, 2017, at 5:10 AM, Georg Heiler >> wrote: >> > >> > Hi, >> > >> > the LabelEncooder is only meant for a single column i.e. target >> variable. Is the DictVectorizeer or a manual chaining of multiple >> LabelEncoders (one per categorical column) the desired way to get values >> which can be fed into a subsequent classifier? >> > >> > Is there some way I have overlooked which works better and possibly >> also can handle unseen values by applying most frequent imputation? >> > >> > regards, >> > Georg >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Sun Aug 6 14:37:15 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Sun, 6 Aug 2017 14:37:15 -0400 Subject: [scikit-learn] transform categorical data to numerical representation In-Reply-To: References: <6D0BF22C-9ABA-4C2A-B35B-210673439286@gmail.com> Message-ID: <103609E5-E50B-4993-87F2-11661E7C7EB5@gmail.com> > performance of prediction is pretty lame when there are around 100-150 columns used as the input. you are talking about computational performance when you are calling the "transform" method? Have you done some profiling to find out where your bottle neck (in the for loop) is? Just one a very quick look, I think this data.loc[~data[column].isin(fittedLabels), column] = str(replacementForUnseen) is already very slow because fittedLabels is an array where you have O(n) lookup instead of an average O(1) by using a hash table. Or is the isin function converting it to a hashtable/set/dict? In general, would it maybe help to use pandas' factorize? https://pandas.pydata.org/pandas-docs/stable/generated/pandas.factorize.html For predict time, say you have only 1 example for prediction that needs to be converted, you could append prototypes of all possible values that could occur, do the transformation, and then only pass the 1 transformed sample to the classifier. I guess that could be even slow though ... Best, Sebastian > On Aug 6, 2017, at 6:30 AM, Georg Heiler wrote: > > @sebastian: thanks. Indeed, I am aware of this problem. > > I developed something here: https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce but realized that the performance of prediction is pretty lame when there are around 100-150 columns used as the input. > Do you have some ideas how to speed this up? > > Regards, > Georg > > Joel Nothman schrieb am So., 6. Aug. 2017 um 00:49 Uhr: > We are working on CategoricalEncoder in https://github.com/scikit-learn/scikit-learn/pull/9151 to help users more with this kind of thing. Feedback and testing is welcome. > > On 6 August 2017 at 02:13, Sebastian Raschka wrote: > Hi, Georg, > > I bring this up every time here on the mailing list :), and you probably aware of this issue, but it makes a difference whether your categorical data is nominal or ordinal. For instance if you have an ordinal variable like with values like {small, medium, large} you probably want to encode it as {1, 2, 3} or {1, 20, 100} or whatever is appropriate based on your domain knowledge regarding the variable. If you have sth like {blue, red, green} it may make more sense to do a one-hot encoding so that the classifier doesn't assume a relationship between the variables like blue > red > green or sth like that. > > Now, the DictVectorizer and OneHotEncoder are both doing one hot encoding. The LabelEncoder does convert a variable to integer values, but if you have sth like {small, medium, large}, it wouldn't know the order (if that's an ordinal variable) and it would just assign arbitrary integers in increasing order. Thus, if you are dealing ordinal variables, there's no way around doing this manually; for example you could create mapping dictionaries for that (most conveniently done in pandas). > > Best, > Sebastian > > > On Aug 5, 2017, at 5:10 AM, Georg Heiler wrote: > > > > Hi, > > > > the LabelEncooder is only meant for a single column i.e. target variable. Is the DictVectorizeer or a manual chaining of multiple LabelEncoders (one per categorical column) the desired way to get values which can be fed into a subsequent classifier? > > > > Is there some way I have overlooked which works better and possibly also can handle unseen values by applying most frequent imputation? > > > > regards, > > Georg > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From georg.kf.heiler at gmail.com Mon Aug 7 02:40:18 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Mon, 07 Aug 2017 06:40:18 +0000 Subject: [scikit-learn] transform categorical data to numerical representation In-Reply-To: <103609E5-E50B-4993-87F2-11661E7C7EB5@gmail.com> References: <6D0BF22C-9ABA-4C2A-B35B-210673439286@gmail.com> <103609E5-E50B-4993-87F2-11661E7C7EB5@gmail.com> Message-ID: I will need to look into factorize. Here is the result from profiling the transform method on a single new observation https://codereview.stackexchange.com/q/171622/132999 Best Georg Sebastian Raschka schrieb am So. 6. Aug. 2017 um 20:39: > > performance of prediction is pretty lame when there are around 100-150 > columns used as the input. > > you are talking about computational performance when you are calling the > "transform" method? Have you done some profiling to find out where your > bottle neck (in the for loop) is? Just one a very quick look, I think this > > data.loc[~data[column].isin(fittedLabels), column] = > str(replacementForUnseen) > > is already very slow because fittedLabels is an array where you have O(n) > lookup instead of an average O(1) by using a hash table. Or is the isin > function converting it to a hashtable/set/dict? > > In general, would it maybe help to use pandas' factorize? > https://pandas.pydata.org/pandas-docs/stable/generated/pandas.factorize.html > For predict time, say you have only 1 example for prediction that needs to > be converted, you could append prototypes of all possible values that could > occur, do the transformation, and then only pass the 1 transformed sample > to the classifier. I guess that could be even slow though ... > > Best, > Sebastian > > > On Aug 6, 2017, at 6:30 AM, Georg Heiler > wrote: > > > > @sebastian: thanks. Indeed, I am aware of this problem. > > > > I developed something here: > https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce but > realized that the performance of prediction is pretty lame when there are > around 100-150 columns used as the input. > > Do you have some ideas how to speed this up? > > > > Regards, > > Georg > > > > Joel Nothman schrieb am So., 6. Aug. 2017 um > 00:49 Uhr: > > We are working on CategoricalEncoder in > https://github.com/scikit-learn/scikit-learn/pull/9151 to help users more > with this kind of thing. Feedback and testing is welcome. > > > > On 6 August 2017 at 02:13, Sebastian Raschka > wrote: > > Hi, Georg, > > > > I bring this up every time here on the mailing list :), and you probably > aware of this issue, but it makes a difference whether your categorical > data is nominal or ordinal. For instance if you have an ordinal variable > like with values like {small, medium, large} you probably want to encode it > as {1, 2, 3} or {1, 20, 100} or whatever is appropriate based on your > domain knowledge regarding the variable. If you have sth like {blue, red, > green} it may make more sense to do a one-hot encoding so that the > classifier doesn't assume a relationship between the variables like blue > > red > green or sth like that. > > > > Now, the DictVectorizer and OneHotEncoder are both doing one hot > encoding. The LabelEncoder does convert a variable to integer values, but > if you have sth like {small, medium, large}, it wouldn't know the order (if > that's an ordinal variable) and it would just assign arbitrary integers in > increasing order. Thus, if you are dealing ordinal variables, there's no > way around doing this manually; for example you could create mapping > dictionaries for that (most conveniently done in pandas). > > > > Best, > > Sebastian > > > > > On Aug 5, 2017, at 5:10 AM, Georg Heiler > wrote: > > > > > > Hi, > > > > > > the LabelEncooder is only meant for a single column i.e. target > variable. Is the DictVectorizeer or a manual chaining of multiple > LabelEncoders (one per categorical column) the desired way to get values > which can be fed into a subsequent classifier? > > > > > > Is there some way I have overlooked which works better and possibly > also can handle unseen values by applying most frequent imputation? > > > > > > regards, > > > Georg > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From georg.kf.heiler at gmail.com Mon Aug 7 02:41:39 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Mon, 07 Aug 2017 06:41:39 +0000 Subject: [scikit-learn] transform categorical data to numerical representation In-Reply-To: References: <6D0BF22C-9ABA-4C2A-B35B-210673439286@gmail.com> <103609E5-E50B-4993-87F2-11661E7C7EB5@gmail.com> Message-ID: To my understanding pandas.factorize only works for the static case where no unseen variables can occur. Georg Heiler schrieb am Mo. 7. Aug. 2017 um 08:40: > I will need to look into factorize. Here is the result from profiling the > transform method on a single new observation > https://codereview.stackexchange.com/q/171622/132999 > > > Best Georg > Sebastian Raschka schrieb am So. 6. Aug. 2017 um > 20:39: > >> > performance of prediction is pretty lame when there are around 100-150 >> columns used as the input. >> >> you are talking about computational performance when you are calling the >> "transform" method? Have you done some profiling to find out where your >> bottle neck (in the for loop) is? Just one a very quick look, I think this >> >> data.loc[~data[column].isin(fittedLabels), column] = >> str(replacementForUnseen) >> >> is already very slow because fittedLabels is an array where you have O(n) >> lookup instead of an average O(1) by using a hash table. Or is the isin >> function converting it to a hashtable/set/dict? >> >> In general, would it maybe help to use pandas' factorize? >> https://pandas.pydata.org/pandas-docs/stable/generated/pandas.factorize.html >> For predict time, say you have only 1 example for prediction that needs >> to be converted, you could append prototypes of all possible values that >> could occur, do the transformation, and then only pass the 1 transformed >> sample to the classifier. I guess that could be even slow though ... >> >> Best, >> Sebastian >> >> > On Aug 6, 2017, at 6:30 AM, Georg Heiler >> wrote: >> > >> > @sebastian: thanks. Indeed, I am aware of this problem. >> > >> > I developed something here: >> https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce but >> realized that the performance of prediction is pretty lame when there are >> around 100-150 columns used as the input. >> > Do you have some ideas how to speed this up? >> > >> > Regards, >> > Georg >> > >> > Joel Nothman schrieb am So., 6. Aug. 2017 um >> 00:49 Uhr: >> > We are working on CategoricalEncoder in >> https://github.com/scikit-learn/scikit-learn/pull/9151 to help users >> more with this kind of thing. Feedback and testing is welcome. >> > >> > On 6 August 2017 at 02:13, Sebastian Raschka >> wrote: >> > Hi, Georg, >> > >> > I bring this up every time here on the mailing list :), and you >> probably aware of this issue, but it makes a difference whether your >> categorical data is nominal or ordinal. For instance if you have an ordinal >> variable like with values like {small, medium, large} you probably want to >> encode it as {1, 2, 3} or {1, 20, 100} or whatever is appropriate based on >> your domain knowledge regarding the variable. If you have sth like {blue, >> red, green} it may make more sense to do a one-hot encoding so that the >> classifier doesn't assume a relationship between the variables like blue > >> red > green or sth like that. >> > >> > Now, the DictVectorizer and OneHotEncoder are both doing one hot >> encoding. The LabelEncoder does convert a variable to integer values, but >> if you have sth like {small, medium, large}, it wouldn't know the order (if >> that's an ordinal variable) and it would just assign arbitrary integers in >> increasing order. Thus, if you are dealing ordinal variables, there's no >> way around doing this manually; for example you could create mapping >> dictionaries for that (most conveniently done in pandas). >> > >> > Best, >> > Sebastian >> > >> > > On Aug 5, 2017, at 5:10 AM, Georg Heiler >> wrote: >> > > >> > > Hi, >> > > >> > > the LabelEncooder is only meant for a single column i.e. target >> variable. Is the DictVectorizeer or a manual chaining of multiple >> LabelEncoders (one per categorical column) the desired way to get values >> which can be fed into a subsequent classifier? >> > > >> > > Is there some way I have overlooked which works better and possibly >> also can handle unseen values by applying most frequent imputation? >> > > >> > > regards, >> > > Georg >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn at python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andre.nascimento.melo at gmail.com Thu Aug 10 09:55:22 2017 From: andre.nascimento.melo at gmail.com (=?UTF-8?Q?Andr=C3=A9_Melo?=) Date: Thu, 10 Aug 2017 15:55:22 +0200 Subject: [scikit-learn] Truncated svd not working for complex matrices Message-ID: Hello all, I'm trying to use the randomized version of scikit-learn's TruncatedSVD (although I'm actually calling the internal function randomized_svd to get the actual u, s, v matrices). While it is working fine for real matrices, for complex matrices I can't get back the original matrix even though the singular values are exactly correct: >>> import numpy as np >>> from sklearn.utils.extmath import randomized_svd >>> N = 3 >>> a = np.random.rand(N, N)*(1 + 1j) >>> u1, s1, v1 = np.linalg.svd(a) >>> u2, s2, v2 = randomized_svd(a, n_components=N, n_iter=7) >>> np.allclose(s1, s2) True >>> np.allclose(a, u1.dot(np.diag(s1)).dot(v1)) True >>> np.allclose(a, u2.dot(np.diag(s2)).dot(v2)) False Any idea what could be wrong? Thank you! Best regards, Andre Melo From olivier.grisel at ensta.org Thu Aug 10 10:13:16 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 10 Aug 2017 16:13:16 +0200 Subject: [scikit-learn] Truncated svd not working for complex matrices In-Reply-To: References: Message-ID: I have no idea whether the randomized SVD method is supposed to work for complex data or not (from a mathematical point of view). I think that all scikit-learn estimators assume real data (or integer data for class labels) and our input validation utilities will cast numeric values to float64 by default. This might be the cause of your problem. Have a look at the source code to confirm. The reference to the paper can also be found in the docstring of those functions. -- Olivier ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From andre.nascimento.melo at gmail.com Thu Aug 10 10:56:43 2017 From: andre.nascimento.melo at gmail.com (=?UTF-8?Q?Andr=C3=A9_Melo?=) Date: Thu, 10 Aug 2017 16:56:43 +0200 Subject: [scikit-learn] Truncated svd not working for complex matrices In-Reply-To: References: Message-ID: Hi Olivier, Thank you very much for your reply. I was convinced it couldn't be a fundamental mathematical issue because the singular values were coming out exactly right, so it had to be a problem with the way complex values were being handled. I decided to look at the source code and it turns out the problem is when the following transformation is applied: U = np.dot(Q, Uhat) Replacing this by U = np.dot(Q.conj(), Uhat) solves the issue! Should I report this on github? On 10 August 2017 at 16:13, Olivier Grisel wrote: > I have no idea whether the randomized SVD method is supposed to work for > complex data or not (from a mathematical point of view). I think that all > scikit-learn estimators assume real data (or integer data for class labels) > and our input validation utilities will cast numeric values to float64 by > default. This might be the cause of your problem. Have a look at the source > code to confirm. The reference to the paper can also be found in the > docstring of those functions. > > -- > Olivier > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From andre.nascimento.melo at gmail.com Thu Aug 10 11:08:09 2017 From: andre.nascimento.melo at gmail.com (=?UTF-8?Q?Andr=C3=A9_Melo?=) Date: Thu, 10 Aug 2017 17:08:09 +0200 Subject: [scikit-learn] Truncated svd not working for complex matrices In-Reply-To: References: Message-ID: Actually, it makes more sense to change B = safe_sparse_dot(Q.T, M) To B = safe_sparse_dot(Q.T.conj(), M) On 10 August 2017 at 16:56, Andr? Melo wrote: > Hi Olivier, > > Thank you very much for your reply. I was convinced it couldn't be a > fundamental mathematical issue because the singular values were coming > out exactly right, so it had to be a problem with the way complex > values were being handled. > > I decided to look at the source code and it turns out the problem is > when the following transformation is applied: > > U = np.dot(Q, Uhat) > > Replacing this by > > U = np.dot(Q.conj(), Uhat) > > solves the issue! Should I report this on github? > > On 10 August 2017 at 16:13, Olivier Grisel wrote: >> I have no idea whether the randomized SVD method is supposed to work for >> complex data or not (from a mathematical point of view). I think that all >> scikit-learn estimators assume real data (or integer data for class labels) >> and our input validation utilities will cast numeric values to float64 by >> default. This might be the cause of your problem. Have a look at the source >> code to confirm. The reference to the paper can also be found in the >> docstring of those functions. >> >> -- >> Olivier >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> From joel.nothman at gmail.com Thu Aug 10 23:41:47 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Fri, 11 Aug 2017 13:41:47 +1000 Subject: [scikit-learn] Truncated svd not working for complex matrices In-Reply-To: References: Message-ID: Should we be more explicitly forbidding complex data in most estimators, and perhaps allow it in a few where it is tested (particularly decomposition)? On 11 August 2017 at 01:08, Andr? Melo wrote: > Actually, it makes more sense to change > > B = safe_sparse_dot(Q.T, M) > > To > B = safe_sparse_dot(Q.T.conj(), M) > > On 10 August 2017 at 16:56, Andr? Melo > wrote: > > Hi Olivier, > > > > Thank you very much for your reply. I was convinced it couldn't be a > > fundamental mathematical issue because the singular values were coming > > out exactly right, so it had to be a problem with the way complex > > values were being handled. > > > > I decided to look at the source code and it turns out the problem is > > when the following transformation is applied: > > > > U = np.dot(Q, Uhat) > > > > Replacing this by > > > > U = np.dot(Q.conj(), Uhat) > > > > solves the issue! Should I report this on github? > > > > On 10 August 2017 at 16:13, Olivier Grisel > wrote: > >> I have no idea whether the randomized SVD method is supposed to work for > >> complex data or not (from a mathematical point of view). I think that > all > >> scikit-learn estimators assume real data (or integer data for class > labels) > >> and our input validation utilities will cast numeric values to float64 > by > >> default. This might be the cause of your problem. Have a look at the > source > >> code to confirm. The reference to the paper can also be found in the > >> docstring of those functions. > >> > >> -- > >> Olivier > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From drraph at gmail.com Fri Aug 11 03:16:59 2017 From: drraph at gmail.com (Raphael C) Date: Fri, 11 Aug 2017 09:16:59 +0200 Subject: [scikit-learn] Truncated svd not working for complex matrices In-Reply-To: References: Message-ID: Although the first priority should be correctness (in implementation and documentation) and it makes sense to explicitly test for inputs for which code will give the wrong answer, it would be great if we could support complex data types, especially where it is very little extra work. Raphael On 11 August 2017 at 05:41, Joel Nothman wrote: > Should we be more explicitly forbidding complex data in most estimators, and > perhaps allow it in a few where it is tested (particularly decomposition)? > > On 11 August 2017 at 01:08, Andr? Melo > wrote: >> >> Actually, it makes more sense to change >> >> B = safe_sparse_dot(Q.T, M) >> >> To >> B = safe_sparse_dot(Q.T.conj(), M) >> >> On 10 August 2017 at 16:56, Andr? Melo >> wrote: >> > Hi Olivier, >> > >> > Thank you very much for your reply. I was convinced it couldn't be a >> > fundamental mathematical issue because the singular values were coming >> > out exactly right, so it had to be a problem with the way complex >> > values were being handled. >> > >> > I decided to look at the source code and it turns out the problem is >> > when the following transformation is applied: >> > >> > U = np.dot(Q, Uhat) >> > >> > Replacing this by >> > >> > U = np.dot(Q.conj(), Uhat) >> > >> > solves the issue! Should I report this on github? >> > >> > On 10 August 2017 at 16:13, Olivier Grisel >> > wrote: >> >> I have no idea whether the randomized SVD method is supposed to work >> >> for >> >> complex data or not (from a mathematical point of view). I think that >> >> all >> >> scikit-learn estimators assume real data (or integer data for class >> >> labels) >> >> and our input validation utilities will cast numeric values to float64 >> >> by >> >> default. This might be the cause of your problem. Have a look at the >> >> source >> >> code to confirm. The reference to the paper can also be found in the >> >> docstring of those functions. >> >> >> >> -- >> >> Olivier >> >> >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From sambarnett95 at gmail.com Fri Aug 11 06:16:50 2017 From: sambarnett95 at gmail.com (Sam Barnett) Date: Fri, 11 Aug 2017 11:16:50 +0100 Subject: [scikit-learn] Overflow Error with Cross-Validation (but not normally fitting the data) Message-ID: To all, I am working on a scikit-learn estimator that performs a version of SVC with a custom kernel. Unfortunately, I have been presented with a problem: when running a grid search (or even using the cross_val_score function), my estimator encounters an overflow error when evaluating my kernel (specifically, in an array multiplication operation). What is particularly strange about this is that, when I train the estimator on the whole dataset, this error does not occur. In other words: the problem only appears to occur when the data is split into folds. Is this something that has been seen before? How ought I fix this? I have attached the source code below (in particular, see the notebook for how the problem arises). Best, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: kernelsqizer.py Type: text/x-python-script Size: 2592 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SeqSVC Toy Data Tests.ipynb Type: application/octet-stream Size: 6613 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seqsvc.py Type: text/x-python-script Size: 11023 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: timeseriestools.py Type: text/x-python-script Size: 1419 bytes Desc: not available URL: From t3kcit at gmail.com Fri Aug 11 12:37:12 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Fri, 11 Aug 2017 12:37:12 -0400 Subject: [scikit-learn] Truncated svd not working for complex matrices In-Reply-To: References: Message-ID: I opened https://github.com/scikit-learn/scikit-learn/issues/9528 I suggest to first error everywhere and then fix those for which it seems easy and worth it, as Joel said, probably mostly in decomposition. Though adding support even in a few places seems like dangerous feature creep. On 08/11/2017 03:16 AM, Raphael C wrote: > Although the first priority should be correctness (in implementation > and documentation) and it makes sense to explicitly test for inputs > for which code will give the wrong answer, it would be great if we > could support complex data types, especially where it is very little > extra work. > > Raphael > > On 11 August 2017 at 05:41, Joel Nothman wrote: >> Should we be more explicitly forbidding complex data in most estimators, and >> perhaps allow it in a few where it is tested (particularly decomposition)? >> >> On 11 August 2017 at 01:08, Andr? Melo >> wrote: >>> Actually, it makes more sense to change >>> >>> B = safe_sparse_dot(Q.T, M) >>> >>> To >>> B = safe_sparse_dot(Q.T.conj(), M) >>> >>> On 10 August 2017 at 16:56, Andr? Melo >>> wrote: >>>> Hi Olivier, >>>> >>>> Thank you very much for your reply. I was convinced it couldn't be a >>>> fundamental mathematical issue because the singular values were coming >>>> out exactly right, so it had to be a problem with the way complex >>>> values were being handled. >>>> >>>> I decided to look at the source code and it turns out the problem is >>>> when the following transformation is applied: >>>> >>>> U = np.dot(Q, Uhat) >>>> >>>> Replacing this by >>>> >>>> U = np.dot(Q.conj(), Uhat) >>>> >>>> solves the issue! Should I report this on github? >>>> >>>> On 10 August 2017 at 16:13, Olivier Grisel >>>> wrote: >>>>> I have no idea whether the randomized SVD method is supposed to work >>>>> for >>>>> complex data or not (from a mathematical point of view). I think that >>>>> all >>>>> scikit-learn estimators assume real data (or integer data for class >>>>> labels) >>>>> and our input validation utilities will cast numeric values to float64 >>>>> by >>>>> default. This might be the cause of your problem. Have a look at the >>>>> source >>>>> code to confirm. The reference to the paper can also be found in the >>>>> docstring of those functions. >>>>> >>>>> -- >>>>> Olivier >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From gael.varoquaux at normalesup.org Fri Aug 11 12:45:31 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 11 Aug 2017 18:45:31 +0200 Subject: [scikit-learn] Truncated svd not working for complex matrices In-Reply-To: References: Message-ID: <20170811164531.GB3756445@phare.normalesup.org> On Fri, Aug 11, 2017 at 12:37:12PM -0400, Andreas Mueller wrote: > I opened https://github.com/scikit-learn/scikit-learn/issues/9528 > I suggest to first error everywhere and then fix those for which it seems > easy and worth it, as Joel said, probably mostly in decomposition. > Though adding support even in a few places seems like dangerous feature > creep. I am trying to predent that I am offline and in vacations, so I shouldn't answer. But I do have a clear cut opinion here. I believe that we should decide _not_ to support complex data everywhere. The reason is that the support for complex data will always be incomplete and risks being buggy. Indeed, complex data is very infrequent in machine learning (unlike with signal processing). Hence, it will recieve little usage. In addition, many machine learning algorithms cannot easily be adapted to complex data. To manage user expectation and to ensure quality of the codebase, let us error on complex data. Should we move this discussion on the issue opened by Andy? Ga?l > On 08/11/2017 03:16 AM, Raphael C wrote: > >Although the first priority should be correctness (in implementation > >and documentation) and it makes sense to explicitly test for inputs > >for which code will give the wrong answer, it would be great if we > >could support complex data types, especially where it is very little > >extra work. > >Raphael > >On 11 August 2017 at 05:41, Joel Nothman wrote: > >>Should we be more explicitly forbidding complex data in most estimators, and > >>perhaps allow it in a few where it is tested (particularly decomposition)? > >>On 11 August 2017 at 01:08, Andr? Melo > >>wrote: > >>>Actually, it makes more sense to change > >>> B = safe_sparse_dot(Q.T, M) > >>>To > >>> B = safe_sparse_dot(Q.T.conj(), M) > >>>On 10 August 2017 at 16:56, Andr? Melo > >>>wrote: > >>>>Hi Olivier, > >>>>Thank you very much for your reply. I was convinced it couldn't be a > >>>>fundamental mathematical issue because the singular values were coming > >>>>out exactly right, so it had to be a problem with the way complex > >>>>values were being handled. > >>>>I decided to look at the source code and it turns out the problem is > >>>>when the following transformation is applied: > >>>>U = np.dot(Q, Uhat) > >>>>Replacing this by > >>>>U = np.dot(Q.conj(), Uhat) > >>>>solves the issue! Should I report this on github? > >>>>On 10 August 2017 at 16:13, Olivier Grisel > >>>>wrote: > >>>>>I have no idea whether the randomized SVD method is supposed to work > >>>>>for > >>>>>complex data or not (from a mathematical point of view). I think that > >>>>>all > >>>>>scikit-learn estimators assume real data (or integer data for class > >>>>>labels) > >>>>>and our input validation utilities will cast numeric values to float64 > >>>>>by > >>>>>default. This might be the cause of your problem. Have a look at the > >>>>>source > >>>>>code to confirm. The reference to the paper can also be found in the > >>>>>docstring of those functions. > >>>>>-- > >>>>>Olivier > >>>>>_______________________________________________ > >>>>>scikit-learn mailing list > >>>>>scikit-learn at python.org > >>>>>https://mail.python.org/mailman/listinfo/scikit-learn > >>>_______________________________________________ > >>>scikit-learn mailing list > >>>scikit-learn at python.org > >>>https://mail.python.org/mailman/listinfo/scikit-learn > >>_______________________________________________ > >>scikit-learn mailing list > >>scikit-learn at python.org > >>https://mail.python.org/mailman/listinfo/scikit-learn > >_______________________________________________ > >scikit-learn mailing list > >scikit-learn at python.org > >https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From olivier.grisel at ensta.org Fri Aug 11 17:49:13 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 11 Aug 2017 23:49:13 +0200 Subject: [scikit-learn] scikit-learn 0.19.0 is out! Message-ID: Grab it with pip or conda ! Quoting the release highlights from the website: We are excited to release a number of great new features including neighbors.LocalOutlierFactor for anomaly detection, preprocessing.QuantileTransformer for robust feature transformation, and the multioutput.ClassifierChain meta-estimator to simply account for dependencies between classes in multilabel problems. We have some new algorithms in existing estimators, such as multiplicative update in decomposition.NMF and multinomial linear_model.LogisticRegression with L1 loss (use solver='saga'). Cross validation is now able to return the results from multiple metric evaluations. The new model_selection.cross_validate can return many scores on the test data as well as training set performance and timings, and we have extended the scoring and refit parameters for grid/randomized search to handle multiple metrics. You can also learn faster. For instance, the new option to cache transformations in pipeline.Pipeline makes grid search over pipelines including slow transformations much more efficient. And you can predict faster: if you?re sure you know what you?re doing, you can turn off validating that the input is finite using config_context. We?ve made some important fixes too. We?ve fixed a longstanding implementation error in metrics.average_precision_score, so please be cautious with prior results reported from that function. A number of errors in the manifold.TSNE implementation have been fixed, particularly in the default Barnes-Hut approximation. semi_supervised.LabelSpreading and semi_supervised.LabelPropagation have had substantial fixes. LabelPropagation was previously broken. LabelSpreading should now correctly respect its alpha parameter. Please see the full changelog at: http://scikit-learn.org/0.19/whats_new.html#version-0-19 Notably some models have changed behaviors (bug fixes) and some methods or parameters part of the public API have been deprecated. A big thank you to anyone who made this release possible and Joel in particular. -- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Fri Aug 11 17:57:03 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Fri, 11 Aug 2017 17:57:03 -0400 Subject: [scikit-learn] scikit-learn 0.19.0 is out! In-Reply-To: References: Message-ID: <77c26ae0-808a-3d87-1912-c634edc1fb7c@gmail.com> Thank you everybody for making the release possible, in particular Olivier and Joel :) Wohoo! From fabian.sippl at gmx.net Fri Aug 11 17:57:32 2017 From: fabian.sippl at gmx.net (fabian.sippl at gmx.net) Date: Fri, 11 Aug 2017 23:57:32 +0200 Subject: [scikit-learn] Question-Early Stopping MLPClassifer RandomizedSearchCV Message-ID: An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Fri Aug 11 18:16:07 2017 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Sat, 12 Aug 2017 00:16:07 +0200 Subject: [scikit-learn] scikit-learn 0.19.0 is out! In-Reply-To: <77c26ae0-808a-3d87-1912-c634edc1fb7c@gmail.com> References: <77c26ae0-808a-3d87-1912-c634edc1fb7c@gmail.com> Message-ID: Congrats guys!!!! On 11 August 2017 at 23:57, Andreas Mueller wrote: > Thank you everybody for making the release possible, in particular Olivier > and Joel :) > > Wohoo! > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sat Aug 12 01:14:49 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 12 Aug 2017 07:14:49 +0200 Subject: [scikit-learn] scikit-learn 0.19.0 is out! In-Reply-To: References: <77c26ae0-808a-3d87-1912-c634edc1fb7c@gmail.com> Message-ID: <20170812051449.GD3225585@phare.normalesup.org> Hurray, thank you everybody. This is a good one! (as always). Ga?l On Sat, Aug 12, 2017 at 12:16:07AM +0200, Guillaume Lema?tre wrote: > Congrats guys!!!! > On 11 August 2017 at 23:57, Andreas Mueller wrote: > Thank you everybody for making the release possible, in particular Olivier > and Joel :) > Wohoo! > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From se.raschka at gmail.com Sat Aug 12 01:19:41 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Sat, 12 Aug 2017 01:19:41 -0400 Subject: [scikit-learn] scikit-learn 0.19.0 is out! In-Reply-To: <20170812051449.GD3225585@phare.normalesup.org> References: <77c26ae0-808a-3d87-1912-c634edc1fb7c@gmail.com> <20170812051449.GD3225585@phare.normalesup.org> Message-ID: Yay, as an avid user, thanks to all the developers! This is a great release indeed -- no breaking changes (at least for my code base) and so many improvements and additions (that I need to check out in detail) :) > On Aug 12, 2017, at 1:14 AM, Gael Varoquaux wrote: > > Hurray, thank you everybody. This is a good one! (as always). > > Ga?l > > On Sat, Aug 12, 2017 at 12:16:07AM +0200, Guillaume Lema?tre wrote: >> Congrats guys!!!! > >> On 11 August 2017 at 23:57, Andreas Mueller wrote: > >> Thank you everybody for making the release possible, in particular Olivier >> and Joel :) > >> Wohoo! > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From raga.markely at gmail.com Sat Aug 12 01:32:38 2017 From: raga.markely at gmail.com (Raga Markely) Date: Sat, 12 Aug 2017 01:32:38 -0400 Subject: [scikit-learn] scikit-learn 0.19.0 is out! In-Reply-To: References: <77c26ae0-808a-3d87-1912-c634edc1fb7c@gmail.com> <20170812051449.GD3225585@phare.normalesup.org> Message-ID: Thanks a lot for all the hard work and congratz! Best, Raga On Aug 12, 2017 1:21 AM, "Sebastian Raschka" wrote: > Yay, as an avid user, thanks to all the developers! This is a great > release indeed -- no breaking changes (at least for my code base) and so > many improvements and additions (that I need to check out in detail) :) > > > > On Aug 12, 2017, at 1:14 AM, Gael Varoquaux < > gael.varoquaux at normalesup.org> wrote: > > > > Hurray, thank you everybody. This is a good one! (as always). > > > > Ga?l > > > > On Sat, Aug 12, 2017 at 12:16:07AM +0200, Guillaume Lema?tre wrote: > >> Congrats guys!!!! > > > >> On 11 August 2017 at 23:57, Andreas Mueller wrote: > > > >> Thank you everybody for making the release possible, in particular > Olivier > >> and Joel :) > > > >> Wohoo! > > > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > -- > > Gael Varoquaux > > Researcher, INRIA Parietal > > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > > Phone: ++ 33-1-69-08-79-68 > > http://gael-varoquaux.info http://twitter.com/ > GaelVaroquaux > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashimb9 at gmail.com Sat Aug 12 02:32:57 2017 From: ashimb9 at gmail.com (Ashim Bhattarai) Date: Sat, 12 Aug 2017 01:32:57 -0500 Subject: [scikit-learn] scikit-learn 0.19.0 is out! In-Reply-To: References: <77c26ae0-808a-3d87-1912-c634edc1fb7c@gmail.com> <20170812051449.GD3225585@phare.normalesup.org> Message-ID: Yes, thank you everyone! On Sat, Aug 12, 2017 at 12:32 AM, Raga Markely wrote: > Thanks a lot for all the hard work and congratz! > > Best, > Raga > > On Aug 12, 2017 1:21 AM, "Sebastian Raschka" wrote: > >> Yay, as an avid user, thanks to all the developers! This is a great >> release indeed -- no breaking changes (at least for my code base) and so >> many improvements and additions (that I need to check out in detail) :) >> >> >> > On Aug 12, 2017, at 1:14 AM, Gael Varoquaux < >> gael.varoquaux at normalesup.org> wrote: >> > >> > Hurray, thank you everybody. This is a good one! (as always). >> > >> > Ga?l >> > >> > On Sat, Aug 12, 2017 at 12:16:07AM +0200, Guillaume Lema?tre wrote: >> >> Congrats guys!!!! >> > >> >> On 11 August 2017 at 23:57, Andreas Mueller wrote: >> > >> >> Thank you everybody for making the release possible, in particular >> Olivier >> >> and Joel :) >> > >> >> Wohoo! >> > >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > -- >> > Gael Varoquaux >> > Researcher, INRIA Parietal >> > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >> > Phone: ++ 33-1-69-08-79-68 >> > http://gael-varoquaux.info http://twitter.com/GaelVaroqua >> ux >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bertrand.thirion at inria.fr Sat Aug 12 04:50:43 2017 From: bertrand.thirion at inria.fr (bthirion) Date: Sat, 12 Aug 2017 10:50:43 +0200 Subject: [scikit-learn] scikit-learn 0.19.0 is out! In-Reply-To: References: Message-ID: <539f18e9-8e59-7294-bd3e-26ae24928f3f@inria.fr> Congratulations for all these improvements and for orchestrating the release ! Bertrand On 11/08/2017 23:49, Olivier Grisel wrote: > Grab it with pip or conda ! > > Quoting the release highlights from the website: > > We are excited to release a number of great new features including > neighbors.LocalOutlierFactor for anomaly detection, > preprocessing.QuantileTransformer for robust feature transformation, > and the multioutput.ClassifierChain meta-estimator to simply account > for dependencies between classes in multilabel problems. We have some > new algorithms in existing estimators, such as multiplicative update > in decomposition.NMF and multinomial linear_model.LogisticRegression > with L1 loss (use solver='saga'). > > Cross validation is now able to return the results from multiple > metric evaluations. The new model_selection.cross_validate can return > many scores on the test data as well as training set performance and > timings, and we have extended the scoring and refit parameters for > grid/randomized search to handle multiple metrics. > > You can also learn faster. For instance, the new option to cache > transformations in pipeline.Pipeline makes grid search over pipelines > including slow transformations much more efficient. And you can > predict faster: if you?re sure you know what you?re doing, you can > turn off validating that the input is finite using config_context. > > We?ve made some important fixes too. We?ve fixed a longstanding > implementation error in metrics.average_precision_score, so please be > cautious with prior results reported from that function. A number of > errors in the manifold.TSNE implementation have been fixed, > particularly in the default Barnes-Hut approximation. > semi_supervised.LabelSpreading and semi_supervised.LabelPropagation > have had substantial fixes. LabelPropagation was previously broken. > LabelSpreading should now correctly respect its alpha parameter. > > Please see the full changelog at: > > http://scikit-learn.org/0.19/whats_new.html#version-0-19 > > Notably some models have changed behaviors (bug fixes) and some > methods or parameters part of the public API have been deprecated. > > A big thank you to anyone who made this release possible and Joel in > particular. > > -- > Olivier > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From valerio.maggio at gmail.com Sat Aug 12 05:18:08 2017 From: valerio.maggio at gmail.com (Valerio Maggio) Date: Sat, 12 Aug 2017 09:18:08 +0000 Subject: [scikit-learn] scikit-learn 0.19.0 is out! In-Reply-To: References: <77c26ae0-808a-3d87-1912-c634edc1fb7c@gmail.com> <20170812051449.GD3225585@phare.normalesup.org> Message-ID: On Sat, 12 Aug 2017 at 07:20, Sebastian Raschka wrote: > Yay, as an avid user, thanks to all the developers! This is a great > release indeed -- no breaking changes (at least for my code base) and so > many improvements and additions (that I need to check out in detail) :) Quoting Sebastian: totally agree!! +1 Thanks a lot for this super new release and for all these improvements. Cheers Valerio > > > > On Aug 12, 2017, at 1:14 AM, Gael Varoquaux < > gael.varoquaux at normalesup.org> wrote: > > > > Hurray, thank you everybody. This is a good one! (as always). > > > > Ga?l > > > > On Sat, Aug 12, 2017 at 12:16:07AM +0200, Guillaume Lema?tre wrote: > >> Congrats guys!!!! > > > >> On 11 August 2017 at 23:57, Andreas Mueller wrote: > > > >> Thank you everybody for making the release possible, in particular > Olivier > >> and Joel :) > > > >> Wohoo! > > > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > -- > > Gael Varoquaux > > Researcher, INRIA Parietal > > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > > Phone: ++ 33-1-69-08-79-68 > > http://gael-varoquaux.info > http://twitter.com/GaelVaroquaux > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- # valerio -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at telecom-paristech.fr Sat Aug 12 08:11:13 2017 From: alexandre.gramfort at telecom-paristech.fr (Alexandre Gramfort) Date: Sat, 12 Aug 2017 14:11:13 +0200 Subject: [scikit-learn] Truncated svd not working for complex matrices In-Reply-To: <20170811164531.GB3756445@phare.normalesup.org> References: <20170811164531.GB3756445@phare.normalesup.org> Message-ID: I agree with Ga?l on this. If you want to support complex values just copy the estimators / functions you want and maintain them in a separate package. +1 to error when complex are passed. From b.noushin7 at gmail.com Sun Aug 13 12:16:53 2017 From: b.noushin7 at gmail.com (Ariani A) Date: Sun, 13 Aug 2017 12:16:53 -0400 Subject: [scikit-learn] No module named crluster.hierarchical Message-ID: Dear all, I am writing this import: from sklearn.crluster.hierarchical import (_hc_cut, _TREE_BUILDERS, linkage_tree) But it gives this error: ImportError: No module named crluster.hierarchical Any clue? Best regards, -Noushin -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyr14 at gmail.com Sun Aug 13 12:20:27 2017 From: zephyr14 at gmail.com (Vlad Niculae) Date: Sun, 13 Aug 2017 12:20:27 -0400 Subject: [scikit-learn] No module named crluster.hierarchical In-Reply-To: References: Message-ID: Looks like you're misspelling the word "cluster". Yours, Vlad On Aug 13, 2017 12:19 PM, "Ariani A" wrote: > Dear all, > > I am writing this import: > > from sklearn.crluster.hierarchical import (_hc_cut, _TREE_BUILDERS, > linkage_tree) > But it gives this error: > ImportError: No module named crluster.hierarchical > > Any clue? > Best regards, > -Noushin > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From b.noushin7 at gmail.com Sun Aug 13 12:25:49 2017 From: b.noushin7 at gmail.com (Ariani A) Date: Sun, 13 Aug 2017 12:25:49 -0400 Subject: [scikit-learn] No module named crluster.hierarchical In-Reply-To: References: Message-ID: Thank you so much! On Sun, Aug 13, 2017 at 12:20 PM, Vlad Niculae wrote: > Looks like you're misspelling the word "cluster". > > Yours, > Vlad > > On Aug 13, 2017 12:19 PM, "Ariani A" wrote: > >> Dear all, >> >> I am writing this import: >> >> from sklearn.crluster.hierarchical import (_hc_cut, _TREE_BUILDERS, >> linkage_tree) >> But it gives this error: >> ImportError: No module named crluster.hierarchical >> >> Any clue? >> Best regards, >> -Noushin >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Aug 14 10:05:56 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 14 Aug 2017 10:05:56 -0400 Subject: [scikit-learn] Question-Early Stopping MLPClassifer RandomizedSearchCV In-Reply-To: References: Message-ID: Yes, you understood correctly. You can see the implementation in the code: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neural_network/multilayer_perceptron.py#L491 It calls ``train_test_split``, so it's a random subset of the data. Currently the API doesn't allow providing your own validation set. What is the use-case for that? Andy On 08/11/2017 05:57 PM, fabian.sippl at gmx.net wrote: > Hello Scikit-Learn Team, > I?ve got a question concerning the implementation of Early Stopping in > MLPClassifier. I am using it in combination with RandomizedSearchCV. > The fraction used for validation in early stopping is set with the > parameter validation_fraction of MLPClassifier. How is the validaton > set extracted from the training set ? Does the function simply take > the last X % from the training set ? Is there a possibility to > manually set this validation set ? > I wonder whether I correctly understand the functionality: The neural > net is trained on the training data and the performance is evaluated > after every epoch on the validation set (which is internally selected > by the MLPClassifer)? If the Net stops training, the performance on > the left out data (Parameter "cv" in RandomizedSearch) is determined ? > Thank you very much for your help ! > Kind Regards, > Fabian Sippl > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From georg.kf.heiler at gmail.com Wed Aug 16 07:28:21 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Wed, 16 Aug 2017 11:28:21 +0000 Subject: [scikit-learn] caching transformers during hyper parameter optimization Message-ID: There is a new option in the pipeline: http://scikit-learn.org/stable/modules/pipeline.html#pipeline-cache How can I use this to also store the transformed data as I only want to compute the last step i.e. estimator during hyper parameter tuning and not the transform methods of the clean steps? Is there a possibility to apply this for crossvalidation? I would want to see all the folds precomputed and stored to disk in a folder. Regards, Georg -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed Aug 16 07:51:19 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Wed, 16 Aug 2017 21:51:19 +1000 Subject: [scikit-learn] caching transformers during hyper parameter optimization In-Reply-To: References: Message-ID: We certainly considered this over the many years that Pipeline caching has been in the pipeline. Storing the fitted model means we can do both a fit_transform and a transform on new data, and in many cases takes away the pain point of CV over pipelines where downstream steps are varied. What transformer are you using where the transform is costly? Or is it more a matter of you wanting to store the transformed data at each step? There are custom ways to do this sort of thing generically with a mixin if you really want. On 16 August 2017 at 21:28, Georg Heiler wrote: > There is a new option in the pipeline: http://scikit-learn. > org/stable/modules/pipeline.html#pipeline-cache > How can I use this to also store the transformed data as I only want to > compute the last step i.e. estimator during hyper parameter tuning and not > the transform methods of the clean steps? > > Is there a possibility to apply this for crossvalidation? I would want to > see all the folds precomputed and stored to disk in a folder. > > Regards, > Georg > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From georg.kf.heiler at gmail.com Wed Aug 16 12:53:28 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Wed, 16 Aug 2017 16:53:28 +0000 Subject: [scikit-learn] caching transformers during hyper parameter optimization In-Reply-To: References: Message-ID: Data cleaning @ enrichment Could you link an example for a mixing? Currently this is a bit if a mess with custom pickle persistence in a big for loop and custom transformers Thanks. Georg Joel Nothman schrieb am Mi. 16. Aug. 2017 um 13:51: > We certainly considered this over the many years that Pipeline caching has > been in the pipeline. Storing the fitted model means we can do both a > fit_transform and a transform on new data, and in many cases takes away the > pain point of CV over pipelines where downstream steps are varied. > > What transformer are you using where the transform is costly? Or is it > more a matter of you wanting to store the transformed data at each step? > > There are custom ways to do this sort of thing generically with a mixin if > you really want. > > On 16 August 2017 at 21:28, Georg Heiler > wrote: > >> There is a new option in the pipeline: >> http://scikit-learn.org/stable/modules/pipeline.html#pipeline-cache >> How can I use this to also store the transformed data as I only want to >> compute the last step i.e. estimator during hyper parameter tuning and not >> the transform methods of the clean steps? >> >> Is there a possibility to apply this for crossvalidation? I would want to >> see all the folds precomputed and stored to disk in a folder. >> >> Regards, >> Georg >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed Aug 16 21:15:03 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 17 Aug 2017 11:15:03 +1000 Subject: [scikit-learn] caching transformers during hyper parameter optimization In-Reply-To: References: Message-ID: Now this isn't the best example, because joblib.Memory isn't going to be very fast at dumping a list of strings, but I hope you can get the idea from https://gist.github.com/jnothman/019d594d197c98a3d6192fa0cb19c850 On 17 August 2017 at 02:53, Georg Heiler wrote: > Data cleaning @ enrichment > > Could you link an example for a mixing? > > Currently this is a bit if a mess with custom pickle persistence in a big > for loop and custom transformers > > Thanks. > Georg > Joel Nothman schrieb am Mi. 16. Aug. 2017 um > 13:51: > >> We certainly considered this over the many years that Pipeline caching >> has been in the pipeline. Storing the fitted model means we can do both a >> fit_transform and a transform on new data, and in many cases takes away the >> pain point of CV over pipelines where downstream steps are varied. >> >> What transformer are you using where the transform is costly? Or is it >> more a matter of you wanting to store the transformed data at each step? >> >> There are custom ways to do this sort of thing generically with a mixin >> if you really want. >> >> On 16 August 2017 at 21:28, Georg Heiler >> wrote: >> >>> There is a new option in the pipeline: http://scikit-learn. >>> org/stable/modules/pipeline.html#pipeline-cache >>> How can I use this to also store the transformed data as I only want to >>> compute the last step i.e. estimator during hyper parameter tuning and not >>> the transform methods of the clean steps? >>> >>> Is there a possibility to apply this for crossvalidation? I would want >>> to see all the folds precomputed and stored to disk in a folder. >>> >>> Regards, >>> Georg >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sambarnett95 at gmail.com Thu Aug 17 05:22:21 2017 From: sambarnett95 at gmail.com (Sam Barnett) Date: Thu, 17 Aug 2017 10:22:21 +0100 Subject: [scikit-learn] Malformed input for SVC(kernel='precomputed').predict() Message-ID: I am rolling classifier based on SVC which computes a custom Gram matrix and runs this through the SVC classifier with kernel = 'precomputed'. While this works fine with the fit method, I face a dilemma with the predict method, shown here: def predict(self, X): """Run the predict method of the previously-instantiated SVM classifier, returning the predicted classes for test set X.""" # Check is fit had been called check_is_fitted(self, ['X_', 'y_']) # Input validation X = check_array(X) cut_off = self.cut_ord_pair[0] order = self.cut_ord_pair[1] X_gram = seq_kernel_free(X, self.X_, \ pri_kernel=kernselect(self.kernel, self.coef0, self.gamma, self.degree, self.scale), \ cut_off=cut_off, order=order) X_gram = np.nan_to_num(X_gram) return self.ord_svc_.predict(X_gram) This will run on any dataset just fine. However, it fails the check_estimator test. Specifically, when trying to raise an error for malformed input on predict (in check_classifiers_train), it says that a ValueError is not raised. Yet if I change the order of X and self.X_ in seq_kernel_free (which computes the [n_samples_train, n_samples_test] Gram matrix), it passes the check_estimator test yet fails to run the predict method. How do I resolve both issues simultaneously? -------------- next part -------------- An HTML attachment was scrubbed... URL: From georg.kf.heiler at gmail.com Thu Aug 17 07:50:33 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Thu, 17 Aug 2017 11:50:33 +0000 Subject: [scikit-learn] Categorical handling Message-ID: Hi, how can I properly handle categorical values in scikit-learn? https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 goals - scikit-learn syle fit/transform methods to encode labels of categorical features of X - should handle unseen labels - should be faster than running a label encoder manually for each fold and manually checking if the label already was seen in the training data i.e. what I currently do ( https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 which links to https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce ) - only some columns are categorical, and only these should be converted Regards, Georg -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Thu Aug 17 11:03:16 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 17 Aug 2017 11:03:16 -0400 Subject: [scikit-learn] Malformed input for SVC(kernel='precomputed').predict() In-Reply-To: References: Message-ID: <0d66111e-3e35-869b-9753-1f20cb118216@gmail.com> Hi Sam. Can you say which test fails exactly and where (i.e. give traceback)? The estimator checks are currently quite strict with respect to raising helpful error messages. That doesn't mean your estimator is broken (necessarily). With a precomputed gram matrix, I expect the shape of X in predict to be (n_samples_test, n_samples_train), right? Does you estimator have a _pairwise attribute? (It should to work with cross-validation, I'm not sure if it's used in the estimator checks right now, but it should). Your feedback will help making check_estimator be more robust. I don't think it's tested with anything that requires "precomputed" kernels. Thanks Andy On 08/17/2017 05:22 AM, Sam Barnett wrote: > I am rolling classifier based on SVC which computes a custom Gram > matrix and runs this through the SVC classifier with kernel = > 'precomputed'. While this works fine with the fit method, I face a > dilemma with the predict method, shown here: > > > def predict(self, X): > """Run the predict method of the previously-instantiated SVM > classifier, returning the predicted classes for test set X.""" > > # Check is fit had been called > check_is_fitted(self, ['X_', 'y_']) > > # Input validation > X = check_array(X) > > cut_off = self.cut_ord_pair[0] > order = self.cut_ord_pair[1] > > X_gram = seq_kernel_free(X, self.X_, \ > pri_kernel=kernselect(self.kernel, self.coef0, self.gamma, > self.degree, self.scale), \ > cut_off=cut_off, order=order) > > X_gram = np.nan_to_num(X_gram) > > return self.ord_svc_.predict(X_gram) > > This will run on any dataset just fine. However, it fails the > check_estimator test. Specifically, when trying to raise an error for > malformed input on predict (in check_classifiers_train), it says that > a ValueError is not raised. Yet if I change the order of X and self.X_ > in seq_kernel_free (which computes the [n_samples_train, > n_samples_test] Gram matrix), it passes the check_estimator test yet > fails to run the predict method. > > How do I resolve both issues simultaneously? > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Thu Aug 17 11:11:49 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 17 Aug 2017 11:11:49 -0400 Subject: [scikit-learn] Categorical handling In-Reply-To: References: Message-ID: Hi Georg. Unfortunately this is not entirely trivial right now, but will be fixed by https://github.com/scikit-learn/scikit-learn/pull/9151 and https://github.com/scikit-learn/scikit-learn/pull/9012 which will be in the next release (0.20). LabelBinarizer is probably the best work-around for now, and selecting columns can be done (awkwardly) like in this example: http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py Best, Andy On 08/17/2017 07:50 AM, Georg Heiler wrote: > Hi, > > how can I properly handle categorical values in scikit-learn? > https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 > > > goals > > * scikit-learn syle fit/transform methods to encode labels of > categorical features of X > * should handle unseen labels > * should be faster than running a label encoder manually for each > fold and manually checking if the label already was seen in the > training data i.e. what I currently do > (https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 which > links to > https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce) > * only some columns are categorical, and only these should be converted > > > Regards, > Georg > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Thu Aug 17 11:26:13 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Fri, 18 Aug 2017 01:26:13 +1000 Subject: [scikit-learn] Categorical handling In-Reply-To: References: Message-ID: I don't consider LabelBinarizer the best workaround. Given a Pandas dataframe df, I'd use: DictVectorizer().fit_transform(df.to_dict(orient='records')) which will handle encoding strings with one-hot and numerical features as column vectors. Or: class PandasVectorizer(DictVectorizer): def fit(self, x, y=None): return super(PandasVectorizer, self).fit(x.to_dict('records')) def fit_transform(self, x, y=None): return super(PandasVectorizer, self).fit_transform(x.to_dict('records')) def transform(self, x): return super(PandasVectorizer, self).transform(x.to_dict('records')) On 18 August 2017 at 01:11, Andreas Mueller wrote: > Hi Georg. > Unfortunately this is not entirely trivial right now, but will be fixed by > https://github.com/scikit-learn/scikit-learn/pull/9151 > and > https://github.com/scikit-learn/scikit-learn/pull/9012 > which will be in the next release (0.20). > > LabelBinarizer is probably the best work-around for now, and selecting > columns can be done (awkwardly) > like in this example: http://scikit-learn.org/dev/ > auto_examples/hetero_feature_union.html#sphx-glr-auto- > examples-hetero-feature-union-py > > Best, > Andy > > > On 08/17/2017 07:50 AM, Georg Heiler wrote: > > Hi, > > how can I properly handle categorical values in scikit-learn? > https://stackoverflow.com/questions/45727934/pandas-categories-new-levels? > noredirect=1#comment78424496_45727934 > > goals > > - scikit-learn syle fit/transform methods to encode labels of > categorical features of X > - should handle unseen labels > - should be faster than running a label encoder manually for each fold > and manually checking if the label already was seen in the training data > i.e. what I currently do (https://stackoverflow.com/ > questions/45727934/pandas-categories-new-levels? > noredirect=1#comment78424496_45727934 > which > links to https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2 > ce) > - only some columns are categorical, and only these should be converted > > > Regards, > Georg > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Thu Aug 17 11:27:43 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Fri, 18 Aug 2017 01:27:43 +1000 Subject: [scikit-learn] Categorical handling In-Reply-To: References: Message-ID: gist at https://gist.github.com/jnothman/a75bac778c1eb9661017555249e50379 On 18 August 2017 at 01:26, Joel Nothman wrote: > I don't consider LabelBinarizer the best workaround. > > Given a Pandas dataframe df, I'd use: > > DictVectorizer().fit_transform(df.to_dict(orient='records')) > > which will handle encoding strings with one-hot and numerical features as > column vectors. Or: > > class PandasVectorizer(DictVectorizer): > def fit(self, x, y=None): > return super(PandasVectorizer, self).fit(x.to_dict('records')) > def fit_transform(self, x, y=None): > return super(PandasVectorizer, self).fit_transform(x.to_dict( > 'records')) > def transform(self, x): > return super(PandasVectorizer, self).transform(x.to_dict(' > records')) > > > On 18 August 2017 at 01:11, Andreas Mueller wrote: > >> Hi Georg. >> Unfortunately this is not entirely trivial right now, but will be fixed by >> https://github.com/scikit-learn/scikit-learn/pull/9151 >> and >> https://github.com/scikit-learn/scikit-learn/pull/9012 >> which will be in the next release (0.20). >> >> LabelBinarizer is probably the best work-around for now, and selecting >> columns can be done (awkwardly) >> like in this example: http://scikit-learn.org/dev/au >> to_examples/hetero_feature_union.html#sphx-glr-auto-examples >> -hetero-feature-union-py >> >> Best, >> Andy >> >> >> On 08/17/2017 07:50 AM, Georg Heiler wrote: >> >> Hi, >> >> how can I properly handle categorical values in scikit-learn? >> https://stackoverflow.com/questions/45727934/pandas-categori >> es-new-levels?noredirect=1#comment78424496_45727934 >> >> goals >> >> - scikit-learn syle fit/transform methods to encode labels of >> categorical features of X >> - should handle unseen labels >> - should be faster than running a label encoder manually for each >> fold and manually checking if the label already was seen in the training >> data i.e. what I currently do (https://stackoverflow.com/que >> stions/45727934/pandas-categories-new-levels?noredirect=1# >> comment78424496_45727934 >> which >> links to https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b07 >> 99dc2ce) >> - only some columns are categorical, and only these should be >> converted >> >> >> Regards, >> Georg >> >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sambarnett95 at gmail.com Thu Aug 17 13:21:24 2017 From: sambarnett95 at gmail.com (Sam Barnett) Date: Thu, 17 Aug 2017 18:21:24 +0100 Subject: [scikit-learn] Malformed input for SVC(kernel='precomputed').predict() In-Reply-To: <0d66111e-3e35-869b-9753-1f20cb118216@gmail.com> References: <0d66111e-3e35-869b-9753-1f20cb118216@gmail.com> Message-ID: Hi Andy, Please find attached a Jupyter notebook showing exactly where the problem appears. Best, Sam On Thu, Aug 17, 2017 at 4:03 PM, Andreas Mueller wrote: > Hi Sam. > > Can you say which test fails exactly and where (i.e. give traceback)? > The estimator checks are currently quite strict with respect to raising > helpful error messages. > That doesn't mean your estimator is broken (necessarily). > With a precomputed gram matrix, I expect the shape of X in predict to be > (n_samples_test, n_samples_train), right? > Does you estimator have a _pairwise attribute? (It should to work with > cross-validation, I'm not sure if it's > used in the estimator checks right now, but it should). > > Your feedback will help making check_estimator be more robust. I don't > think it's tested with anything that requires > "precomputed" kernels. > > Thanks > > Andy > > > On 08/17/2017 05:22 AM, Sam Barnett wrote: > > I am rolling classifier based on SVC which computes a custom Gram matrix > and runs this through the SVC classifier with kernel = 'precomputed'. While > this works fine with the fit method, I face a dilemma with the predict > method, shown here: > > > def predict(self, X): > """Run the predict method of the previously-instantiated SVM > classifier, returning the predicted classes for test set X.""" > > # Check is fit had been called > check_is_fitted(self, ['X_', 'y_']) > > # Input validation > X = check_array(X) > > cut_off = self.cut_ord_pair[0] > order = self.cut_ord_pair[1] > > X_gram = seq_kernel_free(X, self.X_, \ > pri_kernel=kernselect(self.kernel, self.coef0, self.gamma, > self.degree, self.scale), \ > cut_off=cut_off, order=order) > > X_gram = np.nan_to_num(X_gram) > > return self.ord_svc_.predict(X_gram) > > This will run on any dataset just fine. However, it fails the > check_estimator test. Specifically, when trying to raise an error for > malformed input on predict (in check_classifiers_train), it says that a > ValueError is not raised. Yet if I change the order of X and self.X_ in > seq_kernel_free (which computes the [n_samples_train, n_samples_test] Gram > matrix), it passes the check_estimator test yet fails to run the predict > method. > > How do I resolve both issues simultaneously? > > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: kernelsqizer.py Type: text/x-python-script Size: 5142 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SeqSVC Check Estimator Test.ipynb Type: application/octet-stream Size: 9321 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seqsvc_v2.py Type: text/x-python-script Size: 10890 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: timeseriestools.py Type: text/x-python-script Size: 1419 bytes Desc: not available URL: From mcapizzi at email.arizona.edu Thu Aug 17 14:46:50 2017 From: mcapizzi at email.arizona.edu (Michael Capizzi) Date: Thu, 17 Aug 2017 11:46:50 -0700 Subject: [scikit-learn] any interest in incorporating a new Transformer? Message-ID: Hi all - Forgive me if this is the wrong place for posting this question, but I'd like to inquire about the community's interest in incorporating a new Transformer into the code base. This paper ( https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf ) is a "classic" in Natural Language Processing and is often times used as a very competitive baseline. TL;DR it transforms a traditional count-based feature space into the conditional probabilities of a `Naive Bayes` classifier. These transformed features can then be used to train any linear classifier. The paper focuses on `SVM`. The attached notebook has an example of the custom `Transformer` I built along with a custom `Classifier` to utilize this `Transformer` in a `multiclass` case (as the feature space transformation differs depending on the label). If there is interest in the community for the inclusion of this `Transformer` and `Classifier`, I'd happily go through the official process of a `pull-request`, etc. -Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat Aug 19 05:47:06 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Sat, 19 Aug 2017 19:47:06 +1000 Subject: [scikit-learn] any interest in incorporating a new Transformer? In-Reply-To: References: Message-ID: this is the right place to ask, but I'd be more interested to see a scikit-learn-compatible implementation available, perhaps in scikit-learn-contrib more than to see it part of the main package... On 19 Aug 2017 2:13 am, "Michael Capizzi" wrote: > Hi all - > > Forgive me if this is the wrong place for posting this question, but I'd > like to inquire about the community's interest in incorporating a new > Transformer into the code base. > > This paper ( https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf ) > is a "classic" in Natural Language Processing and is often times used as a > very competitive baseline. TL;DR it transforms a traditional count-based > feature space into the conditional probabilities of a `Naive Bayes` > classifier. These transformed features can then be used to train any > linear classifier. The paper focuses on `SVM`. > > The attached notebook has an example of the custom `Transformer` I built > along with a custom `Classifier` to utilize this `Transformer` in a > `multiclass` case (as the feature space transformation differs depending on > the label). > > If there is interest in the community for the inclusion of this > `Transformer` and `Classifier`, I'd happily go through the official process > of a `pull-request`, etc. > > -Michael > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcapizzi at email.arizona.edu Sat Aug 19 18:36:21 2017 From: mcapizzi at email.arizona.edu (Michael Capizzi) Date: Sat, 19 Aug 2017 15:36:21 -0700 Subject: [scikit-learn] any interest in incorporating a new Transformer? In-Reply-To: References: Message-ID: Thanks @joel - I wasn?t aware of scikit-learn-contrib. Is this what you?re referring to? https://github.com/scikit-learn-contrib/scikit-learn-contrib If so, I don?t see any existing projects that this would fit into; could I start a new one in a pull-request? -M ? On Sat, Aug 19, 2017 at 2:47 AM, Joel Nothman wrote: > this is the right place to ask, but I'd be more interested to see a > scikit-learn-compatible implementation available, perhaps in > scikit-learn-contrib more than to see it part of the main package... > > On 19 Aug 2017 2:13 am, "Michael Capizzi" > wrote: > >> Hi all - >> >> Forgive me if this is the wrong place for posting this question, but I'd >> like to inquire about the community's interest in incorporating a new >> Transformer into the code base. >> >> This paper ( https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf ) >> is a "classic" in Natural Language Processing and is often times used as a >> very competitive baseline. TL;DR it transforms a traditional count-based >> feature space into the conditional probabilities of a `Naive Bayes` >> classifier. These transformed features can then be used to train any >> linear classifier. The paper focuses on `SVM`. >> >> The attached notebook has an example of the custom `Transformer` I built >> along with a custom `Classifier` to utilize this `Transformer` in a >> `multiclass` case (as the feature space transformation differs depending on >> the label). >> >> If there is interest in the community for the inclusion of this >> `Transformer` and `Classifier`, I'd happily go through the official process >> of a `pull-request`, etc. >> >> -Michael >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sun Aug 20 08:28:44 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Sun, 20 Aug 2017 22:28:44 +1000 Subject: [scikit-learn] any interest in incorporating a new Transformer? In-Reply-To: References: Message-ID: The idea is to take the template ( https://github.com/scikit-learn-contrib/project-template), build, test and document your estimator(s), and offer it to be housed within scikit-learn-contrib. On 20 August 2017 at 08:36, Michael Capizzi wrote: > Thanks @joel - > > I wasn?t aware of scikit-learn-contrib. Is this what you?re referring to? > https://github.com/scikit-learn-contrib/scikit-learn-contrib > > If so, I don?t see any existing projects that this would fit into; could I > start a new one in a pull-request? > > -M > ? > > On Sat, Aug 19, 2017 at 2:47 AM, Joel Nothman > wrote: > >> this is the right place to ask, but I'd be more interested to see a >> scikit-learn-compatible implementation available, perhaps in >> scikit-learn-contrib more than to see it part of the main package... >> >> On 19 Aug 2017 2:13 am, "Michael Capizzi" >> wrote: >> >>> Hi all - >>> >>> Forgive me if this is the wrong place for posting this question, but I'd >>> like to inquire about the community's interest in incorporating a new >>> Transformer into the code base. >>> >>> This paper ( https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf ) >>> is a "classic" in Natural Language Processing and is often times used as a >>> very competitive baseline. TL;DR it transforms a traditional count-based >>> feature space into the conditional probabilities of a `Naive Bayes` >>> classifier. These transformed features can then be used to train any >>> linear classifier. The paper focuses on `SVM`. >>> >>> The attached notebook has an example of the custom `Transformer` I built >>> along with a custom `Classifier` to utilize this `Transformer` in a >>> `multiclass` case (as the feature space transformation differs depending on >>> the label). >>> >>> If there is interest in the community for the inclusion of this >>> `Transformer` and `Classifier`, I'd happily go through the official process >>> of a `pull-request`, etc. >>> >>> -Michael >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.atasever at gmail.com Mon Aug 21 10:11:25 2017 From: s.atasever at gmail.com (Sema Atasever) Date: Mon, 21 Aug 2017 17:11:25 +0300 Subject: [scikit-learn] How can i write the birch prediction results to the file Message-ID: Dear scikit-learn developers, I have a text file where the columns represent the 22 features and the rows represent the amino asid . (you can see in the attachment) I want to apply hierarchical clustering to this database usign *sklearn.cluster.Birch algorithm.* There are too many prediction results and it is not possible to see them on the screen. How can i write the birch prediction results to the file? I would appreciate if you could advise on some methods. Thanks. *Birch Codes:* from sklearn.cluster import Birch import numpy as np X=np.loadtxt(open("C:\class1.txt", "rb"), delimiter=";") brc = Birch(branching_factor=50, n_clusters=None, threshold=0.5,compute_labels=True,copy=True) brc.fit(X) centroids = brc.subcluster_centers_ labels = brc.subcluster_labels_ n_clusters = np.unique(labels).size brc.predict(X) print("\n brc.predict(X)") print(brc.predict(X)) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 0.1877;0.386705;0.242412;0.175513;0.109395;0.5;0;0.244492;0.25402;0.501485;0.08978;0.30011;0.610105;0.399205;0.20793;0.392865;0.183304;0.450634;0.181905;0.119275;0.441045;0.15519 0.175513;0.395115;0.244492;0.25402;0.20793;0.5;0;0.183304;0.450634;0.36606;0.085465;0.30111;0.61342;0.39889;0.181905;0.419205;0.196001;0.284247;0.15519;0.116745;0.508985;0.120175 0.096681;0.508985;0.231754;0.22298;0.122875;0.982014;0;0.239757;0.362949;0.397294;0.14375;0.30115;0.5551;0.5607;0.1547;0.2846;0.238086;0.365937;0.155065;0.112465;0.58629;0.114685 0.22298;0.54238;0.239757;0.362949;0.1547;0.5;0.976014;0.238086;0.365937;0.395978;0.11412;0.302565;0.58332;0.591975;0.155065;0.25296;0.234884;0.292462;0.114685;0.08463;0.46578;0.03225 0.362949;0.5607;0.238086;0.365937;0.155065;0.952574;0;0.234884;0.292462;0.472657;0.112465;0.262565;0.624975;0.58629;0.114685;0.29903;0.185031;0.109242;0.03225;0.061355;0.30821;0.01616 0.362957;0.305915;0.101388;0.502346;0.193875;0.5;0;0.092671;0.540357;0.36697;0.042315;0.565685;0.391995;0.23211;0.203875;0.564015;0.075362;0.544133;0.20143;0.039395;0.177705;0.17322 0.502346;0.253195;0.092671;0.540357;0.203875;0.982014;0;0.075362;0.544133;0.380503;0.0419;0.58211;0.37599;0.180535;0.20143;0.61803;0.07361;0.518633;0.17322;0.040375;0.25297;0.31168 0.540357;0.23211;0.075362;0.544133;0.20143;0.119203;0;0.07361;0.518633;0.40776;0.039395;0.5865;0.37411;0.177705;0.17322;0.64908;0.099123;0.540488;0.31168;0.039185;0.297655;0.30087 0.544133;0.180535;0.07361;0.518633;0.17322;0.952574;0;0.099123;0.540488;0.36039;0.040375;0.52303;0.436595;0.25297;0.31168;0.435355;0.137794;0.547889;0.30087;0.035415;0.30434;0.295825 0.518633;0.177705;0.099123;0.540488;0.31168;0.5;0.976014;0.137794;0.547889;0.314322;0.039185;0.513795;0.447025;0.297655;0.30087;0.401485;0.137281;0.490202;0.295825;0.02889;0.29;0.23639 0.540488;0.25297;0.137794;0.547889;0.30087;0.952574;0;0.137281;0.490202;0.372519;0.035415;0.45937;0.50522;0.30434;0.295825;0.399835;0.130684;0.374903;0.23639;0.018825;0.24624;0.125565 0.547889;0.297655;0.137281;0.490202;0.295825;0.268941;0;0.130684;0.374903;0.494415;0.02889;0.20978;0.76133;0.29;0.23639;0.473615;0.089682;0.161964;0.125565;0.01771;0.254795;0.057755 0.235259;0.30101;0.114724;0.370166;0.17405;0.982014;0;0.11486;0.25665;0.628491;0.006885;0.096835;0.89629;0.33127;0.151235;0.51749;0.103503;0.216193;0.07958;0.01084;0.37686;0.02358 0.16566;0.37686;0.213938;0.173933;0.00739;0.119203;0;0.256584;0.252658;0.490756;0.020135;0.1854;0.79446;0.69126;0.020875;0.287865;0.288979;0.309277;0.03181;0.022485;0.82069;0.035805 0.173933;0.57118;0.256584;0.252658;0.020875;0.731059;0;0.288979;0.309277;0.401748;0.02228;0.24233;0.735395;0.835785;0.03181;0.13241;0.284203;0.319514;0.035805;0.0206;0.862155;0.034595 0.252658;0.69126;0.288979;0.309277;0.03181;0.268941;0;0.284203;0.319514;0.396281;0.022485;0.24967;0.72784;0.82069;0.035805;0.143505;0.29731;0.315762;0.034595;0.015885;0.81669;0.02245 0.309277;0.835785;0.284203;0.319514;0.035805;0.731059;0;0.29731;0.315762;0.38693;0.0206;0.230605;0.7488;0.862155;0.034595;0.10325;0.281889;0.206203;0.02245;0.00778;0.72839;0.012685 0.319514;0.82069;0.29731;0.315762;0.034595;0.119203;0;0.281889;0.206203;0.511907;0.015885;0.167175;0.816935;0.81669;0.02245;0.16086;0.288807;0.137009;0.012685;0.034445;0.592785;0.00568 0.060841;0.607885;0.283237;0.126276;0.03798;0.119203;0;0.280505;0.224908;0.494589;0.1794;0.120365;0.700235;0.65246;0.07413;0.273415;0.285956;0.228121;0.076115;0.23053;0.589895;0.063305 0.126276;0.66572;0.280505;0.224908;0.07413;0.982014;0;0.285956;0.228121;0.485922;0.2269;0.122395;0.65071;0.619125;0.076115;0.304755;0.278271;0.250223;0.063305;0.206875;0.373615;0.037035 0.224908;0.65246;0.285956;0.228121;0.076115;0.119203;0;0.278271;0.250223;0.471506;0.23053;0.104645;0.664825;0.589895;0.063305;0.3468;0.198532;0.132265;0.037035;0.213415;0.413025;0.04198 0.228121;0.619125;0.278271;0.250223;0.063305;0.047426;0;0.198532;0.132265;0.669203;0.206875;0.021325;0.7718;0.373615;0.037035;0.58935;0.214268;0.220627;0.04198;0.213965;0.49312;0.049615 0.250223;0.589895;0.198532;0.132265;0.037035;0.268941;0;0.214268;0.220627;0.565108;0.213415;0.01544;0.77115;0.413025;0.04198;0.545;0.270003;0.192249;0.049615;0.18857;0.45497;0.034465 0.132265;0.373615;0.214268;0.220627;0.04198;0.880797;0;0.270003;0.192249;0.537747;0.213965;0.014525;0.771505;0.49312;0.049615;0.457265;0.276269;0.146924;0.034465;0.14891;0.454565;0.03569 0.146924;0.45497;0.254095;0.136505;0.03569;0.731059;0;0.239351;0.203895;0.556755;0.135675;0.019835;0.84449;0.547205;0.034755;0.418045;0.225775;0.151238;0.019;0.03194;0.45914;0.01009 0.044534;0.537675;0.219391;0.148928;0.031155;0.119203;0;0.233709;0.110772;0.655518;0.085805;0.00849;0.9057;0.605745;0.033535;0.36072;0.223357;0.163415;0.069595;0.09819;0.359975;0.07867 0.148928;0.605765;0.233709;0.110772;0.033535;0.268941;0.976014;0.223357;0.163415;0.613229;0.093765;0.069415;0.83682;0.567775;0.069595;0.36263;0.155202;0.247704;0.07867;0.091425;0.29815;0.095765 0.110772;0.605745;0.223357;0.163415;0.069595;0.119203;0;0.155202;0.247704;0.597093;0.09819;0.18264;0.71917;0.359975;0.07867;0.561355;0.131984;0.322047;0.095765;0.09255;0.283705;0.30327 0.163415;0.567775;0.155202;0.247704;0.07867;0.268941;0;0.131984;0.322047;0.545971;0.091425;0.228625;0.67995;0.29815;0.095765;0.60609;0.127338;0.533373;0.30327;0.093525;0.285145;0.439565 0.247704;0.359975;0.131984;0.322047;0.095765;0.119203;0;0.127338;0.533373;0.339288;0.09255;0.47722;0.43023;0.283705;0.30327;0.413025;0.127969;0.661972;0.439565;0.09255;0.28159;0.519385 0.322047;0.29815;0.127338;0.533373;0.30327;0.880797;0;0.127969;0.661972;0.21006;0.093525;0.60772;0.29876;0.285145;0.439565;0.27529;0.126322;0.676301;0.519385;0.087105;0.261645;0.567635 0.533373;0.283705;0.127969;0.661972;0.439565;0.5;0;0.126322;0.676301;0.197377;0.09255;0.68466;0.22279;0.28159;0.519385;0.199025;0.148321;0.693539;0.567635;0.07853;0.200985;0.52805 0.661972;0.285145;0.126322;0.676301;0.519385;0.952574;0;0.148321;0.693539;0.158141;0.087105;0.69281;0.22009;0.261645;0.567635;0.17072;0.124622;0.695229;0.52805;0.066455;0.18649;0.347945 0.676301;0.28159;0.148321;0.693539;0.567635;0.268941;0;0.124622;0.695229;0.180149;0.07853;0.67916;0.24231;0.200985;0.52805;0.270965;0.115651;0.449274;0.347945;0.062215;0.181915;0.127245 0.051303;0.166175;0.097584;0.010879;0.020105;0.047426;0;0.09575;0.18132;0.722928;0.04626;0.02684;0.926895;0.23601;0.02507;0.73892;0.132114;0.3721;0.14666;0.093275;0.319155;0.157535 0.010879;0.21263;0.09575;0.18132;0.02507;0.119203;0;0.132114;0.3721;0.495786;0.06582;0.190955;0.743225;0.32563;0.14666;0.52771;0.139137;0.47708;0.157535;0.10699;0.333655;0.2156 0.18132;0.23601;0.132114;0.3721;0.14666;0.268941;0;0.139137;0.47708;0.383782;0.093275;0.28367;0.623055;0.319155;0.157535;0.52331;0.148548;0.534862;0.2156;0.139145;0.36626;0.24109 0.3721;0.32563;0.139137;0.47708;0.157535;0.119203;0;0.148548;0.534862;0.316592;0.10699;0.398985;0.494025;0.333655;0.2156;0.45075;0.170184;0.549754;0.24109;0.1508;0.367415;0.24485 0.47708;0.319155;0.148548;0.534862;0.2156;0.952574;0;0.170184;0.549754;0.280064;0.139145;0.51996;0.340895;0.36626;0.24109;0.392655;0.174615;0.551945;0.24485;0.163915;0.47935;0.3 0.534862;0.333655;0.170184;0.549754;0.24109;0.268941;0;0.174615;0.551945;0.273441;0.1508;0.533245;0.315955;0.367415;0.24485;0.38774;0.216316;0.520662;0.3;0.16551;0.544115;0.33293 0.549754;0.36626;0.174615;0.551945;0.24485;0.5;0.976014;0.216316;0.520662;0.263024;0.163915;0.56274;0.273345;0.47935;0.3;0.220655;0.238555;0.581823;0.33293;0.165685;0.566745;0.33161 0.551945;0.367415;0.216316;0.520662;0.3;0.5;0;0.238555;0.581823;0.17962;0.16551;0.59629;0.2382;0.544115;0.33293;0.12295;0.246606;0.584443;0.33161;0.14973;0.552795;0.32441 0.520662;0.47935;0.238555;0.581823;0.33293;0.731059;0;0.246606;0.584443;0.168951;0.165685;0.56978;0.264535;0.566745;0.33161;0.101645;0.268405;0.373439;0.32441;0.12076;0.510895;0.198525 0.581823;0.544115;0.246606;0.584443;0.33161;0.119203;0;0.268405;0.373439;0.358159;0.14973;0.510535;0.339735;0.552795;0.32441;0.122805;0.264856;0.21281;0.198525;0.11001;0.48784;0.056955 0.075419;0.33633;0.134668;0.268703;0.08415;0.5;0.976014;0.16702;0.432621;0.40036;0.14939;0.171745;0.678865;0.33464;0.160175;0.505185;0.163303;0.478866;0.306115;0.174145;0.29995;0.379215 0.268703;0.329135;0.16702;0.432621;0.160175;0.119203;0;0.163303;0.478866;0.35783;0.171445;0.251265;0.57729;0.304375;0.306115;0.38951;0.162333;0.554726;0.379215;0.181885;0.30061;0.477735 0.432621;0.33464;0.163303;0.478866;0.306115;0.268941;0;0.162333;0.554726;0.282941;0.174145;0.423855;0.402;0.29995;0.379215;0.320835;0.165015;0.638838;0.477735;0.182815;0.29999;0.61531 0.478866;0.304375;0.162333;0.554726;0.379215;0.5;0;0.165015;0.638838;0.196149;0.181885;0.601345;0.216775;0.30061;0.477735;0.221655;0.165118;0.772105;0.61531;0.182125;0.293595;0.6517 0.554726;0.29995;0.165015;0.638838;0.477735;0.268941;0;0.165118;0.772105;0.062777;0.182815;0.726105;0.09108;0.29999;0.61531;0.0847;0.163001;0.794721;0.6517;0.181555;0.24602;0.64613 0.638838;0.30061;0.165118;0.772105;0.61531;0.5;0;0.163001;0.794721;0.042276;0.182125;0.75903;0.05884;0.293595;0.6517;0.054705;0.147314;0.701928;0.64613;0.177455;0.244545;0.60125 0.772105;0.29999;0.163001;0.794721;0.6517;0.982014;0;0.147314;0.701928;0.150756;0.181555;0.718695;0.099745;0.24602;0.64613;0.10785;0.149118;0.632085;0.60125;0.16804;0.12363;0.512945 0.794721;0.293595;0.147314;0.701928;0.64613;0.5;0;0.149118;0.632085;0.218799;0.177455;0.6506;0.171945;0.244545;0.60125;0.15421;0.115354;0.508349;0.512945;0.096545;0.04381;0.110115 0.701928;0.24602;0.149118;0.632085;0.60125;0.982014;0;0.115354;0.508349;0.376297;0.16804;0.48107;0.35089;0.12363;0.512945;0.363425;0.070178;0.112402;0.110115;0;0;0 0.007272;0.147985;0.134302;0.013124;0.022915;0.731059;0.963771;0.181377;0.016925;0.801699;0.167775;0.026965;0.80526;0.374755;0.02221;0.60304;0.233115;0.012332;0.01381;0.276805;0.617895;0.00969 0.002356;0.94317;0.872384;0.00263;0.005535;0.002473;0;0.487121;0.006569;0.506309;0.831585;0.001145;0.16727;0.628115;0.0169;0.35498;0.58298;0.003255;0.0052;0.38921;0.4108;0.00115 0.323931;0.00121;0.079323;0.408448;0.590075;0.982014;0.863303;0.073819;0.903539;0.022644;0.22018;0.767645;0.01218;0.001205;0.997195;0.0016;0.073449;0.905625;0.998335;0.20652;0.00066;0.998185 0.408448;0.00122;0.073819;0.903539;0.997195;0.982014;0.590492;0.073449;0.905625;0.020927;0.219095;0.772555;0.00835;0.00118;0.998335;0.000485;0.069084;0.901119;0.998185;0.20752;0.00062;0.9915 0.903539;0.001205;0.073449;0.905625;0.998335;0.982014;0.53743;0.069084;0.901119;0.029799;0.20652;0.75918;0.0343;0.00066;0.998185;0.00116;0.069404;0.882532;0.9915;0.2028;0.000595;0.3302 0.905625;0.00118;0.069084;0.901119;0.998185;0.017986;0;0.069404;0.882532;0.048063;0.20752;0.690115;0.102365;0.00062;0.9915;0.007875;0.067822;0.239705;0.3302;0.026295;0.00087;0.18147 0.239705;0.000595;0.009079;0.114279;0.18147;0.000911;0.040195;0.014765;0.381576;0.603657;0.02538;0.144105;0.83051;0.001575;0.069035;0.92939;0.018693;0.168116;0.25333;0.053385;0.005235;0.443395 0.243523;0.005235;0.03053;0.450328;0.809155;0.731059;0.60659;0.031525;0.767024;0.20145;0.056355;0.566095;0.37755;0.02088;0.804525;0.174595;0.031348;0.215145;0.47567;0.6854;0.199265;0.176545 0.450328;0.00817;0.031525;0.767024;0.804525;0.000335;0;0.031348;0.215145;0.753507;0.052715;0.00412;0.94317;0.023915;0.47567;0.50041;0.436929;0.093897;0.176545;0.71705;0.57;0.01179 0.023184;0.76356;0.253804;0.037863;0.00155;0.006693;0.216531;0.230198;0.076535;0.693269;0.0537;0.0015;0.9448;0.539185;0.014415;0.446405;0.203502;0.264082;0.026925;0.073625;0.68837;0.003265 0.037863;0.577295;0.230198;0.076535;0.014415;0.5;0.275878;0.203502;0.264082;0.532418;0.061525;0.00279;0.93569;0.50012;0.026925;0.472955;0.296627;0.008366;0.003265;0.110415;0.871655;0.00024 0.004486;0.00001;0.011028;0.064004;0.03695;0.047426;0.343666;0.013581;0.561416;0.425003;0.000005;0.179595;0.8204;0;0.68652;0.31348;0.015574;0.968942;0.994335;0.00004;0.00001;0.99976 0.064004;0;0.013581;0.561416;0.68652;0.731059;0;0.015574;0.968942;0.015486;0.00004;0.978415;0.02155;0.00001;0.994335;0.005655;0.01673;0.980562;0.99976;0.00004;0.00001;0.99957 0.561416;0;0.015574;0.968942;0.994335;0.982014;0.761514;0.01673;0.980562;0.002708;0.00004;0.99704;0.00292;0.00001;0.99976;0.00023;0.019454;0.979457;0.99957;0.00002;0.00001;0.929785 0.968942;0.00001;0.01673;0.980562;0.99976;0.952574;0.365632;0.019454;0.979457;0.001089;0.00004;0.998405;0.00155;0.00001;0.99957;0.000425;0.017822;0.93131;0.929785;0.000005;0.000465;0.692255 0.980562;0.00001;0.019454;0.979457;0.99957;0.268941;0.699517;0.017822;0.93131;0.050869;0.00002;0.99017;0.009805;0.00001;0.929785;0.070215;0.015362;0.553587;0.692255;0.000025;0.01079;0.146955 0.979457;0.00001;0.017822;0.93131;0.929785;0.002473;0.260186;0.015362;0.553587;0.431048;0.000005;0.486165;0.513825;0.000465;0.692255;0.307275;0.017397;0.175649;0.146955;0.00018;0.05644;0.02807 0.107791;0.38941;0.360409;0.094321;0.120315;0.017986;0.579812;0.282293;0.32822;0.389485;0.200125;0.054475;0.745395;0.398155;0.395415;0.20643;0.225211;0.490728;0.499605;0.08699;0.379415;0.55187 0.094321;0.3975;0.282293;0.32822;0.395415;0.993307;0.595801;0.225211;0.490728;0.284059;0.12206;0.230095;0.64784;0.392625;0.499605;0.10777;0.205975;0.563051;0.55187;0.08514;0.354385;0.511085 0.32822;0.398155;0.225211;0.490728;0.499605;0.952574;0.332922;0.205975;0.563051;0.230975;0.08699;0.41396;0.49905;0.379415;0.55187;0.068715;0.1634;0.433684;0.511085;0.47228;0.34851;0.374965 0.490728;0.392625;0.205975;0.563051;0.55187;0.017986;0.527223;0.1634;0.433684;0.402916;0.08514;0.40595;0.50891;0.354385;0.511085;0.13453;0.456107;0.285092;0.374965;0.455275;0.34145;0.28912 0.033213;0.469805;0.526536;0.150858;0.22959;0.982014;0.598688;0.098255;0.838856;0.062889;0.28257;0.61279;0.104645;0.00239;0.97829;0.019315;0.098046;0.889193;0.99112;0.277465;0.001315;0.996565 0.150858;0.46981;0.098255;0.838856;0.97829;0.731059;0.394365;0.098046;0.889193;0.012762;0.2797;0.706605;0.013695;0.00208;0.99112;0.0068;0.096238;0.890941;0.996565;0.276265;0.0003;0.91948 0.838856;0.00239;0.098046;0.889193;0.99112;0.017986;0.686111;0.096238;0.890941;0.012821;0.277465;0.70809;0.014445;0.001315;0.996565;0.00212;0.0955;0.866052;0.91948;0.26696;0.00061;0.45372 0.889193;0.00208;0.096238;0.890941;0.996565;0.999089;0.444233;0.0955;0.866052;0.038445;0.276265;0.706325;0.017405;0.0003;0.91948;0.080215;0.0925;0.590819;0.45372;0.062765;0.009245;0.31457 0.890941;0.001315;0.0955;0.866052;0.91948;0.002473;0.49525;0.0925;0.590819;0.316681;0.26696;0.41425;0.31879;0.00061;0.45372;0.54567;0.027319;0.123275;0.31457;0.033165;0.036845;0.01395 0.240744;0.00019;0.009187;0.594416;0.516895;0.880797;0.703913;0.009794;0.955948;0.03426;0.013205;0.949855;0.03694;0.00112;0.990655;0.00823;0.012711;0.959844;0.994915;0.008995;0.001515;0.995435 0.594416;0.0003;0.009794;0.955948;0.990655;0.952574;0.790344;0.012711;0.959844;0.027449;0.01517;0.95863;0.02621;0.00151;0.994915;0.003575;0.010517;0.927037;0.995435;0.0085;0.001485;0.779545 0.955948;0.00112;0.012711;0.959844;0.994915;0.000335;0.50575;0.010517;0.927037;0.062446;0.008995;0.96307;0.027935;0.001515;0.995435;0.00305;0.007897;0.812989;0.779545;0.00562;0.016965;0.349095 0.959844;0.00151;0.010517;0.927037;0.995435;0.997527;0.610639;0.007897;0.812989;0.179113;0.0085;0.93467;0.05683;0.001485;0.779545;0.218965;0.015394;0.155837;0.349095;0.00065;0.03528;0.16833 0.077914;0.009575;0.020339;0.218886;0.444115;0.731059;0.167842;0.021978;0.884858;0.093164;0.006755;0.839665;0.15358;0.0087;0.959875;0.031425;0.018916;0.898364;0.97293;0.007105;0.0035;0.965875 0.218886;0.009035;0.021978;0.884858;0.959875;0.982014;0.06037;0.018916;0.898364;0.082718;0.007855;0.83419;0.15795;0.00363;0.97293;0.02344;0.023723;0.841787;0.965875;0.009;0.004505;0.00242 0.884858;0.0087;0.018916;0.898364;0.97293;0.982014;0.765846;0.023723;0.841787;0.134486;0.007105;0.82348;0.16941;0.0035;0.965875;0.03062;0.027247;0.052629;0.00242;0.00919;0.006095;0.00003 0.004944;0.100865;0.024145;0.007634;0.007485;0.006693;0.551319;0.046341;0.151277;0.802382;0.03534;0.02668;0.93798;0.00609;0.07053;0.92338;0.086738;0.196522;0.250195;0.16302;0.00652;0.13455 0.015566;0.09013;0.004389;0.480567;0.35378;0.982014;0.484005;0.004502;0.979457;0.016039;0.00294;0.97179;0.025265;0.000625;0.978285;0.02109;0.005214;0.9933;0.99777;0.00297;0.001305;0.983505 0.480567;0.00052;0.004502;0.979457;0.978285;0.880797;0.172359;0.005214;0.9933;0.001487;0.00292;0.993815;0.003265;0.00119;0.99777;0.00104;0.004616;0.98871;0.983505;0.002665;0.00132;0.79871 0.979457;0.000625;0.005214;0.9933;0.99777;0.5;0.34299;0.004616;0.98871;0.006674;0.00297;0.992355;0.004675;0.001305;0.983505;0.01519;0.001798;0.905601;0.79871;0.00107;0.01199;0.239855 0.9933;0.00119;0.004616;0.98871;0.983505;0.952574;0.647713;0.001798;0.905601;0.092603;0.002665;0.984255;0.013085;0.00132;0.79871;0.19997;0.004971;0.228885;0.239855;0.00185;0.037485;0.00845 0.013341;0.122755;0.030508;0.07802;0.194175;0.047426;0.21032;0.029174;0.574614;0.396212;0.002405;0.59517;0.40243;0.071635;0.45088;0.47748;0.026203;0.89989;0.75755;0.00216;0.073185;0.85247 0.07802;0.0772;0.029174;0.574614;0.45088;0.047426;0.064888;0.026203;0.89989;0.073907;0.002355;0.95994;0.037705;0.076105;0.75755;0.166345;0.025164;0.929438;0.85247;0.002325;0.05591;0.632825 0.574614;0.071635;0.026203;0.89989;0.75755;0.731059;0.04319;0.025164;0.929438;0.0454;0.00216;0.9674;0.03044;0.073185;0.85247;0.07435;0.019463;0.616041;0.632825;0.011645;0.05171;0.227105 0.042094;0.002055;0.005219;0.473903;0.734625;0.997527;0.484005;0.002131;0.965516;0.032355;0.00068;0.974585;0.02474;0;0.951055;0.048945;0.00212;0.989401;0.99982;0.00053;0;0.999605 0.473903;0.00075;0.002131;0.965516;0.951055;0.952574;0.169103;0.00212;0.989401;0.008477;0.00065;0.996385;0.002965;0;0.99982;0.000175;0.002081;0.98797;0.999605;0.00047;0;0.996755 0.965516;0;0.00212;0.989401;0.99982;0.047426;0.843169;0.002081;0.98797;0.009951;0.00053;0.99298;0.006495;0;0.999605;0.000395;0.004674;0.984851;0.996755;0.00035;0.00001;0.535795 0.989401;0;0.002081;0.98797;0.999605;0.952574;0.024602;0.004674;0.984851;0.010475;0.00047;0.97303;0.0265;0;0.996755;0.003245;0.004609;0.656645;0.535795;0.000355;0.000045;0.03902 0.98797;0;0.004674;0.984851;0.996755;0.268941;0.419945;0.004609;0.656645;0.338749;0.00035;0.46422;0.535435;0.00001;0.535795;0.4642;0.003507;0.343683;0.03902;0.00176;0.01635;0.01423 0.984851;0;0.004609;0.656645;0.535795;0.002473;0.039658;0.003507;0.343683;0.652811;0.000355;0.155335;0.84431;0.000045;0.03902;0.960935;0.183935;0.034452;0.01423;0.035615;0.78056;0.000785 0.014009;0.00047;0.009229;0.040883;0.026625;0.952574;0.306613;0.011681;0.598624;0.389694;0.012395;0.585185;0.40242;0.00001;0.696755;0.303235;0.008672;0.895889;0.933615;0.01287;0.00001;0.99989 0.040883;0;0.011681;0.598624;0.696755;0.982014;0.093215;0.008672;0.895889;0.095439;0.01273;0.774725;0.212545;0.00001;0.933615;0.066375;0.008736;0.983715;0.99989;0.013095;0.00001;0.999865 0.598624;0.00001;0.008672;0.895889;0.933615;0.731059;0.275279;0.008736;0.983715;0.007549;0.01287;0.97109;0.01604;0.00001;0.99989;0.0001;0.010853;0.981604;0.999865;0.01246;0.00001;0.9977 0.895889;0.00001;0.008736;0.983715;0.99989;0.993307;0.152775;0.010853;0.981604;0.007543;0.013095;0.96446;0.022445;0.00001;0.999865;0.000125;0.010275;0.978682;0.9977;0.01166;0.00001;0.970805 0.983715;0.00001;0.010853;0.981604;0.999865;0.997527;0.015413;0.010275;0.978682;0.011043;0.01246;0.959105;0.028435;0.00001;0.9977;0.00229;0.010127;0.960651;0.970805;0.008425;0.00004;0.77845 0.981604;0.00001;0.010275;0.978682;0.9977;0.268941;0.039925;0.010127;0.960651;0.029226;0.01166;0.93147;0.056875;0.00001;0.970805;0.02919;0.006985;0.835344;0.77845;0.053325;0.00772;0.082345 0.978682;0.00001;0.010127;0.960651;0.970805;0.017986;0.016367;0.006985;0.835344;0.157669;0.008425;0.75382;0.237755;0.00004;0.77845;0.221505;0.024582;0.139714;0.082345;0.066775;0.018615;0.015495 0.139714;0.00772;0.030512;0.01276;0.015495;0.268941;0.618456;0.049878;0.135499;0.814621;0.07008;0.156935;0.77298;0.05353;0.04505;0.90142;0.114052;0.296745;0.05371;0.293185;0.294375;0.074375 0.176029;0.68548;0.401316;0.183996;0.07982;0.119203;0.296713;0.193638;0.177673;0.628689;0.002005;0.162875;0.83512;0.48434;0.08495;0.43071;0.114346;0.071748;0.05965;0.013005;0.366555;0.034925 0.021529;0.00028;0.050754;0.078146;0.009675;0.880797;0.594355;0.048512;0.360439;0.591047;0.007765;0.334275;0.65796;0.00147;0.059045;0.93948;0.048455;0.804422;0.78814;0.006715;0.00103;0.974055 0.078146;0.00136;0.048512;0.360439;0.059045;0.006693;0.706615;0.048455;0.804422;0.147125;0.006485;0.76785;0.22567;0.001875;0.78814;0.209985;0.044872;0.892974;0.974055;0.003485;0.00058;0.98263 0.360439;0.00147;0.048455;0.804422;0.78814;0.5;0.451899;0.044872;0.892974;0.062152;0.006715;0.84881;0.14447;0.00103;0.974055;0.024915;0.039232;0.855282;0.98263;0.004985;0.000635;0.811015 0.804422;0.001875;0.044872;0.892974;0.974055;0.119203;0.20751;0.039232;0.855282;0.105485;0.003485;0.813185;0.183325;0.00058;0.98263;0.01679;0.034006;0.780605;0.811015;0.006285;0.000695;0.018085 0.892974;0.00103;0.039232;0.855282;0.98263;0.997527;0.653849;0.034006;0.780605;0.185391;0.004985;0.80955;0.18547;0.000635;0.811015;0.18835;0.018621;0.145666;0.018085;0.01499;0.006995;0.00853 0.780605;0.000635;0.018621;0.145666;0.018085;0.047426;0.123035;0.033272;0.055623;0.911104;0.01499;0.004405;0.9806;0.006995;0.00853;0.984475;0.926382;0.000096;0.00006;0.862105;0.97443;0 0.007829;0.01228;0.022858;0.142897;0.07904;0.993307;0.715042;0.017755;0.872298;0.109947;0.001625;0.864975;0.1334;0.00005;0.92823;0.07172;0.017965;0.904778;0.994825;0.00183;0.00005;0.998615 0.142897;0.00003;0.017755;0.872298;0.92823;0.993307;0.153164;0.017965;0.904778;0.077258;0.00187;0.89422;0.10391;0.000055;0.994825;0.005125;0.018068;0.894045;0.998615;0.001555;0.000045;0.99505 0.872298;0.00005;0.017965;0.904778;0.994825;0.952574;0.14944;0.018068;0.894045;0.087885;0.00183;0.85784;0.140325;0.00005;0.998615;0.001335;0.017802;0.85711;0.99505;0.001475;0.000055;0.96293 0.904778;0.000055;0.018068;0.894045;0.998615;0.006693;0.076775;0.017802;0.85711;0.125088;0.001555;0.75029;0.24816;0.000045;0.99505;0.0049;0.026358;0.808248;0.96293;0.001195;0.00005;0.935395 0.894045;0.00005;0.017802;0.85711;0.99505;0.982014;0.615147;0.026358;0.808248;0.165392;0.001475;0.63985;0.35867;0.000055;0.96293;0.037015;0.027138;0.788297;0.935395;0.001285;0.00011;0.818915 0.85711;0.000045;0.026358;0.808248;0.96293;0.731059;0.112645;0.027138;0.788297;0.184565;0.001195;0.608475;0.39033;0.00005;0.935395;0.064555;0.028716;0.448114;0.818915;0.002385;0.000315;0.11658 0.116165;0.017385;0.03848;0.224348;0.274045;0.047426;0.338721;0.040265;0.428855;0.53088;0.078305;0.207855;0.71384;0.002225;0.649855;0.34792;0.073885;0.716853;0.931955;0.14737;0.00159;0.9665 0.224348;0.001155;0.040265;0.428855;0.649855;0.731059;0;0.073885;0.716853;0.209262;0.14609;0.50175;0.35216;0.00168;0.931955;0.066365;0.07448;0.773778;0.9665;0.14594;0.00159;0.987575 0.428855;0.002225;0.073885;0.716853;0.931955;0.047426;0.498;0.07448;0.773778;0.151743;0.14737;0.581055;0.27158;0.00159;0.9665;0.031905;0.073765;0.804605;0.987575;0.14915;0.00156;0.99152 0.716853;0.00168;0.07448;0.773778;0.9665;0.119203;0;0.073765;0.804605;0.12163;0.14594;0.621635;0.23243;0.00159;0.987575;0.01083;0.075355;0.813645;0.99152;0.147985;0.001585;0.983875 0.773778;0.00159;0.073765;0.804605;0.987575;0.119203;0.636453;0.075355;0.813645;0.110997;0.14915;0.63577;0.21508;0.00156;0.99152;0.006915;0.074785;0.8062;0.983875;0.14784;0.001565;0.962955 0.804605;0.00159;0.075355;0.813645;0.99152;0.731059;0.548596;0.074785;0.8062;0.119013;0.147985;0.628525;0.223485;0.001585;0.983875;0.01454;0.074703;0.752547;0.962955;0.137805;0.001155;0.918225 0.813645;0.00156;0.074785;0.8062;0.983875;0.047426;0;0.074703;0.752547;0.172745;0.14784;0.54214;0.31002;0.001565;0.962955;0.03547;0.06948;0.684555;0.918225;0.130225;0.001145;0.51806 0.8062;0.001585;0.074703;0.752547;0.962955;0.047426;0.14381;0.06948;0.684555;0.245965;0.137805;0.450885;0.411305;0.001155;0.918225;0.080625;0.065685;0.404043;0.51806;0.128785;0.00185;0.042 0.752547;0.001565;0.06948;0.684555;0.918225;0.119203;0.129431;0.065685;0.404043;0.530273;0.130225;0.290025;0.57975;0.001145;0.51806;0.480795;0.150575;0.063305;0.042;0.12209;0.00157;0.0188 0.060093;0.00144;0.078702;0.152268;0.08523;0.119203;0.140035;0.088707;0.549999;0.361296;0.265105;0.285135;0.44976;0.00032;0.55573;0.443955;0.096459;0.764338;0.965885;0.31945;0.00036;0.9953 0.152268;0.00061;0.088707;0.549999;0.55573;0.5;0;0.096459;0.764338;0.139203;0.28843;0.508745;0.202825;0.000345;0.965885;0.03377;0.11368;0.834506;0.9953;0.327975;0.00037;0.999085 0.549999;0.00032;0.096459;0.764338;0.965885;0.731059;0;0.11368;0.834506;0.051818;0.31945;0.53838;0.14217;0.00036;0.9953;0.00435;0.116499;0.841604;0.999085;0.328085;0.000375;0.997895 0.764338;0.000345;0.11368;0.834506;0.9953;0.997527;0.65701;0.116499;0.841604;0.041897;0.327975;0.54747;0.12455;0.00037;0.999085;0.00055;0.116537;0.841211;0.997895;0.335305;0.0004;0.989895 0.834506;0.00036;0.116499;0.841604;0.999085;0.119203;0.391026;0.116537;0.841211;0.04225;0.328085;0.54748;0.124435;0.000375;0.997895;0.001725;0.118952;0.832061;0.989895;0.335335;0.002955;0.93527 0.841604;0.00037;0.116537;0.841211;0.997895;0.982014;0.13447;0.118952;0.832061;0.048987;0.335305;0.52803;0.13666;0.0004;0.989895;0.00971;0.119826;0.784089;0.93527;0.324525;0.002985;0.67902 0.841211;0.000375;0.118952;0.832061;0.989895;0.731059;0.04936;0.119826;0.784089;0.096085;0.335335;0.438775;0.22589;0.002955;0.93527;0.061775;0.116232;0.625686;0.67902;0.30304;0.00966;0.38157 0.832061;0.0004;0.119826;0.784089;0.93527;0.047426;0.603005;0.116232;0.625686;0.258084;0.324525;0.219815;0.455665;0.002985;0.67902;0.317995;0.111328;0.475871;0.38157;0.101425;0.012375;0.26555 0.784089;0.002955;0.116232;0.625686;0.67902;0.268941;0.919384;0.111328;0.475871;0.412801;0.30304;0.06792;0.62904;0.00966;0.38157;0.60877;0.044468;0.413109;0.26555;0.061365;0.016445;0.19436 0.625686;0.002985;0.111328;0.475871;0.38157;0.5;0.229347;0.044468;0.413109;0.542424;0.101425;0.021545;0.877035;0.012375;0.26555;0.722075;0.030979;0.126875;0.19436;0.065815;0.05808;0.212275 0.097891;0.006055;0.015399;0.270954;0.408025;0.017986;0.643136;0.038785;0.431346;0.52987;0.113775;0.01419;0.872035;0.00197;0.67323;0.324805;0.044187;0.754108;0.8614;0.14031;0.001935;0.986005 0.270954;0.003295;0.038785;0.431346;0.67323;0.047426;0.066546;0.044187;0.754108;0.201707;0.130035;0.57857;0.291405;0.00193;0.8614;0.136665;0.047613;0.853348;0.986005;0.14102;0.00193;0.99528 0.431346;0.00197;0.044187;0.754108;0.8614;0.997527;0.65971;0.047613;0.853348;0.099041;0.14031;0.751035;0.108655;0.001935;0.986005;0.012065;0.047848;0.87919;0.99528;0.14037;0.001875;0.996685 0.754108;0.00193;0.047613;0.853348;0.986005;0.119203;0.340964;0.047848;0.87919;0.072963;0.14102;0.819285;0.039695;0.00193;0.99528;0.00279;0.047614;0.882708;0.996685;0.137425;0.00185;0.991255 0.853348;0.001935;0.047848;0.87919;0.99528;0.993307;0.514746;0.047614;0.882708;0.069676;0.14037;0.82968;0.029945;0.001875;0.996685;0.00144;0.052807;0.931313;0.991255;0.13122;0.001795;0.79284 0.87919;0.00193;0.047614;0.882708;0.996685;0.119203;0.097176;0.052807;0.931313;0.01588;0.137425;0.82243;0.040145;0.00185;0.991255;0.006895;0.050863;0.856942;0.79284;0.11338;0.00155;0.281755 0.882708;0.001875;0.052807;0.931313;0.991255;0.731059;0.102569;0.050863;0.856942;0.092197;0.13122;0.79817;0.070615;0.001795;0.79284;0.205365;0.052922;0.49468;0.281755;0.13241;0.00213;0.106875 0.931313;0.00185;0.050863;0.856942;0.79284;0.5;0.219943;0.052922;0.49468;0.452397;0.11338;0.459345;0.427275;0.00155;0.281755;0.716695;0.072946;0.246973;0.106875;0.24225;0.006445;0.017445 0.422303;0.0467;0.242832;0.533152;0.672385;0.268941;0.353658;0.273956;0.489737;0.23631;0.68913;0.115855;0.195015;0.04516;0.686095;0.268755;0.275656;0.45225;0.67896;0.670705;0.06558;0.66292 0.533152;0.0542;0.273956;0.489737;0.686095;0.119203;0.121425;0.275656;0.45225;0.272094;0.68266;0.113375;0.203965;0.0471;0.67896;0.27394;0.277059;0.338904;0.66292;0.680625;0.076905;0.655 0.071286;0.0026;0.043449;0.085994;0.12254;0.982014;0.402033;0.046885;0.289706;0.663407;0.11354;0.05196;0.834495;0.00619;0.22014;0.77367;0.048872;0.398986;0.49162;0.10709;0.008075;0.47003 0.085994;0.00545;0.046885;0.289706;0.22014;0.119203;0;0.048872;0.398986;0.552142;0.117175;0.106975;0.77585;0.008585;0.49162;0.499795;0.045326;0.401411;0.47003;0.162745;0.00806;0.634195 0.289706;0.00619;0.048872;0.398986;0.49162;0.017986;0.203915;0.045326;0.401411;0.553261;0.10709;0.13501;0.7579;0.008075;0.47003;0.52189;0.06388;0.474307;0.634195;0.160665;0.006395;0.80011 0.398986;0.008585;0.045326;0.401411;0.47003;0.017986;0.151357;0.06388;0.474307;0.461813;0.162745;0.1743;0.662955;0.00806;0.634195;0.357745;0.05588;0.613278;0.80011;0.158275;0.00424;0.93948 0.401411;0.008075;0.06388;0.474307;0.634195;0.268941;0.290285;0.05588;0.613278;0.33084;0.160665;0.21489;0.62444;0.006395;0.80011;0.193495;0.054364;0.717171;0.93948;0.15457;0.002805;0.95151 0.474307;0.00806;0.05588;0.613278;0.80011;0.268941;0.036299;0.054364;0.717171;0.228465;0.158275;0.233305;0.608415;0.00424;0.93948;0.056285;0.052651;0.723596;0.95151;0.132385;0.002775;0.911885 0.613278;0.006395;0.054364;0.717171;0.93948;0.731059;0.16645;0.052651;0.723596;0.223751;0.15457;0.24055;0.604875;0.002805;0.95151;0.045685;0.045245;0.707611;0.911885;0.160115;0.002715;0.770025 0.717171;0.00424;0.052651;0.723596;0.95151;0.119203;0.13308;0.045245;0.707611;0.247145;0.132385;0.22493;0.642685;0.002775;0.911885;0.085345;0.054469;0.618433;0.770025;0.16997;0.0047;0.47753 0.051053;0.101195;0.150668;0.158469;0.2019;0.5;0.450909;0.145796;0.630752;0.22345;0.434895;0.1371;0.428005;0.00177;0.911335;0.08689;0.14953;0.821307;0.98787;0.45316;0.00173;0.997335 0.158469;0.028205;0.145796;0.630752;0.911335;0.731059;0.136933;0.14953;0.821307;0.029162;0.446275;0.497205;0.05652;0.00174;0.98787;0.010385;0.151821;0.828652;0.997335;0.460125;0.001725;0.99794 0.630752;0.00177;0.14953;0.821307;0.98787;0.982014;0.269729;0.151821;0.828652;0.019525;0.45316;0.509765;0.037075;0.00173;0.997335;0.00093;0.154141;0.826954;0.99794;0.45835;0.00173;0.99302 0.821307;0.00174;0.151821;0.828652;0.997335;0.5;0.108515;0.154141;0.826954;0.018903;0.460125;0.50403;0.035845;0.001725;0.99794;0.00033;0.153551;0.823262;0.99302;0.458205;0.001755;0.974995 0.828652;0.00173;0.154141;0.826954;0.99794;0.268941;0.222354;0.153551;0.823262;0.023185;0.45835;0.497875;0.04377;0.00173;0.99302;0.00525;0.153511;0.788578;0.974995;0.446895;0.00175;0.874255 0.826954;0.001725;0.153551;0.823262;0.99302;0.993307;0.333366;0.153511;0.788578;0.05791;0.458205;0.433535;0.108255;0.001755;0.974995;0.02325;0.149742;0.735143;0.874255;0.431925;0.001875;0.667015 0.823262;0.00173;0.153511;0.788578;0.974995;0.119203;0;0.149742;0.735143;0.115115;0.446895;0.35438;0.19873;0.00175;0.874255;0.12399;0.144802;0.593409;0.667015;0.441515;0.00318;0.549985 0.788578;0.001755;0.149742;0.735143;0.874255;0.006693;0.259802;0.144802;0.593409;0.261795;0.431925;0.2198;0.34828;0.001875;0.667015;0.33112;0.196798;0.31799;0.549985;0.441775;0.00472;0.246845 0.113596;0.02837;0.162614;0.095305;0.014125;0.006693;0.361929;0.172438;0.327639;0.499923;0.465305;0.005125;0.52957;0.02582;0.206915;0.767265;0.188168;0.538699;0.742505;0.499855;0.02211;0.87246 0.095305;0.026315;0.172438;0.327639;0.206915;0.880797;0.122389;0.188168;0.538699;0.273133;0.512485;0.023255;0.464265;0.021715;0.742505;0.235775;0.185034;0.588711;0.87246;0.33162;0.021495;0.95142 0.327639;0.02582;0.188168;0.538699;0.742505;0.731059;0.037943;0.185034;0.588711;0.226255;0.499855;0.057875;0.44227;0.02211;0.87246;0.10543;0.135797;0.622053;0.95142;0.329215;0.021505;0.97028 0.538699;0.021715;0.185034;0.588711;0.87246;0.017986;0.663515;0.135797;0.622053;0.242147;0.33162;0.057365;0.61101;0.021495;0.95142;0.02708;0.134969;0.625211;0.97028;0.26987;0.0179;0.94991 0.588711;0.02211;0.135797;0.622053;0.95142;0.047426;0.138715;0.134969;0.625211;0.23982;0.329215;0.07728;0.5935;0.021505;0.97028;0.00822;0.182267;0.563414;0.94991;0.245875;0.016375;0.947805 0.622053;0.021495;0.134969;0.625211;0.97028;0.993307;0.727901;0.182267;0.563414;0.25432;0.26987;0.087805;0.642325;0.0179;0.94991;0.032195;0.187958;0.561418;0.947805;0.2386;0.016385;0.82359 0.625211;0.021505;0.182267;0.563414;0.94991;0.5;0.233438;0.187958;0.561418;0.250623;0.245875;0.08392;0.6702;0.016375;0.947805;0.03582;0.193081;0.516911;0.82359;0.23652;0.016055;0.51205 0.563414;0.0179;0.187958;0.561418;0.947805;0.982014;0.049878;0.193081;0.516911;0.29001;0.2386;0.074615;0.68679;0.016385;0.82359;0.160025;0.19982;0.352548;0.51205;0.372485;0.37679;0.19802 0.561418;0.016375;0.193081;0.516911;0.82359;0.119203;0.218744;0.19982;0.352548;0.447633;0.23652;0.043435;0.72005;0.016055;0.51205;0.471895;0.560497;0.07015;0.19802;0.28225;0.41645;0.18791 0.019828;0.08949;0.215775;0.265667;0.12676;0.880797;0.418971;0.259977;0.371936;0.368087;0.74063;0.00884;0.25053;0.03791;0.427645;0.534445;0.260497;0.420113;0.6075;0.74345;0.018485;0.778055 0.265667;0.05189;0.259977;0.371936;0.427645;0.119203;0.448187;0.260497;0.420113;0.319387;0.753355;0.009505;0.237135;0.02698;0.6075;0.365515;0.254264;0.595104;0.778055;0.78367;0.01662;0.890445 0.371936;0.03791;0.260497;0.420113;0.6075;0.5;0.052501;0.254264;0.595104;0.150628;0.74345;0.008975;0.24757;0.018485;0.778055;0.203455;0.267025;0.633416;0.890445;0.757225;0.006625;0.929185 0.420113;0.02698;0.254264;0.595104;0.778055;0.006693;0;0.267025;0.633416;0.09956;0.78367;0.011375;0.20496;0.01662;0.890445;0.092935;0.254879;0.646856;0.929185;0.742555;0.00333;0.927105 0.595104;0.018485;0.267025;0.633416;0.890445;0.047426;0.092122;0.254879;0.646856;0.098265;0.757225;0.012955;0.22982;0.006625;0.929185;0.06419;0.24889;0.646329;0.927105;0.676945;0.003595;0.69577 0.633416;0.01662;0.254879;0.646856;0.929185;0.047426;0.146665;0.24889;0.646329;0.10478;0.742555;0.013455;0.24399;0.00333;0.927105;0.069565;0.227113;0.305733;0.69577;0.593765;0.0027;0.488335 0.646329;0.00333;0.227113;0.305733;0.69577;0.268941;0;0.199186;0.503665;0.297145;0.593765;0.02485;0.38138;0.0027;0.488335;0.50896;0.131762;0.078209;0.049025;0.32168;0.007045;0.042645 0.035033;0.014175;0.043143;0.233167;0.08489;0.5;0;0.045863;0.647938;0.306202;0.06641;0.847025;0.08657;0.025315;0.44885;0.525835;0.03823;0.715917;0.58478;0.063535;0.01961;0.560555 0.002482;0.00089;0.020875;0.080635;0.06993;0.017986;0.195604;0.053433;0.189943;0.756615;0.09584;0.158975;0.745175;0.011025;0.22091;0.768055;0.058365;0.407607;0.319535;0.120425;0.011065;0.41217 0.080635;0.002115;0.053433;0.189943;0.22091;0.119203;0.753803;0.058365;0.407607;0.534025;0.105485;0.49568;0.39883;0.011245;0.319535;0.66922;0.065745;0.549207;0.41217;0.13447;0.01083;0.905705 0.189943;0.011025;0.058365;0.407607;0.319535;0.119203;0.966431;0.065745;0.549207;0.38505;0.120425;0.686245;0.193335;0.011065;0.41217;0.576765;0.07265;0.848058;0.905705;0.13637;0.01158;0.920875 0.407607;0.011245;0.065745;0.549207;0.41217;0.993307;0;0.07265;0.848058;0.079298;0.13447;0.79041;0.07512;0.01083;0.905705;0.083475;0.073975;0.870518;0.920875;0.136195;0.011565;0.92844 0.549207;0.011065;0.07265;0.848058;0.905705;0.880797;0;0.073975;0.870518;0.055507;0.13637;0.82016;0.04347;0.01158;0.920875;0.067545;0.07388;0.871145;0.92844;0.13414;0.01149;0.92473 0.848058;0.01083;0.073975;0.870518;0.920875;0.017986;0;0.07388;0.871145;0.054977;0.136195;0.81385;0.049955;0.011565;0.92844;0.06;0.072815;0.815647;0.92473;0.00865;0.001595;0.75983 0.870518;0.01158;0.07388;0.871145;0.92844;0.119203;0;0.072815;0.815647;0.11154;0.13414;0.706565;0.159295;0.01149;0.92473;0.063785;0.005122;0.591782;0.75983;0.001765;0.000245;0.35225 0.64433;0.02993;0.034845;0.927845;0.942265;0.982014;0.878361;0.034937;0.95042;0.014645;0.036955;0.944375;0.01867;0.03292;0.956465;0.01062;0.03499;0.952723;0.95948;0.037215;0.03137;0.959705 0.927845;0.03281;0.034937;0.95042;0.956465;0.268941;0;0.03499;0.952723;0.012287;0.037235;0.945965;0.0168;0.032745;0.95948;0.007775;0.034293;0.95237;0.959705;0.02426;0.030195;0.955835 0.95042;0.03292;0.03499;0.952723;0.95948;0.952574;0;0.034293;0.95237;0.013337;0.037215;0.945035;0.01775;0.03137;0.959705;0.008925;0.027227;0.916058;0.955835;0.021095;0.0298;0.59007 0.952723;0.032745;0.034293;0.95237;0.959705;0.880797;0;0.027227;0.916058;0.056715;0.02426;0.87628;0.09946;0.030195;0.955835;0.01397;0.025447;0.696613;0.59007;0.01859;0.03282;0.580155 0.95237;0.03137;0.027227;0.916058;0.955835;0.047426;0;0.025447;0.696613;0.27794;0.021095;0.803155;0.17575;0.0298;0.59007;0.38013;0.025705;0.681028;0.580155;0.015215;0.04437;0.499615 0.916058;0.030195;0.025447;0.696613;0.59007;0.268941;0;0.025705;0.681028;0.293265;0.01859;0.7819;0.19951;0.03282;0.580155;0.38702;0.029792;0.552175;0.499615;0.01509;0.04618;0.51508 0.696613;0.0298;0.025705;0.681028;0.580155;0.119203;0.859483;0.029792;0.552175;0.418038;0.015215;0.604735;0.380055;0.04437;0.499615;0.45602;0.030635;0.549233;0.51508;0.008275;0.043585;0.534745 0.681028;0.03282;0.029792;0.552175;0.499615;0.268941;0;0.030635;0.549233;0.420133;0.01509;0.583385;0.401525;0.04618;0.51508;0.43874;0.02593;0.374918;0.534745;0.00138;0.004705;0.023925 0.552175;0.04437;0.030635;0.549233;0.51508;0.017986;0;0.02593;0.374918;0.59915;0.008275;0.21509;0.77663;0.043585;0.534745;0.42167;0.003042;0.023242;0.023925;0.00094;0.001175;0.000265 0.549233;0.04618;0.02593;0.374918;0.534745;0.047426;0;0.003042;0.023242;0.973713;0.00138;0.02256;0.976055;0.004705;0.023925;0.97137;0.001058;0.00251;0.000265;0.00138;0.00041;0.005515 From nicholdav at gmail.com Mon Aug 21 10:38:57 2017 From: nicholdav at gmail.com (David Nicholson) Date: Mon, 21 Aug 2017 10:38:57 -0400 Subject: [scikit-learn] How can i write the birch prediction results to the file In-Reply-To: References: Message-ID: Hi Sema, You can save using pickle from the Python standard library, or using the joblib library which is a dependency of sklearn (so you have it already). The sklearn docs show examples of saving models but it will work for your predict results too: http://scikit-learn.org/stable/modules/model_persistence.html You'd just do something like: import joblib ... # your code here ... birch_predict = brc.predict(X) filename = 'predictions' joblib.dump(birch_predict, filename) And you can get the values back into memory with joblib.load Hth --David (list lurker) On Aug 21, 2017 10:13, "Sema Atasever" wrote: Dear scikit-learn developers, I have a text file where the columns represent the 22 features and the rows represent the amino asid . (you can see in the attachment) I want to apply hierarchical clustering to this database usign *sklearn.cluster.Birch algorithm.* There are too many prediction results and it is not possible to see them on the screen. How can i write the birch prediction results to the file? I would appreciate if you could advise on some methods. Thanks. *Birch Codes:* from sklearn.cluster import Birch import numpy as np X=np.loadtxt(open("C:\class1.txt", "rb"), delimiter=";") brc = Birch(branching_factor=50, n_clusters=None, threshold=0.5,compute_labels=True,copy=True) brc.fit(X) centroids = brc.subcluster_centers_ labels = brc.subcluster_labels_ n_clusters = np.unique(labels).size brc.predict(X) print("\n brc.predict(X)") print(brc.predict(X)) _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholdav at gmail.com Mon Aug 21 10:41:51 2017 From: nicholdav at gmail.com (David Nicholson) Date: Mon, 21 Aug 2017 10:41:51 -0400 Subject: [scikit-learn] How can i write the birch prediction results to the file In-Reply-To: References: Message-ID: Ack, should've mentioned you can do: from sklearn.externals import joblib since it is a sklearn dependency. That way you won't need to install joblib separately On Aug 21, 2017 10:38, "David Nicholson" wrote: > Hi Sema, > > You can save using pickle from the Python standard library, or using the > joblib library which is a dependency of sklearn (so you have it already). > > The sklearn docs show examples of saving models but it will work for your > predict results too: > http://scikit-learn.org/stable/modules/model_persistence.html > > You'd just do something like: > import joblib > ... > # your code here > ... > birch_predict = brc.predict(X) > filename = 'predictions' > joblib.dump(birch_predict, filename) > > And you can get the values back into memory with joblib.load > > Hth > --David (list lurker) > > On Aug 21, 2017 10:13, "Sema Atasever" wrote: > > Dear scikit-learn developers, > > I have a text file where the columns represent the 22 features and the > rows represent the amino asid . (you can see in the attachment) > > > I want to apply hierarchical clustering to this database usign *sklearn.cluster.Birch > algorithm.* > > There are too many prediction results and it is not possible to see them > on the screen. > How can i write the birch prediction results to the file? > > I would appreciate if you could advise on some methods. > Thanks. > > *Birch Codes:* > from sklearn.cluster import Birch > import numpy as np > > X=np.loadtxt(open("C:\class1.txt", "rb"), delimiter=";") > > brc = Birch(branching_factor=50, n_clusters=None, > threshold=0.5,compute_labels=True,copy=True) > > brc.fit(X) > > centroids = brc.subcluster_centers_ > > labels = brc.subcluster_labels_ > n_clusters = np.unique(labels).size > brc.predict(X) > > print("\n brc.predict(X)") > print(brc.predict(X)) > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.atasever at gmail.com Tue Aug 22 04:24:26 2017 From: s.atasever at gmail.com (Sema Atasever) Date: Tue, 22 Aug 2017 11:24:26 +0300 Subject: [scikit-learn] How can i write the birch prediction results to the file In-Reply-To: References: Message-ID: Dear David, "joblib.dump" produces a file format with npy extension so I can not open the file with the notepad editor. I can not see the predictions results inside the file. Is there another way to save the prediction results in text format? Thank you for your help. On Mon, Aug 21, 2017 at 5:38 PM, David Nicholson wrote: > Hi Sema, > > You can save using pickle from the Python standard library, or using the > joblib library which is a dependency of sklearn (so you have it already). > > The sklearn docs show examples of saving models but it will work for your > predict results too: > http://scikit-learn.org/stable/modules/model_persistence.html > > You'd just do something like: > import joblib > ... > # your code here > ... > birch_predict = brc.predict(X) > filename = 'predictions' > joblib.dump(birch_predict, filename) > > And you can get the values back into memory with joblib.load > > Hth > --David (list lurker) > > On Aug 21, 2017 10:13, "Sema Atasever" wrote: > > Dear scikit-learn developers, > > I have a text file where the columns represent the 22 features and the > rows represent the amino asid . (you can see in the attachment) > > > I want to apply hierarchical clustering to this database usign *sklearn.cluster.Birch > algorithm.* > > There are too many prediction results and it is not possible to see them > on the screen. > How can i write the birch prediction results to the file? > > I would appreciate if you could advise on some methods. > Thanks. > > *Birch Codes:* > from sklearn.cluster import Birch > import numpy as np > > X=np.loadtxt(open("C:\class1.txt", "rb"), delimiter=";") > > brc = Birch(branching_factor=50, n_clusters=None, > threshold=0.5,compute_labels=True,copy=True) > > brc.fit(X) > > centroids = brc.subcluster_centers_ > > labels = brc.subcluster_labels_ > n_clusters = np.unique(labels).size > brc.predict(X) > > print("\n brc.predict(X)") > print(brc.predict(X)) > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at gmail.com Tue Aug 22 04:33:21 2017 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Tue, 22 Aug 2017 11:33:21 +0300 Subject: [scikit-learn] How can i write the birch prediction results to the file In-Reply-To: References: Message-ID: <433d5f58-4e2f-661e-6e54-bdd388a4c1ce@gmail.com> Hello Sema, On 22/08/17 11:24, Sema Atasever wrote: > "joblib.dump" produces a file format with npy extension so I can not open the file with the notepad editor. I can not see the predictions results inside the file. > > Is there another way to save the prediction results in text format? Prediction results are just an array: you could use numpy.savetxt to save them in an ascii text format. -- Roman From aliozcan at gmail.com Tue Aug 22 04:37:36 2017 From: aliozcan at gmail.com (Ali Ozcan) Date: Tue, 22 Aug 2017 10:37:36 +0200 Subject: [scikit-learn] How can i write the birch prediction results to the file In-Reply-To: <433d5f58-4e2f-661e-6e54-bdd388a4c1ce@gmail.com> References: <433d5f58-4e2f-661e-6e54-bdd388a4c1ce@gmail.com> Message-ID: Sema, you can use this import numpy as np np.savetxt('birch_predict.csv', birch_predict, delimiter=',') On Tue, Aug 22, 2017 at 10:33 AM, Roman Yurchak wrote: > Hello Sema, > > On 22/08/17 11:24, Sema Atasever wrote: > > "joblib.dump" produces a file format with npy extension so I can not > open the file with the notepad editor. I can not see the predictions > results inside the file. > >> >> Is there another way to save the prediction results in text format? >> > > Prediction results are just an array: you could use numpy.savetxt to save > them in an ascii text format. > > -- > Roman > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Ali Ozcan -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.atasever at gmail.com Tue Aug 22 04:48:59 2017 From: s.atasever at gmail.com (Sema Atasever) Date: Tue, 22 Aug 2017 11:48:59 +0300 Subject: [scikit-learn] How can i write the birch prediction results to the file In-Reply-To: <433d5f58-4e2f-661e-6e54-bdd388a4c1ce@gmail.com> References: <433d5f58-4e2f-661e-6e54-bdd388a4c1ce@gmail.com> Message-ID: Dear Roman and Ali, it did worked thanks for all your help. Regards. On Tue, Aug 22, 2017 at 11:33 AM, Roman Yurchak wrote: > Hello Sema, > > On 22/08/17 11:24, Sema Atasever wrote: > > "joblib.dump" produces a file format with npy extension so I can not > open the file with the notepad editor. I can not see the predictions > results inside the file. > >> >> Is there another way to save the prediction results in text format? >> > > Prediction results are just an array: you could use numpy.savetxt to save > them in an ascii text format. > > -- > Roman > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcapizzi at email.arizona.edu Tue Aug 22 14:52:04 2017 From: mcapizzi at email.arizona.edu (Michael Capizzi) Date: Tue, 22 Aug 2017 11:52:04 -0700 Subject: [scikit-learn] any interest in incorporating a new Transformer? In-Reply-To: References: Message-ID: Thanks @joel, for the guidance. I will get right on it, and hopefully have something for public consumption soon! -M On Sun, Aug 20, 2017 at 5:28 AM, Joel Nothman wrote: > The idea is to take the template (https://github.com/scikit- > learn-contrib/project-template), build, test and document your > estimator(s), and offer it to be housed within scikit-learn-contrib. > > On 20 August 2017 at 08:36, Michael Capizzi > wrote: > >> Thanks @joel - >> >> I wasn?t aware of scikit-learn-contrib. Is this what you?re referring >> to? https://github.com/scikit-learn-contrib/scikit-learn-contrib >> >> If so, I don?t see any existing projects that this would fit into; could >> I start a new one in a pull-request? >> >> -M >> ? >> >> On Sat, Aug 19, 2017 at 2:47 AM, Joel Nothman >> wrote: >> >>> this is the right place to ask, but I'd be more interested to see a >>> scikit-learn-compatible implementation available, perhaps in >>> scikit-learn-contrib more than to see it part of the main package... >>> >>> On 19 Aug 2017 2:13 am, "Michael Capizzi" >>> wrote: >>> >>>> Hi all - >>>> >>>> Forgive me if this is the wrong place for posting this question, but >>>> I'd like to inquire about the community's interest in incorporating a new >>>> Transformer into the code base. >>>> >>>> This paper ( https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf ) >>>> is a "classic" in Natural Language Processing and is often times used as a >>>> very competitive baseline. TL;DR it transforms a traditional count-based >>>> feature space into the conditional probabilities of a `Naive Bayes` >>>> classifier. These transformed features can then be used to train any >>>> linear classifier. The paper focuses on `SVM`. >>>> >>>> The attached notebook has an example of the custom `Transformer` I >>>> built along with a custom `Classifier` to utilize this `Transformer` in a >>>> `multiclass` case (as the feature space transformation differs depending on >>>> the label). >>>> >>>> If there is interest in the community for the inclusion of this >>>> `Transformer` and `Classifier`, I'd happily go through the official process >>>> of a `pull-request`, etc. >>>> >>>> -Michael >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.atasever at gmail.com Wed Aug 23 04:55:21 2017 From: s.atasever at gmail.com (Sema Atasever) Date: Wed, 23 Aug 2017 11:55:21 +0300 Subject: [scikit-learn] Accessing Clustering Feature Tree in Birch Message-ID: Dear scikit-learn members, Considering the "CF-tree" data structure : - How can i *access Clustering Feature Tree* in Birch? - For example, how many clusters are there in the hierarchy under the *root node* and what are the data samples in this cluster? - Can I get them separately for 3 trees? Best. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- from sklearn.cluster import Birch from sklearn.externals import joblib import numpy as np import matplotlib.pyplot as plt X=np.loadtxt(open("C:\dataset.txt", "rb"), delimiter=";") brc = Birch(branching_factor=50, n_clusters=None, threshold=0.5,compute_labels=True,copy=True) brc.fit(X) birch_predict=brc.predict(X) print ("\nClustering_result:\n") print (birch_predict) np.savetxt('birch_predict_CLASS_0.csv', birch_predict,fmt="%i", delimiter=',') -------------- next part -------------- 0.008442;0.66962;0.742786;0.12921;0.035335;0.5;0;0.751871;0.164742;0.083387;0.52606;0.426925;0.047015;0.760345;0.051905;0.18775;0.678724;0.181262;0.09258;0.527265;0.708205;0.09418 0.12921;0.755105;0.751871;0.164742;0.051905;0.047426;0;0.678724;0.181262;0.140013;0.52578;0.436165;0.038055;0.74779;0.09258;0.15963;0.666024;0.180252;0.09418;0.528065;0.717575;0.076115 0.164742;0.760345;0.678724;0.181262;0.09258;0.993307;0;0.666024;0.180252;0.153723;0.527265;0.431535;0.0412;0.708205;0.09418;0.197615;0.669414;0.167021;0.076115;0.528445;0.71082;0.06787 0.181262;0.74779;0.666024;0.180252;0.09418;0.047426;0.389599;0.669414;0.167021;0.163565;0.528065;0.409905;0.06203;0.717575;0.076115;0.20631;0.663989;0.155896;0.06787;0.52745;0.72124;0.073745 0.180252;0.708205;0.669414;0.167021;0.076115;0.017986;0;0.663989;0.155896;0.180117;0.528445;0.38415;0.087405;0.71082;0.06787;0.221315;0.66713;0.159766;0.073745;0.523975;0.73143;0.088295 0.167021;0.717575;0.663989;0.155896;0.06787;0.268941;0;0.66713;0.159766;0.173103;0.52745;0.389885;0.08267;0.72124;0.073745;0.20501;0.665781;0.1704;0.088295;0.510225;0.70722;0.083135 0.155896;0.71082;0.66713;0.159766;0.073745;0.5;0;0.665781;0.1704;0.163819;0.523975;0.406555;0.069475;0.73143;0.088295;0.18027;0.642032;0.165493;0.083135;0.49496;0.67203;0.1121 0.309835;0.614635;0.59243;0.24016;0.14647;0.119203;0;0.63434;0.194778;0.17088;0.61152;0.290565;0.09791;0.65716;0.09899;0.24385;0.709535;0.130572;0.044365;0.645405;0.734585;0.024905 0.24016;0.581995;0.63434;0.194778;0.09899;0.017986;0;0.709535;0.130572;0.159893;0.61952;0.256325;0.124155;0.691135;0.044365;0.2645;0.732647;0.113563;0.024905;0.664725;0.83201;0.02481 0.194778;0.65716;0.709535;0.130572;0.044365;0.017986;0;0.732647;0.113563;0.15379;0.645405;0.22476;0.129835;0.734585;0.024905;0.24051;0.771562;0.113287;0.02481;0.666795;0.745095;0.024735 0.130572;0.691135;0.732647;0.113563;0.024905;0.5;0;0.771562;0.113287;0.115153;0.664725;0.224025;0.111255;0.83201;0.02481;0.14318;0.723874;0.121635;0.024735;0.658105;0.70706;0.023255 0.113563;0.734585;0.771562;0.113287;0.02481;0.268941;0;0.723874;0.121635;0.154492;0.666795;0.220035;0.11317;0.745095;0.024735;0.23017;0.708299;0.116688;0.023255;0.63076;0.684905;0.017255 0.113287;0.83201;0.723874;0.121635;0.024735;0.5;0.854954;0.708299;0.116688;0.175015;0.658105;0.206675;0.13522;0.70706;0.023255;0.26969;0.468897;0.088995;0.017255;0.180385;0.455915;0.007055 0.121635;0.745095;0.708299;0.116688;0.023255;0.5;0;0.468897;0.088995;0.442108;0.63076;0.158705;0.210535;0.684905;0.017255;0.29784;0.231702;0.169084;0.007055;0.009225;0.027455;0.0043 0.035705;0.02416;0.072467;0.015195;0.01129;0.119203;0.262697;0.790346;0.01195;0.197704;0.769445;0.000195;0.23036;0.668065;0.00242;0.329515;0.793541;0.011307;0.000645;0.784085;0.7004;0.00229 0.015195;0.03245;0.790346;0.01195;0.00242;0.017986;0;0.793541;0.011307;0.19515;0.764935;0.00004;0.235025;0.68216;0.000645;0.31719;0.686992;0.007612;0.00229;0.97409;0.79086;0.08215 0.01195;0.668065;0.793541;0.011307;0.000645;0.006693;0;0.686992;0.007612;0.305397;0.784085;0.00002;0.215895;0.7004;0.00229;0.29731;0.907967;0.034313;0.08215;0.97517;0.688985;0.075555 0.011307;0.68216;0.686992;0.007612;0.00229;0.952574;0;0.907967;0.034313;0.057718;0.97409;0.000265;0.025645;0.79086;0.08215;0.126985;0.875077;0.031757;0.075555;0.97471;0.653285;0.061705 0.007612;0.7004;0.907967;0.034313;0.08215;0.5;0.667078;0.875077;0.031757;0.093164;0.97517;0.000255;0.024575;0.688985;0.075555;0.235455;0.863024;0.027104;0.061705;0.980635;0.70174;0.01938 0.034313;0.79086;0.875077;0.031757;0.075555;0.017986;0;0.863024;0.027104;0.109872;0.97471;0.000145;0.025145;0.653285;0.061705;0.28501;0.881151;0.012997;0.01938;0.97968;0.715565;0.0084 0.031757;0.688985;0.863024;0.027104;0.061705;0.119203;0;0.881151;0.012997;0.105856;0.980635;0.00015;0.01922;0.70174;0.01938;0.278885;0.885441;0.009437;0.0084;0.984365;0.72635;0.00049 0.027104;0.653285;0.881151;0.012997;0.01938;0.119203;0;0.885441;0.009437;0.105121;0.97968;0.00045;0.019865;0.715565;0.0084;0.276035;0.891236;0.006388;0.00049;0.993995;0.938085;0.006585 0.012997;0.70174;0.885441;0.009437;0.0084;0.047426;0;0.891236;0.006388;0.102376;0.984365;0.00017;0.015465;0.72635;0.00049;0.27316;0.965025;0.008656;0.006585;0.994325;0.961415;0.01333 0.009437;0.715565;0.891236;0.006388;0.00049;0.880797;0.667078;0.965025;0.008656;0.026319;0.993995;0.00088;0.005125;0.938085;0.006585;0.05533;0.972911;0.010983;0.01333;0.99423;0.973995;0.015275 0.006388;0.72635;0.965025;0.008656;0.006585;0.731059;0.948778;0.972911;0.010983;0.016106;0.994325;0.001115;0.00456;0.961415;0.01333;0.025255;0.977073;0.011641;0.015275;0.992885;0.97204;0.011895 0.008656;0.938085;0.972911;0.010983;0.01333;0.119203;0;0.977073;0.011641;0.011284;0.99423;0.001145;0.004625;0.973995;0.015275;0.010725;0.975973;0.010504;0.011895;0.98109;0.960235;0.01008 0.010983;0.961415;0.977073;0.011641;0.015275;0.5;0;0.975973;0.010504;0.013521;0.992885;0.001115;0.005995;0.97204;0.011895;0.016065;0.968106;0.009809;0.01008;0.905975;0.947335;0.00057 0.011641;0.973995;0.975973;0.010504;0.011895;0.5;0;0.968106;0.009809;0.022084;0.98109;0.000845;0.018065;0.960235;0.01008;0.029685;0.938041;0.00688;0.00057;0.629465;0.614855;0.00049 0.010504;0.97204;0.968106;0.009809;0.01008;0.5;0.488002;0.938041;0.00688;0.05508;0.905975;0.000475;0.09355;0.947335;0.00057;0.052095;0.59958;0.007675;0.00049;0.398225;0.054915;0.00193 0.009809;0.960235;0.938041;0.00688;0.00057;0.047426;0.223393;0.59958;0.007675;0.392743;0.629465;0.00032;0.37021;0.614855;0.00049;0.384655;0.163412;0.014925;0.00193;0.084205;0.0059;0.013055 0.326985;0.07661;0.051845;0.139195;0.186475;0.268941;0;0.06022;0.097127;0.842653;0.080545;0.043865;0.875595;0.039895;0.15039;0.80971;0.088763;0.016698;0.020645;0.091805;0.080505;0.02658 0.139195;0.083335;0.06022;0.097127;0.15039;0.268941;0.779714;0.088763;0.016698;0.894545;0.09449;0.01275;0.892765;0.083035;0.020645;0.896325;0.086155;0.042025;0.02658;0.09471;0.06764;0.31128 0.097127;0.039895;0.088763;0.016698;0.020645;0.017986;0;0.086155;0.042025;0.87182;0.091805;0.05747;0.850725;0.080505;0.02658;0.892915;0.081175;0.298045;0.31128;0.050345;0.02021;0.737805 0.495558;0.185695;0.093198;0.530914;0.220625;0.997527;0.781427;0.099843;0.516794;0.383363;0.099525;0.451045;0.44943;0.199435;0.100475;0.70009;0.098953;0.203845;0.02881;0.10351;0.217985;0.017295 0.530914;0.183815;0.099843;0.516794;0.100475;0.119203;0.64497;0.098953;0.203845;0.697199;0.09697;0.582155;0.32087;0.19932;0.02881;0.771865;0.107355;0.213066;0.017295;0.099415;0.22013;0.045165 0.516794;0.199435;0.098953;0.203845;0.02881;0.997527;0.457106;0.107355;0.213066;0.679579;0.10351;0.621335;0.275155;0.217985;0.017295;0.76472;0.106705;0.218615;0.045165;0.09363;0.26663;0.04146 0.203845;0.19932;0.107355;0.213066;0.017295;0.047426;0;0.106705;0.218615;0.674683;0.099415;0.61011;0.290475;0.22013;0.045165;0.73471;0.120276;0.457291;0.04146;0.07761;0.304405;0.03027 0.213066;0.217985;0.106705;0.218615;0.045165;0.119203;0.18954;0.120276;0.457291;0.422436;0.09363;0.33155;0.574825;0.26663;0.04146;0.691915;0.127528;0.058846;0.03027;0.026315;0.65796;0.00464 0.218615;0.22013;0.120276;0.457291;0.04146;0.047426;0.390788;0.127528;0.058846;0.813624;0.07761;0.1457;0.77669;0.304405;0.03027;0.66532;0.342137;0.008417;0.00464;0.03819;0.84422;0.00579 0.003138;0.84422;0.654379;0.003161;0.000425;0.017986;0.414595;0.783778;0.00473;0.211496;0.39544;0.012155;0.592415;0.95703;0.001465;0.041505;0.914397;0.003594;0.00043;0.809;0.99335;0.000095 0.003161;0.91128;0.783778;0.00473;0.001465;0.5;0.732433;0.914397;0.003594;0.082006;0.757605;0.009785;0.232605;0.98672;0.00043;0.012845;0.933738;0.002311;0.000095;0.877555;0.997475;0.00003 0.00473;0.95703;0.914397;0.003594;0.00043;0.006693;0.799472;0.933738;0.002311;0.063946;0.809;0.00627;0.18472;0.99335;0.000095;0.00655;0.957965;0.002491;0.00003;0.94554;0.99902;0.000005 0.003594;0.98672;0.933738;0.002311;0.000095;0.006693;0.478014;0.957965;0.002491;0.039544;0.877555;0.006875;0.11557;0.997475;0.00003;0.002495;0.981142;0.003436;0.000005;0.958705;0.999125;0 0.002311;0.99335;0.957965;0.002491;0.00003;0.047426;0.39556;0.981142;0.003436;0.015424;0.94554;0.009735;0.044725;0.99902;0.000005;0.00098;0.985565;0.005279;0;0.95914;0.999945;0 0.002491;0.997475;0.981142;0.003436;0.000005;0.993307;0.907795;0.985565;0.005279;0.009157;0.958705;0.01527;0.02603;0.999125;0;0.000875;0.985984;0.005669;0;0.960575;0.999935;0 0.003436;0.99902;0.985565;0.005279;0;0.017986;0.695508;0.985984;0.005669;0.008347;0.95914;0.01644;0.02442;0.999945;0;0.000055;0.986459;0.005656;0;0.961995;0.99982;0 0.005279;0.999125;0.985984;0.005669;0;0.047426;0.925048;0.986459;0.005656;0.007886;0.960575;0.0164;0.023025;0.999935;0;0.000065;0.986894;0.005554;0;0.95715;0.99988;0 0.005669;0.999945;0.986459;0.005656;0;0.880797;0.319733;0.986894;0.005554;0.007552;0.961995;0.016095;0.02191;0.99982;0;0.00018;0.985299;0.005352;0;0.90035;0.999795;0 0.005656;0.999935;0.986894;0.005554;0;0.268941;0.863303;0.985299;0.005352;0.009349;0.95715;0.01549;0.02736;0.99988;0;0.00012;0.966326;0.005159;0;0.805855;0.999615;0 0.005554;0.99982;0.985299;0.005352;0;0.047426;0.601807;0.966326;0.005159;0.028516;0.90035;0.014895;0.08476;0.999795;0;0.000205;0.934732;0.003761;0;0.606475;0.99777;0.00001 0.003039;0.99777;0.670359;0.001451;0.00001;0.268941;0.586375;0.629746;0.000693;0.369563;0.011315;0.000355;0.988335;0.88131;0.00003;0.11866;0.23484;0.001218;0.000075;0.00434;0.43038;0.00258 0.001451;0.99192;0.629746;0.000693;0.00003;0.119203;0.433889;0.23484;0.001218;0.763943;0.00538;0.00105;0.99357;0.6911;0.000075;0.30883;0.219774;0.076059;0.00258;0.003685;0.10887;0.00114 0.000693;0.88131;0.23484;0.001218;0.000075;0.047426;0.458099;0.219774;0.076059;0.704164;0.00434;0.000995;0.99466;0.43038;0.00258;0.56704;0.112386;0.085661;0.00114;0.002;0.03147;0.002315 0.032934;0.22018;0.25514;0.028839;0.01624;0.119203;0.7191;0.603796;0.011174;0.38503;0.29032;0.02698;0.682695;0.56421;0.0017;0.434095;0.923163;0.003005;0.000175;0.889325;0.984285;0.000035 0.028839;0.349925;0.603796;0.011174;0.0017;0.047426;0.334033;0.923163;0.003005;0.073832;0.84408;0.004025;0.151895;0.943165;0.000175;0.05666;0.9378;0.002788;0.000035;0.955635;0.99905;0.00001 0.011174;0.56421;0.923163;0.003005;0.000175;0.5;0.629483;0.9378;0.002788;0.059412;0.889325;0.003655;0.10702;0.984285;0.000035;0.01568;0.948102;0.002001;0.00001;0.98219;0.999775;0.000005 0.003005;0.943165;0.9378;0.002788;0.000035;0.952574;0.584676;0.948102;0.002001;0.049899;0.955635;0.005635;0.03873;0.99905;0.00001;0.000945;0.957786;0.004236;0.000005;0.98412;0.999975;0 0.002788;0.984285;0.948102;0.002001;0.00001;0.952574;0.905081;0.957786;0.004236;0.037978;0.98219;0.01235;0.00546;0.999775;0.000005;0.00022;0.952973;0.012087;0;0.984635;0.999955;0 0.002001;0.99905;0.957786;0.004236;0.000005;0.017986;0.340964;0.952973;0.012087;0.03494;0.98412;0.013755;0.002125;0.999975;0;0.000025;0.951057;0.014627;0;0.98458;0.99991;0 0.004236;0.999775;0.952973;0.012087;0;0.880797;0.873028;0.951057;0.014627;0.034314;0.984635;0.01438;0.00098;0.999955;0;0.000045;0.950876;0.015671;0;0.98404;0.99996;0 0.012087;0.999975;0.951057;0.014627;0;0.880797;0.697622;0.950876;0.015671;0.033453;0.98458;0.014395;0.001025;0.99991;0;0.00009;0.960394;0.005822;0;0.982855;0.999905;0 0.014627;0.999955;0.950876;0.015671;0;0.952574;0.905253;0.960394;0.005822;0.033783;0.98404;0.014125;0.001835;0.99996;0;0.00004;0.957887;0.005889;0;0.97646;0.99974;0 0.015671;0.99991;0.960394;0.005822;0;0.047426;0.712795;0.957887;0.005889;0.036224;0.982855;0.013745;0.0034;0.999905;0;0.000095;0.956875;0.004227;0;0.955485;0.999375;0.000005 0.005822;0.99996;0.957887;0.005889;0;0.5;0.78279;0.956875;0.004227;0.038896;0.97646;0.008745;0.01479;0.99974;0;0.00026;0.92999;0.003049;0.000005;0.938535;0.99311;0.000005 0.005889;0.999905;0.956875;0.004227;0;0.119203;0.579568;0.92999;0.003049;0.066959;0.955485;0.005195;0.03932;0.999375;0.000005;0.000615;0.942257;0.002056;0.000005;0.87656;0.911585;0 0.004227;0.99974;0.92999;0.003049;0.000005;0.268941;0.759511;0.942257;0.002056;0.055687;0.938535;0.00228;0.059185;0.99311;0.000005;0.006885;0.745626;0.015394;0;0.653185;0.171825;0.00004 0.003049;0.999375;0.942257;0.002056;0.000005;0.119203;0.872917;0.745626;0.015394;0.23898;0.87656;0.00142;0.12202;0.911585;0;0.088415;0.309117;0.141822;0.00004;0.55808;0.00044;0.000785 0.144274;0.04801;0.146957;0.084234;0.11681;0.119203;0.450414;0.536664;0.04495;0.418386;0.195625;0.079095;0.72528;0.568975;0.04854;0.382485;0.64845;0.021258;0.03926;0.39226;0.90174;0.05751 0.084234;0.17759;0.536664;0.04495;0.04854;0.017986;0.147795;0.64845;0.021258;0.33029;0.311835;0.01742;0.670745;0.73025;0.03926;0.230485;0.761245;0.036791;0.05751;0.52422;0.929955;0.06594 0.04495;0.568975;0.64845;0.021258;0.03926;0.119203;0.746872;0.761245;0.036791;0.20196;0.39226;0.045655;0.562085;0.90174;0.05751;0.04074;0.815355;0.049582;0.06594;0.536225;0.930545;0.066465 0.021258;0.73025;0.761245;0.036791;0.05751;0.982014;0.609212;0.815355;0.049582;0.135061;0.52422;0.075575;0.400205;0.929955;0.06594;0.0041;0.818375;0.056409;0.066465;0.5574;0.930585;0.067005 0.036791;0.90174;0.815355;0.049582;0.06594;0.017986;0.608021;0.818375;0.056409;0.125216;0.536225;0.09544;0.368335;0.930545;0.066465;0.00299;0.826736;0.093697;0.067005;0.61885;0.930695;0.066395 0.049582;0.929955;0.818375;0.056409;0.066465;0.047426;0.207839;0.826736;0.093697;0.079569;0.5574;0.20671;0.23589;0.930585;0.067005;0.002415;0.847252;0.12085;0.066395;0.619115;0.928225;0.06536 0.056409;0.930545;0.826736;0.093697;0.067005;0.982014;0.896135;0.847252;0.12085;0.031899;0.61885;0.28877;0.092385;0.930695;0.066395;0.00291;0.788514;0.118014;0.06536;0.575755;0.919285;0.046905 0.093697;0.930585;0.847252;0.12085;0.066395;0.731059;0.721115;0.788514;0.118014;0.093474;0.619115;0.28828;0.092605;0.928225;0.06536;0.00642;0.772007;0.109543;0.046905;0.51972;0.86882;0.01736 0.12085;0.930695;0.788514;0.118014;0.06536;0.017986;0.418971;0.772007;0.109543;0.118451;0.575755;0.28133;0.142915;0.919285;0.046905;0.033815;0.734065;0.097927;0.01736;0.45373;0.409985;0.00429 0.118014;0.928225;0.772007;0.109543;0.046905;0.119203;0.375194;0.734065;0.097927;0.168009;0.51972;0.276015;0.204265;0.86882;0.01736;0.11382;0.347599;0.093625;0.00429;0.32548;0.002525;0.003945 0.109543;0.919285;0.734065;0.097927;0.01736;0.268941;0.080394;0.347599;0.093625;0.558776;0.45373;0.275445;0.270825;0.409985;0.00429;0.585725;0.162941;0.128565;0.003945;0.304175;0.00075;0.19228 0.015575;0.128615;0.18182;0.010233;0.006035;0.006693;0.628083;0.509804;0.002916;0.48728;0.564515;0.000295;0.43519;0.82232;0.000115;0.177565;0.836858;0.003037;0.00001;0.843905;0.956825;0.000005 0.010233;0.131695;0.509804;0.002916;0.000115;0.119203;0.665299;0.836858;0.003037;0.160105;0.642435;0.00077;0.356795;0.89494;0.00001;0.10505;0.924991;0.001083;0.000005;0.851295;0.999195;0 0.002916;0.82232;0.836858;0.003037;0.00001;0.993307;0.4985;0.924991;0.001083;0.073921;0.843905;0.00277;0.15332;0.956825;0.000005;0.04316;0.941577;0.000813;0;0.919575;0.999835;0 0.003037;0.89494;0.924991;0.001083;0.000005;0.119203;0.606828;0.941577;0.000813;0.057608;0.851295;0.001965;0.14674;0.999195;0;0.0008;0.964536;0.000653;0;0.952645;0.99983;0 0.001083;0.956825;0.941577;0.000813;0;0.017986;0.601088;0.964536;0.000653;0.034808;0.919575;0.001485;0.07893;0.999835;0;0.000165;0.975557;0.001343;0;0.95212;0.999935;0 0.000813;0.999195;0.964536;0.000653;0;0.047426;0.514996;0.975557;0.001343;0.023099;0.952645;0.003555;0.0438;0.99983;0;0.00017;0.977389;0.003606;0;0.94496;0.999805;0 0.000653;0.999835;0.975557;0.001343;0;0.880797;0.218744;0.977389;0.003606;0.019006;0.95212;0.010345;0.03754;0.999935;0;0.000065;0.974858;0.006419;0;0.937955;0.99909;0 0.001343;0.99983;0.977389;0.003606;0;0.047426;0.347964;0.974858;0.006419;0.018723;0.94496;0.01178;0.04326;0.999805;0;0.000195;0.973184;0.004053;0;0.90462;0.997515;0 0.003606;0.999935;0.974858;0.006419;0;0.047426;0.49725;0.973184;0.004053;0.022763;0.937955;0.011675;0.05037;0.99909;0;0.00091;0.960937;0.003776;0;0.799285;0.986475;0.000035 0.006419;0.999805;0.973184;0.004053;0;0.731059;0.426046;0.960937;0.003776;0.035285;0.90462;0.010795;0.08458;0.997515;0;0.002485;0.928221;0.003408;0.000035;0.72453;0.894125;0.00006 0.004053;0.99909;0.960937;0.003776;0;0.047426;0.725916;0.928221;0.003408;0.068373;0.799285;0.00964;0.19108;0.986475;0.000035;0.01349;0.842942;0.003839;0.00006;0.698595;0.739345;0.00018 0.003776;0.997515;0.928221;0.003408;0.000035;0.047426;0.299643;0.842942;0.003839;0.153217;0.72453;0.010775;0.264695;0.894125;0.00006;0.10581;0.650625;0.012722;0.00018;0.663635;0.65696;0.008015 0.003408;0.986475;0.842942;0.003839;0.00006;0.5;0.536187;0.650625;0.012722;0.336655;0.698595;0.03711;0.2643;0.739345;0.00018;0.260475;0.505169;0.043735;0.008015;0.630895;0.39294;0.00277 0.072568;0.06769;0.021446;0.038434;0.07939;0.047426;0;0.348958;0.036144;0.614896;0.077135;0.00615;0.91671;0.021545;0.07638;0.902075;0.348864;0.030603;0.034935;0.11489;0.024195;0.162845 0.038434;0.020485;0.348958;0.036144;0.07638;0.119203;0.803924;0.348864;0.030603;0.620533;0.085065;0.027595;0.88734;0.020085;0.034935;0.94498;0.357251;0.112677;0.162845;0.11449;0.02295;0.32538 0.036144;0.021545;0.348864;0.030603;0.034935;0.119203;0;0.357251;0.112677;0.530072;0.11489;0.14152;0.743585;0.024195;0.162845;0.812965;0.349695;0.205096;0.32538;0.095495;0.01456;0.18781 0.402061;0.154955;0.200411;0.383701;0.146995;0.047426;0.335815;0.644193;0.058533;0.297273;0.413315;0.108035;0.478645;0.64462;0.004885;0.350495;0.647131;0.056199;0.005445;0.40603;0.673135;0.008125 0.383701;0.16909;0.644193;0.058533;0.004885;0.047426;0.165481;0.647131;0.056199;0.296669;0.416905;0.100475;0.48262;0.649845;0.005445;0.34471;0.65127;0.039951;0.008125;0.39438;0.667785;0.00753 0.058533;0.64462;0.647131;0.056199;0.005445;0.119203;0;0.65127;0.039951;0.308776;0.40603;0.04905;0.54492;0.673135;0.008125;0.31873;0.64146;0.035531;0.00753;0.376435;0.666245;0.00642 0.056199;0.649845;0.65127;0.039951;0.008125;0.731059;0.635758;0.64146;0.035531;0.323009;0.39438;0.03017;0.57545;0.667785;0.00753;0.324685;0.373052;0.032856;0.00642;0.34128;0.66615;0.030435 0;0;0.150847;0.094682;0.099565;0.006693;0.414353;0.849131;0.000118;0.150753;0.880315;0.000095;0.11959;0.882355;0.00012;0.11753;0.922273;0.000065;0.00003;0.992165;0.976785;0.00005 0.094682;0.02561;0.849131;0.000118;0.00012;0.731059;0.388648;0.922273;0.000065;0.077658;0.98756;0.00006;0.012375;0.956645;0.00003;0.04332;0.924201;0.00007;0.00005;0.99962;0.99817;0 0.000118;0.882355;0.922273;0.000065;0.00003;0.119203;0.513747;0.924201;0.00007;0.075727;0.992165;0.00007;0.007765;0.976785;0.00005;0.02316;0.957982;0.000043;0;0.99969;0.99949;0 0.000065;0.956645;0.924201;0.00007;0.00005;0.268941;0.348418;0.957982;0.000043;0.041973;0.99962;0.000055;0.00032;0.99817;0;0.00183;0.974686;0.000038;0;0.99974;0.999935;0 0.00007;0.976785;0.957982;0.000043;0;0.006693;0.649991;0.974686;0.000038;0.025273;0.99969;0.00005;0.00025;0.99949;0;0.00051;0.98024;0.000037;0;0.9996;0.99978;0 0.000043;0.99817;0.974686;0.000038;0;0.000911;0.53096;0.98024;0.000037;0.019725;0.99974;0.000055;0.00021;0.999935;0;0.000065;0.983794;0.000048;0;0.9994;0.99998;0 0.000038;0.99949;0.98024;0.000037;0;0.5;0.349327;0.983794;0.000048;0.016156;0.9996;0.000095;0.0003;0.99978;0;0.00022;0.990775;0.000049;0;0.99923;0.999985;0 0.000037;0.999935;0.983794;0.000048;0;0.047426;0.27768;0.990775;0.000049;0.009177;0.9994;0.0001;0.000505;0.99998;0;0.00002;0.99067;0.000062;0;0.99604;0.9999;0 0.000048;0.99978;0.990775;0.000049;0;0.119203;0.380365;0.99067;0.000062;0.009267;0.99923;0.00014;0.00063;0.999985;0;0.000015;0.986525;0.00013;0;0.986525;0.99898;0 0.000049;0.99998;0.99067;0.000062;0;0.880797;0.668853;0.986525;0.00013;0.013345;0.99604;0.000345;0.003615;0.9999;0;0.0001;0.983631;0.000132;0;0.96961;0.99821;0 0.000062;0.999985;0.986525;0.00013;0;0.002473;0.100833;0.983631;0.000132;0.016239;0.986525;0.00035;0.01313;0.99898;0;0.00102;0.971793;0.000128;0;0.950005;0.994645;0.000005 0.00013;0.9999;0.983631;0.000132;0;0.017986;0.557988;0.971793;0.000128;0.028079;0.96961;0.00034;0.03005;0.99821;0;0.00179;0.939341;0.001325;0.000005;0.7612;0.908295;0.000745 0.000132;0.99898;0.971793;0.000128;0;0.5;0.391741;0.939341;0.001325;0.059334;0.950005;0.00187;0.048125;0.994645;0.000005;0.00535;0.832202;0.010909;0.000745;0.20003;0.294375;0.000355 0.000128;0.99821;0.939341;0.001325;0.000005;0.268941;0.276678;0.832202;0.010909;0.156889;0.7612;0.00267;0.23613;0.908295;0.000745;0.09096;0.24415;0.012425;0.000355;0.12832;0.04852;0.00063 0.700558;0.00128;0.033528;0.51778;0.69314;0.119203;0.467048;0.957029;0.018677;0.024294;0.988125;0.00001;0.01186;0.9871;0.000125;0.01278;0.961403;0.011465;0;0.999395;0.99769;0.000005 0.51778;0.001365;0.957029;0.018677;0.000125;0.047426;0.316695;0.961403;0.011465;0.027132;0.99916;0.000005;0.000835;0.997235;0;0.002765;0.966611;0.011268;0.000005;0.999775;0.99896;0 0.018677;0.9871;0.961403;0.011465;0;0.017986;0.108515;0.966611;0.011268;0.022121;0.999395;0;0.000605;0.99769;0.000005;0.002305;0.97066;0.011267;0;0.99999;0.99999;0 0.011465;0.997235;0.966611;0.011268;0.000005;0.5;0.099302;0.97066;0.011267;0.018073;0.999775;0;0.000225;0.99896;0;0.00104;0.967128;0.010435;0;0.999975;0.999955;0 0.011268;0.99769;0.97066;0.011267;0;0.268941;0.567093;0.967128;0.010435;0.022437;0.99999;0;0.00001;0.99999;0;0.00001;0.96963;0.011402;0;0.99997;0.99994;0 0.011267;0.99896;0.967128;0.010435;0;0.119203;0.653169;0.96963;0.011402;0.018968;0.999975;0;0.000025;0.999955;0;0.000045;0.968817;0.008672;0;0.9999;0.999595;0 0.010435;0.99999;0.96963;0.011402;0;0.268941;0.797057;0.968817;0.008672;0.022511;0.99997;0;0.00003;0.99994;0;0.00006;0.952639;0.000037;0;0.99978;0.999795;0 0.011402;0.999955;0.968817;0.008672;0;0.731059;0.563653;0.952639;0.000037;0.047324;0.9999;0;0.0001;0.999595;0;0.000405;0.941544;0.000053;0;0.999345;0.99894;0 0.008672;0.99994;0.952639;0.000037;0;0.047426;0.613963;0.941544;0.000053;0.058401;0.99978;0;0.000215;0.999795;0;0.000205;0.953784;0.033575;0;0.99645;0.99539;0.00001 0.000037;0.999595;0.941544;0.000053;0;0.119203;0.529715;0.953784;0.033575;0.012639;0.999345;0.00004;0.00061;0.99894;0;0.00106;0.950305;0.037169;0.00001;0.992425;0.988895;0.00003 0.000053;0.999795;0.953784;0.033575;0;0.731059;0.488502;0.950305;0.037169;0.012526;0.99645;0.00008;0.003465;0.99539;0.00001;0.004605;0.953128;0.015558;0.00003;0.96765;0.981705;0.00012 0.033575;0.99894;0.950305;0.037169;0.00001;0.5;0.426536;0.953128;0.015558;0.031314;0.992425;0.000085;0.00749;0.988895;0.00003;0.011075;0.933835;0.016748;0.00012;0.82843;0.899765;0.000305 0.037169;0.99539;0.953128;0.015558;0.00003;0.119203;0.554285;0.933835;0.016748;0.049415;0.96765;0.00009;0.032255;0.981705;0.00012;0.018175;0.835794;0.015184;0.000305;0.151705;0.462245;0.000715 0.015558;0.988895;0.933835;0.016748;0.00012;0.5;0.60444;0.835794;0.015184;0.149021;0.82843;0.000105;0.171465;0.899765;0.000305;0.099925;0.312008;0.008636;0.000715;0.07423;0.025605;0.00077 0.016792;0.083015;0.117144;0.078285;0.04578;0.119203;0.546614;0.248611;0.003692;0.7477;0.21343;0.00002;0.786555;0.33779;0.00833;0.653885;0.393686;0.005201;0.00746;0.550185;0.461935;0.003405 0.078285;0.12086;0.248611;0.003692;0.00833;0.119203;0.455617;0.393686;0.005201;0.601113;0.38517;0;0.61483;0.4509;0.00746;0.54164;0.432389;0.001162;0.003405;0.888935;0.87477;0.00035 0.003692;0.33779;0.393686;0.005201;0.00746;0.119203;0.231831;0.432389;0.001162;0.566446;0.550185;0.00001;0.4498;0.461935;0.003405;0.534655;0.841113;0.000142;0.00035;0.97981;0.92331;0.00007 0.005201;0.4509;0.432389;0.001162;0.003405;0.5;0.280497;0.841113;0.000142;0.158745;0.888935;0.00001;0.111055;0.87477;0.00035;0.12488;0.900356;0.000046;0.00007;0.990275;0.97047;0.00004 0.001162;0.461935;0.841113;0.000142;0.00035;0.119203;0.523732;0.900356;0.000046;0.099599;0.97981;0.000005;0.020185;0.92331;0.00007;0.076625;0.968005;0.001144;0.00004;0.99949;0.99538;0.000115 0.000142;0.87477;0.900356;0.000046;0.00007;0.268941;0.439301;0.968005;0.001144;0.030853;0.990275;0.000005;0.009725;0.97047;0.00004;0.02949;0.993907;0.001241;0.000115;0.99965;0.996815;0.00004 0.000046;0.92331;0.968005;0.001144;0.00004;0.880797;0.350692;0.993907;0.001241;0.004852;0.99949;0;0.00051;0.99538;0.000115;0.004505;0.988958;0.001281;0.00004;0.99971;0.995905;0.000015 0.001144;0.97047;0.993907;0.001241;0.000115;0.5;0.329599;0.988958;0.001281;0.009761;0.99965;0;0.00035;0.996815;0.00004;0.003145;0.985293;0.000038;0.000015;0.999985;0.99454;0.00001 0.001241;0.99538;0.988958;0.001281;0.00004;0.119203;0.452642;0.985293;0.000038;0.014667;0.99971;0;0.000285;0.995905;0.000015;0.00408;0.979348;0.005637;0.00001;0.99995;0.995545;0.00001 0.001281;0.996815;0.985293;0.000038;0.000015;0.5;0.606828;0.979348;0.005637;0.015015;0.999985;0;0.000015;0.99454;0.00001;0.00545;0.994323;0.003886;0.00001;0.99995;0.99819;0 0.000038;0.995905;0.979348;0.005637;0.00001;0.5;0.719302;0.994323;0.003886;0.001791;0.99995;0;0.00005;0.995545;0.00001;0.004445;0.993189;0.000233;0;0.999355;0.999255;0 0.005637;0.99454;0.994323;0.003886;0.00001;0.268941;0.179167;0.993189;0.000233;0.006578;0.99995;0;0.00005;0.99819;0;0.00181;0.999115;0.000211;0;0.99927;0.99949;0 0.003886;0.995545;0.993189;0.000233;0;0.119203;0.67787;0.999115;0.000211;0.000674;0.999355;0;0.000645;0.999255;0;0.000745;0.998113;0.000171;0;0.99831;0.99966;0 0.000233;0.99819;0.999115;0.000211;0;0.731059;0;0.998113;0.000171;0.001716;0.99927;0.000005;0.000725;0.99949;0;0.00051;0.998331;0.000116;0;0.991755;0.99985;0 0.000211;0.999255;0.998113;0.000171;0;0.047426;0.286182;0.998331;0.000116;0.001553;0.99831;0.000005;0.001685;0.99966;0;0.00034;0.996304;0.000106;0;0.9818;0.99917;0 0.000171;0.99949;0.998331;0.000116;0;0.268941;0.441519;0.996304;0.000106;0.003589;0.991755;0.00001;0.008235;0.99985;0;0.00015;0.984398;0.000236;0;0.8794;0.99648;0.000005 0.000116;0.99966;0.996304;0.000106;0;0.119203;0.457354;0.984398;0.000236;0.015367;0.9818;0.000575;0.01763;0.99917;0;0.00083;0.94823;0.00062;0.000005;0.83004;0.981515;0.000025 0.000236;0.99917;0.94823;0.00062;0.000005;0.5;0.315831;0.916663;0.000691;0.082648;0.83004;0.002015;0.167945;0.981515;0.000025;0.018465;0.849844;0.000841;0.000045;0.60209;0.93436;0.009335 0.00062;0.99648;0.916663;0.000691;0.000025;0.119203;0.512497;0.849844;0.000841;0.149313;0.69651;0.002445;0.301045;0.943615;0.000045;0.056335;0.784729;0.00706;0.009335;0.550905;0.788395;0.01519 0.000691;0.981515;0.849844;0.000841;0.000045;0.119203;0.607305;0.784729;0.00706;0.208215;0.60209;0.01181;0.386105;0.93436;0.009335;0.05631;0.697286;0.014475;0.01519;0.00835;0.10455;0.00141 0.672462;0.005635;0.048684;0.372434;0.79551;0.5;0.023823;0.941118;0.006192;0.052691;0.9675;0.000015;0.032485;0.983405;0.00004;0.01656;0.955736;0.008144;0.000005;0.999815;0.999065;0 0.372434;0.006595;0.941118;0.006192;0.00004;0.047426;0.444974;0.955736;0.008144;0.036119;0.99976;0;0.000235;0.99842;0.000005;0.001575;0.976851;0.000012;0;0.99996;0.9999;0 0.006192;0.983405;0.955736;0.008144;0.000005;0.017986;0.199248;0.976851;0.000012;0.023137;0.999815;0.000005;0.00018;0.999065;0;0.000935;0.983451;0.000012;0;0.999985;0.99999;0 0.008144;0.99842;0.976851;0.000012;0;0.119203;0.057487;0.983451;0.000012;0.016538;0.99996;0.000005;0.000035;0.9999;0;0.0001;0.982751;0.000012;0;0.999985;0.99985;0 0.000012;0.999065;0.983451;0.000012;0;0.952574;0.597967;0.982751;0.000012;0.017236;0.999985;0.000005;0.000005;0.99999;0;0.00001;0.975726;0.000529;0;0.999965;0.999845;0 0.000012;0.9999;0.982751;0.000012;0;0.047426;0.574443;0.975726;0.000529;0.023745;0.999985;0.000005;0.00001;0.99985;0;0.00015;0.972878;0.000503;0;0.999435;0.999335;0 0.000012;0.99999;0.975726;0.000529;0;0.5;0.597967;0.972878;0.000503;0.026617;0.999965;0.000005;0.000025;0.999845;0;0.000155;0.967218;0.002922;0;0.99919;0.998885;0 0.000529;0.99985;0.972878;0.000503;0;0.880797;0.389361;0.967218;0.002922;0.029857;0.999435;0.000005;0.00055;0.999335;0;0.000665;0.946015;0.009153;0;0.998625;0.988535;0.00001 0.000503;0.999845;0.967218;0.002922;0;0.268941;0.563899;0.946015;0.009153;0.044832;0.99919;0.000005;0.000805;0.998885;0;0.001115;0.926613;0.003178;0.00001;0.99058;0.968105;0.00004 0.002922;0.999335;0.946015;0.009153;0;0.047426;0.58686;0.926613;0.003178;0.070208;0.998625;0.00002;0.001355;0.988535;0.00001;0.01145;0.853032;0.000042;0.00004;0.381295;0.802785;0.00148 0.009153;0.998885;0.926613;0.003178;0.00001;0.880797;0.279086;0.853032;0.000042;0.146931;0.99058;0.000025;0.0094;0.968105;0.00004;0.031865;0.558403;0.000559;0.00148;0.23099;0.6761;0.000085 0.002453;0.6761;0.256732;0.000116;0.00013;0.017986;0.44695;0.407903;0.000456;0.591639;0.19305;0.00006;0.806885;0.54496;0.00067;0.45437;0.658868;0.001987;0.00152;0.9086;0.75571;0.002275 0.000116;0.42864;0.407903;0.000456;0.00067;0.047426;0.516494;0.658868;0.001987;0.339146;0.579265;0.00031;0.420425;0.581085;0.00152;0.417395;0.787854;0.002991;0.002275;0.96037;0.92668;0.00197 0.000456;0.54496;0.658868;0.001987;0.00152;0.268941;0.347057;0.787854;0.002991;0.209154;0.9086;0.00226;0.089145;0.75571;0.002275;0.24201;0.881451;0.001828;0.00197;0.986125;0.96213;0.00024 0.002991;0.75571;0.881451;0.001828;0.00197;0.017986;0.350464;0.819756;0.00041;0.179832;0.986125;0.000085;0.013785;0.96213;0.00024;0.03763;0.807821;0.000242;0.000225;0.99317;0.98375;0.00066 0.001828;0.92668;0.819756;0.00041;0.00024;0.731059;0.45289;0.807821;0.000242;0.191934;0.9916;0.000265;0.008125;0.96689;0.000225;0.032885;0.833145;0.000286;0.00066;0.99834;0.985635;0.00004 0.00041;0.96213;0.807821;0.000242;0.000225;0.880797;0.282925;0.833145;0.000286;0.166569;0.99317;0.000075;0.006755;0.98375;0.00066;0.01559;0.933461;0.000049;0.00004;0.998295;0.99341;0.000005 0.000242;0.96689;0.833145;0.000286;0.00066;0.268941;0.233438;0.933461;0.000049;0.066492;0.99834;0.00004;0.001625;0.985635;0.00004;0.014325;0.948341;0.000038;0.000005;0.99838;0.997545;0 0.000038;0.99341;0.973207;0.000044;0;0.731059;0.676558;0.97968;0.00275;0.017572;0.998635;0.00019;0.001175;0.9994;0.000005;0.0006;0.97611;0.002973;0;0.998365;0.999885;0 0.000044;0.997545;0.97968;0.00275;0.000005;0.047426;0.498;0.97611;0.002973;0.020915;0.998615;0.000385;0.000995;0.99987;0;0.00013;0.980911;0.00034;0;0.996925;0.99964;0 0.00275;0.9994;0.97611;0.002973;0;0.5;0.634599;0.980911;0.00034;0.018749;0.998365;0.00099;0.000645;0.999885;0;0.000115;0.983061;0.000367;0;0.99072;0.997965;0 0.002973;0.99987;0.980911;0.00034;0;0.952574;0.843169;0.983061;0.000367;0.01657;0.996925;0.00107;0.001995;0.99964;0;0.00036;0.979397;0.000427;0;0.981765;0.996365;0.00001 0.00034;0.999885;0.983061;0.000367;0;0.5;0.093638;0.979397;0.000427;0.020175;0.99072;0.00125;0.008025;0.997965;0;0.002035;0.974694;0.00077;0.00001;0.94478;0.98166;0.00056 0.000367;0.99964;0.979397;0.000427;0;0.047426;0.39556;0.974694;0.00077;0.024536;0.981765;0.00227;0.015965;0.996365;0.00001;0.003625;0.907345;0.002525;0.00056;0.81602;0.90559;0.000965 0.000427;0.997965;0.974694;0.00077;0.00001;0.952574;0.42678;0.907345;0.002525;0.090128;0.94478;0.002605;0.052615;0.98166;0.00056;0.017775;0.806528;0.002708;0.000965;0.37423;0.168285;0.003255 0.00077;0.996365;0.907345;0.002525;0.00056;0.952574;0.149822;0.806528;0.002708;0.190764;0.81602;0.00275;0.18123;0.90559;0.000965;0.093445;0.25032;0.005192;0.003255;0.194985;0.056635;0.00331 0.509818;0.000275;0.05059;0.37672;0.54454;0.731059;0.282114;0.951709;0.024367;0.023924;0.980945;0.00005;0.019005;0.997815;0.00003;0.002155;0.95633;0.001666;0.000005;0.995795;0.999305;0 0.37672;0.00048;0.951709;0.024367;0.00003;0.047426;0.459837;0.95633;0.001666;0.042002;0.995415;0;0.004585;0.999065;0.000005;0.000925;0.976598;0.001666;0;0.99991;0.999895;0 0.024367;0.997815;0.95633;0.001666;0.000005;0.002473;0.037652;0.976598;0.001666;0.021734;0.995795;0.000005;0.004195;0.999305;0;0.000695;0.982954;0.003358;0;0.999945;0.999985;0 0.001666;0.999065;0.976598;0.001666;0;0.268941;0.284144;0.982954;0.003358;0.013688;0.99991;0.00002;0.00007;0.999895;0;0.000105;0.983102;0.002203;0;0.999865;0.999905;0 0.001666;0.999305;0.982954;0.003358;0;0.5;0.378246;0.983102;0.002203;0.01469;0.999945;0.00002;0.00003;0.999985;0;0.000005;0.975464;0.002224;0;0.999865;0.999855;0 0.003358;0.999895;0.983102;0.002203;0;0.268941;0.476268;0.975464;0.002224;0.02231;0.999865;0.00002;0.000115;0.999905;0;0.00009;0.969589;0.000018;0;0.998255;0.998425;0.00001 0.002203;0.999985;0.975464;0.002224;0;0.880797;0.693174;0.969589;0.000018;0.030394;0.999865;0.000025;0.000115;0.999855;0;0.000145;0.94434;0.002803;0.00001;0.99444;0.993315;0.000145 0.002224;0.999905;0.969589;0.000018;0;0.5;0.529217;0.94434;0.002803;0.052856;0.998255;0.000025;0.001715;0.998425;0.00001;0.001565;0.933727;0.006593;0.000145;0.99246;0.961665;0.0002 0.000018;0.999855;0.94434;0.002803;0.00001;0.047426;0.41096;0.933727;0.006593;0.059682;0.99444;0.000025;0.005535;0.993315;0.000145;0.006545;0.911364;0.002948;0.0002;0.495955;0.625965;0.00081 0.002803;0.998425;0.933727;0.006593;0.000145;0.880797;0.443246;0.911364;0.002948;0.085688;0.99246;0.00002;0.00752;0.961665;0.0002;0.038135;0.53995;0.000964;0.00081;0.000715;0.021055;0.000915 0.000021;0.977485;0.955565;0.000027;0.00002;0.731059;0.643136;0.981244;0.000015;0.018738;0.996045;0.00001;0.00394;0.998985;0;0.00101;0.980455;0.000014;0;0.997085;0.999905;0 0.000027;0.98628;0.981244;0.000015;0;0.268941;0.287614;0.980455;0.000014;0.019531;0.997015;0.00001;0.002975;0.999835;0;0.000165;0.977148;0.000026;0;0.9991;0.99988;0 0.000015;0.998985;0.980455;0.000014;0;0.268941;0.61822;0.977148;0.000026;0.022826;0.997085;0.000045;0.002875;0.999905;0;0.00009;0.985975;0.000036;0;0.998265;0.99845;0 0.000014;0.999835;0.977148;0.000026;0;0.880797;0.532952;0.985975;0.000036;0.013989;0.9991;0.000075;0.000825;0.99988;0;0.00012;0.979153;0.000048;0;0.99201;0.995045;0 0.000026;0.999905;0.985975;0.000036;0;0.119203;0.195761;0.979153;0.000048;0.020799;0.998265;0.00011;0.001625;0.99845;0;0.00155;0.965073;0.003846;0;0.962435;0.974465;0.000075 0.000036;0.99988;0.979153;0.000048;0;0.047426;0.537927;0.965073;0.003846;0.031079;0.99201;0.000105;0.00788;0.995045;0;0.004955;0.928092;0.000133;0.000075;0.90279;0.723185;0.01667 0.000048;0.99845;0.965073;0.003846;0;0.268941;0.536435;0.928092;0.000133;0.071776;0.962435;0.00029;0.03727;0.974465;0.000075;0.025465;0.789833;0.006586;0.01667;0.37088;0.27723;0.001685 0.003846;0.995045;0.928092;0.000133;0.000075;0.880797;0.675902;0.789833;0.006586;0.203578;0.90279;0.003055;0.09415;0.723185;0.01667;0.26014;0.251092;0.006111;0.001685;0.16336;0.06408;0.00334 0.655616;0.0014;0.028594;0.4833;0.96643;0.731059;0.123467;0.975345;0.001873;0.02278;0.989215;0.00001;0.010775;0.999055;0.00001;0.00093;0.974067;0.00476;0;0.99997;0.999835;0 0.4833;0.001525;0.975345;0.001873;0.00001;0.006693;0.393649;0.974067;0.00476;0.021172;0.999815;0;0.000185;0.999715;0;0.000285;0.976068;0.007189;0;0.999995;0.999965;0 0.001873;0.999055;0.974067;0.00476;0;0.006693;0.255213;0.976068;0.007189;0.016743;0.99997;0;0.00003;0.999835;0;0.000165;0.973135;0.001278;0;0.99999;0.999985;0 0.00476;0.999715;0.976068;0.007189;0;0.5;0.163693;0.973135;0.001278;0.025587;0.999995;0;0.000005;0.999965;0;0.000035;0.97794;0.002564;0;0.99996;0.99998;0 0.007189;0.999835;0.973135;0.001278;0;0.982014;0.511248;0.97794;0.002564;0.019495;0.99999;0;0.000005;0.999985;0;0.000015;0.978694;0.000011;0;0.999955;0.999965;0 0.001278;0.999965;0.97794;0.002564;0;0.268941;0.742691;0.978694;0.000011;0.021295;0.99996;0;0.00004;0.99998;0;0.00002;0.97844;0.000012;0;0.99834;0.99947;0 0.002564;0.999985;0.978694;0.000011;0;0.047426;0.529964;0.97844;0.000012;0.021546;0.999955;0;0.00004;0.999965;0;0.000035;0.949995;0.00235;0;0.9983;0.999175;0 0.000011;0.99998;0.97844;0.000012;0;0.880797;0.321475;0.949995;0.00235;0.047653;0.99834;0.000015;0.00164;0.99947;0;0.00053;0.937838;0.008506;0;0.99788;0.964215;0.000005 0.000012;0.999965;0.949995;0.00235;0;0.731059;0.579812;0.937838;0.008506;0.053654;0.9983;0.000015;0.00168;0.999175;0;0.000825;0.917462;0.003603;0.000005;0.995875;0.941665;0.00006 0.00235;0.99947;0.937838;0.008506;0;0.047426;0.528968;0.917462;0.003603;0.078933;0.99788;0.000095;0.00202;0.964215;0.000005;0.03578;0.88201;0.00159;0.00006;0.78624;0.905205;0.000245 -------------- next part -------------- A non-text attachment was scrubbed... Name: CF-tree data structure.jpg Type: image/jpeg Size: 54025 bytes Desc: not available URL: From msuzen at gmail.com Wed Aug 23 06:44:58 2017 From: msuzen at gmail.com (Suzen, Mehmet) Date: Wed, 23 Aug 2017 12:44:58 +0200 Subject: [scikit-learn] Accessing Clustering Feature Tree in Birch In-Reply-To: References: Message-ID: Hi Sema, You can access CFNode from the fit output, assign fit output, so you can have the object. brc_fit = brc.fit(X) brc_fit_cfnode = brc_fit.root_ Then you can access CFNode, see here https://kite.com/docs/python/sklearn.cluster.birch._CFNode Also, this example comparing mini batch kmeans. http://scikit-learn.org/stable/auto_examples/cluster/plot_birch_vs_minibatchkmeans.html Hope this was what you are after. Best, Mehmet On 23 August 2017 at 10:55, Sema Atasever wrote: > Dear scikit-learn members, > > Considering the "CF-tree" data structure : > > - How can i access Clustering Feature Tree in Birch? > > - For example, how many clusters are there in the hierarchy under the root > node and what are the data samples in this cluster? > > - Can I get them separately for 3 trees? > > Best. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From rth.yurchak at gmail.com Wed Aug 23 07:28:16 2017 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Wed, 23 Aug 2017 14:28:16 +0300 Subject: [scikit-learn] Accessing Clustering Feature Tree in Birch In-Reply-To: References: Message-ID: > what are the data samples in this cluster Mehmet's response below works for exploring the hierarchical tree. However, Birch currently doesn't store the data samples that belong to a given subcluster. If you need that, as far as I know, a reasonable approximation can be obtained by computing the data samples that are closest to the centroid of the considered subcluster (accessible via _CFNode.centroids_) as compared to all other subcluster centroids at this hierarchical tree depth. Alternatively, the modifications in PR https://github.com/scikit-learn/scikit-learn/pull/8808 aimed to make this process easier.. -- Roman On 23/08/17 13:44, Suzen, Mehmet wrote: > Hi Sema, > > You can access CFNode from the fit output, assign fit output, so you > can have the object. > > brc_fit = brc.fit(X) > brc_fit_cfnode = brc_fit.root_ > > > Then you can access CFNode, see here > https://kite.com/docs/python/sklearn.cluster.birch._CFNode > > Also, this example comparing mini batch kmeans. > http://scikit-learn.org/stable/auto_examples/cluster/plot_birch_vs_minibatchkmeans.html > > Hope this was what you are after. > > Best, > Mehmet > > On 23 August 2017 at 10:55, Sema Atasever wrote: >> Dear scikit-learn members, >> >> Considering the "CF-tree" data structure : >> >> - How can i access Clustering Feature Tree in Birch? >> >> - For example, how many clusters are there in the hierarchy under the root >> node and what are the data samples in this cluster? >> >> - Can I get them separately for 3 trees? >> >> Best. >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From g.lemaitre58 at gmail.com Thu Aug 24 20:14:08 2017 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Fri, 25 Aug 2017 02:14:08 +0200 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 Message-ID: We are excited to announce the new release of the scikit-learn-contrib imbalanced-learn, already available through conda and pip (cf. the installation page https://tinyurl.com/y92flbab for more info) Notable add-ons are: * Support of sparse matrices * Support of multi-class resampling for all methods * A new BalancedBaggingClassifier using random under-sampling chained with the scikit-learn BaggingClassifier * Creation of a didactic user guide * New API of the ratio parameter to fit the needs of multi-class resampling * Migration from nosetests to pytest You can check the full changelog at: http://contrib.scikit-learn.org/imbalanced-learn/stable/whats_new.html#version-0-3 A big thank you to contributors to use, raise issues, and submit PRs to imblearn. -- Guillaume Lemaitre -------------- next part -------------- An HTML attachment was scrubbed... URL: From francois.dion at gmail.com Thu Aug 24 20:31:08 2017 From: francois.dion at gmail.com (Francois Dion) Date: Thu, 24 Aug 2017 20:31:08 -0400 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 In-Reply-To: References: Message-ID: <20170825003108.5587025.54506.170474@gmail.com> An HTML attachment was scrubbed... URL: From ashimb9 at gmail.com Thu Aug 24 21:48:07 2017 From: ashimb9 at gmail.com (Ashim Bhattarai) Date: Thu, 24 Aug 2017 20:48:07 -0500 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 In-Reply-To: <20170825003108.5587025.54506.170474@gmail.com> References: <20170825003108.5587025.54506.170474@gmail.com> Message-ID: Thanks a lot! On Thu, Aug 24, 2017 at 7:31 PM, Francois Dion wrote: > Merci beaucoup.? Super utile. J' ai d'ailleurs introduit ton module a la > conference data intelligence a Capital One il y a moins de 2 mois ( en > banlieue de Washington DC). > > Sent from my BlackBerry 10 Darkphone > *From: *Guillaume Lema?tre > *Sent: *Thursday, August 24, 2017 20:15 > *To: *Scikit-learn user and developer mailing list > *Reply To: *Scikit-learn mailing list > *Subject: *[scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn > 0.19.0 > > We are excited to announce the new release of the scikit-learn-contrib > imbalanced-learn, already available through conda and pip (cf. the > installation page https://tinyurl.com/y92flbab for more info) > > Notable add-ons are: > > * Support of sparse matrices > * Support of multi-class resampling for all methods > * A new BalancedBaggingClassifier using random under-sampling chained with > the scikit-learn BaggingClassifier > * Creation of a didactic user guide > * New API of the ratio parameter to fit the needs of multi-class resampling > * Migration from nosetests to pytest > > You can check the full changelog at: > http://contrib.scikit-learn.org/imbalanced-learn/stable/ > whats_new.html#version-0-3 > > A big thank you to contributors to use, raise issues, and submit PRs to > imblearn. > -- > Guillaume Lemaitre > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Fri Aug 25 00:13:26 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Fri, 25 Aug 2017 14:13:26 +1000 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 In-Reply-To: References: Message-ID: Congratulations Guillaume and the imblearn team! On 25 August 2017 at 10:14, Guillaume Lema?tre wrote: > We are excited to announce the new release of the scikit-learn-contrib > imbalanced-learn, already available through conda and pip (cf. the > installation page https://tinyurl.com/y92flbab for more info) > > Notable add-ons are: > > * Support of sparse matrices > * Support of multi-class resampling for all methods > * A new BalancedBaggingClassifier using random under-sampling chained with > the scikit-learn BaggingClassifier > * Creation of a didactic user guide > * New API of the ratio parameter to fit the needs of multi-class resampling > * Migration from nosetests to pytest > > You can check the full changelog at: > http://contrib.scikit-learn.org/imbalanced-learn/stable/ > whats_new.html#version-0-3 > > A big thank you to contributors to use, raise issues, and submit PRs to > imblearn. > -- > Guillaume Lemaitre > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Aug 25 01:52:01 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 25 Aug 2017 07:52:01 +0200 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 In-Reply-To: References: Message-ID: <20170825055201.GQ153900@phare.normalesup.org> Indeed, congratulations for the release! Ga?l On Fri, Aug 25, 2017 at 02:13:26PM +1000, Joel Nothman wrote: > Congratulations Guillaume and the imblearn team! > On 25 August 2017 at 10:14, Guillaume Lema?tre wrote: > We are excited to announce the new release of the scikit-learn-contrib > imbalanced-learn, already available through conda and pip (cf. the > installation page https://tinyurl.com/y92flbab for more info) > Notable add-ons are: > * Support of sparse matrices > * Support of multi-class resampling for all methods > * A new BalancedBaggingClassifier using random under-sampling chained with > the scikit-learn BaggingClassifier > * Creation of a didactic user guide > * New API of the ratio parameter to fit the needs of multi-class resampling > * Migration from nosetests to pytest > You can check the full changelog at: > http://contrib.scikit-learn.org/imbalanced-learn/stable/whats_new.html# > version-0-3 > A big thank you to contributors to use, raise issues, and submit PRs to > imblearn. -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From bertrand.thirion at inria.fr Fri Aug 25 01:56:28 2017 From: bertrand.thirion at inria.fr (bthirion) Date: Fri, 25 Aug 2017 07:56:28 +0200 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 In-Reply-To: <20170825055201.GQ153900@phare.normalesup.org> References: <20170825055201.GQ153900@phare.normalesup.org> Message-ID: <71818780-5165-7545-3eea-1e398780be9a@inria.fr> +1 B On 25/08/2017 07:52, Gael Varoquaux wrote: > Indeed, congratulations for the release! > > Ga?l > > On Fri, Aug 25, 2017 at 02:13:26PM +1000, Joel Nothman wrote: >> Congratulations Guillaume and the imblearn team! >> On 25 August 2017 at 10:14, Guillaume Lema?tre wrote: >> We are excited to announce the new release of the scikit-learn-contrib >> imbalanced-learn, already available through conda and pip (cf. the >> installation page https://tinyurl.com/y92flbab for more info) >> Notable add-ons are: >> * Support of sparse matrices >> * Support of multi-class resampling for all methods >> * A new BalancedBaggingClassifier using random under-sampling chained with >> the scikit-learn BaggingClassifier >> * Creation of a didactic user guide >> * New API of the ratio parameter to fit the needs of multi-class resampling >> * Migration from nosetests to pytest >> You can check the full changelog at: >> http://contrib.scikit-learn.org/imbalanced-learn/stable/whats_new.html# >> version-0-3 >> A big thank you to contributors to use, raise issues, and submit PRs to >> imblearn. From se.raschka at gmail.com Fri Aug 25 02:18:49 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Fri, 25 Aug 2017 02:18:49 -0400 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 In-Reply-To: References: Message-ID: <07BF3B06-5970-42B6-9148-0D604FE5921F@gmail.com> Just read through the summary of the new features and browsed through the user guide. The guide is really well structured and easy to navigate, thanks for putting all the work into it. Overall, thanks for this great contribution and new version :) Best, Sebastian > On Aug 24, 2017, at 8:14 PM, Guillaume Lema?tre wrote: > > We are excited to announce the new release of the scikit-learn-contrib imbalanced-learn, already available through conda and pip (cf. the installation page https://tinyurl.com/y92flbab for more info) > > Notable add-ons are: > > * Support of sparse matrices > * Support of multi-class resampling for all methods > * A new BalancedBaggingClassifier using random under-sampling chained with the scikit-learn BaggingClassifier > * Creation of a didactic user guide > * New API of the ratio parameter to fit the needs of multi-class resampling > * Migration from nosetests to pytest > > You can check the full changelog at: > http://contrib.scikit-learn.org/imbalanced-learn/stable/whats_new.html#version-0-3 > > A big thank you to contributors to use, raise issues, and submit PRs to imblearn. > -- > Guillaume Lemaitre > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From jaquesgrobler at gmail.com Fri Aug 25 04:53:33 2017 From: jaquesgrobler at gmail.com (Jaques Grobler) Date: Fri, 25 Aug 2017 10:53:33 +0200 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 In-Reply-To: <07BF3B06-5970-42B6-9148-0D604FE5921F@gmail.com> References: <07BF3B06-5970-42B6-9148-0D604FE5921F@gmail.com> Message-ID: Congrats guys! 2017-08-25 8:18 GMT+02:00 Sebastian Raschka : > Just read through the summary of the new features and browsed through the > user guide. The guide is really well structured and easy to navigate, > thanks for putting all the work into it. Overall, thanks for this great > contribution and new version :) > > Best, > Sebastian > > > On Aug 24, 2017, at 8:14 PM, Guillaume Lema?tre > wrote: > > > > We are excited to announce the new release of the scikit-learn-contrib > imbalanced-learn, already available through conda and pip (cf. the > installation page https://tinyurl.com/y92flbab for more info) > > > > Notable add-ons are: > > > > * Support of sparse matrices > > * Support of multi-class resampling for all methods > > * A new BalancedBaggingClassifier using random under-sampling chained > with the scikit-learn BaggingClassifier > > * Creation of a didactic user guide > > * New API of the ratio parameter to fit the needs of multi-class > resampling > > * Migration from nosetests to pytest > > > > You can check the full changelog at: > > http://contrib.scikit-learn.org/imbalanced-learn/stable/ > whats_new.html#version-0-3 > > > > A big thank you to contributors to use, raise issues, and submit PRs to > imblearn. > > -- > > Guillaume Lemaitre > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbbrown at kuhp.kyoto-u.ac.jp Fri Aug 25 05:09:37 2017 From: jbbrown at kuhp.kyoto-u.ac.jp (Brown J.B.) Date: Fri, 25 Aug 2017 11:09:37 +0200 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 In-Reply-To: References: <07BF3B06-5970-42B6-9148-0D604FE5921F@gmail.com> Message-ID: In drug discovery, if you are lucky you might get hit compounds 10% of the time. So if you do ML-based drug discovery, your datasets are strongly imbalanced. It seems the imbalanced package would be perfect for this area. J.B. 2017-08-25 10:53 GMT+02:00 Jaques Grobler : > Congrats guys! > > 2017-08-25 8:18 GMT+02:00 Sebastian Raschka : > >> Just read through the summary of the new features and browsed through the >> user guide. The guide is really well structured and easy to navigate, >> thanks for putting all the work into it. Overall, thanks for this great >> contribution and new version :) >> >> Best, >> Sebastian >> >> > On Aug 24, 2017, at 8:14 PM, Guillaume Lema?tre >> wrote: >> > >> > We are excited to announce the new release of the scikit-learn-contrib >> imbalanced-learn, already available through conda and pip (cf. the >> installation page https://tinyurl.com/y92flbab for more info) >> > >> > Notable add-ons are: >> > >> > * Support of sparse matrices >> > * Support of multi-class resampling for all methods >> > * A new BalancedBaggingClassifier using random under-sampling chained >> with the scikit-learn BaggingClassifier >> > * Creation of a didactic user guide >> > * New API of the ratio parameter to fit the needs of multi-class >> resampling >> > * Migration from nosetests to pytest >> > >> > You can check the full changelog at: >> > http://contrib.scikit-learn.org/imbalanced-learn/stable/what >> s_new.html#version-0-3 >> > >> > A big thank you to contributors to use, raise issues, and submit PRs to >> imblearn. >> > -- >> > Guillaume Lemaitre >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mario.michael.krell at gmail.com Fri Aug 25 11:40:35 2017 From: mario.michael.krell at gmail.com (Dr. Mario Michael Krell) Date: Fri, 25 Aug 2017 08:40:35 -0700 Subject: [scikit-learn] L-BFGS in MLPClassifier Message-ID: <05E76DEC-1286-4A87-BA27-B48F786BD89C@gmail.com> To whoever programmed the MLPClassifier (with the L-BFGS solver), I just wanted to personally thank you and if I get your name(s), I would mention it/them in my paper additionally to the mandatory sklearn citation. I hope that sklearn will be keeping this algorithm forever in their library despite the increasing amount of established deep learning libraries that seem to make this code obsolete. For my small scale, more theoretic analysis, it worked much better than any other algorithm and I would not have gotten such surprising results. Due to the high quality implementation, the integration of a much better solver than SGD, and the respective good documentation, I could show empirically how the VC dimension and another property of MLPs (MacKay dimension) actually scale linear with the number of edges in the respective graph which helped us to provide a new much more strict upper bound (https://arxiv.org/abs/1708.06019 ). This would have not been possible with other implementations. If there is an interest by the developers, I could try to contribute a tutorial documentation for sklearn. Just let me know. Thank you a lot!!! Best, Mario -------------- next part -------------- An HTML attachment was scrubbed... URL: From soneill5045 at gmail.com Fri Aug 25 19:30:52 2017 From: soneill5045 at gmail.com (Stephen O'Neill) Date: Fri, 25 Aug 2017 19:30:52 -0400 Subject: [scikit-learn] different sized inputs in call to custom metric in KNN Message-ID: Hey Gang, I was wondering if anyone might be able to answer a question about the sklearn,neighbors.NearestNeighbors class. For reference, I'm on: anaconda distribution python 2.7.11 sklearn version 0.17.1 I'm subclassing the NearestNeighbors class and using a custom distance metric, something like the following: class MyModel(NearestNeighbors): def __init__(self, some_info): def custom_dist(x, y, info=some_info): return numpy.sum(numpy.abs(x - y)/some_info) return scalar_value NearestNeighbors.__init__(self, metric=custom_dist) So I build a dummy dataset based on some gaussians of shape (5000,3), then later when I call MyModel.fit() I get the error: "ValueError: operands could not be broadcast together with shapes (10,) (3,)" inside of my custom_dist function. Naturally I checked with some simple print x, print y statements inside of custom_dist, and sure enough the shapes of x and y are both (10,), whereas I am expecting them to be of shape (3,) since my dummy data has 3 columns. (Note the actual custom_dist function written above is not what I'm truly using but it does reproduce the same ValueError). When I change my NearestNeighbors.__init__ call to (self, algorithm='brute') instead of the default algorithm='auto', the x and y values that get passed to my custom_dist are shape (3,) like I would expect. What is different in the distance metric between the algorithm='auto' and algorithm='brute' that would transform a 3-dimensional sample to a 10-dimensional sample? Do the KDTree and/or BallTree classes use the distance metric on tree nodes or something too? I wasn't able to figure out where the shape (10,) x and y samples could be coming from. Thanks in advance! Best, Steve O'Neill -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu at mblondel.org Fri Aug 25 21:33:06 2017 From: mathieu at mblondel.org (Mathieu Blondel) Date: Sat, 26 Aug 2017 10:33:06 +0900 Subject: [scikit-learn] L-BFGS in MLPClassifier In-Reply-To: <05E76DEC-1286-4A87-BA27-B48F786BD89C@gmail.com> References: <05E76DEC-1286-4A87-BA27-B48F786BD89C@gmail.com> Message-ID: Thanks for this email. It is always nice to hear about success stories. I assume the guilty party is Issam Laradji, as you can see from his Google Summer of Code blog post: http://issamlaradji.blogspot.jp/2014/06/week-3-gsoc-2014-extending-neural.html L-BFGS is indeed usually a good default choice for medium-scale datasets. It doesn't require any step size tuning and I found recently that it works well for poorly conditioned problems. You can also see a blog post by Nicolas Le Roux praising L-BFGS here: http://labs.criteo.com/2014/09/poh-part-3-distributed-optimization/ Mathieu On Sat, Aug 26, 2017 at 12:40 AM, Dr. Mario Michael Krell < mario.michael.krell at gmail.com> wrote: > To whoever programmed the MLPClassifier (with the L-BFGS solver), > > I just wanted to personally thank you and if I get your name(s), I would > mention it/them in my paper additionally to the mandatory sklearn citation. > > I hope that sklearn will be keeping this algorithm forever in their > library despite the increasing amount of established deep learning > libraries that seem to make this code obsolete. For my small scale, more > theoretic analysis, it worked much better than any other algorithm and I > would not have gotten such surprising results. Due to the high quality > implementation, the integration of a much better solver than SGD, and the > respective good documentation, I could show empirically how the VC > dimension and another property of MLPs (MacKay dimension) actually scale > linear with the number of edges in the respective graph which helped us to > provide a new much more strict upper bound (https://arxiv.org/abs/1708. > 06019). This would have not been possible with other implementations. If > there is an interest by the developers, I could try to contribute a > tutorial documentation for sklearn. Just let me know. > > Thank you a lot!!! > > Best, > > Mario > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shabieiqbal at gmail.com Sun Aug 27 14:48:42 2017 From: shabieiqbal at gmail.com (Shabie Iqbal) Date: Sun, 27 Aug 2017 20:48:42 +0200 Subject: [scikit-learn] Clarification regarding SGDClassifier Message-ID: <59a31409.c35c1c0a.5c153.9b50@mx.google.com> Dear all, could anyone of you kindly answer this question I posted on Stackoverflow: https://stackoverflow.com/questions/45900330/sklearn-sgdclassifiers-decision-function-odd-behavior Best, Shabie -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdslater at gmail.com Sun Aug 27 18:21:38 2017 From: rdslater at gmail.com (Robert Slater) Date: Sun, 27 Aug 2017 17:21:38 -0500 Subject: [scikit-learn] Clarification regarding SGDClassifier In-Reply-To: <59a31409.c35c1c0a.5c153.9b50@mx.google.com> References: <59a31409.c35c1c0a.5c153.9b50@mx.google.com> Message-ID: I think your use of "X" in X_mod should be "X_train" instead, as X_mod is currently using unshuffled indices while X_check uses the shuffled indices. Virus-free. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> On Sun, Aug 27, 2017 at 1:48 PM, Shabie Iqbal wrote: > Dear all, > > > > could anyone of you kindly answer this question I posted on Stackoverflow: > > > > https://stackoverflow.com/questions/45900330/sklearn- > sgdclassifiers-decision-function-odd-behavior > > > > Best, > > Shabie > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Sun Aug 27 18:23:45 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Sun, 27 Aug 2017 15:23:45 -0700 Subject: [scikit-learn] Decision stubs? Message-ID: Is it possible to efficiently get at the branch statistics that decision tree algorithms iterate over in scikit? For example if the root population has the class counts in the output vector: c0: 5000 c1: 500 Then I'd like to iterate over: # For a boolean (2 valued category) f1=True: c0=3000, c1=450 f1=False: c0=300, c1=30 f1=Null: c0=1700, c1=20 # ? Is considered? # For a continuous value f2<10: c0= ... c1= ... f2>=10: c0= ... c1= ... f2<22: c0= ... c1= ... f2>=22: c0= ... c1= ... I'd like to experiment with building models on-demand for each input row in a predict. To work efficiently, I'd like to reduce the training set to the 'most significant' sub-space(s) using the population statistics. I can do it in pandas, although its fairly inefficient to iterate over each feature column many times. Thanks, - Stu From shabieiqbal at gmail.com Sun Aug 27 18:41:01 2017 From: shabieiqbal at gmail.com (Shabie Iqbal) Date: Mon, 28 Aug 2017 00:41:01 +0200 Subject: [scikit-learn] Clarification regarding SGDClassifier In-Reply-To: References: <59a31409.c35c1c0a.5c153.9b50@mx.google.com> Message-ID: <59a34a7d.53e81c0a.de71a.06b0@mx.google.com> Got it? Thanks! Sent from Mail for Windows 10 From: Robert Slater Sent: Monday, August 28, 2017 12:24 AM To: Scikit-learn mailing list Subject: Re: [scikit-learn] Clarification regarding SGDClassifier I think your use of "X" in X_mod should be "X_train" instead, as X_mod is currently using unshuffled indices while X_check uses the shuffled indices. Virus-free. www.avast.com On Sun, Aug 27, 2017 at 1:48 PM, Shabie Iqbal wrote: Dear all, ? could anyone of you kindly answer this question I posted on Stackoverflow: ? https://stackoverflow.com/questions/45900330/sklearn-sgdclassifiers-decision-function-odd-behavior ? Best, Shabie _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From raga.markely at gmail.com Sun Aug 27 23:20:09 2017 From: raga.markely at gmail.com (Raga Markely) Date: Sun, 27 Aug 2017 23:20:09 -0400 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline Message-ID: Hello, I am wondering if it's possible to get the weight coefficients of logistic regression from a pipeline? For instance, I have the followings: > clf_lr = LogisticRegression(penalty='l1', C=0.1) > pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', clf_lr]]) > pipe_lr.fit(X, y) Does pipe_lr have an attribute that I can call to get the weight coefficient? Or do I have to get it from the classifier as follows? > X_std = StandardScaler().fit_transform(X) > clf_lr = LogisticRegression(penalty='l1', C=0.1) > clf_lr.fit(X_std, y) > clf_lr.coef_ Thank you, Raga -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Mon Aug 28 00:01:40 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Mon, 28 Aug 2017 14:01:40 +1000 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: No, we do not have a way to get the coefficients with respect to the input (pre-scaling) space. On 28 August 2017 at 13:20, Raga Markely wrote: > Hello, > > I am wondering if it's possible to get the weight coefficients of logistic > regression from a pipeline? > > For instance, I have the followings: > >> clf_lr = LogisticRegression(penalty='l1', C=0.1) >> pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', clf_lr]]) >> pipe_lr.fit(X, y) > > > Does pipe_lr have an attribute that I can call to get the weight > coefficient? > > Or do I have to get it from the classifier as follows? > >> X_std = StandardScaler().fit_transform(X) >> clf_lr = LogisticRegression(penalty='l1', C=0.1) >> clf_lr.fit(X_std, y) >> clf_lr.coef_ > > > Thank you, > Raga > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From raga.markely at gmail.com Mon Aug 28 00:08:43 2017 From: raga.markely at gmail.com (Raga Markely) Date: Mon, 28 Aug 2017 00:08:43 -0400 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: No problem, thank you! Best, Raga On Mon, Aug 28, 2017 at 12:01 AM, Joel Nothman wrote: > No, we do not have a way to get the coefficients with respect to the input > (pre-scaling) space. > > On 28 August 2017 at 13:20, Raga Markely wrote: > >> Hello, >> >> I am wondering if it's possible to get the weight coefficients of >> logistic regression from a pipeline? >> >> For instance, I have the followings: >> >>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>> pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', clf_lr]]) >>> pipe_lr.fit(X, y) >> >> >> Does pipe_lr have an attribute that I can call to get the weight >> coefficient? >> >> Or do I have to get it from the classifier as follows? >> >>> X_std = StandardScaler().fit_transform(X) >>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>> clf_lr.fit(X_std, y) >>> clf_lr.coef_ >> >> >> Thank you, >> Raga >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ichkoar at gmail.com Mon Aug 28 08:37:14 2017 From: ichkoar at gmail.com (Christos Aridas) Date: Mon, 28 Aug 2017 15:37:14 +0300 Subject: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0 In-Reply-To: References: Message-ID: Well done guys! Thanks a lot for this great release! I hope to be back soon. Best, Chris On Fri, Aug 25, 2017 at 3:14 AM, Guillaume Lema?tre wrote: > We are excited to announce the new release of the scikit-learn-contrib > imbalanced-learn, already available through conda and pip (cf. the > installation page https://tinyurl.com/y92flbab for more info) > > Notable add-ons are: > > * Support of sparse matrices > * Support of multi-class resampling for all methods > * A new BalancedBaggingClassifier using random under-sampling chained with > the scikit-learn BaggingClassifier > * Creation of a didactic user guide > * New API of the ratio parameter to fit the needs of multi-class resampling > * Migration from nosetests to pytest > > You can check the full changelog at: > http://contrib.scikit-learn.org/imbalanced-learn/stable/ > whats_new.html#version-0-3 > > A big thank you to contributors to use, raise issues, and submit PRs to > imblearn. > -- > Guillaume Lemaitre > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Aug 28 12:01:31 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 28 Aug 2017 12:01:31 -0400 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: Can can get the coefficients on the scaled data with pipeline_lr.named_steps_['clf'].coef_ though On 08/28/2017 12:08 AM, Raga Markely wrote: > No problem, thank you! > > Best, > Raga > > On Mon, Aug 28, 2017 at 12:01 AM, Joel Nothman > wrote: > > No, we do not have a way to get the coefficients with respect to > the input (pre-scaling) space. > > On 28 August 2017 at 13:20, Raga Markely > wrote: > > Hello, > > I am wondering if it's possible to get the weight coefficients > of logistic regression from a pipeline? > > For instance, I have the followings: > > clf_lr = LogisticRegression(penalty='l1', C=0.1) > pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', > clf_lr]]) > pipe_lr.fit(X, y) > > > Does pipe_lr have an attribute that I can call to get the > weight coefficient? > > Or do I have to get it from the classifier as follows? > > X_std = StandardScaler().fit_transform(X) > clf_lr = LogisticRegression(penalty='l1', C=0.1) > clf_lr.fit(X_std, y) > clf_lr.coef_ > > > Thank you, > Raga > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Aug 28 13:16:19 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 28 Aug 2017 13:16:19 -0400 Subject: [scikit-learn] scikit-learn-commits mailing list defunct? Message-ID: <465fa212-0fb6-cfe2-2281-416c672429ca@gmail.com> Hey all. Is it just me or is the scikit-learn-commits mailing list no longer working? Given that it's still on sourceforge, that seems somewhat likely. I find the mailing list helpful in case I can't keep track of the issue tracker (i.e. for the last 3 years?). It looks like it was set up here: https://github.com/scikit-learn/scikit-learn/settings/hooks/28838 I propose we transition this to either a google group or another python.org mailing list (they already gave us 2 ;). Cheers, Andy From raga.markely at gmail.com Mon Aug 28 14:12:05 2017 From: raga.markely at gmail.com (Raga Markely) Date: Mon, 28 Aug 2017 14:12:05 -0400 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: Thank you, Andreas. When I try > pipe_lr.named_steps['clf'].coef_ I get: > AttributeError: 'LogisticRegression' object has no attribute 'coef_' And when I try: > pipe_lr.named_steps['clf'] I get: > LogisticRegression(C=0.1, class_weight=None, dual=False, > fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', > n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, > verbose=0, warm_start=False) I wonder what I am missing? Thanks, Raga On Mon, Aug 28, 2017 at 12:01 PM, Andreas Mueller wrote: > Can can get the coefficients on the scaled data with > pipeline_lr.named_steps_['clf'].coef_ > though > > > On 08/28/2017 12:08 AM, Raga Markely wrote: > > No problem, thank you! > > Best, > Raga > > On Mon, Aug 28, 2017 at 12:01 AM, Joel Nothman > wrote: > >> No, we do not have a way to get the coefficients with respect to the >> input (pre-scaling) space. >> >> On 28 August 2017 at 13:20, Raga Markely wrote: >> >>> Hello, >>> >>> I am wondering if it's possible to get the weight coefficients of >>> logistic regression from a pipeline? >>> >>> For instance, I have the followings: >>> >>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>> pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', clf_lr]]) >>>> pipe_lr.fit(X, y) >>> >>> >>> Does pipe_lr have an attribute that I can call to get the weight >>> coefficient? >>> >>> Or do I have to get it from the classifier as follows? >>> >>>> X_std = StandardScaler().fit_transform(X) >>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>> clf_lr.fit(X_std, y) >>>> clf_lr.coef_ >>> >>> >>> Thank you, >>> Raga >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From susan_liu at brown.edu Mon Aug 28 14:17:04 2017 From: susan_liu at brown.edu (Liu, Susan) Date: Mon, 28 Aug 2017 14:17:04 -0400 Subject: [scikit-learn] remoe from list Message-ID: hi there, just wanted to ask if i could be removed from list? Thanks, Susan -------------- next part -------------- An HTML attachment was scrubbed... URL: From alekhka at gmail.com Mon Aug 28 14:19:57 2017 From: alekhka at gmail.com (Alekh Karkada Ashok) Date: Mon, 28 Aug 2017 23:49:57 +0530 Subject: [scikit-learn] remoe from list In-Reply-To: References: Message-ID: Hi Susan, You can visit https://mail.python.org/mailman/listinfo/scikit-learn and unsubscribe from the list there. Thanks, Alekh On Mon, Aug 28, 2017 at 11:47 PM, Liu, Susan wrote: > hi there, > > just wanted to ask if i could be removed from list? > > > Thanks, > Susan > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Aug 28 14:55:08 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 28 Aug 2017 14:55:08 -0400 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: Have you called "fit" on the pipeline? On 08/28/2017 02:12 PM, Raga Markely wrote: > Thank you, Andreas. > > When I try > > pipe_lr.named_steps['clf'].coef_ > > > I get: > > AttributeError: 'LogisticRegression' object has no attribute 'coef_' > > > And when I try: > > pipe_lr.named_steps['clf'] > > > I get: > > LogisticRegression(C=0.1, class_weight=None, dual=False, > fit_intercept=True, intercept_scaling=1, max_iter=100, > multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, > solver='liblinear', tol=0.0001, verbose=0, warm_start=False) > > > I wonder what I am missing? > > Thanks, > Raga > > > On Mon, Aug 28, 2017 at 12:01 PM, Andreas Mueller > wrote: > > Can can get the coefficients on the scaled data with > pipeline_lr.named_steps_['clf'].coef_ > though > > > On 08/28/2017 12:08 AM, Raga Markely wrote: >> No problem, thank you! >> >> Best, >> Raga >> >> On Mon, Aug 28, 2017 at 12:01 AM, Joel Nothman >> > wrote: >> >> No, we do not have a way to get the coefficients with respect >> to the input (pre-scaling) space. >> >> On 28 August 2017 at 13:20, Raga Markely >> > wrote: >> >> Hello, >> >> I am wondering if it's possible to get the weight >> coefficients of logistic regression from a pipeline? >> >> For instance, I have the followings: >> >> clf_lr = LogisticRegression(penalty='l1', C=0.1) >> pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', >> clf_lr]]) >> pipe_lr.fit(X, y) >> >> >> Does pipe_lr have an attribute that I can call to get the >> weight coefficient? >> >> Or do I have to get it from the classifier as follows? >> >> X_std = StandardScaler().fit_transform(X) >> clf_lr = LogisticRegression(penalty='l1', C=0.1) >> clf_lr.fit(X_std, y) >> clf_lr.coef_ >> >> >> Thank you, >> Raga >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From raga.markely at gmail.com Mon Aug 28 15:07:44 2017 From: raga.markely at gmail.com (Raga Markely) Date: Mon, 28 Aug 2017 15:07:44 -0400 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: Ah.. got it :D.. The pipeline was run in gridsearchcv.. It works now after calling fit.. Thanks! Raga On Mon, Aug 28, 2017 at 2:55 PM, Andreas Mueller wrote: > Have you called "fit" on the pipeline? > > > On 08/28/2017 02:12 PM, Raga Markely wrote: > > Thank you, Andreas. > > When I try > >> pipe_lr.named_steps['clf'].coef_ > > > I get: > >> AttributeError: 'LogisticRegression' object has no attribute 'coef_' > > > And when I try: > >> pipe_lr.named_steps['clf'] > > > I get: > >> LogisticRegression(C=0.1, class_weight=None, dual=False, >> fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', >> n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, >> verbose=0, warm_start=False) > > > I wonder what I am missing? > > Thanks, > Raga > > > On Mon, Aug 28, 2017 at 12:01 PM, Andreas Mueller > wrote: > >> Can can get the coefficients on the scaled data with >> pipeline_lr.named_steps_['clf'].coef_ >> though >> >> >> On 08/28/2017 12:08 AM, Raga Markely wrote: >> >> No problem, thank you! >> >> Best, >> Raga >> >> On Mon, Aug 28, 2017 at 12:01 AM, Joel Nothman >> wrote: >> >>> No, we do not have a way to get the coefficients with respect to the >>> input (pre-scaling) space. >>> >>> On 28 August 2017 at 13:20, Raga Markely wrote: >>> >>>> Hello, >>>> >>>> I am wondering if it's possible to get the weight coefficients of >>>> logistic regression from a pipeline? >>>> >>>> For instance, I have the followings: >>>> >>>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>>> pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', clf_lr]]) >>>>> pipe_lr.fit(X, y) >>>> >>>> >>>> Does pipe_lr have an attribute that I can call to get the weight >>>> coefficient? >>>> >>>> Or do I have to get it from the classifier as follows? >>>> >>>>> X_std = StandardScaler().fit_transform(X) >>>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>>> clf_lr.fit(X_std, y) >>>>> clf_lr.coef_ >>>> >>>> >>>> Thank you, >>>> Raga >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Aug 28 15:20:27 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 28 Aug 2017 15:20:27 -0400 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: you can also use grid.best_estimator_ (and then all the rest) On 08/28/2017 03:07 PM, Raga Markely wrote: > Ah.. got it :D.. > > The pipeline was run in gridsearchcv.. > > It works now after calling fit.. > > Thanks! > Raga > > On Mon, Aug 28, 2017 at 2:55 PM, Andreas Mueller > wrote: > > Have you called "fit" on the pipeline? > > > On 08/28/2017 02:12 PM, Raga Markely wrote: >> Thank you, Andreas. >> >> When I try >> >> pipe_lr.named_steps['clf'].coef_ >> >> >> I get: >> >> AttributeError: 'LogisticRegression' object has no attribute >> 'coef_' >> >> >> And when I try: >> >> pipe_lr.named_steps['clf'] >> >> >> I get: >> >> LogisticRegression(C=0.1, class_weight=None, dual=False, >> fit_intercept=True, intercept_scaling=1, max_iter=100, >> multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, >> solver='liblinear', tol=0.0001, verbose=0, warm_start=False) >> >> >> I wonder what I am missing? >> >> Thanks, >> Raga >> >> >> On Mon, Aug 28, 2017 at 12:01 PM, Andreas Mueller >> > wrote: >> >> Can can get the coefficients on the scaled data with >> pipeline_lr.named_steps_['clf'].coef_ >> though >> >> >> On 08/28/2017 12:08 AM, Raga Markely wrote: >>> No problem, thank you! >>> >>> Best, >>> Raga >>> >>> On Mon, Aug 28, 2017 at 12:01 AM, Joel Nothman >>> > wrote: >>> >>> No, we do not have a way to get the coefficients with >>> respect to the input (pre-scaling) space. >>> >>> On 28 August 2017 at 13:20, Raga Markely >>> > >>> wrote: >>> >>> Hello, >>> >>> I am wondering if it's possible to get the weight >>> coefficients of logistic regression from a pipeline? >>> >>> For instance, I have the followings: >>> >>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>> pipe_lr = Pipeline([['sc', StandardScaler()], >>> ['clf', clf_lr]]) >>> pipe_lr.fit(X, y) >>> >>> >>> Does pipe_lr have an attribute that I can call to >>> get the weight coefficient? >>> >>> Or do I have to get it from the classifier as follows? >>> >>> X_std = StandardScaler().fit_transform(X) >>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>> clf_lr.fit(X_std, y) >>> clf_lr.coef_ >>> >>> >>> Thank you, >>> Raga >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From raga.markely at gmail.com Mon Aug 28 16:32:48 2017 From: raga.markely at gmail.com (Raga Markely) Date: Mon, 28 Aug 2017 16:32:48 -0400 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: Sounds good.. tried it and works.. thank you! On Mon, Aug 28, 2017 at 3:20 PM, Andreas Mueller wrote: > you can also use grid.best_estimator_ (and then all the rest) > > On 08/28/2017 03:07 PM, Raga Markely wrote: > > Ah.. got it :D.. > > The pipeline was run in gridsearchcv.. > > It works now after calling fit.. > > Thanks! > Raga > > On Mon, Aug 28, 2017 at 2:55 PM, Andreas Mueller wrote: > >> Have you called "fit" on the pipeline? >> >> >> On 08/28/2017 02:12 PM, Raga Markely wrote: >> >> Thank you, Andreas. >> >> When I try >> >>> pipe_lr.named_steps['clf'].coef_ >> >> >> I get: >> >>> AttributeError: 'LogisticRegression' object has no attribute 'coef_' >> >> >> And when I try: >> >>> pipe_lr.named_steps['clf'] >> >> >> I get: >> >>> LogisticRegression(C=0.1, class_weight=None, dual=False, >>> fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', >>> n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, >>> verbose=0, warm_start=False) >> >> >> I wonder what I am missing? >> >> Thanks, >> Raga >> >> >> On Mon, Aug 28, 2017 at 12:01 PM, Andreas Mueller >> wrote: >> >>> Can can get the coefficients on the scaled data with >>> pipeline_lr.named_steps_['clf'].coef_ >>> though >>> >>> >>> On 08/28/2017 12:08 AM, Raga Markely wrote: >>> >>> No problem, thank you! >>> >>> Best, >>> Raga >>> >>> On Mon, Aug 28, 2017 at 12:01 AM, Joel Nothman >>> wrote: >>> >>>> No, we do not have a way to get the coefficients with respect to the >>>> input (pre-scaling) space. >>>> >>>> On 28 August 2017 at 13:20, Raga Markely >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I am wondering if it's possible to get the weight coefficients of >>>>> logistic regression from a pipeline? >>>>> >>>>> For instance, I have the followings: >>>>> >>>>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>>>> pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', clf_lr]]) >>>>>> pipe_lr.fit(X, y) >>>>> >>>>> >>>>> Does pipe_lr have an attribute that I can call to get the weight >>>>> coefficient? >>>>> >>>>> Or do I have to get it from the classifier as follows? >>>>> >>>>>> X_std = StandardScaler().fit_transform(X) >>>>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>>>> clf_lr.fit(X_std, y) >>>>>> clf_lr.coef_ >>>>> >>>>> >>>>> Thank you, >>>>> Raga >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Mon Aug 28 16:42:27 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 28 Aug 2017 22:42:27 +0200 Subject: [scikit-learn] scikit-learn-commits mailing list defunct? In-Reply-To: <465fa212-0fb6-cfe2-2281-416c672429ca@gmail.com> References: <465fa212-0fb6-cfe2-2281-416c672429ca@gmail.com> Message-ID: +1 ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Mon Aug 28 16:42:51 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 28 Aug 2017 22:42:51 +0200 Subject: [scikit-learn] scikit-learn-commits mailing list defunct? In-Reply-To: References: <465fa212-0fb6-cfe2-2281-416c672429ca@gmail.com> Message-ID: +1 for python.org if they accept this kind of mailing lists. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Mon Aug 28 20:23:03 2017 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 29 Aug 2017 10:23:03 +1000 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: Sorry if I misunderstood your question. On 29 August 2017 at 06:32, Raga Markely wrote: > Sounds good.. tried it and works.. thank you! > > On Mon, Aug 28, 2017 at 3:20 PM, Andreas Mueller wrote: > >> you can also use grid.best_estimator_ (and then all the rest) >> >> On 08/28/2017 03:07 PM, Raga Markely wrote: >> >> Ah.. got it :D.. >> >> The pipeline was run in gridsearchcv.. >> >> It works now after calling fit.. >> >> Thanks! >> Raga >> >> On Mon, Aug 28, 2017 at 2:55 PM, Andreas Mueller >> wrote: >> >>> Have you called "fit" on the pipeline? >>> >>> >>> On 08/28/2017 02:12 PM, Raga Markely wrote: >>> >>> Thank you, Andreas. >>> >>> When I try >>> >>>> pipe_lr.named_steps['clf'].coef_ >>> >>> >>> I get: >>> >>>> AttributeError: 'LogisticRegression' object has no attribute 'coef_' >>> >>> >>> And when I try: >>> >>>> pipe_lr.named_steps['clf'] >>> >>> >>> I get: >>> >>>> LogisticRegression(C=0.1, class_weight=None, dual=False, >>>> fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', >>>> n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, >>>> verbose=0, warm_start=False) >>> >>> >>> I wonder what I am missing? >>> >>> Thanks, >>> Raga >>> >>> >>> On Mon, Aug 28, 2017 at 12:01 PM, Andreas Mueller >>> wrote: >>> >>>> Can can get the coefficients on the scaled data with >>>> pipeline_lr.named_steps_['clf'].coef_ >>>> though >>>> >>>> >>>> On 08/28/2017 12:08 AM, Raga Markely wrote: >>>> >>>> No problem, thank you! >>>> >>>> Best, >>>> Raga >>>> >>>> On Mon, Aug 28, 2017 at 12:01 AM, Joel Nothman >>>> wrote: >>>> >>>>> No, we do not have a way to get the coefficients with respect to the >>>>> input (pre-scaling) space. >>>>> >>>>> On 28 August 2017 at 13:20, Raga Markely >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I am wondering if it's possible to get the weight coefficients of >>>>>> logistic regression from a pipeline? >>>>>> >>>>>> For instance, I have the followings: >>>>>> >>>>>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>>>>> pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', clf_lr]]) >>>>>>> pipe_lr.fit(X, y) >>>>>> >>>>>> >>>>>> Does pipe_lr have an attribute that I can call to get the weight >>>>>> coefficient? >>>>>> >>>>>> Or do I have to get it from the classifier as follows? >>>>>> >>>>>>> X_std = StandardScaler().fit_transform(X) >>>>>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>>>>> clf_lr.fit(X_std, y) >>>>>>> clf_lr.coef_ >>>>>> >>>>>> >>>>>> Thank you, >>>>>> Raga >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From raga.markely at gmail.com Mon Aug 28 21:06:49 2017 From: raga.markely at gmail.com (Raga Markely) Date: Mon, 28 Aug 2017 21:06:49 -0400 Subject: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline In-Reply-To: References: Message-ID: No worries.. ur answer is helpful for me too.. I was actually exploring different ways to get the coeff, what i can and can't get :).. Thanks! On Aug 28, 2017 8:24 PM, "Joel Nothman" wrote: > Sorry if I misunderstood your question. > > On 29 August 2017 at 06:32, Raga Markely wrote: > >> Sounds good.. tried it and works.. thank you! >> >> On Mon, Aug 28, 2017 at 3:20 PM, Andreas Mueller >> wrote: >> >>> you can also use grid.best_estimator_ (and then all the rest) >>> >>> On 08/28/2017 03:07 PM, Raga Markely wrote: >>> >>> Ah.. got it :D.. >>> >>> The pipeline was run in gridsearchcv.. >>> >>> It works now after calling fit.. >>> >>> Thanks! >>> Raga >>> >>> On Mon, Aug 28, 2017 at 2:55 PM, Andreas Mueller >>> wrote: >>> >>>> Have you called "fit" on the pipeline? >>>> >>>> >>>> On 08/28/2017 02:12 PM, Raga Markely wrote: >>>> >>>> Thank you, Andreas. >>>> >>>> When I try >>>> >>>>> pipe_lr.named_steps['clf'].coef_ >>>> >>>> >>>> I get: >>>> >>>>> AttributeError: 'LogisticRegression' object has no attribute 'coef_' >>>> >>>> >>>> And when I try: >>>> >>>>> pipe_lr.named_steps['clf'] >>>> >>>> >>>> I get: >>>> >>>>> LogisticRegression(C=0.1, class_weight=None, dual=False, >>>>> fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', >>>>> n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, >>>>> verbose=0, warm_start=False) >>>> >>>> >>>> I wonder what I am missing? >>>> >>>> Thanks, >>>> Raga >>>> >>>> >>>> On Mon, Aug 28, 2017 at 12:01 PM, Andreas Mueller >>>> wrote: >>>> >>>>> Can can get the coefficients on the scaled data with >>>>> pipeline_lr.named_steps_['clf'].coef_ >>>>> though >>>>> >>>>> >>>>> On 08/28/2017 12:08 AM, Raga Markely wrote: >>>>> >>>>> No problem, thank you! >>>>> >>>>> Best, >>>>> Raga >>>>> >>>>> On Mon, Aug 28, 2017 at 12:01 AM, Joel Nothman >>>> > wrote: >>>>> >>>>>> No, we do not have a way to get the coefficients with respect to the >>>>>> input (pre-scaling) space. >>>>>> >>>>>> On 28 August 2017 at 13:20, Raga Markely >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I am wondering if it's possible to get the weight coefficients of >>>>>>> logistic regression from a pipeline? >>>>>>> >>>>>>> For instance, I have the followings: >>>>>>> >>>>>>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>>>>>> pipe_lr = Pipeline([['sc', StandardScaler()], ['clf', clf_lr]]) >>>>>>>> pipe_lr.fit(X, y) >>>>>>> >>>>>>> >>>>>>> Does pipe_lr have an attribute that I can call to get the weight >>>>>>> coefficient? >>>>>>> >>>>>>> Or do I have to get it from the classifier as follows? >>>>>>> >>>>>>>> X_std = StandardScaler().fit_transform(X) >>>>>>>> clf_lr = LogisticRegression(penalty='l1', C=0.1) >>>>>>>> clf_lr.fit(X_std, y) >>>>>>>> clf_lr.coef_ >>>>>>> >>>>>>> >>>>>>> Thank you, >>>>>>> Raga >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn at python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jonny.Evans at soton.ac.uk Wed Aug 30 05:50:47 2017 From: Jonny.Evans at soton.ac.uk (Evans J.R.A.) Date: Wed, 30 Aug 2017 09:50:47 +0000 Subject: [scikit-learn] FW: Random Forest Regressor criterion Message-ID: Hi there, I would like to fully understand how the Random Forest Regressor chooses how to split the data at each node. I understand that each tree considers a boostrap sample of the training data, and on each split a random subset of features (using max_features) are considered. But among these features, how does the algorithm work out which is the best split to make? I am using the default criterion 'mse', but don't understand the given explanation "equal to variance reduction as feature selection criterion". Does this mean that for each possible split that could be made, the sum of variances of data in the child nodes is calculated, then the algorithm would use the split with the least sum of variances? Kind regards, Jonny Evans Doctoral Researcher Transportation Research Group Faculty of Engineering and the Environment University of Southampton Email: Jonny.Evans at soton.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Thu Aug 31 01:47:44 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Thu, 31 Aug 2017 01:47:44 -0400 Subject: [scikit-learn] Random Forest Regressor criterion In-Reply-To: References: Message-ID: Hi, regarding MSE minimization vs variance reduction; it's been a few years but I remember that we had a discussion about that, where Gilles Louppe explained that those two are identical when I was confused about the wikipedia equation at https://en.wikipedia.org/wiki/Decision_tree_learning#Variance_reduction (I didn't read carefully and somehow thought that x_i etc was referring to feature columns instead of x being the target variable :P). A better resource: I think Gilles also had a page about that in his thesis but I currently can't find the page. The thesis should be accessible from https://arxiv.org/abs/1407.7502 though, and I would recommend taking a look at "3.6.3 Finding the best binary split" and page 108+ on how it's implemented (if this is still up to date with the current implementation!?). This would probably address all your questions :). Best, Sebastian > On Aug 30, 2017, at 5:50 AM, Evans J.R.A. wrote: > > Hi there, > > I would like to fully understand how the Random Forest Regressor chooses how to split the data at each node. > > I understand that each tree considers a boostrap sample of the training data, and on each split a random subset of features (using max_features) are considered. But among these features, how does the algorithm work out which is the best split to make? I am using the default criterion ?mse?, but don?t understand the given explanation ?equal to variance reduction as feature selection criterion?. Does this mean that for each possible split that could be made, the sum of variances of data in the child nodes is calculated, then the algorithm would use the split with the least sum of variances? > > Kind regards, > > Jonny Evans > Doctoral Researcher > Transportation Research Group > Faculty of Engineering and the Environment > University of Southampton > Email: Jonny.Evans at soton.ac.uk > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn