From olivier.grisel at ensta.org Thu Sep 1 04:43:59 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 1 Sep 2016 10:43:59 +0200 Subject: [scikit-learn] Declaring numpy and scipy dependencies? In-Reply-To: <36f5d0ef-397d-f5bc-c312-19793482fb06@gmail.com> References: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> <705a27d4-3643-bc9b-11a8-80ba0f6752bf@gmail.com> <36f5d0ef-397d-f5bc-c312-19793482fb06@gmail.com> Message-ID: I would be +1 to add the dependencies to numpy and scipy on the binary wheels only. We don't have the tools yet but this could be implemented in the auditwheel tool that is already used to generate the manylinux1 compatible wheels for Linux. -- Olivier From popeye2408 at googlemail.com Thu Sep 1 14:28:26 2016 From: popeye2408 at googlemail.com (Daniel Seeliger) Date: Thu, 1 Sep 2016 20:28:26 +0200 Subject: [scikit-learn] Confidence Estimation for Regressor Predictions Message-ID: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> Dear all, For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. Thanks a lot for your help, Daniel From Dale.T.Smith at macys.com Thu Sep 1 14:32:09 2016 From: Dale.T.Smith at macys.com (Dale T Smith) Date: Thu, 1 Sep 2016 18:32:09 +0000 Subject: [scikit-learn] Confidence Estimation for Regressor Predictions In-Reply-To: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> References: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> Message-ID: There is a scikit-learn-contrib project with confidence intervals for random forests. https://github.com/scikit-learn-contrib/forest-confidence-interval __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning |?5985 State Bridge Road, Johns Creek, GA 30097?|?dale.t.smith at macys.com -----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Daniel Seeliger via scikit-learn Sent: Thursday, September 1, 2016 2:28 PM To: scikit-learn at python.org Cc: Daniel Seeliger Subject: [scikit-learn] Confidence Estimation for Regressor Predictions ? EXT MSG: Dear all, For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. Thanks a lot for your help, Daniel _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. From rth.yurchak at gmail.com Thu Sep 1 15:45:01 2016 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Thu, 1 Sep 2016 21:45:01 +0200 Subject: [scikit-learn] Confidence Estimation for Regressor Predictions In-Reply-To: References: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> Message-ID: <57C8853D.7030109@gmail.com> I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors. In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. -- Roman On 01/09/16 20:32, Dale T Smith wrote: > There is a scikit-learn-contrib project with confidence intervals for random forests. > > https://github.com/scikit-learn-contrib/forest-confidence-interval > > > __________________________________________________________________________________________ > Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning > | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com > > -----Original Message----- > From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Daniel Seeliger via scikit-learn > Sent: Thursday, September 1, 2016 2:28 PM > To: scikit-learn at python.org > Cc: Daniel Seeliger > Subject: [scikit-learn] Confidence Estimation for Regressor Predictions > > ? EXT MSG: > > Dear all, > > For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. > > Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. > > Thanks a lot for your help, > Daniel > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From Dale.T.Smith at macys.com Thu Sep 1 15:55:02 2016 From: Dale.T.Smith at macys.com (Dale T Smith) Date: Thu, 1 Sep 2016 19:55:02 +0000 Subject: [scikit-learn] Confidence Estimation for Regressor Predictions In-Reply-To: <57C8853D.7030109@gmail.com> References: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> <57C8853D.7030109@gmail.com> Message-ID: Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning |?5985 State Bridge Road, Johns Creek, GA 30097?|?dale.t.smith at macys.com -----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Roman Yurchak Sent: Thursday, September 1, 2016 3:45 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions ? EXT MSG: I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors. In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. -- Roman On 01/09/16 20:32, Dale T Smith wrote: > There is a scikit-learn-contrib project with confidence intervals for random forests. > > https://github.com/scikit-learn-contrib/forest-confidence-interval > > > __________________________________________________________________________________________ > Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning > | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com > > -----Original Message----- > From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Daniel Seeliger via scikit-learn > Sent: Thursday, September 1, 2016 2:28 PM > To: scikit-learn at python.org > Cc: Daniel Seeliger > Subject: [scikit-learn] Confidence Estimation for Regressor Predictions > > ? EXT MSG: > > Dear all, > > For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. > > Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. > > Thanks a lot for your help, > Daniel > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. From jurafejfar at gmail.com Thu Sep 1 16:00:50 2016 From: jurafejfar at gmail.com (=?UTF-8?B?SmnFmcOtIEZlamZhcg==?=) Date: Thu, 1 Sep 2016 22:00:50 +0200 Subject: [scikit-learn] Confidence Estimation for Regressor Predictions In-Reply-To: <57C8853D.7030109@gmail.com> References: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> <57C8853D.7030109@gmail.com> Message-ID: Maybe you can also use bootstrap method published by Efron? You can see https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf It is implemented in resampling module with replacement option, if I can understand. J. Dne 1.9.2016 21:46 napsal u?ivatel "Roman Yurchak" : > I'm also interested to know if there are any projects similar to > scikit-learn-contrib/forest-confidence-interval for linear_model or SVM > regressors. > > In the general case, I think you could get a quick first order > approximation of the confidence interval for your regressor, if you take > the standard deviation of predictions obtained by fitting different > subsets of your data using, > cross_validation.cross_val_score( ).std() > with a fixed set of estimator parameters? Or some multiple of it (e.g. > 2*std). Though this will probably not match exactly the mathematical > definition of a confidence interval. > -- > Roman > > > On 01/09/16 20:32, Dale T Smith wrote: > > There is a scikit-learn-contrib project with confidence intervals for > random forests. > > > > https://github.com/scikit-learn-contrib/forest-confidence-interval > > > > > > ____________________________________________________________ > ______________________________ > > Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data > Science and Capacity Planning > > | 5985 State Bridge Road, Johns Creek, GA 30097 | > dale.t.smith at macys.com > > > > -----Original Message----- > > From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith= > macys.com at python.org] On Behalf Of Daniel Seeliger via scikit-learn > > Sent: Thursday, September 1, 2016 2:28 PM > > To: scikit-learn at python.org > > Cc: Daniel Seeliger > > Subject: [scikit-learn] Confidence Estimation for Regressor Predictions > > > > ? EXT MSG: > > > > Dear all, > > > > For classifiers I make use of the predict_proba method to compute a Gini > coefficient or entropy to get an estimate of how "sure" the model is about > an individual prediction. > > > > Is there anything similar I could use for regression models? I guess for > a RandomForest I could simply use the indiviual predictions of each tree in > clf.estimators_ and compute a standard deviation but I guess this is not a > generic approach I can use for other regressors like the > GradientBoostingRegressor or a SVR. > > > > Thanks a lot for your help, > > Daniel > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > * This is an EXTERNAL EMAIL. Stop and think before clicking a link or > opening attachments. > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at gmail.com Thu Sep 1 17:13:45 2016 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Thu, 1 Sep 2016 23:13:45 +0200 Subject: [scikit-learn] Confidence Estimation for Regressor Predictions In-Reply-To: References: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> <57C8853D.7030109@gmail.com> Message-ID: <57C89A09.4090100@gmail.com> Dale, I meant for all the methods in scikit.linear_model. Linear regression is well known, but say for rigde regression that does not look that simple http://stats.stackexchange.com/a/15417 . Thanks for mentioning the bootstrap method! -- Roman On 01/09/16 21:55, Dale T Smith wrote: > Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared. > > > __________________________________________________________________________________________ > Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning > | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com > > > -----Original Message----- > From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Roman Yurchak > Sent: Thursday, September 1, 2016 3:45 PM > To: Scikit-learn user and developer mailing list > Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions > > ? EXT MSG: > > I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors. > > In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, > cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. > 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. > -- > Roman > > > On 01/09/16 20:32, Dale T Smith wrote: >> There is a scikit-learn-contrib project with confidence intervals for random forests. >> >> https://github.com/scikit-learn-contrib/forest-confidence-interval >> >> >> __________________________________________________________________________________________ >> Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning >> | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com >> >> -----Original Message----- >> From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Daniel Seeliger via scikit-learn >> Sent: Thursday, September 1, 2016 2:28 PM >> To: scikit-learn at python.org >> Cc: Daniel Seeliger >> Subject: [scikit-learn] Confidence Estimation for Regressor Predictions >> >> ? EXT MSG: >> >> Dear all, >> >> For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. >> >> Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. >> >> Thanks a lot for your help, >> Daniel >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From jeff1evesque at yahoo.com Fri Sep 2 00:19:09 2016 From: jeff1evesque at yahoo.com (Jeffrey Levesque) Date: Fri, 2 Sep 2016 00:19:09 -0400 Subject: [scikit-learn] Confidence Estimation for Regressor Predictions In-Reply-To: <57C89A09.4090100@gmail.com> References: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> <57C8853D.7030109@gmail.com> <57C89A09.4090100@gmail.com> Message-ID: Hi All, I am also interested in determining a confidence level associated with an SVM, or SVR prediction. Is there a nice way to generalize this confidence regardless of the kernel chosen, for the given SVM or SVR implementation? Last year I generally tried the 'predict_proba' method, which was not very good, when implemented generically: - https://github.com/jeff1evesque/machine-learning/issues/1924#issuecomment-159491052 The 'decision_function' performed a little better. But, are my examples poor, because the sample data is too small for accurate confidence measurements? Would both the 'decision_function', and 'predict_proba' improve if my dataset was much larger, or should I customize the latter methods? Feel free to make any comments on the above github issue. I've spent more time on the web tools of that repository, than understanding the fundamentals of predictions. Forgive me ahead of time. Thank you, Jeff Levesque https://github.com/jeff1evesque > On Sep 1, 2016, at 5:13 PM, Roman Yurchak wrote: > > Dale, I meant for all the methods in scikit.linear_model. Linear > regression is well known, but say for rigde regression that does not > look that simple http://stats.stackexchange.com/a/15417 . > Thanks for mentioning the bootstrap method! > > -- > Roman > >> On 01/09/16 21:55, Dale T Smith wrote: >> Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared. >> >> >> __________________________________________________________________________________________ >> Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning >> | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com >> >> >> -----Original Message----- >> From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Roman Yurchak >> Sent: Thursday, September 1, 2016 3:45 PM >> To: Scikit-learn user and developer mailing list >> Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions >> >> ? EXT MSG: >> >> I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors. >> >> In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, >> cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. >> 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. >> -- >> Roman >> >> >>> On 01/09/16 20:32, Dale T Smith wrote: >>> There is a scikit-learn-contrib project with confidence intervals for random forests. >>> >>> https://github.com/scikit-learn-contrib/forest-confidence-interval >>> >>> >>> __________________________________________________________________________________________ >>> Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning >>> | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com >>> >>> -----Original Message----- >>> From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Daniel Seeliger via scikit-learn >>> Sent: Thursday, September 1, 2016 2:28 PM >>> To: scikit-learn at python.org >>> Cc: Daniel Seeliger >>> Subject: [scikit-learn] Confidence Estimation for Regressor Predictions >>> >>> ? EXT MSG: >>> >>> Dear all, >>> >>> For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. >>> >>> Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. >>> >>> Thanks a lot for your help, >>> Daniel >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From Dale.T.Smith at macys.com Fri Sep 2 08:21:27 2016 From: Dale.T.Smith at macys.com (Dale T Smith) Date: Fri, 2 Sep 2016 12:21:27 +0000 Subject: [scikit-learn] Confidence Estimation for Regressor Predictions In-Reply-To: <57C89A09.4090100@gmail.com> References: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> <57C8853D.7030109@gmail.com> <57C89A09.4090100@gmail.com> Message-ID: Roman, Research in the 1970's that's not well known indicates that the bias for t-statistics, for instance, cancels out in the numerator and denominator. I should have written up something showing how to do the relevant statistical diagnostics for ridge regression, but got laid off an earlier job. Lasso regression is a very different story. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning |?5985 State Bridge Road, Johns Creek, GA 30097?|?dale.t.smith at macys.com -----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Roman Yurchak Sent: Thursday, September 1, 2016 5:14 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions ? EXT MSG: Dale, I meant for all the methods in scikit.linear_model. Linear regression is well known, but say for rigde regression that does not look that simple http://stats.stackexchange.com/a/15417 . Thanks for mentioning the bootstrap method! -- Roman On 01/09/16 21:55, Dale T Smith wrote: > Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared. > > > ______________________________________________________________________ > ____________________ Dale Smith | Macy's Systems and Technology | IFS > eCommerce | Data Science and Capacity Planning | 5985 State Bridge > Road, Johns Creek, GA 30097 | dale.t.smith at macys.com > > > -----Original Message----- > From: scikit-learn > [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On > Behalf Of Roman Yurchak > Sent: Thursday, September 1, 2016 3:45 PM > To: Scikit-learn user and developer mailing list > Subject: Re: [scikit-learn] Confidence Estimation for Regressor > Predictions > > ? EXT MSG: > > I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors. > > In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, > cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. > 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. > -- > Roman > > > On 01/09/16 20:32, Dale T Smith wrote: >> There is a scikit-learn-contrib project with confidence intervals for random forests. >> >> https://github.com/scikit-learn-contrib/forest-confidence-interval >> >> >> _____________________________________________________________________ >> _____________________ Dale Smith | Macy's Systems and Technology | >> IFS eCommerce | Data Science and Capacity Planning | 5985 State >> Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com >> >> -----Original Message----- >> From: scikit-learn >> [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On >> Behalf Of Daniel Seeliger via scikit-learn >> Sent: Thursday, September 1, 2016 2:28 PM >> To: scikit-learn at python.org >> Cc: Daniel Seeliger >> Subject: [scikit-learn] Confidence Estimation for Regressor >> Predictions >> >> ? EXT MSG: >> >> Dear all, >> >> For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. >> >> Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. >> >> Thanks a lot for your help, >> Daniel >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. From Dale.T.Smith at macys.com Fri Sep 2 08:34:03 2016 From: Dale.T.Smith at macys.com (Dale T Smith) Date: Fri, 2 Sep 2016 12:34:03 +0000 Subject: [scikit-learn] Confidence Estimation for Regressor Predictions In-Reply-To: References: <3A554CF0-3DD8-4DC0-ACE2-1E0491D815DE@googlemail.com> <57C8853D.7030109@gmail.com> <57C89A09.4090100@gmail.com> Message-ID: I do not know of any research related to any estimators except linear_model and forests of trees. Knowledge of the underlying distributions is required for confidence intervals. The Jackknife and bootstrap are the most common methods to obtain this information from the data. If anyone knows of these techniques applied more widely in machine learning to measure confidence intervals, please post the references. I think providing these measures in scikit-learn-contrib provides the entire project with features other packages don't have. Here's an example of the work done on the StatML side, "Distribution-Free Predictive Inference for Regression" http://www.stat.cmu.edu/~ryantibs/papers/conformal.pdf Note the use of leave-one-covariate-out to estimate variable importance. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning |?5985 State Bridge Road, Johns Creek, GA 30097?|?dale.t.smith at macys.com -----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Jeffrey Levesque via scikit-learn Sent: Friday, September 2, 2016 12:19 AM To: Scikit-learn user and developer mailing list Cc: Jeffrey Levesque Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions ? EXT MSG: Hi All, I am also interested in determining a confidence level associated with an SVM, or SVR prediction. Is there a nice way to generalize this confidence regardless of the kernel chosen, for the given SVM or SVR implementation? Last year I generally tried the 'predict_proba' method, which was not very good, when implemented generically: - https://github.com/jeff1evesque/machine-learning/issues/1924#issuecomment-159491052 The 'decision_function' performed a little better. But, are my examples poor, because the sample data is too small for accurate confidence measurements? Would both the 'decision_function', and 'predict_proba' improve if my dataset was much larger, or should I customize the latter methods? Feel free to make any comments on the above github issue. I've spent more time on the web tools of that repository, than understanding the fundamentals of predictions. Forgive me ahead of time. Thank you, Jeff Levesque https://github.com/jeff1evesque > On Sep 1, 2016, at 5:13 PM, Roman Yurchak wrote: > > Dale, I meant for all the methods in scikit.linear_model. Linear > regression is well known, but say for rigde regression that does not > look that simple http://stats.stackexchange.com/a/15417 . > Thanks for mentioning the bootstrap method! > > -- > Roman > >> On 01/09/16 21:55, Dale T Smith wrote: >> Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared. >> >> >> _____________________________________________________________________ >> _____________________ Dale Smith | Macy's Systems and Technology | >> IFS eCommerce | Data Science and Capacity Planning >> | 5985 State Bridge Road, Johns Creek, GA 30097 | >> | dale.t.smith at macys.com >> >> >> -----Original Message----- >> From: scikit-learn >> [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On >> Behalf Of Roman Yurchak >> Sent: Thursday, September 1, 2016 3:45 PM >> To: Scikit-learn user and developer mailing list >> Subject: Re: [scikit-learn] Confidence Estimation for Regressor >> Predictions >> >> ? EXT MSG: >> >> I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors. >> >> In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, >> cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. >> 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. >> -- >> Roman >> >> >>> On 01/09/16 20:32, Dale T Smith wrote: >>> There is a scikit-learn-contrib project with confidence intervals for random forests. >>> >>> https://github.com/scikit-learn-contrib/forest-confidence-interval >>> >>> >>> ____________________________________________________________________ >>> ______________________ Dale Smith | Macy's Systems and Technology | >>> IFS eCommerce | Data Science and Capacity Planning >>> | 5985 State Bridge Road, Johns Creek, GA 30097 | >>> | dale.t.smith at macys.com >>> >>> -----Original Message----- >>> From: scikit-learn >>> [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On >>> Behalf Of Daniel Seeliger via scikit-learn >>> Sent: Thursday, September 1, 2016 2:28 PM >>> To: scikit-learn at python.org >>> Cc: Daniel Seeliger >>> Subject: [scikit-learn] Confidence Estimation for Regressor >>> Predictions >>> >>> ? EXT MSG: >>> >>> Dear all, >>> >>> For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. >>> >>> Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. >>> >>> Thanks a lot for your help, >>> Daniel >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. From drraph at gmail.com Wed Sep 7 08:17:46 2016 From: drraph at gmail.com (Raphael C) Date: Wed, 7 Sep 2016 13:17:46 +0100 Subject: [scikit-learn] How to get the factorization from NMF in scikit learn Message-ID: I am trying to use NMF from scikit learn. Given a matrix A this should give me a factorization into matrices W and H so that WH is approximately equal to A. As a sanity check I tried the following: from sklearn.decomposition import NMF import numpy as np A = np.array([[0,1,0],[1,0,1],[1,1,0]]) nmf = NMF(n_components=3, init='random', random_state=0) print nmf.components_ This gives me a single 3 by 3 matrix as output. What is this representing? I want the two matrices W and H from the factorization. How can I get these two matrices? I am sure I am just missing something simple. Raphael From zephyr14 at gmail.com Wed Sep 7 08:32:16 2016 From: zephyr14 at gmail.com (Vlad Niculae) Date: Wed, 7 Sep 2016 08:32:16 -0400 Subject: [scikit-learn] How to get the factorization from NMF in scikit learn In-Reply-To: References: Message-ID: Hi Raphael, The other matrix in the factorization is the output of nmf.transform(A). In your example you forgot to fit the estimator; if you're just interested in the decomposition the recommended way is to get it in one line with W = nmf.fit_transform(A). While the mathematical description doesn't make it immediately obvious, the scikit-learn API makes a distinction between the two factors W, H based on whether they're in the samples or the features direction. W is a representation of the samples in the learned latent space, shape (n_samples, n_components). Meanwhile, H is a representation of the features, so it's useful to store it *in the transformer* in case more samples arise from the same sample representation (e.g, at test time) and you want to transform them. HTH, Vlad On Wed, Sep 7, 2016 at 8:17 AM, Raphael C wrote: > I am trying to use NMF from scikit learn. Given a matrix A this should > give me a factorization into matrices W and H so that WH is > approximately equal to A. As a sanity check I tried the following: > > from sklearn.decomposition import NMF > import numpy as np > A = np.array([[0,1,0],[1,0,1],[1,1,0]]) > nmf = NMF(n_components=3, init='random', random_state=0) > print nmf.components_ > > This gives me a single 3 by 3 matrix as output. What is this > representing? I want the two matrices W and H from the factorization. > How can I get these two matrices? > > I am sure I am just missing something simple. > > Raphael > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From piotr.bialecki at hotmail.de Wed Sep 7 14:03:46 2016 From: piotr.bialecki at hotmail.de (Piotr Bialecki) Date: Wed, 7 Sep 2016 18:03:46 +0000 Subject: [scikit-learn] Tuning custom parameters using grid_search Message-ID: Hi all, I am currently tuning some parameters of my xgboost model using scikit's grid_search, e.g.: param_test1 = {'max_depth':range(3,10,2), 'min_child_weight':range(1,6,2) } gsearch1 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, n_estimators=762, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=2809), param_grid = param_test1, scoring='roc_auc', n_jobs=6, iid=False, cv=5) Before that I preprocessed my dataset X with some different methods. These preprocessing steps have some parameters too, which I would like to tune. I know that it is possible to tune the parameters of the preprocessing steps, if they are part pf my pipeline. E.g. if I am using PCA, I could tune the parameter n_components, right? But what if I have some "custom" preprocessing code with some parameters? Is it possible to create a scikit-compatible "object" of my custom code in order to tune the parameters in the pipeline with grid search? Imagine I would like to write a custom method FeatureMultiplier() with a parameter multiplier_value. Is it possible to create a scikit-compatible class out of this method and tune it with grid search? I thought I saw a talk about exactly this topic at some PyData in 2016 or 2015, but unfortunately I cannot find the video of it. Maybe I misunderstood the presentation at that time. Best regards, Piotr -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Wed Sep 7 14:11:36 2016 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Wed, 7 Sep 2016 14:11:36 -0400 Subject: [scikit-learn] Tuning custom parameters using grid_search In-Reply-To: References: Message-ID: You can use a pipeline object to contain both feature selection/transformation steps and an estimator. All elements of a pipeline can then be tuned using gridsearch. You can see a simple example here: http://scikit-learn.org/stable/modules/pipeline.html You may also be interested seeing if the FeatureUnion object can serve the same purpose as your FeatureMultiplier. On Wed, Sep 7, 2016 at 2:03 PM, Piotr Bialecki wrote: > Hi all, > > I am currently tuning some parameters of my xgboost model using scikit's > grid_search, e.g.: > > param_test1 = {'max_depth':range(3,10,2), > 'min_child_weight':range(1,6,2) > } > gsearch1 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, > n_estimators=762, > > max_depth=5, min_child_weight=1, gamma=0, > > subsample=0.8, colsample_bytree=0.8, > > objective= 'binary:logistic', nthread=4, > > scale_pos_weight=1, seed=2809), > param_grid = param_test1, > scoring='roc_auc', > n_jobs=6, > iid=False, cv=5) > > Before that I preprocessed my dataset X with some different methods. > These preprocessing steps have some parameters too, which I would like to > tune. > I know that it is possible to tune the parameters of the preprocessing > steps, > if they are part pf my pipeline. > E.g. if I am using PCA, I could tune the parameter n_components, right? > > But what if I have some "custom" preprocessing code with some parameters? > Is it possible to create a scikit-compatible "object" of my custom code > in order to tune the > parameters in the pipeline with grid search? > Imagine I would like to write a custom method FeatureMultiplier() with a > parameter multiplier_value. > Is it possible to create a scikit-compatible class out of this method and > tune it with grid search? > > I thought I saw a talk about exactly this topic at some PyData in 2016 or > 2015, > but unfortunately I cannot find the video of it. > Maybe I misunderstood the presentation at that time. > > > Best regards, > Piotr > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Wed Sep 7 14:26:55 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Wed, 7 Sep 2016 14:26:55 -0400 Subject: [scikit-learn] Tuning custom parameters using grid_search In-Reply-To: References: Message-ID: Hi, Piotr, > These preprocessing steps have some parameters too, which I would like to tune. > I know that it is possible to tune the parameters of the preprocessing steps, > if they are part pf my pipeline. > E.g. if I am using PCA, I could tune the parameter n_components, right? > > But what if I have some "custom" preprocessing code with some parameters? > Is it possible to create a scikit-compatible "object" of my custom code in order to tune the > parameters in the pipeline with grid search? Yeah, you could use the Pipeline class or the `make_pipeline` function, then you can create a custom estimator using the BaseEstimator class like so: class CustomEstimator(BaseEstimator): def __init__(self, my_param=None): pass def fit_transform(self, X, y=None): return self.fit(X).transform(X) def transform(self, X, y=None): return X def fit(self, X, y=None): return self pipe = make_pipeline(CustomEstimator(), LogisticRegression()) grid = {'customestimator__my_param': [3], 'logisticregression__C': [0.1, 1.0, 10.0]} gsearch1 = GridSearchCV(estimator=pipe, param_grid=grid) gsearch1.fit(X, y) Then, you can put in your desired preprocessing stuff into fit and transform. Best, Sebastian > On Sep 7, 2016, at 2:03 PM, Piotr Bialecki wrote: > > Hi all, > > I am currently tuning some parameters of my xgboost model using scikit's grid_search, e.g.: > > param_test1 = {'max_depth':range(3,10,2), > 'min_child_weight':range(1,6,2) > } > gsearch1 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, n_estimators=762, > max_depth=5, min_child_weight=1, gamma=0, > subsample=0.8, colsample_bytree=0.8, > objective= 'binary:logistic', nthread=4, > scale_pos_weight=1, seed=2809), > param_grid = param_test1, > scoring='roc_auc', > n_jobs=6, > iid=False, cv=5) > > Before that I preprocessed my dataset X with some different methods. > These preprocessing steps have some parameters too, which I would like to tune. > I know that it is possible to tune the parameters of the preprocessing steps, > if they are part pf my pipeline. > E.g. if I am using PCA, I could tune the parameter n_components, right? > > But what if I have some "custom" preprocessing code with some parameters? > Is it possible to create a scikit-compatible "object" of my custom code in order to tune the > parameters in the pipeline with grid search? > Imagine I would like to write a custom method FeatureMultiplier() with a parameter multiplier_value. > Is it possible to create a scikit-compatible class out of this method and tune it with grid search? > > I thought I saw a talk about exactly this topic at some PyData in 2016 or 2015, > but unfortunately I cannot find the video of it. > Maybe I misunderstood the presentation at that time. > > > Best regards, > Piotr > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From mail at sebastianraschka.com Wed Sep 7 14:38:29 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Wed, 7 Sep 2016 14:38:29 -0400 Subject: [scikit-learn] Mailing list "slow"? Message-ID: <73E2228E-8A05-4942-B8A0-CD8A406BD505@sebastianraschka.com> Hi, all, I noticed that it takes forever now until something is posted on the mailing list after I sent it out. Since the switch to Python.org, it takes about ~15 - 45min after hitting ?sent?. I?ve noticed this for months now and was wondering if this is normal or of there?s something going on with my particular mailing list account? (besides the mailing list, my usually arrives within 1-2 seconds, so it?s not a problem with my email client or server in general). Best, Sebastian From piotr.bialecki at hotmail.de Wed Sep 7 15:16:43 2016 From: piotr.bialecki at hotmail.de (Piotr Bialecki) Date: Wed, 7 Sep 2016 19:16:43 +0000 Subject: [scikit-learn] Tuning custom parameters using grid_search In-Reply-To: References: Message-ID: Hi Sebastian, thanks a lot. That was exactly what I was looking for! :) I will have a look into the base classes of other preprocessing steps as well. @Jacob Thank you too! :) Greets, Piotr On 07.09.2016 20:26, Sebastian Raschka wrote: > Hi, Piotr, > > >> These preprocessing steps have some parameters too, which I would like to tune. >> I know that it is possible to tune the parameters of the preprocessing steps, >> if they are part pf my pipeline. >> E.g. if I am using PCA, I could tune the parameter n_components, right? >> >> But what if I have some "custom" preprocessing code with some parameters? >> Is it possible to create a scikit-compatible "object" of my custom code in order to tune the >> parameters in the pipeline with grid search? > Yeah, you could use the Pipeline class or the `make_pipeline` function, then you can create a custom estimator using the BaseEstimator class like so: > > > class CustomEstimator(BaseEstimator): > > def __init__(self, my_param=None): > pass > > def fit_transform(self, X, y=None): > return self.fit(X).transform(X) > > def transform(self, X, y=None): > return X > > def fit(self, X, y=None): > return self > > > pipe = make_pipeline(CustomEstimator(), > LogisticRegression()) > grid = {'customestimator__my_param': [3], > 'logisticregression__C': [0.1, 1.0, 10.0]} > > gsearch1 = GridSearchCV(estimator=pipe, param_grid=grid) > > gsearch1.fit(X, y) > > > Then, you can put in your desired preprocessing stuff into fit and transform. > > Best, > Sebastian > >> On Sep 7, 2016, at 2:03 PM, Piotr Bialecki wrote: >> >> Hi all, >> >> I am currently tuning some parameters of my xgboost model using scikit's grid_search, e.g.: >> >> param_test1 = {'max_depth':range(3,10,2), >> 'min_child_weight':range(1,6,2) >> } >> gsearch1 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, n_estimators=762, >> max_depth=5, min_child_weight=1, gamma=0, >> subsample=0.8, colsample_bytree=0.8, >> objective= 'binary:logistic', nthread=4, >> scale_pos_weight=1, seed=2809), >> param_grid = param_test1, >> scoring='roc_auc', >> n_jobs=6, >> iid=False, cv=5) >> >> Before that I preprocessed my dataset X with some different methods. >> These preprocessing steps have some parameters too, which I would like to tune. >> I know that it is possible to tune the parameters of the preprocessing steps, >> if they are part pf my pipeline. >> E.g. if I am using PCA, I could tune the parameter n_components, right? >> >> But what if I have some "custom" preprocessing code with some parameters? >> Is it possible to create a scikit-compatible "object" of my custom code in order to tune the >> parameters in the pipeline with grid search? >> Imagine I would like to write a custom method FeatureMultiplier() with a parameter multiplier_value. >> Is it possible to create a scikit-compatible class out of this method and tune it with grid search? >> >> I thought I saw a talk about exactly this topic at some PyData in 2016 or 2015, >> but unfortunately I cannot find the video of it. >> Maybe I misunderstood the presentation at that time. >> >> >> Best regards, >> Piotr >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From olivier.grisel at ensta.org Thu Sep 8 09:01:40 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 8 Sep 2016 15:01:40 +0200 Subject: [scikit-learn] Mailing list "slow"? In-Reply-To: <73E2228E-8A05-4942-B8A0-CD8A406BD505@sebastianraschka.com> References: <73E2228E-8A05-4942-B8A0-CD8A406BD505@sebastianraschka.com> Message-ID: I have not noticed it myself. Let me try to time this email to check: sent at 3:01pm CEST. From olivier.grisel at ensta.org Thu Sep 8 09:02:39 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 8 Sep 2016 15:02:39 +0200 Subject: [scikit-learn] Mailing list "slow"? In-Reply-To: References: <73E2228E-8A05-4942-B8A0-CD8A406BD505@sebastianraschka.com> Message-ID: It's already in the archive: https://mail.python.org/pipermail/scikit-learn/2016-September/000495.html -- Olivier From gael.varoquaux at normalesup.org Thu Sep 8 09:06:09 2016 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 8 Sep 2016 15:06:09 +0200 Subject: [scikit-learn] Mailing list "slow"? In-Reply-To: References: <73E2228E-8A05-4942-B8A0-CD8A406BD505@sebastianraschka.com> Message-ID: <20160908130609.GJ35579@phare.normalesup.org> I recieved it. G On Thu, Sep 08, 2016 at 03:02:39PM +0200, Olivier Grisel wrote: > It's already in the archive: > https://mail.python.org/pipermail/scikit-learn/2016-September/000495.html -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From Dale.T.Smith at macys.com Thu Sep 8 09:09:29 2016 From: Dale.T.Smith at macys.com (Dale T Smith) Date: Thu, 8 Sep 2016 13:09:29 +0000 Subject: [scikit-learn] Mailing list "slow"? In-Reply-To: <20160908130609.GJ35579@phare.normalesup.org> References: <73E2228E-8A05-4942-B8A0-CD8A406BD505@sebastianraschka.com> <20160908130609.GJ35579@phare.normalesup.org> Message-ID: Likewise here in the U.S. - Atlanta, GA. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning |?5985 State Bridge Road, Johns Creek, GA 30097?|?dale.t.smith at macys.com -----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Gael Varoquaux Sent: Thursday, September 8, 2016 9:06 AM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Mailing list "slow"? ? EXT MSG: I recieved it. G On Thu, Sep 08, 2016 at 03:02:39PM +0200, Olivier Grisel wrote: > It's already in the archive: > https://mail.python.org/pipermail/scikit-learn/2016-September/000495.html -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. From se.raschka at gmail.com Thu Sep 8 09:30:33 2016 From: se.raschka at gmail.com (Sebastian Raschka) Date: Thu, 8 Sep 2016 09:30:33 -0400 Subject: [scikit-learn] Mailing list "slow"? In-Reply-To: References: <73E2228E-8A05-4942-B8A0-CD8A406BD505@sebastianraschka.com> <20160908130609.GJ35579@phare.normalesup.org> Message-ID: <1C96E1A9-E9FC-4112-B889-A7B2AD9D3D25@gmail.com> Thanks! So it must be something on my side (or sth. weird with this email account in combination with the Python mailing list). Sorry for spamming, but let me try using my gmail account and send 2 mails simultaneously (I will later delete one of the two). 9:30:30 AM EDT (from gmail) > On Sep 8, 2016, at 9:09 AM, Dale T Smith wrote: > > Likewise here in the U.S. - Atlanta, GA. > > > __________________________________________________________________________________________ > Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning > | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com > > -----Original Message----- > From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Gael Varoquaux > Sent: Thursday, September 8, 2016 9:06 AM > To: Scikit-learn user and developer mailing list > Subject: Re: [scikit-learn] Mailing list "slow"? > > ? EXT MSG: > > I recieved it. > > G > > On Thu, Sep 08, 2016 at 03:02:39PM +0200, Olivier Grisel wrote: >> It's already in the archive: > >> https://mail.python.org/pipermail/scikit-learn/2016-September/000495.html > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From mail at sebastianraschka.com Thu Sep 8 09:29:52 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 8 Sep 2016 09:29:52 -0400 Subject: [scikit-learn] Mailing list "slow"? In-Reply-To: References: <73E2228E-8A05-4942-B8A0-CD8A406BD505@sebastianraschka.com> <20160908130609.GJ35579@phare.normalesup.org> Message-ID: Thanks! So it must be something on my side (or sth. weird with this email account in combination with the Python mailing list). Sorry for spamming, but let me try using my gmail account and send 2 mails simultaneously (I will later delete one of the two). 9:29:50 AM EDT > On Sep 8, 2016, at 9:09 AM, Dale T Smith wrote: > > Likewise here in the U.S. - Atlanta, GA. > > > __________________________________________________________________________________________ > Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning > | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com > > -----Original Message----- > From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Gael Varoquaux > Sent: Thursday, September 8, 2016 9:06 AM > To: Scikit-learn user and developer mailing list > Subject: Re: [scikit-learn] Mailing list "slow"? > > ? EXT MSG: > > I recieved it. > > G > > On Thu, Sep 08, 2016 at 03:02:39PM +0200, Olivier Grisel wrote: >> It's already in the archive: > >> https://mail.python.org/pipermail/scikit-learn/2016-September/000495.html > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From se.raschka at gmail.com Thu Sep 8 09:48:22 2016 From: se.raschka at gmail.com (Sebastian Raschka) Date: Thu, 8 Sep 2016 09:48:22 -0400 Subject: [scikit-learn] Mailing list "slow"? In-Reply-To: References: <73E2228E-8A05-4942-B8A0-CD8A406BD505@sebastianraschka.com> <20160908130609.GJ35579@phare.normalesup.org> Message-ID: <017A1D1D-3489-4A9F-8E39-186527253F61@gmail.com> Okay, it?s my @sebastianraschka.com domain then: it took ~15 minutes this time (gmail ~ 1 min). Maybe the former is going through a more rigorous filtering on the mailserver since it is an unknown domain name or so. In any case, I will use my gmail address on the mailing list then, sorry for the bother :P > On Sep 8, 2016, at 9:29 AM, Sebastian Raschka wrote: > > Thanks! So it must be something on my side (or sth. weird with this email account in combination with the Python mailing list). Sorry for spamming, but let me try using my gmail account and send 2 mails simultaneously (I will later delete one of the two). > > 9:29:50 AM EDT > >> On Sep 8, 2016, at 9:09 AM, Dale T Smith wrote: >> >> Likewise here in the U.S. - Atlanta, GA. >> >> >> __________________________________________________________________________________________ >> Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning >> | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com >> >> -----Original Message----- >> From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Gael Varoquaux >> Sent: Thursday, September 8, 2016 9:06 AM >> To: Scikit-learn user and developer mailing list >> Subject: Re: [scikit-learn] Mailing list "slow"? >> >> ? EXT MSG: >> >> I recieved it. >> >> G >> >> On Thu, Sep 08, 2016 at 03:02:39PM +0200, Olivier Grisel wrote: >>> It's already in the archive: >> >>> https://mail.python.org/pipermail/scikit-learn/2016-September/000495.html >> -- >> Gael Varoquaux >> Researcher, INRIA Parietal >> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >> Phone: ++ 33-1-69-08-79-68 >> http://gael-varoquaux.info http://twitter.com/GaelVaroquaux >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From klonuo at gmail.com Thu Sep 8 14:40:26 2016 From: klonuo at gmail.com (klo uo) Date: Thu, 8 Sep 2016 20:40:26 +0200 Subject: [scikit-learn] Fwd: Loading file in libsvm format In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: klo uo Date: Thu, Sep 8, 2016 at 8:25 PM Subject: Loading file in libsvm format To: scikit-learn-general at lists.sourceforge.net Hi, I produced a file in libsvm format: