From t3kcit at gmail.com Wed Mar 1 09:49:36 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 1 Mar 2017 09:49:36 -0500 Subject: [scikit-learn] Women in Machine Learning and Data Science Sprint next Weekend (also call for help) In-Reply-To: References: <314038f1-d325-1e0d-8399-aea6a4a47d95@gmail.com>

Message-ID: <8ba7aadc-6d3c-a73b-e3f9-e0705d2c5e3d@gmail.com> Yes, on gitter: https://gitter.im/scikit-learn/wimlds On 02/28/2017 11:07 PM, Jacob Schreiber wrote: > Okay. I will be there. Is there going to be a chat channel of some > sort to organize things? > > On Tue, Feb 28, 2017 at 4:28 PM, Andreas Mueller > wrote: > > Thanks! > It's gonna be 9:30 till 4, but I'd be surprised if there's a lot > going on on the issue tracker before 11h with setup etc. > (EST that is). > > Andy > > > On 02/27/2017 11:58 PM, Jacob Schreiber wrote: >> I will try to carve out some time Saturday to review PRs. What >> time is it occuring? >> >> On Mon, Feb 27, 2017 at 8:50 PM, Andreas Mueller >> > wrote: >> >> Hey all. >> >> There's gonna be an introductory scikit-learn sprint at NYC >> on Saturday that a local Women's DS/ML group is organizing >> with me. >> I feel like we could do a bit more to improve (gender) >> diversity in the scipy/pydata space, and so I think this will >> be cool. >> >> If anyone wants to review code on Saturday that would be a >> great help for people getting started. >> Also, if anyone wants to help beforehand, making sure there >> is enough "easy" and "need contributor" issues tagged >> is important, as well as ensuring that all the tagged issues >> actually still need contributors. >> >> I'll try to do as much of these as I can but my time is >> limited these days :( >> >> Thanks y'all! >> >> Andy >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ scikit-learn > mailing list scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From raga.markely at gmail.com Wed Mar 1 15:07:05 2017 From: raga.markely at gmail.com (Raga Markely) Date: Wed, 1 Mar 2017 15:07:05 -0500 Subject: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression Message-ID: Hi everyone, I wonder if you could provide me with some suggestions on how to determine the confidence and prediction intervals of SVR? If you have suggestions for any machine learning algorithms in general, that would be fine too (doesn't have to be specific for SVR). So far, I have found: 1. Bootstrap: http://stats.stackexchange.com/questions/183230/bootstrapping-confidence-interval-from-a-regression-prediction 2. http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0048723&type=printable 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf But, I don't fully understand the details in #2 and #3 to the point that I can write a step by step code. If I use bootstrap method, I can get the confidence interval as follows? a. Draw bootstrap sample of size n b. Fit the SVR model (with hyperparameters chosen during model selection with grid search cv) to this bootstrap sample c. Use this model to predict the output variable y* from input variable X* d. Repeat step a-c for, for instance, 100 times e. Order the 100 values of y*, and determine, for instance, the 10th percentile and 90th percentile (if we are looking for 0.8 confidence interval) f. Repeat a-e for different values of X* to plot the prediction with confidence interval But, I don't know how to get the prediction interval from here. Thank you very much, Raga -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Wed Mar 1 15:13:41 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Wed, 1 Mar 2017 15:13:41 -0500 Subject: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression In-Reply-To: References: Message-ID: Hi, Raga, I have a short section on this here (https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and-empirical-confidence-intervals) if it helps. Best, Sebastian > On Mar 1, 2017, at 3:07 PM, Raga Markely wrote: > > Hi everyone, > > I wonder if you could provide me with some suggestions on how to determine the confidence and prediction intervals of SVR? If you have suggestions for any machine learning algorithms in general, that would be fine too (doesn't have to be specific for SVR). > > So far, I have found: > 1. Bootstrap: http://stats.stackexchange.com/questions/183230/bootstrapping-confidence-interval-from-a-regression-prediction > 2. http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0048723&type=printable > 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf > > But, I don't fully understand the details in #2 and #3 to the point that I can write a step by step code. If I use bootstrap method, I can get the confidence interval as follows? > a. Draw bootstrap sample of size n > b. Fit the SVR model (with hyperparameters chosen during model selection with grid search cv) to this bootstrap sample > c. Use this model to predict the output variable y* from input variable X* > d. Repeat step a-c for, for instance, 100 times > e. Order the 100 values of y*, and determine, for instance, the 10th percentile and 90th percentile (if we are looking for 0.8 confidence interval) > f. Repeat a-e for different values of X* to plot the prediction with confidence interval > > But, I don't know how to get the prediction interval from here. > > Thank you very much, > Raga > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From raga.markely at gmail.com Wed Mar 1 17:39:52 2017 From: raga.markely at gmail.com (Raga Markely) Date: Wed, 1 Mar 2017 17:39:52 -0500 Subject: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression In-Reply-To: References:

Message-ID: Thanks a lot, Sebastian! Very nicely written. I have a few follow-up questions: 1. Just to make sure I understand correctly, using the .632+ bootstrap method, the ACC_lower and ACC_upper are the lower and higher percentile of the ACC_h,i distribution? 2. For regression algorithms, is there a recommended equation for the no-information rate gamma? 3. I need to plot the confidence interval and prediction interval for my Support Vector Regression prediction (just to clarify these intervals, please see an analogy from linear model on slide 14: http://www2.stat.duke.edu/~tjl13/s101/slides/unit6lec3H.pdf) - can I derive the intervals from .632+ bootstrap method or is there a different way of getting these intervals? Thank you! Raga On Wed, Mar 1, 2017 at 3:13 PM, Sebastian Raschka wrote: > Hi, Raga, > I have a short section on this here (https://sebastianraschka.com/ > blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and- > empirical-confidence-intervals) if it helps. > > Best, > Sebastian > > > On Mar 1, 2017, at 3:07 PM, Raga Markely wrote: > > > > Hi everyone, > > > > I wonder if you could provide me with some suggestions on how to > determine the confidence and prediction intervals of SVR? If you have > suggestions for any machine learning algorithms in general, that would be > fine too (doesn't have to be specific for SVR). > > > > So far, I have found: > > 1. Bootstrap: http://stats.stackexchange.com/questions/183230/ > bootstrapping-confidence-interval-from-a-regression-prediction > > 2. http://journals.plos.org/plosone/article/file?id=10. > 1371/journal.pone.0048723&type=printable > > 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf > > > > But, I don't fully understand the details in #2 and #3 to the point that > I can write a step by step code. If I use bootstrap method, I can get the > confidence interval as follows? > > a. Draw bootstrap sample of size n > > b. Fit the SVR model (with hyperparameters chosen during model selection > with grid search cv) to this bootstrap sample > > c. Use this model to predict the output variable y* from input variable > X* > > d. Repeat step a-c for, for instance, 100 times > > e. Order the 100 values of y*, and determine, for instance, the 10th > percentile and 90th percentile (if we are looking for 0.8 confidence > interval) > > f. Repeat a-e for different values of X* to plot the prediction with > confidence interval > > > > But, I don't know how to get the prediction interval from here. > > > > Thank you very much, > > Raga > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Wed Mar 1 21:44:13 2017 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Wed, 1 Mar 2017 21:44:13 -0500 Subject: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression In-Reply-To: References:

Message-ID: <7B05F1AE-FCE4-413E-B96A-773EEF2D7947@sebastianraschka.com> Hi, Raga, > 1. Just to make sure I understand correctly, using the .632+ bootstrap method, the ACC_lower and ACC_upper are the lower and higher percentile of the ACC_h,i distribution? phew, I am actually not sure anymore ? I think it?s the percentile of the ACC_boot distribution, similar to the ?classic? bootstrap but where ACC_boot got computed from weighted ACC_h,i and ACC_r,i > 2. For regression algorithms, is there a recommended equation for the no-information rate gamma? Sorry, can?t be of much help here; I am not sure what the equivalent of the no-information rate for regression would be ... > On Mar 1, 2017, at 5:39 PM, Raga Markely wrote: > > Thanks a lot, Sebastian! Very nicely written. > > I have a few follow-up questions: > 1. Just to make sure I understand correctly, using the .632+ bootstrap method, the ACC_lower and ACC_upper are the lower and higher percentile of the ACC_h,i distribution? > 2. For regression algorithms, is there a recommended equation for the no-information rate gamma? > 3. I need to plot the confidence interval and prediction interval for my Support Vector Regression prediction (just to clarify these intervals, please see an analogy from linear model on slide 14: http://www2.stat.duke.edu/~tjl13/s101/slides/unit6lec3H.pdf) - can I derive the intervals from .632+ bootstrap method or is there a different way of getting these intervals? > > Thank you! > Raga > > > On Wed, Mar 1, 2017 at 3:13 PM, Sebastian Raschka wrote: > Hi, Raga, > I have a short section on this here (https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and-empirical-confidence-intervals) if it helps. > > Best, > Sebastian > > > On Mar 1, 2017, at 3:07 PM, Raga Markely wrote: > > > > Hi everyone, > > > > I wonder if you could provide me with some suggestions on how to determine the confidence and prediction intervals of SVR? If you have suggestions for any machine learning algorithms in general, that would be fine too (doesn't have to be specific for SVR). > > > > So far, I have found: > > 1. Bootstrap: http://stats.stackexchange.com/questions/183230/bootstrapping-confidence-interval-from-a-regression-prediction > > 2. http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0048723&type=printable > > 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf > > > > But, I don't fully understand the details in #2 and #3 to the point that I can write a step by step code. If I use bootstrap method, I can get the confidence interval as follows? > > a. Draw bootstrap sample of size n > > b. Fit the SVR model (with hyperparameters chosen during model selection with grid search cv) to this bootstrap sample > > c. Use this model to predict the output variable y* from input variable X* > > d. Repeat step a-c for, for instance, 100 times > > e. Order the 100 values of y*, and determine, for instance, the 10th percentile and 90th percentile (if we are looking for 0.8 confidence interval) > > f. Repeat a-e for different values of X* to plot the prediction with confidence interval > > > > But, I don't know how to get the prediction interval from here. > > > > Thank you very much, > > Raga > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From se.raschka at gmail.com Wed Mar 1 21:46:51 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Wed, 1 Mar 2017 21:46:51 -0500 Subject: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression In-Reply-To: References:

Message-ID: <1D9AA4D5-7429-4CE6-BF91-B6B303AC3530@gmail.com> Hi, Raga, > 1. Just to make sure I understand correctly, using the .632+ bootstrap method, the ACC_lower and ACC_upper are the lower and higher percentile of the ACC_h,i distribution? phew, I am actually not sure anymore ? I think it?s the percentile of the ACC_boot distribution, similar to the ?classic? bootstrap but where ACC_boot got computed from weighted ACC_h,i and ACC_r,i > 2. For regression algorithms, is there a recommended equation for the no-information rate gamma? Sorry, can?t be of much help here; I am not sure what the equivalent of the no-information rate for regression would be ... > On Mar 1, 2017, at 5:39 PM, Raga Markely wrote: > > Thanks a lot, Sebastian! Very nicely written. > > I have a few follow-up questions: > 1. Just to make sure I understand correctly, using the .632+ bootstrap method, the ACC_lower and ACC_upper are the lower and higher percentile of the ACC_h,i distribution? > 2. For regression algorithms, is there a recommended equation for the no-information rate gamma? > 3. I need to plot the confidence interval and prediction interval for my Support Vector Regression prediction (just to clarify these intervals, please see an analogy from linear model on slide 14: http://www2.stat.duke.edu/~tjl13/s101/slides/unit6lec3H.pdf) - can I derive the intervals from .632+ bootstrap method or is there a different way of getting these intervals? > > Thank you! > Raga > > > On Wed, Mar 1, 2017 at 3:13 PM, Sebastian Raschka wrote: > Hi, Raga, > I have a short section on this here (https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and-empirical-confidence-intervals) if it helps. > > Best, > Sebastian > > > On Mar 1, 2017, at 3:07 PM, Raga Markely wrote: > > > > Hi everyone, > > > > I wonder if you could provide me with some suggestions on how to determine the confidence and prediction intervals of SVR? If you have suggestions for any machine learning algorithms in general, that would be fine too (doesn't have to be specific for SVR). > > > > So far, I have found: > > 1. Bootstrap: http://stats.stackexchange.com/questions/183230/bootstrapping-confidence-interval-from-a-regression-prediction > > 2. http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0048723&type=printable > > 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf > > > > But, I don't fully understand the details in #2 and #3 to the point that I can write a step by step code. If I use bootstrap method, I can get the confidence interval as follows? > > a. Draw bootstrap sample of size n > > b. Fit the SVR model (with hyperparameters chosen during model selection with grid search cv) to this bootstrap sample > > c. Use this model to predict the output variable y* from input variable X* > > d. Repeat step a-c for, for instance, 100 times > > e. Order the 100 values of y*, and determine, for instance, the 10th percentile and 90th percentile (if we are looking for 0.8 confidence interval) > > f. Repeat a-e for different values of X* to plot the prediction with confidence interval > > > > But, I don't know how to get the prediction interval from here. > > > > Thank you very much, > > Raga > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From raga.markely at gmail.com Wed Mar 1 22:07:35 2017 From: raga.markely at gmail.com (Raga Markely) Date: Wed, 1 Mar 2017 22:07:35 -0500 Subject: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression In-Reply-To: <7B05F1AE-FCE4-413E-B96A-773EEF2D7947@sebastianraschka.com> References:

<7B05F1AE-FCE4-413E-B96A-773EEF2D7947@sebastianraschka.com> Message-ID: No worries, Sebastian :) .. thank you very much for your help.. I learned a lot of new things from your site today.. it led me to some relevant chapters in "The Elements of Statistical Learning", which then led me to chapter 8 page 264 about non-parametric & parametric bootstrap.. I think I will just go with the non-parametric bootstrap for my problem.. similar to the bootstrap steps i mentioned earlier.. Thank you! Raga On Wed, Mar 1, 2017 at 9:44 PM, Sebastian Raschka wrote: > Hi, Raga, > > > 1. Just to make sure I understand correctly, using the .632+ bootstrap > method, the ACC_lower and ACC_upper are the lower and higher percentile of > the ACC_h,i distribution? > > phew, I am actually not sure anymore ? I think it?s the percentile of the > ACC_boot distribution, similar to the ?classic? bootstrap but where > ACC_boot got computed from weighted ACC_h,i and ACC_r,i > > > 2. For regression algorithms, is there a recommended equation for the > no-information rate gamma? > > > Sorry, can?t be of much help here; I am not sure what the equivalent of > the no-information rate for regression would be ... > > > > > On Mar 1, 2017, at 5:39 PM, Raga Markely wrote: > > > > Thanks a lot, Sebastian! Very nicely written. > > > > I have a few follow-up questions: > > 1. Just to make sure I understand correctly, using the .632+ bootstrap > method, the ACC_lower and ACC_upper are the lower and higher percentile of > the ACC_h,i distribution? > > 2. For regression algorithms, is there a recommended equation for the > no-information rate gamma? > > 3. I need to plot the confidence interval and prediction interval for my > Support Vector Regression prediction (just to clarify these intervals, > please see an analogy from linear model on slide 14: > http://www2.stat.duke.edu/~tjl13/s101/slides/unit6lec3H.pdf) - can I > derive the intervals from .632+ bootstrap method or is there a different > way of getting these intervals? > > > > Thank you! > > Raga > > > > > > On Wed, Mar 1, 2017 at 3:13 PM, Sebastian Raschka > wrote: > > Hi, Raga, > > I have a short section on this here (https://sebastianraschka.com/ > blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and- > empirical-confidence-intervals) if it helps. > > > > Best, > > Sebastian > > > > > On Mar 1, 2017, at 3:07 PM, Raga Markely > wrote: > > > > > > Hi everyone, > > > > > > I wonder if you could provide me with some suggestions on how to > determine the confidence and prediction intervals of SVR? If you have > suggestions for any machine learning algorithms in general, that would be > fine too (doesn't have to be specific for SVR). > > > > > > So far, I have found: > > > 1. Bootstrap: http://stats.stackexchange.com/questions/183230/ > bootstrapping-confidence-interval-from-a-regression-prediction > > > 2. http://journals.plos.org/plosone/article/file?id=10. > 1371/journal.pone.0048723&type=printable > > > 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf > > > > > > But, I don't fully understand the details in #2 and #3 to the point > that I can write a step by step code. If I use bootstrap method, I can get > the confidence interval as follows? > > > a. Draw bootstrap sample of size n > > > b. Fit the SVR model (with hyperparameters chosen during model > selection with grid search cv) to this bootstrap sample > > > c. Use this model to predict the output variable y* from input > variable X* > > > d. Repeat step a-c for, for instance, 100 times > > > e. Order the 100 values of y*, and determine, for instance, the 10th > percentile and 90th percentile (if we are looking for 0.8 confidence > interval) > > > f. Repeat a-e for different values of X* to plot the prediction with > confidence interval > > > > > > But, I don't know how to get the prediction interval from here. > > > > > > Thank you very much, > > > Raga > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Wed Mar 1 22:13:02 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Wed, 1 Mar 2017 22:13:02 -0500 Subject: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression In-Reply-To: References:

<7B05F1AE-FCE4-413E-B96A-773EEF2D7947@sebastianraschka.com> Message-ID: <34DD2CC4-D6E1-4FFD-B53E-095EF1DAB7B8@gmail.com> Glad to hear that it was at least a little bit helpful :) (haha, Efron and Tibshirani even have a whole ~500 pg book on bootstrap if you have the time and patience ? :) https://www.crcpress.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317) > On Mar 1, 2017, at 10:07 PM, Raga Markely wrote: > > No worries, Sebastian :) .. thank you very much for your help.. I learned a lot of new things from your site today.. it led me to some relevant chapters in "The Elements of Statistical Learning", which then led me to chapter 8 page 264 about non-parametric & parametric bootstrap.. > > I think I will just go with the non-parametric bootstrap for my problem.. similar to the bootstrap steps i mentioned earlier.. > > Thank you! > Raga > > On Wed, Mar 1, 2017 at 9:44 PM, Sebastian Raschka wrote: > Hi, Raga, > > > 1. Just to make sure I understand correctly, using the .632+ bootstrap method, the ACC_lower and ACC_upper are the lower and higher percentile of the ACC_h,i distribution? > > phew, I am actually not sure anymore ? I think it?s the percentile of the ACC_boot distribution, similar to the ?classic? bootstrap but where ACC_boot got computed from weighted ACC_h,i and ACC_r,i > > > 2. For regression algorithms, is there a recommended equation for the no-information rate gamma? > > > Sorry, can?t be of much help here; I am not sure what the equivalent of the no-information rate for regression would be ... > > > > > On Mar 1, 2017, at 5:39 PM, Raga Markely wrote: > > > > Thanks a lot, Sebastian! Very nicely written. > > > > I have a few follow-up questions: > > 1. Just to make sure I understand correctly, using the .632+ bootstrap method, the ACC_lower and ACC_upper are the lower and higher percentile of the ACC_h,i distribution? > > 2. For regression algorithms, is there a recommended equation for the no-information rate gamma? > > 3. I need to plot the confidence interval and prediction interval for my Support Vector Regression prediction (just to clarify these intervals, please see an analogy from linear model on slide 14: http://www2.stat.duke.edu/~tjl13/s101/slides/unit6lec3H.pdf) - can I derive the intervals from .632+ bootstrap method or is there a different way of getting these intervals? > > > > Thank you! > > Raga > > > > > > On Wed, Mar 1, 2017 at 3:13 PM, Sebastian Raschka wrote: > > Hi, Raga, > > I have a short section on this here (https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and-empirical-confidence-intervals) if it helps. > > > > Best, > > Sebastian > > > > > On Mar 1, 2017, at 3:07 PM, Raga Markely wrote: > > > > > > Hi everyone, > > > > > > I wonder if you could provide me with some suggestions on how to determine the confidence and prediction intervals of SVR? If you have suggestions for any machine learning algorithms in general, that would be fine too (doesn't have to be specific for SVR). > > > > > > So far, I have found: > > > 1. Bootstrap: http://stats.stackexchange.com/questions/183230/bootstrapping-confidence-interval-from-a-regression-prediction > > > 2. http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0048723&type=printable > > > 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf > > > > > > But, I don't fully understand the details in #2 and #3 to the point that I can write a step by step code. If I use bootstrap method, I can get the confidence interval as follows? > > > a. Draw bootstrap sample of size n > > > b. Fit the SVR model (with hyperparameters chosen during model selection with grid search cv) to this bootstrap sample > > > c. Use this model to predict the output variable y* from input variable X* > > > d. Repeat step a-c for, for instance, 100 times > > > e. Order the 100 values of y*, and determine, for instance, the 10th percentile and 90th percentile (if we are looking for 0.8 confidence interval) > > > f. Repeat a-e for different values of X* to plot the prediction with confidence interval > > > > > > But, I don't know how to get the prediction interval from here. > > > > > > Thank you very much, > > > Raga > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From raga.markely at gmail.com Wed Mar 1 22:17:13 2017 From: raga.markely at gmail.com (Raga Markely) Date: Wed, 1 Mar 2017 22:17:13 -0500 Subject: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression In-Reply-To: <34DD2CC4-D6E1-4FFD-B53E-095EF1DAB7B8@gmail.com> References:

<7B05F1AE-FCE4-413E-B96A-773EEF2D7947@sebastianraschka.com> <34DD2CC4-D6E1-4FFD-B53E-095EF1DAB7B8@gmail.com> Message-ID: that's a very serious dedication to bootstrap :) On Wed, Mar 1, 2017 at 10:13 PM, Sebastian Raschka wrote: > Glad to hear that it was at least a little bit helpful :) > (haha, Efron and Tibshirani even have a whole ~500 pg book on bootstrap if > you have the time and patience ? :) https://www.crcpress.com/An- > Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317) > > > On Mar 1, 2017, at 10:07 PM, Raga Markely > wrote: > > > > No worries, Sebastian :) .. thank you very much for your help.. I > learned a lot of new things from your site today.. it led me to some > relevant chapters in "The Elements of Statistical Learning", which then led > me to chapter 8 page 264 about non-parametric & parametric bootstrap.. > > > > I think I will just go with the non-parametric bootstrap for my > problem.. similar to the bootstrap steps i mentioned earlier.. > > > > Thank you! > > Raga > > > > On Wed, Mar 1, 2017 at 9:44 PM, Sebastian Raschka < > mail at sebastianraschka.com> wrote: > > Hi, Raga, > > > > > 1. Just to make sure I understand correctly, using the .632+ bootstrap > method, the ACC_lower and ACC_upper are the lower and higher percentile of > the ACC_h,i distribution? > > > > phew, I am actually not sure anymore ? I think it?s the percentile of > the ACC_boot distribution, similar to the ?classic? bootstrap but where > ACC_boot got computed from weighted ACC_h,i and ACC_r,i > > > > > 2. For regression algorithms, is there a recommended equation for the > no-information rate gamma? > > > > > > Sorry, can?t be of much help here; I am not sure what the equivalent of > the no-information rate for regression would be ... > > > > > > > > > On Mar 1, 2017, at 5:39 PM, Raga Markely > wrote: > > > > > > Thanks a lot, Sebastian! Very nicely written. > > > > > > I have a few follow-up questions: > > > 1. Just to make sure I understand correctly, using the .632+ bootstrap > method, the ACC_lower and ACC_upper are the lower and higher percentile of > the ACC_h,i distribution? > > > 2. For regression algorithms, is there a recommended equation for the > no-information rate gamma? > > > 3. I need to plot the confidence interval and prediction interval for > my Support Vector Regression prediction (just to clarify these intervals, > please see an analogy from linear model on slide 14: > http://www2.stat.duke.edu/~tjl13/s101/slides/unit6lec3H.pdf) - can I > derive the intervals from .632+ bootstrap method or is there a different > way of getting these intervals? > > > > > > Thank you! > > > Raga > > > > > > > > > On Wed, Mar 1, 2017 at 3:13 PM, Sebastian Raschka < > se.raschka at gmail.com> wrote: > > > Hi, Raga, > > > I have a short section on this here (https://sebastianraschka.com/ > blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and- > empirical-confidence-intervals) if it helps. > > > > > > Best, > > > Sebastian > > > > > > > On Mar 1, 2017, at 3:07 PM, Raga Markely > wrote: > > > > > > > > Hi everyone, > > > > > > > > I wonder if you could provide me with some suggestions on how to > determine the confidence and prediction intervals of SVR? If you have > suggestions for any machine learning algorithms in general, that would be > fine too (doesn't have to be specific for SVR). > > > > > > > > So far, I have found: > > > > 1. Bootstrap: http://stats.stackexchange.com/questions/183230/ > bootstrapping-confidence-interval-from-a-regression-prediction > > > > 2. http://journals.plos.org/plosone/article/file?id=10. > 1371/journal.pone.0048723&type=printable > > > > 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf > > > > > > > > But, I don't fully understand the details in #2 and #3 to the point > that I can write a step by step code. If I use bootstrap method, I can get > the confidence interval as follows? > > > > a. Draw bootstrap sample of size n > > > > b. Fit the SVR model (with hyperparameters chosen during model > selection with grid search cv) to this bootstrap sample > > > > c. Use this model to predict the output variable y* from input > variable X* > > > > d. Repeat step a-c for, for instance, 100 times > > > > e. Order the 100 values of y*, and determine, for instance, the 10th > percentile and 90th percentile (if we are looking for 0.8 confidence > interval) > > > > f. Repeat a-e for different values of X* to plot the prediction with > confidence interval > > > > > > > > But, I don't know how to get the prediction interval from here. > > > > > > > > Thank you very much, > > > > Raga > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shubham.bhardwaj2015 at vit.ac.in Thu Mar 2 04:07:36 2017 From: shubham.bhardwaj2015 at vit.ac.in (SHUBHAM BHARDWAJ 15BCE0704) Date: Thu, 2 Mar 2017 14:37:36 +0530 Subject: [scikit-learn] GSoc, 2017 (proposal idea and intro) .reg Message-ID: Hello Sir, My introduction : I am a 2nd year student studying Computer Science and engineering from VIT, Vellore. I work in Google Developers Group VIT. All my experience has been about collaborating with a lot of people ,working as a team, building products and learning along the way. Since scikit-learn is participating this time I am too planning to submit a proposal. Proposal idea: I am really interested in implementing kmeans++ algorithm.I was doing some work on DT but I found this very appealing. Just wanted to know if it can be a good project idea. Regards Shubham Bhardwaj -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Thu Mar 2 13:31:46 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Thu, 2 Mar 2017 10:31:46 -0800 Subject: [scikit-learn] GSoc, 2017 (proposal idea and intro) .reg In-Reply-To: References: Message-ID: Hi Shubham Thanks for your interest. I'm eager to see your contributions to sklearn in the future. However, I'm pretty sure kmeans++ is already implemented: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html Jacob On Thu, Mar 2, 2017 at 1:07 AM, SHUBHAM BHARDWAJ 15BCE0704 < shubham.bhardwaj2015 at vit.ac.in> wrote: > Hello Sir, > > My introduction : > I am a 2nd year student studying Computer Science and engineering from > VIT, Vellore. I work in Google Developers Group VIT. All my experience has > been about collaborating with a lot of people ,working as a team, building > products and learning along the way. > Since scikit-learn is participating this time I am too planning to submit > a proposal. > > Proposal idea: > I am really interested in implementing kmeans++ algorithm.I was doing some > work on DT but I found this very appealing. Just wanted to know if it can > be a good project idea. > > Regards > Shubham Bhardwaj > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shubham.bhardwaj2015 at vit.ac.in Thu Mar 2 20:00:12 2017 From: shubham.bhardwaj2015 at vit.ac.in (SHUBHAM BHARDWAJ 15BCE0704) Date: Fri, 3 Mar 2017 06:30:12 +0530 Subject: [scikit-learn] GSoc, 2017 (proposal idea and intro) .reg In-Reply-To: References:

Message-ID: Hello Sir, Thanks a lot for the reply. Sorry for not being elaborate about what I was trying to address. I wanted to implement this [ http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf] (1200+citations)- mentioned in comments. This pertains to the stalled issue #4357 .Proposal idea - implementing a scalable kmeans++. Regards Shubham Bhardwaj On Fri, Mar 3, 2017 at 12:01 AM, Jacob Schreiber wrote: > Hi Shubham > > Thanks for your interest. I'm eager to see your contributions to sklearn > in the future. However, I'm pretty sure kmeans++ is already implemented: > http://scikit-learn.org/stable/modules/generated/sklearn.cluster. > KMeans.html > > Jacob > > On Thu, Mar 2, 2017 at 1:07 AM, SHUBHAM BHARDWAJ 15BCE0704 < > shubham.bhardwaj2015 at vit.ac.in> wrote: > >> Hello Sir, >> >> My introduction : >> I am a 2nd year student studying Computer Science and engineering from >> VIT, Vellore. I work in Google Developers Group VIT. All my experience has >> been about collaborating with a lot of people ,working as a team, building >> products and learning along the way. >> Since scikit-learn is participating this time I am too planning to submit >> a proposal. >> >> Proposal idea: >> I am really interested in implementing kmeans++ algorithm.I was doing >> some work on DT but I found this very appealing. Just wanted to know if it >> can be a good project idea. >> >> Regards >> Shubham Bhardwaj >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Thu Mar 2 20:10:32 2017 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Fri, 3 Mar 2017 02:10:32 +0100 Subject: [scikit-learn] GSoc, 2017 (proposal idea and intro) .reg In-Reply-To: References:

Message-ID: I think that you mean this paper -> Scalable K-Means++ -> 218 citations On 3 March 2017 at 02:00, SHUBHAM BHARDWAJ 15BCE0704 < shubham.bhardwaj2015 at vit.ac.in> wrote: > Hello Sir, > > Thanks a lot for the reply. Sorry for not being elaborate about what I was > trying to address. I wanted to implement this [http://ilpubs.stanford.edu: > 8090/778/1/2006-13.pdf] (1200+citations)- mentioned in comments. This > pertains to the stalled issue #4357 .Proposal idea - implementing a > scalable kmeans++. > > Regards > Shubham Bhardwaj > > On Fri, Mar 3, 2017 at 12:01 AM, Jacob Schreiber > wrote: > >> Hi Shubham >> >> Thanks for your interest. I'm eager to see your contributions to sklearn >> in the future. However, I'm pretty sure kmeans++ is already implemented: >> http://scikit-learn.org/stable/modules/generate >> d/sklearn.cluster.KMeans.html >> >> Jacob >> >> On Thu, Mar 2, 2017 at 1:07 AM, SHUBHAM BHARDWAJ 15BCE0704 < >> shubham.bhardwaj2015 at vit.ac.in> wrote: >> >>> Hello Sir, >>> >>> My introduction : >>> I am a 2nd year student studying Computer Science and engineering from >>> VIT, Vellore. I work in Google Developers Group VIT. All my experience has >>> been about collaborating with a lot of people ,working as a team, building >>> products and learning along the way. >>> Since scikit-learn is participating this time I am too planning to >>> submit a proposal. >>> >>> Proposal idea: >>> I am really interested in implementing kmeans++ algorithm.I was doing >>> some work on DT but I found this very appealing. Just wanted to know if it >>> can be a good project idea. >>> >>> Regards >>> Shubham Bhardwaj >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Guillaume Lemaitre INRIA Saclay - Ile-de-France Equipe PARIETAL guillaume.lemaitre at inria.f r --- https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shubham.bhardwaj2015 at vit.ac.in Thu Mar 2 20:23:16 2017 From: shubham.bhardwaj2015 at vit.ac.in (SHUBHAM BHARDWAJ 15BCE0704) Date: Fri, 3 Mar 2017 06:53:16 +0530 Subject: [scikit-learn] GSoc, 2017 (proposal idea and intro) .reg In-Reply-To: References:

Message-ID: Hello Sir, Very Sorry for the numbers I saw this written in the comments.I assumed -Given the person who suggested the paper might have taken a look into the number of citations.I will make sure to personally check myself. Regards Shubham Bhardwaj On Fri, Mar 3, 2017 at 6:40 AM, Guillaume Lema?tre wrote: > I think that you mean this paper -> Scalable K-Means++ -> 218 citations > > On 3 March 2017 at 02:00, SHUBHAM BHARDWAJ 15BCE0704 < > shubham.bhardwaj2015 at vit.ac.in> wrote: > >> Hello Sir, >> >> Thanks a lot for the reply. Sorry for not being elaborate about what I >> was trying to address. I wanted to implement this [ >> http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf] (1200+citations)- >> mentioned in comments. This pertains to the stalled issue #4357 .Proposal >> idea - implementing a scalable kmeans++. >> >> Regards >> Shubham Bhardwaj >> >> On Fri, Mar 3, 2017 at 12:01 AM, Jacob Schreiber > > wrote: >> >>> Hi Shubham >>> >>> Thanks for your interest. I'm eager to see your contributions to sklearn >>> in the future. However, I'm pretty sure kmeans++ is already implemented: >>> http://scikit-learn.org/stable/modules/generate >>> d/sklearn.cluster.KMeans.html >>> >>> Jacob >>> >>> On Thu, Mar 2, 2017 at 1:07 AM, SHUBHAM BHARDWAJ 15BCE0704 < >>> shubham.bhardwaj2015 at vit.ac.in> wrote: >>> >>>> Hello Sir, >>>> >>>> My introduction : >>>> I am a 2nd year student studying Computer Science and engineering from >>>> VIT, Vellore. I work in Google Developers Group VIT. All my experience has >>>> been about collaborating with a lot of people ,working as a team, building >>>> products and learning along the way. >>>> Since scikit-learn is participating this time I am too planning to >>>> submit a proposal. >>>> >>>> Proposal idea: >>>> I am really interested in implementing kmeans++ algorithm.I was doing >>>> some work on DT but I found this very appealing. Just wanted to know if it >>>> can be a good project idea. >>>> >>>> Regards >>>> Shubham Bhardwaj >>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Ile-de-France > Equipe PARIETAL > guillaume.lemaitre at inria.f r --- > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ragvrv at gmail.com Fri Mar 3 12:56:27 2017 From: ragvrv at gmail.com (Raghav R V) Date: Fri, 3 Mar 2017 18:56:27 +0100 Subject: [scikit-learn] MAPE in scikit-learn? Message-ID: Hi all, Do we want Median Absolute Percentage Error in scikit-learn? Ref: KDD2017 - https://tianchi.shuju.aliyun.com/competition/information.htm?spm=5176.100067.5678.2.8CnCPt&raceId=231597 Thanks -- Raghav RV https://github.com/raghavrv -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahowe42 at gmail.com Fri Mar 3 13:18:22 2017 From: ahowe42 at gmail.com (Andrew Howe) Date: Fri, 3 Mar 2017 21:18:22 +0300 Subject: [scikit-learn] MAPE in scikit-learn? In-Reply-To: References: Message-ID: I would think so. I've used it in research before. Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Fri, Mar 3, 2017 at 8:56 PM, Raghav R V wrote: > Hi all, > > Do we want Median Absolute Percentage Error in scikit-learn? > > Ref: KDD2017 - https://tianchi.shuju.aliyun.com/competition/ > information.htm?spm=5176.100067.5678.2.8CnCPt&raceId=231597 > > Thanks > > -- > Raghav RV > https://github.com/raghavrv > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amanpratik10 at gmail.com Fri Mar 3 14:54:13 2017 From: amanpratik10 at gmail.com (Aman Pratik) Date: Sat, 4 Mar 2017 01:24:13 +0530 Subject: [scikit-learn] GSoC 2017 Message-ID: Hello Developers, This is Aman Pratik. I am currently pursuing my B.Tech from Indian Institute of Technology, Varanasi. I am a keen software developer and not very new to the open source community. I am interested in your project "*Improve online learning for linear models*" for GSoC 2017. I have been working in Python for the past 2 years and have good idea about Machine Learning algorithms. I am quite familiar with scikit-learn both as a user and a developer. These are the PRs I have worked/working on for the past few months. [MRG+1] Issue#5803 : Regression Test added #8112 [MRG] Issue#6673 : Make a wrapper around functions that score an individual feature [MRG] Issue #7987: Embarrassingly parallel "n_restarts_optimizer" in GaussianProcessRegressor My GitHub Profile: https://www.github.com/amanp10 I have basic knowledge about SGD (Stochastic Gradient Descent) and related algorithms. Also, I am familiar with Benchmark tests, Unit tests and other technical knowledge I would require for this project. I have started my study for the subject and am looking forward to guidance from the potential mentors or anyone willing to help. Thank You -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Fri Mar 3 17:36:21 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Fri, 3 Mar 2017 14:36:21 -0800 Subject: [scikit-learn] Scipy 2017 In-Reply-To: References: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com>

Message-ID: Do you still need someone to help with the tutorial? I may be able to attend. On Tue, Feb 28, 2017 at 9:43 AM, Nelson Liu wrote: > The conference generally (at least for the last three years) uploads > recordings of the tutorials afterwards, e.g. here > is part one of the > scikit-learn tutorial at Scipy 2016. I would assume that they are doing > this again. > > Nelson Liu > > On Tue, Feb 28, 2017 at 9:37 AM, Ruchika Nayyar > wrote: > >> Hello >> >> Will there be a video link ? >> >> Thanks, >> Ruchika >> ---------------------------------------- >> Dr Ruchika Nayyar, >> Post Doctoral Fellow for ATLAS Collaboration >> University of Arizona >> Arizona, USA. >> -------------------------------------------- >> >> On Mon, Feb 27, 2017 at 2:20 PM, Alexandre Gramfort < >> alexandre.gramfort at telecom-paristech.fr> wrote: >> >>> Hi Andy, >>> >>> I'll be happy to share the stage with you for a tutorial. >>> >>> Alex >>> >>> >>> On Tue, Feb 21, 2017 at 3:52 PM, Andreas Mueller >>> wrote: >>> > Hey folks. >>> > Who's coming to scipy this year? >>> > Any volunteers for tutorials? I'm happy to be part of it but doing 7h >>> by >>> > myself is a bit much ;) >>> > >>> > >>> > Andy >>> > _______________________________________________ >>> > scikit-learn mailing list >>> > scikit-learn at python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhail at combust.ml Sat Mar 4 17:12:11 2017 From: mikhail at combust.ml (Mikhail Semeniuk) Date: Sat, 4 Mar 2017 14:12:11 -0800 Subject: [scikit-learn] MAPE in scikit-learn? In-Reply-To: References:

Message-ID: +1 On Fri, Mar 3, 2017 at 10:18 AM, Andrew Howe wrote: > I would think so. I've used it in research before. > > Andrew > > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > J. Andrew Howe, PhD > www.andrewhowe.com > http://www.linkedin.com/in/ahowe42 > https://www.researchgate.net/profile/John_Howe12/ > I live to learn, so I can learn to live. - me > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > On Fri, Mar 3, 2017 at 8:56 PM, Raghav R V wrote: > >> Hi all, >> >> Do we want Median Absolute Percentage Error in scikit-learn? >> >> Ref: KDD2017 - https://tianchi.shuju.aliyun.com/competition/information. >> htm?spm=5176.100067.5678.2.8CnCPt&raceId=231597 >> >> Thanks >> >> -- >> Raghav RV >> https://github.com/raghavrv >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Sun Mar 5 12:42:07 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Sun, 5 Mar 2017 12:42:07 -0500 Subject: [scikit-learn] MAPE in scikit-learn? In-Reply-To: References:

Message-ID: <03202134-9d33-f31d-b448-ba00f1bb6a54@gmail.com> +1 On 03/04/2017 05:12 PM, Mikhail Semeniuk wrote: > +1 > > On Fri, Mar 3, 2017 at 10:18 AM, Andrew Howe > wrote: > > I would think so. I've used it in research before. > > Andrew > > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > J. Andrew Howe, PhD > www.andrewhowe.com > http://www.linkedin.com/in/ahowe42 > > https://www.researchgate.net/profile/John_Howe12/ > > I live to learn, so I can learn to live. - me > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > On Fri, Mar 3, 2017 at 8:56 PM, Raghav R V > wrote: > > Hi all, > > Do we want Median Absolute Percentage Error in scikit-learn? > > Ref: KDD2017 - > https://tianchi.shuju.aliyun.com/competition/information.htm?spm=5176.100067.5678.2.8CnCPt&raceId=231597 > > > Thanks > > -- > Raghav RV > https://github.com/raghavrv > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Sun Mar 5 12:42:20 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Sun, 5 Mar 2017 12:42:20 -0500 Subject: [scikit-learn] Scipy 2017 In-Reply-To: References: <65ef1d1c-28a9-0772-6da1-3b54feb7cfd1@gmail.com>

Message-ID: I'm gonna do it with Alex :) On 03/03/2017 05:36 PM, Jacob Schreiber wrote: > Do you still need someone to help with the tutorial? I may be able to > attend. > > On Tue, Feb 28, 2017 at 9:43 AM, Nelson Liu > wrote: > > The conference generally (at least for the last three years) > uploads recordings of the tutorials afterwards, e.g. here > is part one of the > scikit-learn tutorial at Scipy 2016. I would assume that they are > doing this again. > > Nelson Liu > > On Tue, Feb 28, 2017 at 9:37 AM, Ruchika Nayyar > > wrote: > > Hello > > Will there be a video link ? > > Thanks, > Ruchika > ---------------------------------------- > Dr Ruchika Nayyar, > Post Doctoral Fellow for ATLAS Collaboration > University of Arizona > Arizona, USA. > -------------------------------------------- > > On Mon, Feb 27, 2017 at 2:20 PM, Alexandre Gramfort > > wrote: > > Hi Andy, > > I'll be happy to share the stage with you for a tutorial. > > Alex > > > On Tue, Feb 21, 2017 at 3:52 PM, Andreas Mueller > > wrote: > > Hey folks. > > Who's coming to scipy this year? > > Any volunteers for tutorials? I'm happy to be part of it > but doing 7h by > > myself is a bit much ;) > > > > > > Andy > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Sun Mar 5 12:47:09 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Sun, 5 Mar 2017 12:47:09 -0500 Subject: [scikit-learn] Scikit-learn survey results Message-ID: <23963584-ae33-db28-90d8-6e1479e3f862@gmail.com> Hey all. In case you're interested, here is a summary view of the scikit-learn survey I posted recently: https://www.surveymonkey.com/results/SM-RHGZVZ73/ tldr; Preprocessing takes the most time, people want out-of-core learning, better integration with pandas and easier visualization of models and data. People would use automatic machine learning if it was there, but it's not the highest priority item. There is also a lot of interesting info in the comments, but because I was not able to go through all of them yet, I don't want to publish them publicly in case there is sensitive information included (and if anyone knows if there are legal implications if there wasn't a disclaimer, please let me know). Cheers, Andy From t3kcit at gmail.com Sun Mar 5 14:02:01 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Sun, 5 Mar 2017 14:02:01 -0500 Subject: [scikit-learn] GSoc, 2017 (proposal idea and intro) .reg In-Reply-To: References:

Message-ID: There was a PR here: https://github.com/scikit-learn/scikit-learn/pull/5530 but it didn't seem to work. Feel free to convince us otherwise ;) On 03/02/2017 08:23 PM, SHUBHAM BHARDWAJ 15BCE0704 wrote: > Hello Sir, > Very Sorry for the numbers I saw this written in the comments.I > assumed -Given the person who suggested the paper might have taken a > look into the number of citations.I will make sure to personally check > myself. > > Regards > Shubham Bhardwaj > > On Fri, Mar 3, 2017 at 6:40 AM, Guillaume Lema?tre > > wrote: > > I think that you mean this paper -> Scalable K-Means++ -> 218 > citations > > On 3 March 2017 at 02:00, SHUBHAM BHARDWAJ 15BCE0704 > > wrote: > > Hello Sir, > > Thanks a lot for the reply. Sorry for not being elaborate > about what I was trying to address. I wanted to implement this > [http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf > ] > (1200+citations)- mentioned in comments. This pertains to the > stalled issue #4357 .Proposal idea - implementing a scalable > kmeans++. > > Regards > Shubham Bhardwaj > > On Fri, Mar 3, 2017 at 12:01 AM, Jacob Schreiber > > wrote: > > Hi Shubham > > Thanks for your interest. I'm eager to see your > contributions to sklearn in the future. However, I'm > pretty sure kmeans++ is already implemented: > http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html > > > Jacob > > On Thu, Mar 2, 2017 at 1:07 AM, SHUBHAM BHARDWAJ 15BCE0704 > > wrote: > > Hello Sir, > > My introduction : > I am a 2nd year student studying Computer Science and > engineering from VIT, Vellore. I work in Google > Developers Group VIT. All my experience has been about > collaborating with a lot of people ,working as a team, > building products and learning along the way. > Since scikit-learn is participating this time I am too > planning to submit a proposal. > > Proposal idea: > I am really interested in implementing kmeans++ > algorithm.I was doing some work on DT but I found this > very appealing. Just wanted to know if it can be a > good project idea. > > Regards > Shubham Bhardwaj > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > -- > Guillaume Lemaitre > INRIA Saclay - Ile-de-France > Equipe PARIETAL > guillaume.lemaitre at inria.f r > --- https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From shubham.bhardwaj2015 at vit.ac.in Mon Mar 6 06:25:05 2017 From: shubham.bhardwaj2015 at vit.ac.in (SHUBHAM BHARDWAJ 15BCE0704) Date: Mon, 6 Mar 2017 16:55:05 +0530 Subject: [scikit-learn] GSoc, 2017 (proposal idea and intro) .reg In-Reply-To: References:

Message-ID: Hello Sir, Thanks for the reply, I will try to reproduce the claims of the paper and would update about my progress. Regards Shubham On Mon, Mar 6, 2017 at 12:32 AM, Andreas Mueller wrote: > There was a PR here: > https://github.com/scikit-learn/scikit-learn/pull/5530 > > but it didn't seem to work. Feel free to convince us otherwise ;) > > > > On 03/02/2017 08:23 PM, SHUBHAM BHARDWAJ 15BCE0704 wrote: > > Hello Sir, > Very Sorry for the numbers I saw this written in the comments.I assumed > -Given the person who suggested the paper might have taken a look into the > number of citations.I will make sure to personally check myself. > > Regards > Shubham Bhardwaj > > On Fri, Mar 3, 2017 at 6:40 AM, Guillaume Lema?tre > wrote: > >> I think that you mean this paper -> Scalable K-Means++ -> 218 citations >> >> On 3 March 2017 at 02:00, SHUBHAM BHARDWAJ 15BCE0704 < >> shubham.bhardwaj2015 at vit.ac.in> wrote: >> >>> Hello Sir, >>> >>> Thanks a lot for the reply. Sorry for not being elaborate about what I >>> was trying to address. I wanted to implement this [ >>> http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf] (1200+citations)- >>> mentioned in comments. This pertains to the stalled issue #4357 .Proposal >>> idea - implementing a scalable kmeans++. >>> >>> Regards >>> Shubham Bhardwaj >>> >>> On Fri, Mar 3, 2017 at 12:01 AM, Jacob Schreiber < >>> jmschreiber91 at gmail.com> wrote: >>> >>>> Hi Shubham >>>> >>>> Thanks for your interest. I'm eager to see your contributions to >>>> sklearn in the future. However, I'm pretty sure kmeans++ is already >>>> implemented: http://scikit-learn.org/stable/modules/generate >>>> d/sklearn.cluster.KMeans.html >>>> >>>> Jacob >>>> >>>> On Thu, Mar 2, 2017 at 1:07 AM, SHUBHAM BHARDWAJ 15BCE0704 < >>>> shubham.bhardwaj2015 at vit.ac.in> wrote: >>>> >>>>> Hello Sir, >>>>> >>>>> My introduction : >>>>> I am a 2nd year student studying Computer Science and engineering from >>>>> VIT, Vellore. I work in Google Developers Group VIT. All my experience has >>>>> been about collaborating with a lot of people ,working as a team, building >>>>> products and learning along the way. >>>>> Since scikit-learn is participating this time I am too planning to >>>>> submit a proposal. >>>>> >>>>> Proposal idea: >>>>> I am really interested in implementing kmeans++ algorithm.I was doing >>>>> some work on DT but I found this very appealing. Just wanted to know if it >>>>> can be a good project idea. >>>>> >>>>> Regards >>>>> Shubham Bhardwaj >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> -- >> Guillaume Lemaitre >> INRIA Saclay - Ile-de-France >> Equipe PARIETAL >> guillaume.lemaitre at inria.f r --- >> https://glemaitre.github.io/ >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdholt1 at gmail.com Mon Mar 6 06:46:37 2017 From: bdholt1 at gmail.com (Brian Holt) Date: Mon, 6 Mar 2017 11:46:37 +0000 Subject: [scikit-learn] Scikit-learn survey results In-Reply-To: <23963584-ae33-db28-90d8-6e1479e3f862@gmail.com> References: <23963584-ae33-db28-90d8-6e1479e3f862@gmail.com> Message-ID: Thanks Andy, That's really interesting and gives some hints for future direction. As an initial suggestion, I wonder if incremental decision tree learning would be welcomed by the project? My personal experience building trees was very often frustrated by memory constraints and an alternative that uses batches would allow the technique to scale up to much larger datasets that don't fit in memory. Regards Brian On 5 March 2017 at 17:47, Andreas Mueller wrote: > Hey all. > In case you're interested, here is a summary view of the scikit-learn > survey I posted recently: > https://www.surveymonkey.com/results/SM-RHGZVZ73/ > > tldr; > Preprocessing takes the most time, people want out-of-core learning, > better integration with pandas > and easier visualization of models and data. > People would use automatic machine learning if it was there, but it's not > the highest priority item. > > There is also a lot of interesting info in the comments, but because I was > not able to go through all of them yet, > I don't want to publish them publicly in case there is sensitive > information included (and if anyone knows if there are > legal implications if there wasn't a disclaimer, please let me know). > > Cheers, > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Mon Mar 6 10:37:01 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 6 Mar 2017 10:37:01 -0500 Subject: [scikit-learn] Scikit-learn survey results In-Reply-To: References: <23963584-ae33-db28-90d8-6e1479e3f862@gmail.com> Message-ID: <711efdaa-388e-681a-a09c-7ecd9a2acb5b@gmail.com> Hi Brian. How about mondrian forests? ;) And I think Gilles has thought about parallelizing trees a bit. It's definitely something that people are interested in. Andy On 03/06/2017 06:46 AM, Brian Holt wrote: > Thanks Andy, > > That's really interesting and gives some hints for future direction. > As an initial suggestion, I wonder if incremental decision tree > learning would be welcomed by the project? My personal experience > building trees was very often frustrated by memory constraints and an > alternative that uses batches would allow the technique to scale up to > much larger datasets that don't fit in memory. > > Regards > Brian > > On 5 March 2017 at 17:47, Andreas Mueller > wrote: > > Hey all. > In case you're interested, here is a summary view of the > scikit-learn survey I posted recently: > https://www.surveymonkey.com/results/SM-RHGZVZ73/ > > > tldr; > Preprocessing takes the most time, people want out-of-core > learning, better integration with pandas > and easier visualization of models and data. > People would use automatic machine learning if it was there, but > it's not the highest priority item. > > There is also a lot of interesting info in the comments, but > because I was not able to go through all of them yet, > I don't want to publish them publicly in case there is sensitive > information included (and if anyone knows if there are > legal implications if there wasn't a disclaimer, please let me know). > > Cheers, > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From betatim at gmail.com Mon Mar 6 10:43:30 2017 From: betatim at gmail.com (Tim Head) Date: Mon, 06 Mar 2017 15:43:30 +0000 Subject: [scikit-learn] Scikit-learn survey results In-Reply-To: <711efdaa-388e-681a-a09c-7ecd9a2acb5b@gmail.com> References: <23963584-ae33-db28-90d8-6e1479e3f862@gmail.com> <711efdaa-388e-681a-a09c-7ecd9a2acb5b@gmail.com> Message-ID: On Mon, Mar 6, 2017 at 10:37 AM Andreas Mueller wrote: > Hi Brian. > > How about mondrian forests? ;) > Talk to Manoj (CCed) about those. He recently started an implementation while exploring them for scikit-optimize. T -------------- next part -------------- An HTML attachment was scrubbed... URL: From konst.katrioplas at gmail.com Mon Mar 6 11:34:34 2017 From: konst.katrioplas at gmail.com (Konstantinos Katrioplas) Date: Mon, 6 Mar 2017 18:34:34 +0200 Subject: [scikit-learn] contribution to scikit-learn - questions Message-ID: <2f3910b8-dd14-0980-d174-daa0d663a15e@gmail.com> Hello all, My name is Konstantinos and I would like to contribute to scikit-learn. I am relatively new to open source development and I want to work on some easy bug-fixing to get used to the github workflow. Firstly, is this issue open and should I try working on it? https://github.com/scikit-learn/scikit-learn/issues/8425 If not, would you suggest another? Furthermore, when trying to build with make I get this: make: nosetests: Command not found Makefile:32: recipe for target 'test-code' failed make: *** [test-code] Error 127 Is this in any way expected and do you know what I might be missing? Finally, is there an IRC channel particularly for scikit-learn? Thanks in advance, Konstantinos From t3kcit at gmail.com Mon Mar 6 11:42:35 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 6 Mar 2017 11:42:35 -0500 Subject: [scikit-learn] contribution to scikit-learn - questions In-Reply-To: <2f3910b8-dd14-0980-d174-daa0d663a15e@gmail.com> References: <2f3910b8-dd14-0980-d174-daa0d663a15e@gmail.com> Message-ID: Hi Konstantinos. There is an IRC channel but it's not that busy any more. You could try the gitter channel at http://gitter.im/scikit-learn/scikit-learn The issue that you cited is ok, but this one might be easier to start with: https://github.com/scikit-learn/scikit-learn/issues/8194 You need to install nosetests to run it. Andy On 03/06/2017 11:34 AM, Konstantinos Katrioplas wrote: > Hello all, > > My name is Konstantinos and I would like to contribute to > scikit-learn. I am relatively new to open source development and I > want to work on some easy bug-fixing to get used to the github workflow. > > Firstly, is this issue open and should I try working on it? > https://github.com/scikit-learn/scikit-learn/issues/8425 If not, > would you suggest another? > > Furthermore, when trying to build with make I get this: > > make: nosetests: Command not found > Makefile:32: recipe for target 'test-code' failed > make: *** [test-code] Error 127 > > Is this in any way expected and do you know what I might be missing? > > Finally, is there an IRC channel particularly for scikit-learn? > > Thanks in advance, > Konstantinos > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From seralouk at gmail.com Tue Mar 7 04:48:27 2017 From: seralouk at gmail.com (Serafeim Loukas) Date: Tue, 7 Mar 2017 10:48:27 +0100 Subject: [scikit-learn] Linear Discriminant Analysis with Cross Validation in Python Message-ID: Dear scikit members, I would like to ask if there is any function that implements Linear Discriminant Analysis with Cross Validation (leave one out). Thank you in advance, S -------------- next part -------------- An HTML attachment was scrubbed... URL: From maheshak04 at gmail.com Tue Mar 7 04:56:24 2017 From: maheshak04 at gmail.com (Mahesh Kulkarni) Date: Tue, 7 Mar 2017 15:26:24 +0530 Subject: [scikit-learn] Linear Discriminant Analysis with Cross Validation in Python In-Reply-To: References: Message-ID: Yes. Please see following link: http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html On Tue, Mar 7, 2017 at 3:18 PM, Serafeim Loukas wrote: > Dear scikit members, > > > I would like to ask if there is any function that implements > Linear Discriminant Analysis with Cross Validation (leave one out). > > Thank you in advance, > S > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomarshubham24 at gmail.com Tue Mar 7 08:24:07 2017 From: tomarshubham24 at gmail.com (Shubham Singh Tomar) Date: Tue, 7 Mar 2017 18:54:07 +0530 Subject: [scikit-learn] Error while using GridSearchCV. Message-ID: Hi, I'm trying to use GridSearchCV to tune the parameters for DecisionTreeRegressor. I'm using sklearn 0.18.1 I'm getting the following error: ---------------------------------------------------------------------------TypeError Traceback (most recent call last) in () 1 # Fit the training data to the model using grid search----> 2 reg = fit_model(X_train, y_train) 3 4 # Produce the value for 'max_depth' 5 print "Parameter 'max_depth' is {} for the optimal model.".format(reg.get_params()['max_depth']) in fit_model(X, y) 11 12 # Create cross-validation sets from the training data---> 13 cv_sets = ShuffleSplit(X.shape[0], n_splits = 10, test_size = 0.20, random_state = 0) 14 15 # TODO: Create a decision tree regressor object TypeError: __init__() got multiple values for keyword argument 'n_splits' -- *Thanks,* *Shubham Singh Tomar* *Autodidact24.github.io * -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at gmail.com Tue Mar 7 08:43:00 2017 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Tue, 7 Mar 2017 14:43:00 +0100 Subject: [scikit-learn] Error while using GridSearchCV. In-Reply-To: References: Message-ID: <58BEB8E4.9060909@gmail.com> Shubham, the definition of ShuffleSplit.__init__ is ShuffleSplit(n_splits=10, test_size=0.1, train_size=None, random_state=None) you are passing the n_split parameter twice (once named and once as the first parameter), as the exception that you getting says, -- Roman On 07/03/17 14:24, Shubham Singh Tomar wrote: > Hi, > > I'm trying to use GridSearchCV to tune the parameters for > DecisionTreeRegressor. I'm using sklearn 0.18.1 > > I'm getting the following error: > > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > in () 1 # Fit the training data to the model using grid search----> > 2reg = fit_model(X_train, y_train)3 4 # Produce the value for > 'max_depth'5 print "Parameter 'max_depth' is {} for the optimal > model.".format(reg.get_params()['max_depth']) > in fit_model(X, y) 11 12 # Create cross-validation sets from the > training data---> 13cv_sets = ShuffleSplit(X.shape[0], n_splits = 10, > test_size = 0.20, random_state = 0)14 15 # TODO: Create a decision tree > regressor objectTypeError: __init__() got multiple values for keyword > argument 'n_splits' > > > > > -- > *Thanks,* > *Shubham Singh Tomar* > *Autodidact24.github.io * > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From seralouk at gmail.com Tue Mar 7 10:01:09 2017 From: seralouk at gmail.com (Serafeim Loukas) Date: Tue, 7 Mar 2017 16:01:09 +0100 Subject: [scikit-learn] Linear Discriminant Analysis with Cross Validation in Python In-Reply-To: References: Message-ID: Dear Mahesh, Thank you for your response. I read the documentation however I did not find anything related to cross-validation (leave one out). Can you give me a hint? Thank you, S ............................................. Loukas Serafeim University of Geneva email: seralouk at gmail.com 2017-03-07 10:56 GMT+01:00 Mahesh Kulkarni : > Yes. Please see following link: > > http://scikit-learn.org/stable/modules/generated/ > sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > On Tue, Mar 7, 2017 at 3:18 PM, Serafeim Loukas > wrote: > >> Dear scikit members, >> >> >> I would like to ask if there is any function that implements >> Linear Discriminant Analysis with Cross Validation (leave one out). >> >> Thank you in advance, >> S >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Tue Mar 7 11:56:55 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Tue, 7 Mar 2017 11:56:55 -0500 Subject: [scikit-learn] Linear Discriminant Analysis with Cross Validation in Python In-Reply-To: References: Message-ID: Hi, Loukas and Mahesh, for LOOCV, you could e.g., use the LeaveOneOut class ``` from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.model_selection import LeaveOneOut loo = LeaveOneOut() lda = LinearDiscriminantAnalysis() test_fold_predictions = [] for train_index, test_index in loo.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] lda.fit(X_train, y_train) test_fold_predictions.append(lda.predict(X_test)) ``` or you could pass the loo to the cross_val_score function directly: ``` from sklearn.model_selection import cross_val_score cross_val_score(estimator=lda, X=X, y=y, cv=loo) ``` Best, Sebastian > On Mar 7, 2017, at 10:01 AM, Serafeim Loukas wrote: > > Dear Mahesh, > > Thank you for your response. > > I read the documentation however I did not find anything related to cross-validation (leave one out). > Can you give me a hint? > > Thank you, > S > > ............................................. > Loukas Serafeim > University of Geneva > email: seralouk at gmail.com > > > 2017-03-07 10:56 GMT+01:00 Mahesh Kulkarni : > Yes. Please see following link: > > http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > On Tue, Mar 7, 2017 at 3:18 PM, Serafeim Loukas wrote: > Dear scikit members, > > > I would like to ask if there is any function that implements Linear Discriminant Analysis with Cross Validation (leave one out). > > Thank you in advance, > S > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From fernando.wittmann at gmail.com Tue Mar 7 16:02:26 2017 From: fernando.wittmann at gmail.com (Fernando Marcos Wittmann) Date: Tue, 7 Mar 2017 18:02:26 -0300 Subject: [scikit-learn] Error while using GridSearchCV. In-Reply-To: References: Message-ID: Hey Shubham, I am a project reviewer at Udacity. This code seems to be part of one of our projects (P1 - Boston Housing ). I think that you have updated the old module sklearn.cross_validation to the module sklearn.model_detection, is that correct? If yes, then you should also update the parameters in ShuffleSplit to match with this new version (check the docs ). Try to update ShuffleSplit to the following line of code: cv_sets = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0) I hope that helps! Feel free to send me a PM. On Tue, Mar 7, 2017 at 10:24 AM, Shubham Singh Tomar < tomarshubham24 at gmail.com> wrote: > Hi, > > I'm trying to use GridSearchCV to tune the parameters for > DecisionTreeRegressor. I'm using sklearn 0.18.1 > > I'm getting the following error: > > ---------------------------------------------------------------------------TypeError Traceback (most recent call last) in () 1 # Fit the training data to the model using grid search----> 2 reg = fit_model(X_train, y_train) 3 4 # Produce the value for 'max_depth' 5 print "Parameter 'max_depth' is {} for the optimal model.".format(reg.get_params()['max_depth']) > in fit_model(X, y) 11 12 # Create cross-validation sets from the training data---> 13 cv_sets = ShuffleSplit(X.shape[0], n_splits = 10, test_size = 0.20, random_state = 0) 14 15 # TODO: Create a decision tree regressor object > TypeError: __init__() got multiple values for keyword argument 'n_splits' > > > > > -- > *Thanks,* > *Shubham Singh Tomar* > *Autodidact24.github.io * > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Fernando Marcos Wittmann -------------- next part -------------- An HTML attachment was scrubbed... URL: From maheshak04 at gmail.com Tue Mar 7 20:30:54 2017 From: maheshak04 at gmail.com (Mahesh Kulkarni) Date: Wed, 8 Mar 2017 07:00:54 +0530 Subject: [scikit-learn] Linear Discriminant Analysis with Cross Validation in Python In-Reply-To: References:

Message-ID: Hi Sebastian, Thank you On 7 Mar 2017 10:28 p.m., "Sebastian Raschka" wrote: > Hi, Loukas and Mahesh, > for LOOCV, you could e.g., use the LeaveOneOut class > > ``` > from sklearn.discriminant_analysis import LinearDiscriminantAnalysis > from sklearn.model_selection import LeaveOneOut > > loo = LeaveOneOut() > lda = LinearDiscriminantAnalysis() > > test_fold_predictions = [] > > for train_index, test_index in loo.split(X): > X_train, X_test = X[train_index], X[test_index] > y_train, y_test = y[train_index], y[test_index] > lda.fit(X_train, y_train) > test_fold_predictions.append(lda.predict(X_test)) > ``` > > or you could pass the loo to the cross_val_score function directly: > > ``` > from sklearn.model_selection import cross_val_score > cross_val_score(estimator=lda, X=X, y=y, cv=loo) > ``` > > > Best, > Sebastian > > > > On Mar 7, 2017, at 10:01 AM, Serafeim Loukas wrote: > > > > Dear Mahesh, > > > > Thank you for your response. > > > > I read the documentation however I did not find anything related to > cross-validation (leave one out). > > Can you give me a hint? > > > > Thank you, > > S > > > > ............................................. > > Loukas Serafeim > > University of Geneva > > email: seralouk at gmail.com > > > > > > 2017-03-07 10:56 GMT+01:00 Mahesh Kulkarni : > > Yes. Please see following link: > > > > http://scikit-learn.org/stable/modules/generated/ > sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > On Tue, Mar 7, 2017 at 3:18 PM, Serafeim Loukas > wrote: > > Dear scikit members, > > > > > > I would like to ask if there is any function that implements Linear > Discriminant Analysis with Cross Validation (leave one out). > > > > Thank you in advance, > > S > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From seralouk at gmail.com Wed Mar 8 04:16:44 2017 From: seralouk at gmail.com (Serafeim Loukas) Date: Wed, 8 Mar 2017 10:16:44 +0100 Subject: [scikit-learn] Linear Discriminant Analysis with Cross Validation in Python In-Reply-To: References:

Message-ID: Dear Sebastian, Thank you for your response. Best, S ............................................. Loukas Serafeim University of Geneva email: seralouk at gmail.com 2017-03-07 17:56 GMT+01:00 Sebastian Raschka : > Hi, Loukas and Mahesh, > for LOOCV, you could e.g., use the LeaveOneOut class > > ``` > from sklearn.discriminant_analysis import LinearDiscriminantAnalysis > from sklearn.model_selection import LeaveOneOut > > loo = LeaveOneOut() > lda = LinearDiscriminantAnalysis() > > test_fold_predictions = [] > > for train_index, test_index in loo.split(X): > X_train, X_test = X[train_index], X[test_index] > y_train, y_test = y[train_index], y[test_index] > lda.fit(X_train, y_train) > test_fold_predictions.append(lda.predict(X_test)) > ``` > > or you could pass the loo to the cross_val_score function directly: > > ``` > from sklearn.model_selection import cross_val_score > cross_val_score(estimator=lda, X=X, y=y, cv=loo) > ``` > > > Best, > Sebastian > > > > On Mar 7, 2017, at 10:01 AM, Serafeim Loukas wrote: > > > > Dear Mahesh, > > > > Thank you for your response. > > > > I read the documentation however I did not find anything related to > cross-validation (leave one out). > > Can you give me a hint? > > > > Thank you, > > S > > > > ............................................. > > Loukas Serafeim > > University of Geneva > > email: seralouk at gmail.com > > > > > > 2017-03-07 10:56 GMT+01:00 Mahesh Kulkarni : > > Yes. Please see following link: > > > > http://scikit-learn.org/stable/modules/generated/ > sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > On Tue, Mar 7, 2017 at 3:18 PM, Serafeim Loukas > wrote: > > Dear scikit members, > > > > > > I would like to ask if there is any function that implements Linear > Discriminant Analysis with Cross Validation (leave one out). > > > > Thank you in advance, > > S > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From msuzen at gmail.com Wed Mar 8 16:50:53 2017 From: msuzen at gmail.com (Suzen, Mehmet) Date: Wed, 8 Mar 2017 22:50:53 +0100 Subject: [scikit-learn] MAPE in scikit-learn? In-Reply-To: References: Message-ID: Hi Raghav; I suggest forecast package's code if you can read R [*]. A collection of measures related to time-series forecasting would be nice [*] Best, Mehmet [*] https://cran.r-project.org/web/packages/forecast/forecast.pdf From jlopez at ende.cc Sat Mar 11 08:04:57 2017 From: jlopez at ende.cc (=?utf-8?Q?Javier_L=C3=B3pez_Pe=C3=B1a?=) Date: Sat, 11 Mar 2017 13:04:57 +0000 Subject: [scikit-learn] Label encoding for classifiers and soft targets Message-ID: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> Hi there! I have been recently experimenting with model regularization through the use of soft targets, and I?d like to be able to play with that from sklearn. The main idea is as follows: imagine I want to fit a (probabilisitic) classifier with three possible targets, 0, 1, 2 If I pass my training set (X, y) to a sklearn classifier, the target vector y gets encoded so that each target becomes an array, [1, 0, 0], [0, 1, 0], or [0, 0, 1] What I would like to do is to be able to pass the targets directly in the encoded form, and avoid any further encoding. This allows for instance to pass targets as [0.9, 0.5, 0.5] if I want to prevent my classifier from becoming too opinionated on its predicted probabilities. Ideally I would like to do something like this: ``` clf = SomeClassifier(*parameters, encode_targets=False) ``` and then call ``` elf.fit(X, encoded_y) ``` Would it be simple to modify sklearn code to do this, or would it require a lot of tinkering such as modifying every single classifier under the sun? Cheers, J From konst.katrioplas at gmail.com Sat Mar 11 08:29:30 2017 From: konst.katrioplas at gmail.com (Konstantinos Katrioplas) Date: Sat, 11 Mar 2017 15:29:30 +0200 Subject: [scikit-learn] issue suggestion - decision trees - GSoC Message-ID: <33a3a5bf-37dd-1cad-c4ae-ef4b62294a8c@gmail.com> Hello all, While I am waiting for the PR that I have submitted to be evaluated (https://github.com/scikit-learn/scikit-learn/pull/8563), would you suggest another (easy) issue for me to work on? Ideally something for which I will write some substantial code, so as to present it in my application for GSoC? Is anyone interested to mentor me in the parallelization of decision trees? I admit I am not yet really familiar with the current tree code (although I have been using the method for regression on a research project) but I am very much intrigued by the idea and willing to learn all about it until the summer. Regards, Konstantinos From gborad at gmail.com Sun Mar 12 01:38:20 2017 From: gborad at gmail.com (Gautam Borad) Date: Sun, 12 Mar 2017 12:08:20 +0530 Subject: [scikit-learn] scikit-learn Digest, Vol 12, Issue 18 In-Reply-To: References: Message-ID: On 11 Mar 2017 22:32, wrote: > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Label encoding for classifiers and soft targets > (Javier L?pez Pe?a) > 2. issue suggestion - decision trees - GSoC (Konstantinos Katrioplas) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 11 Mar 2017 13:04:57 +0000 > From: Javier L?pez Pe?a > To: scikit-learn at python.org > Subject: [scikit-learn] Label encoding for classifiers and soft > targets > Message-ID: <542B0BDD-F329-4F26-9001-9F535426306C at ende.cc> > Content-Type: text/plain; charset=utf-8 > > Hi there! > > I have been recently experimenting with model regularization through the > use of soft targets, > and I?d like to be able to play with that from sklearn. > > The main idea is as follows: imagine I want to fit a (probabilisitic) > classifier with three possible > targets, 0, 1, 2 > > If I pass my training set (X, y) to a sklearn classifier, the target > vector y gets encoded so that > each target becomes an array, [1, 0, 0], [0, 1, 0], or [0, 0, 1] > > What I would like to do is to be able to pass the targets directly in the > encoded form, and avoid > any further encoding. This allows for instance to pass targets as [0.9, > 0.5, 0.5] if I want to prevent > my classifier from becoming too opinionated on its predicted probabilities. > > Ideally I would like to do something like this: > ``` > clf = SomeClassifier(*parameters, encode_targets=False) > ``` > > and then call > ``` > elf.fit(X, encoded_y) > ``` > > Would it be simple to modify sklearn code to do this, or would it require > a lot of tinkering > such as modifying every single classifier under the sun? > > Cheers, > J > > ------------------------------ > > Message: 2 > Date: Sat, 11 Mar 2017 15:29:30 +0200 > From: Konstantinos Katrioplas > To: scikit-learn at python.org > Subject: [scikit-learn] issue suggestion - decision trees - GSoC > Message-ID: <33a3a5bf-37dd-1cad-c4ae-ef4b62294a8c at gmail.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > Hello all, > > While I am waiting for the PR that I have submitted to be evaluated > (https://github.com/scikit-learn/scikit-learn/pull/8563), would you > suggest another (easy) issue for me to work on? Ideally something for > which I will write some substantial code, so as to present it in my > application for GSoC? > > Is anyone interested to mentor me in the parallelization of decision > trees? I admit I am not yet really familiar with the current tree code > (although I have been using the method for regression on a research > project) but I am very much intrigued by the idea and willing to learn > all about it until the summer. > > Regards, > Konstantinos > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 12, Issue 18 > ******************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomarshubham24 at gmail.com Sun Mar 12 09:59:35 2017 From: tomarshubham24 at gmail.com (Shubham Singh Tomar) Date: Sun, 12 Mar 2017 19:29:35 +0530 Subject: [scikit-learn] Error while using GridSearchCV. In-Reply-To: References: Message-ID: Hi, guys! Thanks for the responses. @Fernando: Yes, this code is, in fact, part of Udacity's Boston Housing project. I'm currently working on my MLE Nanodegree. I was able to modify the code to go with *sklearn.model_selection*, as you suggested. And, it's great to see you help Udacity students here as well :) Do you think we should update the code and project description in main Udacity repository to support the newer sklearn versions? On Wed, Mar 8, 2017 at 2:32 AM, Fernando Marcos Wittmann < fernando.wittmann at gmail.com> wrote: > Hey Shubham, > > I am a project reviewer at Udacity. This code seems to be part of one of > our projects (P1 - Boston Housing > ). > I think that you have updated the old module sklearn.cross_validation to > the module sklearn.model_detection, is that correct? If yes, then you > should also update the parameters in ShuffleSplit to match with this new > version (check the docs > ). > Try to update ShuffleSplit to the following line of code: > > cv_sets = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0) > > I hope that helps! Feel free to send me a PM. > > > On Tue, Mar 7, 2017 at 10:24 AM, Shubham Singh Tomar < > tomarshubham24 at gmail.com> wrote: > >> Hi, >> >> I'm trying to use GridSearchCV to tune the parameters for >> DecisionTreeRegressor. I'm using sklearn 0.18.1 >> >> I'm getting the following error: >> >> ---------------------------------------------------------------------------TypeError Traceback (most recent call last) in () 1 # Fit the training data to the model using grid search----> 2 reg = fit_model(X_train, y_train) 3 4 # Produce the value for 'max_depth' 5 print "Parameter 'max_depth' is {} for the optimal model.".format(reg.get_params()['max_depth']) >> in fit_model(X, y) 11 12 # Create cross-validation sets from the training data---> 13 cv_sets = ShuffleSplit(X.shape[0], n_splits = 10, test_size = 0.20, random_state = 0) 14 15 # TODO: Create a decision tree regressor object >> TypeError: __init__() got multiple values for keyword argument 'n_splits' >> >> >> >> >> -- >> *Thanks,* >> *Shubham Singh Tomar* >> *Autodidact24.github.io * >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > > Fernando Marcos Wittmann > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- *Thanks,* *Shubham Singh Tomar* *Autodidact24.github.io * -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun Mar 12 14:38:44 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 12 Mar 2017 19:38:44 +0100 Subject: [scikit-learn] Label encoding for classifiers and soft targets In-Reply-To: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> References: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> Message-ID: <20170312183844.GD694569@phare.normalesup.org> > Would it be simple to modify sklearn code to do this, or would it require a lot of tinkering > such as modifying every single classifier under the sun? You can use sample weights to go a bit in this direction. But in general, the mathematical meaning of your intuitions will depend on the classifier, so they will not be general ways of implementing them without a lot of tinkering. From jlopez at ende.cc Sun Mar 12 15:11:02 2017 From: jlopez at ende.cc (=?utf-8?Q?Javier_L=C3=B3pez_Pe=C3=B1a?=) Date: Sun, 12 Mar 2017 19:11:02 +0000 Subject: [scikit-learn] Label encoding for classifiers and soft targets In-Reply-To: <20170312183844.GD694569@phare.normalesup.org> References: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> <20170312183844.GD694569@phare.normalesup.org> Message-ID: <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc> > On 12 Mar 2017, at 18:38, Gael Varoquaux wrote: > > You can use sample weights to go a bit in this direction. But in general, > the mathematical meaning of your intuitions will depend on the > classifier, so they will not be general ways of implementing them without > a lot of tinkering. I see? to be honest for my purposes it would be enough to bypass the target binarization for the MLP classifier, so maybe I will just fork my own copy of that class for this. The purpose is two-fold, on the one hand use the probabilities generated by a very complex model (e.g. a massive ensemble) to train a simpler one that achieves comparable performance at a fraction of the cost. Any universal classifier will do (neural networks are the prime example). The second purpose it to use classes probabilities instead of observed classes at training time. In some problems this helps with model regularization (see section 6 of [1]) Cheers, J [1] https://arxiv.org/pdf/1503.02531v1.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From fastier at linkedin.com Sun Mar 12 22:07:22 2017 From: fastier at linkedin.com (Frank Astier) Date: Sun, 12 Mar 2017 19:07:22 -0700 Subject: [scikit-learn] Differences between scikit-learn and Spark.ml for regression toy problem Message-ID: (this was also posted to stackoverflow on 03/10) I am setting up a very simple logistic regression problem in scikit-learn and in spark.ml, and the results diverge: the models they learn are different, but I can't figure out why (data is the same, model type is the same, regularization is the same...). No doubt I am missing some setting on one side or the other. Which setting? How should I set up either scikit or spark.ml to find the same model as its counterpart? I give the sklearn code and spark.ml code below. Both should be ready to cut-and-paste and run. scikit-learn code: ---------------------- import numpy as np from sklearn.linear_model import LogisticRegression, Ridge X = np.array([ [-0.7306653538519616, 0.0], [0.6750417712898752, -0.4232874171873786], [0.1863463229359709, -0.8163423997075965], [-0.6719842051493347, 0.0], [0.9699938346531928, 0.0], [0.22759406190283604, 0.0], [0.9688721028330911, 0.0], [0.5993795346650845, 0.0], [0.9219423508390701, -0.8972778242305388], [0.7006904841584055, -0.5607635619919824] ]) y = np.array([ 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0 ]) m, n = X.shape # Add intercept term to simulate inputs to GameEstimator X_with_intercept = np.hstack((X, np.ones(m)[:,np.newaxis])) l = 0.3 e = LogisticRegression( fit_intercept=False, penalty='l2', C=1/l, max_iter=100, tol=1e-11) e.fit(X_with_intercept, y) print e.coef_ # => [[ 0.98662189 0.45571052 -0.23467255]] # Linear regression is called Ridge in sklearn e = Ridge( fit_intercept=False, alpha=l, max_iter=100, tol=1e-11) e.fit(X_with_intercept, y) print e.coef_ # =>[ 0.32155545 0.17904355 0.41222418] spark.ml code: ------------------- import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.regression.LinearRegression import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.sql.SQLContext object TestSparkRegression { def main(args: Array[String]): Unit = { import org.apache.log4j.{Level, Logger} Logger.getLogger("org").setLevel(Level.OFF) Logger.getLogger("akka").setLevel(Level.OFF) val conf = new SparkConf().setAppName("test").setMaster("local") val sc = new SparkContext(conf) val sparkTrainingData = new SQLContext(sc) .createDataFrame(Seq( LabeledPoint(0.0, Vectors.dense(-0.7306653538519616, 0.0)), LabeledPoint(1.0, Vectors.dense(0.6750417712898752, -0.4232874171873786)), LabeledPoint(1.0, Vectors.dense(0.1863463229359709, -0.8163423997075965)), LabeledPoint(0.0, Vectors.dense(-0.6719842051493347, 0.0)), LabeledPoint(1.0, Vectors.dense(0.9699938346531928, 0.0)), LabeledPoint(1.0, Vectors.dense(0.22759406190283604, 0.0)), LabeledPoint(1.0, Vectors.dense(0.9688721028330911, 0.0)), LabeledPoint(0.0, Vectors.dense(0.5993795346650845, 0.0)), LabeledPoint(0.0, Vectors.dense(0.9219423508390701, -0.8972778242305388)), LabeledPoint(0.0, Vectors.dense(0.7006904841584055, -0.5607635619919824)))) .toDF("label", "features") val logisticModel = new LogisticRegression() .setRegParam(0.3) .setLabelCol("label") .setFeaturesCol("features") .fit(sparkTrainingData) println(s"Spark logistic model coefficients: ${logisticModel.coefficients} Intercept: ${logisticModel.intercept}") // Spark logistic model coefficients: [0.5451588538376263,0.26740606573584713] Intercept: -0.13897955358689987 val linearModel = new LinearRegression() .setRegParam(0.3) .setLabelCol("label") .setFeaturesCol("features") .setSolver("l-bfgs") .fit(sparkTrainingData) println(s"Spark linear model coefficients: ${linearModel.coefficients} Intercept: ${linearModel.intercept}") // Spark linear model coefficients: [0.19852664861346023,0.11501200541407802] Intercept: 0.45464906876832323 sc.stop() } } Thanks, Frank -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.louppe at gmail.com Mon Mar 13 03:43:29 2017 From: g.louppe at gmail.com (Gilles Louppe) Date: Mon, 13 Mar 2017 08:43:29 +0100 Subject: [scikit-learn] Label encoding for classifiers and soft targets In-Reply-To: <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc> References: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> <20170312183844.GD694569@phare.normalesup.org> <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc> Message-ID: Hi Javier, In the particular case of tree-based models, you case use the soft labels to create a multi-output regression problem, which would yield an equivalent classifier (one can show that reduction of variance and the gini index would yield the same trees). So basically, reg = RandomForestRegressor() reg.fit(X, encoded_y) should work. Gilles On 12 March 2017 at 20:11, Javier L?pez Pe?a wrote: > > On 12 Mar 2017, at 18:38, Gael Varoquaux > wrote: > > You can use sample weights to go a bit in this direction. But in general, > the mathematical meaning of your intuitions will depend on the > classifier, so they will not be general ways of implementing them without > a lot of tinkering. > > > I see? to be honest for my purposes it would be enough to bypass the target > binarization for > the MLP classifier, so maybe I will just fork my own copy of that class for > this. > > The purpose is two-fold, on the one hand use the probabilities generated by > a very complex > model (e.g. a massive ensemble) to train a simpler one that achieves > comparable performance at a > fraction of the cost. Any universal classifier will do (neural networks are > the prime example). > > The second purpose it to use classes probabilities instead of observed > classes at training time. > In some problems this helps with model regularization (see section 6 of > [1]) > > Cheers, > J > > [1] https://arxiv.org/pdf/1503.02531v1.pdf > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From jlopez at ende.cc Mon Mar 13 08:35:22 2017 From: jlopez at ende.cc (=?utf-8?Q?Javier_L=C3=B3pez_Pe=C3=B1a?=) Date: Mon, 13 Mar 2017 12:35:22 +0000 Subject: [scikit-learn] Label encoding for classifiers and soft targets In-Reply-To: References: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> <20170312183844.GD694569@phare.normalesup.org> <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc> Message-ID: Hi Giles, thanks for the suggestion! Training a regression tree would require sticking some kind of probability normaliser at the end to ensure proper probabilities, this might somehow hurt sharpness or calibration. Unfortunately, one of the things I am trying to do with this is moving away from RF and they humongous memory requirements? Anyway, I think I have a fairly good idea on how to modify the MLPClassifier to get what I need. When I get around to do it I?ll drop a line to see if there might be any interest on pushing the code upstream. Cheers, J > On 13 Mar 2017, at 07:43, Gilles Louppe wrote: > > Hi Javier, > > In the particular case of tree-based models, you case use the soft > labels to create a multi-output regression problem, which would yield > an equivalent classifier (one can show that reduction of variance and > the gini index would yield the same trees). > > So basically, > > reg = RandomForestRegressor() > reg.fit(X, encoded_y) > > should work. > > Gilles From stuart at stuartreynolds.net Mon Mar 13 12:57:56 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Mon, 13 Mar 2017 09:57:56 -0700 Subject: [scikit-learn] Logistic regression with elastic net regularization Message-ID: Is there an implementation of logistic regression with elastic net regularization in scikit? (or pointers on implementing this - its seems non-convex and so you might expect poor behavior with some optimizers) - Stuart -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Mon Mar 13 13:04:28 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Mon, 13 Mar 2017 10:04:28 -0700 Subject: [scikit-learn] Logistic regression with elastic net regularization In-Reply-To: References: Message-ID: Hi Stuart Take a look at this issue: https://github.com/scikit-learn/scikit-learn/issues/2968 On Mon, Mar 13, 2017 at 9:57 AM, Stuart Reynolds wrote: > Is there an implementation of logistic regression with elastic net > regularization in scikit? > (or pointers on implementing this - its seems non-convex and so you might > expect poor behavior with some optimizers) > > > - Stuart > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Mon Mar 13 13:06:08 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Mon, 13 Mar 2017 10:06:08 -0700 Subject: [scikit-learn] Differences between scikit-learn and Spark.ml for regression toy problem In-Reply-To: References: Message-ID: Both libraries are heavily parameterized. You should check what the defaults are for both. Some ideas: - What regularization is being used. L1/L2? - Does the regularization parameter have the same interpretation? 1/C = lambda? Some libraries use C. Some use lambda. - Also, some libraries regularize the intercept (scikit), other do not. (It doesn't seem like a particularly good idea to regularize the intercept if your optimizer permits not doing it). On Sun, Mar 12, 2017 at 7:07 PM, Frank Astier via scikit-learn < scikit-learn at python.org> wrote: > (this was also posted to stackoverflow on 03/10) > > I am setting up a very simple logistic regression problem in scikit-learn > and in spark.ml, and the results diverge: the models they learn are > different, but I can't figure out why (data is the same, model type is the > same, regularization is the same...). > > No doubt I am missing some setting on one side or the other. Which > setting? How should I set up either scikit or spark.ml to find the same > model as its counterpart? > > I give the sklearn code and spark.ml code below. Both should be ready to > cut-and-paste and run. > > scikit-learn code: > ---------------------- > > import numpy as np > from sklearn.linear_model import LogisticRegression, Ridge > > X = np.array([ > [-0.7306653538519616, 0.0], > [0.6750417712898752, -0.4232874171873786], > [0.1863463229359709, -0.8163423997075965], > [-0.6719842051493347, 0.0], > [0.9699938346531928, 0.0], > [0.22759406190283604, 0.0], > [0.9688721028330911, 0.0], > [0.5993795346650845, 0.0], > [0.9219423508390701, -0.8972778242305388], > [0.7006904841584055, -0.5607635619919824] > ]) > > y = np.array([ > 0.0, > 1.0, > 1.0, > 0.0, > 1.0, > 1.0, > 1.0, > 0.0, > 0.0, > 0.0 > ]) > > m, n = X.shape > > # Add intercept term to simulate inputs to GameEstimator > X_with_intercept = np.hstack((X, np.ones(m)[:,np.newaxis])) > > l = 0.3 > e = LogisticRegression( > fit_intercept=False, > penalty='l2', > C=1/l, > max_iter=100, > tol=1e-11) > > e.fit(X_with_intercept, y) > > print e.coef_ > # => [[ 0.98662189 0.45571052 -0.23467255]] > > # Linear regression is called Ridge in sklearn > e = Ridge( > fit_intercept=False, > alpha=l, > max_iter=100, > tol=1e-11) > > e.fit(X_with_intercept, y) > > print e.coef_ > # =>[ 0.32155545 0.17904355 0.41222418] > > spark.ml code: > ------------------- > > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.ml.classification.LogisticRegression > import org.apache.spark.ml.regression.LinearRegression > import org.apache.spark.mllib.linalg.Vectors > import org.apache.spark.mllib.regression.LabeledPoint > import org.apache.spark.sql.SQLContext > > object TestSparkRegression { > def main(args: Array[String]): Unit = { > import org.apache.log4j.{Level, Logger} > > Logger.getLogger("org").setLevel(Level.OFF) > Logger.getLogger("akka").setLevel(Level.OFF) > > val conf = new SparkConf().setAppName("test").setMaster("local") > val sc = new SparkContext(conf) > > val sparkTrainingData = new SQLContext(sc) > .createDataFrame(Seq( > LabeledPoint(0.0, Vectors.dense(-0.7306653538519616, 0.0)), > LabeledPoint(1.0, Vectors.dense(0.6750417712898752, > -0.4232874171873786)), > LabeledPoint(1.0, Vectors.dense(0.1863463229359709, > -0.8163423997075965)), > LabeledPoint(0.0, Vectors.dense(-0.6719842051493347, 0.0)), > LabeledPoint(1.0, Vectors.dense(0.9699938346531928, 0.0)), > LabeledPoint(1.0, Vectors.dense(0.22759406190283604, 0.0)), > LabeledPoint(1.0, Vectors.dense(0.9688721028330911, 0.0)), > LabeledPoint(0.0, Vectors.dense(0.5993795346650845, 0.0)), > LabeledPoint(0.0, Vectors.dense(0.9219423508390701, > -0.8972778242305388)), > LabeledPoint(0.0, Vectors.dense(0.7006904841584055, > -0.5607635619919824)))) > .toDF("label", "features") > > val logisticModel = new LogisticRegression() > .setRegParam(0.3) > .setLabelCol("label") > .setFeaturesCol("features") > .fit(sparkTrainingData) > > println(s"Spark logistic model coefficients: > ${logisticModel.coefficients} Intercept: ${logisticModel.intercept}") > // Spark logistic model coefficients: [0.5451588538376263,0.26740606573584713] > Intercept: -0.13897955358689987 > > val linearModel = new LinearRegression() > .setRegParam(0.3) > .setLabelCol("label") > .setFeaturesCol("features") > .setSolver("l-bfgs") > .fit(sparkTrainingData) > > println(s"Spark linear model coefficients: > ${linearModel.coefficients} Intercept: ${linearModel.intercept}") > // Spark linear model coefficients: [0.19852664861346023,0.11501200541407802] > Intercept: 0.45464906876832323 > > sc.stop() > } > } > > Thanks, > > Frank > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Mon Mar 13 13:07:55 2017 From: g.lemaitre58 at gmail.com (Guillaume Lemaitre) Date: Mon, 13 Mar 2017 18:07:55 +0100 Subject: [scikit-learn] Logistic regression with elastic net regularization In-Reply-To: (Stuart Reynolds's message of "Mon, 13 Mar 2017 09:57:56 -0700") References: Message-ID: <874lyx9kb8.fsf@gmail.com> Recently, there are some issues/PRs tackling the topic: https://github.com/scikit-learn/scikit-learn/issues/8288 https://github.com/scikit-learn/scikit-learn/issues/8446 From stuart at stuartreynolds.net Mon Mar 13 13:07:57 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Mon, 13 Mar 2017 10:07:57 -0700 Subject: [scikit-learn] Logistic regression with elastic net regularization In-Reply-To: References: Message-ID: Perfect. Thanks -- will give it a go. On Mon, Mar 13, 2017 at 10:04 AM, Jacob Schreiber wrote: > Hi Stuart > > Take a look at this issue: https://github.com/scikit-learn/scikit-learn/ > issues/2968 > > On Mon, Mar 13, 2017 at 9:57 AM, Stuart Reynolds < > stuart at stuartreynolds.net> wrote: > >> Is there an implementation of logistic regression with elastic net >> regularization in scikit? >> (or pointers on implementing this - its seems non-convex and so you might >> expect poor behavior with some optimizers) >> >> >> - Stuart >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Mon Mar 13 13:08:07 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Mon, 13 Mar 2017 13:08:07 -0400 Subject: [scikit-learn] Logistic regression with elastic net regularization In-Reply-To: References: Message-ID: <98AA67A8-71D6-402C-8F99-5CAB64D28525@gmail.com> Hi, Stuart, I think the only way to do that right now would be through the SGD classifier, e.g., sklearn.linear_model.SGDClassifier(loss='log', penalty='elasticnet' ?) Best, Sebastian > On Mar 13, 2017, at 12:57 PM, Stuart Reynolds wrote: > > Is there an implementation of logistic regression with elastic net regularization in scikit? > (or pointers on implementing this - its seems non-convex and so you might expect poor behavior with some optimizers) > > > - Stuart > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From t3kcit at gmail.com Mon Mar 13 17:17:22 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 13 Mar 2017 17:17:22 -0400 Subject: [scikit-learn] Label encoding for classifiers and soft targets In-Reply-To: <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc> References: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> <20170312183844.GD694569@phare.normalesup.org> <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc> Message-ID: On 03/12/2017 03:11 PM, Javier L?pez Pe?a wrote: > The purpose is two-fold, on the one hand use the probabilities > generated by a very complex > model (e.g. a massive ensemble) to train a simpler one that achieves > comparable performance at a > fraction of the cost. Any universal classifier will do (neural > networks are the prime example). You could use a regression model with a logistic sigmoid in the output layer. From t3kcit at gmail.com Mon Mar 13 17:18:33 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 13 Mar 2017 17:18:33 -0400 Subject: [scikit-learn] Label encoding for classifiers and soft targets In-Reply-To: References: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> <20170312183844.GD694569@phare.normalesup.org> <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc>

Message-ID: <0520bb5d-6d1c-e14e-ed26-fcef4725d167@gmail.com> On 03/13/2017 08:35 AM, Javier L?pez Pe?a wrote: > Training a regression tree would require sticking some kind of > probability normaliser at the end to ensure proper probabilities, > this might somehow hurt sharpness or calibration. No, if all the samples are normalized and your aggregation function is sane (like the mean), the output will also be normalized. From jlopez at ende.cc Mon Mar 13 17:54:24 2017 From: jlopez at ende.cc (=?windows-1252?Q?Javier_L=F3pez_Pe=F1a?=) Date: Mon, 13 Mar 2017 21:54:24 +0000 Subject: [scikit-learn] Label encoding for classifiers and soft targets In-Reply-To: References: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> <20170312183844.GD694569@phare.normalesup.org> <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc> Message-ID: > You could use a regression model with a logistic sigmoid in the output layer. By training a regression network with logistic activation the outputs do not add to 1. I just checked on a minimal example on the iris dataset. From jlopez at ende.cc Mon Mar 13 17:56:14 2017 From: jlopez at ende.cc (=?utf-8?Q?Javier_L=C3=B3pez_Pe=C3=B1a?=) Date: Mon, 13 Mar 2017 21:56:14 +0000 Subject: [scikit-learn] Label encoding for classifiers and soft targets In-Reply-To: <0520bb5d-6d1c-e14e-ed26-fcef4725d167@gmail.com> References: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> <20170312183844.GD694569@phare.normalesup.org> <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc>

<0520bb5d-6d1c-e14e-ed26-fcef4725d167@gmail.com> Message-ID: <4D74F250-C79C-4900-8670-5C420B620C2B@ende.cc> > On 13 Mar 2017, at 21:18, Andreas Mueller wrote: > > No, if all the samples are normalized and your aggregation function is sane (like the mean), the output will also be normalised. You are completely right, I hadn?t checked this for random forests. Still, my purpose is to reduce model complexity, and RF require too much memory to be used in my production environment. From ssaligra at hawk.iit.edu Mon Mar 13 18:29:10 2017 From: ssaligra at hawk.iit.edu (Shreyas Saligrama chandrakan) Date: Mon, 13 Mar 2017 15:29:10 -0700 Subject: [scikit-learn] GSoc, 2017 (proposal idea and intro) .reg In-Reply-To: References:

Message-ID: Hi, Is it possible for me to contribute a library to introduce SVM's with tree kernel (like current available one in svmlight) which is currently missing in scikit-learn? Best, Shreyas On 5 Mar 2017 11:03 a.m., "Andreas Mueller" wrote: > There was a PR here: > https://github.com/scikit-learn/scikit-learn/pull/5530 > > but it didn't seem to work. Feel free to convince us otherwise ;) > > > On 03/02/2017 08:23 PM, SHUBHAM BHARDWAJ 15BCE0704 wrote: > > Hello Sir, > Very Sorry for the numbers I saw this written in the comments.I assumed > -Given the person who suggested the paper might have taken a look into the > number of citations.I will make sure to personally check myself. > > Regards > Shubham Bhardwaj > > On Fri, Mar 3, 2017 at 6:40 AM, Guillaume Lema?tre > wrote: > >> I think that you mean this paper -> Scalable K-Means++ -> 218 citations >> >> On 3 March 2017 at 02:00, SHUBHAM BHARDWAJ 15BCE0704 < >> shubham.bhardwaj2015 at vit.ac.in> wrote: >> >>> Hello Sir, >>> >>> Thanks a lot for the reply. Sorry for not being elaborate about what I >>> was trying to address. I wanted to implement this [ >>> http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf] (1200+citations)- >>> mentioned in comments. This pertains to the stalled issue #4357 .Proposal >>> idea - implementing a scalable kmeans++. >>> >>> Regards >>> Shubham Bhardwaj >>> >>> On Fri, Mar 3, 2017 at 12:01 AM, Jacob Schreiber < >>> jmschreiber91 at gmail.com> wrote: >>> >>>> Hi Shubham >>>> >>>> Thanks for your interest. I'm eager to see your contributions to >>>> sklearn in the future. However, I'm pretty sure kmeans++ is already >>>> implemented: http://scikit-learn.org/stable/modules/generate >>>> d/sklearn.cluster.KMeans.html >>>> >>>> Jacob >>>> >>>> On Thu, Mar 2, 2017 at 1:07 AM, SHUBHAM BHARDWAJ 15BCE0704 < >>>> shubham.bhardwaj2015 at vit.ac.in> wrote: >>>> >>>>> Hello Sir, >>>>> >>>>> My introduction : >>>>> I am a 2nd year student studying Computer Science and engineering from >>>>> VIT, Vellore. I work in Google Developers Group VIT. All my experience has >>>>> been about collaborating with a lot of people ,working as a team, building >>>>> products and learning along the way. >>>>> Since scikit-learn is participating this time I am too planning to >>>>> submit a proposal. >>>>> >>>>> Proposal idea: >>>>> I am really interested in implementing kmeans++ algorithm.I was doing >>>>> some work on DT but I found this very appealing. Just wanted to know if it >>>>> can be a good project idea. >>>>> >>>>> Regards >>>>> Shubham Bhardwaj >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> -- >> Guillaume Lemaitre >> INRIA Saclay - Ile-de-France >> Equipe PARIETAL >> guillaume.lemaitre at inria.f r --- >> https://glemaitre.github.io/ >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Tue Mar 14 12:17:14 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Tue, 14 Mar 2017 09:17:14 -0700 Subject: [scikit-learn] Logistic regression with elastic net regularization In-Reply-To: <98AA67A8-71D6-402C-8F99-5CAB64D28525@gmail.com> References: <98AA67A8-71D6-402C-8F99-5CAB64D28525@gmail.com> Message-ID: Many thanks. On Mon, Mar 13, 2017 at 10:08 AM, Sebastian Raschka wrote: > Hi, Stuart, > I think the only way to do that right now would be through the SGD > classifier, e.g., > > sklearn.linear_model.SGDClassifier(loss='log', penalty='elasticnet' ?) > > Best, > Sebastian > > > On Mar 13, 2017, at 12:57 PM, Stuart Reynolds > wrote: > > > > Is there an implementation of logistic regression with elastic net > regularization in scikit? > > (or pointers on implementing this - its seems non-convex and so you > might expect poor behavior with some optimizers) > > > > > > - Stuart > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Tue Mar 14 16:39:39 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 14 Mar 2017 21:39:39 +0100 Subject: [scikit-learn] Logistic regression with elastic net regularization In-Reply-To: References: <98AA67A8-71D6-402C-8F99-5CAB64D28525@gmail.com> Message-ID: Note that SGD is not very good at optimizing finely with a non-smooth penalty (e.g. l1 or elasticnet). The future SAGA solver is going to be much better at finding the optimal sparsity support (although this support is not guaranteed to be stable across re-sampling of the training set if the training set is small). -- Olivier From olivier.grisel at ensta.org Tue Mar 14 16:41:29 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 14 Mar 2017 21:41:29 +0100 Subject: [scikit-learn] Logistic regression with elastic net regularization In-Reply-To: References: <98AA67A8-71D6-402C-8F99-5CAB64D28525@gmail.com>

Message-ID: >From a generalization point of view (test accuracy), the optimal sparsity support should not matter much though, but it can be helpful to find a the optimally sparsest solution for either computational constraints (smaller models with a lower prediction latency) and interpretation of the weights (domain specific). -- Olivier From karandesai281196 at gmail.com Wed Mar 15 04:48:28 2017 From: karandesai281196 at gmail.com (Karan Desai) Date: Wed, 15 Mar 2017 14:18:28 +0530 Subject: [scikit-learn] [GSoC 2017] First Draft, request for suggestions - Improve Online Learning of Linear Models. Message-ID: Hello developers, I'm Karan Desai, an Electrical Engineering Undergraduate at IIT Roorkee. I was following the community since October and initially planned to work on Pytest Migration idea. But on meticulous discussions, it was concluded that the migration task might be short for a three month wide timeline. Besides work is in progress on that. I particularly found the first project idea appealing, and went about gathering ingredients to make the perfect recipe for summers. Finally I can outline it as stated below. The description was quite short, so I will be happy to include more in it if need be. 1. There's a gradient descent optimizer, but I could not find an optimizer for adaptive learning strategies (I saw a method for adam in MLP though). So adding that can be a part of my project. 2. I looked into benchmarks directory, and checked a comparison of SGD against coordinate descent and ridge regression. Similar type of benching should be done with this new Optimizer/s as well. 3. There's a lack of multinomial logloss as mentioned in description (categorical cross entropy for classification tasks). I can work on adding that as well. As an addition, I can work on KL divergence, poisson and cosine proximity losses, to name a few. In my opinion, these are pretty standard and can be a nice to have. They already exist as metrics, just need to be ported to Cython and used as an optimization objective for linear classifiers. 4. About a tool to anneal learning rate: I suggest a new approach to look at this - as a callback. I searched through the documentation and I could not find this way of handling tidbits during training of models. We should be able to provide a callback to the constructor of a linear model which can do any dedicated job after every epoch, be it learning rate annealing, saving model checkpoint, getting custom verbose output, or as creative as uploading data to server for real time plots on any website. If this gets working in place, we can generalize this to many classes of scikit-learn. As a part of my project, I am planning to enrich scikit-learn to be shipping some ready made callback helpers for easy plug and play. I am still not sure whether this is sufficient for a three months timeline, because I am assuming the review cycles might take slightly longer time because of scikit-learn being such a huge community. As far as the math is concerned, I have searched for some good references, some of which are listed below: 1. First two points will heavily rely on @mblondel's lightning package, and this blog post: http://sebastianruder.com/optimizing-gradient-descent/ 2. For the losses (third point), I have seen the way existing losses are written in cython, as well as in the metrics submodule. That should help a lot. 3. About the fourth point, first of all I would be happy to get some suggestions from the community. Once satisfied, I should implement a very basic prototype with some existing class, maybe convert verbose logging of some class to a callback structure. Will include that in the second draft of proposal which would be a preliminary version of what I shall submit on GSoC website. More about me: 1. Github Profile: https://www.github.com/karandesai-96 2. GSoC 2016 Project: https://goo.gl/mdFZ6m 3. Joblib Contributions: https://git.io/vyMSx 4. Scikit-learn Contributions: https://git.io/vyMSF I'll be eagerly waiting for feedback. Thanks. Regards, Karan Desai, Department of Electrical Engineering, IIT Roorkee, India. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Mar 15 10:42:58 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 15 Mar 2017 10:42:58 -0400 Subject: [scikit-learn] Label encoding for classifiers and soft targets In-Reply-To: References: <542B0BDD-F329-4F26-9001-9F535426306C@ende.cc> <20170312183844.GD694569@phare.normalesup.org> <72559155-CB35-441E-9F9D-6FD679033E17@ende.cc>

Message-ID: On 03/13/2017 05:54 PM, Javier L?pez Pe?a wrote: >> You could use a regression model with a logistic sigmoid in the output layer. > By training a regression network with logistic activation the outputs do not add to 1. > I just checked on a minimal example on the iris dataset. Sorry meant softmax ;) From t3kcit at gmail.com Wed Mar 15 10:48:23 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 15 Mar 2017 10:48:23 -0400 Subject: [scikit-learn] [GSoC 2017] First Draft, request for suggestions - Improve Online Learning of Linear Models. In-Reply-To: References: Message-ID: On 03/15/2017 04:48 AM, Karan Desai wrote: > 4. About a tool to anneal learning rate: I suggest a new approach to > look at this - as a callback. I searched through the documentation and > I could not find this way of handling tidbits during training of > models. We should be able to provide a callback to the constructor of > a linear model which can do any dedicated job after every epoch, be it > learning rate annealing, saving model checkpoint, getting custom > verbose output, or as creative as uploading data to server for real > time plots on any website. There has been some effort on doing adagrad but it was ultimately discontinued, I think. There was quite a bit of complexity to handle. The problem with callbacks is that for callbacks on each iteration to be feasible, they need to be cython functions. Otherwise they will be too slow. You could do python callbacks, but they could not be called at every iteration, and so they wouldn't be suitable to implement something like adagrad or adam. Best, Andy From shubham.bhardwaj2015 at vit.ac.in Wed Mar 15 13:28:00 2017 From: shubham.bhardwaj2015 at vit.ac.in (SHUBHAM BHARDWAJ 15BCE0704) Date: Wed, 15 Mar 2017 22:58:00 +0530 Subject: [scikit-learn] GSoc, 2017 (proposal idea and intro) .reg In-Reply-To: References:

Message-ID: Hello Sir, Greetings. I have coded a sequential version of Scalable Kmeans++ (#8585) and have included a test script for testing it in the pr's description. https://github.com/scikit-learn/scikit-learn/pull/8585. Regards Shubham Bhardwaj On Tue, Mar 14, 2017 at 3:59 AM, Shreyas Saligrama chandrakan < ssaligra at hawk.iit.edu> wrote: > Hi, > > Is it possible for me to contribute a library to introduce SVM's with tree > kernel (like current available one in svmlight) which is currently missing > in scikit-learn? > > Best, > Shreyas > > On 5 Mar 2017 11:03 a.m., "Andreas Mueller" wrote: > >> There was a PR here: >> https://github.com/scikit-learn/scikit-learn/pull/5530 >> >> but it didn't seem to work. Feel free to convince us otherwise ;) >> >> >> On 03/02/2017 08:23 PM, SHUBHAM BHARDWAJ 15BCE0704 wrote: >> >> Hello Sir, >> Very Sorry for the numbers I saw this written in the comments.I assumed >> -Given the person who suggested the paper might have taken a look into the >> number of citations.I will make sure to personally check myself. >> >> Regards >> Shubham Bhardwaj >> >> On Fri, Mar 3, 2017 at 6:40 AM, Guillaume Lema?tre < >> g.lemaitre58 at gmail.com> wrote: >> >>> I think that you mean this paper -> Scalable K-Means++ -> 218 citations >>> >>> On 3 March 2017 at 02:00, SHUBHAM BHARDWAJ 15BCE0704 < >>> shubham.bhardwaj2015 at vit.ac.in> wrote: >>> >>>> Hello Sir, >>>> >>>> Thanks a lot for the reply. Sorry for not being elaborate about what I >>>> was trying to address. I wanted to implement this [ >>>> http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf] (1200+citations)- >>>> mentioned in comments. This pertains to the stalled issue #4357 .Proposal >>>> idea - implementing a scalable kmeans++. >>>> >>>> Regards >>>> Shubham Bhardwaj >>>> >>>> On Fri, Mar 3, 2017 at 12:01 AM, Jacob Schreiber < >>>> jmschreiber91 at gmail.com> wrote: >>>> >>>>> Hi Shubham >>>>> >>>>> Thanks for your interest. I'm eager to see your contributions to >>>>> sklearn in the future. However, I'm pretty sure kmeans++ is already >>>>> implemented: http://scikit-learn.org/stable/modules/generate >>>>> d/sklearn.cluster.KMeans.html >>>>> >>>>> Jacob >>>>> >>>>> On Thu, Mar 2, 2017 at 1:07 AM, SHUBHAM BHARDWAJ 15BCE0704 < >>>>> shubham.bhardwaj2015 at vit.ac.in> wrote: >>>>> >>>>>> Hello Sir, >>>>>> >>>>>> My introduction : >>>>>> I am a 2nd year student studying Computer Science and engineering >>>>>> from VIT, Vellore. I work in Google Developers Group VIT. All my experience >>>>>> has been about collaborating with a lot of people ,working as a team, >>>>>> building products and learning along the way. >>>>>> Since scikit-learn is participating this time I am too planning to >>>>>> submit a proposal. >>>>>> >>>>>> Proposal idea: >>>>>> I am really interested in implementing kmeans++ algorithm.I was doing >>>>>> some work on DT but I found this very appealing. Just wanted to know if it >>>>>> can be a good project idea. >>>>>> >>>>>> Regards >>>>>> Shubham Bhardwaj >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> >>> -- >>> Guillaume Lemaitre >>> INRIA Saclay - Ile-de-France >>> Equipe PARIETAL >>> guillaume.lemaitre at inria.f r --- >>> https://glemaitre.github.io/ >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skacanski at gmail.com Wed Mar 15 21:20:55 2017 From: skacanski at gmail.com (Sasha Kacanski) Date: Wed, 15 Mar 2017 21:20:55 -0400 Subject: [scikit-learn] best way to scale on the random forest for text w bag of words ... Message-ID: