[scikit-learn] Confidence and Prediction Intervals of Support Vector Regression

Wed Mar 1 22:17:13 EST 2017

that's a very serious dedication to bootstrap :)

On Wed, Mar 1, 2017 at 10:13 PM, Sebastian Raschka <se.raschka at gmail.com>
wrote:

> Glad to hear that it was at least a little bit helpful :)
> (haha, Efron and Tibshirani even have a whole ~500 pg book on bootstrap if
> you have the time and patience … :) https://www.crcpress.com/An-
> Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)
>
> > On Mar 1, 2017, at 10:07 PM, Raga Markely <raga.markely at gmail.com>
> wrote:
> >
> > No worries, Sebastian :) .. thank you very much for your help.. I
> learned a lot of new things from your site today.. it led me to some
> relevant chapters in "The Elements of Statistical Learning", which then led
> me to chapter 8 page 264 about non-parametric & parametric bootstrap..
> >
> > I think I will just go with the non-parametric bootstrap for my
> problem.. similar to the bootstrap steps i mentioned earlier..
> >
> > Thank you!
> > Raga
> >
> > On Wed, Mar 1, 2017 at 9:44 PM, Sebastian Raschka <
> mail at sebastianraschka.com> wrote:
> > Hi, Raga,
> >
> > > 1. Just to make sure I understand correctly, using the .632+ bootstrap
> method, the ACC_lower and ACC_upper are the lower and higher percentile of
> the ACC_h,i distribution?
> >
> > phew, I am actually not sure anymore … I think it’s the percentile of
> the ACC_boot distribution, similar to the “classic” bootstrap but where
> ACC_boot got computed from weighted ACC_h,i and ACC_r,i
> >
> > >  2. For regression algorithms, is there a recommended equation for the
> no-information rate gamma?
> >
> >
> > Sorry, can’t be of much help here; I am not sure what the equivalent of
> the no-information rate for regression would be ...
> >
> >
> >
> > > On Mar 1, 2017, at 5:39 PM, Raga Markely <raga.markely at gmail.com>
> wrote:
> > >
> > > Thanks a lot, Sebastian! Very nicely written.
> > >
> > > I have a few follow-up questions:
> > > 1. Just to make sure I understand correctly, using the .632+ bootstrap
> method, the ACC_lower and ACC_upper are the lower and higher percentile of
> the ACC_h,i distribution?
> > > 2. For regression algorithms, is there a recommended equation for the
> no-information rate gamma?
> > > 3. I need to plot the confidence interval and prediction interval for
> my Support Vector Regression prediction (just to clarify these intervals,
> please see an analogy from linear model on slide 14:
> http://www2.stat.duke.edu/~tjl13/s101/slides/unit6lec3H.pdf) - can I
> derive the intervals from .632+ bootstrap method or is there a different
> way of getting these intervals?
> > >
> > > Thank you!
> > > Raga
> > >
> > >
> > > On Wed, Mar 1, 2017 at 3:13 PM, Sebastian Raschka <
> se.raschka at gmail.com> wrote:
> > > Hi, Raga,
> > > I have a short section on this here (https://sebastianraschka.com/
> blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and-
> empirical-confidence-intervals) if it helps.
> > >
> > > Best,
> > > Sebastian
> > >
> > > > On Mar 1, 2017, at 3:07 PM, Raga Markely <raga.markely at gmail.com>
> wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I wonder if you could provide me with some suggestions on how to
> determine the confidence and prediction intervals of SVR? If you have
> suggestions for any machine learning algorithms in general, that would be
> fine too (doesn't have to be specific for SVR).
> > > >
> > > > So far, I have found:
> > > > 1. Bootstrap: http://stats.stackexchange.com/questions/183230/
> bootstrapping-confidence-interval-from-a-regression-prediction
> > > > 2. http://journals.plos.org/plosone/article/file?id=10.
> 1371/journal.pone.0048723&type=printable
> > > > 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf
> > > >
> > > > But, I don't fully understand the details in #2 and #3 to the point
> that I can write a step by step code. If I use bootstrap method, I can get
> the confidence interval as follows?
> > > > a. Draw bootstrap sample of size n
> > > > b. Fit the SVR model (with hyperparameters chosen during model
> selection with grid search cv) to this bootstrap sample
> > > > c. Use this model to predict the output variable y* from input
> variable X*
> > > > d. Repeat step a-c for, for instance, 100 times
> > > > e. Order the 100 values of y*, and determine, for instance, the 10th
> percentile and 90th percentile (if we are looking for 0.8 confidence
> interval)
> > > > f. Repeat a-e for different values of X* to plot the prediction with
> confidence interval
> > > >
> > > > But, I don't know how to get the prediction interval from here.
> > > >
> > > > Thank you very much,
> > > > Raga
> > > > _______________________________________________
> > > > scikit-learn mailing list
> > > > scikit-learn at python.org
> > > > https://mail.python.org/mailman/listinfo/scikit-learn
> > >
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> > >
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170301/c2a386e8/attachment.html>