[SciPy-Dev] scipy.stats: algorithm to for ticket 1493

Sun Apr 22 20:27:36 EDT 2012

On Sun, Apr 22, 2012 at 2:42 PM, nicky van foreest <vanforeest at gmail.com> wrote:
> I just realized, xa may be too large... hence we should search such
> that cdf(left) < q < cdf(right).
>
> *Assuming* that xa < 0 and xb > 0 the following should be better
>
> def findppf(q):
>    # search  until cdf(left) < q < cdf(right)
>    left, right = invnorm.xa, invnorm.xb
>    while invnorm.cdf(left, 7.24000019602, scale=2.51913630166) > q:
>        right = left
>        left *= 2
>    while invnorm.cdf(right, 7.24000019602, scale=2.51913630166) < q:
>        left = right
>        right *= 2
>    return optimize.brentq(lambda x: \
>                           invnorm.cdf(x, 7.24000019602,
> scale=2.51913630166) - q,\
>                           left, right)
>
> Should a test on xa < 0 and xb>0 be added?

for xa, xb it doesn't matter whether they are larger or smaller than
zero, so I don't think we need a special check

it looks good in a few more example cases.

The difficult cases will be where cdf also doesn't exist and we need
to get it through integrate.quad, but I don't remember which
distribution is a good case.
There is a testcase in the test suite, where I tried to roundtrip
close to the 0, 1 boundary before running into failures with some
distributions
https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L307

to try out how well tyour solution works, the roundtrip could be done
with, for example, q= [1e-8, 1-1e-8] and see at which distribution it
breaks and why (if any)

Note: I removed the scale in your example, because internal _ppf works
on the standard distribution, loc=0, scale=1. loc and scale are added
generically in .ppf

Thanks,

Josef

from scipy import stats, optimize

def findppf(dist, q, *args):
    # search  until cdf(left) < q < cdf(right)

    left, right = dist.xa, dist.xb

    counter = 0
    while dist.cdf(left, *args) > q:
        right = left
        left *= 2
        counter += 1
        print counter, left, right

    while dist.cdf(right, *args) < q:
        left = right
        right *= 2
        counter += 1
        print counter, left, right

    return optimize.brentq(lambda x: dist.cdf(x, *args) - q, left, right)

print
print 'invgauss'
s = 7.24000019602
sol =  findppf(stats.invgauss, 0.8455, s)
print sol
sol = findppf(stats.invgauss, 1-1e-8, s)
print 'roundtrip', 1-1e-8, sol, stats.invgauss.cdf(sol, s)
print 1e-30, stats.invgauss.cdf(findppf(stats.invgauss, 1e-30, s), s)

print '\nt'
print  findppf(stats.t, 1-1e-8, s), stats.t.ppf(1-1e-8, s)
print  findppf(stats.t, 1e-8, s), stats.t.ppf(1e-8, s)
print '\ncauchy'
print  findppf(stats.cauchy, 1e-8), stats.cauchy.ppf(1e-8)
print '\nf'
print findppf(stats.f, 1-1e-8, 2, 10), stats.f.ppf(1-1e-8, 2, 10)

> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev