[SciPy-Dev] scipy.stats: algorithm to for ticket 1493

nicky van foreest vanforeest at gmail.com
Mon Apr 23 14:58:24 EDT 2012


>> for xa, xb it doesn't matter whether they are larger or smaller than
>>> zero, so I don't think we need a special check

I think it does, for suppose that in the algo left = xa = 0.5 (because
the user has been fiddling with xa) and cdf(xa) > q. Then  setting
left = 2*left will only worsen the problem. Or do I miss something?

>>> it looks good in a few more example cases.

I found another small bug, please see the included code.

>>>
>>> The difficult cases will be where cdf also doesn't exist and we need
>>> to get it through integrate.quad, but I don't remember which
>>> distribution is a good case.

This case is harder indeed. (I assume you mean by 'not exist' that
there is no closed form expression for the cdf, like the normal
distribution). Computing the ppf would involve calling quad a lot of
times. This is wasteful especially since the computation of cdf(b)
includes the computation of cdf(a) for a < b, supposing that quad runs
from -np.inf to b. We could repair this by computing cdf(b) = cdf(a) +
quad(f, a, b), assuming that cdf(a) has been computed already.
(perhaps I am not clear enough here. If so, let me know.)

>> I just think that we are not able to reach the q=0, q=1 boundaries,
>> since for some distributions we will run into other numerical
>> problems. And I'm curious how far we can get with this.

I completely missed to include a test on the obvious cases q >= 1. -
np.finfo(float).eps and q <= np.finfo(float).eps. It is now in the
attached file.

>> Similarly, I don't know whether the default xa and xb are good. I
>> changed them for a few distributions, but only where I saw obvious
>> improvements.

I also have no clue what would be good values in general. The choices
seems reasonable from a practical point of view...

>>> Note: I removed the scale in your example, because internal _ppf works
>>> on the standard distribution, loc=0, scale=1. loc and scale are added
>>> generically in .ppf

Thanks. I included also **kwds so that I can pass scale = 10 or
something like this. Once all works as it should, I'll try to convert
the code such that it fits nicely in distributions.py.

The simultaneous updating of left and right in the previous algo is
wrong. Suppose for instance that cdf(left) < cdf(right) < q. Then both
left and right would `move to the left'. This is clearly wrong. The
included code should be better.

With regard to the values of xb and xa. Can a `ordinary' user change
these? If so, then the ppf finder should include some protection in my
opinion. If not, the user will get an error that brentq has not the
right limits, but this error might be somewhat unexpected. (What has
brentq to do with finding the ppf?) Of course, looking at the code
this is clear, but I expect most users will no do so.

The code contains two choices about how to handle xa and xb. Do you
have any preference?

Thanks for your feedback. Very helpful.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: findppf.py
Type: application/octet-stream
Size: 1792 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20120423/e39294d4/attachment.obj>


More information about the SciPy-Dev mailing list