[SciPy-User] log pdf, cdf, etc
josef.pktd at gmail.com
josef.pktd at gmail.com
Sat May 29 00:00:07 EDT 2010
On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, May 28, 2010 at 7:53 PM, <josef.pktd at gmail.com> wrote:
>> I don't think for many use cases log(stats.t.pdf) or many other
>> distributions the performance and accuracy hit would be large enough
>> to make it useless. At least, I haven't seen any other comments in
>> this direction.
>
> "Useless" is a value judgement, of course, but it doesn't seem *too*
> far off to me either. I myself basically always find myself wanting
> log-space values, and even if you're just doing statistical tests,
> numerical precision in the tails can become very practically relevant
> when doing multiple hypothesis correction.
>
>> R's license, GPL, is incompatible with the license of scipy, BSD.
>> While they are allowed to look at our code, code that goes into scipy
>> cannot be based on GPL licensed code.
>
> You mean, they're allowed to copy our code, and we're allowed to look
> at their code for reference but can't use it directly :-).
We are allowed to look at their manuals but not their code.
(Life ain't fair.)
>
>> If never seen it mentioned before that there is a direct function for
>> log(norm.cdf). Which functions and packages in R implement the
>> logarithm of the cdf of these distributions?
>>
>> The cdf for several distributions (including normal) is implement in
>> Fortran or C in scipy.special, and I've never seen a log version for
>> them.
>
> Yet R does in fact use specialized code for computing the log-cdf for
> the normal distribution... at least over some parts of its range. I'm
> not sure how much difference it makes or anything, I'm just reporting
> on the existence of 'if' statements in the source :-). See the base R
> distribution, src/nmath/pnorm.c (which also contains references).
pnorm is the cdf not the log of the cdf, that's what I thought,
but I just saw that they have a "log.p" option.
from the R manual:
"""
For pnorm, based on
Cody, W. D. (1993) Algorithm 715: SPECFUN – A portable FORTRAN package
of special function routines and test drivers
"""
this sounds similar to the fortran or c code that scipy.special has. I
never tried to read that one, except for the doc comments.
accuracy doesn't seem to be a problem
np.log(stats.norm.cdf(np.linspace(-20,20,21))) - [r.pnorm(x,
log_p=True) for x in np.linspace(-20,20,21)]
array([ -2.84217094e-14, -2.84217094e-14, -2.84217094e-14,
0.00000000e+00, -1.42108547e-14, -7.10542736e-15,
-7.10542736e-15, -3.55271368e-15, -1.77635684e-15,
-4.44089210e-16, 0.00000000e+00, 0.00000000e+00,
-5.42101086e-20, -5.53867815e-17, -4.40377573e-17,
7.61985302e-24, 1.77648211e-33, 7.79353682e-45,
6.38875440e-58, 9.74094892e-73, 2.75362412e-89])
except the small numbers in the tail look much better in R
>>> np.log(stats.norm.cdf(np.linspace(-20,20,21)))
array([ -2.03917155e+02, -1.65812373e+02, -1.31695396e+02,
-1.01563034e+02, -7.54106730e+01, -5.32312852e+01,
-3.50134372e+01, -2.07367689e+01, -1.03601015e+01,
-3.78318433e+00, -6.93147181e-01, -2.30129093e-02,
-3.16717434e-05, -9.86587701e-10, -6.66133815e-16,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00])
>>> np.array([r.pnorm(x, log_p=True) for x in np.linspace(-20,20,21)])
array([ -2.03917155e+02, -1.65812373e+02, -1.31695396e+02,
-1.01563034e+02, -7.54106730e+01, -5.32312852e+01,
-3.50134372e+01, -2.07367689e+01, -1.03601015e+01,
-3.78318433e+00, -6.93147181e-01, -2.30129093e-02,
-3.16717434e-05, -9.86587646e-10, -6.22096057e-16,
-7.61985302e-24, -1.77648211e-33, -7.79353682e-45,
-6.38875440e-58, -9.74094892e-73, -2.75362412e-89])
except if we use a branch cut
>>> np.log1p(-stats.norm.sf(np.linspace(-20,20,21)))
array([ -Inf, -Inf, -Inf,
-Inf, -Inf, -Inf,
-3.49450411e+01, -2.07367689e+01, -1.03601015e+01,
-3.78318433e+00, -6.93147181e-01, -2.30129093e-02,
-3.16717434e-05, -9.86587646e-10, -6.22096057e-16,
-7.61985302e-24, -1.77648211e-33, -7.79353682e-45,
-6.38875440e-58, -9.74094892e-73, -2.75362412e-89])
I have no idea about speed.
Josef
>
> -- Nathaniel
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list