[SciPy-User] log pdf, cdf, etc

josef.pktd at gmail.com josef.pktd at gmail.com
Sat May 29 00:00:07 EDT 2010


On Fri, May 28, 2010 at 11:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, May 28, 2010 at 7:53 PM,  <josef.pktd at gmail.com> wrote:
>> I don't think for many use cases log(stats.t.pdf) or many other
>> distributions the performance and accuracy hit would be large enough
>> to make it useless. At least, I haven't seen any other comments in
>> this direction.
>
> "Useless" is a value judgement, of course, but it doesn't seem *too*
> far off to me either. I myself basically always find myself wanting
> log-space values, and even if you're just doing statistical tests,
> numerical precision in the tails can become very practically relevant
> when doing multiple hypothesis correction.
>
>> R's license, GPL, is incompatible with the license of scipy, BSD.
>> While they are allowed to look at our code, code that goes into scipy
>> cannot be based on GPL licensed code.
>
> You mean, they're allowed to copy our code, and we're allowed to look
> at their code for reference but can't use it directly :-).

We are allowed to look at their manuals but not their code.
(Life ain't fair.)

>
>> If never seen it mentioned before that there is a direct function for
>> log(norm.cdf). Which functions and packages in R implement the
>> logarithm of the cdf of these distributions?
>>
>> The cdf for several distributions (including normal) is implement in
>> Fortran or C in scipy.special, and I've never seen a log version for
>> them.
>
> Yet R does in fact use specialized code for computing the log-cdf for
> the normal distribution... at least over some parts of its range. I'm
> not sure how much difference it makes or anything, I'm just reporting
> on the existence of 'if' statements in the source :-). See the base R
> distribution, src/nmath/pnorm.c (which also contains references).

pnorm is the cdf not the log of the cdf, that's what I thought,
 but I just saw that they have a "log.p" option.

from the R manual:
"""
For pnorm, based on
Cody, W. D. (1993) Algorithm 715: SPECFUN – A portable FORTRAN package
of special function routines and test drivers
"""

this sounds similar to the fortran or c code that scipy.special has. I
never tried to read that one, except for the doc comments.

accuracy doesn't seem to be a problem

np.log(stats.norm.cdf(np.linspace(-20,20,21))) - [r.pnorm(x,
log_p=True) for x in np.linspace(-20,20,21)]
array([ -2.84217094e-14,  -2.84217094e-14,  -2.84217094e-14,
         0.00000000e+00,  -1.42108547e-14,  -7.10542736e-15,
        -7.10542736e-15,  -3.55271368e-15,  -1.77635684e-15,
        -4.44089210e-16,   0.00000000e+00,   0.00000000e+00,
        -5.42101086e-20,  -5.53867815e-17,  -4.40377573e-17,
         7.61985302e-24,   1.77648211e-33,   7.79353682e-45,
         6.38875440e-58,   9.74094892e-73,   2.75362412e-89])

except the small numbers in the tail look much better in R

>>> np.log(stats.norm.cdf(np.linspace(-20,20,21)))
array([ -2.03917155e+02,  -1.65812373e+02,  -1.31695396e+02,
        -1.01563034e+02,  -7.54106730e+01,  -5.32312852e+01,
        -3.50134372e+01,  -2.07367689e+01,  -1.03601015e+01,
        -3.78318433e+00,  -6.93147181e-01,  -2.30129093e-02,
        -3.16717434e-05,  -9.86587701e-10,  -6.66133815e-16,
         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00,   0.00000000e+00])

>>> np.array([r.pnorm(x, log_p=True) for x in np.linspace(-20,20,21)])
array([ -2.03917155e+02,  -1.65812373e+02,  -1.31695396e+02,
        -1.01563034e+02,  -7.54106730e+01,  -5.32312852e+01,
        -3.50134372e+01,  -2.07367689e+01,  -1.03601015e+01,
        -3.78318433e+00,  -6.93147181e-01,  -2.30129093e-02,
        -3.16717434e-05,  -9.86587646e-10,  -6.22096057e-16,
        -7.61985302e-24,  -1.77648211e-33,  -7.79353682e-45,
        -6.38875440e-58,  -9.74094892e-73,  -2.75362412e-89])

except if we use a branch cut

>>> np.log1p(-stats.norm.sf(np.linspace(-20,20,21)))
array([            -Inf,             -Inf,             -Inf,
                   -Inf,             -Inf,             -Inf,
        -3.49450411e+01,  -2.07367689e+01,  -1.03601015e+01,
        -3.78318433e+00,  -6.93147181e-01,  -2.30129093e-02,
        -3.16717434e-05,  -9.86587646e-10,  -6.22096057e-16,
        -7.61985302e-24,  -1.77648211e-33,  -7.79353682e-45,
        -6.38875440e-58,  -9.74094892e-73,  -2.75362412e-89])

I have no idea about speed.

Josef

>
> -- Nathaniel
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list