probability distribution function in python

Jari Oksanen jarioksa at cc.oulu.fi
Wed Apr 11 03:55:24 EDT 2001


Ken Seehof wrote:


> 
> Here's a better one, but you have to learn about lambda and
> reduce:
> 
> >>> def fact(n):
> ...  return reduce(lambda x,y: long(x*y), range(1,n+1))
> ...
This looks nice as an isolated excercise, but fails if you need to use
these factorials somewhere, like in calculating Poisson probabilities in
the original question. The problem is that you need the factorial to
divide a real number (float), and for that promote your factorial to a
float and that overflows. A better solution was already posted here:
calculate log-Poisson with Stirling approximation for factorials. Here
is the first approximation:

>>> from math import *
>>> def stirling(n):   
...     log(2)/2+log(n)/2+log(pi)/2+n*log(n)-n

We may compare this to your fact() in two cases:

>>> log(fact(100))
363.739375556
>>> stirling(100)
363.738542225
>>> log(fact(200))
inf
>>> stirling(200)
863.231570526

So fact(200) gives a result, but fails when this result is used in
floating point calculations while Stirling still works. So the idea of
using log-Poisson is to avoid overflows in interim results: the final
result is in range 0...1 and it can be calculated for higher n as well
as long as you avoid large floats.

Now it depends on the needs of accuracy whether the simple Stirling
suffices. A common practice is to use straight factorial for smaller n
and switch to Stirling in larger n. There are very simple correction
terms for larger n (and for smaller n you can calculate or cut-and-paste
the corrections explicitly) depending on the magnitude of n. These you
need in serious applications, and in that case I wouldn't write them
myself but I would get a good numeric library function.

In case you don't want the Poisson probability but a Poisson
log-likelihood, you might consider skipping the factorial completely.

cheers, jo
-- 
Jari Oksanen -- Oulu, Finland



More information about the Python-list mailing list