[SciPy-User] distributions - who got the most ?

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Dec 4 20:15:56 EST 2012


On Tue, Dec 4, 2012 at 4:01 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
>
> On Tue, Dec 4, 2012 at 4:30 AM, <josef.pktd at gmail.com> wrote:
>>
>> scipy.stats has more than 90 distributions.
>>
>> Do we want to increase it by almost a factor of 10? :)
>>
>> While looking for the cdf of a distribution, I found this :
>>
>> http://www.mathworks.com/matlabcentral/fileexchange/35008-generation-of-random-variates
>>
>> He collected 870 distributions (under BSD license). Includes generic
>> random number generation.
>>
>> Even though there are some variations of distributions counted
>> separately, given my quick browsing this looks impressive and a good
>> source for code and references.
>> Coding style is not great but it's 10 years or so of collecting
>> distributions.
>
>
> Adding a lot of distributions sounds fine to me. That many distributions
> would need to go into a separate namespace.
>
> Any additions should be complete though (the Matlab code only has pdf/cdf)
> and well tested. The Matlab code doesn't look all that useful except for the
> references ("coding style is not great" is really too kind). I also don't
> trust the BSD license that's put on it, many files have different author
> names in them with no mention of where they came from.

The matlab code includes several "special" functions that look mostly
copied from other authors.
This would need checking, but I doubt we need many of those since we
have scipy.special.
We are missing some special functions for distributions, but I didn't
check whether he has any of those.
The pdfs, and the cdfs when available, look like they were implemented
by the author, at least it looks that way for the small sample that I
checked.
(code quality varies a lot, but many distributions are vectorized or
can be easily vectorized from his code.

Given the pdf, the rest could all be derived generically. But it won't
be efficient.

Also, I just saw that sympy could become useful to derive extra properties
http://matthewrocklin.com/blog/work/2012/12/03/Characteristic-Functions/
sympy.stats also works based only on the pdf (from what I have seen).

I'm a bit skeptical about the number of distributions that are
actually generally useful and not just used once in a journal article.
 My impression from tracking several statistics journals is that there
are at least 10 new distributions each year.

As an example, he has a long list of poisson mixture distributions
that I never heard of except for negative binomial. They might be
useful in some cases, but a more general class might cover it better.
>From a brief look at his reference
http://scholar.google.com/scholar?cluster=6061641765696455790&hl=en&as_sdt=0,5&as_vis=1
I think it might not be necessary to implement all details for 5 or
more distributions separately.
According to Google the paper has only 4 citations.  see also 1)

But there are a lot of distributions, or classes/categories of
distributions that scipy is missing, and are for example available in
R, but in R they are spread out over many packages.

Josef

1) another reference for poisson mixtures (technical, not a quick
read, but a funny table)

Karlis, D. and Xekalaki, E. (2005), Mixed Poisson Distributions.
International Statistical Review, 73: 35–58. doi:
10.1111/j.1751-5823.2005.tb00250.x
http://scholar.google.com/scholar?cluster=4455890634693542956&hl=en&as_sdt=2005&sciodt=0,5

--------------------------
Table 1
Some mixed Poisson distributions.
Mixed Poisson Distribution Mixing Distribution A Key Reference
Negative Binomial Gamma Greenwood & Yule (1920)
Geometric Exponential Johnson et al. (1992)
Poisson-Linear Exponential Family Linear Exponential Family Sankaran (1969)
Poisson–Lindley Lindley Sankaran (1970)
Poisson-Linear Exponential Linear Exponential Kling & Goovaerts (1993)
Poisson-Lognormal Lognormal Bulmer (1974)
Poisson-Confluent Hypergeometric Series Confluent Hypergeometric
Series Bhattacharya (1966)
Poisson-Generalized Inverse Gaussian Generalized Inverse Gaussian Sichel (1974)
Sichel Inverse Gaussian Sichel (1975)
Poisson-Inverse Gamma Inverse Gamma Willmot (1993)
Poisson-Truncated Normal Truncated Normal Patil (1964)
Generalized Waring Gamma Product Ratio Irwin (1975)
Simple Waring Exponential  Beta Pielou (1962)
Yule Beta with Specific Parameter Values Simon (1955)
Poisson-Generalized Pareto Generalized Pareto Kempton (1975)
Poisson-Beta I Beta Type I Holla & Bhattacharya (1965)
Poisson-Beta II Beta Type II Gurland (1958)
Poisson-Truncated Beta II Truncated Beta Type II Willmot (1986)
Poisson-Uniform Uniform Bhattacharya (1966)
Poisson-Truncated Gamma Truncated Gamma Willmot (1993)
Poisson-Generalized Gamma Generalized Gamma Albrecht (1984)
Dellaporte Shifted Gamma Ruohonen (1988)
Poisson-Modified Bessel of the 3rd Kind Modified Bessel of the 3rd
Kind Ong & Muthaloo (1995)
Poisson–Pareto Pareto Willmot (1993)
Poisson-Shifted Pareto Shifted Pareto Willmot (1993)
Poisson–Pearson Family Pearson’s Family of Distributions Albrecht (1982)
Poisson-Log-Student Log-Student Gaver & O’Muircheartaigh (1987)
Poisson-Power Function Power Function Distribution Rai (1971)
Poisson–Lomax Lomax Al-Awadhi & Ghitany (2001)
Poisson-Power Variance Power Variance Family Hougaard et al. (1997)
Neyman Poisson Douglas (1980)
Other Discrete Distributions Johnson et al. (1992)
-------------------------------------------------------------------

>
> Ralf
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list