[SciPy-User] fitting discrete probability distributions to data

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Mar 11 19:42:51 EDT 2015


On Wed, Mar 11, 2015 at 7:14 PM, c <twocolorflipflop at gmail.com> wrote:

> hi,
>
> i have some data:
>
> A) a 1d array (dimensions 1x50), made by summing the columns of a 2d array
> (dimensions ~20k x 50).
>
> B) a 1D array that is just a particular row of that 2d array
>
> i need to fit a sum of 2 negative binomial distributions to A), and to fit
> a single negative binomial distrib. to B).
>
> i have spent a while now reading the documentation for numpy.stats and the
> statsmodel package and various stack overflow posts, etc.. but i do not yet
> understand how to go about fitting a discrete probability distribution to a
> vector of data.
>

Do  you have the data in the form of histograms (counts) or the original
data ?

statsmodels can only estimate based on the original data which is assumed
to consist of observations drawn from a Negative Binomial distribution.
Fitting histogram and fitting mixtures of distributions is not supported
"out of the box", and would require some custom models.

If you just want to fit a distribution to a histogram or discrete counts,
then using curve_fit or leastsq is one possibility.

Josef



>
> specific subquestions:
>
> - do i need to load data in as a pandas df? an ndarray? does it not matter?
>
> - i understand endog and exog in the context of the examples given in the
> docs (where you have one column that you want to use to predict some other
> column) but not what they should be in the case where i basically am trying
> to fit a curve to the normalized histogram of my data
>
> - if someone can explain how to fit with statsmodels' "Negative Binomial (
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.html#statsmodels.discrete.discrete_model.NegativeBinomial)
> that would be a good start. but i do also need to know how to fit to a sum
> of two of these, or possibly a sum of two other discrete distributions
>
> - is the patsy formula syntax relevant here? i have never used R and could
> not find an example of the "R-like" syntax that is similar enough to my use
> case to parse how it works
>
> - honestly i don't know what i'm doing, please help!
>
> if these questions reveal grave ignorance, or are not directly relevant
> enough to scipy for this mailing list, i apologize and thanks for bearing
> with me. i barely know how to flip a coin, this stuff is new to me.
>
> thanks a lot
> c
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20150311/33dff299/attachment.html>


More information about the SciPy-User mailing list