[SciPy-User] fitting discrete probability distributions to data

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Mar 11 20:07:34 EDT 2015


(please comment inline or post at the bottom in scipy related mailing lists)

On Wed, Mar 11, 2015 at 7:49 PM, c <twocolorflipflop at gmail.com> wrote:

> yup, i have the original data
>

To estimate a single Negative Binomial, you can use
statsmodels.NegativeBinomial and regress on a constant.
endog is your negative binomial data, exog = np.ones(len(data))

and
result = sm.NegativeBinomial(data, exog).fit()

result.params has the estimated parameters but they are in a
mean-dispersion parameterization used for regression, not in the "standard"
parameterization of a Negative Binomial  distribution.

There is somewhere (!?) a helper function to transform the params into the
standard form as used for example by scipy.stats.negbin

estimating the mixture of two or more NegativeBinomial distributions takes
a bit of work.

Josef





>
> On Thu, Mar 12, 2015 at 1:42 AM, <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Wed, Mar 11, 2015 at 7:14 PM, c <twocolorflipflop at gmail.com> wrote:
>>
>>> hi,
>>>
>>> i have some data:
>>>
>>> A) a 1d array (dimensions 1x50), made by summing the columns of a 2d
>>> array (dimensions ~20k x 50).
>>>
>>> B) a 1D array that is just a particular row of that 2d array
>>>
>>> i need to fit a sum of 2 negative binomial distributions to A), and to
>>> fit a single negative binomial distrib. to B).
>>>
>>> i have spent a while now reading the documentation for numpy.stats and
>>> the statsmodel package and various stack overflow posts, etc.. but i do not
>>> yet understand how to go about fitting a discrete probability distribution
>>> to a vector of data.
>>>
>>
>> Do  you have the data in the form of histograms (counts) or the original
>> data ?
>>
>> statsmodels can only estimate based on the original data which is assumed
>> to consist of observations drawn from a Negative Binomial distribution.
>> Fitting histogram and fitting mixtures of distributions is not supported
>> "out of the box", and would require some custom models.
>>
>> If you just want to fit a distribution to a histogram or discrete counts,
>> then using curve_fit or leastsq is one possibility.
>>
>> Josef
>>
>>
>>
>>>
>>> specific subquestions:
>>>
>>> - do i need to load data in as a pandas df? an ndarray? does it not
>>> matter?
>>>
>>> - i understand endog and exog in the context of the examples given in
>>> the docs (where you have one column that you want to use to predict some
>>> other column) but not what they should be in the case where i basically am
>>> trying to fit a curve to the normalized histogram of my data
>>>
>>> - if someone can explain how to fit with statsmodels' "Negative Binomial
>>> (
>>> http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.html#statsmodels.discrete.discrete_model.NegativeBinomial)
>>> that would be a good start. but i do also need to know how to fit to a sum
>>> of two of these, or possibly a sum of two other discrete distributions
>>>
>>> - is the patsy formula syntax relevant here? i have never used R and
>>> could not find an example of the "R-like" syntax that is similar enough to
>>> my use case to parse how it works
>>>
>>> - honestly i don't know what i'm doing, please help!
>>>
>>> if these questions reveal grave ignorance, or are not directly relevant
>>> enough to scipy for this mailing list, i apologize and thanks for bearing
>>> with me. i barely know how to flip a coin, this stuff is new to me.
>>>
>>> thanks a lot
>>> c
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20150311/f515c87a/attachment.html>


More information about the SciPy-User mailing list