[SciPy-User] fitting discrete probability distributions to data

c twocolorflipflop at gmail.com
Wed Mar 11 19:49:09 EDT 2015


yup, i have the original data

On Thu, Mar 12, 2015 at 1:42 AM, <josef.pktd at gmail.com> wrote:

>
>
> On Wed, Mar 11, 2015 at 7:14 PM, c <twocolorflipflop at gmail.com> wrote:
>
>> hi,
>>
>> i have some data:
>>
>> A) a 1d array (dimensions 1x50), made by summing the columns of a 2d
>> array (dimensions ~20k x 50).
>>
>> B) a 1D array that is just a particular row of that 2d array
>>
>> i need to fit a sum of 2 negative binomial distributions to A), and to
>> fit a single negative binomial distrib. to B).
>>
>> i have spent a while now reading the documentation for numpy.stats and
>> the statsmodel package and various stack overflow posts, etc.. but i do not
>> yet understand how to go about fitting a discrete probability distribution
>> to a vector of data.
>>
>
> Do  you have the data in the form of histograms (counts) or the original
> data ?
>
> statsmodels can only estimate based on the original data which is assumed
> to consist of observations drawn from a Negative Binomial distribution.
> Fitting histogram and fitting mixtures of distributions is not supported
> "out of the box", and would require some custom models.
>
> If you just want to fit a distribution to a histogram or discrete counts,
> then using curve_fit or leastsq is one possibility.
>
> Josef
>
>
>
>>
>> specific subquestions:
>>
>> - do i need to load data in as a pandas df? an ndarray? does it not
>> matter?
>>
>> - i understand endog and exog in the context of the examples given in the
>> docs (where you have one column that you want to use to predict some other
>> column) but not what they should be in the case where i basically am trying
>> to fit a curve to the normalized histogram of my data
>>
>> - if someone can explain how to fit with statsmodels' "Negative Binomial (
>> http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.html#statsmodels.discrete.discrete_model.NegativeBinomial)
>> that would be a good start. but i do also need to know how to fit to a sum
>> of two of these, or possibly a sum of two other discrete distributions
>>
>> - is the patsy formula syntax relevant here? i have never used R and
>> could not find an example of the "R-like" syntax that is similar enough to
>> my use case to parse how it works
>>
>> - honestly i don't know what i'm doing, please help!
>>
>> if these questions reveal grave ignorance, or are not directly relevant
>> enough to scipy for this mailing list, i apologize and thanks for bearing
>> with me. i barely know how to flip a coin, this stuff is new to me.
>>
>> thanks a lot
>> c
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20150312/09573000/attachment.html>


More information about the SciPy-User mailing list