[SciPy-User] fitting discrete probability distributions to data

c twocolorflipflop at gmail.com
Wed Mar 11 19:14:52 EDT 2015


hi,

i have some data:

A) a 1d array (dimensions 1x50), made by summing the columns of a 2d array
(dimensions ~20k x 50).

B) a 1D array that is just a particular row of that 2d array

i need to fit a sum of 2 negative binomial distributions to A), and to fit
a single negative binomial distrib. to B).

i have spent a while now reading the documentation for numpy.stats and the
statsmodel package and various stack overflow posts, etc.. but i do not yet
understand how to go about fitting a discrete probability distribution to a
vector of data.

specific subquestions:

- do i need to load data in as a pandas df? an ndarray? does it not matter?

- i understand endog and exog in the context of the examples given in the
docs (where you have one column that you want to use to predict some other
column) but not what they should be in the case where i basically am trying
to fit a curve to the normalized histogram of my data

- if someone can explain how to fit with statsmodels' "Negative Binomial (
http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.html#statsmodels.discrete.discrete_model.NegativeBinomial)
that would be a good start. but i do also need to know how to fit to a sum
of two of these, or possibly a sum of two other discrete distributions

- is the patsy formula syntax relevant here? i have never used R and could
not find an example of the "R-like" syntax that is similar enough to my use
case to parse how it works

- honestly i don't know what i'm doing, please help!

if these questions reveal grave ignorance, or are not directly relevant
enough to scipy for this mailing list, i apologize and thanks for bearing
with me. i barely know how to flip a coin, this stuff is new to me.

thanks a lot
c
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20150312/ff98372d/attachment.html>


More information about the SciPy-User mailing list