[SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data

Tue Nov 17 15:04:20 EST 2009

On Tue, Nov 17, 2009 at 2:37 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Tue, Nov 17, 2009 at 13:28, Gökhan Sever <gokhansever at gmail.com> wrote:
>>
>>
>> On Tue, Nov 17, 2009 at 12:38 PM, <josef.pktd at gmail.com> wrote:
>
>>> If conc where just lognormal distributed, then you would not get any
>>> relationship between conc and size.
>>>
>>> If you have many observations with conc, size pairs then you could
>>> estimate a noisy model
>>> conc = f(size) + u  where the noise u is for example log-normal
>>> distributed but you would still need to get an expression for the
>>> non-linear function f.
>>
>> I don't understand why I can't get a relation between sizes and conc values
>> if conc is log-normally distributed. Can I elaborate this a bit more? The
>> non-linear relationship part is also confusing me. If say to test the linear
>> relationship of x and y data pairs we just fit a line, in this case what I
>> am looking is to fit a log-normal to get a relation between size and conc.
>
> It's a language issue. Your concentration values are not log-normally
> distributed. Your particle sizes are log-normally distributed (maybe).
> The concentration of a range of particle sizes is a measurement that
> is related to particle size the distribution, but you would not say
> that the measurements themselves are log-normally distributed. Josef
> was taking your language at face value.

The way I see it, you have to variables, size and counts (or concentration).
My initial interpretation was you want to model the relationship between
these two variables.
When the total number of particles is fixed, then the conditional size
distribution is univariate, and could be modeled by a log-normal
distribution. (This still leaves the total count unmodelled.)

If you have the total particle count per bin, then it
should be possible to write down the likelihood function that is
discretized to the bins from the continuous distribution.
Given a random particle, what's the probability of being in bin 1,
bin 2 and so on. Then add the log-likelihood over all particles
and maximize as a function of the log-normal parameters.
(There might be a numerical trick using fraction instead of
conditional count, but I'm not sure what the analogous discrete
distribution would be. )
Once the parameters of the log-normal distribution are
estimated, the distribution would be defined over all of
the real line (where the out of sample pdf is determined
by assumption not data).

Josef

>
>>> If you want to fit a curve f that has the same shape as the pdf of
>>> the log-normal, then you cannot do it with lognorm.fit, because that
>>> just assumes you have a random sample independent of size.
>>
>> Could you give an example on this?
>
> x = stats.norm.rvs()
> stats.norm.fit(x)
>
>>> So, it's not clear to me what you really want, or what your sample data
>>> looks like (do you have only one 15 element sample or lots of them).
>>
>> I have many sample points (thousands) that are composed of this 15 elements.
>> But the whole data don't look much different the sample I used. Most peaks
>> are around 3rd - 4th channel and decaying as shown in the figure.
>
> Do you need to fit a different distribution for each 15-vector? Or are
> all of these measurements supposed to be merged somehow?
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>