[SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data

Gökhan Sever gokhansever at gmail.com
Tue Nov 17 14:28:45 EST 2009


On Tue, Nov 17, 2009 at 12:38 PM, <josef.pktd at gmail.com> wrote:

> On Tue, Nov 17, 2009 at 12:29 PM, Gökhan Sever <gokhansever at gmail.com>
> wrote:
> >
> >
> > On Tue, Nov 17, 2009 at 12:13 AM, Ian Mallett <geometrian at gmail.com>
> wrote:
> >>
> >> Theory wise:
> >> -Do a linear regression on your data.
> >> -Apply a logrithmic transform to your data's dependent variable, and do
> >> another linear regression.
> >> -Apply a logrithmic transform to your data's independent variable, and
> do
> >> another linear regression.
> >> -Take the best regression (highest r^2 value) and execute a back
> >> transform.
> >>
> >> Then, to get your desired extrapolation, simply substitute in the size
> for
> >> the independent variable to get the expected value.
> >>
> >> If, however, you're looking for how to implement this in NumPy or SciPy,
> I
> >> can't really help :-P
> >> Ian
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >
> > OK, before applying your suggestions. I have a few more questions. Here
> is 1
> > real-sample data that I will use as a part of the log-normal fitting.
> There
> > is 15 elements in this arrays each being a concentration for
> corresponding
> > 0.1 - 3.0 um size ranges.
> >
> > I[74]: conc
> > O[74]:
> > array([ 119.7681,  118.546 ,  146.6548,   96.5478,  109.9911,   32.9974,
> >          20.7762,    6.1107,   12.2212,    3.6664,    3.6664,    1.2221,
> >           2.4443,    2.4443,    3.6664])
> >
> > For now not calibrated size range I just assume a linear array:
> >
> > I[78]: sizes = linspace(0.1, 3.0, 15)
> >
> > I[79]: sizes
> > O[79]:
> > array([ 0.1       ,  0.30714286,  0.51428571,  0.72142857,  0.92857143,
> >         1.13571429,  1.34285714,  1.55      ,  1.75714286,  1.96428571,
> >         2.17142857,  2.37857143,  2.58571429,  2.79285714,  3.        ])
> >
> >
> > Not a very ideal looking log-normal, but so far I don't know what else
> > besides a log-normal fit would give me a better estimate:
> > I[80]: figure(); plot(sizes, conc)
> > http://img406.imageshack.us/img406/156/sizeconc.png
> >
> > scipy.stats has the lognorm.fit
> >
> >     lognorm.fit(data,s,loc=0,scale=1)
> >         - Parameter estimates for lognorm data
> >
> > and applying this to my data. However not sure the right way of calling
> it,
> > and not sure if this could be applied to my case?
> >
> > I[81]: stats.lognorm.fit(conc)
> > O[81]: array([ 2.31386066,  1.19126064,  9.5748391 ])
> >
> > Lastly, what is the way to create a ideal log-normal sample using the
> > stats.lognorm.rvs?
>
>
R. Kern has nicely summarized my intention. Let me add some more onto his
description.


> I don't think I understand the connection to the log-normal distribution.
> You seem to have a non-linear relationship
> conc = f(size)  where you want to find a non-linear relationship f
>

Here I am directly quoting from on of my cloud physics books:

"Once a discrete model size distribution has been laid out, the initial
particle number,
volume, and mass concentrations must be distributed among model size bins.
This
can be accomplished by fitting measurements to a continuous size
distribution,
then discretizing the continuous distribution over the model bins. Three
continuous
distributions available for this procedure are the lognormal,
Marshall–Palmer, and
modified gamma distributions."

My data are discrete in its nature, since have only 15 channels in between
(0.1 to 3.0 um ranges).
Say that (from the sample data that I used in my previous e-mail) the first
channel is in between
0.10 to 0.31 um and I read the number concentration for this size-range as
119.77 #/cm^3 so on so forth.

Since I am interested to estimate the number concentrations below the 0.1 um
(preferably down to 0.01 um or 10 nm)
I would like to fit a continuous distribution onto my dataset. Among the all
three continuous distributions lognormal seems
to be the easiest to implement, and log-normal distribution is commonly used
to represent aerosol size distribution in the
atmosphere. If there is a way to do this discretely I would like to know
very much.


>
> If conc where just lognormal distributed, then you would not get any
> relationship between conc and size.
>
> If you have many observations with conc, size pairs then you could
> estimate a noisy model
> conc = f(size) + u  where the noise u is for example log-normal
> distributed but you would still need to get an expression for the
> non-linear function f.
>

I don't understand why I can't get a relation between sizes and conc values
if conc is log-normally distributed. Can I elaborate this a bit more? The
non-linear relationship part is also confusing me. If say to test the linear
relationship of x and y data pairs we just fit a line, in this case what I
am looking is to fit a log-normal to get a relation between size and conc.



> Extending a non-linear function outside of the observed range
> is essentially always just a guess or by assumption.
>

Yes, I am aware of this. Just trying to put my guesses into a well-defined
form. So when I am describing the analysis process I will be able tell to
others that this extrapolation is a result of log-normal fitting.


>
> If you want to fit a curve f that has the same shape as the pdf of
> the log-normal, then you cannot do it with lognorm.fit, because that
> just assumes you have a random sample independent of size.
>

Could you give an example on this?


>
> So, it's not clear to me what you really want, or what your sample data
> looks like (do you have only one 15 element sample or lots of them).
>

I have many sample points (thousands) that are composed of this 15 elements.
But the whole data don't look much different the sample I used. Most peaks
are around 3rd - 4th channel and decaying as shown in the figure.


>
> Josef
>





>
>
> >
> > Thanks
> >
> >
> > --
> > Gökhan
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091117/a05fc3f2/attachment.html>


More information about the SciPy-User mailing list