[SciPy-User] How to fit a curve/function?

Wed Jun 8 11:37:00 EDT 2011

On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger <JRadinger at gmx.at> wrote:
>
> -------- Original-Nachricht --------
>> Datum: Wed, 8 Jun 2011 10:33:45 -0400
>> Von: josef.pktd at gmail.com
>> An: SciPy Users List <scipy-user at scipy.org>
>> Betreff: Re: [SciPy-User] How to fit a curve/function?
>
>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger <JRadinger at gmx.at>
>> wrote:
>> >
>> > -------- Original-Nachricht --------
>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400
>> >> Von: josef.pktd at gmail.com
>> >> An: SciPy Users List <scipy-user at scipy.org>
>> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>> >
>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger <JRadinger at gmx.at>
>> >> wrote:
>> >> >
>> >> > -------- Original-Nachricht --------
>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400
>> >> >> Von: josef.pktd at gmail.com
>> >> >> An: SciPy Users List <scipy-user at scipy.org>
>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function?
>> >> >
>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger <JRadinger at gmx.at>
>> >> >> wrote:
>> >> >> > Hello,
>> >> >> >
>> >> >> > I've got following function describing any kind of animal
>> dispersal
>> >> >> kernel:
>> >> >> >
>> >> >> > def pdf(x,s1,s2):
>> >> >> >    return
>> >> >>
>> >>
>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>> >> >> >
>> >> >> > On the other hand I've got data from literature with which I want
>> to
>> >> fit
>> >> >> the function so that I get s1, s2 and x.
>> >> >> > Ususally the data in the literature are as follows:
>> >> >> >
>> >> >> > Example 1: 50% of the animals are between -270m and +270m and 90%
>> >>  are
>> >> >> between -500m and + 500m
>> >> >> >
>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between
>> >> >> -1000m and +1000m
>> >> >> >
>> >> >> > So far as I understand an integration of the function is needed to
>> >> solve
>> >> >> for s1 and s2 as all the literature data give percentage (area under
>> >> the
>> >> >> curve) Can that be used to fit the curve or can that create ranges
>> for
>> >> s1
>> >> >> and s2.
>> >> >>
>> >> >> I don't see a way around integration.
>> >> >>
>> >> >> If you have exactly 2 probabilities, then you can you a solver like
>> >> >> scipy.optimize.fsolve to match the probabilites
>> >> >> eg.
>> >> >> 0.5 = integral pdf from -270 to 270
>> >> >> 0.9 = integral pdf from -500 to 500
>> >> >>
>> >> >> If you have more than 2 probabilities, then using optimization of a
>> >> >> weighted function of the moment conditions would be better.
>> >> >>
>> >> >> Josef
>> >> >
>> >> >
>> >> >
>> >> > Hello again
>> >> >
>> >> > I tried following, but without success so far. What do I have to do
>> >> excactly...
>> >> >
>> >> > import numpy
>> >> > from scipy import stats
>> >> > from scipy import integrate
>> >> > from scipy.optimize import fsolve
>> >> > import math
>> >> >
>> >> > p=0.3
>> >> >
>> >> > def pdf(x,s1,s2):
>> >> >    return
>> >>
>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>> >> >
>> >> > def equ(s1,s2):
>> >> >    0.5==integrate.quad(pdf,-270,270,args=(s1,s2))
>> >> >    0.9==integrate.quad(pdf,-500,500,args=(s1,s2))
>> >> >
>> >> > result=fsolve(equ, 1,500)
>> >> >
>> >> > print result
>> >>
>> >> equ needs to return the deviation of the equations (I changed some
>> >> details for s1 just to try it)
>> >>
>> >> import numpy
>> >> from scipy import stats
>> >> from scipy import integrate
>> >> from scipy.optimize import fsolve
>> >> import math
>> >>
>> >> p=0.3
>> >>
>> >> def pdf(x,s1,s2):
>> >>     return
>> >>
>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2)))))
>> >>
>> >> def equ(arg):
>> >>     s1,s2 = numpy.abs(arg)
>> >>     cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0]
>> >>     cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0]
>> >>     return [cond1, cond2]
>> >>
>> >> result=fsolve(equ, [200., 1200])
>
> thank you for your last reply...seems that the parameters of the two normals are nearly identical... anyway just two small addtional questions:
>
> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start values so far as I understand...how should these be choosen? what is recommended?

There is no general solution for choosing starting values, in your
case it should be possible to

>>> q = np.array([0.5, 0.9])
>>> cr = x/stats.norm.ppf(0.5 + q/2.)
>>> x = [270, 500]
>>> q = np.array([0.5, 0.9])
>>> x = [270, 500]
>>> cr = x/stats.norm.ppf(0.5 + q/2.)
>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500, scale=cr[1])
0.89999999999999991
>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0], scale=cr[0])
0.0011545021185267457
>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0], scale=cr[0])
0.000996601515122153
>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0], scale=cr[0])
0.5
>>> sol = fsolve(equ, np.sort(cr))

there are some numerical problems finding the solution (???)

>>> equ(sol)
array([-0.05361093,  0.05851309])
>>> from pprint import pprint
>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1))
(array([ 354.32616549,  354.69918062]),
 {'fjac': array([[-0.7373189 , -0.67554484],
       [ 0.67554484, -0.7373189 ]]),
  'fvec': array([-0.05361093,  0.05851309]),
  'nfev': 36,
  'qtf': array([  1.40019135e-07,  -7.93593929e-02]),
  'r': array([ -5.21390161e-04,  -1.21700831e-03,   3.88274320e-07])},
 5,
 'The iteration is not making good progress, as measured by the \n
improvement from the last ten iterations.')

>
> 2) How can that be solve if I have I third condition (overfitted) can that be used as well or how does the alternative look like?

use optimize.leastsq on equ (I never tried this for this case)
use fmin on the sum of squared errors

if the intervals for the probabilities are non-overlapping (interval
data), then there is an optimal weighting matrix, (but my code for
that in the statsmodels.sandbox is not verified).

Josef

>
> /johannes
>
>> >>
>> >> print result
>> >>
>> >> but in the results I get the parameters are very close to each other
>> >> [-356.5283675   353.82544075]
>> >>
>> >> the pdf looks just like a mixture of 2 normals both with loc=0, then
>> >> maybe the cdf of norm can be used directly
>> >
>> >
>> > Thank you for that hint... First yes these are 2 superimposed normals
>> but for other reasons I want to use the original formula instead of the
>> stats.functions...
>> >
>> > anyway there is still a thing...the locator s1 and s2 are like the scale
>> parameter of stats.norm so the are both + and -. For fsolve above it seems
>> that I get only one parameter (s1 or s2) but for the positive and negative
>> side of the distribution. So in actually there are four parameters -s1,
>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve to look
>> for the two values only in the positive range...
>>
>> It doesn't really matter, if the scale only shows up in quadratic
>> terms, or as in my initial change I added a absolute value, so whether
>> it's positive or negative, it's still only one value, and we
>> interprete it as postive scale
>>
>> s1 = sqrt(s1**2)
>>
>> Josef
>>
>> >
>> > any guesses?
>> >
>> > /J
>> >
>> >>
>> >> >>> from scipy import stats
>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350)
>> >> 0.55954705470577526
>> >> >>>
>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354)
>> >> 0.55436474670960978
>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354)
>> >> 0.84217642881921018
>> >>
>> >> Josef
>> >> >
>> >> >
>> >> > /Johannes
>> >> >>
>> >> >> >
>> >> >> > /Johannes
>> >> >> >
>> >> >> > --
>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren!
>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>> >> >> > _______________________________________________
>> >> >> > SciPy-User mailing list
>> >> >> > SciPy-User at scipy.org
>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >> >
>> >> >> _______________________________________________
>> >> >> SciPy-User mailing list
>> >> >> SciPy-User at scipy.org
>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> > --
>> >> > NEU: FreePhone - kostenlos mobil telefonieren!
>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>> >> > _______________________________________________
>> >> > SciPy-User mailing list
>> >> > SciPy-User at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> > --
>> > NEU: FreePhone - kostenlos mobil telefonieren!
>> > Jetzt informieren: http://www.gmx.net/de/go/freephone
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> --
> NEU: FreePhone - kostenlos mobil telefonieren!
> Jetzt informieren: http://www.gmx.net/de/go/freephone
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>