[SciPy-User] Least Square fit and goodness of fit

josef.pktd at gmail.com josef.pktd at gmail.com
Mon May 17 07:35:27 EDT 2010


On Mon, May 17, 2010 at 2:01 AM, Benedikt Riedel <briedel at wisc.edu> wrote:
> Thanks for the clarification. I am still not sure how to get the chi-squared
> value of my regression though. When I use the formula under "Regression
> Analysis" here
>
> http://en.wikipedia.org/wiki/Goodness_of_fit
>
> I get a chi-square somewhere around 19, which seems way to large compared to
> the value of 3.2 I get for the same data set when I fit it using gnuplot.
> Where gnuplot supposedly used the weighted sum of squares of residuals. I do
> not fully this because of the results I get.
>
> Here is the python code I used:
>
> chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau),
> 2)/pow(R4errctsdataselect,2)))/dof


from some gnuplot help page it looks like what they call chisquare is WSSR/dof

which would be something like

chi2=(sum(  ( R4ctsdataselect-fitfunc(pinit, tau)) /
sqrt(R4errctsdataselect) )**2  )/dof

I'm not sure whether the sqrt is in there or not, because I don't
remember the normalization that is used, weights or weights squared

Josef




>
> Sorry for being so thick headed, statistics is just beyond me at times.
>
> Cheers,
>
> Ben
>
> On Mon, May 17, 2010 at 00:20, <josef.pktd at gmail.com> wrote:
>>
>> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel <briedel at wisc.edu>
>> wrote:
>> >
>> >
>> > On Sun, May 16, 2010 at 22:33, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu>
>> >> wrote:
>> >> > What I still do not understand is the fact that curve_fit gives me a
>> >> > different output then leastsq, even though curve_fit calls leastsq.
>> >> >
>> >> > I tried to get the chi-squared because we want to plot contours of
>> >> > chi-square from the minimum to the maximum. I used following code:
>> >> >
>> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>> >> > pinit = [20,20.]
>> >> >
>> >> > def func(x, a, b):
>> >> >      return a*exp(-x) + b
>> >> >
>> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
>> >> > sigma=R4errctsdataselect)
>> >>
>> >> this uses weighted least squares
>> >> sigma : None or N-length sequence
>> >>    If not None, it represents the standard-deviation of ydata. This
>> >> vector, if given, will be used as weights in the least-squares problem
>> >>
>> >> In your initial example with leastsq you don't have any weighting,
>> >> it's just ordinary least squares
>> >>
>> >> maybe that's the difference.
>> >>
>> >>
>> >
>> > Yeah I guess that will be it.
>> >
>> >>
>> >> > print pfinal
>> >> > print covar
>> >> > dof=size(tau)-size(pinit)
>> >> > print dof
>> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
>> >> > tau)))/dof
>> >> > print chi2
>> >> >
>> >> > I am not 100% sure I am doing the degrees of freedom calculation
>> >> > right.
>> >> > I
>> >> > got the chi-square formula from the Pearson chi-squared test.
>> >>
>> >> I don't recognize your formula for chi2, and I don't see the
>> >> connection to Pearson chi-squared test .
>> >>
>> >> Do you have a reference?
>> >>
>> >
>> > I based my use of the Pearson test from what I read in an Econometrics
>> > book,
>> > but wiki has the a pretty good description. I basically based it off the
>> > example there. Where the expected would be what comes out of the fit and
>> > what you is the "R4ctsdataselect" for those specific values.
>> >
>> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test
>>
>> I looked at that, but it's a completely different case, the values in
>> the formulas are frequencies
>>
>>    Oi = an observed frequency;
>>    Ei = an expected (theoretical) frequency, asserted by the null
>> hypothesis;
>>
>> not points on a regression curve
>>
>> Josef
>>
>> >
>> >
>> >>
>> >> Josef
>> >>
>> >
>> > Thanks again
>> >
>> > Ben
>> >
>> >
>> >>
>> >> >
>> >> > Thank you very much for the help so far.
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Ben
>> >> >
>> >> > On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
>> >> >>
>> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
>> >> >> wrote:
>> >> >> >
>> >> >> >
>> >> >> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
>> >> >> >>
>> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel
>> >> >> >> <briedel at wisc.edu>
>> >> >> >> wrote:
>> >> >> >> > Hey,
>> >> >> >> >
>> >> >> >> > I am fairly new Scipy and am trying to do a least square fit to
>> >> >> >> > a
>> >> >> >> > set
>> >> >> >> > of
>> >> >> >> > data. Currently, I am using following code:
>> >> >> >> >
>> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
>> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
>> >> >> >> > pinit = [20,20.]
>> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
>> >> >> >> > full_output=1)
>> >> >> >> >
>> >> >> >> > I am now trying to get the goodness of fit out of this data. I
>> >> >> >> > am
>> >> >> >> > sort
>> >> >> >> > of
>> >> >> >> > running into a brick wall because I found a lot of conflicting
>> >> >> >> > ways
>> >> >> >> > of
>> >> >> >> > how
>> >> >> >> > to calculate it.
>> >> >> >>
>> >> >> >> For regression the usual is
>> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
>> >> >> >> coefficient of determination is
>> >> >> >>
>> >> >> >>    R^2 = 1 - {SS_{err} / SS_{tot}}
>> >> >> >>
>> >> >> >> Note your fitfunc is linear in parameters and can be better
>> >> >> >> estimated
>> >> >> >> by linear least squares, OLS.
>> >> >> >> linear regression is handled in statsmodels and you can get lot's
>> >> >> >> of
>> >> >> >> statistics without worrying about the formulas.
>> >> >> >> If you only have one slope parameter, then scipy.stats.linregress
>> >> >> >> also
>> >> >> >> works
>> >> >> >>
>> >> >> >
>> >> >> > Thanks for the information. I am still note quite sure if this is
>> >> >> > what
>> >> >> > my
>> >> >> > boss wants because there should not be an average y value.
>> >> >>
>> >> >> The definition of Rsquared is pretty uncontroversial with the
>> >> >> y.mean()
>> >> >> correction, if there is a constant in the regression (although I
>> >> >> know
>> >> >> mainly the linear case for this).
>> >> >>
>> >> >> If there is no constant in the regression, the definition or
>> >> >> Rsquared
>> >> >> is not clear/unambiguous, but usually used without mean correction
>> >> >> of
>> >> >> y.
>> >> >>
>> >> >> Josef
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the
>> >> >> >> covariance
>> >> >> >> of the parameter estimates.
>> >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
>> >> >> >
>> >> >> > I have been trying this out, but the fit just looks horrid
>> >> >> > compared
>> >> >> > to
>> >> >> > using
>> >> >> > leastsq method even though they call the same function according
>> >> >> > to
>> >> >> > the
>> >> >> > documentation.
>> >> >> >
>> >> >> >>
>> >> >> >> > I am aware of the chisquare function in stats function, but the
>> >> >> >> > documentation seems a little confusing to me. Any help would be
>> >> >> >> > greatly
>> >> >> >> > appreciates.
>> >> >> >>
>> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing
>> >> >> >> the
>> >> >> >> goodness-of-fit of entire distributions, not for how well a curve
>> >> >> >> or
>> >> >> >> line fits the data.
>> >> >> >>
>> >> >> >
>> >> >> > That is what I thought, which brought up my confusion when I asked
>> >> >> > other
>> >> >> > people and they told me to use that
>> >> >> >
>> >> >> >>
>> >> >> >> Josef
>> >> >> >>
>> >> >> >> >
>> >> >> >> > Thanks very much in advance.
>> >> >> >> >
>> >> >> >> > Cheers,
>> >> >> >> >
>> >> >> >> > Ben
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > _______________________________________________
>> >> >> >> > SciPy-User mailing list
>> >> >> >> > SciPy-User at scipy.org
>> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >> >> >
>> >> >> >> >
>> >> >> >> _______________________________________________
>> >> >> >> SciPy-User mailing list
>> >> >> >> SciPy-User at scipy.org
>> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Benedikt Riedel
>> >> >> > Graduate Student University of Wisconsin-Madison
>> >> >> > Department of Physics
>> >> >> > Office: 2304 Chamberlin Hall
>> >> >> > Lab: 6247 Chamberlin Hall
>> >> >> > Tel:  (608) 301-5736
>> >> >> > Cell: (213) 519-1771
>> >> >> > Lab: (608) 262-5916
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > SciPy-User mailing list
>> >> >> > SciPy-User at scipy.org
>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >> >
>> >> >> >
>> >> >> _______________________________________________
>> >> >> SciPy-User mailing list
>> >> >> SciPy-User at scipy.org
>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Benedikt Riedel
>> >> > Graduate Student University of Wisconsin-Madison
>> >> > Department of Physics
>> >> > Office: 2304 Chamberlin Hall
>> >> > Lab: 6247 Chamberlin Hall
>> >> > Tel:  (608) 301-5736
>> >> > Cell: (213) 519-1771
>> >> > Lab: (608) 262-5916
>> >> >
>> >> > _______________________________________________
>> >> > SciPy-User mailing list
>> >> > SciPy-User at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> >
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> >
>> > --
>> > Benedikt Riedel
>> > Graduate Student University of Wisconsin-Madison
>> > Department of Physics
>> > Office: 2304 Chamberlin Hall
>> > Lab: 6247 Chamberlin Hall
>> > Tel:  (608) 301-5736
>> > Cell: (213) 519-1771
>> > Lab: (608) 262-5916
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> --
> Benedikt Riedel
> Graduate Student University of Wisconsin-Madison
> Department of Physics
> Office: 2304 Chamberlin Hall
> Lab: 6247 Chamberlin Hall
> Tel:  (608) 301-5736
> Cell: (213) 519-1771
> Lab: (608) 262-5916
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>



More information about the SciPy-User mailing list