[SciPy-User] Least Square fit and goodness of fit

Benedikt Riedel briedel at wisc.edu
Mon May 17 02:01:07 EDT 2010


Thanks for the clarification. I am still not sure how to get the chi-squared
value of my regression though. When I use the formula under "Regression
Analysis" here

http://en.wikipedia.org/wiki/Goodness_of_fit

I get a chi-square somewhere around 19, which seems way to large compared to
the value of 3.2 I get for the same data set when I fit it using gnuplot.
Where gnuplot supposedly used the weighted sum of squares of residuals. I do
not fully this because of the results I get.

Here is the python code I used:

chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau),
2)/pow(R4errctsdataselect,2)))/dof

Sorry for being so thick headed, statistics is just beyond me at times.

Cheers,

Ben

On Mon, May 17, 2010 at 00:20, <josef.pktd at gmail.com> wrote:

> On Mon, May 17, 2010 at 12:18 AM, Benedikt Riedel <briedel at wisc.edu>
> wrote:
> >
> >
> > On Sun, May 16, 2010 at 22:33, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sun, May 16, 2010 at 9:05 PM, Benedikt Riedel <briedel at wisc.edu>
> wrote:
> >> > What I still do not understand is the fact that curve_fit gives me a
> >> > different output then leastsq, even though curve_fit calls leastsq.
> >> >
> >> > I tried to get the chi-squared because we want to plot contours of
> >> > chi-square from the minimum to the maximum. I used following code:
> >> >
> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> >> > pinit = [20,20.]
> >> >
> >> > def func(x, a, b):
> >> >      return a*exp(-x) + b
> >> >
> >> > pfinal, covar = curve_fit(func,tau, R4ctsdataselect, p0=pinit,
> >> > sigma=R4errctsdataselect)
> >>
> >> this uses weighted least squares
> >> sigma : None or N-length sequence
> >>    If not None, it represents the standard-deviation of ydata. This
> >> vector, if given, will be used as weights in the least-squares problem
> >>
> >> In your initial example with leastsq you don't have any weighting,
> >> it's just ordinary least squares
> >>
> >> maybe that's the difference.
> >>
> >>
> >
> > Yeah I guess that will be it.
> >
> >>
> >> > print pfinal
> >> > print covar
> >> > dof=size(tau)-size(pinit)
> >> > print dof
> >> > chi2=(sum(pow(R4ctsdataselect-fitfunc(pinit, tau), 2)/fitfunc(pinit,
> >> > tau)))/dof
> >> > print chi2
> >> >
> >> > I am not 100% sure I am doing the degrees of freedom calculation
> right.
> >> > I
> >> > got the chi-square formula from the Pearson chi-squared test.
> >>
> >> I don't recognize your formula for chi2, and I don't see the
> >> connection to Pearson chi-squared test .
> >>
> >> Do you have a reference?
> >>
> >
> > I based my use of the Pearson test from what I read in an Econometrics
> book,
> > but wiki has the a pretty good description. I basically based it off the
> > example there. Where the expected would be what comes out of the fit and
> > what you is the "R4ctsdataselect" for those specific values.
> >
> > http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test
>
> I looked at that, but it's a completely different case, the values in
> the formulas are frequencies
>
>    Oi = an observed frequency;
>    Ei = an expected (theoretical) frequency, asserted by the null
> hypothesis;
>
> not points on a regression curve
>
> Josef
>
> >
> >
> >>
> >> Josef
> >>
> >
> > Thanks again
> >
> > Ben
> >
> >
> >>
> >> >
> >> > Thank you very much for the help so far.
> >> >
> >> > Cheers,
> >> >
> >> > Ben
> >> >
> >> > On Sun, May 16, 2010 at 05:50, <josef.pktd at gmail.com> wrote:
> >> >>
> >> >> On Sun, May 16, 2010 at 12:12 AM, Benedikt Riedel <briedel at wisc.edu>
> >> >> wrote:
> >> >> >
> >> >> >
> >> >> > On Fri, May 14, 2010 at 14:51, <josef.pktd at gmail.com> wrote:
> >> >> >>
> >> >> >> On Fri, May 14, 2010 at 3:01 PM, Benedikt Riedel <
> briedel at wisc.edu>
> >> >> >> wrote:
> >> >> >> > Hey,
> >> >> >> >
> >> >> >> > I am fairly new Scipy and am trying to do a least square fit to
> a
> >> >> >> > set
> >> >> >> > of
> >> >> >> > data. Currently, I am using following code:
> >> >> >> >
> >> >> >> > fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
> >> >> >> > errfunc = lambda p, x, y: (y-fitfunc(p,x))
> >> >> >> > pinit = [20,20.]
> >> >> >> > out = leastsq(errfunc, pinit, args=(tau,R4ctsdataselect),
> >> >> >> > full_output=1)
> >> >> >> >
> >> >> >> > I am now trying to get the goodness of fit out of this data. I
> am
> >> >> >> > sort
> >> >> >> > of
> >> >> >> > running into a brick wall because I found a lot of conflicting
> >> >> >> > ways
> >> >> >> > of
> >> >> >> > how
> >> >> >> > to calculate it.
> >> >> >>
> >> >> >> For regression the usual is
> >> >> >> http://en.wikipedia.org/wiki/Coefficient_of_determination
> >> >> >> coefficient of determination is
> >> >> >>
> >> >> >>    R^2 = 1 - {SS_{err} / SS_{tot}}
> >> >> >>
> >> >> >> Note your fitfunc is linear in parameters and can be better
> >> >> >> estimated
> >> >> >> by linear least squares, OLS.
> >> >> >> linear regression is handled in statsmodels and you can get lot's
> of
> >> >> >> statistics without worrying about the formulas.
> >> >> >> If you only have one slope parameter, then scipy.stats.linregress
> >> >> >> also
> >> >> >> works
> >> >> >>
> >> >> >
> >> >> > Thanks for the information. I am still note quite sure if this is
> >> >> > what
> >> >> > my
> >> >> > boss wants because there should not be an average y value.
> >> >>
> >> >> The definition of Rsquared is pretty uncontroversial with the
> y.mean()
> >> >> correction, if there is a constant in the regression (although I know
> >> >> mainly the linear case for this).
> >> >>
> >> >> If there is no constant in the regression, the definition or Rsquared
> >> >> is not clear/unambiguous, but usually used without mean correction of
> >> >> y.
> >> >>
> >> >> Josef
> >> >>
> >> >> >
> >> >> >>
> >> >> >> scipy.optimize.curve_fit (scipy 0.8) can also give you the
> >> >> >> covariance
> >> >> >> of the parameter estimates.
> >> >> >> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.curve_fit
> >> >> >
> >> >> > I have been trying this out, but the fit just looks horrid compared
> >> >> > to
> >> >> > using
> >> >> > leastsq method even though they call the same function according to
> >> >> > the
> >> >> > documentation.
> >> >> >
> >> >> >>
> >> >> >> > I am aware of the chisquare function in stats function, but the
> >> >> >> > documentation seems a little confusing to me. Any help would be
> >> >> >> > greatly
> >> >> >> > appreciates.
> >> >> >>
> >> >> >> chisquare and others like kolmogorov-smirnov are more for testing
> >> >> >> the
> >> >> >> goodness-of-fit of entire distributions, not for how well a curve
> or
> >> >> >> line fits the data.
> >> >> >>
> >> >> >
> >> >> > That is what I thought, which brought up my confusion when I asked
> >> >> > other
> >> >> > people and they told me to use that
> >> >> >
> >> >> >>
> >> >> >> Josef
> >> >> >>
> >> >> >> >
> >> >> >> > Thanks very much in advance.
> >> >> >> >
> >> >> >> > Cheers,
> >> >> >> >
> >> >> >> > Ben
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > _______________________________________________
> >> >> >> > SciPy-User mailing list
> >> >> >> > SciPy-User at scipy.org
> >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >> >
> >> >> >> >
> >> >> >> _______________________________________________
> >> >> >> SciPy-User mailing list
> >> >> >> SciPy-User at scipy.org
> >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Benedikt Riedel
> >> >> > Graduate Student University of Wisconsin-Madison
> >> >> > Department of Physics
> >> >> > Office: 2304 Chamberlin Hall
> >> >> > Lab: 6247 Chamberlin Hall
> >> >> > Tel:  (608) 301-5736
> >> >> > Cell: (213) 519-1771
> >> >> > Lab: (608) 262-5916
> >> >> >
> >> >> > _______________________________________________
> >> >> > SciPy-User mailing list
> >> >> > SciPy-User at scipy.org
> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >> >
> >> >> >
> >> >> _______________________________________________
> >> >> SciPy-User mailing list
> >> >> SciPy-User at scipy.org
> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> >
> >> > --
> >> > Benedikt Riedel
> >> > Graduate Student University of Wisconsin-Madison
> >> > Department of Physics
> >> > Office: 2304 Chamberlin Hall
> >> > Lab: 6247 Chamberlin Hall
> >> > Tel:  (608) 301-5736
> >> > Cell: (213) 519-1771
> >> > Lab: (608) 262-5916
> >> >
> >> > _______________________________________________
> >> > SciPy-User mailing list
> >> > SciPy-User at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> >
> > --
> > Benedikt Riedel
> > Graduate Student University of Wisconsin-Madison
> > Department of Physics
> > Office: 2304 Chamberlin Hall
> > Lab: 6247 Chamberlin Hall
> > Tel:  (608) 301-5736
> > Cell: (213) 519-1771
> > Lab: (608) 262-5916
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



-- 
Benedikt Riedel
Graduate Student University of Wisconsin-Madison
Department of Physics
Office: 2304 Chamberlin Hall
Lab: 6247 Chamberlin Hall
Tel:  (608) 301-5736
Cell: (213) 519-1771
Lab: (608) 262-5916
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100517/4bbf35e3/attachment.html>


More information about the SciPy-User mailing list