[Offtopic] Line fitting [was Re: Numpy outlier removal]

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Jan 8 19:02:11 EST 2013


On Tue, 08 Jan 2013 04:07:08 -0500, Terry Reedy wrote:

>> But that is not fitting a line by eye, which is what I am talking
>> about.
> 
> With the line constrained to go through 0,0 a line eyeballed with a 
> clear ruler could easily be better than either regression line, as a
> human will tend to minimize the deviations *perpendicular to the line*,
> which is the proper thing to do (assuming both variables are measured
> in the same units).

It is conventional to talk about "residuals" rather than deviations.

And it could even more easily be worse than a regression line. And since 
eyeballing is entirely subjective and impossible to objectively verify, 
the line that you claim minimizes the residuals might be very different 
from the line that I claim minimizes the residuals, and no way to decide 
between the two claims.

In any case, there is a technique for working out ordinary least squares 
(OLS) linear regression using perpendicular offsets rather than vertical 
offsets:

http://mathworld.wolfram.com/LeastSquaresFittingPerpendicularOffsets.html

but in general, if you have to care about errors in the dependent 
variable, you're better off using a more powerful technique than just OLS.

The point I keep making, that everybody seems to be ignoring, is that 
eyeballing a line of best fit is subjective, unreliable and impossible to 
verify. How could I check that the line you say is the "best fit" 
actually *is* the *best fit* for the given data, given that you picked 
that line by eye? Chances are good that if you came back to the data a 
month later, you'd pick a different line!

As I have said, eyeballing a line is fine for rough back of the envelope 
type calculations, where you only care that you have a line pointing more 
or less in the right direction. But for anything where accuracy is 
required, line fitting by eye is down in the pits of things not to do, 
right next to "making up the answers you prefer".



-- 
Steven



More information about the Python-list mailing list