
∂
D
––
∂
b
∂
D
––
∂
a
306 least squares method
interest in a scientific study (for example, the average
shoe size of adults might be linearly correlated to
height), then one can seek the equation of a line that
best fits the data. Such a line is called a regression line.
Specifically, if a study produces Npairs of data values,
(x1,y1),…,(xN,yN), then one seeks a linear equation
y=ax + bthat minimizes the total deviation of data
points from that line. This total deviation could be
measured as a sum of absolute values:
|y1– (ax1+ b)| + |y2– (ax2+ b)| +…+ |yN– (axN+ b)|
(yielding what is called the Chebyshev approximation
criterion), but this quantity is difficult to analyze using
the techniques of
CALCULUS
. (The
ABSOLUTE VALUE
function is not differentiable.)
Another measure of total deviation is the sum of all
the individual deviations squared, which, again, is a
sum of positive quantities:
D= (y1– (ax1+ b))2+ (y2– (ax2+ b))2
+…+ (yN– (axN+ b))2
The task is to choose values for aand bthat minimize
this sum. This is called the least squares criterion.
A necessary condition for Dto adopt a minimal
value is that the two partial derivatives and
equal zero, yielding the two normal equations:
Dividing through by Nand solving for a(the slope)
and b(the intercept), we obtain:
and
b= –
y– a· –
x
where –
xis the mean x-value, and –
yis the mean y-value.
Setting:
(this is the
VARIANCE
of the x-values) and
(the
COVARIANCE
of the two variables), these formulae
can be more compactly written: and b=
. Thus the least squares method gives the
equation for the line of best fit as:
Measuring the Degree of Fit
The quantity Dthat was minimized (above) is called
the “error sum of squares”:
It reflects the amount of variation of the data points
about the regression line. The total corrected sum of
squares (SST) of y:
gives a measure of the scattering of the y-values in gen-
eral. Necessarily, D≤SST. The difference, SST – D,
called the regression sum of squares, reflects the
amount of variation in the y-values explained by the
linear regression line y= ax + bwhen compared with
their general distribution. That the quantity SST – Dis
positive prompts the definition of the
CORRELATION
COEFFICIENT
, R2, given by . An exercise
in algebra shows:
RSST D
SST
2=−
SST y y
i
i
N
=−
()
=
∑2
1
D y ax b
ii
i
N
=−+
()
=
∑()
2
1
yy S
Sxx
xy
xx
−=
−
()
yS
Sx
xy
xx
−
aS
S
xy
xx
=
SNxxyy Nxy x y
xy i i
i
N
ii
i
N
=−−=
−⋅
==
∑∑
11
11
()()
SNxx Nxx
xx i
i
N
i
i
N
=−
()
=
−
==
∑∑
11
2
1
2
1
2
a
xy x y
xx
ii
i
N
i
i
N
=
−⋅
−
=
=
∑
∑
1
2
1
2
∂
∂=− − − = ⇒ + =
∂
∂=− − − = ⇒ + =
====
===
∑∑∑∑
∑∑∑
D
ayaxbx ax bx xy
D
by ax b a x Nb y
ii i
i
N
ii
i
N
ii
i
N
i
N
ii
i
N
i
i
N
i
i
N
20
20
1
2
111
111
()
()