4.4 Application--OLS Estimation. Background When we do a study of data and are looking at the...

4.4 Application--OLS Estimation

Background

• When we do a study of data and are looking at the relationship between 2 variables, and have reason to believe that a linear fit is appropriate, we need a way to determine a model that gives the optimal linear fit (one that best reflects the trend in the data).

• Ex. Relationship between hits and RBI’s • A perfect fit would result if every point in the data exactly

satisfied some equation y = a + bx , but this is next to impossible -- too much variability in real world data.

So what do we do?

• Assume y = a + bx is the best fit for the data.

• Then we can find a point on the line, (xi, f(xi)), with the same x-value as each of the points in the data set, (xi,yi)

• Draw a diagram on the board.

• Then we say di = yi - f(xi) = distance from point in the data to the point on the line.

• di is also called the error or residual at xi -- how far data is from line

• So to measure the fit of the line, we could add up all the errors, d1 + d2 + … + dn

• However, note that a best fit line will have some data above and some below, so this error will turn out to be 0.

So what do we do?

• Therefore, we need to make all of our errors positive by taking either |di| or di

2.

• |di| will give the same weight to large and small errors, where di

2 gives more weight to larger errors

• Ex d’s: {0,0,0,0,50} vs {10,10,10,10,10}

avg of |di| = 10 = 10d1

2 d22 ... d5

2

510

d12 d2

2 ... d52

54.51

Which one is a better fit?• Which should be considered a better fit?

• graph that goes right through 4 points, but nowhere near #5• graph that is same distance from each point (yes)

• |di| method will not show this, but di2 method will.

Ordinary Least Squares Method• So, we will select the model which minimizes the sum of squared residuals:

S = d12 + d2

2 + … + dn2 = [y1 - f(x1)]2 + …+ [yn - f(xn)]2

• This line is called the least squares approximating line

• We can use vectors to help us choose y = a + bx to minimize S

Ordinary Least Squares Method

• S, which we will minimize, is just the sum of the squares of the entries in the matrix, Y-MZ.• If n = 3, then

y1 f (x1)

y2 f (x2 )

...

yn f (xn )

y1 a bx1 y2 a bx2

...

yn a bxn

Y MZ.....Y

y1y2

...

yn

.....M

1 x1

1 x2

... ...

1 xn

...Z a

b

S y1 f (x1 ) 2 y2 f (x2 ) 2

y3 f (x3) 2

Y-MZ is a vector = y1 f (x1 ) , y2 f (x2 ) , y3 f (x3)

Then S = || Y-MZ||2

Ordinary Least Squares MethodS = || Y-MZ||2

Recall Y y1

y2

y3

....M

1 x1

1 x2

1 x3

....Z

a

b

Y and M are given since we have 3 data points to fit. We simply need to select Z to minimize S.Let P be the set of all vectors MZ where Z varies:

P MZ | Z a

b

a bx1

a bx2

a bx3

a,b

Ordinary Least Squares MethodIt turns out that all of the vectors in set P lie in the same plane through the origin (we discuss why later in the book).

The equation of the plane is x2 x3 x x3 x1 y x1 x2 z 0

Take a=0,b=1, or a,b=0 and find that this plane contains:

U x1

x2

x3

........V

1

1

1

And the normal vector will be U x V = x2 x3

x3 x1

x1 x2

Ordinary Least Squares Method Y

Y Y-MA

O MZ MA

Recall that we are trying to minimize S = || Y-MZ||2

Y = (y1,y2,y3) is a point in space, and MZ is some vector in the set P which we have illustrated as a plane.

S = || Y-MZ||2 is the squared distance from the point to the plane, so if we can find the point,MA, in the plane closest to Y, we will have our solution.


Y Y-MA

O MZ MA

Y-MA is orthogonal to all vectors,MZ, in the plane, so

(MZ) • (Y-MA) = 0

Note this rule for dot products when vectors are written as matrices:

U V x1

y1

z1

x2

y2

z2

UTV


Y Y-MA

O MZ MA

0 = (MZ) • (Y-MA) =(MZ)T(Y-MA)=ZTMT(Y-MA)

=ZT(MTY-MTMA) = Z • (MTY-MTMA)

The last dot product is in two dimensions and tells that (MTY-MTMA) is orthogonal to every possible Z which can only happen if (MTY-MTMA) = 0,so

MTY=MTMA called the normal equations for A


Y Y-MA

O MZ MA

With x1, x2,x3 all distinct, we can show that MTM is invertible, so from MTY=MTMA ,we get A = (MTM)-1MTY,

This will give us A=(a,b) which will give then give us the point (a+bx1,a+bx2,a+bx3) closest to Y.

Thus the best fit line will then be y=a + bx.


Y Y-MA

O MZ MA

Recall that this argument started by defining n=3 so that we could use a 3 dimensional argument with vectors. The argument becomes more complex, but does extend to any n.

Theorem 1

• Suppose that n data points (x1,y1),…,(xn,yn) of which at least two x’s are distinct. If

Y

y1

y2

...

yn

......M

1 x1

1 x2

... ...

1 xn

Then, the least squares approximating line has equation y=a0 + a1x where A = is found by Gaussian

elimination from the normal equations MTY=MTMA

Since at least two x’s are distinct, MTM is invertible so A=(MTM)-1MTY

a0

a1

Example

• Find the least squares approximating line for the following data: (1,0),(2,2),(4,5),(6,9),(8,12)• See what you get with the TI83+

Example

• Find an equation of the plane through P(1,3,2) with normal (2,0,-1).

We extend further...

We can generalize to select the least squares approximating polynmial of degree m: f(x)=a0+a1x+a2x2+…+anxn where we estimate the a’s

Theorem 2 (proof in ch 6)

If n data points are given with at least m+1 x’s distinct, then

Y

y1

y2

...

yn

...M

1 x1 x12 ... x1

m

1 x2 x22 ... x2

m

... ... ... ... ...

1 xn xn2 ... xn

m

Then least squares approximating polynomial of degree m is: f(x)=a0+a1x+a2x2+…+anxn where

Aa1

...

am

Is found by Gaussian elim from

normal equations MTY=MTMA

Since at least m+1 x’s are distinct, MTM is invertible so

A=(MTM)-1MTY

Note• we need at least one more data point than the degree of the polynomial we are trying to estimate.

• I.e. With n data points, we could not estimate a polynomial of degree n.

Example• Find the least squares approximating quadratic for the following data points: (-2,0),(0,-4),(2,-10),(4,-9),(6,-3)

4.4 Application--OLS Estimation. Background When we do a study of data and are looking at the...

Documents

Transcript of 4.4 Application--OLS Estimation. Background When we do a study of data and are looking at the...