4.4 Application--OLS Estimation. Background When we do a study of data and are looking at the...

21
4.4 Application--OLS Estimation

Transcript of 4.4 Application--OLS Estimation. Background When we do a study of data and are looking at the...

4.4 Application--OLS Estimation

Background

• When we do a study of data and are looking at the relationship between 2 variables, and have reason to believe that a linear fit is appropriate, we need a way to determine a model that gives the optimal linear fit (one that best reflects the trend in the data).

• Ex. Relationship between hits and RBI’s • A perfect fit would result if every point in the data exactly

satisfied some equation y = a + bx , but this is next to impossible -- too much variability in real world data.

So what do we do?

• Assume y = a + bx is the best fit for the data.

• Then we can find a point on the line, (xi, f(xi)), with the same x-value as each of the points in the data set, (xi,yi)

• Draw a diagram on the board.

• Then we say di = yi - f(xi) = distance from point in the data to the point on the line.

• di is also called the error or residual at xi -- how far data is from line

• So to measure the fit of the line, we could add up all the errors, d1 + d2 + … + dn

• However, note that a best fit line will have some data above and some below, so this error will turn out to be 0.

So what do we do?

• Therefore, we need to make all of our errors positive by taking either |di| or di

2.

• |di| will give the same weight to large and small errors, where di

2 gives more weight to larger errors

• Ex d’s: {0,0,0,0,50} vs {10,10,10,10,10}

avg of |di| = 10 = 10d1

2 d22 ... d5

2

510

d12 d2

2 ... d52

54.51

Which one is a better fit?• Which should be considered a better fit?

• graph that goes right through 4 points, but nowhere near #5• graph that is same distance from each point (yes)

• |di| method will not show this, but di2 method will.

Ordinary Least Squares Method• So, we will select the model which minimizes the sum of squared residuals:

S = d12 + d2

2 + … + dn2 = [y1 - f(x1)]2 + …+ [yn - f(xn)]2

• This line is called the least squares approximating line

• We can use vectors to help us choose y = a + bx to minimize S

Ordinary Least Squares Method

• S, which we will minimize, is just the sum of the squares of the entries in the matrix, Y-MZ.• If n = 3, then

y1 f (x1)

y2 f (x2 )

...

yn f (xn )

y1 a bx1 y2 a bx2

...

yn a bxn

Y MZ.....Y

y1y2

...

yn

.....M

1 x1

1 x2

... ...

1 xn

...Z a

b

S y1 f (x1 ) 2 y2 f (x2 ) 2

y3 f (x3) 2

Y-MZ is a vector = y1 f (x1 ) , y2 f (x2 ) , y3 f (x3)

Then S = || Y-MZ||2

Ordinary Least Squares MethodS = || Y-MZ||2

Recall Y y1

y2

y3

....M

1 x1

1 x2

1 x3

....Z

a

b

Y and M are given since we have 3 data points to fit. We simply need to select Z to minimize S.Let P be the set of all vectors MZ where Z varies:

P MZ | Z a

b

a bx1

a bx2

a bx3

a,b

Ordinary Least Squares MethodIt turns out that all of the vectors in set P lie in the same plane through the origin (we discuss why later in the book).

The equation of the plane is x2 x3 x x3 x1 y x1 x2 z 0

Take a=0,b=1, or a,b=0 and find that this plane contains:

U x1

x2

x3

........V

1

1

1

And the normal vector will be U x V = x2 x3

x3 x1

x1 x2

Ordinary Least Squares Method Y

Y Y-MA

O MZ MA

Recall that we are trying to minimize S = || Y-MZ||2

Y = (y1,y2,y3) is a point in space, and MZ is some vector in the set P which we have illustrated as a plane.

S = || Y-MZ||2 is the squared distance from the point to the plane, so if we can find the point,MA, in the plane closest to Y, we will have our solution.

Ordinary Least Squares Method Y

Y Y-MA

O MZ MA

Y-MA is orthogonal to all vectors,MZ, in the plane, so

(MZ) • (Y-MA) = 0

Note this rule for dot products when vectors are written as matrices:

U V x1

y1

z1

x2

y2

z2

UTV

Ordinary Least Squares Method Y

Y Y-MA

O MZ MA

0 = (MZ) • (Y-MA) =(MZ)T(Y-MA)=ZTMT(Y-MA)

=ZT(MTY-MTMA) = Z • (MTY-MTMA)

The last dot product is in two dimensions and tells that (MTY-MTMA) is orthogonal to every possible Z which can only happen if (MTY-MTMA) = 0,so

MTY=MTMA called the normal equations for A

Ordinary Least Squares Method Y

Y Y-MA

O MZ MA

With x1, x2,x3 all distinct, we can show that MTM is invertible, so from MTY=MTMA ,we get A = (MTM)-1MTY,

This will give us A=(a,b) which will give then give us the point (a+bx1,a+bx2,a+bx3) closest to Y.

Thus the best fit line will then be y=a + bx.

Ordinary Least Squares Method Y

Y Y-MA

O MZ MA

Recall that this argument started by defining n=3 so that we could use a 3 dimensional argument with vectors. The argument becomes more complex, but does extend to any n.

Theorem 1

• Suppose that n data points (x1,y1),…,(xn,yn) of which at least two x’s are distinct. If

Y

y1

y2

...

yn

......M

1 x1

1 x2

... ...

1 xn

Then, the least squares approximating line has equation y=a0 + a1x where A = is found by Gaussian

elimination from the normal equations MTY=MTMA

Since at least two x’s are distinct, MTM is invertible so A=(MTM)-1MTY

a0

a1

Example

• Find the least squares approximating line for the following data: (1,0),(2,2),(4,5),(6,9),(8,12)• See what you get with the TI83+

Example

• Find an equation of the plane through P(1,3,2) with normal (2,0,-1).

We extend further...

We can generalize to select the least squares approximating polynmial of degree m: f(x)=a0+a1x+a2x2+…+anxn where we estimate the a’s

Theorem 2 (proof in ch 6)

If n data points are given with at least m+1 x’s distinct, then

Y

y1

y2

...

yn

...M

1 x1 x12 ... x1

m

1 x2 x22 ... x2

m

... ... ... ... ...

1 xn xn2 ... xn

m

Then least squares approximating polynomial of degree m is: f(x)=a0+a1x+a2x2+…+anxn where

Aa1

...

am

Is found by Gaussian elim from

normal equations MTY=MTMA

Since at least m+1 x’s are distinct, MTM is invertible so

A=(MTM)-1MTY

Note• we need at least one more data point than the degree of the polynomial we are trying to estimate.

• I.e. With n data points, we could not estimate a polynomial of degree n.

Example• Find the least squares approximating quadratic for the following data points: (-2,0),(0,-4),(2,-10),(4,-9),(6,-3)