Section 9.6 Linear Correlation

Post on 06-Jan-2016

21 views 1 download

description

Section 9.6 Linear Correlation. Objectives: 1.To see the method of least squares to determine the best-fit line through a set of data points. 2.To calculate correlation and coefficient of determination. Population of a bacteria culture after every generation. GenPop 11 23 - PowerPoint PPT Presentation

Transcript of Section 9.6 Linear Correlation

Section 9.6

Linear Correlation

Section 9.6

Linear Correlation

Objectives:1. To see the method of least

squares to determine the best-fitline through a set of data points.

2. To calculate correlation and coefficient of determination.

Objectives:1. To see the method of least

squares to determine the best-fitline through a set of data points.

2. To calculate correlation and coefficient of determination.

Gen Pop1 12 33 24 35 56 47 5

Gen Pop1 12 33 24 35 56 47 5

1 2 3 4 5 6 71 2 3 4 5 6 7

654321

654321P

opul

atio

nP

opul

atio

n

GenerationGeneration

Population of a bacteria culture after every generationPopulation of a bacteria culture after every generation

Finding the line of best fit is called linear regression.

Correlation measures the strength of the relationship between two variables.

Finding the line of best fit is called linear regression.

Correlation measures the strength of the relationship between two variables.

Suppose y = mx + b is the equation of the best-fit line. For each data point (xi, yi), you could calculate the predicted y-value, yi (yi-hat), by the line yi = mxi + b.

Suppose y = mx + b is the equation of the best-fit line. For each data point (xi, yi), you could calculate the predicted y-value, yi (yi-hat), by the line yi = mxi + b.

ˆ̂ˆ̂

To have a good model, the yi on the best-fit line should be close to the yi of the original data for each xi.

To have a good model, the yi on the best-fit line should be close to the yi of the original data for each xi.

ˆ̂

Since the sum of the deviations will be zero, we will minimize the sum of the squared deviations.

Since the sum of the deviations will be zero, we will minimize the sum of the squared deviations.

This method is called the method of least squares. Since the sum of the squared deviations represents the error between the line and actual data, SSE is used as an abbreviation for the sum of squares error.

This method is called the method of least squares. Since the sum of the squared deviations represents the error between the line and actual data, SSE is used as an abbreviation for the sum of squares error.

nn

i=1i=1

ˆ̂(yi – yi)2(yi – yi)2SSE =SSE = nn

ii

(yi – mxi – b)2(yi – mxi – b)2== i=1i=1

----==nn

i=1i=1iiiixyxy ))yyyy)()(xxxx((SSSS

--==nn

i=1i=1

22iiyy ))yyyy((SSSS

--==nn

i=1i=1

22iixx ))xxxx((SSSS

You can also compute the sum of squared deviations for the x and y variables separately.

You can also compute the sum of squared deviations for the x and y variables separately.

linelinefit fit --bestbest thethe squares,squares,least least ofof methodmethod thethe UsingUsing

..xxmmyy bbintercept intercept -- y yandand --==

SSSS

SSSS mm slopeslope hashas bb mxmx yy

xx

xyxy==++==

Theorem 9.6: Linear RegressionTheorem 9.6: Linear Regression

EXAMPLE 1 Give the equation of the line for the bacteria population. Predict the population after the eighth generation.

EXAMPLE 1 Give the equation of the line for the bacteria population. Predict the population after the eighth generation.

yi = 1+3+2+3+5+4+5 = 23yi = 1+3+2+3+5+4+5 = 23

xi = 1+2+3+4+5+6+7 = 28xi = 1+2+3+4+5+6+7 = 28

x = 28/7 = 4x = 28/7 = 4

y = 23/7 = 3.29y = 23/7 = 3.29

1 1 -3 -2.29 9 5.22 6.86

2 3 -2 -0.29 4 0.08 0.57

3 2 -1 -1.29 1 1.65 1.29

4 3 0 -0.29 0 0.08 0.00

5 5 1 1.71 1 2.94 1.71

6 4 2 0.71 4 0.51 1.43

7 5 3 1.71 9 2.94 5.14

1 1 -3 -2.29 9 5.22 6.86

2 3 -2 -0.29 4 0.08 0.57

3 2 -1 -1.29 1 1.65 1.29

4 3 0 -0.29 0 0.08 0.00

5 5 1 1.71 1 2.94 1.71

6 4 2 0.71 4 0.51 1.43

7 5 3 1.71 9 2.94 5.14

xi yi xi-x yi-y (xi-x)2 (yi-y)2 (xi-x)(yi-y)xi yi xi-x yi-y (xi-x)2 (yi-y)2 (xi-x)(yi-y)

1 1

2 3

3 2

4 3

5 5

6 4

7 5

1 1

2 3

3 2

4 3

5 5

6 4

7 5

2828 13.4313.43 17.0017.00

y = mx + b = 0.61x + 0.86y = mx + b = 0.61x + 0.86

SSx = 28 SSy = 13.43 SSxy = 17SSx = 28 SSy = 13.43 SSxy = 17

0.610.6128281717

SSSSSSSS

m =m =xx

xyxy====

= 3.29 - (0.61)(4) = 0.86= 3.29 - (0.61)(4) = 0.86

f(8) = 0.61(8) + 0.86 = 5.71f(8) = 0.61(8) + 0.86 = 5.71

b = y - mx b = y - mx

Correlation A measure of the strength of the relation between two variables using

the formula

Correlation A measure of the strength of the relation between two variables using

the formulaSSxSSySSxSSy

SSxySSxyr =r =

DefinitionDefinition

Coefficient of determination The square of the correlation, r2.Coefficient of determination The square of the correlation, r2.

DefinitionDefinition

The ranges for these measures are 0 r2 1 and -1 r 1. When all the data falls exactly on the least squares line, the model has no error and SSE = 0. This means that r2 = 1 (and r = 1 or -1). If the model does not help at all, and there is no reduction in error, then SSE = SSy, making r2 = 0 (and r = 0).

The ranges for these measures are 0 r2 1 and -1 r 1. When all the data falls exactly on the least squares line, the model has no error and SSE = 0. This means that r2 = 1 (and r = 1 or -1). If the model does not help at all, and there is no reduction in error, then SSE = SSy, making r2 = 0 (and r = 0).

A correlation of 0 means the model is worthless, and a correlation of ±1 means that it is perfect.

A correlation of 0 means the model is worthless, and a correlation of ±1 means that it is perfect.

EXAMPLE 2 Find the correlation between generation and population size for bacteria.

EXAMPLE 2 Find the correlation between generation and population size for bacteria.

SSxSSySSxSSy

SSxy SSxy r =r =

28(13.43)28(13.43)

1717r =r = ≈ 0.88≈ 0.88

Since r > 0, the positive correlation tells us that the slope of the best-fit line is positive. Since r2 = 0.77, using the line provides a 77% reduction in error over using the average, the horizontal line.

Since r > 0, the positive correlation tells us that the slope of the best-fit line is positive. Since r2 = 0.77, using the line provides a 77% reduction in error over using the average, the horizontal line.

Homework

pp. 477-479

Homework

pp. 477-479

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find1. the slope of the best-fit line.

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find1. the slope of the best-fit line.

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find2. the intercept of the best-fit line.

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find2. the intercept of the best-fit line.

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find 3. the equation of the best-fit line.

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find 3. the equation of the best-fit line.

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find4. the correlation r and its meaning.

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find4. the correlation r and its meaning.

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find5. the error SSE of the model.

Given SSx = 100, SSy = 25, SSxy = -50, y = 4, and x = 6, find5. the error SSE of the model.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then 6. predict y when x is 8.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then 6. predict y when x is 8.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then7. find SSxy.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then7. find SSxy.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then8. find r.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then8. find r.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then9. interpret r.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then9. interpret r.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then10. find SSE.

If y = 4x + 3 is the best-fit line by the method of least squares and SSx = 2, and SSy = 71, then10. find SSE.

■ Cumulative Review:

Consider the function: f(x) = x4 + 2x3 – 35x2 – 36x + 180.

31. Find the zeros of the function.

■ Cumulative Review:

Consider the function: f(x) = x4 + 2x3 – 35x2 – 36x + 180.

31. Find the zeros of the function.

■ Cumulative Review:

Consider the function: f(x) = x4 + 2x3 – 35x2 – 36x + 180.

32. Is the function even? odd? Identify any symmetry.

■ Cumulative Review:

Consider the function: f(x) = x4 + 2x3 – 35x2 – 36x + 180.

32. Is the function even? odd? Identify any symmetry.

■ Cumulative Review:

Consider the function: f(x) = x4 + 2x3 – 35x2 – 36x + 180.

33. Graph the function.

■ Cumulative Review:

Consider the function: f(x) = x4 + 2x3 – 35x2 – 36x + 180.

33. Graph the function.

■ Cumulative Review:

34. Solve the equation x3 + 125 = 0.

■ Cumulative Review:

34. Solve the equation x3 + 125 = 0.

■ Cumulative Review:

35. Solve the system using Cramer’s rule.

■ Cumulative Review:

35. Solve the system using Cramer’s rule.

4x – 5y = 83x + 2y = 44x – 5y = 83x + 2y = 4