Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

29

Transcript of Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Page 1: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.
Page 2: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Overview

4.2 Introduction to Correlation

4.3 Introduction to Regression

Page 3: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Scatterplots

Used to summarize the relationship between two quantitative variables that have been

measured on the same element

Graph of points (x, y) each of which represents one observation from the data set

One of the variables is measured along the horizontal axis and is called the x variable

The other variable is measured along the vertical axis and is called the y variable

Page 4: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Predictor Variable and Response Variable

The value of the x variable can be used to predict or estimate the value of the

y variable

The x variable is referred to as the predictor variable

The y variable is called the response variable

Page 5: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Scatterplot Terminology

Note the terminology in the caption to Figure 4.2.

When describing a scatterplot, always indicate the y variable first and use the term versus (vs.) or against the x variable.

This terminology reinforces the notion that the y variable depends on the x variable.

Page 6: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

FIGURE 4.2Scatterplot of sales price versus square

footage.

Page 7: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Positive relationship

As the x variable increases in value, the y variable also tends to increase.

FIGURE 4.3 (a) Scatterplot of a positive relationship

Page 8: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Negative relationship

As the x variable increases in value, the y variable tends to decrease

FIGURE 4.3 (b) scatterplot of a negative relationship

Page 9: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

No apparent relationship

As the x variable increases in value, the y variable tends to remain unchanged

FIGURE 4.3 (c) scatterplot of no apparent relationship.

Page 10: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

4.2 Introduction to Correlation

Objective:By the end of this section, I will beable to…

1) Calculate and interpret the value of the correlation coefficient.

Page 11: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Correlation Coefficient r

Measures the strength and direction of the linear relationship between two variables.

sx is the sample standard deviation of the x data values.

sy is the sample standard deviation of the y data values.

)( )(

( 1) x y

y yx xr

n s s

Page 12: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Example 4.5 - Calculating the correlation coefficient r

Find the value of the correlation coefficient rfor the temperature data in Table 4.11.

Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities

Page 13: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Interpreting the Correlation Coefficient r

1) Values of r close to 1 indicate a positive relationship between the two variables.

The variables are said to be positively correlated.

As x increases, y tends to increase as well.

Page 14: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Interpreting the Correlation Coefficient r

2) Values of r close to -1 indicate a negative relationship between the two variables.

The variables are said to be negatively correlated.

As x increases, y tends to decrease.

Page 15: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Interpreting the Correlation Coefficient r

3) Other values of r indicate the lack of either a positive or negative linear relationship between the two variables.

The variables are said to be uncorrelated

As x increases, y tends to neither increase nor decrease linearly.

Page 16: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Guidelines for Interpreting the Correlation Coefficient rIf the correlation coefficient between twovariables is

greater than 0.7, the variables are positively correlated.

between 0.33 and 0.7, the variables are mildly positively correlated.

between –0.33 and 0.33, the variables are not correlated.

between –0.7 and –0.33, the variables are mildly negatively correlated.

less than –0.7, the variables are negatively correlated.

Page 17: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Example 4.6 - Interpreting the correlation coefficient

Interpret the correlation coefficient found in Example 4.5.

Page 18: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Example 4.6 continued

Solution

In Example 4.5, we found the correlation coefficient for the relationship between high and low temperature to be r = 0.9761.

r = 0.9761 very close to 1.

We would therefore say that high and low temperatures for these 10 American cities are strongly positively correlated.

As low temperature increases, high temperatures also tend to increase.

Page 19: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Equivalent Computational Formula for Calculating the Correlation Coefficient r

2 22 2

/

/ /

xy x y nr

x x n y y n

Page 20: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Example 4.7

Use the computational formula to calculate the correlation coefficient r for the relationshipbetween square footage and sales price of the eight home lots for sale in Glen Ellyn from Table 4.6 (Example 4.3 in Section 4.1).

Page 21: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

SummarySection 4.2 introduces the correlation coefficient r, a measure of the strength of

linear association between two numeric variables.

Values of r close to 1 indicate that the variables are positively correlated.

Values of r close to –1 indicate that the variables are negatively correlated.

Values of r close to 0 indicate that the variables are not correlated.

Page 22: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

4.3 Introduction to Regression

Objectives:By the end of this section, I will beable to…

1) Calculate the value and understand the meaning of the slope and the y intercept of the regression line.

2) Predict values of y for given values of x.

Page 23: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Equation of the Regression Line

Approximates the relationship between x and y

The equation is where the regression coefficients are the

slope, b1, and the y intercept, b0.

The “hat” over the y (pronounced “y-hat”) indicates that this is an estimate of y and not necessarily an actual value of y.

0 1y b b x

Page 24: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Example 4.8 - Calculating the regression coefficients b0 and b1

Find the value of the regression coefficients b0 and b1 for the temperature data in

Table 4.11.

Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities

Page 25: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Example 4.8 continued

Step 4:

Thus, the equation of the regression line for the temperature data is

10.0533 0.9865y x

Page 26: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Example 4.8 continued

Since y and x represent high and low temperatures, respectively, this equation is read as follows:

“The estimated high temperature for an American city is 10.0533 degrees Fahrenheit plus 0.9865 times the low temperature for that city.”

Page 27: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Using the Regression Equation to Make PredictionsFor any particular value of x, the predicted

value for y lies on the regression line.

Example 4.11

Suppose we are considering moving to a city that has a low temperature of 47 degrees Fahrenheit (ºF) on this particular winter’s day. What would the estimated high temperature be for this city?

Page 28: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Example 4.11 continuedSolution

Plug the value of 47ºF for the variable low into the regression equation from Example 4.8:

We would say: “The estimated high temperature for an American city with a low of 47ºF, is 56.4188ºF.”

10.0533 0.9865

10.0533 0.9865 47

56.4188

y low

Page 29: Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

Interpreting the Slope

Relationship Between Slope and Correlation Coefficient

The slope b1 of the regression line and the correlation coefficient r always have the same sign.

b1 is positive if and only if r is positive.

b1 is negative if and only if r is negative.