Section 12.1

44
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Section 12.1 Scatter Plots and Correlation With the quality added value you’ve come to expect from D.R.S., University of Cordele

description

Section 12.1. Scatter Plots and Correlation. With the quality added value you’ve come to expect from D.R.S., University of Cordele. HAWKES LEARNING SYSTEMS math courseware specialists. Regression, Inference, and Model Building 12.1 Scatter Plots and Correlation. - PowerPoint PPT Presentation

Transcript of Section 12.1

Page 1: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Section 12.1

Scatter Plots and Correlation

With the quality added value you’ve come to expect from D.R.S., University of Cordele

Page 2: Section 12.1

Types of Relationships:

HAWKES LEARNING SYSTEMS

math courseware specialists

Regression, Inference, and Model Building

12.1 Scatter Plots and Correlation

Strong Linear

Relationship

Non-LinearRelationship

NoRelationship

Weak LinearRelationship

Plot (x,y) data points and think about whether x and y are somehow related

Page 3: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Table 12.2: Sample of NFL Quarterbacks (2011–2012 Season)Number of Passing

Touchdowns2012 Base Salary

(in Millions of Dollars) Quarterback Rating

Drew Brees 46 3.0 110.6Michael Vick 18 12.5 84.9Philip Rivers 27 10.2 88.7Tony Romo 31 0.825 102.5

Aaron Rodgers 45 8.0 122.5Jay Cutler 13 7.7 85.7Alex Smith 17 5.0 90.7

Eli Manning 29 1.75 92.9Tim Tebow 12 2.1 72.9Tom Brady 39 0.95 105.6

Source: Yahoo! Sports. “NFL - Statistics by Position.” http://sports.yahoo.com/nfl/stats/byposition?pos=QB&conference=NFL&year=season_20 11&sort=49&timeframe=All (20 May 2012). Source: Spotrac.com. “NFL Player Contracts, Salaries, and Transactions.” http://www.spotrac.com/nfl/ (2 Oct. 2012).

Page 4: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.1: Creating a Scatter Plot to Identify Trends in Data

Use the data from Table 12.2 to produce a scatter plot that shows the relationship between the base salary of an NFL quarterback and the number of touchdowns the quarterback has thrown in one season. Solution We might expect for the number of touchdowns a quarterback throws in one season to influence his salary. Taking this into consideration, we will place the number of touchdowns on the x-axis and the base salary on the y-axis.

Page 5: Section 12.1

Scatter Plot of (touchdowns, salary) on TI-84

• Put Touchdowns in list L1, Salary in list L2

• Y= old algebra plots should be cleared out of there• 2nd STAT PLOT all should be “Off” to start with• 1:Plot 1: On, choose Type, Lists L1 and L2, Mark

• Remember 2ND 1, 2ND 2 to put in list names?• ZOOM 9:ZoomStat• If unexplainable error, 2ND MEM 7 1 2 to clear all

and then retype the lists of data.• TRACE and Left Arrow and Right Arrow to explore it

Page 6: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.1: Creating a Scatter Plot to Identify Trends in Data (cont.)

Is there any apparent relationship between the number of passing touchdowns and the salary? __________________

Page 7: Section 12.1

Scatter Plot of (touchdowns, rating) on TI-84

• Ratings in list L3, Touchdowns still in L1, Salary in L2

• Type the Ratings into List L3 if you haven’t already done so.

• 2nd STAT PLOT• 1:Plot 1: Change to Lists L1 and L3

• ZOOM 9:ZoomStat• TRACE and Left Arrow and Right Arrow to explore it

Page 8: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.2: Creating a Scatter Plot to Identify Trends in Data (cont.)

Is there any apparent relationship between the number of passing touchdowns and the QB Rating? _____________. It appears to be a _______ relationship with _______ slope.

Page 9: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.3: Determining Whether a Scatter Plot Would Have a Positive Slope, Negative Slope, or Not Follow a Straight-Line Pattern

Determine whether the points in a scatter plot for the two variables are likely to have a positive slope, negative slope, or not follow a straight-line pattern. a. The number of hours you study for an exam and the

score you make on that exam _________________b. The price of a used car and the number of miles on

the odometer _____________________________c. The pressure on a gas pedal and the speed of the car

_____________________________________d. Shoe size and IQ for adults ___________________

Page 10: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Scatter Plots and Correlation

The Pearson correlation coefficient, , is the parameter that measures the strength of a linear relationship between two quantitative variables in a population.

The correlation coefficient for a sample is denoted by r. It always takes a value between −1 and 1, inclusive.

1 1r

ρ is the Greek letter “rho”. Practice writing the rho character here:

Page 11: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Population parameter , Sample statistic

(Greek letter rho) is the population parameter for the Correlation Coefficient. is the sample statistic for the Correlation Coefficient.

We use our sample to estimate the population’s . Just like in other experiments we used our sample to estimate the population’s mean, .

2 22 2

i i i i

i i i i

n x y x yr

n x x n y y

Page 12: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

HAWKES LEARNING SYSTEMS

math courseware specialists

Regression, Inference, and Model Building

12.1 Scatter Plots and Correlation

• –1 ≤ r ≤ 1

• Close to –1 means a strong negative correlation.

• Close to 0 means no correlation.

• Close to 1 means a strong positive correlation.

Page 13: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.4: Calculating the Correlation Coefficient Using a TI-83/84 Plus Calculator

Calculate the correlation coefficient, r, for the data from Table 12.2 relating touchdowns thrown and base salaries. Solution The data we need from Table 12.2 are reproduced in the following table. (Should already be in your calculator’s lists.)

But we will not dig into the details of that awful formula! The TI-84 has built-in goodies.

Page 14: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.4: Calculating the Correlation Coefficient Using a TI-83/84 Plus Calculator (cont.)

NFL Quarterbacks Number of Passing

Touchdowns Base Salary (in

Millions of Dollars) 46 3.0 18 12.5 27 10.2 31 0.825 45 8.0 13 7.7 17 5.0 29 1.75 12 2.1 39 0.95

in ListL1

in ListL2

Do you expect r to be• Close to -1 ?• Close to 0?• Close to 1?

Page 15: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.4: Calculating the Correlation Coefficient Using a TI-83/84 Plus Calculator (cont.)

• It’s STAT, TESTS, ALPHA F (ALPHA E on the 83/Plus)

• Repeat for the lists for Passing Touchdowns and QB Rating. In that case, r = _______.

Put in the list names

VARS, Y-VARS, 1, 1will be useful later.

Highlight Calculate, press ENTER, down arrows to find r = _____________

Page 16: Section 12.1

Use TI-84 LinRegTTest for a full Hypothesis Test(more than just getting the correlation coefficient, r)

• The next few slides describe the use of LinRegTTest.• It’s STAT, TESTS, ALPHA F (ALPHA E on the 83/Plus)• This description is about the full hypothesis test to

determine “Is the relationship significant?”• The outputs include the value of r, the correlation

coefficient, which is of greatest interest at this early point in our study.

The Hawkes materials talk about the LinReg feature but I’m recommending the LinRegTTest instead because you get more information for about the same effort.

Page 17: Section 12.1

Hypothesis Test for significant

Null Hypothesis: “No relationship”Alternative: “There is a significant relationship!”There’s some level of significance specified in advance, like or A value is calculated. Then “what is the -value of this ?” (Area beyond , is it a small probability?)And if -value < , reject the null hypothesis

– If so, then we say “Yes, significant relationship!”

Page 18: Section 12.1

LinRegTTest inputs (not identical to the quarterback example!)

• Here are the inputs:

• Xlist and Ylist – where you put the data– Shortcut: 2ND 2 puts L2

• Freq: 1 (unless…)

• β & ρ: ≠ 0– This is the Alternative

Hypothesis• RegEq: VARS, right

arrow to Y-VARS, 1, 1– Just put it in for later

• Highlight “Calculate”• Press ENTER

Page 19: Section 12.1

LinRegTTest Outputs, first screen(from a different problem)

• t= the t statistic value for this test (the formula is in the book)

• p = the p-value for this t test statistic

• in this kind of a test• later – for regression

Page 20: Section 12.1

LinRegTTest Outputs, second screen (from a different problem)

• b later, for Regression• s much later, for

advanced Regression

• r2 = how much of the output variable (weight) is explained by the input variable (girth)

• r = the correlation coefficient for the sample– Close to – strong

positive relationship– Or – strong negative

Page 21: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Testing the Correlation Coefficient for Significance Using Hypothesis Testing

Testing Linear Relationships for Significance Significant Linear Relationship (Two-Tailed Test)

H0: = 0 (Implies that there is no significant linear relationship) Ha: ≠ 0 (Implies that there is a significant linear relationship)

(Now they’re getting into the Hypothesis Testing we saw a brief preview of earlier in this set of slides.)

Testing Linear Relationships for Significance (cont.)Significant Negative Linear Relationship (Left-Tailed Test)

H0: ≥ 0 (Implies that there is no significant negative linear relationship) Ha: < 0 (Implies that there is a significant negative linear relationship)

Testing Linear Relationships for Significance (cont.) Significant Positive Linear Relationship (Right-Tailed Test)

H0: ≤ 0 (Implies that there is no significant positive linear relationship) Ha: > 0 (Implies that there is a significant positive linear relationship)

This is the onewe use the most.

Be aware that this one exists.

Be aware that this one exists.

Page 22: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Testing the Correlation Coefficient for Significance Using Hypothesis Testing

Test Statistic for a Hypothesis Test for a Correlation Coefficient

The test statistic for testing the significance of the correlation coefficient is given by

212

rt

rn

TI-84 LinRegTTest will calculate this value for us.

Test Statistic for a Hypothesis Test for a Correlation Coefficient (cont.)

where r is the sample correlation coefficient and n is the number of data pairs in the sample. The number of degrees of freedom for the t-distribution of the test statistic is given by n 2.

Page 23: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Testing the Correlation Coefficient for Significance Using Hypothesis Testing

Rejection Regions for Testing Linear Relationships Significant Linear Relationship (Two-Tailed Test)

Reject the null hypothesis, H0 , if Significant Negative Linear Relationship (Left-Tailed Test)

Reject the null hypothesis, H0 , ifSignificant Positive Linear Relationship (Right-Tailed Test)

Reject the null hypothesis, H0 , if

2.tt

.tt

.tt

But we will use the p-value methodbecause LinRegTTest gives us a p-valueand the experiment specifies the α

Page 24: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant

Use a hypothesis test to determine if the linear relationship between the number of parking tickets a student receives during a semester and his or her GPA during the same semester is statistically significant at the 0.05 level of significance. Refer to the data presented in the following table.

GPA and Number of Parking Tickets

Number of Tickets 0 0 0 0 1 1 1 2 2 2 3 3 5 7 8

GPA 3.6 3.9 2.4 3.1 3.5 4.0 3.6 2.8 3.0 2.2 3.9 3.1 2.1 2.8 1.7

Page 25: Section 12.1

Example 12.7

Use the TI-84 LinRegTTest to perform the hypothesis test. Use the p-value method:• The LinRegTTest gives you a p-value.• If the p-value is < the given Level of Significance α =

0.05, then REJECT the null hypothesis; conclude that there IS a significant linear relationship.

• Otherwise, Fail To Reject – no significant relationship.And you can disregard most or all of the by-hand detail that is in the book and in the online Help.

Page 26: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.)

Solution Step 1: State the null and alternative hypotheses. We wish to test the claim that a significant linear relationship exists between the number of parking tickets a student receives during a semester and his or her GPA during the same semester. Thus, the hypotheses are stated as follows.

0: 0: 0a

HH

Page 27: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.)

Step 2: Determine which distribution to use for the test statistic, and state the level of significance. We will use the t-test statistic presented previously in this section along with a significance level of = 0.05 to perform this hypothesis test. Step 3: Gather data and calculate the necessary sample statistics. (Do LinRegTTest)

Page 28: Section 12.1

Example 12-7 Hypothesis Test, concluded

Compare p = _____ vs. α = ______Decision: { Reject / Fail to Reject } the Null Hypothesis.

Conclusion about Signficant Linear Relationship:

Conclusion in Plain English:

Page 29: Section 12.1

Correlation does not imply Causation!

If there seems to be a Correlation, it doesn’t necessarily mean that changes in one variable cause changes in the other variable.1. There might be a lurking variable that affects both.2. Or the two might be completely unrelated. The

mathematical indication of a strong correlation is merely coincidental.

Extreme examples can be seen at the Spurious Correlations web site (www.tylervigen.com)

Page 30: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant

An online retailer wants to research the effectiveness of its mail-out catalogs. The company collects data from its eight largest markets with respect to the number of catalogs (in thousands) that were mailed out one fiscal year versus sales (in thousands of dollars) for that year. The results are as follows.

Number of Catalogs Mailed and SalesNumber of Catalogs

(in Thousands) 2 3 3 3 4 4 5 6

Sales (in Thousands) $126 $98 $255 $394 $107 $122 $334 $403

Page 31: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.)

Use a hypothesis test to determine if the linear relationship between the number of catalogs mailed out and sales is statistically significant at the 0.01 level of significance. Step 1: Hypotheses:H0: ___________ meaning _____________________.

Ha: ___________ meaning _____________________.

Step 2: Decision to use the t distribution and level of significance _____ = 0.01

Page 32: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.)

Step 3: Gather data and calculate the necessary sample statistics. Using a TI-83/84 Plus calculator, enter the values for the numbers of catalogs mailed (x) in L1 and the sales values (y) in L2. Run LinRegTTest.Step 4: Conclusion:{ Reject / Fail to Reject } the Null Hypothesis.Interpretation:

Page 33: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Coefficient of Determination

The coefficient of determination, r2 , is a measure of the proportion of the variation in the response variable (y) that can be associated with the variation in the explanatory variable (x).

This too is reported to you in the LinRegTTest outputs.

Page 34: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.9: Calculating and Interpreting the Coefficient of Determination

If the correlation coefficient for the relationship between the numbers of rooms in houses and their prices is r = 0.65, how much of the variation in house prices can be associated with the variation in the numbers of rooms in the houses? Solution Recall that the coefficient of determination tells us the amount of variation in the response variable (house price) that is associated with the variation in the explanatory variable (number of rooms).

Page 35: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.9: Calculating and Interpreting the Coefficient of Determination (cont.)

Thus, the coefficient of determination for the relationship between the numbers of rooms in houses and their prices will tell us the proportion or percentage of the variation in house prices that can be associated with the variation in the numbers of rooms in the houses. Also, recall that the coefficient of determination is equal to the square of the correlation coefficient.

Page 36: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.9: Calculating and Interpreting the Coefficient of Determination (cont.)

Since we know that the correlation coefficient for these data is r = 0.65, we can calculate the coefficient of determination as r2 = _____Thus, approximately _____% of the variation in house prices can be associated with the variation in the numbers of rooms in the houses.

Page 37: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Testing the Correlation Coefficient for Significance

Using Critical Values of the Pearson Correlation Coefficient to Determine the Significance of a Linear

Relationship A sample correlation coefficient, r, is statistically significant if .r r

(Why is this discussion here? Sometimes they give you a shred of a problem that gives some summary results and you have to use a printed table to make the determination. That’s the only time you’ll need to do this, for a few of those kinds of problems. In “real life”, in large problems, the LinRegTTest p-value is compared to alpha.)

Page 38: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.6: Using a Table of Critical Values to Determine Significance of a Linear Relationship

Use the critical values in Table I to determine if the correlation between the number of passing touchdowns and base salary from Example 12.4 is statistically significant. Use a 0.05 level of significance. Solution Begin by finding the critical value for = 0.05 with n = 10 in Table I. Find the value in the table where the row for n = 10 intersects the column for = 0.05.

Page 39: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.6: Using a Table of Critical Values to Determine Significance of a Linear Relationship (cont.)

n = 0.05 = 0.016 0.811 0.9177 0.754 0.8758 0.707 0.8349 0.666 0.798

10 0.632 0.76511 0.602 0.73512 0.576 0.708

INTERPRETATION: “If my sample’s correlation coefficient, r, is at least as big as the value you look up in this table, then YES, significant linear relationship. Otherwise, no, no significant linear relationship.”

Page 40: Section 12.1

HAWKES LEARNING SYSTEMS

Students Matter. Success Counts.

Copyright © 2013 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Example 12.6: Using a Table of Critical Values to Determine Significance of a Linear Relationship (cont.)

Thus, r = 0.632. Comparing this critical value to the absolute value of the correlation coefficient we found for the data in Example 12.4, we have 0.251 < 0.632, and thus r < r. Therefore, the linear relationship between the variables is not statistically significant at the 0.05 level of significance. Thus, we do not have sufficient evidence, at the 0.05 level of significance, to conclude that a linear relationship exists between the number of passing touchdowns during the 2011–2012 season and the 2012 base salary of an NFL quarterback.

Page 41: Section 12.1

Correlation Coefficient in Excel

Page 42: Section 12.1

More with Excel

That’s about all that can be done with basic Excel.There is an advanced feature on Data tab, then the Data Analysis add-in.

It gets intothe Regressiontopic in thenext lesson.

Page 43: Section 12.1

.

.

Page 44: Section 12.1

.

.