Chapter 7 Scatterplots, Association, and Correlation.

Post on 17-Jan-2016

236 views 0 download

Tags:

Transcript of Chapter 7 Scatterplots, Association, and Correlation.

Chapter 7

Scatterplots, Association, and Correlation

Examining Relationships

Relationship between two variables Examples:

•Height and Weight•Alcohol and Body Temperature•SAT Verbal Score and SAT Math Score•High School GPA and College GPA

Two Types of Variables

Response Variable (Dependent) Measures an outcome of the study

Explanatory Variable (Independent) Used to explain the response variable.

Example: Alcohol and Body Temp Explanatory Variable: Alcohol Response Variable: Body Temperature

Two Types of Variables

Does not mean that explanatory variable causes response variable It helps explain the response

Sometimes there are no true response or explanatory variables Ex. Height and Weight SAT Verbal and SAT Math Scores

Graphing Two Variables

Plot of explanatory variable vs. response variable Explanatory variable goes on horizontal axis (x)

Response variable goes on vertical axis (y) If response and explanatory variables do not exist, you can plot the variables on either axis.

This plot is called a scatterplot This plot can only be used if explanatory and response variables are both quantitative.

Scatterplots

Scatterplots show patterns, trends, and relationships.

When interpreting a scatterplot (i.e., describing the relationship between two variables) always look at the following: Overall Pattern

• Form• Direction• Strength

Deviations from the Pattern• Outliers

Interpreting Scatterplots Form

Is the plot linear or is it curved?

Strength Does the plot follow the form very closely or is there a lot of scatter (variation)?

Interpreting Scatterplots

Direction Is the plot increasing or is it decreasing?

Positively Associated•Above (below) average in one variable tends to be associated with above (below) average in another variable.

Negative Associated•Above (below) average in one variable tends to be associated with below (above) average in another variable.

Example – Scatterplot

The following survey was conducted in the U.S. and in 10 countries of Western Europe to determine the percentage of teenagers who had used marijuana and other drugs.

Example – Scatterplot

2434United States

3153Scotland

37Portugal

36Norway

1423North Ireland

819Italy

1637Ireland

15Finland

2140England

317Denmark

422Czech Republic

Other DrugsMarijuanaCountry

Percent who have used

Example – Scatterplot

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Percent who have used Marijuana vs Other Drugs

05

101520253035

0 10 20 30 40 50 60

Example – Scatterplot

The variables are interchangeable in this example. In this example, Percent of Marijuana is being used as the explanatory variable (since it is on the x-axis).

Percent of Other Drugs is being used as the response since it is on the y-axis.

Example - Scatterplot

The form is linear The strength is fairly strong The direction is positive since larger values on the x-axis yield larger values on the y-axis

Example - Scatterplot Negative association Outside temperature and amount of natural gas used

0

5

10

Gas

-5.0 .0 5.0 10.0 15.0

Temp

Correlation

The strength of the linear relationship between two quantitative variables can be described numerically

This numerical method is called correlation

Correlation is denoted by r

Correlation

A way to measure the strength of the linear relationship between two quantitative variables.

yxss

yyxx

nr

))((

11

Correlation

Steps to calculate correlation: Calculate the mean of x and y Calculate the standard deviation for x and y

Calculate Plug all numbers into formula

))(( yyxx

Correlation

Femur vs. Humerus

0

20

40

60

80

100

0 10 20 30 40 50 60 70 80

Femur

Hu

me

rus

Calculating r.

Femur (x) 38 56 59 63 74

Humerus (y) 41 63 70 72 84 Set up a table with columns for x, y, ,

, , , and

xx yy 2xx 2yy

yyxx

Calculating r.

828101068600330290

28832425618168474

303625657263

4161417059

694-3-26356

500625400-25-204138

yx xx yy 2xx 2yy yyxx

Calculating r

Recall:

So,

n

yy

665

330

585

290

y

x

Calculating r

Recall:

So,

9.154

1010

1.134

686

y

x

s

s

1

)( 2

n

yys

Calculating r.

Put everything into the formula:

994.0

9.151.1315828

1

yx ssn

yyxxr

Properties of r

r has no units (i.e., just a number)

Measures the strength of a LINEAR association between two quantitative variables If the data have a curvilinear relationship, the correlation may not be strong even if the data follow the curve very closely.

Properties of r r always ranges in values from –1 to 1 r = 1 indicates a straight increasing line

r = -1 indicates a straight decreasing line

r = 0 indicates no LINEAR relationship As r moves away from 0, the linear relationship between variables is stronger

Properties of r Changing the scale of x or y will not change the value of r

Not resistant to outliers Strong correlation ≠ Causation Strong linear relationship between two variables is NOT proof of a causal relationship!

Reading JMP Output

The following is some output from JMP where I considered Blood Alcohol Content and Number of Beers. The explanatory variable is the number of beers. Blood alcohol content is the response variable.

Reading JMP Output

0

0.05

0.1

0.15

0.2B

AC

0 2 4 6 8 10

Beers

Bivariate Fit of BAC By Beers

Reading JMP Output

RSquare 0.803536

RSquare Adj 0.788424

Root Mean Square Error 0.02092

Mean of Response 0.076

Observations (or Sum Wgts) 15

Summary of Fit

Reading JMP Output

RSquare = r2

This means I know this is positive because the scatterplot has a positive direction.

The Mean of the Response is the mean of the y’s or

896.00.803536 RSquarer

y