Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative...

37
Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative Data?

Transcript of Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative...

Page 1: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 1 of 63

Section 2.4

How Can We Describe the Spread of Quantitative Data?

Page 2: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 2 of 63

Measuring Spread: Range

Range: difference between the largest and smallest observations

Page 3: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 3 of 63

Measuring Spread: Standard Deviation

Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations

1

)( 2

n

xxs

Page 4: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 4 of 63

Empirical Rule

For bell-shaped data sets:

Approximately 68% of the observations fall within 1 standard deviation of the mean

Approximately 95% of the observations fall within 2 standard deviations of the mean

Approximately 100% of the observations fall within 3 standard deviations of the mean

Page 5: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 5 of 63

Parameter and Statistic

A parameter is a numerical summary of the population

A statistic is a numerical summary of a sample taken from a population

Page 6: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 6 of 63

Section 2.5

How Can Measures of Position Describe Spread?

Page 7: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 7 of 63

Quartiles

Splits the data into four parts The median is the second quartile, Q2

The first quartile, Q1, is the median of the lower half of the observations

The third quartile, Q3, is the median of the upper half of the observations

Page 8: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 8 of 63

Example: Find the first and third quartiles

Prices per share of 10 most actively traded stocks on NYSE (rounded to nearest $)

2 4 11 12 13 15 31 31 37 47

a. Q1 = 2 Q3 = 47

b. Q1 = 12 Q3 = 31

c. Q1 = 11 Q3 = 31

d. Q1 =11.5 Q3 = 32

Page 9: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 9 of 63

Measuring Spread: Interquartile Range

The interquartile range is the distance between the third quartile and first quartile:

IQR = Q3 – Q1

Page 10: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 10 of 63

Detecting Potential Outliers

An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile

Page 11: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 11 of 63

The Five-Number Summary

The five number summary of a dataset:

• Minimum value

• First Quartile

• Median

• Third Quartile

• Maximum value

Page 12: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 12 of 63

Boxplot

A box is constructed from Q1 to Q3

A line is drawn inside the box at the median

A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier

A line extends outward from the upper end of the box to the largest observation that is not a potential outlier

Page 13: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 13 of 63

Boxplot for Sodium Data

Sodium Data: 0 200 Five Number Summary:

70 210

125 210 Min: 0

125 220 Q1: 145

140 220 Med: 200

150 230 Q3: 225

170 250 Max: 290

170 260

180 290

200 290

Page 14: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 14 of 63

Boxplot for Sodium in Cereals

Sodium Data: 0 210

260 125220 290210 140220 200125 170250 150170 70230 200290 180

Page 15: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 15 of 63

Z-Score

The z-score for an observation measures how far an observation is from the mean in standard deviation units

An observation in a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3

deviation standard

mean -n observatio z

Page 16: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 16 of 63

Chapter 3Association: Contingency,

Correlation, and Regression

Learn ….

How to examine links

between two variables

Page 17: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 17 of 63

Variables

Response variable: the outcome variable

Explanatory variable: the variable that explains the outcome variable

Page 18: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 18 of 63

Association

An association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable

Page 19: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 19 of 63

Section 3.1

How Can We Explore the Association Between Two Categorical Variables?

Page 20: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 20 of 63

Example: Food Type and Pesticide Status

Page 21: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 21 of 63

Example: Food Type and Pesticide Status

What is the response variable? What is the explanatory variable?

Pesticides:

Food Type: Yes No

Organic 29 98

Conventional 19485 7086

Page 22: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 22 of 63

Example: Food Type and Pesticide Status

What proportion of organic foods contain pesticides?

What proportion of conventionally grown foods contain pesticides?

Pesticides:

Food Type: Yes No

Organic 29 98

Conventional 19485 7086

Page 23: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 23 of 63

Example: Food Type and Pesticide Status

What proportion of all sampled items contain pesticide residuals?

Pesticides:

Food Type: Yes No

Organic 29 98

Conventional 19485 7086

Page 24: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 24 of 63

Contingency Table

The Food Type and Pesticide Status Table is called a contingency table

A contingency table:• Displays 2 categorical variables

• The rows list the categories of 1 variable

• The columns list the categories of the other variable

• Entries in the table are frequencies

Page 25: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 25 of 63

Example: Food Type and Pesticide Status

Contingency Table Showing Conditional Proportions

Page 26: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 26 of 63

Example: Food Type and Pesticide Status

What is the sum over each row? What proportion of organic foods contained

pesticide residuals? What proportion of conventional foods

contained pesticide residuals?

Pesticides: Food Type: Yes No

Organic 0.23 0.77

Conventional 0.73 0.27

Page 27: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 27 of 63

Example: Food Type and Pesticide Status

Page 28: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 28 of 63

Example: For the following pair of variables, which is the response variable and which is the explanatory variable?

College grade point average (GPA) and high school GPA

a. College GPA: response variable and High School GPA : explanatory variable

b. College GPA: explanatory variable and High School GPA : response variable

Page 29: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 29 of 63

Section 3.2

How Can We Explore the Association Between Two Quantitative

Variables?

Page 30: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 30 of 63

Scatterplot

Graphical display of two quantitative variables:

• Horizontal Axis: Explanatory variable, x

• Vertical Axis: Response variable, y

Page 31: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 31 of 63

Example: Internet Usage and Gross National Product (GDP)

Page 32: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 32 of 63

Positive Association

Two quantitative variables, x and y, are said to have a positive association when high values of x tend to occur with high values of y, and when low values of x tend to occur with low values of y

Page 33: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 33 of 63

Negative Association

Two quantitative variables, x and y, are said to have a negative association when high values of x tend to occur with low values of y, and when low values of x tend to occur with high values of y

Page 34: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 34 of 63

Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?

Page 35: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 35 of 63

Linear Correlation: r

Measures the strength of the linear association between x and y

• A positive r-value indicates a positive association

• A negative r-value indicates a negative association

• An r-value close to +1 or -1 indicates a strong linear association

• An r-value close to 0 indicates a weak association

Page 36: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 36 of 63

Calculating the correlation, r

))((1

1

yx s

yy

s

xx

nr

Page 37: Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 37 of 63

Example: 100 cars on the lot of a used-car dealership

Would you expect a positive association, a

negative association or no association between

the age of the car and the mileage on the

odometer? Positive association Negative association No association