Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative...

Post on 20-Jan-2016

216 views 0 download

Transcript of Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative...

Agresti/Franklin Statistics, 1 of 63

Section 2.4

How Can We Describe the Spread of Quantitative Data?

Agresti/Franklin Statistics, 2 of 63

Measuring Spread: Range

Range: difference between the largest and smallest observations

Agresti/Franklin Statistics, 3 of 63

Measuring Spread: Standard Deviation

Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations

1

)( 2

n

xxs

Agresti/Franklin Statistics, 4 of 63

Empirical Rule

For bell-shaped data sets:

Approximately 68% of the observations fall within 1 standard deviation of the mean

Approximately 95% of the observations fall within 2 standard deviations of the mean

Approximately 100% of the observations fall within 3 standard deviations of the mean

Agresti/Franklin Statistics, 5 of 63

Parameter and Statistic

A parameter is a numerical summary of the population

A statistic is a numerical summary of a sample taken from a population

Agresti/Franklin Statistics, 6 of 63

Section 2.5

How Can Measures of Position Describe Spread?

Agresti/Franklin Statistics, 7 of 63

Quartiles

Splits the data into four parts The median is the second quartile, Q2

The first quartile, Q1, is the median of the lower half of the observations

The third quartile, Q3, is the median of the upper half of the observations

Agresti/Franklin Statistics, 8 of 63

Example: Find the first and third quartiles

Prices per share of 10 most actively traded stocks on NYSE (rounded to nearest $)

2 4 11 12 13 15 31 31 37 47

a. Q1 = 2 Q3 = 47

b. Q1 = 12 Q3 = 31

c. Q1 = 11 Q3 = 31

d. Q1 =11.5 Q3 = 32

Agresti/Franklin Statistics, 9 of 63

Measuring Spread: Interquartile Range

The interquartile range is the distance between the third quartile and first quartile:

IQR = Q3 – Q1

Agresti/Franklin Statistics, 10 of 63

Detecting Potential Outliers

An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile

Agresti/Franklin Statistics, 11 of 63

The Five-Number Summary

The five number summary of a dataset:

• Minimum value

• First Quartile

• Median

• Third Quartile

• Maximum value

Agresti/Franklin Statistics, 12 of 63

Boxplot

A box is constructed from Q1 to Q3

A line is drawn inside the box at the median

A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier

A line extends outward from the upper end of the box to the largest observation that is not a potential outlier

Agresti/Franklin Statistics, 13 of 63

Boxplot for Sodium Data

Sodium Data: 0 200 Five Number Summary:

70 210

125 210 Min: 0

125 220 Q1: 145

140 220 Med: 200

150 230 Q3: 225

170 250 Max: 290

170 260

180 290

200 290

Agresti/Franklin Statistics, 14 of 63

Boxplot for Sodium in Cereals

Sodium Data: 0 210

260 125220 290210 140220 200125 170250 150170 70230 200290 180

Agresti/Franklin Statistics, 15 of 63

Z-Score

The z-score for an observation measures how far an observation is from the mean in standard deviation units

An observation in a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3

deviation standard

mean -n observatio z

Agresti/Franklin Statistics, 16 of 63

Chapter 3Association: Contingency,

Correlation, and Regression

Learn ….

How to examine links

between two variables

Agresti/Franklin Statistics, 17 of 63

Variables

Response variable: the outcome variable

Explanatory variable: the variable that explains the outcome variable

Agresti/Franklin Statistics, 18 of 63

Association

An association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable

Agresti/Franklin Statistics, 19 of 63

Section 3.1

How Can We Explore the Association Between Two Categorical Variables?

Agresti/Franklin Statistics, 20 of 63

Example: Food Type and Pesticide Status

Agresti/Franklin Statistics, 21 of 63

Example: Food Type and Pesticide Status

What is the response variable? What is the explanatory variable?

Pesticides:

Food Type: Yes No

Organic 29 98

Conventional 19485 7086

Agresti/Franklin Statistics, 22 of 63

Example: Food Type and Pesticide Status

What proportion of organic foods contain pesticides?

What proportion of conventionally grown foods contain pesticides?

Pesticides:

Food Type: Yes No

Organic 29 98

Conventional 19485 7086

Agresti/Franklin Statistics, 23 of 63

Example: Food Type and Pesticide Status

What proportion of all sampled items contain pesticide residuals?

Pesticides:

Food Type: Yes No

Organic 29 98

Conventional 19485 7086

Agresti/Franklin Statistics, 24 of 63

Contingency Table

The Food Type and Pesticide Status Table is called a contingency table

A contingency table:• Displays 2 categorical variables

• The rows list the categories of 1 variable

• The columns list the categories of the other variable

• Entries in the table are frequencies

Agresti/Franklin Statistics, 25 of 63

Example: Food Type and Pesticide Status

Contingency Table Showing Conditional Proportions

Agresti/Franklin Statistics, 26 of 63

Example: Food Type and Pesticide Status

What is the sum over each row? What proportion of organic foods contained

pesticide residuals? What proportion of conventional foods

contained pesticide residuals?

Pesticides: Food Type: Yes No

Organic 0.23 0.77

Conventional 0.73 0.27

Agresti/Franklin Statistics, 27 of 63

Example: Food Type and Pesticide Status

Agresti/Franklin Statistics, 28 of 63

Example: For the following pair of variables, which is the response variable and which is the explanatory variable?

College grade point average (GPA) and high school GPA

a. College GPA: response variable and High School GPA : explanatory variable

b. College GPA: explanatory variable and High School GPA : response variable

Agresti/Franklin Statistics, 29 of 63

Section 3.2

How Can We Explore the Association Between Two Quantitative

Variables?

Agresti/Franklin Statistics, 30 of 63

Scatterplot

Graphical display of two quantitative variables:

• Horizontal Axis: Explanatory variable, x

• Vertical Axis: Response variable, y

Agresti/Franklin Statistics, 31 of 63

Example: Internet Usage and Gross National Product (GDP)

Agresti/Franklin Statistics, 32 of 63

Positive Association

Two quantitative variables, x and y, are said to have a positive association when high values of x tend to occur with high values of y, and when low values of x tend to occur with low values of y

Agresti/Franklin Statistics, 33 of 63

Negative Association

Two quantitative variables, x and y, are said to have a negative association when high values of x tend to occur with low values of y, and when low values of x tend to occur with high values of y

Agresti/Franklin Statistics, 34 of 63

Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?

Agresti/Franklin Statistics, 35 of 63

Linear Correlation: r

Measures the strength of the linear association between x and y

• A positive r-value indicates a positive association

• A negative r-value indicates a negative association

• An r-value close to +1 or -1 indicates a strong linear association

• An r-value close to 0 indicates a weak association

Agresti/Franklin Statistics, 36 of 63

Calculating the correlation, r

))((1

1

yx s

yy

s

xx

nr

Agresti/Franklin Statistics, 37 of 63

Example: 100 cars on the lot of a used-car dealership

Would you expect a positive association, a

negative association or no association between

the age of the car and the mileage on the

odometer? Positive association Negative association No association