Post on 20-Jan-2016
Agresti/Franklin Statistics, 1 of 63
Section 2.4
How Can We Describe the Spread of Quantitative Data?
Agresti/Franklin Statistics, 2 of 63
Measuring Spread: Range
Range: difference between the largest and smallest observations
Agresti/Franklin Statistics, 3 of 63
Measuring Spread: Standard Deviation
Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations
1
)( 2
n
xxs
Agresti/Franklin Statistics, 4 of 63
Empirical Rule
For bell-shaped data sets:
Approximately 68% of the observations fall within 1 standard deviation of the mean
Approximately 95% of the observations fall within 2 standard deviations of the mean
Approximately 100% of the observations fall within 3 standard deviations of the mean
Agresti/Franklin Statistics, 5 of 63
Parameter and Statistic
A parameter is a numerical summary of the population
A statistic is a numerical summary of a sample taken from a population
Agresti/Franklin Statistics, 6 of 63
Section 2.5
How Can Measures of Position Describe Spread?
Agresti/Franklin Statistics, 7 of 63
Quartiles
Splits the data into four parts The median is the second quartile, Q2
The first quartile, Q1, is the median of the lower half of the observations
The third quartile, Q3, is the median of the upper half of the observations
Agresti/Franklin Statistics, 8 of 63
Example: Find the first and third quartiles
Prices per share of 10 most actively traded stocks on NYSE (rounded to nearest $)
2 4 11 12 13 15 31 31 37 47
a. Q1 = 2 Q3 = 47
b. Q1 = 12 Q3 = 31
c. Q1 = 11 Q3 = 31
d. Q1 =11.5 Q3 = 32
Agresti/Franklin Statistics, 9 of 63
Measuring Spread: Interquartile Range
The interquartile range is the distance between the third quartile and first quartile:
IQR = Q3 – Q1
Agresti/Franklin Statistics, 10 of 63
Detecting Potential Outliers
An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile
Agresti/Franklin Statistics, 11 of 63
The Five-Number Summary
The five number summary of a dataset:
• Minimum value
• First Quartile
• Median
• Third Quartile
• Maximum value
Agresti/Franklin Statistics, 12 of 63
Boxplot
A box is constructed from Q1 to Q3
A line is drawn inside the box at the median
A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier
A line extends outward from the upper end of the box to the largest observation that is not a potential outlier
Agresti/Franklin Statistics, 13 of 63
Boxplot for Sodium Data
Sodium Data: 0 200 Five Number Summary:
70 210
125 210 Min: 0
125 220 Q1: 145
140 220 Med: 200
150 230 Q3: 225
170 250 Max: 290
170 260
180 290
200 290
Agresti/Franklin Statistics, 14 of 63
Boxplot for Sodium in Cereals
Sodium Data: 0 210
260 125220 290210 140220 200125 170250 150170 70230 200290 180
Agresti/Franklin Statistics, 15 of 63
Z-Score
The z-score for an observation measures how far an observation is from the mean in standard deviation units
An observation in a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3
deviation standard
mean -n observatio z
Agresti/Franklin Statistics, 16 of 63
Chapter 3Association: Contingency,
Correlation, and Regression
Learn ….
How to examine links
between two variables
Agresti/Franklin Statistics, 17 of 63
Variables
Response variable: the outcome variable
Explanatory variable: the variable that explains the outcome variable
Agresti/Franklin Statistics, 18 of 63
Association
An association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable
Agresti/Franklin Statistics, 19 of 63
Section 3.1
How Can We Explore the Association Between Two Categorical Variables?
Agresti/Franklin Statistics, 20 of 63
Example: Food Type and Pesticide Status
Agresti/Franklin Statistics, 21 of 63
Example: Food Type and Pesticide Status
What is the response variable? What is the explanatory variable?
Pesticides:
Food Type: Yes No
Organic 29 98
Conventional 19485 7086
Agresti/Franklin Statistics, 22 of 63
Example: Food Type and Pesticide Status
What proportion of organic foods contain pesticides?
What proportion of conventionally grown foods contain pesticides?
Pesticides:
Food Type: Yes No
Organic 29 98
Conventional 19485 7086
Agresti/Franklin Statistics, 23 of 63
Example: Food Type and Pesticide Status
What proportion of all sampled items contain pesticide residuals?
Pesticides:
Food Type: Yes No
Organic 29 98
Conventional 19485 7086
Agresti/Franklin Statistics, 24 of 63
Contingency Table
The Food Type and Pesticide Status Table is called a contingency table
A contingency table:• Displays 2 categorical variables
• The rows list the categories of 1 variable
• The columns list the categories of the other variable
• Entries in the table are frequencies
Agresti/Franklin Statistics, 25 of 63
Example: Food Type and Pesticide Status
Contingency Table Showing Conditional Proportions
Agresti/Franklin Statistics, 26 of 63
Example: Food Type and Pesticide Status
What is the sum over each row? What proportion of organic foods contained
pesticide residuals? What proportion of conventional foods
contained pesticide residuals?
Pesticides: Food Type: Yes No
Organic 0.23 0.77
Conventional 0.73 0.27
Agresti/Franklin Statistics, 27 of 63
Example: Food Type and Pesticide Status
Agresti/Franklin Statistics, 28 of 63
Example: For the following pair of variables, which is the response variable and which is the explanatory variable?
College grade point average (GPA) and high school GPA
a. College GPA: response variable and High School GPA : explanatory variable
b. College GPA: explanatory variable and High School GPA : response variable
Agresti/Franklin Statistics, 29 of 63
Section 3.2
How Can We Explore the Association Between Two Quantitative
Variables?
Agresti/Franklin Statistics, 30 of 63
Scatterplot
Graphical display of two quantitative variables:
• Horizontal Axis: Explanatory variable, x
• Vertical Axis: Response variable, y
Agresti/Franklin Statistics, 31 of 63
Example: Internet Usage and Gross National Product (GDP)
Agresti/Franklin Statistics, 32 of 63
Positive Association
Two quantitative variables, x and y, are said to have a positive association when high values of x tend to occur with high values of y, and when low values of x tend to occur with low values of y
Agresti/Franklin Statistics, 33 of 63
Negative Association
Two quantitative variables, x and y, are said to have a negative association when high values of x tend to occur with low values of y, and when low values of x tend to occur with high values of y
Agresti/Franklin Statistics, 34 of 63
Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?
Agresti/Franklin Statistics, 35 of 63
Linear Correlation: r
Measures the strength of the linear association between x and y
• A positive r-value indicates a positive association
• A negative r-value indicates a negative association
• An r-value close to +1 or -1 indicates a strong linear association
• An r-value close to 0 indicates a weak association
Agresti/Franklin Statistics, 36 of 63
Calculating the correlation, r
))((1
1
yx s
yy
s
xx
nr
Agresti/Franklin Statistics, 37 of 63
Example: 100 cars on the lot of a used-car dealership
Would you expect a positive association, a
negative association or no association between
the age of the car and the mileage on the
odometer? Positive association Negative association No association