Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative...

Agresti/Franklin Statistics, 1 of 63

Section 2.4

How Can We Describe the Spread of Quantitative Data?

Measuring Spread: Range

Range: difference between the largest and smallest observations

Measuring Spread: Standard Deviation

Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations

Empirical Rule

For bell-shaped data sets:

Approximately 68% of the observations fall within 1 standard deviation of the mean

Approximately 95% of the observations fall within 2 standard deviations of the mean

Approximately 100% of the observations fall within 3 standard deviations of the mean

Parameter and Statistic

A parameter is a numerical summary of the population

A statistic is a numerical summary of a sample taken from a population

Section 2.5

How Can Measures of Position Describe Spread?

Quartiles

Splits the data into four parts The median is the second quartile, Q2

The first quartile, Q1, is the median of the lower half of the observations

The third quartile, Q3, is the median of the upper half of the observations

Example: Find the first and third quartiles

Prices per share of 10 most actively traded stocks on NYSE (rounded to nearest $)

2 4 11 12 13 15 31 31 37 47

a. Q1 = 2 Q3 = 47

b. Q1 = 12 Q3 = 31

c. Q1 = 11 Q3 = 31

d. Q1 =11.5 Q3 = 32

Measuring Spread: Interquartile Range

The interquartile range is the distance between the third quartile and first quartile:

IQR = Q3 – Q1

Detecting Potential Outliers

An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile

The Five-Number Summary

The five number summary of a dataset:

• Minimum value

• First Quartile

• Median

• Third Quartile

• Maximum value

Boxplot

A box is constructed from Q1 to Q3

A line is drawn inside the box at the median

A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier

A line extends outward from the upper end of the box to the largest observation that is not a potential outlier

Boxplot for Sodium Data

Sodium Data: 0 200 Five Number Summary:

70 210

125 210 Min: 0

125 220 Q1: 145

140 220 Med: 200

150 230 Q3: 225

170 250 Max: 290

170 260

180 290

200 290

Boxplot for Sodium in Cereals

Sodium Data: 0 210

260 125220 290210 140220 200125 170250 150170 70230 200290 180

Z-Score

The z-score for an observation measures how far an observation is from the mean in standard deviation units

An observation in a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3

deviation standard

mean -n observatio z

Chapter 3Association: Contingency,

Correlation, and Regression

Learn ….

How to examine links

between two variables

Variables

Response variable: the outcome variable

Explanatory variable: the variable that explains the outcome variable

Association

An association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable

Section 3.1

How Can We Explore the Association Between Two Categorical Variables?

Example: Food Type and Pesticide Status

What is the response variable? What is the explanatory variable?

Pesticides:

Food Type: Yes No

Organic 29 98

Conventional 19485 7086

What proportion of organic foods contain pesticides?

What proportion of conventionally grown foods contain pesticides?

Pesticides:

Food Type: Yes No

Organic 29 98

What proportion of all sampled items contain pesticide residuals?

Pesticides:

Food Type: Yes No

Organic 29 98

Contingency Table

The Food Type and Pesticide Status Table is called a contingency table

A contingency table:• Displays 2 categorical variables

• The rows list the categories of 1 variable

• The columns list the categories of the other variable

• Entries in the table are frequencies

Contingency Table Showing Conditional Proportions

What is the sum over each row? What proportion of organic foods contained

pesticide residuals? What proportion of conventional foods

contained pesticide residuals?

Pesticides: Food Type: Yes No

Organic 0.23 0.77

Conventional 0.73 0.27

Example: For the following pair of variables, which is the response variable and which is the explanatory variable?

College grade point average (GPA) and high school GPA

a. College GPA: response variable and High School GPA : explanatory variable

b. College GPA: explanatory variable and High School GPA : response variable

Section 3.2

How Can We Explore the Association Between Two Quantitative

Variables?

Scatterplot

Graphical display of two quantitative variables:

• Horizontal Axis: Explanatory variable, x

• Vertical Axis: Response variable, y

Example: Internet Usage and Gross National Product (GDP)

Positive Association

Two quantitative variables, x and y, are said to have a positive association when high values of x tend to occur with high values of y, and when low values of x tend to occur with low values of y

Negative Association

Two quantitative variables, x and y, are said to have a negative association when high values of x tend to occur with low values of y, and when low values of x tend to occur with high values of y

Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?

Linear Correlation: r

Measures the strength of the linear association between x and y

• A positive r-value indicates a positive association

• A negative r-value indicates a negative association

• An r-value close to +1 or -1 indicates a strong linear association

• An r-value close to 0 indicates a weak association

Calculating the correlation, r

Example: 100 cars on the lot of a used-car dealership

Would you expect a positive association, a

negative association or no association between

the age of the car and the mileage on the

odometer? Positive association Negative association No association

Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative...

Documents

Transcript of Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative...

ASTA 1.9 Agresti: Overviewoft-test 1.10 Signiﬁcancetestforproportion • Considerasampleofsizen,whereweobservewhetheragivenpropertyispresentornot ...

Categorical Data Analysis Selected Solutions by Agresti

Agresti/Franklin Statistics, 1 of 90 Section 10.2 How Can We Test whether Categorical Variables are Independent?

Conditional Logistic Regression for Matched Data HRP 261 02/25/04 reading: Agresti chapter 9.2.

AGRESTI AMATO - ipsiasar.it

Franklin Consulting How Web 2.0 may change teaching and learning Tom Franklin Franklin Consulting tom@franklin-consulting.co.uk.

NOMINAL RESPONSES: BASELINE-CATEGORY LOGIT MODELS (Agresti 7.1)

Agresti/Franklin Statistics, 1 of 33 Enrollment Fall 2005 (all students) ClassificationMenWomenTotal Undergraduate 1,533 (52%) 1,416 (48%) 2,949 Professional*172239.

Maddalena Mantovani , Juri Agresti, Erika D’Ambrosio, Riccardo De Salvo, Barbara Simoni

Franklin Consulting Futureproofing: Technology for the future Tom Franklin Franklin Consulting tom@franklin-consulting.co.uk .

the silent way by Meiry agresti part 1

Agresti/Franklin Statistics, 1e, 1 of 139 Section 6.4 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.

Agresti/Franklin Statistics, 1 of 62 Chapter 14 Nonparametric Statistics Learn …. About Nonparametric Statistical Methods.

Agresti/Franklin Statistics, 1 of 141 Chapter 12 Multiple Regression Learn…. T o use Multiple Regression Analysis to predict a response variable using.

Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.

Agresti/Franklin Statistics, 1 of 88 Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?

Franklin County Prosperity Indicators Report96bda424cfcc34d9dd1a-0a7f10f87519dba22d2dbc6233a731e5.r41.cf2.rackcdn.com/...For the county, the breakout of movers is 2.4% moving in from

the silent way by Meiry agresti part 2

Agresti/Franklin Statistics, 1 of 82 Chapter 13 Comparing Groups: Analysis of Variance Methods Learn …. How to use Statistical inference To Compare Several.

Agresti - Bayesian Inference for Categorical Data Analysis