Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables...

Unit 4: Inference for numerical variablesLecture 4: ANOVA

Statistics 104

Mine Cetinkaya-Rundel

October 24, 2013

Announcements

PA opens at 5pm today, due Sat evening (based on feedback onmidterm evals)

If I still have your midterm or project proposal, pick it up at theend of class

New unit next week...

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 2 / 1

The t distribution for the difference of two means

Diamonds

0.99 carat 1 caratpt99 pt100

x 44.50 53.43s 13.32 12.22n 23 30

carat = 0.99 carat = 1

These data are a random sample from the diamonds data set in ggplot2 R package.Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 3 / 1

The t distribution for the difference of two means Hypothesis testing for the difference of two means

From last time..

We are interested in finding out if the average point price of a 1 caratdiamond is higher than the average point price of a 0.99 caratdiamond.

H0 :µpt99 = µpt100

HA :µpt99 < µpt100

SE = 3.56

T = −2.508

df = 22

The t distribution for the difference of two means Hypothesis testing for the difference of two means

p-value

Clicker question

Which of the following is the correct p-value for this hypothesis test?

T = −2.508

(a) between 0.005 and 0.01

(b) between 0.01 and 0.025

(c) between 0.01 and 0.025

(d) between 0.02 and 0.05

(e) between 0.01 and 0.02

one tail 0.100 0.050 0.025 0.010 0.005two tails 0.200 0.100 0.050 0.020 0.010

df 21 1.32 1.72 2.08 2.52 2.8322 1.32 1.72 2.07 2.51 2.8223 1.32 1.71 2.07 2.50 2.8124 1.32 1.71 2.06 2.49 2.8025 1.32 1.71 2.06 2.49 2.79

The t distribution for the difference of two means Confidence intervals for the difference of two means

Application exercise: t interval for comparing means

The equivalent confidence level for a one sided HT with α = 0.05 is90%. Calculate a 90% confidence interval for the average differencebetween the point prices of 0.99 and 1 carat diamonds, and choosethe closest answer below. Then, interpret this interval in context of thedata.

(a) (-15.05, -2.81)

(b) (-15.05, -2.81)

(c) (-15.91, -1.95)

(d) (-16.30, -1.56)

(e) (-15.05, 2.81)

(f) (-16.30, 1.56)

Solution

one tail 0.100 0.050 0.025 0.010 0.005two tails 0.200 0.100 0.050 0.020 0.010

df 21 1.32 1.72 2.08 2.52 2.8322 1.32 1.72 2.07 2.51 2.8223 1.32 1.71 2.07 2.50 2.8124 1.32 1.71 2.06 2.49 2.8025 1.32 1.71 2.06 2.49 2.79

(xpt99 − xpt1) ± t?df × SE = (44.50 − 53.43) ± 1.72 × 3.56

= −8.93 ± 6.12

= (−15.05,−2.81)

We are 90% confident that the average point price of a 0.99 caratdiamond is $15.05 to $2.81 lower than the average point price of a 1carat diamond.

Synthesis

How (if at all) would this conclusion change your behaviour if you wentdiamond shopping?

Maybe buy a 0.99 carat diamond?It looks like a 1 carat, but issignificantly cheaper.

http:// rstudio-pubs-static.s3.amazonaws.com/

2176 75884214fc524dc0bc2a140573da38bb.html

ANOVA Classy vocabulary

The GSS gives the following 10 question vocabulary test:

A SPACE (school, noon, captain, room, board, don’t know)B BROADEN (efface, make level, elapse, embroider, widen, don’t know)C EMANATE (populate, free, prominent, rival, come, don’t know)D EDIBLE (auspicious, eligible, fit to eat, sagacious, able to speak, don’t know)E ANIMOSITY (hatred, animation, disobedience, diversity, friendship, don’t know)F PACT (puissance, remonstrance, agreement, skillet, pressure, don’t know)G CLOISTERED (miniature, bunched, arched, malady, secluded, don’t know)H CAPRICE (value, a star, grimace, whim, inducement, don’t know)I ACCUSTOM (disappoint, customary, encounter, get used to, business, don’t know)J ALLUSION (reference, dream, eulogy, illusion, aria, don’t know)

vocabulary scores

0 2 4 6 8 10

The GSS also asks the following question: “If you were asked to useone of four names for your social class, which would you say youbelong in: the lower class, the working class, the middle class, or theupper class?”

LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS

(self reported) class

wordsum class1 6 MIDDLE CLASS2 9 WORKING CLASS3 6 WORKING CLASS4 5 WORKING CLASS5 6 WORKING CLASS6 6 WORKING CLASS7 8 MIDDLE CLASS8 10 WORKING CLASS9 8 WORKING CLASS

10 9 UPPER CLASS· · ·

795 9 MIDDLE CLASS

Exploratory analysis

●●●●●●

●●

●●●●

n mean sdlower class 41 5.07 2.24working class 407 5.75 1.87middle class 331 6.76 1.89upper class 16 6.19 2.34overall 795 6.14 1.98

ANOVA ANOVA and the F test

Clicker question

Which of the following plots shows groups with means that are mostand least likely to be significantly different from each other?

●●−

(a) most: I, least: II

(b) most: I, least: II

(c) most: II, least: III

(d) most: I, least: III

(e) most: III, least: II

(f) most: II, least: I

Research question

Is there a difference between the average vocabulary scores of Amer-icans from different (self reported) classes?

To compare means of 2 groups we use a Z or a T statistic.

To compare means of 3+ groups we use a new test calledANOVA and a new statistic called F.

ANOVA - hypotheses

H0 : The mean outcome is the same across all categories,

µ1 = µ2 = · · · = µk ,

where µi represents the mean of the outcome for observations incategory i.

HA : At least one pair of means are different from each other.

z/t test vs. ANOVA - Purpose

z/t test

Compare means from two groupsto see whether they are so farapart that the observed differencecannot reasonably be attributed tosampling variability.

H0 : µ1 = µ2

Compare the means from two ormore groups to see whether theyare so far apart that the observeddifferences cannot all reasonablybe attributed to samplingvariability.

H0 : µ1 = µ2 = · · · = µk

z/t test vs. ANOVA - Method

z/t test

Compute a test statistic (a ratio).

z/t =(x1 − x2) − (µ1 − µ2)

SE(x1 − x2)

Compute a test statistic (a ratio).

F =variability bet. groupsvariability w/in groups

Large test statistics lead to small p-values.

If the p-value is small enough H0 is rejected, and we concludethat the population means are not equal.

F distribution and p-value

F =variability bet. groupsvariability w/in groups

In order to be able to reject H0, we need a small p-value, whichrequires a large F statistic.

In order to obtain a large F statistic, variability between samplemeans needs to be greater than variability within sample means.

Goal: Determine measures of variability between and within groups,so that we can make a decision on the hypotheses based on howthey compare to each other.

ANOVA ANOVA output, deconstructed

Df Sum Sq Mean Sq F value Pr(>F)(Group) class 3 236.56 78.855 21.735 <0.0001(Error) Residuals 791 2869.80 3.628

Total 794 3106.36

Sum of squares total, SST

Measures the total variability in the data

SST =n∑

(xi − x)2

where xi represent the value of the response variable of each obser-vation in the dataset.[Very similar to calculation of variance, except not scaled by the sample size.]

SST = (6 − 6.14)2 + (9 − 6.14)2 + · · ·+ (9 − 6.14)2

= 3106.36

Total 794 3106.36

Sum of squares between groups, SSG

Measures the variability between groups, i.e. how the group meanscompare to the grand mean

SSG =k∑

ni(xj − x)2

nj : each group size, xj : average for each group, x: overall (grand)mean[Explained variability: deviation of group mean from overall mean, weighted by sample size.]

n mean sdlower class 41 5.07 2.24working class 407 5.75 1.87middle class 331 6.76 1.89upper class 16 6.19 2.34overall 795 6.14 1.98

SSG =(41 × (5.07 − 6.14)2

(407 × (5.75 − 6.14)2

(331 × (6.76 − 6.14)2

(16 × (6.19 − 6.14)2

)= 236.56Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 21 / 1

Total 794 3106.36

Sum of squares error, SSE

Measures the variability within groups:

SSE = SST − SSG

[Unexplained variability, i.e. unexplained by the group variable, due to other reasons]

SSE = 54.29 − 16.96 = 37.33

now we need a way to get from these measures of total variability toaverage variability (scaling by a measure that incorporates samplesizes and number of groups→ degrees of freedom)

Application exercise: ANOVA output

Fill in the rest of the ANOVA table, and make a decision on thehypotheses. Submit your decision using your clicker.

The data provide convincing evidence that the:

(a) average vocabulary scores are different for all classes.

(b) average vocabulary score for middle class is higher than the average forthe lower class.

(c) average vocabulary score is different for at least one pair of classes.

(d) average vocabulary score is different for at least one pair of classes.

(e) average vocabulary scores are the same for all classes.

(f) average vocabulary scores are different for upper and lower classes.

Note that you will need access to R to calculate the p-value. You canuse the following function:> pf(F-score, df_group, df_error, lower.tail = FALSE)

Relevant formulasDegrees of freedom associated with ANOVA

groups: dfG = k − 1, where k is the number of groups

total: dfT = n − 1, where n is the total sample size

error: dfE = dfT − dfG

Mean squares

Associated sum of squares divided by the associated df: MS = SS/df

Test statistic, F value

Ratio of the between group and within group variability: F = MSGMSE

p-value

Probability of at least as large a ratio between the “between group” and “withingroup” variability as the one observed, if in fact the means of all groups areequal – calculated as the area under the F curve, with degrees of freedomdfG and dfE , above the observed F statistic.

Solution

Total 794 3106.36

ANOVA Checking conditions

(1) independence

If the data are a simple random sample from less than 10% ofthe population, this condition is satisfied.

Carefully consider whether the data may be independent (e.g. nopairing).

Always important, but sometimes difficult to check.

Does this condition appear to be satisfied?

(2) approximately normal

The observations within each group should be nearly normal(especially important when the sample sizes are small.)

●●●

●●

−2 −1 0 1 2

Normal Q−Q Plot

Theoretical Quantiles

●●

●●●●

●●

●●●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

−3 −2 −1 0 1 2 32

Normal Q−Q Plot

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●●

●●

−3 −2 −1 0 1 2 3

Normal Q−Q Plot

● ●

−2 −1 0 1 2

Normal Q−Q Plot

(3) constant variance

The variability across the groups should be about equal (especiallyimportant when the sample sizes differ between groups.)

●●●●●●

●●

●●●●

n mean sdlower class 41 5.07 2.24working class 407 5.75 1.87middle class 331 6.76 1.89upper class 16 6.19 2.34

Multiple comparisons & Type 1 error rate

Which means differ?

Earlier we concluded that at least one pair of means differ. Thenatural question that follows is “which ones?”

We can do two sample t tests for differences in each possiblepair of groups.

Can you see any pitfalls with this approach?

When we run too many tests, the Type 1 Error rate increases.

This issue is resolved by using a modified significance level.

Multiple comparisons

The scenario of testing many pairs of groups is called multiplecomparisons.

The Bonferroni correction suggests that a more stringentsignificance level is more appropriate for these tests:

α? = α/K

where K is the number of comparisons being considered.

If there are k groups, then usually all possible pairs arecompared and K =

k(k−1)2 .

Determining the modified α

Clicker question

In the aldrin data set depth has 3 levels: bottom, mid-depth, and sur-face. If α = 0.05, what should be the modified significance level for twosample t tests for determining which pairs of groups have significantlydifferent means?

(a) α∗ = 0.05

(b) α∗ = 0.05/2 = 0.025

(c) α∗ = 0.05/4 = 0.0125

(d) α∗ = 0.05/6 = 0.0083

(e) α∗ = 0.05/6 = 0.0083

Which means differ?

Based on the box plots below, which means would you expect to besignificantly different?

●●●●●●

●●

●●●●

Which means differ? (cont.)

When doing multiple comparisons after ANOVA, since theassumption of equal variability across groups must have beensatisfied, we re-think how we measure the standard error and thedegrees of freedom.For all comparisons, use a consistent

SE: calculate SE using spooled =√

MSE instead of s1 and s2.

n2→ SE =

√MSE

df: use df = dfE from ANOVA instead of df calculated based onindividual sample sizes n1 and n2.

df = min(n1 − 1, n2 − 1)→ df = dfE

Finally, compare the p-value of this test to the modifiedsignificance level (α?).

[Time permitting] Is there a difference between the average vocabularyscores between middle and lower class Americans?

TdfE =(xmiddle − xlower)√

MSEnmiddle

+ MSEnlower

T791 =(6.76 − 5.07)√

3.628331 + 3.628

=1.690.315

= 5.365

p − value = 1.06 × 10−7 (two-sided)

α? = 0.05/6 = 0.0083

Reject H0, the data provide convincing evidence of a difference between theaverage vocabulary scores of those from the lower and middle classes.

Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables...

Documents

Transcript of Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables...

Scientific Method Jeopardy Scientific Method Scientific Method 2 Variables Variables Example Graphing Observati ons & Inference s 100 200 300 400 Final.

Statistics 111 - Lecture 22 Inference for relationships ...stjensen/stat111/lecture22.handout.pdfbetween variables Statistics 111 - Lecture 22 April 7, ... count data and inference

Inference for Categorical Variables 2/29/12

EXACTLY DISTRIBUTION-FREE INFERENCE WITH ...cowles.yale.edu/sites/default/files/files/pub/d15/d1501.pdfExactly Distribution-Free Inference in Instrumental Variables Regression with

Lecture 34: Chapter 13, Section 1 Two Quantitative Variables Inference for Regression

Instrumental Variables Method for Causal Inference in ... · Instrumental Variables Method for Causal Inference in Observational Studies John Pura BIOS790 October 15, 2015 John PuraBIOS790

Causal Inference - Harvard University · 2018. 12. 24. · 4 Causal Inference the treatment value =0. =1 and =0 are also random variables. Zeus Sometimes we abbreviate the ex- has

Inference for Categorical Variables

4 : Exact Inference: Variable Elimination 1 Probabilistic Inference

Causal Inference Without Balance Checking: Coarsened … · 3Institute for Quantitative Social Science, ... ence of pretreatment control variables in ... all variables is the best

Variational Inference and Learning - University at Buffalocedar.buffalo.edu/~srihari/CSE676/19.4 VariationalInference.pdflatent variables, we can perform variational inference and

Course Overview Collecting Data Exploring Data Probability Intro. Inference Comparing Variables Relationships between Variables Means/Variances Proportions.

Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

Unit 4: Inference for numerical variables Lecture 4: ANOVA · Unit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

CHAPTER 14 Inference for Distributions of Categorical Variables: Chi-square Procedures.

Inference and Representation - MITpeople.csail.mit.edu/dsontag/courses/inference14/slides/... · 2014-09-03 · Exact inference & learning Then generalize Continuous variables Partially

Chapter 10 Two Quantitative Variables Inference for Correlation: Simulation-Based Methods Least-Squares Regression Inference for Correlation and.

Unit 4: Inference for numerical variables Lecture 3: t ...Statistics 104 (Mine C¸etinkaya-Rundel) U4 - L3: t-distribution October 22, 2013 13 / 39 Small sample inference for the mean

Correlation & Simple Regressionmbognar/2020/notes... · 2019. 4. 30. · Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In

Review: Bayesian inference A general scenario: Query variables: X Evidence (observed) variables and their values: E = e Unobserved variables: Y.