Download - PSY 1950 Null Hypothesis Significance Testing September 29, 2008

Transcript
Page 1: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

PSY 1950Null Hypothesis Significance

TestingSeptember 29, 2008

Page 2: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

vs

Page 3: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

0

2

4

6

8

10

12

2 12 22 32 42 52 62 72 82 92

sample N

mean sampling statistic

sample SQRT(SS/N) sample SQRT(SS)/Npopulation SQRT(SS/N) population SQRT(SS)/N

Page 4: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

Finite Population Correction Factor

• SEM and central limit theorem calculations are based on sampling with replacement from idealized, infinite populations

• Real-life research involves sampling without replacement from actual, finite populations

• When n/N<.05, this doesn’t matter

• When n/N>.05, use a correction factor:

Page 5: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

Controversy of NHST“backbone of psychological research”

– Gerrig & Zimbardo (2002, p. 42)

“a potent but sterile intellectual rake who leaves in his merry path a long train of ravished maidens but no viable scientific offspring”– Meehl (1967, p. 265)

“…surely the most bone-headedly misguided procedure ever institutionalized in the rote training of science students”– Rozeboom (1997, p. 335)

Page 6: PSY 1950 Null Hypothesis Significance Testing September 29, 2008
Page 7: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

NHST example (z-test)1. State the null and alternative

hypotheses• H0: µinfant = 26 lbs• H1: µ infant 26 lbs

2. Set the criteria for a decision• = .05• |z| ≥ 1.96

• Collect data and compute sample statistics• Minfant = 30 lbs• with n = 16 and = 4• z = (M - µ)/M = (30 - 26)/1 = 4

• Make a decision• Reject H0

Page 8: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

NHST Errors

Type III error?

Page 9: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

Power• The probability of correctly rejecting a false null hypothesis = 1 -

• http://wise.cgu.edu/power/power_applet.html

Page 10: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

= .05• “It is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regard as significant”– Fisher (1925, p. 47)

• Historical roots prior to Fisher’s definition

• Corresponds to subjective demarcation of chance from non-chance events

• "... surely, God loves the .06 nearly as much as the .05”– (Rosnow & Rosenthal, 1989)

Page 11: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

NHST Rationale• Why try to reject null hypothesis?– Philosophical: Popperian falsifiability•accept H1: the projector occasionally malfunctions

•reject H0: the projector always works

– Practical: defining the sampling distribution•H1: the projector failure rate = ?

•H0: the projector failure rate = 0%

Page 12: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

History of NHSTFisher’s (1925) NHST:

1. Set up null hypothesis (not necessarily)2. Report exact significance3. Only do this when you know little else

Neyman & Pearson (1950)1. Set up two competing hypothesis, H1 and H2,

and make a priori decisions about and 2. If data falls into rejection region of H1,

accept H2; otherwise accept H1. Acceptance belief.

3. Only do this when you have a disjunction of hypotheses (either H1 or H2 is true)

Current NHST (according to some):1. Set up null hypothesis as nil hypothesis2. Reject null at p<.05 and accept your

hypothesis3. Always do this

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 13: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

Criticisms of NHST• Affirming the consequent

– If P then Q. Q. Therefore P.

• The straw person argument– Tukey (1991): “It is foolish to ask ‘Are the effects of A and B different?’ They are always different—for some decimal place”(p. 100)

– “Statistical significance does not necessarily imply practical significance!”

• The replication fallacy– If you conduct an experiment that results in p = .05 (two-tailed), what is the chance that a replication of that experiment will produce a statistically significant (p<.05) effect?• 50% (see Cumming, 2008, Appendix B)

• “Confusion of the inverse”– “absence of proof is not proof of absence”– “presence of proof is not proof of presence”

Page 14: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

Affirming the Consequent• NHST commits logical fallacy

– NHST: If the null hypothesis is correct, then these data are highly unlikely•These data have occurred•Therefore, the null hypothesis is highly unlikely

– Analog: If a person is an American, then he is probably not a member of Congress•This person is a member of Congress•Therefore, he is probably not an American

• Response: Science progresses through testing its predictions– Logic may be flawed, but success is hard to deny

Page 15: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

The Straw Person Argument• Often null hypothesis = nil hypothesis– The nil hypothesis is always (or almost always) false•The “crud factor” in correlational research (Meehl, 1990)

•The “princess and the pea” effect in experimental research

– If the null hypothesis is always false, how does rejecting it increase knowledge?

• Response: effect size matters, statistical significance is not practical significance, test interactions

Page 16: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

Replication Fallacy• p-values don’t say much about replicability, yet most everyone thinks they do– Replication is NOT 1 - (Tversky & Kahneman, 1971)

• Response: p-values inform replicability, just less than one might think– All else equal, the lower the p-value, the higher the replicability

Page 17: PSY 1950 Null Hypothesis Significance Testing September 29, 2008
Page 18: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

“Confusion of the Inverse”• Criticism: NHST calculates the probability of obtaining the data given a hypothesis, p(D|H0), not the probability of obtaining a hypothesis given the data, p(H0|D)– A p-value of .05 does NOT necessarily indicate that the null hypothesis is unlikely to be true

• Response: logically faulty but productive inferences is better than nothing– p(D|H0) approximates p(H0|D) under typical experimental settings where p(H0) is low, i.e., p(H1) > p(H0)

– p(H0|D) varies monotonically with p(D|H0) p(H0|D)

– When p(H0) = .35, p(H0|D) = .35

– p(D|H0) and p(H0|D) are correlated (r = .38)

Page 19: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

NHST gives p(D|H0) not p(H0|D)

Page 20: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

Reconciliation• “Inductive inference cannot be logically justified, but they can be defended pragmatically” (Krueger, 2001)

• Use NHST mindfully– “There is no God-given rule about when and how to make up your mind in general.”•Hays (1973, p. 353)

• Don’t rely exclusively on p-values

Page 21: PSY 1950 Null Hypothesis Significance Testing September 29, 2008

Alternatives to p-values • Effect size

– Meta-analysis

• Confidence intervals

• prep