Laboratory for Interdisciplinary Statistical Analysis Anne Ryan [email protected] Virginia Tech.

93
Laboratory for Interdisciplinary Statistical Analysis Anne Ryan [email protected] Virginia Tech

Transcript of Laboratory for Interdisciplinary Statistical Analysis Anne Ryan [email protected] Virginia Tech.

Page 1: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Laboratory for InterdisciplinaryStatistical Analysis

Anne [email protected]

Virginia Tech

Page 2: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

1948: The Statistical Laboratory was founded as a division of the Virginia Agricultural Experiment Station to

help agronomists design experiments and calculate sums of squares.

Page 3: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

1949: Based on the success of the Statistical Laboratory, the Department of Statistics at Virginia Polytechnic

Institute (VPI) was founded—the 3rd oldest statistics department in the United States.

Page 4: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

1973: The Statistical Laboratory was re-formed as the Statistical Consulting Center to assist with statistical

analyses in every college of Virginia Polytechnic Institute & State University (VPI&SU).

Page 5: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

2007: The Graduate Student Assembly led a movement to save statistical consulting and collaboration from

death by budget cuts, ensuring that graduate students could receive help with their research.

The College of Science, Provost, Vice President of Research, Graduate School, and six additional colleges agreed that researchers should be able to receive free

statistical consulting and collaboration.

Page 6: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

2008: The Statistical Consulting Center was re-organized as the Laboratory for Interdisciplinary

Statistical Analysis (LISA) to collaborate with researchers across the Virginia Tech (VT) campuses.

Page 7: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Established in 2008

Year Clients Hours

2000 299 13682001 293 19382002 321 22202003 304 21922004 274 17752005 211 4952006 171 5412007 190 9652008 895 21842009 719 30932010 1124 4420

Page 8: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Year Clients Hours

2000 299 13682001 293 19382002 321 22202003 304 21922004 274 17752005 211 4952006 171 5412007 190 9652008 895 21842009 719 30932010 1124 4420

Year

Clie

nts

pe

r ye

ar

2000 2002 2004 2006 2008 2010

03

00

60

09

00

12

00

Page 9: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Year Clients Hours

2000 299 13682001 293 19382002 321 22202003 304 21922004 274 17752005 211 4952006 171 5412007 190 9652008 895 21842009 719 30932010 1124 4420

Year

Ho

urs

pe

r ye

ar

2000 2002 2004 2006 2008 2010

01

00

02

00

03

00

04

00

05

00

0

Page 10: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Laboratory for Interdisciplinary Statistical Analysis

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

LISA helps VT researchers benefit

from the use of Statistics

www.lisa.stat.vt.edu

Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...)

Our goal is to improve the quality of research and the use of statistics at

Virginia Tech.

10

Page 11: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Laboratory for Interdisciplinary Statistical Analysis

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

LISA helps VT researchers benefit

from the use of Statistics

www.lisa.stat.vt.edu

Collaboration LISA statisticians meet with faculty, staff, and graduate students to

understand their research and think of

ways to help them using statistics.

11

Page 12: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Laboratory for Interdisciplinary Statistical Analysis

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Collaboration

LISA helps VT researchers benefit

from the use of Statistics

www.lisa.stat.vt.edu

Walk-In Consulting

Every day from 1-3PMclients get answers to their (quick) questions

about using statistics in their research.

12

Page 13: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Laboratory for Interdisciplinary Statistical Analysis

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

LISA helps VT researchers benefit

from the use of Statistics

www.lisa.stat.vt.edu

Walk-In Consulting

Collaboration

Short Courses

Short Courses are designed to teach graduate students

howto apply statisticsin their research.

13

Page 14: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Laboratory for Interdisciplinary Statistical Analysis

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Short Courses

LISA helps VT researchers benefit

from the use of Statistics

www.lisa.stat.vt.edu

Walk-In Consulting

Collaboration

All services are FREE for VT researchers. We assist with research—not class projects or homework.

14

Page 15: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

How can LISA help?• Formulate research question.• Screen data for integrity and unusual

observations.• Implement graphical techniques to showcase

the data – what is the story?• Develop and implement an analysis plan to

address research question.• Help interpret results.• Communicate! Help with writing the report or

giving the talk.

• Identify future research directions.

Page 16: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

To request a collaboration meeting go to

www.lisa.stat.vt.edu

Page 17: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

To request a collaboration meeting go to www.lisa.stat.vt.edu

1. Sign in to the website using your VT PID and password.2. Enter your information (email address, college, etc.)3. Describe your project (project title, research goals,

specific research questions, if you have already collected data, special requests, etc.) 4. Wait 0-3 days, then contact the LISA collaboratorsassigned to your project to schedule an initial meeting.

Page 18: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Page 19: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Introduction to R• R is a free software environment for

statistical computing and graphics. Download: http://www.r-project.org/

• Topics Covered:

• Data objects in R, loops, import/export datasets, data manipulation

• Graphing

• Basic Analyses: T-tests, Regression, ANOVA

Page 20: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Linear Regression & Structural Equation Monitoring• Linear regression is used to model the

relationship between a continuous response and a continuous predictor.

• SEM is a modeling technique that investigates causal relationships among variables.

• Time –related latent variables, modification indices and critical ratio in exploratory analyses, and computation of implied moments, factor score weights, total effects, and indirect effects.

Page 21: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Generalized Linear Models

• Modeling technique for situations where the errors are not necessarily normal.

• Can handle situations where you have binary responses, counts, etc.

• Uses a link function to relate the response to the linear model.

• Cover: Basic statistical concepts of GLM and how it relates to regression using normal errors.

Page 22: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

Mixed Models and Random Effects• Mixed Model: A statistical model that has

both random effects and fixed effects.

• Fixed Effect: Levels of the factor are predetermined. Random Effect: Levels of the factor were chosen at random.

• The primary focus of the course will be to identify scenarios where a mixed model approach will be appropriate. The concepts will be explained almost wholly through examples in SAS or in R.

Page 23: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

T-Tests and Analysis of Variance

Anne Ryan

23

Page 24: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Defense:

Prosecution:

What’s the Assumed Conclusion?

Criminal Trial

Represent the accused (defendant)

Hold the “Burden of Proof”—obligation to shift the assumed conclusion from an oppositional opinion to one’s own position through evidence

ANSWER: The accused is innocent until proven guilty.• Prosecution must convince the judge/jury that

the defendant is guilty beyond a reasonable doubt

24

Page 25: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Similarities between Criminal Trials and Hypothesis Testing

Burden of Proof—Obligation to shift the conclusion using evidence

TrialHypothesis Test

Innocent until proven guilty

Accept the status quo (what is

believed before) until the data

suggests otherwise

25

Page 26: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Similarities between Criminal Trials and Hypothesis Testing

Decision Criteria

TrialHypothesis Test

Evidence has to convincing beyond a

reasonable

Occurs by chance less than 100α% of the time (ex:

5%)

26

Page 27: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Hypothesis Test: Procedure for examining a claim about the value of a parameter◦ i.e.

Hypothesis tests are very methodical with several key pieces.

Introduction to Hypothesis Testing

27

Page 28: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

1. Test

2. Assumptions

3. Hypotheses

4. Mechanics

5. Conclusion

Steps in a Hypothesis Test

28

Page 29: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

State the name of the testing method to be used

It is important to not be off track in the very beginning

Hypothesis Tests we will Perform:◦ One Sample t test for μ◦ Two sample t test for μ◦ Paired t test ◦ ANOVA

1. Test

29

Page 30: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

List all the assumptions required for your test to be valid.

All tests have assumptions

Even if assumptions are not met you should still comment on how this affects your results.

2. Assumptions

30

Page 31: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

State the hypothesis of interest

There are two hypotheses◦ Null Hypothesis: Denoted ◦ Alternative Hypothesis: Denoted

Examples of possible hypotheses:

3. Hypotheses

0HaHorH1

13:.13:0 aHvsH

31

Page 32: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

For hypothesis testing there are three popular versions of testing◦ Left Tailed Hypothesis Test◦ Right Tailed Hypothesis Test◦ Two Tailed or Two Sided Hypothesis Test

3. Hypotheses Continued

32

Page 33: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

1. Left Tailed Hypothesis Test: Researchers are only interested in whether

the true value is below the hypothesized value.

e.g—

2. Right Tailed Hypothesis Test: Researchers are only interested in

whether the True Value is above the hypothesized value.

e.g.–

3. Hypotheses Continued

000 :.: aHvsH

33

Page 34: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

3. Two Tailed or Two Sided Hypothesis Test: The researcher is interested in looking above and below they hypothesized value.

3. Hypotheses Continued

000 :.: aHvsH

34

Page 35: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Three Requirements for Stating Hypotheses:1. Two complementary hypotheses.

2. A parameter about which the test is to be based e.g.—μ

3. Hypothesized Value for parameter

Denoted but generally takes on numeric values in practice

3. Hypotheses Continued

andorand

35

Page 36: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Computational Part of the Test

What is part of the Mechanics step?◦ Stating the Significance Level◦ Finding the Rejection Rule◦ Computing the Test Statistic◦ Computing the p-value

4. Mechanics

36

Page 37: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Significance Level: Here we choose a value to use as the significance level, which is the level at which we are willing to start rejecting the null hypothesis.

Denoted by α

Default value is α=.05, use α=.05 unless otherwise noted!

4. Mechanics Continued

37

Page 38: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Rejection Rule: State our criteria for rejecting the null hypothesis.◦ “Reject the null hypothesis if p-value<.05”.

p-value: The probability of obtaining a point estimate as “extreme” as the current value where the definition of “extreme” is taken from the alternative hypotheses assuming the null hypothesis is true.

4. Mechanics Continued

38

Page 39: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Test Statistic: Compute the test statistic, which is usually a standardization of your point estimate.

Translates your point estimate, a statistic, to follow a known distribution so that is can be used for a test.

4. Mechanics Continued

39

Page 40: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

p-value: After computing the test statistic, now you can compute the p-value.

Use software to compute p-values.

4. Mechanics Continued

40

Page 41: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Conclusion: Last step of the hypothesis test just like it is the last step when computing confidence intervals.

Conclusions should always include:◦ Decision: reject or fail to reject◦ Linkage: why you made the decision (interpret p-

value)◦ Context: what your decision means in context of

the problem.

5. Conclusion

41

Page 42: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Note: Your decision can only be one of two choices:

1. Reject --data gives strong indication that is more likely

2. Fail to Reject --data gives no strong indication that is more likely

When conducting hypothesis tests, we assume that is true, therefore the decision CAN NOT be to accept the null hypothesis

5. Conclusion

0HaH

0H

aH

0H

42

Page 43: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

One Sample T-Test

43

Page 44: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

One Sample T-Test Used to test whether the population mean is

different from a specified value.

Example: Is the mean height of 12 year old girls greater than 60 inches?

http://office.microsoft.com/en-us/images

44

Page 45: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Step 1: Formulate the Hypotheses

The population mean is not equal to a specified value.Null Hypothesis, H0: μ = μ0

Alternative Hypothesis: Ha: μ ≠ μ0

The population mean is greater than a specified value. H0: μ = μ0

Ha: μ > μ0

The population mean is less than a specified value.H0: μ = μ0

Ha: μ < μ0

45

Page 46: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Step 2: Check the Assumptions The sample is random.

The population from which the sample is drawn is either normal or the sample size is large.

46

Page 47: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Steps 3-5 Step 3: Calculate the test statistic:

Where

Step 4: Calculate the p-value based on the appropriate alternative hypothesis.

Step 5: Write a conclusion.

ns

yt

/0

11

2

n

yys

n

ii

47

Page 48: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Iris Example A researcher would like to know whether the mean sepal

width of a variety of irises is different from 3.5 cm. Use .

The researcher randomly selects 50 irises and measures the sepal width.

Step 1: HypothesesH0: μ = 3.5 cm

Ha: μ ≠ 3.5 cm

http://en.wikipedia.org/wiki/Iris_flower_data_set

48

Page 49: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

JMP Steps 2-4:

JMP DemonstrationAnalyze DistributionY, Columns: Sepal Width

Normal Quantile Plot

Test MeanSpecify Hypothesized Mean: 3.5

49

Page 50: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

JMP Output

Step 5 Conclusion: Fail to reject since the p-value=0.1854 is greater than 0.05. There is significant sample evidence to indicate that the mean sepal width is not different from 3.5 cm.

50

Page 51: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Two Sample T-Test

51

Page 52: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Two Sample T-Test Two sample t-tests are used to determine

whether the population mean of one group is equal to, larger than or smaller than the population mean of another group.

Example: Is the mean cholesterol of people taking drug A lower than the mean cholesterol of people taking drug B?

52

Page 53: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Step 1: Formulate the Hypotheses The population means of the two groups are not equal.

H0: μ1 = μ2

Ha: μ1 ≠ μ2

The population mean of group 1 is greater than the population mean of group 2.H0: μ1 = μ2

Ha: μ1 > μ2

The population mean of group 1 is less than the population mean of group 2.H0: μ1 = μ2

Ha: μ1 < μ2

53

Page 54: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Step 2: Check the Assumptions The two samples are random and

independent.

The populations from which the samples are drawn are either normal or the sample sizes are large.

The populations have the same standard deviation.

54

Page 55: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Steps 3-5 Step 3: Calculate the test statistic

where

Step 4: Calculate the appropriate p-value. Step 5: Write a Conclusion.

21

21

11

nns

yyt

p

2

)1()1(

21

222

211

nn

snsnsp

55

Page 56: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Two Sample Example A researcher would like to know whether the

mean sepal width of setosa irises is different from the mean sepal width of versicolor irises.

The researcher randomly selects 50 setosa irises and 50 versicolor irises and measures their sepal widths.

Step 1 Hypotheses:H0: μsetosa = μversicolor

Ha: μsetosa ≠ μversicolorhttp://en.wikipedia.org/wiki/Iris_flower_data_set

http://en.wikipedia.org/wiki/Iris_versicolor

56

Page 57: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

JMP Steps 2-4:

JMP Demonstration:Analyze Fit Y By XY, Response: Sepal WidthX, Factor: Species

Means/ANOVA/Pooled t

Normal Quantile Plot Plot Actual by Quantile

57

Page 58: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

JMP Output

Step 5 Conclusion: There is strong evidence (p-value < 0.0001) that the mean sepal widths for the two varieties are different.

setosa

versicolor

-2.33 -1.64-1.28 -0.67 0.0 0.67 1.281.64 2.33

0.5

0.8

0.9

0.2

0.1

0.0

2

0.9

8

Normal Quantile

58

Page 59: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Paired T-Test

59

Page 60: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Paired T-Test The paired t-test is used to compare the

population means of two groups when the samples are dependent.

Example:A researcher would like to determine if background noise causes people to take longer to complete math problems. The researcher gives 20 subjects two math tests one with complete silence and one with background noise and records the time each subject takes to complete each test.

60

Page 61: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Step 1: Formulate the Hypotheses

The population mean difference is not equal to zero. H0: μdifference = 0

Ha: μdifference ≠ 0 The population mean difference is greater than

zero. H0: μdifference = 0

Ha: μdifference > 0 The population mean difference is less than a zero.

H0: μdifference = 0

Ha: μdifference < 0

61

Page 62: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Step 2: Check the assumptions The sample is random.

The data is matched pairs.

The differences have a normal distribution or the sample size is large.

62

Page 63: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Steps 3-5

ns

dt

d /

0

Where d bar is the mean of the differences and sd is the standard deviations of the differences.

Step 4: Calculate the p-value.

Step 5: Write a conclusion.

Step 3: Calculate the test Statistic:

63

Page 64: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Paired T-Test Example A researcher would like to determine

whether a fitness program increases flexibility. The researcher measures the flexibility (in inches) of 12 randomly selected participants before and after the fitness program.

Step 1: Formulate a HypothesisH0: μAfter - Before = 0

Ha: μ After - Before > 0http://office.microsoft.com/en-us/images

64

Page 65: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Paired T-Test Example Steps 2-4:

JMP Analysis:Create a new column of After – BeforeAnalyze DistributionY, Columns: After – Before

Normal Quantile Plot

Test MeanSpecify Hypothesized Mean: 0

65

Page 66: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

JMP Output

Step 5 Conclusion: There is not evidence that the fitness program increases flexibility.

66

Page 67: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

One-Way Analysis of Variance

67

Page 68: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

One-Way ANOVA ANOVA is used to determine whether three

or more populations have different distributions.

A B C

Medical Treatment

68

Page 69: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

ANOVA Strategy

The first step is to use the ANOVA F test to

determine if there are any significant differences

among the population means.

If the ANOVA F test shows that the population

means are not all the same, then follow up tests

can be performed to see which pairs of population

means differ.

69

Page 70: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

One-Way ANOVA Model

i

ij

i

ij

ijiij

nj

ri

N

y

y

,,1

,,1

),0(~

groupith theofmean theis

levelfactor ith on the jth trial theof response theis

Where

2

In other words, for each group the observed value is the group mean plus some random variation.

70

Page 71: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

One-Way ANOVA Hypothesis Step 1: We test whether there is a

difference in the population means.

equal. allnot are The :

: 210

ia

r

H

H

71

Page 72: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Step 2: Check ANOVA Assumptions The samples are random and independent of

each other. The populations are normally distributed. The populations all have the same standard

deviations.

The ANOVA F test is robust to the assumptions of normality and equal standard deviations.

72

Page 73: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Step 3: ANOVA F Test

Compare the variation within the samples to the variation between the samples.

A B C A B C

Medical Treatment

73

Page 74: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

ANOVA Test Statistic

MSE

MSG

Groupswithin Variation

Groupsbetween Variation F

Variation within groups small compared with variation between groups → Large F

Variation within groups large compared with variation between groups → Small F

74

Page 75: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

MSG

1-r

)(n)(n)(n

1 -r

SSGMSG

21r

222

211

yyyyyy

The mean square for groups, MSG, measures

the variability of the sample averages.

SSG stands for sums of squares groups.

75

Page 76: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

MSE

1

)(

s

Wherer -n

1)s - (n1)s - (n 1)s - (n

r -n

SSE MSE

1i

2rr

222

211

i

n

jiij

n

yyi

Mean square error, MSE, measures the variability within the groups.

SSE stands for sums of squares error.

76

Page 77: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Steps 4-5 Step 4: Calculate the p-value.

Step 5: Write a conclusion.

77

Page 78: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

ANOVA Example A researcher would like to determine if

three drugs provide the same relief from pain.

60 patients are randomly assigned to a treatment (20 people in each treatment).

Step 1: Formulate the HypothesesH0: μDrug A = μDrug B = μDrug C

Ha : The μi are not all equal.

http://office.microsoft.com/en-us/images

78

Page 79: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Steps 2-4 JMP demonstration

Analyze Fit Y By X Y, Response: Pain

X, Factor: Drug

Normal Quantile Plot Plot Actual by Quantile

Means/ANOVA

79

Page 80: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

JMP Output and Conclusion

Step 5 Conclusion: There is strong evidence that the drugs are not all the same.

50

55

60

65

70

75

Pa

in

Drug A Drug B Drug CDrug

Drug ADrug BDrug C

-2.33 -1.64-1.28 -0.67 0.0 0.67 1.281.64 2.33

0.5

0.8

0.9

0.2

0.1

0.0

2

0.9

8

Normal Quantile

80

Page 81: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Follow-Up Test The p-value of the overall F test indicates

that the level of pain is not the same for patients taking drugs A, B and C.

We would like to know which pairs of treatments are different.

One method is to use Tukey’s HSD (honestly significant differences).

81

Page 82: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Tukey Tests Tukey’s test simultaneously tests

JMP demonstrationOneway Analysis of Pain By Drug Compare Means All Pairs, Tukey HSD

'a

'0

:H

:H

ii

ii

for all pairs of factor levels. Tukey’s HSD controls the overall type I error.

82

Page 83: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

JMP Output

The JMP output shows that drugs A and C are significantly different.

Drug C

Drug C

Drug B

Level

Drug A

Drug B

Drug A

- Level

5.850000

3.600000

2.250000

Difference

1.677665

1.677665

1.677665

Std Err Dif

1.81283

-0.43717

-1.78717

Lower CL

9.887173

7.637173

6.287173

Upper CL

0.0027*

0.0897

0.3786

p-Value

83

Page 84: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Two-Way Analysis of Variance

84

Page 85: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Two-Way ANOVA We are interested in the effect of two

categorical factors on the response. We are interested in whether either of the

two factors have an effect on the response and whether there is an interaction effect. ◦ An interaction effect means that the effect on the

response of one factor depends on the level of the other factor.

85

Page 86: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Interaction

Low High Dosage

Impr

ovem

ent

No Interaction

Drug A Drug B

Low High Dosage

Impr

ovem

ent

Interaction

Drug A Drug B

86

Page 87: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Two-Way ANOVA Model

ij

ijk

ij

j

i

ijk

ijkijjiijk

nk

bj

ai

N

y

y

,...,1

,,1

,,1

),0(~

Bfactor of leveljth theandA factor of levelith theofeffect n interactio theis )(

Bfactor of leveljth theofeffect main theis

Afactor of levelith theofeffect main theis

mean overall theis

level Bfactor jth theand levelA factor ith on the kth trial theof response theis

Where

)(

2

87

Page 88: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Two-Way ANOVA Example We would like to determine the effect of two

alloys (low, high) and three cooling temperatures (low, medium, high) on the strength of a wire.

JMP demonstrationAnalyze Fit ModelY: StrengthHighlight Alloy and Temp and click Macros Factorial to DegreeRun Model

http://office.microsoft.com/en-us/images

88

Page 89: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

JMP Output

Conclusion: There is strong evidence of an interaction between alloy and temperature.

89

Page 90: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Conclusion The one sample t-test allows us to test

whether the population mean of a group is equal to a specified value.

The two-sample t-test and paired t-test allow us to determine if the population means of two groups are different.

ANOVA allows us to determine whether the population means of several groups are different.

90

Page 91: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

SAS, SPSS and R For information about using SAS, SPSS and

R to do ANOVA:

http://www.ats.ucla.edu/stat/sas/topics/anova.htm

http://www.ats.ucla.edu/stat/spss/topics/anova.htm

http://www.ats.ucla.edu/stat/r/sk/books_pra.htm

91

Page 92: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

References Fisher’s Irises Data (used in one sample and

two sample t-test examples).

Flexibility data (paired t-test example):Michael Sullivan III. Statistics Informed Decisions Using Data. Upper Saddle River, New Jersey: Pearson Education, 2004: 602.

92

Page 93: Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech.

Special thanks to Jennifer Kensler for course materials and help with JMP!

93