Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods ›...

59
Richard S. Balkin, Ph.D., 2020 1 Hypothesis Testing I. 2 types of hypotheses: A. research/scientific—includes the following: 1. declarative statement about what we think should happen in our study 2. an examination of variables that deter the study B. statistical hypotheses: you have a null and alternative 1. accept one a. Ho: b. H1: 2. supports existence or lack of a. accountability b. evidence c. does not prove II. Hypothesis testing is governed by the Central Limit Theorem A. Distribution of sample means from random sample will be normally distributed B. The more samples you have, the closer your values get to the mean C. If your sample is large enough, outliers will not affect the normalizing of the sample distribution D. Sample size is not as important as representativeness E. With a representative sample, the Central Limit Theorem allows you to generalize about the population 2 1 Μ = Μ 2 1 Μ Μ

Transcript of Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods ›...

Page 1: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

1

Hypothesis Testing

I. 2 types of hypotheses:

A. research/scientific—includes the following:

1. declarative statement about what we think should happen in our

study

2. an examination of variables that deter the study

B. statistical hypotheses: you have a null and alternative

1. accept one

a. Ho:

b. H1:

2. supports existence or lack of

a. accountability

b. evidence

c. does not prove

II. Hypothesis testing is governed by the Central Limit Theorem

A. Distribution of sample means from random sample will be normally

distributed

B. The more samples you have, the closer your values get to the mean

C. If your sample is large enough, outliers will not affect the normalizing of

the sample distribution

D. Sample size is not as important as representativeness

E. With a representative sample, the Central Limit Theorem allows you to

generalize about the population

21 Μ=Μ

21 Μ≠Μ

Page 2: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

2

F. Steps to hypothesis testing

1. state the null and alternative hypotheses

2. identify level of significance—this is subjective; may be

directional or non-directional for t-tests or z- tests

3. perform test

4. accept or reject null

5. state results

G. error

True False

Retain Null

Reject Null

Correct

Type II error

Type I error

Correct Power

α−1 β

αβ−1

Page 3: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

3

When increases, power increases

Increasing sample size is the greatest, most effective means of increasing power.

if we increase the sample size to n = 100, then:

Higher sample size decreases error and increases power

α

σX =σn

=525

=1

z =X −µσX

=20 −191

=11

=1

σX =σn

=5100

= .5

z =X −µσX

=20 −19.5

=1.5

= 2

Page 4: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

4

Group1 Group 2

Score Score

20 0 0 18 -1 1 0

18 -2 4 16 -3 9 6

21 1 1 20 1 1 1

18 -2 4 19 0 0 0

23 3 9 21 2 4 6

20 0 0 20 1 1 0

Independent t-test Dependent t-test

x 2x x 2x xy

7746.

8974.120

1

1

1

=

=

=

x

x

s

sX ∑ = 182

1x

7303.

7889.119

2

2

2

=

=

=

x

x

s

sX ∑ = 162

2x

7660.

6.213

=

=

=∑

r

s

xy

xy

Page 5: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

5

nullretain

if

nnnnxx

XXssXX

xx

812.19393.10.

9393.0646.11

61

61

1016181920

11)1()1( 2121

22

21

2121

21

<

=

==

!"

#$%

& ++

=

!!"

#$$%

&+

−+−

+

−=

∑∑

α

nullretain

srsss

XXsXX

xxxxxx

015.294.18666.1334.1

1)7303)(.7746)(.7660(.25334.6000.

1920

2212121

222121

<

=−

=−+

=−+

−=

Page 6: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

6

ANOVA

I. Model equation A. Completely Randomized Design (CR-p)

1. j = specific tx level 2. i = observation for each participant

B.

1. observation for subject I in tx level j 2. grand mean 3. tx effect for population j and is equal to grand mean – population mean--

4. error effect

C. We do ANOVA, rather than several t-tests to keep type I error low (e.g. Five t-

tests if alpha = .05 = a total type I error rate of .25; this is unacceptable).

II. Computing ANOVA J = # of groups

# of subjects in a specific group (treatment level) = number in total sample

= mean of a specific group (treatment level) = grand mean

= variance; s = standard deviation

Sum of squares between is the sum of the squared deviations between the group mean minus the grand mean

Mean square between is the estimate of the population variance

between groups

Sum of squares within is the sum of the squared sample variances

Mean Square Within (also known as Mean Square error) is the

mean of the j sample variances

ijjijY εαµ ++=

=ijY=µ=jα

µµ −j=ijε

=jn.njX.X2s

....)(.)(.)( 222

211

2 XXnXXnXXnSS jjB −+−=−=∑

1−=JSSMS B

B

)...1()1( 2221

21 −+−= nsnsSSw

JnssMS w

w −=.

Page 7: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

7

degrees of freedom

grand mean

Group1 Group 2 Group 3 Group 4

4 9 8 1 6 11 6 2 8 8 9 3 3 9 5 5 9 8 7 1

6 9 7 2.4 6.1

6.5 1.5 2.5 2.8

III. Model Assumptions

A. Independence—sampling, treatment implementation

B. Normality-- There are significance tests for both skewness and kurtosis

1. Skewness

standard error of skewness

2. Kurtosis

JnnndfJdf

w

b

−=−+−=

−=

.)...1()1(1

21

.......

21

2211

nnXnXnX

+

+=

w

B

MSMSF =

=1X =2X =3X =4X =.X=21s =22s =23s =24s

49.11325.32.38

325.3162.53

4202.53

2.53)4(8.2)4(5.2)4(5.1)4(5.6

2.3836.114

146.114

6.11445.6805.405.4205.)1.64.2(5)1.67(5)1.69(5)1.66(5 2222

==

==−

=

=+++=

==−

=

=+++

=−+−+−+−=

F

MS

SS

MS

SS

w

w

B

B

====N

SskewnessSSSz ss

6;;

Page 8: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

8

standard error of kurtosis

Alpha levels of .01 or .001 are appropriate to evaluate significance of

skewness and kurtosis with small or moderate sample sizes. With a large

sample size, examine the shape of the distribution. Skewness above +1 or

below -1 should be evaluated for transformation. Usually kurtosis will

stabilize when skewness is stable.

3. Shapiro-Wilk statistic is also a measure of normality. Check this at the .01

level of significance.

For ANOVA, we should consider if a transformation is necessary if normality

assumption is breached

Sample data forms with recommended transformations:

As with many statistical techniques, transformation is an iterative process which requires post calculation evaluation. Check to see that a variable is normally or near-normally distributed after transformation, if not, redo with a more appropriate transformation.

====N

SkurtosisKSK

z kk

24;;

Page 9: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

9

Continue attempting transformations until skewness and kurtosis values are nearest zero, or the fewest outliers. The suggested transformations above are intended to bring the distribution closer to normal.

C. Homogeneity of Variance (HOV)—Brown-Forsythe, Levene Statistic

1. for t-tests, if n’s are equal, t-test is robust—the procedure gives accurate

results

2. if n’s are unequal:

a. liberal F-test—inflated type I error if

1. larger variances are paired with smaller n’s and

2. smaller variances are paired with lager n’s

b. conservative F-test—deflated type I error

1. larger variances are paired with larger n’s and

2. smaller variances are paired with smaller n’s

3. Effects of model assumptions failure

a. Non-normality: negligible consequences on type I and type II error

UNLESS

1. Populations are highly skewed

2. N’s are very small

3. One tailed tests are employed (t or z tests only)

4. Kurtosis: leptokurtic distributions increase power; platykurtic

distributions decrease power

b. HOV: if n’s are equal, ANOVA is robust; when n’s are unequal and

HOV is not achieved, consider F’ (Welch statistic), t’, or F* (Brown-

Forsythe)

Page 10: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

10

c. What HOV test to use?

1. Levene statistic

a. examines deviations from the mean in their respective group by

utilizing ANOVA

b. Default on SPSS is Levene statistic

c. Glass and Hopkins are very critical of Levene statistic:

1. Not robust to nonnormality

2. Poor estimator with unequal n’s

2. Brown-Forsythe is preferred test

a. also uses ANOVA but compares each observation and its

group median

b. robust to nonnormal distributions

c. accurate type I error rates

pp. 100-107 Kirk (1995)

IV. Power in ANOVA

A. Power in ANOVA is affected by

1. magnitude of differences between means

2. error variance

3. degrees of freedom in numerator: j-1

4. error degrees of freedom n.-j

5. type I error:

B. To determine power, compute : the noncentrality parameter

α

φ

Page 11: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

11

Suppose we have three groups. Each group has 50 participants (N = 150). The WISC III is administered and has a standard deviation of 15. Group 1 has a mean of 90; Group 2 has a mean of 95; Group 3 has a mean of 100. Grand mean = 95.

(check Table G)

C. What if the standard deviation is unknown?

1. We can estimate using standard deviation units—this is subjective

2. Let’s assume we have 3 levels of an independent variable such as socio-

economic status (low, middle, high).

3. We will assume the high SES will be 1 standard deviation above the low SES

group in a given study; Middle SES will be between low and high. So,

)()(.)( 2

w

B

w

jj

MSjSS

MSjXXn

=−

= ∑φ

84.1

92.1)225(3

)25(50)0(50)25(50)(

25;595100

0;09595

25;59590

233

222

211

=−

=++

==

==−=

==−=

=−=−=

β

φ

α

α

α

w

B

j

j

j

MSjSS

X

X

X

25.5.)5.1(

0)5.5(.

25.5.)5.0(

5.]80/)1(20)5(.40)0(20[.

20,40,200.1,5.,0

2223

222

2221

321

321

==−=

=−=

=−=−=

=++=

===

===

α

α

α

X

nnnXXX

φ =SSB

j(MSw )=

20(.25) + 40(0) + 20(.25)3(1)

=103

=1.83

1−β = .75

Page 12: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

12

check Table G

V. Effect size

Practical significance is measured by effect size. For ANOVA we use a Cohen’s d or a

Cohen’s f. A Cohen’s d compares two groups and may also be used for t-tests. Cohen’s f

compares 2 or more groups. Cohen’s d =

Example (from p.4):

note:

medium effect size, demonstrating a moderate level of practical significance

)1()1( 21

212

2

2

21

−+−

+==

∑∑nn

xxss

or

MS

ors

XX

ww

error

error

19

20

2

1

=

=

X

X79.190.1

2

1

=

=

ss

16

18

221

2

=

=

∑∑x

x)1(22 −=∑ nsx

54.84.11

10341

)16()16(16181920

===

−+−

+

Small = .2 Medium = .5 Large = .8

Page 13: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

13

Cohen’s f =

Large effect size, demonstrating very strong practical significance

There is also another method of demonstrating practical significance—Eta-squared ( -

-which refers to strength of association between the independent variable(s) and the

dependent variable. Essentially, it indicates the amount of variance accounted for in the

dependent variable by the independent variable(s). If the strength of association is weak,

or low, the independent variable(s) have less meaning/relevance to the dependent

variable.

The reason effect size is a better measure of practical significance than strength of association is

because effect size is reported in standard deviation units, whereas strength of association is

concerned with accounting for amount of variance. The amount of variance accounted for will be

an issue in multiple regression.

VI. Post Hoc Analyses

A. When conducting an ANOVA on three or more groups, a statistically significant result

means that at least one group is statistically significantly different from the other groups.

Post hoc analyses determine what group or groups are statistically significantly different

(X j − X .)2

(J)MSw=

[(6 − 6.1)2 + (9 − 6.1)2 + (7 − 6.1)2 + (2.4 − 6.1)2](4)3.325

=

(.01+ 8.41+ .81+13.69)(4)3.325

=1.31

)2η

SSTOSSBG

SSWGSSBGSSBG

e

=+

=+

= 22

22

α

α

σσ

ση

Small = .10 Medium = .25 Large = .40

Small = .01 Medium = .059 Large = .138

Page 14: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

14

from the others. In a post hoc analysis, each group mean is compared to all other group

means. A t-test is then conducted. The most common type of post hoc analysis is known

as a Tukey post hoc.

B. Conducting a Tukey Post Hoc analysis

1. determine the maximum number of comparisons

2. Compute q; q = the difference between two means. To compute q, we use

the standard error of the mean-- --except we do not use the standard

deviation in the computation (like we did in computing a z-test) but MSw.

nj refers to sample size of each group

if n’s are not equal then

j & k refer to the sample size of each group being compared.

Example (from p. 7):

Example: 4 groups : C = 4(4-1) / 2 = 6

First we compare the two groups with the greatest difference between the

means. If they are not significant, the other groups will not be either.

3. Use Table I: dfb = # of groups

dfw = n.-j

2/)1( −= JJC

Xs

sX =MSwn j

!!"

#$$%

&+=

kj

wX nn

MSs 112

82.5325.3

==Xs

q1 =X 2 − X 4

sX=9 − 2.4.82

= 8.05

Page 15: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

15

qcrit (4, 16) = 4.05, p = .05

8.05 > 4.05

Since q1 is statistically significant, we will compare the next group until we have

either compared all of the groups or one group is not statistically significant. qcrit

will remain the critical value for each of the comparisons.

5.61 >4.05

Since q2 is statistically significant, we would move to the next largest pair until we

see that a pair is not significant or all groups are statistically signficant.

C. There are other methods of comparing mean differences. The Newman-Keuls method is

the same as the Tukey but does not adjust for type I error. Thus, if α = .05 and you are

making 6 comparisons, you have a 30% chance of type 1 error (.05 X 6). BUYER

BEWARE!

VII. Planned vs Post hoc (A priori vs. post hoc (a posteriori))

A. Multiple comparisons

1. unplanned post hoc i.e. Tukey

2. planned a priori i.e. contrast comparisons

B. Contrasts:

1. All mean comparisons have a theoretical constant applied to them that add to zero

e.g. ψ = µ3 – µ6; contrast coefficients are 1 and -1 respectively

2. Simple contrast = 2 mean comparisons

complex contrast = 3 or more mean comparisons

q2 =X 3 − X 4

sX=7 − 2.4.82

= 5.61

Page 16: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

16

3. (p. 454 G&H) We can use contrasts to detect significant differences between groups

using a t ratio as a test statistic—for example, using the ANOVA example on p. 6, we are

going to compare groups 1 & 3 to group 2 and group 2 to 3.

a. compute the standard error of a contrast:

b. compute

c. Compute t ratio:

d. Now do the other contrasts (group 2 to group 3)

compute

Compute t ratio:

e. Use Table L for critical value: v = n.-j; number of contrasts was 2

Critical value at the .05 level is 2.467

f. When doing complex comparisons, use Scheffe. Keep in mind Scheffe does not

consider type I error: Family based type I error rate.

VIII. Planned Orthogonal Contrast(17.16, 17.18)

!!

"

#

$$

%

&++=

j

jw n

cnc

ncMSs

2

2

22

1

21

ˆ ...ϕ

9987.9975.)3(.325.350

521

5)1(

521

325.32

2

2

2

ˆ ===

!!!!

"

#

$$$$

%

&

++−

+=ϕs

jjjj XcXcXcXc +++=∑= ...ˆ 2211ψ

5.2)7(21)9)(1()6(

21

−=+−+=

50.29987.5.2ˆ

ˆ

−=−

ψs

jjjj XcXcXcXc +++=∑= ...ˆ 2211ψ

0.2)7)(1()9(1 =−+=

734.115.10.2ˆ

ˆ

==ψ

ψs

Page 17: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

17

A. POC are the most powerful tests of mean differences

B. Uses a contrast based type I error rate

C. Orthogonality refers to unique, non-overlapping information of j-1 orthogonal contrasts.

POC is appropriate when

1. all research questions can be answered by C = j-1 or fewer contrasts

2. all contrasts are orthogonal

Examples:

We have seven groups and we are going to compare group 1 to group 7; group 3 to group 6;

groups 1-4 to groups 5-7:

[1 0 0 0 0 0 -1]

[0 0 1 0 0 -1 0]

[¼ ¼ ¼ ¼ -1/3 -1/3 -1/3]

Not orthogonal

4 groups: Compare 1 to 2; 3 to 4; 1-2 to 3-4:

[1 -1 0 0]

[0 0 1 -1]

[- ½ -½ ½ ½ ]

=1ψ

=2ψ

=3ψ

( )127

310

311

310

411

410

410:&

0)0)(1()1(0)0(0)1(0)0(0)0(1:&

32

21

=!"

#$%

&−+!"

#$%

&−−+!"

#$%

&−+!"

#$%

&+!"

#$%

&+!"

#$%

&

=−+−++++

ψψ

ψψ

Page 18: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

18

Orthogonal

D. While this method increases power, conditions requiring contrasts to be planned and

orthogonal restricts its utility.

Factorial ANOVA

I. Factorial ANOVA tests is used when there are 2 or more independent variables

A. 2 types of hypotheses are tested

1. Whether levels in the independent variables (IVs) are statistically different

2. Whether there is an interaction between or among the IVs

B. Main effects: IV = Gender, SES; DV = test score

1. Is depression significantly different between males and females?

2. Is depression significantly different among SES?

3. Are mean differences between males and females consistent across SES?

0210

210

211

211:&

0211

211

210

210:&

0)1(0)1(0)0)(1()0(1:&

31

32

21

=!"

#$%

&−+!"

#$%

&+!"

#$%

&−−+!"

#$%

&−

=!"

#$%

&−+!"

#$%

&+!"

#$%

&−+!"

#$%

&−

=−++−+

ψψ

ψψ

ψψ

Page 19: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

19

e.g. no interaction

0

5

10

15

20

25

30

35

low middle high

SES

Test

sco

re

malesfemales

Page 20: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

20

Interaction effect

B. The absence of an interaction is the statistical justification for generalizability

C. If an interaction does exist, generalization of the main effects must be explained.

0

5

10

15

20

25

30

35

40

low middle high

SES

Test

Sco

re

malesfemales

Page 21: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

21

Factorial ANOVA K

Low Middle High Row Means Males 2 2 2 4 3 2 3 2 2.5 Cell mean 3 2.33 2.17 J Females 6 5 5 8 3 6 7.5 4 5 Cell mean 7.17 4 5.33 Column Means Grand Mean So, we will have 3 F-tests 1) Gender, 2) SES, 3) Gender X SES

(cell mean – (grand mean + row mean-grand mean + column mean-grand mean)

5.2.1 =X5.1ˆ −=α

5.5.2 =X5.1ˆ =α

08.5.1 =X 17.3.2 =X 75.3.3 =X08.1=β 83.−=β 25.−=β 0.4.. =X

2.

2 ..)(ˆ XXnKnKSS jjA −Σ=Σ= α

5.40])0.45.5()0.45.2)[3(3 22 =−+−=ASS2

.2 ..)(ˆ XXnJnJSS KKB −Σ=Σ= β

51.11)475.3()417.3()408.5)[(2(3 222 =−+−+−=BSS2)]ˆˆ..([ KJJK

JKAB XXnSS βα ++−= ∑∑

776.4)592.1(3)]006.45.35.006.44.34[(.3

]))25.(5.14(33.5())83.(5.14(4()08.15.14(17.7(

))25.()5.1(4(17.2())83.()5.1(4(33.2())08.1)5.1(4(3[(3222

222

==

+++++=

−++−+−++−+++−

+−+−+−+−+−+−++−+−=ABSS

Page 22: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

22

(individual score – cell mean)

DF:

∑∑∑ −=K J i

JKiJKW XXSS 2)(

68.711.45.11.01111.69.37.111.03.03.11.45.11.011)33.55.5()33.56()33.55()44()43()45()17.75.7()17.78()17.76(

)17.25.2()17.22()17.22()33.22()33.23()33.22()33()34()32(222222222

222222222

=+++++++++++++++++=

−+−+−+−+−+−+−+−+−

+−+−+−+−+−+−+−+−+−=WSS

11−=

−=

KSSJSS

B

A

SSAB = (J −1)(K −1)SSW = n.− (J)(K)

64.)2)(3)(2/(68.7388.22/776.4755.52/51.115.401/5.40

==

==

==

==

E

AB

B

A

MSMSMSMS

885.373.3)12,2(99.8)12,2(28.63)12,1(

=

=

=

=

critFFFF

Page 23: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

23

II. Analyzing Factorial ANOVA for nonsignificant interaction

A. Evaluate your data

B. Analyze model assumptions

C. Determine whether the interaction term is statistically significant. If you have a

nonsignificant interaction, you can generalize the omnibus F- test

D. Determine if either of the two main effects is statistically significant.

E. Review the sample means and the results of any necessary post hoc procedures

F. Summarize the results.

III. Analyzing Factorial ANOVA for significant interaction

A. Evaluate your data

B. Analyze model assumptions

C. Determine whether the interaction term is statistically significant. If you have a

significant interaction, you cannot generalize the omnibus F- test

D. Plot the interaction

E. Test for simple effects

1. Identify your hypotheses

2. sort your data and run your analysis

3. Calculate the F

Use the appropriate df associated with each value

Use F table to identify statistical significance

F. Review the sample means and the results of any necessary post hoc procedures

MSgroupswithinOmnibusMS

F effectsimple

−−−= −

Page 24: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

24

SPSS can perform the proceeding steps using the following code in SPSS Syntax:

UNIANOVA DV BY IV1 IV2

/EMMEANS = TABLES (IV1 * IV2) COMP (IV2)

Note: The COMP statement could contain the other IV.

G. Summarize the results.

Evaluate your Data

Analyze Model Assumptions

Determine Interaction Effect

Non-Significant Interaction Significant Interaction (p>alpha) (p<alpha)

Report Main Plot the Interaction Effects for each Independent Variable Test for simple effects Use SPSS Syntax as above Perform Post-Hoc (Tukey, effect size) Note: Tukey need only be performed when the main effect is significant and only when there are three or more levels within the Independent variable

Analyze post hoc tests for each significant simple effect

Page 25: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

25

Repeated Measures

I. Repeated measures follow a different experimental design, known as randomized block

design.

A. It differs in that each observation is measured more than once or individuals are matched

according to a specifric variable.

B. As a result, repeated measure designs can be viewed as univariate or multivariate

II. Analyzing Repeated Measures

A.Model assumptions

1. independence

2. normality/multivariate normality—repeated measures is quite robust ot this

assumption, especially when equal sample size are employed.

3. Sphericity is a way to measure homogeneity of covariance—a model assumption in

MANOVA. Sphericity refers to the equality of variances between each pair.

a. = extent to which covariance matrix deviates from sphericity

b. If = 1, sphericity is met

c. If > .70 sphericity holds—univariate is more powerful

B. If < .70

1. n is small, multivariate is more powerful

2. n is large, univariate is more powerful, but use Greenhouse-Geisser correction

3. Stevens (2007) recommended examining both approaches to confirm results.

ε

ε

ε

ε

Page 26: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

26

C. Post hoc procedures—use dependent t-tests but with a Bonferroni correction:

α / c, where c = # of pairwise comparisons [c = # of groups (# of groups -1) / 2]

For example, with 4 separate administrations, .05 / [4(4 – 1) / 2] = .05 / 6 = .008.

All results will be compared to the .008 level of significance.

Page 27: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

27

Multiple Regression

• The purpose of multiple regression is to explain variances and determine how and to what

extent variability in the criterion variable (dependent variable) depends on manipulation of

the predictor variable(s) (independent variable).

• Whereas ANOVA is experimental research (independent variable is manipulated), multiple

regression is a correlational procedure—it looks at relationships between predictor variables

and a criterion variable.

• Thus, both predictor and criterion variables are continuous in multiple regression.

• ANOVA and multiple regression both have a continuous variables as the dependent variable

(called criterion variable in regression) and utilize the F-test.

• In multiple regression, the F-test identifies a statistically significant relationship, as opposed

to statistically significant differences between groups in ANOVA.

Simple Regression—a review

raw score on Y for each individual

intercept

unstandardized regression coefficient for each raw score on x

error for each individual in predicting Y

• and are parameters to be estimated. Goal is to minimize error. Error is assumed to be

independent, normally distributed, with a mean of 0.

Simple regression formula:

• If we know information about X, we can predict Y

• We regress Y on X

iii xY εβα ++= 1

=iY

=ix1β

=iε

α 1β

Page 28: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

28

= predicted score of the dependent variable Y

b = regression coefficient

a = intercept

• The regression equation is based on the principle of least squares. The values used

minimize the errors in prediction. This is because the error in prediction is used in calculating

the regression coefficient.

• The difference is identified as

• The principle of least squares is calculated by summing the square errors of the prediction:

Understanding More Nomenclature

= total variance = ssreg + ssres

Remember, in ANOVA, sstot = ssb + ssrw

So, in regression,

F =

Because we are looking at variance and regression is directly tied to correlation,

F =

Multiple Correlation Coefficient

• The measure of effect size in regression is R2.

• It is the correlation between Y and Y’ squared

2xxy

b

bXaY

Σ

Σ=

+="

Y !

YY !−

2)( YY !−Σ

2yΣ

res

reg

res

reg

res

reg

MSMS

jNssjss

dfssdfss

=−−

=1/

///

2

1

)1/()1(/

2

2

−−− jNrjr

Small = .02 Medium = .13 Large = .26

Page 29: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

29

and equals the amount of variance accounted for in the model.

• When the predictor variables are not correlated to each other, R2 = the sum of the squared

correlations between each predictor variable to the criterion variable.

• However, in most research, we deal with correlated predictors.

• Thus, this produces some redundancy in what is being measured due to the intercorrelations

of the predictor variables—the predictor variables are measuring some of the same things.

• As a result, the unique amount of variance accounted for by each predictor variable is

reduced, giving inaccurate measures of the importance of the predictor variable. This is

known as multicollinearity.

• One way to detect multicollinearity is to examine the intercorrelations of the predictor

variables. Another way is to compute Variance Inflation Factors (VIF).

• VIF: If VIF is > (1/1 - R2), then multicollinearity may be affecting your results.

• However, there can be a special case in which the intercorrelations between predictor

variables actually increase to the amount of variance accounted for (R2), but have no unique

contribution by themselves. This is known as a suppressor variable—it controls for error.

• Suppressor variables have

o zero or near zero correlations with the criterion

o Moderate to strong correlations with at least one predictor variable

• Suppressor variables are uncommon in social science research

Uniqueness Index

Sometimes, you may wish to run an analysis, excluding specific variables. You can calculate

whether or not the change in R2 is statistically significant. This is known as a uniqueness index.

22

yss

R reg

Σ=

Page 30: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

30

F =

Regression Coefficients

• A regression coefficient for a given X variable represents the average change in Y that is

associated with one unit of change in X.

• The goal is to identify which of the predictor variables (X) are important to predicting the

criterion (Y).

• Regression coefficients may be nonstandardized or standardized.

• Nonstandardized regression coefficients (b) are produced when data are analyzed in raw

score form.

• It is not appropriate to use nonstandardized regression coefficients as the sole evidence of the

importance of the predictor variable. We can test the nonstandardized regression coefficient

It is possible to have a model that is statistically significant, but each predictor variable may

not be important. To test the regression coefficient,

• Important: The statistical significance of the nonstandardized regression coefficient is only

one piece of evidence that identifies the importance of the predictor variable and is not to be

used as the only evidence. This is because the nonstandardized regression coefficient is

affected by the standard deviation. Since different predictor variables have different standard

deviations, the importance of the variable is difficult to compare.

1/1/

2

22

−−−

−−

fullfull

reducedfullreducedfull

jNRjjRR

2xMSs

sbt

resb

b

S=

=

Page 31: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

31

• When we use standardized regression coefficients ( ), all of the predictor variables have a

standard deviation of 1 and can be compared.

• To calculate the standardized regression coefficient:

Statistical and practical significance in multiple regression

• Statistical and practical significance should be determined for both the model and each

predictor variable.

o Determine statistical significance of the model by evaluating the F test.

o Determine practical significance of the model by evaluating R2 . Cohen (1992)

recommended using f2 to determine effect size, where with the

following effect size interpretations: small = .02, medium = .15, and large = .35.

These values can easily be converted to R2 with the following interpretations:

small = .02, medium = .13, and large = .26.

o Statistical significance of each predictor variable is determined by a t-test of the

beta weights.

o Practical significance of each predictor variable should be determined using two

types of measures:

§ The squared semiparital correlation coefficient (sr2), which is the part

correlation squared in SPSS output. sr2 represents the unique amount of

variance that the predictor variable brings to the model. The advantage of

this value is that the researcher gains information as to the amount of

information the predictor variable contributes that is not shared by any

y

x

ssb=β

f 2 =R2

1− R2

Page 32: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

32

other variable in the model. However, this value is highly influenced by

intercorrelations with other predictor variables (i.e. multicollinearity).

§ In order to deal with this limitation, Thompson (1990; 2001) and Courville

and Thompson (2001) recommend examining structure coefficients.

Structure coefficients (rs) identify the relationship of a predictor variable

to what is predicted ( ). In other words, it is the proportion of the

correlation of the predictor variable and criterion variable (r) to the

predicted model (R). In other words, . When this value is squared,

the researcher can interpret the amount of variance that the predictor

variable contributes to the predictor model. While this value is not

distorted by multicollinearity, the value may not be pertinent if the overall

model is not significant. Thus, both sr2 and rs2 should be interpreted.

Model Assumptions

1. Predictor and criterion variables should be continuous and at least interval or ratio level

of measurement. You can use nominal level predictors, but they must be dummy-coded.

2. Sample should be random

3. Criterion variable should be normally distributed

4. Observations should be independent and not affected by another observation.

5. The relationship between the criterion variable and each predictor variable should be

linear.

6. Errors in prediction should be normally distributed

7. Errors should have a constant variance.

ˆ Y

Rr

rs xy=

Page 33: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

33

8. Errors should not be correlated with other errors in other observations or with the

predictor variable.

9. Predictor variables should be measured without error.

10. Absence of specification errors: predictor variables are not theoretically tenable or

relationship is not linear.

Regression is robust against minor violations of most of these assumptions with the

exception of independent observations, measurement error, or specification errors.

Power (Note: this section uses information from Howell (2002) from the University of

Vermont: http://www.uvm.edu/~dhowell/gradstat/)

Remember, power is dependent upon sample size and effect size. In multiple regression the

number of predictor variables is also important. The more predictor variables you have, the

larger your sample size needs to be.

We need a measure of effect size. This is true of any calculation of power. We have been using R2 as our measure of effect size. When we calculate power, we will use a version of R2. This effect size will be called f2

First consider the situation where we want the power for a significant R2

We want to get the power that our overall multiple R with 4 predictors and 40 subjects will be significant if R2 = .35. Define f2 = R2/(1-R2)

let p = number of predictor variables.

Let v = N – p – 1 = number of df for error

define l = f2(p + v + 1)

Look l up in the tables Cohen gives.

Page 34: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

34

For our example,

R = .59; R2 = .35

f2 = .35/(1-/35) = .35/.65 = .54

p = 4

v = 40 – 4 – 1 = 35

l = f2(p + v + 1) = .54(4 + 35 + 1) = .54´ 40 = 21.6

round this down to 20 to be conservative

round v down to 20 to be conservative.

A copy of one of Cohen’s tables is at the end of this book. (note: I have copied this from Cohen for classroom use. It is a copyrighted table.)

Power of the F Test, u= 1 to 8, a = .05, (u = number of predictors; v = df for error) So power for our example = .91. We have a very high probability of finding a significant correlation if the parameters are as I have specified them. l

u v 2 4 6 8 10 12 14 16 18 20

1 20 27 48 64 77 B5 91 95 97 98 99

60 29 50 67 79 88 92 96 98 99 99

120 29 51 68 80 88 93 96 98 99 99

00 29 52 69 81 89 93 96 98 99 99

2 20 20 36 52 65 75 83 88 92 95 97

60 22 40 56 69 79 87 91 95 97 98

Page 35: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

35

120 22 41 57 71 80 87 92 95 97 98

00 23 42 58 72 82 88 93 96 97 99

3 20 17 30 44 56 67 75 82 87 91 94

60 19 34 49 62 73 81 87 92 95 97

120 19 35 50 64 75 83 89 93 95 97

00 19 36 52 65 76 84 90 93 96 98

4 20 15 26 38 49 60 69 76 83 87 91

60 17 30 44 57 68 77 83 89 92 95

120 17 31 46 58 70 78 85 90 93 96

00 17 32 47 60 72 80 87 91 94 96

5 20 13 23 34 44 54 63 71 78 83 87

60 15 27 40 52 63 72 80 86 90 93

120 16 29 41 54 65 75 82 87 91 94

00 16 29 43 56 68 77 84 89 93 95

6 20 12 21 30 40 50 59 66 73 79 84

Page 36: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

36

60 14 25 37 48 59 68 76 83 87 91

120 14 27 39 50 62 71 79 85 89 93

00 15 27 40 53 64 74 81 87 91 94

7 20 11 19 28 37 46 54 62 69 75 80

60 17 24 35 45 56 65 73 80 85 89

120 13 25 37 47 59 68 76 82 87 91

OD 14 25 38 50 61 71 79 85 89 93

8 20 10 18 26 34 42 50 58 65 71 76

60 12 23 33 43 52 62 70 77 83 87

120 12 24 35 45 55 65 73 80 85 89

00 13 24 36 48 59 68 77 83 88 92

9 20 10 17 24 32 39 47 54 61 68 73

60 11 21 31 41 50 58 67 74 80 85

120 11 22 33 44 53 62 71 78 83 88

Page 37: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

37

00 13 23 34 45 56 66 74 81 86 90

(Modified slightly from Cohen, 1988, for class purposes only.)

Page 38: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

38

Example 1: Results for a One-way ANOVA with Tukey Post hoc

A one-way ANOVA was conducted to explore group differences based on score.

An alpha level of .05 was utilized. Descriptive statistics are in Table 1. All groups were

normally distributed. Variances were homogeneous, F (3, 16) = 1.367, p = .289.

Statistically significant differences were evident among the groups, F (3, 16) = 11.49, p <

.001. A large effect size was noted, h2= .68, indicative of a strong degree of practical

significance. Given the sample size of n = 20, statistical significance would be detected

only for large effect sizes, h2 > .41.

In order to investigate significant differences between groups, a Tukey post hoc

analysis was conducted. Statistically significant differences were noted between groups 1

and 4, groups 2 and 4, and groups 3 and 4 (see Table 2). Practical significance was

assessed using Cohen’s d. A moderate effect size was noted between groups 1 and 3.

Large effect sizes were noted between groups 1 and 2, 1 and 4, 2 and 3, 2 and 4, and 3

and 4 (see Table 2). Large effect sizes were indicative of very strong practical

significance.

Table 1

Descriptive Statistics

Group n Mean SD

1 5 6.00 2.55

2 5 9.00 1.225

3 5 7.00 1.581

4 5 2.40 1.673

Page 39: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

39

Table 2.

Tukey post hoc analysis

Group

Comparisons Mean Difference p d

1 2 -3.00 0.0815 1.65

3 -1.00 0.8215 0.55

4 3.60* 0.0301 1.97

2 3 2.00 0.3393 1.09

4 6.60* 0.0002 3.62

3 4 4.60* 0.0052 2.52

*p < .05

Page 40: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

40

Example 2: Results for a One-way ANOVA with a priori contrasts

A one-way ANOVA was conducted to explore group differences based on score.

An alpha level of .05 was utilized. Descriptive statistics are in Table 1. All groups were

normally distributed. Variances were homogeneous, F (3, 16) = 1.367, p = .289.

Statistically significant differences were evident among the groups, F (3, 16) = 11.489, p

< .001. A large effect size was noted, h2 = .68, indicative of a strong degree of practical

significance. Given the sample size of n = 20, statistical significance would be detected

only for large effect sizes, h2 > .41.

A priori contrast comparisons were conducted comparing groups 1 and 3 to group

2 and comparing group 2 to group 3. A Bonferroni adjustment yielded an alpha level of

.025 for statistical significance. There was a statistically significant difference between

groups 1 and 3 versus group 2, t(16) = -2.50, p = .024. A large effect size was noted, d =

2.74. There was not a statistically significant difference between group 2 versus group 3,

t(16) = 1.734, p = .102. A large effect size was noted, d = 1.10.

Table 1

Descriptive Statistics

Group n Mean SD

1 5 6.00 2.55

2 5 9.00 1.225

3 5 7.00 1.581

4 5 2.40 1.673

Page 41: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

41

Example 3: Factorial ANOVA Results with Nonsignificant Interaction

A 2 X 3 ANOVA was conducted on grade point average improvement with

respect to differences in note-taking methods and gender. An alpha level of .05 was

utilized for this study. Males and females were normally distributed. Note-taking method

was also normally distributed for method 1, method 2, and the control group. Variances

were homogeneous, FLevene (5, 54) = .575, p = .719.

There was not a statistically significant interaction between gender and note-

taking method, F(2, 54) = 2.921, p = .062. Statistically significant differences were found

in grade point average improvement between males and females, F(1, 54) = 15.86, p <

.001. Males had significantly improved their gpa compared to females (see Table 1). A

large effect size was noted d = 1.08 indicating a strong degree of practical significance.

Statistically significant differences were found in grade point average improvement

among note-taking methods, F(2, 54) = 17.809, p < .001. A large effect size was noted

h2 = .40 indicating a strong degree of practical significance. A Tukey post hoc analysis

was conducted on note-taking method. Method 2 was statistically significantly higher

than both the control group (p < .001) and Method 1 (p = .001) (see table 1). Large effect

sizes were noted between method 2 and method 1, d = .94, method 2 and the control

group, d = 1.65, and method 1 and control, d = .63. Given the sample size of n = 60,

statistical significance would be detected for large effect sizes, h2 > .14.

Page 42: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

42

Table 1

Change in GPA Across Gender and Note-Taking Method

Gender Mean SD n

Men 0.38 0.27 30

Women 0.19 0.19 30

Method 1 0.25 0.22 20

Method 2 0.47 0.25 20

Control 0.14 0.15 20

Page 43: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

43

Example 4a: Factorial ANOVA Results with Significant Interaction

A 2 X 3 ANOVA was conducted on grade point average improvement with

respect to differences in note-taking methods and gender. An alpha level of .05 was

utilized for this study. Males and females were normally distributed. Note-taking method

was also normally distributed for method 1, method 2, and the control group. Variances

were homogeneous, FLevene (5, 54) = .575, p = .719.

There was a statistically significant interaction between gender and note-taking

method, F(2, 54) = 10.543, p < .0001 (see Figure 1). In order to evaluate the interaction,

simple effects were analyzed. There was no statistically significant difference in grade

point average improvement for males across note-taking methods F(2, 54) = 2.50, p =

.092. However, a moderate effect size was noted, h2 = .09. Note-taking methods for

males appear to be better than when no method is taught at all, but the type of method did

not make much difference (see Table 1). There was a statistically significant difference in

grade point average improvement for females across note-taking methods F(2, 54) =

25.86, p < .001. A large effect size was noted, h2 = .49, indicating a strong degree of

practical significance. Note-taking method 2 for females appears to be better than note-

taking method 1 or when no method is taught at all (see Table 1). In order to investigate

the differences in note-taking method among females, a Tukey post hoc analysis was

conducted. Method 2 was statistically significantly higher than both the control group (p

< .001) and Method 1 (p < .001) (see table 1). Large effect sizes were noted between

method 2 and method 1, d = 2.59 and method 2 and control, d = 2.94. A moderate effect

size was noted between method 1 and method 2, d = .36. Given the sample size of n = 60,

statistical significance would be detected only for large effect sizes, h2 > .14.

Page 44: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

44

Figure 1.

Interaction Effect for Gender by Note-taking

Table 1

Change in GPA Across Gender and Note-Taking Method

Gender

Note-Taking

methods Mean SD n

Men Method 1 0.34 0.23 10

Method 2 0.31 0.19 10

Control 0.17 0.15 10

Women Method 1 0.17 0.18 10

Method 2 0.64 0.18 10

Control 0.11 0.15 10

Page 45: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

45

Example 4b: Factorial ANOVA Results with Significant Interaction

A 2 X 3 ANOVA was conducted on grade point average improvement with

respect to differences in note-taking methods and gender. An alpha level of .05 was

utilized for this study. Males and females were normally distributed. Note-taking method

was also normally distributed for method 1, method 2, and the control group. Variances

were homogeneous, FLevene (5, 54) = .575, p = .719.

There was a statistically significant interaction between gender and note-taking

method, F(2, 54) = 10.543, p < .0001 (see Figure 1). In order to evaluate the interaction,

simple effects were analyzed. There was no statistically significant difference in GPA

improvement for method 1 across males and females F(1, 54) = 4.13, p = .047. However,

a moderate effect size was noted, d = .55. There was a statistically significant difference

in GPA improvement for method 2 across males and females, F(1, 54) = 17.02, p < .001.

A large effect size was noted, d = 1.12, indicating a strong degree of practical

significance. There was no statistically significant difference in GPA improvement for

the control group across males and females F(1, 54) = .55, p = .463. A small effect size

was noted, d = .20. Note-taking method 2 for females appears to be better than note-

taking method 1 or when no method is taught at all (see Table 1). Given the sample size

of n = 60, statistical significance would be detected only for large effect sizes, h2 > .14.

Figure 1.

Page 46: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

46

Interaction Effect for Gender by Note-taking

Table 1

Change in GPA Across Gender and Note-Taking Method

Gender

Note-Taking

methods Mean SD n

Men Method 1 0.34 0.23 10

Method 2 0.31 0.19 10

Control 0.17 0.15 10

Women Method 1 0.17 0.18 10

Method 2 0.64 0.18 10

Control 0.11 0.15 10

Page 47: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

47

Example 5: One-way ANOVA Repeated Measures Results

A one-way repeated measures ANOVA was conducted on four administrations of the

DEW at just married, 5 years of marriage, 10 years of marriage, and 15 years of marriage.

Descriptive statistics for each of the administrations are in Table 1. Assumptions of

normality and sphericity were met due to the balanced nature of the design. A statistically

significant difference among the administrations was evident, F(3, 87) = 7.664, p < .001.

A large effect size was evident, h2= .21. Post hoc analyses were conducted to analyze

significant differences between each of the administrations. Statistically significant

differences were found between administrations 1 and 3, 1 and 4, 2 and 3, and 2 and 4

with moderate effect sizes (see Table 2). Given the sample size of n = 30, statistical

significance would be detected for small effect sizes, h2 > .042.

Table 1

Descriptive Statistics for Test Administrations

Mean SD N

time1 65.8 9.23 30

time2 65.43 10.69 30

time3 63.10 10.68 30

time4 61.93 12.57 30

Page 48: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

48

Table 2

Post hoc tests

Pair Mean Differences t d

time1 - time2 0.37 0.41 0.07

time1 - time3 2.7 2.90* 0.53

time1 - time4 3.87 3.00* .55

time2 - time3 2.33 3.76* 0.69

time2 - time4 3.5 3.40* 0.62

time3 - time4 1.17 1.47 0.27

* p < .008

Page 49: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

49

Example 6: SPANOVA Results

Participants were involved in a dieting program to lose weight and recruited to

examine whether there was a statistical significant difference between two kinds of

exercise frequency in determination of the weight loss. Fifty participants were recruited

and randomly assigned into two groups. In Group 1, participants will exercise 30 minutes

everyday. In Group 2, participants will exercise 30 minutes for four days per week. Prior

to beginning the exercise, a pretest was conducted to see how many kilograms

participants lost in their weight by dieting in a month. Then participants engaged in

aerobic exercise for a month. Posttests were conducted at the end of the month to see how

many kilograms participants lost based on their exercise routine. An a priori analysis was

conducted to determine appropriate sample size. Given a moderate effect size, a sample

size of 34 was necessary to achieve adequate power at .80 (Cohen, 1988).

A split-plot analysis of variance (SPANOVA) was conducted between exercise

groups across pretest and posttest measures of weight loss. An alpha level of .05 was

utilized. Assumptions for normality, homogeneity of covariances and homogeneity of

variances were met. There was not a statistically significant interaction between the tests

and exercise group, F(1, 48) = 3.43, p = .07 (see Figure 1). Thus, results are generalizable

across testing periods and exercise groups. A statistically significant difference between

pretest and posttest weight loss was evident, F(1, 48) = 132.59, p < .001, hp2 = .73

indicative of a very large effect size. Weight loss was increased in both groups upon

engagement of an exercise routine (see Table 1). A statistically significant difference

between exercise groups was evident, F(1, 48) = 7.35, p = .009, hp2 = .13 indicative of a

Page 50: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

50

moderate to large effect size. Group 2 (exercise four times per week) had slightly more

weight loss than those who exercised every day (group 1) as indicated in Table 1.

Table 1.

Descriptive Statistics of Exercise Groups Across Testing Periods.

Exercise type Mean SD N Pretest 1 2.34 .34 25

2 2.39 .26 25 Total 2.36 .30 50

Posttest 1 2.76 .11 25 2 2.97 .13 25

Total 2.86 .16 50

Figure 1.

Plot of Exercise Groups Across Testing Periods.

Page 51: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

51

Example 7: Multiple Regression Results

A multiple regression analysis was conducted on statistics exam scores based on

math aptitude and English aptitude. Descriptive statistics are reported in Table 1.

Statistics exam scores were normally distributed. Standardized residuals were also

normally distributed. Scatterplots were analyzed, and no curvilinear relationships

between the criterion variable and the predictor variables or heteroscedascity were

evident. There was a statistically significant relationship between math aptitude, English

aptitude, and statistics exam score, F(2, 97) = 16.63, p < .001. A large effect size was

noted with approximately 26% of the variance accounted for in the model, R2 = .255.

Math aptitude was a statistically significant predictor of statistics exam score (see Table

2) uniquely accounting for approximately 21% of the variance. English aptitude was not

significant and uniquely accounted for 2% of the variance. Thus, as a single predictor,

English aptitude has a moderate effect, but is less meaningful when included in a model

with math aptitude. Given the sample size of n = 100, statistical significance would be

detected for small effect sizes, R2 > .09.

Page 52: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

52

Table 1

Descriptive Statistics

Mean SD N Statistics exams Math aptitude English aptitude

Statistics exams 60.11 19.79 100 ---- .48* .20*

Math aptitude 460.60 77.37 100 ---- .12*

English aptitude 478.20 71.65 100 ----

* p < .05 Table 2

Multiple Regression Results for Statistics Exams

Predictor B SE B β t p sr2

Math aptitude .12 .02 .47 5.29* 0.00 .21

English aptitude .04 .02 .15 1.65 0.10 .02

* p < .05

Page 53: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

53

Example 8: MANOVA Results

A one-way MANOVA was conducted to determine the effect of three learning strategies

(writing, talking, and thinking) on two learning outcomes (recall and application). An

alpha level of o.5 was utilized. Descriptive statistics for the dependent variables across

study group are in Table 1. Assumptions for normality (W > .01) and homogeneity of

covariances (Box’s M = 6.98, p = .398) were met. A statistically significant effect was

identified between learning styles and the two dependent variables, Wilks’ l = .421, F (4,

52) = 7.03, p < .001. Approximately 58% of the variance in the model was accounted for

in the combined dependent variables across study groups, yielding a strong effect. An a

priori power analysis yielded a total sample size of 45 to find statistical significance with

a moderate effect size (f2 = .15). In this study, statistical significance was noted when

effect sizes were moderate to large (f2 = .22).

A post hoc discriminant analysis was conducted to determine how the study group

differences were manifested across the dependent variables. The first discriminant

function was significant, Wilks’ l = .42, c2(4) = 22.90, p < .001. Approximately 97% of

the variance in the model was accounted for in the first discriminant function for recall

and application across study groups. Recall loaded strongly (r >.99) and had a strong

relationship (b = .96) to the first discriminant function (see Table 2). Based upon these

results, the first discriminant function was labeled memory. The second discriminant

function was not significant, Wilks’ l = .96, c2(1) = 1.12 p = .29 Approximately 3% of

the variance in the model was accounted for in the second discriminant function for recall

and application across study groups. Centroid means for the discriminant functions

Page 54: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

54

indicated that the writing study strategy (1.41) had the most effect in memory, compared

to the talking (-.22) and thinking (-1.12) strategies.

Table 1

Descriptive Statistics.

Dependent

Variable

Study

Group Mean SD N

Recall Exam Think 3.30 0.68 10

Write 5.80 1.03 10

Talk 4.20 1.14 10

Application Exam Think 3.20 1.23 10

Write 5.00 1.76 10

Talk 4.40 1.17 10

Table 2

Correlation Coefficients and Standardized Function Coefficients.

Correlation Coefficients

with Discriminant Function

Standardized Function

Coefficients Variable

Recall .997 .96

Application .47 .09

Page 55: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

55 Example 9: Canonical Correlation Results

A canonical correlation analysis was performed between the GASS subscales and the (a)

TSR subscales. An alpha level of .05 was utilized. Cutoff correlations of .30 were used for

interpretation of the canonical variates (Tabachnick & Fidell, 2001). Assumptions for normality,

linearity, and homoscedascity were evaluated through distributions of scatterplots; assumptions

were met and no multivariate outliers were detected. When conducting the analysis between the

GASS and SPS, five cases were eliminated due to missing data leaving a sample size of 120 for

the analysis.

A statistically significant relationship was found between GASS subscales and the TSR

subscales. The first canonical root was significant, l = .89, F (4, 232) = 3.55, p = .008,

accounting for 11% (rc = .33) of the overlapping variance. The second canonical root was not

significant, l = .85, F (1, 117) = .33, p = .569, accounting for less than 1% (rc = .05) of the

overlapping variance. Therefore, only the first canonical variate was interpreted.

The first canonical variate included scores on both subscales of the GASS, Coping (.99)

and Commitment to Follow-up (.63) and the subscales on the TSR: Behavioral subscale (-. 87)

and the emotional subscale (.56). Adolescent clients with high therapeutic goal attainment had

fewer behavioral problems and had higher levels of emotional arousal.

Table 1 shows descriptive statistics for the instruments used for this sample. Shown in

Table 2 are the correlations and standardized canonical variate coefficients for the GASS

subscales and the TSR subscales as they relate to the first canonical variate.

Page 56: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

56

Table 1.

Descriptive Statistics for the GASS and TSR Scales

Scale N M SD

Goal Attainment Scale

Coping 124 .39 .66

Follow-up 124 .39 .62

Target Symptom Rating Scale

Behavioral 121 2.55 .58

Emotional 121 2.65 .50

Table 2

Correlations and Standardized Canonical Variate Coefficients on the GASS and TSR for the First Canonical Variate

Scale COR COE

Goal Attainment Scale

Coping .99 .98

Followup .63 .03

Target Symptom Rating Scale

Behavior -.87 -.83

Emotional .56 .49

Note. Correlations ³ .30 are in boldface. COR = correlations to the canonical variate. COE = standardized canonical variate coefficient.

Page 57: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

57 Example 10: ANCOVA Results

An analysis of covariance (ANCOVA) was used to examine differences between three

levels of vitamin C dosages across the number of days of cold symptoms during treatment with

the number of days of cold symptoms prior to treatment as a covariate. Descriptive statistics are

in Table 1. Assumptions of normality, homogeneity of covariance, and homogeneity of

regression were met for this analysis. Dimitrov (2009) noted covariates should be correlated to

the dependent variable. Number of days of cold symptoms prior to treatment had a moderate

relationship to number of days of cold symptoms during treatment (r = .50, p = .005). Thus, the

number of days of cold symptoms prior to treatment appeared to be a tenable covariate that was

important to control in this study.

Using an alpha level of .05, a statistically significant effect was noted among vitamin C

dosages across the number of days of cold symptoms during treatment , F (2, 26) = 6.45, p =

.001, hp2 = .33. A large effect size was noted, with approximately 33% of the variance

accounted for in the model. Univariate comparisons were analyzed using adjusted means. A

statistically significant difference was noted between the placebo and low doses and the placebo

and high doses but not between the low and high doses (see Table 2). Clients receiving a dosage

of vitamin C had fewer days of cold symptoms than clients who received a placebo.

Page 58: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

58

Table 1

Descriptive Statistics, Adjusted Means, and Tests of Significance for Vitamin C Dosages Across

Number of Days of Cold Symptoms During Treatment

Variable Mean SD Adjusted

Mean N Placebo 11.60 5.36 12.01 10

Low Dosage 8.40 3.83 7.72 10

High Dosage 6.40 4.67 6.67 10

Table 2.

Pairwise Comparisons

Group Comparisons

Mean

Difference p d

Placebo Low Dosage 4.30* .012 1.23

High Dosage 5.34* .002 1.52

Low Dosage High Dosage 1.04 .518 0.30

*p < .05

Page 59: Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods › ...Score Score 2 xy 20 0 0 18 -1 1 0 18 -2 4 16 -3 9 6 21 1 1 20 1 1 1 18 -2 4 19 0

Richard S. Balkin, Ph.D., 2020

59 Example 11: Logistic Regression Results

A simultaneous logistic regression was conducted on a dichotomous dependent

variable—relapse. Predictor variables included unpleasant mood states, euphoric mood, and

lessened vigilance. The assumption for linearity between the weighted combination of predictor

variables and the natural of the odds for relapse was met, c2(8) = 9.67, p = .287. A statistically

significant model for predicting relapse was evident, c2(3) = 15.06, p = .002. Regression

coefficients, Wald statistics, odds ratio, and the 95% confidence intervals for the odds ratio are in

Table 1. Neither unpleasant mood states nor euphoric mood are statistically significant predictors

of relapse. Lessened vigilance, however, was a significant predictor. For each single point

increase in lessened vigilance, there is a 1.19 times greater likelihood of maintaining sobriety.

Table 1

Logistic Regression Analysis of Relapse as a function of Unpleasant Mood, Euphoric Mood,

and Lessened Vigilance

95% Confidence Interval for Odds Ratio

Variables B Wald Odds Ratio Lower Upper

Unpleasant Mood -0.01 0.57 0.99 0.96 1.02

Euphoric Mood 0.01 0.25 1.01 0.96 1.07

Lessened Vigilance -0.18 7.94* 0.84 0.74 0.95

Constant 1.10 10.54 3.02

Note. Wald (df = 1).

*p < .05