Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods ›...
Transcript of Intermediate Advanced Stats Notes 2020balkinresearchmethods.com › Balkin_Research_Methods ›...
Richard S. Balkin, Ph.D., 2020
1
Hypothesis Testing
I. 2 types of hypotheses:
A. research/scientific—includes the following:
1. declarative statement about what we think should happen in our
study
2. an examination of variables that deter the study
B. statistical hypotheses: you have a null and alternative
1. accept one
a. Ho:
b. H1:
2. supports existence or lack of
a. accountability
b. evidence
c. does not prove
II. Hypothesis testing is governed by the Central Limit Theorem
A. Distribution of sample means from random sample will be normally
distributed
B. The more samples you have, the closer your values get to the mean
C. If your sample is large enough, outliers will not affect the normalizing of
the sample distribution
D. Sample size is not as important as representativeness
E. With a representative sample, the Central Limit Theorem allows you to
generalize about the population
21 Μ=Μ
21 Μ≠Μ
Richard S. Balkin, Ph.D., 2020
2
F. Steps to hypothesis testing
1. state the null and alternative hypotheses
2. identify level of significance—this is subjective; may be
directional or non-directional for t-tests or z- tests
3. perform test
4. accept or reject null
5. state results
G. error
True False
Retain Null
Reject Null
Correct
Type II error
Type I error
Correct Power
α−1 β
αβ−1
Richard S. Balkin, Ph.D., 2020
3
When increases, power increases
Increasing sample size is the greatest, most effective means of increasing power.
if we increase the sample size to n = 100, then:
Higher sample size decreases error and increases power
α
€
σX =σn
=525
=1
€
z =X −µσX
=20 −191
=11
=1
€
σX =σn
=5100
= .5
€
z =X −µσX
=20 −19.5
=1.5
= 2
Richard S. Balkin, Ph.D., 2020
4
Group1 Group 2
Score Score
20 0 0 18 -1 1 0
18 -2 4 16 -3 9 6
21 1 1 20 1 1 1
18 -2 4 19 0 0 0
23 3 9 21 2 4 6
20 0 0 20 1 1 0
Independent t-test Dependent t-test
x 2x x 2x xy
7746.
8974.120
1
1
1
=
=
=
x
x
s
sX ∑ = 182
1x
7303.
7889.119
2
2
2
=
=
=
x
x
s
sX ∑ = 162
2x
7660.
6.213
=
=
=∑
r
s
xy
xy
Richard S. Balkin, Ph.D., 2020
5
nullretain
if
nnnnxx
XXssXX
xx
812.19393.10.
9393.0646.11
61
61
1016181920
11)1()1( 2121
22
21
2121
21
<
=
==
!"
#$%
& ++
−
=
!!"
#$$%
&+
−+−
+
−=
−
−
∑∑
α
nullretain
srsss
XXsXX
xxxxxx
015.294.18666.1334.1
1)7303)(.7746)(.7660(.25334.6000.
1920
2212121
222121
<
=−
=−+
−
=−+
−=
−
−
Richard S. Balkin, Ph.D., 2020
6
ANOVA
I. Model equation A. Completely Randomized Design (CR-p)
1. j = specific tx level 2. i = observation for each participant
B.
1. observation for subject I in tx level j 2. grand mean 3. tx effect for population j and is equal to grand mean – population mean--
4. error effect
C. We do ANOVA, rather than several t-tests to keep type I error low (e.g. Five t-
tests if alpha = .05 = a total type I error rate of .25; this is unacceptable).
II. Computing ANOVA J = # of groups
# of subjects in a specific group (treatment level) = number in total sample
= mean of a specific group (treatment level) = grand mean
= variance; s = standard deviation
Sum of squares between is the sum of the squared deviations between the group mean minus the grand mean
Mean square between is the estimate of the population variance
between groups
Sum of squares within is the sum of the squared sample variances
Mean Square Within (also known as Mean Square error) is the
mean of the j sample variances
ijjijY εαµ ++=
=ijY=µ=jα
µµ −j=ijε
=jn.njX.X2s
....)(.)(.)( 222
211
2 XXnXXnXXnSS jjB −+−=−=∑
1−=JSSMS B
B
)...1()1( 2221
21 −+−= nsnsSSw
JnssMS w
w −=.
Richard S. Balkin, Ph.D., 2020
7
degrees of freedom
grand mean
Group1 Group 2 Group 3 Group 4
4 9 8 1 6 11 6 2 8 8 9 3 3 9 5 5 9 8 7 1
6 9 7 2.4 6.1
6.5 1.5 2.5 2.8
III. Model Assumptions
A. Independence—sampling, treatment implementation
B. Normality-- There are significance tests for both skewness and kurtosis
1. Skewness
standard error of skewness
2. Kurtosis
JnnndfJdf
w
b
−=−+−=
−=
.)...1()1(1
21
.......
21
2211
nnXnXnX
+
+=
w
B
MSMSF =
=1X =2X =3X =4X =.X=21s =22s =23s =24s
49.11325.32.38
325.3162.53
4202.53
2.53)4(8.2)4(5.2)4(5.1)4(5.6
2.3836.114
146.114
6.11445.6805.405.4205.)1.64.2(5)1.67(5)1.69(5)1.66(5 2222
==
==−
=
=+++=
==−
=
=+++
=−+−+−+−=
F
MS
SS
MS
SS
w
w
B
B
====N
SskewnessSSSz ss
6;;
Richard S. Balkin, Ph.D., 2020
8
standard error of kurtosis
Alpha levels of .01 or .001 are appropriate to evaluate significance of
skewness and kurtosis with small or moderate sample sizes. With a large
sample size, examine the shape of the distribution. Skewness above +1 or
below -1 should be evaluated for transformation. Usually kurtosis will
stabilize when skewness is stable.
3. Shapiro-Wilk statistic is also a measure of normality. Check this at the .01
level of significance.
For ANOVA, we should consider if a transformation is necessary if normality
assumption is breached
Sample data forms with recommended transformations:
As with many statistical techniques, transformation is an iterative process which requires post calculation evaluation. Check to see that a variable is normally or near-normally distributed after transformation, if not, redo with a more appropriate transformation.
====N
SkurtosisKSK
z kk
24;;
Richard S. Balkin, Ph.D., 2020
9
Continue attempting transformations until skewness and kurtosis values are nearest zero, or the fewest outliers. The suggested transformations above are intended to bring the distribution closer to normal.
C. Homogeneity of Variance (HOV)—Brown-Forsythe, Levene Statistic
1. for t-tests, if n’s are equal, t-test is robust—the procedure gives accurate
results
2. if n’s are unequal:
a. liberal F-test—inflated type I error if
1. larger variances are paired with smaller n’s and
2. smaller variances are paired with lager n’s
b. conservative F-test—deflated type I error
1. larger variances are paired with larger n’s and
2. smaller variances are paired with smaller n’s
3. Effects of model assumptions failure
a. Non-normality: negligible consequences on type I and type II error
UNLESS
1. Populations are highly skewed
2. N’s are very small
3. One tailed tests are employed (t or z tests only)
4. Kurtosis: leptokurtic distributions increase power; platykurtic
distributions decrease power
b. HOV: if n’s are equal, ANOVA is robust; when n’s are unequal and
HOV is not achieved, consider F’ (Welch statistic), t’, or F* (Brown-
Forsythe)
Richard S. Balkin, Ph.D., 2020
10
c. What HOV test to use?
1. Levene statistic
a. examines deviations from the mean in their respective group by
utilizing ANOVA
b. Default on SPSS is Levene statistic
c. Glass and Hopkins are very critical of Levene statistic:
1. Not robust to nonnormality
2. Poor estimator with unequal n’s
2. Brown-Forsythe is preferred test
a. also uses ANOVA but compares each observation and its
group median
b. robust to nonnormal distributions
c. accurate type I error rates
pp. 100-107 Kirk (1995)
IV. Power in ANOVA
A. Power in ANOVA is affected by
1. magnitude of differences between means
2. error variance
3. degrees of freedom in numerator: j-1
4. error degrees of freedom n.-j
5. type I error:
B. To determine power, compute : the noncentrality parameter
α
φ
Richard S. Balkin, Ph.D., 2020
11
Suppose we have three groups. Each group has 50 participants (N = 150). The WISC III is administered and has a standard deviation of 15. Group 1 has a mean of 90; Group 2 has a mean of 95; Group 3 has a mean of 100. Grand mean = 95.
(check Table G)
C. What if the standard deviation is unknown?
1. We can estimate using standard deviation units—this is subjective
2. Let’s assume we have 3 levels of an independent variable such as socio-
economic status (low, middle, high).
3. We will assume the high SES will be 1 standard deviation above the low SES
group in a given study; Middle SES will be between low and high. So,
)()(.)( 2
w
B
w
jj
MSjSS
MSjXXn
=−
= ∑φ
84.1
92.1)225(3
)25(50)0(50)25(50)(
25;595100
0;09595
25;59590
233
222
211
=−
=++
==
==−=
==−=
=−=−=
β
φ
α
α
α
w
B
j
j
j
MSjSS
X
X
X
25.5.)5.1(
0)5.5(.
25.5.)5.0(
5.]80/)1(20)5(.40)0(20[.
20,40,200.1,5.,0
2223
222
2221
321
321
==−=
=−=
=−=−=
=++=
===
===
α
α
α
X
nnnXXX
€
φ =SSB
j(MSw )=
20(.25) + 40(0) + 20(.25)3(1)
=103
=1.83
1−β = .75
Richard S. Balkin, Ph.D., 2020
12
check Table G
V. Effect size
Practical significance is measured by effect size. For ANOVA we use a Cohen’s d or a
Cohen’s f. A Cohen’s d compares two groups and may also be used for t-tests. Cohen’s f
compares 2 or more groups. Cohen’s d =
Example (from p.4):
note:
medium effect size, demonstrating a moderate level of practical significance
)1()1( 21
212
2
2
21
−+−
+==
−
∑∑nn
xxss
or
MS
ors
XX
ww
error
error
19
20
2
1
=
=
X
X79.190.1
2
1
=
=
ss
16
18
221
2
=
=
∑∑x
x)1(22 −=∑ nsx
54.84.11
10341
)16()16(16181920
===
−+−
+
−
Small = .2 Medium = .5 Large = .8
Richard S. Balkin, Ph.D., 2020
13
Cohen’s f =
Large effect size, demonstrating very strong practical significance
There is also another method of demonstrating practical significance—Eta-squared ( -
-which refers to strength of association between the independent variable(s) and the
dependent variable. Essentially, it indicates the amount of variance accounted for in the
dependent variable by the independent variable(s). If the strength of association is weak,
or low, the independent variable(s) have less meaning/relevance to the dependent
variable.
The reason effect size is a better measure of practical significance than strength of association is
because effect size is reported in standard deviation units, whereas strength of association is
concerned with accounting for amount of variance. The amount of variance accounted for will be
an issue in multiple regression.
VI. Post Hoc Analyses
A. When conducting an ANOVA on three or more groups, a statistically significant result
means that at least one group is statistically significantly different from the other groups.
Post hoc analyses determine what group or groups are statistically significantly different
€
(X j − X .)2
(J)MSw=
[(6 − 6.1)2 + (9 − 6.1)2 + (7 − 6.1)2 + (2.4 − 6.1)2](4)3.325
=
(.01+ 8.41+ .81+13.69)(4)3.325
=1.31
)2η
SSTOSSBG
SSWGSSBGSSBG
e
=+
=+
= 22
22
α
α
σσ
ση
Small = .10 Medium = .25 Large = .40
Small = .01 Medium = .059 Large = .138
Richard S. Balkin, Ph.D., 2020
14
from the others. In a post hoc analysis, each group mean is compared to all other group
means. A t-test is then conducted. The most common type of post hoc analysis is known
as a Tukey post hoc.
B. Conducting a Tukey Post Hoc analysis
1. determine the maximum number of comparisons
2. Compute q; q = the difference between two means. To compute q, we use
the standard error of the mean-- --except we do not use the standard
deviation in the computation (like we did in computing a z-test) but MSw.
nj refers to sample size of each group
if n’s are not equal then
j & k refer to the sample size of each group being compared.
Example (from p. 7):
Example: 4 groups : C = 4(4-1) / 2 = 6
First we compare the two groups with the greatest difference between the
means. If they are not significant, the other groups will not be either.
3. Use Table I: dfb = # of groups
dfw = n.-j
2/)1( −= JJC
Xs
€
sX =MSwn j
!!"
#$$%
&+=
kj
wX nn
MSs 112
82.5325.3
==Xs
€
q1 =X 2 − X 4
sX=9 − 2.4.82
= 8.05
Richard S. Balkin, Ph.D., 2020
15
qcrit (4, 16) = 4.05, p = .05
8.05 > 4.05
Since q1 is statistically significant, we will compare the next group until we have
either compared all of the groups or one group is not statistically significant. qcrit
will remain the critical value for each of the comparisons.
5.61 >4.05
Since q2 is statistically significant, we would move to the next largest pair until we
see that a pair is not significant or all groups are statistically signficant.
C. There are other methods of comparing mean differences. The Newman-Keuls method is
the same as the Tukey but does not adjust for type I error. Thus, if α = .05 and you are
making 6 comparisons, you have a 30% chance of type 1 error (.05 X 6). BUYER
BEWARE!
VII. Planned vs Post hoc (A priori vs. post hoc (a posteriori))
A. Multiple comparisons
1. unplanned post hoc i.e. Tukey
2. planned a priori i.e. contrast comparisons
B. Contrasts:
1. All mean comparisons have a theoretical constant applied to them that add to zero
e.g. ψ = µ3 – µ6; contrast coefficients are 1 and -1 respectively
2. Simple contrast = 2 mean comparisons
complex contrast = 3 or more mean comparisons
€
q2 =X 3 − X 4
sX=7 − 2.4.82
= 5.61
Richard S. Balkin, Ph.D., 2020
16
3. (p. 454 G&H) We can use contrasts to detect significant differences between groups
using a t ratio as a test statistic—for example, using the ANOVA example on p. 6, we are
going to compare groups 1 & 3 to group 2 and group 2 to 3.
a. compute the standard error of a contrast:
b. compute
c. Compute t ratio:
d. Now do the other contrasts (group 2 to group 3)
compute
Compute t ratio:
e. Use Table L for critical value: v = n.-j; number of contrasts was 2
Critical value at the .05 level is 2.467
f. When doing complex comparisons, use Scheffe. Keep in mind Scheffe does not
consider type I error: Family based type I error rate.
VIII. Planned Orthogonal Contrast(17.16, 17.18)
!!
"
#
$$
%
&++=
j
jw n
cnc
ncMSs
2
2
22
1
21
ˆ ...ϕ
9987.9975.)3(.325.350
521
5)1(
521
325.32
2
2
2
ˆ ===
!!!!
"
#
$$$$
%
&
++−
+=ϕs
jjjj XcXcXcXc +++=∑= ...ˆ 2211ψ
5.2)7(21)9)(1()6(
21
−=+−+=
50.29987.5.2ˆ
ˆ
−=−
=ψ
ψs
jjjj XcXcXcXc +++=∑= ...ˆ 2211ψ
0.2)7)(1()9(1 =−+=
734.115.10.2ˆ
ˆ
==ψ
ψs
Richard S. Balkin, Ph.D., 2020
17
A. POC are the most powerful tests of mean differences
B. Uses a contrast based type I error rate
C. Orthogonality refers to unique, non-overlapping information of j-1 orthogonal contrasts.
POC is appropriate when
1. all research questions can be answered by C = j-1 or fewer contrasts
2. all contrasts are orthogonal
Examples:
We have seven groups and we are going to compare group 1 to group 7; group 3 to group 6;
groups 1-4 to groups 5-7:
[1 0 0 0 0 0 -1]
[0 0 1 0 0 -1 0]
[¼ ¼ ¼ ¼ -1/3 -1/3 -1/3]
Not orthogonal
4 groups: Compare 1 to 2; 3 to 4; 1-2 to 3-4:
[1 -1 0 0]
[0 0 1 -1]
[- ½ -½ ½ ½ ]
=1ψ
=2ψ
=3ψ
( )127
310
311
310
411
410
410:&
0)0)(1()1(0)0(0)1(0)0(0)0(1:&
32
21
=!"
#$%
&−+!"
#$%
&−−+!"
#$%
&−+!"
#$%
&+!"
#$%
&+!"
#$%
&
=−+−++++
ψψ
ψψ
Richard S. Balkin, Ph.D., 2020
18
Orthogonal
D. While this method increases power, conditions requiring contrasts to be planned and
orthogonal restricts its utility.
Factorial ANOVA
I. Factorial ANOVA tests is used when there are 2 or more independent variables
A. 2 types of hypotheses are tested
1. Whether levels in the independent variables (IVs) are statistically different
2. Whether there is an interaction between or among the IVs
B. Main effects: IV = Gender, SES; DV = test score
1. Is depression significantly different between males and females?
2. Is depression significantly different among SES?
3. Are mean differences between males and females consistent across SES?
0210
210
211
211:&
0211
211
210
210:&
0)1(0)1(0)0)(1()0(1:&
31
32
21
=!"
#$%
&−+!"
#$%
&+!"
#$%
&−−+!"
#$%
&−
=!"
#$%
&−+!"
#$%
&+!"
#$%
&−+!"
#$%
&−
=−++−+
ψψ
ψψ
ψψ
Richard S. Balkin, Ph.D., 2020
19
e.g. no interaction
0
5
10
15
20
25
30
35
low middle high
SES
Test
sco
re
malesfemales
Richard S. Balkin, Ph.D., 2020
20
Interaction effect
B. The absence of an interaction is the statistical justification for generalizability
C. If an interaction does exist, generalization of the main effects must be explained.
0
5
10
15
20
25
30
35
40
low middle high
SES
Test
Sco
re
malesfemales
Richard S. Balkin, Ph.D., 2020
21
Factorial ANOVA K
Low Middle High Row Means Males 2 2 2 4 3 2 3 2 2.5 Cell mean 3 2.33 2.17 J Females 6 5 5 8 3 6 7.5 4 5 Cell mean 7.17 4 5.33 Column Means Grand Mean So, we will have 3 F-tests 1) Gender, 2) SES, 3) Gender X SES
(cell mean – (grand mean + row mean-grand mean + column mean-grand mean)
5.2.1 =X5.1ˆ −=α
5.5.2 =X5.1ˆ =α
08.5.1 =X 17.3.2 =X 75.3.3 =X08.1=β 83.−=β 25.−=β 0.4.. =X
2.
2 ..)(ˆ XXnKnKSS jjA −Σ=Σ= α
5.40])0.45.5()0.45.2)[3(3 22 =−+−=ASS2
.2 ..)(ˆ XXnJnJSS KKB −Σ=Σ= β
51.11)475.3()417.3()408.5)[(2(3 222 =−+−+−=BSS2)]ˆˆ..([ KJJK
JKAB XXnSS βα ++−= ∑∑
776.4)592.1(3)]006.45.35.006.44.34[(.3
]))25.(5.14(33.5())83.(5.14(4()08.15.14(17.7(
))25.()5.1(4(17.2())83.()5.1(4(33.2())08.1)5.1(4(3[(3222
222
==
+++++=
−++−+−++−+++−
+−+−+−+−+−+−++−+−=ABSS
Richard S. Balkin, Ph.D., 2020
22
(individual score – cell mean)
DF:
∑∑∑ −=K J i
JKiJKW XXSS 2)(
68.711.45.11.01111.69.37.111.03.03.11.45.11.011)33.55.5()33.56()33.55()44()43()45()17.75.7()17.78()17.76(
)17.25.2()17.22()17.22()33.22()33.23()33.22()33()34()32(222222222
222222222
=+++++++++++++++++=
−+−+−+−+−+−+−+−+−
+−+−+−+−+−+−+−+−+−=WSS
11−=
−=
KSSJSS
B
A
€
SSAB = (J −1)(K −1)SSW = n.− (J)(K)
64.)2)(3)(2/(68.7388.22/776.4755.52/51.115.401/5.40
==
==
==
==
E
AB
B
A
MSMSMSMS
885.373.3)12,2(99.8)12,2(28.63)12,1(
=
=
=
=
critFFFF
Richard S. Balkin, Ph.D., 2020
23
II. Analyzing Factorial ANOVA for nonsignificant interaction
A. Evaluate your data
B. Analyze model assumptions
C. Determine whether the interaction term is statistically significant. If you have a
nonsignificant interaction, you can generalize the omnibus F- test
D. Determine if either of the two main effects is statistically significant.
E. Review the sample means and the results of any necessary post hoc procedures
F. Summarize the results.
III. Analyzing Factorial ANOVA for significant interaction
A. Evaluate your data
B. Analyze model assumptions
C. Determine whether the interaction term is statistically significant. If you have a
significant interaction, you cannot generalize the omnibus F- test
D. Plot the interaction
E. Test for simple effects
1. Identify your hypotheses
2. sort your data and run your analysis
3. Calculate the F
Use the appropriate df associated with each value
Use F table to identify statistical significance
F. Review the sample means and the results of any necessary post hoc procedures
MSgroupswithinOmnibusMS
F effectsimple
−−−= −
Richard S. Balkin, Ph.D., 2020
24
SPSS can perform the proceeding steps using the following code in SPSS Syntax:
UNIANOVA DV BY IV1 IV2
/EMMEANS = TABLES (IV1 * IV2) COMP (IV2)
Note: The COMP statement could contain the other IV.
G. Summarize the results.
Evaluate your Data
Analyze Model Assumptions
Determine Interaction Effect
Non-Significant Interaction Significant Interaction (p>alpha) (p<alpha)
Report Main Plot the Interaction Effects for each Independent Variable Test for simple effects Use SPSS Syntax as above Perform Post-Hoc (Tukey, effect size) Note: Tukey need only be performed when the main effect is significant and only when there are three or more levels within the Independent variable
Analyze post hoc tests for each significant simple effect
Richard S. Balkin, Ph.D., 2020
25
Repeated Measures
I. Repeated measures follow a different experimental design, known as randomized block
design.
A. It differs in that each observation is measured more than once or individuals are matched
according to a specifric variable.
B. As a result, repeated measure designs can be viewed as univariate or multivariate
II. Analyzing Repeated Measures
A.Model assumptions
1. independence
2. normality/multivariate normality—repeated measures is quite robust ot this
assumption, especially when equal sample size are employed.
3. Sphericity is a way to measure homogeneity of covariance—a model assumption in
MANOVA. Sphericity refers to the equality of variances between each pair.
a. = extent to which covariance matrix deviates from sphericity
b. If = 1, sphericity is met
c. If > .70 sphericity holds—univariate is more powerful
B. If < .70
1. n is small, multivariate is more powerful
2. n is large, univariate is more powerful, but use Greenhouse-Geisser correction
3. Stevens (2007) recommended examining both approaches to confirm results.
ε
ε
ε
ε
Richard S. Balkin, Ph.D., 2020
26
C. Post hoc procedures—use dependent t-tests but with a Bonferroni correction:
α / c, where c = # of pairwise comparisons [c = # of groups (# of groups -1) / 2]
For example, with 4 separate administrations, .05 / [4(4 – 1) / 2] = .05 / 6 = .008.
All results will be compared to the .008 level of significance.
Richard S. Balkin, Ph.D., 2020
27
Multiple Regression
• The purpose of multiple regression is to explain variances and determine how and to what
extent variability in the criterion variable (dependent variable) depends on manipulation of
the predictor variable(s) (independent variable).
• Whereas ANOVA is experimental research (independent variable is manipulated), multiple
regression is a correlational procedure—it looks at relationships between predictor variables
and a criterion variable.
• Thus, both predictor and criterion variables are continuous in multiple regression.
• ANOVA and multiple regression both have a continuous variables as the dependent variable
(called criterion variable in regression) and utilize the F-test.
• In multiple regression, the F-test identifies a statistically significant relationship, as opposed
to statistically significant differences between groups in ANOVA.
Simple Regression—a review
raw score on Y for each individual
intercept
unstandardized regression coefficient for each raw score on x
error for each individual in predicting Y
• and are parameters to be estimated. Goal is to minimize error. Error is assumed to be
independent, normally distributed, with a mean of 0.
Simple regression formula:
• If we know information about X, we can predict Y
• We regress Y on X
iii xY εβα ++= 1
=iY
=α
=ix1β
=iε
α 1β
Richard S. Balkin, Ph.D., 2020
28
= predicted score of the dependent variable Y
b = regression coefficient
a = intercept
• The regression equation is based on the principle of least squares. The values used
minimize the errors in prediction. This is because the error in prediction is used in calculating
the regression coefficient.
• The difference is identified as
• The principle of least squares is calculated by summing the square errors of the prediction:
Understanding More Nomenclature
= total variance = ssreg + ssres
Remember, in ANOVA, sstot = ssb + ssrw
So, in regression,
F =
Because we are looking at variance and regression is directly tied to correlation,
F =
Multiple Correlation Coefficient
• The measure of effect size in regression is R2.
• It is the correlation between Y and Y’ squared
2xxy
b
bXaY
Σ
Σ=
+="
Y !
YY !−
2)( YY !−Σ
2yΣ
res
reg
res
reg
res
reg
MSMS
jNssjss
dfssdfss
=−−
=1/
///
2
1
)1/()1(/
2
2
−−− jNrjr
Small = .02 Medium = .13 Large = .26
Richard S. Balkin, Ph.D., 2020
29
and equals the amount of variance accounted for in the model.
• When the predictor variables are not correlated to each other, R2 = the sum of the squared
correlations between each predictor variable to the criterion variable.
• However, in most research, we deal with correlated predictors.
• Thus, this produces some redundancy in what is being measured due to the intercorrelations
of the predictor variables—the predictor variables are measuring some of the same things.
• As a result, the unique amount of variance accounted for by each predictor variable is
reduced, giving inaccurate measures of the importance of the predictor variable. This is
known as multicollinearity.
• One way to detect multicollinearity is to examine the intercorrelations of the predictor
variables. Another way is to compute Variance Inflation Factors (VIF).
• VIF: If VIF is > (1/1 - R2), then multicollinearity may be affecting your results.
• However, there can be a special case in which the intercorrelations between predictor
variables actually increase to the amount of variance accounted for (R2), but have no unique
contribution by themselves. This is known as a suppressor variable—it controls for error.
• Suppressor variables have
o zero or near zero correlations with the criterion
o Moderate to strong correlations with at least one predictor variable
• Suppressor variables are uncommon in social science research
Uniqueness Index
Sometimes, you may wish to run an analysis, excluding specific variables. You can calculate
whether or not the change in R2 is statistically significant. This is known as a uniqueness index.
22
yss
R reg
Σ=
Richard S. Balkin, Ph.D., 2020
30
F =
Regression Coefficients
• A regression coefficient for a given X variable represents the average change in Y that is
associated with one unit of change in X.
• The goal is to identify which of the predictor variables (X) are important to predicting the
criterion (Y).
• Regression coefficients may be nonstandardized or standardized.
• Nonstandardized regression coefficients (b) are produced when data are analyzed in raw
score form.
• It is not appropriate to use nonstandardized regression coefficients as the sole evidence of the
importance of the predictor variable. We can test the nonstandardized regression coefficient
It is possible to have a model that is statistically significant, but each predictor variable may
not be important. To test the regression coefficient,
•
• Important: The statistical significance of the nonstandardized regression coefficient is only
one piece of evidence that identifies the importance of the predictor variable and is not to be
used as the only evidence. This is because the nonstandardized regression coefficient is
affected by the standard deviation. Since different predictor variables have different standard
deviations, the importance of the variable is difficult to compare.
1/1/
2
22
−−−
−−
fullfull
reducedfullreducedfull
jNRjjRR
2xMSs
sbt
resb
b
S=
=
Richard S. Balkin, Ph.D., 2020
31
• When we use standardized regression coefficients ( ), all of the predictor variables have a
standard deviation of 1 and can be compared.
• To calculate the standardized regression coefficient:
Statistical and practical significance in multiple regression
• Statistical and practical significance should be determined for both the model and each
predictor variable.
o Determine statistical significance of the model by evaluating the F test.
o Determine practical significance of the model by evaluating R2 . Cohen (1992)
recommended using f2 to determine effect size, where with the
following effect size interpretations: small = .02, medium = .15, and large = .35.
These values can easily be converted to R2 with the following interpretations:
small = .02, medium = .13, and large = .26.
o Statistical significance of each predictor variable is determined by a t-test of the
beta weights.
o Practical significance of each predictor variable should be determined using two
types of measures:
§ The squared semiparital correlation coefficient (sr2), which is the part
correlation squared in SPSS output. sr2 represents the unique amount of
variance that the predictor variable brings to the model. The advantage of
this value is that the researcher gains information as to the amount of
information the predictor variable contributes that is not shared by any
y
x
ssb=β
€
f 2 =R2
1− R2
Richard S. Balkin, Ph.D., 2020
32
other variable in the model. However, this value is highly influenced by
intercorrelations with other predictor variables (i.e. multicollinearity).
§ In order to deal with this limitation, Thompson (1990; 2001) and Courville
and Thompson (2001) recommend examining structure coefficients.
Structure coefficients (rs) identify the relationship of a predictor variable
to what is predicted ( ). In other words, it is the proportion of the
correlation of the predictor variable and criterion variable (r) to the
predicted model (R). In other words, . When this value is squared,
the researcher can interpret the amount of variance that the predictor
variable contributes to the predictor model. While this value is not
distorted by multicollinearity, the value may not be pertinent if the overall
model is not significant. Thus, both sr2 and rs2 should be interpreted.
Model Assumptions
1. Predictor and criterion variables should be continuous and at least interval or ratio level
of measurement. You can use nominal level predictors, but they must be dummy-coded.
2. Sample should be random
3. Criterion variable should be normally distributed
4. Observations should be independent and not affected by another observation.
5. The relationship between the criterion variable and each predictor variable should be
linear.
6. Errors in prediction should be normally distributed
7. Errors should have a constant variance.
€
ˆ Y
Rr
rs xy=
Richard S. Balkin, Ph.D., 2020
33
8. Errors should not be correlated with other errors in other observations or with the
predictor variable.
9. Predictor variables should be measured without error.
10. Absence of specification errors: predictor variables are not theoretically tenable or
relationship is not linear.
Regression is robust against minor violations of most of these assumptions with the
exception of independent observations, measurement error, or specification errors.
Power (Note: this section uses information from Howell (2002) from the University of
Vermont: http://www.uvm.edu/~dhowell/gradstat/)
Remember, power is dependent upon sample size and effect size. In multiple regression the
number of predictor variables is also important. The more predictor variables you have, the
larger your sample size needs to be.
We need a measure of effect size. This is true of any calculation of power. We have been using R2 as our measure of effect size. When we calculate power, we will use a version of R2. This effect size will be called f2
First consider the situation where we want the power for a significant R2
We want to get the power that our overall multiple R with 4 predictors and 40 subjects will be significant if R2 = .35. Define f2 = R2/(1-R2)
let p = number of predictor variables.
Let v = N – p – 1 = number of df for error
define l = f2(p + v + 1)
Look l up in the tables Cohen gives.
Richard S. Balkin, Ph.D., 2020
34
For our example,
R = .59; R2 = .35
f2 = .35/(1-/35) = .35/.65 = .54
p = 4
v = 40 – 4 – 1 = 35
l = f2(p + v + 1) = .54(4 + 35 + 1) = .54´ 40 = 21.6
round this down to 20 to be conservative
round v down to 20 to be conservative.
A copy of one of Cohen’s tables is at the end of this book. (note: I have copied this from Cohen for classroom use. It is a copyrighted table.)
Power of the F Test, u= 1 to 8, a = .05, (u = number of predictors; v = df for error) So power for our example = .91. We have a very high probability of finding a significant correlation if the parameters are as I have specified them. l
u v 2 4 6 8 10 12 14 16 18 20
1 20 27 48 64 77 B5 91 95 97 98 99
60 29 50 67 79 88 92 96 98 99 99
120 29 51 68 80 88 93 96 98 99 99
00 29 52 69 81 89 93 96 98 99 99
2 20 20 36 52 65 75 83 88 92 95 97
60 22 40 56 69 79 87 91 95 97 98
Richard S. Balkin, Ph.D., 2020
35
120 22 41 57 71 80 87 92 95 97 98
00 23 42 58 72 82 88 93 96 97 99
3 20 17 30 44 56 67 75 82 87 91 94
60 19 34 49 62 73 81 87 92 95 97
120 19 35 50 64 75 83 89 93 95 97
00 19 36 52 65 76 84 90 93 96 98
4 20 15 26 38 49 60 69 76 83 87 91
60 17 30 44 57 68 77 83 89 92 95
120 17 31 46 58 70 78 85 90 93 96
00 17 32 47 60 72 80 87 91 94 96
5 20 13 23 34 44 54 63 71 78 83 87
60 15 27 40 52 63 72 80 86 90 93
120 16 29 41 54 65 75 82 87 91 94
00 16 29 43 56 68 77 84 89 93 95
6 20 12 21 30 40 50 59 66 73 79 84
Richard S. Balkin, Ph.D., 2020
36
60 14 25 37 48 59 68 76 83 87 91
120 14 27 39 50 62 71 79 85 89 93
00 15 27 40 53 64 74 81 87 91 94
7 20 11 19 28 37 46 54 62 69 75 80
60 17 24 35 45 56 65 73 80 85 89
120 13 25 37 47 59 68 76 82 87 91
OD 14 25 38 50 61 71 79 85 89 93
8 20 10 18 26 34 42 50 58 65 71 76
60 12 23 33 43 52 62 70 77 83 87
120 12 24 35 45 55 65 73 80 85 89
00 13 24 36 48 59 68 77 83 88 92
9 20 10 17 24 32 39 47 54 61 68 73
60 11 21 31 41 50 58 67 74 80 85
120 11 22 33 44 53 62 71 78 83 88
Richard S. Balkin, Ph.D., 2020
37
00 13 23 34 45 56 66 74 81 86 90
(Modified slightly from Cohen, 1988, for class purposes only.)
Richard S. Balkin, Ph.D., 2020
38
Example 1: Results for a One-way ANOVA with Tukey Post hoc
A one-way ANOVA was conducted to explore group differences based on score.
An alpha level of .05 was utilized. Descriptive statistics are in Table 1. All groups were
normally distributed. Variances were homogeneous, F (3, 16) = 1.367, p = .289.
Statistically significant differences were evident among the groups, F (3, 16) = 11.49, p <
.001. A large effect size was noted, h2= .68, indicative of a strong degree of practical
significance. Given the sample size of n = 20, statistical significance would be detected
only for large effect sizes, h2 > .41.
In order to investigate significant differences between groups, a Tukey post hoc
analysis was conducted. Statistically significant differences were noted between groups 1
and 4, groups 2 and 4, and groups 3 and 4 (see Table 2). Practical significance was
assessed using Cohen’s d. A moderate effect size was noted between groups 1 and 3.
Large effect sizes were noted between groups 1 and 2, 1 and 4, 2 and 3, 2 and 4, and 3
and 4 (see Table 2). Large effect sizes were indicative of very strong practical
significance.
Table 1
Descriptive Statistics
Group n Mean SD
1 5 6.00 2.55
2 5 9.00 1.225
3 5 7.00 1.581
4 5 2.40 1.673
Richard S. Balkin, Ph.D., 2020
39
Table 2.
Tukey post hoc analysis
Group
Comparisons Mean Difference p d
1 2 -3.00 0.0815 1.65
3 -1.00 0.8215 0.55
4 3.60* 0.0301 1.97
2 3 2.00 0.3393 1.09
4 6.60* 0.0002 3.62
3 4 4.60* 0.0052 2.52
*p < .05
Richard S. Balkin, Ph.D., 2020
40
Example 2: Results for a One-way ANOVA with a priori contrasts
A one-way ANOVA was conducted to explore group differences based on score.
An alpha level of .05 was utilized. Descriptive statistics are in Table 1. All groups were
normally distributed. Variances were homogeneous, F (3, 16) = 1.367, p = .289.
Statistically significant differences were evident among the groups, F (3, 16) = 11.489, p
< .001. A large effect size was noted, h2 = .68, indicative of a strong degree of practical
significance. Given the sample size of n = 20, statistical significance would be detected
only for large effect sizes, h2 > .41.
A priori contrast comparisons were conducted comparing groups 1 and 3 to group
2 and comparing group 2 to group 3. A Bonferroni adjustment yielded an alpha level of
.025 for statistical significance. There was a statistically significant difference between
groups 1 and 3 versus group 2, t(16) = -2.50, p = .024. A large effect size was noted, d =
2.74. There was not a statistically significant difference between group 2 versus group 3,
t(16) = 1.734, p = .102. A large effect size was noted, d = 1.10.
Table 1
Descriptive Statistics
Group n Mean SD
1 5 6.00 2.55
2 5 9.00 1.225
3 5 7.00 1.581
4 5 2.40 1.673
Richard S. Balkin, Ph.D., 2020
41
Example 3: Factorial ANOVA Results with Nonsignificant Interaction
A 2 X 3 ANOVA was conducted on grade point average improvement with
respect to differences in note-taking methods and gender. An alpha level of .05 was
utilized for this study. Males and females were normally distributed. Note-taking method
was also normally distributed for method 1, method 2, and the control group. Variances
were homogeneous, FLevene (5, 54) = .575, p = .719.
There was not a statistically significant interaction between gender and note-
taking method, F(2, 54) = 2.921, p = .062. Statistically significant differences were found
in grade point average improvement between males and females, F(1, 54) = 15.86, p <
.001. Males had significantly improved their gpa compared to females (see Table 1). A
large effect size was noted d = 1.08 indicating a strong degree of practical significance.
Statistically significant differences were found in grade point average improvement
among note-taking methods, F(2, 54) = 17.809, p < .001. A large effect size was noted
h2 = .40 indicating a strong degree of practical significance. A Tukey post hoc analysis
was conducted on note-taking method. Method 2 was statistically significantly higher
than both the control group (p < .001) and Method 1 (p = .001) (see table 1). Large effect
sizes were noted between method 2 and method 1, d = .94, method 2 and the control
group, d = 1.65, and method 1 and control, d = .63. Given the sample size of n = 60,
statistical significance would be detected for large effect sizes, h2 > .14.
Richard S. Balkin, Ph.D., 2020
42
Table 1
Change in GPA Across Gender and Note-Taking Method
Gender Mean SD n
Men 0.38 0.27 30
Women 0.19 0.19 30
Method 1 0.25 0.22 20
Method 2 0.47 0.25 20
Control 0.14 0.15 20
Richard S. Balkin, Ph.D., 2020
43
Example 4a: Factorial ANOVA Results with Significant Interaction
A 2 X 3 ANOVA was conducted on grade point average improvement with
respect to differences in note-taking methods and gender. An alpha level of .05 was
utilized for this study. Males and females were normally distributed. Note-taking method
was also normally distributed for method 1, method 2, and the control group. Variances
were homogeneous, FLevene (5, 54) = .575, p = .719.
There was a statistically significant interaction between gender and note-taking
method, F(2, 54) = 10.543, p < .0001 (see Figure 1). In order to evaluate the interaction,
simple effects were analyzed. There was no statistically significant difference in grade
point average improvement for males across note-taking methods F(2, 54) = 2.50, p =
.092. However, a moderate effect size was noted, h2 = .09. Note-taking methods for
males appear to be better than when no method is taught at all, but the type of method did
not make much difference (see Table 1). There was a statistically significant difference in
grade point average improvement for females across note-taking methods F(2, 54) =
25.86, p < .001. A large effect size was noted, h2 = .49, indicating a strong degree of
practical significance. Note-taking method 2 for females appears to be better than note-
taking method 1 or when no method is taught at all (see Table 1). In order to investigate
the differences in note-taking method among females, a Tukey post hoc analysis was
conducted. Method 2 was statistically significantly higher than both the control group (p
< .001) and Method 1 (p < .001) (see table 1). Large effect sizes were noted between
method 2 and method 1, d = 2.59 and method 2 and control, d = 2.94. A moderate effect
size was noted between method 1 and method 2, d = .36. Given the sample size of n = 60,
statistical significance would be detected only for large effect sizes, h2 > .14.
Richard S. Balkin, Ph.D., 2020
44
Figure 1.
Interaction Effect for Gender by Note-taking
Table 1
Change in GPA Across Gender and Note-Taking Method
Gender
Note-Taking
methods Mean SD n
Men Method 1 0.34 0.23 10
Method 2 0.31 0.19 10
Control 0.17 0.15 10
Women Method 1 0.17 0.18 10
Method 2 0.64 0.18 10
Control 0.11 0.15 10
Richard S. Balkin, Ph.D., 2020
45
Example 4b: Factorial ANOVA Results with Significant Interaction
A 2 X 3 ANOVA was conducted on grade point average improvement with
respect to differences in note-taking methods and gender. An alpha level of .05 was
utilized for this study. Males and females were normally distributed. Note-taking method
was also normally distributed for method 1, method 2, and the control group. Variances
were homogeneous, FLevene (5, 54) = .575, p = .719.
There was a statistically significant interaction between gender and note-taking
method, F(2, 54) = 10.543, p < .0001 (see Figure 1). In order to evaluate the interaction,
simple effects were analyzed. There was no statistically significant difference in GPA
improvement for method 1 across males and females F(1, 54) = 4.13, p = .047. However,
a moderate effect size was noted, d = .55. There was a statistically significant difference
in GPA improvement for method 2 across males and females, F(1, 54) = 17.02, p < .001.
A large effect size was noted, d = 1.12, indicating a strong degree of practical
significance. There was no statistically significant difference in GPA improvement for
the control group across males and females F(1, 54) = .55, p = .463. A small effect size
was noted, d = .20. Note-taking method 2 for females appears to be better than note-
taking method 1 or when no method is taught at all (see Table 1). Given the sample size
of n = 60, statistical significance would be detected only for large effect sizes, h2 > .14.
Figure 1.
Richard S. Balkin, Ph.D., 2020
46
Interaction Effect for Gender by Note-taking
Table 1
Change in GPA Across Gender and Note-Taking Method
Gender
Note-Taking
methods Mean SD n
Men Method 1 0.34 0.23 10
Method 2 0.31 0.19 10
Control 0.17 0.15 10
Women Method 1 0.17 0.18 10
Method 2 0.64 0.18 10
Control 0.11 0.15 10
Richard S. Balkin, Ph.D., 2020
47
Example 5: One-way ANOVA Repeated Measures Results
A one-way repeated measures ANOVA was conducted on four administrations of the
DEW at just married, 5 years of marriage, 10 years of marriage, and 15 years of marriage.
Descriptive statistics for each of the administrations are in Table 1. Assumptions of
normality and sphericity were met due to the balanced nature of the design. A statistically
significant difference among the administrations was evident, F(3, 87) = 7.664, p < .001.
A large effect size was evident, h2= .21. Post hoc analyses were conducted to analyze
significant differences between each of the administrations. Statistically significant
differences were found between administrations 1 and 3, 1 and 4, 2 and 3, and 2 and 4
with moderate effect sizes (see Table 2). Given the sample size of n = 30, statistical
significance would be detected for small effect sizes, h2 > .042.
Table 1
Descriptive Statistics for Test Administrations
Mean SD N
time1 65.8 9.23 30
time2 65.43 10.69 30
time3 63.10 10.68 30
time4 61.93 12.57 30
Richard S. Balkin, Ph.D., 2020
48
Table 2
Post hoc tests
Pair Mean Differences t d
time1 - time2 0.37 0.41 0.07
time1 - time3 2.7 2.90* 0.53
time1 - time4 3.87 3.00* .55
time2 - time3 2.33 3.76* 0.69
time2 - time4 3.5 3.40* 0.62
time3 - time4 1.17 1.47 0.27
* p < .008
Richard S. Balkin, Ph.D., 2020
49
Example 6: SPANOVA Results
Participants were involved in a dieting program to lose weight and recruited to
examine whether there was a statistical significant difference between two kinds of
exercise frequency in determination of the weight loss. Fifty participants were recruited
and randomly assigned into two groups. In Group 1, participants will exercise 30 minutes
everyday. In Group 2, participants will exercise 30 minutes for four days per week. Prior
to beginning the exercise, a pretest was conducted to see how many kilograms
participants lost in their weight by dieting in a month. Then participants engaged in
aerobic exercise for a month. Posttests were conducted at the end of the month to see how
many kilograms participants lost based on their exercise routine. An a priori analysis was
conducted to determine appropriate sample size. Given a moderate effect size, a sample
size of 34 was necessary to achieve adequate power at .80 (Cohen, 1988).
A split-plot analysis of variance (SPANOVA) was conducted between exercise
groups across pretest and posttest measures of weight loss. An alpha level of .05 was
utilized. Assumptions for normality, homogeneity of covariances and homogeneity of
variances were met. There was not a statistically significant interaction between the tests
and exercise group, F(1, 48) = 3.43, p = .07 (see Figure 1). Thus, results are generalizable
across testing periods and exercise groups. A statistically significant difference between
pretest and posttest weight loss was evident, F(1, 48) = 132.59, p < .001, hp2 = .73
indicative of a very large effect size. Weight loss was increased in both groups upon
engagement of an exercise routine (see Table 1). A statistically significant difference
between exercise groups was evident, F(1, 48) = 7.35, p = .009, hp2 = .13 indicative of a
Richard S. Balkin, Ph.D., 2020
50
moderate to large effect size. Group 2 (exercise four times per week) had slightly more
weight loss than those who exercised every day (group 1) as indicated in Table 1.
Table 1.
Descriptive Statistics of Exercise Groups Across Testing Periods.
Exercise type Mean SD N Pretest 1 2.34 .34 25
2 2.39 .26 25 Total 2.36 .30 50
Posttest 1 2.76 .11 25 2 2.97 .13 25
Total 2.86 .16 50
Figure 1.
Plot of Exercise Groups Across Testing Periods.
Richard S. Balkin, Ph.D., 2020
51
Example 7: Multiple Regression Results
A multiple regression analysis was conducted on statistics exam scores based on
math aptitude and English aptitude. Descriptive statistics are reported in Table 1.
Statistics exam scores were normally distributed. Standardized residuals were also
normally distributed. Scatterplots were analyzed, and no curvilinear relationships
between the criterion variable and the predictor variables or heteroscedascity were
evident. There was a statistically significant relationship between math aptitude, English
aptitude, and statistics exam score, F(2, 97) = 16.63, p < .001. A large effect size was
noted with approximately 26% of the variance accounted for in the model, R2 = .255.
Math aptitude was a statistically significant predictor of statistics exam score (see Table
2) uniquely accounting for approximately 21% of the variance. English aptitude was not
significant and uniquely accounted for 2% of the variance. Thus, as a single predictor,
English aptitude has a moderate effect, but is less meaningful when included in a model
with math aptitude. Given the sample size of n = 100, statistical significance would be
detected for small effect sizes, R2 > .09.
Richard S. Balkin, Ph.D., 2020
52
Table 1
Descriptive Statistics
Mean SD N Statistics exams Math aptitude English aptitude
Statistics exams 60.11 19.79 100 ---- .48* .20*
Math aptitude 460.60 77.37 100 ---- .12*
English aptitude 478.20 71.65 100 ----
* p < .05 Table 2
Multiple Regression Results for Statistics Exams
Predictor B SE B β t p sr2
Math aptitude .12 .02 .47 5.29* 0.00 .21
English aptitude .04 .02 .15 1.65 0.10 .02
* p < .05
Richard S. Balkin, Ph.D., 2020
53
Example 8: MANOVA Results
A one-way MANOVA was conducted to determine the effect of three learning strategies
(writing, talking, and thinking) on two learning outcomes (recall and application). An
alpha level of o.5 was utilized. Descriptive statistics for the dependent variables across
study group are in Table 1. Assumptions for normality (W > .01) and homogeneity of
covariances (Box’s M = 6.98, p = .398) were met. A statistically significant effect was
identified between learning styles and the two dependent variables, Wilks’ l = .421, F (4,
52) = 7.03, p < .001. Approximately 58% of the variance in the model was accounted for
in the combined dependent variables across study groups, yielding a strong effect. An a
priori power analysis yielded a total sample size of 45 to find statistical significance with
a moderate effect size (f2 = .15). In this study, statistical significance was noted when
effect sizes were moderate to large (f2 = .22).
A post hoc discriminant analysis was conducted to determine how the study group
differences were manifested across the dependent variables. The first discriminant
function was significant, Wilks’ l = .42, c2(4) = 22.90, p < .001. Approximately 97% of
the variance in the model was accounted for in the first discriminant function for recall
and application across study groups. Recall loaded strongly (r >.99) and had a strong
relationship (b = .96) to the first discriminant function (see Table 2). Based upon these
results, the first discriminant function was labeled memory. The second discriminant
function was not significant, Wilks’ l = .96, c2(1) = 1.12 p = .29 Approximately 3% of
the variance in the model was accounted for in the second discriminant function for recall
and application across study groups. Centroid means for the discriminant functions
Richard S. Balkin, Ph.D., 2020
54
indicated that the writing study strategy (1.41) had the most effect in memory, compared
to the talking (-.22) and thinking (-1.12) strategies.
Table 1
Descriptive Statistics.
Dependent
Variable
Study
Group Mean SD N
Recall Exam Think 3.30 0.68 10
Write 5.80 1.03 10
Talk 4.20 1.14 10
Application Exam Think 3.20 1.23 10
Write 5.00 1.76 10
Talk 4.40 1.17 10
Table 2
Correlation Coefficients and Standardized Function Coefficients.
Correlation Coefficients
with Discriminant Function
Standardized Function
Coefficients Variable
Recall .997 .96
Application .47 .09
Richard S. Balkin, Ph.D., 2020
55 Example 9: Canonical Correlation Results
A canonical correlation analysis was performed between the GASS subscales and the (a)
TSR subscales. An alpha level of .05 was utilized. Cutoff correlations of .30 were used for
interpretation of the canonical variates (Tabachnick & Fidell, 2001). Assumptions for normality,
linearity, and homoscedascity were evaluated through distributions of scatterplots; assumptions
were met and no multivariate outliers were detected. When conducting the analysis between the
GASS and SPS, five cases were eliminated due to missing data leaving a sample size of 120 for
the analysis.
A statistically significant relationship was found between GASS subscales and the TSR
subscales. The first canonical root was significant, l = .89, F (4, 232) = 3.55, p = .008,
accounting for 11% (rc = .33) of the overlapping variance. The second canonical root was not
significant, l = .85, F (1, 117) = .33, p = .569, accounting for less than 1% (rc = .05) of the
overlapping variance. Therefore, only the first canonical variate was interpreted.
The first canonical variate included scores on both subscales of the GASS, Coping (.99)
and Commitment to Follow-up (.63) and the subscales on the TSR: Behavioral subscale (-. 87)
and the emotional subscale (.56). Adolescent clients with high therapeutic goal attainment had
fewer behavioral problems and had higher levels of emotional arousal.
Table 1 shows descriptive statistics for the instruments used for this sample. Shown in
Table 2 are the correlations and standardized canonical variate coefficients for the GASS
subscales and the TSR subscales as they relate to the first canonical variate.
Richard S. Balkin, Ph.D., 2020
56
Table 1.
Descriptive Statistics for the GASS and TSR Scales
Scale N M SD
Goal Attainment Scale
Coping 124 .39 .66
Follow-up 124 .39 .62
Target Symptom Rating Scale
Behavioral 121 2.55 .58
Emotional 121 2.65 .50
Table 2
Correlations and Standardized Canonical Variate Coefficients on the GASS and TSR for the First Canonical Variate
Scale COR COE
Goal Attainment Scale
Coping .99 .98
Followup .63 .03
Target Symptom Rating Scale
Behavior -.87 -.83
Emotional .56 .49
Note. Correlations ³ .30 are in boldface. COR = correlations to the canonical variate. COE = standardized canonical variate coefficient.
Richard S. Balkin, Ph.D., 2020
57 Example 10: ANCOVA Results
An analysis of covariance (ANCOVA) was used to examine differences between three
levels of vitamin C dosages across the number of days of cold symptoms during treatment with
the number of days of cold symptoms prior to treatment as a covariate. Descriptive statistics are
in Table 1. Assumptions of normality, homogeneity of covariance, and homogeneity of
regression were met for this analysis. Dimitrov (2009) noted covariates should be correlated to
the dependent variable. Number of days of cold symptoms prior to treatment had a moderate
relationship to number of days of cold symptoms during treatment (r = .50, p = .005). Thus, the
number of days of cold symptoms prior to treatment appeared to be a tenable covariate that was
important to control in this study.
Using an alpha level of .05, a statistically significant effect was noted among vitamin C
dosages across the number of days of cold symptoms during treatment , F (2, 26) = 6.45, p =
.001, hp2 = .33. A large effect size was noted, with approximately 33% of the variance
accounted for in the model. Univariate comparisons were analyzed using adjusted means. A
statistically significant difference was noted between the placebo and low doses and the placebo
and high doses but not between the low and high doses (see Table 2). Clients receiving a dosage
of vitamin C had fewer days of cold symptoms than clients who received a placebo.
Richard S. Balkin, Ph.D., 2020
58
Table 1
Descriptive Statistics, Adjusted Means, and Tests of Significance for Vitamin C Dosages Across
Number of Days of Cold Symptoms During Treatment
Variable Mean SD Adjusted
Mean N Placebo 11.60 5.36 12.01 10
Low Dosage 8.40 3.83 7.72 10
High Dosage 6.40 4.67 6.67 10
Table 2.
Pairwise Comparisons
Group Comparisons
Mean
Difference p d
Placebo Low Dosage 4.30* .012 1.23
High Dosage 5.34* .002 1.52
Low Dosage High Dosage 1.04 .518 0.30
*p < .05
Richard S. Balkin, Ph.D., 2020
59 Example 11: Logistic Regression Results
A simultaneous logistic regression was conducted on a dichotomous dependent
variable—relapse. Predictor variables included unpleasant mood states, euphoric mood, and
lessened vigilance. The assumption for linearity between the weighted combination of predictor
variables and the natural of the odds for relapse was met, c2(8) = 9.67, p = .287. A statistically
significant model for predicting relapse was evident, c2(3) = 15.06, p = .002. Regression
coefficients, Wald statistics, odds ratio, and the 95% confidence intervals for the odds ratio are in
Table 1. Neither unpleasant mood states nor euphoric mood are statistically significant predictors
of relapse. Lessened vigilance, however, was a significant predictor. For each single point
increase in lessened vigilance, there is a 1.19 times greater likelihood of maintaining sobriety.
Table 1
Logistic Regression Analysis of Relapse as a function of Unpleasant Mood, Euphoric Mood,
and Lessened Vigilance
95% Confidence Interval for Odds Ratio
Variables B Wald Odds Ratio Lower Upper
Unpleasant Mood -0.01 0.57 0.99 0.96 1.02
Euphoric Mood 0.01 0.25 1.01 0.96 1.07
Lessened Vigilance -0.18 7.94* 0.84 0.74 0.95
Constant 1.10 10.54 3.02
Note. Wald (df = 1).
*p < .05