Power Winnifred Louis 15 July 2009. Overview of Workshop Review of the concept of power Review...

PowerPower

Winnifred Louis

15 July 2009

Overview of Workshop

Review of the concept of power Review of antecedents of power Review of power analyses and effect size

calculations DL and discussion of write-up guide Intro to G-Power3 Examples of GPower3 usage

33

Power Power Comes down to a “limitation” of Null hypothesis testing Comes down to a “limitation” of Null hypothesis testing

approach and concern with decision errorsapproach and concern with decision errors Recall:Recall:

Significant differences are defined with reference to a Significant differences are defined with reference to a criterioncriterion, , (controlled/acceptable rate) for committing type-1 errors, (controlled/acceptable rate) for committing type-1 errors, typically .05typically .05

• the the type-1 errortype-1 error finding a significant difference in the finding a significant difference in the sample when it actually doesn’t exist in the populationsample when it actually doesn’t exist in the population

• type-1 error rate denoted type-1 error rate denoted However relatively little attention has been paid to the However relatively little attention has been paid to the

type-2 errortype-2 error• the the type-2 errortype-2 error finding no significant difference in the finding no significant difference in the

sample when there is a difference in the populationsample when there is a difference in the population• type-2 error rate denoted type-2 error rate denoted

44

Reality vs Statistical DecisionsReality vs Statistical Decisions

Hit (correct Hit (correct decision)decision)

1- 1- αα

Reality: H0 H1Statistical Decision:

Reject H0

Retain H0

55


““False alarm”False alarm”

αα(aka Type 1 error)(aka Type 1 error)


Reject H0

Retain H0

66


““Miss”Miss”

ββ(aka Type 2 error)(aka Type 2 error)


Reject H0

Retain H0

77



1 - 1 - ββ

PowerPower


Reject H0

Retain H0

88


““False alarm”False alarm”

αα(aka Type 1 error)(aka Type 1 error)


1 - 1 - ββ

PowerPower


1- 1- αα

““Miss”Miss”

ββ(aka Type 2 error)(aka Type 2 error)


Reject H0

Retain H0

powerpower is: is:

the probability of correctly rejecting a the probability of correctly rejecting a falsefalse null hypothesisnull hypothesis

the probability that the study will yield the probability that the study will yield significant results significant results if the research if the research hypothesis is truehypothesis is true

the probability of the probability of correctly identifying a truecorrectly identifying a true alternative hypothesisalternative hypothesis

powerpower

sampling distributionssampling distributions

the distribution of a statistic that the distribution of a statistic that we would expect if we drew an we would expect if we drew an infinite number of samples (of a infinite number of samples (of a given size) from the populationgiven size) from the population

sampling distributions have sampling distributions have means and SDsmeans and SDs

can have a sampling can have a sampling distribution for any statistic, but distribution for any statistic, but the most common is the the most common is the sampling distribution of the sampling distribution of the meanmean

H0: 1 = 2

= .025 = .025

Recall: Estimating pop means from sample meansRecall: Estimating pop means from sample meansHere – Null hyp is Here – Null hyp is truetrue

so if our test tells us - our sample of differences between means falls into the shaded areas, we reject the null hypothesis. But, 5% of the time, we will do so incorrectly.

(type I error) (type I error)

H0: 1 = 2

= .025

H1: 1 2

= .025

Here – Null hyp is Here – Null hyp is falsefalse

1 2

H0: 1 = 2

= .025

H1: 1 2

= .025

to the right of this line we reject the null hypothesis

POWER : 1 -

Reject H0Don’t Reject H0

H0: 1 = 2H1: 1 2

Correct decision:Rejection of H0

1 - POWER

type 1 error ( )

type 2 error ()

Correct decision:Acceptance of H0

1 -

factors that influence powerfactors that influence power

1.1. level level

remember the remember the level defines the probability of making level defines the probability of making a Type I errora Type I error

tthe he level is typically .05 but the level is typically .05 but the level might change level might change depending on how worried the experimenter is about depending on how worried the experimenter is about ttype I and ype I and ttype II errorsype II errors

tthe bigger the he bigger the the more powerful the test (but the the more powerful the test (but the greater the risk of erroneously saying there’s an effect greater the risk of erroneously saying there’s an effect when there’s not ... when there’s not ... ttype I error)ype I error)

E.g., use one-tail testE.g., use one-tail test

H0: 1 = 2

= .025 = .025(type I error) (type I error)

factors that influence power: factors that influence power: level level

H0: 1 = 2

= .025

H1: 1 2

= .025


POWER

H0: 1 = 2

= .025

H1: 1 2

= .025


= .05

2. 2. the size of the effect (d)the size of the effect (d)

the effect size is not something the experimenter the effect size is not something the experimenter can (usually) control - it represents how big the can (usually) control - it represents how big the effect is in reality (the size of the relationship effect is in reality (the size of the relationship between the IV and the DV)between the IV and the DV)

Independent of Independent of N N (population level)(population level) it stands to reason that with big effects you’re it stands to reason that with big effects you’re

going to have more power than with small, going to have more power than with small, subtle effectssubtle effects


H0: 1 = 2

= .025

H1: 1 2

= .025

factors that influence power: factors that influence power: dd

3. 3. sample size (sample size (NN))

the bigger your sample size, the more the bigger your sample size, the more power you havepower you have

large sample size allows small effects to large sample size allows small effects to emergeemerge or … big samples can act as a magnifying or … big samples can act as a magnifying

glass that detects small effectsglass that detects small effects


3. 3. sample size (sample size (NN))

you can see this when you look closely at formulasyou can see this when you look closely at formulas

the standard error of the mean tells us how much the standard error of the mean tells us how much on average we’d expect a sample mean to differ on average we’d expect a sample mean to differ from a population mean just by chance. The bigger from a population mean just by chance. The bigger the the NN the smaller the the smaller the standard errorstandard error and … smaller and … smaller standard errors = bigger standard errors = bigger zz scores scores

z = X -

X

X =

N


Std err

4.4. smaller variance of scores in the smaller variance of scores in the population (population (22))

small standard errors lead to more power. small standard errors lead to more power. NN is one is one thing that affects your standard errorthing that affects your standard error

the other thing is the the other thing is the variancevariance of the population ( of the population (22) )

basically, the smaller the variance (spread) in basically, the smaller the variance (spread) in scores the smaller your standard error is going to scores the smaller your standard error is going to bebe


H0: 1 = 2

= .025

H1: 1 2

= .025

factors that influence power: factors that influence power: N & N & 22

outcomes of interestoutcomes of interest

power determinationpower determination

NN determination determination

, effect size, N, and power related, effect size, N, and power related

Effect sizesEffect sizes

Measures of group differencesMeasures of group differences Cohen’s d (t-test)Cohen’s d (t-test) Cohen’s f (ANOVA)Cohen’s f (ANOVA)

Measures of associationMeasures of association Partial eta-squared (Partial eta-squared (pp

22)) Eta-squared (Eta-squared (22)) Omega-squared (Omega-squared (22) ) R-squared (RR-squared (R22))

Classic 1988 textIn the library

Measures of difference - dMeasures of difference - d

When there are only two groups When there are only two groups dd is the standardised is the standardised difference between the two groupsdifference between the two groups

to calculate an effect size (to calculate an effect size (dd) you need to calculate the ) you need to calculate the difference you difference you expectexpect to find between means and divide to find between means and divide it by the it by the expectedexpected standard deviation of the population standard deviation of the population

conceptually, this tells us how many conceptually, this tells us how many SDSD’s apart we ’s apart we expect expect the populations (null and alternative) to bethe populations (null and alternative) to be

01 -

= d

ˆ d x 1 x 2MSerror

Effect size d % overlap Small

.20

85

Medium

.50

67

Large

.80

53

Cohen’s conventions for dCohen’s conventions for d

H0: 1 = 2 H1: 1 2

overlap of distributionsoverlap of distributions

MediumSmallLarge

Eta squared is the proportion of the total Eta squared is the proportion of the total variance in the DV that is attributed to an effect.variance in the DV that is attributed to an effect.

Partial eta-squared is the proportion of the Partial eta-squared is the proportion of the leftover variance in the DV (after all other IVs are leftover variance in the DV (after all other IVs are accounted for) that is attributable to the effect accounted for) that is attributable to the effect

This is what SPSS gives you but dodgy (over This is what SPSS gives you but dodgy (over estimates the effect)estimates the effect)

Measures of association - Eta-Measures of association - Eta-SquaredSquared

2 SStreatment

SStotal

p2 SStreatment

SStreatment SSerror

Omega-squared is an estimate of the Omega-squared is an estimate of the dependent variable population variability dependent variable population variability accounted for by the independent variable.accounted for by the independent variable.

For a one-way between groups design:For a one-way between groups design:

pp=number of levels of the treatment =number of levels of the treatment variable, F = value and variable, F = value and nn= the number of = the number of participants per treatment levelparticipants per treatment level

Measures of association - Measures of association - Omega-squaredOmega-squared

ˆ 2 ( p 1)(F 1)( p 1)(F 1)np

2= SSeffect – (dfeffect)MSerror

SStotal + Mserror

Cohen’s (1988) Cohen’s (1988) ff for the one-way between groups for the one-way between groups analysis of variance can be calculated as followsanalysis of variance can be calculated as follows

Or can use eta sq instead of omegaOr can use eta sq instead of omega It is an averaged standardised difference between It is an averaged standardised difference between

the 3 or more levels of the IV (even though the the 3 or more levels of the IV (even though the above formula doesn’t look like that)above formula doesn’t look like that)

Small effect - Small effect - ff=0.10; Medium effect - =0.10; Medium effect - ff=0.25; =0.25; Large effect - Large effect - ff=0.40=0.40

Measures of difference - Measures of difference - ff

ˆ f ˆ 2

1 ˆ 2

Measures of association - R-Measures of association - R-SquaredSquared

RR22 is the proportion of variance explained is the proportion of variance explained by the modelby the model

In general RIn general R22 is given by is given by Can be converted to effect size fCan be converted to effect size f22

FF2 2 = R= R22/(1- R/(1- R22)) Small effect – Small effect – ff22=0.02; =0.02; Medium effect - Medium effect - ff2 2 =0.15; =0.15; Large effect - Large effect - ff2 2 =0.35=0.35

R2 SSmodel

SStotal

Summary of effect Summary of effect conventionsconventions

From G*PowerFrom G*Power http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/user_manual/http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/user_manual/

user_manual_02.html#input_valuser_manual_02.html#input_val

estimating effectestimating effect

prior literatureprior literature

assessment of how great a difference is importantassessment of how great a difference is important e.g., effect on reading ability only worth the trouble if at e.g., effect on reading ability only worth the trouble if at

least increases half a least increases half a SDSD

special conventionsspecial conventions

3838

side issues…side issues…

recall the logic of calculating estimates of recall the logic of calculating estimates of effect effect sizesize (i.e., criticisms of significance testing)(i.e., criticisms of significance testing) the tradition of significance testing is based upon an the tradition of significance testing is based upon an

arbitrary rule leading to a yes/no decisionarbitrary rule leading to a yes/no decision

power illustrates further some of the caveats power illustrates further some of the caveats with significance testingwith significance testing with a high with a high NN you will have enough power to detect a you will have enough power to detect a

very small effectvery small effect if you cannot keep error variance low a large effect if you cannot keep error variance low a large effect

may still be non-significant may still be non-significant

3939

side issues…side issues…

on the other hand…on the other hand… sometimes very small effects are importantsometimes very small effects are important by employing strategies to increase power by employing strategies to increase power

you have a better chance at detecting these you have a better chance at detecting these small effects small effects

4040

powerpowerCommon constraints :Common constraints :Cell size too smallCell size too small

• B/c sample difficult to recruit or too little time / moneyB/c sample difficult to recruit or too little time / moneySmall effects are often a focus of theoretical interest Small effects are often a focus of theoretical interest (especially in social / clinical / org)(especially in social / clinical / org)

• DV is subject to multiple influences, so each IV has small impactDV is subject to multiple influences, so each IV has small impact• ““Error” or residual variance is large, because many IVs unmeasured Error” or residual variance is large, because many IVs unmeasured

in experiment / survey are influencing DVin experiment / survey are influencing DV• Interactions are of interest, and interactions draw on smaller cell Interactions are of interest, and interactions draw on smaller cell

sizes (and thus lower power) than tests of main effects [Cell means sizes (and thus lower power) than tests of main effects [Cell means for interaction are based on n observations, while main effects are for interaction are based on n observations, while main effects are based on n x # of levels of other factors collapsed across]based on n x # of levels of other factors collapsed across]

determining powerdetermining power

sometimes, for practical reasons, it’s useful sometimes, for practical reasons, it’s useful to try to calculate the power of your to try to calculate the power of your experiment experiment beforebefore conducting it conducting it

if the power is very low, then there’s no if the power is very low, then there’s no point in conducting the experiment.point in conducting the experiment.

basically, you want to make sure you have basically, you want to make sure you have a reasonable shot at getting an effect (if one a reasonable shot at getting an effect (if one exists!)exists!)

which is why grant reviewers want themwhich is why grant reviewers want them

Post hoc power calculations

Generally useless / difficult to interpret from the point of view of stats

Mandated within some fields Examples of post hoc power write-ups

online at http://www.psy.uq.edu.au/~wlouis

G*POWERG*POWER G*POWER is a FREE program that can make the calculations G*POWER is a FREE program that can make the calculations

a lot easiera lot easier

http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/

Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social,

behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191. G*Power computes: power values for given sample sizes, effect sizes, and alpha

levels (post hoc power analyses), sample sizes for given effect sizes, alpha levels, and power

values (a priori power analyses) suitable for most fundamental statistical methods Note – some tests assume equal variance across groups and

assumes using pop SD (which are likely to be est from sample)

Ok, lets do it: BS t-test

two random samples of n = 25

expect difference between means of 5

two-tailed test, = .05

– 1 = 5

– 2 = 10

– = 10 .500 = 10

10 - 5 = d

G*POWERG*POWERpower calculations: example

two random samples of n = 25

expect difference between means of 5

two-tailed test, = .05

– 1 = 5

– 2 = 10

– = 10

So, with that expected effect size and n we get So, with that expected effect size and n we get power = ~.41power = ~.41

We have a probability of correctly rejecting null We have a probability of correctly rejecting null hyp (if false) 41% of the timehyp (if false) 41% of the time

Is this good enough?Is this good enough? convention dictates that researchers should be convention dictates that researchers should be

entering into an experiment with no less than entering into an experiment with no less than 80% chance of getting an effect (presuming it 80% chance of getting an effect (presuming it exists) ~ power at least .80exists) ~ power at least .80

determining determining NN

Determine nDetermine n

Calculate effect sizeCalculate effect size Use power of .80 (convention)Use power of .80 (convention)

WS t-testWS t-test Within subjects designs more powerful Within subjects designs more powerful

than between subjects (control for than between subjects (control for individual differences)individual differences)

WS t-test not very difficult in G*Power, but WS t-test not very difficult in G*Power, but becomes trickier in ANOVAbecomes trickier in ANOVA

Need to know correlation between Need to know correlation between timepoints (luckily SPSS paired t gives this)timepoints (luckily SPSS paired t gives this)

Or can use the mean and SD of Or can use the mean and SD of “difference” scores (also in SPSS output)“difference” scores (also in SPSS output)

ss

Screen clipping taken: 7/8/2008, 4:30 PM

Method 1

Difference scores

Dz = Mean Diff/ SD diff

= .0167/.0718= .233

ss


WS t-testWS t-test

I said before that WS are more powerful I said before that WS are more powerful than the equivalent BS versionthan the equivalent BS version

Let’s test this by using the same means Let’s test this by using the same means and SDs and using the Independent and SDs and using the Independent Samples t-test calculator in GPowerSamples t-test calculator in GPower


Between subjectsPower = .18

Within subjectsPower = .07

5656

Extension to 1-way anova…Extension to 1-way anova… In PSYC3010 you used Phi prime as the ANOVA equivalent In PSYC3010 you used Phi prime as the ANOVA equivalent

of of d d which is the same as Cohen’s which is the same as Cohen’s ff G*Power uses Cohen’s G*Power uses Cohen’s ff Numerous methodsNumerous methods1)1) calculate Omega sq and then use the formula for f and enter calculate Omega sq and then use the formula for f and enter

directlydirectly2)2) Calculate Omega sq or eta sq and enter into “Direct” under Calculate Omega sq or eta sq and enter into “Direct” under

“Effect size from variances”“Effect size from variances”3)3) Use means and use “Effect size from means”Use means and use “Effect size from means”

ˆ 2 ( p 1)(F 1)( p 1)(F 1)np

ˆ f ˆ 2

1 ˆ 2

ANOVAPTSD Severity

SS df Mean Square F Sig.Between Groups 507.84 3 169.28 3.269 0.030Within Groups 2278.74 44 51.7895Total 2786.58 47

Calculating omega & fCalculating omega & f

Given the above analysisGiven the above analysis

SoSo

ˆ 2 ( p 1)(F 1)( p 1)(F 1)np

(4 1)(3.269 1)(4 1)(3.269 1)(12)(4)

0.124

ˆ f ˆ 2

1 ˆ 2 0.124

1 0.1240.378

Not sure if this works withSPSS partial eta sq – havehad problems before & Omega more conservative anyway

6060

AlternativelyAlternatively Alternatively, if have means (note – this is a different Alternatively, if have means (note – this is a different

data set)data set)

meanmean DV score DV score nn

CoffeeCoffee 63.7563.75 1616Energy DrinkEnergy Drink 64.6964.69 1616WaterWater 46.5646.56 1616

MSMSerrorerror = = 125.21125.21 =58.33 =58.33 NN=48=48

use square root of MSE to enter into SD within each group in GPOwer

6262

how about 2-way factorial how about 2-way factorial anova?anova?

Need to test for Need to test for 3 effects3 effects to estimate the power: to estimate the power: Main effect IV 1Main effect IV 1 Main effect IV 2Main effect IV 2 Interaction effect (usually less power than main Interaction effect (usually less power than main

effects due to smaller n in each cell)effects due to smaller n in each cell)

See See http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/reference/http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/reference/reference_manual_07.htmlreference_manual_07.html

Within subjects ANOVAWithin subjects ANOVA

Not only need to know effect size but Not only need to know effect size but also correlation across time/varsalso correlation across time/vars Use a convention for estimating effect size Use a convention for estimating effect size

(G*Power uses either Lambda or Cohen’s f)(G*Power uses either Lambda or Cohen’s f) Calculate f using number of levels, effect Calculate f using number of levels, effect

convention, correlation (e.g., test-retest)convention, correlation (e.g., test-retest) Calculate Lambda (f * N)Calculate Lambda (f * N) Use Generic F testUse Generic F test

Within ExampleWithin Example 3 levels over time (m)3 levels over time (m) 64 Participants (n)64 Participants (n) Look for small effect (f = .01)Look for small effect (f = .01) Test-retest corr = .79 (p)Test-retest corr = .79 (p) Calc fCalc f = = (m*f)/(1-p) = (3*.01)/(1-.79) = .143(m*f)/(1-p) = (3*.01)/(1-.79) = .143 Calc Lambda = f*n = .143*64 = 9.152Calc Lambda = f*n = .143*64 = 9.152 DF 1 = m- 1 = 2DF 1 = m- 1 = 2 DF 2 = n*(m-1) = 128DF 2 = n*(m-1) = 128

Note. Can’t do a priori. If need toestimate upfront play with denominatorDF (based on N)

Within ExampleWithin Example

Refer to Karl Wuensch’s website for more Refer to Karl Wuensch’s website for more details re: RMdetails re: RM

http://core.ecu.edu/psyc/wuenschk/http://core.ecu.edu/psyc/wuenschk/StatsLessons.htmStatsLessons.htm

And Gpower manuals online – e.g.: And Gpower manuals online – e.g.: http://www.psycho.uni-duesseldorf.de/abteilungehttp://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/user-guide-n/aap/gpower3/user-guide-type_of_power_analysis type_of_power_analysis

Regression analysesRegression analyses

Effect size associated with Effect size associated with RR22

ff22 = R = R22/1-R/1-R22

For semipartialFor semipartial ff22 = sr = sr22/1-R/1-R22

fullfull

ff22 = .02 (small) = .02 (small) ff22 = .15 (medium) = .15 (medium) ff22 = .35 (large) = .35 (large) Convert to variance acct fConvert to variance acct f22/(1+ f/(1+ f22))

RR22

3 predictor variables3 predictor variables

RR22 for full model = .22 for full model = .22

ff22 = .22/(1-.22) = .282 = .22/(1-.22) = .282

N = 110N = 110

Change RChange R2 2 (HMR)(HMR)

2 steps, 2 predictors in step 1, 3 in step 22 steps, 2 predictors in step 1, 3 in step 2

RR22 for full model = .10 for full model = .10

Change RChange R22 for step 2 = .04 for step 2 = .04

ff22 = R = R22changechange/(1-R/(1-R22

fullfull))

ff22 = .04/(1-.1) = .0444 = .04/(1-.1) = .0444

N = 95N = 95

DF numerator for Step 2= 3DF numerator for Step 2= 3

Complex analysesComplex analyses

G*POWER useful for basic analysesG*POWER useful for basic analyses Complex analyses e.g., SEM, MLM etc Complex analyses e.g., SEM, MLM etc

usually look to monte carlo studiesusually look to monte carlo studies

Additional ResourcesAdditional Resources

http://www.danielsoper.com/statcalc/http://www.danielsoper.com/statcalc/ Some other statistical calculators including for Some other statistical calculators including for

powerpower

Power Winnifred Louis 15 July 2009. Overview of Workshop Review of the concept of power Review...

Documents

Transcript of Power Winnifred Louis 15 July 2009. Overview of Workshop Review of the concept of power Review...