Presentazione standard di...
Transcript of Presentazione standard di...
Methodological workshop How to get it right: why you should think twice before planning your next study
Luigi Lombardi
Dept. of Psychology and Cognitive Science, University of Trento
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Part 1
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
The Neyman-Pearson paradigm (N-H)
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
The power algebra 1
The N-H table
power
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
The power algebra 1
Probabilistic interpretation
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
The power algebra 1
Graphical interpretation
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
The power algebra 1
Decision rule in the N-H approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
The power algebra 1
Power analysis is based on four different parameters:
Power (population level)
Type I error (population level)
Effect size (population level)
Hypothetical Sample size
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Effect size (population level)
Effect size parameter defining HA; it represents the degree of deviation from H0 in the underlying population
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
A priori power analysis
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
A priori power analysis: an example using the pwr package
0.05
0.85
0.2
181.09
One-sample t-test: H0 0
pwr.t.test(d=0.2,power=0.85,sig.level=0.05,n=NULL,typ
e="one.sample",alternative="greater")
R syntax
One-sample t test power calculation
n = 181.0934
d = 0.2
sig.level = 0.05
power = 0.85
alternative = greater
R output
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Post hoc power analysis
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Post hoc power analysis: an example using the pwr package
60
0.454
One-sample t-test: H0 0
pwr.t.test(d=0.2,n=60,sig.level=0.05,power=NULL,type=
"one.sample",alternative="greater")
R syntax
One-sample t test power calculation
n = 60
d = 0.2
sig.level = 0.05
power = 0.4548365
alternative = greater
R output
0.05 0.2
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Sensitivity analysis
.
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Sensitivity analysis: an example using the pwr package
0.419
One-sample t-test: H0 0
pwr.t.test(n=50,power=0.9,sig.level=0.05,d=NULL,type=
"one.sample",alternative="greater")
R syntax
One-sample t test power calculation
n = 50
d = 0.4197092
sig.level = 0.05
power = 0.9
alternative = greater
R output
0.90
0.05 50
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Criterion analysis
.
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Criterion analysis: an example using the pwr package
0.044
One-sample t-test: H0 0
pwr.t.test(n=100,d=0.3,power=0.9,sig.level=NULL,type=
"one.sample",alternative="greater")
R syntax
One-sample t test power calculation
n = 100
d = 0.3
sig.level = 0.04489474
power = 0.9
alternative = greater
R output
0.90
0.3 100
The power algebra 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
.
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed power analysis
The effect size (at population level) is replaced with the observed effect size d (at the sample level)
The basic idea of observed power analysis is that there is evidence for the null hypothesis being true if p > and the computed power is high at the observed effect size d
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed power analysis
The effect size (at population level) is replaced with the observed effect size d (at the sample level)
Note d is not a theoretical value (hypothetical value)
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed power analysis
The effect size (at population level) is replaced with the observed effect size d (at the sample level)
Note d is not a theoretical value (hypothetical value)
It is estimated from the sample according to the theoretical model for the null hypothesis
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed power analysis
The effect size (at population level) is replaced with the observed effect size d (at the sample level)
Note d is not a theoretical value (hypothetical value)
It is estimated from the sample according to the theoretical model for the null hypothesis
It is biased!!!
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed power analysis – hypothetical derivations
Basic power analysis claim:
(p > ) AND (power is high) entails «evidence for H0 is high»
Some ‘derivations’: NOT [(p > ) AND (power is high)] iff
NOT(p > ) OR NOT(power is high)
Some ‘derivations’: 1. NOT(p > ) AND (power is high) entails ?? 2. (p > ) AND NOT(power is high) entails ?? 3. NOT(p > ) AND NOT(power is high) entails ??
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed power analysis – hypothetical derivations
Some interpretations:
(p > ) AND NOT(power is high) entails «evidence for H0 is weak»
The underlying idea is: if we increase the sample size, then we raise the power, and probably we can reject H0!
However some of these interpretations lead us to the a paradox!
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
There is a negative
monotonic relationship
between observed power
and p-value!
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
There is a negative
monotonic relationship
between observed power
and p-value!
That is to say, because of the one-to-one relationship between p-values and observed
power, nonsignificant p-values always correspond to low observed powers!!!
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
There is a negative
monotonic relationship
between observed power
and p-value!
That is to say, because of the one-to-one relationship between p-values and observed
power, nonsignificant p-values always correspond to low observed powers!!!
Hence, we will never observe nonsignificant p-values corresponding
to high observed powers. The main claim is a nonsense!
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
relationship between observed power and p-value – simulation study
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
n <- 50
mu0 <- 0
sd <- 1
B <- 2000
simPv <- rep(0,B)
simPw <- rep(0,B)
for (b in 1:B) {
X <- rnorm(n,mu0,sd)
dobs <- (mean(X))/sqrt(((n-1)*sd^2)/(n-1))
simPv[b] <- t.test(X)$p.value
simPw[b] <- pwr.t.test(d=dobs,n=n,sig.level=0.05,power=NULL,
type="one.sample",alternative="two.sided")$power
}
plot(simPv,simPw,ylab="Observed power", xlab="p-value")
R syntax
One-sample t-test: H0 0 = 0 (simulation study)
The power algebra: the power fallacy 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Computing observed effect sizes 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed effect sizes allow to compute the magnitute of an effect of interest. They can be understood as estimates of the differences between groups or the strength of associations between variables.
Widely used examples of observed effect sizes are: • Different typologies of d measures (Cohen, 1988; Hedges,
1981; Rosenthal, 1994; Dunlap et al., 1996) • Association measures such as, for example, the correlation r
Differences Between groups
Association between quantitative
variables
Computing observed effect sizes 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed effect size for comparing two independent groups
Computing observed effect sizes 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed effect size for comparing two independent groups with t values
Note this is a Transformation index
Computing observed effect sizes 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed effect size for comparing two dependent groups with t values
Computing observed effect sizes 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Conversion formulae
Note however that conversions
may unnecessarily incur in
some sort of bias
Computing observed effect sizes 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Observed effect size derived from regression models
In general, it is always possible to obtain t values from a regression model for each continuous predictor variable and also for each group (level) of a categorical predictor variable (specifically for each of its recoded dummy variables of the categorical predictor):
where n1 and n2 are the sample sizes for two groups and df denotes the degrees of freedom used for the associated t value in a linear model
Categorical predictor
Continuous predictor
Computing observed effect sizes 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Deriving approximate confidence intervals (CI) for effect sizes
In general, computing approximate CI for effect sizes is not an easy task as the equations usually vary according to the selected effect size index and also the way it has been derived from the specific statistical analysis. A general equation is the following:
The main problem regards the way we compute the asymptotic standard error (se). A better way may be to use a parametric bootstrap approach to derive empirical Cis for effect sizes.
95% CI for ES
Computing observed effect sizes 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
beta0 <- 0
beta1 <- 0.5
beta2 <- -2.0
n <- 100
x1 <- rnorm(n,10,5)
a <- c(rep("a1",n/2),rep("a2",n/2))
x2 <- c(rep(0,n/2),rep(1,n/2))
y <- beta0 + beta1*x1 + beta2*x2 + rnorm(n,0,4)
plot(x1,y)
plot(x2,y)
boxplot(y ~ a)
MR <- lm(y ~ x1 + a)
summary(MR)
# effect size categorical variable a - second level (a2)
d <- (summary(MR)$coefficients[3,3]*(n))/(sqrt((n/2)^2)*sqrt(MR$df))
# effect size for the quantitative variable x1
r <- summary(MR)$coefficients[2,3]/sqrt(summary(MR)$coefficients[2,3]^2
+ MR$df)
d
r
R syntax (…)
Multiple regression model: 1 quant. predictor
+ 1 categ. predictor (simulation study)
1
Computing observed effect sizes 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
# Parametric bootstrap for approximate 95% CIs for effect sizes #####
# number of simulations: B
B <- 500
dSim <- rep(0,B)
rSim <- rep(0,B)
for (b in 1:B) {
YS <- simulate(MR,1)[,1]
MS <- lm(YS ~ x1 + a)
# absolute effect size
dSim[b] <-
abs(summary(MS)$coefficients[3,3]*(n))/(sqrt((n/2)^2)*sqrt(MS$df))
rSim[b] <-
summary(MS)$coefficients[2,3]/sqrt(summary(MS)$coefficients[2,3]^2 +
MS$df)
}
par(mfrow=c(1,2))
plot(density(dSim),main="Distribution for simulated |d|")
hist(dSim,freq=F,add=T)
plot(density(rSim),main="Distribution for simulated r")
hist(rSim,freq=F,add=T)
quantile(dSim,probs=c(0.025,0.975))
quantile(rSim,probs=c(0.025,0.975))
R syntax (end)
Multiple regression model: 1 quant. predictor
+ 1 categ. predictor (simulation study)
1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
95% CI for |d| [0.508, 1.357]
95% CI for r
[0.368, 0.654]
Multiple regression model: 1 quant. predictor
+ 1 categ. predictor (simulation study) 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
beta0 <- 0
beta1 <- 0.5
beta2 <- -2.0
n <- 100
x1 <- rnorm(n,10,5)
a <- c(rep("a1",n/2),rep("a2",n/2))
x2 <- c(rep(0,n/2),rep(1,n/2))
muL <- beta0 + beta1*x1 + beta2*x2 # linear predictor
piS <- exp(muL)/(1+exp(muL)) # inverse transformation muL
y <- rbinom(n,40,piS) # generate binomial counts u.b. = 40
plot(x1[a=="a1"],y[a=="a1"],xlab="x1",ylab="y")
points(x1[a=="a2"],y[a=="a2"],pch=3)
MR <- glm(cbind(y,40-y) ~ x1 + a, family='binomial')
summary(MR)
df <- 97 # as if t-tests were used
# effect size categorical variable a - second level (a2)
d <- (summary(MR)$coefficients[3,3]*(n))/(sqrt((n/2)^2)*sqrt(df))
# effect size for the quantitative variable x1
r <- summary(MR)$coefficients[2,3]/sqrt(summary(MR)$coefficients[2,3]^2
+ df)
d
r R syntax (…)
Multiple logistic regression model: 1 quant.
predictor + 1 categ. predictor (simulation study) 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
# Parametric bootstrap for approximate 95% CIs for effect sizes #####
B <- 500
dSim <- rep(0,B)
rSim <- rep(0,B)
for (b in 1:B) {
YS <- simulate(MR,1)[,1]
MS <- glm(YS ~ x1 + a, family='binomial')
# absolute effect size
dSim[b] <-
abs(summary(MS)$coefficients[3,3]*(n))/(sqrt((n/2)^2)*sqrt(df))
rSim[b] <-
summary(MS)$coefficients[2,3]/sqrt(summary(MS)$coefficients[2,3]^2 + df)
}
par(mfrow=c(1,2))
plot(density(dSim),main="Distribution for simulated |d|")
hist(dSim,freq=F,add=T)
plot(density(rSim),main="Distribution for simulated r")
hist(rSim,freq=F,add=T)
quantile(dSim,probs=c(0.025,0.975))
quantile(rSim,probs=c(0.025,0.975))
R syntax (end)
Multiple logistic regression model: 1 quant.
predictor + 1 categ. predictor (simulation study) 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
95% CI for |d| [2.318, 2.767]
95% CI for r
[0.908, 0.917]
Multiple logistic regression model: 1 quant.
predictor + 1 categ. predictor (simulation study) 2
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Multiple logistic regression model: 1 quant.
predictor + 1 categ. predictor (simulation study) 2
For glm (generalized linear models) the t values must be replaced with z values. However, the degrees of freedom should be computed as if t-tests were used.
Categorical predictor
Continuous predictor
z
Cautionary note
When using glm models to derive ESs, it is uncertain the amount of bias that may be incurred using the above modified equations
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Beyond power calculations 3
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
One of the main problems of standard power analysis is that it puts a narrow emphasis on statistical significance which is the primary focus of many study designs. However, in noisy, small-sample settings, statistically significant results can often be misleading. This is particularly true when observed power analysis is used to evaluate the statistical results.
Beyond power calculations 3
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Beyond power calculations
A better approach would be
Design Analysis (DA): a set of statistical calculations about what could happen under hypothetical replications of a study (that focuses on estimates and uncertainties rather than on statistical significance)
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
Somehow this work represents a kind of conceptual «bridge» linking the frequentist approach with a more Bayesian oriented perspective
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
DA main tokens
The observed effect
The true population effect
The standard error (SE) of the observed effect
The Type I error
A hypothetical normally distributed random variable with parameters D and s
(note this constitutes a conceptual leap)
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
DA main tokens
The main goals are to compute:
and dc being the cumulative standard normal distribution and the critical value for the effect size, respectively
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
DA main tokens
The main goals are to compute:
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
DA main tokens
The main goals are to compute:
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
Gelman & Carlin (2014), p. 644
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
retrodesign <- function(A, s, alpha=.05, df=Inf, n.sims=10000){
z <- qt(1-alpha/2, df)
p.hi <- 1 - pt(z-A/s, df)
p.lo <- pt(-z-A/s, df)
power <- p.hi + p.lo
typeS <- p.lo/power
estimate <- A + s*rt(n.sims,df)
significant <- abs(estimate) > s*z
exaggeration <- mean(abs(estimate)[significant])/A
return(list(power=power,typeS=typeS,exaggeration=exaggeration))
}
R function: Gelman & Carlin (2014), p. 644
Beyond power calculations
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
A simple example: linear regression
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-15.1642 -4.7063 -0.9168 5.5848 15.6263
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.6061 3.9588 -0.153 0.879
x 2.1792 0.3697 5.894 7.96e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.779 on 38 degrees of freedom
Multiple R-squared: 0.4776, Adjusted R-squared: 0.4638
F-statistic: 34.74 on 1 and 38 DF, p-value: 7.955e-07
R syntax
Simple regression with lm()
Beyond power calculations
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
> retrodesign(1, 0.3697, df=38)
$power
[1] 0.7498592
$typeS
[1] 2.054527e-05
$exaggeration
[1] 1.161278
R syntax
Design Analysis
True population effect
D = 1
Beyond power calculations
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
D = 1
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
> retrodesign(0.5, 0.3697, df=38)
$power
[1] 0.2536931
$typeS
[1] 0.003356801
$exaggeration
[1] 1.962419
R syntax
True population effect
D = 0.5
Beyond power calculations
Design Analysis
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
D = 0.5
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
5000 simulated samples with 20 observations each
from a normal distribution with parameters = 0.5; s = 0.9
% of significant results (≠ 0) : 39.7 % of sample means > D(=) : 32.3
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Gelman & Carlin (2014), p. 644
Beyond power calculations
Type S error as a function of Power
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Gelman & Carlin (2014), p. 644
Beyond power calculations
Exaggeration ratio as a function of Power
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Beyond power calculations
Practical implications:
Design Analysis strongly suggest larger sample sizes than those that are commonly used in psychology. In particular, if sample size is too small, in relation to the true effect size, then what appears to be a win (statistical significance) may really be a loss (in the form of a claim that does not replicate).
For a more formal presentation of the DA approach see Gelman A. & Tuerlinckx F. (2000). Type S error rates for classical and Bayesian single and multiple comparison procedures. Computational Statistics, 15, 373–390.
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
3
Fake data analysis 4
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Fake data analysis: the SGR approach 4
SGR = Sample Generation by Replacement (Lombardi & Pastore, 2012; Pastore & Lombardi, 2014, Lombardi & Pastore, 2014;
Lombardi et al., 2015)
SGR is a data simulation procedure that allows to generate artificial samples of fake discrete/ordinal data. SGR can be used to quantify uncertainty in inferences based on
possible fake data as well as to evaluate the implications of fake data for statistical results. For example, how sensitive are the results to possible fake data? Are the
conclusions still valid under one or more scenarios of faking manipulations?
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
Some «examples»
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
Some «examples»
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
The SGR logic
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
This is usually not
directly observable
This is observable
Information (data)
The SGR logic
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
The replacement distribution
Ori
gin
al v
alu
e d
Replaced value f
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
Other examples of replacement distribution
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
Other examples of replacement distribution
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
The sgr package (The R Journal, 6(1), 164-177)
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
The sgr package (The R Journal, 6(1), 164-177)
sgr package is
available on the
CRAN repository
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4 S
pe
arm
an
co
rre
lati
on
Proportion of subjects with fake responses (faking-good type)
Effect of faking on two items that are originally not correlated (n=50)
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4 S
pe
arm
an
co
rre
lati
on
Proportion of subjects with fake responses (faking-good type)
Effect of faking on two items that are originally not correlated (n=100)
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
SGR allows to test and compare different fake data models
Fake data hypotheses
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
SGR allows to test and compare different fake data models
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
SGR allows to test and compare different fake data models
…also more complex factorial models
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
4
SGR allows to test and compare different fake data models
to evaluate the effect on g.o.f. statistics
Fake data analysis: the SGR approach
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions
Thank you for your attention!
visit the WS website at http://polorovereto.unitn.it/~luigi.lombardi/WS2016.html
May 20 2016 Luigi Lombardi – Power analysis and some of its extensions