Post on 19-Jan-2016
Research planning
Planning v. evaluating research
To a large extent, the same thing
Plan a study so that it is capable of yielding data that could possibly allow you to draw a relevant conclusion from the data
Evaluate other studies to check that the conclusions they claim can be drawn from their data really do follow
Summary
Quality of the research questionlink to previous theory (theories)precision
Design and ‘causal’ research questionsPowerSample sizeEffect sizeConfidence intervals
Imaginary study
Research question
Do second year students have a ‘sweeter tooth’ than third year students?
• Give WSS to a sample of current y2 and y3 psychology students.
• Predict, My2 > My3
Any good as a research question?
Not a terribly good research question
Theoretically vacuous
why would we expect third years to lose their taste for sweet things?
what psychological theories are supposed to be relevant?
Could be made into a better question
Link the research question, in a specific and precise way, to previous research
The sugar-experience theory claims that as people acquire more memories, they develop a more dense neural-network. This density requires more sugar for energy and fuel.
The sugar-young theory claims that as people get older, they lose bits of brain stuff, and so the fuel requirements of the brain reduce.Consequently sugar becomes less desirable.
Of course, it doesn’t have to be a neuropsychological theory
Causal conclusion?
Can’t make a causal conclusion
because:
quasi-experimental design
There may be other differences between second and third year students than just year of study
… so if result is My2 > My3
Could be because loss of brain stuff due to ageing reduces need for sugar
Or, it could be that:
- larger class size drives you to sugar
- living on campus puts you off sugar
…
Or, we were unlucky, and its just one of the 5% of samples…
Design of study limits conclusions
Experiment, with random allocation of participants to conditions
could allow a causal conclusion
Quasi-experiment, or correlational study
no causal conclusion yet
Result Y2 sweetness > Y3 ?
Could be because loss of brain stuff due to ageing reduces need for sugar
Or, it could be that:
- Larger class size drives you to sugar
- Living on campus again puts you off sugar
…Or, we were unlucky, and its just one of the 5% of samples…
Directness of measures
Year of study (2 versus 3) is our IV
However, “Year” is standing for the amount of neural material (one hypothesis says it is lost, the other says it is gained)
Ideally, we would measure that directly.
Aim for the most direct measures you can get
What if there is no significant difference?
What can you conclude?
There really is no effect
There really is an effect, but we did not detect it because…
We were unlucky (again!)Measures lack validity
reliabilitySample size too small
1.2.3.4.
power
Probability that any particular (random) sample will produce a statistically significant effect
Eg. power = 0.9
90% chance of detecting an effect if there really is an effect
Researchers usually aim to have power at 80-90%
make it easier to detect an effect
Test of F-ratio for ANOVA
F =effect we are interested in
error variance
making it easier to detect an effect
F =
effect we are interested in
error variance
Effect size ↑
Reliability of measures ↑
Other sources of error ↓
tip: power & ANOVA
Each effect in the ANOVA has its own power
Eg. 2 x 3 ANOVA
Main effect A
Main effect B
Interaction effect A * B
Tip: power is lower for interactionsthan for main effects
Power and sample size
All else being equal, to get more power you need more participants
Where “all else” means:reliability of measuresother sources of error variancep-valuethe true size of the effect
Small samples
• Fewer repetitions of measurement– less reliability
• Anomalies can have more influence
More likely to be quirky
Sample size – ethical issues
Too small a sample
-- can’t detect significant effects
waste all participants’ time
Too large a sample
-- waste resources
-- waste the extra participants’ time
Sample size – practical issues
ResourcesTimeCost of running each participant
AvailabilityClinical populations are often smallAccess can take time & require permission
Choosing an appropriate sample size
Shortcut
Base sample size on previous research
(but make sure the previous research is of high quality!)
if you know these…
effect size
variance of measures
you can work out what the sample size should be
Effect size Do year 2 like sweet things better than year 3?
Should we order more sugar for the café?
My2 = 42, My3 = 40
Effect size = 42 – 40 = 2
Statistical significance: p < .05
Practical (‘clinical’) significance: is there an effect that matters?
Significance level (p-value) & sample size
a very large sample can detect tiny effects
a small sample can miss even a large effect
A very small p (like p = .001) does not mean a strong effect
Significance and effect size are different things
n = 3000, a difference in mean WSS score of 0.1 p < .0001
n = 3, a difference in mean WSS score of 3 p > .10
standardised effect size
d = M1 – M2
M1 and M2 are the respective population meansis an estimate of population sd.
Values typically range 0 – 3
0.2 is "small"; 0.8 is a "large" effect (Cohen, 1977)
Confidence intervals (CI)
p-value: is the difference significant?
CI
Is the difference significant?
What is the effect size?
How well have we estimated the difference?
Confidence interval
A range of effect sizes, with the most likely effect size in the middle
CI95 = 2.37 (1.5 – 3.24)
95% CI 5% p-value tested
If the interval includes 0, the difference is not statistically significant.
The 95% confidence interval
The data are consistent with anyvalue in this range
Confidence interval
A range of effect sizes, with the most likely effect size in the middle
CI95 = 2.37 (1.5 – 3.24)
The wider the interval, the less precisely we have measured the effect
CI95 = 2.37 (0.5 – 4.24)
The 95% confidence interval
…and the more uncertainty remains about the true effect size
Summary
Quality of the research questionlink to previous theory (theories)precision
Design and ‘causal’ research questionsPowerSample sizeEffect sizeConfidence intervals
These concepts are inter-related
Desired power ↑ N ↑
Acceptable p-value ↓ N ↑
Effect size to detect ↓ N ↑
Reliability of measures ↓ N ↑
Other error variance ↑ N ↑