Descriptive Statistics Introduction to Study Skills & Research Methods (HL10040) Dr James Betts.
Hypothesis Testing Introduction to Study Skills & Research Methods (HL10040) Dr James Betts.
-
date post
15-Jan-2016 -
Category
Documents
-
view
239 -
download
0
Transcript of Hypothesis Testing Introduction to Study Skills & Research Methods (HL10040) Dr James Betts.
Hypothesis Testing
Introduction to Study Skills & Research Methods (HL10040)
Dr James Betts
Lecture Outline:
•What is Hypothesis Testing?
•Hypothesis Formulation
•Statistical Errors
•Effect of Study Design
•Test Procedures
•Test Selection.
Statistics
Descriptive Inferential
Correlational
Relationships
GeneralisingOrganising, summarising & describing data
Significance
Sampling Error
Statistics
The dependent variable can be generalised from n to N
Effective sampling is essential to correctly
generalise back to our target population
What is Hypothesis Testing?
A B
A = B
Null Hypothesis
We also need to establish:
1) How unequal are these observations?
2) Are these observations reflective of the general population?
Alternative Hypothesis
Example Hypotheses: Isometric Torque• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Null Hypothesis Alternative Hypothesis
♂ = ♀ ♂ ♀
Example Hypotheses: Isometric Torque• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Null Hypothesis (H0)
There is not a significant difference in the DV between males and females
Alternative Hypothesis (HA) or experimental (HE)
There is a significant difference in the DV between males and females.
n.b. these are 2-tailed hypotheses. Most common and more recommended.
Example Hypotheses: Isometric Torque• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Useful analogy- the criminal trial
Imagine you are the prosecutor
H0 = Defendant not guilty
HA = Defendant guilty
We must assume that the defendant is innocent until proven guilty.
Example Hypotheses: Isometric Torque• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Energy Intake (calories per day)
1500 2500 3500 4500 5500
Nu
mb
er o
f P
eo
ple
0
20
40
60
80
100
120
140
160
16 17 18 19 20
Sustained Isometric Torque (seconds)
N♂N♀
n♂n♀
n.b. This is why effective sampling is so important...
Example Hypotheses: Isometric Torque• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Energy Intake (calories per day)
1500 2500 3500 4500 5500
Nu
mb
er o
f P
eo
ple
0
20
40
60
80
100
120
140
160
16 17 18 19 20
Sustained Isometric Torque (seconds)
N♂N♀
n♂n♀
…poor/insufficient sampling can lead to errors…
Statistical Errors• Type 1 Errors
-Rejecting H0 when it is actually true
-Concluding a difference when one does not actually exist
• Type 2 Errors
-Accepting H0 when it is actually false (e.g. previous slide)
-Concluding no difference when one does existErrors can occur due to biased/inadequate sampling, poor
experimental design or the use of inappropriate/non-parametric tests.
Back to Study Design• Independent Measures
– Individual scores in each data set are independent of one another
• Repeated Measures– Individual scores in each data set are
dependent/paired/correlated
Back to Study Design• Independent Measures
– Individual scores in each data set are independent of one another
• Repeated Measures– Individual scores in each data set are
dependent/paired/correlatedTO1 O2
T O1
Oa
P
Pre-Experimental designs.
2 Distinct Groups
Same individuals tested twice
Back to Study Design• Independent Measures
– Individual scores in each data set are independent of one another
• Repeated Measures
True-Experimental design.
Depends on how equivalent
groups were achieved
O1 T O2
P O4O3
R
Random Group Assignment
Cross-Over Design
Example Hypotheses: Isometric Torque• Is there any difference in the length of time that males and females can sustain an isometric muscular contraction?
• So the above example is an measures design
– Which therefore requires an independent t-test.
Independent
AKA Students’ (Gosset’s) t-test
Energy Intake (calories per day)
1500 2500 3500 4500 5500
Nu
mb
er o
f P
eo
ple
0
20
40
60
80
100
120
140
160
16 17 18 19 20
Sustained Isometric Torque (seconds)
n♂n♀
Independent t-test: Calculation
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Is this a significant effect?
Independent t-test: Calculation
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Step 1:
Calculate the Standard Error for Each Mean
SEM♀ = SD/√n = 1.74/5 = 0.348
SEM♂ = SD/√n = 1.72/5 = 0.344
Independent t-test: Calculation
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Step 2:
Calculate the Standard Error for the difference in means
SEMdiff = √ SEM♀2 + SEM♂2 = √ 0.251 = 0.501
Independent t-test: Calculation
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Step 3:
Calculate the t statistic
t = (Mean♀ - Mean♂) / SEMdiff = 2.00
Independent t-test: Calculation
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Step 4:
Calculate the degrees of freedom (df)
df = (n♀ - 1) + (n♂ - 1) = 48
Independent t-test: Calculation
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Step 5:
Determine the critical value for t using a t-distribution table
Degrees of Freedom Critical t-ratio
44464850
2.0152.0132.0112.009
n.b. Use 0.05 for 2 tailed test
Independent t-test: Calculation
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Step 6 finished:
Compare t calculated with t critical
Calculated t = 2.00
Critical t = 2.01
Therefore,
t calculated < t critical
Effect size n.s.
Independent t-test: Calculation
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Interpretation:
P > 0.05 Reject HA & Accept HO
Conclusion:
There is not a significant difference in the DV between males and females.
Independent t-test: Calculation
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Evaluation:
The wealth of available literature supports that females can sustain isometric contractions longer than males. This may suggest that the findings of the present study represent a type error
Possible solution: Increase n
Independent t-test: SPSS Output
Independent Samples Test
7.842 .012 -2.333 18 .031 -1.69600 .72710 -3.22358 -.16842
-2.333 15.447 .034 -1.69600 .72710 -3.24188 -.15012
Equal variancesassumed
Equal variancesnot assumed
SwimTime50mF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Group Statistics
10 24.7720 1.25246 .39606
10 26.4680 1.92823 .60976
GroupControl
Visualisation
SwimTime50mN Mean Std. Deviation
Std. ErrorMean
Swim Data from SPSS session 8
Calculated t
df 18 = critical t 2.101
Ignore sign 2.333 > 2.101 So P < 0.05
Repeated Measures Designs• As shown earlier, a repeated measures design
infers that data in each data set can be paired or correlated with one another
• An independent t-test is inappropriate to analyse such data
• Instead, a paired t-test should be used…
Week1 2
Nu
mb
er o
f P
ress
-Up
s
0
20
40
60
80
100
120
140
160
180
200
Advantages of using Paired Data• Data from independent samples is heavily
influenced by variance between subjects
i.e.
This data would have a large SD associated with an
independent t-test simply because some subjects
performed better than others
HOWEVER…
Large SD
(variance)
Week1 2
Nu
mb
er o
f P
ress
-Up
s
0
20
40
60
80
100
120
140
160
180
200
Advantages of using Paired Data• Data from independent samples is heavily
influenced by variance between subjects
…using the same participants on two
occasions allows us to pair up the data…
…now we can remove between subject variance
from subsequent analysis…
Paired t-test: CalculationSubject Week 1 Week 2 Diff (D) Diff2 (D2)
1 10 12
2 50 52
3 20 25
4 8 10
5 115 120
6 75 80
7 45 50
8 170 175
∑D = ∑D2 =Steps 1 & 2: Complete this table
Paired t-test: Calculation
∑D = ∑D2 =
Step 3:
Calculate the t statistic
t = n x ∑D2 – (∑D)2 = √ (n - 1)
∑D
Paired t-test: Calculation
∑D = ∑D2 =
Step 3:
Calculate the t statistic
t = 8 x 137 – (31)2 = 7.06 √ 7
31
Paired t-test: Calculation
Steps 4 & 5:
Calculate the df and use a t-distribution table to find t critical
Degrees of FreedomCritical t-ratio
(0.05 level)
123456789
12.714.3033.1822.7762.5712.4472.3652.3062.262
df = n -1
Critical t-ratio (0.01
level)63.6579.9255.8414.6044.0323.7073.4993.3553.250
Paired t-test: CalculationStep 6 finished:
Compare t calculated with t critical
Calculated t = 7.06
Critical t = 3.499
Therefore,
t calculated > t critical
Effect size sig.
Mean SD n
Week 1 61.6 56.6 8
Week 2 65.5 57.5 8
Paired t-test: Calculation
Mean SD n
Week 1 61.6 56.6 8
Week 2 65.5 57.5 8
Interpretation:
P < 0.05 Reject H0 & Accept HA
Conclusion:
There is a significant difference in the DV between week 1 and week 2.
Paired Samples Test
-3.87500 1.55265 .54894 -5.17305 -2.57695 -7.059 7 .000VAR00001 - VAR00002Pair 1Mean Std. Deviation
Std. ErrorMean Lower Upper
95% ConfidenceInterval of the
Difference
Paired Differences
t df Sig. (2-tailed)
Paired t-test: SPSS Output
Push-up Data from lecture 3
Calculated t
df 7 = critical t 2.365 (0.05) 3.499 (0.01)
Ignore sign 7.059 > 3.499 So P < 0.01
Paired Samples Statistics
61.6250 8 56.64157 20.02582
65.5000 8 57.54005 20.34348
VAR00001
VAR00002
Pair1
Mean N Std. DeviationStd. Error
Mean
Parametric versus Non-Parametric
• Both the t-tests just shown are parametric tests
• These examine for differences in the mean
• Therefore the mean must be an accurate descriptor
Normal Non-normal?
Example Hypotheses: Isometric Torque• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Energy Intake (calories per day)
1500 2500 3500 4500 5500
Nu
mb
er o
f P
eo
ple
0
20
40
60
80
100
120
140
160
16 17 18 19 20
Sustained Isometric Torque (seconds)
Normal Distribution mean is appropriate t-test
Mean A
Mean B
Example Hypotheses: Isometric Torque• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Energy Intake (calories per day)
1500 2500 3500 4500 5500
Nu
mb
er o
f P
eo
ple
0
20
40
60
80
100
120
140
160
16 17 18 19 20
Sustained Isometric Torque (seconds)
NON-Normal Distribution mean is INappropriate
Mean A
Mean B
Type 2
error
…assumptions of parametric analyses
• All means and paired differences are ND (this is the main consideration)
• N acquired through random sampling
• Data must be of at least the interval LOM
• Data must be Continuous.
…but see Norman (2010) Adv. Health Sci. Educ.
Non-Parametric Tests
• These tests use the median and do not assume anything about distribution, i.e. ‘distribution free’
• Mathematically, value is ignored (i.e. the magnitude of differences are not compared)
• Instead, data is analysed simply according to rank.
Non-Parametric Tests
• Independent Measures
– Mann-Whitney Test
• Repeated Measures
– Wilcoxon Test
e.g. Exam grades (ordinal) from 14 students in 2 separate schools
Mann-Whitney U: CalculationStep 1:
Rank all the data from both groups in one series, then total each
Student
School A School B
StudentGrade GradeRank Rank
J. S. L. D. H. L. M. J. T. M. T. S. P. H.
T. J. M. M. K. S. P. S. R. M. P. W. A. F.
B- B- A+ D- B+ A- F
D C+ C+ B- E C-
A- Median = B-; Median = C+;∑RA = ∑RB =
Mann-Whitney U: CalculationStep 2:
Calculate two versions of the U statistic using:
Median = B-; Median = C+;∑RA = ∑RB =
U1 = (nA x nB) + 2
(nA + 1) x nA - ∑RA
AND…
U2 = (nA x nB) + 2
(nB + 1) x nB - ∑RB
Mann-Whitney U: CalculationStep 2:
Calculate two versions of the U statistic using:
Median = B-; Median = C+;∑RA = ∑RB =
U1 = (nA x nB) + 2
(nA + 1) x nA - ∑RA
…OR to save time you can calculate U1 and then U2 as follows
U2 = (nA x nB) - U1
Mann-Whitney U: CalculationStep 3 finished:
Select the smaller of the two U statistics (U1 = 17.5; U2 = 31.5)
…now consult a table of critical values for the Mann-Whitney test
n
0.05
0.01
6
5
2
7
8
4
8
13
7
9
17
11
Calculated U must be less than critical U to conclude a significant difference
Conclusion
Median A = Median B
Test Statisticsb
17.500
45.500
-.900
.368
.383a
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailedSig.)]
VAR00001
Not corrected for ties.a.
Grouping Variable: VAR00002b.
Mann-Whitney U: SPSS Output
Calculated U (lower value)
17.5 > 8
So P > 0.05 n.s.
Ranks
7 8.50 59.50
7 6.50 45.50
14
VAR000021.00
2.00
Total
VAR00001N Mean Rank Sum of Ranks
Non-Parametric Tests
• Independent Measures
– Mann-Whitney Test
• Repeated Measures
– Wilcoxon Test
e.g. One group pre-test post-test, assumed non-normal
Wilcoxon Signed Ranks: CalculationStep 1:
Rank all the differences in one series (ignoring signs), then total each
AthletePre-training OBLA (kph)
Rank
J. S. L. D. H. L. M. J. T. M. T. S. P. H.
15.6 17.2 17.7 16.5 15.9 16.7
17.0
0.5 0.3 -1 0.3
0.1 -0.2 0.1 ∑Signed Ranks =
Post-training OBLA (kph)
Diff. Signed Ranks
16.1 17.5 16.7 16.8 16.0 16.5
17.1
6 4.5 -7 4.5 1.5 -3
1.5
- +
-7
-3
6 4.5
4.5
1.5
1.5
Medians = 16.7 16.7
Wilcoxon Signed Ranks: CalculationStep 2:
The smaller of the T values is our test statistic (T+ = 18; T- = 10)
…now consult a table of critical values for the Wilcoxon test
n
0.05
6
0
7
2
8
3
9
5
Calculated T must be less than critical T to conclude a significant difference
Conclusion
Median A = Median B
Test Statisticsb
-1.364a
.172
Z
Asymp. Sig. (2-tailed)
VAR00002 -VAR00001
Based on negative ranks.a.
Wilcoxon Signed Ranks Testb.
Wilcoxon Signed Ranks: SPSS Output
10 > 2
So P > 0.05 n.s.
Ranks
2a 3.00 6.00
5b 4.40 22.00
0c
7
Negative Ranks
Positive Ranks
Ties
Total
VAR00002 - VAR00001N Mean Rank Sum of Ranks
VAR00002 < VAR00001a.
VAR00002 > VAR00001b.
VAR00002 = VAR00001c.
So which stats test should you use?
Q1. What is the LOM?
Ordinal
Nominal Interval/Ratio
Q2. Are the data ND?NoYes
Q3. Are the data paired
or independent?
Why do we use Hypothesis Testing?
• It is easy (i.e. data in P value out)
• It provides the ‘Illusion of Scientific Objectivity’
• Everybody else does it.
Problems with Hypothesis Testing?
• P<0.05 is an arbitrary probability (P<0.06?)
• The size of the effect is not expressed
• The variability of this effect is not expressed
• Overall, hypothesis testing ignores ‘judgement’.