Post on 04-Jan-2016
Nonparametric Statistics
Lecture 9
Small Sample, Non-normal Population
If the sample was large, the Central Limit Theorem would be applicable for testing hypotheses about the mean.If the population was normal, the sampling distribution of the mean is exactly a normal distribution to start with.If the sample is small and the population non-normal, what do we do?Nonparametric statistics is a sub-field of statistics that creates inferences concerning populations that cannot be assumed to follow any particular distribution.
One –Sample Example
Suppose that a nurse has been instructed to perform a procedure in a new way . Researchers recorded the change in the number of minutes it took the nurse to perform the procedure.The data is0.6, -0.5, 1.1, 2.4, 3.5, 2.0-0.1, 1.0, 2.1, -0.6, -0.2We would be hard pressed to say that this data even approximately follows a normal distribution.
Assumption of normality for small sample example
There are only 11 observations and we might be uncomfortable claiming that this distribution looks normal. Instead, it looks more uniform.
The Sign Test – 5 Steps
Assumptions: Random, independent sample
Hypotheses:Null hypothesis: Median equals zero
Alternative hypothesis: Median does not equal zero
Test statistic: p=7/11, interested in comparing proportion that are greater than zero with one-half.
The Sign Test – 5 Steps, cont.
P-value: Need exact calculation since CLT doesn’t apply with small samples. 95% CI for p with small samples: (0.308, 0.891)
Conclusion: Since 0.5 is included in the 95% confidence interval, we can’t say that the median is significantly different than zero at the 0.05 level. (We fail to reject the null hypothesis.)
The Signed Rank Test – 5 stepsAssumptions:
The measurement is continuousIndependent, random sample from the populationDistribution is symmetric
Hypotheses: H0: Median of the distribution is 0
HA: Median of distribution is non-zero
Test Statistic: Minimum of the rank sumsP-value: from the computer!
For this example, p=0.0439
Conclusion: As per usual.
Calculation of Signed Rank Test Statistic
Order observations from smallest to largest in absolute value
|Y|(1) ≤ |Y|(2) ≤ … ≤ |Y|(n)
So from example,|-0.1| < |-0.2| < |-0.5| < |-0.6| = 0.6 < 1.0 < 1.1 < 2.0 < 2.1 < 2.4 < 3.5 Assign Ranks to these absolute values
1, 2, … , nIn example, 1, 2, … , 11
Signed Rank Test Statistic, cont…
Arrange the ranks into two groups: those with actual values that are smaller and those that are larger than zero. Sum the ranks for both the negative and positive valued observations, separately.Here, for negative values, sum of ranks = 1+2+3+4.5 = 10.5For positive valuessum of ranks = 4.5+6+7+8+9+10+11 = 55.5Test Statistic = smallest rank sum
P-values for signed rank test
For critical values and p-values, look at tables/computer generated p-values.This procedure is unavailable in the Student version of SPSS. It is available in SAS and the regular version of SPSS.
Comments on Signed Rank Test
More “powerful” than the Sign Test, but requires more assumptionsOne-sided tests are possibleRobust to outliersSome books/programs use the sum of the ranks of the positive values as the test statistic – p-values are always the sameNonparametric confidence intervals are also available from some software programs.For tied observations, use average rank for each tied observation.
Nonparametric statistics for small, non-normal samples
Paired DataThe same as for univariate data, except perform the test using the differences rather than the raw data.
Two Independent GroupsMann-Whitney Rank Sum Test (Ch. 24)
• Procedure is similar to the Sign Rank test, except that instead of dividing observations according to whether they are positive or negative, we divide observations according to group membership.
• Assumptions include (1) independent, random samples, (2) independently selected groups, and (3) the shape and spread of the two distributions are the same
Paired Differences Example
Wife 0.4 0.5 1.0 0.2 0.9 1.0 1.2 0.1 0.6 0.4 0.2
Husband 0.5 0.4 0.7 0.0 0.6 1.2 0.7 0.1 0.5 0.1 0.1
Difference -0.1 0.1 0.3 0.2 0.3 -0.2 0.5 0.0 0.1 0.3 0.1
Study Hypothesis: Men and women spend different amounts of time reading/watching the news.
The Signed Rank Test – 5 stepsAssumptions:
The measurement (difference) is continuousIndependent, random sample from the populationDistribution of difference is symmetric
Hypotheses: H0: Median of the difference is 0
HA: Median of difference is non-zero
Test Statistic: Minimum of the rank sumsP-value: from the computer!
For this example,
Conclusion: As per usual.
Computer Outputs - Paired
Data for wives and husbands are in two separate columns, with matched observations in the same row.AnalyzeNonparametric tests2 Related Samples…
Wilcoxon Signed Ranks Test
Ranks
8a 5.88 47.00
2b 4.00 8.00
1c
11
Negative Ranks
Positive Ranks
Ties
Total
HUSBAND - WIFEN Mean Rank Sum of Ranks
HUSBAND < WIFEa.
HUSBAND > WIFEb.
WIFE = HUSBANDc.
Test Statisticsb
-2.007a
.045
Z
Asymp. Sig. (2-tailed)
HUSBAND -WIFE
Based on positive ranks.a.
Wilcoxon Signed Ranks Testb.
Computer Outputs - Paired
Data for wives and husbands are in two separate columns, with matched observations in the same row.AnalyzeNonparametric tests2 Related Samples…
Sign Test
Frequencies
8
2
1
11
Negative Differencesa
Positive Differencesb
Ties c
Total
HUSBAND - WIFEN
HUSBAND < WIFEa.
HUSBAND > WIFEb.
WIFE = HUSBANDc.
Test Statisticsb
.109aExact Sig. (2-tailed)
HUSBAND -WIFE
Binomial distribution used.a.
Sign Testb.
Two Independent Groups Example
Wife 0.4 0.5 1.0 0.2 0.9 1.0 1.2 0.1 0.6 0.4 0.2
Husband 0.5 0.4 0.7 0.0 0.6 1.2 0.7 0.1 0.5 0.1 0.1
Study Hypothesis: Men and women spend different amounts of time reading/watching the news.
The Mann-Whitney Test – 5 stepsAssumptions:
Independent, random samplesIndependently selected groupsThe shape and spread of the two distributions are the same
Hypotheses: H0: Group medians are the same
HA: Group medians are different
Test Statistic: rank sumsP-value: from the table or computer!
For this example,
Conclusion: As per usual.
Computer Outputs - Independent
Data for wives & husbands are in the same column; a second column indicates whether each observation is for the wife or husband*.AnalyzeNonparametric tests2 Independent Samples…
Mann-Whitney Test
Ranks
11 12.68 139.50
11 10.32 113.50
22
GROUPHusband
Wife
Total
TIMEN Mean Rank Sum of Ranks
Test Statisticsb
47.500
113.500
-.859
.390
.401a
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailedSig.)]
TIME
Not corrected for ties.a.
Grouping Variable: GROUPb.
*: Type of this variable must be Numeric in SPSS.
Comments on Nonparametric Test for 2 Independent Samples
Robust to outliersOne-sided tests are possibleNonparametric confidence intervals are also available from some software programsFor tied observations, use average rank for each tied observation.Possible Names
Mann-Whitney Rank Sum TestMann-Whitney TestMann-Whitney U TestWilcoxon Rank Sum Test
Testing for a Relationship between Categorical Variables
Large Sample SizeChi-square test
Small Sample SizeChi-square test with Yates’ continuity correction
Fisher’s exact test
Urgent Colonoscopy for the Diagnosis and Treatment of Severe Diverticular Hemorrhage New England Journal of Medicine 2000;342:78-82
Research Hypothesis
Severe Bleeding
Medical and Surgical Treatment
Medical and Colonoscopic Treatment
Total
No 11 10 21
Yes 6 0 6
Total 17 10 27
Fisher’s Exact Test – 5 stepsAssumptions:
Independent, random sample from the populationTwo variables are categorical
Hypotheses: H0: Response and Predictor are Independent
HA: Response and Predictor are Associated
Test Statistic: (p-value)P-value: from the computer!
For this example, p=0.057
Conclusion: As per usual.
Data Entry
Weight the variable: count.DataWeight Cases…
Computer Outputs - FET
Perform FET (or Chi-square test if sample size is large)AnalyzeDescriptive StatisticsCrosstabs…Assign “bleeding” for
“Row(s)”, “treat” for “Column(s)”
Click “Statistics” to check “Chi-
square”
CrosstabsBLEEDING * TREAT Crosstabulation
Count
11 10 21
6 6
17 10 27
No
Yes
BLEEDING
Total
Medical andSurgical
Treatment
Medical andColonoscopic
Treatment
TREAT
Total
Chi-Square Tests
4.538b 1 .033
2.726 1 .099
6.530 1 .011
.057 .042
4.370 1 .037
27
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
2 cells (50.0%) have expected count less than 5. The minimum expected count is2.22.
b.
The Inexact Use of Fisher’s Exact Test in Six Major Medical Journals
JAMA 1989;261:3430-3433
Table 1. Specification of Use of Fisher’s Exact Test by Journal
Journal No. of Articles That Specified /
No. of Articles Reviewed
------------------------------------------------------------------------------------------------------
New England Journal of Medicine 8 / 9
Annals of Internal Medicine 2 / 4
British Medical Journal 3 / 6
The Journal of the American 6 / 16
Medical Association
Lancet 4 / 14
American Journal of Medicine 0 / 7
Homework
To be posted, not graded
Solutions will be posted on Monday
Read Chapters 24, 25, 27