nonparametric lecture.ppt

31
Nonparametric Statistics Timothy C. Bates [email protected] k

Transcript of nonparametric lecture.ppt

Page 1: nonparametric lecture.ppt

Nonparametric Statistics

Timothy C. Bates

[email protected]

Page 2: nonparametric lecture.ppt

Parametric Statistics 1

Assume data are drawn from samples with a certain distribution (usually normal)

Compute the likelihood that groups are related/unrelated or same/different given that underlying model

t-test, Pearson’s correlation, ANOVA…

Page 3: nonparametric lecture.ppt

Parametric Statistics 2

Assumptions of Parametric statistics1. Observations are independent

2. Your data are normally distributed

3. Variances are equal across groups• Can be modified to cope with unequal ∂2

Page 4: nonparametric lecture.ppt

Non-parametric Statistics?

Non-parametric statistics do not assume any underlying distribution

They estimate the distribution AND compute the probability that your groups are the related/the same or unrelated/different

Page 5: nonparametric lecture.ppt

Nonparametric ≠ No parameters

Model structure is not specified a priori but is instead determined from data.

The data are parameterised by the analysis

AKA: “distribution free”

Page 6: nonparametric lecture.ppt

Non-parametric Statistics

Assumptions of non-parametric statistics1. Observations are independent

Page 7: nonparametric lecture.ppt

Non-parametric Statistics?

Non-parametric statistics do not assume any underlying distribution

Estimating or modeling this distribution reduces their power to detect effects…

So never use them unless you have to

Page 8: nonparametric lecture.ppt

Why use a Non-parametric Statistic?

Very small samples (<20 replicates) High probability of violating the assumption of

normality Leads to spurious Type-1 (false alarm) errors

Page 9: nonparametric lecture.ppt

Why use a Non-parametric Statistic?

Outliers more often lead to spurious Type-1 (false alarm) errors in parametric statistics.

Nonparametric statistics reduce data to an ordinal rank, which reduces the impact or leverage of outliers.

Page 10: nonparametric lecture.ppt

Error Type-I error: False Alarm for a bogus effect

reject the null hypothesis when it is really true

Type-II error: Miss a real effect fail to reject our null hypothesis when it is really false

Type-III error: :-) lazy, incompetent, or willful ignorance of the truth

Page 11: nonparametric lecture.ppt

Power

1-alpha

Page 12: nonparametric lecture.ppt

Non-parametric ChoicesData type?

χ2

discrete

Question?

continuous

Number of groups?

Spearman’s Rank

association Different central value

Mann-Whitney UWilcoxon’s Rank Sums

Kruskal-Wallis test

two-groups more than 2

Brown-Forsythe

Difference in ∂2

Page 13: nonparametric lecture.ppt

Non-parametric ChoicesData type?

χ2

discrete

Question?

continuous

Number of groups?

Spearman’s Rank

Like a Pearson’s R

Mann-Whitney UWilcoxon’s Rank Sums

Kruskal-Wallis test

two-groups more than 2Like ANOVA

Like Student’s t

No alternative

Different central value

Brown-Forsythe

Difference in ∂2

Like F-test

association

Page 14: nonparametric lecture.ppt

Chi-Squared (Χ2) χ2 tests the null hypothesis that observed

events occur with an expected frequency in large samples frequencies are distributed as Χ2

e.g. Ho: “This six-sided dice is fair ” Expect all 6 outcomes to occur equally often

Assumptions Observations are independent Outcomes mutually exclusive Sample is not small

Small samples require exact test:, i.e., binomial test

Page 15: nonparametric lecture.ppt

Chi-Squared Χ2 formula

Χ2 = the sum of each squared difference between the observed and expected frequencies divided its expected frequency

Page 16: nonparametric lecture.ppt

Χ2 and contingency tables

Χ2 essentially tests if each cell in a contingency table has its expected value

In a 2-way table, this expectation will be the value of an adjacent cell

Page 17: nonparametric lecture.ppt

Example: coin toss

Random sample of 100 coin tosses, of a coin believed to be fair

We observed number of 45 heads, and and 55 tails

Is the coin fair?

Page 18: nonparametric lecture.ppt

Coin toss

If ho is true, our test statistic is drawn from a Χ2

distribution with df = 1

(45-50)2 + (55-50)2 = 0.5 + 0.5 = 1

50 50

Χ2(1) = 1, p > 0.3

Page 19: nonparametric lecture.ppt

Coin toss Χ2 in R

chisq.test(c(45,55), p=c(.5,.5))

Chi-squared test for given probabilities Χ2 = 1, df = 1, p = 0.3173

Page 20: nonparametric lecture.ppt

Spearman Rank test (ρ (rho)) Named after Charles Spearman,

Non-parametric measure of correlation Assesses how well an arbitrary monotonic

function describes the relationship between two variables,

Does not require the relationship be linear Does not require interval measurement

Page 21: nonparametric lecture.ppt

Spearman Rank test (ρ (rho)) Mathematically, it is simply a Pearson’s r

computed on ranked data d = difference in rank of a given pair n = number of pairs

Alternative test = Kendall's Tau (Kendall's τ)

Page 22: nonparametric lecture.ppt

Mann-Whitney U

AKA: “Wilcoxon rank-sum test Mann & Whitney, 1947; Wilcoxon, 1945

Non-parametric test for difference in the medians of two independent samples Assumptions:

• Samples are independent• Observations can be ranked (ordinal or better)

Page 23: nonparametric lecture.ppt

Mann-Whitney U

U tests the difference in the medians of two independent samples

n1 = number of obs in sample 1

n2 = number of obs in sample 2 R = sum of ranks of the lower-ranked

sample

Page 24: nonparametric lecture.ppt

Mann-Whitney U or t-test? Should you use it over the t-test?

Yes if you have a very small sample (<20)• (central limit assumptions not met)

Possibly if your data are inherently ordinal Otherwise, probably not.

It is less prone to type-I error (spurious significance) due to outliers.

But does not in fact handle comparisons of samples whose variances differ very well (Use unequal variance t-test with rank data)

Page 25: nonparametric lecture.ppt

Aesop: Mann-Whitney U Example

Suppose that Aesop is dissatisfied with his classic experiment in which one tortoise was found to beat one hare in a race.

He decides to carry out a significance test to discover whether the results could be extended to tortoises and hares in general…

Page 26: nonparametric lecture.ppt

Aesop 2: Mann-Whitney U He collects a sample of 6 tortoises and 6 hares,

and makes them all run his race. The order in which they reach the finishing post (their rank order) is as follows:

tort = c(1, 7, 8, 9, 10,11) hare = c(2, 3, 4, 5, 6, 12)

Original tortoise still goes at warp speed, original hare is still lazy, but the others run truer to stereotype.

Page 27: nonparametric lecture.ppt

Aesop 3: Mann-Whitney U

wilcox.test(tort, hare) Wilcoxon = W = 25, p-value = 0.31

Tortoises are not faster (but neither are hares)

tort = c(1, 7, 8, 9, 10,11) (n2 = 6)

hare = c(2, 3, 4, 5, 6, 12) (n1 = 6, R1 =32)

Page 28: nonparametric lecture.ppt

Aesop 4: Mann-Whitney U Wilcoxon = W = 25, p-value = 0.31

Tortoises are not faster (but neither are hares). Welch Two Sample t-test

t = 1.1355, df = 10, p-value = 0.28 Alternative hypothesis: true difference in means is

not equal to 0 95 percent confidence interval:

-2.25 ~ 6.91 sample estimates:

• mean of x = 7.6 mean of y = 5.3

Page 29: nonparametric lecture.ppt

Power comparison with continuous normal data

tort = 1 74 79 81 100 121 hare = 4 9 16 17 18 144 Wilcoxon

W = 25, p = 0.31 t.test

t.test(tort, hare, var.equal = TRUE) t(10) = 1.5, p = 0.16

Page 30: nonparametric lecture.ppt

Wilcoxon signed-rank test (related samples)

Same idea as MW U, generalized to matched samples

Equivalent to non-independent sample t-test

Page 31: nonparametric lecture.ppt

Kruskall-Wallis Non-parametric one-way analysis of variance

by ranks (named after William Kruskal and W. Allen Wallis)

tests equality of medians across groups. It is an extension of the Mann-Whitney U test to

3 or more groups. Does not assume a normal population, Assumes population variances among groups

are equal.