SOME STATISTICAL TESTS -...
Transcript of SOME STATISTICAL TESTS -...
Overview
● Theory of statistical tests● Test for a difference in mean● Test for dependence
– Nominal variables
– Continuous variables
– Ordinal variables
● Power of a test● Degrees of freedom
Overview
● Theory of statistical tests● Test for a difference in mean● Test for dependence
– Nominal variables
– Continuous variables
– Ordinal variables
● Power of a test● Degrees of freedom
Overview
● Theory of statistical tests● Test for a difference in mean● Test for dependence
– Nominal variables
– Continuous variables
– Ordinal variables
● Power of a test● Degrees of freedom
Test for a difference in mean : T test
● Underline of the test– What is given? Independent observations (x1 , . . . , xn )
and (y1 , . . . , ym ).
– Null hypothesis: x and y are samples from distributions having the same mean.
– Test: t-test– R command: t.test( x, y )
– Idea of the test: If the sample means are too far apart, then reject the null hypothesis.
– Approximative test but rather robust
Test for a difference in mean : T test
● Ex 1: marsians– Dataset containing
height for marsians of different colors
– Reject the null hypo
– It was an unpaired t test (no dependence between the 2 samples)
> mars <- read.table("mars.txt",header=TRUE)> head(mars) size color1 65.67974 red2 65.90436 red3 67.34730 red4 60.42924 red5 55.34526 red6 62.85024 red> attach(mars)> t.test(size[color=="green"],size[color=="blue"])
Two Sample t-testdata: size[color == "green"] and size[color == "blue"]t = -3.4244, df = 19.419, p-value = 0.002775alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:-16.875514 -4.083647sample estimates:mean of x mean of y60.86840 71.34798
Test for a difference in mean : T test
● Ex 1: marsians– Dataset containing
height for marsians of different colors
– Reject the null hypo
– It was an unpaired t test (no dependence between the 2 samples)
> mars <- read.table("mars.txt",header=TRUE)> head(mars) size color1 65.67974 red2 65.90436 red3 67.34730 red4 60.42924 red5 55.34526 red6 62.85024 red> attach(mars)> t.test(size[color=="green"],size[color=="blue"])
Two Sample t-testdata: size[color == "green"] and size[color == "blue"]t = -3.4244, df = 19.419, p-value = 0.002775alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:-16.875514 -4.083647sample estimates:mean of x mean of y60.86840 71.34798
Test for a difference in mean : T test
● Ex 2: shoe wear– Dataset containing
wear of shoes of 2 materials A and B
– Paired test because some boys will cause more damage to the shoe than others
– Reject the null hypo
> data(shoes,package=’MASS’) > attach(shoes) > head(shoes) $A [1] 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 8.8 13.3 $B [1] 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 9.3 13.6> t.test(A,B,paired=TRUE)
Paired t-testdata: A and Bt = -3.3489, df = 9, p-value = 0.008539alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -0.6869539 -0.1330461sample estimates:mean of the differences -0.41
Test for a difference in mean : T test
● Ex 2: shoe wear– Dataset containing
wear of shoes of 2 materials A and B
– Paired test because some boys will cause more damage to the shoe than others
– Reject the null hypo
> data(shoes,package=’MASS’) > attach(shoes) > head(shoes) $A [1] 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 8.8 13.3 $B [1] 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 9.3 13.6> t.test(A,B,paired=TRUE)
Paired t-testdata: A and Bt = -3.3489, df = 9, p-value = 0.008539alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -0.6869539 -0.1330461sample estimates:mean of the differences -0.41
Test for a difference in mean : T test
● Linked tests that might be of interest– var.test() to test for equality in variance
→ this way you can change the option var.equal in t.test()
– shapiro.test() to test for normality for example before doing a Pearson correlation
The null hypothesis of the shapiro test is normal distribution
Overview
● Theory of statistical tests● Test for a difference in mean● Test for dependence
– Nominal variables
– Continuous variables
– Ordinal variables
● Power of a test● Degrees of freedom
Test for dependence
● The test depends from the data type– Nominal variables (not ordered like eye color or
gender)
– Ordinal variables (ordered but not continuous like result of a dice)
– Continuous variables (like body height)
Overview
● Theory of statistical tests● Test for a difference in mean● Test for dependence
– Nominal variables
– Continuous variables
– Ordinal variables
● Power of a test● Degrees of freedom
Test for dependenceNominal (count) variables
● Underline of the test– What is given? Pairwise observations (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )
– Null hypothesis: x and y are independent
– Test: χ2 -test for independence
– R command: chisq.test( x, y ) or chisq.test( contingency.table )
– Idea of the test: Calculate the expected abundancies under the assumption of independence. If the observed abundancies deviate too much from the expected abundancies, then reject the null hypothesis.
– Approximate test, see the conditions on the lecture notes
Test for dependenceNominal (count) variables
● Ex 1: χ2 -test
> contingency <- matrix( c(47,3,8,42,60,15,8,33,3), nrow=3 )
> chisq.test(contingency)$expected
[,1] [,2] [,3]
[1,] 25.689498 51.82192 19.488584
[2,] 25.424658 51.28767 19.287671
[3,] 6.885845 13.89041 5.223744
# expected abundancies are all above 5, so we may apply the test
> chisq.test(contingency)
Pearson’s Chi-squared test
data: contingency
X-squared = 58.5349, df = 4, p-value = 5.892e-12
● Reject the null hypo that the two variables are independent
Test for dependenceNominal (count) variables
● Fisher´s exact test– 2*2 contingency tables
– Example:
> table <- matrix( c(14,10,21,3), nrow=2 )
> fisher.test(table)
Fisher’s Exact Test for Count Data
data: table
p-value = 0.04899
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.03105031 0.99446037
sample estimates:
odds ratio 0.2069884
● We reject the null hypo
Overview
● Theory of statistical tests● Test for a difference in mean● Test for dependence
– Nominal variables
– Continuous variables
– Ordinal variables
● Power of a test● Degrees of freedom
Test for dependenceContinuous variables
● Underline of the test– What is given? Pairwise observations (x1 , y1 ),
(x2 , y2 ), . . . , (xn , yn ); all values in some interval are possible
– Null hypothesis: x and y are independent
– Test: Pearson’s correlation test for independence
– Assumption: x and y are samples from a normal distribution
– R command: cor.test( x, y )
Test for dependenceContinuous variables
● Ex: – Distance needed to
stop from a certain speed for cars
– Reject the null hypo
> data(cars)> attach(cars)> str(cars)> ?cars> plot(speed,dist)> cor.test(speed, dist)Pearson’s product-moment correlationdata: speed and distt = 9.464, df = 48, p-value = 1.49e-12alternative hypothesis: true correlation is not equal to 095 percent confidence interval:0.6816422 0.8862036sample estimates:cor 0.8068949
Test for dependenceContinuous variables
● Ex: – Distance needed to
stop from a certain speed for cars
– Reject the null hypo
> data(cars)> attach(cars)> str(cars)> ?cars> plot(speed,dist)> cor.test(speed, dist)Pearson’s product-moment correlationdata: speed and distt = 9.464, df = 48, p-value = 1.49e-12alternative hypothesis: true correlation is not equal to 095 percent confidence interval:0.6816422 0.8862036sample estimates:cor 0.8068949
Overview
● Theory of statistical tests● Test for a difference in mean● Test for dependence
– Nominal variables
– Continuous variables
– Ordinal variables
● Power of a test● Degrees of freedom
Test for dependenceOrdinal variables
● Underline of the test– What is given? Pairwise
observations (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ); values can be ordered
– Null hypothesis: x and y are uncorrelated
– Test: Spearman’s rank correlation rho
– R command: cor.test( x, y, method="spearman")
> data(cars)> attach(cars)> cor.test(speed, dist, method=”spearman”)
Spearman's rank correlation rho
data: speed and dist S = 3532.819, p-value = 8.825e-14alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.8303568
Warning message:In cor.test.default(speed, dist, method = "spearman") : Cannot compute exact p-values with ties
Overview
● Theory of statistical tests● Test for a difference in mean● Test for dependence
– Nominal variables
– Continuous variables
– Ordinal variables
● Power of a test● Degrees of freedom
The power of a test
● Alternative hypothesis H1– Ex: H0: µ=0 and H1: µ≠0
● 2 types of error– Type I error (or “first kind” or “α error” or “false positive”): rejecting H0 when it is
true
– Type II error (or “second kind” or “β error” or “false negative”): failing to reject H0 when it is not true
● Power is 1-β– If power=0 you will never reject H0
– Ex: if the true value is close to 0, the test has no chance to reject H0: rather choose |µ|>=0.5
● In general the power increase with sample size– Use power.test() or power.fisher.test() to calculate the min sample size needed