Student’s t statistic

Student’s t statistic Use

Test for equality of two means E.g., compare two groups of subjects given

different treatments Test for value of a single mean

E.g., test to see if a single group of subjects differs from a known value

Also ‘matched sample’ test where a single group is compared before and after treatment (test for zero treatment effect)

Advanced Tests of significance of correlation/regression

coefficients.

Student’s t statistic Assumptions

Parent population is normal Sample observations (subjects) are

independent. Robustness

To normality: Affects Type I error and power and may lead to inappropriate interpretation. In real life, we can’t expect exactly normal data but it should not be too much skewed

Student’s t statistic Formula (single group)

Let x1, x2, ….xn be a random sample from a normal population with mean µ and variance σ2, then the following statistic is distributed as Student’s t with (n-1) degrees of freedom.

ns

xt

/

μ−=

Student’s t statistic Formula (two groups)

Case 1: Two matched samples The following statistic follows t distribution with n-1 d.f.

Where, d is the difference of two matched samples and Sd is the standard deviation of the variable d.

ns

dt

d /=


Case 2: Equal Population Standard Deviations: The following statistic is distributed as t distribution with (n1+n2 -2)

d.f.

The pooled standard deviation,

n1 and n2 are the sample sizes and S1 and S2 are the sample standard deviations of two groups.

21

21

11

)(

nnS

xxt

p +

−=

2

)1()1(

21

222

211

−+−+−

=nn

SnSnSp


Case 3: Unequal population standard deviations The following statistic follows t distribution.

The d.f. of this statistic is,

2

22

1

21

2121 )()(

ns

ns

xxt

+

−−−=

μμ

( )

1)/(

1)/(

//

2

22

22

1

21

21

2

2221

21

−+

−

+=

nns

nns

nsnsv

Student’s t statistic One-sided

There can only be on direction of effect The investigator is only interested in one

direction of effect. Greater power to detect difference in

expected direction Two-sided

Difference could go in either direction More conservative

Student’s t statistic

One group Two groups

One sided A single mean differs from a known value in a specific direction. e.g. mean > 0

Two means differ from one another in a specific direction. e.g., mean2 < mean1

Two sided A single mean differs from a known value in either direction. e.g., mean ≠ 0

Two means are not equal. That is, mean1 ≠ mean2

Student’s t statistic

SPSS One Group: Analyze>Compare Means> One-

Sample T Test Two Groups (Matched Samples):

Analyze>Compare Means> Paired Samples T Test

Two Groups: Analyze>Compare Means> Independent Samples T Test

Student’s t statistic R

The default t-test ist.test(x, y = NULL, alternative = "two.sided", mu = 0, paired = False, var.equal = FALSE, conf.level = 0.95)

Where x and y are two data for two numeric variables. We need to change only default settings matching with the case we want to perform. For example,

One Group: t.test(x, alternative=“greater”, mu=30) Two Groups (Matched Samples): t.test(x, y,

alternative= "less", mu = 0, paired = TRUE,) Two Groups: t.test(x,y, alternative=“greater”, mu=0,

var.equal = TRUE)

Student’s t-statistic

MS Excel (in Tools -> Data Analysis…) One Group: Not available Two Groups (Matched Samples):

t-Test: Paired two sample for mean Two Groups (Independent Samples):

t-Test: Two-Sample Assuming Equal Variances t-Test: Two-Sample Assuming Unequal Variances

Example 1

Consider the heights of children 4 to 12 years old in dataset 1 of our course website (variable ‘hgt’). Suppose we want to test if the average height (µ) for this age group in the population is 50 inches, using our sample of 60 children. We will use 5% level of significance.

This is a one-sample, two-sided test.

Example 1 Hypotheses:

H0: µ = 50

Ha: µ ≠ 50

Computation in Excel: Excel does not have a 1-sample test, but we can

fool it. Create a dummy column parallel to the hgt column

with an equal number of cells, all set to 0.0 Run the Matched sample test using hgt and the

dummy column and 50 as the hypothesized mean difference.

The p-value for two tail test is 0.0092

Example 1

Using SPSS: Analyze> Caompare Means >One Sample T

Test > Select hgt > Test value: 50 > ok P-value is .009

Using R, t.test(df1$hgt, mu=50) Two-tail p-value is .0092

Example 2 Suppose we want to compare the height of two groups

(hgt in each sex from dataset). H0: Mean heights are equal for the two sexes. Ha: Mean heights are not equal

Using MS-Excel: Sort data by sex (data>sort>by:sex) In Data Analysis… t-test:Two-sample Assuming equal variance select the range of hgt for all sex = f as Variable 1 Range select the range of hgt for all sex = m as Variable 2 Range P-value for two-sided test = 0.205

Example 2 Using SPSS:

Analyze>Compare Means>Independent-Samples T-test>

Select hgt as a Test Variable Select sex as a Grouping Variable In Define Groups, type f for Group 1 and m for Group

2 Click Continue then OK It gives us the p-value 0.205. We can assume equal

variance as the p-value of F statistic for testing equality of variances is 0.845.

Sign Test (Nonparametric) Use:

(1) Compare the median of a single group with a specified value (instead of single sample t-test).(2) Compare medians of two matched groups (instead of Two matched samples t-test)

Test Statistic: Number of positive difference of (median-c). The number of positive difference follows a Binomial distribution.

Sign Test (Nonparametric) SPSS: Analyze> Nonparametric Tests>

Binomial R: sign.test(x, y = NULL, md = 0,

alternative = "two.sided", conf.level = 0.95) For testing the median (md) of a single

sample, use data only for one variable. To compare paired data, use two paired

variables. NB: This test requires the BSDA package

Wilcoxon Signed-Rank Test: USE:

Compares medians of two paired samples. Test Statistic:

Consider n pairs of data of two variables x and Y, then the following statistic is known as Wilcoxon signed rank statistic.WS = Sum of the rank of positive differences after assigning ranks to the absolute value of differences.

Wilcoxon Rank-Sum Test

Use: Compares medians of two independent groups.

Test Statistic: Let, X and Y be two samples of sizes m and n. Suppose N=m+n. Compute the rank of all N observations. Then, the statistic,Wm= Sum of the ranks of all observations of variable X.

Wilcoxon Signed-Rank Test & Wilcoxon Rank-Sum Test

SPSS: Two Matched Groups: Analyze>

Nonparametric Tests> 2 Related Samples Two Groups: Analyze> Nonparametric

Tests> 2 Independent Samples

Wilcoxon Signed-Rank Test: /Wilcoxon Rank-Sum Test

R: The default test is

wilcox.test(x, y, alternative = "two.sided", mu = 0, paired = FALSE, exact = FALSE, conf.int = FALSE, conf.level = 0.95) Two matched Groups: wilcox.test(x, y, alternative =

“less", paired = TRUE) Two Groups: wilcox.test(x, y, alternative = “greater“)

Subject Hours of Sleep Difference Rank Ignoring Sign

Drug Placebo

1 6.1 5.2 0.9 3.5

2 7.0 7.9 -0.9 3.5

3 8.2 3.9 4.3 10

4 7.6 4.7 2.9 7

5 6.5 5.3 1.2 5

6 8.4 5.4 3.0 8

7 6.9 4.2 2.7 6

8 6.7 6.1 0.6 2

9 7.4 3.8 3.6 9

10 5.8 6.3 -0.5 1

3rd & 4th ranks are tied hence averaged.

P-value of this test is 0.02. Hence the test is significant at any level more than 2%, indicating the drug is more effective than placebo.

Example 3 (two matched samples)

Proportion Tests

Use Test for equality of two Proportions

E.g. proportions of subjects in two treatment groups who benefited from treatment.

Test for the value of a single proportion E.g., to test if the proportion of smokers in a

population is some specified value (less than 1)

Proportion Tests Formula

One Group:

Two Groups:

npp

ppz

)1(

ˆ

00

0

−−

=

.ˆ where

)11

)(ˆ1(ˆ

ˆˆ

21

21

21

21

nn

xxp

nnpp

ppz

++

=

+−

−=

Proportion Test SPSS:

One Group: Analyze> Nonparametric Tests> Binomial Two Groups?

R: The default tests are:

One Group: binom.test(x, n, p = 0.5, alternative = "two.sided", conf.level = 0.95)

Two Groups: prop.test(c(x,y), c(m,n), p = NULL, alternative = "two.sided", conf.level = 0.95, correct = TRUE)

X, Y are the number of successes and m and n are the sample sizes

Example 4: Proportion of males in Dataset 1 R:

n=60 and there are 30 males binom.test(30,60) returns a p-value of 1.0.

SPSS: recode sex as numeric -

Transform> Recode>Into Different Variables> Make all selections there and click on Change after recoding character variable into numeric.

Analyze> Nonparametric test> Binomial> select Test variable> Test proportion Set null hypothesis = 0.5 The p-value = 1.0

Chi-square statistic USE

Testing the population variance σ2= σ02.

Testing the goodness of fit. Testing the independence/ association of attributes

Assumptions Sample observations should be independent. Cell frequencies should be >= 5. Total observed and expected frequencies are

equal

Chi-square statistic Formula: If xi (i=1,2,…n) are independent and

normally distributed with mean µ and standard deviation σ, then,

If we don’t know µ, then we estimate it using a sample mean and then,

d.f.n on with distributi a is 2

1

2

χσ

μ∑=

⎟⎠

⎞⎜⎝

⎛ −n

i

ix

d.f. 1)-(non with distributi a is 2

1

2

χσ∑

=⎟⎠

⎞⎜⎝

⎛ −n

i

i xx

Chi-square statistic For a contingency table we use the following

chi- square test statistic,

Frequency Expected

Frequency Observed

d.f. 1)-(n with as ddistribute ,)( 2

1

22

==

−=∑

=

i

i

n

i i

ii

EO

EEO χχ

Chi-square statistic

SPSS: Analyze> Descriptive stat> Crosstabs>

statistics> Chi-square Select variables. Click on Cell button to select items you

want in cells, rows, and columns.

Example 5 (class demonstration)

Make a contingency table using two variables sex and grp from our dataset. Analyze> Descriptive statistics> crosstabs>

select variables for rows and columns Statistics> Chi-square> Continue> Cells>

selection> ok. It will give us a contingency table and p-value

of Pearson Chi-square Tests. For this particular case, the p-value of Pearson-

Chi-square test is 0.549 and d.f. is 2.

F-statistic

Use: Testing the equality of population

variances. Testing the significance of difference of

several means in analysis of variance.

F-statistic Let X and Y be two independent Chi-square variables with

n1 and n2 d.f. respectively, then the following statistic follows a F distribution with n1 and n2 d.f.

Let, X and Y are two independent normal variables with sample sizes n1 and n2. Then the following statistic follows a F distribution with n1 and n2 d.f.

Where, sx2 and sy

2 are sample variances of X and Y.

2/

1/21 , nY

nXF nn =

2

2

, 21

y

xnn s

sF =

F-statistic

Hypotheses:H0: µ1= µ2=…. =µn

Ha: µ1≠ µ2 ≠ …. ≠µn

Comparison will be done using analysis of variance (ANOVA) technique. ANOVA uses F statistic for this comparison. The ANOVA technique will be covered in another class session.

Student’s t statistic

Documents

Transcript of Student’s t statistic