Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a...

41
Örebro University Örebro University School of Business Statistics, Paper, Second level, 15 Credits Supervisor: Sune Karlsson Examiner: Thomas Laitila Spring 2014 Another Student’s T-test Proposal and evaluation of a modified T-test Jonas Englund 880131

Transcript of Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a...

Page 1: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

Örebro University

Örebro University School of Business

Statistics, Paper, Second level, 15 Credits

Supervisor: Sune Karlsson

Examiner: Thomas Laitila

Spring 2014

Another Student’s T-test

Proposal and evaluation of a modified T-test

Jonas Englund

880131

Page 2: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

Abstract In this paper we propose a way of performing hypothesis tests by utlizing all information

that we know under the null hypothesis. W.S. Gosset, also known as Student, derived the

famous Student’s T-test in the early days of the twentieth century and this is where this

paper departures from. It turns out that when using Student’s T-test we are not using all

available information that is known under the null hyptothesis. By using all known

information we can get a better variance estimator than the usual variance estimator. The test

based on this variance estimator is in this paper called Another Student’s Test (AST). The

test is evaluated with the use of simulation and compared to Student’s T-test. The conclusion

that we arrived at were that AST and Student’s T can not be said to perform any different, at

least not under the settings used in this paper. Albeit this, further analysis should be carried

out to investigate AST further, and a couple of situations where the ideas can be used are

proposed.

Page 3: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

Table of contents

1 Introduction ............................................................................................................................ 1

2 Hypothesis testing in general ................................................................................................. 1

2.1 Test evaluation ................................................................................................................ 2

3. Student’s T-test ..................................................................................................................... 2

4 Another student’s T-test ........................................................................................................ 2

4.1 Proof of as a superior variance estimator under the null ........................................... 4

4.2 as a maximum likelihood estimator ........................................................................... 6

4.3 Formal derivation of the -statistic ................................................................................ 7

5 Probability distribution of .................................................................................................. 7

5.1 Probability distribution of under ........................................................................... 7

5.2 Probability distribution of under ........................................................................... 9

6 Evaluation ............................................................................................................................ 11

6.1 Power estimation .......................................................................................................... 12

6.2 Graphical evaluation ..................................................................................................... 12

6.3 Non-graphical evaluation.............................................................................................. 16

6.4 Analysis ........................................................................................................................ 17

6.5 Summary of evaluation ................................................................................................. 19

7 Extensions of the test ........................................................................................................... 19

8 Summary and conclusions ................................................................................................... 21

References ............................................................................................................................... 22

Appendix A: Review of method of evaluation ....................................................................... 23

A.1 Code in R ..................................................................................................................... 23

Appendix B: Lyapunov’s CLT ............................................................................................... 25

B.1 Code in R ..................................................................................................................... 25

Appendix C: More graphs ...................................................................................................... 27

Page 4: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

1

1 Introduction In this paper we introduce a variant of Student’s T-test which we call Another Student’s Test

(AST). The rationale behind this test will be thoroughly reviewed and explained. The

characteristics of the test will be examined, such as its probability distribution under both the

null and otherwise. Critical values will be derived via Monte Carlo simulation and power

estimates in various situations will also be estimated via simulation. The test will be

evaluated by estimating the test’s power in various situations and by comparing it with the

power of the one sample Student’s T-test. Since the use of Student’s T-test is the golden

standard in the situation of testing if a mean of a single group is equal to some constant,

given that the assumption of normality is met, we will also test the hypothesis of no

difference between the tests. We begin by an introduction to hypothesis testing, followed by

an introduction to Student’s T-test and then the AST. This is then followed by an evalutation

of the tests, that is, a comparison between the tests performances in terms of power and then

extensions of the test is proposed. The purpose of this paper is simply to examine AST,

evaluate it and test whether it is equally good as Student’s T-test; all of this under the

assumption of a normally distributed variable.

2 Hypothesis testing in general “A hypothesis is a statement about a population parameter”, as Casella & Berger (2002, p.

373) expresses it. In most cases we only have sample data and the aim of a hypothesis test is

to get an indication of whether the null can be rejected or not. In a hypothesis testing

situation, a null and an alternative hypothesis is predetermined before the test is carried out.

The null hypothesis can, in mathematical notation, be expressed as

where the alternative hypothesis is, in general, the complement of the null. When performing

hypothesis tests we assume that the null is true and evaluate whether the result we got is

probable. In other words: we calculate or estimate the probability of attaining the result we

got or more extreme, given that the null is true. There are two types of errors that can occur

when carrying out hypothesis tests, there are type I errors which is when the test tells us to

reject the null when the null is true (the probability of this occurence when the null is true, is

denoted ), and there are type II errors which is when the test tells us not to reject the null

when it is false. Then there are two other possibilities; the probability of not rejecting the

null when the null is true; and the other is the power of the test, that is, the probability of

rejecting the null when the null is false (often denoted ). The larger the probability of

correctly rejecting the null, given some level of significance, the better the test is (Casella &

Berger, 2002).

Page 5: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

2

2.1 Test evaluation There are many ways of evaluating tests. In this paper we will evaluate whether AST is

equally powerful as Student’s T-test, and this is carried out via simulation based methods.

See section 6 for more on this topic.

3. Student’s T-test Gosset, also known as Student which he used as a pseudonym when publishing his work,

was interested in the behaviour of the probability distribution of the T-statistic in small

samples. Early on in the twentieth century many statisticians did not distinguish between the

true variance population parameter, , and the estimated variance, . Gosset worked as a

brewer at Guinness’ brewery at the research department and when they started doing

research they often used small samples. So, Gosset started his work on Student’s T-test with

some help from another famous statistician, namely Ronald Fischer (Box, 1987).

Gosset and his team at the brewery made experiments and when they treated the sample

variance as population variance they found that the results were not trustworthy. Which in

turn led him to dig in to the derivation of Student’s T-test. Gosset derived the distribution of

the following statistic

√ ⁄

which he found out had the following probability distribution function

√ (

)

where denotes the gamma function and the degrees of freedom. This finding made it

possible to test hypotheses reliably in small samples (Box, 1987; Casella & Berger, 2002).

4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this

section is that is normally distributed and that each sampled observation is independent

and identically distributed (I.I.D). The ordinary T-test has the following form (Casella &

Berger, 2002)

(1)

where is the sample mean, is the hypothesized population mean or expected value and

is the estimated population variance and is defined as

Page 6: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

3

The hypothesis in such a test is, in the simpliest case, stated as:

(2)

The test statistic in (1) follows a central t-distribution with degrees of freedom if is

normally distributed (Casella & Berger, 2002). Below is a modified T-test introduced that

uses a variance estimator that is superior compared to under the null hypothesis,

(3)

where

The logical foundation for this test is somewhat similar to that of the score test when we are

dealing with a binomial distributed variable for which the test looks like:

(4)

this test statistic is asymptotically standard normal (Casella & Berger, 2002). In this test, the

standard error (see the denominator above) of is a function of , which is equal to if the

null is true. By utilizing the same idea we can make use of the same methodology as in (4)

also in the case of when we have a normally distributed variable, . We begin by showing

that the usual variance estimator, , is an inferior estimator of when the null is true,

compared to . The usual variance estimator, , has an expected value equal to and

variance equal to (Wackerly, Mendenhall & Sheaffer, 2002). Ghosh (1979)

Page 7: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

4

give a thorough review of the score statistic described in (4). He provides evidence that the

power function of the score test and the usual approximate Z-test crosses one another,

therefore we hypothesize that the power function of the test, , also is a more powerful test

than in some regions. In order to use this modified T-test we need to find the distribution

of , or at least the critical value at level for a given sample size.

In fact, this test is more rational than Student’s T-test since it utilizes more information,

information that is known (well, not known since then we would not have to perform a test,

but known under the null hypothesis, which we assume is true).

4.1 Proof of as a superior variance estimator under the null To begin with, the following moments has to be established in order to complete the proof

1:

[ ]

[ ]

[ ]

and

[ ]

The following general result is also used to establish the proof:

[ ] [ ] [ ] [ ]

[ ] [ ]

Now we can start by giving a proof of this variance estimator’s unbiasedness under the null,

[ ] [

] [ ]

We can now see that, under , this is equal to and from this it follows that [ ]

. So, when the null is true this variance estimator is unbiased and next is a proof of it’s

superior (lower, that is) variance,

1 See Bryc (1995) for a derivation of these results.

Page 8: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

5

[ ] [ ] [ ] [

]

[

]

[

]

[

]

[

]

[

]

[ ] [ ] [

]

[ ]

(5)

In order to forthgo from here we need to establish a few intermediate results. We have that

[ ] [

] [

] [ ] [ ]

and

[ ] [ ] [ ]

[ ]

[ ]

and

[ ] [

]

[ ∑

] [ ] [ ] [ ]

( )

By inserting these results into (5) and simplifying a little bit, we are arriving at

[ ] (

( )

) [ ]

[

( )

]

Page 9: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

6

Now, by setting , this expression can be simplified to

[ | ] [ ]

We have now established that is a superior variance estimator under the null.

4.2 as a maximum likelihood estimator We can also show that is the maximum likelihood estimator of when is known. For

an introduction to maximum likelihood estimation, see, for example, Casella and Berger

(2002). Consider the following pdf of , which is the pdf of a normally distributed variable,

from where we need to solve for as described next

( | )

We have that:

( | )

( (√ ) (√ )

)

by setting this equal to zero and solve for , we get

Now we have established that this estimator is the maximum likelihood estimator and also

that it has a variance less than the usual variance estimator under the null. It can also be

shown that this estimator is, in fact, the best unbiased estimator; that is, the unbiased

estimator with least variance. In proving this it suffices to prove that the variance is equal to

the Cramer Rao lower bound. But the proof of that this is the case is omitted and instead we

refer to page 340 in Casella and Berger (2002). Now when arguments for the use of

instead of when testing the hypothesis in (2) has been given, we proceed to a more formal

derivation of the test described in (3).

Page 10: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

7

4.3 Formal derivation of the -statistic A Wald statistic is, asymptotically, a standard normal stochastic variable and can be derived

in the following way, in accordance with Casella and Berger’s (2002) terminology,

√ [

| ]

where is the standard error for , where is an estimate of and if it is the MLE, the

denominator on the right hand side of the equation (the observed information number) is a

resonable estimate of (Casella & Berger, 2002). Therefore, we can derive the AST as a

Wald statistic. Now we can begin the derivation of AST. Based on the following results

|

| |

( | )

( | )

( | )

we can see that

If we estimate with we would have the usual T-test. But, as said before, we do have an

estimator of that is better than when the null is true, thus the use of . We know the

asymptotic distribution of this statistic, but not the distribution in finite samples; which is

reviewed in the next section.

5 Probability distribution of In this section we will visually provide estimated probability distribution of under various

circumstances.

5.1 Probability distribution of under The estimated probability distribution will be displayed in histograms based on simulation,

where number of runs are . To begin with, estimated probability distributions

will be given for under the null. When estimating the probability

distribution for a given sample size under the null we generate a sample under the null and

Page 11: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

8

save the -statistic, which is repeated times. We can begin by noting the following in the

case of a sample size of one, and then turn to distributions of more interesting sample sizes:

| | {

Page 12: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

8

Figure 1: Probability distribution of under ,

.

Figure 3: Probability distribution of under ,

.

Figure 2: Probability distribution of under ,

.

Figure 4: Probability distribution of under ,

.

Figure 5: Probability distribution of under , .

Page 13: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

9

Figure 6: Probability distribution of under , . A fitted standard normal density line is also

apparent in the figure.

As seen above, the distribution for small samples is somewhat peculiar while it seem to

follow a Gaussian distribution in the ”asymptotical” case, as expected.

5.2 Probability distribution of under In this section estimated probability distributions will be displayed for the same sample sizes

as the last section, that is, . Distributions will also be displayed for

{ √

Where is a standard measure of effect size and is called Cohen’s d (Cohen, 1992), defined

by

Next are estimated probability distributions under different circumstances displayed.

Page 14: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

10

Figure 7: Probability distribution of under

, n=2 and √ .

Figure 9: Probability distribution of under ,

n=3 and √ .

Figure 11: Probability distribution of under ,

n=7 and √ .

Figure 8: Probability distribution of under ,

n=2 and √ .

Figure 10: Probability distribution of under ,

n=3 and √ .

Figure 12: Probability distribution of under ,

n=7 and √ .

Page 15: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

11

Figure 13: Probability distribution of under ,

and √ .

Figure 15: Probability distribution of under ,

and √ .

Figure 17: Probability distribution of under ,

and √ .

Figure 14: Probability distribution of under ,

and √ .

Figure 16: Probability distribution of under ,

and √ .

Figure 18: Probability distribution of under ,

and √ .

Figures where Cohen’s d is positive is not displayed since they look the same but in the other

”direction”. Histograms for positive values of can be given upon request.

6 Evaluation In the evaluation of the tests we have to consider various circumstances, such as different

sample sizes and effect sizes. Comparison between the tests will mainly be displayed

Page 16: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

12

graphically with the use of power function plots, with exact power of Student’s T-test on the

x-axis and estimated difference in power between the tests on the y-axis. Evaluation will be

made for all sample sizes between 2 and 30.

6.1 Power estimation Since we do not know the probability distribution function of AST we have to estimate

critical values in order to attain them. We estimate critical values under and when

estimating the power, rejection of the null is made if the test statistic takes on a value in the

range of the rejection region. For estimation of critical values,

replications has been used. For power estimation, . The method is outlined

below

(1) Generate a vector with sample size n, , where , .

(2) Calculate .

(3) Repeat (1)-(2) times.

(4) Calculate critical value, | | | ⁄ |

.

(5) Generate , where , .

(6) Calculate .

(7) Repeat (5)-(6) times.

(8) Count proportion of times | | , which is the power estimate.

When estimating the critical value we assume a symmetrical distribution. Moreover, when

estimating power of both for different and , the random numbers are generated

independently of each other.

6.2 Graphical evaluation In this section we will evaluate the power of AST graphically by plotting the estimated

difference in power between AST and Student’s T-test2. This is done for each sample size

from 2 to 30. In each figure, the estimated difference in power is made for each value of the

power of Student’s T-test ( ) from . In other

words, estimated difference will be displayed for equal to 0.02, 0.03, …, 0.99 when the

alpha level is 0.01. For example, see figure below:

2 For a thorough review of how this is done, see Appendix A.

Page 17: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

13

Figure 19: The estimated difference in power between AST and Student’s T-test for , and a

two-sided hypothesis. Dashed lines are exact 5 percent critical values under the null.

Estimation of the difference in power in the example above is made for

. For , for example, Cohen’s is attained and the

power for AST is estimated via simulation runs, given Cohen’s , and then the

estimated difference in power is plotted. This is then repeated for all values of as

described right before Figure 19.

Values of that gives the desired power for Student’s T-test were derived, exactly

calculated, not estimated, by specifying sample size, power, type of test and the direction of

the hypothesis. Then is attained by finding which value of in the equation below that

gives the area, , outside of the critical values,

√ ( )

(

(

√ )

)

where is the non-centrality parameter. The non-centrality parameter can be caracterized by

√ ⁄

where is standard normal, is a distributed random variable with degrees of

freedom. In this particular case, is equal to . Since is determined by , and ,

where is the only unknown parameter, the equation can be solved. Fortunately, the

Page 18: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

14

findings of more than 16 thousand different values of in this paper were not calculated by

hand. A program called G*Power 3.1.7 was used to attain values on for different sample

sizes and power.

Each power estimate in figure 19 is carried out pseudo3 independently of each and every

other estimate. The fact that they are pseudo independent enables us to carry out a simple

test of the hypothesis

but more about this in section 6.5. In figure 19 we can also see dashed lines, these are 5

percent critical values for the difference in power when the null is true (that is, when the tests

have equal power) and is calculated as

As we can see in the following sections, the difference in power between the tests seem to

be: none! In the following section only a portion of the results is displayed, see Appendix C

for the rest of the results. Next are some more power function graphs displayed.

3 The only dependency between the power estimates is that they are based on the same estimated critical value,

but since the critical value is estimated from ten million simulations we can say that the power estimates are

almost independent of each other, thus the term ”pseudo independent”.

Page 19: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

15

Figure 20: The estimated difference in power between AST and Student’s T-test for , and a

two-sided hypothesis. Dashed lines are exact 5 percent critical values under the null.

Figure 21: The estimated difference in power between AST and Student’s T-test for , and a

two-sided hypothesis. Dashed lines are exact 5 percent critical values under the null.

Page 20: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

16

Figure 22: The estimated difference in power between AST and Student’s T-test for , and a

one-sided hypothesis. Dashed lines are exact 5 percent critical values under the null.

As we can see in the figures above, they all seem to indicate that there is no difference in

power between the tests evaluated. The same pattern as seen above can also be seen for

many other sample sizes, different alpha levels and both for uni- and bi-directional

hypothesis; see Appendix C for power function plots for sample sizes from 2 to 30, for

significance levels 0.01, 0.05 and 0.1 and for both one and two sided hypotheses.

6.3 Non-graphical evaluation The distribution of the estimated difference in power is displayed next, and visually it may

seem to follow a normal distribution, but this is not the case!

Page 21: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

17

Figure 20: Estimated distribution of difference in power between AST and Student’s T- test.

Next is a table with descriptive statistics about the estimated difference in power between the

tests. And we can see that the kurtosis estimate indicate that the distribution is non-normal.

What is also apparent, is that the estimated expected value is in favor of Student’s T, but not

significantly as we will see in the next section.

Mean Variance Skewness Kurtosis Observations

-0.00000141 0.0000179 0.0134 3.53 16298

Table 1: Descriptive statistics of the estimated difference in power

6.4 Analysis So far it does not seem to be any difference what so ever between the tests. Fortunately, we

can test this. It is tempting to think that we can test whether the tests are doing equally well

by assuming that we can use a normal large sample test as described next.

The nominator above is the mean sample difference in power between and AST and the

denominator is the standard error of . This statistic is always asymptotically standard

normal if the null is true and [ ] [ ] , where is the difference in power

between and AST for obeservation . This is not true in this case since the variance of is

equal to where is the power at point and can be anything from to

Page 22: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

18

. The statistic is sometimes standard normally distributed in case of different variances

between observations though4. So we can carry out the test described above, and from table

1 we can after some calculation find the p-value for a two sided test, which is 0.966. Another

way of testing the hypothesis above is to count number of power estimates outside of the 95

percent confidence intervals (see section 6.2). Each of these points has probability 0.05 of

being in the rejection region (i.e., outside the confidence interval) if the null is true. From

this it follows that number of power estimates in the rejection region is binomially

distributed, and thus we can make use of the fact described next (Casella & Berger, 2002):

Lets define

{

then

and from this it follows that we can form 95 percent confidence intervals. The following is

attained when performing an analysis of whether statistical evidence exists that [ ] is

different from :

Mean Lower confidence limit Upper confidence limit

0.0523 0.0489 0.0558

Table 2: Estimated expected proportion of power estimates outside of the 95 percent confidence interval of that

power estimate.

As we can see, we do not have statistical evidence of that Student’s T-test and AST is

different in terms of power, at least not for samples smaller than or equal to 30. We can also

base a test on number of times the AST is estimated to be more powerful than Student’s T,

which is a binomially distributed variable with probability of success equal to one half under

the null. The p-value for this test is 0.033 based on a two sided hypothesis, so this test does

not fail to reject the null hypothesis of equally powerful tests.

4 Identical variance for each observation is not a necessity as is proven in Appendix B with a simple simulation

and a reference to Lyapunov’s central limit theorem.

Page 23: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

19

6.5 Summary of evaluation Three hyptothesis tests that tested the same hypothesis were carried out whereas two of them

failed to reject the null and one that did reject the null. If we would have taken into account

that multiple tests were performed by using, say, Bonferroni correction, we would not have

gotten any significant results though. Given the very large sample size and high p-values in

the hypothesis tests, arguments for any practical difference between the usual T-test and the

test proposed in this paper can hardly be made. Based on this, we can by all means fairly

confident say that it does not matter which test out of the two discussed in this paper that is

used, at least under the circumstances tested in this paper and also given that the normality

assumption is met.

7 Extensions of the test There are many situations where the same ideas can be used. One such situation is in

ordinary least squares regression (OLS). When testing whether a parameter is significantly

different from some value, we usually do not utilize all information known under the null.

The idea is the same as discussed in this paper: instead of estimating the standard error the

usual way, we can estimate it with information known under the null. In OLS we estimate a

model’s parameters in the following way

and the covariance matrix of our estimates are

An unbiased estimate of is

( )

(7)

where is the number of estimated parameters (Greene, 2000). Now, by utilizing the same

idea as in the variance estimator in AST we can get a better estimate of the standard error of

a parameter when testing hypothesis about either one or many parameters via an F-test. This

is carried out by setting the parameter estimates that we wish to test equal to what is stated in

the null hypothesis, when estimating . In other words, instead of using the vector of

estimated parameters,

Page 24: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

20

(

)

when estimating , we can use the parameter estimates along with the information known

under the null. For example, if we want to test whether all parameters except the intercept

are zero in a regression model with two independent variables, then would be estimated

by using the following information instead of , as shown below:

(

)

Another situation in which we can make use of the idea discussed in this paper is when

testing for equal means between two groups where we assume equal variances. The test

statistic in this case is carried out the following way (Wackerly, Mendenhall & Sheaffer,

2002):

where is an estimate of

and is estimated as follows

( )

( )

Another way of estimating the population variance is by using more information, which we

are able to do since we are assuming equal means. By using this information we can estimate

with the statistic described next:

( )

( )

where is the estimated total mean from both groups. In mathematical notation:

Page 25: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

21

In this way we are getting a better estimate, under the null, of the population variance.

8 Summary and conclusions The AST test has been derived and evaluated with Student’s T-test as reference point. It is

utilizing information known under the null hypothesis to get a better estimate of the

population variance. By doing this we hoped to get a test that performed better than

Student’s T-test. It did not! The conclusions are that in the settings tested in this paper, the

AST were performing neither worse nor better than Student’s T-test. A couple of extensions

of the test is proposed and should be evaluated in further analysis.

Conclusions of the findings here are that there is no need at all for using AST instead of

Student’s T-test. AST is not evaluated under situations when the assumption of normality is

not met either, where there are other tests that are well explored and that should be used

instead. Nevertheless, further analysis of test situations where we can utilize as much

information as possible from the null hypothesis should be carried out. And who knows, it

may outperform the T-test in the regression example discussed in section 7, but probably

not!

Page 26: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

22

References Box, J, F. (1987). Guinness, Gosset, Fischer, and small samples. Statistical Science, 2, 45-

52.

Bryc, W. (1995). The normal distribution: characterizations with applications. Springer-

Verlag.

Casella, G., & Berger, R. (2002). Statistical Inference: Second Edition. Duxbury Press.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.

Ghosh, B, K. (1979). A Comparison of Some Approximate Confidence Intervals for the

Binomial Parameter. Journal of American Statistical Association, 74, 894-900.

Greene, W. (2000). Econometric analysis: Fourth edition. Prentice-Hall: New Jersey.

Wackerly, D., Mendenhall, W., & Scheaffer, R. (2002). Mathematical Statistics with

Applications: Sixth edition. Duxbury Press.

Page 27: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

23

Appendix A: Review of method of evaluation In order to keep this section as simple as possible we will only digress the case where

, but the other cases is carried out in basically the same way. To generate the plots

described in 6.2 we can start by attaining values on Cohen’s d for each sample size between

2 and 30, and also for each point where . These values are then

put into a matrix

[

]

where, for example, is the value of Cohen’s d that gives an exact power for Student’s T-

test of when the sample size is . The next step is to attain the critical values for AST

for each sample size and put these into a vector,

The method of attaining these critical values is explained in 6.2. When this is done we can

run the code given in section A.1, which is semantically explained below. Only the

important parts of the code will be explained. The function is reviewed semantically below:

1. Function that returns estimated power of AST for a specified .

2. Run 1 for all .

3. Save plot and power estimates.

4. Run 2 and 3 for each value of .

The matrix is used to get the correct critical values for each sample size and matrix is

used to get correct . The code used to carry out these steps is given below and was

written in R.

A.1 Code in R prog <- function(M, alfa) {

e <- 100*alfa; f <- 100*alfa+1; g <- 99-100*alfa; l <- length(d[,1])

if ((alfa == 0.1 | alfa == 0.05 | alfa == 0.01) & g == l) {

poweronly <- function(N, n, mu, muzero, sd, cri) {

p <- numeric(N)

for (i in 1:(N)) {

x <- rnorm(n, mu, sd)

s2 <- sum((x-muzero)^2)/n

test <- sqrt(n)*(mean(x)-muzero)/sqrt(s2)

Page 28: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

24

if (abs(test) > cri) {p[i]=1}

}

mean(p)

}

tp <- numeric(g); for (i in f:99) {tp[i-e] <- c(0.01*i)}

r <- matrix(0, (g), 29)

for (j in 1:29) {

a <- crit[j,1]

for (i in 1:g) {

r[i,j] = poweronly(M, j+1, 0, d[i,j], 1, a)-tp[i]

}

plott <- plot(tp,r[,j], type="l", xlab="Actual power for Student's T-test", ylab="Estimated

difference in power", ylim=c(-0.018, 0.018))

curve(1.96*sqrt(x*(1-x)/M), lty="dashed", add=T); curve(-1.96*sqrt(x*(1-x)/M),

lty="dashed", add=T)

}

#return(r)

} else {"Wrong alfa level and/or non-conformable alfa level and matrix d"}

}

prog(10000, 0.05)

Page 29: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

25

Appendix B: Lyapunov’s CLT In this appendix I will give both a reference to Lyapunov’s CLT and a simulation-based

proof of that we can make use of a central limit theorem despite non-equal variances of the

observations . The outline of the simulation is as follows: (1) simulate

independently from the Bernoulli distribution and

assign a variance equal to , to each ; (2) calculate the test statistic

described in (6); (3) repeat step (1) and (2) 100 times; attain p-value from the Shapiro-Wilks

normality test; (4) repeat steps (1), (2) and (3) 10 000 times; (5) count number of times the

p-values are below . If (6) is standard normally distributed then this simulation

should yield a result between 0.0457 and 0.0543 95 percent of the times. The result of the

simulation was 0.0488, as expected.

This simulation merely illustrates the russian mathematician Alexander Lyapunov’s central

limit theorem, which states that if we are dealing with observations with unequal variances,

then

(6)

is asymptotically standard normally distributed if

∑ [| | ]

for some .

B.1 Code in R prog <- function(n,N,M) {

set.seed(12345)

Z <- numeric(N); shap <- numeric(M); x <- numeric(n); q <- numeric(20)

for (i in 1:20) {q[i]=0.04*i*(1-0.04*i)/1}; s <- sqrt(10*sum(q))

for (j in 1:M) {

for (a in 1:N) {

for (k in 0:(n/20-1)) {

for (i in 1:20) {

x[20*k+i] <- 0.04*i-rbinom(1,1,0.04*i)

}

}

Page 30: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

26

Z[a] <- mean(x)/(s)

}

shap[j] <- shapiro.test(Z)$p.value

}

p1 <- shap<0.1; p05 <- shap<0.05; p01 <- shap<0.01

return(list(mean(p1),mean(p05),mean(p01)))

}

prog(2000,100,10000)

Page 31: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

27

Appendix C: More graphs

C.1 Two-sided hypothesis and a significance level of 0.1

n=2

n=3

n=4

n=5

n=6

n=7

n=8

n=9

n=10

n=11

n=12

n=13

n=14

n=15

n=16

Page 32: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

28

n=17

n=18

n=19

n=20

n=21

n=22

n=23

n=24

n=25

n=26

n=27

n=28

n=29

n=30

C.2 Two-sided hypothesis and a significance level of 0.05

n=2 n=3 n=4

Page 33: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

29

n=5

n=6

n=7

n=8

n=9

n=10

n=11

n=12

n=13

n=14

n=15

n=16

n=17

n=18

n=19

n=20

n=21

n=22

Page 34: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

30

n=23

n=24

n=25

n=26

n=27

n=28

n=29

n=30

C.3 Two-sided hypothesis and a significance level of 0.01

n=2

n=3

n=4

n=5

n=6

n=7

n=8

n=9

n=10

Page 35: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

31

n=11

n=12

n=13

n=14

n=15

n=16

n=17

n=18

n=19

n=20

n=21

n=22

n=23

n=24

n=25

n=26

n=27

n=28

Page 36: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

32

n=29 n=30

C.4 One-sided hypothesis and a significance level of 0.1

Page 37: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

33

n=2

n=3

n=4

n=5

n=6

n=7

n=8

n=9

n=10

n=11

n=12

n=13

n=14

n=15

n=16

n=17

n=18

n=19

Page 38: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

34

n=20

n=21

n=22

n=23

n=24

n=25

n=26

n=27

n=28

n=29

n=30

C.5 One-sided hypothesis and a significance level of 0.05

n=2

n=3

n=4

n=5

n=6

n=7

Page 39: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

35

n=8

n=9

n=10

n=11

n=12

n=13

n=14

n=15

n=16

n=17

n=18

n=19

n=20

n=21

n=22

n=23

n=24

n=25

Page 40: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

36

n=26

n=27

n28

n=29

n=30

C.6 One-sided hypothesis and a significance level of 0.01

n=2

n=3

n=4

n=5

n=6

n=7

n=8

n=9

n=10

n=11

n=12

n=13

Page 41: Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this section is

37

n=14

n=15

n=16

n=17

n=18

n=19

n=20

n=21

n=22

n=23

n=24

n=25

n=26

n=27

n=28

n=29

n=30