Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a...
Transcript of Another Student’s T-test752341/FULLTEXT01.pdf · 4 Another student’s T-test This test is a...
Örebro University
Örebro University School of Business
Statistics, Paper, Second level, 15 Credits
Supervisor: Sune Karlsson
Examiner: Thomas Laitila
Spring 2014
Another Student’s T-test
Proposal and evaluation of a modified T-test
Jonas Englund
880131
Abstract In this paper we propose a way of performing hypothesis tests by utlizing all information
that we know under the null hypothesis. W.S. Gosset, also known as Student, derived the
famous Student’s T-test in the early days of the twentieth century and this is where this
paper departures from. It turns out that when using Student’s T-test we are not using all
available information that is known under the null hyptothesis. By using all known
information we can get a better variance estimator than the usual variance estimator. The test
based on this variance estimator is in this paper called Another Student’s Test (AST). The
test is evaluated with the use of simulation and compared to Student’s T-test. The conclusion
that we arrived at were that AST and Student’s T can not be said to perform any different, at
least not under the settings used in this paper. Albeit this, further analysis should be carried
out to investigate AST further, and a couple of situations where the ideas can be used are
proposed.
Table of contents
1 Introduction ............................................................................................................................ 1
2 Hypothesis testing in general ................................................................................................. 1
2.1 Test evaluation ................................................................................................................ 2
3. Student’s T-test ..................................................................................................................... 2
4 Another student’s T-test ........................................................................................................ 2
4.1 Proof of as a superior variance estimator under the null ........................................... 4
4.2 as a maximum likelihood estimator ........................................................................... 6
4.3 Formal derivation of the -statistic ................................................................................ 7
5 Probability distribution of .................................................................................................. 7
5.1 Probability distribution of under ........................................................................... 7
5.2 Probability distribution of under ........................................................................... 9
6 Evaluation ............................................................................................................................ 11
6.1 Power estimation .......................................................................................................... 12
6.2 Graphical evaluation ..................................................................................................... 12
6.3 Non-graphical evaluation.............................................................................................. 16
6.4 Analysis ........................................................................................................................ 17
6.5 Summary of evaluation ................................................................................................. 19
7 Extensions of the test ........................................................................................................... 19
8 Summary and conclusions ................................................................................................... 21
References ............................................................................................................................... 22
Appendix A: Review of method of evaluation ....................................................................... 23
A.1 Code in R ..................................................................................................................... 23
Appendix B: Lyapunov’s CLT ............................................................................................... 25
B.1 Code in R ..................................................................................................................... 25
Appendix C: More graphs ...................................................................................................... 27
1
1 Introduction In this paper we introduce a variant of Student’s T-test which we call Another Student’s Test
(AST). The rationale behind this test will be thoroughly reviewed and explained. The
characteristics of the test will be examined, such as its probability distribution under both the
null and otherwise. Critical values will be derived via Monte Carlo simulation and power
estimates in various situations will also be estimated via simulation. The test will be
evaluated by estimating the test’s power in various situations and by comparing it with the
power of the one sample Student’s T-test. Since the use of Student’s T-test is the golden
standard in the situation of testing if a mean of a single group is equal to some constant,
given that the assumption of normality is met, we will also test the hypothesis of no
difference between the tests. We begin by an introduction to hypothesis testing, followed by
an introduction to Student’s T-test and then the AST. This is then followed by an evalutation
of the tests, that is, a comparison between the tests performances in terms of power and then
extensions of the test is proposed. The purpose of this paper is simply to examine AST,
evaluate it and test whether it is equally good as Student’s T-test; all of this under the
assumption of a normally distributed variable.
2 Hypothesis testing in general “A hypothesis is a statement about a population parameter”, as Casella & Berger (2002, p.
373) expresses it. In most cases we only have sample data and the aim of a hypothesis test is
to get an indication of whether the null can be rejected or not. In a hypothesis testing
situation, a null and an alternative hypothesis is predetermined before the test is carried out.
The null hypothesis can, in mathematical notation, be expressed as
where the alternative hypothesis is, in general, the complement of the null. When performing
hypothesis tests we assume that the null is true and evaluate whether the result we got is
probable. In other words: we calculate or estimate the probability of attaining the result we
got or more extreme, given that the null is true. There are two types of errors that can occur
when carrying out hypothesis tests, there are type I errors which is when the test tells us to
reject the null when the null is true (the probability of this occurence when the null is true, is
denoted ), and there are type II errors which is when the test tells us not to reject the null
when it is false. Then there are two other possibilities; the probability of not rejecting the
null when the null is true; and the other is the power of the test, that is, the probability of
rejecting the null when the null is false (often denoted ). The larger the probability of
correctly rejecting the null, given some level of significance, the better the test is (Casella &
Berger, 2002).
2
2.1 Test evaluation There are many ways of evaluating tests. In this paper we will evaluate whether AST is
equally powerful as Student’s T-test, and this is carried out via simulation based methods.
See section 6 for more on this topic.
3. Student’s T-test Gosset, also known as Student which he used as a pseudonym when publishing his work,
was interested in the behaviour of the probability distribution of the T-statistic in small
samples. Early on in the twentieth century many statisticians did not distinguish between the
true variance population parameter, , and the estimated variance, . Gosset worked as a
brewer at Guinness’ brewery at the research department and when they started doing
research they often used small samples. So, Gosset started his work on Student’s T-test with
some help from another famous statistician, namely Ronald Fischer (Box, 1987).
Gosset and his team at the brewery made experiments and when they treated the sample
variance as population variance they found that the results were not trustworthy. Which in
turn led him to dig in to the derivation of Student’s T-test. Gosset derived the distribution of
the following statistic
√ ⁄
which he found out had the following probability distribution function
⁄
√ (
)
where denotes the gamma function and the degrees of freedom. This finding made it
possible to test hypotheses reliably in small samples (Box, 1987; Casella & Berger, 2002).
4 Another student’s T-test This test is a modified one sample student’s T-test. An assumption used throughout this
section is that is normally distributed and that each sampled observation is independent
and identically distributed (I.I.D). The ordinary T-test has the following form (Casella &
Berger, 2002)
√
(1)
where is the sample mean, is the hypothesized population mean or expected value and
is the estimated population variance and is defined as
3
∑
The hypothesis in such a test is, in the simpliest case, stated as:
(2)
The test statistic in (1) follows a central t-distribution with degrees of freedom if is
normally distributed (Casella & Berger, 2002). Below is a modified T-test introduced that
uses a variance estimator that is superior compared to under the null hypothesis,
√
(3)
where
The logical foundation for this test is somewhat similar to that of the score test when we are
dealing with a binomial distributed variable for which the test looks like:
√
(4)
this test statistic is asymptotically standard normal (Casella & Berger, 2002). In this test, the
standard error (see the denominator above) of is a function of , which is equal to if the
null is true. By utilizing the same idea we can make use of the same methodology as in (4)
also in the case of when we have a normally distributed variable, . We begin by showing
that the usual variance estimator, , is an inferior estimator of when the null is true,
compared to . The usual variance estimator, , has an expected value equal to and
variance equal to (Wackerly, Mendenhall & Sheaffer, 2002). Ghosh (1979)
4
give a thorough review of the score statistic described in (4). He provides evidence that the
power function of the score test and the usual approximate Z-test crosses one another,
therefore we hypothesize that the power function of the test, , also is a more powerful test
than in some regions. In order to use this modified T-test we need to find the distribution
of , or at least the critical value at level for a given sample size.
In fact, this test is more rational than Student’s T-test since it utilizes more information,
information that is known (well, not known since then we would not have to perform a test,
but known under the null hypothesis, which we assume is true).
4.1 Proof of as a superior variance estimator under the null To begin with, the following moments has to be established in order to complete the proof
1:
[ ]
[ ]
[ ]
and
[ ]
The following general result is also used to establish the proof:
[ ] [ ] [ ] [ ]
[ ] [ ]
Now we can start by giving a proof of this variance estimator’s unbiasedness under the null,
[ ] [
] [ ]
We can now see that, under , this is equal to and from this it follows that [ ]
. So, when the null is true this variance estimator is unbiased and next is a proof of it’s
superior (lower, that is) variance,
1 See Bryc (1995) for a derivation of these results.
5
[ ] [ ] [ ] [
]
[
]
[
]
[
]
[
]
[
]
[ ] [ ] [
]
[ ]
(5)
In order to forthgo from here we need to establish a few intermediate results. We have that
[ ] [
] [
∑
] [ ] [ ]
and
[ ] [ ] [ ]
[ ]
[ ]
and
[ ] [
]
[ ∑
∑
] [ ] [ ] [ ]
( )
By inserting these results into (5) and simplifying a little bit, we are arriving at
[ ] (
( )
) [ ]
[
( )
]
6
Now, by setting , this expression can be simplified to
[ | ] [ ]
We have now established that is a superior variance estimator under the null.
4.2 as a maximum likelihood estimator We can also show that is the maximum likelihood estimator of when is known. For
an introduction to maximum likelihood estimation, see, for example, Casella and Berger
(2002). Consider the following pdf of , which is the pdf of a normally distributed variable,
√
from where we need to solve for as described next
( | )
We have that:
( | )
( (√ ) (√ )
)
by setting this equal to zero and solve for , we get
Now we have established that this estimator is the maximum likelihood estimator and also
that it has a variance less than the usual variance estimator under the null. It can also be
shown that this estimator is, in fact, the best unbiased estimator; that is, the unbiased
estimator with least variance. In proving this it suffices to prove that the variance is equal to
the Cramer Rao lower bound. But the proof of that this is the case is omitted and instead we
refer to page 340 in Casella and Berger (2002). Now when arguments for the use of
instead of when testing the hypothesis in (2) has been given, we proceed to a more formal
derivation of the test described in (3).
7
4.3 Formal derivation of the -statistic A Wald statistic is, asymptotically, a standard normal stochastic variable and can be derived
in the following way, in accordance with Casella and Berger’s (2002) terminology,
√
√ [
| ]
where is the standard error for , where is an estimate of and if it is the MLE, the
denominator on the right hand side of the equation (the observed information number) is a
resonable estimate of (Casella & Berger, 2002). Therefore, we can derive the AST as a
Wald statistic. Now we can begin the derivation of AST. Based on the following results
|
√
| |
( | )
( | )
( | )
we can see that
√
If we estimate with we would have the usual T-test. But, as said before, we do have an
estimator of that is better than when the null is true, thus the use of . We know the
asymptotic distribution of this statistic, but not the distribution in finite samples; which is
reviewed in the next section.
5 Probability distribution of In this section we will visually provide estimated probability distribution of under various
circumstances.
5.1 Probability distribution of under The estimated probability distribution will be displayed in histograms based on simulation,
where number of runs are . To begin with, estimated probability distributions
will be given for under the null. When estimating the probability
distribution for a given sample size under the null we generate a sample under the null and
8
save the -statistic, which is repeated times. We can begin by noting the following in the
case of a sample size of one, and then turn to distributions of more interesting sample sizes:
√
√
| | {
8
Figure 1: Probability distribution of under ,
.
Figure 3: Probability distribution of under ,
.
Figure 2: Probability distribution of under ,
.
Figure 4: Probability distribution of under ,
.
Figure 5: Probability distribution of under , .
9
Figure 6: Probability distribution of under , . A fitted standard normal density line is also
apparent in the figure.
As seen above, the distribution for small samples is somewhat peculiar while it seem to
follow a Gaussian distribution in the ”asymptotical” case, as expected.
5.2 Probability distribution of under In this section estimated probability distributions will be displayed for the same sample sizes
as the last section, that is, . Distributions will also be displayed for
{ √
√
Where is a standard measure of effect size and is called Cohen’s d (Cohen, 1992), defined
by
Next are estimated probability distributions under different circumstances displayed.
10
Figure 7: Probability distribution of under
, n=2 and √ .
Figure 9: Probability distribution of under ,
n=3 and √ .
Figure 11: Probability distribution of under ,
n=7 and √ .
Figure 8: Probability distribution of under ,
n=2 and √ .
Figure 10: Probability distribution of under ,
n=3 and √ .
Figure 12: Probability distribution of under ,
n=7 and √ .
11
Figure 13: Probability distribution of under ,
and √ .
Figure 15: Probability distribution of under ,
and √ .
Figure 17: Probability distribution of under ,
and √ .
Figure 14: Probability distribution of under ,
and √ .
Figure 16: Probability distribution of under ,
and √ .
Figure 18: Probability distribution of under ,
and √ .
Figures where Cohen’s d is positive is not displayed since they look the same but in the other
”direction”. Histograms for positive values of can be given upon request.
6 Evaluation In the evaluation of the tests we have to consider various circumstances, such as different
sample sizes and effect sizes. Comparison between the tests will mainly be displayed
12
graphically with the use of power function plots, with exact power of Student’s T-test on the
x-axis and estimated difference in power between the tests on the y-axis. Evaluation will be
made for all sample sizes between 2 and 30.
6.1 Power estimation Since we do not know the probability distribution function of AST we have to estimate
critical values in order to attain them. We estimate critical values under and when
estimating the power, rejection of the null is made if the test statistic takes on a value in the
range of the rejection region. For estimation of critical values,
replications has been used. For power estimation, . The method is outlined
below
(1) Generate a vector with sample size n, , where , .
(2) Calculate .
(3) Repeat (1)-(2) times.
(4) Calculate critical value, | | | ⁄ |
.
(5) Generate , where , .
(6) Calculate .
(7) Repeat (5)-(6) times.
(8) Count proportion of times | | , which is the power estimate.
When estimating the critical value we assume a symmetrical distribution. Moreover, when
estimating power of both for different and , the random numbers are generated
independently of each other.
6.2 Graphical evaluation In this section we will evaluate the power of AST graphically by plotting the estimated
difference in power between AST and Student’s T-test2. This is done for each sample size
from 2 to 30. In each figure, the estimated difference in power is made for each value of the
power of Student’s T-test ( ) from . In other
words, estimated difference will be displayed for equal to 0.02, 0.03, …, 0.99 when the
alpha level is 0.01. For example, see figure below:
2 For a thorough review of how this is done, see Appendix A.
13
Figure 19: The estimated difference in power between AST and Student’s T-test for , and a
two-sided hypothesis. Dashed lines are exact 5 percent critical values under the null.
Estimation of the difference in power in the example above is made for
. For , for example, Cohen’s is attained and the
power for AST is estimated via simulation runs, given Cohen’s , and then the
estimated difference in power is plotted. This is then repeated for all values of as
described right before Figure 19.
Values of that gives the desired power for Student’s T-test were derived, exactly
calculated, not estimated, by specifying sample size, power, type of test and the direction of
the hypothesis. Then is attained by finding which value of in the equation below that
gives the area, , outside of the critical values,
√ ( )
∫
(
(
√ )
)
where is the non-centrality parameter. The non-centrality parameter can be caracterized by
√ ⁄
where is standard normal, is a distributed random variable with degrees of
freedom. In this particular case, is equal to . Since is determined by , and ,
where is the only unknown parameter, the equation can be solved. Fortunately, the
14
findings of more than 16 thousand different values of in this paper were not calculated by
hand. A program called G*Power 3.1.7 was used to attain values on for different sample
sizes and power.
Each power estimate in figure 19 is carried out pseudo3 independently of each and every
other estimate. The fact that they are pseudo independent enables us to carry out a simple
test of the hypothesis
but more about this in section 6.5. In figure 19 we can also see dashed lines, these are 5
percent critical values for the difference in power when the null is true (that is, when the tests
have equal power) and is calculated as
√
As we can see in the following sections, the difference in power between the tests seem to
be: none! In the following section only a portion of the results is displayed, see Appendix C
for the rest of the results. Next are some more power function graphs displayed.
3 The only dependency between the power estimates is that they are based on the same estimated critical value,
but since the critical value is estimated from ten million simulations we can say that the power estimates are
almost independent of each other, thus the term ”pseudo independent”.
15
Figure 20: The estimated difference in power between AST and Student’s T-test for , and a
two-sided hypothesis. Dashed lines are exact 5 percent critical values under the null.
Figure 21: The estimated difference in power between AST and Student’s T-test for , and a
two-sided hypothesis. Dashed lines are exact 5 percent critical values under the null.
16
Figure 22: The estimated difference in power between AST and Student’s T-test for , and a
one-sided hypothesis. Dashed lines are exact 5 percent critical values under the null.
As we can see in the figures above, they all seem to indicate that there is no difference in
power between the tests evaluated. The same pattern as seen above can also be seen for
many other sample sizes, different alpha levels and both for uni- and bi-directional
hypothesis; see Appendix C for power function plots for sample sizes from 2 to 30, for
significance levels 0.01, 0.05 and 0.1 and for both one and two sided hypotheses.
6.3 Non-graphical evaluation The distribution of the estimated difference in power is displayed next, and visually it may
seem to follow a normal distribution, but this is not the case!
17
Figure 20: Estimated distribution of difference in power between AST and Student’s T- test.
Next is a table with descriptive statistics about the estimated difference in power between the
tests. And we can see that the kurtosis estimate indicate that the distribution is non-normal.
What is also apparent, is that the estimated expected value is in favor of Student’s T, but not
significantly as we will see in the next section.
Mean Variance Skewness Kurtosis Observations
-0.00000141 0.0000179 0.0134 3.53 16298
Table 1: Descriptive statistics of the estimated difference in power
6.4 Analysis So far it does not seem to be any difference what so ever between the tests. Fortunately, we
can test this. It is tempting to think that we can test whether the tests are doing equally well
by assuming that we can use a normal large sample test as described next.
√
The nominator above is the mean sample difference in power between and AST and the
denominator is the standard error of . This statistic is always asymptotically standard
normal if the null is true and [ ] [ ] , where is the difference in power
between and AST for obeservation . This is not true in this case since the variance of is
equal to where is the power at point and can be anything from to
18
. The statistic is sometimes standard normally distributed in case of different variances
between observations though4. So we can carry out the test described above, and from table
1 we can after some calculation find the p-value for a two sided test, which is 0.966. Another
way of testing the hypothesis above is to count number of power estimates outside of the 95
percent confidence intervals (see section 6.2). Each of these points has probability 0.05 of
being in the rejection region (i.e., outside the confidence interval) if the null is true. From
this it follows that number of power estimates in the rejection region is binomially
distributed, and thus we can make use of the fact described next (Casella & Berger, 2002):
√
Lets define
{
then
√
and from this it follows that we can form 95 percent confidence intervals. The following is
attained when performing an analysis of whether statistical evidence exists that [ ] is
different from :
Mean Lower confidence limit Upper confidence limit
0.0523 0.0489 0.0558
Table 2: Estimated expected proportion of power estimates outside of the 95 percent confidence interval of that
power estimate.
As we can see, we do not have statistical evidence of that Student’s T-test and AST is
different in terms of power, at least not for samples smaller than or equal to 30. We can also
base a test on number of times the AST is estimated to be more powerful than Student’s T,
which is a binomially distributed variable with probability of success equal to one half under
the null. The p-value for this test is 0.033 based on a two sided hypothesis, so this test does
not fail to reject the null hypothesis of equally powerful tests.
4 Identical variance for each observation is not a necessity as is proven in Appendix B with a simple simulation
and a reference to Lyapunov’s central limit theorem.
19
6.5 Summary of evaluation Three hyptothesis tests that tested the same hypothesis were carried out whereas two of them
failed to reject the null and one that did reject the null. If we would have taken into account
that multiple tests were performed by using, say, Bonferroni correction, we would not have
gotten any significant results though. Given the very large sample size and high p-values in
the hypothesis tests, arguments for any practical difference between the usual T-test and the
test proposed in this paper can hardly be made. Based on this, we can by all means fairly
confident say that it does not matter which test out of the two discussed in this paper that is
used, at least under the circumstances tested in this paper and also given that the normality
assumption is met.
7 Extensions of the test There are many situations where the same ideas can be used. One such situation is in
ordinary least squares regression (OLS). When testing whether a parameter is significantly
different from some value, we usually do not utilize all information known under the null.
The idea is the same as discussed in this paper: instead of estimating the standard error the
usual way, we can estimate it with information known under the null. In OLS we estimate a
model’s parameters in the following way
and the covariance matrix of our estimates are
An unbiased estimate of is
( )
(7)
where is the number of estimated parameters (Greene, 2000). Now, by utilizing the same
idea as in the variance estimator in AST we can get a better estimate of the standard error of
a parameter when testing hypothesis about either one or many parameters via an F-test. This
is carried out by setting the parameter estimates that we wish to test equal to what is stated in
the null hypothesis, when estimating . In other words, instead of using the vector of
estimated parameters,
20
(
)
when estimating , we can use the parameter estimates along with the information known
under the null. For example, if we want to test whether all parameters except the intercept
are zero in a regression model with two independent variables, then would be estimated
by using the following information instead of , as shown below:
(
)
Another situation in which we can make use of the idea discussed in this paper is when
testing for equal means between two groups where we assume equal variances. The test
statistic in this case is carried out the following way (Wackerly, Mendenhall & Sheaffer,
2002):
√
where is an estimate of
and is estimated as follows
( )
( )
Another way of estimating the population variance is by using more information, which we
are able to do since we are assuming equal means. By using this information we can estimate
with the statistic described next:
( )
( )
where is the estimated total mean from both groups. In mathematical notation:
21
In this way we are getting a better estimate, under the null, of the population variance.
8 Summary and conclusions The AST test has been derived and evaluated with Student’s T-test as reference point. It is
utilizing information known under the null hypothesis to get a better estimate of the
population variance. By doing this we hoped to get a test that performed better than
Student’s T-test. It did not! The conclusions are that in the settings tested in this paper, the
AST were performing neither worse nor better than Student’s T-test. A couple of extensions
of the test is proposed and should be evaluated in further analysis.
Conclusions of the findings here are that there is no need at all for using AST instead of
Student’s T-test. AST is not evaluated under situations when the assumption of normality is
not met either, where there are other tests that are well explored and that should be used
instead. Nevertheless, further analysis of test situations where we can utilize as much
information as possible from the null hypothesis should be carried out. And who knows, it
may outperform the T-test in the regression example discussed in section 7, but probably
not!
22
References Box, J, F. (1987). Guinness, Gosset, Fischer, and small samples. Statistical Science, 2, 45-
52.
Bryc, W. (1995). The normal distribution: characterizations with applications. Springer-
Verlag.
Casella, G., & Berger, R. (2002). Statistical Inference: Second Edition. Duxbury Press.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.
Ghosh, B, K. (1979). A Comparison of Some Approximate Confidence Intervals for the
Binomial Parameter. Journal of American Statistical Association, 74, 894-900.
Greene, W. (2000). Econometric analysis: Fourth edition. Prentice-Hall: New Jersey.
Wackerly, D., Mendenhall, W., & Scheaffer, R. (2002). Mathematical Statistics with
Applications: Sixth edition. Duxbury Press.
23
Appendix A: Review of method of evaluation In order to keep this section as simple as possible we will only digress the case where
, but the other cases is carried out in basically the same way. To generate the plots
described in 6.2 we can start by attaining values on Cohen’s d for each sample size between
2 and 30, and also for each point where . These values are then
put into a matrix
[
]
where, for example, is the value of Cohen’s d that gives an exact power for Student’s T-
test of when the sample size is . The next step is to attain the critical values for AST
for each sample size and put these into a vector,
The method of attaining these critical values is explained in 6.2. When this is done we can
run the code given in section A.1, which is semantically explained below. Only the
important parts of the code will be explained. The function is reviewed semantically below:
1. Function that returns estimated power of AST for a specified .
2. Run 1 for all .
3. Save plot and power estimates.
4. Run 2 and 3 for each value of .
The matrix is used to get the correct critical values for each sample size and matrix is
used to get correct . The code used to carry out these steps is given below and was
written in R.
A.1 Code in R prog <- function(M, alfa) {
e <- 100*alfa; f <- 100*alfa+1; g <- 99-100*alfa; l <- length(d[,1])
if ((alfa == 0.1 | alfa == 0.05 | alfa == 0.01) & g == l) {
poweronly <- function(N, n, mu, muzero, sd, cri) {
p <- numeric(N)
for (i in 1:(N)) {
x <- rnorm(n, mu, sd)
s2 <- sum((x-muzero)^2)/n
test <- sqrt(n)*(mean(x)-muzero)/sqrt(s2)
24
if (abs(test) > cri) {p[i]=1}
}
mean(p)
}
tp <- numeric(g); for (i in f:99) {tp[i-e] <- c(0.01*i)}
r <- matrix(0, (g), 29)
for (j in 1:29) {
a <- crit[j,1]
for (i in 1:g) {
r[i,j] = poweronly(M, j+1, 0, d[i,j], 1, a)-tp[i]
}
plott <- plot(tp,r[,j], type="l", xlab="Actual power for Student's T-test", ylab="Estimated
difference in power", ylim=c(-0.018, 0.018))
curve(1.96*sqrt(x*(1-x)/M), lty="dashed", add=T); curve(-1.96*sqrt(x*(1-x)/M),
lty="dashed", add=T)
}
#return(r)
} else {"Wrong alfa level and/or non-conformable alfa level and matrix d"}
}
prog(10000, 0.05)
25
Appendix B: Lyapunov’s CLT In this appendix I will give both a reference to Lyapunov’s CLT and a simulation-based
proof of that we can make use of a central limit theorem despite non-equal variances of the
observations . The outline of the simulation is as follows: (1) simulate
independently from the Bernoulli distribution and
assign a variance equal to , to each ; (2) calculate the test statistic
described in (6); (3) repeat step (1) and (2) 100 times; attain p-value from the Shapiro-Wilks
normality test; (4) repeat steps (1), (2) and (3) 10 000 times; (5) count number of times the
p-values are below . If (6) is standard normally distributed then this simulation
should yield a result between 0.0457 and 0.0543 95 percent of the times. The result of the
simulation was 0.0488, as expected.
This simulation merely illustrates the russian mathematician Alexander Lyapunov’s central
limit theorem, which states that if we are dealing with observations with unequal variances,
then
∑
(6)
is asymptotically standard normally distributed if
√
∑ [| | ]
for some .
B.1 Code in R prog <- function(n,N,M) {
set.seed(12345)
Z <- numeric(N); shap <- numeric(M); x <- numeric(n); q <- numeric(20)
for (i in 1:20) {q[i]=0.04*i*(1-0.04*i)/1}; s <- sqrt(10*sum(q))
for (j in 1:M) {
for (a in 1:N) {
for (k in 0:(n/20-1)) {
for (i in 1:20) {
x[20*k+i] <- 0.04*i-rbinom(1,1,0.04*i)
}
}
26
Z[a] <- mean(x)/(s)
}
shap[j] <- shapiro.test(Z)$p.value
}
p1 <- shap<0.1; p05 <- shap<0.05; p01 <- shap<0.01
return(list(mean(p1),mean(p05),mean(p01)))
}
prog(2000,100,10000)
27
Appendix C: More graphs
C.1 Two-sided hypothesis and a significance level of 0.1
n=2
n=3
n=4
n=5
n=6
n=7
n=8
n=9
n=10
n=11
n=12
n=13
n=14
n=15
n=16
28
n=17
n=18
n=19
n=20
n=21
n=22
n=23
n=24
n=25
n=26
n=27
n=28
n=29
n=30
C.2 Two-sided hypothesis and a significance level of 0.05
n=2 n=3 n=4
29
n=5
n=6
n=7
n=8
n=9
n=10
n=11
n=12
n=13
n=14
n=15
n=16
n=17
n=18
n=19
n=20
n=21
n=22
30
n=23
n=24
n=25
n=26
n=27
n=28
n=29
n=30
C.3 Two-sided hypothesis and a significance level of 0.01
n=2
n=3
n=4
n=5
n=6
n=7
n=8
n=9
n=10
31
n=11
n=12
n=13
n=14
n=15
n=16
n=17
n=18
n=19
n=20
n=21
n=22
n=23
n=24
n=25
n=26
n=27
n=28
32
n=29 n=30
C.4 One-sided hypothesis and a significance level of 0.1
33
n=2
n=3
n=4
n=5
n=6
n=7
n=8
n=9
n=10
n=11
n=12
n=13
n=14
n=15
n=16
n=17
n=18
n=19
34
n=20
n=21
n=22
n=23
n=24
n=25
n=26
n=27
n=28
n=29
n=30
C.5 One-sided hypothesis and a significance level of 0.05
n=2
n=3
n=4
n=5
n=6
n=7
35
n=8
n=9
n=10
n=11
n=12
n=13
n=14
n=15
n=16
n=17
n=18
n=19
n=20
n=21
n=22
n=23
n=24
n=25
36
n=26
n=27
n28
n=29
n=30
C.6 One-sided hypothesis and a significance level of 0.01
n=2
n=3
n=4
n=5
n=6
n=7
n=8
n=9
n=10
n=11
n=12
n=13
37
n=14
n=15
n=16
n=17
n=18
n=19
n=20
n=21
n=22
n=23
n=24
n=25
n=26
n=27
n=28
n=29
n=30