Localization of Brain Activity Using Permutation Analysis

Localization of Brain Activity Using

Permutation Analysis

by

Hooman Alikhanian

A thesis submitted to the

Department of Mathematics and Statistics

in conformity with the requirements for

the degree of Master of Science

Queen’s University

Kingston, Ontario, Canada

June 2014

Copyright c© Hooman Alikhanian, 2014

Abstract

In this report we study bootstrap theory and permutation analysis as a hypothesis

testing method using bootstrap procedure. We investigate asymptotic properties of

the bootstrap procedure as well as bootstrap estimate accuracy using Edgeworth

and Cornish-Fisher expansions. We show that resampling with replacement from

data provides a theoretically sound method that outperforms Normal approximation

of data distribution in terms of convergence error and accuracy of estimates. We

conclude the report by applying permutation analysis on Magentoencephalography

(MEG) brain signals to localize human brain activity in pointing/reaching tasks and

find regions that are significantly active.

i

Acknowledgements

I would like to thank my supervisor Gunnar Blohm for his support throughout the

years of my research assistantship in Computational Neuroscience laboratory, keeping

me going when times were tough, insightful discussions, and offering invaluable advice.

Contents

Abstract i

Contents iii

List of Figures v

Chapter 1: Introduction 1

Chapter 2: Bootstrap Theory 3

2.1 Bootstrap Confidence Interval . . . . . . . . . . . . . . . . . . . . . . 62.2 Iterated Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 3: Hypothesis Testing and Permutation Analysis 13

3.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.1 The Neyman-Pearson Lemma . . . . . . . . . . . . . . . . . . 16

3.2 P-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Permutation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Chapter 4: Asymptotic Properties of the Mean 25

Chapter 5: Bootstrap Accuracy and Edgeworth Expansion 31

5.1 Edgeworth Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2 Bootstrap Edgeworth Expansion . . . . . . . . . . . . . . . . . . . . . 375.3 Bootstrap Confidence Interval Accuracy . . . . . . . . . . . . . . . . 39

Chapter 6: Results 43

6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.1.1 Experimental Paradigm . . . . . . . . . . . . . . . . . . . . . 456.1.2 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.2 Permutation Analysis Results . . . . . . . . . . . . . . . . . . . . . . 48

Chapter 7: Conclusion 57

iii

Bibliography 59

iv

List of Figures

6.1 The MEG experiment setup. (a) Time course of the experiment. (b)

Three postures of the hand were used in different recording blocks.

(c) The fixation cross in the middle with two possible target locations

in its left and right hand side. (d) Subjects sit upright under MEG

machine performing the pointing task with the wrist only. (e) Task:

target (cue) appears in either green or red to inform the subject of

the pro or anti nature of the pointing trials. Dimming of the central

fixation cross was the movement instruction for subjects. . . . . . . . 47

6.2 The diagram of the event-related beamformer [8]: The data consists of

T trials each with M channels and N time samples. The covariance

matrix of the data are given to the beamformer as well as the forward

solution for dipole at each location. Average source activity is then

estimated at each voxel, and dipole orientation is adjusted accordingly

to maximize power at the corresponding voxel. . . . . . . . . . . . . . 48

6.3 Average brain activation for pro condition/left target around move-

ment onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal,

and (c) coronal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.4 Average brain activation for pro condition/left target around cue onset

(0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal. 51

v

6.5 Average brain activation for anti condition/right target around move-

ment onset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal,

and (c) coronal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.6 Average brain activation for anti condition/right target around cue

onset (0-0.5 seconds) in three planes (a) transverse (b) sagittal, and

(c) coronal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.7 Permutation analysis for pro condition/left target around movement

onset in three planes with 95% p-values. Right panel: positive activity

(synchronization), Left panel: negative activity (desynchronization) . 54

6.8 Permutation analysis for pro condition/left target around cue onset in

three planes with 95% p-values. Null hypothesis is not rejected for

positive activation. Negative 95% significant activation is shown. . . . 55

6.9 Permutation analysis for anti condition/right target around movement

onset in three planes with 95% p-values. Right panel: positive activity

(synchronization), Left panel: negative activity (desynchronization) . 55

6.10 Permutation analysis for anti condition/right target around cue onset

in three planes with 95% p-values. Null hypothesis is not rejected for

positive activation. Negative 95% significant activation is shown. . . . 56

vi

1

Chapter 1

Introduction

Bootstrap and permutation tests are resampling methods. The main idea of re-

sampling, as the name suggests, is to estimate properties of a population (such as

variance, distribution, confidence intervals) by resampling from the original data. Of-

ten in practice access to the whole population is not possible or uneconomical, and

instead a sample data from the population is available. Bootstrap procedure provides

researchers with a tool to infer population properties by resampling from the data.

In this manuscript we study mathematical framework of the bootstrap to evaluate

the validity and accuracy of such an inference procedure.

Bootstrap is not a new idea. When we do not have information about the density

of the population under study and we wish to infer or estimate some properties of the

population from data, we consider the same functional of the sample (or empirical)

distribution. Instead of taking new samples from the population, we perform resam-

pling with replacement from the data.

The idea of using Monte Carlo resampling to estimate bootstrap statistics was

proposed by Efron (1979). The approximation improves as the number of resamples

2

increases. Often the number of resamples is in the order of thousands making resam-

pling methods computationally intensive.

Modern computers and software make it possible to use these computationally

intensive methods to estimate statistical properties in cases where classical methods

may be analytically intractable or unusable because of the lack of appropriate as-

sumptions being satisfied.

Resampling methods are advantageous to classical inference methods such as

Bayesian inference in practice in a sense that they do not require any assumption

on the population distribution. Resampling methods work in practice for statistics

for which no analytical solution is available for their distribution. Moreover, they

provide concrete analogies to theoretical concepts [18].

This report is organized as follows: In chapter 2, we study bootstrap procedure

mathematical formulation. In chapter 3 we study hypothesis testing and Neyman-

Pearson lemma, and permutation analysis as a method of solving hypothesis testing

problems using bootstrap procedure. In chapter 4 we study asymptotic properties

of bootstrap mean estimate. Chapter 5 investigates accuracy of bootstrap estimate

and confidence intervals. Finally, we conclude the report by applying permutation

analysis on a brain magnetic signal database to localize brain activity in a reaching

task.

3

Chapter 2

Bootstrap Theory

We have a sample data set of size n that is drawn randomly from a population,

that is, we have a set χ = X1, X2, ..., Xn of independent identically distributed

random variables drawn from an unknown population distribution function F . We

are interested in some functional θ(F ) of the population, e.g., population mean. For

instance, for the population mean θ is given as:

θ(F ) =

∫x dF (x). (2.1)

We do not know F to solve for θ. We estimate θ with θ, by estimating the

distribution function F . One unbiased estimator that can be used for this purpose is

the empirical distribution function, F , computed from the sample χ:

F (x) =1

n

n∑

i=1

I(Xi ≤ x), (2.2)

where I(.) is the indicator function.

The problem is to study the statistical properties of the estimator θ e.g., variance,

4

confidence interval. To this end, we need the distribution of θ−θ. Bootstrap procedure

gives us a tool to estimate the distribution function via resampling from the data χ.

Bootstrap procedure generally involves three steps [18]:

Step 1 Perform resampling with replacement on the data. In resampling, data points

are given the same chance to be chosen. Resampled data size is the same as

that of the original sample.

We want to count the number of resamples that can be drawn from the sample

set χ with replacement. There is a one-to-one correspondence between the

number of resamples and the number of ways of placing n indistinguishable

objects into n numbered boxes. The number of objects that end up in the ith

box is essentially the number of times that we choose data Xi. It follows that

the number of resamples, N(n) is given by:

N(n) =

(2n− 1

n

). (2.3)

Using Stirling’s formula n! ∼(ne

)n √2πn, we have:

N(n) ∼ (nπ)−1/222n−1. (2.4)

Thus, the number of resamples increases exponentially with n.

Step 2 For each resample set χ∗ = X∗1 , X

∗2 , ..., X

∗n calculate θ. The distribution of

these statistics is referred to as bootstrap distribution [18].

For example, if θ = θ(F ) = µ is the population mean and F is the empirical

5

distribution function which assigns weight 1nto each data point Xi, then:

θ =1

n

n∑

i=1

Xi. (2.5)

Thus, θ is the sample mean and the distribution of µ is estimated by calculating

the sample mean of each resample χ∗.

Step 3 Use the bootstrap distribution to construct confidence intervals for θ.

Classical inference theory tells us a great deal about the sample mean. For a Normal

population, sample mean is approximately Normally distributed for any sample size.

For large sample sizes the sample mean is Normally distributed for a broad range of

population distributions as long as the central limit theorem holds. Moreover, the

sample standard deviation for the sample mean is:

s =

√√√√ 1

n− 1

n∑

i=1

(Xi − X)2, (2.6)

where X is the sample mean.

However, for many statistics other than the sample mean, e.g., quantiles, calcu-

lating their standard deviation is analytically intractable let alone their distribution.

Of course, one way to get around this problem is to assume a Normal distribution

for the desired statistic and move forward. This method, however, may not work

for situations where the distribution is heavily skewed to one side or has heavy tails.

We will see that bootstrap gives a more accurate estimate of the distribution than

Normal approximation.

In order to estimate the distribution of the statistic, e.g., sample mean, one might

2.1. BOOTSTRAP CONFIDENCE INTERVAL 6

think that instead of resampling from the data, one can draw other sample sets from

the population, estimate the statistic from each sample set, and come up with an

estimate of the statistic distribution, e.g., sample mean distribution. In practice,

such an approach is difficult to implement, because often resampling from the pop-

ulation involves spending a lot of resources, e.g., financial resources that might not

be available. In some cases the population may not easily be accessible. The idea

of the bootstrap is that instead of referring to the population to draw other sample

sets, resampling is performed from the data at hand. To the extent that the data is

representative of the population distribution which is a valid assumption if sampling

methodology is right and the sample size is big enough, resampling from it is justi-

fied. Even when resources are available, it might be a better idea to take a larger

sample from the population and perform resampling on it instead of referring to the

population multiple times to draw smaller sample sets [18].

In the case of estimating the standard deviation of θ, SD(θ(X)), bootstrap prin-

ciple can be summarized as follows:

1. Resample with replacement from χ = X1, X2, ..., Xn to get the bootstrap

sample χ∗ = X∗1 , X

∗2 , ..., X

∗n, and calculate bootstrap estimate θ∗(X).

2. Get B independent bootstrap replicates θ∗1(X), θ∗2(X),...,θ∗B(X).

3. Estimate SD(θ(X)) by the empirical standard deviation of θ∗1(X), θ∗2(X),...,θ∗B(X).

2.1 Bootstrap Confidence Interval

As was mentioned above, the idea of bootstrap is to give an estimate of the distribu-

tion of θ − θ. To construct confidence intervals in order to evaluate the accuracy of


the estimator θ, we need the distribution. In this section we borrow terminology and

ideas from [15, 16].

To find the distribution of θ − θ, we need F0, the population distribution, and

the empirical estimate of it, F1 = F0. Since we do not know F0, bootstrap procedure

suggests that we use F1 instead of F0, i.e., take our sample as a representative of the

population, and take bootstrap distribution, F2 that is derived from resampling with

replacement from the sample as an estimate of F1 [15, 16].

Constructing a two-sided α-confidence interval consists of finding a t that solves:

Eft(F0, F1)|F0 = 0, (2.7)

where ft is a functional from a class ft : t ∈ T for some set T , and is defined as:

ft(F0, F1) = Iθ(F1)− t ≤ θ(F0) ≤ θ(F1) + t − α, (2.8)

where I(.) is the indicator function.

According to the bootstrap principal instead of finding t in equation (2.7), we find

t that solves:

Eft(F1, F2)|F1 = 0. (2.9)

Many statistical problems can be formulated as equation (2.7) with different func-

tional classes. Equation (2.8) gives one example of such classes that are used to con-

struct confidence intervals.

Finding an estimate of t, t, that solves the approximate equation (2.9) instead of

the original one, equation (2.7), is the essence of the bootstrap idea.

A number of methods have been proposed in the literature to construct confidence


intervals [15]. Equation (2.8) is one of such methods that we refer to as Bootstrap

Interval. Bootstrap Percentile-t Interval is another method in which ft is defined as:

ft(F0, F1) = Iθ(F1)− tτ(F1) ≤ θ(F0) ≤ θ(F1) + tτ(F1) − α. (2.10)

Bootstrap Percentile-t Interval involves the introduction of a scaling factor τ(F1)

to equation (2.8). The difference between the two confidence interval methods lies in

the idea of pivotalness. A function of both the data and an unknown parameter is

said to be pivotal if it has the same distribution for all values of the unknowns [16].

For example, for a population with a Normal distribution N(µ, σ2), (X, σ2) is the

maximum likelihood estimator of (µ, σ2). Sample mean distribution is also Normal

N(µ, σ2/n). Thus, Z =√n(X − µ) is N(0, σ2). We can immediately see that Z is

non-pivotal, because the distribution depends on the unknown σ. The α-confidence

interval of the sample mean X can be constructed as:

(X − n−1/2xασ, X + n−1/2xασ

), (2.11)

where xα is defined as:

P (|N | ≤ xα) = α, (2.12)

for Standard Normal random variable N .

Since the distribution of T =√n(X − µ)/σ is not Normal, but Student’s t distri-

bution with n − 1 degrees of freedom, the coverage error of the interval in equation

(2.11) stems from approximating Student’s t distribution by a Normal distribution

which is of order O(n−1). The distribution of T does not depend on any unknowns,

therefore, T is pivotal. An accurate α-confidence interval for the sample mean is


achieved by substituting tα instead of xα in equation (2.11) such that,

P (|V | ≤ tα) = α, (2.13)

for Student’s t random variable V . Scaling factor τ in the above example is σ which

is the maximum likelihood estimator of the standard deviation.

The α-Confidence interval of a statistic θ(F0) is called accurate when t is an exact

solution of equation (2.7) with functional ft(F0, F1) from equation (2.8), that is:

P (θ(F1)− t ≤ θ(F0) ≤ θ(F1) + t|F1) = α. (2.14)

If t is an approximate solution of equation (2.7) as in bootstrap confidence interval,

the probability of the event that θ(F0) is in the confidence interval will not be exactly

α. The difference:

P(θ(F1)− t ≤ θ(F0) ≤ θ(F1) + t

∣∣F1)− α, (2.15)

is referred to as the coverage error of the interval.

The bootstrap percentile-t interval can be estimated for any functional θ(F0).

According to the bootstrap procedure, we construct bootstrap distribution by resam-

pling with replacement from the sample. The bootstrap estimate of θ, θ(F2) and the

scaling factor τ(F2), e.g., the standard deviation σ∗, are estimated from bootstrap

distribution F2. The α-confidence interval is then calculated as:

(θ(F2)− tατ(F2), θ(F2) + tατ(F2)) , (2.16)


where tα is defined as:

P (|T | ≤ tα) = α, (2.17)

for random variable T with Student’s t distribution with n−1 degrees of freedom for

a sample of size n.

In equation (2.16) it is justified to use tα from the t-table as long as the distribu-

tion of θ(F1) is approximately Normal. If the distribution has heavy tails or is highly

skewed, the confidence interval in equation (2.16) will not be a good approximation

of the true confidence interval.

In general, the distribution of θ(F1) is not known. One special case is when θ

is the sample mean. In this case if the sample size is large enough, the distribution

of the sample mean can be approximated as Normal according to the Central Limit

Theorem. What can we do for other statistics?

Bootstrap distribution of the statistic can be constructed. If the bootstrap dis-

tribution is approximately Normal and not heavily skewed, the confidence interval

of equation (2.16) can be used. To see this, we can estimate t∗α from the bootstrap

distribution of θ(F1), that is to find t∗α such that:

P (θ(F2)− t∗ατ(F2) ≤ θ(F1) ≤ θ(F2) + t∗ατ(F2)|F1) = α, (2.18)

which can be solved as:

t∗α = inftP (θ(F2)− t∗ατ(F2) ≤ θ(F1) ≤ θ(F2) + t∗ατ(F2)|F1) ≥ α. (2.19)

To solve equation (2.18) using Monte Carlo approximation, we choose integers B ≥ 1

and 1 ≤ ν ≤ B such that ν/(B + 1) = α for a rational number α. For instance if

2.2. ITERATED BOOTSTRAP 11

α = 0.95, we can take (ν, B) = (95, 99). According to bootstrap procedure, we draw B

independent resamples from χ with replacement, namely χ∗1, χ

∗2, ..., χ

∗B, and for each

resample we calculate the corresponding empirical distribution F2,b, b = 1, 2, ..., B.

Define:

T ∗b = |θ(F2,b)− θ(F1))|/τ(F2,b). (2.20)

We pick the νth largest value of T ∗b as the Monte Carlo estimate of t∗α. As B → ∞,

T ∗b → t∗α with probability one.

Now that we have estimated t∗α using the bootstrap distribution, we can construct

the bootstrap-t confidence interval as:

(θ(F2)− t∗ατ(F2), θ(F2) + t∗ατ(F2)) . (2.21)

If this confidence interval matches closely with the interval from equation (2.16) in

which we got t from t-table, distribution of θ is approximately Normal. Otherwise,

equation (2.21) provides a better approximation of the confidence interval in the sense

of a smaller coverage error.

2.2 Iterated Bootstrap

To develop the bootstrap idea, we started from finding t that solves equation (2.7). A

lack of knowledge of population distribution F0 instructed us to substitute F1 instead

of it in equation (2.7), and substitute the bootstrap distribution F2 instead of F1 to

solve for t1 in equation (2.9) as an approximation of t. We argued that we can use the

empirical distribution F1 instead of the population distribution as long as the sample

can be considered representative of the population.

2.2. ITERATED BOOTSTRAP 12

This idea can be developed one step further by resampling with replacement from

each resample data χ∗, and solve for t2 as an approximation of t in:

Eft(F2, F3)|F2 = 0, (2.22)

where F2 is the bootstrap distribution from resampling of data χ, and F3 is the

bootstrap distribution from resampling of resampled data χ∗.

In theory we can continue this process ad infinitum. In fact, it can be shown

that each iteration improves coverage error by the order of O(n−1). However, we

showed that the number of resamples in the first iteration grows exponentially with the

sample size n. This different growth rate makes iterated resampling computationally

intractable in practice. In most practical problems, resampling is performed only in

one level with 1000 to 5000 resamples.

13

Chapter 3

Hypothesis Testing and Permutation Analysis

3.1 Hypothesis Testing

Hypothesis testing is a statistical decision making procedure to test whether or not

a hypothesis that has been formulated about a population statistic is correct. The

decision leads to either accepting or rejecting the hypothesis in question. The hy-

pothesis is based on some property of a statistic from a population. For instance, to

test the hypothesis that the mean of a population is equal to µ0, we can formulate a

hypothesis testing problem with the null hypothesis defined as H : µ = µ0, and the

alternative hypothesis as K : µ 6= µ0.

A hypothesis testing procedure uses inferential statistics to learn more about a

population that is too large or inaccessible. Often instead of the population we have

access to a sample that is drawn randomly from it, and we need to estimate the statis-

tic from the sample at hand. For instance, to solve a hypothesis testing about the

mean of a population, we can use the sample mean which is the unbiased estimator

of the mean.

To state the problem, let us assume that we want to form a decision about a

3.1. HYPOTHESIS TESTING 14

random variable X with distribution Pθ that belongs to a class P = Pθ, θ ∈ Ω.

We want to formulate some hypothesis about θ. The set Ω can be partitioned into

classes for which the hypothesis is true and those for which it is false. The resulted

two mutually exclusive classes are ΩH and ΩK respectively such that ΩH ∪ ΩK = Ω.

Four possible outcomes can occur as a result of testing a hypothesis: (1) The null

hypothesis is true and we accept it, (2) the null hypothesis is true and we reject it,

(3) the null hypothesis is false and we accept it, and (4) the null hypothesis is false

and we reject it. In cases (2) and (3) we are making an error in our decision making,

that is, we form a perception about a property of the population that is in fact not

true. Thus, two types of error can happen in the decision making process: Type I

error occurs when (2) is the case, and type II error occurs when (3) is the case. Let

us denote the probabilities of type I error and type II error by α and β respectively.

Ideally, hypothesis testing should be performed in a manner that keeps the prob-

abilities of the two types of error α and β to a minimum. However, when the number

of observations is given, both probabilities cannot be controlled simultaneously. In

hypothesis testing, researchers collect evidence to reject the null hypothesis. In the

process, they assume that the null hypothesis is true unless they can prove otherwise.

Thus, it is customary to control the probability of committing type I error [21].

The goal in hypothesis testing is to partition the sample space S into two mutually

exclusive sets S0 and S1. If X falls in S0, the null hypothesis is accepted, and if it

falls in S1 we reject the null hypothesis. S0 and S1 are referred to as acceptance and

critical regions respectively.

In order to control the probability of type I error, we put a significance level α

which is a number between 0 and 1 on the probability of S1 under the assumption


that the null hypothesis is true. That is:

Pθ(X ∈ S1) ≤ α for all θ ∈ ΩH . (3.1)

We are in effect limiting the probability of type I error to α which can be chosen

as an arbitrary small number such as 0.05. We then find S1 such that Pθ(X ∈

S1) is maximized for all θ ∈ ΩK under equation (3.1) condition. We maximize the

probability of rejecting the null hypothesis when it is in fact false. This probability

is referred to as the power of the critical region.

So far we have considered a case where we allow every outcome x of random

variable X to be either a member of S0 or S1. We can generalize this idea and

assume that x can belong to the rejection region with probability φ(x), and to the

acceptance region by probability 1 − φ(x). Then the hypothesis testing experiment

will involve drawing from random variable X with two possible outcomes R and R

with probabilities φ(x) and 1−φ(x) respectively. IfR is the outcome of the experiment

we reject the hypothesis, and otherwise accept it.

If the distribution of X is Pθ, then the probability of rejection will be:

Eθφ(X) =

∫φ(x)dPθ(x). (3.2)

The problem is to find φ(x) that maximizes test power βθ which is defined as:

Eθφ(X) =

∫φ(x)dPθ(x) for all θ ∈ ΩK , (3.3)


under the condition:

Eθφ(X) ≤ α for all θ ∈ ΩH . (3.4)

3.1.1 The Neyman-Pearson Lemma

The Neyman-Pearson Lemma provides us with a way of finding the best critical

region [21].

Theorem 3.1.1. Let P0 and P1 be probability distributions with densities p0 and p1

respectively with respect to a measure µ.

(1) Existence: For testing H : p0 against the alternative K : p1, there exists a test φ

and a constant k such that:

E0φ(X) = α, (3.5)

and

φ(x) =

1 when p1(x) > kp0(x)

0 when p1(x) < kp0(x)(3.6)

(2) Sufficient condition for the most powerful test: if a test satisfies equation (3.5)

and equation (3.6) for some k, then it is most powerful for testing p0 against p1 at

level α.

(3) Necessary condition for the most powerful test: If φ is the most powerful for

testing p0 against p1 at level α, then for some k it satisfies (3.6) all most everywhere

µ. It also satisfies (3.5) unless there exists a test of size < α and with power 1.

Proof. If we define 0 × ∞ := 0 and allow k to become ∞, the theorem is true for

α = 0 and α = 1. So, let us assume that 0 < α < 1.

(1) Let α(c) = P0p1(X) > cp0(X). Because the probability is computed under P0,

we just need to consider the inequality for the set where p0(x) > 0. Therefore, α(c) will


be the probability that the random variable p1(X)/p0(X) is greater than c. 1− α(c)

will then be a cumulative distribution function. Thus, α(c) is nonincreasing, and

continuous on the right, that is: α(c− 0)−α(c) = P0p1(x)/p0(x) = c, α(−∞) = 1,

and α(∞) = 0. for given 0 < α < 1 let c0 be such that α(c0) < α < α(c0 − 0), and

consider the test φ defined by:

φ(x) =

1 when p1(x) > c0p0(x)

α−α(c0)α(c0−0)−α(c0)

when p1(x) = c0p0(x)

0 when p1(x) < c0p0(x).

The middle expression is defined unless α(c0) = α(c0 − 0). Under that condition

P0p1(X) = c0p0(X) = 0, and φ is defined almost everywhere.

The size of φ is:

E0φ(X) = P0

p1(X)

p0(X)> c0

+

α− α(c0)

α(c0 − 0)− α(c0)P0

p1(X)

p0(X)= c0

= α. (3.7)

Comparing the size in equation (3.7) with equation (3.5) we see that c0 is the k of

the theorem. (2) To prove sufficiency, let us take φ∗ as any other test that satisfies

condition E0φ∗(X) ≤ α. Denote S+ and S− as sample space subsets for which

φ(x)− φ∗(x) > 0 and φ(x)− φ∗(x) < 0 respectively. For all x in S+ and S−, we have

p1(x) ≥ kp0(x) and p1(x) ≤ kp0(x) respectively. Thus we have:

∫(φ− φ∗)(p1 − kp0)dµ =

∫

S+∪S−

(φ− φ∗)(p1 − kp0)dµ ≥ 0. (3.8)


The difference in power is then:

∫(φ− φ∗)p1dµ ≥ k

∫(φ− φ∗)p0dµ ≥ 0. (3.9)

Therefore, φ is more powerful than φ∗.

(3) To prove the necessary condition, let us assume that φ∗ is the most powerful

to test p1 against p0 at level α, and it is not equal φ. Take S as the intersection of

S+∪S− with x : p1(x) 6= p0(x) and suppose that µ(S) > 0. Since (φ−φ∗)(p1−kp0)

is positive on S, we have:

∫

S+∪S−

(φ− φ∗)(p1 − kp0)dµ =

∫

S

(φ− φ∗)(p1 − kp0)dµ > 0. (3.10)

Therefore φ is more powerful against p1 than φ∗ which is a contradiction unless

µ(S) = 0 which completes the proof [21, 20].

The proof shows that equations (3.5) and (3.6) give necessary and sufficient con-

ditions for a most powerful test up to sets of measure zero, that is whenever the set

x : p1(x) = kp0(x) is µ-measure zero. Note that the theorem applies for discrete

distributions as well.

To summarize the idea behind the Neyman-Pearson lemma, let us suppose that

X1, X2, ..., Xn is an independent identically distributed (i.i.d) random sample with

joint density function f(X; θ). In testing null hypothesis H : θ = θ0 against the

alternative K : θ = θ1 the critical region:

CK = x : f(x, θ0)/f(x, θ1) < K, (3.11)


is most powerful for K > 0 according to the Neyman-Pearson lemma.

As an example, let us suppose that X represents a single observation from prob-

ability density function f(x, θ) = θxθ−1 for 0 < x < 1. To test null hypothesis

H : θ0 = 1 against K : θ1 = 2 with significance level α = 0.95. We have:

f(x, θ0)

f(x, θ1)=

1

2x.

Thus, the rejection region is R = x : x > k′, where k′ = 1/2k, and k > 0.

To determine the value of k′, we calculate the size of the test with respect to k′

and solve for k′ to get the desired test size 0.05:

Px ∈ R|H = Px > k′|H = 1− k′ = 0.05.

Thus, k′ = 0.95, and the rejection region is R = x : x > 0.95. From the lemma,

among all tests of null hypothesis H : θ = 1 against K : θ = 2 with level 0.05,

rejection region R has the smallest type II error probability.

Our treatment of the problem so far involves simple distributions where the dis-

tribution class contains a single distribution. This enables us to solve hypothesis

testing problem for the null hypothesis of the form H : θ = θ0 and the alternative

K : θ = θ1. In practical applications, however, we might be interested in solving a

hypothesis testing problem of the form H : θ ≤ θ0 and K : θ > θ0 which involves a

composite distribution class rather than a simple one.

If there exists a real-valued function T (x) such that for any θ < θ′, the distribu-

tions Pθ and Pθ′ are distinct, and the ratio pθ′(x)/pθ(x) is a nondecreasing function

of T (x) then pθ(x) is said to have monotone likelihood ratio property [21].


Theorem 3.1.2. Let the random variable X have probability density pθ(x) with mono-

tone likelihood ratio property in a real-valued function T (x), and θ a real parameter.

(1) For testing H : θ ≤ θ0 against K : θ > θ0 the most powerful test is given by:

φ(x) =

1 when T (x) > C

γ when T (x) = C

0 when T (x) < C,

(3.12)

where C and γ are determined by:

Eθ0φ(X) = α. (3.13)

(2) The power function of the test:

β(θ) = Eθφ(X) (3.14)

is strictly increasing for all θ for which 0 < β(θ) < 1.

(3) The test from equations (3.12) and (3.13) is the most powerful test for testing

H ′ : θ ≤ θ′ against K ′ : θ > θ′ at level α′ = β(θ′) for all θ′.

(4) The test minimizes β(θ), the probability of type I error, among all tests that satisfy

(3.13) for θ < θ0.

The one-parameter exponential family is an important class of distributions with

monotone likelihood property with respect to real-valued function T (x) that satisfy

the assumptions of the theorem from the following corollary [21]:

3.2. P-VALUES 21

Corollary 3.1.3. Let X have probability density function with respect to some mea-

sure µ, and θ be a real number

pθ(x) = C(θ)eQ(θ)T (x)h(x), (3.15)

where Q(θ) is strictly monotone. Then φ(x) from equation (3.12) is the most powerful

test for testing H : θ ≤ θ0 against K : θ > θ0 for increasing Q with level α, where

C and γ are determined from equation (3.13). For decreasing θ the inequalities in

equation (3.12) are reversed.

3.2 P-Values

So far we have studied hypothesis testing for a fixed significance level α. In alternative

standard non-Bayesian approach α is not fixed. For varying α, we want to determine

the smallest significance level at which the null hypothesis would be rejected for a

given observation. This significance level is referred to as the p-value of the test.

For random variable X let us suppose that the distribution of p1(X)/p0(X) is

continuous. Then the most powerful test can specify rejection region Sα as x :

p1(x)/p0(x) > k for k = k(α) as a function of α, where k is determined from the

size equation (3.5). Performing the test for varying α creates nested rejection regions,

that is:

Sα ⊂ Sα′ if α < α′. (3.16)

The p-value can now be determined as:

p = p(X) = infα : X ∈ Sα. (3.17)

3.3. PERMUTATION ANALYSIS 22

For example, let us suppose that X is a Normal random variable N(µ, σ2), and σ2

is known. We want to formulate a hypothesis testing problem on µ with H : µ = 0

as the null hypothesis against K : µ = µ1 as the alternative for some µ1 > 0. The

likelihood ratio can be written as:

p1(x)

p0(x)=

exp[−(x−µ1)2

2σ2

]

exp[−x2

2σ2

] = exp

[µ1x

σ2− µ2

1

2σ2

].

Thus, in order to have p1(x)/p0(x) > k, x should be greater than k′ > 0 which can be

determined from the constraint P0X > k′ = α. Thus, the rejection region can be

written as Sα = X : X > σz1−α, where z1−α is the 1−α percentile of the standard

Normal distribution. From the definition of percentile, we have:

Sα = X : 1− Φ(X

σ) < α.

For a given observed value of X , the infimum of Sα over all α can be written as:

p = 1− Φ(X

σ),

which is uniformly distributed on (0, 1).

3.3 Permutation Analysis

The Neyman-Pearson lemma determines the most powerful test for simple tests as

well as for composite ones with a monotone likelihood property. As we studied in

the previous section, to determine the rejection region one also needs to know the


distribution of the test statistic under the null hypothesis. Often in practical applica-

tions finding the test statistic distribution under the null hypothesis cannot be done

analytically. Permutation tests address this problem by providing researchers with a

simple way of estimating test statistic distribution using the bootstrap idea.

A permutation test is essentially hypothesis testing through bootstrapping. The

idea of permutation analysis is to estimate a test statistic distribution by resampling

with replacement under the assumption that the null hypothesis is true. For instance,

suppose that we perform an experiment in which brain magnetic signal is recorded

from a brain region while the subject is performing a reaching task. The signal is

recorded for 2 seconds. During the first 0.5 seconds a baseline signal is recorded while

the subject is sitting still doing nothing, and the last 1.5 seconds is recorded while the

subject is performing the reaching task. We want to investigate whether the brain

region is active during the experiment. To be more specific, suppose that the data is

collected at a rate of 600 samples per second. Thus, we have 300 samples of baseline,

and 900 samples from the task.

One way to approach the problem is to compare the mean of the signal dur-

ing the task, µtask with that of the baseline, µbaseline. To this end, we formulate a

hypothesis testing problem to test null hypothesis H : µtask − µbaseline = 0 against

K : µtask − µbaseline > 0 as alternative. As a test statistic we take the difference

between sample means xtask − xbaseline. The null hypothesis assumes no difference

between the baseline and the task. Thus, resampling with replacement under the null

hypothesis would mean that out of 1200 total samples we pick 300 and 900 samples

to assign to the baseline and the task respectively. Then, we calculate the difference

between sample means for each resample. If we take 2048 resamples, we will have


2048 of such mean differences. We can calculate the empirical distribution of the

differences to build the bootstrap distribution. To calculate the p-value we locate the

sample mean difference in the bootstrap distribution.

Permutation tests scramble data randomly between the groups. Therefore, for the

test to be valid the distribution of the two groups must be the same under the null

hypothesis. To account for the differences in standard deviation, it is more accurate

to consider pivotal test statistic and normalize by the unbiased standard deviation

estimator of the groups.

We can use permutation tests when problem design and the null hypothesis allow

us to resample under the null hypothesis. Permutation tests are suitable for three

groups of problems: Two-sample problems when the null hypothesis assumes that test

statistic is the same for both groups, matched pair designs when the null hypothesis

assumes that there are only random differences within pairs, and for problems that

study a relationship, i.e. correlation coefficient between two variables when the null

hypothesis assumes no relationship between them [18].

In the Result section we study the above example in more depth.

25

Chapter 4

Asymptotic Properties of the Mean

In chapter 2 we reviewed bootstrap theory. In practice, bootstrap procedure is used

when population and/or statistic distribution(s) is not known. In this section we

study the validity of the bootstrap procedure for sample mean which is one of the

statistics that can be handled analytically.

Let X1, X2, ..., Xn be n independent identically distributed random variables with

the common distribution F with mean µ, and variance σ2 both unknown. The sample

mean µn = 1n

∑ni=1Xi is the unbiased estimator of the mean µ. If we take the

estimator σ2n = 1

n−1

∑ni=1(Xi− µn)

2 as an estimator of σ2 then from the Central Limit

Theorem the pivotal statistic Qn =√n(µn − µ)/σn tends to N(0, 1) in distribution.

It is interesting to study the asymptotic behaviour of the bootstrap distribution.

We pick n resamples X∗1 , X

∗2 , ..., X

∗n with replacement from the sample set. With

each resample having the same chance of getting picked, we can assign 1/n probability

mass to each of the n resamples. The bootstrap sample mean is given as:

µ∗n =

1

n

n∑

i=1

X∗i , (4.1)

26

and the sample variance is given as:

σ∗2n =

1

n− 1

n∑

i=1

(X∗i − µ∗

n)2. (4.2)

We are interested in the asymptotic behaviour of the pivotal bootstrap statistic Q∗n =

√n(µ∗

n − µn)/σ∗n. As we discussed in chapter 2, we construct the pivotal statistic by

replacing the sample mean by the bootstrap mean, and the population mean µ by the

sample mean µn. Essentially we are taking the bootstrap distribution F ∗ that assigns

the same probability mass to all Xi, i = 1, ..., n instead of the population distribution

F .

Theorem 4.0.1. Let X1, X2, ... be an independent identically distributed random se-

quence with positive variance σ2. For almost all sample sequences X1, X2, ... condi-

tional on (X1, X2, ..., Xn), as n tends to ∞:

(1) Conditional distribution√n(µ∗

n − µn) converges to N(0, σ2) in distribution.

(2) σ∗n → σ in probability.

Parts (1) and (2) of Theorem (4.0.1) and Slutski theorem imply that the pivotal

bootstrap statistic Q∗n converges to N(0, 1) in distribution. In this report we prove

part (1) of the theorem using the ideas and lemmas from Angus (1989) [2]. For the

complete proof of part (2) using the law of large numbers, we refer to Politis and

Romano (1994) [24].

Lemma 4.0.2 (Borel-Cantelli). Let An, n ≥ 1 be a sequence of events in a proba-

bility space. If∞∑

n=1

P (An) < ∞, (4.3)

27

then P (Ai.o.) = 0, where i.o. stands for infinity often, that is:

A(i.o.) = ∩∞i=1 ∪∞

n=i An. (4.4)

Proof. We want to prove that only a finite number of the events can occur. Let

In = IAn be the indicator function of An. The number of events that can occur

can be determined as N =∑∞

k=1 Ik. P (Ai.o.) = 0 if and only if P (N < ∞) = 1. By

Fubini’s theorem E(N) =∑∞

n=1 P (An) which is finite by assumption. E(N) < ∞

implies that P (N < ∞) = 1 which completes the proof.

Lemma 4.0.3. Let the sequence X1, X2, ... consist of identical independent distribu-

tion random variables with E|X1| < ∞, then for every ǫ > 0 P|Xn| > ǫn i.o. = 0.

Proof. It is sufficient to show that the Borel-Cantelli assumption holds. Fix ǫ > 0:

∞∑

n=1

P (|Xn| ≥ ǫn) =∞∑

n=1

∞∑

k=n

Pǫk ≤ |X1| ≤ ǫ(k + 1)

Fubini=

∞∑

k=1

k∑

n=1

Pǫk ≤ |X1| ≤ ǫ(k + 1)

=∞∑

k=1

kPǫk ≤ |X1| ≤ ǫ(k + 1)

≤ E|X1|/ǫ < ∞.

Borel-Cantelli lemma completes the proof.

Lemma 4.0.4. Let the sequence X1, X2, ... consist of identical independent distributed

28

random variables with E|X1|2 < ∞, then:

lim supn→∞

n−3/2n∑

k=1

|Xk|3 → 0 almost surely.

Proof. Fix ǫ > 0:

n−3/2

n∑

k=1

|Xk|3 = n−3/2

n∑

k=1

|Xk|3I|Xk| ≥ ǫ√k+ n−3/2

n∑

k=1

|Xk|3I|Xk| < ǫ√k

≤ n−3/2n∑

k=1

|Xk|3I|Xk| ≥ ǫ√k+ ǫn−3/2

n∑

k=1

|Xk|2√k

≤ n−3/2n∑

k=1

|Xk|3I|Xk| ≥ ǫ√k+ ǫn−1

n∑

k=1

|Xk|2 (4.5)

By Lemma (4.0.3):

P|Xk|3I|Xk| ≥ ǫ

√k 6= 0 i.o.

= P

|Xk|2 ≥ ǫ2k i.o.

= 0.

Thus, |Xk|3I|Xk| ≥ ǫ√k = 0 almost surely for all but finitely many k values.

Therefore, n−3/2∑n

k=1 |Xk|3I|Xk| ≥ ǫ√k → 0 almost surely as n → ∞. By

the law of large numbers, the second term in equation (4.5), ǫn−1∑n

k=1 |Xk|2, con-

verges almost surely to ǫE[X21 ] as n → ∞. Thus, lim supn→∞ n−3/2

∑nk=1 |Xk|3 →

0 almost surely.

Now we are ready to prove Theorem (4.0.1).

Proof. Define T ∗n =

√n(µ∗

n − µn). T ∗n can be written as the sum of n independent

identically distributed random variables n−1/2(X∗k − µn), k = 1, ..., n. Resampled

29

random variable X∗k can take values from the sample space X1, ..., Xn with equal

probability mass 1/n. Thus, the characteristic function of T ∗n can be written as:

E [exp(itT ∗n)] =

[1

n

n∑

j=1

exp

(it(Xj − µn)√

n

)]n. (4.6)

By repeated integration by parts exp(ix) can be written as:

exp(ix) = 1 + ix− x2

2+

x3

6θ(x), (4.7)

where θ(x) := 3x3

∫ x

0i3(x− t)2eitdt is a continuous function of x, and |θ(x)| ≤ 1. Thus,

equation (4.6) can be written as:

=

[1 +

1

n

n∑

j=1

it(Xj − µn)√n

− 1

n

n∑

j=1

t2(Xj − µn)2

2n+

1

n

n∑

j=1

t3(Xj − µn)3

6n3/2θ

(t(Xj − µn)√

n

)]n. (4.8)

As n → ∞ by the law of large numbers the second term in brackets goes to zero

almost surely. If we refer to the last term as Qn, we get:

E [exp(itT ∗n)] =

[1− t2σ2

n

2n+Qn

]n. (4.9)

From |θ(x)| ≤ 1, n|Qn| ≤ |t|3

6n−3/2

∑nj=1 |Xj − µn|3. Thus, from Lemma (4.0.4) as

n → ∞, nQn → 0 almost surely. By the law of large numbers σ2n → σ2 almost surely

as n → ∞. Hence, as n → ∞ we have:

E [exp(itT ∗n)|X1, X2, ..., Xn] =

[1− t2σ2

2n

]n→ exp

(−t2σ2

2

), (4.10)

30

which is the characteristic function of N(0, σ2). This completes the proof of part

(1).

Thus far, we have shown that the bootstrap procedure works asymptotically for

the mean. Delta method guarantees the validity of the procedure for any function

with continuous derivatives in the neighbourhood of the mean as well.

Another problem of interest is to study the order of accuracy of the bootstrap

estimate. In particular, is it more accurate to use a Normal approximation for the

population instead of using the bootstrap estimate of the distribution? This is the

subject of the next section where we show that the answer is no!

31

Chapter 5

Bootstrap Accuracy and Edgeworth Expansion

5.1 Edgeworth Expansion

Moment generating function of random variable X is defined as φ(t) = E(eitX). As

the name suggests, moments of the random variable can be found from the moment

generating function. The jth moment of the random variable X which is defined as

µj = E(Xj) is the jth derivative of the moment generating function at t = 0:

µj =djφ(t)

dtj(0). (5.1)

Taylor series expansion of the moment generating function at t = 0 can be written

as:

φ(t) = E(eitX) = E

(∞∑

n=0

(itX)n

n!

)

=∞∑

n=0

µn(it)n

n!, (5.2)

5.1. EDGEWORTH EXPANSION 32

where µ0 = 1 and 0! = 1.

Cumulant generating function of the random variable X is defined as log(φ(t)),

the natural logarithm of moment generating function. Cumulants are found from the

power series expansion of the cumulant generating function.

log(φ(t)) =

∞∑

n=1

κn(it)n

n!, (5.3)

where κn is the nth cumulant of the random variable X .

To find the relationship between cumulants and moments of random variable X ,

we can write log(φ(t)) as:

log(φ(t)) = −∞∑

n=1

1

n(1− E(eitX))n = −

∞∑

n=1

1

n

(−

∞∑

m=1

µm(it)m

m!

)n

. (5.4)

Comparing (5.3) and (5.4) expansions, it can be shown that the cumulants are ho-

mogeneous polynomials of moments and vice versa [16]. In particular, we have the

following relationships for the first four cumulants:

κ1 = µ1 (5.5)

κ2 = µ2 − µ21 = var(X) (5.6)

κ3 = µ3 − 3µ2µ1 + 2µ31 (5.7)

κ4 = µ4 − 4µ3µ1 − 3µ22 + 12µ2µ

21 − 6µ4

1. (5.8)

The third and forth cumulants, κ3 and κ4, are referred to as skewness and kurtosis

respectively.

Let X1, X2, ... be independent identically distributed random variables with mean


µ, and variance σ2. As we mentioned in the previous section, by the Central Limit

Theorem Qn =√n(µn − µ) converges to N(0, σ2) in distribution as n → ∞, where

µn is the sample mean for a sample of size n. Let us assume that µ1 = µ = 0 and

σ2 = 1. The problem of interest is to find the cumulative distribution of Sn =√nµn =

n−1/2∑n

j=1Xj . In particular, we are interested in the power series of P (Sn ≤ x). Such

an expansion is referred to as Edgeworth expansion.

To this end, we start from the characteristic function of Sn:

φn(t) = Eexp(itSn). (5.9)

From the independence assumption φn(t) can be written as:

φn(t) = Eexp(itn−1/2X1)Eexp(itn−1/2X2)...Eexp(itn−1/2Xn)

= φ(tn−1/2)...φ(tn−1/2) = φn(tn−1/2). (5.10)

From equation (5.3), we have:

φn(t) = φn(tn−1/2) =

[exp

(∞∑

j=1

κj(itn−1/2)j

j!

)]n. (5.11)

Since κ1 = 0 and κ2 = 1,

φn(t) = exp

−t2

2+

κ3n−1/2(it)3

6+ ... +

κjn−(j−2)/2(it)j

j!+ ...

. (5.12)

By expanding the exponent we get:

φn(t) = e−t2/2 + n−1/2r1(it)e−t2/2 + ... + n−j/2rj(it)e

−t2/2 + ..., (5.13)


where rj(it)’s are polynomials of degree 3j and parity j with coefficients κ3, ..., κj+1 [16].

Moreover, rj(it)’s are independent of n. In particular,

r1(u) =1

6κ3u

3

r2(u) =1

72κ23u

6 +1

24κ4u

4.

By definition, the characteristic function of Sn can be written as,

φn(t) =

∫ ∞

−∞

eitxdP (Sn ≤ x). (5.14)

Moreover, the fact that e−t2/2 is the characteristic function of standard Normal sug-

gests that it is possible to take the inverse expansion of φn(t) to get the Cumulative

Distribution Function (CDF) of Sn,

P (Sn ≤ x) = Φ(x) + n−1/2R1(x) + ...+ n−j/2Rj(x) + ...., (5.15)

where Φ(x) is the CDF of standard Normal, and,

∫ ∞

−∞

eitxdRj(x) = rj(it)e−t2/2. (5.16)

The next step is to calculate Rj(x),

e−t2/2 =

∫ ∞

−∞

eitxdΦ(x). (5.17)


Integration by parts on equation (5.17) gives,

e−t2/2 = (−it)−1

∫ ∞

−∞

eitxdΦ(1)(x) = ...

= (−it)−j

∫ ∞

−∞

eitxdΦ(j)(x),

where Φ(j)(x) = DjΦ(x) and D is the differentiation operator. Therefore,

∫ ∞

−∞

eitxdrj(−D)Φ(x) = rj(it)e−t2/2. (5.18)

From equations (5.16) and (5.18), and the uniqueness of Fourier transform, we deduce

that,

Rj(x) = rj(−D)Φ(x). (5.19)

For j ≥ 1,

(−D)jΦ(x) = −Hej−1(x)φ(x), (5.20)

where Hen(x) is the standardized Hermite polynomial of degree n with the same

parity as n, and is defined as,

Hen(x) = (−1)nex2/2 dn

dxne−x2/2. (5.21)

Therefore, Rj(x) can be written as Rj(x) = Pj(x)φ(x), where Pj(x) is a polynomial

of degree 3j − 1 with opposite parity to j, and coefficients that depend on moments


of X up to order j + 2. In particular,

P1(x) = −1

6κ3(x

2 − 1), (5.22)

P2(x) = −x

1

24κ4(x

2 − 3) +1

72κ23(x

4 − 10x2 + 15)

. (5.23)

The Edgeworth expansion of the cumulative distribution function, P (Sn ≤ x) can be

written as,

P (Sn ≤ x) = Φ(x)+n−1/2P1(x)φ(x)+n−1P2(x)φ(x)+...+n−j/2Pj(x)φ(x)+.... (5.24)

For the CDF of random variable Y the Edgeworth expansion converges as an

infinite series if Eexp(1/4Y 4) < ∞ [10]. This is a restrictive condition on the tails

of the distribution. However, the expansion shows that if the series is stopped after

j terms, the remainder will be of order n−j/2,

P (Sn ≤ x) = Φ(x) + n−1/2P1(x)φ(x) + n−1P2(x)φ(x) + ...

+ n−j/2Pj(x)φ(x) + o(n−j/2), (5.25)

which is a valid expansion for fixed j as n → ∞. Cramer [10] gives a sufficient

regularity condition for the expansion as,

E(|X|j+2) < ∞ lim sup|t|→∞

|φ(x)| < 1. (5.26)

Bhattacharya and Ghosh [5] show that for statistics with continuous derivatives

in the neighbourhood of the mean equation (5.25) with regularity conditions (5.26)

5.2. BOOTSTRAP EDGEWORTH EXPANSION 37

converges uniformly in x as n → ∞. As is the case for Sn, polynomials Pj(x) are of

degrees 3j− 1 with opposite parities to j (even polynomial for odd j and vice versa),

and coefficients that depend on moments of X up to order j+2 and their derivatives

up to order k + 2.

5.2 Bootstrap Edgeworth Expansion

Let X1, X2, ..., Xn be independent identically distributed random samples with com-

mon distribution F , and θ be the estimator of the statistic θ that is computed from

the dataset χ = X1, X2, ..., Xn with empirical distribution F . Let us further assume

that S = n1/2(θ − θ) is asymptotically N(0, σ2) where σ2 = σ2(F ) is the variance of

S. Then pivotal statistic T is,

T = n1/2(θ − θ)/σ, (5.27)

where σ2 = σ2(F ) is asymptotically N(0, 1).

From [5], the Edgeworth expansions of S and T CFDs are,

P (S ≤ x) = Φ(x/σ) + n−1/2P1(x/σ)φ(x/σ) + n−1P2(x/σ)φ(x/σ) + ..., (5.28)

P (T ≤ x) = Φ(x) + n−1/2Q1(x)φ(x) + n−1Q2(x)φ(x) + ..., (5.29)

where Pj(x) and Qj(x) are polynomials of degree 3j − 1 of opposite parity to j.

Thus, the Normal approximation of the CDF of S and T is P (S ≤ x) ≃ Φ(x/σ) and

P (T ≤ x) ≃ Φ(x) respectively which are in error by order n−1/2.

To study bootstrap accuracy, let us assume that the bootstrap estimate θ∗ of θ is

5.2. BOOTSTRAP EDGEWORTH EXPANSION 38

computed from the resample dataset χ∗ = X∗1 , X

∗2 , ..., X

∗n with bootstrap distribu-

tion F ∗. Then bootstrap estimates of S and T are written as,

S∗ = n1/2(θ∗ − θ), (5.30)

T ∗ = n1/2(θ∗ − θ)/σ∗, (5.31)

where σ∗ is the bootstrap estimate of σ.

Edgeworth expansions of S∗ and T ∗ CDFs are,

P (S∗ ≤ x|χ) = Φ(x/σ) + n−1/2P1(x/σ)φ(x/σ) + n−1P2(x/σ)φ(x/σ) + ..., (5.32)

P (T ∗ ≤ x|χ) = Φ(x) + n−1/2Q1(x)φ(x) + n−1Q2(x)φ(x) + ..., (5.33)

where Pj(x) and Qj(x) are obtained by replacing unknowns in Pj(x) and Qj(x) by

their bootstrap estimates respectively. The estimates of coefficients in Pj(x) and

Qj(x) differ from their corresponding values in Pj(x) and Qj(x) by the order n−1/2.

Therefore, the accuracy of bootstrap CDFs of S and T is,

P (S∗ ≤ x)− P (S ≤ x) = Φ(x/σ)− Φ(x/σ) +O(n−1), (5.34)

P (T ∗ ≤ x)− P (T ≤ x) = O(n−1). (5.35)

In equation (5.34) standard deviation estimate σ differs from the standard deviation

σ by the order n−1/2, σ − σ = O(n−1/2). Thus,

P (S∗ ≤ x)− P (S ≤ x) = O(n−1/2). (5.36)

5.3. BOOTSTRAP CONFIDENCE INTERVAL ACCURACY 39

Equations (5.35) and (5.36) outline bootstrap CDF accuracy for S and T respec-

tively. Therefore the bootstrap estimate of S has the same order of accuracy as

Normal approximation whereas bootstrap estimate of T is more accurate than the

Normal approximation by the order n−1/2. This brings us to the advantage of pivotal

statistics. T is a pivotal statistic while S is not. The distribution of T does not de-

pend on any unknown values, and the bootstrap power is directed toward estimating

distribution skewness while in the case of non-pivotal statistic S, bootstrap power is

“wasted” toward estimating standard deviation.

At this stage the problem of interest is to study the accuracy of bootstrap confi-

dence interval.

5.3 Bootstrap Confidence Interval Accuracy

We recall that the Edgeworth expansion of the CDF of statistic Sn can be written as in

equation (5.25). Denote the α-level percentile of Sn and standard Normal distribution

by ξα, and zα respectively,

ξα = infx : P (Sn ≤ x) ≥ α. (5.37)

By taking the inverse from both sides of equation (5.25), we can write the series

expansion of ξα in terms of zα as,

ξα = zα + n−1/2P cf1 (zα) + ...+ n−j/2P cf

j (zα) +O(n−j/2). (5.38)

This expansion is referred to as Cornish-Fisher expansion. Cornish and Fisher [9]

proved that the asymptotic series converges uniformly in ǫ < α < 1−ǫ for 0 < ǫ < 1/2.


In the expansion, P cfj (x)’s are polynomials of degree at most j+1 with opposite parity

to j with coefficients that depend on cumulants up to order j + 2. In particular,

P cf1 (x) = −P1(x), (5.39)

P cf2 (x) = P1(x)P

′1(x)−

1

2xP 2

1 (x)− P2(x). (5.40)

In this section we consider two types of α-confidence intervals: one-sided and

two-sided confidence intervals that we denote by I1 and I2 respectively,

I1 = (−∞, θ + n−1/2σzα) (5.41)

I2 = (θ − n−1/2σxα, θ + n−1/2σxα), (5.42)

where zα and xα are defined as,

P (N ≤ zα) = α, (5.43)

P (|N | ≤ xα) = α, (5.44)

for a standard Normal random variable N . Essentially, I1 and I2 are constructed

under the assumption of a Normal distribution for the population. From the Edge-

worth expansion of statistic T in equation (5.29) we evaluate the accuracy of such an

assumption, that is we calculate the coverage error of I1,

P (θ ∈ I1) = P (T > −zα)

= 1− Φ(−zα) + n−1/2Q1(−zα)φ(−zα) +O(n−1)

= α− n−1/2Q1(−zα)φ(−zα) +O(n−1).


Therefore, coverage error of the one-sided interval is in order n−1/2. Similarly, by

noting that Q1(x) and Q2(x) are even and odd polynomials respectively, we calculate

the coverage error of I2,

P (θ ∈ I2) = P (T > −xα)− P (T < −xα)

= α + 2n−1Q2(xα)φ(xα) +O(n−2),

which indicates that the coverage error of the two-sided interval is in order n−1.

We study the accuracy of bootstrap percentile estimates for pivotal statistic T .

Let us define the α-percentile of T as ηα,

P (T ≤ ηα) = α. (5.45)

Similarly, for the bootstrap estimate of T which is denoted by T ∗ in equation (5.31)

α-percentile ηα is defined as,

P (T ∗ ≤ ηα|χ) = α. (5.46)

From Cornish-Fisher expansion ηα is written as,

ηα = zα + n−1/2Qcf1 (zα) + n−1Qcf

2 (zα) +O(n−1), (5.47)


where zα is the α-percentile of the standard Normal distribution, and Qcf1 and Qcf

2

are defined as,

Qcf1 (x) = −Q1(x), (5.48)

Qcf2 (x) = Q1(x)Q

′1(x)−

1

2xQ2

1(x)−Q2(x), (5.49)

where Q1(x) and Q2(x) are the polynomials in Edgeworth expansion of CDF of T in

equation (5.29). By substituting coefficients in Qcf1 (x) and Qcf

2 (x) with their respec-

tive bootstrap estimates we obtain Cornish-Fisher expansion of ηα,

ηα = zα + n−1/2Qcf1 (zα) + n−1Qcf

2 (zα) +O(n−1). (5.50)

The estimates of coefficients in Qcfj (x) differ from their corresponding values in Qcf

j (x)

by order n−1/2. Therefore,

ηα = ηα +O(n−1), (5.51)

that is the order of accuracy of the bootstrap quantile estimate is n−1 which outper-

forms the order n−1/2 of Normal approximation in equation (5.47).

So far we have studied bootstrap theory and investigated permutation analysis

as a hypothesis testing method based on bootstrap principle. Moreover, we studied

asymptotic behaviour of bootstrap as well as the accuracy of bootstrap estimates, and

showed that bootstrap resampling from data outperforms Normal approximation of

data distribution both in accuracy and asymptotic convergence. In the next chapter

we follow the example that we pointed out briefly in section (3.3) to use permutation

analysis to localize statistically significant brain activity in reaching/pointing tasks.

43

Chapter 6

Results

Recent developments in brain imaging such as electroencephalography (EEG), mag-

netoencephalography (MEG), positron emission tomography (PET), and functional

magnetic resonance imaging (fMRI) have made it possible for researchers to localize

brain activity more accurately. Ideal localization of activity is spatially accurate and

preserves the timing of the activity. However, in choosing an appropriate neuroimag-

ing technique, there has always been a trade off between time and spatial resolution.

For instance, fMRI works by detecting variations in levels of blood oxygenation that

occur when blood flow increases in active brain areas due to more oxygen demand.

Because of the time lag between neural activity and detectable blood oxygenation

level, fMRI image time resolution is low (on the order of seconds). Therefore, while

fMRI spatial accuracy is in millimetre scale which is high enough resolution in most

studies, time resolution of the activity is compromised making it unsuitable for stud-

ies in which timing plays a crucial role. On the other hand, EEG captures timing of

the activity more accurately by measuring the natural electrical current flow of active

neurons. However, spatial localization of activity cannot be performed accurately

because of the artifacts of non-cerebral origin such as eye movements and cardiac

44

artifacts that interfere with EEG data [22].

The neural imaging method of choice in this study is MEG. The MEG machine

uses more than hundred highly sensitive super conductor magnetic sensors, referred

to as superconducting quantum interference device (SQUID), on the scalp to mea-

sure magnetic fields that are produced radially by neuronal electric currents. MEG

time resolution is on the order of milliseconds making the method appropriate for

real time brain functional studies. Since the magnetic permeability of the scalp and

underneath tissues is approximately the same, a neuronal magnetic signal can be

measured without much distortion which is an advantage over EEG where largely

varying conductances of these tissues distort measurements. Reliable estimates of

such conductances must be available in EEG experiments to make up for the distor-

tion [17].

Localization via recorded magnetic signals in MEG poses an inverse problem due

to the fact that the number of sources in the brain from which activity is recorded is

more than the number of measuring sensors. A number of post processing techniques

such as spatial filtering (beamforming) have been proposed in the literature to solve

the problem [23, 27, 3, 25, 26, 7, 8]. Moreover, classical and adaptive clustering algo-

rithms are proposed to improve beamformer spatial resolution [12, 1]

In this report we take the following localization approach: After appropriate pre-

processing steps to prepare the data, we discretize the brain into 3mm3 voxels, and

for each voxel compute the source activity in 7-35Hz frequency band using an event-

related beamformer [8]. For each participant in the MEG experiment we construct a

3D brain image by registering voxel locations from his/her corresponding MEG system

6.1. METHODS 45

coordinate to the standardized Talairach coordinates [17] using an affine transforma-

tion (SPM2) (more detail in [1]). An average activation pattern can be calculated

across all the resulted images (one image per subject). However, if the resulted brain

activation patterns are largely varied across images, average activation pattern will

not be informative enough to find active brain areas consistently across subjects.

Therefore, we propose permutation analysis to find significantly active brain areas in

the resulted images.

This chapter is organized as follows: in section I we describe the experiment setup

and methodology as well as post processing methods to make the data ready for per-

mutation analysis. Finally, section II presents permutation analysis methodology and

results.

6.1 Methods

Participants

Ten healthy adult participants (8 males, 2 females) age range 22-45 years with no

history of neurological dysfunction or injury participated in this study. This study

was approved both by the York University and Hospital for Sick Children Ethics

Board. All participants gave informed consent.

6.1.1 Experimental Paradigm

Figure 6.1 shows the experiment setup. Participants sat upright with their head

under the dewar of the MEG machine in a electromagnetically shielded room during

the experiment (Figure 6.1(d)). In each trial subjects performed a memory guided

reaching task while they remained fixated on a central white cross. After a fixation

6.1. METHODS 46

for 500ms a green or red dot (target) was briefly presented for 200ms randomly either

to the right or the left of the centre cross (Figure 6.1(a,c)). We refer to the 500ms

interval before the target onset as baseline period. The centre cross dimmed after

1500ms as an instructor for subjects to start pointing toward (pro) or to the mirror

opposite location of the target (anti) while the eyes remain fixated. Direction of

pointing depended on the colour of the target, e.g. green and red represented pro

and anti trials respectively. Pointing movement was wrist-only with three different

wrist/forearm postures for right hand (pronation, upright, and down) and one posture

for left hand (pronation) in separate blocks of trials (Figure 6.1(b)). Each pointing

trial lasted approximately 3 seconds with a 500ms inter-trial interval (ITI). 100 trials

for each condition-left hand versus right hand, pro versus anti, and 3 hand postures-

amount to 1200 trials for each subject. Movement onset for each subject is measured

using bipolar differential electromyography (EMG). For more detail of the experiment

and MEG data acquisition procedure please refer to [1].

6.1.2 Data Processing

Data were collected at a rate of 600 samples per second with a 150Hz low pass filter,

using synthetic third-order gradiometer noise cancellation. After manually inspection

for artifacts in addition to eye movements, blinks, premature hand movements and

removing corresponding trials from the analysis, on average 98 reaching trials per

condition were retained for each subject for subsequent processing.

Brain source activity is estimated from sensor data using event-related beamform-

ing [8]. The idea of beamforming is depicted in Figure 6.2. The brain is discretized

into voxels of volume 3mm3 for each subject. Beamformer assumes a dipole direction

6.1. METHODS 47

Figure 6.1: The MEG experiment setup. (a) Time course of the experiment. (b)Three postures of the hand were used in different recording blocks. (c)The fixation cross in the middle with two possible target locations in itsleft and right hand side. (d) Subjects sit upright under MEG machineperforming the pointing task with the wrist only. (e) Task: target (cue)appears in either green or red to inform the subject of the pro or antinature of the pointing trials. Dimming of the central fixation cross wasthe movement instruction for subjects.

at each voxel location. Using the dipole forward solution and sensor covariance ma-

trix it then solved for a dipole direction that minimizes power variance at the voxel.

Dipole direction at each voxel can be regarded as a spatial filter weight that when

applied to sensor data reconstructs instantaneous power at the corresponding voxel

location by rejecting interfering power of adjacent voxels.

6.2. PERMUTATION ANALYSIS RESULTS 48

Compute average virtual sen-sors for 3-dimensional gridcovering entire brain volume Map absolute

amplitude atlatency t

P(t) = Wnm(t) pseudo-Z

2.0

6.0

Thresholded source image superimposed on MRI

(B) Imaging instantaneous source amplitude (“event-related” beamformer)

n voxels

compute normalizedbeamformer weightsWn for source j

forward solution for dipole j(r,u)

N samples

T trials

M c

hann

els

trial 1

trial 2

trial 3

trial T

M X Mcovariance

matrix

(A) Calculation of source activity over time (“virtual sensors”)

Single trial data

Average

Virtual sensorfor source j

Time (s)pseudo-Z

Figure 6.2: The diagram of the event-related beamformer [8]: The data consists of Ttrials each with M channels and N time samples. The covariance matrixof the data are given to the beamformer as well as the forward solutionfor dipole at each location. Average source activity is then estimated ateach voxel, and dipole orientation is adjusted accordingly to maximizepower at the corresponding voxel.

6.2 Permutation Analysis Results

Brain activity is studied in separate frequency bands. In neuroscience terminology

7-15Hz, 15-35Hz, 35-55Hz, and 55-120Hz are referred to as alpha, beta, lower-gamma,

and higher-gamma bands respectively [17]. In this study, the frequency band of focus

is 7-35Hz, that is alpha and beta bands put together. Thus, beamformer estimated

power time series at each voxel location is bandpass filtered to retain the power in

7-35Hz band.

Data at each voxel is aligned at the cue onset when target appears, and around


the movement onset when subject starts the movement according to the EMG mea-

surement. Moreover data samples are transformed into Z-scores by subtracting each

sample from the baseline mean and normalize by standard deviation.

In [1] a two-level adaptive cluster analysis is proposed to find active brain areas

in the experiment. It is shown that a large network of brain areas are involved in

reaching task starting from visual areas in occipital lobe and continuing to parietal

areas that are presumably responsible for sensorimotor transformations, and finally

movement planning and execution in primary motor cortex.

Figure 6.3 shows average brain z-score power activity from 0.45 seconds before

movement onset to movement onset averaged across all subjects for right hand move-

ment toward left targets (pro-left condition). Activity is shown in three planes: (a)

transverse, (b) sagittal, and (c) coronal. As the figure shows, a network of brain ar-

eas are active with positive activations (neuronal synchronization) in occipital visual

areas (e.g. V3)and parietal areas, and negative activity (neuronal desynchronization)

in contralateral primary motor cortex (M1) that executes the movement.

Figure 6.4 shows average brain z-score power around cue onset from target ap-

pearance to 0.5 seconds after, averaged across all subjects for movement toward left

targets (pro-left). Figure shows contralateral desynchronization in primary visual ar-

eas followed by activations in parietal areas such as mid-posterior intra-parietal sulcus

(mIPS), angular gyrus (AG), inferior parietal lobule (IPL), and superior parietal lob-

ule (SPL).

Figure 6.5 shows average brain z-score power activity from 0.45 seconds before

movement onset to movement onset averaged across all subjects for right hand move-

ment to mirror opposite direction of right targets (anti-right condition). This is


3.37

1-1

-2.66

M1

Psu

ed

o-Z

V3

(a)

(b) (c)

Pro-left mov-righthand

Figure 6.3: Average brain activation for pro condition/left target around movementonset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal, and (c)coronal.

essentially the same movement as pro-left condition except that the target appears

on the right. The reason to include such anti conditions in the experiment was to

detach movement from target stimuli in order to be able to investigate parietal brain

areas that are responsible for sensorimotor transformation from retinal to shoulder

coordinates to execute an accurate movement plan [4]. As it can be seen from the

figure activation pattern around movement onset is the same as that of pro-left con-

dition in Figure 6.3. Pre-motor ventral (PMV) area is also shown in the figure.

Figure 6.6 shows average brain z-score power around cue onset from target ap-

pearance to 0.5 seconds after, averaged across all subjects for anti movement/right

target (anti-right). Figure shows desynchronization in contralateral in mIPS and AG.

So far we have looked at average activation patterns which denote by X for the


0.91

Psu

ed

o-Z

0.80-0.80

-2.19(a)

(b) (c)

IPLSPLmIPS

AG

Pro-left cue-righthand

Figure 6.4: Average brain activation for pro condition/left target around cue onset(0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal.

following argument. As it is evident from the figures, average activation patterns

appear in wide pseudo-z ranges. Thus, it is important to investigate what areas

are significantly active among all subjects, that is to set pseudo-z score thresholds

above which average patterns are statistically significant. To this end, we formulate

a hypothesis testing problem with null hypothesis as,

H : X = 0

K : X 6= 0(6.1)

where H and K are the null and alternative hypotheses respectively, and X is the

average activation pattern.

We limit the probability of type I error by α = 0.05 to calculate 95% confidence


3.05

Psu

ed

o-Z

1-1

-2.90(a)

(b) (c)

Anti-right mov-righthand

PMV

Figure 6.5: Average brain activation for anti condition/right target around movementonset (-0.45-0 seconds) in three planes (a) transverse (b) sagittal, and (c)coronal.

intervals and p-values, and solve the hypothesis testing problem at each brain voxel

individually. We do not know the distribution of data and sample mean to solve the

hypothesis problem analytically. Therefore, we estimate the mean distribution using

bootstrap procedure and use permutation to solve the problem.

Bootstrapping procedure involves resampling with replacement from the data, and

calculate the mean for each resample to estimate cumulative distribution function

of the mean. As mentioned in section 3.3 in permutation analysis resampling is

performed consistent with the null hypothesis. If we assume that the null hypothesis

H is true then the mean of power signal at each voxel is zero which is equal to the

mean of the inverted version of it. Therefore, for each condition, say pro-left around

movement onset, for every subject at each brain voxel we have 3 signals corresponding


1.18

Psu

ed

o-Z

1-1

-2.23(a)

(b) (c)

Anti-right cue-righthand

mIPS

AG

Figure 6.6: Average brain activation for anti condition/right target around cue onset(0-0.5 seconds) in three planes (a) transverse (b) sagittal, and (c) coronal.

to the 3 wrist-postures and their inverted versions which amount to a resampling

dataset of size 60 for all subjects at each brain voxel. Notice that the original sample

mean consists of 30 signals without considering their inverted counterparts. Moreover,

by including all the postures in the resampling dataset as well as the original mean we

are ignoring posture effects in the signals. Empirical cumulative distribution of the

mean is estimated by taking 2048 resamples with replacement from the resampling

dataset. If the sample mean is greater than 95%-bootstrap percentile then we reject

the null hypothesis.

Figure 6.7 shows the permutation analysis result for pro-left condition around

movement onset. The left panel and right panel show negative and positive activity

respectively. Negative and positive threshold for 95% p-values are shown in the figure


Maximum value=3.37Absolute threshold=2.26


Figure 6.7: Permutation analysis for pro condition/left target around movement on-set in three planes with 95% p-values. Right panel: positive activity(synchronization), Left panel: negative activity (desynchronization)

as well. As is evident from the figure activity in M1 and V3 are statistically significant

(Figure 6.3).

Figure 6.8 shows the permutation analysis result for pro-left condition around

cue onset. The null hypothesis is not rejected for positive activation, and negative

significant activity with the corresponding threshold is shown in the figure. As the

figure shows only mean negative activation in Figure 6.4 is significant.

Figure 6.9 shows the permutation analysis result for anti-right condition around

movement onset. The left panel and right panel show negative and positive activity

respectively. Negative and positive threshold for 95% p-values are shown in the figure

as well.

Figure 6.10 shows the permutation analysis result for anti-right condition around

cue onset. The null hypothesis is not rejected for positive activation, and negative

significant activity with the corresponding threshold is shown in the figure. As the

figure shows only negative activation in mIPS is significant.



Figure 6.8: Permutation analysis for pro condition/left target around cue onset inthree planes with 95% p-values. Null hypothesis is not rejected for positiveactivation. Negative 95% significant activation is shown.



Figure 6.9: Permutation analysis for anti condition/right target around movementonset in three planes with 95% p-values. Right panel: positive activity(synchronization), Left panel: negative activity (desynchronization)



Figure 6.10: Permutation analysis for anti condition/right target around cue onsetin three planes with 95% p-values. Null hypothesis is not rejected forpositive activation. Negative 95% significant activation is shown.

57

Chapter 7

Conclusion

In this report we have studied the theory of bootstrap. Bootstrap procedure is widely

used recently due to computational advances that have made the implementation pos-

sible.

We studied bootstrap procedure mathematical theory, and construction of boot-

strap confidence interval. We covered hypothesis testing as well as Neyman-Pearson

lemma as an important theorem in hypothesis testing which provides a necessary and

sufficient condition for the uniform most powerful test for a wide range of hypothesis

testing problems. Permutation analysis is also investigated as a method of using re-

sampling idea of the bootstrap procedure in solving hypothesis testing problems.

We investigated the asymptotic properties of sample mean bootstrap distribution

estimate, and showed that it is asymptotically Normal.

Next, we studied bootstrap estimate accuracy order using the tools from Edge-

worth and Cornish-Fisher expansions. The idea of resampling with replacement from

data to derive an estimate for statistic under study might look like a stretch at first

sight. Interestingly, the method not only converges to the true value of the statistic

58

asymptotically but also, provides a more accurate estimate than the Normal approxi-

mation. Furthermore, confidence interval estimate accuracy is improved considerably.

Finally, we applied permutation analysis on a database of brain magnetic signals

that is collected in an MEG reaching experiment to locate brain areas involved in

reaching. We showed that permutation analysis provides a statistically sound frame-

work to derive a significance threshold in brain images specially when there is large

variability in the data that makes average activity more difficult to read. Further

investigation into the role of the areas is called for. One idea is to look at time-

frequency responses at each voxel in which frequency axis covers 7-120Hz range and

time axis covers the duration of the experiment. Time-frequency response in a brain

region helps us study activation time series in different frequencies around cue onset

and movement onset. Furthermore, comparing such patterns between right and left

regions helps us to find specific time points that activity flips from positive to nega-

tive and vice versa between the two corresponding regions which might direct more

specifically to the role of such regions during the experiment.

BIBLIOGRAPHY 59

Bibliography

[1] H. Alikhanian, J. D. Crawford, J. F. Desouza, D. Cheyne, and G. Blohm.

Adaptive cluster analysis approach for functional localization using magnetoen-

cephalography. Front Neurosci., 7:73, 2013.

[2] J. E. Angus. A note on the central limit theorem for the bootstrap mean. Com-

munications in Statistics-Theory and Methods, 18(5):1979–1982, 1989.

[3] G. Barbati, C. Porcaro, F. Zappasodi, P. M. Rossini, and F. Tecchio. Opti-

mization of an independent component analysis approach for artifact identifica-

tion and removal in magnetoencephalographic signals. Clinical Neurophysiology,

115(5):1220–1232, 2004.

[4] S. M. Beurze, I. Toni, L. Pisella, and W. P. Medendorp. Reference frames

for reach planning in human parietofrontal cortex. Journal of Neurophysiology,

104:1736–1745, 2010.

[5] R. N. Bhattacharya and J. K. Ghosh. On the validity of the formal edgeworth

expansion. Ann. Statist., 6(2):239–472, 1978.

[6] P. J. Bickel and D. A. Freedman. Some asymptotic theory for the bootstrap.

Ann. Statist., 9(6):1196–1217, 1981.

BIBLIOGRAPHY 60

[7] D. Cheyne, L. Bakhtazad, and W. Gaetz. Spatiotemporal mapping of cortical

activity accompanying voluntary movements using an event-related beamforming

approach. Hum. Brain Mapp., 48:213–229, 2006.

[8] D. Cheyne, A. C. Bostan, W. Gaetz, and E. W. Pang. Event-related beamform-

ing: A robust method for presurgical functional mapping using meg. Clinical

Neurophysiology, 118(8):1691–1704, 2007.

[9] E. A. Cornish and R. A. Fisher. Moments and cumulants in the specification of

distributions. Revue de lInstitut Internat. de Statistique, 5:307–322, 1938.

[10] H. Cramer. On the composition of elementary errors. Skand. Aktuarietidskr,

(1):141–180, 1928.

[11] B. Efron. Bootstrap methods: Another look at the jackknife. Ann. Statist.,

7(1):1–26, 1979.

[12] J. R. Gilbert, L. R. Shapiro, and G. R. Barnes. A peak-clustering method for meg

group analysis to minimise artefacts due to smoothness. PLoS ONE, 7(9):e45084,

2012.

[13] P. Golland and B. Fischl. Permutation tests for classification: Towards statistical

significance in image-based studies. Inf Process Med Imaging, 18:330–341, 2003.

[14] P. Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer,

2005.

[15] Peter Hall. On the bootstrap and confidence intervals. Ann. Statist., 14(4):1431–

1452, 1986.

BIBLIOGRAPHY 61

[16] Peter Hall. The Bootstrap and Edgeworth Expansion. Springer, 1995.

[17] P. Hansen, M. Kringelbach, and R. Salmelin. MEG: An Introduction to Methods.

Oxford University Press, 2010.

[18] T. Hesterberg, D. S. Moore, S. Monaghan, A. Clipson, and R. Epstein. Boot-

strap Methods and Permutation Tests. Introduction to the Practice of Statistics,

Freeman, New York, 2005.

[19] S. Jun and T. Dongsheng. The Jackknife and Bootstrap. Springer, 1995.

[20] E. L. Lehmann. Some principles of the theory of testing hypotheses. The Annals

of Mathematical Statistics, 21(1):1–26, 1950.

[21] E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer, 2005.

[22] O. G. Lins, T. W. Picton, P. Berg, and M. Scherg. Ocular artifacts in eeg

and event-related potentials i: Scalp topography. Brain Topography, 6(1):51–63,

1993.

[23] W. S. Merrifield, P. G. Simos, A. C. Papanicolaou, L. M. Philpott, and W. W.

Sutherling. Hemispheric language dominance in magnetoencephalography: Sen-

sitivity, specificity, and data reduction techniques. Epilepsy & amp: Behavior,

10(1):120–128, 2007.

[24] D. N. Politis and J. P. Romano. Limit theorems for weakly dependent hilbert

space valued random variables with application to the stationary bootstrap. Sta-

tistica Sinica, 14:461–476, 1994.

BIBLIOGRAPHY 62

[25] F. Rong and J. L. Contreras-Vidal. Magnetoencephalographic artifact identi-

fication and automatic removal based on independent component analysis and

categorization approaches. Journal of Neuroscience Methods, 157(2):337–354,

2006.

[26] K. Sekihara, S. S. Nagarajan, D. Poeppel, A. Marantz, and Y. Miyashita. Re-

constructing spatio-temporal activities of neural sources using an meg vector

beamformer technique. IEEE Trans Biomed Eng., 48(7):760–771, 2001.

[27] S. Taulu and J. Simola. Spatiotemporal signal space separation method for

rejecting nearby interference in meg measurements. Physics in Medicine and

Biology, 51(7):1759–1768, 2006.

Localization of Brain Activity Using Permutation Analysis

Documents

Transcript of Localization of Brain Activity Using Permutation Analysis