uses of the fast fourier transform (fft) in exact statistical inference

USES OF THE FAST FOURIER TRANSFORM (FFT)

IN EXACT STATISTICAL INFERENCE

Joseph Beyene

A thesis submitted in conformity with the requirements for the Degree of Doctor of Philosophy

Graduate Department of Community Health Depart ment of Public Healt h Sciences

University of Toronto

@ Copyright by Joseph Beyene 2001

National Li'biary Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographii SeNices senrices bbliographiques

The author has granteci a non- e x c b licence aiiowing the National Li'brary of Canada to reproduce, ioan, distri'buîe or sen copies of this thesis in microform, papa or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantiai extracts fkom it may be p d e d or othenivise reproduced without the author's permission.

L'auteur a accorde une licence non exclusive pennettant a la Bibliothèque nationale du Canada de reproduire, prêter' disûi'bner ou vendre des copies de cette thèse sous la fonne de microfiche/fIlm, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

Uses of the Fast Fourier Transform (FFT)

in Exact Stat istical Inference

Joseph Beyene (Ph.D. 2001)

Graduate Department of Community Health

Department of Public Health Sciences

University of Toronto

Abstract

We present a unified characteristic function-based fiamework to compute exact sta-

tistical inference. The rnethodology is implemented using the fast Fourier transform

(FFT) algorithm. Euct pvalues for hypotheses of interest are obtained for gener-

atized linear models (GLMs) commonly used in medical and other applied sciences.

Examples are shown to iiiustrate the ease with which the FFT is used to recover

exact probabilities fiom any known characteristic function.

The fr,unework we developed dowed us to incorporate models based on non-

standard underlying enor distributions such as the zero-tmcated binomial and

Poisson distributions. We also have used the methodoIogy to investigate the sen-

sitivity of exact significance Ieveis to miscIassification mors and other mode1 mis

specifications. Potential sources of errors in using the FFT are discussed.

Acknowledgements

First and formost, 1 would like to thank my supervisor, Profeçsor David An-

drews, for his encouragement and guidance throughout my stuclies. 1 have benefited

greatly from his extraordinary talent and intuition about the fieid of statistics. 1

am indebted to Professor Pau1 Corey, who has sefved on my supervising cornmittee,

for providing me with al1 round support over the many years 1 have known him.

1 would &O like to acknowledge the assistance and valuable comments from other

rnernbers of rny supervishg cornmittee, Dr. David Tntchler and Dr. Michael E s

cobar, and my extemal examiner Professor Marcello Pagano. It is a great pleasure

to acknowledge the support and encouragement 1 received from Drs. Shelley Bull,

Mary Corey, Gerarda Darlington and David Tritchler. Danny Lopez and Vartouhi

Jazmaji were always there for me when 1 needed their help.

1 wodd like to thank the Department of Pubiic HeaIth Sciences, University of

Toronto, for hancial support. 1 am indebted to my parents who have taught me the

importance of education and believed in me al1 dong. 1 am grateful to my in-laws

for all their support. Last, but not least, 1 would like to thank my wife, Shafagh

Failah, and my children, Martha and Daniel, for their patience and support.

Contents

List of làbles

List of Figures

1 Introduction 1

. . . . . . . . . . . . . . . . . . . . . . 1.1 Objectiveandscopeofthesis 1

1.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Literaturereview 5

1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 The Fourier transform and its applications 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction 8

. . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Fourier transform 9

2.2.1 The discrete Fourier transform . . . . . . . . . . . . . . . . . . 10

2.2.2 The fast Fourier transfom (FFT) . . . . . . . . . . . . . . . . 10

. . . . . . . . . . . . . . . . . . 2.2.3 The inverse Fourier transforrn Il

2.3 Frorn characteristic fnnctions to probability functions . . . . . . . . . 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 IlIustrative examples 10

2.4.1 Simple discrete randorn variable . . . . . . . . . . . . . . . . . 15

2.4.2 Generalization of BernouIli tri& . . . . . . . . . . . . . . . . 19

. . . . . . . . . . . 2.4.3 Weighted sum of discrete random variables 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Summary 30

3 Exact inference in gene.r&ed iinear models 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction 32

3.2 Mode1 specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.1 Some examples of GLMs . . . . . . . . . . . . . . . . . . . . . 331

. . . . . . . . . . . . . . . . . 3.3 Inference in a Iogistic regression mode1 37

. . . . . . . . . . . . 3.3.1 Cornparison of two binomial proportions 41

3.3.2 Doseresponse experiments . . . . . . . . . . . . . . . . . . . . 43

. . . . . . . . . . . . . . . . . . . . . . 3.4 The Poisson regression mode1 46

3.4.1 Testing Ho : & = O in a simple Poisson regression mode1 . . . 47

3.4.2 Cornparison of two Poisson rate parameters . . . . . . . . . . XI

. . . . . . . . . . . . . . . . . . . . . 3.5 The general exponentid family 54

3.5.1 Joint and conditional distributions of d c i e n t statistics . . . 55

3.5.2 Characteristic function for merubers of the exponential family 57

. . . . . . . . . . . . . . . . . . 3.6 Extensions to truncated distributions 59

. . . . . . . . . . . . . . 3.6.1 Truncated Poisson distribution. Pt(A) 60

. . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Numerical example 63

. . . . . . . . . . . . 3.6.3 Truncated binomial distribution, &(n. p) 67

. . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Analysis of error bounds 68

. . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Sources of errors 68

. . . . . . . . . . . . . . . 3.7.2 Error in the Geometric distribution 71

. . . . . . . . . . . . . . . . 3.7.3 Error in the Poisson distribution 75

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Summary 76

4 Sensitivity analysis 79

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction 79

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Robustness 80

. . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Misclassification Errors 81

. . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Numerical Example 83

. . . . . . . . . . . . . . . . . . . . . 4.4 Mhpecification of link function 85

. . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Testing for = 0 87

. . . . . . . . . . . . . . . . . 4.4.2 Testing for H, : = c (c # 0) 88

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Summary 91

5 Ahernat ive approaches 93

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2 Srnail sample asymptotics . . . . . . . . . . . . . . . . . . . . . . . . 94 5.3 Large sample results . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.4 Applications to simple logistic regression . . . . . . . . . . . . . . . . 97

5.4.1 The likelihood ratio test . . . . . . . . . . . . . . . . . . . . . 98

5.4.2 TheWaldtest. . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.4.3 The double saddlepoint approximation . . . . . . . . . . . . . 99

5.5 Summary , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IO0

6 Summary and discussion 101

A FFT program for Beta-binomial 112

B FFT program for a weighted sum of random variabies 114

C FFT program for binary regression 116

D FFT program for Poisson regression 119

E FFT program for zero-truncated Poisson regression mode1 122

F FFT program for error d y s i s in geometric distribution 125

vii

List of Tables

. . . . . . . . . . . . . 2.1 A simple numerical example of a 4-point DFT 18

. . . . . . . . . 3.1 Data for the cornparison of two binomial proportions 42

. . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A dose-response example 43

3.3 Effect of insulin on mice at cliffereut dose concentratiors (Source:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finney, 1964) 45

. . . . . . . . . . . . . . . . . . . 3.4 Data for a Poisson regression mode1 47

. . . . . . . . . . . . . . 3.5 Exact pvalues in a Poisson regression model 48

. . . . . . . . . 4.1 Sensitivity of exact gvalues to misclassification error 85

. . . . . . . . . . 4.2 Sensitivity of exact pvalues to link mis-specification 91

List of Figures

2.1 Plot of binomial probabilities obtained using FE"ï as weii as the dbi-

nom function in %Plus. . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Probability m a s function (pmf) of a weighted surn Sn = CL, kXk,

. . . . . . . . . . . . . . . . . . . . . . for n=5, obtained using FFT. 31

3.1 Conditional probability mass function (pmf) of TLI% = for the

Poisson regession mode1 example obtained using FFT . . . . . . . . 49

3.2 Conditional ProbabiIity m a s bction (pmf) of Tl ITo = to for the

zero-truncated Poisson regression model example obt ained using FFT 66

3.3 Relative error in percentages of the FFT for calculating P(X = 0) for

different input sizes N and parameter values X = 3, X = 5, and X = 7. ï ï

Chapter 1

Introduction

1.1 Objective and scope of thesis

Traditionaily, statistical inference has reiied heavily on large-sample approximations

for sampling distributions of parameter estimators and test statistics. In particular,

results developed for continuous data are used in situations where the underiying dis-

tributions are dimete even when such approximations perform poorly with typical

sample sizes. This couid be of great concem in that incorrect and deceiving conclu-

sions cm be reached in studies where each sample point is so vital and expensive as

in biomedical researches (Weerahandi, 1995).

CurrentIy exact inference is often carrieci out using specialized software re-

stricteci typically to specific models. The main purpose of this thesis is to explore the

potential of the Fast Fourier transform (FFT) for obtaining exact answers for some

practical statisticd inference problems covering a wider claçs of models. Strengths

and limitations of the technique are studied.

We show that for cornmon important statistical inference problems of small to

moderate size for which a characteristic function is known expiicitly, the FFT is a

viable tool that d o m recovery of exact probabilities. We developed methodology

and implemented it with computer programmes that can be used within existing

and widely availabIe statistical software. ExampIes in this thesis typically took les

than 5 seconds on a personal computer (Pentium III, 64 MB RkM) running %Plus

2000 under Wmdows operating system.

We considered a unified characteristic function based approach for exact in-

ference in the class of generalized linear mod& and extendeci exact inference to

distributions that can be expressed as weighted exponential family members. We

show that a range of problems can be handled using a characteristic function based

framework which may not easily be incorporated into existing branch-and-bound

based approdes.

The organization of the thesis is as folIows. The second section of this introduc-

tory chapter contains brie.€ historicai highiights of the evolution of exact methods in

statisticd inference. Section 3 presents a brief overview of the literature on &hg

methods for canying ont Uexact'' statistical inference.

Chapter 2 introduces the discrete Fourier transfonu and the fast Fourier trans

form algorithm. Simple examples are worked out in detail to illustrate the usefuiness

of the FFT in recovering probabilities fiom a known characteristic function of a given

random variable.

In Chapter 3, the FFT is used to compute exact pvalues for parameters of

interest in two common g e n d e d Iinear models (GLMs) - the logistic and the

Poisson regession modeb - by generating the conditional distribution of a set of

suitable sufficient statistics for the models. in particular, exact pvalues are o b

tained for a hypothesis of the dope parameter treating the intercept as a nuisance

parameter. It is shown that the characteristic f'ctions in such modeis are easy

to work with and are available in ciosed form. This characteristic function based

approach allowed, among other things, extensions to models with tmcated bino-

mial or truncated Poisson error distributions. This chapter dso examines different

potentiai sources of mors in FFT based methods and demonstrates, with specific

examples, how a tmcation error can be controiIed.

Chapter 4 explores the sensitivity of the exact results to some deviations bom

standard modei speciûcations. The &ect of contamination is investigated.

In Chapter 5 we present a brief account of alternative approaches to exact

methods. Finally, a summary and general discussion is provided in Chapter 6. A

suite of S-Plus prograrns were written to ïmplement examples presented in Chapters

2 to 4. These p r o g r m e s are provided in the Appendices.

Historical notes

The need for methods and techniques for dealing with 'small' sample problems has

Iong been recognized. One such recognition is due to Student L(1908). He introduced

the t-distribution (commonly known as Student's t) for continuous data realizing

that routine use of methods based on a 'large' sampie assumption are inappropriate

when data are limited. His contribution did not go uncriticized by contemporary

advocates of large sample theory (Weerahandi, 1995). In the end, however, the t-

distribution and hence the t-test not only survived the critiques, but also pIayed a

dominant role in statistical inference.

The other remarkable technique developed in recognition of small sample prob

lems was Fisher's exact test for 2x2 contingency tables. S i . R.A. Fisher stated

(Fisher, 1925):

... the traditional machinery of statistical processes is wholly unsuited

to the needs of practicai research. Not only does it take a cannon to shoot

a sparrow, but it misses the sparrow! The daborate mechanism b d t on

'Student is the pseudonym of W h Sealy Goeset who dida't use his r d name due to a policy

by his employer, the brewery Arthur Guinness Sons and Co., against mrk &ne for the h n being

made publie

the theory of idnitely large samples is not accurate enough for simple

laboratory data. Only by systematically tackling s m d sample problems

on their rnerits does it seem possible to apply accurate tests to practicd

data.

Fisher noted that the technique he proposed was computationally intensive

making it impractical for routine use at that tirne. That was true when there were

no hi&-speed computers for the Iaborious computations required by the exact test.

However, the advent of faster machines over the last couple of decades coupled

with the development of suitable algonthms has led to renewed interest in pursuing

research on exact methods.

1.3 Literature review

Over the last few decades there has been a dramatic surge in the number of published

articles that show different ways of implernenting exact methods for a variety of

applications (March, 1972; Baker, 1977; Mehta and Patel, 1980, 1983; Hiji et al.,

1987; Pagano and Tritchler, 1983; nitchler, 1984). The different approaches that

appeared in the literature fail in one of the foilowing three categories:

0 exhaustive enumeration (e-g. March, 1972)

graph-theory baseci network algorithm (e-g. Hirji et al., 1987)

recurrence reiations and Fourier transform (e.g. TntchIer, 1984).

Two comprehensive sunrey papers on exact methods and available algorithm

for computing exact pvaiues are due to A m (1992) and Verbreek and Kroonenber

(1985). A recent book titIed 'Exact Statisticai Methods for Data Anaiysis' (Weera-

handi, 1995) shows various useM methods for exact inference with an emphasis on

normal theory methods such as ANOVA and regression.

Weerahandi (1995) notes that the methods describeci in his book are exact

in the sense that the tests and confidence intemais are based on exact probabiiity

statements rather than on asymptotic approximations. Inferences based on this

approach can be made with nny desired accuracy, provided that assumed parametric

mode1 and/or other assumptions are correct.

Exact methods are also widely used in nonparametric settings. These methods

provide exact pvaiues instead of appraximate fixed-IeveI tests. Most of the exact

nonpararnetric techniques are based on the idea of conditionai inference first intru-

duced by Fisher (1925). The basic principle behind this approach is to elhinate

nuisance parameters fiom the inference problem by conditioning on certain functions

of the observable random variables.

In most cases, sufücient statistics are used as conditioning functions. Exact

pvaiues are obtained as conditional probabilities based on extreme regions and

they serve as a measure of how w d the data supports or dismedits the underIyi~g

1.4 S u m m a r y

In some disciplines in which md-sampIes are the nom rather than the exception,

the traditional large-sarnple based rnethods of inference may not be valid. Statistical

consultants are f d a r with the emphasis most subject-matter researchers put on

sigdicance levels ( p d u e s < 0.05!) to the extent that the dissemination of the

findings might depend on them. While this practice is dangerous and not something

statisticians endorse withou t question, it is useful however to provide significance

values as accurately as possible. This thesis explores the practical uses of one method

of accomplishing this task, an approach baseci on a Fourier transform method and

implemented using the fast Fourier transform algorithm, in which exact probabilities

are recovered h m a known characteristic function.

Chapter 2

The Fourier transform and its

applications

2.1 Introduction

This chapter introduces the theory and applications of the Fourier transform (FT)

and estabiishes its connection with a generating function that is familiar to statisti-

cians, namely, the characteristic function of a random variable or vector. A popular

algorithm known as the fast Fourier tmndorm (Fm) is inhoduceci and simple ex-

amples are shown to illustrate the transformation from one dornain to another, in

this case, h m the characteristic function domain to the domain of probabiiity m a s

fiinction.

2.2 The Fourier transfom

A number of textbooks have been written focusing on different aspects of the Fourier

transform (see, for example, B r a c e d , 1986). Brigham (1988) provides an inte-

grated account of both the theory as well as the various applications of the fast

Fourier transform.

The Fourier transform decomposes or separates a function or a waveform into

sinusoids of different kequencies that sum to the original waveform. It identifies the

different frequency sinusoids and th& respective amplitudes (Brigham, 1988). It

is sirnply a frequency-domain representation (frequency content) of a function and

contains exactly the same information as the original waveform or signal. Fourier

analysis, therefore, ailows one to examine a given function Eiom a different point of

view.

The Fourier transform in one dimension is defineci as

where f(t) is a function to be decornposed into a sum of sinusoids. The argument t

is traditionally used to represent the variation of a quantity over time (e.g., physio.

logical signais). Ln general, a Fourier transform of a fwiction is a cornplex quantity:

where R(w) is the red part, S(w) is the imaginary part, (F(w)( is the amplitude,

-1 w and B(w) is the phase angle of the Fourier transform B(w) = tm [P(,l].

2.2.1 The discrete Fourier transform

The most important motivation for the devdopment of the discrete Fourier trans-

form (DFT) was the n d for numerical computations of transformations using dig-

i t d cornputers. Numerical integration of equation (2.1) yields the DFT formally

dehed by

N-L F(wk) = C f ( t j ) e - i m k t j k = O, 1 , ~ - - , N - 1.

j=O (2-3)

For problems that do not yield to a closed-form Fourier transform sohtion,

the discrete Fourier transform offers a potential method of attack. The challenge

in applying the DFT in practicd problems was that direct computation required

excessive machine time for large N, a time compIexity proportional to P. Therefore

a technique to reduce the computing tirne of the discrete Fourier transform was a

necessity if the method was to be of practicd importance.

2.2.2 The fast Fourier transform (FFT)

In 1965, Cooley and Tukey formaiized the fast Fourier transform (FFT), an dg*

r i t h that reduces computing time of the DFT to a tirne proportional to NIog, N,

N being the number of input data points. For Iâfger N, the improvement in efiiciency

gained by using the FFT algorithm over the direct computation is remarhble.

Similarly, in a twudimeonal problem with dimensions N and hi, the total

nuniber of computations is proportional tu NM log, NM. The FFT is considerd

as a fundamental problem-solving tool in the educational, industrial, and military

sectors (Bracewell, 1986). It is ubiquitous and its widespread usage is evidenced

by the wide variety of apparently unrelated application areas including biomedical

engineering, imaghg, analysis of stock market data, and speech signal-processing.

2.2.3 The inverse Fourier transform

The inverse Fourier transfom (IFT) associated with the Fourier transform given in

equation (2-1) is defined by

f (t) = lm F (w)ehutdf. -00

Given any bounded Nth-order sequence f(k) (a finite sequence of N terrns), the DFT

pair is deüned as

The bt equation is calIed the (direct) DFT, and the second the inverse DFT

(IDET). Generally, both sequences (f(k)) and { Fb)) will be cornplex, Le.,

Operations performed in one domain have correspondhg operations in the

other. For instance, the convolution operation in the time domain becomes a mul-

tiplication operation in the frequency domain, that is,

in other words, a Fourier transform maps a convolution in the time domain to

multiplication in the transform domain.

These facts allow us to move between domains so that operations can be per-

formed where they are easiest or most advantageous. The discrete-time transforms

are used to analyze and to process discrete-the signals. Discretetirne signals ei-

ther e i s t in th& own right - such as daiiy clasing stock market prices - or more

comrnonly, are ob tained by sampling contixiuoustime systems.

The Fourier Series and the discrete-the Fourier transîorms (DTFTs) are ac-

t u d y inverses of each other. The samphg theorem shows how densely to sample

continuoustime signds so that there is no Ioss of information.

2.3 Rom characteristic functions to probability

There exists a oneteone correspondence between frequency functions and their

Fourier transforms. In statistical applications involving random phenornenon, the

Fourier transform is essentialiy the characteristic function corresponding to the dis-

tribution of the random variable (or vector). Hence distributions can be represented

equivaiently by either probability distribution or characteristic functions.

In practice, the kequency b c t i o n is the usual representation since it is the

more intuitive function and the empirical distribution function often serves as the

basis for statistical inference. However, the characteristic function is the canonical

representation of some usefui distributions whose frequency functions can not be

expresseci in closed form.

Ta fm ideas, suppose X is an integer-valued random variabIe with support in

the set {O, 1, , N - 1). Then the characteristic function of X is given by

where pk = P(X = k) is the probabity m a s function (pmf).

Using Euler's reIation eie = cos 8 + i sin 8, we can easily ver@ that $x(w) has

a penod of 27r, that is, it satidies 4(w) = q5(w + 27r) for di w , since

Also, the characteristic function is real-valueci if and only if the correspondhg dis-

tribution function is symmetric around the origin. An example with a real-valued

characteristic function is a random variable that has a standard normal distribution.

To see the connection with the DFT, evaluate the characteristic function at N

equdy spaced values in the interval [O, 2x1:

Here t$ and p form a Fourier transform pair. The above equation defines the DFT

of the sequence of probabilities po, . , PN-L. -4s mentioned earlier, the h ' s are in

generd complex numbers. AIso note that extension of the range of m outside the

range {O, 1, + . . , N - 1) will remit in a periodic sequence consisting of a repetition

of the sequence q , . -. , CM- l. Our interest is in recoverïng the sequence of probabiiities Erom the correspond-

h g sequence of characteristic function values. In other words, we seek to obtain the

sequence of pk's Erom the sequence of %'S. This can be accomplished by using the

inverse DFT operation which is defineci by:

r N-L

2.4 Illustrative examples

There are many situations in which the characteristic function of a random miable

is easily computable but the inverse transform is not easily expressed in a closed

form. In such cases, the discrete Fourier transform approach can be applied and

implemented using the FFT algorithm to obtain the probability distribution of the

random variable. In this section, simple examples are worked out in detail to iiius

trate the inversion process and soi id^ some of the important concepts.

2 -4.1 Simple discret e random variable

For iIlustration purposes, we wüi b t consider a simple example for which both the

the probability m m and characteristic functions are known. Suppose the probability

distribution of a discrete random variable X is given by

1 1/4, i f k = O o r k = l

P ( X = k ) = 318, i f k = 2

118, if k = 3.

FoIIowing the notation introduced in the previous section we have po = pl = 1/4,

f i = 318, and = 118. Using equation (2.5) the characteristic function of X is

Now let us obtain the DFT for N = 4 by evaluating the characteristic function

at the Fourier fiequencies w = F, for m = 0,1,2,3. Using equation (2.6) ae obtain

Assume we ody knew the above coefficients and are interested in remverhg

the underIying probabiiity mass function. From equation (2.7) we have

Simiiar substitutions of the 's cm be nsed to recover pz and p3.

Before we show how the FFT can be used in this very simple example, it

is worth mentioning that Merent software packages may have a siightIy different

implementations of the FFT algorithm. Thus the user shouid make sure that the

output from a given FF'I' implementation gives the expected results for the applica-

tions unch study. In this thesis, the SPlus function Et @-Plus 2000 for Wmdows)

has been used throughout.

Going back to our examph, first we store the q ' s in a vector, Say cc. Each

element of this vector is expressed as a cornplex number z = x + iy, where x and

y are the red and imaginary parts, respectively. Then a normalized FFT is used

on cc, which produces the associated probability values exactly, as can be seen in

the Splus output beiow. In the output [Il represents the position of the element

that foIlows, in this case the first element of the vector. Both the characteristic

and probability function vectors have 4 elements. Note aIso that the imaginary part

of the dements in the probability vector are ail zero. For some of the examples

throughout the thesis it rnight be instructive to present an annotateci output £rom

the outputs of the %plus functions given in the Appendices directly. This wilt help

us see the detaiis of iuput/output stnicture.

--

The foilowing table surnrnarizes the results of the Cpoint DFT-IDFT pair

for the above example. Rom the results shown in the table we observe that in

generai the characteristic function d u e s are cornplex valued and the probability

mass functions are recovered exactly by applying a scaled FFT on the input sequence

of the characteristic function vaIues.

Table 2.1: A simple numerical exampie of a Cpoint DFT

2.4.2 Generalization of Bernoulli trials

Suppose the success probability in a sequence of Bernoulli trials is allowed to vary

fiom tria1 to trial, keeping the trials independent. One approach of dealing with

such instances is to use compound (or hierarchical) models. It is often not too

ciifficuit to derive characteristic functions analyticaüy for cases involving compound

(or mixture) distributions. In this example, we consider a compound mode1 which

relates Bernouili/Binomial random variables with a &ta distribution. We give

theoretical detaiIs of two separate cases and show an imp1ementation for one of

them, and we point out the approach to be taken to implementing the other one.

Case-1: Compound of Bernoulli with beta distribution

First, let

Denote the total number of successes by Y. We want to obtain the probabiIity

distribution of Y by h t computing its characteristic funciion.

The moment generating function of Y is given by

where

rn, (t) = E [E (eaj 1 P = pj ) ]

= E [pjer+ 1- pj]

= E [pj(eL - 1) + 11 = (et - 1) E(Pj) + l

Hence, the characteristic function of Y is given by

where the mean of P is assumed not to depend on j (the homogeneous case). We

note that this characteristic function corresponds to a binomial random variable

with n number of trials and success probabihty p p (Beta is a conjugate prior for

binomial).

C-2: Compound of Binomial with beta distribution

Let

XjlPj - Bin(m, Pj), j = 1, .. . , n,

-4s before, we have

where

Hence, the characteristic function of Y is given by

In this case, we need to know the moments of P (up to order m), and as before

we have assumed that the moments do not depend on j. When m=l, this resdt

reduces to Case-1 above.

The SPlus program in Appendix A was used to compute the probability dis-

tribution of Y as describeci in Casel above. As an example, consider the case n=20

and p = 1/3. The %Plus output bdow shows the support (supp), probabiities

obtained using FFT (prob), probabilities obtained ushg the binomial density dbi-

nom function in Splus, and the ciifference between the Iatter two. Inspection of

each probability mas shows that the recovery was indeed exact. These probabilities

are also shown in Figure 2.1.

> betabin(20,1/3)

W prob bin dif f

1 , O 3-0072876-004 3 .OO728?e-OO4 7.291260e-017

C2,l 1 3.007287e-003 3.007287e-003 -9.9746606-018

C3 ,] 2 1.4284616-302 1.4284616-002 -1 -335737e-016

E4.1 3 4.285383e-002 4.2853836-002 -2.220446e-016

CS ,1 4 9.106440e-002 9.10644Oe-002 1.804112e-016

C6 5 1.467030e-001 1.457030e-001 -2.220446e-016

17 ,] 6 1.821288e-001 1.821288e-001 -3.053113e-016

C8 ,J 7 1.821288e-O01 1.821288e-001 -6.55lll5e-Ol6

9 1 8 l.479796a-OOl 1.4797 96e-001 -4.996004e-016

2.4.3 Weighted sum of discrete random variables

A statistic involving a weighted sum of random variables is fiequently used in sta-

tisticai inference. When independence between the random variables is a reasonable

assurnption, weighted sums have several desirable properties. For instance, calcula-

tions of distributionai characteristics [e.g., i h t and second order moments, various

generating functions) wiU be greatly simpMed if independence of the variables in

the sum may be assumd. The key statistical functionals that w i l I be expIored in

detail in the next chapter are linear combinations of independent random variabIes.

Figure 2.1: Plot of binomial probabilities obtained using FFT as well as the dbinom

function in %Plus

Here we demonstrate the uses of the FFT based approach in a relatively small but

practical problem.

Let X1, Xz, . -. , X, be i.i.d. discrete random variables with probability mass

function (pmf)

f(x) =x/21, X = 1,2,---,6.

Suppose we are interesteci in the distribution of the weighted sum Sn = CL, kXk.

It can easily be shown that the necessary condition for the Lindeberg-Feller central

limit theorem for weighted sums holds true, and accordingly, it follows that

i.e., for suEciently large n, Sn is apprdmately normdy distributed.

For small n, however, this asymptotic result may not be satisfactory and the

need ariseç for the evaluation of its exact distribution. Although our aim is to

iiiustrate the process of converting a characteristic function in order to recover a

probability function, it muid be interesthg to discuss at this point an alterna-

tive approach for computing exact probabilities in this particular problem. This

approach makes use of yet another type of generating function known as the prob-

ability generating function (PGF). The PGF, p, of a non-negative integer-dued

random variable X is d&ed by

It is known that P(X=k) ca~ be mvered from P(~)(O), where ~ ( ~ ) ( t ) is the k"

derivative of the PGF. To be s p d c ,

Suppose n = 5. Sn as dehed above is a non-negative integer-valued random

variable with support, Supp(S,) = (15: 16, -, 90). Hence we can apply the resuit

in equation (2.9) to recover exact prababilities. The PGF of S is given by

where

Note that Ps(t)

hence the exact

z=l

is a polynomial and it is easily seen that

probability corresponding to support k is simply the coefficient of

the term tk in the polynomiai expansion defiiiing the PGF. Suppose we are interesteci

in computing exact t d probabilities sneh as P(S < 16) or P(S > 88), we can do

so by adding corresponding polynoniial coefkients. Mathematical packages such as

MapIe or Mathematica can be med to faditate the comput ation.

For this examde, we used MapIe (Maple V, Waterloo) and the entire expansion

of the polynomid m t is shown on the following page.

32 80 48 584 296 376 596 PGF :- - tgO + -

16807 50421 ta9 + iss07 P+yj1263 tE + j042i t" + 64827 t8' + jaizi ta

1636 2036 332 24338 8878 29956 tn +- 151263 t s 3 + ~ ta + 2~soa + 1361367 + i%Ei

119932 tPJ + 1361367

3616 35368 113051 125353 t73 +- 151263 tn + 1361367 t76 + 4084101 t75 + 4084101 t74 + 4084101 130442 134452 45964 139870 15670 ta

+ 4084101 '" + 4084101 tn + 1361367 tm + 4084101 t6g + ZEKl 141388 140542 19883 19526 133534

+ 4084101 t67 + 4084101 tW+ js34n? tm + %zz t" + 4084101 129754 13927 4436 114727 108656 p

+ 4084101 + z%-ii + E i 5 + 40û4101 t59 + 4084101 102442 93696 89422 27590 25442

+ 4084101 "' + 4084101 tM + 4084101 37382

ts + 1361367 51682

tM + ,61367 69610 63446 46070

+ 4084101 t52 + 4084101 t5L ' 4084101 ta + 4084101 t4g ' 4084101 13661 12034 3527 3062 7963 t43

+ 1361367 t47 + 1361367 + 453789 t45 + iiEz

+ 1361367 20494 2498 2108 12394 10316

+ 4084101 ta + -

583443 t4I + ;i83À; t40 +

4084101 + 4084101

We can see, for example, that P(S = 90) = 32116807 (coefEcient corresponding

to i?") and P(S = 15) = 1/4084101 (cofficient corresponding to t15). These extreme

cases are simpIe to check by direct calculation. For example, S=90 can occur only

when each of the X's takes on the value 6, and thus,

Similarly, S=l5 can happen only when each of the X's takes on the value 1, and,

once again direct calculation can be carried out easily. Cumulative probabilities

such as the probability that the random variable S takes on values ut rnost 87 is

obtained as

£rom which tail-ma probabilities can be computed.

Now let us follow this example further, this time using the Fourier transform

approach. Since the characteristic function of the Xi's is given by dw(t) = ~ ( e ' ~ ) =

eitzx/21, the characteristic function of S can easiIy be derived as: ds(t) = Cz=r

E(eis) = nLI #xi (kt). The recovery of the prnf of S h m its characteristic function

can be performed as foiIows (Appendix B gives the S-Plus code that was used in

this example). We evaluate the characteristic functions h m O to 90.

Note that, as expected, the probabilities are O when S takes on values in the

range 14 and under. We can also verify the other results with that obtained fiom

the PGF approach. For example, P (S = 90) = 32/16807 = 0.001903969 (this value

is shown in the SPIus output above as the 91th elment, since vector indexing

starts Erom 1 in this package). SimiIarly the probabiIity that S equaIs 50 is given

by P ( S = 50) = 0.01405009, which was earlier shown in its fractional form on the

output fiom Maple as 57382/4084101.

Figure 2.2 shows the pmf of Sn for n=5 as obtained using the FFT approach.

A slight skewness to the Ieft can be noticed hom this graph.

To give an idea of the improvement in computationd complexity achieved

by using the FFT in the above example, let us consider the input size (without

any padding) N=75. A direct inversion of the DFT would require 7S2 = 5625

multiplication of complex numbers, whereas the number of multiplications required

by the FFT will be U(75 loa(75)) = 0(467), a reduction by a factor of 12.

This chapter introduced the theory and applications of the Fourier transform (FT)

as weii as the connection with the characteristic function of a given random variable.

The fast Fourier transform aigorithm was used to recover exact probabilities in three

simple but practical applications.

Figure 2.2: Probability n i a s function (prnf) of a weighted sum Sn = CLL kXk, for

n=5, obtained using FFT.

Chapter 3

Exact inference in generalized

linear models

3.1 Introduction

Since its introduction by NeIder and Wedderburn in 1972 as a unifying approach

to the regression analysis of both continuous and categoricai outcome variables,

the generalized linear mode1 (GLM) h e w o r k has been used successfully in many

areas of application. This dass of modek extends the traditional linear models by

allowing non-normal response distributions and suitable transformations to linearity.

In medical applications, the logistic regression and the Poisson regression models

are two of the most commoniy used GL-Ms. A comprehensive and cIassic reference

describing the theory and applications of GLMs is due to Wul lagh and Nelder

(1989).

ÙiferentiaI procedures for GLiMs are often based on traditionai large-sample

theory. However, one can cite several instances in which the need to deai with

small-sample situations mises:

0 very expensive experiments or 'costly' sacdices

usefui information may be avaiiable at an early stage of an experiment when

only few patients have been recruited

0 in large-scale multi-center clinical trials, a srnaIl part of the data may represent

the contribution of one of the smaiier centers.

In this chapter we show that the FE'T approach can be used to conduct ex-

act inference in most commonly used GLMs. No fancy software is required. This

is facilitateci by the ease with which characteristic functions of d c i e n t statistics

are computed. In particuiar, we present the two most commoniy used GLMs -

the logistic and Poisson regression models - and demonstrate with examples the

feasibility of computing exact pvaIues for hypotheses of interest. We extend our

formulation of the exponential family to inchde weighted exponential families lead-

hg, among others, ta interesthg specid cases of the logistic and Foisson models.

To get insight into the errors that may resdt born using the FFT, we investigate

the consequences of truncation errors for the Geometric and Poisson distributions.

3.2 Mode1 specification

A generalized linear mode1 is characterized by the following threepart specification:

1) Random Cornponent: each component of Y has a distribution in the e x p

nential famiiy, taking the fom

for some specific functions a(.), b(.), and c(.). If q5 is known, this is a one-parameter

exponential-farniIy mode1 wit h camnicd parameter 0 (McCullagh and Nelder, 1989);

2 ) Systernatic Componenk a h e m combination of the covariates, sometimes

known as a linear predictor, given by

3) Link finction: the iink between the randorn and systematic components is

given by

7h. = !I(&)l

where g(.), is any monotonic differentiable function known as the link function.

Special link functions known as canonid links occur when Bi = qi.

3.2.1 Some examples of GLMs

1. Linear regression

The standard linear model satidies the above GLM formulation with a Nor-

mal distribution for the random component and the identity function for the link

component. The normal distribution belongs to the exponential family.

2. Poisson regression

Assume that K, . . , Y, are independent Poisson random variables with rneans

pi, where pi > O (i = 1,~-.,n). Tt can be shown that the Poisson distribution

belongs to the exponential f d y . Using a logarithmic link function the standard

Poisson regression model is written as

in(/&) = %=B.

The link function In(.) is the canonical iink and maps the intervai [O, m) onto

(-CU, m). The identity link may not be appropriate, in part, because q may be

negative while p must not be.

3. Binary regression

Suppose YI, . - , Y, are independent Bernoulli random variables 6 t h mean pi,

where O 2 5 1 (à = 1,. , n). A potentiai link functiûn in this case should map

the interval [OJ] onto (-00, CU).

The following three lùik functions are used commonly with binomial distribu-

tion:

where 3 is the standard normal cumulative distribution function;

3. Complementary log-log

The logit hnk is the cananical iink for the binomial distribution and is by fa

the most popular. It is widely used in the health sciences, in particular in epidemi-

ological studies, because of the resuiting odds ratio interpretation of the regression

parameter estimates. In addition to its useful interpretation, the Iogit link is also

mathematicall y simpler. Sensitivity of exact t ail probabilit ies to niisspecification of

a link function are explorecl later in Chapter 4.

Note that, as in the Poisson regression case, the identity link is l e s useful with

binary regression, partly due to possible predicted probabiiities falling below zero

or above one.

3.3 Inference in a logistic regression model

Assume that Y = (& - -; &)' is a vector of independent binomial random variables,

that is K .v Bin(w, pi).

The linear logistic regression model is given by

where Iogit(p) is an n-vector of Iog-odds, log(pi/(l - pi)), X is a known n x p

full-column rank matrix of explanatory variabIes whose components cm, in general,

be either quantitative or qualitative and jl is an unknown pvector of regression

paramet ers.

Using Neyman's Factorization theorem, it can easily be shown that T = X'y

is sufficient for @ under the model specifieci above.

In some situation, it might be appropriate to work with

where w is an n-vector of constants. In this case, the primary parameter of interest

is often the scalar A, and the common test of hypothesis is Ho : A = 0.

The sufficient statistics are T and S = w'y. Conditional inference for testing

the hypothesis A = O can be Carried out by working with a conditional reference

distribution. This reference distribution is given by

where t is the obçerved value of T. This conditional distribution does not depend on

the regression parameter vector 8, which in this case may be treated as a nuisance.

In epidemiology and many other medical applications, it is o h convenient to

tbink in terms of the odds of "success" p / ( l - p ) rather than the "success" probability

P*

In particular, let Yi - Bàn(nl,pl) and & - Bàn(m,p2) be two independent

binomial random va.riables and $J = be the odds ratio, i.e. the ratio of the

odds of "success" for YL to the odds of "success" for K.

The conditional distribution of YI given YL + 5 = t is given by

for

max(0, t - nz} 5 u 5 naàn(nt, t).

This conditional distribution is the non-central hyper-geometric distribution.

Wben $ = 1, equivdently pi = pr , we obtain an important special case, the (central)

hyper-geometric distribution with probability m a s function

The n d hypothesis of the equality of the two binomial parameters pl and pz, or

equivalently the hypothesis that the odds ratio is unity, can be cast as an inferential

problem in the cIass of GLMs in the foilowing way. Let x be a binary variable

indicating group membership that can be written (without any loss of generality)

1, if i is in Group 1 xi =

0, otherwise.

and consider the mode1

logit (pi) = /Yo + xi.

Then the nuii hypothesis of interest is equivaient to testing the hypothesis H,, : & =

O. Define a pvalue for a two-sided test as

where

p- (.) = Pr(. 5 obs), p+(.) = Pr(. 2 06s)

are signiscance levels corresponding to Uniformiy Most Powerful Unbiased (UMPU)

lower and upper one-sided tests of Ho (Cox, 1970).

In order to use the FFT approach, first we need to derive the joint characteristic

function of the pair of sufEcient statistics under the simple logistic regression mode1

shown above- In this case, the vector T = (Tor Ti) = (C(X), C(x&)) is sufEcient

for /3 = (Bo, pi). The joint moment generating function is d h e d as

where .Y Bk(%, pi). The joint characteristic fimction is thus defined a s

where the i preceding to and tl is the notation as it is usai in complex numbers.

Inference about pl is based on the conditional distribution of TL given To = to,

The implementation of recovering this conditional distribution is done itsing

the foiiowing 4 steps:

1. evaluate the joint characteristic function on a grid of values as demonstrated

in Chapter 2;

2. recover the joint probability distribution of To and TL using FFT (this is the

numerator in the formula for the conditional distribution of TL given To);

3. compute the marginal distribution of To by summing over all support of T,(this

is the denominator in the formula for the conditional distribution of TL given

To);

4. divide the joint distribution (step 2) by the m+ distribution (step 3)

t O get the mmple te reference conditionai distribution. Useful characteristics

of this reference distribution can then easily be &racted. For exampIe, tail

uea probabifities cm be dcuiated using the additional information on the

o h e d value of the conditional random variable, Tl = t L .

3.3.1 Cornparison of two binomial proport ions

For illustration purpaes, consider the following hypothetical example (we will con-

sider an example involving real data a littIe Iater). A twearm pardel group design

has been carrieci out in a clinicai triai with the purpose of comparing adverse out-

cornes in each of the treatment groups. Suppose 22 patients were randornly docated

to ea& treatment arm and after a pre-spded follow-up thne the fiequency of ad-

verse outcomes was counted. The number and percent of adverse outcomes in the

two groups were 2/22 (9.1%) and 8/22 (36.4%), respectiv& (Table 3.1).

Exact inference under the null hypothesis that the risk of adverse outcome is

not merent in the two treatment groups was conducted using the SPlus program

given in Appendix C. The output below shows how the function is invoked and the

d u e s retunied by the function (Ieft-hand side pvalue, right-hand side p-value, and

W d e d pvalue).

Note that in the above example the function BFt2d.n~ is invoked with the

Table 3.1: Data for the cornparison of two binomial proportions

Group Adverse Normal Total

(36.36) (63.64)

Tot al 10 34 44

iink function defaulted to logit and no contamination was assumed in the data. In

Chapter 4 we wiU present generalizations in which the sensitivity of exact results to

some sort of departure £iom mode1 assumptions is empincally investigated.

Using the FREQ procedure in the SAS statisticd software (SAS Institute Inc.,

Cary, North Carolina), the left, right, and Ztail Fier's exact test probabilities

were 0.034, 0.995, and 0.069, respectively.

We note that the p-values are identical to the ones obtained using the FFT.

However, it shodd be noted that there exist different d a t i o n s of twesideci p

values tbat could lead to different results (Agresti, 1992).

3.3.2 Doseresponse experiments

Suppose a srnd dose-experiment is conducted in which p u p s of 10 experimental

subjects were exposecl to 3 different dose concentrations (O, 1, and 2 units). A binary

response of interest was recorded and the data resulting fiom th& experiment are

shown below.

TabIe 3.2: A dose-remonse examde

A simple logistic regression model, logitlpi) = + j?ixi, was assumed to

be an appropriate model for this data set. Once again we are interesteci in the

dope regreçsion parameter, a, and thus the intercept term, A, will be treated as

a nuisance parameter. Using our FFT program (Appendix C), exact pvalues were

obtained for testing the hypothesis = O. The one-sided (right hmd) pvalue was

0.1352204 and the corresponding 2-tail p-value was 0.2704408.

The SpIus function of Appendix C retum several quantities of interest dong

with the tail area probabiiities. One quantity which is very heIpful in monitor-

ing whether the exact pvalues might be inaccurate is the marginal probability of

the suflicient statistics corresponding to the nuisance parameter evaluated at the

observed value, Le., P(To = ta). In the event this probability is ahost zero, the

pvalues may not be accurate, since this value enters in the denominator of the

conditional probabilities. Vouset et al. (1991) and Huji et al. (1996) show some

specsc applications in whkh inaccurate results may be observeti with FFT based

methods of exact inference. The observeci values for the pair of diicient statistics

T = (G, TL) are also retnrned by our SPlus function.

Using LogXact package (LogXact-Ti~rbo, 1993), the one sided pvalue for the

same hypothesis based on the above data was 0.1352, which is identical tu what we

obtained above using the FFT method. To continue with this example, suppose an

additional 10 subjects were subjected to a 4-unit dose and 7 of them responded. We

wouId now want to anaiyze the combineci data which consists of 40 experimental

nnits. Repeating the same analysis as above resulted in a one-sided pvalue of

0.04626505 using the FM' and 0.0463 with LogXact. To further make this example

close to a reaiistic doseresponse experiment let us consider one more dose Ievel set

at 5 and again 10 subjects being exposed to this dose b e l . Let us also assume there

were 8 responders. AndyWng the combined data with a total of 50 experimentd

units using the FFT method we obtain a onkded p d u e of 0.009967243. The

rounded one-sided pvdue obtained fiom LogXact was 0.0100.

Next we will consider a red data set taken from Finney (1964, Table 17.2).

The data describes the effect of insulin on mice. On a suitable log-scale for the dose,

the data for a standard preparation of insulin are shown in the table below.

Table 3.3: Eect of insuiin on mice at different dose concentrations (Source: Finney,

The exact onesideci pvaiues for the test of a h e a r dose effect obtained using

the FFT approach was 0.00004358595 and the correspondhg pvalue fiom LogXact

(to 6 decimal places) was 0.000044. The above examples ali show that the FE'T

approach produces the same pvaiues as those obtained using well known commercial

p*ge-

3.4 The Poisson regression mode1

In this section, we will study the simple Poisson regression modd. As before we wili

focus on the nuil hypothesis that the slope parameter is zero which can aIso be cast

as the cornparison of two or more Poisson mean parameters.

A connection between log-linear models for fiequencies and multinomial re-

sponse modds for proportions &sts which stems from the fact that the binomiai

and multinomial distributions can be derived from a set of independent Poisson

random variables conditionaily on their total being fixed.

Let YL, . , Yk be independent Poisson random variables with means AL, - . . , Xk,

respectively. One may be interested in testing the composite nul1 hypothesis that

the mean parameters are equal:

where the xjs are assumed to be hed known constants. When = O it is dear

that we are testing the equaIity of Poisson mean parameters.

Again, standard theory ofsignificance testing leads to consideration of the test

statistic TL = Cx,cjE;- conditionally on the observed value of t, = C gj, which is the

mifficient statistic for Po. Generalization to k-parameter exponential families will be

discussed in a subsequent section.

3.4.1 Testing Ho : & = O in a simple Poisson regression

model

The data bdow shows counts pi observeci at various values of a covariate x.

Table 3.4: Data for a Poisson regression model

We assume that the responses are independent Poisson random variables,

with E ( x ) = Var(x), where

for i = 1, ---, 7. The canonid iink for a Poisson regession modei is the log link.

Fitting this canonical Iink and using our FFT program written for the Poisson modei

(Appendix D), we obtain the following results and the entire conditional reference

distribution is depicted in the figure on the next page. The function was iuvoked

as pois2d(x=x,y=y), where the vectors x and y store the covariate and response

values, respectively.

Table 3.5: Exact pvalues in a Poisson regression mode1

t0.o 67

To our knowledge, there was no software available at the time of writing to

conduct exact analysis for an arbitraty Poisson regression model against which our

resdts c m be checked. Table 3.5 shows the two onesided pvaiues dong with the

two-tailed pvalue corresponding to a test of Ho : Br = O in the simple Poisson

regression model. We also give the marginal probability, p = 0.04867797, used as

the denominator in the calculation of the conditional probabiity. Again when this

marginal probability is dciently close to zero, numericd instability may result and

the p d u e s may not be calculated accurately using the FFT method. The observeci

values of the pair of sufücient statistics, also shown in Table 3.0, give us an idea on

how big the grid of evaiuation of the characteristic function could be.

Ili,..

Figure 3.1: Conditional probability m a s function (pmf) of Tl ITo = to for the Poisson

regression mode1 example obtained using FFT

3.4.2 Cornparison of two Poisson rate parameters

To venfy the validity and accuracy of exact tail probabilities under a Poisson re-

gression model, we used an example for which we have a theoretical j d c a t i o n .

Suppose that YI and Y2 are independent Poisson random variables with means X

and pX, respectively. It can be shown that

and

lY = m Binmàal(m, 1/(1+ p)) .

The second result is the most relevant for our purpme. In particufar, if both YL and

Y2 have a Poissm(X) distribution, this fact states that the conditional distribution

of Yl h b g the sum of YL and 6 to an observed value m yields a BinmàaE(m, i/2)

distribution. This speciai case is equivalent to testing for equaliw of the two Poisson

mean paramet ers, since

H , : X = p X + p = l .

NOW let us use a simple example to illustrate this fact and, more importantly, conhm

that what we get out of our FFT formulation for the Poisson regession problem is

indeed an exact conditional probabiIity m a s function.

To do this, without loss of generality, fix the Poisson parameter at X = 1. In

the regretsion framework, this is equivalent to setting both Bo and & to zero and

use a dummy variable 1/0 for the crwariate x Under these conditions, the pair of

s5cient statistics To = CL, Yi and Tl = CL1 xik;: reduce to To = Yl + 5 and

Tl = YI, respectively.

Suppose two independent random values are generated h m a Poisson(1) dis-

tribution, i.e., a Poisson distribution with mean parameter X = 1. This can be

achieved, for instance, using the rpois function in S-Plus which generates random

numbers £rom a Poisson distribution as foilows:

- - - -

If the total of the two Poisson variates is fixed at the observed \due, in this

case at 3, the above theoretical result says that the conditional ditribution of &

given this total is Binomial(3,1/2). The probabilities for a Binomial(J,l/2) random

variable corresponding to the support set {0,1,2,3) are:

And here is the conditional probability distribution of TL lTo = 3 obtained using

our FET S-Plus program (Appendix D) which implements the Poisson regression

model.

Note that the FE'T was used at 36 input sequences (frequencies) to make sure

d or almost ail the probability mass has been recovered. As we can see we were able

to recover the binomid probabiIities exactiy (at least to 8 decimal places accuracy

!). In general, we use this exact conditional probability distribution as the basis for

inference about the regression parameter of interest. In this example we realize we

have used only 2 observed values h m a Poisson(1) probability distribution. Does

the method still work if we use two Merent realizations? To investigate tbis concept

further, let us considzï hxïie cxtlmxie cases. T'hm iz u !. h 100,000 chance that a

value of 7 can be o k e d and a 51 in 100,000 probability that a value of 5 ca. be

obtained h m a Poisson(1) distribution as can be seen h m the probability values

computed h g the dpois(for Poisson density) function in SPLus and shown below-

-- -

This observation may Iead us to ask what happens if we took y, and y2 to be 5

and 7 respectively? According to the theoretical result for a conditional distribution

of one Poisson variate given the sum of two independent Poisson random variables,

the sum is 12 and hence the conditional distribution will be Binomial(l2,1/2). The

probabiIities for a Binomial(l2,1/2) random variable are:

Using the Poisson regression mode1 FFT program (Appendix D), m obtained

the conditionai probability distribution shown in the foiIowing output:

In this case there was no need to carry out the FFT on an extended input

sequence. Only 13 inputs (Fourier fkequencies) were used and it was possible to

recover the exact probabilities as can be seen comparing the FFT results with the

probabüities generated using the exact Binomial distribution and displayed above.

3.5 The general exponential family

In the preceding few sections, we have studied how to compute exact conditional

probabilities to make inferences in two widely used models, the logistic and Poisson

regression models. In this section we lay out the theoreticai justification of the valid-

ity of doing inversions of characteristic functions to recover probabiIity distributions

for a more p e r d class of distributions.

The foliowing definition provides a çlightiy different parameterization fiom

that shown in i3.2 for distributions in the exponentid family.

A family of distribution with p.m.f. f (x; 6) is said to bdong to the exponential

f d y of distributions if f (x; 8) can be expressed in the form

f (z; 6) = a(6) b(x)ex cj(e)dj(d

for a suitable choice of functions a(), b(), c() and d o , where 0 is a vector of

paramet m.

If cj (6) = Bi, j = 1, , k , the family is said to have its natural parameter-

ization. In this case T is complete sdûcient statistics for (O1, + - , Ok) (Lehmann,

1983).

3.5.1 Joint and conditional distributions of sufücient statis-

tics

By way of generalization, below we state a theorem (without proof) and give a

corollary (with proof). The corollary is in particular relevant to the inferential

problem considered in this thesis.

Theorem 3.1 (Source: Warahandi, 1995) Let T = (C TI (Xi), - . -C Tk(Xi)) be the

wmplede suficient-statistics bmed on a random sample h m the exponential family

having the natunzf parameterizdion.

The joint p.m$ or p.d$ of T is oj the fonn

where t j = ry==, TJ(Xi).

Let T = (U, V) be a partition of the set of complete sufficient statistics. Let

O = (O,, 8,) be the corresponding partition of the parameter vector. The foliow-

ing useful coroiiary concerning the conditional distributions of one given the other

(and the marginal distribution of each cornponent) can be deduced fiom the above

theorem.

Corollary 3.1 The conditional distribution o f U given V = v forms an ezponentiul

jamily. Moreover, this distribution is independent of parameters O,.

Proof: (without 108s of generality, consider the continuous case)

fu,v(~, V; 8) = A(0) B (u, v)eGU+*"

The marginal distribution of V can be obtained by integrating this joint density

with respect to u. Therefore, the conditional density function

is exponentiai f a d y independent of 8,.

What this result says is that in principle the conditional distribution needed

for inference on mode1 parameters of interest is available in closed-form as a mem-

ber of the exponential family. In practice, however, there is limitation in that the

combinatorics involved could be daunting. With the ever-growing improvements in

speed and storage capacity of today's computers, a method such as the FE'T could

be applied in many situations where the sample size is small to moderate.

3.5.2 Characteristic function for members of the exponen-

tid family

If a family of distribution can be expressed in the form

for x E A, where A is a set independent of 8, then the sufficient statistic is k-

dimensional. A n a t d parameterization (canonical form) is given by

The canonicd form has several advantages. For example, it is much easier to com-

pute moments and other featnres of sufficient statistics.

For the simple logistic regession modeI, the density of (Kr s , YR) is

Note that this is a Zparameter (with respect to and Br) exponential family

and the sufficient statistics corresponding to the natural parameters are

T, = Cz&.

The joint characteristic function of the d c i e n t statistics Tl and T2 is derived

provided (Bo + sl, + s2) Iie in the natural parameter space (true if space is open

and both sl and s2 are sufficiently close to zero; see Knight, 2000).

Thus we have,

This formula can be cxtended to the k-dimensional case in a straightforward

manner. The inversion of the characteristic function will be carried out, in principle,

in the same way as in the 2-dimensional case.

Regremion models wit h continuous underlying error distributions (e.g., mode1

with a gamma error distribution), can be handled using the FFT in the same manner

as modeIs considerd in this chapter so far. In such cases, numerical inversion of the

characteristic Function can be used to obtain numerical estimates of the distribu-

tion function. This is accomplished by constructing a finite Fourier series (dismete

Fourier transform) that a p p r h a t e s the density over a sp&& finite interval,

a s described in $2.2.1. However, an additional source of error - m o r resulting

fkom discretizing the inherently continuous variable (&O known as sampling error),

would be introduced in such cases and one has to use very dense sampling scheme

to minimize this enor.

3.6 Extensions to truncated distributions

In this section we show that the frarnework we have developed d o m exact inference

in some interesting extensions. We wiIl provide the theoreticai d e t d for two such

extensions and highiight that the impIementation of the methods involve a slight

modification of the $Plus programmes that have been used and discussed in previous

sections.

3.6.1 h c a t e d Poisson distribution, Pt(X)

The truncated Poisson distribution ( a h known as the Zero-truncated Poisson dis-

tribution) has an important applications in some fields of study. For example, the

information technology (IT) manager of a certain Company may have a record of

the number of reported computer crashes (failures) per a defined time interval, say

per month, for each user in his/her network domain. Typicaily, users who did not

experience any crash wiil not report and therefore the value zero w i i l not be ob-

senteci. The manager is interesteci in developing a mode1 in an effort to pinpoint

factors aasociated 6 t h the rate of failure. In this case, the counts of crashes Y may

be assumed to follow a Poisson distribution conditional on Y 2 1. This distribution

is formaüy defined as

where A is the rate of failure.

To see that this distribution belongs to the family of exponential families, we

rewrite the probabiity mass function in the following form:

P(Y = y) = e zp {ylog~ - log($ - 1) - logy!} .

The moment generating function, and thus the characteristic function, of Y is

easily obtained fiom first principles as

and so,

Now let us consider the case of n independent, but not identicdy distributed

truncated Poisson random variabIes. As before, we will do so by, for example,

Mposing a parametric mode1 relating the rate parameters to a fixeci (measured)

covariate. if we want to study one covariate, the model we might be interested to

fit and test a hypothesis on codd be

log(Xi) = ?O + A x i -

The parameter of interest w i i i be the regression coefficient associated with the

covariate, 8'. Exact inference is based on the same pair of sdEcient statistics we

have seen before, as can be just%ed by the following result.

This is a 2-parameter exponential family and T = (C x, C xi:iY) = (To, TL) is

sufkient for 6 = (Ba, Br).

For the tnincated Poisson regression mode1

n

&(Bo, 81) = C log (ego+'" - 1) i= L

In the i i d . case, the m&um likelihood estimator (MLE) for the mean

parameter X of a standard Poisson distribution is the sample mean, 1 = C F. For

the zer~-tnincated Poisson, on the other hand, there is no dosed form solution for

the MLE of A. The MLE h satisfies

This equation is often solved using some kind of iterative technique such as

the Newton-Raphson or K M algorithm.

Using the methodology outiined in this section, it t u m out that conducting

an exact hypothesis test in the truncated Poisson case pose no additionai complexity

- conceptuai or computational.

3.6.2 Numerical example

As mentioned above, caiculation of exact tail probabilities for a regression mudel

baseci on a zeretnuicated Poisson distribution is straightfomd using the FFT

approach provideci the correct characteristic function is evaluated at the appropnate

range of Fourier fiequencies. The S-PIUS programme in Appendix E provides a

function that returns exact tail area probabiIities for this probIem and the following

example shows the results in comparison with the ordinary Poisson model.

Suppose the the following vectors of response (yp) and covariate value (xp) were

available for analysis. One can think of the counts of this example as realization fiom

a Poisson process and the fixeci conriate values as quantities measuring intensity of

a risk factor.

'YP

C l ] 3 4 4 5 4 7 9

'rp

C l J O O l 1 1 2 2

When we analyzed this data set using an ordinary Poisson regression model,

we obtained the following exact tail-area probabilities for testing the hypothesis

81 = O:

When the same data set was analyzed assuming a zero-truncated Poison

model for the response vector, the following resdts were obtained:

Spval . two : [il 0 .O5655095

The complete conditianal reference distribution of TL given To = to was o b

t ained using FFT, assurning an underlying zemtnincated Poisson error distribution,

and depicted in the figure bdow.

Two rernarks are in order. First the choice between a zero-truncated Poisson

mode1 versus a Poisson distribution that does not exclude zero is of course depen-

dent upon knowIedge of the data generating experiment. Second, although in this

exampie the pvdues are quite simiiar, there is a tendency for them to be dinerent

and therefore these exact caldations could be more important in some situations

than others.

Figure 3.2: Conditional Probability mass function (pmf) of TiITo = to for the zer+

truncated Poisson regression mode1 example obtained using FFT

3.6.3 'Ituncated binomial distribution, Bt(n, p)

For reasons similar to those which give rise to the need for a tmcated Poisson

distribution, the Zero-truncated binomial distribution is sometimes used as a model.

For example, consider a genetic trait, which is not dkectly observable, but wiii cause

a disease, among a certain proportion of the individuals that have it. For f d e s in

which one member has the disease, it may be of interest to estimate the proportion p

that has the genetic trait and test hypotheses related to this parameter of interest.

Let Y be the number of members that have the trait in a f d y of n members

where one has the h a s e (and thus also the trait). Since Y 3 1 the Zero-truncated

binomial distribution may be appropriate.

The Zmtruncated binomial distribution is defmed by

and it is a member of the exponential family as can be seen from

P(Y = y) = ezp + nlog(1 - p) - log (1 - (1 - pl") + log

Using the canonid link, let log (&) = Ba + &xi. Then

The above result suggests that exact inference can be carried out with slight mod-

ification of the program used for the m a l logistic regression, a modification that

changes only the fom of the characteristic function.

3.7 Analysis of error bounds

3.7.1 Sources of errors

When the FFT is used to speed up cornputations in numericd inversions of charac-

teristic functions of red random variabIes, three types of errors may be introduced:

1. sampling emrresulting from evaluating the integrand in the Fourier transform

(equation 2.1) ody at specific points, Le., a p p r h a t i n g an integral by a mm;

2. tmncution m r introduced by tmcating the Fourier series at a finite number

of points, i.e., negiecting the integral for fkequencies outside a speded range;

3. round-off emr intmduced by Iess precise cornputer arithmetic.

The h t source of error which occurs when a continuous response random

variable is involved can be controlIed by the interval between samples. The second

source of error, which can occur both in continuons and discrete cases, is controlled

by the number of samples taken. The b t error is reduced by reducing the width

of the intervals, and the second by increasing N, the input size, while keeping the

intervai width fixed. The round-off error should be negligible with the use of greater

numerical precision, eg., double precision.

In Chapter 2, we have seen that for an integer-valued m d o m variable with

support set 0,1,2, - - -, N - 1 the characteristic function is given by

where pk = P(X = k) is the probability m a s h c t i o n (pmf).

When the characteristic function was known but not the probabilities p k , 4

was evaluated at N equally spaced values in the i n t d [O, 27r) and the resulting

sequences c, dehed by

were used as the inputs for the FFT and the probabilitie 4

were recovered exactly.

There are, however, several useful discrete probability distributions with count-

ably W t e support. The geometric and Poisson probability distributions are two

such examples. On the other hand, the FFT being merely a fast algorithm for the

discrete Fourier transform (DFT) works with h i t e input sequences. It muld, then+

fore, be u& to explore the effect of truncating a non-hite seqnence to N finite

sample points. As we can see a little later, in some cases it is possible to obtain

analytic expressions which show how the error bound depends on the length of the

input sequence N and possibly other parameters. In many other situations, it may

not be possible to derive analytic solutions but empirical results may be used to

shed light on the magnitude of the inaccuracy resulting fiom tnincation error.

The effect of tmcation error in the FFT can be investigated by considering

the points ç,,, for rn = 0,1, .- -, N - 1:

N-L I mh/N

= Pk' 1

k=O

where ei-"lN = eamcn+hm/N, for h an integer, is used to obtain the second

equaiityand for k=O, - - . ,N- l ,

It can be seen that the inverse transform of c, = q5x(2m/N) will yield

A, . - 3 dN-l (value pk) plus the enor

The error term can be minimized by making N sufficiently large and one needs

to experiment in order to hd an optimal d u e of N that rninimi./.es the enor.

3.7.2 Error in the Geometric distribution

The geometric (or Pascal) distribution d e s in situations where interest lies in the

number of failures before the hst success occurs. It is a special case of the negative

binomial distribution, the later being a distribution that describes the number of

failures before the k t r successes. The geometric distribution has unbounded

support and its probability mass fiinction is given by

The error term for pk is given by

The percent error term for pk is

The e m r is l e s than r=0.01 if

If p = 0.5, then the input size N should be 7. In general, the size of N depends

on the rate of decay of the probability mass function under study. Note that for the

geometric distribution the percent error iç constant for aii values of the support set,

Le., does not depend on k. We confirmeci this r e d t numericaliy using the SPlus

program given in Appendix F as can be seen h m the following output.

- - --

The above output shows is the percent error Erom using FFT with N = 7

sample points to compute the probability mass function of a geometric distribution

with parameter O.S. As shown theoreticaiiy the error, which worked out to be 1%,

is uniform across the support of the random d a b I e . The "geom.fRn function

cornputes geometric probabilities at the sampled points and the SPIus function

"dgeom" is used to generate the exact probabilities fiom a geometric distribution

with parameter p = 0.5.

The geometric distribution was useful in checking, both analyticaiiy and nu-

merically, the degree of truncation that is introduced when the FFT is used for

varying input sim. For the Poisson distribution, on the other hand, it is easier to

investigate the truncation error ernpirically.

Suppose we are interestai in recovering probabities corresponding to a Pois

son distribution with mean parameter 3, Le., the random variable Y .V Poism(3).

The $Plus output below shows pointwise relative percent m m corresponding to

different input &es, N.

> pk <- dpoia(0:10,3)

> pkl <- pois.fft(il,cf.pois(11,3))

eb <- lOO*Bs(Cpk-pkl)/pkl)

> round(ek,digits=2)

Cl1 0.44 0.040.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

> pk c- dpoia (O:6,3)

> pici <- pooia.fft(7,cf .p0i8(7,3))

> ek c- 100*abs(Cpk-pki) /pkl)

> roand(ek , d i g i t s 4

Ci3 30.26 5.15 1.19 0.36 0.13 0.05 0.03

If we assume a different mean parameter, say X = 2, and stili me an input size

of 7, the pointwise mors become:

> pk c- dpois(0:6,2)

> pkl <- pois.fft(7,d.pois(f,2))

> ek <- 100*abs ( (pk-pkl) /pkl)

> round(ek,digits=2)

Li] 2.48 0.32 0.07 0.02 0.01 0.00 0.00

h m the above r d t s we notice that the error depends, of course, on the mean

parameter X and decays as the support, k, gets large. For the geometric distribution,

the dependence of the error was entirely on the parameter p, not the support k. One

more example wi i i make this very clear. If Y - Gemetric(0.3), what wodd be the

pointwise relative percent errors if probabilities are calculated using the FFT on

input size of ï? The answer to this question is given in the following SPlus output.

> pk <- dgeom(0:6, -3)

> pkl c- geom.fft(7,d.geom(?, -7))

> ek <- 100*abs( (pk-pkl)/pkl)

> round(ek,digitsrl)

Cl] 8.24 8.24 8.24 8.24 8.24 8.24 8.24

The above results show that if the input size to the FFT is N = 7, probabilities

for a Geometric(0.3) distribution are recovered with 8.24% relative error, regardless

of the particular value of the support of the random variable.

3.7.3 Error in the Poisson distribution

For the Poisson distribution, the dependence of the e m r tenn on k (the support), X

(mean parameter) and N (FFT input size) can be surnmarized using sums involving

the Poisson probability m a s and cumulative distribution functions.

The probabiity m a s function for a Poisson (A) random variable is defineci by

The error term in the fmite FFT (i.e., based on sequences of Fourier frequencies

of size N) is given by

To simple this expression for the error term further, we will use a change of

variabIe technique. Let y = k + hN. Then the Iimits of sumation in y go from

k + N to oo by increments of N. We then obtain

where is the cumulative distribution function of a Poisson (A) random vari-

able. Note that y CU only take on values in the set of positive integers since N is

a natusal number and k E {O, 1, -. -, N - 1).

The percent error for pk is bounded by

where the notation fylA(.) iS used here to denote the probabiIity mass function of a

Poisson( A) random variable.

In particular, the percent error in computing P(X = O) = po is bounded by

The graph below shows the relative error in percentages of the FFT truncation

as a function of the input size N for X = 3, X = 5, and X = 7.

In this chapter we showed that a method based on the theory of the Fourier t r am

form and implemented using the fast Fourier transform (FFT) can be fruitfully used

to compute exact tail area probabilities in the class of generalized linear modeis

(GLMs). This ùass of models are suitable for anaiyses involving inversion of char-

acteristic functions since the joint characteristic function for a vector of sufücient

statistics can be expiicitly ddved rdatively easiIy.

In particular we have illustrateci, with numerical examples, that exact infer-

ences can be carried out for hypotheses of interest in two popuiar remon modeIs

Figure 3.3: Relative error in percentages of the FFT for caiculating P(X = 0) for

different input sizes N and parameter vaIues X = 3, X = 5, and X = 7.

- the logistic and Poisson models. We have ais0 extended the FE'T fhmework to

weighted srponential family models and considered two practical examples - the

tmcated Poisson and binomial models. We have outlined the general approach

for any generalized linear mode1 including modek based on a continuous response

vector. Findy, we discussed different sources of errors in any FFT based approach

and derived some crude error bounds for some discrete distributions.

Chapter 4

Sensit ivity analysis

4.1 Introduction

Models are simplified representations of a data generating mechanism with an inher-

ent certainty of departures from rnodel assumptions. Mode1 misspedication may

arise in severai ways. For exampie, in a generalized linear models context one may

misspecify the underlying error distribution, the Iink function, or the structurai

form of the systematic component or any combiiation of these. Also the possibility

of the observed data to contamination that rnay be caused by transcription error or

any other misclassification error codd have a huge impact on results of statistical

inference. For these reasons, it muid be usefui to investigate whether the impor-

tant aspects of the mode1 remah stable or robnst under some perturbations. In this

chapter we use the FFI' based framework to investigate sensitivity of exact tail area

probabilities to some of the possible mis-specifications outlined above.

4.2 Robustness

The literature on robustness has grown steadily over the past quarter of a century.

One early work is the f rinceton robustness study by Andrews et ai. (1972). Several

authors studied robustness of statistical inference techniques from dXerent perspec-

tives and as a result one may h d a slightly different interpretation and definition

for robustness.

Robustness involves protection against model misspecification and resistance

of inference conclusions to spurious observations (Lindsey, 1996). Robust estimation

aims at characterizing how point estimators of population parameters behave when

there are mors in the data (Horowitz and Manski, 1997).

One way of accommodating extreme (outlying) observations is to use distribu-

tions with thicker tails, in which case a member of the secdeci stable distributions

wodd often be suitable (Lindsey, 1996). Alternatively, a finite mixture distribu-

tion might be used to model contaminated observations. An m-component mixture

distribution can be expressed as

where C q = 1. In practice m will be smaii, Say 2 or 3.

This approach is an example of embedding, with the ?ri's being the embedded

parameters. As an illustration, suppose we have a mixture of 'acceptable' observa-

tions together with a few 'unacceptable' or extrerne ones. If a model suitable for the

central observations is chosen as the data generating mechanism, one wouid hope

the corresponding ri's to be large. A formal test of hypothesis can be conducted to

determine the plausibility of a model with ri = O for the contaminateci observations.

h conducting exact inference, however, not much is known about the idu-

ence of contamination and/or model misspecification on computed pvalues. For

generalized h e a r modeh, the vector of the suflicient statistics that r d t e d h m a

canonical pararneterization will no longer be sufficient under the perturbed model.

In the absence of any theoretical karnework, it seems reasonable to introduce a

known perturbation factor and use the same conditional argument to test hypothe-

ses on parameters of interest in a "what-ifn kind of approach.

4.3 Misclassification Errors

In binary regression, contamination in y can ody take the very simple fom of a

transposition enor between O and 1 (i.e., O + 1 or 1 + 0).

Suppose that such transpositions happen with a smaU probability y, so that

the actual recorded response y is governed by a probabiity p* instead of p,

where p is given by the assumed mode1 and 7(< 0.5) denotes the probability of a

transcription error (Copas, 1988; Collet t, 1991, section 5.7.3).

For example an uplier with y=l and p near zero can be explaineci as a tran-

scription error with probability p' = 7 > O rather than as an extremely unusual

response from the basic model. Transcription error occurs in many practical situa-

tions. In studies on prostate cancer, for example, an enor in diagnosing whether or

not a patient had nodal involvernent may r d t in a transcription error.

DiEerent interpretations couid be gîven to this contaminateci modd. For exam-

ple, it can be interpreted as a family of transformations indexeci by a transformation

parameter 7 > O, with the pure logistic modd being obtained when 7 = 0.

The above mode1 can be fitted using the rnethod of m h u m likelihood by

mtuominng the log-likelihood function given by

A better understanding about the approach to be taken in fitting this model

can be gained upon re-Wnting the model in equation (4.1) in the form

which shows that the link function is not the logistic link anpore but one that

includes the additional unknown paramater 7. Statistical packages that allow user-

spec3ed link functions (e.g. GLIM) can be used to 6t the model. One oRen fits this

model for a range of values of y and the model for which the deviance is minimized

is adopted as the one that describes the data welI (ColIett, 1991).

Similady, here we investigate the sensitivity of exact tail probabilities for the

type of misclaçsification described above by considering diflerent values of 7 using

the FFT method (Appendix C).

4.3.1 Numerical Example

Suppose in a certain exposure-response layout experiment the observed vectors of

response, group size, and covariate were (Z,4,6), (10,12,14), and (0,1,2), respectively.

A test of the hypothesis H, : BI = O under a logistic fit with no assumeci misciassi-

fication r d t s in the following taii probabilities.

In this case introduction of some degree of contamination did not affect the

exact pvalues. If, however, the nul1 hypothesis is different from O the exact results

are affected. Table 4.1 surnmarizes 'exact' taii probabilities for a range of des of y

for testing Ho : & = 1. The table also shows the marginal probability being used in

the computation of the conditional distribution of TL lTo = GJ. As d i s c d before,

monitoring the value of this marginal probability is vital to make sure that strange

conditiond probabilities were not obtained as a result of division by a number suf-

fkiently close to zero.

As can be seen from the above table, a significant change in the marginal as

weli as conditiond probabilities codd result as the degree of contamination increases,

which in turn hfiuences the 'exact' tail probabilities. Note that the last row of Table

4.1 corresponds to p* = 112, which in the conditional sense is eqnivalent to testing

Table 4.1: Sensitivity of exact pvaiues to misclassification enor

--y P(To = to) Left 1-tail pvalue Right 1-tail pvalue 2-tail pvalue

the hypothesis BI = O witb no mis-cIassification in the data. Hence, ali of the

probabiIities of interest shown in this row match the values given previously.

4.4 Mis-specification of link function

In a typical medical application the outcome of interest, y, is a binary random

variable which can be coded as l/O representing patient's response to treatment

(e.g. cure/no cure, alive/dead), and the exphnatory cariable, x., is a vector of

treatment and patient characteristics (e-g. dose, age, sex).

Let

A binary regression model for a p-component vector x of covariates asserts that

where F is a given response bc t ion and 0 is a pvector of regremion coefficients.

As outlined in Chapter 3, cornmonly used link functions in binary regression model

include

F(u) = eu/l + eu + logistic regression

F(u) = @(u) -3 probit regression

F(u) = 1 - eq(-eq(u)) -+ complementary log-log regression

We have previously discussed that for a binary regression model the canonical

link is the logit function and the conditional argument which allowed us to conduct

exact inference for a given parameter of interest by conditioning its corresponding

d u e n t statistics on another set of sufficient statistics which are associated with

parameters that we are not interested in (nuisance, incidental parameters) wi l l not

be valid. For example, for a Iogistic model with an intercept (nuisance) and dope

(parameter of interest) parameters, we have seen that the pair of sufficient statistics

are a simple linear functions involving the response and the explanatory variable.

A question that may be worth asking and exploring is that whether these

pair of sufficient statistics can be "reasonably s d u e n t n to be used as a basis for

inference under other link functions. In other words, to what extent wodd the tail

probabiIities change if the same statistics as used in logistic regresçion were to be

used with, Say, probit or complementary log-log regession models?

4.4.1 Testing for ,& = O

Bina~y regression: Logit link

Consider a simple binary regression mode1 based on 28 patients and a response

rate of 13/28 (46%). Suppose &O the values of the explanatory variable x can take

on oniy 3 values, namely, 0, 1, and 2. The conditional distribution of Tl given

Ta = to obtained kom fitting a simple Iinear logjstic modd is distributeci on the

support tr E {4,5,. -- ,20). This is summarized in the foflowing summary output

(to four decimal places). The arguments to the SPlus funetion R2d.n~ are the

response vector y = (3,4,6), the sample size vector n = (10,10,8) and the covariate

vector z = (O, 1,2).

The observed values of (To, 2'1) is (13,16) and the left, right, and two-tail

probabilities (to 4 decimal places) are 0.9819, 0.0539, and 0.1079, respectively. For

the same reason as that of the case of misclassification, changing the link function

from logit to probit or complementary log-log did not d e c t the 'exact' probabilities

in testing the hypothesis ji = 0.

4.4.2 Testing for Ho : = c (c # 0)

Suppose we are testing a nul1 hypothesis different hom zero. The following SPlus

output gives the conditional distribution of TL given To = to obtained under the

null hypothesis Ha : pl = 0.5 using the example data introduced in the preceding

section. The same pair of canonical sdicient statistics that have been used so far

are being used with both canonical and non-canonical link functions. At best, the

interpretation one may give to the probabilities in the non-canonid cases would be

to think of in terms of "approziniate suficiency".

Inspection of the output below shows that for the logit iink (the can0nica.I Iink

in this example), the conditional distribution has a non zero (to four decimal places)

probability on the support set t E {6,7, -. - ,21). For the probit and complementary

log-log link functions, the support set associateci with non-zero probabilities is tl E

{8,9, ---, 21). The probability mass differs across these three link functions in a non-

systematic way. However, for this example, the right hand one-tailed probability

increased in the probit and complementary log-log regressions (Table 4.2).

Conditional distribution based on logit , probit, and complementary log-

log link functions

> fft2d.np(yrc(3,4,6) ,n=c(10,10,81 ,~cC0,1,2) ,bl-0.5,linkr3)

ISComplementary Log-Log

$cond.pr :

Cl] 0,0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

C91 0.0000 0.0002 0.0012 0.0047 0.0150 0.0392 0.0832 0.1428

Cl71 0.1953 0.2088 0.1688 0.0976 0.0363 0.0067 0.0000 0.0000

C251 0.0000 0.0000 0.0000 0.0000 0.0000

The left, right, and twetail probabilities dong with the marginal probabil-

ity P(To = to) obtained under three link functions are summarized in Table 4.2.

The framework which emphasizes using expficitly known characteristic functions to

obtain unknown distribution functions wodd also allow investigation of effects of

covariate dismtizatioa This is helpfui in its own right since in many practical cases

it is common to use 'categorized' version of an inherently continuous variable. Age

and income are two such exampIes. See a h BagIivo et al. (1996) for an inter-

esting appiication of generating hrnctions for sensitivity analysis in the context of

permutation distributions.

Table 4.2: Semitivity of exact pvalues to link misspecification

Link P(To = to) LeR 1-tail pvalue Ri@ 1-tail pvalue 2-tail pvdue

4.5 Summary

In this Chapter we presented resuIts of empirical insmtigation on the sensitivity of

'exact' tail area probabilitieç to sorne departues from ordinary model assump tions.

In particular, we discussed the possibility of misciassification e m r in the context of a

bisary regression model and showd that tests of hypotheses on the dope parameter

away fÎom zero codd result in a marked &ange in the pvalues depending upon the

extent of the rnisciassihtion rate. Aithough the sufiicient statistics which are sim-

ple linear combinations of the respome and covariate vaiues are no longer sdficient

when the iink fnnction is not canonical, we have attempted to see if they can still

be just%ed as "appmzimate suficient dat5st i~" and in what way the conditional

distribution used to make inference is affected by using them. Our ernpiricai inves-

tigation shows that the exact gvalues obtained sssnming a logit link are unaffected

when the link is changed to either probit or complementary hg-log when testhg the

n d hypothesis PI, r & = 0.

Chapter 5

Alternat ive approaches

5.1 Introduction

Broadly speaking, a given statistical inference probiem may be approached in one

of two ways: exactly or approlcimately. One obvious drawback of methods for exact

inference is that the size of the data that can be currently processed efficiently is

limited. When the sample size is s m d to moderate, exact methods are feasible

and should be used as much as possible. However, in practice, one may have a

situation where the sample size is somewhere in the rniddIe, big enough to cause

computational burden for exact methods but smalI enough to make the traditional

centrai limit theorem based approach perform poorly.

To overcome this problem several suggestions have been proposed in the liter-

ature in recent years. For instance, some sort of "hybrid" algorithm can be used in

which an exact algorithm is applied to some part of a given problem while traditional

large sampIe methods are applied on other aspects of the same problem. Monte Carlo

approaches are &O becoming usehi in such problem solving techniques.

5.2 Small sample asymptotics

There are various asymptotic techniques in use in statistics (Barndofi-Nielsen and

C a , 1989). One such technique which bas been proved by many researchers to be

very useful in the asymptotic theory of statistics is based on saddlepoint approxi-

mation (Barndorff-Nielçen and C a , 1979; RRid, 1988).

Approaches based on sadepoint expansions are sometimes referred to as smd-

sample asymptotic methods. The name small-sample uaymptotics might have been

coined to highhght the fact that in most practical cases people have been working on

these methods were found to provide extremely accurate resdts for appraximating

densities and distribution functions of statistics of interest even when the sample

size is s m d . In problems involving continuous data, these methods are wll-studied

(Reid, 1988).

First introduced into the statistics literature by Daniels (1954), the method

of saddlepoint apprmtimation has been applied in a variety of problems arising in

paraznetric an&& (Bedrick and Hill, 1992; Daniels, 1983, 1987). Saddlepoint

approximations have also been used in nonparametric analyses, such as in approx-

imating permutation distributions and in resampling methods (Booth and Butler,

1990; Davison and Hinkley, 1988).

A largedeviation expansion cm be obtained for a conditional density (distri-

bution) by separate approximation of numerator and denominator via saddiepoint

expansion. This is cded double saddlepoint approximation (Bamdorff-Nielsen and

Cox, 1979). Davison (1988) used double saddepoint approximation and carried out

conditional inference in generalized linear models with canonicai link functions.

The other approach in using saddlepoint methods in conditionai inference is

using a single saddlepoint to directiy approximate the conditional distribution. Skov-

gaard (1987) and Wang (1993) describe the usefiilness of saddlepoint appraimation

in conditional inference. Skovgaard (1987) gives saddlepoint expansions for condi-

tional probabilities of the form P(Y 3 y 1 k = x ) where (x, I) is an average of n

independent bivariate randorn vectors. A general version correspondhg to condi-

tioning on a (pl)-dimensional linear function of a pdimensional variable has aiso

been shown.

5.3 Large sample results

Suppose inference concerns a parameter vector 8 with p elements and let the hy-

pothesis of interest be

Let the vector of efficient scores be denoted by U(0) with its jth element

8 - MVNp(B, 1-'(O))

where Ii8) denotes the Fisher expected information matrix, with (j, k)" eiement

Also, for sdiiciently Iarge sample size,

There are three asymptotidy equivalent test statistics that are often used to

make st atistical inference.

1. Likelihood ratio statistic

where z(&) is the value of the log-likelihood at the maximum Wrelibood estimate 8.

2. Score Statistic

An advantage with this test is that it does not require 4.

3. Wald test

xw = ( ê - ~ , , ) ~ ~ ( t ~ , ) ( ê - 00) - G. Alternative version of this test replaces J ( q ) by ~(6). For certain hypotheses in

logistic regression, the Wald test can behave in an aberrant manner (Hauck and

Donner, 1977).

5.4 Applications to simple logistic regression

Consider the simple logistic regression mode1 logi th) = + pixi. An exact condi-

tiond test of PL = O iS baseci on conditional probabilities of the form

where Tl is the sufficient statistic corresponding to Pt, the parameter of interest,

and To the diicient statistic for the nuisance parameter Bo.

5.4.1 The likelihood ratio test

The deviance or likelihood ratio statistic has the form

where 1 is the log-likelihood in which the fidl model includes both Bo and pl and the

reduced sets fil to zero. For large n, the statistic W follows a X: distribution.

The signed Wrelihood ratio test statistic w = sign(W)m has an a p p r h a t e

standard normal distribution and P(u < wolgl = 0) can be used to approximate

the conditionai taiI probability, P,.

5.4.2 The Wald test

The Wald chi-square statistic in this simple case has the simple form

and has a asymptotic distribution.

has an apprarcimate standard normal distribution. The conditionai probabdity P,

is approximated by P(z < z,,tpl = O). Hypotheses involving testing non-zero nul1

values cm bz fmnuiated and tested similarly.

6.4.3 The double saddlepoint approximation

A double saddepoint approximation to the conditional tail probabibty P, is given

by (Davison, 1988)

where and # are the standard normal cumulative distribution and density func-

tions, respectivdy; w is the signed likelihood ratio test statistic; and t is the çigned

Wald chi-square statistic.

The quantity denoted by p measures the ratio of information for the nuisance

parameters in the full and reduced models and is given by (Platt, 2000)

where h2 refers to the bIock of the variance covariance matrix of the B parameters

correspondhg to A,

Some modijications to the general saddlepoint approximations are oRen made

for randorn variables distributecl on a lattice span (Skovgaard, 1987) that are in-

tended to serve the same purpose as the traditional continuity-corrections we are

used to doing, for exampIe, in approlcimating a binomial variable by a normal. In

the simple logistic regression case, the modification is to replace z in equation (5.5)

The error of the saddlepoint approximation to the exact conditional taiI prob-

ability is O (n-3/2) whereas the unconditional approximations based on the Wald or

the likeIihood ratio test have error O (n-II2), where n is the total sample size.

In this Chapter, we presented inferentiai methods in varying degree of deviation

fiom exact andpis. On one extreme, the traditional k t order methods (e.g., Wald

and Iikelihood ratio test) rdy heavily on having sufficiently large sample and they

only provide approximations of ~(n-'/*). A better higher order approximation can

be carried out by using the swx&d s m d sampIe asymptotic methods. These

techniques, based on a saddlepoint expansion, provide approximations of 0(n-3/2)

and can be suitable for cases in which the computational burden of carrying out

exact inference is prohibitive.

Chapter 6

Summary and discussion

S m p k whee effective size is small or moderate arise very commonly in a varie@

of practical appIications, making small-sample inference one of the major themes of

research in statistics. In such cases, traditional large-sample methods may give Iess

accurate and often misleadhg results. Therefore techniques which can be used to

compute exact pvalues shodd be used as much as pmible. Currently there are a

few algorithms and commercial softwares (usually at a peak price!) which address

the problem of small sample inference. Some of the available canneci packages are

optimized to address a s p d c problem (e-g. only a logistic regression rnodel).

In this thesis we have presented a &ed approach taking advantage of the

unique one-teone conespondence between a distribution function of a random vari-

able and its characteristic function. Exact inference is based on the conditiond

distribution and density functionç of statistics mfEcient for a parameter of interest,

given ancillary quantities.

We show that for a wider class of regression models, the discrete Fourier trans-

form (Dm) can be used dong with the fast Fourier transform (FFT) to go fiom

the characteristic function domain back to the probability distribution domain. In

problems involving discrete random variables with finite support, the inversion of

the characteristic function is done exactly on a properly chosen finite grid of Fourier

sequences. If discrete variables with infinite support are involveci, one can still get

exact resdts by letting the grid size grow sufficiently large. For example, for a

random variable that has a Poisson distribution, the truncation error thai; may be

caused by using finite input size decays quickly (exponentially) and exact results are

obtained within a reasonable maximum grid size.

We have demonstrated that hypotheses of interest for t y p i d bioassay type bi-

nary regression models and a Poisson regression modei involving one discrete covari-

ate can eady be conducted within a widely avaiiabIe statisticai software (S-Plus).

The general case of p covariates does not pose any problem in principle, but storage

and processing time of the high dimensionaI grid in the DFT could become a burden.

This is expected to improve with the increasing improvements in computing power.

We have also highlighted that exact inference can be made for any member of

the class of generalized linear models (GLMs), including continuous response mod-

els such as a model based on an exponentiai distribution. The FFT approach has

an additional source of error called sampling error in cases involving a continuous

outcome variable. This is because the DFT is an approximation of the continu-

ous Fourier transform (the error i n m e d when a continous function is numericaiiy

integrated using çums).

In a simple logistic regression model, the presence of misclass3cation enor

and/or misspecification of some features of the model does have a marked infiuence

on 'exact' pvalues obtained using the same conditionai argument of conditioning a

SuffiCient statistic of a parameter of interest by the suflicient statistic corresponding

to a nuisance parameter. This impact is to be expected since the suflî.ciency based

on a pair of statistics that are linear combinations of the response variabIe and as

a r e d t the conditioning principle - will breakdom when we we consider a non-

canonical form for a binary regression. However, the formulation that was put in

place using the characteristic functions dows LIS to do sensitivity analysis to get

some insight into what actudly happens.

In general exact approaches provide more reliable pvaiues than their asymp

totic counterparts for cells with sparse data, at the expense of less efiiciency (both

in terms of t h e and storage) for c e h with sufEicient data. This suggests that to

have the best of both mrlds one should try using a mixture of exact and asymptotic

methods by using a hybrid approach.

Also when exact inference is computationaüy prohibitive or when the sample

size is not large enough for the fht-order methods (based on the central iimit

theorem) to be d d , the ecaiied maü-sample asymptotic methods which are based

on saddlepoint appmximations could be usehl alternatives. There has also been

rapid development of mall-sample approaches based on simulation and resampling

techniques including the bootstrap and Markov chah Monte Carlo.

Bibliography

[l] Agresti, A. (1992). A s w e y of exact inference for contingency tables (with

discussion). Statïstical Science, 7, 131-177.

[2] Andrews, D.F., Bickel, P.J., Harnpei, F.R., Huber, P.J., Rogers, W.H. and

Tukey, J.W. (1972). Robwt Estimates of Location. Princeton University Press,

NJ.

[3] Baglivio, J., Pagano, M. and Spino, C, (1996). Permutation distributions via

generating functions with applications to sensitivity analysis of discrete data.

J. Amer. Statist. Ass., 91, 1037-1046.

[4] Baker R.J. (1977). Algorithm AS 112. Exact distributions derived fiom twwvay

tables. Appl. Statist., 26, 199-206.

[5] Barndorff-Nielsen, O.E. and Ca, D.R. (1979). Edgeworth and saddlepoint a p

proximations with statistical applications (with discussion). J. Roy. Stdist.

Soc. B, 41, 279-312,

[6] BarndorfE-Nielsen, O.E. and C a , D.R (1989). Asymptotic Techniques for Use

in Statistics. Chapman and Hall, London.

[7] Barndorff-Nielsen, O.E., and Cm, D.R. (1994). Inference and Asymptotics.

Chapman and Hali, London.

[8] Bedrick, E.J. and HiIl, J.R (1992). An empirical assessrnent of sacidiepoint

approximations for testing a logistic regression parameter. Biometrics, 48,529-

544.

[9] Booth, J.G. and Butler, R. W. (l99O). Randomization distributions and saddle-

point apprha t ions in generaIized hear models. Biometrika, 77, 787-796.

[IO] Bracewell, R.N. (1986). The Fowier T m ~ f o n n and its Applications, (2nd ed.).

McGraw-Hill, New York.

[Il] Brigham, E.O. (1988). The Fast Fourier ~ n s f o m and its Applications.

Prentice-HaIl, Englewood Cliffs, NJ.

[12] Coilett, D. (1991). Modelling Binary Data. Chapman and Hall, London.

[13] Cooley, J.W. and Wey , J.W. (1965). An algorithm for the machine caIculation

of cornplex Fourier series. Mathematical Computation, 19,297-301.

[14] Copas, J.B. (1988). Binary regression models for contaminated data Journal

of the Royal Statistical SocietyB, 50,225265.

[15] Ca, D.R (1970). Anal* of Binary Data Methuen, London.

[16] Daniels, H.E. (1983). Saddlepoint approximations for estimating eqnations.

Biometrika, 70, 89-96.

[l?] Daniels, H.E. (1987). TaiI probabiiity approximations. Int. Statist. Rev., 54,

3748.

[18] Davison, A.C. (1988). Approlaniate conditional inference in generaIized Iinear

modds. Jovrnol of the Royal Stutistical Society B , 50,445-461.

[El] Davison, A.C. and Hinkley, D.V. (1988). Saddlepoint approlamations in resarn-

p h g met hods. Biometrika, 75, 417-431.

[20] Finney, D.J. (1964). Statisticul Methad in Biolqiuzl Assuy. Charles Griflin,

London.

[21] Fisher RA. (19%). Statistical Methoh for Raeurrh Workera. Edinburgh:

Oliver and Boyd.

[22] Hauck, Jr., W.W. and Domer, A. (1977). Wald's test as applied to hypotheses

in logit andysis. J. Amer. Stutist. Ass., 72, 851-853.

[23] Hirji, K.F., Mehta, C.R. and Patel, N.R. (1987). Computing distributions for

exact lagistic regression. J. Amer. Stotist. Ass., 82, 1110-1117.

[24] Hi@, K.F., Vollset, S.E., M, I.M. and a, A.A. (1996). Exact tests for

interaction in severd 2 x 2 tables. J. Cornpututional and Gmphical Stat., 5,

209-224.

[25] Horowitz and Manski (1997) In the Bandbook of Statistics: Robvst Inference,

edited by Maddala, G.S. and Rao, C.R, North-Holland, New York.

[26] Knight, K. (2000). Mathematical Statistics. Chapman and Wall, London.

[27] Lehmann, E.L. (1983). Theory of Point Estimation, Wiey, New York.

1281 Lindsey, J. K. (1 996). P a m e t + c Statistical Inference. Clarendon P m , Oxford.

[29] LogXact-'Liirbo (1993). LogXact-'ltrrbo: Logistic Regression Software Featuring

Ezact Methoh, Version 1.01. Cytd Software, Cambridge, Massachussets.

[301 March, DL, (1972). Exact probabilities for R x C contingency tabIes. Commu-

nications of the Association for Compubing Muchinery, 15, 991-992.

[31] McCullagh, P. and Neider, J.A. (1989). GenemLed Lineur Modeb (2°d ed.).

Chapman and Haii, London.

[32] Mehta, C.R and Patel, N.R. (1980). A network aigorithm for the exact treat-

ment of the 2 x k contingency table. Commun. Statiat. Sirnul. Cornput., 9 ,

649-664.

[33] Mehta, C R and Patel, N.R. (1983). A network algorithm for performing

Fisher's exact test in r x c contingency tables. J. Amer. Statist. Ass., 78,

427-434.

[34] Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. J.

R. Statist. Soc. A, 135, 371F84.

[35] Pace, L. and Saivan, A. (1992). A note on conditional cumulants in canonicd

exponential families. Scandinavian Journal of Statistics, 19, 185-191.

[361 Pagano, M. and Tritchler, D, (1983). Permutation distributions in polynomiai

time, J. Amer. Statist. Ass., 78, 435440.

[37] Pierce, D.A. and Peters, D. (1992). Practical use of higher order asymptotics

for multiparameter exponentid families (with discussion). Journal of the Ro yal

Statistical Society 3, 54, 701-137.

[38] Platt, RW. (2000). Saddepoint approximations for small sarnple logistic r e

gression problems. Stat. in Med., 19, 32S334.

[39] Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery B. P. (1992).

Numerical Recipes in C. Cambridge University Press.

[40] Reid, N. (1988). Saddepoint methods and statistical inference (with discus-

sion). Statbt. Sci., 3, 213-238.

[41] Reid, N. (1995). The roles of conditioning in inference. Statistictrl Science, 10,

138-157.

[QSI Skovgaard, LM. (1987). Saddlepoint expansions for conditional distributions.

J. Appl. Prob., 24, 875-887.

$31 SPLC'S (1999). 5'- PL US 2000 for Windowa, Professiocal FIlease 1. MathSsoft,

Luc., Seattle, Washington.

[44] StatXact-nirbo (1992). StatXact-Turbo: Statisticcrl Software for Exact Non-

pumrnetric Inference, Version 2.11. Cytel Software, Cambridge, Massachussets.

[45j Tritchier, D. (1984). An algorithm for exact logistic regression. J. Amer. Statbt.

Ass., 79, 709-711.

[461 Venables, W.N., and Ripley, B.D. (1997). Modern AppBed Statrstics urith S-

Plw. Springer, New York.

[47] Verbreek, A. and Kroonenber, P.M. (1985). A survey of algorithms for exact

distributions of test statistics in r x c contingency tables with 6xed margins.

Comp. Statist. Data Analysis, 3, 159-185.

[48] Vokiet, S.E., Hirji, K.F. and Elashoff, R.M. (1991). Fast computation of exact

confidence limits for the common odds ratio in a series of 2 x 2 tables. J. Amer.

Statist. Ass., 86, 404-409.

[49] Wang, S. (1993). Sacidiepoint approximations in conditional inference. J. Appl.

Pd., 30, 397-404.

[50] Weerahandi, S . (1995). Exact Statistical Methoda for Data Analysis. Springer,

New York.

Appendix A

FFT program for Beta-binomial

c- Wurpoae: computing probabilit iea from charactaristic fanction

#via FFT

betabin <- funnion(n, mu)

i

TO <- n + 1

t O <- rep(0, TOI

f o d j in (1:TO))

{tO [j] <- 2qi*( j - l ) /TO)

1 lt evaluate

characteristic function at tO

Ms <- rep(0 ,TOI

Ms <- (1+ mu*(exp(tO*(li)) -1))-n

ü## get the probability matrix

prob <- round( (Re(f ft (Ma)) ) /TO, digit8=20)

supp <- eeq(0 ,n)

sump <- m(prob)

mean,y <- sum(supp*prob)

dista <- cbind(aupp ,prob)

zero. vec <- rep (O, TO)

bin <- dbiaom(sapp,n,mu)

diff <- prob-bb

plot(supp ,prob)

for (i in 1:TO) Csepeats (supplil , zero .vecCil ,

srrpp Cil , prob Cil 11

par (ne-T)

plot ( q p , bin, type~="l", rlab=l1 " , ylabt1 ")

Appendix B

FFT program for a weighted sum

of random variables

#Purpose: FFT for a aeighted sum

grid <- fnnction(T0,ltrl) {

j <- aeq(1,TO)

tO <- k+2*pi*( j-1)/TO

return(liat(tO=tO)

1

#Weivalaate the charaterietic fmiction at tO

cf <- function(T0 ,hl)

< Ha <- rep(0,TO)

x <- seq(l,6)

tO <- grid(TO,k)$tO

for (i in 1:TO)i

HsCil <- su( (x/21)*erp(tOCil *(li)*x))

1

return(Us)

1

WW get the probability matrix

pmf <- function(T0,Ue)

< prob <- rormd((Rs(fft(Me)) )/TO, digits=l?)

sppp <- aeq(0,TO-1)

distn <- cbind(sapp ,prob)

retarn(prob)

1

Appendix C

FFT program for binary regression

c w # # # # # # -

OpUrpose: computing probabil i t ies from characteristic funct ioo #

via FFT

the charateristic function at tO and tl

1 cf c-

function(t,p,n) €0- p + p*arp(Ct)*(li)))'n)

Hi <- array (tO,c(TO ,Tl) 1

H2 <- tCarcay(tl,c(Tl,TO)))

Hs <- 1

eta <- rep (O,length(n) ) tinitialize the linear pradictor

p <- rep(O,length(n))

f or(i in I :length(n))

€

etaCi1 <- bO + bl*xCil

if (linit-1) €p Cil <- exp (etaCi1) / (l+erp(etaCil))> #logit

else if (li-2) ip Cil <- pnodetalil > #probit

else ÇpCi] <- 1 - exp(-exp(etali1))) #coq. log-log

p [il c- (1-eps) *p Ci] + eps* (1-p Cil ) #contamination

Ms C- (Ha*& ( (Hl+x Cil *H2) ,p [il ,nCil) 1

>

## get the probability matrix

prob C- Re(f f t (He) / CTO*Tl) )

sm.p <- sam(prob1 wtconeistenq check

pr <- probCtO.o+I,l

sum.pr <- sum(pr) # u s a to monitor this marginal probability

cond.pr <- pr/sum.pr

pvaï.rhs <- 1- sum(cond.prCl:tl.ol)

pvaï.ïha <- sum(cond.prCl:(tl.o+L)l)

pvaï.tuo <- min(2*min(pval.rha ,pval.lhs) ,1)

retrun(t0.oJtl.o,pval.lhs,pval.rhs,pval.tuo)

#return(cond.pr) #removing thia rine vil1 generate the

#antire conditional distribution

>

Appendix D

FFT program for Poisson

regression

#Purpose: computing probabilities fram characteiristic function

#via FFT

poisld <- function(bOl0,bl=O, x, y)

< n <- length(r1

TO <- m ( y ) + 1 # may need to adjast the length of the grid

Ti <- max(T0, sum(x+g) + 1) t may need t o adjust the length of the grid

t O . 0 <- sum(y)

tl . O <-sam(x+;g)

t O <- rep(0, TOI

ti c- rep(O.Tl)

for( j in (1:TO))

C t O C j l C- 2*pi*( j-I)/TO)

f d k in 1:TI)

{tl CkJ <- 2*pi*(k-1) /Tl)

1 Wevaiuate

the charateristic funaion a t t O and tl

## get the

probability matrix

prob c- Re(fft(Hs)/(TO*Tl))

6um.p C- sum(prob) #consiatenq check

pr C- probCtO.o+l,l

sum.pr <- sum(pr) üvitai to monitor this marginal probability

cond.pr c- pr/aum.pr

pval.rhs C- 1- simi(cond.pr[l:tl.ol)

p v 8 l . U ~ <- sum(cond.pr[l: (tl.o+l)l)

pral. tuo c- min(2+min(pval. rha ,pval . fis) , 1) rehuri(tO.o,t1.o,pvd.Iha,pvd.rhs,pval.tuo)

#return(cond.pr) gactivating this Lins u i l l generate

#entira conditionai distribution

>

Appendix E

FFT program for zero-t runcated

Poisson regression mode1

-- #Purpose: cornputhg probabilities from characteristic function

#via FFT for a Trnncated Poisson r e p s e i o n

Tpois2d c- functionCr , y, bû=logCmean(y) 1 , bl-0)

< n c- length(x)

TO <- (sum(y) + 1)# may need to adjnst the length of the grid

Tl <- max(TO,mtm(x*y) + 1)# may need to adjust the lengtb of the g r i d

t O .O <- sPrn(y1

t l . 0 c-sam(x+y)

tO <- rep(0, TOI

tl <- rep(0,Tl)

for(j in (1:TO))

{ t O Cj] <- 2*pi*(j-l)/TO)

for& in 1:Tl)

{tt Ckl <- 2*pi* (k-1) /Tl)

Wevaluate the charateristic function at tO and t 1

M l c- arrap(tO,c(TO,Ti))

M!2 <- t(array(tl,c(Tl,TO)))

HE <- O

f o d i in 1:n)

C

1ambda.i C- exp(b0 + bl*rli l)

Hs C- (Ms+(log( (exp(1amMa. i*eq((Ml+xlil *Ill) * ( E l ) ) - 1) / (exp(1ambda. i) -1) ) )

1

kh c- erp(n6)

#tlt get the probability matrir

prob C- round(Re(f f t (He) /(TO*Tl) ) ,digits=l7)

snm.p <- wrm(prob1

Appendix F

FFT program for error analysis in

geometric distribution

#ilevalnate the charataristic hmction at tO

cf .gaom <- function(T0 ,p)

C

Ms <- rep(0,TO)

tO <- grid.geom(T0) $t0

for (i in 1:TO)C

He Cil c- (1-p) / (1-p*exp(tO Cil *Clil 1)

> ratnrn(Xs>

1

## get the probability matrix

geom. ff t c- function(T0 ,Ma)

C

prob <- round( (Re(fft (Ma) ) )DO, digits=20)

supp <- seq(0,TO-1)

diatn <- cbind(eupp ,prob)

returdprob)

1

uses of the fast fourier transform (fft) in exact statistical inference

Documents

Transcript of uses of the fast fourier transform (fft) in exact statistical inference