uses of the fast fourier transform (fft) in exact statistical inference
Transcript of uses of the fast fourier transform (fft) in exact statistical inference
USES OF THE FAST FOURIER TRANSFORM (FFT)
IN EXACT STATISTICAL INFERENCE
Joseph Beyene
A thesis submitted in conformity with the requirements for the Degree of Doctor of Philosophy
Graduate Department of Community Health Depart ment of Public Healt h Sciences
University of Toronto
@ Copyright by Joseph Beyene 2001
National Li'biary Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographii SeNices senrices bbliographiques
The author has granteci a non- e x c b licence aiiowing the National Li'brary of Canada to reproduce, ioan, distri'buîe or sen copies of this thesis in microform, papa or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantiai extracts fkom it may be p d e d or othenivise reproduced without the author's permission.
L'auteur a accorde une licence non exclusive pennettant a la Bibliothèque nationale du Canada de reproduire, prêter' disûi'bner ou vendre des copies de cette thèse sous la fonne de microfiche/fIlm, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
Uses of the Fast Fourier Transform (FFT)
in Exact Stat istical Inference
Joseph Beyene (Ph.D. 2001)
Graduate Department of Community Health
Department of Public Health Sciences
University of Toronto
Abstract
We present a unified characteristic function-based fiamework to compute exact sta-
tistical inference. The rnethodology is implemented using the fast Fourier transform
(FFT) algorithm. Euct pvalues for hypotheses of interest are obtained for gener-
atized linear models (GLMs) commonly used in medical and other applied sciences.
Examples are shown to iiiustrate the ease with which the FFT is used to recover
exact probabilities fiom any known characteristic function.
The fr,unework we developed dowed us to incorporate models based on non-
standard underlying enor distributions such as the zero-tmcated binomial and
Poisson distributions. We also have used the methodoIogy to investigate the sen-
sitivity of exact significance Ieveis to miscIassification mors and other mode1 mis
specifications. Potential sources of errors in using the FFT are discussed.
Acknowledgements
First and formost, 1 would like to thank my supervisor, Profeçsor David An-
drews, for his encouragement and guidance throughout my stuclies. 1 have benefited
greatly from his extraordinary talent and intuition about the fieid of statistics. 1
am indebted to Professor Pau1 Corey, who has sefved on my supervising cornmittee,
for providing me with al1 round support over the many years 1 have known him.
1 would &O like to acknowledge the assistance and valuable comments from other
rnernbers of rny supervishg cornmittee, Dr. David Tntchler and Dr. Michael E s
cobar, and my extemal examiner Professor Marcello Pagano. It is a great pleasure
to acknowledge the support and encouragement 1 received from Drs. Shelley Bull,
Mary Corey, Gerarda Darlington and David Tritchler. Danny Lopez and Vartouhi
Jazmaji were always there for me when 1 needed their help.
1 wodd like to thank the Department of Pubiic HeaIth Sciences, University of
Toronto, for hancial support. 1 am indebted to my parents who have taught me the
importance of education and believed in me al1 dong. 1 am grateful to my in-laws
for all their support. Last, but not least, 1 would like to thank my wife, Shafagh
Failah, and my children, Martha and Daniel, for their patience and support.
Contents
List of làbles
List of Figures
1 Introduction 1
. . . . . . . . . . . . . . . . . . . . . . 1.1 Objectiveandscopeofthesis 1
1.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Literaturereview 5
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 The Fourier transform and its applications 8
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction 8
. . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Fourier transform 9
2.2.1 The discrete Fourier transform . . . . . . . . . . . . . . . . . . 10
2.2.2 The fast Fourier transfom (FFT) . . . . . . . . . . . . . . . . 10
. . . . . . . . . . . . . . . . . . 2.2.3 The inverse Fourier transforrn Il
2.3 Frorn characteristic fnnctions to probability functions . . . . . . . . . 13
. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 IlIustrative examples 10
2.4.1 Simple discrete randorn variable . . . . . . . . . . . . . . . . . 15
2.4.2 Generalization of BernouIli tri& . . . . . . . . . . . . . . . . 19
. . . . . . . . . . . 2.4.3 Weighted sum of discrete random variables 23
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Summary 30
3 Exact inference in gene.r&ed iinear models 32
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction 32
3.2 Mode1 specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Some examples of GLMs . . . . . . . . . . . . . . . . . . . . . 331
. . . . . . . . . . . . . . . . . 3.3 Inference in a Iogistic regression mode1 37
. . . . . . . . . . . . 3.3.1 Cornparison of two binomial proportions 41
3.3.2 Doseresponse experiments . . . . . . . . . . . . . . . . . . . . 43
. . . . . . . . . . . . . . . . . . . . . . 3.4 The Poisson regression mode1 46
3.4.1 Testing Ho : & = O in a simple Poisson regression mode1 . . . 47
3.4.2 Cornparison of two Poisson rate parameters . . . . . . . . . . XI
. . . . . . . . . . . . . . . . . . . . . 3.5 The general exponentid family 54
3.5.1 Joint and conditional distributions of d c i e n t statistics . . . 55
3.5.2 Characteristic function for merubers of the exponential family 57
. . . . . . . . . . . . . . . . . . 3.6 Extensions to truncated distributions 59
. . . . . . . . . . . . . . 3.6.1 Truncated Poisson distribution. Pt(A) 60
. . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Numerical example 63
. . . . . . . . . . . . 3.6.3 Truncated binomial distribution, &(n. p) 67
. . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Analysis of error bounds 68
. . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Sources of errors 68
. . . . . . . . . . . . . . . 3.7.2 Error in the Geometric distribution 71
. . . . . . . . . . . . . . . . 3.7.3 Error in the Poisson distribution 75
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Summary 76
4 Sensitivity analysis 79
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction 79
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Robustness 80
. . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Misclassification Errors 81
. . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Numerical Example 83
. . . . . . . . . . . . . . . . . . . . . 4.4 Mhpecification of link function 85
. . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Testing for = 0 87
. . . . . . . . . . . . . . . . . 4.4.2 Testing for H, : = c (c # 0) 88
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Summary 91
5 Ahernat ive approaches 93
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2 Srnail sample asymptotics . . . . . . . . . . . . . . . . . . . . . . . . 94 5.3 Large sample results . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.4 Applications to simple logistic regression . . . . . . . . . . . . . . . . 97
5.4.1 The likelihood ratio test . . . . . . . . . . . . . . . . . . . . . 98
5.4.2 TheWaldtest. . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.4.3 The double saddlepoint approximation . . . . . . . . . . . . . 99
5.5 Summary , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IO0
6 Summary and discussion 101
A FFT program for Beta-binomial 112
B FFT program for a weighted sum of random variabies 114
C FFT program for binary regression 116
D FFT program for Poisson regression 119
E FFT program for zero-truncated Poisson regression mode1 122
F FFT program for error d y s i s in geometric distribution 125
vii
List of Tables
. . . . . . . . . . . . . 2.1 A simple numerical example of a 4-point DFT 18
. . . . . . . . . 3.1 Data for the cornparison of two binomial proportions 42
. . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A dose-response example 43
3.3 Effect of insulin on mice at cliffereut dose concentratiors (Source:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finney, 1964) 45
. . . . . . . . . . . . . . . . . . . 3.4 Data for a Poisson regression mode1 47
. . . . . . . . . . . . . . 3.5 Exact pvalues in a Poisson regression model 48
. . . . . . . . . 4.1 Sensitivity of exact gvalues to misclassification error 85
. . . . . . . . . . 4.2 Sensitivity of exact pvalues to link mis-specification 91
List of Figures
2.1 Plot of binomial probabilities obtained using FE"ï as weii as the dbi-
nom function in %Plus. . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Probability m a s function (pmf) of a weighted surn Sn = CL, kXk,
. . . . . . . . . . . . . . . . . . . . . . for n=5, obtained using FFT. 31
3.1 Conditional probability mass function (pmf) of TLI% = for the
Poisson regession mode1 example obtained using FFT . . . . . . . . 49
3.2 Conditional ProbabiIity m a s bction (pmf) of Tl ITo = to for the
zero-truncated Poisson regression model example obt ained using FFT 66
3.3 Relative error in percentages of the FFT for calculating P(X = 0) for
different input sizes N and parameter values X = 3, X = 5, and X = 7. ï ï
Chapter 1
Introduction
1.1 Objective and scope of thesis
Traditionaily, statistical inference has reiied heavily on large-sample approximations
for sampling distributions of parameter estimators and test statistics. In particular,
results developed for continuous data are used in situations where the underiying dis-
tributions are dimete even when such approximations perform poorly with typical
sample sizes. This couid be of great concem in that incorrect and deceiving conclu-
sions cm be reached in studies where each sample point is so vital and expensive as
in biomedical researches (Weerahandi, 1995).
CurrentIy exact inference is often carrieci out using specialized software re-
stricteci typically to specific models. The main purpose of this thesis is to explore the
potential of the Fast Fourier transform (FFT) for obtaining exact answers for some
practical statisticd inference problems covering a wider claçs of models. Strengths
and limitations of the technique are studied.
We show that for cornmon important statistical inference problems of small to
moderate size for which a characteristic function is known expiicitly, the FFT is a
viable tool that d o m recovery of exact probabilities. We developed methodology
and implemented it with computer programmes that can be used within existing
and widely availabIe statistical software. ExampIes in this thesis typically took les
than 5 seconds on a personal computer (Pentium III, 64 MB RkM) running %Plus
2000 under Wmdows operating system.
We considered a unified characteristic function based approach for exact in-
ference in the class of generalized linear mod& and extendeci exact inference to
distributions that can be expressed as weighted exponential family members. We
show that a range of problems can be handled using a characteristic function based
framework which may not easily be incorporated into existing branch-and-bound
based approdes.
The organization of the thesis is as folIows. The second section of this introduc-
tory chapter contains brie.€ historicai highiights of the evolution of exact methods in
statisticd inference. Section 3 presents a brief overview of the literature on &hg
methods for canying ont Uexact'' statistical inference.
Chapter 2 introduces the discrete Fourier transfonu and the fast Fourier trans
form algorithm. Simple examples are worked out in detail to illustrate the usefuiness
of the FFT in recovering probabilities fiom a known characteristic function of a given
random variable.
In Chapter 3, the FFT is used to compute exact pvalues for parameters of
interest in two common g e n d e d Iinear models (GLMs) - the logistic and the
Poisson regession modeb - by generating the conditional distribution of a set of
suitable sufficient statistics for the models. in particular, exact pvalues are o b
tained for a hypothesis of the dope parameter treating the intercept as a nuisance
parameter. It is shown that the characteristic f'ctions in such modeis are easy
to work with and are available in ciosed form. This characteristic function based
approach allowed, among other things, extensions to models with tmcated bino-
mial or truncated Poisson error distributions. This chapter dso examines different
potentiai sources of mors in FFT based methods and demonstrates, with specific
examples, how a tmcation error can be controiIed.
Chapter 4 explores the sensitivity of the exact results to some deviations bom
standard modei speciûcations. The &ect of contamination is investigated.
In Chapter 5 we present a brief account of alternative approaches to exact
methods. Finally, a summary and general discussion is provided in Chapter 6. A
suite of S-Plus prograrns were written to ïmplement examples presented in Chapters
2 to 4. These p r o g r m e s are provided in the Appendices.
Historical notes
The need for methods and techniques for dealing with 'small' sample problems has
Iong been recognized. One such recognition is due to Student L(1908). He introduced
the t-distribution (commonly known as Student's t) for continuous data realizing
that routine use of methods based on a 'large' sampie assumption are inappropriate
when data are limited. His contribution did not go uncriticized by contemporary
advocates of large sample theory (Weerahandi, 1995). In the end, however, the t-
distribution and hence the t-test not only survived the critiques, but also pIayed a
dominant role in statistical inference.
The other remarkable technique developed in recognition of small sample prob
lems was Fisher's exact test for 2x2 contingency tables. S i . R.A. Fisher stated
(Fisher, 1925):
... the traditional machinery of statistical processes is wholly unsuited
to the needs of practicai research. Not only does it take a cannon to shoot
a sparrow, but it misses the sparrow! The daborate mechanism b d t on
'Student is the pseudonym of W h Sealy Goeset who dida't use his r d name due to a policy
by his employer, the brewery Arthur Guinness Sons and Co., against mrk &ne for the h n being
made publie
the theory of idnitely large samples is not accurate enough for simple
laboratory data. Only by systematically tackling s m d sample problems
on their rnerits does it seem possible to apply accurate tests to practicd
data.
Fisher noted that the technique he proposed was computationally intensive
making it impractical for routine use at that tirne. That was true when there were
no hi&-speed computers for the Iaborious computations required by the exact test.
However, the advent of faster machines over the last couple of decades coupled
with the development of suitable algonthms has led to renewed interest in pursuing
research on exact methods.
1.3 Literature review
Over the last few decades there has been a dramatic surge in the number of published
articles that show different ways of implernenting exact methods for a variety of
applications (March, 1972; Baker, 1977; Mehta and Patel, 1980, 1983; Hiji et al.,
1987; Pagano and Tritchler, 1983; nitchler, 1984). The different approaches that
appeared in the literature fail in one of the foilowing three categories:
0 exhaustive enumeration (e-g. March, 1972)
graph-theory baseci network algorithm (e-g. Hirji et al., 1987)
recurrence reiations and Fourier transform (e.g. TntchIer, 1984).
Two comprehensive sunrey papers on exact methods and available algorithm
for computing exact pvaiues are due to A m (1992) and Verbreek and Kroonenber
(1985). A recent book titIed 'Exact Statisticai Methods for Data Anaiysis' (Weera-
handi, 1995) shows various useM methods for exact inference with an emphasis on
normal theory methods such as ANOVA and regression.
Weerahandi (1995) notes that the methods describeci in his book are exact
in the sense that the tests and confidence intemais are based on exact probabiiity
statements rather than on asymptotic approximations. Inferences based on this
approach can be made with nny desired accuracy, provided that assumed parametric
mode1 and/or other assumptions are correct.
Exact methods are also widely used in nonparametric settings. These methods
provide exact pvaiues instead of appraximate fixed-IeveI tests. Most of the exact
nonpararnetric techniques are based on the idea of conditionai inference first intru-
duced by Fisher (1925). The basic principle behind this approach is to elhinate
nuisance parameters fiom the inference problem by conditioning on certain functions
of the observable random variables.
In most cases, sufücient statistics are used as conditioning functions. Exact
pvaiues are obtained as conditional probabilities based on extreme regions and
they serve as a measure of how w d the data supports or dismedits the underIyi~g
1.4 S u m m a r y
In some disciplines in which md-sampIes are the nom rather than the exception,
the traditional large-sarnple based rnethods of inference may not be valid. Statistical
consultants are f d a r with the emphasis most subject-matter researchers put on
sigdicance levels ( p d u e s < 0.05!) to the extent that the dissemination of the
findings might depend on them. While this practice is dangerous and not something
statisticians endorse withou t question, it is useful however to provide significance
values as accurately as possible. This thesis explores the practical uses of one method
of accomplishing this task, an approach baseci on a Fourier transform method and
implemented using the fast Fourier transform algorithm, in which exact probabilities
are recovered h m a known characteristic function.
Chapter 2
The Fourier transform and its
applications
2.1 Introduction
This chapter introduces the theory and applications of the Fourier transform (FT)
and estabiishes its connection with a generating function that is familiar to statisti-
cians, namely, the characteristic function of a random variable or vector. A popular
algorithm known as the fast Fourier tmndorm (Fm) is inhoduceci and simple ex-
amples are shown to illustrate the transformation from one dornain to another, in
this case, h m the characteristic function domain to the domain of probabiiity m a s
fiinction.
2.2 The Fourier transfom
A number of textbooks have been written focusing on different aspects of the Fourier
transform (see, for example, B r a c e d , 1986). Brigham (1988) provides an inte-
grated account of both the theory as well as the various applications of the fast
Fourier transform.
The Fourier transform decomposes or separates a function or a waveform into
sinusoids of different kequencies that sum to the original waveform. It identifies the
different frequency sinusoids and th& respective amplitudes (Brigham, 1988). It
is sirnply a frequency-domain representation (frequency content) of a function and
contains exactly the same information as the original waveform or signal. Fourier
analysis, therefore, ailows one to examine a given function Eiom a different point of
view.
The Fourier transform in one dimension is defineci as
where f(t) is a function to be decornposed into a sum of sinusoids. The argument t
is traditionally used to represent the variation of a quantity over time (e.g., physio.
logical signais). Ln general, a Fourier transform of a fwiction is a cornplex quantity:
where R(w) is the red part, S(w) is the imaginary part, (F(w)( is the amplitude,
-1 w and B(w) is the phase angle of the Fourier transform B(w) = tm [P(,l].
2.2.1 The discrete Fourier transform
The most important motivation for the devdopment of the discrete Fourier trans-
form (DFT) was the n d for numerical computations of transformations using dig-
i t d cornputers. Numerical integration of equation (2.1) yields the DFT formally
dehed by
N-L F(wk) = C f ( t j ) e - i m k t j k = O, 1 , ~ - - , N - 1.
j=O (2-3)
For problems that do not yield to a closed-form Fourier transform sohtion,
the discrete Fourier transform offers a potential method of attack. The challenge
in applying the DFT in practicd problems was that direct computation required
excessive machine time for large N, a time compIexity proportional to P. Therefore
a technique to reduce the computing tirne of the discrete Fourier transform was a
necessity if the method was to be of practicd importance.
2.2.2 The fast Fourier transform (FFT)
In 1965, Cooley and Tukey formaiized the fast Fourier transform (FFT), an dg*
r i t h that reduces computing time of the DFT to a tirne proportional to NIog, N,
N being the number of input data points. For Iâfger N, the improvement in efiiciency
gained by using the FFT algorithm over the direct computation is remarhble.
Similarly, in a twudimeonal problem with dimensions N and hi, the total
nuniber of computations is proportional tu NM log, NM. The FFT is considerd
as a fundamental problem-solving tool in the educational, industrial, and military
sectors (Bracewell, 1986). It is ubiquitous and its widespread usage is evidenced
by the wide variety of apparently unrelated application areas including biomedical
engineering, imaghg, analysis of stock market data, and speech signal-processing.
2.2.3 The inverse Fourier transform
The inverse Fourier transfom (IFT) associated with the Fourier transform given in
equation (2-1) is defined by
f (t) = lm F (w)ehutdf. -00
Given any bounded Nth-order sequence f(k) (a finite sequence of N terrns), the DFT
pair is deüned as
The bt equation is calIed the (direct) DFT, and the second the inverse DFT
(IDET). Generally, both sequences (f(k)) and { Fb)) will be cornplex, Le.,
Operations performed in one domain have correspondhg operations in the
other. For instance, the convolution operation in the time domain becomes a mul-
tiplication operation in the frequency domain, that is,
in other words, a Fourier transform maps a convolution in the time domain to
multiplication in the transform domain.
These facts allow us to move between domains so that operations can be per-
formed where they are easiest or most advantageous. The discrete-time transforms
are used to analyze and to process discrete-the signals. Discretetirne signals ei-
ther e i s t in th& own right - such as daiiy clasing stock market prices - or more
comrnonly, are ob tained by sampling contixiuoustime systems.
The Fourier Series and the discrete-the Fourier transîorms (DTFTs) are ac-
t u d y inverses of each other. The samphg theorem shows how densely to sample
continuoustime signds so that there is no Ioss of information.
2.3 Rom characteristic functions to probability
There exists a oneteone correspondence between frequency functions and their
Fourier transforms. In statistical applications involving random phenornenon, the
Fourier transform is essentialiy the characteristic function corresponding to the dis-
tribution of the random variable (or vector). Hence distributions can be represented
equivaiently by either probability distribution or characteristic functions.
In practice, the kequency b c t i o n is the usual representation since it is the
more intuitive function and the empirical distribution function often serves as the
basis for statistical inference. However, the characteristic function is the canonical
representation of some usefui distributions whose frequency functions can not be
expresseci in closed form.
Ta fm ideas, suppose X is an integer-valued random variabIe with support in
the set {O, 1, , N - 1). Then the characteristic function of X is given by
where pk = P(X = k) is the probabity m a s function (pmf).
Using Euler's reIation eie = cos 8 + i sin 8, we can easily ver@ that $x(w) has
a penod of 27r, that is, it satidies 4(w) = q5(w + 27r) for di w , since
Also, the characteristic function is real-valueci if and only if the correspondhg dis-
tribution function is symmetric around the origin. An example with a real-valued
characteristic function is a random variable that has a standard normal distribution.
To see the connection with the DFT, evaluate the characteristic function at N
equdy spaced values in the interval [O, 2x1:
Here t$ and p form a Fourier transform pair. The above equation defines the DFT
of the sequence of probabilities po, . , PN-L. -4s mentioned earlier, the h ' s are in
generd complex numbers. AIso note that extension of the range of m outside the
range {O, 1, + . . , N - 1) will remit in a periodic sequence consisting of a repetition
of the sequence q , . -. , CM- l. Our interest is in recoverïng the sequence of probabiiities Erom the correspond-
h g sequence of characteristic function values. In other words, we seek to obtain the
sequence of pk's Erom the sequence of %'S. This can be accomplished by using the
inverse DFT operation which is defineci by:
r N-L
2.4 Illustrative examples
There are many situations in which the characteristic function of a random miable
is easily computable but the inverse transform is not easily expressed in a closed
form. In such cases, the discrete Fourier transform approach can be applied and
implemented using the FFT algorithm to obtain the probability distribution of the
random variable. In this section, simple examples are worked out in detail to iiius
trate the inversion process and soi id^ some of the important concepts.
2 -4.1 Simple discret e random variable
For iIlustration purposes, we wüi b t consider a simple example for which both the
the probability m m and characteristic functions are known. Suppose the probability
distribution of a discrete random variable X is given by
1 1/4, i f k = O o r k = l
P ( X = k ) = 318, i f k = 2
118, if k = 3.
FoIIowing the notation introduced in the previous section we have po = pl = 1/4,
f i = 318, and = 118. Using equation (2.5) the characteristic function of X is
Now let us obtain the DFT for N = 4 by evaluating the characteristic function
at the Fourier fiequencies w = F, for m = 0,1,2,3. Using equation (2.6) ae obtain
Assume we ody knew the above coefficients and are interested in remverhg
the underIying probabiiity mass function. From equation (2.7) we have
Simiiar substitutions of the 's cm be nsed to recover pz and p3.
Before we show how the FFT can be used in this very simple example, it
is worth mentioning that Merent software packages may have a siightIy different
implementations of the FFT algorithm. Thus the user shouid make sure that the
output from a given FF'I' implementation gives the expected results for the applica-
tions unch study. In this thesis, the SPlus function Et @-Plus 2000 for Wmdows)
has been used throughout.
Going back to our examph, first we store the q ' s in a vector, Say cc. Each
element of this vector is expressed as a cornplex number z = x + iy, where x and
y are the red and imaginary parts, respectively. Then a normalized FFT is used
on cc, which produces the associated probability values exactly, as can be seen in
the Splus output beiow. In the output [Il represents the position of the element
that foIlows, in this case the first element of the vector. Both the characteristic
and probability function vectors have 4 elements. Note aIso that the imaginary part
of the dements in the probability vector are ail zero. For some of the examples
throughout the thesis it rnight be instructive to present an annotateci output £rom
the outputs of the %plus functions given in the Appendices directly. This wilt help
us see the detaiis of iuput/output stnicture.
--
The foilowing table surnrnarizes the results of the Cpoint DFT-IDFT pair
for the above example. Rom the results shown in the table we observe that in
generai the characteristic function d u e s are cornplex valued and the probability
mass functions are recovered exactly by applying a scaled FFT on the input sequence
of the characteristic function vaIues.
Table 2.1: A simple numerical exampie of a Cpoint DFT
2.4.2 Generalization of Bernoulli trials
Suppose the success probability in a sequence of Bernoulli trials is allowed to vary
fiom tria1 to trial, keeping the trials independent. One approach of dealing with
such instances is to use compound (or hierarchical) models. It is often not too
ciifficuit to derive characteristic functions analyticaüy for cases involving compound
(or mixture) distributions. In this example, we consider a compound mode1 which
relates Bernouili/Binomial random variables with a &ta distribution. We give
theoretical detaiIs of two separate cases and show an imp1ementation for one of
them, and we point out the approach to be taken to implementing the other one.
Case-1: Compound of Bernoulli with beta distribution
First, let
Denote the total number of successes by Y. We want to obtain the probabiIity
distribution of Y by h t computing its characteristic funciion.
The moment generating function of Y is given by
where
rn, (t) = E [E (eaj 1 P = pj ) ]
= E [pjer+ 1- pj]
= E [pj(eL - 1) + 11 = (et - 1) E(Pj) + l
Hence, the characteristic function of Y is given by
where the mean of P is assumed not to depend on j (the homogeneous case). We
note that this characteristic function corresponds to a binomial random variable
with n number of trials and success probabihty p p (Beta is a conjugate prior for
binomial).
C-2: Compound of Binomial with beta distribution
Let
XjlPj - Bin(m, Pj), j = 1, .. . , n,
-4s before, we have
where
Hence, the characteristic function of Y is given by
In this case, we need to know the moments of P (up to order m), and as before
we have assumed that the moments do not depend on j. When m=l, this resdt
reduces to Case-1 above.
The SPlus program in Appendix A was used to compute the probability dis-
tribution of Y as describeci in Casel above. As an example, consider the case n=20
and p = 1/3. The %Plus output bdow shows the support (supp), probabiities
obtained using FFT (prob), probabilities obtained ushg the binomial density dbi-
nom function in Splus, and the ciifference between the Iatter two. Inspection of
each probability mas shows that the recovery was indeed exact. These probabilities
are also shown in Figure 2.1.
> betabin(20,1/3)
W prob bin dif f
1 , O 3-0072876-004 3 .OO728?e-OO4 7.291260e-017
C2,l 1 3.007287e-003 3.007287e-003 -9.9746606-018
C3 ,] 2 1.4284616-302 1.4284616-002 -1 -335737e-016
E4.1 3 4.285383e-002 4.2853836-002 -2.220446e-016
CS ,1 4 9.106440e-002 9.10644Oe-002 1.804112e-016
C6 5 1.467030e-001 1.457030e-001 -2.220446e-016
17 ,] 6 1.821288e-001 1.821288e-001 -3.053113e-016
C8 ,J 7 1.821288e-O01 1.821288e-001 -6.55lll5e-Ol6
9 1 8 l.479796a-OOl 1.4797 96e-001 -4.996004e-016
2.4.3 Weighted sum of discrete random variables
A statistic involving a weighted sum of random variables is fiequently used in sta-
tisticai inference. When independence between the random variables is a reasonable
assurnption, weighted sums have several desirable properties. For instance, calcula-
tions of distributionai characteristics [e.g., i h t and second order moments, various
generating functions) wiU be greatly simpMed if independence of the variables in
the sum may be assumd. The key statistical functionals that w i l I be expIored in
detail in the next chapter are linear combinations of independent random variabIes.
Figure 2.1: Plot of binomial probabilities obtained using FFT as well as the dbinom
function in %Plus
Here we demonstrate the uses of the FFT based approach in a relatively small but
practical problem.
Let X1, Xz, . -. , X, be i.i.d. discrete random variables with probability mass
function (pmf)
f(x) =x/21, X = 1,2,---,6.
Suppose we are interesteci in the distribution of the weighted sum Sn = CL, kXk.
It can easily be shown that the necessary condition for the Lindeberg-Feller central
limit theorem for weighted sums holds true, and accordingly, it follows that
i.e., for suEciently large n, Sn is apprdmately normdy distributed.
For small n, however, this asymptotic result may not be satisfactory and the
need ariseç for the evaluation of its exact distribution. Although our aim is to
iiiustrate the process of converting a characteristic function in order to recover a
probability function, it muid be interesthg to discuss at this point an alterna-
tive approach for computing exact probabilities in this particular problem. This
approach makes use of yet another type of generating function known as the prob-
ability generating function (PGF). The PGF, p, of a non-negative integer-dued
random variable X is d&ed by
It is known that P(X=k) ca~ be mvered from P(~)(O), where ~ ( ~ ) ( t ) is the k"
derivative of the PGF. To be s p d c ,
Suppose n = 5. Sn as dehed above is a non-negative integer-valued random
variable with support, Supp(S,) = (15: 16, -, 90). Hence we can apply the resuit
in equation (2.9) to recover exact prababilities. The PGF of S is given by
where
Note that Ps(t)
hence the exact
z=l
is a polynomial and it is easily seen that
probability corresponding to support k is simply the coefficient of
the term tk in the polynomiai expansion defiiiing the PGF. Suppose we are interesteci
in computing exact t d probabilities sneh as P(S < 16) or P(S > 88), we can do
so by adding corresponding polynoniial coefkients. Mathematical packages such as
MapIe or Mathematica can be med to faditate the comput ation.
For this examde, we used MapIe (Maple V, Waterloo) and the entire expansion
of the polynomid m t is shown on the following page.
32 80 48 584 296 376 596 PGF :- - tgO + -
16807 50421 ta9 + iss07 P+yj1263 tE + j042i t" + 64827 t8' + jaizi ta
1636 2036 332 24338 8878 29956 tn +- 151263 t s 3 + ~ ta + 2~soa + 1361367 + i%Ei
119932 tPJ + 1361367
3616 35368 113051 125353 t73 +- 151263 tn + 1361367 t76 + 4084101 t75 + 4084101 t74 + 4084101 130442 134452 45964 139870 15670 ta
+ 4084101 '" + 4084101 tn + 1361367 tm + 4084101 t6g + ZEKl 141388 140542 19883 19526 133534
+ 4084101 t67 + 4084101 tW+ js34n? tm + %zz t" + 4084101 129754 13927 4436 114727 108656 p
+ 4084101 + z%-ii + E i 5 + 40û4101 t59 + 4084101 102442 93696 89422 27590 25442
+ 4084101 "' + 4084101 tM + 4084101 37382
ts + 1361367 51682
tM + ,61367 69610 63446 46070
+ 4084101 t52 + 4084101 t5L ' 4084101 ta + 4084101 t4g ' 4084101 13661 12034 3527 3062 7963 t43
+ 1361367 t47 + 1361367 + 453789 t45 + iiEz
+ 1361367 20494 2498 2108 12394 10316
+ 4084101 ta + -
583443 t4I + ;i83À; t40 +
4084101 + 4084101
We can see, for example, that P(S = 90) = 32116807 (coefEcient corresponding
to i?") and P(S = 15) = 1/4084101 (cofficient corresponding to t15). These extreme
cases are simpIe to check by direct calculation. For example, S=90 can occur only
when each of the X's takes on the value 6, and thus,
Similarly, S=l5 can happen only when each of the X's takes on the value 1, and,
once again direct calculation can be carried out easily. Cumulative probabilities
such as the probability that the random variable S takes on values ut rnost 87 is
obtained as
£rom which tail-ma probabilities can be computed.
Now let us follow this example further, this time using the Fourier transform
approach. Since the characteristic function of the Xi's is given by dw(t) = ~ ( e ' ~ ) =
eitzx/21, the characteristic function of S can easiIy be derived as: ds(t) = Cz=r
E(eis) = nLI #xi (kt). The recovery of the prnf of S h m its characteristic function
can be performed as foiIows (Appendix B gives the S-Plus code that was used in
this example). We evaluate the characteristic functions h m O to 90.
Note that, as expected, the probabilities are O when S takes on values in the
range 14 and under. We can also verify the other results with that obtained fiom
the PGF approach. For example, P (S = 90) = 32/16807 = 0.001903969 (this value
is shown in the SPIus output above as the 91th elment, since vector indexing
starts Erom 1 in this package). SimiIarly the probabiIity that S equaIs 50 is given
by P ( S = 50) = 0.01405009, which was earlier shown in its fractional form on the
output fiom Maple as 57382/4084101.
Figure 2.2 shows the pmf of Sn for n=5 as obtained using the FFT approach.
A slight skewness to the Ieft can be noticed hom this graph.
To give an idea of the improvement in computationd complexity achieved
by using the FFT in the above example, let us consider the input size (without
any padding) N=75. A direct inversion of the DFT would require 7S2 = 5625
multiplication of complex numbers, whereas the number of multiplications required
by the FFT will be U(75 loa(75)) = 0(467), a reduction by a factor of 12.
This chapter introduced the theory and applications of the Fourier transform (FT)
as weii as the connection with the characteristic function of a given random variable.
The fast Fourier transform aigorithm was used to recover exact probabilities in three
simple but practical applications.
Figure 2.2: Probability n i a s function (prnf) of a weighted sum Sn = CLL kXk, for
n=5, obtained using FFT.
Chapter 3
Exact inference in generalized
linear models
3.1 Introduction
Since its introduction by NeIder and Wedderburn in 1972 as a unifying approach
to the regression analysis of both continuous and categoricai outcome variables,
the generalized linear mode1 (GLM) h e w o r k has been used successfully in many
areas of application. This dass of modek extends the traditional linear models by
allowing non-normal response distributions and suitable transformations to linearity.
In medical applications, the logistic regression and the Poisson regression models
are two of the most commoniy used GL-Ms. A comprehensive and cIassic reference
describing the theory and applications of GLMs is due to Wul lagh and Nelder
(1989).
ÙiferentiaI procedures for GLiMs are often based on traditionai large-sample
theory. However, one can cite several instances in which the need to deai with
small-sample situations mises:
0 very expensive experiments or 'costly' sacdices
usefui information may be avaiiable at an early stage of an experiment when
only few patients have been recruited
0 in large-scale multi-center clinical trials, a srnaIl part of the data may represent
the contribution of one of the smaiier centers.
In this chapter we show that the FE'T approach can be used to conduct ex-
act inference in most commonly used GLMs. No fancy software is required. This
is facilitateci by the ease with which characteristic functions of d c i e n t statistics
are computed. In particuiar, we present the two most commoniy used GLMs -
the logistic and Poisson regression models - and demonstrate with examples the
feasibility of computing exact pvaIues for hypotheses of interest. We extend our
formulation of the exponential family to inchde weighted exponential families lead-
hg, among others, ta interesthg specid cases of the logistic and Foisson models.
To get insight into the errors that may resdt born using the FFT, we investigate
the consequences of truncation errors for the Geometric and Poisson distributions.
3.2 Mode1 specification
A generalized linear mode1 is characterized by the following threepart specification:
1) Random Cornponent: each component of Y has a distribution in the e x p
nential famiiy, taking the fom
for some specific functions a(.), b(.), and c(.). If q5 is known, this is a one-parameter
exponential-farniIy mode1 wit h camnicd parameter 0 (McCullagh and Nelder, 1989);
2 ) Systernatic Componenk a h e m combination of the covariates, sometimes
known as a linear predictor, given by
3) Link finction: the iink between the randorn and systematic components is
given by
7h. = !I(&)l
where g(.), is any monotonic differentiable function known as the link function.
Special link functions known as canonid links occur when Bi = qi.
3.2.1 Some examples of GLMs
1. Linear regression
The standard linear model satidies the above GLM formulation with a Nor-
mal distribution for the random component and the identity function for the link
component. The normal distribution belongs to the exponential family.
2. Poisson regression
Assume that K, . . , Y, are independent Poisson random variables with rneans
pi, where pi > O (i = 1,~-.,n). Tt can be shown that the Poisson distribution
belongs to the exponential f d y . Using a logarithmic link function the standard
Poisson regression model is written as
in(/&) = %=B.
The link function In(.) is the canonical iink and maps the intervai [O, m) onto
(-CU, m). The identity link may not be appropriate, in part, because q may be
negative while p must not be.
3. Binary regression
Suppose YI, . - , Y, are independent Bernoulli random variables 6 t h mean pi,
where O 2 5 1 (à = 1,. , n). A potentiai link functiûn in this case should map
the interval [OJ] onto (-00, CU).
The following three lùik functions are used commonly with binomial distribu-
tion:
where 3 is the standard normal cumulative distribution function;
3. Complementary log-log
The logit hnk is the cananical iink for the binomial distribution and is by fa
the most popular. It is widely used in the health sciences, in particular in epidemi-
ological studies, because of the resuiting odds ratio interpretation of the regression
parameter estimates. In addition to its useful interpretation, the Iogit link is also
mathematicall y simpler. Sensitivity of exact t ail probabilit ies to niisspecification of
a link function are explorecl later in Chapter 4.
Note that, as in the Poisson regression case, the identity link is l e s useful with
binary regression, partly due to possible predicted probabiiities falling below zero
or above one.
3.3 Inference in a logistic regression model
Assume that Y = (& - -; &)' is a vector of independent binomial random variables,
that is K .v Bin(w, pi).
The linear logistic regression model is given by
where Iogit(p) is an n-vector of Iog-odds, log(pi/(l - pi)), X is a known n x p
full-column rank matrix of explanatory variabIes whose components cm, in general,
be either quantitative or qualitative and jl is an unknown pvector of regression
paramet ers.
Using Neyman's Factorization theorem, it can easily be shown that T = X'y
is sufficient for @ under the model specifieci above.
In some situation, it might be appropriate to work with
where w is an n-vector of constants. In this case, the primary parameter of interest
is often the scalar A, and the common test of hypothesis is Ho : A = 0.
The sufficient statistics are T and S = w'y. Conditional inference for testing
the hypothesis A = O can be Carried out by working with a conditional reference
distribution. This reference distribution is given by
where t is the obçerved value of T. This conditional distribution does not depend on
the regression parameter vector 8, which in this case may be treated as a nuisance.
In epidemiology and many other medical applications, it is o h convenient to
tbink in terms of the odds of "success" p / ( l - p ) rather than the "success" probability
P*
In particular, let Yi - Bàn(nl,pl) and & - Bàn(m,p2) be two independent
binomial random va.riables and $J = be the odds ratio, i.e. the ratio of the
odds of "success" for YL to the odds of "success" for K.
The conditional distribution of YI given YL + 5 = t is given by
for
max(0, t - nz} 5 u 5 naàn(nt, t).
This conditional distribution is the non-central hyper-geometric distribution.
Wben $ = 1, equivdently pi = pr , we obtain an important special case, the (central)
hyper-geometric distribution with probability m a s function
The n d hypothesis of the equality of the two binomial parameters pl and pz, or
equivalently the hypothesis that the odds ratio is unity, can be cast as an inferential
problem in the cIass of GLMs in the foilowing way. Let x be a binary variable
indicating group membership that can be written (without any loss of generality)
1, if i is in Group 1 xi =
0, otherwise.
and consider the mode1
logit (pi) = /Yo + xi.
Then the nuii hypothesis of interest is equivaient to testing the hypothesis H,, : & =
O. Define a pvalue for a two-sided test as
where
p- (.) = Pr(. 5 obs), p+(.) = Pr(. 2 06s)
are signiscance levels corresponding to Uniformiy Most Powerful Unbiased (UMPU)
lower and upper one-sided tests of Ho (Cox, 1970).
In order to use the FFT approach, first we need to derive the joint characteristic
function of the pair of sufEcient statistics under the simple logistic regression mode1
shown above- In this case, the vector T = (Tor Ti) = (C(X), C(x&)) is sufEcient
for /3 = (Bo, pi). The joint moment generating function is d h e d as
where .Y Bk(%, pi). The joint characteristic fimction is thus defined a s
where the i preceding to and tl is the notation as it is usai in complex numbers.
Inference about pl is based on the conditional distribution of TL given To = to,
The implementation of recovering this conditional distribution is done itsing
the foiiowing 4 steps:
1. evaluate the joint characteristic function on a grid of values as demonstrated
in Chapter 2;
2. recover the joint probability distribution of To and TL using FFT (this is the
numerator in the formula for the conditional distribution of TL given To);
3. compute the marginal distribution of To by summing over all support of T,(this
is the denominator in the formula for the conditional distribution of TL given
To);
4. divide the joint distribution (step 2) by the m+ distribution (step 3)
t O get the mmple te reference conditionai distribution. Useful characteristics
of this reference distribution can then easily be &racted. For exampIe, tail
uea probabifities cm be dcuiated using the additional information on the
o h e d value of the conditional random variable, Tl = t L .
3.3.1 Cornparison of two binomial proport ions
For illustration purpaes, consider the following hypothetical example (we will con-
sider an example involving real data a littIe Iater). A twearm pardel group design
has been carrieci out in a clinicai triai with the purpose of comparing adverse out-
cornes in each of the treatment groups. Suppose 22 patients were randornly docated
to ea& treatment arm and after a pre-spded follow-up thne the fiequency of ad-
verse outcomes was counted. The number and percent of adverse outcomes in the
two groups were 2/22 (9.1%) and 8/22 (36.4%), respectiv& (Table 3.1).
Exact inference under the null hypothesis that the risk of adverse outcome is
not merent in the two treatment groups was conducted using the SPlus program
given in Appendix C. The output below shows how the function is invoked and the
d u e s retunied by the function (Ieft-hand side pvalue, right-hand side p-value, and
W d e d pvalue).
Note that in the above example the function BFt2d.n~ is invoked with the
Table 3.1: Data for the cornparison of two binomial proportions
Group Adverse Normal Total
(36.36) (63.64)
Tot al 10 34 44
iink function defaulted to logit and no contamination was assumed in the data. In
Chapter 4 we wiU present generalizations in which the sensitivity of exact results to
some sort of departure £iom mode1 assumptions is empincally investigated.
Using the FREQ procedure in the SAS statisticd software (SAS Institute Inc.,
Cary, North Carolina), the left, right, and Ztail Fier's exact test probabilities
were 0.034, 0.995, and 0.069, respectively.
We note that the p-values are identical to the ones obtained using the FFT.
However, it shodd be noted that there exist different d a t i o n s of twesideci p
values tbat could lead to different results (Agresti, 1992).
3.3.2 Doseresponse experiments
Suppose a srnd dose-experiment is conducted in which p u p s of 10 experimental
subjects were exposecl to 3 different dose concentrations (O, 1, and 2 units). A binary
response of interest was recorded and the data resulting fiom th& experiment are
shown below.
TabIe 3.2: A dose-remonse examde
A simple logistic regression model, logitlpi) = + j?ixi, was assumed to
be an appropriate model for this data set. Once again we are interesteci in the
dope regreçsion parameter, a, and thus the intercept term, A, will be treated as
a nuisance parameter. Using our FFT program (Appendix C), exact pvalues were
obtained for testing the hypothesis = O. The one-sided (right hmd) pvalue was
0.1352204 and the corresponding 2-tail p-value was 0.2704408.
The SpIus function of Appendix C retum several quantities of interest dong
with the tail area probabiiities. One quantity which is very heIpful in monitor-
ing whether the exact pvalues might be inaccurate is the marginal probability of
the suflicient statistics corresponding to the nuisance parameter evaluated at the
observed value, Le., P(To = ta). In the event this probability is ahost zero, the
pvalues may not be accurate, since this value enters in the denominator of the
conditional probabilities. Vouset et al. (1991) and Huji et al. (1996) show some
specsc applications in whkh inaccurate results may be observeti with FFT based
methods of exact inference. The observeci values for the pair of diicient statistics
T = (G, TL) are also retnrned by our SPlus function.
Using LogXact package (LogXact-Ti~rbo, 1993), the one sided pvalue for the
same hypothesis based on the above data was 0.1352, which is identical tu what we
obtained above using the FFT method. To continue with this example, suppose an
additional 10 subjects were subjected to a 4-unit dose and 7 of them responded. We
wouId now want to anaiyze the combineci data which consists of 40 experimental
nnits. Repeating the same analysis as above resulted in a one-sided pvalue of
0.04626505 using the FM' and 0.0463 with LogXact. To further make this example
close to a reaiistic doseresponse experiment let us consider one more dose Ievel set
at 5 and again 10 subjects being exposed to this dose b e l . Let us also assume there
were 8 responders. AndyWng the combined data with a total of 50 experimentd
units using the FFT method we obtain a onkded p d u e of 0.009967243. The
rounded one-sided pvdue obtained fiom LogXact was 0.0100.
Next we will consider a red data set taken from Finney (1964, Table 17.2).
The data describes the effect of insulin on mice. On a suitable log-scale for the dose,
the data for a standard preparation of insulin are shown in the table below.
Table 3.3: Eect of insuiin on mice at different dose concentrations (Source: Finney,
The exact onesideci pvaiues for the test of a h e a r dose effect obtained using
the FFT approach was 0.00004358595 and the correspondhg pvalue fiom LogXact
(to 6 decimal places) was 0.000044. The above examples ali show that the FE'T
approach produces the same pvaiues as those obtained using well known commercial
p*ge-
3.4 The Poisson regression mode1
In this section, we will study the simple Poisson regression modd. As before we wili
focus on the nuil hypothesis that the slope parameter is zero which can aIso be cast
as the cornparison of two or more Poisson mean parameters.
A connection between log-linear models for fiequencies and multinomial re-
sponse modds for proportions &sts which stems from the fact that the binomiai
and multinomial distributions can be derived from a set of independent Poisson
random variables conditionaily on their total being fixed.
Let YL, . , Yk be independent Poisson random variables with means AL, - . . , Xk,
respectively. One may be interested in testing the composite nul1 hypothesis that
the mean parameters are equal:
where the xjs are assumed to be hed known constants. When = O it is dear
that we are testing the equaIity of Poisson mean parameters.
Again, standard theory ofsignificance testing leads to consideration of the test
statistic TL = Cx,cjE;- conditionally on the observed value of t, = C gj, which is the
mifficient statistic for Po. Generalization to k-parameter exponential families will be
discussed in a subsequent section.
3.4.1 Testing Ho : & = O in a simple Poisson regression
model
The data bdow shows counts pi observeci at various values of a covariate x.
Table 3.4: Data for a Poisson regression model
We assume that the responses are independent Poisson random variables,
with E ( x ) = Var(x), where
for i = 1, ---, 7. The canonid iink for a Poisson regession modei is the log link.
Fitting this canonical Iink and using our FFT program written for the Poisson modei
(Appendix D), we obtain the following results and the entire conditional reference
distribution is depicted in the figure on the next page. The function was iuvoked
as pois2d(x=x,y=y), where the vectors x and y store the covariate and response
values, respectively.
Table 3.5: Exact pvalues in a Poisson regression mode1
t0.o 67
To our knowledge, there was no software available at the time of writing to
conduct exact analysis for an arbitraty Poisson regression model against which our
resdts c m be checked. Table 3.5 shows the two onesided pvaiues dong with the
two-tailed pvalue corresponding to a test of Ho : Br = O in the simple Poisson
regression model. We also give the marginal probability, p = 0.04867797, used as
the denominator in the calculation of the conditional probabiity. Again when this
marginal probability is dciently close to zero, numericd instability may result and
the p d u e s may not be calculated accurately using the FFT method. The observeci
values of the pair of sufücient statistics, also shown in Table 3.0, give us an idea on
how big the grid of evaiuation of the characteristic function could be.
Ili,..
Figure 3.1: Conditional probability m a s function (pmf) of Tl ITo = to for the Poisson
regression mode1 example obtained using FFT
3.4.2 Cornparison of two Poisson rate parameters
To venfy the validity and accuracy of exact tail probabilities under a Poisson re-
gression model, we used an example for which we have a theoretical j d c a t i o n .
Suppose that YI and Y2 are independent Poisson random variables with means X
and pX, respectively. It can be shown that
and
lY = m Binmàal(m, 1/(1+ p)) .
The second result is the most relevant for our purpme. In particufar, if both YL and
Y2 have a Poissm(X) distribution, this fact states that the conditional distribution
of Yl h b g the sum of YL and 6 to an observed value m yields a BinmàaE(m, i/2)
distribution. This speciai case is equivalent to testing for equaliw of the two Poisson
mean paramet ers, since
H , : X = p X + p = l .
NOW let us use a simple example to illustrate this fact and, more importantly, conhm
that what we get out of our FFT formulation for the Poisson regession problem is
indeed an exact conditional probabiIity m a s function.
To do this, without loss of generality, fix the Poisson parameter at X = 1. In
the regretsion framework, this is equivalent to setting both Bo and & to zero and
use a dummy variable 1/0 for the crwariate x Under these conditions, the pair of
s5cient statistics To = CL, Yi and Tl = CL1 xik;: reduce to To = Yl + 5 and
Tl = YI, respectively.
Suppose two independent random values are generated h m a Poisson(1) dis-
tribution, i.e., a Poisson distribution with mean parameter X = 1. This can be
achieved, for instance, using the rpois function in S-Plus which generates random
numbers £rom a Poisson distribution as foilows:
- - - -
If the total of the two Poisson variates is fixed at the observed \due, in this
case at 3, the above theoretical result says that the conditional ditribution of &
given this total is Binomial(3,1/2). The probabilities for a Binomial(J,l/2) random
variable corresponding to the support set {0,1,2,3) are:
And here is the conditional probability distribution of TL lTo = 3 obtained using
our FET S-Plus program (Appendix D) which implements the Poisson regression
model.
Note that the FE'T was used at 36 input sequences (frequencies) to make sure
d or almost ail the probability mass has been recovered. As we can see we were able
to recover the binomid probabiIities exactiy (at least to 8 decimal places accuracy
!). In general, we use this exact conditional probability distribution as the basis for
inference about the regression parameter of interest. In this example we realize we
have used only 2 observed values h m a Poisson(1) probability distribution. Does
the method still work if we use two Merent realizations? To investigate tbis concept
further, let us considzï hxïie cxtlmxie cases. T'hm iz u !. h 100,000 chance that a
value of 7 can be o k e d and a 51 in 100,000 probability that a value of 5 ca. be
obtained h m a Poisson(1) distribution as can be seen h m the probability values
computed h g the dpois(for Poisson density) function in SPLus and shown below-
-- -
This observation may Iead us to ask what happens if we took y, and y2 to be 5
and 7 respectively? According to the theoretical result for a conditional distribution
of one Poisson variate given the sum of two independent Poisson random variables,
the sum is 12 and hence the conditional distribution will be Binomial(l2,1/2). The
probabiIities for a Binomial(l2,1/2) random variable are:
Using the Poisson regression mode1 FFT program (Appendix D), m obtained
the conditionai probability distribution shown in the foiIowing output:
In this case there was no need to carry out the FFT on an extended input
sequence. Only 13 inputs (Fourier fkequencies) were used and it was possible to
recover the exact probabilities as can be seen comparing the FFT results with the
probabüities generated using the exact Binomial distribution and displayed above.
3.5 The general exponential family
In the preceding few sections, we have studied how to compute exact conditional
probabilities to make inferences in two widely used models, the logistic and Poisson
regression models. In this section we lay out the theoreticai justification of the valid-
ity of doing inversions of characteristic functions to recover probabiIity distributions
for a more p e r d class of distributions.
The foliowing definition provides a çlightiy different parameterization fiom
that shown in i3.2 for distributions in the exponentid family.
A family of distribution with p.m.f. f (x; 6) is said to bdong to the exponential
f d y of distributions if f (x; 8) can be expressed in the form
f (z; 6) = a(6) b(x)ex cj(e)dj(d
for a suitable choice of functions a(), b(), c() and d o , where 0 is a vector of
paramet m.
If cj (6) = Bi, j = 1, , k , the family is said to have its natural parameter-
ization. In this case T is complete sdûcient statistics for (O1, + - , Ok) (Lehmann,
1983).
3.5.1 Joint and conditional distributions of sufücient statis-
tics
By way of generalization, below we state a theorem (without proof) and give a
corollary (with proof). The corollary is in particular relevant to the inferential
problem considered in this thesis.
Theorem 3.1 (Source: Warahandi, 1995) Let T = (C TI (Xi), - . -C Tk(Xi)) be the
wmplede suficient-statistics bmed on a random sample h m the exponential family
having the natunzf parameterizdion.
The joint p.m$ or p.d$ of T is oj the fonn
where t j = ry==, TJ(Xi).
Let T = (U, V) be a partition of the set of complete sufficient statistics. Let
O = (O,, 8,) be the corresponding partition of the parameter vector. The foliow-
ing useful coroiiary concerning the conditional distributions of one given the other
(and the marginal distribution of each cornponent) can be deduced fiom the above
theorem.
Corollary 3.1 The conditional distribution o f U given V = v forms an ezponentiul
jamily. Moreover, this distribution is independent of parameters O,.
Proof: (without 108s of generality, consider the continuous case)
fu,v(~, V; 8) = A(0) B (u, v)eGU+*"
The marginal distribution of V can be obtained by integrating this joint density
with respect to u. Therefore, the conditional density function
is exponentiai f a d y independent of 8,.
What this result says is that in principle the conditional distribution needed
for inference on mode1 parameters of interest is available in closed-form as a mem-
ber of the exponential family. In practice, however, there is limitation in that the
combinatorics involved could be daunting. With the ever-growing improvements in
speed and storage capacity of today's computers, a method such as the FE'T could
be applied in many situations where the sample size is small to moderate.
3.5.2 Characteristic function for members of the exponen-
tid family
If a family of distribution can be expressed in the form
for x E A, where A is a set independent of 8, then the sufficient statistic is k-
dimensional. A n a t d parameterization (canonical form) is given by
The canonicd form has several advantages. For example, it is much easier to com-
pute moments and other featnres of sufficient statistics.
For the simple logistic regession modeI, the density of (Kr s , YR) is
Note that this is a Zparameter (with respect to and Br) exponential family
and the sufficient statistics corresponding to the natural parameters are
T, = Cz&.
The joint characteristic function of the d c i e n t statistics Tl and T2 is derived
provided (Bo + sl, + s2) Iie in the natural parameter space (true if space is open
and both sl and s2 are sufficiently close to zero; see Knight, 2000).
Thus we have,
This formula can be cxtended to the k-dimensional case in a straightforward
manner. The inversion of the characteristic function will be carried out, in principle,
in the same way as in the 2-dimensional case.
Regremion models wit h continuous underlying error distributions (e.g., mode1
with a gamma error distribution), can be handled using the FFT in the same manner
as modeIs considerd in this chapter so far. In such cases, numerical inversion of the
characteristic Function can be used to obtain numerical estimates of the distribu-
tion function. This is accomplished by constructing a finite Fourier series (dismete
Fourier transform) that a p p r h a t e s the density over a sp&& finite interval,
a s described in $2.2.1. However, an additional source of error - m o r resulting
fkom discretizing the inherently continuous variable (&O known as sampling error),
would be introduced in such cases and one has to use very dense sampling scheme
to minimize this enor.
3.6 Extensions to truncated distributions
In this section we show that the frarnework we have developed d o m exact inference
in some interesting extensions. We wiIl provide the theoreticai d e t d for two such
extensions and highiight that the impIementation of the methods involve a slight
modification of the $Plus programmes that have been used and discussed in previous
sections.
3.6.1 h c a t e d Poisson distribution, Pt(X)
The truncated Poisson distribution ( a h known as the Zero-truncated Poisson dis-
tribution) has an important applications in some fields of study. For example, the
information technology (IT) manager of a certain Company may have a record of
the number of reported computer crashes (failures) per a defined time interval, say
per month, for each user in his/her network domain. Typicaily, users who did not
experience any crash wiil not report and therefore the value zero w i i l not be ob-
senteci. The manager is interesteci in developing a mode1 in an effort to pinpoint
factors aasociated 6 t h the rate of failure. In this case, the counts of crashes Y may
be assumed to follow a Poisson distribution conditional on Y 2 1. This distribution
is formaüy defined as
where A is the rate of failure.
To see that this distribution belongs to the family of exponential families, we
rewrite the probabiity mass function in the following form:
P(Y = y) = e zp {ylog~ - log($ - 1) - logy!} .
The moment generating function, and thus the characteristic function, of Y is
easily obtained fiom first principles as
and so,
Now let us consider the case of n independent, but not identicdy distributed
truncated Poisson random variabIes. As before, we will do so by, for example,
Mposing a parametric mode1 relating the rate parameters to a fixeci (measured)
covariate. if we want to study one covariate, the model we might be interested to
fit and test a hypothesis on codd be
log(Xi) = ?O + A x i -
The parameter of interest w i i i be the regression coefficient associated with the
covariate, 8'. Exact inference is based on the same pair of sdEcient statistics we
have seen before, as can be just%ed by the following result.
This is a 2-parameter exponential family and T = (C x, C xi:iY) = (To, TL) is
sufkient for 6 = (Ba, Br).
For the tnincated Poisson regression mode1
n
&(Bo, 81) = C log (ego+'" - 1) i= L
In the i i d . case, the m&um likelihood estimator (MLE) for the mean
parameter X of a standard Poisson distribution is the sample mean, 1 = C F. For
the zer~-tnincated Poisson, on the other hand, there is no dosed form solution for
the MLE of A. The MLE h satisfies
This equation is often solved using some kind of iterative technique such as
the Newton-Raphson or K M algorithm.
Using the methodology outiined in this section, it t u m out that conducting
an exact hypothesis test in the truncated Poisson case pose no additionai complexity
- conceptuai or computational.
3.6.2 Numerical example
As mentioned above, caiculation of exact tail probabilities for a regression mudel
baseci on a zeretnuicated Poisson distribution is straightfomd using the FFT
approach provideci the correct characteristic function is evaluated at the appropnate
range of Fourier fiequencies. The S-PIUS programme in Appendix E provides a
function that returns exact tail area probabiIities for this probIem and the following
example shows the results in comparison with the ordinary Poisson model.
Suppose the the following vectors of response (yp) and covariate value (xp) were
available for analysis. One can think of the counts of this example as realization fiom
a Poisson process and the fixeci conriate values as quantities measuring intensity of
a risk factor.
'YP
C l ] 3 4 4 5 4 7 9
'rp
C l J O O l 1 1 2 2
When we analyzed this data set using an ordinary Poisson regression model,
we obtained the following exact tail-area probabilities for testing the hypothesis
81 = O:
When the same data set was analyzed assuming a zero-truncated Poison
model for the response vector, the following resdts were obtained:
Spval . two : [il 0 .O5655095
The complete conditianal reference distribution of TL given To = to was o b
t ained using FFT, assurning an underlying zemtnincated Poisson error distribution,
and depicted in the figure bdow.
Two rernarks are in order. First the choice between a zero-truncated Poisson
mode1 versus a Poisson distribution that does not exclude zero is of course depen-
dent upon knowIedge of the data generating experiment. Second, although in this
exampie the pvdues are quite simiiar, there is a tendency for them to be dinerent
and therefore these exact caldations could be more important in some situations
than others.
Figure 3.2: Conditional Probability mass function (pmf) of TiITo = to for the zer+
truncated Poisson regression mode1 example obtained using FFT
3.6.3 'Ituncated binomial distribution, Bt(n, p)
For reasons similar to those which give rise to the need for a tmcated Poisson
distribution, the Zero-truncated binomial distribution is sometimes used as a model.
For example, consider a genetic trait, which is not dkectly observable, but wiii cause
a disease, among a certain proportion of the individuals that have it. For f d e s in
which one member has the disease, it may be of interest to estimate the proportion p
that has the genetic trait and test hypotheses related to this parameter of interest.
Let Y be the number of members that have the trait in a f d y of n members
where one has the h a s e (and thus also the trait). Since Y 3 1 the Zero-truncated
binomial distribution may be appropriate.
The Zmtruncated binomial distribution is defmed by
and it is a member of the exponential family as can be seen from
P(Y = y) = ezp + nlog(1 - p) - log (1 - (1 - pl") + log
Using the canonid link, let log (&) = Ba + &xi. Then
The above result suggests that exact inference can be carried out with slight mod-
ification of the program used for the m a l logistic regression, a modification that
changes only the fom of the characteristic function.
3.7 Analysis of error bounds
3.7.1 Sources of errors
When the FFT is used to speed up cornputations in numericd inversions of charac-
teristic functions of red random variabIes, three types of errors may be introduced:
1. sampling emrresulting from evaluating the integrand in the Fourier transform
(equation 2.1) ody at specific points, Le., a p p r h a t i n g an integral by a mm;
2. tmncution m r introduced by tmcating the Fourier series at a finite number
of points, i.e., negiecting the integral for fkequencies outside a speded range;
3. round-off emr intmduced by Iess precise cornputer arithmetic.
The h t source of error which occurs when a continuous response random
variable is involved can be controlIed by the interval between samples. The second
source of error, which can occur both in continuons and discrete cases, is controlled
by the number of samples taken. The b t error is reduced by reducing the width
of the intervals, and the second by increasing N, the input size, while keeping the
intervai width fixed. The round-off error should be negligible with the use of greater
numerical precision, eg., double precision.
In Chapter 2, we have seen that for an integer-valued m d o m variable with
support set 0,1,2, - - -, N - 1 the characteristic function is given by
where pk = P(X = k) is the probability m a s h c t i o n (pmf).
When the characteristic function was known but not the probabilities p k , 4
was evaluated at N equally spaced values in the i n t d [O, 27r) and the resulting
sequences c, dehed by
were used as the inputs for the FFT and the probabilitie 4
were recovered exactly.
There are, however, several useful discrete probability distributions with count-
ably W t e support. The geometric and Poisson probability distributions are two
such examples. On the other hand, the FFT being merely a fast algorithm for the
discrete Fourier transform (DFT) works with h i t e input sequences. It muld, then+
fore, be u& to explore the effect of truncating a non-hite seqnence to N finite
sample points. As we can see a little later, in some cases it is possible to obtain
analytic expressions which show how the error bound depends on the length of the
input sequence N and possibly other parameters. In many other situations, it may
not be possible to derive analytic solutions but empirical results may be used to
shed light on the magnitude of the inaccuracy resulting fiom tnincation error.
The effect of tmcation error in the FFT can be investigated by considering
the points ç,,, for rn = 0,1, .- -, N - 1:
N-L I mh/N
= Pk' 1
k=O
where ei-"lN = eamcn+hm/N, for h an integer, is used to obtain the second
equaiityand for k=O, - - . ,N- l ,
It can be seen that the inverse transform of c, = q5x(2m/N) will yield
A, . - 3 dN-l (value pk) plus the enor
The error term can be minimized by making N sufficiently large and one needs
to experiment in order to hd an optimal d u e of N that rninimi./.es the enor.
3.7.2 Error in the Geometric distribution
The geometric (or Pascal) distribution d e s in situations where interest lies in the
number of failures before the hst success occurs. It is a special case of the negative
binomial distribution, the later being a distribution that describes the number of
failures before the k t r successes. The geometric distribution has unbounded
support and its probability mass fiinction is given by
The error term for pk is given by
The percent error term for pk is
The e m r is l e s than r=0.01 if
If p = 0.5, then the input size N should be 7. In general, the size of N depends
on the rate of decay of the probability mass function under study. Note that for the
geometric distribution the percent error iç constant for aii values of the support set,
Le., does not depend on k. We confirmeci this r e d t numericaliy using the SPlus
program given in Appendix F as can be seen h m the following output.
- - --
The above output shows is the percent error Erom using FFT with N = 7
sample points to compute the probability mass function of a geometric distribution
with parameter O.S. As shown theoreticaiiy the error, which worked out to be 1%,
is uniform across the support of the random d a b I e . The "geom.fRn function
cornputes geometric probabilities at the sampled points and the SPIus function
"dgeom" is used to generate the exact probabilities fiom a geometric distribution
with parameter p = 0.5.
The geometric distribution was useful in checking, both analyticaiiy and nu-
merically, the degree of truncation that is introduced when the FFT is used for
varying input sim. For the Poisson distribution, on the other hand, it is easier to
investigate the truncation error ernpirically.
Suppose we are interestai in recovering probabities corresponding to a Pois
son distribution with mean parameter 3, Le., the random variable Y .V Poism(3).
The $Plus output below shows pointwise relative percent m m corresponding to
different input &es, N.
> pk <- dpoia(0:10,3)
> pkl <- pois.fft(il,cf.pois(11,3))
eb <- lOO*Bs(Cpk-pkl)/pkl)
> round(ek,digits=2)
Cl1 0.44 0.040.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> pk c- dpoia (O:6,3)
> pici <- pooia.fft(7,cf .p0i8(7,3))
> ek c- 100*abs(Cpk-pki) /pkl)
> roand(ek , d i g i t s 4
Ci3 30.26 5.15 1.19 0.36 0.13 0.05 0.03
If we assume a different mean parameter, say X = 2, and stili me an input size
of 7, the pointwise mors become:
> pk c- dpois(0:6,2)
> pkl <- pois.fft(7,d.pois(f,2))
> ek <- 100*abs ( (pk-pkl) /pkl)
> round(ek,digits=2)
Li] 2.48 0.32 0.07 0.02 0.01 0.00 0.00
h m the above r d t s we notice that the error depends, of course, on the mean
parameter X and decays as the support, k, gets large. For the geometric distribution,
the dependence of the error was entirely on the parameter p, not the support k. One
more example wi i i make this very clear. If Y - Gemetric(0.3), what wodd be the
pointwise relative percent errors if probabilities are calculated using the FFT on
input size of ï? The answer to this question is given in the following SPlus output.
> pk <- dgeom(0:6, -3)
> pkl c- geom.fft(7,d.geom(?, -7))
> ek <- 100*abs( (pk-pkl)/pkl)
> round(ek,digitsrl)
Cl] 8.24 8.24 8.24 8.24 8.24 8.24 8.24
The above results show that if the input size to the FFT is N = 7, probabilities
for a Geometric(0.3) distribution are recovered with 8.24% relative error, regardless
of the particular value of the support of the random variable.
3.7.3 Error in the Poisson distribution
For the Poisson distribution, the dependence of the e m r tenn on k (the support), X
(mean parameter) and N (FFT input size) can be surnmarized using sums involving
the Poisson probability m a s and cumulative distribution functions.
The probabiity m a s function for a Poisson (A) random variable is defineci by
The error term in the fmite FFT (i.e., based on sequences of Fourier frequencies
of size N) is given by
To simple this expression for the error term further, we will use a change of
variabIe technique. Let y = k + hN. Then the Iimits of sumation in y go from
k + N to oo by increments of N. We then obtain
where is the cumulative distribution function of a Poisson (A) random vari-
able. Note that y CU only take on values in the set of positive integers since N is
a natusal number and k E {O, 1, -. -, N - 1).
The percent error for pk is bounded by
where the notation fylA(.) iS used here to denote the probabiIity mass function of a
Poisson( A) random variable.
In particular, the percent error in computing P(X = O) = po is bounded by
The graph below shows the relative error in percentages of the FFT truncation
as a function of the input size N for X = 3, X = 5, and X = 7.
In this chapter we showed that a method based on the theory of the Fourier t r am
form and implemented using the fast Fourier transform (FFT) can be fruitfully used
to compute exact tail area probabilities in the class of generalized linear modeis
(GLMs). This ùass of models are suitable for anaiyses involving inversion of char-
acteristic functions since the joint characteristic function for a vector of sufücient
statistics can be expiicitly ddved rdatively easiIy.
In particular we have illustrateci, with numerical examples, that exact infer-
ences can be carried out for hypotheses of interest in two popuiar remon modeIs
Figure 3.3: Relative error in percentages of the FFT for caiculating P(X = 0) for
different input sizes N and parameter vaIues X = 3, X = 5, and X = 7.
- the logistic and Poisson models. We have ais0 extended the FE'T fhmework to
weighted srponential family models and considered two practical examples - the
tmcated Poisson and binomial models. We have outlined the general approach
for any generalized linear mode1 including modek based on a continuous response
vector. Findy, we discussed different sources of errors in any FFT based approach
and derived some crude error bounds for some discrete distributions.
Chapter 4
Sensit ivity analysis
4.1 Introduction
Models are simplified representations of a data generating mechanism with an inher-
ent certainty of departures from rnodel assumptions. Mode1 misspedication may
arise in severai ways. For exampie, in a generalized linear models context one may
misspecify the underlying error distribution, the Iink function, or the structurai
form of the systematic component or any combiiation of these. Also the possibility
of the observed data to contamination that rnay be caused by transcription error or
any other misclassification error codd have a huge impact on results of statistical
inference. For these reasons, it muid be usefui to investigate whether the impor-
tant aspects of the mode1 remah stable or robnst under some perturbations. In this
chapter we use the FFI' based framework to investigate sensitivity of exact tail area
probabilities to some of the possible mis-specifications outlined above.
4.2 Robustness
The literature on robustness has grown steadily over the past quarter of a century.
One early work is the f rinceton robustness study by Andrews et ai. (1972). Several
authors studied robustness of statistical inference techniques from dXerent perspec-
tives and as a result one may h d a slightly different interpretation and definition
for robustness.
Robustness involves protection against model misspecification and resistance
of inference conclusions to spurious observations (Lindsey, 1996). Robust estimation
aims at characterizing how point estimators of population parameters behave when
there are mors in the data (Horowitz and Manski, 1997).
One way of accommodating extreme (outlying) observations is to use distribu-
tions with thicker tails, in which case a member of the secdeci stable distributions
wodd often be suitable (Lindsey, 1996). Alternatively, a finite mixture distribu-
tion might be used to model contaminated observations. An m-component mixture
distribution can be expressed as
where C q = 1. In practice m will be smaii, Say 2 or 3.
This approach is an example of embedding, with the ?ri's being the embedded
parameters. As an illustration, suppose we have a mixture of 'acceptable' observa-
tions together with a few 'unacceptable' or extrerne ones. If a model suitable for the
central observations is chosen as the data generating mechanism, one wouid hope
the corresponding ri's to be large. A formal test of hypothesis can be conducted to
determine the plausibility of a model with ri = O for the contaminateci observations.
h conducting exact inference, however, not much is known about the idu-
ence of contamination and/or model misspecification on computed pvalues. For
generalized h e a r modeh, the vector of the suflicient statistics that r d t e d h m a
canonical pararneterization will no longer be sufficient under the perturbed model.
In the absence of any theoretical karnework, it seems reasonable to introduce a
known perturbation factor and use the same conditional argument to test hypothe-
ses on parameters of interest in a "what-ifn kind of approach.
4.3 Misclassification Errors
In binary regression, contamination in y can ody take the very simple fom of a
transposition enor between O and 1 (i.e., O + 1 or 1 + 0).
Suppose that such transpositions happen with a smaU probability y, so that
the actual recorded response y is governed by a probabiity p* instead of p,
where p is given by the assumed mode1 and 7(< 0.5) denotes the probability of a
transcription error (Copas, 1988; Collet t, 1991, section 5.7.3).
For example an uplier with y=l and p near zero can be explaineci as a tran-
scription error with probability p' = 7 > O rather than as an extremely unusual
response from the basic model. Transcription error occurs in many practical situa-
tions. In studies on prostate cancer, for example, an enor in diagnosing whether or
not a patient had nodal involvernent may r d t in a transcription error.
DiEerent interpretations couid be gîven to this contaminateci modd. For exam-
ple, it can be interpreted as a family of transformations indexeci by a transformation
parameter 7 > O, with the pure logistic modd being obtained when 7 = 0.
The above mode1 can be fitted using the rnethod of m h u m likelihood by
mtuominng the log-likelihood function given by
A better understanding about the approach to be taken in fitting this model
can be gained upon re-Wnting the model in equation (4.1) in the form
which shows that the link function is not the logistic link anpore but one that
includes the additional unknown paramater 7. Statistical packages that allow user-
spec3ed link functions (e.g. GLIM) can be used to 6t the model. One oRen fits this
model for a range of values of y and the model for which the deviance is minimized
is adopted as the one that describes the data welI (ColIett, 1991).
Similady, here we investigate the sensitivity of exact tail probabilities for the
type of misclaçsification described above by considering diflerent values of 7 using
the FFT method (Appendix C).
4.3.1 Numerical Example
Suppose in a certain exposure-response layout experiment the observed vectors of
response, group size, and covariate were (Z,4,6), (10,12,14), and (0,1,2), respectively.
A test of the hypothesis H, : BI = O under a logistic fit with no assumeci misciassi-
fication r d t s in the following taii probabilities.
In this case introduction of some degree of contamination did not affect the
exact pvalues. If, however, the nul1 hypothesis is different from O the exact results
are affected. Table 4.1 surnmarizes 'exact' taii probabilities for a range of des of y
for testing Ho : & = 1. The table also shows the marginal probability being used in
the computation of the conditional distribution of TL lTo = GJ. As d i s c d before,
monitoring the value of this marginal probability is vital to make sure that strange
conditiond probabilities were not obtained as a result of division by a number suf-
fkiently close to zero.
As can be seen from the above table, a significant change in the marginal as
weli as conditiond probabilities codd result as the degree of contamination increases,
which in turn hfiuences the 'exact' tail probabilities. Note that the last row of Table
4.1 corresponds to p* = 112, which in the conditional sense is eqnivalent to testing
Table 4.1: Sensitivity of exact pvaiues to misclassification enor
--y P(To = to) Left 1-tail pvalue Right 1-tail pvalue 2-tail pvalue
the hypothesis BI = O witb no mis-cIassification in the data. Hence, ali of the
probabiIities of interest shown in this row match the values given previously.
4.4 Mis-specification of link function
In a typical medical application the outcome of interest, y, is a binary random
variable which can be coded as l/O representing patient's response to treatment
(e.g. cure/no cure, alive/dead), and the exphnatory cariable, x., is a vector of
treatment and patient characteristics (e-g. dose, age, sex).
Let
A binary regression model for a p-component vector x of covariates asserts that
where F is a given response bc t ion and 0 is a pvector of regremion coefficients.
As outlined in Chapter 3, cornmonly used link functions in binary regression model
include
F(u) = eu/l + eu + logistic regression
F(u) = @(u) -3 probit regression
F(u) = 1 - eq(-eq(u)) -+ complementary log-log regression
We have previously discussed that for a binary regression model the canonical
link is the logit function and the conditional argument which allowed us to conduct
exact inference for a given parameter of interest by conditioning its corresponding
d u e n t statistics on another set of sufficient statistics which are associated with
parameters that we are not interested in (nuisance, incidental parameters) wi l l not
be valid. For example, for a Iogistic model with an intercept (nuisance) and dope
(parameter of interest) parameters, we have seen that the pair of sufficient statistics
are a simple linear functions involving the response and the explanatory variable.
A question that may be worth asking and exploring is that whether these
pair of sufficient statistics can be "reasonably s d u e n t n to be used as a basis for
inference under other link functions. In other words, to what extent wodd the tail
probabiIities change if the same statistics as used in logistic regresçion were to be
used with, Say, probit or complementary log-log regession models?
4.4.1 Testing for ,& = O
Bina~y regression: Logit link
Consider a simple binary regression mode1 based on 28 patients and a response
rate of 13/28 (46%). Suppose &O the values of the explanatory variable x can take
on oniy 3 values, namely, 0, 1, and 2. The conditional distribution of Tl given
Ta = to obtained kom fitting a simple Iinear logjstic modd is distributeci on the
support tr E {4,5,. -- ,20). This is summarized in the foflowing summary output
(to four decimal places). The arguments to the SPlus funetion R2d.n~ are the
response vector y = (3,4,6), the sample size vector n = (10,10,8) and the covariate
vector z = (O, 1,2).
The observed values of (To, 2'1) is (13,16) and the left, right, and two-tail
probabilities (to 4 decimal places) are 0.9819, 0.0539, and 0.1079, respectively. For
the same reason as that of the case of misclassification, changing the link function
from logit to probit or complementary log-log did not d e c t the 'exact' probabilities
in testing the hypothesis ji = 0.
4.4.2 Testing for Ho : = c (c # 0)
Suppose we are testing a nul1 hypothesis different hom zero. The following SPlus
output gives the conditional distribution of TL given To = to obtained under the
null hypothesis Ha : pl = 0.5 using the example data introduced in the preceding
section. The same pair of canonical sdicient statistics that have been used so far
are being used with both canonical and non-canonical link functions. At best, the
interpretation one may give to the probabilities in the non-canonid cases would be
to think of in terms of "approziniate suficiency".
Inspection of the output below shows that for the logit iink (the can0nica.I Iink
in this example), the conditional distribution has a non zero (to four decimal places)
probability on the support set t E {6,7, -. - ,21). For the probit and complementary
log-log link functions, the support set associateci with non-zero probabilities is tl E
{8,9, ---, 21). The probability mass differs across these three link functions in a non-
systematic way. However, for this example, the right hand one-tailed probability
increased in the probit and complementary log-log regressions (Table 4.2).
Conditional distribution based on logit , probit, and complementary log-
log link functions
> fft2d.np(yrc(3,4,6) ,n=c(10,10,81 ,~cC0,1,2) ,bl-0.5,linkr3)
ISComplementary Log-Log
$cond.pr :
Cl] 0,0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
C91 0.0000 0.0002 0.0012 0.0047 0.0150 0.0392 0.0832 0.1428
Cl71 0.1953 0.2088 0.1688 0.0976 0.0363 0.0067 0.0000 0.0000
C251 0.0000 0.0000 0.0000 0.0000 0.0000
The left, right, and twetail probabilities dong with the marginal probabil-
ity P(To = to) obtained under three link functions are summarized in Table 4.2.
The framework which emphasizes using expficitly known characteristic functions to
obtain unknown distribution functions wodd also allow investigation of effects of
covariate dismtizatioa This is helpfui in its own right since in many practical cases
it is common to use 'categorized' version of an inherently continuous variable. Age
and income are two such exampIes. See a h BagIivo et al. (1996) for an inter-
esting appiication of generating hrnctions for sensitivity analysis in the context of
permutation distributions.
Table 4.2: Semitivity of exact pvalues to link misspecification
Link P(To = to) LeR 1-tail pvalue Ri@ 1-tail pvalue 2-tail pvdue
4.5 Summary
In this Chapter we presented resuIts of empirical insmtigation on the sensitivity of
'exact' tail area probabilitieç to sorne departues from ordinary model assump tions.
In particular, we discussed the possibility of misciassification e m r in the context of a
bisary regression model and showd that tests of hypotheses on the dope parameter
away fÎom zero codd result in a marked &ange in the pvalues depending upon the
extent of the rnisciassihtion rate. Aithough the sufiicient statistics which are sim-
ple linear combinations of the respome and covariate vaiues are no longer sdficient
when the iink fnnction is not canonical, we have attempted to see if they can still
be just%ed as "appmzimate suficient dat5st i~" and in what way the conditional
distribution used to make inference is affected by using them. Our ernpiricai inves-
tigation shows that the exact gvalues obtained sssnming a logit link are unaffected
when the link is changed to either probit or complementary hg-log when testhg the
n d hypothesis PI, r & = 0.
Chapter 5
Alternat ive approaches
5.1 Introduction
Broadly speaking, a given statistical inference probiem may be approached in one
of two ways: exactly or approlcimately. One obvious drawback of methods for exact
inference is that the size of the data that can be currently processed efficiently is
limited. When the sample size is s m d to moderate, exact methods are feasible
and should be used as much as possible. However, in practice, one may have a
situation where the sample size is somewhere in the rniddIe, big enough to cause
computational burden for exact methods but smalI enough to make the traditional
centrai limit theorem based approach perform poorly.
To overcome this problem several suggestions have been proposed in the liter-
ature in recent years. For instance, some sort of "hybrid" algorithm can be used in
which an exact algorithm is applied to some part of a given problem while traditional
large sampIe methods are applied on other aspects of the same problem. Monte Carlo
approaches are &O becoming usehi in such problem solving techniques.
5.2 Small sample asymptotics
There are various asymptotic techniques in use in statistics (Barndofi-Nielsen and
C a , 1989). One such technique which bas been proved by many researchers to be
very useful in the asymptotic theory of statistics is based on saddlepoint approxi-
mation (Barndorff-Nielçen and C a , 1979; RRid, 1988).
Approaches based on sadepoint expansions are sometimes referred to as smd-
sample asymptotic methods. The name small-sample uaymptotics might have been
coined to highhght the fact that in most practical cases people have been working on
these methods were found to provide extremely accurate resdts for appraximating
densities and distribution functions of statistics of interest even when the sample
size is s m d . In problems involving continuous data, these methods are wll-studied
(Reid, 1988).
First introduced into the statistics literature by Daniels (1954), the method
of saddlepoint apprmtimation has been applied in a variety of problems arising in
paraznetric an&& (Bedrick and Hill, 1992; Daniels, 1983, 1987). Saddlepoint
approximations have also been used in nonparametric analyses, such as in approx-
imating permutation distributions and in resampling methods (Booth and Butler,
1990; Davison and Hinkley, 1988).
A largedeviation expansion cm be obtained for a conditional density (distri-
bution) by separate approximation of numerator and denominator via saddiepoint
expansion. This is cded double saddlepoint approximation (Bamdorff-Nielsen and
Cox, 1979). Davison (1988) used double saddepoint approximation and carried out
conditional inference in generalized linear models with canonicai link functions.
The other approach in using saddlepoint methods in conditionai inference is
using a single saddlepoint to directiy approximate the conditional distribution. Skov-
gaard (1987) and Wang (1993) describe the usefiilness of saddlepoint appraimation
in conditional inference. Skovgaard (1987) gives saddlepoint expansions for condi-
tional probabilities of the form P(Y 3 y 1 k = x ) where (x, I) is an average of n
independent bivariate randorn vectors. A general version correspondhg to condi-
tioning on a (pl)-dimensional linear function of a pdimensional variable has aiso
been shown.
5.3 Large sample results
Suppose inference concerns a parameter vector 8 with p elements and let the hy-
pothesis of interest be
Let the vector of efficient scores be denoted by U(0) with its jth element
8 - MVNp(B, 1-'(O))
where Ii8) denotes the Fisher expected information matrix, with (j, k)" eiement
Also, for sdiiciently Iarge sample size,
There are three asymptotidy equivalent test statistics that are often used to
make st atistical inference.
1. Likelihood ratio statistic
where z(&) is the value of the log-likelihood at the maximum Wrelibood estimate 8.
2. Score Statistic
An advantage with this test is that it does not require 4.
3. Wald test
xw = ( ê - ~ , , ) ~ ~ ( t ~ , ) ( ê - 00) - G. Alternative version of this test replaces J ( q ) by ~(6). For certain hypotheses in
logistic regression, the Wald test can behave in an aberrant manner (Hauck and
Donner, 1977).
5.4 Applications to simple logistic regression
Consider the simple logistic regression mode1 logi th) = + pixi. An exact condi-
tiond test of PL = O iS baseci on conditional probabilities of the form
where Tl is the sufficient statistic corresponding to Pt, the parameter of interest,
and To the diicient statistic for the nuisance parameter Bo.
5.4.1 The likelihood ratio test
The deviance or likelihood ratio statistic has the form
where 1 is the log-likelihood in which the fidl model includes both Bo and pl and the
reduced sets fil to zero. For large n, the statistic W follows a X: distribution.
The signed Wrelihood ratio test statistic w = sign(W)m has an a p p r h a t e
standard normal distribution and P(u < wolgl = 0) can be used to approximate
the conditionai taiI probability, P,.
5.4.2 The Wald test
The Wald chi-square statistic in this simple case has the simple form
and has a asymptotic distribution.
has an apprarcimate standard normal distribution. The conditionai probabdity P,
is approximated by P(z < z,,tpl = O). Hypotheses involving testing non-zero nul1
values cm bz fmnuiated and tested similarly.
6.4.3 The double saddlepoint approximation
A double saddepoint approximation to the conditional tail probabibty P, is given
by (Davison, 1988)
where and # are the standard normal cumulative distribution and density func-
tions, respectivdy; w is the signed likelihood ratio test statistic; and t is the çigned
Wald chi-square statistic.
The quantity denoted by p measures the ratio of information for the nuisance
parameters in the full and reduced models and is given by (Platt, 2000)
where h2 refers to the bIock of the variance covariance matrix of the B parameters
correspondhg to A,
Some modijications to the general saddlepoint approximations are oRen made
for randorn variables distributecl on a lattice span (Skovgaard, 1987) that are in-
tended to serve the same purpose as the traditional continuity-corrections we are
used to doing, for exampIe, in approlcimating a binomial variable by a normal. In
the simple logistic regression case, the modification is to replace z in equation (5.5)
The error of the saddlepoint approximation to the exact conditional taiI prob-
ability is O (n-3/2) whereas the unconditional approximations based on the Wald or
the likeIihood ratio test have error O (n-II2), where n is the total sample size.
In this Chapter, we presented inferentiai methods in varying degree of deviation
fiom exact andpis. On one extreme, the traditional k t order methods (e.g., Wald
and Iikelihood ratio test) rdy heavily on having sufficiently large sample and they
only provide approximations of ~(n-'/*). A better higher order approximation can
be carried out by using the swx&d s m d sampIe asymptotic methods. These
techniques, based on a saddlepoint expansion, provide approximations of 0(n-3/2)
and can be suitable for cases in which the computational burden of carrying out
exact inference is prohibitive.
Chapter 6
Summary and discussion
S m p k whee effective size is small or moderate arise very commonly in a varie@
of practical appIications, making small-sample inference one of the major themes of
research in statistics. In such cases, traditional large-sample methods may give Iess
accurate and often misleadhg results. Therefore techniques which can be used to
compute exact pvalues shodd be used as much as pmible. Currently there are a
few algorithms and commercial softwares (usually at a peak price!) which address
the problem of small sample inference. Some of the available canneci packages are
optimized to address a s p d c problem (e-g. only a logistic regression rnodel).
In this thesis we have presented a &ed approach taking advantage of the
unique one-teone conespondence between a distribution function of a random vari-
able and its characteristic function. Exact inference is based on the conditiond
distribution and density functionç of statistics mfEcient for a parameter of interest,
given ancillary quantities.
We show that for a wider class of regression models, the discrete Fourier trans-
form (Dm) can be used dong with the fast Fourier transform (FFT) to go fiom
the characteristic function domain back to the probability distribution domain. In
problems involving discrete random variables with finite support, the inversion of
the characteristic function is done exactly on a properly chosen finite grid of Fourier
sequences. If discrete variables with infinite support are involveci, one can still get
exact resdts by letting the grid size grow sufficiently large. For example, for a
random variable that has a Poisson distribution, the truncation error thai; may be
caused by using finite input size decays quickly (exponentially) and exact results are
obtained within a reasonable maximum grid size.
We have demonstrated that hypotheses of interest for t y p i d bioassay type bi-
nary regression models and a Poisson regression modei involving one discrete covari-
ate can eady be conducted within a widely avaiiabIe statisticai software (S-Plus).
The general case of p covariates does not pose any problem in principle, but storage
and processing time of the high dimensionaI grid in the DFT could become a burden.
This is expected to improve with the increasing improvements in computing power.
We have also highlighted that exact inference can be made for any member of
the class of generalized linear models (GLMs), including continuous response mod-
els such as a model based on an exponentiai distribution. The FFT approach has
an additional source of error called sampling error in cases involving a continuous
outcome variable. This is because the DFT is an approximation of the continu-
ous Fourier transform (the error i n m e d when a continous function is numericaiiy
integrated using çums).
In a simple logistic regression model, the presence of misclass3cation enor
and/or misspecification of some features of the model does have a marked infiuence
on 'exact' pvalues obtained using the same conditionai argument of conditioning a
SuffiCient statistic of a parameter of interest by the suflicient statistic corresponding
to a nuisance parameter. This impact is to be expected since the suflî.ciency based
on a pair of statistics that are linear combinations of the response variabIe and as
a r e d t the conditioning principle - will breakdom when we we consider a non-
canonical form for a binary regression. However, the formulation that was put in
place using the characteristic functions dows LIS to do sensitivity analysis to get
some insight into what actudly happens.
In general exact approaches provide more reliable pvaiues than their asymp
totic counterparts for cells with sparse data, at the expense of less efiiciency (both
in terms of t h e and storage) for c e h with sufEicient data. This suggests that to
have the best of both mrlds one should try using a mixture of exact and asymptotic
methods by using a hybrid approach.
Also when exact inference is computationaüy prohibitive or when the sample
size is not large enough for the fht-order methods (based on the central iimit
theorem) to be d d , the ecaiied maü-sample asymptotic methods which are based
on saddlepoint appmximations could be usehl alternatives. There has also been
rapid development of mall-sample approaches based on simulation and resampling
techniques including the bootstrap and Markov chah Monte Carlo.
Bibliography
[l] Agresti, A. (1992). A s w e y of exact inference for contingency tables (with
discussion). Statïstical Science, 7, 131-177.
[2] Andrews, D.F., Bickel, P.J., Harnpei, F.R., Huber, P.J., Rogers, W.H. and
Tukey, J.W. (1972). Robwt Estimates of Location. Princeton University Press,
NJ.
[3] Baglivio, J., Pagano, M. and Spino, C, (1996). Permutation distributions via
generating functions with applications to sensitivity analysis of discrete data.
J. Amer. Statist. Ass., 91, 1037-1046.
[4] Baker R.J. (1977). Algorithm AS 112. Exact distributions derived fiom twwvay
tables. Appl. Statist., 26, 199-206.
[5] Barndorff-Nielsen, O.E. and Ca, D.R. (1979). Edgeworth and saddlepoint a p
proximations with statistical applications (with discussion). J. Roy. Stdist.
Soc. B, 41, 279-312,
[6] BarndorfE-Nielsen, O.E. and C a , D.R (1989). Asymptotic Techniques for Use
in Statistics. Chapman and Hall, London.
[7] Barndorff-Nielsen, O.E., and Cm, D.R. (1994). Inference and Asymptotics.
Chapman and Hali, London.
[8] Bedrick, E.J. and HiIl, J.R (1992). An empirical assessrnent of sacidiepoint
approximations for testing a logistic regression parameter. Biometrics, 48,529-
544.
[9] Booth, J.G. and Butler, R. W. (l99O). Randomization distributions and saddle-
point apprha t ions in generaIized hear models. Biometrika, 77, 787-796.
[IO] Bracewell, R.N. (1986). The Fowier T m ~ f o n n and its Applications, (2nd ed.).
McGraw-Hill, New York.
[Il] Brigham, E.O. (1988). The Fast Fourier ~ n s f o m and its Applications.
Prentice-HaIl, Englewood Cliffs, NJ.
[12] Coilett, D. (1991). Modelling Binary Data. Chapman and Hall, London.
[13] Cooley, J.W. and Wey , J.W. (1965). An algorithm for the machine caIculation
of cornplex Fourier series. Mathematical Computation, 19,297-301.
[14] Copas, J.B. (1988). Binary regression models for contaminated data Journal
of the Royal Statistical SocietyB, 50,225265.
[15] Ca, D.R (1970). Anal* of Binary Data Methuen, London.
[16] Daniels, H.E. (1983). Saddlepoint approximations for estimating eqnations.
Biometrika, 70, 89-96.
[l?] Daniels, H.E. (1987). TaiI probabiiity approximations. Int. Statist. Rev., 54,
3748.
[18] Davison, A.C. (1988). Approlaniate conditional inference in generaIized Iinear
modds. Jovrnol of the Royal Stutistical Society B , 50,445-461.
[El] Davison, A.C. and Hinkley, D.V. (1988). Saddlepoint approlamations in resarn-
p h g met hods. Biometrika, 75, 417-431.
[20] Finney, D.J. (1964). Statisticul Methad in Biolqiuzl Assuy. Charles Griflin,
London.
[21] Fisher RA. (19%). Statistical Methoh for Raeurrh Workera. Edinburgh:
Oliver and Boyd.
[22] Hauck, Jr., W.W. and Domer, A. (1977). Wald's test as applied to hypotheses
in logit andysis. J. Amer. Stutist. Ass., 72, 851-853.
[23] Hirji, K.F., Mehta, C.R. and Patel, N.R. (1987). Computing distributions for
exact lagistic regression. J. Amer. Stotist. Ass., 82, 1110-1117.
[24] Hi@, K.F., Vollset, S.E., M, I.M. and a, A.A. (1996). Exact tests for
interaction in severd 2 x 2 tables. J. Cornpututional and Gmphical Stat., 5,
209-224.
[25] Horowitz and Manski (1997) In the Bandbook of Statistics: Robvst Inference,
edited by Maddala, G.S. and Rao, C.R, North-Holland, New York.
[26] Knight, K. (2000). Mathematical Statistics. Chapman and Wall, London.
[27] Lehmann, E.L. (1983). Theory of Point Estimation, Wiey, New York.
1281 Lindsey, J. K. (1 996). P a m e t + c Statistical Inference. Clarendon P m , Oxford.
[29] LogXact-'Liirbo (1993). LogXact-'ltrrbo: Logistic Regression Software Featuring
Ezact Methoh, Version 1.01. Cytd Software, Cambridge, Massachussets.
[301 March, DL, (1972). Exact probabilities for R x C contingency tabIes. Commu-
nications of the Association for Compubing Muchinery, 15, 991-992.
[31] McCullagh, P. and Neider, J.A. (1989). GenemLed Lineur Modeb (2°d ed.).
Chapman and Haii, London.
[32] Mehta, C.R and Patel, N.R. (1980). A network aigorithm for the exact treat-
ment of the 2 x k contingency table. Commun. Statiat. Sirnul. Cornput., 9 ,
649-664.
[33] Mehta, C R and Patel, N.R. (1983). A network algorithm for performing
Fisher's exact test in r x c contingency tables. J. Amer. Statist. Ass., 78,
427-434.
[34] Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. J.
R. Statist. Soc. A, 135, 371F84.
[35] Pace, L. and Saivan, A. (1992). A note on conditional cumulants in canonicd
exponential families. Scandinavian Journal of Statistics, 19, 185-191.
[361 Pagano, M. and Tritchler, D, (1983). Permutation distributions in polynomiai
time, J. Amer. Statist. Ass., 78, 435440.
[37] Pierce, D.A. and Peters, D. (1992). Practical use of higher order asymptotics
for multiparameter exponentid families (with discussion). Journal of the Ro yal
Statistical Society 3, 54, 701-137.
[38] Platt, RW. (2000). Saddepoint approximations for small sarnple logistic r e
gression problems. Stat. in Med., 19, 32S334.
[39] Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery B. P. (1992).
Numerical Recipes in C. Cambridge University Press.
[40] Reid, N. (1988). Saddepoint methods and statistical inference (with discus-
sion). Statbt. Sci., 3, 213-238.
[41] Reid, N. (1995). The roles of conditioning in inference. Statistictrl Science, 10,
138-157.
[QSI Skovgaard, LM. (1987). Saddlepoint expansions for conditional distributions.
J. Appl. Prob., 24, 875-887.
$31 SPLC'S (1999). 5'- PL US 2000 for Windowa, Professiocal FIlease 1. MathSsoft,
Luc., Seattle, Washington.
[44] StatXact-nirbo (1992). StatXact-Turbo: Statisticcrl Software for Exact Non-
pumrnetric Inference, Version 2.11. Cytel Software, Cambridge, Massachussets.
[45j Tritchier, D. (1984). An algorithm for exact logistic regression. J. Amer. Statbt.
Ass., 79, 709-711.
[461 Venables, W.N., and Ripley, B.D. (1997). Modern AppBed Statrstics urith S-
Plw. Springer, New York.
[47] Verbreek, A. and Kroonenber, P.M. (1985). A survey of algorithms for exact
distributions of test statistics in r x c contingency tables with 6xed margins.
Comp. Statist. Data Analysis, 3, 159-185.
[48] Vokiet, S.E., Hirji, K.F. and Elashoff, R.M. (1991). Fast computation of exact
confidence limits for the common odds ratio in a series of 2 x 2 tables. J. Amer.
Statist. Ass., 86, 404-409.
[49] Wang, S. (1993). Sacidiepoint approximations in conditional inference. J. Appl.
Pd., 30, 397-404.
[50] Weerahandi, S . (1995). Exact Statistical Methoda for Data Analysis. Springer,
New York.
Appendix A
FFT program for Beta-binomial
c- Wurpoae: computing probabilit iea from charactaristic fanction
#via FFT
betabin <- funnion(n, mu)
i
TO <- n + 1
t O <- rep(0, TOI
f o d j in (1:TO))
{tO [j] <- 2qi*( j - l ) /TO)
1 lt evaluate
characteristic function at tO
Ms <- rep(0 ,TOI
Ms <- (1+ mu*(exp(tO*(li)) -1))-n
ü## get the probability matrix
prob <- round( (Re(f ft (Ma)) ) /TO, digit8=20)
supp <- eeq(0 ,n)
sump <- m(prob)
mean,y <- sum(supp*prob)
dista <- cbind(aupp ,prob)
zero. vec <- rep (O, TO)
bin <- dbiaom(sapp,n,mu)
diff <- prob-bb
plot(supp ,prob)
for (i in 1:TO) Csepeats (supplil , zero .vecCil ,
srrpp Cil , prob Cil 11
par (ne-T)
plot ( q p , bin, type~="l", rlab=l1 " , ylabt1 ")
Appendix B
FFT program for a weighted sum
of random variables
#Purpose: FFT for a aeighted sum
grid <- fnnction(T0,ltrl) {
j <- aeq(1,TO)
tO <- k+2*pi*( j-1)/TO
return(liat(tO=tO)
1
#Weivalaate the charaterietic fmiction at tO
cf <- function(T0 ,hl)
< Ha <- rep(0,TO)
x <- seq(l,6)
tO <- grid(TO,k)$tO
for (i in 1:TO)i
HsCil <- su( (x/21)*erp(tOCil *(li)*x))
1
return(Us)
1
WW get the probability matrix
pmf <- function(T0,Ue)
< prob <- rormd((Rs(fft(Me)) )/TO, digits=l?)
sppp <- aeq(0,TO-1)
distn <- cbind(sapp ,prob)
retarn(prob)
1
Appendix C
FFT program for binary regression
c w # # # # # # -
OpUrpose: computing probabil i t ies from characteristic funct ioo #
via FFT
the charateristic function at tO and tl
1 cf c-
function(t,p,n) €0- p + p*arp(Ct)*(li)))'n)
Hi <- array (tO,c(TO ,Tl) 1
H2 <- tCarcay(tl,c(Tl,TO)))
Hs <- 1
eta <- rep (O,length(n) ) tinitialize the linear pradictor
p <- rep(O,length(n))
f or(i in I :length(n))
€
etaCi1 <- bO + bl*xCil
if (linit-1) €p Cil <- exp (etaCi1) / (l+erp(etaCil))> #logit
else if (li-2) ip Cil <- pnodetalil > #probit
else ÇpCi] <- 1 - exp(-exp(etali1))) #coq. log-log
p [il c- (1-eps) *p Ci] + eps* (1-p Cil ) #contamination
Ms C- (Ha*& ( (Hl+x Cil *H2) ,p [il ,nCil) 1
>
## get the probability matrix
prob C- Re(f f t (He) / CTO*Tl) )
sm.p <- sam(prob1 wtconeistenq check
pr <- probCtO.o+I,l
sum.pr <- sum(pr) # u s a to monitor this marginal probability
cond.pr <- pr/sum.pr
pvaï.rhs <- 1- sum(cond.prCl:tl.ol)
pvaï.ïha <- sum(cond.prCl:(tl.o+L)l)
pvaï.tuo <- min(2*min(pval.rha ,pval.lhs) ,1)
retrun(t0.oJtl.o,pval.lhs,pval.rhs,pval.tuo)
#return(cond.pr) #removing thia rine vil1 generate the
#antire conditional distribution
>
Appendix D
FFT program for Poisson
regression
#Purpose: computing probabilities fram characteiristic function
#via FFT
poisld <- function(bOl0,bl=O, x, y)
< n <- length(r1
TO <- m ( y ) + 1 # may need to adjast the length of the grid
Ti <- max(T0, sum(x+g) + 1) t may need t o adjust the length of the grid
t O . 0 <- sum(y)
tl . O <-sam(x+;g)
t O <- rep(0, TOI
ti c- rep(O.Tl)
for( j in (1:TO))
C t O C j l C- 2*pi*( j-I)/TO)
f d k in 1:TI)
{tl CkJ <- 2*pi*(k-1) /Tl)
1 Wevaiuate
the charateristic funaion a t t O and tl
## get the
probability matrix
prob c- Re(fft(Hs)/(TO*Tl))
6um.p C- sum(prob) #consiatenq check
pr C- probCtO.o+l,l
sum.pr <- sum(pr) üvitai to monitor this marginal probability
cond.pr c- pr/aum.pr
pval.rhs C- 1- simi(cond.pr[l:tl.ol)
p v 8 l . U ~ <- sum(cond.pr[l: (tl.o+l)l)
pral. tuo c- min(2+min(pval. rha ,pval . fis) , 1) rehuri(tO.o,t1.o,pvd.Iha,pvd.rhs,pval.tuo)
#return(cond.pr) gactivating this Lins u i l l generate
#entira conditionai distribution
>
Appendix E
FFT program for zero-t runcated
Poisson regression mode1
-- #Purpose: cornputhg probabilities from characteristic function
#via FFT for a Trnncated Poisson r e p s e i o n
Tpois2d c- functionCr , y, bû=logCmean(y) 1 , bl-0)
< n c- length(x)
TO <- (sum(y) + 1)# may need to adjnst the length of the grid
Tl <- max(TO,mtm(x*y) + 1)# may need to adjust the lengtb of the g r i d
t O .O <- sPrn(y1
t l . 0 c-sam(x+y)
tO <- rep(0, TOI
tl <- rep(0,Tl)
for(j in (1:TO))
{ t O Cj] <- 2*pi*(j-l)/TO)
for& in 1:Tl)
{tt Ckl <- 2*pi* (k-1) /Tl)
Wevaluate the charateristic function at tO and t 1
M l c- arrap(tO,c(TO,Ti))
M!2 <- t(array(tl,c(Tl,TO)))
HE <- O
f o d i in 1:n)
C
1ambda.i C- exp(b0 + bl*rli l)
Hs C- (Ms+(log( (exp(1amMa. i*eq((Ml+xlil *Ill) * ( E l ) ) - 1) / (exp(1ambda. i) -1) ) )
1
kh c- erp(n6)
#tlt get the probability matrir
prob C- round(Re(f f t (He) /(TO*Tl) ) ,digits=l7)
snm.p <- wrm(prob1
Appendix F
FFT program for error analysis in
geometric distribution
#ilevalnate the charataristic hmction at tO
cf .gaom <- function(T0 ,p)
C
Ms <- rep(0,TO)
tO <- grid.geom(T0) $t0
for (i in 1:TO)C
He Cil c- (1-p) / (1-p*exp(tO Cil *Clil 1)
> ratnrn(Xs>
1
## get the probability matrix
geom. ff t c- function(T0 ,Ma)
C
prob <- round( (Re(fft (Ma) ) )DO, digits=20)
supp <- seq(0,TO-1)
diatn <- cbind(eupp ,prob)
returdprob)
1