Empirical Characteristic Function Approach to Goodness of Fit Tests
-
Upload
sameer-al-subh -
Category
Documents
-
view
235 -
download
0
Transcript of Empirical Characteristic Function Approach to Goodness of Fit Tests
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
1/277
Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc.
November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00
iii
Journal Of Modern Applied Statistical Methods
Shlomo S. Sawilowsky
Editor College of EducationWayne State University
Harvey Keselman Associate Editor
Department of PsychologyUniversity of Manitoba
Bruno D. Zumbo Associate Editor Measurement, Evaluation, & Research Methodology
University of British Columbia
Vance W. Berger Assistant Editor
Biometry Research Group National Cancer Institute
John L. Cuzzocrea Assistant Editor
Educational ResearchUniversity of Akron
Todd C. Headrick Assistant Editor
Educational Psychology and Special EducationSouthern Illinois University-Carbondale
Alan Klockars Assistant Editor
Educational PsychologyUniversity of Washington
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
2/277
Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc.
November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00
iv
Journal Of Modern Applied Statistical Methods
Invited Debate
332 339 Daniel H. Robinson, The Not-So-Quiet Revolution: CautionaryJoel R. Levin Comments on the Rejection of HypothesisTesting in Favor of a Causal ModelingAlternative
340 347 Joseph Lee Rodgers Statistical and Mathematical Modelingversus NHST? Theres No Competition!
348 358 Lisa L. Harlow On Scientific Research: The Role of Statistical Modeling and HypothesisTesting
Regular Articles359 368 Robert H. Pearson, Recommended Sample Size for Conducting
Daniel J. Mundfrom Exploratory Factor Analysis onDichotomous Data
369 378 Marcelo Angelo Cirillo, Generalized Variances Ratio Test for Daniel Furtado Ferreira, Comparing k Covariance Matrices fromThelma Sfadi, Dependent Normal PopulationsEric Batista Ferreira
379 387 Kung-Jong Lui Notes on Hypothesis Testing under a Single-Stage Design in Phase II Trial
388 402 Housila P. Singh, Effect of Measurement Errors on the SeparateNamrata Karpe and Combined Ratio and Product Estimators
in Stratified Random Sampling
403 413 Xian Liu, Reducing Selection Bias in AnalyzingCharles C. Engel, Longitudinal Health Data with HighHan Kang, Mortality RatesKristie L. Gore
414 442 Oluseun Odumade, Use of Two Variables Having Common MeanSarjinder Singh to Improve the Bar-Lev, Bobovitch and
Boukai Randomized Response Model
443 451 Peyman Jafari, A Flexible Method for Testing IndependenceNoori Akhtar-Danesh, in Two-Way Contingency TablesZahra Bagheri
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
3/277
Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc.
November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00
v
452 460 Seema Jaggi, Neighbor Balanced Block Designs for Two
Cini Varghese, FactorsN. R. Abeynayake
461 469 Moustafa Omar Ahmed Adjusted Confidence Interval for theAbu-Shawiesh Population Median of the Exponential
Distribution
470 479 K. A. Bashiru, Nonlinear Trigonometric TransformationO. E. Olowofeso, Time Series ModelingS. A. Owabumoye
480 487 Hilmi F. Kittani Incidence and Prevalence for A TriplyCensored Data
488 494 Mowafaq M. Al-Kassab, A Comparison between Unbiased Ridge andOmar Q. Qwaider Least Squares Regression Methods Using
Simulation Technique
495 501 Hatice Samkar, Ridge Regression Based on Some RobustOzlem Alpu Estimators
502 511 Sanizah Ahmad, Robust Estimators in Logistic Regression:Norazan Mohamed Ramli, A Comparative Simulation StudyHabshah Midi
512 519 Sunil Kumar, A General Class of Chain-Type EstimatorsHousila P. Singh, in the Presence of Non-Response Under Sandeep Bhougal Double Sampling Scheme
520 535 Li-Chih Wang, A GA-Based Sales Forecasting ModelChin-Lien Wang Incorporating Promotion Factors
536 546 Anton Abdulbasah Kamil, Maximum Downside Semi DeviationAdli Mustafa, Stochastic Programming for PortfolioKhilpah Ibrahim Optimization Problem
547 557 Gyan Prakash On Bayesian Shrinkage Setup for ItemFailure Data Under a Family of Life TestingDistribution
558 567 M. T. Alodat, Empirical Characteristic Function ApproachS. A. Al-Subh, to Goodness of Fit Tests for the LogisticKamaruzaman Ibrahim, Distribution under SRS and RSSAbdul Aziz Jemain
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
4/277
Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc.
November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00
vi
568 578 Sheikh Parvaiz Ahmad, Bayesian Analysis of Location-Scale Family
Aquil Ahmed, of Distributions Using S-Plus and R SoftwareAthar Ali Khan
579 583 Ozer Ozdemir, ANN Forecasting Models for ISE National-Atilla Aslanargun, 100 IndexSenay Asma
584 595 Shafiqah Alawadhi, Markov Chain Analysis and StudentMokhtar Konsowa Academic Progress: An Empirical
Comparative Study
Brief Report 596 598 L. V. Nandakishore Bayesian Analysis for Component
Manufacturing Process
Emerging Scholars599 603 Hamid Reza Kamali, Estimating the Non-Existent Mean and
Parisa Shahnazari- Variance of the F-Distribution by SimulationShahrezaei
JMASM is an independent print and electronic journal (http://www.jmasm.com/), publishing (1) new statistical tests or procedures, or the comparison of existing statistical testsor procedures, using computer-intensive Monte Carlo, bootstrap, jackknife, or resamplingmethods, (2) the study of nonparametric, robust, permutation, exact, and approximaterandomization methods, and (3) applications of computer programming, preferably in Fortran(all other programming environments are welcome), related to statistical algorithms, pseudo-random number generators, simulation techniques, and self-contained executable code to carryout new or interesting statistical methods.
Editorial Assistant: Julie M. SmithInternet Sponsor: Paula C. Wood , Dean, College of Education, Wayne State University
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
5/277
Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc. November 2010, Vol. 9, No. 2, 332-339 1538 9472/10/$95.00
332
INVITED DEBATE The Not-So-Quiet Revolution: Cautionary Comments on the Rejection of
Hypothesis Testing in Favor of a Causal Modeling Alternative
Daniel H. Robinson Joel R. LevinUniversity of Texas University of Arizona
Rodgers (2010) recently applauded a revolution involving the increased use of statistical modelingtechniques. It is argued that such use may have a downside, citing empirical evidence in educational
psychology that modeling techniques are often applied in cross-sectional, correlational studies to produceunjustified causal conclusions and prescriptive statements.
Key words: Modeling, hypothesis testing, SEM, HLM, causation.
Daniel H. Robinson is Professor of EducationalPsychology and editor of Educational
Psychology Review. His research interestsinclude educational technology innovations thatmay facilitate learning, team-based approachesto learning, and examining gender trendsconcerning authoring, reviewing, and editingarticles published in various educational journalsand societies. Email him at:
[email protected] R. Levin is Professor Emeritus at theUniversity of Arizona and the University of Wisconsin-Madison. He is former editor of the
Journal of Educational Psychology , and former Chief Editorial Advisor for journal publicationsof the American Psychological Association. Hisresearch interests include the design and
statistical analysis of educational research, aswell as cognitive-instructional strategies thatimprove students learning. Email him at:
IntroductionOver the years, we have found that Joseph
Rodgers (e.g., Rodgers, Cleveland, van denOord, & Rowe, 2000; Rodgers & Nicewander,1988) has something academically interesting,meaty, and instructive to say. Against that
backdrop, Rodgers most recent essay, provocatively titled The epistemology of mathematical and statistical modeling: A quietmethodological revolution (Rogers, 2010)merits close examination and extensive
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
6/277
ROBINSON & LEVIN
333
commentary. Rodgers appeared to have missedthe mark in two critical respects; both reflectedin the subtitle A quiet methodologicalrevolution, because as will become apparent inthe following discussion, the revolution isneither quiet nor methodological.
The Null Hypothesis HullabalooRodgers is correct in stating that serious
concerns about null hypothesis significancetesting (NHST) have been mounting over the
past several decades. Yet, as is well representedin Harlow, Mulaik, & Steigers (1997)impressive volume, NHST criticisms havehardly been expressed quietly, but rather withfull sound and fury. Moreover, in making hiscase, Rodgers provided a one-sided view of thecontroversy. Although several sources that indict
NHST were cited, short shrift was given toapproaches that have defended reasonable and
proper applications of statistical hypothesistesting, including, among others, decidingwhether a believed-random process is trulyrandom (e.g., Abelson, 1997), intelligenthypothesis testing (Levin, 1998a), equivalencetesting (e.g., Serlin & Lapsley, 1993), andhypothesis testing supplemented by effect-sizeestimation and/or confidence-intervalconstruction (Steiger, 2004).
In addition, numerous authors have
defended the use of NHST when mindfullyapplied (e.g., Frick, 1996; Hagen, 1997;Robinson & Levin, 1997; Wainer & Robinson,2003). Rodgers cited social-sciences statisticalsage Jacob Cohen (1994) as one who dismissed
NHST practices in his 1994 seminal article,The Earth is Round ( p < .05). Yet, in the samearticle, one could easily interpret Cohens (p.1001) comment about the nonexistence of magical alternatives to NHST as conceding thatfor whatever good NHST does, there are noadequate substitutes.
Rodgers (p. 2) described thefundamental difference between the Fisherianand Neyman-Pearson approaches, with the latter emphasiz[ing] the importance of the individualdecision. However, he characterized NHST as ahybrid and condemned it. Just because atechnique is often misused is not a sufficientreason to abandon it. For example, it is argued
below that in educational psychology we have
observed frequent misapplication of theRodgers favored causal modeling techniques. Inrecognizing that misapplication, however, our goal is not to deter researchers from adoptingmodeling techniques, but rather to encourageresearchers to apply such techniquesappropriately and to interpret wisely the resultsthat they pump out. (Back in the Neanderthalage of computers, grind out would have been amuch more fitting description.)
As researchers who have spent most of our careers conducting randomized experiments,we have sought to apply NHST judiciously,typically adopting or adapting Neyman-Pearsona priori Type I, Type II error, effect-size, andsample-size specification principles.Accordingly, we have found that in experimentsconducted with rationally (or better, optimally)
determined sample sizes - that is, sample sizesassociated with enough statistical power todetect nontrivial differences but with not toomuch power to detect trivial differences (see, for example, Levin, 1998b; and Walster & Cleary,1970) - NHST provides useful informationconcerning whether one has an experimentaleffect worth pursuing. In this context, pursuingmeans that obtaining a statistically significanteffect is followed by a sufficient number of independent replications until the researcher hasconfidence that the initially observed effect is a
statistically reliable one (see, for example, Levin& Robinson, 2003).
In that sense as well, we have regarded NHST primarily as a screening device, similar infunction to what Sir Ronald had in mind (e.g.,Fisher, 1935). Much of the hullabaloo about
NHST is caused by too many researchersfocusing on the results of a single study rather than on a series of studies that are part of a
program of research (Levin & Robinson, 2000).Fisher was never satisfied with an effectidentified in a single study, even if it had a p
value of less than 0.05! Instead, he believed thata treatment was only worth writing home aboutwhen it had consistently appeared in numerousexperiments. As is implied in the followingsection, whatever purported advantagesmodeling techniques have over NHST alsovanish unless researchers test a priori models inmultiple experiments.
Rodgers (p. 3) also condemned the
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
7/277
REJECTION OF HYPOTHESIS TESTING IN FAVOR OF CAUSAL MODELING
334
NHST jurisprudence model while aptly referringto Tukeys (1977) confirmatory data analysisstrategy as being judicial (or quasi-judicial) innature. Yet, Rodgers mischaracterized Tukeysexploratory data analysis strategy insofar as thedetective nature of that hypothesis-generatingapproach clearly is not jurisprudence. It is thisdetective role that one emphasizes when using
NHST simply as a research-based screening process to determine whether posited effectsexist. To us, convincing a jury of ones peersthat a prescription for practice should be basedon a single research study is rarely, if ever,
justified.Rodgers (p. 9) assertion that a
fundamental problem with NHST is one of testing valueless nil null hypotheses has beenadvanced by many critics. As researchers who
endeavor to use intelligent forms of hypothesistesting with experimental data, we regard the
problem of nil nulls not as a statistical issue butas a methodological one. Specifically, it makeslittle or no conceptual sense to apply NHSTwhen comparing an instructional treatment witha closet (Levin, 1994, p. 233) control group(i.e., a condition in which participants sit in adark room and do nothing), just as it is inane tocompute p-values for reliability correlations(see, for example, Thompson, 1996).Educational psychology is filled with such
examples of comparing new innovations withridiculous straw-person control conditions thatno sane researcher would ever consider using. Amore appropriate formulation of a nil null iswhen an investigator wishes to compare a newlydeveloped and previously untested experimentaltreatment with the best treatment that iscurrently available.
According to Rodgers, the [1999 task force assembled by the American PsychologicalAssociation] concluded that NHST was brokenin [a] certain respect (p. 3). Task-force member
Wainer and the present first author (Wainer &Robinson, 2003) provided a different view of thetask forces brief consideration of therecommendation to issue an outright ban on
NHST. As we have argued previously (e.g.,Levin & Robinson, 1999) and in our precedingdiscussion, adopting such an extreme stancewould be akin to calling for a ban on hammers
because hammerers were hammering their
fingers instead of nails (for additionaldiscussion, see Levin & Robinson, 2003). Eventhe outspoken NHST critic Rozeboom (1997)acknowledged via another tools analogy thatthe sharpest of scalpels can only create a messif misdirected by the hand that drives it, (p.335). Fortunately, in the case of the most recent(6 th) edition of the APA Publication Manual (American Psychological Association, 2010),the hypothesis-testing baby was not thrown outwith the bath water.
Causal Modeling TechniquesContemporary modeling techniques,
including structural equations modeling (SEM)and hierarchical linear modeling (HLM), amongothers, which emerge from atheoretical/conceptual framework, are
statistical/data-analytic and not methodologicalin nature. So, whence Rodgersmethodological revolution? Even he noted on
p. 8 that SEM has been built into a powerfulanalytic method and is a prototype of the firstapproach [a model-comparison framework] to
postrevolutionary modeling (p. 8).That a statistical modeling tail often
wags the methodological dog may havecontributed to what we consider a major misuseof causal modeling: researchers attempting tosqueeze causality out of observational or
correlational data. Because of the unfortunatecausal nomenclature, we fear that manyresearchers may be deluded into believing thatthe statistical control that such techniques
provide for correlational (non-experimental)data is on a par with the genuine experimentalcontrol of randomized experiments (Levin &ODonnell, 2000, p. 211). This in turn results incausal-model appliers issuing causal conclusionsthat they mistakenly believe are scientificallyvalid. As Cliff (1983) previously noted, Literalacceptance of the results of fitting causal
models to correlational data can lead toconclusions that are of questionable value (p.115).
In addition, because causal-modelresearchers conclusions typically flow fromrevised data-driven models rather than from a
priori theory-based model specifications, in theabsence of independent validations those causalconclusions present even more cause for
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
8/277
ROBINSON & LEVIN
335
concern. As with our previous hammers vs.hammerers distinction, Rodgers is well aware of researchers potential shoddy application of causal modeling techniques. Yet, he could havesent a stronger cautionary message to therelatively uninitiated model builder than hisinnocuous pronouncement that the success of SEM depends on the extent to which it is appliedin many research settings (p. 8).
To illustrate what we mean by prescriptive statements appearing in articles thatinclude statistical modeling techniques, we offer very recent examples that appeared in areputable educational psychology research
journal. To avoid redundancy, we offer only twosuch unjustified causal excerpts here, fromnumerous ones that we have encountered inmultiple teaching-and-learning research journals
that we have recently read or reviewed (seeRobinson, Levin, Thomas, Pituch, & Vaughn,2007, and the following section).
Ciani, Middleton, Summers and Sheldon(2010)s Study
The following summary appeared inCiani et al.s study abstract:
Multilevel modeling was used to teststudent perceptions of three contextual
buffers: classroom community, teachers
autonomy support, and a masteryclassroom goal structureResults
provide practitioners with tools for counteracting potential negativeimplications of emphasizing
performance in the classroom. (p. 88)
There was one predictor variable; one outcomevariable, a three-item scale that measuredstudents motivation to learn; and threemoderator variables, a three-item scale thatmeasured student perceptions of classroom
community, a four-item scale that measuredstudent perceptions of instructor autonomysupport, and a three-item scale that measuredstudent perceptions of the extent to which their teacher emphasizes developing competence inthe classroom. All measures were collected at asingle point in time and HLM was used toanalyze the data. Here are a couple causalconclusions from the discussion section:
However, it appears that comparingstudents achievement publicly, or usingthe work of the highest achievingstudents as an example for everyone,may not be so pernicious a practicewhen students in the classroom perceivea sense of community among their fellow classmates.
[O]ur findings demonstrate that if students feel respected by the teacher,such that their preferences and ways of doing things are acknowledged andaccommodated as much as possible,then a strong performance orientation onthe part of the teacher is not harmful.Autonomy support enables students tointernalize what they are doing, so that
they view their activity as importanteven if it is not enjoyable, or if it createsstress and pressure. Thus, it appears thatemphasizing competition betweenstudents is not necessarily underminingof student mastery goals, if the teacher can communicate and promote the
performance structure in a non-controlling way. These findings arereassuring, showing that performanceorientations are not necessarilycorrosive certainly an important
message, given the performancenecessities that all students face. (p. 95)
As with most of these articles based oncorrelational data and yet that offer prescriptiverecommendations, certain limitations of theresearch are explicitly acknowledged by theauthors:
The most significant limitation to thecurrent study is that all data reported arecorrelational.
Gathering data at one point in time alsocreates a limitation regarding the causalrelationships among the variables in thisstudy. (p. 96)
These limitations aside (or ignored?), the authors proceeded to offer the following prescriptive:
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
9/277
REJECTION OF HYPOTHESIS TESTING IN FAVOR OF CAUSAL MODELING
336
Our findings, along with other goaltheorists (e.g., Urdan & Midgley, 2003),suggest that given current prevailingattitudes and policy it may be morefruitful to emphasize adaptiveinstructional practices in the classroom,as opposed to trying to reducemaladaptive practices. (p. 97)
Thus, the authors made recommendations for practice (prescriptive statements) in theabsence of convincing evidence that such
practices are clearly causally related to studentoutcomes.
Chen, Wu, Kee, Lin & Shuis (2009) StudyChen et al. used SEM to analyze
relations among fear of failure, achievement
goals, and self-handicapping. Causal relationsamong the variables are implied in theDiscussion section:
This finding shows fear of failure as adistal determinant of self-handicappingand achievement goals (MAv and PAv)as proximal determinants of self-handicapping, demonstrating themotivational process of self-handicapping. (p. 302)
The authors revealed the perceived magicalquality of SEM allowing researchers to coaxcausality from correlational data:
Since SEM analysis examines manyvariables relationships simultaneously, werely on its results as the basis for our conclusions and discussion. (p. 303)
The Limitations section is predictable:
Although we used the SEM approach to
estimate the proposed model, the data inthe study are cross-sectional in natureand causal relations cannot be drawn.The longitudinal approach is preferredin order to ascertain the causal patternand to further clarify the chronic effectsof mastery-avoidance and performance-approach goals on achievement-relatedoutcomes. (p. 304)
In contrast, what follows are the grand prescriptives that appeared in the Implicationsand Conclusions:
We believe that the integrative modelcan help educators develop effectiveinterventions to reduce students self-handicapping, especially since we foundthat the mid-level achievement goals(MAv and PAv) mediate therelationships between fear of failure andself-handicapping it is suggested thatteachers use multiple indices to offer more opportunities for students to attainsuccess. In addition, teachers shouldencourage students to embrace amultiple goals perspective in whichdoing ones best and outperforming
others are not in conflict with eachother. (p. 304)
Rodgers (2010, p. 8) previously proffered caveataside, in both of the just-presented examples,cross-sectional (one time point), correlational(no variables were manipulated) data weretossed into a statistical modeling analysis andwhat popped out were causal conclusions.
Correlational Data and Causal ConclusionsOver the past few years, we have
examined empirical articles published in widelyread teaching- and-learning research journalsand have found that:
1. In one journal survey (Hsieh et al., 2005),the proportion of articles based onintervention and experimental (randomassignment) methodology had decreasedfrom 47% in 1983 to 23% in 2004.
2. In another journal survey (Robinson et al.,2007), the proportion of articles based on
intervention methods had decreased from45% in 1994 to 33% in 2004. Meanwhile,the proportion of nonintervention articlesthat contained prescriptive statementsincreased from 34% in 1994 to 43% in 2004.The proportion of nonintervention (non-experimental and correlational) articles thatincluded prescriptive statements (in the formof causally implied implications for
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
10/277
ROBINSON & LEVIN
337
educational practice) increased from 33% in1994 to 45% in 2004.
3. In a follow-up to the just-describedRobinson et al. (2007) survey (Shaw, Walls,Dacy, Levin & Robinson, 2010), althoughonly 19 nonintervention studies in 1994included prescriptive statements, thesestatements were repeated in 30 subsequentarticles that had cited the original 19.
For the present article, we examined thefirst two issues of the 1999 volume of the APA-
published journal, the Journal of Educational Psychology , and again for the 2009 volume. Welooked specifically at the comparative
proportions of articles based on correlationalmethods and those that involved interventions
(either randomized experimental or nonrandomized but researcher manipulated), aswell as the proportion of correlational methodsarticles in which prescriptive statements wereoffered. The results are summarized in Table 1.
Although roughly half of the articlesappearing in only one of the five journals thatwere part of Robinson et al.s (2007) study weresurveyed, the findings support the reportedtrends. Intervention studies (both randomizedand nonrandomized) are becoming increasinglyrare and instead researchers are basing their
recommendations for practice on weaker evidence. Moreover, it appears that statistical
and nonrandomized) are becoming increasinglyrare and instead researchers are nonrandomized)modeling techniques are becoming more popular - having increased from only 3% of thecorrelational research articles in 1999 to 40% in2009 - which may in turn contribute to theconcomitant 10-year increase in prescriptivestatements appearing in such articles.
Thus, we have witnessed widespreadapplication of SEM, HLM, and other sophisticated statistical procedures incorrelational data contexts, where causality issought but the critical conditions needed toattribute causality are missing (e.g., Marley &Levin, 2011; Robinson, 2010). Rodgers statesthat researchers who are scientistsshould befocusing on building a modelembedded withinwell-developed theory (p. 4-5). Here we agree
with former Institute for Educational ScienceDirector Grover Whitehurst who argued that - atleast in the field of education - we have enoughtheory development studies and need morestudies that address practical what worksquestions.
It is our fear that a research approachwhere the question, Does the data fit mymodel? is far more dangerous than thequestion, Is there anything here worth
pursuing? As we have seen, an affirmativeanswer to the former question seems to entitle a
researcher to form a model that indicates acausal relationship between, say, students self-efficacy and their achievement. The researcher then develops a self-efficacy scale that measures
Table 1: Summary of Selected Results of Surveyed Articles Appearing in the Journal of Educational Psychology (1999 and 2009) Based on Either Correlational or Intervention Methods
1999 2009
Type of Study Type of Study
Correlational Intervention Correlational Intervention
Number of Articles 18 (60%) 12 (40%) 23 (66%) 12 (34%)
Prescriptive Statements 9 (50%) ------ a 13 (57%) ------ a
Statistical Modeling 1 (3%) 0 (0%) 14 (40%) 2 (6%)
Prescriptive Statements 1 ------ a 7 (50%) ------ a
Note : This table includes preliminary data from a larger study recently completed by Reinhart, Haring,Levin, Patall, and Robinson (2011). a Not assessed in the present survey
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
11/277
REJECTION OF HYPOTHESIS TESTING IN FAVOR OF CAUSAL MODELING
338
students self-perceptions and also measuresachievement. The data may fit the model but inthe absence of convincing longitudinal data,ruling out alternative explanations, andindependent replications based on the previousnice-fitting model, this practice may lead todangerous causal conclusions. For the just-
presented self-efficacy example, it is just aslikely that high achievers feel better about their effectiveness as learners rather than the other way around. Apparently, many researchers
believe that it is entirely appropriate to applysuch modeling techniques and to interpret theresults as support for prescriptive statementsfounded on causality.
Conclusions About RevolutionsTo summarize, Rodgers (2010) has
written a cogent essay on the vices of statisticalhypothesis testing and the virtues of statisticalmodeling. We believe, however, that his essay
painted a somewhat distorted (and potentiallymisleading) portrait about those statistical arts.In particular, we take issue with two aspects of Rodgers so-called quiet methodologicalrevolution. For one aspect (rejecting statisticalhypothesis testing), we argue that the picture isneither as bleak nor as open and shut as Rodgers
portrayed. As supporting evidence, witness thesustained presence of hypothesis testing, along
with its more intelligent additions andadaptations, in various academic-researchdisciplines - including the research-and-
publication bible of both our very own field of psychology and virtually all social-sciencesdomains, the most recent edition of the APA
Publication Manual (American PsychologicalAssociation, 2010).
For the other aspect of Rodgers essaythat merits critical commentary (acceptingmodeling techniques), we argue that causalmodeling and other related multivariate and
multilevel data-analysis tools frequently causetheir users to think - in accord with Rodgersseductive subtitle - that the procedures aremethodological randomization-compensating
panaceas rather than techniques that do the bestthey can to provide some degree of statisticalcontrol in a multiply confounded variableworld. The unfortunate consequence of thatmethodological understanding, then, is that
when combined with researcher misapplicationof such modern modeling artillery, instead of
being on target with their data analyses andresearch conclusions, weapons are backfiringand researchers are ending up (whether knowingly or not) with a considerable amount of egg on their faces.
ReferencesAbelson, R. P. (1997). The surprising
longevity of flogged horses: Why there is a case for the significance test. Psychological Science , 8, 12-15.
American Psychological Association.(2010). Publication manual of the American
Psychological Association (6 th Ed.). Washington,DC: American Psychological Association.
Chen, L. H., Wu, C.-H., Kee, Y. H., Lin,M.-S., & Shui, S.-H. (2009). Fear of failure, 2 x 2achievement goal and self-handicapping: Anexamination of the hierarchical model of achievement motivation in physical education.Contemporary Educational Psychology , 34, 298-305.
Ciani, K. D., Middleton, M. J., Summers, J.J., & Sheldon, K. M. (2010). Buffering against
performance classroom goal structures: Theimportance of autonomy support and classroomcommunity. Contemporary Educational Psychology ,35, 88-99.
Cliff, N. (1983). Some cautions concerningthe application of causal modeling methods.
Multivariate Behavioral Research , 18, 115-126.
Cohen, J. (1994). The earth is round ( p 0 (1)
The parameter represents the mean number of events per unit time (e.g., the rate of arrivalsor the rate of failure). The exponential
distribution is supported on the interval [0, ).The mean (expected value) of an exponentiallydistributed random variable X with rate
parameter is given by:
1)( == X E (2)
In light of the examples given above, thismakes sense: if a person receives phone calls atan average rate of 2 per hour, then they canexpect to wait one-half hour for every call. Also,note that approximately 63% of the possiblevalues lie below the mean for any exponentialdistribution (Betteley, et al., 1994).
The median of an exponentiallydistributed random variable X with rate
parameter is given by:
69315.0)2(ln
== MD (3)
The maximum likelihood estimator (MLE) for
the rate parameter , given an independent andidentically distributed random sample of size n,
n X X X ,...,, 21 , drawn from the exponential
distribution, ( ) 1 Exp , is given by:
n
ii 1
n 1XX
=
= = . (4)
While this estimate is the most likelyreconstruction of the true parameter , it is onlyan estimate, and as such, the more data pointsavailable the better the estimate will be. Also,the MLEs are consistent estimators of their
parameters and are asymptotically efficient(Casella & Berger, 2002).
The Used EstimatorsThe sample mean, X , and the
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
136/277
ABU-SHAWIESH M.O.A
463
sample median, 1 MD , which are used in thisstudy for constructing the proposedmodified confidence intervals for theexponential distribution median, MD , arenow introduced.
The Sample Mean, X The sample mean is the most well
known example of a measure of location, or average. It is defined for a set of values as thesum of values divided by the number of valuesand is denoted by X . The sample mean for arandom sample of size n observations
n X X X ,...,, 21 can be defined as follows:
n
X X
n
i i== 1 (5)
The main advantages of the sample mean, X ,are: it is easy to compute, easy to understoodand takes all values into account. Its maindisadvantages are: it is influenced by outliers,can be considered unrepresentative of datawhere outliers occur because many values may
be well away from it and it requires all values inorder to calculate its value (Betteley, et al.,
1994; Francis, 1995).The Sample Median, MD 1
The sample median is perhaps the bestknown of the resistant location estimators. It isinsensitive to behavior in the tails of thedistribution. The sample median is defined for aset of values as the middle value when thevalues are arranged in order of magnitude and itis denoted herein by 1 MD . The sample medianfor a random sample of size n observations
n X X X ,...,, 21 can be defined as follows:
+=
+
+
evenisnif
X X
odd isnif X
MD nn
n
2
122
21
1
(6)
The main advantages of the samplemedian, MD 1, is that, it is easy to determine,requires only the middle values to calculate, can
be used when a distribution is skewed - as in thecase of the exponential distribution, is notaffected by outliers and has a maximal 50%
breakdown point. The main disadvantages of thesample median, MD 1, are that it is difficult tohandle in mathematical equations, it does notuse all available values and it can be misleadingin a distribution with a long tail because itdiscards so much information. The samplemedian, though, is considered as an alternativeaverage to the sample mean (Betteley, et al.,1994; Francis, 1995). However, the samplemedian, MD 1, has become as a good general
purpose estimator and is generally considered asan alternative average to the sample mean, X .
Estimating the Exponential Distribution Median:Two techniques are now introduced for
finding estimates, the method of sample medianand the method of maximum likelihood which isthe most widely used.
The Method of Sample MedianGiven a random sample of size n
observations, n X X X ,...,, 21 , the estimator of the exponential population median , MD , based
on the method of sample median, MD 1, isdenoted by MD MD1 . Now, from equation (3):
1 ln(2)MD ln(2)
MD= =
(8)
Thus, if the exponential population median MD in (8) is estimated by the sample median MD 1,results in the following approximation:
1
)2ln(
MD
= . (9)
Therefore, equating the results in (4) and (9) andsolving, the following approximation isobtained:
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
137/277
ADJUSTED CONFIDENCE INTERVAL FOR EXPONENTIAL DISTRIBUTION MEDIAN
464
)2ln()2ln( 1
111
MDn X
MD X
n n
iin
ii
==
=
(10)
The Maximum Likelihood Estimator of theMedianGiven a random sample of size n
observations, n X X X ,...,, 21 , the estimator of the exponential population median based on themaximum likelihood method is denoted herein
by MD MLE and can be defined as follows:
)2ln()2ln(1 1
n
X MD
n
ii
MLE ===
(11a)
where
n
ii 1
n 1XX
=
= = . (11b)
Comparing the Two Estimators of theExponential Distribution Median
It is known that the maximum likelihoodestimators are asymptotically unbiased andefficient. Concretely, the estimator MD MLE is
unbiased and2
2)2(ln)(
n MDVar MLE = .
Moreover, the sample median estimator, MD1, isasymptotically normal distributed withasymptotic variance
)(4
1)(
21 MDnf MDVar A = , where (.) f is
the corresponding density and MD is thetheoretical median (Vann deer Vaart, 1998).Asymptotically unbiased means that theaverage value over many random samplesfor the two estimators MD 1 or MD MLE is theexponential distribution median, MD . Tocompare the two estimators MD 1 and MD MLE interms of how far they are from the exponentialdistribution median ( MD ) on the average for many random samples, it is necessary tocompare their root mean square error, RMSE ,given as follows:
==
r
i MDi MDr
RMSE 1
2)(1
(12)
where r MD MD MD ,...,, 21 are the values of the
estimators MD 1 and MD MLE for r replications and MD is the value of the exponential distributiontrue median.
The Confidence Interval for the ExponentialDistribution Median
Next, the confidence interval for themedian of the exponential distribution, MD , isderived by modifying the confidence interval for the mean of the exponential distribution, . Let
n X X X ,...,, 21 be a random sample of size n from the exponential distribution with parameter , that is, ( ) 1~ Exp X , then the exact twosided )%1(100 confidence interval for theexponential distribution mean, , is given by(Trivedi, 2001):
=
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
138/277
ABU-SHAWIESH M.O.A
465
n n
i ii 1 i 1
2 2(2n , 2) (2n ,1 2)
n n
i ii 1 i 1
2 2(2n, 2) (2n,1 2)
2 X 2 XMD 1
P( )ln(2)
2ln(2) X 2ln(2) XP( MD )
1
= =
= =
< = <
= < <
=
(15)
The MD MLE confidence interval is exact. It is
based on the fact that=
n
1iiX2 follows a
2)2( n distribution. The coverage probability
must be exactly 95%.
The MD MD1 Confidence IntervalThis confidence interval is obtained by
substituting the result from (10) into equation(13) to give the exact )%1(100 confidenceinterval for the exponential distribution median,
MD , as follows:
1 12 2(2n , 2) (2n ,1 2)
1 12 2(2n , 2) (2n,1 2)
2(n MD ln(2)) 2(n MD ln(2))MDP( )
ln(2)
2n MD 2n MDP( MD )
1
< <
= < <
= (16)
The MD1 confidence interval is notexact. Its expression in (16) is based on equation(10) which is only an approximation of thestatistics In order to see that, the performance of the MD1 confidence interval is studied bycalculating the coverage probability, the averagewidth and the standard error using Monte-Carlosimulations. Actual, approximate and exactconfidence intervals based on the sample median
MD 1 can be also constructed using standardmethods.
The Monte Carlo Simulation StudyA Monte Carlo simulation was designed
to compare and study the behavior of the twoestimators MD 1 and MD MLE and investigate the
behavior of the proposed approximateconfidence intervals for the exponentialdistribution median, MD . FORTRAN programswere used to generate the data from theexponential distribution and run the simulationsand to make the necessary tables. Results arefrom the exponential distribution with parameter which was set to 1 and 0.5, to increaseskewness. The more the repetition, the moreaccurate are simulated results, therefore 10,000random samples of sizes n = 10, 15, 20, 30, 40,50 and 100 were generated.
Table (1) shows the simulated results for the root mean square error, RMSE , and theaverage of MD 1s and MD MLEs (AVG) toillustrate that both estimators are approximatelyunbiased for the true median of the exponentialdistribution, MD . The simulated results for thecoverage probability ( P ), average width (AW)and standard error (SE) of the exact confidenceinterval for the exponential mean, , and thetwo proposed approximate confidence intervalsfor the exponential distribution median, MD , areshown in tables (2-4). The criteria used toevaluate the exact and proposed approximateconfidence intervals is the value of the coverage
probability ( P ) and average width (AW); a goodmethod should have an observed coverage
probability ( P ) near to the nominal coverage probability and a small scaled average width(AW).
The simulation results in Table 1 showthat the maximum likelihood estimator of themedian, MD MLE , is a much better estimator for the population median of the exponentialdistribution, MD , than the sample median, MD 1.While both estimators are approximately
unbiased, the root mean square error, RMSE , for the sample median, MD 1, is larger than that of the maximum likelihood estimator of themedian, MD MLE . It should be noted that theaccuracy of the maximum likelihood estimator of the median, MD MLE , increases as the samplesize, n, increases which clearly provides a verygood estimator, even considering that thediscrepancy of these two estimators is very small
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
139/277
ADJUSTED CONFIDENCE INTERVAL FOR EXPONENTIAL DISTRIBUTION MEDIAN
466
-that is, these two estimators asymptoticallycoincide.
As shown in Tables 2-4, the simulationresults show that the coverage probability ( P )for the confidence interval of the mean and theapproximate confidence interval of the median
based on the MLE method for the exponentialdistribution are the same and very close to thenominal confidence coefficient.
The approximate confidence interval of the median based on the sample median methodfor the exponential distribution provides thelower coverage probability ( P ) and gives the
lowest width among the three methods. Theaverage widths (AW) for the two proposedconfidence interval methods are approximatelythe same for moderate and large sample sizes.However, the estimated average width (AW) for the sample median method is the shortest amongall considered methods, but it has poor coverage
probability. Furthermore, as sample sizesincreases, the performance of the proposedconfidence interval based on the MLE methodimproves, but is still much lower than thenominal confidence.
Table 1: The Root Mean Square Error and Average of the Two Estimators for theExponential Distribution Median
SampleSize (n)
1= (True Median = 0.69315)
RMSE( MD 1) AVG( MD 1) RMSE( MD MLE ) AVG( MD MLE )
10 0.31575 0.74851 0.22369 0.69487
15 0.26754 0.72676 0.18040 0.69332
20 0.22459 0.72003 0.15715 0.69394
30 0.18276 0.71180 0.12808 0.69335
40 0.15907 0.70649 0.11106 0.69342
50 0.14108 0.70459 0.09887 0.69439100 0.10005 0.69894 0.06961 0.69265
SampleSize (n)
5.0= (True Median = 1.38629)
RMSE( MD 1) AVG( MD 1) RMSE( MD MLE ) AVG( MD MLE )
10 0.63150 1.49703 0.44739 1.38973
15 0.53509 1.45352 0.36080 1.38663
20 0.44918 1.44005 0.31430 1.38788
30 0.36553 1.42359 0.25615 1.38671
40 0.31814 1.41299 0.22211 1.38684
50 0.28215 1.40917 0.19773 1.38877
100 0.20010 1.39788 0.13922 1.38530
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
140/277
ABU-SHAWIESH M.O.A
467
Table 2: Coverage Probabilities, Average Width and Standard Error for the Confidence Intervalof the Mean of the Exponential Distribution
n
95.01 =
1= 5.0=
P AW SE P AW SE
10 94.55 1.504 0.484 94.55 3.007 0.968
15 94.84 1.148 0.299 94.84 2.297 0.59820 94.51 0.964 0.218 94.51 1.928 0.43730 94.58 0.762 0.141 94.58 1.524 0.282
40 94.94 0.650 0.104 94.94 1.299 0.20850 94.79 0.576 0.082 94.79 1.153 0.164
100 95.02 0.399 0.040 95.02 0.798 0.080
Table 3: Coverage Probabilities, Average Width and Standard Error for the Confidence Intervalof the Median of the Exponential Distribution with 1
=
n
95.01 = Confidence Interval Method
MDMLE Method MD MD1 Method
P AW SE P AW SE
10 94.55 1.123 0.336 85.11 1.042 0.46615 94.84 0.834 0.207 82.76 0.796 0.30520 94.51 0.693 0.151 83.92 0.668 0.21530 94.58 0.542 0.098 83.41 0.528 0.139
40 94.94 0.459 0.072 83.00 0.450 0.10350 94.79 0.405 0.057 83.28 0.400 0.081
100 95.02 0.279 0.028 82.95 0.277 0.040
Table 4: Coverage Probabilities, Average Width and Standard Error for the Confidence Intervalof the Median of the Exponential Distribution with 5.0=
n
95.01 = Confidence Interval Method
MDMLE Method MD MD1 Method
P AW SE P AW SE
10 94.55 2.246 0.671 85.11 2.085 0.93315 94.84 1.669 0.414 82.76 1.592 0.60920 94.51 1.387 0.303 83.92 1.337 0.43030 94.58 1.085 0.195 83.41 1.056 0.27740 94.94 0.918 0.144 83.00 0.901 0.20650 94.79 0.811 0.114 83.28 0.799 0.162
100 95.02 0.558 0.056 82.95 0.553 0.080
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
141/277
ADJUSTED CONFIDENCE INTERVAL FOR EXPONENTIAL DISTRIBUTION MEDIAN
468
Numerical ExampleThis example is taken from Wilk, et al.
(1962); the data set represents the lifetimes (inweeks) of 34 transistors in an accelerated lifetest. The transistors were tested and the testcontinued until all of them failed. The lifetimesfor the 34 transistors (in weeks) were recordedas follows:
3, 4, 5, 6, 6, 7, 8, 8, 9,9, 9, 10, 10, 11, 11, 11, 13, 13,
13, 13, 13, 17, 17, 19, 19, 25, 29,33, 42, 42, 52, 52, 52, 52
The sample mean 912.18= X weeks, theexponential median MD 1 = 13 weeks, theexponential median MD2 = 13.108 weeks and the
skewness is 1.265695, which is highly skeweddistribution. Furthermore,
053.005287.0912.1811 ===
X and -
using the approximation derived earlier - results
in 053.005332.013
69315.0)2ln(1
=== MD
which indicates that the two values are veryclose and therefore the approximation is good.Based on Kibria (2006), the above data set areassumed to come from an exponential
distribution with mean 21= weeks; using theKolmogorov-Smirnov test, the K-S statistic =0.1603 and the p-value = 0.3125, it indicates thatthe sample data are from an exponentialdistribution with mean 21= weeks, andtherefore (from equation (3)) has a median MD = 14.556 weeks. The resulting 95% confidenceinterval and the corresponding confidence width
for the exact confidence interval for theexponential mean and the two proposed methodsfor the exponential median are calculated andgiven in table (5).
From table (5), it is observed that theexact confidence interval for the exponentialmean, as expected, covered the hypothesizedtrue population mean of 21= weeks and alsothe proposed confidence intervals for theexponential median, MD , covered thehypothesized true population median MD =14.556 weeks. However, the proposedconfidence interval for the exponential median,
MD , based on the sample median, MD1, provided the shortest confidence interval width.
ConclusionThe median - one of the most important and
popular measures for location - has many goodfeatures. The median confidence interval isuseful for one parameter families, such as theexponential distribution, and it may not need to
be adjusted if censored observations are present.The maximum likelihood estimation is a popular statistical method used to make inferences about
parameters of the underlying probabilitydistribution from a given data set. This study
proposed an approximate confidence interval for the median of the exponential distribution, MD ,
based on two estimators, the sample median, MD 1, and the maximum likelihood estimator of the median, MD MLE .
The results of this study show that usinga maximum likelihood estimator, MLE , for the
population median of the exponentialdistribution, MD , is better alternative to theclassical estimator based on the sample median,
MD 1. As shown by the study results, themaximum likelihood estimator of the median,
MD MLE , provides a good estimation for the population median of the exponentialdistribution, MD , and the proposed confidenceinterval based on this estimator had a goodcoverage probabilities compared to the samplemedian method. However, it produced slightlywider estimated width. It appears that the samplesize, n, has significant effect on the two
proposed confidence interval methods.Moreover, both of the proposed methods arecomputationally simpler. If scientists and
Table 5: The 95% Confidence Intervals for theLifetimes Data
ConfidenceInterval Method
95% ConfidenceInterval Width
Exact for Mean (13.874 , 27.308) 13.434
MD MLE (9.617 , 18.929) 9.312
MD MD1 (9.537 , 18.772) 9.235
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
142/277
ABU-SHAWIESH M.O.A
469
researchers are conservative about the smaller width, they might consider confidence interval
based on sample median method as a possibleinterval estimator for the population median of the exponential distribution, MD . Finally, theresults obtained from the simulation studycoincided with that of the numerical example.
AcknowledgementThe author would like to thank the HashemiteUniversity for their cooperation during the
preparation of this article.
ReferencesBalakrishnan, N., & Basu, A. P. (1996).
The exponential distribution: theory, methodsand applications , CRC press.
Bettely, G., Mettric, N., Sweeney, E., &Wilson, D. (1994). Using statistics in industry:Quality improvement through total processcontrol , (1 st Ed. ). London: Prentice HallInternational Ltd.
Casella, G., & Berger, R. L. (2002).Statistical inference . Pacific Grove, CA:Duxbury.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd Ed. ). Duxbury,Canada: Thomson Learning.
Francis, A. (1995). Businessmathematics and statistics . London: DP
Publications, Ltd.Kececioglu, D. (2002). Reliability
engineering handbook . Lancaster, UK: DEStechPublications.
Kibria, B. M. G. (2006). Modifiedconfidence intervals for the mean of theasymmetric distribution. Pakestanian Journal of Statistics , 22 (2), 109-120.
Kinney, J. (1997). Probability: Anintroduction with statistical applications . NewYork: John Wiley and Sons, Inc.
Lewis, E. E. (1996). Introduction to
reliability engineering , (2nd
Ed. ). New York:John Wiley and Sons, Inc.
Maguire, B. A., Pearson, E. S., & Wynn,A. H. A. (1952). The time intervals betweenindustrial accidents. Biometrika , 39 , 168-180.
Montgomery, D. C. (2005). Introductionto statistical quality control (5th Ed. ). New York:John Wiley and Sons, Inc.
Patel, J. K., Kapadia, C. H., & Owen, D.B. (1976). Handbook of statistical distributions .
New York: Marcel Dekker.Sinha, A., & Bhattacharjee, S. (2004).
Test of parameter of an exponential distributionin predictive approach. Pakestanian Journal of Statistics , 20(3), 409-414.
Smithson, M. (2001). Correctconfidence intervals for various regression effectsizes and parameters: The importance of noncentral distributions in computing intervals.
Educational and Psychological Measurement ,61 , 605-532.
Trivedi, K. S. (2001). Probability and statistics with reliability, queuing, and computer science applications (2nd Ed. ). New York: JohnWiley and Sons, Inc.
Van der Vaart, A.W., (1998). Asymptotic Statistics, 1-496. CambridgeUniversity Press, Cambridge.
Wilk, M. B., Gnanadesikan, R., &Huyett, M. J. (1962). Estimation of parametersof the gamma distribution using order statistics.
Biometrika , 49 , 525-545
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
143/277
Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc. November 2010, Vol. 9, No. 2, 470-479 1538 9472/10/$95.00
470
Nonlinear Trigonometric Transformation Time Series Modeling
K. A. Bashiru O. E. Olowofeso S. A. OwabumoyeOsun State University,
Osogbo, Osun StateFederal University Of Technology,
Akure Ondo State, Nigeria
The nonlinear trigonometric transformation and augmented nonlinear trigonometric transformation with a polynomial of order two was examined. The two models were tested and compared using daily meantemperatures for 6 major towns in Nigeria with different rates of missing values. The results were used todetermine the consistency and efficiency of the models formulated.
Key words: Nonlinear time series, polynomial, consistency, efficiency, missing value, model andforecasting.
Introduction
Time series analysis is an important techniqueused in many disciplines, including physics,engineering, finance, economics, meteorology,
biology, medicine, hydrology, oceanography andgeomorphology (Terasvirta & Anderson, 1992).This technique is primarily used to infer
properties of a system by the analysis of ameasured time record (data) (Priestley, 1988);this is accomplished by fitting a representativemodel to the data with an aim of discovering theunderlying structure as closely as possible.
Traditional time series analysis is based
on assumptions of linearity and stationarity.However, time series analysis (Box & Jenkins,1970; Brock & Potter, 1993) has nonlinear features such as cycles, asymmetries, bursts,
K. A. Bashiru is a lecturer in the Department of Mathematical and Physical Sciences. His areasof interest are Geostatistics and Time seriesEconometrics. Email him at:[email protected]. O. E.
Olowofeso is an Associate Professor of Statisticsin the Mathematical Sciences Department. He isalso currently working in the Central Bank of
Nigeria, Abuja. Email: [email protected]. A. Owabumoye is a Masters student under the supervision of O. E Olowofeso in theMathematical Sciences Department. Email:[email protected].
jumps, chaos, thresholds and heteroscedasticity,
and mixtures of these must also be taken intoaccount. Thus, a problem arises regarding asuitable definition of a nonlinear model becausenot every time series analysis is purely linear:the nonlinear class clearly encompasses a largenumber of possible choices. For these reasons,non-linear time series analysis is a rapidlydeveloping area and there have been major developments in model building and forecasting(De Gooijer & Kumar, 1992).
The growing interest in studyingnonlinear and non-stationary time series models
in many practical problems stems from theinherently non-linear nature of many phenomenain physics, engineering, meteorology, medicine,hydrology, oceanography, economics andfinance, that is, many real world problems donot satisfy the assumptions of linearity and/or stationarity (Bates & Watts, 1988; DeGooijer &Kumar, 1992; Sugihara & May, 1990).Therefore, for many real time series data,nonlinear models are more appropriate thanlinear models for accurately describing thedynamic of the series and making multi-step-
ahead forecast (Tsay, 1986; Barnett, Powell &Tauchen, 1991; Olowofeso, 2006). For example,financial markets and trends are influenced byclimatic factors like daily temperature, amountof rainfall and intensity of sun, these are areaswhere a need exists to explain behaviors that arefar from being even approximately linear.
Nonlinear models would be more appropriate for forecasting and accurately describing returns and
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
144/277
BASHIRU, OLOWOFESO & OWABUMOYE
471
volatility. Thus, the need for the further development of the theory and applications for nonlinear models is essential, and, because thereare an enormous number of nonlinear modelsavailable for modeling and forecasting economictime series, research should help provideguidance for choosing the best model for a
particular application (Robinson, 1983).
MethodologyThe model proposed by Gallant (1981) calledthe Augmented Nonlinear Parametric TimeSeries Model (ANPTSM) was used in this studyand a second model was formulated based on theLeast Square Method Modified Nonlinear Trigonometric Transformation Time SeriesModel (MNTTTSM).
DataData used in this study were daily mean
of temperatures from 1987 to 1996 for Ikeja,Ibadan, Ilorin, Minna and Zaria. The data werecollected from the Meteorological Centre-Oshodi Lagos.
Model FormulationConsider the format shown in Table 1.
In this model, up to 9 years were considered andthe model is formulated based on the data asshown in Table 2.
Assumption and Notation for the ModelsLet:
X t,i,k = value of occurrence for day t of Month iin the year k;
X t,k = mean occurrence for day t of year k;X*i,k = mean occurrence for month i of year k;X*yK = overall yearly mean for the sampled
month;X*mi = overall monthly mean for the sampled
year;t = the position of the day from the first day of
the Month. 1 t 31;ti = the sum of days in month i for 1 i 12;tik = the sum of days from the initial sampled
month of initial sampled year to month iof year k;
ti* = the sum of days from the initial sampledmonth to month I;
k = the position of a particular year from aninitial sample year for k ;
n = the number of sampled years;m = the number of sampled months; andX* = Grand Mean occurrence for k year(s)
examined.The first model was reviewed based on
the assumption that the sum of the occurrenceswere presented monthly, where i th monthrepresents the month i for 1 i 12 which is to
be modeled using the number of days in eachmonth (see Table 3).
Table 1: Model Formulation for a Particular Year
t/i 1 2 3 4 5 6 7 8 9 10 11 12 xi/12
1 x1,1,k x1,2,k x1,3,k x1,4,k x1,5,k x1,6,k x1,7,k x1,8,k x1,9,k x1,10,k x1,11,k x1,12,k X 1
2 x2,1,k x2,2,k x2,3,k x2,4,k x2,5,k x2,6,k x2,7,k x2,8,k x2,9,k x2,10,k x2,11,k x2,12,k X 2
t x t,1,k xt,2,k xt,3,k xt,4,k xt,5,k xt,6,k xt,7,k xt,8,k xt,9,k xt,10,k xt,11,k xt,12,k X t *
1,k x*2,k x*3,k x*4,k x*5,k x*6,k x*7,k x*8,k x*9,k x*10,k x*11,k x*12,k x*i,k /12
Table 2: Model Data Formulationt/ik 1 2 3 i k xi/ik
1 x1,1,1 x1,2,1 x1,3,1 x 1,i,k X 1,k
2 x2,1,1 x2,2,1 x2,3,1 x 2,i,k X 2,k
t x t,1,1 xt,2,1 xt,3,1 x t,i,k X t,k
x*1,k x*2,k x*3,k x *i,k x*i,k /ik= X
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
145/277
NONLINEAR TRIGONOMETRIC TRANSFORMATION TIME SERIES MODELING
472
Augmented Nonlinear Parametric Time SeriesModel (ANPTSM)
Trigonometric (sine and cosine)transformation augmented with polynomial of order two was applied to formulate the modelacross the year, that is, the monthly meansample and the least square methods were usedfor estimating the models parameters asfollows. Let the equation be of the form
( ) ( )2t ,i,k 1 2 ik 3 ik iX a a tSin t a t Cos t1 i 12
= + + +
(3.0)
The expected value of X t,i,k is X *i,k then theequation can be reformed as below to estimate
the parameters; a 1, a2 and a 3 using Least SquareMethod.
( ) ( )* 2i,k 1 2 i ik 3 i ik iX a a t Sin t a t Cos t1 i 12
= + + +
(3.1)
( ) ( )( )* 2i i,k 1 2 i ik 3 i ik X a a t Sin t a t Cos t = + +
(3.2)Let i2 = S
( ) ( )( )* 2 2i,k 1 2 i ik 3 i ik S (X a a t Sin t a t Cos t )= + + (3.3)
Differentiating 3.3 with respect to a 1, a2, a3, a3,as
01
aS
results in
( ) ( )* 2i,k 1 2 i ik 3 i ik X ma a t Sin t a t Cos t = + + (3.4)
where m is the number of the monthly samplemean examined. Similarly, as
02
aS
then
( ) ( )* 2 2i ik i,k 1 i ik 2 i ik 3
3 i ik ik
t Sin t X a t Sin t a t Sin (t )
a t Sin(t )Cos(t )
= +
+
(3.5)
and as0
3
aS
then
( ) ( )( ) ( )
( )
2 * 21i ik i,k i ik
32 i ik ik
4 23 i ik
t Cos t X a t Cos t
a t Sin t Cos t
a t Cos t
=
+
+
(3.6)
Simultaneously solving equations 3.4, 3.5 and3.6 using Cramers Rule results in equations 3.7-3.10.
Table 3: Months and Sums of Occurrences ModeledJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan
i 1 2 3 4 5 6 7 8 9 10 11 12 13 i k
t 31 281/4 31 30 31 30 31 31 30 31 30 31 31 t i
ti 31 591/4 90
1/4 1201/4 151
1/4 1811/4 212
1/4 2431/4 273
1/4 3041/4 334
1/4 3651/4 396
1/4 t
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
146/277
BASHIRU, OLOWOFESO & OWABUMOYE
473
Therefore, from equations 3.7, 3.8, 3.9 and 3.10,the following result:
a1 =0
1
(3.11)
a2 =0
2
(3.12)
a3 =0
3
(3.13)
Next, substituting 3.11, 3.12 and 3.13 into 3.1gives:
* 231 2i,k ik ik
0 0 0
X tSin(t ) t Cos(t ).
= + +
(3.14)
Because X *i,k is the expected value of X t,i,k ,equation 3.14 can be rewritten as
231 2t,i,k ik ik
0 0 0X tSin(t ) t Cos(t ).
= + +
(3.15)
Models 3.14 and 3.15 would only be visible provided there is an occurrence within a monthof any sampled year.
Modified Nonlinear TrigonometricTransformation Time Series Model(MNTTTSM)
In a situation where a whole month of
data is missing, the above model may bedifficult to apply and a different model would beneeded. The model for such occurrence isformulated as follows. If the data in 3.2 arereformed such that the monthly means are thoseshown in Table 4. Consider:
( )*i,k i iX a bsin t *= + + (3.16)
( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )
2 2 4 2 3 20 i ik i ik i ik ik
4 2 2 3i ik i ik i ik i ik i ik ik
2 3 2 2 2i ik i ik i ik i i i ik
m{ t Sin t t Cos t ( t Sin t Cos t ) }
t Sin t { t Sin t t Cos t t Cos t t Sin t Cos t }
t Cos t { t Sin t t Sin t Cos t t Cos t t Sin t }
=
+
(3.7)
( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )
* 2 2 4 2 3 21 i,k i ik i ik i ik ik
* 4 2 2 * 3i ik i ik i,k ik i ik i,k i ik ik
2 * 3 2 * 2 2i ik i ik i,k i ik ik i ik i,k i ik
X { t Sin t t Cos t ( t Sin t Cos t ) }
t Sin t { t Sin t X t Cos t t Cos t X t Sin t Cos t }
t Cos t { t Sin t X t Sin t Cos t t Cos t X t Sin t }
=
+
(3.8)
( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
* 4 2 2 * 32 i ik i,k i ik i ik i,k i ik ik
* 4 2 2 3i,k i ik i ik i ik i ik ik
2 2 * 2 *i ik i ik i ik i,k i ik i ik i,k
m{ t Sin t X t Cos t ( t Cos t X t Sin t Cos t }
X { t Sin t t Cos t t Cos t t Sin t Cos t }
t Cos t { t Sin t t Cos t X t Cos t t Sin t X }
=
+
(3.9)
( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
2 2 2 * 3 *3 i ik i ik i,k i ik i i ik i,k
2 * 2 *i ik i ik i ik i,k i ik i ik i,k
* 3 2 2 2i,k i ik i ik ik i ik i ik
m{ t Sin t t Cos t X ( t Sin t Cos t t Sin t X }
t Sin t { t Sin t t Cos t X t Cos t t Sin t X }
X { t Sin t t Sin t Cos t t Cos t t Sin t
=
+
(3.10)
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
147/277
NONLINEAR TRIGONOMETRIC TRANSFORMATION TIME SERIES MODELING
474
where
i1
1 i 12 , 1 t 3654
If the expected value of X *i,k is X *mi, then
equation 3.16 can take the form
( )*m i i iX a b sin t *= + + (3.17)where
i1
1 i 12 , 1 t * 3654
An ordinary least square method was used inestimating the parameters a and b. If S m = i2 = (X*mi -(a+bsin(t i*))2, then differentiating withrespect to a and b
( )( )*m *m i iS 2 X a bsin ta
= +
as
0
a
S m
( )*m *i i X 12a b sin t = + (3.18)
Also,
( ) ( )( )( )* *m *m i i iS 2 (sin t X a bsin t b
= +
as
0
bS m
( ) ( ) ( )*m * 2 *i i i isin t * X a sin t b sin t = + (3.19)
Using Cramers Rule to solve equations 3.18and 3.19 simultaneously, results in:
( )22 * *
4 i i
*m 2 * * *m *5 i i i i i
* *m *m *6 i i i i
12 Sin (t ) Sin(t )
X Sin (t ) Sin(t )X Sin(t )
12 Sin(t )X X Sin(t )
=
=
=
where parameters
( )
*m 2 * * *m *5 i i i i i
22 * *4 i i
X Sin (t ) Sin(t )X Sin(t )a
12 Sin (t ) Sin(t )
= =
(3.20)
and
( )
* *m *m *6 i i i i
22 * *4 i i
12 Sin(t )X X Sin(t ) b
12 Sin (t ) Sin(t )
= =
(3.21)
Therefore, the model for monthly occurrence is
Table 4: Modified Nonlinear Trigonometric Transformation Time Series Model Data
t/i 1 2 3 4 5 6 7 8 9 10 11 12 xi/12
1 X*1,1 X*2,1 X*3,1 X*4,1 X*5,1 X*6,1 X*7,1 X*8,1 X*9,1 X*10,1 X*11,1 X*12,1 X*y1
2X*
1,2 X*
2,2 X*
3,2 X*
4,2 X*
5,2 X*
6,2 X*
7,2 X*
8,2 X*
9,2 X*
10,2 X*
11,2 X*
12,2 X*y
2
k X*1,k X*2,k X*3,k X*4,k X*5,k X*6,k X*7,k X*8,k X*9,k X*10,k X*11,k X*12,k X*yk
x*m1 x*m2 x*m3 x*m4 x*m5 x*m6 x*m7 x*m8 x*m9 x*m10 x*m11 x*m x*m,i = X*12
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
148/277
BASHIRU, OLOWOFESO & OWABUMOYE
475
* *5 6
4 4
( )mi i X Sin t
= +
(3.22)
Because X *mi is an expected value for X *i,k thenequation 3.22 can be rewritten as
)( *
4
6
4
5,
*ik i t Sin X
+
= (3.23)
Similarly, along the sampled year X *yK = c + dSin( k) for - k + , 15 75. The must
be chosen such that i = 0, i2 is as small as possible.
If S y = i2 = (X*yK -(c+dsin( k ))2 then
y *y
K
S2 (X (c dsin( k))
c
= +
as
0
c
S y
*yK X nc d sin( k) = + (3.24)
Also,
y *yK
S2 (sin( k) X (c dsin( k)))
d
= +
as
0
d
S y
*y 2K sin( k) X c sin( k) d sin ( k) = +
(3.25)
Solving equations 3.24 and 3.25 simultaneously
using Cramers Rule results in2 2
7
*y 2 *y8 k k
*y *y9 k k
n Sin ( k) ( Sin( k))
X Sin ( k) Sin( k)X Sin( k)
n Sin( k)X X Sin( k)
=
=
=
Where the parameters
*y 2 *y8 k k
2 27
X Sin ( k) Sin( k)X Sin( k)c
n Sin ( k) ( Sin( k))
= =
(3.26)and
*y *y9 k k
2 27
n Sin( k)X X Sin( k)d
n Sin ( k) ( Sin( k))
= =
(3.27)
*y 8 9k
7 7
X Sin( k)
= +
(3.28)
The method of placing expected
occurrences in a contingency table of a Chi-square was applied using equations 3.23 and3.28 to obtain the model to find the dailyoccurrences for a particular month of a particular year. Therefore, the model for expected dailyoccurrences is
k y
k y
im
k it X X X n
X ***
,,))((
= (3.29)
Substituting 3.23 and 3.28 into 3.29, results in
+
+
+
=
)(
)()(
7
9
7
8
7
9
7
8*
4
6
4
5
,,
k Sin
k Sint Sinn
X i
k it
(3.30)
ResultsModel Analysis and Discussion
The data on the daily mean temperaturefor Ikeja, Ibadan, Ilorin, Minna and Zariacollected from the Meteorological Centre-Oshodi Lagos were used. The parameters of themodels were estimated and the fitted models for each zone are shown in Table 5 for Ikeja,Ibadan, Ilorin and Minna for ANPTSM. Data for the daily mean temperature was used to estimatethe parameters. The fitted model for Zaria could
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
149/277
NONLINEAR TRIGONOMETRIC TRANSFORMATION TIME SERIES MODELING
476
not be formulated due to the fact that manymonths of data were missing.
Table 6 shows the fitted models for
Ikeja, Ibadan,Ilorin, Minna and Zaria for MNTTTSM using the daily mean temperaturedata to estimate their parameters. The fittedmodel for Zaria was formulated becauseMNTTTSM has the strength of addressing the
problem of missing values. Thus, although manymonths data were missing from Zarias dailymean temperature, MNTTTSM parameterscould still be estimated. This is one of the
advantages of MNTTTSM over ANPTSM.Table 7 shows that the results of the
Pearson Product Moment Correlation
coefficients and Spearman Browns rank Order Correlation coefficients for Ikeja, Ibadan, Ilorinand Minna are highly and positively correlated,indicating a strong relationship between theactual data and estimated data for the daily meantemperature. In Zaria the correlation coefficientfor MNTTTSM is positive but low which mayindicate a weak relationship between the actualand estimated daily mean temperatures.
Table 5: The Fitted Models for ANPTSMZones Augmented Nonlinear Parametric Time Series Model (ANPTSM)
IKEJA ( ) ( )226.88642582 0.047971536tSin 0.000143793 Cost t tik ik + IBADAN ( ) 2ik ik 26.36612286 0.054847742tSin 0.0000344912t Cost t+ ILORIN ( ) ( )t t ik Cos
2
ik 000833551.04tSin0.0481158726.2476883 t +
MINNA ( ) ( )t t t ik ik CostSin2
00073.0062853.072428.25 +
ZARIA
Table 6: The Fitted Models for MNTTTSMZones Modified Nonlinear Trigonometric Transformation Time Series Model (MNTTTSM)
IKEJA
=+
++10
1
*
)601311.088996.26(
)6013116.088996.26)(420072.187226.26(10
K
i
k Sin
k SinSin t
IBADAN
=
+
++10
1
*
)901311.036761.26(
)9013535.036761.26)(591834.136749.26(10
K
i
k Sin
k SinSin t
ILORIN
=
+
++10
1
*
)45409024.040106.26(
)45409024.040106.26)(816182.145708.26(10
K
i
k Sin
k SinSin t
MINNA=
+
++
10
1
*
)90112148.067047.27(
)90112148.067047.27)(508736.256143.27(10
K
i
k Sin
k SinSin t
ZARIA
=
+
++10
1
*
)90222282.000445.25(
)90222282.000445.25)(210108.198532.24(10
K
i
k Sin
k SinSin t
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
150/277
BASHIRU, OLOWOFESO & OWABUMOYE
477
Apart from Ibadan, in which thecorrelation coefficient in ANPTSM is greater than MNTTTSM and Ikeja which has equalcorrelation coefficients, all other Zones, thecorrelation coefficient in MNTTTSM is greater than ANPTSM. This indicates that MNTTTSMshows a stronger relationship between the actualand estimated values than does ANPTSM.Although the relationship between actual andestimated values of MNTTTSM in Zaria is weak
but positive, that of ANPTSM could not beestimated due to the large number of missingvalues in the data. Also, all of the correlations
are significant at the 0.01 level (2-tailed).As shown in Table 8, the mean of the
actual and estimated values for each zones of allmodels are almost equal; differences are due toapproximation (truncation error) duringcalculation. Also, the mean of the actual andestimated values of MNTTTSM are closer thanthose of ANPTSM, which implies thatMNTTTSM estimates better than ANPTSM. Itwas also discovered from results in Table 8 thatthe more missing values in the data, the weaker the ANPTSM is in estimating, while inMNTTTSM, the model maintains its precision.
Table 7: Correlation Coefficients
Zones TypesANPTSM MNTTTSM
Coefficients Sig. Coefficients Sig.
IKEJAPearsons r 0.607 .000 0.607 .000
Spearmans Rho 0.620 .000 0.620 .000
IBADANPearsons r 0.594 .000 0.575 .000
Spearmans Rho 0.622 .000 0.584 .000
ILORINPearsons r 0.503 .000 0.589 .000
Spearmans Rho 0.560 .000 0.612 .000
MINNAPearsons r 0.596 .000 0.676 .000
Spearmans Rho 0.656 .000 0.686 .000
ZARIAPearsons r - - 0.419 .000
Spearmans Rho - - 0.445 .000
Table 8: Comparison of ANPTSM and MNTTSM Means
Zones NANPTSM MNTTTSM
Actual Estimated Actual Estimated
IKEJA 3,660 26.9077 26.8759 26.9077 26.8759
IBADAN 3,601 26.3749 26.3756 26.3749 26.3791
ILORIN 3,580 26.4558 26.2443 26.4558 26.4593
MINNA 3,362 27.5489 26.3611 27.5489 27.5559
ZARIA 3,588 - - 25.0514 25.0172
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
151/277
NONLINEAR TRIGONOMETRIC TRANSFORMATION TIME SERIES MODELING
478
Table 9 shows that the standarddeviations for MNTTTSM are less than those of ANPTSM which indicates that MNTTTSM is
better in estimating and forecasting thanANPTSM. Similarly, apart from the standarderror of ANPTSM and MNTTTSM of Ikeja,which are equal, it may be observed that thestandard errors for MNTTTSM were alsosmaller than those of ANPTSM, which indicatesthat MNTTTSM is better in estimating andforecasting than ANPTSM for time series datawith missing values.
Table 10 shows that at Ikeja, there is a95% chance that the differences between theactual and estimated daily mean temperaturewould lie between -0.00749 and 0.07119 inANPTSM and -0.00748 and 0.07118 inMNTTTSM. Similarly, at Ibadan; -0.0462 and0.04473 in ANPTSM and -0.0496 and 0.04114
in MNTTTSM, at Ilorin; 0.1505 and 0.2723 inANPTSM and -0.0592 and 0.05218 inMNTTTSM, at Minna; 1.1155 and 1.2601 inmodel I and -0.0689 and 0.05482 in MNTTTSMwhile in Zaria is between -0.0546 and 0.12310.
It was also discovered that the range of the confidence interval for MNTTTSM is lessthan that of ANPTSM for Ikeja and Ibadan. InIlorin and Minna, the lower confidence intervalsof differences for ANPTSM are positive whichindicates a 95% chance that the differences
between their actual and estimated dailytemperature (actual estimate) are positivewhile those of MNTTTSM are not. This impliesthat the estimated daily temperatures for ANPTSM at Ilorin and Minna were under-estimated. Hence MNTTTSM is better inestimating and forecasting than ANPTSM whenthere are missing values in the time series.
Table 9: Comparison of ANPTSM and MNTTSMS Standard Deviation andStandard Error of Differences
ZonesANPTSM MNTTTSM
Std. Dev. Std. Error of the Mean Std. Dev.Std. Error of
the MeanIKEJA 1.2138 0.02006 1.2137 0.02006
IBADAN 1.3913 0.02319 1.3882 0.02313
ILORIN 1.8585 0.03106 1.6996 0.02841MINNA 2.1381 0.03688 1.8293 0.03155
ZARIA - - 2.7152 0.04533
Table 10: Comparison of ANPTSM and MNTTTSMs 95 % Confidence Interval of the Difference
ZonesANPTSM MNTTTSM
Lower Upper Lower Upper
IKEJA -0.00749 0.07119 -0.00748 0.07118
IBADAN -0.0462 0.04473 -0.0496 0.04114
ILORIN 0.1505 0.2723 -0.0592 0.05218
MINNA 1.1155 1.2601 -0.0689 0.05482
ZARIA - - -0.0546 0.12310
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
152/277
BASHIRU, OLOWOFESO & OWABUMOYE
479
ConclusionThe two models tested in this study were theAugmented Nonlinear Parametric Time SeriesModel (ANPTSM) and the Modified Nonlinear Trigonometric Transformation Time SeriesModel (MNTTTSM). Both models were testedusing daily mean temperatures at Ikeja, Ibadan,Ilorin, Minna and Zaria, and the results wereanalyzed. It was discovered that ANPTSM could
be used in forecasting provided the data ishaving few missing values. However MNTTTSM estimates forecasts better thanANPTSM in estimating missing values andforecasting. Based on results of this study,MNTTTSM is more efficient in estimatingmissing values and forecasts better thanANPTSM.
The beauty of a good model developedfor nonlinear time series modeling is the abilityto forecast better, the new method MNTTTTSMis therefore recommended for numericalsolutions for a nonlinear model with missingvalues due to its higher capacity to addressmissing values. It was also noted that themathematical derivative of MNTTTSM issimpler than ANPTSM which did not forecast
better. Further research could be conducted by placing a condition in which data having a year or more of missing values is taken intoconsideration.
ReferencesBarnett, W. A., Powell, J., & Tauchen,
G. E. (1991). Nonparametric and semi parametric methods in econometrics and statistics . Proceedings of the 5 th InternationalSymposium in Economic Theory andEconometrics. Cambridge: CambridgeUniversity Press.
Bates, D. M., & Watts, D. G. (1988). Nonlinear regression analysis and itsapplications . New York: Wiley.
Box, G. E. P., Jenkins, G. M. (1970).Time series analysis, forecasting and control .San Francisco: Holden-Day.
Brock, W. A., Potter, S. M. (1993). Nonlinear time series and macroeconometrics.In: G.S. Maddala, C. R. Rao & H. R. Vinod,Eds., Handbook of Statistics , Vol. 11 , 195-229.Amsterdam: North-Holland.
De Gooijer, J. G., & Kumar, K. (1992).Some recent developments in non-linear timeseries modeling, testing and forecasting.
International Journal of Forecasting , 8, 135-156.
Friedman, J. H., & Stuetzle, W. (1981).Projection pursuit regression. Journal of the
American Statistical Association , 76 , 817-823.Gallant, A. R. (1987). Nonlinear
statistical models . New York: Wiley.Gallant, A. R., & Tauchen, G. (1989).
Semi nonparametric estimation of conditionallyconstrained heterogeneous processes: asset
pricing application. Econometrica , 57 , 1091-1120.
Granger, C. W. J., & Hallman, J. J.(1991a). Nonlinear transformations integratedtime series. Journal of Time Series Analysis , 12 ,207-224.
Olowofeso, O. E. (2006). Varying co-efficient regression models for non-linear time
series: Estimation and testing .13 th Meeting of the Forum for Interdisciplinary Mathematicaland Statistical Techniques.
Priestley, M. (1988). Non-linear and
non-stationary time series analysis . London:Academic Press.
Robinson, P. M. (1983). Non-parametricestimation for time series models. Journal of Time Series Analysis , 4, 185-208.
Sugihara, G., & May, R. M. (1990). Nonlinear forecasting as a way of distinguishingchaos from measurement error in time series.
Nature , 344 , 734-741.Terasvirta, T., & Anderson, H. M.
(1992). Modelling nonlinearities in businesscycles using smooth transition autoregressive
models. Journal of Applied Econometrics , 7 ,S119-S136.
Tsay, R. S. (1986). Nonlinearity tests for time series. Biometrika , 73 , 461-466.
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
153/277
Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc. November 2010, Vol. 9, No. 2, 480-487 1538 9472/10/$95.00
480
Incidence and Prevalence for A Triply Censored Data
Hilmi F. KittaniThe Hashemite University,
Jordan
The model introduced for the natural history of a progressive disease has four disease states which areexpressed as a joint distribution of three survival random variables. Covariates are included in the modelusing Coxs proportional hazards model with necessary assumptions needed. Effects of the covariates areestimated and tested. Formulas for incidence in the preclinical, clinical and death states are obtained, and
prevalence formulas are obtained for the preclinical and clinical states. Estimates of the sojourn times inthe preclinical and clinical states are obtained.
Key words: Progressive disease model, prevalence, incidence, trivariate hazard function, censored data, proportional hazards model, sojourn times, chronic habitu.
IntroductionLouis, et al. (1978) introduced a natural historymodel for a progressive disease in a set of threearticles: Albert, Gertman and Louis (1978),Albert, Gertman, Louis and Liu(1978) andLouis, Albert and Heghinian (1978). This modelwas extended by Kittani (1995a). Clayton (1978)also developed a model for association for the
bivariate case and Oakes (1982) made inferencesabout the association parameter in Claytonsmodel. Clayton and Cuzick (1985) introduced
the bivariate survival function for two failuretimes and made inferences about the association parameter, . Kittani (1995a, 1996, 1997, 1997-1998) considered the model for the bivariatecase that is, a case with two failure times (X,T) by including covariates and by using Coxs
proportional hazards model.The motivation for this research lies in
the fact that it is necessary to identify a threedimensional survival function for three failuretimes (X, Y, D) with four disease states (diseasefree state, preclinical state, clinical state and
death state). In the model, X is the age uponentering the preclinical state (tumor onset or first
Hilmi Kittani is a Professor of Statistics in theCollege of Science, Department of Mathematics.Email: [email protected].
heart attack), T is the age when entering theclinical state (symptoms first appear or secondheart attack) and D is the age upon entering thedeath state (dying of cancer or acute myocardialinfarction). Kittani (2010) considered estimatingthe parameters using nonparametric approach for a triply censored data.
Background and AssumptionsAs in the Louis, et al. (1978) model, it is
assumed that f XYZ(x,y,z,a) is continuous that is,
X = Y = Z = is not allowed and Y and Z aretermed the sojourn times in the preclinical andclinical states respectively. The model proposed
by Louis, et al. (1978) makes the assumption of no cohort effect, meaning that the distribution of the random variables (X, Y, Z) is independent of the age distribution A, or
f XYZA (x, y, z, a) = f XYZ(x, y, z) f A(a)
andf XYA (x, y, a) = f XY(x, y) f A(a)
where f XYZ(x, y, z) is the joint pdf of (X, Y, Z) ,f XY(x,y) is the joint pdf of X, Y and f A(a) is the
pdf of A( the age distribution of the subject population). In addition, a subject is a chronichabitu of the PCS if, for that subject, X < , Y= , for example, subject never leaves PCS.According to the model, there will be no chronichabitus of the PCS or CS because, if a subject
-
8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests
154/277
KITTANI
481
lives long enough, then he/she will progress tothe next state eventually (Louis et al., 1978).
The X, Y and Z axes are partitioned intoI, J and K intervals according to Chiang, et al.(1989) and Hollford (1976); they assumedconstant baseline hazards in each subinterval, 1i(x) = 1i, x Ii, 2j (y) = 2j , y I j and 3k (z)= 3k , z Ik in the i th, jth and k th intervalsrespectively. The hazard functions for the n th individual whose (X, Y, Z) values fall in thecube I ixI jxIk are modeled by assuming Coxs(1972) proportional hazards model and holds for each X, Y and Z in each respective I i, I j and I k interval.
Assuming , and (regression parameters) for the covariate (p-dimensional)are constant (the same) for all intervals to beestimated . The hazard functions 1, 2 and 3 inthe I th, Jth and K th intervals for n th individualwhose observed (X, Y, Z) value is ( Xn, Yn, Zn)will be defined as
'n(x ) e , x I1i n 1i n i
(a , a ], (y )i i 1 2 j n'
ne ,2 j
=
= +
=
andy I (b , b ], (z )n j j j 1 3k n
'ne , z I3k n k
(c ,c ].k k 1
= +
=
= +
Where 1i, 2j and 3k are baseline hazardfunctions associated with X, Y and Zrespectively. Assuming , and are constant(the same) regression parameters for thecovariate for all intervals and to be estimatedalong with the association parameter .
The joint survival function for the threenon-negative random variables (X, Y, Z) given
by Kittani (1995b) is:
1
-)()()( ]2[z),F(x, 321 ++= z y x eee y .(2.1)
Where > 0, x > 0, y> 0, z > 0, and 1, 2, 3 are the cumulative hazard functions associatedwith X, Y and Z respectively. For example, tocompute 1i(x), which is the cumulative hazardfunction for the n th individual whose x value fallsin the i th interval (assuming a constant hazardover each interval) is as follows:
k x
n 10
r 1 r n i
(x ) (u)du1i
i 1 '(a a ) (x a ) e1r 1i
r 1
+
=
= +
= (2.2)
where 2j(yn) and 3k (zn) are defined in a similar way. Thus, the joint density function (X, Y, Z) is
1 32
1 2
3
3)[ ( x ) ( y ) ( z )]
y
( 1)( 2) (x ) (y)
(z) e U1
(-
f(x, ,z)
+ +
=
+ +
(2.3)
where > 0, x > 0, y > 0, z > 0, and 1, 2 and 3 are base line hazard functions associated with X,Y and Z respectively as
31 2 (z)(x) (y)U e e e 2. = + +
Kittani (1996) derived the likelihood functionfor the uncensored and censored cases in order to estimate the regression parameters bymaximizing the likelihood function, that is, thenth individual that generates data vector wn , andL(wn) is the likelihood function contribution for the n th individual as:
.1 2 N N
L(w ,w ,...,w ) L(w )nn 1
= =
-
8/2/201