Empirical Characteristic Function Approach to Goodness of Fit Tests

8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

1/277

Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc.

November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00

iii

Journal Of Modern Applied Statistical Methods

Shlomo S. Sawilowsky

Editor College of EducationWayne State University

Harvey Keselman Associate Editor

Department of PsychologyUniversity of Manitoba

Bruno D. Zumbo Associate Editor Measurement, Evaluation, & Research Methodology

University of British Columbia

Vance W. Berger Assistant Editor

Biometry Research Group National Cancer Institute

John L. Cuzzocrea Assistant Editor

Educational ResearchUniversity of Akron

Todd C. Headrick Assistant Editor

Educational Psychology and Special EducationSouthern Illinois University-Carbondale

Alan Klockars Assistant Editor

Educational PsychologyUniversity of Washington


2/277


November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00

iv

Journal Of Modern Applied Statistical Methods

Invited Debate

332 339 Daniel H. Robinson, The Not-So-Quiet Revolution: CautionaryJoel R. Levin Comments on the Rejection of HypothesisTesting in Favor of a Causal ModelingAlternative

340 347 Joseph Lee Rodgers Statistical and Mathematical Modelingversus NHST? Theres No Competition!

348 358 Lisa L. Harlow On Scientific Research: The Role of Statistical Modeling and HypothesisTesting

Regular Articles359 368 Robert H. Pearson, Recommended Sample Size for Conducting

Daniel J. Mundfrom Exploratory Factor Analysis onDichotomous Data

369 378 Marcelo Angelo Cirillo, Generalized Variances Ratio Test for Daniel Furtado Ferreira, Comparing k Covariance Matrices fromThelma Sfadi, Dependent Normal PopulationsEric Batista Ferreira

379 387 Kung-Jong Lui Notes on Hypothesis Testing under a Single-Stage Design in Phase II Trial

388 402 Housila P. Singh, Effect of Measurement Errors on the SeparateNamrata Karpe and Combined Ratio and Product Estimators

in Stratified Random Sampling

403 413 Xian Liu, Reducing Selection Bias in AnalyzingCharles C. Engel, Longitudinal Health Data with HighHan Kang, Mortality RatesKristie L. Gore

414 442 Oluseun Odumade, Use of Two Variables Having Common MeanSarjinder Singh to Improve the Bar-Lev, Bobovitch and

Boukai Randomized Response Model

443 451 Peyman Jafari, A Flexible Method for Testing IndependenceNoori Akhtar-Danesh, in Two-Way Contingency TablesZahra Bagheri


3/277


November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00

v

452 460 Seema Jaggi, Neighbor Balanced Block Designs for Two

Cini Varghese, FactorsN. R. Abeynayake

461 469 Moustafa Omar Ahmed Adjusted Confidence Interval for theAbu-Shawiesh Population Median of the Exponential

Distribution

470 479 K. A. Bashiru, Nonlinear Trigonometric TransformationO. E. Olowofeso, Time Series ModelingS. A. Owabumoye

480 487 Hilmi F. Kittani Incidence and Prevalence for A TriplyCensored Data

488 494 Mowafaq M. Al-Kassab, A Comparison between Unbiased Ridge andOmar Q. Qwaider Least Squares Regression Methods Using

Simulation Technique

495 501 Hatice Samkar, Ridge Regression Based on Some RobustOzlem Alpu Estimators

502 511 Sanizah Ahmad, Robust Estimators in Logistic Regression:Norazan Mohamed Ramli, A Comparative Simulation StudyHabshah Midi

512 519 Sunil Kumar, A General Class of Chain-Type EstimatorsHousila P. Singh, in the Presence of Non-Response Under Sandeep Bhougal Double Sampling Scheme

520 535 Li-Chih Wang, A GA-Based Sales Forecasting ModelChin-Lien Wang Incorporating Promotion Factors

536 546 Anton Abdulbasah Kamil, Maximum Downside Semi DeviationAdli Mustafa, Stochastic Programming for PortfolioKhilpah Ibrahim Optimization Problem

547 557 Gyan Prakash On Bayesian Shrinkage Setup for ItemFailure Data Under a Family of Life TestingDistribution

558 567 M. T. Alodat, Empirical Characteristic Function ApproachS. A. Al-Subh, to Goodness of Fit Tests for the LogisticKamaruzaman Ibrahim, Distribution under SRS and RSSAbdul Aziz Jemain


4/277


November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00

vi

568 578 Sheikh Parvaiz Ahmad, Bayesian Analysis of Location-Scale Family

Aquil Ahmed, of Distributions Using S-Plus and R SoftwareAthar Ali Khan

579 583 Ozer Ozdemir, ANN Forecasting Models for ISE National-Atilla Aslanargun, 100 IndexSenay Asma

584 595 Shafiqah Alawadhi, Markov Chain Analysis and StudentMokhtar Konsowa Academic Progress: An Empirical

Comparative Study

Brief Report 596 598 L. V. Nandakishore Bayesian Analysis for Component

Manufacturing Process

Emerging Scholars599 603 Hamid Reza Kamali, Estimating the Non-Existent Mean and

Parisa Shahnazari- Variance of the F-Distribution by SimulationShahrezaei

JMASM is an independent print and electronic journal (http://www.jmasm.com/), publishing (1) new statistical tests or procedures, or the comparison of existing statistical testsor procedures, using computer-intensive Monte Carlo, bootstrap, jackknife, or resamplingmethods, (2) the study of nonparametric, robust, permutation, exact, and approximaterandomization methods, and (3) applications of computer programming, preferably in Fortran(all other programming environments are welcome), related to statistical algorithms, pseudo-random number generators, simulation techniques, and self-contained executable code to carryout new or interesting statistical methods.

Editorial Assistant: Julie M. SmithInternet Sponsor: Paula C. Wood , Dean, College of Education, Wayne State University


5/277

Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc. November 2010, Vol. 9, No. 2, 332-339 1538 9472/10/$95.00

332

INVITED DEBATE The Not-So-Quiet Revolution: Cautionary Comments on the Rejection of

Hypothesis Testing in Favor of a Causal Modeling Alternative

Daniel H. Robinson Joel R. LevinUniversity of Texas University of Arizona

Rodgers (2010) recently applauded a revolution involving the increased use of statistical modelingtechniques. It is argued that such use may have a downside, citing empirical evidence in educational

psychology that modeling techniques are often applied in cross-sectional, correlational studies to produceunjustified causal conclusions and prescriptive statements.

Key words: Modeling, hypothesis testing, SEM, HLM, causation.

Daniel H. Robinson is Professor of EducationalPsychology and editor of Educational

Psychology Review. His research interestsinclude educational technology innovations thatmay facilitate learning, team-based approachesto learning, and examining gender trendsconcerning authoring, reviewing, and editingarticles published in various educational journalsand societies. Email him at:

[email protected] R. Levin is Professor Emeritus at theUniversity of Arizona and the University of Wisconsin-Madison. He is former editor of the

Journal of Educational Psychology , and former Chief Editorial Advisor for journal publicationsof the American Psychological Association. Hisresearch interests include the design and

statistical analysis of educational research, aswell as cognitive-instructional strategies thatimprove students learning. Email him at:

[email protected].

IntroductionOver the years, we have found that Joseph

Rodgers (e.g., Rodgers, Cleveland, van denOord, & Rowe, 2000; Rodgers & Nicewander,1988) has something academically interesting,meaty, and instructive to say. Against that

backdrop, Rodgers most recent essay, provocatively titled The epistemology of mathematical and statistical modeling: A quietmethodological revolution (Rogers, 2010)merits close examination and extensive


6/277

ROBINSON & LEVIN

333

commentary. Rodgers appeared to have missedthe mark in two critical respects; both reflectedin the subtitle A quiet methodologicalrevolution, because as will become apparent inthe following discussion, the revolution isneither quiet nor methodological.

The Null Hypothesis HullabalooRodgers is correct in stating that serious

concerns about null hypothesis significancetesting (NHST) have been mounting over the

past several decades. Yet, as is well representedin Harlow, Mulaik, & Steigers (1997)impressive volume, NHST criticisms havehardly been expressed quietly, but rather withfull sound and fury. Moreover, in making hiscase, Rodgers provided a one-sided view of thecontroversy. Although several sources that indict

NHST were cited, short shrift was given toapproaches that have defended reasonable and

proper applications of statistical hypothesistesting, including, among others, decidingwhether a believed-random process is trulyrandom (e.g., Abelson, 1997), intelligenthypothesis testing (Levin, 1998a), equivalencetesting (e.g., Serlin & Lapsley, 1993), andhypothesis testing supplemented by effect-sizeestimation and/or confidence-intervalconstruction (Steiger, 2004).

In addition, numerous authors have

defended the use of NHST when mindfullyapplied (e.g., Frick, 1996; Hagen, 1997;Robinson & Levin, 1997; Wainer & Robinson,2003). Rodgers cited social-sciences statisticalsage Jacob Cohen (1994) as one who dismissed

NHST practices in his 1994 seminal article,The Earth is Round ( p < .05). Yet, in the samearticle, one could easily interpret Cohens (p.1001) comment about the nonexistence of magical alternatives to NHST as conceding thatfor whatever good NHST does, there are noadequate substitutes.

Rodgers (p. 2) described thefundamental difference between the Fisherianand Neyman-Pearson approaches, with the latter emphasiz[ing] the importance of the individualdecision. However, he characterized NHST as ahybrid and condemned it. Just because atechnique is often misused is not a sufficientreason to abandon it. For example, it is argued

below that in educational psychology we have

observed frequent misapplication of theRodgers favored causal modeling techniques. Inrecognizing that misapplication, however, our goal is not to deter researchers from adoptingmodeling techniques, but rather to encourageresearchers to apply such techniquesappropriately and to interpret wisely the resultsthat they pump out. (Back in the Neanderthalage of computers, grind out would have been amuch more fitting description.)

As researchers who have spent most of our careers conducting randomized experiments,we have sought to apply NHST judiciously,typically adopting or adapting Neyman-Pearsona priori Type I, Type II error, effect-size, andsample-size specification principles.Accordingly, we have found that in experimentsconducted with rationally (or better, optimally)

determined sample sizes - that is, sample sizesassociated with enough statistical power todetect nontrivial differences but with not toomuch power to detect trivial differences (see, for example, Levin, 1998b; and Walster & Cleary,1970) - NHST provides useful informationconcerning whether one has an experimentaleffect worth pursuing. In this context, pursuingmeans that obtaining a statistically significanteffect is followed by a sufficient number of independent replications until the researcher hasconfidence that the initially observed effect is a

statistically reliable one (see, for example, Levin& Robinson, 2003).

In that sense as well, we have regarded NHST primarily as a screening device, similar infunction to what Sir Ronald had in mind (e.g.,Fisher, 1935). Much of the hullabaloo about

NHST is caused by too many researchersfocusing on the results of a single study rather than on a series of studies that are part of a

program of research (Levin & Robinson, 2000).Fisher was never satisfied with an effectidentified in a single study, even if it had a p

value of less than 0.05! Instead, he believed thata treatment was only worth writing home aboutwhen it had consistently appeared in numerousexperiments. As is implied in the followingsection, whatever purported advantagesmodeling techniques have over NHST alsovanish unless researchers test a priori models inmultiple experiments.

Rodgers (p. 3) also condemned the


7/277

REJECTION OF HYPOTHESIS TESTING IN FAVOR OF CAUSAL MODELING

334

NHST jurisprudence model while aptly referringto Tukeys (1977) confirmatory data analysisstrategy as being judicial (or quasi-judicial) innature. Yet, Rodgers mischaracterized Tukeysexploratory data analysis strategy insofar as thedetective nature of that hypothesis-generatingapproach clearly is not jurisprudence. It is thisdetective role that one emphasizes when using

NHST simply as a research-based screening process to determine whether posited effectsexist. To us, convincing a jury of ones peersthat a prescription for practice should be basedon a single research study is rarely, if ever,

justified.Rodgers (p. 9) assertion that a

fundamental problem with NHST is one of testing valueless nil null hypotheses has beenadvanced by many critics. As researchers who

endeavor to use intelligent forms of hypothesistesting with experimental data, we regard the

problem of nil nulls not as a statistical issue butas a methodological one. Specifically, it makeslittle or no conceptual sense to apply NHSTwhen comparing an instructional treatment witha closet (Levin, 1994, p. 233) control group(i.e., a condition in which participants sit in adark room and do nothing), just as it is inane tocompute p-values for reliability correlations(see, for example, Thompson, 1996).Educational psychology is filled with such

examples of comparing new innovations withridiculous straw-person control conditions thatno sane researcher would ever consider using. Amore appropriate formulation of a nil null iswhen an investigator wishes to compare a newlydeveloped and previously untested experimentaltreatment with the best treatment that iscurrently available.

According to Rodgers, the [1999 task force assembled by the American PsychologicalAssociation] concluded that NHST was brokenin [a] certain respect (p. 3). Task-force member

Wainer and the present first author (Wainer &Robinson, 2003) provided a different view of thetask forces brief consideration of therecommendation to issue an outright ban on

NHST. As we have argued previously (e.g.,Levin & Robinson, 1999) and in our precedingdiscussion, adopting such an extreme stancewould be akin to calling for a ban on hammers

because hammerers were hammering their

fingers instead of nails (for additionaldiscussion, see Levin & Robinson, 2003). Eventhe outspoken NHST critic Rozeboom (1997)acknowledged via another tools analogy thatthe sharpest of scalpels can only create a messif misdirected by the hand that drives it, (p.335). Fortunately, in the case of the most recent(6 th) edition of the APA Publication Manual (American Psychological Association, 2010),the hypothesis-testing baby was not thrown outwith the bath water.

Causal Modeling TechniquesContemporary modeling techniques,

including structural equations modeling (SEM)and hierarchical linear modeling (HLM), amongothers, which emerge from atheoretical/conceptual framework, are

statistical/data-analytic and not methodologicalin nature. So, whence Rodgersmethodological revolution? Even he noted on

p. 8 that SEM has been built into a powerfulanalytic method and is a prototype of the firstapproach [a model-comparison framework] to

postrevolutionary modeling (p. 8).That a statistical modeling tail often

wags the methodological dog may havecontributed to what we consider a major misuseof causal modeling: researchers attempting tosqueeze causality out of observational or

correlational data. Because of the unfortunatecausal nomenclature, we fear that manyresearchers may be deluded into believing thatthe statistical control that such techniques

provide for correlational (non-experimental)data is on a par with the genuine experimentalcontrol of randomized experiments (Levin &ODonnell, 2000, p. 211). This in turn results incausal-model appliers issuing causal conclusionsthat they mistakenly believe are scientificallyvalid. As Cliff (1983) previously noted, Literalacceptance of the results of fitting causal

models to correlational data can lead toconclusions that are of questionable value (p.115).

In addition, because causal-modelresearchers conclusions typically flow fromrevised data-driven models rather than from a

priori theory-based model specifications, in theabsence of independent validations those causalconclusions present even more cause for


8/277

ROBINSON & LEVIN

335

concern. As with our previous hammers vs.hammerers distinction, Rodgers is well aware of researchers potential shoddy application of causal modeling techniques. Yet, he could havesent a stronger cautionary message to therelatively uninitiated model builder than hisinnocuous pronouncement that the success of SEM depends on the extent to which it is appliedin many research settings (p. 8).

To illustrate what we mean by prescriptive statements appearing in articles thatinclude statistical modeling techniques, we offer very recent examples that appeared in areputable educational psychology research

journal. To avoid redundancy, we offer only twosuch unjustified causal excerpts here, fromnumerous ones that we have encountered inmultiple teaching-and-learning research journals

that we have recently read or reviewed (seeRobinson, Levin, Thomas, Pituch, & Vaughn,2007, and the following section).

Ciani, Middleton, Summers and Sheldon(2010)s Study

The following summary appeared inCiani et al.s study abstract:

Multilevel modeling was used to teststudent perceptions of three contextual

buffers: classroom community, teachers

autonomy support, and a masteryclassroom goal structureResults

provide practitioners with tools for counteracting potential negativeimplications of emphasizing

performance in the classroom. (p. 88)

There was one predictor variable; one outcomevariable, a three-item scale that measuredstudents motivation to learn; and threemoderator variables, a three-item scale thatmeasured student perceptions of classroom

community, a four-item scale that measuredstudent perceptions of instructor autonomysupport, and a three-item scale that measuredstudent perceptions of the extent to which their teacher emphasizes developing competence inthe classroom. All measures were collected at asingle point in time and HLM was used toanalyze the data. Here are a couple causalconclusions from the discussion section:

However, it appears that comparingstudents achievement publicly, or usingthe work of the highest achievingstudents as an example for everyone,may not be so pernicious a practicewhen students in the classroom perceivea sense of community among their fellow classmates.

[O]ur findings demonstrate that if students feel respected by the teacher,such that their preferences and ways of doing things are acknowledged andaccommodated as much as possible,then a strong performance orientation onthe part of the teacher is not harmful.Autonomy support enables students tointernalize what they are doing, so that

they view their activity as importanteven if it is not enjoyable, or if it createsstress and pressure. Thus, it appears thatemphasizing competition betweenstudents is not necessarily underminingof student mastery goals, if the teacher can communicate and promote the

performance structure in a non-controlling way. These findings arereassuring, showing that performanceorientations are not necessarilycorrosive certainly an important

message, given the performancenecessities that all students face. (p. 95)

As with most of these articles based oncorrelational data and yet that offer prescriptiverecommendations, certain limitations of theresearch are explicitly acknowledged by theauthors:

The most significant limitation to thecurrent study is that all data reported arecorrelational.

Gathering data at one point in time alsocreates a limitation regarding the causalrelationships among the variables in thisstudy. (p. 96)

These limitations aside (or ignored?), the authors proceeded to offer the following prescriptive:


9/277


336

Our findings, along with other goaltheorists (e.g., Urdan & Midgley, 2003),suggest that given current prevailingattitudes and policy it may be morefruitful to emphasize adaptiveinstructional practices in the classroom,as opposed to trying to reducemaladaptive practices. (p. 97)

Thus, the authors made recommendations for practice (prescriptive statements) in theabsence of convincing evidence that such

practices are clearly causally related to studentoutcomes.

Chen, Wu, Kee, Lin & Shuis (2009) StudyChen et al. used SEM to analyze

relations among fear of failure, achievement

goals, and self-handicapping. Causal relationsamong the variables are implied in theDiscussion section:

This finding shows fear of failure as adistal determinant of self-handicappingand achievement goals (MAv and PAv)as proximal determinants of self-handicapping, demonstrating themotivational process of self-handicapping. (p. 302)

The authors revealed the perceived magicalquality of SEM allowing researchers to coaxcausality from correlational data:

Since SEM analysis examines manyvariables relationships simultaneously, werely on its results as the basis for our conclusions and discussion. (p. 303)

The Limitations section is predictable:

Although we used the SEM approach to

estimate the proposed model, the data inthe study are cross-sectional in natureand causal relations cannot be drawn.The longitudinal approach is preferredin order to ascertain the causal patternand to further clarify the chronic effectsof mastery-avoidance and performance-approach goals on achievement-relatedoutcomes. (p. 304)

In contrast, what follows are the grand prescriptives that appeared in the Implicationsand Conclusions:

We believe that the integrative modelcan help educators develop effectiveinterventions to reduce students self-handicapping, especially since we foundthat the mid-level achievement goals(MAv and PAv) mediate therelationships between fear of failure andself-handicapping it is suggested thatteachers use multiple indices to offer more opportunities for students to attainsuccess. In addition, teachers shouldencourage students to embrace amultiple goals perspective in whichdoing ones best and outperforming

others are not in conflict with eachother. (p. 304)

Rodgers (2010, p. 8) previously proffered caveataside, in both of the just-presented examples,cross-sectional (one time point), correlational(no variables were manipulated) data weretossed into a statistical modeling analysis andwhat popped out were causal conclusions.

Correlational Data and Causal ConclusionsOver the past few years, we have

examined empirical articles published in widelyread teaching- and-learning research journalsand have found that:

1. In one journal survey (Hsieh et al., 2005),the proportion of articles based onintervention and experimental (randomassignment) methodology had decreasedfrom 47% in 1983 to 23% in 2004.

2. In another journal survey (Robinson et al.,2007), the proportion of articles based on

intervention methods had decreased from45% in 1994 to 33% in 2004. Meanwhile,the proportion of nonintervention articlesthat contained prescriptive statementsincreased from 34% in 1994 to 43% in 2004.The proportion of nonintervention (non-experimental and correlational) articles thatincluded prescriptive statements (in the formof causally implied implications for


10/277

ROBINSON & LEVIN

337

educational practice) increased from 33% in1994 to 45% in 2004.

3. In a follow-up to the just-describedRobinson et al. (2007) survey (Shaw, Walls,Dacy, Levin & Robinson, 2010), althoughonly 19 nonintervention studies in 1994included prescriptive statements, thesestatements were repeated in 30 subsequentarticles that had cited the original 19.

For the present article, we examined thefirst two issues of the 1999 volume of the APA-

published journal, the Journal of Educational Psychology , and again for the 2009 volume. Welooked specifically at the comparative

proportions of articles based on correlationalmethods and those that involved interventions

(either randomized experimental or nonrandomized but researcher manipulated), aswell as the proportion of correlational methodsarticles in which prescriptive statements wereoffered. The results are summarized in Table 1.

Although roughly half of the articlesappearing in only one of the five journals thatwere part of Robinson et al.s (2007) study weresurveyed, the findings support the reportedtrends. Intervention studies (both randomizedand nonrandomized) are becoming increasinglyrare and instead researchers are basing their

recommendations for practice on weaker evidence. Moreover, it appears that statistical

and nonrandomized) are becoming increasinglyrare and instead researchers are nonrandomized)modeling techniques are becoming more popular - having increased from only 3% of thecorrelational research articles in 1999 to 40% in2009 - which may in turn contribute to theconcomitant 10-year increase in prescriptivestatements appearing in such articles.

Thus, we have witnessed widespreadapplication of SEM, HLM, and other sophisticated statistical procedures incorrelational data contexts, where causality issought but the critical conditions needed toattribute causality are missing (e.g., Marley &Levin, 2011; Robinson, 2010). Rodgers statesthat researchers who are scientistsshould befocusing on building a modelembedded withinwell-developed theory (p. 4-5). Here we agree

with former Institute for Educational ScienceDirector Grover Whitehurst who argued that - atleast in the field of education - we have enoughtheory development studies and need morestudies that address practical what worksquestions.

It is our fear that a research approachwhere the question, Does the data fit mymodel? is far more dangerous than thequestion, Is there anything here worth

pursuing? As we have seen, an affirmativeanswer to the former question seems to entitle a

researcher to form a model that indicates acausal relationship between, say, students self-efficacy and their achievement. The researcher then develops a self-efficacy scale that measures

Table 1: Summary of Selected Results of Surveyed Articles Appearing in the Journal of Educational Psychology (1999 and 2009) Based on Either Correlational or Intervention Methods

1999 2009

Type of Study Type of Study

Correlational Intervention Correlational Intervention

Number of Articles 18 (60%) 12 (40%) 23 (66%) 12 (34%)

Prescriptive Statements 9 (50%) ------ a 13 (57%) ------ a

Statistical Modeling 1 (3%) 0 (0%) 14 (40%) 2 (6%)

Prescriptive Statements 1 ------ a 7 (50%) ------ a

Note : This table includes preliminary data from a larger study recently completed by Reinhart, Haring,Levin, Patall, and Robinson (2011). a Not assessed in the present survey


11/277


338

students self-perceptions and also measuresachievement. The data may fit the model but inthe absence of convincing longitudinal data,ruling out alternative explanations, andindependent replications based on the previousnice-fitting model, this practice may lead todangerous causal conclusions. For the just-

presented self-efficacy example, it is just aslikely that high achievers feel better about their effectiveness as learners rather than the other way around. Apparently, many researchers

believe that it is entirely appropriate to applysuch modeling techniques and to interpret theresults as support for prescriptive statementsfounded on causality.

Conclusions About RevolutionsTo summarize, Rodgers (2010) has

written a cogent essay on the vices of statisticalhypothesis testing and the virtues of statisticalmodeling. We believe, however, that his essay

painted a somewhat distorted (and potentiallymisleading) portrait about those statistical arts.In particular, we take issue with two aspects of Rodgers so-called quiet methodologicalrevolution. For one aspect (rejecting statisticalhypothesis testing), we argue that the picture isneither as bleak nor as open and shut as Rodgers

portrayed. As supporting evidence, witness thesustained presence of hypothesis testing, along

with its more intelligent additions andadaptations, in various academic-researchdisciplines - including the research-and-

publication bible of both our very own field of psychology and virtually all social-sciencesdomains, the most recent edition of the APA

Publication Manual (American PsychologicalAssociation, 2010).

For the other aspect of Rodgers essaythat merits critical commentary (acceptingmodeling techniques), we argue that causalmodeling and other related multivariate and

multilevel data-analysis tools frequently causetheir users to think - in accord with Rodgersseductive subtitle - that the procedures aremethodological randomization-compensating

panaceas rather than techniques that do the bestthey can to provide some degree of statisticalcontrol in a multiply confounded variableworld. The unfortunate consequence of thatmethodological understanding, then, is that

when combined with researcher misapplicationof such modern modeling artillery, instead of

being on target with their data analyses andresearch conclusions, weapons are backfiringand researchers are ending up (whether knowingly or not) with a considerable amount of egg on their faces.

ReferencesAbelson, R. P. (1997). The surprising

longevity of flogged horses: Why there is a case for the significance test. Psychological Science , 8, 12-15.

American Psychological Association.(2010). Publication manual of the American

Psychological Association (6 th Ed.). Washington,DC: American Psychological Association.

Chen, L. H., Wu, C.-H., Kee, Y. H., Lin,M.-S., & Shui, S.-H. (2009). Fear of failure, 2 x 2achievement goal and self-handicapping: Anexamination of the hierarchical model of achievement motivation in physical education.Contemporary Educational Psychology , 34, 298-305.

Ciani, K. D., Middleton, M. J., Summers, J.J., & Sheldon, K. M. (2010). Buffering against

performance classroom goal structures: Theimportance of autonomy support and classroomcommunity. Contemporary Educational Psychology ,35, 88-99.

Cliff, N. (1983). Some cautions concerningthe application of causal modeling methods.

Multivariate Behavioral Research , 18, 115-126.

Cohen, J. (1994). The earth is round ( p 0 (1)

The parameter represents the mean number of events per unit time (e.g., the rate of arrivalsor the rate of failure). The exponential

distribution is supported on the interval [0, ).The mean (expected value) of an exponentiallydistributed random variable X with rate

parameter is given by:

1)( == X E (2)

In light of the examples given above, thismakes sense: if a person receives phone calls atan average rate of 2 per hour, then they canexpect to wait one-half hour for every call. Also,note that approximately 63% of the possiblevalues lie below the mean for any exponentialdistribution (Betteley, et al., 1994).

The median of an exponentiallydistributed random variable X with rate

parameter is given by:

69315.0)2(ln

== MD (3)

The maximum likelihood estimator (MLE) for

the rate parameter , given an independent andidentically distributed random sample of size n,

n X X X ,...,, 21 , drawn from the exponential

distribution, ( ) 1 Exp , is given by:

n

ii 1

n 1XX

=

= = . (4)

While this estimate is the most likelyreconstruction of the true parameter , it is onlyan estimate, and as such, the more data pointsavailable the better the estimate will be. Also,the MLEs are consistent estimators of their

parameters and are asymptotically efficient(Casella & Berger, 2002).

The Used EstimatorsThe sample mean, X , and the


136/277

ABU-SHAWIESH M.O.A

463

sample median, 1 MD , which are used in thisstudy for constructing the proposedmodified confidence intervals for theexponential distribution median, MD , arenow introduced.

The Sample Mean, X The sample mean is the most well

known example of a measure of location, or average. It is defined for a set of values as thesum of values divided by the number of valuesand is denoted by X . The sample mean for arandom sample of size n observations

n X X X ,...,, 21 can be defined as follows:

n

X X

n

i i== 1 (5)

The main advantages of the sample mean, X ,are: it is easy to compute, easy to understoodand takes all values into account. Its maindisadvantages are: it is influenced by outliers,can be considered unrepresentative of datawhere outliers occur because many values may

be well away from it and it requires all values inorder to calculate its value (Betteley, et al.,

1994; Francis, 1995).The Sample Median, MD 1

The sample median is perhaps the bestknown of the resistant location estimators. It isinsensitive to behavior in the tails of thedistribution. The sample median is defined for aset of values as the middle value when thevalues are arranged in order of magnitude and itis denoted herein by 1 MD . The sample medianfor a random sample of size n observations

n X X X ,...,, 21 can be defined as follows:

+=

+

+

evenisnif

X X

odd isnif X

MD nn

n

2

122

21

1

(6)

The main advantages of the samplemedian, MD 1, is that, it is easy to determine,requires only the middle values to calculate, can

be used when a distribution is skewed - as in thecase of the exponential distribution, is notaffected by outliers and has a maximal 50%

breakdown point. The main disadvantages of thesample median, MD 1, are that it is difficult tohandle in mathematical equations, it does notuse all available values and it can be misleadingin a distribution with a long tail because itdiscards so much information. The samplemedian, though, is considered as an alternativeaverage to the sample mean (Betteley, et al.,1994; Francis, 1995). However, the samplemedian, MD 1, has become as a good general

purpose estimator and is generally considered asan alternative average to the sample mean, X .

Estimating the Exponential Distribution Median:Two techniques are now introduced for

finding estimates, the method of sample medianand the method of maximum likelihood which isthe most widely used.

The Method of Sample MedianGiven a random sample of size n

observations, n X X X ,...,, 21 , the estimator of the exponential population median , MD , based

on the method of sample median, MD 1, isdenoted by MD MD1 . Now, from equation (3):

1 ln(2)MD ln(2)

MD= =

(8)

Thus, if the exponential population median MD in (8) is estimated by the sample median MD 1,results in the following approximation:

1

)2ln(

MD

= . (9)

Therefore, equating the results in (4) and (9) andsolving, the following approximation isobtained:


137/277

ADJUSTED CONFIDENCE INTERVAL FOR EXPONENTIAL DISTRIBUTION MEDIAN

464

)2ln()2ln( 1

111

MDn X

MD X

n n

iin

ii

==

=

(10)

The Maximum Likelihood Estimator of theMedianGiven a random sample of size n

observations, n X X X ,...,, 21 , the estimator of the exponential population median based on themaximum likelihood method is denoted herein

by MD MLE and can be defined as follows:

)2ln()2ln(1 1

n

X MD

n

ii

MLE ===

(11a)

where

n

ii 1

n 1XX

=

= = . (11b)

Comparing the Two Estimators of theExponential Distribution Median

It is known that the maximum likelihoodestimators are asymptotically unbiased andefficient. Concretely, the estimator MD MLE is

unbiased and2

2)2(ln)(

n MDVar MLE = .

Moreover, the sample median estimator, MD1, isasymptotically normal distributed withasymptotic variance

)(4

1)(

21 MDnf MDVar A = , where (.) f is

the corresponding density and MD is thetheoretical median (Vann deer Vaart, 1998).Asymptotically unbiased means that theaverage value over many random samplesfor the two estimators MD 1 or MD MLE is theexponential distribution median, MD . Tocompare the two estimators MD 1 and MD MLE interms of how far they are from the exponentialdistribution median ( MD ) on the average for many random samples, it is necessary tocompare their root mean square error, RMSE ,given as follows:

==

r

i MDi MDr

RMSE 1

2)(1

(12)

where r MD MD MD ,...,, 21 are the values of the

estimators MD 1 and MD MLE for r replications and MD is the value of the exponential distributiontrue median.

The Confidence Interval for the ExponentialDistribution Median

Next, the confidence interval for themedian of the exponential distribution, MD , isderived by modifying the confidence interval for the mean of the exponential distribution, . Let

n X X X ,...,, 21 be a random sample of size n from the exponential distribution with parameter , that is, ( ) 1~ Exp X , then the exact twosided )%1(100 confidence interval for theexponential distribution mean, , is given by(Trivedi, 2001):

=


138/277

ABU-SHAWIESH M.O.A

465

n n

i ii 1 i 1

2 2(2n , 2) (2n ,1 2)

n n

i ii 1 i 1

2 2(2n, 2) (2n,1 2)

2 X 2 XMD 1

P( )ln(2)

2ln(2) X 2ln(2) XP( MD )

1

= =

= =

< = <

= < <

=

(15)

The MD MLE confidence interval is exact. It is

based on the fact that=

n

1iiX2 follows a

2)2( n distribution. The coverage probability

must be exactly 95%.

The MD MD1 Confidence IntervalThis confidence interval is obtained by

substituting the result from (10) into equation(13) to give the exact )%1(100 confidenceinterval for the exponential distribution median,

MD , as follows:

1 12 2(2n , 2) (2n ,1 2)

1 12 2(2n , 2) (2n,1 2)

2(n MD ln(2)) 2(n MD ln(2))MDP( )

ln(2)

2n MD 2n MDP( MD )

1

< <

= < <

= (16)

The MD1 confidence interval is notexact. Its expression in (16) is based on equation(10) which is only an approximation of thestatistics In order to see that, the performance of the MD1 confidence interval is studied bycalculating the coverage probability, the averagewidth and the standard error using Monte-Carlosimulations. Actual, approximate and exactconfidence intervals based on the sample median

MD 1 can be also constructed using standardmethods.

The Monte Carlo Simulation StudyA Monte Carlo simulation was designed

to compare and study the behavior of the twoestimators MD 1 and MD MLE and investigate the

behavior of the proposed approximateconfidence intervals for the exponentialdistribution median, MD . FORTRAN programswere used to generate the data from theexponential distribution and run the simulationsand to make the necessary tables. Results arefrom the exponential distribution with parameter which was set to 1 and 0.5, to increaseskewness. The more the repetition, the moreaccurate are simulated results, therefore 10,000random samples of sizes n = 10, 15, 20, 30, 40,50 and 100 were generated.

Table (1) shows the simulated results for the root mean square error, RMSE , and theaverage of MD 1s and MD MLEs (AVG) toillustrate that both estimators are approximatelyunbiased for the true median of the exponentialdistribution, MD . The simulated results for thecoverage probability ( P ), average width (AW)and standard error (SE) of the exact confidenceinterval for the exponential mean, , and thetwo proposed approximate confidence intervalsfor the exponential distribution median, MD , areshown in tables (2-4). The criteria used toevaluate the exact and proposed approximateconfidence intervals is the value of the coverage

probability ( P ) and average width (AW); a goodmethod should have an observed coverage

probability ( P ) near to the nominal coverage probability and a small scaled average width(AW).

The simulation results in Table 1 showthat the maximum likelihood estimator of themedian, MD MLE , is a much better estimator for the population median of the exponentialdistribution, MD , than the sample median, MD 1.While both estimators are approximately

unbiased, the root mean square error, RMSE , for the sample median, MD 1, is larger than that of the maximum likelihood estimator of themedian, MD MLE . It should be noted that theaccuracy of the maximum likelihood estimator of the median, MD MLE , increases as the samplesize, n, increases which clearly provides a verygood estimator, even considering that thediscrepancy of these two estimators is very small


139/277


466

-that is, these two estimators asymptoticallycoincide.

As shown in Tables 2-4, the simulationresults show that the coverage probability ( P )for the confidence interval of the mean and theapproximate confidence interval of the median

based on the MLE method for the exponentialdistribution are the same and very close to thenominal confidence coefficient.

The approximate confidence interval of the median based on the sample median methodfor the exponential distribution provides thelower coverage probability ( P ) and gives the

lowest width among the three methods. Theaverage widths (AW) for the two proposedconfidence interval methods are approximatelythe same for moderate and large sample sizes.However, the estimated average width (AW) for the sample median method is the shortest amongall considered methods, but it has poor coverage

probability. Furthermore, as sample sizesincreases, the performance of the proposedconfidence interval based on the MLE methodimproves, but is still much lower than thenominal confidence.

Table 1: The Root Mean Square Error and Average of the Two Estimators for theExponential Distribution Median

SampleSize (n)

1= (True Median = 0.69315)

RMSE( MD 1) AVG( MD 1) RMSE( MD MLE ) AVG( MD MLE )

10 0.31575 0.74851 0.22369 0.69487

15 0.26754 0.72676 0.18040 0.69332

20 0.22459 0.72003 0.15715 0.69394

30 0.18276 0.71180 0.12808 0.69335

40 0.15907 0.70649 0.11106 0.69342

50 0.14108 0.70459 0.09887 0.69439100 0.10005 0.69894 0.06961 0.69265

SampleSize (n)

5.0= (True Median = 1.38629)

RMSE( MD 1) AVG( MD 1) RMSE( MD MLE ) AVG( MD MLE )

10 0.63150 1.49703 0.44739 1.38973

15 0.53509 1.45352 0.36080 1.38663

20 0.44918 1.44005 0.31430 1.38788

30 0.36553 1.42359 0.25615 1.38671

40 0.31814 1.41299 0.22211 1.38684

50 0.28215 1.40917 0.19773 1.38877

100 0.20010 1.39788 0.13922 1.38530


140/277

ABU-SHAWIESH M.O.A

467

Table 2: Coverage Probabilities, Average Width and Standard Error for the Confidence Intervalof the Mean of the Exponential Distribution

n

95.01 =

1= 5.0=

P AW SE P AW SE

10 94.55 1.504 0.484 94.55 3.007 0.968

15 94.84 1.148 0.299 94.84 2.297 0.59820 94.51 0.964 0.218 94.51 1.928 0.43730 94.58 0.762 0.141 94.58 1.524 0.282

40 94.94 0.650 0.104 94.94 1.299 0.20850 94.79 0.576 0.082 94.79 1.153 0.164

100 95.02 0.399 0.040 95.02 0.798 0.080

Table 3: Coverage Probabilities, Average Width and Standard Error for the Confidence Intervalof the Median of the Exponential Distribution with 1

=

n

95.01 = Confidence Interval Method

MDMLE Method MD MD1 Method

P AW SE P AW SE

10 94.55 1.123 0.336 85.11 1.042 0.46615 94.84 0.834 0.207 82.76 0.796 0.30520 94.51 0.693 0.151 83.92 0.668 0.21530 94.58 0.542 0.098 83.41 0.528 0.139

40 94.94 0.459 0.072 83.00 0.450 0.10350 94.79 0.405 0.057 83.28 0.400 0.081

100 95.02 0.279 0.028 82.95 0.277 0.040

Table 4: Coverage Probabilities, Average Width and Standard Error for the Confidence Intervalof the Median of the Exponential Distribution with 5.0=

n

95.01 = Confidence Interval Method

MDMLE Method MD MD1 Method

P AW SE P AW SE

10 94.55 2.246 0.671 85.11 2.085 0.93315 94.84 1.669 0.414 82.76 1.592 0.60920 94.51 1.387 0.303 83.92 1.337 0.43030 94.58 1.085 0.195 83.41 1.056 0.27740 94.94 0.918 0.144 83.00 0.901 0.20650 94.79 0.811 0.114 83.28 0.799 0.162

100 95.02 0.558 0.056 82.95 0.553 0.080


141/277


468

Numerical ExampleThis example is taken from Wilk, et al.

(1962); the data set represents the lifetimes (inweeks) of 34 transistors in an accelerated lifetest. The transistors were tested and the testcontinued until all of them failed. The lifetimesfor the 34 transistors (in weeks) were recordedas follows:

3, 4, 5, 6, 6, 7, 8, 8, 9,9, 9, 10, 10, 11, 11, 11, 13, 13,

13, 13, 13, 17, 17, 19, 19, 25, 29,33, 42, 42, 52, 52, 52, 52

The sample mean 912.18= X weeks, theexponential median MD 1 = 13 weeks, theexponential median MD2 = 13.108 weeks and the

skewness is 1.265695, which is highly skeweddistribution. Furthermore,

053.005287.0912.1811 ===

X and -

using the approximation derived earlier - results

in 053.005332.013

69315.0)2ln(1

=== MD

which indicates that the two values are veryclose and therefore the approximation is good.Based on Kibria (2006), the above data set areassumed to come from an exponential

distribution with mean 21= weeks; using theKolmogorov-Smirnov test, the K-S statistic =0.1603 and the p-value = 0.3125, it indicates thatthe sample data are from an exponentialdistribution with mean 21= weeks, andtherefore (from equation (3)) has a median MD = 14.556 weeks. The resulting 95% confidenceinterval and the corresponding confidence width

for the exact confidence interval for theexponential mean and the two proposed methodsfor the exponential median are calculated andgiven in table (5).

From table (5), it is observed that theexact confidence interval for the exponentialmean, as expected, covered the hypothesizedtrue population mean of 21= weeks and alsothe proposed confidence intervals for theexponential median, MD , covered thehypothesized true population median MD =14.556 weeks. However, the proposedconfidence interval for the exponential median,

MD , based on the sample median, MD1, provided the shortest confidence interval width.

ConclusionThe median - one of the most important and

popular measures for location - has many goodfeatures. The median confidence interval isuseful for one parameter families, such as theexponential distribution, and it may not need to

be adjusted if censored observations are present.The maximum likelihood estimation is a popular statistical method used to make inferences about

parameters of the underlying probabilitydistribution from a given data set. This study

proposed an approximate confidence interval for the median of the exponential distribution, MD ,

based on two estimators, the sample median, MD 1, and the maximum likelihood estimator of the median, MD MLE .

The results of this study show that usinga maximum likelihood estimator, MLE , for the

population median of the exponentialdistribution, MD , is better alternative to theclassical estimator based on the sample median,

MD 1. As shown by the study results, themaximum likelihood estimator of the median,

MD MLE , provides a good estimation for the population median of the exponentialdistribution, MD , and the proposed confidenceinterval based on this estimator had a goodcoverage probabilities compared to the samplemedian method. However, it produced slightlywider estimated width. It appears that the samplesize, n, has significant effect on the two

proposed confidence interval methods.Moreover, both of the proposed methods arecomputationally simpler. If scientists and

Table 5: The 95% Confidence Intervals for theLifetimes Data

ConfidenceInterval Method

95% ConfidenceInterval Width

Exact for Mean (13.874 , 27.308) 13.434

MD MLE (9.617 , 18.929) 9.312

MD MD1 (9.537 , 18.772) 9.235


142/277

ABU-SHAWIESH M.O.A

469

researchers are conservative about the smaller width, they might consider confidence interval

based on sample median method as a possibleinterval estimator for the population median of the exponential distribution, MD . Finally, theresults obtained from the simulation studycoincided with that of the numerical example.

AcknowledgementThe author would like to thank the HashemiteUniversity for their cooperation during the

preparation of this article.

ReferencesBalakrishnan, N., & Basu, A. P. (1996).

The exponential distribution: theory, methodsand applications , CRC press.

Bettely, G., Mettric, N., Sweeney, E., &Wilson, D. (1994). Using statistics in industry:Quality improvement through total processcontrol , (1 st Ed. ). London: Prentice HallInternational Ltd.

Casella, G., & Berger, R. L. (2002).Statistical inference . Pacific Grove, CA:Duxbury.

Daniel, W. W. (1990). Applied nonparametric statistics (2nd Ed. ). Duxbury,Canada: Thomson Learning.

Francis, A. (1995). Businessmathematics and statistics . London: DP

Publications, Ltd.Kececioglu, D. (2002). Reliability

engineering handbook . Lancaster, UK: DEStechPublications.

Kibria, B. M. G. (2006). Modifiedconfidence intervals for the mean of theasymmetric distribution. Pakestanian Journal of Statistics , 22 (2), 109-120.

Kinney, J. (1997). Probability: Anintroduction with statistical applications . NewYork: John Wiley and Sons, Inc.

Lewis, E. E. (1996). Introduction to

reliability engineering , (2nd

Ed. ). New York:John Wiley and Sons, Inc.

Maguire, B. A., Pearson, E. S., & Wynn,A. H. A. (1952). The time intervals betweenindustrial accidents. Biometrika , 39 , 168-180.

Montgomery, D. C. (2005). Introductionto statistical quality control (5th Ed. ). New York:John Wiley and Sons, Inc.

Patel, J. K., Kapadia, C. H., & Owen, D.B. (1976). Handbook of statistical distributions .

New York: Marcel Dekker.Sinha, A., & Bhattacharjee, S. (2004).

Test of parameter of an exponential distributionin predictive approach. Pakestanian Journal of Statistics , 20(3), 409-414.

Smithson, M. (2001). Correctconfidence intervals for various regression effectsizes and parameters: The importance of noncentral distributions in computing intervals.

Educational and Psychological Measurement ,61 , 605-532.

Trivedi, K. S. (2001). Probability and statistics with reliability, queuing, and computer science applications (2nd Ed. ). New York: JohnWiley and Sons, Inc.

Van der Vaart, A.W., (1998). Asymptotic Statistics, 1-496. CambridgeUniversity Press, Cambridge.

Wilk, M. B., Gnanadesikan, R., &Huyett, M. J. (1962). Estimation of parametersof the gamma distribution using order statistics.

Biometrika , 49 , 525-545


143/277


470

Nonlinear Trigonometric Transformation Time Series Modeling

K. A. Bashiru O. E. Olowofeso S. A. OwabumoyeOsun State University,

Osogbo, Osun StateFederal University Of Technology,

Akure Ondo State, Nigeria

The nonlinear trigonometric transformation and augmented nonlinear trigonometric transformation with a polynomial of order two was examined. The two models were tested and compared using daily meantemperatures for 6 major towns in Nigeria with different rates of missing values. The results were used todetermine the consistency and efficiency of the models formulated.

Key words: Nonlinear time series, polynomial, consistency, efficiency, missing value, model andforecasting.

Introduction

Time series analysis is an important techniqueused in many disciplines, including physics,engineering, finance, economics, meteorology,

biology, medicine, hydrology, oceanography andgeomorphology (Terasvirta & Anderson, 1992).This technique is primarily used to infer

properties of a system by the analysis of ameasured time record (data) (Priestley, 1988);this is accomplished by fitting a representativemodel to the data with an aim of discovering theunderlying structure as closely as possible.

Traditional time series analysis is based

on assumptions of linearity and stationarity.However, time series analysis (Box & Jenkins,1970; Brock & Potter, 1993) has nonlinear features such as cycles, asymmetries, bursts,

K. A. Bashiru is a lecturer in the Department of Mathematical and Physical Sciences. His areasof interest are Geostatistics and Time seriesEconometrics. Email him at:[email protected]. O. E.

Olowofeso is an Associate Professor of Statisticsin the Mathematical Sciences Department. He isalso currently working in the Central Bank of

Nigeria, Abuja. Email: [email protected]. A. Owabumoye is a Masters student under the supervision of O. E Olowofeso in theMathematical Sciences Department. Email:[email protected].

jumps, chaos, thresholds and heteroscedasticity,

and mixtures of these must also be taken intoaccount. Thus, a problem arises regarding asuitable definition of a nonlinear model becausenot every time series analysis is purely linear:the nonlinear class clearly encompasses a largenumber of possible choices. For these reasons,non-linear time series analysis is a rapidlydeveloping area and there have been major developments in model building and forecasting(De Gooijer & Kumar, 1992).

The growing interest in studyingnonlinear and non-stationary time series models

in many practical problems stems from theinherently non-linear nature of many phenomenain physics, engineering, meteorology, medicine,hydrology, oceanography, economics andfinance, that is, many real world problems donot satisfy the assumptions of linearity and/or stationarity (Bates & Watts, 1988; DeGooijer &Kumar, 1992; Sugihara & May, 1990).Therefore, for many real time series data,nonlinear models are more appropriate thanlinear models for accurately describing thedynamic of the series and making multi-step-

ahead forecast (Tsay, 1986; Barnett, Powell &Tauchen, 1991; Olowofeso, 2006). For example,financial markets and trends are influenced byclimatic factors like daily temperature, amountof rainfall and intensity of sun, these are areaswhere a need exists to explain behaviors that arefar from being even approximately linear.

Nonlinear models would be more appropriate for forecasting and accurately describing returns and


144/277

BASHIRU, OLOWOFESO & OWABUMOYE

471

volatility. Thus, the need for the further development of the theory and applications for nonlinear models is essential, and, because thereare an enormous number of nonlinear modelsavailable for modeling and forecasting economictime series, research should help provideguidance for choosing the best model for a

particular application (Robinson, 1983).

MethodologyThe model proposed by Gallant (1981) calledthe Augmented Nonlinear Parametric TimeSeries Model (ANPTSM) was used in this studyand a second model was formulated based on theLeast Square Method Modified Nonlinear Trigonometric Transformation Time SeriesModel (MNTTTSM).

DataData used in this study were daily mean

of temperatures from 1987 to 1996 for Ikeja,Ibadan, Ilorin, Minna and Zaria. The data werecollected from the Meteorological Centre-Oshodi Lagos.

Model FormulationConsider the format shown in Table 1.

In this model, up to 9 years were considered andthe model is formulated based on the data asshown in Table 2.

Assumption and Notation for the ModelsLet:

X t,i,k = value of occurrence for day t of Month iin the year k;

X t,k = mean occurrence for day t of year k;X*i,k = mean occurrence for month i of year k;X*yK = overall yearly mean for the sampled

month;X*mi = overall monthly mean for the sampled

year;t = the position of the day from the first day of

the Month. 1 t 31;ti = the sum of days in month i for 1 i 12;tik = the sum of days from the initial sampled

month of initial sampled year to month iof year k;

ti* = the sum of days from the initial sampledmonth to month I;

k = the position of a particular year from aninitial sample year for k ;

n = the number of sampled years;m = the number of sampled months; andX* = Grand Mean occurrence for k year(s)

examined.The first model was reviewed based on

the assumption that the sum of the occurrenceswere presented monthly, where i th monthrepresents the month i for 1 i 12 which is to

be modeled using the number of days in eachmonth (see Table 3).

Table 1: Model Formulation for a Particular Year

t/i 1 2 3 4 5 6 7 8 9 10 11 12 xi/12

1 x1,1,k x1,2,k x1,3,k x1,4,k x1,5,k x1,6,k x1,7,k x1,8,k x1,9,k x1,10,k x1,11,k x1,12,k X 1

2 x2,1,k x2,2,k x2,3,k x2,4,k x2,5,k x2,6,k x2,7,k x2,8,k x2,9,k x2,10,k x2,11,k x2,12,k X 2

t x t,1,k xt,2,k xt,3,k xt,4,k xt,5,k xt,6,k xt,7,k xt,8,k xt,9,k xt,10,k xt,11,k xt,12,k X t *

1,k x*2,k x*3,k x*4,k x*5,k x*6,k x*7,k x*8,k x*9,k x*10,k x*11,k x*12,k x*i,k /12

Table 2: Model Data Formulationt/ik 1 2 3 i k xi/ik

1 x1,1,1 x1,2,1 x1,3,1 x 1,i,k X 1,k

2 x2,1,1 x2,2,1 x2,3,1 x 2,i,k X 2,k

t x t,1,1 xt,2,1 xt,3,1 x t,i,k X t,k

x*1,k x*2,k x*3,k x *i,k x*i,k /ik= X


145/277

NONLINEAR TRIGONOMETRIC TRANSFORMATION TIME SERIES MODELING

472

Augmented Nonlinear Parametric Time SeriesModel (ANPTSM)

Trigonometric (sine and cosine)transformation augmented with polynomial of order two was applied to formulate the modelacross the year, that is, the monthly meansample and the least square methods were usedfor estimating the models parameters asfollows. Let the equation be of the form

( ) ( )2t ,i,k 1 2 ik 3 ik iX a a tSin t a t Cos t1 i 12

= + + +

(3.0)

The expected value of X t,i,k is X *i,k then theequation can be reformed as below to estimate

the parameters; a 1, a2 and a 3 using Least SquareMethod.

( ) ( )* 2i,k 1 2 i ik 3 i ik iX a a t Sin t a t Cos t1 i 12

= + + +

(3.1)

( ) ( )( )* 2i i,k 1 2 i ik 3 i ik X a a t Sin t a t Cos t = + +

(3.2)Let i2 = S

( ) ( )( )* 2 2i,k 1 2 i ik 3 i ik S (X a a t Sin t a t Cos t )= + + (3.3)

Differentiating 3.3 with respect to a 1, a2, a3, a3,as

01

aS

results in

( ) ( )* 2i,k 1 2 i ik 3 i ik X ma a t Sin t a t Cos t = + + (3.4)

where m is the number of the monthly samplemean examined. Similarly, as

02

aS

then

( ) ( )* 2 2i ik i,k 1 i ik 2 i ik 3

3 i ik ik

t Sin t X a t Sin t a t Sin (t )

a t Sin(t )Cos(t )

= +

+

(3.5)

and as0

3

aS

then

( ) ( )( ) ( )

( )

2 * 21i ik i,k i ik

32 i ik ik

4 23 i ik

t Cos t X a t Cos t

a t Sin t Cos t

a t Cos t

=

+

+

(3.6)

Simultaneously solving equations 3.4, 3.5 and3.6 using Cramers Rule results in equations 3.7-3.10.

Table 3: Months and Sums of Occurrences ModeledJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan

i 1 2 3 4 5 6 7 8 9 10 11 12 13 i k

t 31 281/4 31 30 31 30 31 31 30 31 30 31 31 t i

ti 31 591/4 90

1/4 1201/4 151

1/4 1811/4 212

1/4 2431/4 273

1/4 3041/4 334

1/4 3651/4 396

1/4 t


146/277


473

Therefore, from equations 3.7, 3.8, 3.9 and 3.10,the following result:

a1 =0

1

(3.11)

a2 =0

2

(3.12)

a3 =0

3

(3.13)

Next, substituting 3.11, 3.12 and 3.13 into 3.1gives:

* 231 2i,k ik ik

0 0 0

X tSin(t ) t Cos(t ).

= + +

(3.14)

Because X *i,k is the expected value of X t,i,k ,equation 3.14 can be rewritten as

231 2t,i,k ik ik

0 0 0X tSin(t ) t Cos(t ).

= + +

(3.15)

Models 3.14 and 3.15 would only be visible provided there is an occurrence within a monthof any sampled year.

Modified Nonlinear TrigonometricTransformation Time Series Model(MNTTTSM)

In a situation where a whole month of

data is missing, the above model may bedifficult to apply and a different model would beneeded. The model for such occurrence isformulated as follows. If the data in 3.2 arereformed such that the monthly means are thoseshown in Table 4. Consider:

( )*i,k i iX a bsin t *= + + (3.16)

( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )

2 2 4 2 3 20 i ik i ik i ik ik

4 2 2 3i ik i ik i ik i ik i ik ik

2 3 2 2 2i ik i ik i ik i i i ik

m{ t Sin t t Cos t ( t Sin t Cos t ) }

t Sin t { t Sin t t Cos t t Cos t t Sin t Cos t }

t Cos t { t Sin t t Sin t Cos t t Cos t t Sin t }

=

+

(3.7)

( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )

* 2 2 4 2 3 21 i,k i ik i ik i ik ik

* 4 2 2 * 3i ik i ik i,k ik i ik i,k i ik ik

2 * 3 2 * 2 2i ik i ik i,k i ik ik i ik i,k i ik

X { t Sin t t Cos t ( t Sin t Cos t ) }

t Sin t { t Sin t X t Cos t t Cos t X t Sin t Cos t }

t Cos t { t Sin t X t Sin t Cos t t Cos t X t Sin t }

=

+

(3.8)

( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

* 4 2 2 * 32 i ik i,k i ik i ik i,k i ik ik

* 4 2 2 3i,k i ik i ik i ik i ik ik

2 2 * 2 *i ik i ik i ik i,k i ik i ik i,k

m{ t Sin t X t Cos t ( t Cos t X t Sin t Cos t }

X { t Sin t t Cos t t Cos t t Sin t Cos t }

t Cos t { t Sin t t Cos t X t Cos t t Sin t X }

=

+

(3.9)

( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

2 2 2 * 3 *3 i ik i ik i,k i ik i i ik i,k

2 * 2 *i ik i ik i ik i,k i ik i ik i,k

* 3 2 2 2i,k i ik i ik ik i ik i ik

m{ t Sin t t Cos t X ( t Sin t Cos t t Sin t X }

t Sin t { t Sin t t Cos t X t Cos t t Sin t X }

X { t Sin t t Sin t Cos t t Cos t t Sin t

=

+

(3.10)


147/277


474

where

i1

1 i 12 , 1 t 3654

If the expected value of X *i,k is X *mi, then

equation 3.16 can take the form

( )*m i i iX a b sin t *= + + (3.17)where

i1

1 i 12 , 1 t * 3654

An ordinary least square method was used inestimating the parameters a and b. If S m = i2 = (X*mi -(a+bsin(t i*))2, then differentiating withrespect to a and b

( )( )*m *m i iS 2 X a bsin ta

= +

as

0

a

S m

( )*m *i i X 12a b sin t = + (3.18)

Also,

( ) ( )( )( )* *m *m i i iS 2 (sin t X a bsin t b

= +

as

0

bS m

( ) ( ) ( )*m * 2 *i i i isin t * X a sin t b sin t = + (3.19)

Using Cramers Rule to solve equations 3.18and 3.19 simultaneously, results in:

( )22 * *

4 i i

*m 2 * * *m *5 i i i i i

* *m *m *6 i i i i

12 Sin (t ) Sin(t )

X Sin (t ) Sin(t )X Sin(t )

12 Sin(t )X X Sin(t )

=

=

=

where parameters

( )

*m 2 * * *m *5 i i i i i

22 * *4 i i

X Sin (t ) Sin(t )X Sin(t )a

12 Sin (t ) Sin(t )

= =

(3.20)

and

( )

* *m *m *6 i i i i

22 * *4 i i

12 Sin(t )X X Sin(t ) b

12 Sin (t ) Sin(t )

= =

(3.21)

Therefore, the model for monthly occurrence is

Table 4: Modified Nonlinear Trigonometric Transformation Time Series Model Data

t/i 1 2 3 4 5 6 7 8 9 10 11 12 xi/12

1 X*1,1 X*2,1 X*3,1 X*4,1 X*5,1 X*6,1 X*7,1 X*8,1 X*9,1 X*10,1 X*11,1 X*12,1 X*y1

2X*

1,2 X*

2,2 X*

3,2 X*

4,2 X*

5,2 X*

6,2 X*

7,2 X*

8,2 X*

9,2 X*

10,2 X*

11,2 X*

12,2 X*y

2

k X*1,k X*2,k X*3,k X*4,k X*5,k X*6,k X*7,k X*8,k X*9,k X*10,k X*11,k X*12,k X*yk

x*m1 x*m2 x*m3 x*m4 x*m5 x*m6 x*m7 x*m8 x*m9 x*m10 x*m11 x*m x*m,i = X*12


148/277


475

* *5 6

4 4

( )mi i X Sin t

= +

(3.22)

Because X *mi is an expected value for X *i,k thenequation 3.22 can be rewritten as

)( *

4

6

4

5,

*ik i t Sin X

+

= (3.23)

Similarly, along the sampled year X *yK = c + dSin( k) for - k + , 15 75. The must

be chosen such that i = 0, i2 is as small as possible.

If S y = i2 = (X*yK -(c+dsin( k ))2 then

y *y

K

S2 (X (c dsin( k))

c

= +

as

0

c

S y

*yK X nc d sin( k) = + (3.24)

Also,

y *yK

S2 (sin( k) X (c dsin( k)))

d

= +

as

0

d

S y

*y 2K sin( k) X c sin( k) d sin ( k) = +

(3.25)

Solving equations 3.24 and 3.25 simultaneously

using Cramers Rule results in2 2

7

*y 2 *y8 k k

*y *y9 k k

n Sin ( k) ( Sin( k))

X Sin ( k) Sin( k)X Sin( k)

n Sin( k)X X Sin( k)

=

=

=

Where the parameters

*y 2 *y8 k k

2 27

X Sin ( k) Sin( k)X Sin( k)c


= =

(3.26)and

*y *y9 k k

2 27

n Sin( k)X X Sin( k)d


= =

(3.27)

*y 8 9k

7 7

X Sin( k)

= +

(3.28)

The method of placing expected

occurrences in a contingency table of a Chi-square was applied using equations 3.23 and3.28 to obtain the model to find the dailyoccurrences for a particular month of a particular year. Therefore, the model for expected dailyoccurrences is

k y

k y

im

k it X X X n

X ***

,,))((

= (3.29)

Substituting 3.23 and 3.28 into 3.29, results in

+

+

+

=

)(

)()(

7

9

7

8

7

9

7

8*

4

6

4

5

,,

k Sin

k Sint Sinn

X i

k it

(3.30)

ResultsModel Analysis and Discussion

The data on the daily mean temperaturefor Ikeja, Ibadan, Ilorin, Minna and Zariacollected from the Meteorological Centre-Oshodi Lagos were used. The parameters of themodels were estimated and the fitted models for each zone are shown in Table 5 for Ikeja,Ibadan, Ilorin and Minna for ANPTSM. Data for the daily mean temperature was used to estimatethe parameters. The fitted model for Zaria could


149/277


476

not be formulated due to the fact that manymonths of data were missing.

Table 6 shows the fitted models for

Ikeja, Ibadan,Ilorin, Minna and Zaria for MNTTTSM using the daily mean temperaturedata to estimate their parameters. The fittedmodel for Zaria was formulated becauseMNTTTSM has the strength of addressing the

problem of missing values. Thus, although manymonths data were missing from Zarias dailymean temperature, MNTTTSM parameterscould still be estimated. This is one of the

advantages of MNTTTSM over ANPTSM.Table 7 shows that the results of the

Pearson Product Moment Correlation

coefficients and Spearman Browns rank Order Correlation coefficients for Ikeja, Ibadan, Ilorinand Minna are highly and positively correlated,indicating a strong relationship between theactual data and estimated data for the daily meantemperature. In Zaria the correlation coefficientfor MNTTTSM is positive but low which mayindicate a weak relationship between the actualand estimated daily mean temperatures.

Table 5: The Fitted Models for ANPTSMZones Augmented Nonlinear Parametric Time Series Model (ANPTSM)

IKEJA ( ) ( )226.88642582 0.047971536tSin 0.000143793 Cost t tik ik + IBADAN ( ) 2ik ik 26.36612286 0.054847742tSin 0.0000344912t Cost t+ ILORIN ( ) ( )t t ik Cos

2

ik 000833551.04tSin0.0481158726.2476883 t +

MINNA ( ) ( )t t t ik ik CostSin2

00073.0062853.072428.25 +

ZARIA

Table 6: The Fitted Models for MNTTTSMZones Modified Nonlinear Trigonometric Transformation Time Series Model (MNTTTSM)

IKEJA

=+

++10

1

*

)601311.088996.26(

)6013116.088996.26)(420072.187226.26(10

K

i

k Sin

k SinSin t

IBADAN

=

+

++10

1

*

)901311.036761.26(

)9013535.036761.26)(591834.136749.26(10

K

i

k Sin

k SinSin t

ILORIN

=

+

++10

1

*

)45409024.040106.26(

)45409024.040106.26)(816182.145708.26(10

K

i

k Sin

k SinSin t

MINNA=

+

++

10

1

*

)90112148.067047.27(

)90112148.067047.27)(508736.256143.27(10

K

i

k Sin

k SinSin t

ZARIA

=

+

++10

1

*

)90222282.000445.25(

)90222282.000445.25)(210108.198532.24(10

K

i

k Sin

k SinSin t


150/277


477

Apart from Ibadan, in which thecorrelation coefficient in ANPTSM is greater than MNTTTSM and Ikeja which has equalcorrelation coefficients, all other Zones, thecorrelation coefficient in MNTTTSM is greater than ANPTSM. This indicates that MNTTTSMshows a stronger relationship between the actualand estimated values than does ANPTSM.Although the relationship between actual andestimated values of MNTTTSM in Zaria is weak

but positive, that of ANPTSM could not beestimated due to the large number of missingvalues in the data. Also, all of the correlations

are significant at the 0.01 level (2-tailed).As shown in Table 8, the mean of the

actual and estimated values for each zones of allmodels are almost equal; differences are due toapproximation (truncation error) duringcalculation. Also, the mean of the actual andestimated values of MNTTTSM are closer thanthose of ANPTSM, which implies thatMNTTTSM estimates better than ANPTSM. Itwas also discovered from results in Table 8 thatthe more missing values in the data, the weaker the ANPTSM is in estimating, while inMNTTTSM, the model maintains its precision.

Table 7: Correlation Coefficients

Zones TypesANPTSM MNTTTSM

Coefficients Sig. Coefficients Sig.

IKEJAPearsons r 0.607 .000 0.607 .000

Spearmans Rho 0.620 .000 0.620 .000

IBADANPearsons r 0.594 .000 0.575 .000

Spearmans Rho 0.622 .000 0.584 .000

ILORINPearsons r 0.503 .000 0.589 .000

Spearmans Rho 0.560 .000 0.612 .000

MINNAPearsons r 0.596 .000 0.676 .000

Spearmans Rho 0.656 .000 0.686 .000

ZARIAPearsons r - - 0.419 .000

Spearmans Rho - - 0.445 .000

Table 8: Comparison of ANPTSM and MNTTSM Means

Zones NANPTSM MNTTTSM

Actual Estimated Actual Estimated

IKEJA 3,660 26.9077 26.8759 26.9077 26.8759

IBADAN 3,601 26.3749 26.3756 26.3749 26.3791

ILORIN 3,580 26.4558 26.2443 26.4558 26.4593

MINNA 3,362 27.5489 26.3611 27.5489 27.5559

ZARIA 3,588 - - 25.0514 25.0172


151/277


478

Table 9 shows that the standarddeviations for MNTTTSM are less than those of ANPTSM which indicates that MNTTTSM is

better in estimating and forecasting thanANPTSM. Similarly, apart from the standarderror of ANPTSM and MNTTTSM of Ikeja,which are equal, it may be observed that thestandard errors for MNTTTSM were alsosmaller than those of ANPTSM, which indicatesthat MNTTTSM is better in estimating andforecasting than ANPTSM for time series datawith missing values.

Table 10 shows that at Ikeja, there is a95% chance that the differences between theactual and estimated daily mean temperaturewould lie between -0.00749 and 0.07119 inANPTSM and -0.00748 and 0.07118 inMNTTTSM. Similarly, at Ibadan; -0.0462 and0.04473 in ANPTSM and -0.0496 and 0.04114

in MNTTTSM, at Ilorin; 0.1505 and 0.2723 inANPTSM and -0.0592 and 0.05218 inMNTTTSM, at Minna; 1.1155 and 1.2601 inmodel I and -0.0689 and 0.05482 in MNTTTSMwhile in Zaria is between -0.0546 and 0.12310.

It was also discovered that the range of the confidence interval for MNTTTSM is lessthan that of ANPTSM for Ikeja and Ibadan. InIlorin and Minna, the lower confidence intervalsof differences for ANPTSM are positive whichindicates a 95% chance that the differences

between their actual and estimated dailytemperature (actual estimate) are positivewhile those of MNTTTSM are not. This impliesthat the estimated daily temperatures for ANPTSM at Ilorin and Minna were under-estimated. Hence MNTTTSM is better inestimating and forecasting than ANPTSM whenthere are missing values in the time series.

Table 9: Comparison of ANPTSM and MNTTSMS Standard Deviation andStandard Error of Differences

ZonesANPTSM MNTTTSM

Std. Dev. Std. Error of the Mean Std. Dev.Std. Error of

the MeanIKEJA 1.2138 0.02006 1.2137 0.02006

IBADAN 1.3913 0.02319 1.3882 0.02313

ILORIN 1.8585 0.03106 1.6996 0.02841MINNA 2.1381 0.03688 1.8293 0.03155

ZARIA - - 2.7152 0.04533

Table 10: Comparison of ANPTSM and MNTTTSMs 95 % Confidence Interval of the Difference

ZonesANPTSM MNTTTSM

Lower Upper Lower Upper

IKEJA -0.00749 0.07119 -0.00748 0.07118

IBADAN -0.0462 0.04473 -0.0496 0.04114

ILORIN 0.1505 0.2723 -0.0592 0.05218

MINNA 1.1155 1.2601 -0.0689 0.05482

ZARIA - - -0.0546 0.12310


152/277


479

ConclusionThe two models tested in this study were theAugmented Nonlinear Parametric Time SeriesModel (ANPTSM) and the Modified Nonlinear Trigonometric Transformation Time SeriesModel (MNTTTSM). Both models were testedusing daily mean temperatures at Ikeja, Ibadan,Ilorin, Minna and Zaria, and the results wereanalyzed. It was discovered that ANPTSM could

be used in forecasting provided the data ishaving few missing values. However MNTTTSM estimates forecasts better thanANPTSM in estimating missing values andforecasting. Based on results of this study,MNTTTSM is more efficient in estimatingmissing values and forecasts better thanANPTSM.

The beauty of a good model developedfor nonlinear time series modeling is the abilityto forecast better, the new method MNTTTTSMis therefore recommended for numericalsolutions for a nonlinear model with missingvalues due to its higher capacity to addressmissing values. It was also noted that themathematical derivative of MNTTTSM issimpler than ANPTSM which did not forecast

better. Further research could be conducted by placing a condition in which data having a year or more of missing values is taken intoconsideration.

ReferencesBarnett, W. A., Powell, J., & Tauchen,

G. E. (1991). Nonparametric and semi parametric methods in econometrics and statistics . Proceedings of the 5 th InternationalSymposium in Economic Theory andEconometrics. Cambridge: CambridgeUniversity Press.

Bates, D. M., & Watts, D. G. (1988). Nonlinear regression analysis and itsapplications . New York: Wiley.

Box, G. E. P., Jenkins, G. M. (1970).Time series analysis, forecasting and control .San Francisco: Holden-Day.

Brock, W. A., Potter, S. M. (1993). Nonlinear time series and macroeconometrics.In: G.S. Maddala, C. R. Rao & H. R. Vinod,Eds., Handbook of Statistics , Vol. 11 , 195-229.Amsterdam: North-Holland.

De Gooijer, J. G., & Kumar, K. (1992).Some recent developments in non-linear timeseries modeling, testing and forecasting.

International Journal of Forecasting , 8, 135-156.

Friedman, J. H., & Stuetzle, W. (1981).Projection pursuit regression. Journal of the

American Statistical Association , 76 , 817-823.Gallant, A. R. (1987). Nonlinear

statistical models . New York: Wiley.Gallant, A. R., & Tauchen, G. (1989).

Semi nonparametric estimation of conditionallyconstrained heterogeneous processes: asset

pricing application. Econometrica , 57 , 1091-1120.

Granger, C. W. J., & Hallman, J. J.(1991a). Nonlinear transformations integratedtime series. Journal of Time Series Analysis , 12 ,207-224.

Olowofeso, O. E. (2006). Varying co-efficient regression models for non-linear time

series: Estimation and testing .13 th Meeting of the Forum for Interdisciplinary Mathematicaland Statistical Techniques.

Priestley, M. (1988). Non-linear and

non-stationary time series analysis . London:Academic Press.

Robinson, P. M. (1983). Non-parametricestimation for time series models. Journal of Time Series Analysis , 4, 185-208.

Sugihara, G., & May, R. M. (1990). Nonlinear forecasting as a way of distinguishingchaos from measurement error in time series.

Nature , 344 , 734-741.Terasvirta, T., & Anderson, H. M.

(1992). Modelling nonlinearities in businesscycles using smooth transition autoregressive

models. Journal of Applied Econometrics , 7 ,S119-S136.

Tsay, R. S. (1986). Nonlinearity tests for time series. Biometrika , 73 , 461-466.


153/277


480

Incidence and Prevalence for A Triply Censored Data

Hilmi F. KittaniThe Hashemite University,

Jordan

The model introduced for the natural history of a progressive disease has four disease states which areexpressed as a joint distribution of three survival random variables. Covariates are included in the modelusing Coxs proportional hazards model with necessary assumptions needed. Effects of the covariates areestimated and tested. Formulas for incidence in the preclinical, clinical and death states are obtained, and

prevalence formulas are obtained for the preclinical and clinical states. Estimates of the sojourn times inthe preclinical and clinical states are obtained.

Key words: Progressive disease model, prevalence, incidence, trivariate hazard function, censored data, proportional hazards model, sojourn times, chronic habitu.

IntroductionLouis, et al. (1978) introduced a natural historymodel for a progressive disease in a set of threearticles: Albert, Gertman and Louis (1978),Albert, Gertman, Louis and Liu(1978) andLouis, Albert and Heghinian (1978). This modelwas extended by Kittani (1995a). Clayton (1978)also developed a model for association for the

bivariate case and Oakes (1982) made inferencesabout the association parameter in Claytonsmodel. Clayton and Cuzick (1985) introduced

the bivariate survival function for two failuretimes and made inferences about the association parameter, . Kittani (1995a, 1996, 1997, 1997-1998) considered the model for the bivariatecase that is, a case with two failure times (X,T) by including covariates and by using Coxs

proportional hazards model.The motivation for this research lies in

the fact that it is necessary to identify a threedimensional survival function for three failuretimes (X, Y, D) with four disease states (diseasefree state, preclinical state, clinical state and

death state). In the model, X is the age uponentering the preclinical state (tumor onset or first

Hilmi Kittani is a Professor of Statistics in theCollege of Science, Department of Mathematics.Email: [email protected].

heart attack), T is the age when entering theclinical state (symptoms first appear or secondheart attack) and D is the age upon entering thedeath state (dying of cancer or acute myocardialinfarction). Kittani (2010) considered estimatingthe parameters using nonparametric approach for a triply censored data.

Background and AssumptionsAs in the Louis, et al. (1978) model, it is

assumed that f XYZ(x,y,z,a) is continuous that is,

X = Y = Z = is not allowed and Y and Z aretermed the sojourn times in the preclinical andclinical states respectively. The model proposed

by Louis, et al. (1978) makes the assumption of no cohort effect, meaning that the distribution of the random variables (X, Y, Z) is independent of the age distribution A, or

f XYZA (x, y, z, a) = f XYZ(x, y, z) f A(a)

andf XYA (x, y, a) = f XY(x, y) f A(a)

where f XYZ(x, y, z) is the joint pdf of (X, Y, Z) ,f XY(x,y) is the joint pdf of X, Y and f A(a) is the

pdf of A( the age distribution of the subject population). In addition, a subject is a chronichabitu of the PCS if, for that subject, X < , Y= , for example, subject never leaves PCS.According to the model, there will be no chronichabitus of the PCS or CS because, if a subject


154/277

KITTANI

481

lives long enough, then he/she will progress tothe next state eventually (Louis et al., 1978).

The X, Y and Z axes are partitioned intoI, J and K intervals according to Chiang, et al.(1989) and Hollford (1976); they assumedconstant baseline hazards in each subinterval, 1i(x) = 1i, x Ii, 2j (y) = 2j , y I j and 3k (z)= 3k , z Ik in the i th, jth and k th intervalsrespectively. The hazard functions for the n th individual whose (X, Y, Z) values fall in thecube I ixI jxIk are modeled by assuming Coxs(1972) proportional hazards model and holds for each X, Y and Z in each respective I i, I j and I k interval.

Assuming , and (regression parameters) for the covariate (p-dimensional)are constant (the same) for all intervals to beestimated . The hazard functions 1, 2 and 3 inthe I th, Jth and K th intervals for n th individualwhose observed (X, Y, Z) value is ( Xn, Yn, Zn)will be defined as

'n(x ) e , x I1i n 1i n i

(a , a ], (y )i i 1 2 j n'

ne ,2 j

=

= +

=

andy I (b , b ], (z )n j j j 1 3k n

'ne , z I3k n k

(c ,c ].k k 1

= +

=

= +

Where 1i, 2j and 3k are baseline hazardfunctions associated with X, Y and Zrespectively. Assuming , and are constant(the same) regression parameters for thecovariate for all intervals and to be estimatedalong with the association parameter .

The joint survival function for the threenon-negative random variables (X, Y, Z) given

by Kittani (1995b) is:

1

-)()()( ]2[z),F(x, 321 ++= z y x eee y .(2.1)

Where > 0, x > 0, y> 0, z > 0, and 1, 2, 3 are the cumulative hazard functions associatedwith X, Y and Z respectively. For example, tocompute 1i(x), which is the cumulative hazardfunction for the n th individual whose x value fallsin the i th interval (assuming a constant hazardover each interval) is as follows:

k x

n 10

r 1 r n i

(x ) (u)du1i

i 1 '(a a ) (x a ) e1r 1i

r 1

+

=

= +

= (2.2)

where 2j(yn) and 3k (zn) are defined in a similar way. Thus, the joint density function (X, Y, Z) is

1 32

1 2

3

3)[ ( x ) ( y ) ( z )]

y

( 1)( 2) (x ) (y)

(z) e U1

(-

f(x, ,z)

+ +

=

+ +

(2.3)

where > 0, x > 0, y > 0, z > 0, and 1, 2 and 3 are base line hazard functions associated with X,Y and Z respectively as

31 2 (z)(x) (y)U e e e 2. = + +

Kittani (1996) derived the likelihood functionfor the uncensored and censored cases in order to estimate the regression parameters bymaximizing the likelihood function, that is, thenth individual that generates data vector wn , andL(wn) is the likelihood function contribution for the n th individual as:

.1 2 N N

L(w ,w ,...,w ) L(w )nn 1

= =

8/2/201

Empirical Characteristic Function Approach to Goodness of Fit Tests

Documents

Transcript of Empirical Characteristic Function Approach to Goodness of Fit Tests