Chap 3 of BHT

download Chap 3 of BHT

of 78

Transcript of Chap 3 of BHT

  • 8/11/2019 Chap 3 of BHT

    1/78

    Modeling and Inverse Problems in thePresence of Uncertainty

    Authors/AffiliationsH. T. Banks , North Carolina State University, Raleigh, USAShuhua Hu , North Carolina State University, Raleigh, USA

    W. Clayton Thompson , Noth Carolina State University, Raleigh, USA

    This book collects recent research including the authors own substantial projects onuncertainty propagation and quantification. It covers two sources of uncertainty: whereuncertainty is present primarily due to measurement errors and where uncertainty ispresent due to the modeling formulation itself. With many examples throughoutaddressing problems in physics, biology, and other areas, the book is suitable for appliedmathematicians as well as scientists in biology, medicine, engineering, and physics.

    Key FeaturesReviews basic probability and statistical concepts, making the book self-containedPresents many applications and theoretical results from engineering, biology,and physicsCovers the general relationship of differential equations driven by white noise(stochastic differential equations) and the ones driven by colored noise (randomdifferential equations) in terms of their resulting probability density functionsDescribes the Prohorov metric framework for nonparametric estimation of aprobability measureContains numerous examples and end-of-chapter references to research results,including the authors technical reports that can be downloaded from North

    Carolina State Universitys Center for Research in Scientific Computation

    Selected ContentsIntroduction. Probability and Statistics Overview. Mathematical and Statistical Aspects ofInverse Problems. Model Comparison Criteria. Estimation of Probability Measures UsingAggregate Population Data. Optimal Design. Propagation of Uncertainty in a ContinuousTime Dynamical System. A Stochastic System and Its Corresponding Deterministic System.Frequently Used Notations and Abbreviations. Index.

    SAVE

    20%

    SAVE 20% when you order online and enter Promo Code EZL18 FREE standard shipping when you order online.

    Catalog no. K21506April 2014, 405 pp.

    ISBN: 978-1-4822-0642-5$89.95 / 57.99

    http://www.ncsu.edu/crsc/reports.htmlhttp://www.ncsu.edu/crsc/reports.htmlhttp://www.ncsu.edu/crsc/reports.html
  • 8/11/2019 Chap 3 of BHT

    2/78

  • 8/11/2019 Chap 3 of BHT

    3/78

    Contents

    Preface xiii

    1 Introduction 1

    2 Probability and Statistics Overview 32.1 Probability and Probability Space . . . . . . . . . . . . . . . 3

    2.1.1 Joint Probability . . . . . . . . . . . . . . . . . . . . . 62.1.2 Conditional Probability . . . . . . . . . . . . . . . . . 7

    2.2 Random Variables and Their Associated Distribution Func-tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.1 Cumulative Distribution Function . . . . . . . . . . . 92.2.2 Probability Mass Function . . . . . . . . . . . . . . . . 122.2.3 Probability Density Function . . . . . . . . . . . . . . 132.2.4 Equivalence of Two Random Variables . . . . . . . . . 142.2.5 Joint Distribution Function and Marginal Distribution

    Function . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.6 Conditional Distribution Function . . . . . . . . . . . 172.2.7 Function of a Random Variable . . . . . . . . . . . . . 20

    2.3 Statistical Averages of Random Variables . . . . . . . . . . . 212.3.1 Joint Moments . . . . . . . . . . . . . . . . . . . . . . 232.3.2 Conditional Moments . . . . . . . . . . . . . . . . . . 252.3.3 Statistical Averages of Random Vectors . . . . . . . . 262.3.4 Important Inequalities . . . . . . . . . . . . . . . . . . 26

    2.4 Characteristic Functions of a Random Variable . . . . . . . . 272.5 Special Probability Distributions . . . . . . . . . . . . . . . . 28

    2.5.1 Poisson Distribution . . . . . . . . . . . . . . . . . . . 292.5.2 Uniform Distribution . . . . . . . . . . . . . . . . . . . 292.5.3 Normal Distribution . . . . . . . . . . . . . . . . . . . 312.5.4 Log-Normal Distribution . . . . . . . . . . . . . . . . . 332.5.5 Multivariate Normal Distribution . . . . . . . . . . . . 35

    2.5.6 Exponential Distribution . . . . . . . . . . . . . . . . 362.5.7 Gamma Distribution . . . . . . . . . . . . . . . . . . . 392.5.8 Chi-Square Distribution . . . . . . . . . . . . . . . . . 412.5.9 Students t Distribution . . . . . . . . . . . . . . . . . 42

    2.6 Convergence of a Sequence of Random Variables . . . . . . . 442.6.1 Convergence in Distribution . . . . . . . . . . . . . . . 442.6.2 Convergence in Probability . . . . . . . . . . . . . . . 46

    v

  • 8/11/2019 Chap 3 of BHT

    4/78

    vi Contents

    2.6.3 Convergence Almost Surely . . . . . . . . . . . . . . . 472.6.4 Mean Square Convergence . . . . . . . . . . . . . . . . 49

    References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    3 Mathematical and Statistical Aspects of Inverse Problems 533.1 Least Squares Inverse Problem Formulations . . . . . . . . . 54

    3.1.1 The Mathematical Model . . . . . . . . . . . . . . . . 543.1.2 The Statistical Model . . . . . . . . . . . . . . . . . . 54

    3.2 Methodology: Ordinary, Weighted and GeneralizedLeast Squares . . . . . . . . . . . . . . . . . . . . . . . . . . 563.2.1 Scalar Ordinary Least Squares . . . . . . . . . . . . . 563.2.2 Vector Ordinary Least Squares . . . . . . . . . . . . . 583.2.3 Numerical Implementation of the Vector OLS Proce-

    dure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.2.4 Weighted Least Squares (WLS) . . . . . . . . . . . . . 603.2.5 Generalized Least Squares Denition and Motivation . 623.2.6 Numerical Implementation of the GLS Procedure . . . 64

    3.3 Asymptotic Theory: Theoretical Foundations . . . . . . . . . 643.3.1 Extension to Weighted Least Squares . . . . . . . . . 70

    3.4 Computation of N , Standard Errors and Condence Intervals 733.5 Investigation of Statistical Assumptions . . . . . . . . . . . . 80

    3.5.1 Residual Plots . . . . . . . . . . . . . . . . . . . . . . 803.5.2 An Example Using Residual Plots: Logistic Growth . 823.5.3 A Second Example Using Residual Plot Analysis: Cell

    Proliferation . . . . . . . . . . . . . . . . . . . . . . . 873.6 Bootstrapping vs. Asymptotic Error Analysis . . . . . . . . 93

    3.6.1 Bootstrapping Algorithm: Constant Variance Data . . 943.6.2 Bootstrapping Algorithm: Non-Constant Variance Data

    963.6.3 Results of Numerical Simulations . . . . . . . . . . . . 97

    3.6.3.1 Constant Variance Data with OLS . . . . . . 973.6.3.2 Non-Constant Variance Data with GLS . . . 100

    3.6.4 Using Incorrect Assumptions on Errors . . . . . . . . 1013.6.4.1 Constant Variance Data Using GLS . . . . . 1033.6.4.2 Non-Constant Variance Data Using OLS . . 103

    3.7 The Corrective Nature of Bootstrapping Covariance Esti-mates and Their Effects on Condence Intervals . . . . . . . 106

    3.8 Some Summary Remarks on Asymptotic Theory vs. Bootstrap-

    ping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

    4 Model Selection Criteria 1174.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

    4.1.1 Statistical and Probability Distribution Models . . . . 1174.1.2 Risks Involved in the Process of Model Selection . . . 119

  • 8/11/2019 Chap 3 of BHT

    5/78

    Contents vii

    4.1.3 Model Selection Principle . . . . . . . . . . . . . . . . 1204.2 Likelihood Based Model Selection Criteria Akaike Informa-

    tion Criterion and Its Variations . . . . . . . . . . . . . . . . 1214.2.1 KullbackLeibler Information . . . . . . . . . . . . . . 1224.2.2 Maximum Likelihood Estimation . . . . . . . . . . . . 1234.2.3 A Large Sample AIC . . . . . . . . . . . . . . . . . . . 1254.2.4 A Small Sample AIC . . . . . . . . . . . . . . . . . . . 126

    4.2.4.1 Univariate Observations . . . . . . . . . . . . 1264.2.4.2 Multivariate Observations . . . . . . . . . . . 127

    4.2.5 Takeuchis Information Criterion . . . . . . . . . . . . 1284.2.6 Remarks on Akaike Information Criterion and Its Vari-

    ations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1294.2.6.1 Candidate Models . . . . . . . . . . . . . . . 1304.2.6.2 The Selected Best Model . . . . . . . . . . . 1304.2.6.3 Pitfalls When Using the AIC . . . . . . . . . 131

    4.3 The AIC under the Framework of Least Squares Estimation 1324.3.1 Independent and Identically Normally Distributed Ob-

    servations . . . . . . . . . . . . . . . . . . . . . . . . . 1324.3.2 Independent Multivariate Normally Distributed Obser-

    vations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1344.3.2.1 Unequal Number of Observations for Different

    Observed Components . . . . . . . . . . . . . 1364.3.3 Independent Gamma Distributed Observations . . . . 1384.3.4 General Remarks . . . . . . . . . . . . . . . . . . . . . 140

    4.4 Example: CFSE Label Decay . . . . . . . . . . . . . . . . . . 142

    4.5 Residual Sum of Squares Based Model Selection Criterion . . 1464.5.1 Ordinary Least Squares . . . . . . . . . . . . . . . . . 1464.5.2 Application: Cat Brain Diffusion/Convection Problem 1494.5.3 Weighted Least Squares . . . . . . . . . . . . . . . . . 1514.5.4 Summary Remarks . . . . . . . . . . . . . . . . . . . . 152

    References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    5 Estimation of Probability Measures Using Aggregate Popu-lation Data 1575.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1575.2 Type I: Individual Dynamics/Aggregate Data Inverse Problems 160

    5.2.1 Structured Population Models . . . . . . . . . . . . . . 1605.3 Type II: Aggregate Dynamics/Aggregate Data

    Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . . . 1635.3.1 Probability Measure Dependent SystemsViscoelasticity . . . . . . . . . . . . . . . . . . . . . . 163

    5.3.2 Probability Measure Dependent SystemsMaxwells Equations . . . . . . . . . . . . . . . . . . . 167

    5.4 Aggregate Data and the Prohorov Metric Framework . . . . 1695.5 Consistency of the PMF Estimator . . . . . . . . . . . . . . 178

  • 8/11/2019 Chap 3 of BHT

    6/78

    viii Contents

    5.6 Further Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 1815.7 Non-Parametric Maximum Likelihood Estimation . . . . . . 181

    5.7.1 Likelihood Formulation . . . . . . . . . . . . . . . . . 1825.7.2 Computational Techniques . . . . . . . . . . . . . . . 184

    5.8 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 187References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

    6 Optimal Design 1956.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1956.2 Mathematical and Statistical Models . . . . . . . . . . . . . 197

    6.2.1 Formulation of the Optimal Design Problem . . . . . . 1986.3 Algorithmic Considerations . . . . . . . . . . . . . . . . . . 2026.4 Example: HIV model . . . . . . . . . . . . . . . . . . . . . . 203References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

    7 Propagation of Uncertainty in a Continuous Time DynamicalSystem 2097.1 Introduction to Stochastic Processes . . . . . . . . . . . . . . 210

    7.1.1 Distribution Functions of a Stochastic Process . . . . 2117.1.2 Moments, Correlation and Covariance Functions of a

    Stochastic Process . . . . . . . . . . . . . . . . . . . . 2137.1.3 Classication of a Stochastic Process . . . . . . . . . . 215

    7.1.3.1 Stationary Versus Non-Stationary StochasticProcesses . . . . . . . . . . . . . . . . . . . . 215

    7.1.3.2 Gaussian vs. Non-Gaussian Processes . . . . 2167.1.4 Methods of Studying a Stochastic Process . . . . . . . 217

    7.1.4.1 Sample Function Approach . . . . . . . . . . 2177.1.4.2 Mean Square Calculus Approach . . . . . . . 218

    7.1.5 Markov Processes . . . . . . . . . . . . . . . . . . . . . 2207.1.5.1 Characterization of a Markov Process . . . . 2227.1.5.2 The ChapmanKolmogorov Equation . . . . 2227.1.5.3 An Example of a Markov Process: Wiener

    Process . . . . . . . . . . . . . . . . . . . . . 2237.1.5.4 An Example of a Markov Process: Diffusion

    Process . . . . . . . . . . . . . . . . . . . . . 2267.1.5.5 An Example of a Markov Process: Poisson

    Process . . . . . . . . . . . . . . . . . . . . . 2267.1.5.6 Classication of a Markov Process . . . . . . 229

    7.1.5.7 Continuous Time Markov Chain . . . . . . . 2307.1.6 Martingales . . . . . . . . . . . . . . . . . . . . . . . . 2337.1.6.1 Examples of Sample-Continuous Martingales 2347.1.6.2 The Role of Martingales in the Development

    of Stochastic Integration Theory . . . . . . . 2347.1.7 White Noise vs. Colored Noise . . . . . . . . . . . . . 236

    7.1.7.1 The Power Spectral Density Function . . . . 236

  • 8/11/2019 Chap 3 of BHT

    7/78

    Contents ix

    7.1.7.2 White Noise . . . . . . . . . . . . . . . . . . 2387.1.7.3 Colored Noise . . . . . . . . . . . . . . . . . 240

    7.2 Stochastic Differential Equations . . . . . . . . . . . . . . . . 2447.2.1 It o Stochastic Differential Equations . . . . . . . . . . 245

    7.2.1.1 Evolution of the Probability Density Functionof X (t) . . . . . . . . . . . . . . . . . . . . . 249

    7.2.1.2 Applications of the FokkerPlank Equation inPopulation Dynamics . . . . . . . . . . . . . 252

    7.2.2 Stratonovich Stochastic Differential Equations . . . . 2557.3 Random Differential Equations . . . . . . . . . . . . . . . . . 257

    7.3.1 Differential Equations with Random Initial Conditions 2587.3.1.1 Evolution of the Probability Density Function

    of x(t; X 0) . . . . . . . . . . . . . . . . . . . 2597.3.1.2 Applications of Liouvilles Equation in Popu-

    lation Dynamics . . . . . . . . . . . . . . . . 2617.3.2 Differential Equations with Random Model Parameters

    and Random Initial Conditions . . . . . . . . . . . . . 2627.3.2.1 Evolution of the Joint Probability Density Func-

    tion for ( x(t; X 0 , Z), Z)T . . . . . . . . . . . 2637.3.2.2 Evolution of Conditional Probability Density

    Function of x(t; X 0 , Z) Given the Realizationz of Z . . . . . . . . . . . . . . . . . . . . . . 264

    7.3.2.3 Applications in Population Dynamics . . . . 2657.3.3 Differential Equations Driven by Correlated Stochastic

    Processes . . . . . . . . . . . . . . . . . . . . . . . . . 268

    7.3.3.1 Joint Probability Density Function of the Cou-pled Stochastic Process . . . . . . . . . . . . 2697.3.3.2 The Probability Density Function of X (t) . . 273

    7.4 Relationships between Random and Stochastic Differential Equa-tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2767.4.1 Markov Operators and Markov Semigroups . . . . . . 277

    7.4.1.1 Random Differential Equations . . . . . . . . 2797.4.1.2 Stochastic Differential Equations . . . . . . . 281

    7.4.2 Pointwise Equivalence Results between Stochastic Dif-ferential Equations and Random Differential Equations 2827.4.2.1 Scalar Affine Differential Equations (Class 1) 2837.4.2.2 Scalar Affine Differential Equations (Class 2) 2857.4.2.3 Vector Affine Systems . . . . . . . . . . . . . 286

    7.4.2.4 Non-Linear Differential Equations . . . . . . 2887.4.2.5 Remarks on the Equivalence between the SDEand the RDE . . . . . . . . . . . . . . . . . . 293

    7.4.2.6 Relationship between the FPPS and GRDPSPopulation Models . . . . . . . . . . . . . . . 295

    References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

  • 8/11/2019 Chap 3 of BHT

    8/78

    x Contents

    8 A Stochastic System and Its Corresponding DeterministicSystem 3098.1 Overview of Multivariate Continuous Time Markov Chains . 310

    8.1.1 Exponentially Distributed Holding Times . . . . . . . 3108.1.2 Random Time Change Representation . . . . . . . . . 311

    8.1.2.1 Relationship between the Stochastic Equationand the Martingale Problem . . . . . . . . . 312

    8.1.2.2 Relationship between the Martingale Problemand Kolmogorovs Forward Equation . . . . 313

    8.2 Simulation Algorithms for Continuous Time Markov Chain Mod-els . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3148.2.1 Stochastic Simulation Algorithm . . . . . . . . . . . . 314

    8.2.1.1 The Direct Method . . . . . . . . . . . . . . 3158.2.1.2 The First Reaction Method . . . . . . . . . . 315

    8.2.2 The Next Reaction Method . . . . . . . . . . . . . . . 3168.2.2.1 The Original Next Reaction Method . . . . . 3178.2.2.2 The Modied Next Reaction Method . . . . 319

    8.2.3 Tau-Leaping Methods . . . . . . . . . . . . . . . . . . 3218.2.3.1 An Explicit Tau-Leaping Method . . . . . . 3218.2.3.2 An Implicit Tau-Leaping Method . . . . . . 325

    8.3 Density Dependent Continuous Time Markov Chains and KurtzsLimit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 3278.3.1 Kurtzs Limit Theorem . . . . . . . . . . . . . . . . . 3288.3.2 Implications of Kurtzs Limit Theorem . . . . . . . . . 329

    8.4 Biological Application: Vancomycin-Resistant Enterococcus In-

    fection in a Hospital Unit . . . . . . . . . . . . . . . . . . . . 3318.4.1 The Stochastic VRE Model . . . . . . . . . . . . . . . 3318.4.2 The Deterministic VRE Model . . . . . . . . . . . . . 3338.4.3 Numerical Results . . . . . . . . . . . . . . . . . . . . 334

    8.5 Biological Application: HIV Infection within a Host . . . . . 3368.5.1 Deterministic HIV Model . . . . . . . . . . . . . . . . 3368.5.2 Stochastic HIV Models . . . . . . . . . . . . . . . . . . 341

    8.5.2.1 The Stochastic HIV Model Based on the BurstProduction Mode . . . . . . . . . . . . . . . 341

    8.5.2.2 The Stochastic HIV Model Based on the Con-tinuous Production Mode . . . . . . . . . . . 343

    8.5.3 Numerical Results for the Stochastic HIV Model Basedon the Burst Production Mode . . . . . . . . . . . . . 343

    8.5.3.1 Implementation of the Tau-LeapingAlgorithms . . . . . . . . . . . . . . . . . . . 3438.5.3.2 Comparison of Computational Efficiency of the

    SSA and the Tau-Leaping Algorithms . . . . 3468.5.3.3 Accuracy of the Results Obtained by Tau-Leaping

    Algorithms . . . . . . . . . . . . . . . . . . . 3488.5.3.4 Stochastic Solution vs. Deterministic Solution 350

  • 8/11/2019 Chap 3 of BHT

    9/78

  • 8/11/2019 Chap 3 of BHT

    10/78

  • 8/11/2019 Chap 3 of BHT

    11/78

    Preface

    Writing a research monograph on a hot topic such as uncertainty prop-agation is a somewhat daunting undertaking. Nonetheless, we decided tocollect our own views, supported by our own research efforts over the past1215 years on a number of aspects of this topic, and summarize these forthe possible enlightenment they might provide (for us, our students and oth-

    ers). The research results discussed below are thus necessarily lled witha preponderance of references to our own research reports and papers. Innumerous references below (given at the conclusion of each chapter), we re-fer to CRSC-TRXX-YY. This refers to early Technical Report versions of manuscripts which can be found on the Center for Research in Scientic Com-putation website at North Carolina State University where XX refers to theyear, e.g., XX = 03 is 2003, XX = 99 is 1999, while the YY refers to thenumber of the report in that year. These can be found at and downloadedfrom http://www.ncsu.edu/crsc/reports.html where they are listed by year.

    Our presentation here has an intended audience from the community of in-vestigators in applied mathematics interested in deterministic and/or stochas-tic models and their interactions as well as scientists in biology, medicine,engineering and physics interested in basic modeling and inverse problems,

    uncertainty in modeling, propagation of uncertainty and statistical modeling.We owe great thanks to our former and current students, postdocs and col-

    leagues for their patience in enduring lectures, questions, feedback and someproofreading. Special thanks are due (in no particular order) to Zack Kenz,Keri Rehm, Dustin Kapraun, Jared Catenacci, Katie Link, Kris Rinnovatore,Kevin Flores, John Nardini, Karissa Cross and Laura Poag for careful read-ing of notes and suggested corrections/revisions on subsets of the material forthis monograph. However, in a sincere attempt to give credit where it is due,each of the authors rmly insists that any errors in judgment, mathematicalcontent, grammar or typos in the material presented in this monograph areentirely the responsibility of his/her two co-authors!!

    We (especially young members of our research group) have been Generouslysupported by research grants and fellowships from US federal funding agenciesincluding AFSOR, DARPA, NIH, NSF, DED, and DOE. For this support andencouragement we are all most grateful.

    H.T. BanksShuhua Hu

    W. Clayton Thompson

    xiii

  • 8/11/2019 Chap 3 of BHT

    12/78

  • 8/11/2019 Chap 3 of BHT

    13/78

  • 8/11/2019 Chap 3 of BHT

    14/78

    Chapter 1Introduction

    The terms uncertainty quantication and uncertainty propagation havebecome so widely used as to almost have little meaning unless they are furtherexplained. Here we focus primarily on two basic types of problems:

    1. Modeling and inverse problems where one assumes that a precise math-ematical model without modeling error is available. This is a standardassumption underlying a large segment of what is taught in many mod-ern statistics courses with a frequentist philosophy. More precisely, amathematical model is given by a dynamical system

    dxdt

    (t) = g (t, x (t), q ) (1.1)

    x (t0) = x 0 (1.2)

    with observation process

    f (t; ) = Cx (t; ), (1.3)where = ( q , x 0). The mathematical model is an n-dimensional deter-

    ministic system and there is a corresponding truth parameter 0 =(q 0 , x 00 ) so that in the presence of no measurement error the data can bedescribed exactly by the deterministic system at 0 . Thus, uncertaintyis present entirely due to some statistical model of the form

    Y j = f (t j ; 0) + E j , j = 1, . . . , N , (1.4)

    where f (t j ; ) = Cx (t j ; ), j = 1, . . . , N , corresponds to the observedpart of the solution of the mathematical model (1.1)(1.2) at the j thcovariate or observation time and E j is some type of (possibly statedependent) measurement error. For example, we consider errors thatinclude those of the form E j = f (t j ; 0) E j where the operation

    denotes component-wise exponentiation by followed by component-wise multiplication and

    0.

    2. An alternate problem wherein the mathematical modeling itself is amajor source of uncertainty and this uncertainty usually propagates intime. That is, the mathematical model has major uncertainties in itsform and/or its parametrization and/or its initial/boundary data, andthis uncertainty is propagated dynamically via some framework as yetto be determined.

    1

  • 8/11/2019 Chap 3 of BHT

    15/78

    2 Modeling and Inverse Problems in the Presence of Uncertainty

    Before we begin the inverse problem discussions, we give a brief but usefulreview of certain basic probability and statistical concepts. After the probabil-ity and statistics review we present a chapter summarizing both mathematicaland statistical aspects of inverse problem methodology which includes ordi-nary, weighted and generalized least-squares formulations. We discuss asymp-totic theories, bootstrapping and issues related to evaluation of the correctnessof the assumed form of statistical models. We follow this with a discussionof methods for evaluating and comparing the validity of appropriateness of a collection of models for describing a given data set, including statisticallybased model selection and model comparison techniques.

    In Chapter 5 we present a summary of recent results on the estimation of probability distributions when they are embedded in complex mathematicalmodels and only aggregate (not individual) data are available. This is followedby a brief chapter on optimal design (what to measure? when and where tomeasure?) of experiments to be carried out in support of inverse problems forgiven models.

    The last two chapters focus on the uncertainty in model formulation itself (the second item listed above as the focus of this monograph). In Chapter 7we consider the general problem of evolution of probability density functionsin time. This is done in the context of associated processes resulting fromstochastic differential equations (SDE), which are driven by white noise, andthose resulting from random differential equations (RDE), which are driven bycolored noise. We also discuss their respective wide applications in a numberof different elds including physics and biology. We also consider the generalrelationship between SDE and RDE and establish that there are classes of

    problems for which there is an equivalence between the solutions of the twoformulations. This equivalence, which we term pointwise equivalence, is inthe sense that the respective probability density functions are the same ateach time t. We show, however, that the stochastic processes resulting fromthe SDE and its corresponding pointwise equivalent RDE are generally notthe same in that they may have different covariance functions.

    In a nal chapter we consider questions related to the appropriateness of discrete versus continuum models in transitions from small numbers of in-dividuals (particles, populations, molecules, etc.) to large numbers. Theseinvestigations are carried out in the context of continuous time Markov chain(CTMC) models and the Kurtz limit theorems for approximations for largenumber stochastic populations by ordinary differential equations for corre-sponding mean populations. Algorithms for simulating CTMC models and

    CTMC models with delays (discrete and random) are explained and simula-tions are presented for problems arising in specic applications.The monograph contains illustrative examples throughout, many of them

    directly related to research projects carried out by our group at North CarolinaState University over the past decade.

  • 8/11/2019 Chap 3 of BHT

    16/78

    Chapter 3 Mathematical and Statistical Aspects of Inverse Problems

    In inverse or parameter estimation problems, as discussed in the Introduc-tion, an important but practical question is how successful the mathematicalmodel is in describing the physical or biological phenomena represented bythe experimental data. In general, it is very unlikely that the residual sum of squares (RSS) in the least-squares formulation is zero. Indeed, due to mea-surement noise as well as modeling error, there may not be a true set of parameters so that the mathematical model will provide an exact t to theexperimental data.

    Even if one begins with a deterministic model and has no initial interestin uncertainty or stochasticity, as soon as one employs experimental data inthe investigation, one is led to uncertainty that should not be ignored. Infact, all measurement procedures contain error or uncertainty in the data col-lection process and hence statistical questions arise regarding that samplingerror. To correctly formulate, implement and analyze the corresponding in-verse problems, one requires a framework entailing a statistical model as wellas a mathematical model .

    In this chapter we discuss mathematical, statistical and computational as-pects of inverse or parameter estimation problems for deterministic dynamicalsystems based on the Ordinary Least Squares (OLS), Weighted Least Squares(WLS) and Generalized Least Squares (GLS) methods with appropriate cor-responding data noise assumptions of constant variance and non-constantvariance (e.g., relative error). Among the topics included are the interplaybetween the mathematical model, the statistical model, and observation ordata assumptions, and some techniques (residual plots) for analyzing the un-certainties associated with inverse problems employing experimental data. Wealso outline a standard theory underlying the construction of condence inter-vals for parameter estimators. This asymptotic theory for condence intervals

    can be found in Seber and Wild [34]. Finally, we also compare this asymptoticerror approach to the popular bootstrapping approach.

    53

  • 8/11/2019 Chap 3 of BHT

    17/78

    54 Modeling and Inverse Problems in the Presence of Uncertainty

    3.1 Least Squares Inverse Problem Formulations

    3.1.1 The Mathematical Model

    We consider inverse or parameter estimation problems in the context of aparameterized (with vector parameter q R q ) n-dimensional vector dynam-ical system or mathematical model

    dxdt

    (t) = g (t, x (t), q ), (3.1)

    x (t0) = x 0 , (3.2)

    with observation processf (t; ) = Cx (t; ), (3.3)

    where = ( q T , x T 0 )T R q +n = R , n n, and the observation operator Cmaps R n to R m . In most of the discussions below we assume without loss of

    generality that some subset x 0 of the initial values x 0 are also unknown.The mathematical model is a deterministic systemin this chapter we pri-

    marily treat ordinary differential equations (ODE), but our discussions arerelevant to problems involving parameter-dependent partial differential equa-tions (PDE), delay differential equations, etc., as long as the system is assumedto be well-posed (i.e., to possess unique solutions that depend smoothly onthe parameters and initial data). For example, in the CFSE example of thischapter we shall also consider partial differential equation systems such as

    ut

    (t, x ) = G t,x, u , ux

    , 2u

    x 2, q , t [t0 , t f ], x [x0 , xf ] (3.4)

    u (t0 , x) = u 0 (x)

    with appropriate boundary conditions.Following usual conventions (which correspond to the form of data usually

    available from experiments), we assume a discrete form of the observations inwhich one has N longitudinal observations y j corresponding to

    f (t j ; ) = Cx (t j ; ), j = 1, . . . , N . (3.5)In general, the corresponding observations or data {y j } will not be exactlyf (t j ; ). Due to the nature of the phenomena leading to this discrepancy, wetreat this uncertainty pertaining to the observations with a statistical modelfor the observation process.

    3.1.2 The Statistical Model

    In our discussions here we consider a statistical model of the form

    Y j = f (t j ; 0) + h j E j , j = 1 , . . . , N , (3.6)

  • 8/11/2019 Chap 3 of BHT

    18/78

    Mathematical and Statistical Aspects of Inverse Problems 55

    where f (t j ; ) = Cx (t j ; ), j = 1, . . . , N , and C is an m n matrix. Thiscorresponds to the observed part of the solution of the mathematical model(3.1)(3.2) at the j th covariate or observation time for a particular vector of parameters R q +n = R . Here the m-vector function h j is dened by

    h j =

    (1, . . . , 1)T for the vector OLS case

    (w1,j , . . . , wm,j )T for the vector WLS case

    (f 1 (t j ; 0 ), . . . , f m (t j ; 0))

    T for the vector GLS case,

    (3.7)

    for j = 1, . . . , N , and h j E j denotes the component-wise multiplication of the vectors h j and E j . The vector 0 represents the truth parameter that

    generates the observations

    {Y j

    }N j =1 . (The existence of a truth parameter 0

    is a standard assumption in statistical formulations and this along with theassumption that the means E (E j ) are zero yields implicitly that (3.1)(3.2) isa correct description of the process being modeled.) The terms h j E j are ran-dom variables which can represent observation or measurement error, systemuctuations or other phenomena that cause observations to not fall exactlyon the points f (t j ; 0) from the smooth path f (t, 0). Since these uctua-tions are unknown to the modeler, we will assume that realizations j of E jare generated from a probability distribution which reects the assumptionsregarding these phenomena. Thus specic data ( realizations ) correspondingto (3.6) will be represented by

    y j = f (t j ; 0) + h j j , j = 1, . . . , N . (3.8)We make standard assumptions about the

    E j in that they are independentand identically distributed with mean zero and constant covariance matrix.This model (3.8) allows for a fairly wide range of error models, including theusual absolute (or constant variance ) error model, when = 0 (the OLS case),as well as the relative (or constant coefficient of variation) error model when = 1. For instance, in a statistical model for pharmacokinetics of drugsin human blood samples, a natural choice for the statistical model might be = 0 and a multivariate normal distribution for E j . In observing (counting)populations, the error may depend on the size of the population itself (i.e., = 1) while studies of ow cytometry data [9] have revealed that a choice of =

    12

    may be most appropriate. Each of these cases will be further discussedbelow.

    The purpose of our presentation in this chapter is to discuss methodologyrelated to estimates for the true value of the parameter 0 from a set of admissible parameters, and the dependence of this methodology on whatis assumed about the choice of and the covariance matrices of the errorsE j . We discuss a class of inverse problem methodologies that can be used tocalculate estimates for 0: the ordinary, the weighted and the generalizedleast-squares formulations.

  • 8/11/2019 Chap 3 of BHT

    19/78

    56 Modeling and Inverse Problems in the Presence of Uncertainty

    We are interested in situations (as is the case in most applications) wherethe error distribution is unknown to the modeler beyond the assumptions onE (Y j ) embodied in the model and the assumptions made on Var( E j ). Weseek to explore how one should proceed in estimating 0 and the covariancematrix in these circumstances.

    3.2 Methodology: Ordinary, Weighted and GeneralizedLeast Squares

    3.2.1 Scalar Ordinary Least SquaresTo simplify notation, we rst consider the absolute error statistical model

    ( = 0) in the scalar case. This then takes the form

    Y j = f (t j ; 0) + E j , j = 1, . . . , N , (3.9)where the variance Var( E j ) = 20 is assumed to be unknown to the modeler.(Note also that the distribution of the error need not be specied.) It isassumed that the observation errors are independent across j (i.e., time),which may be a reasonable one when the observations are taken with sufficientintermittency or when the primary source of error is measurement error. If we dene

    OLS = N OLS (Y ) = arg min

    N

    j =1

    [Y j f (t j ; )]2 , (3.10)

    where Y = ( Y 1 , Y 2 , . . . , Y N )T , then OLS can be viewed as minimizing thedistance between the data and model where all observations are treated asbeing of equal importance. We note that minimizing the functional in (3.10)corresponds to solving for in

    N

    j =1

    [Y j f (t j ; )] f (t j ; ) = 0 , (3.11)

    the so-called normal equations or estimating equations . We point out that

    OLS is a random vector (because E j = Y j f (t j ; ) is a random variable);hence if {yj }N j =1 are realizations of the random variables {Y j }N j =1 then solving

    OLS = N OLS = arg min

    N

    j =1

    [yj f (t j ; )]2 (3.12)

    provides a realization for OLS .

  • 8/11/2019 Chap 3 of BHT

    20/78

  • 8/11/2019 Chap 3 of BHT

    21/78

    58 Modeling and Inverse Problems in the Presence of Uncertainty

    We can obtain the standard errors SE k ( OLS ) (discussed in more detail below)for the kth element of OLS by calculating SE k ( OLS ) N kk .We note that (3.18) represents the estimate for 20 of (3.13) with the factor1N

    replaced by the factor 1

    N . In the linear case the estimate with

    1N

    can

    be shown to be biased downward (i.e., biased too low) and the same behaviorcan be observed in the general nonlinear case see Chapter 12 of [34] and p.28 of [21]. The subtraction of degrees of freedom reects the fact that has been computed to satisfy the normal equations (3.11). We also remarkthat (3.13) is true even in the general nonlinear caseit does not rely on anyasymptotic theories, although it does depend on the assumption of constantvariance being correct.

    3.2.2 Vector Ordinary Least Squares

    We next consider the more general case in which we have a vector of observations for the j th covariate tj . If we still assume the variance isconstant in longitudinal data, then the statistical model is reformulated as

    Y j = f (t j ; 0 ) + E j , (3.20)

    where f (t j ; 0) R m and E j , j = 1, . . . , N are independent and identicallydistributed with zero mean and covariance matrix given byV 0 = Var( E j ) = diag( 20,1 , . . . ,

    20,m ), (3.21)

    for j = 1, . . . , N . Here we have allowed for the possibility that the observation

    coordinates Y j may have different constant variances 20,i , i.e., 20,i does notnecessarily have to equal 20,k . We note that this formulation also can beused to treat the case where V 0 is used to simply scale the observations (i.e.,V 0 = diag( v1 , . . . , vm ) is known). In this case the formulation is simply a vector OLS (sometimes also called a weighted least squares (WLS)). The problemwill consist of nding the minimizer

    OLS = arg min

    N

    j =1

    [Y j f (t j , )]T V 1

    0 [Y j f (t j , )], (3.22)where the procedure weights elements of the vector Y j f (t j , ) according totheir variability. (Some authors refer to (3.22) as a generalized least squares(GLS) procedure, but we will make use of this terminology in a different

    formulation in subsequent discussions). Just as in the scalar OLS case, OLSis a random vector (again, because E j = Y j f (t j , ) is a random vector);hence if {y j }N j =1 is a collection of realizations of the random vectors {Y j }N j =1 ,then solving

    OLS = arg min

    N

    j =1

    [y j f (t j , )]T V 1

    0 [y j f (t j , )] (3.23)

  • 8/11/2019 Chap 3 of BHT

    22/78

    Mathematical and Statistical Aspects of Inverse Problems 59

    provides a realization = OLS for OLS . By the denition of the covariancematrix we have

    V 0 = diag E1N

    N

    j =1

    [Y j f (t j , 0)][Y j f (t j , 0)]T ii

    .

    Thus an unbiased approximation for V 0 is given by

    V = diag 1

    N N

    j =1

    [y j f (t j , )][y j f (t j , )]T ii

    . (3.24)

    However, the estimate of (3.23) requires the (generally unknown) matrix V 0 ,and V 0 requires the unknown vector 0 , so we will instead use the following

    expressions to calculate and V :

    0 = arg min N

    j =1

    [y j f (t j , )]T V 1[y j f (t j , )] (3.25)

    V 0 V = diag 1

    N (

    N

    j =1

    [y j f (t j ; )][y j f (t j ; )]T )ii . (3.26)

    Note that the expressions for and V constitute a coupled system of equa-tions that will require greater effort in implementing a numerical scheme, as

    discussed in the next section.Just as in the scalar case, we can determine the asymptotic properties of the OLS estimator (3.22). As N , OLS has the following asymptoticproperties [21, 34]:

    OLS N ( 0 , N 0 ), (3.27)where

    N 0 N

    j =1

    D T j ( 0)V 1

    0 Dj ( 0)

    1

    , (3.28)

    and the m

    matrix D j ( 0) = DN

    j ( 0) is given by

    f 1(t j ; 0)1

    f 1(t j ; 0)2

    f 1(t j ; 0)

    ......

    ...f m (t j ; 0)

    1f m (t j ; 0)

    2 f m (t j ; 0)

    .

  • 8/11/2019 Chap 3 of BHT

    23/78

    60 Modeling and Inverse Problems in the Presence of Uncertainty

    Since the true values of the parameters 0 and V 0 are unknown, their estimates and V are used to approximate the asymptotic properties of the least-squares estimator OLS :

    OLS N ( 0 , N 0 ) N ( , N ), (3.29)where

    N 0 N =N

    j =1

    D T j ( )V 1D j ( )

    1

    . (3.30)

    The standard errors SE k ( OLS ) can then be calculated for the kth element of OLS by SEk (

    OLS )

    N kk .

    3.2.3 Numerical Implementation of the Vector OLS Proce-dure

    In the scalar statistical model (3.9), the estimates and can be solved sep-arately (this is also true of the vector statistical model in the case V 0 = 20 Im ,where Im is the m m identity matrix) and thus the numerical implemen-tation is straightforward. First determine OLS according to (3.12) and thencalculate 2OLS according to (3.18). However, as already noted, the estimates and V in the case of the general vector statistical model (3.20) require moreeffort since the equations (3.25)(3.26) are coupled. To solve this coupledsystem the following iterative process can be used:

    1. Set V = V (0) = Im and solve for the initial estimate (0)

    using (3.25).Set l = 0.

    2. Use ( l)

    to calculate V ( l+1) using (3.26).

    3. Re-estimate by solving (3.25) with V = V ( l+1) to obtain ( l+1)

    .

    4. Set l = l + 1 and return to step 2. Terminate the process and set OLS =

    ( l+1)when two successive estimates for are sufficiently close

    to one another.

    3.2.4 Weighted Least Squares (WLS)

    Although in the above discussion the measurement errors distribution re-mained unspecied, we did require that the measurement error remain con-stant in variance in longitudinal data. That assumption may not be appro-priate for data sets whose measurement error is not constant in a longitudinalsense. A common weighted error model, in which the error is weighted accord-ing to some known weights, an assumption which might be reasonable when

  • 8/11/2019 Chap 3 of BHT

    24/78

  • 8/11/2019 Chap 3 of BHT

    25/78

    62 Modeling and Inverse Problems in the Presence of Uncertainty

    where we take the approximation

    20 2WLS = 1

    N N

    j =1

    1w2j

    [yj f (t j ; )]2 .

    We can then approximate the standard errors of WLS by taking the squareroots of the diagonal elements of N .

    3.2.5 Generalized Least Squares Denition and MotivationA method motivated by the WLS (as we have presented it above) involves

    the so-called Generalized Least Squares (GLS) estimator. To dene the ran-

    dom vector GLS [20, Chapter 3] and [34, p. 69], the following normal equations are solved for the estimator GLS :

    N

    j =1

    f 2 (t j ; GLS )[Y j f (t j ; GLS )] f (t j ; GLS ) = 0 , (3.37)

    where Y j satisesY j = f (t j ; 0 ) + f (t j ; 0)E j ,

    and

    f (t j ; ) =f (t j ; )

    1, . . . ,

    f (t j ; )

    T

    .

    The quantity GLS is a random vector; hence if

    {yj

    }N

    j =1 is a realization of

    {Y j }N j =1 , then solvingN

    j =1

    f 2 (t j ; )[yj f (t j ; )] f (t j ; ) = 0 (3.38)

    for will provide an estimate for GLS .The GLS equation (3.38) can be motivated by examining the special weighted

    least-squares estimate

    WLS = arg min

    N

    j =1

    wj [yj f (t j ; )]2 (3.39)

    for a given {wj }. If we differentiate the sum of squares in (3.39) with respectto and then choose wj = f 2 (t j ; ), an estimate GLS is obtained by solving

    N

    j =1

    wj [yj f (t j ; )] f (t j ; ) = 0

  • 8/11/2019 Chap 3 of BHT

    26/78

    Mathematical and Statistical Aspects of Inverse Problems 63

    for , i.e., solving (3.38). However, we note the GLS relationship (3.38) doesnot follow from minimizing the weighted least squares with weights chosen aswj = f 2 (t j ; ) (see p. 89 of [34]).

    Another motivation for the GLS estimating equations (3.37) and (3.38) canbe found in [17]. In that text, Carroll and Ruppert claim that if the data aredistributed according to the gamma distribution, then the maximum-likelihood estimate for (a standard approach when one assumes that the distributionfor the measurement error is completely knownto be discussed later) is thesolution to

    N

    j =1

    f 2(t j ; )[yj f (t j ; )] f (t j ; ) = 0 ,which is equivalent to the corresponding GLS estimating equations (3.38) with = 1. (We refer the reader to Chapter 4 on this as well as the maximumlikelihood estimation method.) The connection between the maximum likeli-hood estimation method and our GLS method is reassuring, but it also posesanother interesting question: what if the variance of the data is assumed tobe independent of the model output f (t j ; ) but dependent on some otherfunction h(t j ; ) (i.e., Var( Y j ) = 20 h

    2(t j ; ))? Is there a corresponding max-imum likelihood estimator of whose form is equivalent to the appropriateGLS estimating equation

    N

    j =1

    h 2(t j ; )[Y j f (t j ; )] f (t j ; ) = 0 ? (3.40)

    In their text, Carroll and Ruppert [17] briey describe how distributions be-longing to the exponential family of distributions (to be discussed later) gen-erate maximum-likelihood estimating equations equivalent to (3.40).

    The GLS estimator GLS = N GLS has the following asymptotic properties[21, 34]:

    GLS N ( 0 , N 0 ), (3.41)where

    N 0 20 F T ( 0)W ( 0)F ( 0) 1

    , (3.42)

    the sensitivity matrix is given by (3.36) and the matrix W ( ) is dened byW 1( ) = diag f 2 (t1 ; ), . . . , f 2 (tN ; ) . Note that because 0 and 20 areunknown, the estimates = GLS and 2 = 2GLS will again be used in (3.42)to calculate

    N 0 N = 2 F T ( )W ( )F ( ) 1

    ,

    where we take the approximation

    20 2GLS = 1

    N N

    j =1

    1f 2 (t j ; )

    [yj f (t j ; )]2 .

  • 8/11/2019 Chap 3 of BHT

    27/78

    64 Modeling and Inverse Problems in the Presence of Uncertainty

    We can then approximate the standard errors of GLS by taking the squareroots of the diagonal elements of N .

    3.2.6 Numerical Implementation of the GLS Procedure

    We note that an estimate GLS can be solved either directly according to(3.38) or iteratively using an algorithm similar in spirit to that in Section3.2.3. This iterative procedure as described in [21] (often referred to as theGLS algorithm) is summarized below:

    1. Solve for the initial estimate (0)

    obtained using the OLS minimization(3.12). Set l = 0.

    2. Form the weights wj = f 2 (t j ; ( l) ).

    3. Re-estimate by solving

    N

    j =1

    wj yj f t j , f (t j ; ) = 0 (3.43)

    to obtain ( l+1)

    .

    4. Set l = l + 1 and return to step 2. Terminate the process and set GLS =

    ( l+1)when two of the successive estimates are sufficiently close.

    We note that the above iterative procedure was formulated by the equivalentof minimizing for a given :

    N

    j =1

    f 2 (t j ; )[yj f (t j ; )]2

    (over ) and then updating the weights wj = f 2 (t j ; ) after eachiteration. One would hope that after a sufficient number of iterations wjwould converge to f 2 (t j ; GLS ). Fortunately, under reasonable conditions[21], if the process enumerated above is continued a sufficient number of times,then wj f

    2 (t j ; GLS ).

    3.3 Asymptotic Theory: Theoretical Foundations

    Finally we turn to the general least squares problems discussed in previoussections where the statistical models have one of the forms given in (3.6).

  • 8/11/2019 Chap 3 of BHT

    28/78

    Mathematical and Statistical Aspects of Inverse Problems 65

    For ease in notation we discuss only versions of these models for scalar ob-servation cases in the OLS and WLS formulations; the reader can easily usethe vector extensions discussed earlier in this chapter to treat vector obser-vations. We rst explain the asymptotic theory for OLS formulations , i.e.,for an absolute error model ( = 0 in the general formulation) in the scalarcase. We then explain the extensions to more general error models, again inthe scalar observation case. Thus we consider the statistical model (3.6) withm = 1 and = 0 given by

    Y j = f (t j ; 0) + E j , j = 1, . . . , N . (3.44)In the discussions below, for notational simplicity the dependence of theestimators and cost functions on N may be suppressed, e.g., using J OLS ( ) orJ OLS ( ; Y ) instead of

    J N OLS ( ) = J N OLS ( ; Y ) =N

    j =1

    (Y j f (t j ; ))2 .

    This is the case particularly during the discussions of the cell proliferationexample in Section 3.5.3 below.

    We present existing results on the asymptotic properties of non-linear ordi-nary least squares estimators. The notation and organization of this sectionclosely follows that of Banks and Fitzpatrick [7] and the extension in [ ?]. Thework of Banks and Fitzpatrick was partially inspired by the work of Gallant[24], and many of their results and assumptions are similar. In the discussion

    that follows, we comment on any differences between the two approaches thatwe feel are noteworthy. While Gallants work is remarkably general, allowingfor a misspecied model and a general estimation procedure (both least meandistance estimators and method of moments estimators are included), we donot consider such generalities here. The comments below are limited to ordi-nary and weighted least squares estimation with a correctly specied model(recall the discussion above).

    We rst follow [7] in focusing exclusively on asymptotic properties for i.i.d.(absolute) error models. The theoretical results of Gallant similarly focus oni.i.d. errors, though some mathematical tools are discussed [24, Chapter 2,p. 156157] which help to address more general error models. In fact, thesetools are used in a rigorous fashion in the next section to extend the resultsof [7].

    It is assumed that the error random variables {E j } are dened on someprobability space ( , F , Prob) and take their values in Euclidean space R .By construction, it follows that the data Y as well as the estimators N WLSand N OLS are random vectors dened on this probability space, and henceare functions of so that we may write Y j (), N OLS (), etc., asnecessary.

  • 8/11/2019 Chap 3 of BHT

    29/78

    66 Modeling and Inverse Problems in the Presence of Uncertainty

    For a given sampling set {t j }N j =1 , one can dene the empirical distributionfunctionN (t) =

    1N

    N

    j =1

    t j (t), (3.45)

    where t j is the Dirac measure with atom at t j , that is,

    t j (t) =0, t < t j1, otherwise .

    Clearly, N P ([t0 , t f ]), the space of probability measures (or, equivalently,the cumulative distribution functions) on [ t0 , t f ]. Following popular conven-tions we will not always distinguish between probability measures and theirassociated cumulative distribution functions.

    Again, the results presented below are paraphrased from [7, ?], and com-ments have been included to indicate the alternative approach of [24]. Noproofs are given here, though the interested reader can nd a complete set of proofs in [7, ?] and [24]. First, we consider the following set of assumptions:

    (A1) The random variables {E j } are independent and identically distributedrandom variables with distribution function P . Moreover, E (E j ) = 0and Var( E j ) = 20 .

    (A2) is a compact, separable, nite-dimensional Euclidean space (i.e., R ) with 0 int ( ).

    (A3) [t0 , t f ] is a compact subset of R (i.e., t0 , t f are both nite).

    (A4) The function f (; ) C ([t0 , t f ], C 2 ( )).(A5) There exists a nite measure on [t0 , t f ] such that the sampling sets

    {t j }N j =1 satisfy as N 1N

    N

    j =1

    h(t j ) = t f t 0 h(t)dN (t) t f t 0 h(t)d(t)for all continuous functions h, where N is the nite distribution func-tion as dened in (3.45). That is, N converges to in the weak topol-ogy (where P ([t0 , t f ]) is viewed as a subset of C ([t0 , t f ]), the topologicaldual of the space of continuous functions C ([t0 , t f ])). (Note that thismeans that the data must be taken in a way such that in the limit itlls up the interval [ t0 , t f ].)

    (A6) The functional

    J 0( ) = 20 + t f t 0 (f (t; 0) f (t ; ))2 d(t)has a unique minimizer at 0 .

  • 8/11/2019 Chap 3 of BHT

    30/78

    Mathematical and Statistical Aspects of Inverse Problems 67

    (A7) The matrix

    J = 2J 0 ( 0)

    2

    = 2 t f t 0 f (t; 0) f (t; 0) T d(t)is positive denite.

    Remark: The most notable difference between the assumptions above(which are those of [7]) and those of [24] is assumption (A5). In its place,Gallant states the following.

    (A5 ) Dene the probability measure

    (S ) = t f t 0 E I S (E , t)dP (E )d(t)for an indicator function I S and set S E [t0 , t f ] and dened asabove. Then almost every realized pair ( j , t j ) is a Cesaro sum gen-erator with respect to and a dominating function b(E , t ) satisfying [t 0 ,t f ] E bd < . That is,

    limN

    1

    N

    N

    j =1

    h( j , t j ) =

    t f

    t 0

    E

    h(

    E , t )d (

    E , t)

    almost always, for all continuous functions h such that |h(E , t)| < b (E , t).Moreover, it is assumed that for each t [t0 , t f ], there exists a neigh-borhood O t such that

    E supO t b(E , t)dP (E ) < .The assumption (A5 ) is stronger than the assumption (A5) as it supposes

    not only the existence of a dominating function, but also involves the behaviorof the probability distribution P , which is generally unknown in practice.The practical importance of the dominating function b arises in the proof of

    consistency for the least squares estimator (see Theorem 3.3.1 below).As has been noted elsewhere (see, e.g., [1], [34, Chapter 12]), the strongconsistency of the estimator is proved by arguing that J 0( ) is the almostsure limit of J N OLS ( ; Y ). Thus, if J N OLS ( ; Y ) is close to J 0( ) and J 0( ) isuniquely minimized by 0 , it makes intuitive sense that N OLS , which minimizesJ N OLS ( ; Y ), should be close to 0 . This task is made difficult by the fact thatthe null set (from the almost sure statement) may depend on the parameter

  • 8/11/2019 Chap 3 of BHT

    31/78

    68 Modeling and Inverse Problems in the Presence of Uncertainty

    . In [7], the almost sure convergence of J N OLS ( ; Y ) to J 0 ( ) is demonstratedconstructively, that is, by building a set A F (which does not depend on ) with Prob {A } = 1 such that J N OLS ( ; Y ) J 0 ( ) for each A and foreach . This construction relies upon the separability of the parameterspace (assumption (A2)) as well as the compactness of the space [ t0 , t f ](assumption (A3)). The alternative approach of Gallant uses a consequenceof the GlivenkoCantelli theorem [24, p. 158] to demonstrate a uniform (withrespect to ) strong law of large numbers. The proof relies upon the dominatedconvergence theorem [31, p. 246], and hence the dominating function b. Asa result, Gallant does not need the space to be separable or the interval[t0 , t f ] to be compact. It should be noted, however, that in most practicalapplications of interest and [t0 , t f ] are compact subsets of Euclidean spaceso that relaxing these assumptions provides little advantage.

    While the list of assumptions above is extensive, we remark that the set isnot overly restrictive. Assumptions (A2) and (A3) are naturally satised formost problem formulations (although the requirement 0 int ( ) may beoccasionally problematic [7, Remark 4.4]). Assumption (A4) is easily checked.Though assumption (A1) may be difficult to verify, it is much less restrictivethan, say, a complete likelihood specication. Moreover, residual plots (see[15, Chapter 3] and the discussions below) can aid in assessing the reliabilityof the assumption.

    Assumption (A5) is more difficult to check in practice as one does notknow the limiting distribution . Of course, this is simply an assumptionregarding the manner in which data is sampled (in the independent variablespace [t0 , t f ]). Namely, it must be taken in a way that lls up the space in an

    appropriate sense . Similarly, assumptions (A6) and (A7) cannot be verieddirectly as one knows neither nor 0 . In many practical applications of interest, is Lebesgue measure, and one can assess the assumptions at

    N OLS

    (which is hopefully close to 0). Of course, if assumption (A7) holds, thenassumption (A6) must hold at least for a small region around 0 thoughpossibly not on all of .

    Assumption (A7) is not strictly necessary if one uses assumption (A5 ) inthe place of (A5). Given assumptions (A2)(A4), it follows that the func-tion b (and its relevant derivatives) is bounded (and hence dominated by a -measurable function) provided the space E in which the random variables

    E j take their values is bounded . On one hand, this has the desirable effect of weakening the assumptions placed on the Hessian matrix J . Yet the assump-tion that E is bounded precludes certain error models, in particular normally distributed errors .

    We now give several results which summarize the asymptotic properties of the ordinary least squares estimator.

    Theorem 3.3.1 Given the probability space (, F , Prob ) as dened above de-scribing the space of sampling points and assumptions (A1)(A6), N OLS 0

  • 8/11/2019 Chap 3 of BHT

    32/78

    Mathematical and Statistical Aspects of Inverse Problems 69

    with probability one or almost surely. That is,

    Prob limN N OLS () = 0 = 1 .This theorem states that the ordinary least squares estimator is consistent.We remark that the nite dimensionality of the parameter space (see as-sumption (A2)) is not necessary in the proof of this theorem, and it is sufficientfor the function f to be continuous from into C ([t0 , t f ]) rather than to betwice continuously differentiable.

    Given Theorem 3.3.1, the following theorem may also be proven.

    Theorem 3.3.2 Given assumptions (A1)(A7), as N N N OLS 0 dZ N 0 , 220J 1 , (3.46)

    that is, the convergence is in the sense of convergence in distribution.

    To reconcile these asymptotic results with the approximations used in prac-tice given in (3.14)(3.17) and (3.19), we argue as follows. Dene

    J N = 2 t f t 0 f (t; 0) f (t; 0) T dN (t), (3.47)which from the denition in (3.45) is the same as

    2 1N

    N

    j =1

    f (t j ; 0)

    f (t j ; 0)

    T

    = 2 1N (F

    N ( 0 ))

    T F

    N ( 0). (3.48)

    Recalling (3.17) and using (A5), we have

    12J

    N = 1N (F

    N ( 0))

    T F N ( 0) 0 = 12J . (3.49)

    Thus the convergence statement in (3.46) is

    N N OLS 0 d

    Z N 0 , 20 0 1 . (3.50)

    Given (3.19), in practice we make an approximation in (3.50) given by

    N 0 , 20 0

    1

    N 0 , 20 N [(F

    N ( 0))

    T F

    N ( 0)]

    1

    ,

    which, in light of (3.50), leads to the approximation

    N OLS 0 d

    Z 1 N 0 , 20 [(F N ( 0))T F N ( 0)] 1 , (3.51)which is indeed (3.14).

  • 8/11/2019 Chap 3 of BHT

    33/78

    70 Modeling and Inverse Problems in the Presence of Uncertainty

    3.3.1 Extension to Weighted Least SquaresThe results presented above are quite useful, but they only apply to the class

    of problems in which the measurement error random variables are independentand identically distributed with constant variance 20 . While the assumptionof independence is common, there are many practical cases of interest inwhich these random variables are not identically distributed. In many casesone encounters a weighted least squares problem in which E (E j ) = 0 for all jbut Var( Y j ) = 20 w2j . As discussed previously, in such cases, the results above(in particular, assumption (A1)) fail to apply directly. Here we return to theweighted least squares model (3.31) for the scalar observation case. Thus werecall our problem of interest involves the weighted least squares cost

    J N WLS ( ) =

    N

    j =1

    Y j

    f (t j ; )w(t j )

    2

    , (3.52)

    where wj = w(t j ). The weighted least squares estimator is dened as therandom vector N WLS which minimizes the weighted least squares cost functionfor a given set of random variables {Y j }. Hence

    N WLS = arg min

    J N WLS ( ). (3.53)

    In order to extend the results presented above (Theorems 3.3.1 and 3.3.2)to independent, heteroscedastic error models (and hence, to general weightedleast squares problems), we turn to a technique suggested by Gallant [24, p.124] in which one denes a change of variables in an attempt to normalizethe heteroscedasticity of the random variables. As has been noted previously,Gallant used this technique under a different set of assumptions in order toobtain results similar to those presented above. This change of variablestechnique will allow us to extend the results above, originally from [7], in arigorous fashion, as given in [ ?].

    Consider the following assumptions.

    (A1 a) The error random variables {E j } are independent, have central mo-ments which satisfy E (E j ) = 0 and Var( E j ) = 20 and yield observa-tions Y j satisfying

    Cov( Y ) = 20 W = 20 diag( w

    21 , . . . , w

    2N ),

    where Y = ( Y 1 , Y 2 , . . . , Y N )T .

    (A1

    b) The function w satises w C ([t0 , t f ],R +

    ) and w(t) = 0 for t [t0 , t f ].(A7 ) The matrix

    J = 2 t f t 0 1w2 (t) f (t ; 0) f (t; 0) T d(t)is positive denite.

  • 8/11/2019 Chap 3 of BHT

    34/78

  • 8/11/2019 Chap 3 of BHT

    35/78

    72 Modeling and Inverse Problems in the Presence of Uncertainty

    (by assumption (A1 a)) with constant variance, and thus assumption (A1) issatised. Assumptions (A2), (A3) and (A5) are unchanged. For assump-tion (A4), we must show that h C ([t0 , t f ], C 2( )) where h is given byh(t; ) = f (t ; )/w (t). This follows from assumption (A1 b). For the ana-logue of assumption (A6), we must show

    J ( ) = 20 + t f t 0 (h(t; 0) h(t; )) 2 d(t)= 20 + t f t 0 f (t; 0) f (t; )w(t) 2 d(t)

    has a unique minimizer at = 0 . Clearly, J ( 0 ) = 20 . Since the functionJ (see assumption (A6)) has a unique minimizer at = 0 and w(t) > 0,it follows immediately that J ( ) > 20 if = 0 so that J ( ) has a uniqueminimizer at = 0 . Assumption (A7) is satised for the formulation (3.55)directly by assumption (A7 ).

    In fact, the proof of Theorem 3.3.3 applies to any set of observations inwhich a change of variables can be used to produce a set of error random vari-ables which are independent and identically distributed. The weighted leastsquares problem addressed in the above discussion arises from an observationprocess in which the measurement errors are assumed to be independent butare not necessarily identically distributed. By rescaling the observations inaccordance with their variances (which are assumed to be known) one obtainserror random variables which are identically distributed as well as indepen-

    dent. Even more generally, it is not strictly necessary that the observationsbe independent. For instance, one might have observations generated by anautoregressive process of order r [34]. Then, by denition, some linear combi-nation of r observational errors will give rise to errors which are independentand identically distributed. This linear combination is exactly the changeof variables necessary to obtain a model which is suitable for ordinary leastsquares. Thus, even in the most general situation, when one has a generalcovariance matrix R, one may still use the Cholesky decomposition in themanner discussed above, provided one has sufficient assumptions regardingthe underlying error process. See [24, Chapter 2] for details.

    Analogous with (A7) above, the assumption (A7 ) is the most problem-atic to verify in practice. In the proof above for the weighted least squaresproblem, the assumption (A1) has been replaced with assumptions (A1 a)

    (A1

    b), a change which merely accounts for the heteroscedastic statisticalmodel. Then the assumptions (A2)(A6) for the rescaled model (3.55) canbe veried directly from the original assumptions (A2)(A6) for the ordinaryleast squares formulation, as shown above. The only exception is the as-sumption (A7), which cannot be established directly from the ordinary leastsquares assumptions; hence the need for assumption (A7 ). On one hand,the existence of a unique minimizer (assumption (A6)) is sufficient to prove

  • 8/11/2019 Chap 3 of BHT

    36/78

    Mathematical and Statistical Aspects of Inverse Problems 73

    that the matrix J or J must be positive semi -denite, so that the assump-tion (A7) or (A7 ) may not be overly restrictive. Alternatively, as has beennoted before, one can relax the assumptions (A7) or (A7 ) by assuming theexistence of a dominating function b. Moreover, provided the weights w(t)satisfy the requirement w(t) w > 0 for all t [t0 , t f ], then one can use thedominating function b (from the ordinary least squares problem) to obtain anew dominating function b(E , t ) = w 1 b(E , t) which is also -integrable. Evenin this case, though, one must still make the additional assumption that J isinvertible.

    We note that asymptotic results similar to those of Theorems 3.3.1 and3.3.2 above can be given for the GLS estimators dened in the estimatingequation (3.37) and their analogues for the general vector observation case see p. 8889 of [34]. All of the results above also readily extend to systemsgoverned by partial differential equations; the essential elements of the theoryare the form of the discrete observation operator for the dynamical systemand the statistical model as given in (3.6).

    3.4 Computation of N , Standard Errors and CondenceIntervals

    We return to the case of N scalar longitudinal observations and consider theOLS case of Section 3.2 (the extension of these ideas to vectors is completely

    straightforward). Recall that in the ordinary least squares approach, we seekto use a realization {yj }of the observation process {Y j }along with the modelto determine a vector

    N OLS where

    N OLS = arg min

    J N OLS ( ; y) = arg min

    N

    j =1

    [yj f (t j ; )]2 . (3.56)

    Since Y j is a random variable, the corresponding estimator N OLS (here we wishto emphasize the dependence on the sample size N ) is also a random vectorwith a distribution called the sampling distribution . Knowledge of this sam-pling distribution provides uncertainty information (e.g., standard errors) forthe numerical values of

    N obtained using a specic data set {yj }. In particu-

    lar, loosely speaking, the sampling distribution characterizes the distributionof possible values the estimator could take on across all possible realizationswith data of size N that could be collected. The standard errors thus approx-imate the extent of variability in possible parameter values across all possiblerealizations, and hence provide a measure of the extent of uncertainty involvedin estimating using a specic estimator and sample size N in actual datacollection.

  • 8/11/2019 Chap 3 of BHT

    37/78

    74 Modeling and Inverse Problems in the Presence of Uncertainty

    The uncertainty quantications (as embodied in standard errors) we discusshere are given in terms of standard nonlinear regression approximation theory([7, ? , 21, 24, 27], and Chapter 12 of [34]) for asymptotic (as N ) dis-tributions . As we have already mentioned for N large, and a correspondingdata set Y = ( Y 1 , . . . , Y N )T , the sampling distribution satises the approxi-mation (see (3.14))

    N OLS (Y ) N ( 0 , N 0 ) N ( 0 , 20 [F N ( 0)T F N ( 0)] 1). (3.57)We thus see that the quantity F is the fundamental entity in computationalaspects of this theroy. There are typically several ways to compute the matrixF (which actually is composed of the well known sensitivity functionswidely used in applied mathematics and engineeringe.g., see the discussions

    in [3, 4, 6] and the references therein). First, the elements of the matrixF = ( F jk ) can always be estimated using the forward difference

    F jk ( ) = f (t j ; )

    k f (t j ; + h k ) f (t j ; )

    |h k | ,

    where h k is a -vector with a non-zero entry in only the kth componentwhich is chosen small and | |is the Euclidean norm in R . But the choiceof h k can be problematic in practice, i.e., what does small mean, especiallywhen the parameters may vary by orders of magnitude? Of course, in somecases the function f (t j ; ) may be sufficiently simple so as to allow one toderive analytical expressions for the components of F .

    Alternatively, if the f (t j ; ) correspond to longitudinal observations f (t j ; )=

    Cx (t j ; ) of solutions to a parameterized n-vector differential equation sys-

    tem x = g(t, x (t), q ) as in (3.1)(3.2), then one can use the n matrixsensitivity equations (see [4, 6] and the references therein)ddt

    x

    = g x

    x

    + g

    (3.58)

    to obtainf (t j ; )

    k= C

    x (t j , )k

    .

    To be a little more specic, we may examine the variations in the outputof a model f resulting from variations in the parameters q and the initialconditions

    x 0 . In this section, for notational convenience, we temporarily

    assume

    x 0 = x 0 , i.e., we assume we estimate all of the initial conditions in

    our discussions of the sensitivity equations.In order to quantify the variation in the state variable x (t) with respect tochanges in the parameters q and the initial conditions x 0 , we are naturallyled to consider the individual (traditional) sensitivity functions (TSF) denedby the derivatives

    s qk (t) = xq k

    (t) = xq k

    (t, ), k = 1, . . . , q , (3.59)

  • 8/11/2019 Chap 3 of BHT

    38/78

    Mathematical and Statistical Aspects of Inverse Problems 75

    andr x 0 l (t) =

    xx 0l

    (t) = xx 0l

    (t, ), l = 1 , . . . , n , (3.60)

    where x0l is the lth component of the initial condition x 0 . If the function gis sufficiently regular, the solution x is differentiable with respect to q k andx0l , and therefore the sensitivity functions s qk and r x 0 l are well dened.

    Often in practice, the model under investigation is simple enough to allowus to combine the sensitivity functions (3.59) and (3.60), as is the case withthe logistic growth population example discussed below. However, when onedeals with a more complex model, it is often preferable to consider thesesensitivity functions separately for clarity purposes.

    Because they are dened by partial derivatives which have a local character,the sensitivity functions are also local in nature. Thus sensitivity and insen-sitivity ( s qk = x /q k not close to zero and very close to zero, respectively)depend on the time interval, the state values x and the values of for whichthey are considered. Thus, for example, in a certain time subinterval we mightnd s qk small so that the state variable x is insensitive to the parameter q kon that particular interval. The same function s qk can take large values on adifferent subinterval, indicating to us that the state variable x is very sensi-tive to the parameter q k on the latter interval. From the sensitivity analysistheory for dynamical systems, one nds that s = ( s q1 , . . . , s q q ) is an n qvector function that satises the ODE system

    s (t) = g x

    (t, x (t; ), q )s (t) + g q

    (t, x (t; ), q ), (3.61)

    s (t0) = 0n q ,which is obtained by differentiating (3.1)(3.2) with respect to q. Here thedependence of s on (t, x (t; )) as well as q is readily apparent.

    In a similar manner, the sensitivity functions with respect to the com-ponents of the initial condition x 0 dene an n n vector function r =(r x 01 , . . . , r x 0 n ), which satises

    r (t) = g x

    (t, x (t; ), q )r (t), (3.62)

    r (t0) = In .

    This is obtained by differentiating (3.1)(3.2) with respect to the initial con-ditions x 0 . Equations (3.61) and (3.62) are used in conjunction with (i.e.,usually solved simultaneously with) Equations (3.1)(3.2) to numerically com-pute the sensitivities s and r for general cases when the function g is suffi-ciently complicated to prohibit a closed form solution by direct integration.

    These can be succinctly written as a system for x

    = x q

    , x x 0

    given

    by (3.58).

  • 8/11/2019 Chap 3 of BHT

    39/78

  • 8/11/2019 Chap 3 of BHT

    40/78

    Mathematical and Statistical Aspects of Inverse Problems 77

    Given a small value (e.g., = 0.05 for 95% condence intervals), thecritical value t1 / 2 is computed from the Student

    s t distribution tN with N degrees of freedom. The value of t1 / 2 is determined byProb {T t1 / 2} = / 2 where T tN . In general, a condence in-terval is constructed so that, if the condence interval could be constructedfor each possible realization of data of size N that could have been collected,100(1 )% of the intervals so constructed would contain the true value 0k .Thus, a condence interval provides further information on the extent of un-certainty involved in estimating 0 using the given estimator and sample sizeN .

    Remark 3.4.1 We turn to a further comment on the use of the Student s t

    distribution in computing condence intervals. We have already argued N

    OLS

    N 0 , N 0 . We can further establish (N )S 2

    20 2N

    where

    S 2 = J N OLS ( N OLS ; Y )

    N and 2 is a chi-square distribution with degrees of freedom. Moreover,

    N OLS

    and S 2 are independent.Some comments about these three statements are in order. The rst distri-

    butional statement has already been established. The third statement regarding

    independence is exact for linear regression problems. A proof of the statement can be found in [33]. In the non-linear case, the statement is true to within the order of the linear approximation. The second distributional statement requires some explanation. We can argue

    (N )S 220

    = 120

    N

    j =1Y j f (t j ; N OLS )

    2

    120

    N

    j =1E 2j

    =N

    j =1

    E j0

    2

    . (3.67)

    If the errors E j are assumed to be independent and normally distributed with zero mean and constant variance 20 , then the nal expression above is (by denition) distributed as 2N . The missing degrees of freedom follow because the estimator N OLS must satisfy the normal equations (3.11) . If the errors are not assumed to be normally distributed, one has to modify the

  • 8/11/2019 Chap 3 of BHT

    41/78

    78 Modeling and Inverse Problems in the Presence of Uncertainty

    arguments. However, we are only interested in behavior as N . If we as-sume only that the E j are independent and identically distributed with constant variance (as is commonly the case), the Central Limit Theorem (i.e., Theo-rem 2.6.2) applies and the distribution (whether chi-squared or not) will tend toward the normal. Even in the more general case of non-constant variance,a simple change of variables can be used to reduce to the constant variance case. Hence we really only need the variance terms to be bounded above.

    Next, consider only a single element, N l , of the estimator. Then we have as N N l 0,l

    N 0,ll

    N (0, 1).

    Now dene T =

    N l 0,l N 0,ll(N )S 2(N )20

    1/ 2

    The rst factor above is distributed as a standard normal variable. The second factor above is a chi-squared random variable rescaled by its degrees of freedom,and then raised to the 1/ 2 power. Thus T is distributed as tN . Moreover,by simple algebra (and using (3.16)),

    T = N l 0,lS (N 0) 1ll .

    One then makes standard arguments (i.e., approximation of 0 see (3.17))

    to arrive at the usual condence interval calculations.When one is taking longitudinal samples corresponding to solutions of a

    dynamical system, the N sensitivity matrix depends explicitly on wherein time the scalar observations are taken when f (t j ; ) = Cx (t j ; ), as men-tioned above. That is, the sensitivity matrix (3.15) depends on the numberN and the nature (for example, how they are taken) of the sampling times

    {t j }. Moreover, it is the matrix [ F T F ] 1 in (3.63) and the parameter 2in (3.64) that ultimately determine the standard errors and condence inter-vals. At rst investigation of (3.64), it appears that an increased number N of samples might drive 2 20 and hence drive the standard error (SE) tozero as long as this is done in a way to maintain a bound on the residualsum of squares in (3.64). However, we observe that the condition number

    of the Fisher information matrix F T F is also very important in these con-siderations and increasing the sampling could potentially adversely affect the

    numerical inversion of F T F . In this regard, we note that among the impor-tant hypotheses in the asymptotic statistical theory (see p. 571 of [34]) is theexistence of a matrix function ( ) such that

    1N F

    N ( )

    T F N ( ) ( ) uniformly in as N ,

  • 8/11/2019 Chap 3 of BHT

    42/78

    Mathematical and Statistical Aspects of Inverse Problems 79

    with 0 = ( 0) being a non-singular matrix. It is this condition that israther easily violated in practice when one is dealing with data from differen-tial equation systems, especially near an equilibrium or steady state (see theexamples of [6] where this rather common phenomenon is illustrated).

    Since the computations for standard errors and condence intervals (andalso model comparison tests to be discussed in a subsequent chapter) dependon an asymptotic limit distribution theory , one should interpret the ndingsas sometimes crude indicators of uncertainty inherent in the inverse problemndings. Nonetheless, it is useful to consider the formal mathematical re-quirements underpinning these techniques. We offer the following summaryof possibilities:

    (1) Among the more readily checked hypotheses are those of the statisticalmodel requiring that the errors E j , j = 1, 2, . . . , N , are independent andidentically distributed ( i.i.d. ) random variables with mean E (E j ) = 0and constant variance Var( E j ) = 20 . After carrying out the estimationprocedures, one can readily plot the residuals r j = yj f (t j ;

    N OLS ) vs.

    time tj and the residuals vs. the resulting estimated model/observation f (t j ;

    N OLS ) values . A random pattern for the rst is strong support for

    the validity of the independence assumption; a random pattern for thelatter suggests the assumption of constant variance may be reasonable.This will be further explained in the next section.

    (2) The underlying assumption that the sampling size N must be large (re-call the theory is asymptotic in that it holds as N

    ) is not so readily

    veried and is often ignored (albeit at the users peril in regard to thequality of the uncertainty ndings). Often asymptotic results provideremarkably good approximations to the true sampling distributions fornite N . However, in practice there is no way to ascertain whether allassumptions for the theory hold and N is sufficiently large for a specicexample.

    All of the above theory readily generalizes to vector systems with partial,non-scalar observations. For example, suppose now we have the vector sys-tem (3.1) with partial vector observations given by Equation (3.5). That is,suppose we have m coordinate observations where m n. In this case, wehave

    dx

    dt (t) = g (t, x (t), q ) (3.68)

    andY j = f (t j ; 0) + E j = Cx (t j , 0) + E j , (3.69)

    where C is an m n matrix and f (t j ; 0) R m , x R n . As already explainedin Section 3.2, if we assume that different observation coordinates Y j may have

  • 8/11/2019 Chap 3 of BHT

    43/78

    80 Modeling and Inverse Problems in the Presence of Uncertainty

    different variances 20,j associated with different coordinates of the errors E j ,

    then we have that E j is an m-dimensional random vector withE (E j ) = 0m , Var( E j ) = V 0 ,where V 0 = diag( 20,1 ,...,

    20,m ), and we may follow a similar asymptotic theory

    to calculate approximate covariances, standard errors and condence intervalsfor parameter estimates.

    3.5 Investigation of Statistical AssumptionsThe form of error in the data (which of course is rarely known) dictates

    which method from those discussed above one should choose. The OLSmethod is most appropriate for constant variance observations of the formY j = f (t j ; 0) + E j whereas the GLS should be used for problems in which wehave non-constant variance observations Y j = f (t j ; 0) + f (t j ; 0)E j .We emphasize that to obtain the correct standard errors in an inverse prob-lem calculation, the OLS method (and corresponding asymptotic formulas )must be used with constant variance generated data, while the GLS method(and corresponding asymptotic formulas ) should be applied to non-constantvariance generated data.

    Not doing so can lead to incorrect conclusions . In either case, the standard

    error calculations are not valid unless the correct formulas (which dependon the error structure) are employed. Unfortunately, it is very difficult toascertain the structure of the error, and hence the correct method to use,without a priori information. Although the error structure cannot denitivelybe determined, the two residual tests can be performed after the estimationprocedure has been completed to assist in concluding whether or not thecorrect asymptotic statistics were used.

    3.5.1 Residual Plots

    One can carry out simulation studies with a proposed mathematical modelto assist in understanding the behavior of the model in inverse problems withdifferent types of data with respect to misspecication of the statistical model.For example, we consider a statistical model with constant variance (CV) noise( = 0)

    Y j = f (t j ; 0) + E j , Var( Y j ) = 20 ,

    and another with non-constant variance (NCV) noise ( = 1)

    Y j = f (t j ; 0)(1 + E j ), Var( Y j ) = 20 f

    2(t j ; 0).

  • 8/11/2019 Chap 3 of BHT

    44/78

    Mathematical and Statistical Aspects of Inverse Problems 81

    We obtain a data set by considering a realization {yj }N j =1 of the randomvariables {Y j }N j =1 through a realization of {E j }N j =1 , and then calculate anestimate of 0 using the OLS or GLS procedure. Other values for couldalso readily be analyzed and in fact will be compared in an example below (inSection 3.5.3) on cell proliferation models. Here we focus on two of the moreprominent statistical models for absolute error vs. relative error.

    We will then use the residuals rj = yj f (t j ; ) to test whether the dataset is i.i.d. and possesses the assumed variance structure. If a data set hasconstant variance, then

    Y j = f (t j ; 0) +

    E j or

    E j = Y j f (t j ; 0),

    and hence the residuals r j are approximations to realizations of the errors

    E j

    (when it is tacitly assumed that 0). As we have discussed above and wantto summarize again, since it is assumed that the errors E j are i.i.d. , a plot of the residuals r j = yj f (t j ; ) vs. t j should be random (and neither increasingnor decreasing with time). Also, the error in the constant variance case doesnot depend on f (t j ; 0), and so a plot of the residuals r j = yj f (t j ; )vs. f (t j ; ) should also be random (and neither increasing nor decreasing).Therefore, if the error has constant variance, then a plot of the residualsr j = yj f (t j ; ) against tj and against f (t j ; ) should both be random. If not, then the constant variance assumption is suspect.

    We next turn to questions of what to expect if this residual test is appliedto a data set that has non-constant variance (NCV) generated error. That is,we wish to investigate what happens if the data are incorrectly assumed tohave CV error when in fact they have NCV error. Since in the NCV example,R j = Y j f (t j ; 0) = f (t j ; 0) E j depends upon the deterministic modelf (t j ; 0), we should expect that a plot of the residuals r j = yj f (t j ; ) vs.t j should exhibit some type of pattern. Also, the residuals actually dependon f (t j ; ) in the NCV case, and so as f (t j ; ) increases the variation of theresiduals rj = yj f (t j ; ) should increase as well. Thus rj = yj f (t j ; )vs. f (t j ; ) should have a fan shape in the NCV case.

    If a data set has non-constant variance generated data, then

    Y j = f (t j ; 0 ) + f (t j ; 0) E j or E j = Y j f (t j ; 0)

    f (t j ; 0) .

    If the distributions of E j are i.i.d. , then a plot of the modied residuals r

    m

    j =(yj f (t j ; )) /f (t j ; ) vs. tj should be random for non-constant variancegenerated data. A plot of r mj = ( yj f (t j ; ))/f (t j ; ) vs. f (t j ; ) should alsobe random.

    Another question of interest concerns the case in which the data are incor-rectly assumed to have non-constant variance error when in fact they haveconstant variance error. Since Y j f (t j ; 0) = E j in the constant variance

  • 8/11/2019 Chap 3 of BHT

    45/78

    82 Modeling and Inverse Problems in the