Relating models to data: A review P.D. O’Neill University of Nottingham.
-
Upload
timothy-mclaughlin -
Category
Documents
-
view
215 -
download
0
Transcript of Relating models to data: A review P.D. O’Neill University of Nottingham.
![Page 1: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/1.jpg)
Relating models to data: Relating models to data: A reviewA review
P.D. O’NeillP.D. O’Neill
University of NottinghamUniversity of Nottingham
![Page 2: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/2.jpg)
CaveatsCaveats
Scope is strictly limited Review with a view to future challenges
![Page 3: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/3.jpg)
OutlineOutline1.1. Why relate models to data?Why relate models to data?
2.2. How to relate models to dataHow to relate models to data
3.3. Present and future challengesPresent and future challenges
![Page 4: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/4.jpg)
OutlineOutline1.1. Why relate models to data?Why relate models to data?
2.2. How to relate models to dataHow to relate models to data
3.3. Present and future challengesPresent and future challenges
![Page 5: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/5.jpg)
1. Why relate models to data?1. Why relate models to data?
1. Scientific hypothesis testing1. Scientific hypothesis testing
e.g. Can within-host heterogeneity of susceptibility to HIV explain decreasing prevalence?
e.g. Did control measures alone control SARS in Hong Kong?
![Page 6: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/6.jpg)
1. Why relate models to data?1. Why relate models to data?
2. Estimation2. Estimation
e.g. What is R0?
e.g. What is the efficacy of a vaccine?
![Page 7: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/7.jpg)
1. Why relate models to data?1. Why relate models to data?
3. What-if scenarios3. What-if scenarios e.g. What would have happened if
transport restrictions were in place sooner in the UK foot and mouth outbreak?
e.g. How much would school closure prevent spread of influenza?
![Page 8: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/8.jpg)
1. Why relate models to data?1. Why relate models to data?
4. Real-time analyses4. Real-time analyses
e.g. Has the epidemic finished yet?
e.g. Are control measures effective?
![Page 9: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/9.jpg)
1. Why relate models to data?1. Why relate models to data?
5. Calibration/parameterisation5. Calibration/parameterisation
e.g. What range of parameter values are sensible for simulation studies?
![Page 10: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/10.jpg)
OutlineOutline1.1. Why relate models to data?Why relate models to data?
2.2. How to relate models to dataHow to relate models to data
3.3. Present and future challengesPresent and future challenges
![Page 11: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/11.jpg)
2. How to relate models to data2. How to relate models to data
2.1 Fitting deterministic models Options include
(i) “Estimation from the literature”(ii) Least-squares / minimise metric(iii) Can be Bayesian (Elderd, Dukic and
Dwyer 2006)
![Page 12: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/12.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models
Available methods depend heavily on the model and the data.
![Page 13: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/13.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models
(i) Explicit likelihood
e.g. Longini-Koopman model for household data (Longini and Koopman, 1982)
![Page 14: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/14.jpg)
2. How to relate models to data2. How to relate models to data
P (Avoid infection from outside) = q
P (Avoid infection from housemate) = p
Given data on final outcome in (independent) households, can formulate likelihood L (p,q)
SEIR model within household
![Page 15: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/15.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models
(i) Explicit likelihood (continued)
Related household models examples:
Bayesian analysis (O’Neill at al., 2000)
Multi-type models (van Boven et al., 2007)
![Page 16: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/16.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models(i) Explicit likelihood (continued) Methods include Max likelihood (e.g. Longini and Koopman, 1982)EM algorithm (e.g. Becker, 1997)MCMC (e.g. O’Neill et al., 2000)Rejection sampling (e.g. Clancy and O’Neill, 2007)
![Page 17: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/17.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models
(ii) No explicit likelihood
Can arise due to model complexity and/or insufficient data
![Page 18: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/18.jpg)
2. How to relate models to data2. How to relate models to dataEver-infectedNever-infected
Sample Unseen
Two-level mixing model
![Page 19: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/19.jpg)
2. How to relate models to data2. How to relate models to dataIndividual-based
transmission models involve
unseen infection times
![Page 20: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/20.jpg)
2. How to relate models to data2. How to relate models to dataEven detailed data from
studies generally only give
bounds on unseen infection
times – e.g. infection occurs
between last –ve test and first
+ve test
![Page 21: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/21.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models(ii) No explicit likelihood Solutions include: Use a simpler approximating model e.g. use pseudolikelihood, e.g. Ball, Mollison and
Scalia-Tomba, 1997
![Page 22: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/22.jpg)
2. How to relate models to data2. How to relate models to dataEver-infectedNever-infected
Two-level mixing model
Explicit interactions between households
![Page 23: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/23.jpg)
2. How to relate models to data2. How to relate models to dataEver-infectedNever-infected
Two-level mixing model
-> independent households model
In a large population, households are approximately independent
![Page 24: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/24.jpg)
2. How to relate models to data2. How to relate models to data2.2 Fitting stochastic models(ii) No explicit likelihood Solutions include: Use a simpler approximating model e.g. discrete-time model instead of a continuous time model
(e.g. Lekone and Finkenstädt, 2006)
![Page 25: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/25.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models(ii) No explicit likelihood Solutions include: Direct approach – e.g. Martingale methods
(Becker, 1989)
![Page 26: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/26.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models(ii) No explicit likelihood Solutions include: Data augmentation: add in “missing data” or extra
model parameters to formulate a likelihood
![Page 27: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/27.jpg)
2. How to relate models to data2. How to relate models to data2.2 Fitting stochastic models(ii) No explicit likelihood: Data augmentation (continued) Common example - model describes individual-to-individual transmission- observe times of case ascertainment, test results, etc, but
not times of infection/exposure- augment data with missing infection/exposure times
![Page 28: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/28.jpg)
2. How to relate models to data2. How to relate models to data
Exposure time
Infectivity starts
Not observed Observed data
Infectivity ends
= -ve test
TI
TE
Höhle et al. (2005)
= +ve test
![Page 29: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/29.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models(ii) No explicit likelihood: Data augmentation
(continued)
Data-augmentation methods include MCMC (e.g. Gibson and Renshaw, 1998; O’Neill
and Roberts, 1999; Auranen et al., 2000) EM algorithm (e.g. Becker, 1997)
![Page 30: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/30.jpg)
2. How to relate models to data2. How to relate models to data
2.2 Fitting stochastic models(ii) No explicit likelihood: Data augmentation
(continued)
Data-augmentation methods can also be used in less “obvious” settings
e.g. final size data for complex models
![Page 31: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/31.jpg)
2. How to relate models to data2. How to relate models to dataEver-infectedNever-infected
Two-level mixing model
Augment parameter space using links to describe potential infections
Data
Demiris and O’Neill, 2005
![Page 32: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/32.jpg)
OutlineOutline1.1. Why relate models to data?Why relate models to data?
2.2. How to relate models to dataHow to relate models to data
3.3. Present and future challengesPresent and future challenges
![Page 33: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/33.jpg)
3. Present & future challenges3. Present & future challenges
3.1 Large populations/complex models
Current methods often struggle with large-scale problems.
e.g: Large population, Many missing data, Many hard-to-estimate parameters/covariates
![Page 34: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/34.jpg)
3. Present & future challenges3. Present & future challenges3.1 Large populations/complex models
e.g. UK foot & Mouth outbreak 2001
Keeling et al. (2001) stochastic discrete-time model, parameterised via likelihood estimation and tuning/ simulation.
Attempting to fit this kind of model using “standard” Bayesian/MCMC methods does not work well.
![Page 35: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/35.jpg)
3. Present & future challenges3. Present & future challenges
Large data set and many missing data can cause problems for standard (and also non-standard) MCMC
![Page 36: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/36.jpg)
3. Present & future challenges3. Present & future challenges
3.1 Large populations/complex models
e.g. Measles data
Cauchemez and Ferguson (2008) discuss the problems that arise when fitting a standard SIR model to large-scale temporal aggregated data in a large population using standard methods.
![Page 37: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/37.jpg)
3. Present & future challenges3. Present & future challenges
3.1 Large populations/complex models
Problems of this kind are usually tackled via approximations (e.g. of the model itself).
Challenge: Can generic non-approximate methods be found?
![Page 38: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/38.jpg)
3. Present & future challenges3. Present & future challenges
3.2 Data augmentation
Comment: this technique is surprisingly powerful and is (probably) under-developed.
![Page 39: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/39.jpg)
3. Present & future challenges3. Present & future challenges
3.2 Data augmentation
e.g. Cauchemez and Ferguson (2008) use a novel MCMC data-augmentation scheme using a diffusion model to approximate an SIR epidemic model.
![Page 40: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/40.jpg)
3. Present & future challenges3. Present & future challenges
3.2 Data augmentation
e.g. For final size data, instead of imputing a graph describing infection pathways, could instead impute generations of infection (joint work with Simon White).
This can lead to much faster MCMC algorithms.
![Page 41: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/41.jpg)
3. Present & future challenges3. Present & future challengesEver-infectedNever-infected
Two-level mixing model
Imputing edges in graph
![Page 42: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/42.jpg)
3. Present & future challenges3. Present & future challengesEver-infectedNever-infected
Two-level mixing model
1
2
2
2
3
4
4
5
Infection chain = {1, 3, 1, 2, 1}
![Page 43: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/43.jpg)
3. Present & future challenges3. Present & future challenges
3.2 Data augmentation
e.g. Augmented data can also (sometimes) be used to bound quantities of interest.
Clancy and O’Neill (2008) show how to obtain stochastic bounds on R0 and other quantities by considering “minimal” and “maximal” configurations of unobserved infection times in an SIR model.
![Page 44: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/44.jpg)
3. Present & future challenges3. Present & future challenges
3.2 Data augmentation
Observed removal times
Imputed infection times
x x x x x
x
![Page 45: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/45.jpg)
3. Present & future challenges3. Present & future challenges
3.2 Data augmentation
Observed removal times
Imputed infection times
xxxxx
x
Soon as possible
![Page 46: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/46.jpg)
3. Present & future challenges3. Present & future challenges
3.2 Data augmentation
Observed removal times
Imputed infection times
x x x x x
x
Late as possible
Can show that “Soon as possible” maximises R0 but that minimal value is not necessarily given by “Late as possible” – use Linear Programming to find actual solution.
General idea also applicable to final outcome data
![Page 47: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/47.jpg)
3. Present & future challenges3. Present & future challenges
3.3 Model fit and model choice
Various methods are used in the literature to assess model fit, e.g.
Simulation-based methods; use of Bayesian predictive distribution; standard methods where applicable; Bayesian p-values
![Page 48: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/48.jpg)
3. Present & future challenges3. Present & future challenges
3.3 Model fit and model choice
Likewise for model choice methods include AIC, RJMCMC
Challenge Better understanding of pros and cons of such methods
![Page 49: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/49.jpg)
ReferencesReferencesB. D. Elderd, V. M. Dukic, and G. Dwyer (2006) Uncertainty in predictions of disease spread and public health responses to
bioterrorism and emerging diseases. PNAS 103, 15693-15697
I.M. Longini, Jr and J.S. Koopman (1982) Household and community transmission parameters from final distributions of infections in households. Biometrics 38, 115-126.
P.D. O'Neill, D. J. Balding, N. G. Becker, M. Eerola and D. Mollison (2000) Analyses of infectious disease data from household outbreaks by Markov Chain Monte Carlo methods. Applied Statistics 49, 517-542.
M. Van Boven, M. Koopmans, M. D. R. van Beest Holle, A. Meijer, D. Klinkenberg, C. A. Donnelly and H.A.P. Heesterbeek (2007) Detecting emerging transmissibility of Avian Influenza virus in human households. PLoS Computational Biology 3, 1394-1402.
D. Clancy and P.D. O'Neill (2007) Exact Bayesian inference and model selection for stochastic models of epidemics among a community of households. Scandinavian Journal of Statistics 34, 259-274.
N.G. Becker (1997) Uses of the EM algorithm in the analysis of data on HIV/AIDS and other infectious diseases. Statistical Methods in Medical Research 6, 24-37.
F.G. Ball, D. Mollison and G-P. Scalia-Tomba (1997) Epidemic models with two levels of mixing. Annals of Applied Probability 7, 46-89.
M. Höhle, E. Jørgensen. and P.D. O'Neill (2005) Inference in disease transmission experiments by using stochastic epidemic models. Applied Statistics 54, 349-366.
![Page 50: Relating models to data: A review P.D. O’Neill University of Nottingham.](https://reader031.fdocuments.in/reader031/viewer/2022033107/56649edb5503460f94beb20f/html5/thumbnails/50.jpg)
References…References…N. G. Becker (1989) Analysis of Infectious Disease Data. Chapman and Hall, London.
G. Gibson and E. Renshaw (1998). Estimating parameters in stochastic compartmental models using Markov chain methods. IMA Journal of Mathematics Applied in Medicine and Biology 15, 19-40.
P.D. O’Neill and G.O. Roberts (1999) Bayesian inference for partially observed stochastic epidemics. Journal of the Royal Statistical Society Series A 162, 121-129.
K. Auranen, E. Arjas, T. Leino and A. K. Takala (2000) Transmission of pneumococcal carriage in families: a latent Markov process
model for binary longitudinal data. Journal of the American Statistical Association 95, 1044-1053.
P.E. Lekone and B.F. Finkenstädt (2006) Statistical Inference in a stochastic epidemic SEIR model with control intervention: Ebola as a case study. Biometrics 62, 1170-1177.
M.J. Keeling, M.E.J. Woolhouse, D.J. Shaw, L. Matthews, M. Chase-Topping, D.T. Haydon, S.J. Cornell, J. Kappey, J. Wilesmith, B.T. Grenfell (2001). Dynamics of the 2001 UK Foot and Mouth Epidemic: Stochastic Dispersal in a Heterogeneous Landscape. Science 294, 813-817.
S. Cauchemez and N.M. Ferguson (2008). Likelihood-based estimation of continuous-time epidemic models from time-series data: application to measles transmission in London. Journal of the Royal Society Interface 5, 885-897.
D. Clancy and P.D. O'Neill (2008) Bayesian estimation of the basic reproduction number in stochastic epidemic models. Bayesian Analysis, in press.