Spatial and Syndromic Surveillance for Public Health · 1.3 Geographic information, cluster...

29
Spatial and Syndromic Surveillance for Public Health Edited by Andrew B. Lawson Department of Epidemiology and Biostatistics University of South Carolina, USA Ken Kleinman Department of Ambulatory Care and Prevention Harvard Medical School, USA

Transcript of Spatial and Syndromic Surveillance for Public Health · 1.3 Geographic information, cluster...

  • Spatial and SyndromicSurveillance for

    Public Health

    Edited by

    Andrew B. LawsonDepartment of Epidemiology and Biostatistics

    University of South Carolina, USA

    Ken KleinmanDepartment of Ambulatory Care and Prevention

    Harvard Medical School, USA

    Innodata0470092491.jpg

  • Spatial and SyndromicSurveillance for Public Health

  • Spatial and SyndromicSurveillance for

    Public Health

    Edited by

    Andrew B. LawsonDepartment of Epidemiology and Biostatistics

    University of South Carolina, USA

    Ken KleinmanDepartment of Ambulatory Care and Prevention

    Harvard Medical School, USA

  • Copyright © 2005 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,West Sussex PO19 8SQ, England

    Telephone (+44) 1243 779777

    Email (for orders and customer service enquiries): [email protected] our Home Page on www.wiley.com

    All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system ortransmitted in any form or by any means, electronic, mechanical, photocopying, recording,scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 orunder the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham CourtRoad, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to thePublisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium,Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to [email protected], orfaxed to (+44) 1243 770571.

    This publication is designed to provide accurate and authoritative information in regard to thesubject matter covered. It is sold on the understanding that the Publisher is not engaged in renderingprofessional services. If professional advice or other expert assistance is required, the services of acompetent professional should be sought.

    Other Wiley Editorial Offices

    John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

    Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

    Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

    John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia

    John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

    John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1

    Wiley also publishes its books in a variety of electronic formats. Some content that appearsin print may not be available in electronic books.

    British Library Cataloguing in Publication Data

    A catalogue record for this book is available from the British Library

    ISBN 0-470-09248-3 (HB)

    Typeset in 10/12pt Photina by Integra Software Services Pvt. Ltd, Pondicherry, IndiaPrinted and bound in Great Britain by TJ International Ltd, Padstow, CornwallThis book is printed on acid-free paper responsibly manufactured from sustainable forestryin which at least two trees are planted for each one used for paper production.

    http://www.wiley.com

  • Contents

    Preface xi

    List of Contributors xiii

    1 Introduction: Spatial and syndromic surveillancefor public health 1Andrew B. Lawson and Ken Kleinman

    1.1 What is public health surveillance? 11.1.1 Spatial surveillance 11.1.2 Syndromic surveillance 2

    1.2 The increased importance of public health surveillance 21.3 Geographic information, cluster detection and spatial surveillance 31.4 Surveillance and screening 41.5 Overview of process control and mapping 5

    1.5.1 Process control methodology 51.5.2 The analysis of maps and surveillance 6

    1.6 The purpose of this book 71.6.1 Statistical surveillance and methodological development

    in a public health context 71.6.2 The statistician’s role in surveillance 7

    1.7 The contents of this book 8

    Part I Introduction to Temporal Surveillance 11

    2 Overview of temporal surveillance 13Yann Le Strat

    2.1 Introduction 132.1.1 Surveillance systems 132.1.2 Surveillance attributes 142.1.3 Early detection of unusual health events 15

    2.2 Statistical methods 162.2.1 Historical limits method 162.2.2 Process control charts 192.2.3 Time-series analysis 22

    2.3 Conclusion 28

  • vi Contents

    3 Optimal surveillance 31Marianne Frisén and Christian Sonesson

    3.1 Introduction 313.2 Optimality for a fixed sample and for on-line surveillance 333.3 Specification of the statistical surveillance problem 343.4 Evaluations of systems for surveillance 35

    3.4.1 Measures for a fixed sample situation adopted for surveillance 363.4.2 False alarms 373.4.3 Delay of the alarm 373.4.4 Predictive value 39

    3.5 Optimality criteria 393.5.1 Minimal expected delay 393.5.2 Minimax optimality 403.5.3 Average run length 40

    3.6 Optimality of some standard methods 413.6.1 The likelihood ratio method 413.6.2 The Shewhart method 433.6.3 The CUSUM method 443.6.4 Moving average and window-based methods 463.6.5 Exponentially weighted moving average methods 46

    3.7 Special aspects of optimality for surveillance of public health 483.7.1 Gradual changes during outbreaks of diseases 483.7.2 Change between unknown incidences 493.7.3 Spatial and other multivariate surveillance 50

    3.8 Concluding remarks 51Acknowledgment 52

    Part II Basic Methods for Spatial and SyndromicSurveillance 53

    4 Spatial and spatio-temporal disease analysis 55Andrew B. Lawson

    4.1 Introduction 554.2 Disease mapping and map reconstruction 564.3 Disease map restoration 57

    4.3.1 Simple statistical representations 574.3.2 Basic models 624.3.3 A simple overdispersion model 664.3.4 Advanced Bayesian models 67

    4.4 Residuals and goodness of fit 684.5 Spatio-temporal analysis 714.6 Surveillance issues 75

    5 Generalized linear models and generalized linearmixed models for small-area surveillance 77Ken Kleinman

    5.1 Introduction 775.2 Surveillance using small-area modeling 78

  • Contents vii

    5.2.1 Example 785.2.2 Using the model results 79

    5.3 Alternate model formulations 805.3.1 Fixed effects logistic regression 805.3.2 Poisson regression models 81

    5.4 Practical variations 825.5 Data 83

    5.5.1 Developing and defining syndromes 845.6 Evaluation 85

    5.6.1 Fixed and random effects monthly models 855.6.2 Daily versus monthly modeling 92

    5.7 Conclusion 93

    6 Spatial surveillance and cumulative sum methods 95Peter A. Rogerson

    6.1 Introduction 956.2 Statistical process control 96

    6.2.1 Shewhart charts 966.2.2 Cumulative sum charts 96

    6.3 Cumulative sum methods for spatial surveillance 1056.3.1 Maintaining a cumulative sum chart for each region 1056.3.2 Maintaining cumulative sum charts for local neighborhoods

    around each region 1066.3.3 Cumulative sum charts for global spatial statistics 1106.3.4 Multivariate cumulative sum methods 111

    6.4 Summary and discussion 112Acknowledgments 113Appendix 113

    7 Scan statistics for geographical diseasesurveillance: an overview 115Martin Kulldorff

    7.1 Introduction 1157.1.1 Geographical disease surveillance 1157.1.2 Tests for spatial randomness 1177.1.3 Scan statistics 117

    7.2 Scan statistics for geographical disease surveillance 1197.2.1 Probability models 1197.2.2 Likelihood ratio test 1207.2.3 Scanning window 1217.2.4 Adjustments 122

    7.3 Secondary clusters 1237.4 Null and alternative hypotheses 124

    7.4.1 The null hypothesis 1247.4.2 Spatial autocorrelation 1247.4.3 The alternative hypothesis 125

    7.5 Power 1267.6 Visualizing the detected clusters 1267.7 A Sample of applications 127

  • viii Contents

    7.7.1 Cancer surveillance 1277.7.2 Infectious diseases 1297.7.3 Other human diseases 1297.7.4 Veterinary medicine 1307.7.5 Plant diseases 131

    7.8 Software 131Acknowledgment 131

    8 Distance-based methods for spatialand spatio-temporal surveillance 133Laura Forsberg, Marco Bonetti, Caroline Jeffery, Al Ozonoffand Marcello Pagano

    8.1 Introduction 1338.2 Motivation 1348.3 Distance-based statistics for surveillance 136

    8.3.1 MEET statistic 1368.3.2 The interpoint distribution function and the M statistic 137

    8.4 Spatio-temporal surveillance: an example 1418.4.1 Temporal component 1428.4.2 Bivariate test statistic 1448.4.3 Power calculations 145

    8.5 Locating clusters 1478.6 Conclusion 151Acknowledgments 152

    9 Multivariate surveillance 153Christian Sonesson and Marianne Frisén

    9.1 Introduction 1539.2 Specifications 1549.3 Approaches to multivariate surveillance 155

    9.3.1 Reduction of dimensionality 1559.3.2 Reduction to one scalar statistic for each time 1569.3.3 Parallel surveillance 1579.3.4 Vector accumulation methods 1609.3.5 Simultaneous solution 162

    9.4 Evaluation of the properties of multivariatesurveillance methods 162

    9.5 Concluding discussion 164

    Part III Database Mining and Bayesian Methods 167

    10 Bayesian network approaches to detection 169Weng-Keen Wong and Andrew W. Moore

    10.1 Introduction 16910.2 Association rules 17010.3 WSARE 172

    10.3.1 Creating the baseline distribution 17210.3.2 Finding the best one-component rule 174

  • Contents ix

    10.3.3 Two-component rules 17410.3.4 Obtaining the p-value for each rule 176

    10.4 Evaluation 17710.4.1 The simulator 17710.4.2 Algorithms 179

    10.5 Results 18110.6 Conclusion 186

    11 Efficient scan statistic computations 189Daniel B. Neill and Andrew W. Moore

    11.1 Introduction 18911.1.1 The spatial scan statistic 19111.1.2 Randomization testing 19111.1.3 The naive approach 192

    11.2 Overlap-multiresolution partitioning 19311.2.1 Score bounds 196

    11.3 Results 19711.3.1 Comparison to SaTScan 200

    11.4 Conclusions and future work 201

    12 Bayesian data mining for health surveillance 203David Madigan

    12.1 Introduction 20312.2 Probabilistic graphical models 20412.3 Hidden Markov models for surveillance: illustrative examples 20612.4 Hidden Markov models for surveillance: further exploration 210

    12.4.1 Beyond normality 21012.4.2 How many hidden states? 21212.4.3 Label switching 21212.4.4 Multivariate extensions 212

    12.5 Random observation time hidden Markov models 21412.6 Interpretation of hidden Markov models for surveillance 22012.7 Discussion 221Acknowledgments 221

    13 Advanced modeling for surveillance: clusteringof relative risk changes 223Andrew B. Lawson

    13.1 Introduction 22313.2 Cluster concepts 22313.3 Cluster modeling 224

    13.3.1 Spatial modeling of case event data 22413.3.2 Spatial modeling of count data 23013.3.3 Spatio-temporal modeling of case and count data 231

    13.4 Syndromic cluster assessment 23513.4.1 The Bayesian posterior distribution 236

    13.5 Bayesian version of the optimal surveillance alarm function 239

  • x Contents

    13.5.1 Clustering and f�x�u���� 23913.5.2 A simple real-time biohazard model 240

    13.6 Computational issues 24213.7 Conclusions and future directions 243

    References 245

    Index 267

  • Preface

    This volume hopes to fill a growing need for the description of current method-ology in public health surveillance. Recent advances in syndromic surveillanceand, more generally, in spatial and multivariate surveillance have never beencollected in a single volume. The field of syndromic surveillance now attractsa wide audience due to the perceived need to implement wide-ranging monit-oring systems to detect possible health-related bioterrorism activity. In addition,many computer systems have been, and are being, developed that have thecapability to store and display large volumes of health data and to link dynamic-ally between different data streams. This capability has not been matched withextensive statistical research into the properties of the methods used in thesesystems. The ability of these systems to correctly sound health alarms whenneeded is of paramount importance. It is the task of statisticians to develop andevaluate the methodologies and to ensure that the correct interpretation is madeof evaluated data. This volume seeks to provide a synopsis of current practiceas well as a starting point for the development and evaluation of methods.In the production of this volume we have been helped by a great range of

    people. First we would like to thank our families who, throughout this venture,have been a great source of support. In addition we would like to thank thecontributors for their timely submission of interesting articles, as well as ourcolleagues in respective departments for helping with evaluation and criticism.Finally we would like to thank the staff of Wiley Europe for their continual helpduring the sometimes fraught stages of production. In particular, we thankKathryn Sharples the Statistics sub-editor and Lucy Bryan in production, aswell as Richard Leigh, the copy-editor.

    Andrew Lawson (Columbia)Ken Kleinman (Boston)

  • List of Contributors

    Marco Bonetti Instituto di Metodi Quantitativi Univerisita’ Bocconi VialeIsonzo, 25 Milano Italy [email protected]

    Laura Forsberg Department of Biostatistics Harvard School of Public Health655 Huntington Ave Boston MA 02115 USA [email protected]

    Marianne Frisén Statistical Research Unit Göteborg University Box 660 SE40530 Göteborg Sweden [email protected]

    Caroline Jeffery Department of Biostatistics Harvard School of Public Health655 Huntington Ave Boston MA 02115 USA [email protected]

    Ken Kleinman Department of Ambulatory Care and Prevention HarvardMedical School & Harvard Pilgrim Health Care 133 Brookline Avenue BostonMA 02215 USA [email protected]

    Martin Kulldorff Department of Ambulatory Care and Prevention HarvardMedical School and Harvard Pilgrim Health Care 133 Brookline Avenue 6thFloor Boston MA 02215 USA [email protected]

    Andrew B. Lawson Department of Epidemiology and Biostatistics ArnoldSchool of Public Health University of South Carolina Columbia SC 29208 [email protected]

    David Madigan Faculty of Arts and Sciences Rutgers University 77 HamiltonStreet New Brunswick NJ 08901 USA [email protected]

    Andrew W. Moore Carnegie Mellon University School of Computer Science5000 Forbes Avenue Pittsburgh PA 15213 USA [email protected]

  • xiv List of contributors

    Daniel B. Neill Carnegie Mellon University School of Computer Science5000 Forbes Avenue Pittsburgh PA 15213 USA [email protected]

    Al Ozonoff Department of Biostatistics Boston University School ofPublic Health 715 Albany Street Talbot East 512 Boston MA 02118 [email protected]

    Marcello Pagano Department of Biostatistics Harvard School of Public Health655 Huntington Ave Boston MA 02115 USA [email protected]

    Peter A. Rogerson Departments of Geography and Biostatistics and NationalCenter for Geographic Information and Analysis University at Buffalo BuffaloNY 14261 USA [email protected]

    Christian Sonesson Statistical Research Unit Göteborg University Box 660SE 40530 Göteborg Sweden [email protected]

    Yann Le Strat Département des Maladies Infectieuses Institut de VeilleSanitaire 12 rue du Val d’Osne 94415 Saint-Maurice Cedex France

    Weng-Keen Wong Center for Biomedical Informatics 200 Lothrop Street8084 Forbes Tower Pittsburgh PA 15213 USA [email protected]

  • 1

    Introduction: Spatial andSyndromic Surveillance for

    Public HealthAndrew B. Lawson and Ken Kleinman

    1.1 WHAT IS PUBLIC HEALTH SURVEILLANCE?

    The Centers for Disease Control and Prevention (CDC) define public healthsurveillance as:

    the ongoing, systematic collection, analysis, and interpretation of health dataessential to the planning, implementation, and evaluation of public health practice,closely integrated with the timely dissemination of these data to those who needto know. The final link of the surveillance chain is the application of these data toprevention and control. A surveillance system includes a functional capacity fordata collection, analysis, and dissemination linked to public health programs.

    (Thacker, 1994)

    It is clear from this that a broad definition of surveillance is implied and thatit relates to a wide range of monitoring methods related to health. From astatistical point of view it is relevant to consider how statistical methods can bedeveloped or employed to best aid the task of surveillance of populations. Thiswill require using all of the relevant data available for analysis. It will certainlyinclude information about where the data was recorded as well as when it wasobserved.

    1.1.1 Spatial Surveillance

    There is thus a need to combine the thinking in two previously mostly distinctfields of statistical research, namely surveillance, which generally constitutes

    Spatial and Syndromic Surveillance for Public Health Edited by A.B. Lawson and K. Kleinman© 2005 John Wiley & Sons, Ltd

  • 2 Introduction

    monitoring statistics for evidence of a change, and spatial techniques, whichare often used to find or describe the extent of ‘clustering’ across a map. Whileboth endeavors pre-date the formal study of statistics as a discipline, they haverarely been combined. More often, as in the famous case of John Snow andcholera in London in 1854, note of an increase in a global statistic has beenfollowed by a spatial analysis to determine whether the increase is localizedor general. Interest in doing spatial monitoring of data as it accrues has beengreatly enhanced by two developments: the perceived need to quickly detectbioterrorism after the terrorist dissemination of Bacillus anthracis in October2001, and the increasing availability of data that contains spatial (geographicallocation) information.

    1.1.2 Syndromic Surveillance

    Another result of the burgeoning availability of data has been the recognitionof a need and an opportunity. The need is for the ability to group symptomstogether in broad groups that combine similar types of complaints – this beingnecessary to ensure that two cases attributable to the same cause are notconsidered separately due to variable coding practice on the part of healthcare providers. Misappropriating from medical nomenclature, these groups ofsymptoms are loosely designated as ‘syndromes’. The opportunity presented bythe increasing availability of data is to use, for public health purposes, datathat has not often been recognized as useful in this way. Examples includeinformation about school absenteeism and over-the-counter sales of remediessuch as anti-diarrheals.Together, grouping of large numbers of symptoms and data regarding nontra-

    ditional sources of information are labeled as ‘syndromic surveillance’. Theputative advantage of syndromic surveillance is that detection of adverse effectscan be made at the earliest possible time, possibly even before disease diagnosescan be confirmed through unmistakable signs or laboratory confirmation.

    1.2 THE INCREASED IMPORTANCE OF PUBLICHEALTH SURVEILLANCE

    In addition to the CDC definition, we might consider a dictionary definitionof surveillance: ‘the close observation of a person or group, especially oneunder suspicion’. In this light we would define public health surveillance asthe monitoring of the health of the public for the onset or outbreak of illness.The illness surveilled may be rare (plague) or recurrent (influenza), natural orintentional (bioterrorism).Since the intentional release of anthrax in the USA in October 2001, there

    has been a great deal of interest in establishing systems to detect another such

  • Cluster detection 3

    attack as early as possible, should it occur. The need for early detection ismotivated by two facts. First, many agents that might be used for such an attackhave a prodromal phase that is relatively nonspecific, with symptoms that oftenresemble those of the common cold. This describes anthrax, botulism, plague,smallpox, and tularemia – all of the CDC class A bioterrorism agents exceptfor viral hemorrhagic fevers (CDC, 2004). If the attack can be detected whilemost victims are in this phase, they may be helped by specialized care, andfuture onsets may be prevented by prophylaxis. Second, for contagious diseases,earlier interdiction can slow down or stop the epidemic curve; the latter mightbe impossible if detection were delayed.A great deal of resources are being expended on mechanical detection of

    airborne organisms, such as anthrax spores. We do not discuss such efforts hereand merely observe that from the statistical perspective, detection is finishedonce a spore of anthrax has been positively identified. (Determining the locationsin need of treatment or prophylaxis is a separate question). In this book, we focuson individual human beings, in contrast to disease organisms. While it is truethat a single definitive diagnosis of anthrax or any of the organisms cited abovealso ends the statistical interest in detection, the fact of the nonspecific prodromeopens a window for detection of an attack before a definitive diagnosis hasbeen made. To wit, since the prodromal symptoms are so common, one mightsearch for unusual increases in symptoms consistent with the prodrome. Suchan increase could be due either to natural variation in the symptom incidenceor to an attack with some agent that causes those symptoms in the prodrome.In this context, we are most interested in detecting attacks while they are

    ongoing rather than retrospectively. In statistical terms, we might refer tothis as ‘cluster detection’ or ‘incident cluster detection’, where by ‘cluster’ wemean the occurrence of extra cases in a short time span. In the literature onsurveillance, this is sometimes referred to as ‘on-line’ surveillance (Chapter 3).Many techniques exist for ongoing monitoring or surveillance of a count; thesecome from industrial applications – for example, Shewhart control tables andcumulative sum (CUSUM) methods (Chapter 2) – as well as from public healthsurveillance (Huttwagner, 2003; Sonesson and Bock, 2004).

    1.3 GEOGRAPHIC INFORMATION, CLUSTERDETECTION AND SPATIAL SURVEILLANCE

    The increased need for cluster detection has coincided with an increasing avail-ability of data, especially data on the location of events. This is often obtainedby geocoding the addresses of individual cases. This can be done ‘on the fly’ ascases are encountered (Beitel et al., 2004) or with static databases that retainthe location of all patients eligible for surveillance (Lazarus et al., 2002). In itssimplest form, geocoding could imply merely obtaining the zip or postal code,but it may also include finding the exact latitude and longitude of an address

  • 4 Introduction

    using geographical information systems (GIS). In statistical jargon, such dataabout location is often referred to as ‘spatial’ data.The value of spatial data for cluster detection is twofold. First, all attacks are

    localized at some spatial scale. That is, an attack could conceivably target aneighborhood, but on a city-wide scale this would be a small area. Alternatively,an attack could include a whole metropolitan area, but on a national scale thiswould be a small region. When surveillance is limited to a single daily countfrom a neighborhood or city, even sharp increases in relatively small regionalcounts may be hidden within the natural variation found in the count across alarger area. Spatial surveillance thus promises to increase the power to detectevents that occur in small regions, relative to surveillance of the total countonly. Secondly, if an incident cluster is identified, public health officials will needto respond. If the data are nonspatial, surveillance can only give vague messagesof the sort ‘there is an excess of cases in the Boston metropolitan region’; thisis unlikely to be of much practical use. In contrast, spatial surveillance wouldallow more-specific messages, such as ‘there are excess cases in zip code 02474’.The job of identifying small regions with extra cases is also referred to as ‘clusterdetection’, where the clustering in this case refers to extra cases in an area onthe map.The coincidence of suddenly increased need and increasingly available spatial

    data has generated new interest in statistical methods for spatial surveillance,which might be described as the detection of incident clusters in space. Thegoal of this book is to provide a snapshot of the state of the nascent art ofincident spatial cluster detection, provided by statisticians involved in tradi-tional surveillance (of a single statistic), in spatial clustering, and in spatialsurveillance.

    1.4 SURVEILLANCE AND SCREENING

    An idea related to surveillance is that of screening. The use of screening to allowthe early detection of disease onset is well established, though possibly contro-versial, in such areas as cervical or mammarian cancer. These examples ofscreening involve testing individuals at regular time points to attempt to assessif onset of a condition has occurred or is likely or imminent. Screening could beapplied to populations as well as individuals, in that changes in public healthmight trigger interventions. Such interventions could be designed to redirecthealth resources towards attempts to improve the health status of the popu-lation. However, screening is usually associated with individual assessment ormonitoring, while surveillance is usually carried out at an aggregate populationlevel.Surveillance and screening share an implicit temporal dimension: popula-

    tions or individuals are assessed (often repeatedly over time) to assess whetherchanges have occurred which may warrant action. In general, a change is

  • Process control and mapping 5

    defined as exceeding limits describing the acceptable results of current obser-vation and actions taken if these limits are passed. In screening individuals,the limits may be based on a previously observed known or stable abnormalbaseline or on ‘normal’ standards thought to obtain in healthy persons.In surveillance, the ‘normal’ case is rarely known, and most attention is

    directed to detect passing limits based on observed or expected patterns. Theselimits may be fixed or may depend on the status of ancillary variables. Forexample, incidence of influenza-like illness would be expected to vary seasonally,so similar numbers of cases would be more or less alarming at different timesof year.To carry the screening analogy further, the location of the public health

    incident is as important as the fact that it occurred. A public health reportindicating only a disease outbreak is comparable to a garbled mammographyreport that only indicates a cancer but no suggestion of which breast is affected,let alone a location in which a biopsy would be appropriate. In population-levelanalysis, statisticians use ‘spatial’ statistics to discuss location. Further still,mammography uses the spatial information to help identify the existence of thenode in the first place.

    1.5 OVERVIEW OF PROCESS CONTROLAND MAPPING

    Process monitoring is necessary for quality control in a manufacturing context.The subject of statistical process control (SPC) has received the most method-ological attention of all surveillance questions. SPC has formed the basis formany disease surveillance systems. In this section we describe some basic SPCmethods that could be applied in this context.

    1.5.1 Process Control Methodology

    A number of methods have been developed for the detection of changes inpopulations over time. These methods are characterized by the estimation ofchangepoints in a sequence of disease events or a time series of populationrates, or the determination of or application of control limits to the behaviorof a system. In this area there are some simple methods available to assistin the assessment of change or ‘in control’ behavior. Some of these methodsare derived from SPC, which was developed for the monitoring of industrialprocesses over time, and could be applied within a disease surveillance program,with due care. For example, it is well known that the temporal variation incount data can be monitored by using a Poisson control chart (C or U chart),upon which specific limits can be plotted beyond which corrective action shouldbe taken. These charts are based on normal pivotal approximations.

  • 6 Introduction

    An exact interval could be constructed for independent Poisson counts in anattempt to utilize SPC methods. However, if the counts were correlated evenunder the null hypothesis, then some allowance must be made for this correl-ation in the chart. A further issue, when such methods are to be used withindisease monitoring, is the issue of how to incorporate any changes in the back-ground ‘at-risk’ population which may arise. One possibility, in the temporaldomain, is to employ relative risk estimates. For large aggregation scales, time-series methods have been employed which allow temporal dependence (seeChapter 2 in this volume).In addition, special types of chart (CUSUM charts) have been developed

    specifically to detect changes in pattern over time (changepoints). These areconstructed by cumulative recording of events over time, the accumulationbeing found to be sensitive to changepoints in the process under consideration.Some recent work in the application of these ideas in medical surveillance andmonitoring has been done by Frisén and co-workers (Chapters 3 and 9 in thisvolume). These methods require special adaptations to be developed to deal withthe spatial and spatio-temporal nature of geographical surveillance.The main issues within temporal surveillance which impact on spatial surveil-

    lance and spatio-temporal surveillance can be categorized into three classes:detection of changepoints (mean level, variance), detection of clusters, and thedetection of overall process change. Conventional SPC would use control limitsto detect shifts in single or multiple parameters where the target parameters areusually constant. However, disease incidence varies naturally in time and soallowance must be made for this variation in any monitoring system, particu-larly with variation in population at risk. In addition, particular departures fromthe ‘normal’ variation are often of greater interest than simple shifts of para-meters. Changepoints, where jumps in the incidence occur, could be a focus ofinterest. Alternatively, clusters of disease may be important. Finally, there maybe an overall process change, where various parameters change. Any diseasesurveillance system is likely to be focused on one or all of these changes. Indeed,it is the multiple focus of such systems that is one of the greatest challenges forthe development of statistical methodology.

    1.5.2 The Analysis of Maps and Surveillance

    In the spatial case, there is a wide range of methods that can be applied to asingle map of case events within a fixed time frame/period. Many of the methodsapplied in disease mapping, clustering or ecological analysis could be appliedas a surveillance tool. For example, general clustering tests could be appliedor residuals from disease maps fitted in each time period could be examined.Questions which might be appropriate to answer with these methods are suchas: Is there evidence of unusual variation in incidence in the map? Is thereevidence of ‘unusual’ clustering on the map? Is there a spatial trend on the maprelated to, for example, a putative source?

  • Purpose of this book 7

    However, when the question relates to a spatio-temporal pattern or changein pattern, then there are few methods currently available which are designedfor this purpose. There is a correspondence between the temporal surveillancefoci, and features which are important to detect in the spatial domain. First,localized discontinuities in mean level or variance of risk could be of concern(changepoints). Second, spatial clusters of disease could be a focus. Finally,overall process change could also be envisaged spatially.

    1.6 THE PURPOSE OF THIS BOOK

    We hope that this book may serve a dual purpose. First, we hope that thepotential users of spatial surveillance – the public health authorities – will use itas an introduction to the value of spatial data and as guide to analytic methodscompeting for scarce resources. Second, we hope that the statistical communitywill use it as a spur to further development of techniques and to resolution ofquestions unanswered by the chapters which follow.

    1.6.1 Statistical Surveillance and MethodologicalDevelopment in a Public Health Context

    The ongoing aim of public health surveillance, since the time of John Snow, hasbeen to identify public health problems as they occur and respond appropriatelywhen they do. Breaking this down into discrete steps, this involves determiningfromwhom to collect data, collecting the data, summarizing the data, evaluatingthe summarized data, and taking action if the evaluation warrants it. ‘Action’can be defined as any additional steps that are not performed on a routinebasis, which might include everything from asking for additional data to thevigilante removal of a pump handle or launching some other direct interdictionto prevent further illness.

    1.6.2 The Statistician’s Role in Surveillance

    As statistical analysts, it is important that we remember that our role in publichealth surveillance is in evaluating the summarized data; this may be the mainfactor in the decision whether to take action now. This is different from academicwork and most scientific work in several important ways.First of all, we do not have the luxury of academic distance from the subject.

    This means that the case for making the best-supported decision needs tobe made in a way that is powerful and easily absorbed by decision-makers.These authorities may lack the time, inclination, centralization, or trainingto follow complicated arguments or abstract presentations of our evaluations.

  • 8 Introduction

    One example of simplifying the evaluation is included in Chapter 5, on modeling,where we discuss inverting the p-value as a means of making the unusualnessof the result under the null more approachable. A second difference is that thetime available to develop analyses is very brief. In contrast, the motivation tocomplete statistical work quickly mainly derives from fear of competition orcareer goals. The impact of this difference is that simpler, more easily general-izable methods have a practical advantage – they can be deployed in practicalsituations. Methods tailored to one situation may fit that data better but beessentially useless when the data changes or when applying the method to adifferent situation. Similarly, health departments may never use methods socomplicated that they require the active participation of a Ph.D. statistician.Finally, in our experience, decision-makers often urgently request results imme-diately after data becomes available. This is another impetus towards easilygeneralizable, easily used methods.On the other hand, while statistical evaluation may be a key datum informing

    a decision, we should also remember that it is not actually the decision itself.Other data important to decision-makers include financial, political, and organ-izational feasibility considerations.

    1.7 THE CONTENTS OF THIS BOOK

    In the initial section (Part I), we provide an introduction and grounding intraditional temporal surveillance. This includes the current chapter, plus anoverview and an evaluation of methods used in purely temporal surveillance.The goal of these chapters is to bring a reader unfamiliar with surveillance toa level that subsequent chapters can be more easily digested. This is necessarybecause those chapters may take as read concepts native to traditional temporalsurveillance.We begin with a discussion of purely temporal surveillance by Yann Le Strat

    (Chapter 2). Purely temporal surveillance is commonly used in most publichealth departments, and is an area studied little outside the areas of statisticalprocess control and surveillance. Thus statistical readers may find a reviewhelpful. The chapter includes an introduction to surveillance as well as a surveyof typical methods. Methods considered include historical (nonstatistical) limits,process control charts (Shewhart charts, moving average charts, exponentiallyweighted moving average charts, CUSUM charts), time-series analysis, combin-ations of process control and time-series methods, integer-valued autoregressiveprocesses, Serfling’s method, and log-linear and other parametric models.We also provide a discussion, by Marianne Frisén and Christian Sonesson

    (Chapter 3), of optimality in surveillance, and how detection methods mightbe designed with optimality in mind. This includes a discussion of evalu-ation metrics for surveillance (including false alarms, delay before alarm, andpredictive value of alarms) and optimality criteria (including minimal expected

  • Contents of this book 9

    delay, minimax optimality, and average run length). The chapter goes on todiscuss the optimality and performance features of several methods describedin Chapter 2. It concludes by discussing several features of the public healthenvironment that differentiate it from other applications of surveillance.In Part II of the book, we provide a summary and some development of

    statistical approaches currently applied for spatial surveillance. First, Chapter 4provides an overview of spatial and spatio-temporal health analysis outside ofsurveillance. This includes a discussion of disease mapping in the cases whereindividual locations of each case are known and alternatively when cases areaggregated into regions, as well as assessment of maps through residuals andgoodness of fit. Finally, spatio-temporal and surveillance issues are introducedin the spatial context.In Chapter 5, a summary of generalized linear models and generalized linear

    mixed models, including the use of binomial and Poisson models, is offered.Another purpose of the chapter is to note advantages that are realized throughPoisson models (including variable-duration cluster signals) and to compare thesurveillance resulting from the various models in an example data set.In Chapter 6, Peter Rogerson addresses how CUSUM methods can be adapted

    to spatial surveillance. This includes a discussion of statistical process controlthat can be tailored for use in spatial applications, followed by a demonstration.Uses include surveillance of multiple local regions as well as of global statistics.Martin Kulldorff (Chapter 7) discusses how scan statistics can be used in this

    context, and recent developments in this approach. The chapter mentions testsof spatial randomness, then introduces scan statistics. This is followed by a thor-ough introduction to the practical application of scan statistics for spatial healthsurveillance. This includes a discussion of the null and alternative hypothesesfor the test, as well as the power and methods for displaying the suggestedclusters. Finally, some applications in cancer clustering, infectious disease, otherhuman diseases, veterinary medicine, and plant disease are surveyed.In Chapter 8, Laura Fosberg and co-workers discuss distance methods for

    cluster detection and identification. This includes a motivation and summary ofdistance-based methods, the introduction of a new statistic based on distances,and a simulation-based evaluation of the new statistic. An example of syndromicspatial surveillance using the statistic is provided.Next Christian Sonesson and Marianne Frisén (Chapter 9) consider multi-

    variate surveillance, what is often described as multiple streams of surveillancedata. This topic addresses the common case where either different data sourcessupply information regarding a single syndrome, or where a single data providerreports on multiple syndromes. The approaches mentioned include a reduc-tion of dimensionality (to one or a few statistics) for each time point, parallelsurveillance, vector accumulation methods, and simultaneous solution. Theyalso discuss evaluation in this context.In Part III, advanced approaches to syndromic and spatial surveillance

    are considered, including Bayesian models and data mining techniques.

  • 10 Introduction

    In Chapter 10, Neil and co-workers discuss the use of Bayesian networks andthe development of computational algorithms; in Chapter 11 they considerspeeding up spatial processing of large data sets. In Chapter 12, David Madiganprovides an example of Bayesian modeling of temporal surveillance using hiddenMarkov models. Finally, in Chapter 13, general issues in the Bayesian analysisof syndromic data and the model-based detection of spatial and spatio-temporalclusters as they evolve in time are discussed.

  • PART I

    Introduction to TemporalSurveillance

  • 2

    Overview of TemporalSurveillance

    Yann Le Strat

    2.1 INTRODUCTION

    The threat of emerging infections and the increased potential for bioterroristattacks have introduced an additional importance for surveillance systems. Themain objective of surveillance is to monitor the incidence or prevalence ofspecific health problems over time, within a well-defined population. But a widerange of objectives can be considered. Among these, detecting or monitoringoutbreaks and monitoring trends represent statistical challenges. Temporalhealth surveillance is a vast domain. After a brief and incomplete descriptionof surveillance systems and attributes, this chapter will focus on a review ofstatistical methods for the detection of unusual health events.

    2.1.1 Surveillance Systems

    Most disease surveillance systems are passive. A passive approach means thatthe organization conducting surveillance leaves the initiative for reporting topotential reporters. Contrary to an active system, the organization does notregularly contact physicians or hospitals to obtain reports. Surveillance canbe conducted in many ways. To simplify, one can briefly identify two mainapproaches: exhaustive reporting and voluntary reporting.In the former approach, notifiable diseases, essentially infectious diseases, are

    designated by public health agencies and by law, and their occurrence must bereported. In the USA, each state can designate which diseases are reportable bylaw. A physician, a laboratory where the diagnosis is made or a hospital where

    Spatial and Syndromic Surveillance for Public Health Edited by A.B. Lawson and K. Kleinman© 2005 John Wiley & Sons, Ltd