GIS approaches to the problem of disease clusters: a brief commentary

Social Science & Medicine 52 (2001) 1751–1754

GIS approaches to the problem of disease clusters:a brief commentary

Tom Koch*, Ken Denike

Department of Geography, University of British Columbia, 1984 West Mall, Vancouver, BC, Canada V6 T 1Z2

Abstract

This commentary considers issues raised in a recent article on GIS-based approaches to modeling disease clusters.‘Modeling exposure opportunities’ (Sabel, Gatrell & Loytonen et al., 2000. Social Science and Medicine, 50, 1121–37)and the general problem of mapping disease clusters. It notes that the authors’ advocate a fundamentally statistical

approach, Kerneling estimation, to map the occurrence of a specific illness whose etiology is unknown. Epidemiologists,ironically, have advocated a fundamentally cartographic solution, the cartogram, in addressing the general problem ofdisease clusters. The advantages and limits of both approaches are reviewed and the potential for their comparison in asingle study suggested. Most importantly, perhaps, the commentary seeks to join the epidemiological and medical

geographic literatures as they pertain to this analytic problem and medical cartography’s potential (GIS-based ortraditional) to understand disease etiology. # 2001 Elsevier Science Ltd. All rights reserved.

Keywords: Cartography; Disease clusters; Epidemiology; GIS

Introduction

In a recent article exploring ‘‘clusters’’ of motor

neuron disease (amyotrophic lateral sclerosis), Sabel,Gatrell, Loytonen, Maasilta, and Joelainen (2000)argued for the use of Geographic Information Systems

(GIS) as a tool for epidemiological investigation.Ambitiously, they suggested GIS mapping proceduresoffer specific and unique approaches to the problem of

disease clusters and their significance. ‘‘In the absence ofknown pathogens,’’ they argue, ‘‘one should ideallyexamine the myriad patterns of movement of thepopulation at risk. This is the geography of commuting,

socializing, travel and migration; of places of residence,vacation and work’’ (Sabel et al. 2000, p. 1122).

Their approach transposes into the GIS environment

a traditional statistical approach, Kernel estimation, as arobust methodology for the understanding of spatially

proximate incidences of specific diseases. Ironically,epidemiologists equally concerned with the significanceof such clusters have argued a fundamentally carto-

graphic solution to the same problem. This briefcommentary seeks to add to the work of Sable and hiscolleagues by comparing these approaches. Its intention

is not to criticize but to emphasize the importance of theproblem and its relation to medical cartography ingeneral, and GIS-based mapping in particular.

Disease clusters

Since Dr. John Snow’s mapping of cholera in the 19thCentury (McLeod, 2000), the hope has been thatidentifying groupings of disease incidence may assist in

identifying the cause of illnesses of unknown origin. Theproblem of clusters of illnesses with uncertain etiologyreceived popular attention in the late 1970s when

pockets of cancer clusters were reported near areas ofhigh electromagnetic occurrence (Brodeur, 1989a, b).The question was } and is } whether these congrega-

tions reflected a causal relationship or merely randomstatistical events with no epidemiological significance.

*Corresponding author. Tel.: +1-604-714-0348; fax: +1-

604-822-6150.

E-mail addresses: [email protected] (T. Koch), kdeni-

[email protected] (K. Denike).

0277-9536/01/$ - see front matter # 2001 Elsevier Science Ltd. All rights reserved.

PII: S 0 2 7 7 - 9 5 3 6 ( 0 0 ) 0 0 2 7 5 - 6

Sabel et al. (2000) offer two distinct arguments in thisarea. The first is that mapping the movement of persons

over time as well in space may be critical for theunderstanding of cases in which the causal pathogen isunknown. A second, separate argument advocates

Kernel estimation in a GIS-based analysis as a modeof investigation. We agree that time-based geography }

in this case length of residence and pattern of movement} is a potentially important tool in the exploration of

disease causation and transmission. Our purpose in thisshort commentary is to suggest an alternate methodol-ogy to Kernel estimation that is both cartographic in

nature and already advanced by epidemiologists.Through this, we hope to strengthen ties betweenmapping-based medical geography, on the one hand,

and on the other, the broader field of epidemiologicalstudies of disease incidence.

Kerneling

The central problem faced by those investigating

disease clusters is that apparent groupings of a specificcondition are meaningless out of context. The signifi-cance of a cluster exists only in the context of patterns of

population distribution and density at one or anotherscale of analysis. What appears in sparsely populatedareas to be clusters may be only random fluctuations

without significance (Selvin, Merrill, & Schulman et al.,1988). Ten cases of aseptic meningitis may be a typicalincidence rate at a large city hospital, for example, but

an epidemic if they occur at a small county medicalcentre. Thus, ‘‘cases of disease plotted on a geopoliticalmap provide a description of the location but do notaccurately reflect the risk [of a disease]’’ (Selvin, Schul-

man, & Merrill, 1992, p. 769). Distribution of diseasetherefore must be considered in a population as well asin space before the significance of incidence can be

assessed accurately.Kernel estimation permitting a smooth estimate of

probability density is one traditional, statistical ap-

proach to this problem. It uses relatively standardalgorithms designed to estimate bivariate probabilitydensity on the assumption that this task is similar to

estimating the intensity of spatial point patterning.Pattern identification typically is achieved by identifyingthe point pattern of cases (clusters) against a backdropof point-pattern identification of a specific subject

(Silverman, 1986). In the case of Snow’s study ofcholera-related deaths in London, for example, thedisease cluster centered at the Broad Street water pump

} the presumed cause of the outbreak } was significantbecause it was higher than in other areas served by otherwater sources (McLeod, 2000).

The purpose of Kernel estimation, as Bailey andGatrell (1995, p. 77), earlier noted is to arrive at first

order properties describing intensity of incidence so thatsecond order properties, defining spatial dependence or

independence, then can be considered. In their mapanalysis of ALS (motor neurone disease) in Finland,Sabel et al. (2000) use Kernel estimation to arrive at

first-order properties and recommend the use of MonteCarlo simulations, or Openshaw’s cluster detectionsoftware (Openshaw, Charlton, Wymer, & Craft, 2000)for this second level of analysis. An advantage of this

approach is that it transposes fairly standard, statisticalmethodologies familiar to epidemiologists and statisti-cians to GIS mapping. There are, however, significant

drawbacks.First, Kernel estimation does not necessarily correct

for edge effect, for the fall off of events occurring at the

boundary of a geopolitical space. Is a higher incidencehere the result of proximity to events outside anadministrative boundary but proximate to the case, or

is it caused by something within that mapped jurisdic-tion? A second problem is that the GIS program used byresearchers in this study } Arc/Info GIS } is relativelyexpensive and has a long learning curve. The utility of

this approach is thus restricted from the start to thosewith the financial resources and training to complete astatistical analysis with non-intuitive map-generating

software.A related problem is the necessity of a separate,

statistical analysis of second-order properties of spatial

dependence or independence. This adds another layer ofcomplexity, cost (in software and training), and intro-duces new methodological questions. How do wecompare, for example, a Monte Carlo simulation

approach and Openshaw’s ‘‘cluster detection’’ software?While the authors recommend the latter, its robustness isnecessarily suspect because it is based on proprietary

and, thus, unpublished algorithms.1

Finally, Kernel estimation is typically used when thereis point-based data identifying, in medical geography,

individual cases of a disease condition. This specificity ofdata is often unavailable, however. In North America,for example, concerns regarding patient privacy often

restrict dissemination of point-based disease incidencedata suitable for geocoding, and thus may limit theapplicability of this type of GIS-based analysis. Evenwhere precise locational data is available, other,

potentially relevant data is typically offered only bygeopolitical region. Thus attempts to relate diseaseincidence to sociopolitical factors, including poverty

and measures of inequality (Wilkinson, 1996), must stilllocate those incidents within political jurisdictions ofvarying population sizes. Thus a polygon-based, areal

approach will sooner or later be needed.

1 http://www.ccg.leeds.ac.uk/smart/gam. (May 14, 2000).

T. Koch, K. Denike / Social Science & Medicine 52 (2001) 1751–17541752

Cartograms

For their part, epidemiologists have suggested theincidence of disease presence and related risk may beanalyzed and mapped using cartograms. This class of

maps portrays relationships between data sets byresizing either areal boundaries or proximity measuresdefining distance or proximity of related, point-baseddata (Monmonier, 1996, p. 16–18). The maps therefore

are relational representations including simultaneously aresized backdrop and the data to be considered upon it.In the case of disease clusters of unknown etiology, the

map may present disease incidence on a map in whichpopulation regions (by county, census, postal code, etc.)are resized to reflect population density, for example

(Selvin et al., 1988, p. 215). In effect, the manipulation ofspatial relations in a map result in a cartographic,Poisson-based analysis. This approach has been used in

the mapping of breast cancer risk areas (Selvin, MerrillEndman, White, & Ragland, 1998), and separately, theanalysis of childhood cancers (Selvin et al., 1992) in theSan Francisco, CA, area.

There are significant advantages to this is fundamen-tally cartographic approach to the problem of diseaseclusters. Most importantly, perhaps, it aggressively

presents both backdrop and at-risk populations in amanner that is rigorous, transparent, and graphicallyclear. In addition, the problem of edge effect is

minimized in areal cartograms relating disease incidenceto the population in specific geopolitical districts. As aresult, the cartogram approach from the start permitsdisease location data to be presented in the context of a

flexible range of social and medical data aggregated tocensus tract, zip code, etc. This facilitates rather thaninhibits secondary analysis based on regionally aggre-

gated medical, sociopolitical and economic data.Finally, we note that this approach permits the use of

less expensive GIS mapping tools requiring less training

time. Data can be collected and analyzed using entry-level GIS programs like ArcView, for example, ratherthan the more complex and more expensive Arc/Info.

And while the generation of rigorous cartograms hasbeen traditionally perceived as a complex, non-intuitivetask (Selvin et al., 1988, 1992), recent advances in GISpermit their automation. A script suitable for use in

ArcView GIS software, one based on Olson’s algorithmfor non-contiguous cartograms (Olson, 1976), is avail-able for general review and downloading, for example.2

As well, other researchers have presented differentalgorithmic sets suitable for cartogram generation(Kocmoud & House, 2000).

Discussion

In the analysis of clusters of medical conditions withan unknown etiology, one might say that Kernelestimation may undershoot the research mark that

cartogram analysis is likely to overshoot. But wherethe cause of a condition is unknown, we submit it isoften more useful to analyze more broadly rather thantoo narrowly. Another advantage of the cartogram

approach is the manner in which it incorporates primaryand contextual data in a rigorous and unified manner. Alimit of the approach in some cases may be its generality.

Depending on the scale and nature of specific data sets itmay lack sufficient specificity to permit the detailed,second order analysis a definitive declaration of causa-

tion would require.Obviously, what would be most useful would be to

compare the benefits (and limits) of both approaches in

the analysis of a single problem of disease clustering. Byacknowledging the complimentary potential of bothapproaches, however, better insights into problematicdisease clusters may result. There is also the hope that a

more robust and varied comparison of these methodol-ogies will advance the utility of medical cartography tothe study of disease etiology, and through this, most

importantly, our understanding of problematic diseasestates themselves.

References

Bailey, T. C., & Gatrell, A. C. (1995). Interactive spatial data

analysis. New York: Wiley.

Brodeur, P. (1989a). Annals of radiation: The hazards of

electronic fields. New Yorker, June 12–29.

Brodeur, P. (1989b). Currents of death: Power lines, computer

terminals, and the attempt to cover up their threat to your

health. New York: Simon and Schuster.

Kocmoud, C. J., & House, D. H. (2000). A constraint-based

continuous cartogram method. http:// www-viz.tamu.edu/

faculty/house/carograms/index.html.

McLeod, K. S. (2000). Our sense of Snow; the myth of John

Snow in medical geography. Social Science and Medicine,

50, 923–937.

Monmonier, M. (1996). How to lie with maps (2nd ed). Chicago:

University of Chicago Press.

Olson, J. M. (1976). Noncontiguous area cartograms. The

Professional Geographer, 28, 317–380.

Openshaw, S., Charlton, M., Wymer, C., & Craft, A.W. (2000).

GAM/K software. http://www.ccg.leeds.ac.uk/smart/gam.

Sabel, C. E., Gatrell, A. C., Loytonen, M., Maasilta, P., &

Joelainen, M. (2000). Modeling exposure opportunities:

Estimating relative risk for motor neurone disease in

Finland. Social Science and Medicine, 50, 1121–1137.

Selvin, S., Merrill, D., Schulman, J., Sacks, S., Bedell, L., &

Wong, L. (1988). Transformations of maps to investigate

clusters of disease. Social Science and Medicine, 26, 215–222.

2 http://gis.esri.com/arcscripts/details.cfm?CFGRIDKEY=-

1791867381.

T. Koch, K. Denike / Social Science & Medicine 52 (2001) 1751–1754 1753

Selvin, S., Merrill, D. W., Endman, C., White, M., & Ragland,

K. (1998). Breast cancer detection: Maps of two San

Francisco Bay Area counties. American Journal of Public

Health, 88, 1186–1192.

Selvin, S., Schulman, J., & Merrill, D. (1992). Distance and risk

measures for the analysis of spatial data: A study of

childhood cancers. Social Science and Medicine, 34,

767–777.

Silverman, B. U. (1986). Density estimates for statistics and data

analysis. New York: Chapman and Hall.

Wilkinson, R. G. (1996). Unhealthy societies: The afflictions of

inequality. New York: Routledge.

T. Koch, K. Denike / Social Science & Medicine 52 (2001) 1751–17541754

GIS approaches to the problem of disease clusters: a brief commentary

Documents

Transcript of GIS approaches to the problem of disease clusters: a brief commentary