GIS approaches to the problem of disease clusters: a brief commentary
Transcript of GIS approaches to the problem of disease clusters: a brief commentary
Social Science & Medicine 52 (2001) 1751–1754
GIS approaches to the problem of disease clusters:a brief commentary
Tom Koch*, Ken Denike
Department of Geography, University of British Columbia, 1984 West Mall, Vancouver, BC, Canada V6 T 1Z2
Abstract
This commentary considers issues raised in a recent article on GIS-based approaches to modeling disease clusters.‘Modeling exposure opportunities’ (Sabel, Gatrell & Loytonen et al., 2000. Social Science and Medicine, 50, 1121–37)and the general problem of mapping disease clusters. It notes that the authors’ advocate a fundamentally statistical
approach, Kerneling estimation, to map the occurrence of a specific illness whose etiology is unknown. Epidemiologists,ironically, have advocated a fundamentally cartographic solution, the cartogram, in addressing the general problem ofdisease clusters. The advantages and limits of both approaches are reviewed and the potential for their comparison in asingle study suggested. Most importantly, perhaps, the commentary seeks to join the epidemiological and medical
geographic literatures as they pertain to this analytic problem and medical cartography’s potential (GIS-based ortraditional) to understand disease etiology. # 2001 Elsevier Science Ltd. All rights reserved.
Keywords: Cartography; Disease clusters; Epidemiology; GIS
Introduction
In a recent article exploring ‘‘clusters’’ of motor
neuron disease (amyotrophic lateral sclerosis), Sabel,Gatrell, Loytonen, Maasilta, and Joelainen (2000)argued for the use of Geographic Information Systems
(GIS) as a tool for epidemiological investigation.Ambitiously, they suggested GIS mapping proceduresoffer specific and unique approaches to the problem of
disease clusters and their significance. ‘‘In the absence ofknown pathogens,’’ they argue, ‘‘one should ideallyexamine the myriad patterns of movement of thepopulation at risk. This is the geography of commuting,
socializing, travel and migration; of places of residence,vacation and work’’ (Sabel et al. 2000, p. 1122).
Their approach transposes into the GIS environment
a traditional statistical approach, Kernel estimation, as arobust methodology for the understanding of spatially
proximate incidences of specific diseases. Ironically,epidemiologists equally concerned with the significanceof such clusters have argued a fundamentally carto-
graphic solution to the same problem. This briefcommentary seeks to add to the work of Sable and hiscolleagues by comparing these approaches. Its intention
is not to criticize but to emphasize the importance of theproblem and its relation to medical cartography ingeneral, and GIS-based mapping in particular.
Disease clusters
Since Dr. John Snow’s mapping of cholera in the 19thCentury (McLeod, 2000), the hope has been thatidentifying groupings of disease incidence may assist in
identifying the cause of illnesses of unknown origin. Theproblem of clusters of illnesses with uncertain etiologyreceived popular attention in the late 1970s when
pockets of cancer clusters were reported near areas ofhigh electromagnetic occurrence (Brodeur, 1989a, b).The question was } and is } whether these congrega-
tions reflected a causal relationship or merely randomstatistical events with no epidemiological significance.
*Corresponding author. Tel.: +1-604-714-0348; fax: +1-
604-822-6150.
E-mail addresses: [email protected] (T. Koch), kdeni-
[email protected] (K. Denike).
0277-9536/01/$ - see front matter # 2001 Elsevier Science Ltd. All rights reserved.
PII: S 0 2 7 7 - 9 5 3 6 ( 0 0 ) 0 0 2 7 5 - 6
Sabel et al. (2000) offer two distinct arguments in thisarea. The first is that mapping the movement of persons
over time as well in space may be critical for theunderstanding of cases in which the causal pathogen isunknown. A second, separate argument advocates
Kernel estimation in a GIS-based analysis as a modeof investigation. We agree that time-based geography }
in this case length of residence and pattern of movement} is a potentially important tool in the exploration of
disease causation and transmission. Our purpose in thisshort commentary is to suggest an alternate methodol-ogy to Kernel estimation that is both cartographic in
nature and already advanced by epidemiologists.Through this, we hope to strengthen ties betweenmapping-based medical geography, on the one hand,
and on the other, the broader field of epidemiologicalstudies of disease incidence.
Kerneling
The central problem faced by those investigating
disease clusters is that apparent groupings of a specificcondition are meaningless out of context. The signifi-cance of a cluster exists only in the context of patterns of
population distribution and density at one or anotherscale of analysis. What appears in sparsely populatedareas to be clusters may be only random fluctuations
without significance (Selvin, Merrill, & Schulman et al.,1988). Ten cases of aseptic meningitis may be a typicalincidence rate at a large city hospital, for example, but
an epidemic if they occur at a small county medicalcentre. Thus, ‘‘cases of disease plotted on a geopoliticalmap provide a description of the location but do notaccurately reflect the risk [of a disease]’’ (Selvin, Schul-
man, & Merrill, 1992, p. 769). Distribution of diseasetherefore must be considered in a population as well asin space before the significance of incidence can be
assessed accurately.Kernel estimation permitting a smooth estimate of
probability density is one traditional, statistical ap-
proach to this problem. It uses relatively standardalgorithms designed to estimate bivariate probabilitydensity on the assumption that this task is similar to
estimating the intensity of spatial point patterning.Pattern identification typically is achieved by identifyingthe point pattern of cases (clusters) against a backdropof point-pattern identification of a specific subject
(Silverman, 1986). In the case of Snow’s study ofcholera-related deaths in London, for example, thedisease cluster centered at the Broad Street water pump
} the presumed cause of the outbreak } was significantbecause it was higher than in other areas served by otherwater sources (McLeod, 2000).
The purpose of Kernel estimation, as Bailey andGatrell (1995, p. 77), earlier noted is to arrive at first
order properties describing intensity of incidence so thatsecond order properties, defining spatial dependence or
independence, then can be considered. In their mapanalysis of ALS (motor neurone disease) in Finland,Sabel et al. (2000) use Kernel estimation to arrive at
first-order properties and recommend the use of MonteCarlo simulations, or Openshaw’s cluster detectionsoftware (Openshaw, Charlton, Wymer, & Craft, 2000)for this second level of analysis. An advantage of this
approach is that it transposes fairly standard, statisticalmethodologies familiar to epidemiologists and statisti-cians to GIS mapping. There are, however, significant
drawbacks.First, Kernel estimation does not necessarily correct
for edge effect, for the fall off of events occurring at the
boundary of a geopolitical space. Is a higher incidencehere the result of proximity to events outside anadministrative boundary but proximate to the case, or
is it caused by something within that mapped jurisdic-tion? A second problem is that the GIS program used byresearchers in this study } Arc/Info GIS } is relativelyexpensive and has a long learning curve. The utility of
this approach is thus restricted from the start to thosewith the financial resources and training to complete astatistical analysis with non-intuitive map-generating
software.A related problem is the necessity of a separate,
statistical analysis of second-order properties of spatial
dependence or independence. This adds another layer ofcomplexity, cost (in software and training), and intro-duces new methodological questions. How do wecompare, for example, a Monte Carlo simulation
approach and Openshaw’s ‘‘cluster detection’’ software?While the authors recommend the latter, its robustness isnecessarily suspect because it is based on proprietary
and, thus, unpublished algorithms.1
Finally, Kernel estimation is typically used when thereis point-based data identifying, in medical geography,
individual cases of a disease condition. This specificity ofdata is often unavailable, however. In North America,for example, concerns regarding patient privacy often
restrict dissemination of point-based disease incidencedata suitable for geocoding, and thus may limit theapplicability of this type of GIS-based analysis. Evenwhere precise locational data is available, other,
potentially relevant data is typically offered only bygeopolitical region. Thus attempts to relate diseaseincidence to sociopolitical factors, including poverty
and measures of inequality (Wilkinson, 1996), must stilllocate those incidents within political jurisdictions ofvarying population sizes. Thus a polygon-based, areal
approach will sooner or later be needed.
1 http://www.ccg.leeds.ac.uk/smart/gam. (May 14, 2000).
T. Koch, K. Denike / Social Science & Medicine 52 (2001) 1751–17541752
Cartograms
For their part, epidemiologists have suggested theincidence of disease presence and related risk may beanalyzed and mapped using cartograms. This class of
maps portrays relationships between data sets byresizing either areal boundaries or proximity measuresdefining distance or proximity of related, point-baseddata (Monmonier, 1996, p. 16–18). The maps therefore
are relational representations including simultaneously aresized backdrop and the data to be considered upon it.In the case of disease clusters of unknown etiology, the
map may present disease incidence on a map in whichpopulation regions (by county, census, postal code, etc.)are resized to reflect population density, for example
(Selvin et al., 1988, p. 215). In effect, the manipulation ofspatial relations in a map result in a cartographic,Poisson-based analysis. This approach has been used in
the mapping of breast cancer risk areas (Selvin, MerrillEndman, White, & Ragland, 1998), and separately, theanalysis of childhood cancers (Selvin et al., 1992) in theSan Francisco, CA, area.
There are significant advantages to this is fundamen-tally cartographic approach to the problem of diseaseclusters. Most importantly, perhaps, it aggressively
presents both backdrop and at-risk populations in amanner that is rigorous, transparent, and graphicallyclear. In addition, the problem of edge effect is
minimized in areal cartograms relating disease incidenceto the population in specific geopolitical districts. As aresult, the cartogram approach from the start permitsdisease location data to be presented in the context of a
flexible range of social and medical data aggregated tocensus tract, zip code, etc. This facilitates rather thaninhibits secondary analysis based on regionally aggre-
gated medical, sociopolitical and economic data.Finally, we note that this approach permits the use of
less expensive GIS mapping tools requiring less training
time. Data can be collected and analyzed using entry-level GIS programs like ArcView, for example, ratherthan the more complex and more expensive Arc/Info.
And while the generation of rigorous cartograms hasbeen traditionally perceived as a complex, non-intuitivetask (Selvin et al., 1988, 1992), recent advances in GISpermit their automation. A script suitable for use in
ArcView GIS software, one based on Olson’s algorithmfor non-contiguous cartograms (Olson, 1976), is avail-able for general review and downloading, for example.2
As well, other researchers have presented differentalgorithmic sets suitable for cartogram generation(Kocmoud & House, 2000).
Discussion
In the analysis of clusters of medical conditions withan unknown etiology, one might say that Kernelestimation may undershoot the research mark that
cartogram analysis is likely to overshoot. But wherethe cause of a condition is unknown, we submit it isoften more useful to analyze more broadly rather thantoo narrowly. Another advantage of the cartogram
approach is the manner in which it incorporates primaryand contextual data in a rigorous and unified manner. Alimit of the approach in some cases may be its generality.
Depending on the scale and nature of specific data sets itmay lack sufficient specificity to permit the detailed,second order analysis a definitive declaration of causa-
tion would require.Obviously, what would be most useful would be to
compare the benefits (and limits) of both approaches in
the analysis of a single problem of disease clustering. Byacknowledging the complimentary potential of bothapproaches, however, better insights into problematicdisease clusters may result. There is also the hope that a
more robust and varied comparison of these methodol-ogies will advance the utility of medical cartography tothe study of disease etiology, and through this, most
importantly, our understanding of problematic diseasestates themselves.
References
Bailey, T. C., & Gatrell, A. C. (1995). Interactive spatial data
analysis. New York: Wiley.
Brodeur, P. (1989a). Annals of radiation: The hazards of
electronic fields. New Yorker, June 12–29.
Brodeur, P. (1989b). Currents of death: Power lines, computer
terminals, and the attempt to cover up their threat to your
health. New York: Simon and Schuster.
Kocmoud, C. J., & House, D. H. (2000). A constraint-based
continuous cartogram method. http:// www-viz.tamu.edu/
faculty/house/carograms/index.html.
McLeod, K. S. (2000). Our sense of Snow; the myth of John
Snow in medical geography. Social Science and Medicine,
50, 923–937.
Monmonier, M. (1996). How to lie with maps (2nd ed). Chicago:
University of Chicago Press.
Olson, J. M. (1976). Noncontiguous area cartograms. The
Professional Geographer, 28, 317–380.
Openshaw, S., Charlton, M., Wymer, C., & Craft, A.W. (2000).
GAM/K software. http://www.ccg.leeds.ac.uk/smart/gam.
Sabel, C. E., Gatrell, A. C., Loytonen, M., Maasilta, P., &
Joelainen, M. (2000). Modeling exposure opportunities:
Estimating relative risk for motor neurone disease in
Finland. Social Science and Medicine, 50, 1121–1137.
Selvin, S., Merrill, D., Schulman, J., Sacks, S., Bedell, L., &
Wong, L. (1988). Transformations of maps to investigate
clusters of disease. Social Science and Medicine, 26, 215–222.
2 http://gis.esri.com/arcscripts/details.cfm?CFGRIDKEY=-
1791867381.
T. Koch, K. Denike / Social Science & Medicine 52 (2001) 1751–1754 1753
Selvin, S., Merrill, D. W., Endman, C., White, M., & Ragland,
K. (1998). Breast cancer detection: Maps of two San
Francisco Bay Area counties. American Journal of Public
Health, 88, 1186–1192.
Selvin, S., Schulman, J., & Merrill, D. (1992). Distance and risk
measures for the analysis of spatial data: A study of
childhood cancers. Social Science and Medicine, 34,
767–777.
Silverman, B. U. (1986). Density estimates for statistics and data
analysis. New York: Chapman and Hall.
Wilkinson, R. G. (1996). Unhealthy societies: The afflictions of
inequality. New York: Routledge.
T. Koch, K. Denike / Social Science & Medicine 52 (2001) 1751–17541754