SPATIAL ANALYSIS APPLIED TO...

30
SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGY Seminar final report Ana Carolina Cuéllar

Transcript of SPATIAL ANALYSIS APPLIED TO...

Page 1: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

SPATIAL ANALYSIS

APPLIED TO

EPIDEMIOLOGY Seminar final report

Ana Carolina Cuéllar

Page 2: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

2

Index

1. Introduction ......................................................................................................................................................... 2

1.1. What is Spatial Analysis? ..................................................................................................................... 2

1.2. Epidemiology and Spatial Analysis ................................................................................................ 3

1.3. Basics concepts in spatial analysis ................................................................................................. 3

1.4 Data for Spatial Epidemiological Studies ..................................................................................... 5

2. Spatial analysis techniques ......................................................................................................................... 7

2.1 Point Pattern Analysis ............................................................................................................................ 7

2.1.1 Ripley´s K-function ........................................................................................................................ 8

2.1.2 Kernel estimation ....................................................................................................................... 11

2.2 Areal Patterns: ......................................................................................................................................... 13

2.2.1 Spatial autocorrelarion indexes ......................................................................................... 13

2.2.1.1 Global indexes ................................................................................................................... 13

2.2.1.2. Local autocorrelation ................................................................................................... 16

2.3 Geostatistics .............................................................................................................................................. 17

2.3.1 Kriging .............................................................................................................................................. 18

3. Disease mapping ............................................................................................................................................ 20

3.1. Why disease mapping is important? .......................................................................................... 22

4. Some Spatial Analysis softwares that can be used in Epidemiology. ................................ 23

5. Some applications of Spatial Analysis (SA) in Epidemiology ................................................ 24

6. Referencias bibliográficas ........................................................................................................................ 29

Page 3: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

3

1. Introduction

1.1 What is Spatial Analysis?

Spatial analysis is a broad term that comprises both statistical (spatial statistical) and

non-statistical methods, and starts with the application of exploratory techniques to seek a

good description of the data (like any traditional analysis), and thus help the definition of

hypothesis as well as the choice of the appropriate models. The main characteristic of

spatial statistical analysis, compared to the traditional statistical models, is that the places

where the events occurred are, in an explicit way, presented in the analysis (Pina et al.,

2010).

Spatial analysis (SA) is sometimes defined as a collection of techniques for analyzing

geographical events where the results of analysis depend on the spatial arrangement of the

events. By the term ‘geographical event’ (henceforth, ‘event’) is meant a collection of point,

line or area objects, located in geographical space, attached to which are a set of (one or

more) attribute values. In contrast to other forms of analysis, therefore, SA requires

information both on attribute values and the geographical locations of the objects to which

the collection of attributes are attached.

Based on the systematic collection of quantitative information, the aims of SA are: (1)

the careful and accurate description of events in geographical space (including the

description of pattern); (2) systematic exploration of the pattern of events and the

association between events in space in order to gain a better understanding of the

processes that might be responsible for the observed distribution of events; (3) improving

the ability to predict and control events occurring in geographical space (Haining, 1994).

1.2 Epidemiology and Spatial Analysis

As in any area of applied statistics, the definition and application of appropriate

inferential techniques require a balanced understanding of the questions of interest, the

data available or attainable, and probabilistic models defining or approximating the data

generating process. The central question of interest in most studies in epidemiology is the

identification of factors increasing or decreasing the individual risk of disease as observed

Page 4: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

4

in at-risk populations or samples from the at-risk population. In spatial studies, we refine

this central question to explore spatial variations in the risk of disease in order to identify

locations (and more importantly, individuals) associated with higher risk of the disease

(Waller, 2010)

Geographical epidemiology can be defined as the description of spatial patterns of

disease morbidity and mortality, part of descriptive epidemiological studies, with the aim of

formulating hypotheses about the etiology of diseases. An other definition for the same

concept that relates epidemiology to Spatial Analysis is Spatial epidemiology. This is the

description and analysis of geographic variations in disease with respect to demographic,

environmental, behavioral, socioeconomic, genetic, and infectious risk factors. Spatial

epidemiology extends the rich tradition of ecologic studies that use explanations of the

distribution of diseases in different places to better understand the etiology of disease.

1.3 Basics concepts in spatial analysis

The important role of location for special data, both in terms of absolute location (i.e.

coordinates in a space) as well as in terms of relative location (spatial arrangement,

topology), has major implications for the way in which statistical analyses may be carried

out. In fact, location leads to two different types of so called spatial effects: Spatial

dependence and spatial heterogeneity. The former results directly from the First Law of

Geography. This Law will tend to result in observations that are spatially clustered or, n

other words, will yield samples of geographical data that will not be independent. From a

geographical perspective, this spatial dependence is the rule rather than the exception, and

it conflicts with the usual assumption of independent observations in statistics. The

dependence in spatial data is often referred to spatial autocorrelation. The second, but

equally important spatial effect is related to spatial (or regional) differentiation which

follows from the intrinsic uniqueness of each location, such spatial heterogeneity (or, non

stationrity) may be evidenced in spatial regimes for variables, functional forms or model

coefficients (Anselin, 1993).

The calculation expression of the concept of spatial dependence is the spatial

autocorrelation. This term is derived from the statistical concept of correlation, which is

used to measure the relationship between two random variables. The preposition "auto"

indicates that the measurement of correlation is done with the same random variable,

measured at various locations in the space. We can use different indicators to measure the

spatial autocorrelation, all of them based in how the spatial dependence varies by

comparing the values of a sample and its neighbors. Its value varies from -1 to 1. Values

close to zero, indicate the absence of significant spatial autocorrelation between the objects

Page 5: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

values and their neighbors. Positive Values for the index, indicate positive spatial

autocorrelation, i.e. the value of the attribute of an object

its neighbors. Negative Values for the index, in turn, indi

(Figure 1).

Figure 1. Spatial association an

An other important concept is

spatialized occurrence is been studied, is not approximately constant through all the region

equally, it it is said that the process does not have “

process is a non stationary one

The “stationarity” concept leads us to a

isotropy and anisotropy. A process is isotropic when its behavior is

directions. That is, when the spatial dependence is equal in north

west direction. An example of an anisotropic process is the population density of Bra

East-West density decrease, to the inner country, is more intense than that one found in

North-South direction (Figure 2).

Figure 2. Isotropy and Anisotropy (Extracted from

values and their neighbors. Positive Values for the index, indicate positive spatial

autocorrelation, i.e. the value of the attribute of an object tends to be similar to the values of

Negative Values for the index, in turn, indicate negative auto

Spatial association and correlation (Extracted from Lai, So & Chan

An other important concept is “Stationarity”. If the mean of a process, which its

spatialized occurrence is been studied, is not approximately constant through all the region

is said that the process does not have “stationarity”, this means

ionary one.

concept leads us to a concept which is exclusive of spatial statistics:

. A process is isotropic when its behavior is the same in all

directions. That is, when the spatial dependence is equal in north-south direction or in east

An example of an anisotropic process is the population density of Bra

West density decrease, to the inner country, is more intense than that one found in

(Figure 2).

Isotropy and Anisotropy (Extracted from Santos & Souza, 2007).

5

values and their neighbors. Positive Values for the index, indicate positive spatial

tends to be similar to the values of

cate negative auto-correlation

Chan, 2009)

process, which its

spatialized occurrence is been studied, is not approximately constant through all the region

means that this

exclusive of spatial statistics:

the same in all

south direction or in east-

An example of an anisotropic process is the population density of Brazil. An

West density decrease, to the inner country, is more intense than that one found in

2007).

Page 6: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

6

1.4 Data for Spatial Epidemiological Studies

There are usually two important types of spatial data: point and area data. Each item of

health data (including population, environmental exposure, mortality and morbidity) may

be connected with a point, or precise spatial position such as a home, a street address or an

area, which could be defined as a spatial region by postcode, ward, local authority, province

and country. A public health specialist may also come across spatial data in the form of

continuous surface, such as the statistical surfaces of pollution interpolated from fixed-point

characteristic (Rezaeian et al.,2006).

Although it is possible to obtain point-based data representing disease occurrences, a

point distribution map is difficult to interpret and often not a desirable option for policy

makers. Moreover, such detailed data are not suitable for public release, given concerns

over personal privacy and data confidentiality. Units of aggregation typically used by many

public health agencies are census enumeration units. These not only provide an acceptable

solution to ensure the protection of data privacy and the individual’s anonymity but also

allow for the incorporation of demographic and socioeconomic analysis of the enumeration

units within which disease events take place.

Point representation is used to portray health data at the most detailed level of

geographic space. In spatial epidemiological studies, the plotting of disease locations as

points may reveal its distributional pattern, but points are not sufficient to disclose the

possible causes or interactions with other factors. Further analyses incorporating

sociodemographic and environmental factors are usually desirable. These analyses can

require disease counts by locations to be aggregated to some census enumeration units

(e.g., province or state, county, township, village) where summary statistics on the

socioeconomic composition of these units (e.g., age and gender groups, median income,

educational attainment) can be studied to supplement the analyses.

The aggregation of point data into areal data “ignores” a large amount of locational

information in the observed point distribution. This aggregation may inadvertently mask

true hot spots as high frequencies. The process may also result in a more “uniform” or

smoothed areal distribution of events across space than would have been observed through

point patterns (Figure 3).

Page 7: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

7

Figure 3. Aggregation of epidemiological observations for mapping (Extracted from Lai, So

& Chan, 2009)

2. Spatial analysis techniques

2.1 Point Pattern Analysis

That the basic working units of disease data in a GIS include point (e.g., patient location),

line (e.g., transmission route), and area (e.g., disease rate by country). Among these three

data units, point data representing disease locations are basic and most fundamental in

spatial epidemiological studies. Point pattern analysis in spatial epidemiology concerns the

distribution of disease events in space. At the elementary level, the spread o a disease in a

community is revealed through the plotting of disease occurrences (at the residential

locations of infected individuals) enabled with geocoding or address matching function in

GIS. Point-by-point plotting is the simplest form of mapping disease occurrences (Lai, So &

Chan, 2009)

Pattern is the feature of a set of points which describe the location of these points in

terms of the relative distance between each point and the others (Upton and Fingleton,

1985). A central aspect for understanding the spatial statistical is the notion of random

pattern. A random pattern implies that any region of the plane has the same probability to

contain a point, the same definition of the Poisson distribution. In general, the assumption is

that the point pattern presents a random distribution ("complete spatial randomness", CSR)

Page 8: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

will be the null hypothesis for the analysis. The alternative scenarios will be contagious

distribution or aggregated, and overdispersed

Figure 4: random pattern or Poisson pattern (left), aggregated pattern (center), r

pattern (right)

The nature of pattern generated by biological processes can be affected by the scale at

which the process is observed. Most of the natural environments show heterogeneity at a

scale large enough to allow the emergence of patterns aggregates. On a smaller scale, the

environmental variation may be less pronounced and the pattern will be determined by the

intensity and the nature of the i

Under the assumption of stationary (t

and isotropic (the process is invariant to rotation), the main features of a point process can

be summarize by its first order property (λ or intensity: the expected number of points per

unit of area in any locality), and by their ownership of second order, which describes the

relationships between pairs of points (e.g. , the probability of finding a point in the vicinity

of another). In the case of regular or uniform patterns, the probability of finding a

the vicinity of another is less than would have a random pattern while the

patterns the probability is greater. The estimator of the most popular second

properties is the Ripley's K function

ill be the null hypothesis for the analysis. The alternative scenarios will be contagious

ted, and overdispersed or regular (Figure 4) (Rot, 2006)

dom pattern or Poisson pattern (left), aggregated pattern (center), r

pattern (right) (extracted from Rot, 2006).

The nature of pattern generated by biological processes can be affected by the scale at

which the process is observed. Most of the natural environments show heterogeneity at a

the emergence of patterns aggregates. On a smaller scale, the

environmental variation may be less pronounced and the pattern will be determined by the

intensity and the nature of the interactions between individuals.

Under the assumption of stationary (the process is uniform or invariant to translation)

and isotropic (the process is invariant to rotation), the main features of a point process can

be summarize by its first order property (λ or intensity: the expected number of points per

y locality), and by their ownership of second order, which describes the

relationships between pairs of points (e.g. , the probability of finding a point in the vicinity

of another). In the case of regular or uniform patterns, the probability of finding a

the vicinity of another is less than would have a random pattern while the

is greater. The estimator of the most popular second

Ripley's K function, which gives an estimation at all scales (Rot, 2006

8

ill be the null hypothesis for the analysis. The alternative scenarios will be contagious

).

dom pattern or Poisson pattern (left), aggregated pattern (center), regular

The nature of pattern generated by biological processes can be affected by the scale at

which the process is observed. Most of the natural environments show heterogeneity at a

the emergence of patterns aggregates. On a smaller scale, the

environmental variation may be less pronounced and the pattern will be determined by the

he process is uniform or invariant to translation)

and isotropic (the process is invariant to rotation), the main features of a point process can

be summarize by its first order property (λ or intensity: the expected number of points per

y locality), and by their ownership of second order, which describes the

relationships between pairs of points (e.g. , the probability of finding a point in the vicinity

of another). In the case of regular or uniform patterns, the probability of finding a point in

the vicinity of another is less than would have a random pattern while the aggregated

is greater. The estimator of the most popular second-order

Rot, 2006)

Page 9: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

2.1.1 Ripley´s K-function

This function is proposed by Ripley in 1971.

extent to which there is spatial dependence in the arrangement of events. We see shortly

how this function can be estimated from an observed event

establish how we would expect it to behave in a particular theoretical situation

Formally, the point process that gives rise to such an arrangement is called a

homogeneous Poisson process. We say that an arrangement of events shows complete

spatial randomness (CSR) if it is a realization of such a process. As far as the K function for a

CSR process is concerned, the important po

event at any point in R is independent of what other events have occurred and is equally

likely over the whole of R. Thus, for a homogeneous

expected number of events within a distance d of a randomly chosen

other words, K(d) =πd2 .

If there is clustering of point events, we would expect to see an excess of events at short

distances. Thus, for small values of d, the observed value of K(d) will be greater than πd

Consider a circle centered on event i, passing through the point j, and let wq be the pro

portion of the circumference of this circle which lies within R. Then wq is the conditional

probability that an event is observed in R, given that it is a distance d, from the ith event

suitable estimator for K(d) is then

where R is the area of region R and I

when dij is less than d.

We can visualize the estimation of a K function as shown

that an event is 'visited' and that around this event a set of concentric circles at a fine

spacing is constructed. The cumulative number of events within each of these distance

'bands' is counted. Every other event is similarly 'visited' and the cumulative number o

events within dis- tance bands up to a radius d around all events becomes the estimate of

K(d) when scaled by R/n2.

function

his function is proposed by Ripley in 1971. Essentially, the K function describes the

extent to which there is spatial dependence in the arrangement of events. We see shortly

ow this function can be estimated from an observed event distribution but, first, we

establish how we would expect it to behave in a particular theoretical situation

Formally, the point process that gives rise to such an arrangement is called a

us Poisson process. We say that an arrangement of events shows complete

spatial randomness (CSR) if it is a realization of such a process. As far as the K function for a

CSR process is concerned, the important point is that the probability of the occurrenc

event at any point in R is independent of what other events have occurred and is equally

likely over the whole of R. Thus, for a homogeneous process with no spatial depen

expected number of events within a distance d of a randomly chosen event is simply

If there is clustering of point events, we would expect to see an excess of events at short

distances. Thus, for small values of d, the observed value of K(d) will be greater than πd

red on event i, passing through the point j, and let wq be the pro

portion of the circumference of this circle which lies within R. Then wq is the conditional

probability that an event is observed in R, given that it is a distance d, from the ith event

suitable estimator for K(d) is then

where R is the area of region R and Id(dij) is an indicator function that takes the value 1

e can visualize the estimation of a K function as shown in Figure 5. We

ent is 'visited' and that around this event a set of concentric circles at a fine

spacing is constructed. The cumulative number of events within each of these distance

'bands' is counted. Every other event is similarly 'visited' and the cumulative number o

tance bands up to a radius d around all events becomes the estimate of

9

Essentially, the K function describes the

extent to which there is spatial dependence in the arrangement of events. We see shortly

istribution but, first, we

establish how we would expect it to behave in a particular theoretical situation.

Formally, the point process that gives rise to such an arrangement is called a

us Poisson process. We say that an arrangement of events shows complete

spatial randomness (CSR) if it is a realization of such a process. As far as the K function for a

ity of the occurrence of an

event at any point in R is independent of what other events have occurred and is equally

process with no spatial dependence, the

event is simply πd2. In

If there is clustering of point events, we would expect to see an excess of events at short

distances. Thus, for small values of d, the observed value of K(d) will be greater than πd2.

red on event i, passing through the point j, and let wq be the pro-

portion of the circumference of this circle which lies within R. Then wq is the conditional

probability that an event is observed in R, given that it is a distance d, from the ith event. A

) is an indicator function that takes the value 1

e may imagine

ent is 'visited' and that around this event a set of concentric circles at a fine

spacing is constructed. The cumulative number of events within each of these distance

'bands' is counted. Every other event is similarly 'visited' and the cumulative number of

tance bands up to a radius d around all events becomes the estimate of

Page 10: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

Figure 5. Estimation of a K function

Once calculated, k(d) can be compared with its expected for

theoretical situations. For example, as we noted, we expect K(d)= πd

process with no spatial dependence. Under regularity, K(d) would be less than πd

whereas, under clustering, K(d) would be greater than πd

estimated from the observed data, with πd

against d. Peaks in positive values tend to indicate spatial clustering and troughs of negative

values indicate regularity, at corresponding sc

To assess whether the observed peaks or troughs in this plot are significant simulation

techniques may be used. Under the assumption of CSR, we may perform

simulations of n events in the study region (

point pattern, we can estimate K(d) and use the maximum and minimum of these functions

for the simulated patterns to define an upper and lower simulation envelope. If the

estimated K(d) lies above the upper en

the lower envelope, this is evidence of spatial 'inhibition' or regularity in the arrangement

of events.

In practice, it is used more often the function L(r) = (K(r)/ π )

constant variance and allows an easier interpretation of the test (Fig

Under CSR, L(r) = r and therefore L(r)

place when L(d) -d is significantly greater than zero and a regular patte

significantly less than zero (Rot, 2006

Estimation of a K function (extracted from Gatrell et al., 1996)

Once calculated, k(d) can be compared with its expected form according to particular

theoretical situations. For example, as we noted, we expect K(d)= πd2 for a homogeneous

process with no spatial dependence. Under regularity, K(d) would be less than πd

whereas, under clustering, K(d) would be greater than πd2. So we can compare K(d),

estimated from the observed data, with πd2. This may be done through a plot of k(d)

against d. Peaks in positive values tend to indicate spatial clustering and troughs of negative

values indicate regularity, at corresponding scales of distance (Gatrell et al., 1996).

To assess whether the observed peaks or troughs in this plot are significant simulation

techniques may be used. Under the assumption of CSR, we may perform m

events in the study region (where m might be, say, 99). For each simulated

point pattern, we can estimate K(d) and use the maximum and minimum of these functions

for the simulated patterns to define an upper and lower simulation envelope. If the

estimated K(d) lies above the upper envelope, we can speak of aggregation. If it lies below

the lower envelope, this is evidence of spatial 'inhibition' or regularity in the arrangement

In practice, it is used more often the function L(r) = (K(r)/ π )1/2 ,that in addition, has a

stant variance and allows an easier interpretation of the test (Figure 6C

Under CSR, L(r) = r and therefore L(r) - r = 0 can be tested in each distance r. A cluster takes

d is significantly greater than zero and a regular pattern when L(d)

Rot, 2006).

10

(extracted from Gatrell et al., 1996)

m according to particular

for a homogeneous

process with no spatial dependence. Under regularity, K(d) would be less than πd2,

So we can compare K(d),

. This may be done through a plot of k(d)- πd2

against d. Peaks in positive values tend to indicate spatial clustering and troughs of negative

(Gatrell et al., 1996).

To assess whether the observed peaks or troughs in this plot are significant simulation

m independent

where m might be, say, 99). For each simulated

point pattern, we can estimate K(d) and use the maximum and minimum of these functions

for the simulated patterns to define an upper and lower simulation envelope. If the

velope, we can speak of aggregation. If it lies below

the lower envelope, this is evidence of spatial 'inhibition' or regularity in the arrangement

,that in addition, has a

ure 6C, Figure 7).

A cluster takes

rn when L(d) -d is

Page 11: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

Figure 6. Point pattern distribution (left), K function for the same point pattern. The red

indicates the k function for a random process. The black line shows the K function

calculated, dotted blue lines are the max and min values obtained from 99 Monte

simulations (center). L function of the same point pattern.

Figure 7. L function for the point patterns of figure 4: Ran

pattern (left), aggregated pattern (cente

2.1.2 Kernel estimation

The kernel density estimation is a method for examining large

pattern analysis. It analyzes disease patterns and detects hot spots

window technique linked to a quartic kernel algorithm. The

how event frequencies vary continuously across

Kernel estimation is a generalization of this idea, where the window is

moving three-dimensional function (the kernel) which weights events within its sphere of

influence according to their distance from the point at which the intensity is being

Point pattern distribution (left), K function for the same point pattern. The red

indicates the k function for a random process. The black line shows the K function

d blue lines are the max and min values obtained from 99 Monte

simulations (center). L function of the same point pattern.

. L function for the point patterns of figure 4: Random pattern or Poisson

pattern (left), aggregated pattern (center), regular pattern (right).

The kernel density estimation is a method for examining large-scale trends

pattern analysis. It analyzes disease patterns and detects hot spots through a movi

window technique linked to a quartic kernel algorithm. The approach attempts to estimate

how event frequencies vary continuously across the study area based on the point patterns

Kernel estimation is a generalization of this idea, where the window is replaced with a

dimensional function (the kernel) which weights events within its sphere of

influence according to their distance from the point at which the intensity is being

11

Point pattern distribution (left), K function for the same point pattern. The red

indicates the k function for a random process. The black line shows the K function

d blue lines are the max and min values obtained from 99 Monte-Carlo

dom pattern or Poisson

scale trends in point

through a moving

approach attempts to estimate

the study area based on the point patterns.

replaced with a

dimensional function (the kernel) which weights events within its sphere of

influence according to their distance from the point at which the intensity is being

Page 12: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

estimated. The method is commonly used in

smooth estimates of univariate (or multivariate) probability densities from an observed

sample of observations. Formally, if s represents a vector location anywhere in R and s, ..., s,

are the vector locations of the n observed events,

Here, k() represents the kernel weighting function which, for convenience, is expressed

in standardized form (that is, cent

curve). This is then centered on s and 'stretched' according to the parameter z > 0, which is

referred to as the band width. The value of z is chosen to provide the required degree of

smoothing in the estimate. Graphically, we may imagine a three

'visits' each point s on the fine grid (Figure

within the region of influence (as controlled by z), are measured and contribute to the

intensity estimate at s according to how close they are to s. We may then u

contouring algorithm, or some form of raster display, to represent the resulting intensit

estimates as a continuous surface showing how intensity varies over the

Figure 8. Kernel estimation of a point pattern

The incident hot spots can then be verified and tested for their statistical significance

against a random distribution.

estimated. The method is commonly used in a more general statistical context to obtain

smooth estimates of univariate (or multivariate) probability densities from an observed

Formally, if s represents a vector location anywhere in R and s, ..., s,

are the vector locations of the n observed events, then the intensity, 2(s), at s is estimated as

Here, k() represents the kernel weighting function which, for convenience, is expressed

in standardized form (that is, centered at the origin and having a total volume of 1 under the

red on s and 'stretched' according to the parameter z > 0, which is

referred to as the band width. The value of z is chosen to provide the required degree of

smoothing in the estimate. Graphically, we may imagine a three-dimensional function that

h point s on the fine grid (Figure 8). Distances to each observed event

within the region of influence (as controlled by z), are measured and contribute to the

intensity estimate at s according to how close they are to s. We may then u

contouring algorithm, or some form of raster display, to represent the resulting intensit

face showing how intensity varies over the entire

Kernel estimation of a point pattern (extracted from Gatrell et al., 1996)

The incident hot spots can then be verified and tested for their statistical significance

against a random distribution. The data collected at sampled locations are representative of

12

text to obtain

smooth estimates of univariate (or multivariate) probability densities from an observed

Formally, if s represents a vector location anywhere in R and s, ..., s,

then the intensity, 2(s), at s is estimated as

Here, k() represents the kernel weighting function which, for convenience, is expressed

red at the origin and having a total volume of 1 under the

red on s and 'stretched' according to the parameter z > 0, which is

referred to as the band width. The value of z is chosen to provide the required degree of

dimensional function that

). Distances to each observed event Si that lies

within the region of influence (as controlled by z), are measured and contribute to the

intensity estimate at s according to how close they are to s. We may then use a suitable

contouring algorithm, or some form of raster display, to represent the resulting intensity

entire area.

Gatrell et al., 1996)

The incident hot spots can then be verified and tested for their statistical significance

data collected at sampled locations are representative of

Page 13: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

the spatial distribution. The interpolator will us

variables of interest at other unsampled locations.

The kernel estimate

this is increased, there is more smoothing of the spatial variation in intens

reduced we obtain an increasingly 'spiky' estimate. What value, then should we choose? In

practice, the value of kernel estimation is that one has the flexibilit

different values of z, exploring the surface a,(s) using dif

order to look at the variation in A(s) at different scales. There are

attempt automatically to choose a value of z which optimally balances the reliability of the

estimate against the degree of spatial deta

event locations. We should further note that it is possible to adjust the value of z at different

points in R in order to improve the kernel estimate. Such local adjustment of bandwidth

may be achieved by adaptive kernel estimation. In adaptive smoothing, sub

events are more densely packed than others (and thus where more detailed information on

the variation in intensity is available) are 'visited' by a kernel whose band

than elsewhere, as a means of avoiding smoothing out too much detail.

As a result of a kernel we get a map showing for each point, a value of density according

to the position and the pattern exhibit by the original points. This estima

punctual distribution to a continuous density value map. An example of a map of kernel

density estimator can be seen at figure

Figure 9: Map of kernel density estimator, illustrating the distribution of Canine Visceral

Lishmaniasis cases in each sector of Ihla Solteira (Extracted from Paulan et al., 2012).

the spatial distribution. The interpolator will use these sample points to predict values of

variables of interest at other unsampled locations.

is intended to be sensitive to the choice of bandwidth, r. As

this is increased, there is more smoothing of the spatial variation in intens

reduced we obtain an increasingly 'spiky' estimate. What value, then should we choose? In

practice, the value of kernel estimation is that one has the flexibility to exper

different values of z, exploring the surface a,(s) using different degrees of smoothing in

order to look at the variation in A(s) at different scales. There are also methods which

matically to choose a value of z which optimally balances the reliability of the

estimate against the degree of spatial detail that is retained, given the observed pattern of

event locations. We should further note that it is possible to adjust the value of z at different

points in R in order to improve the kernel estimate. Such local adjustment of bandwidth

adaptive kernel estimation. In adaptive smoothing, sub-areas in which

events are more densely packed than others (and thus where more detailed information on

the variation in intensity is available) are 'visited' by a kernel whose band- width is smaller

an elsewhere, as a means of avoiding smoothing out too much detail. (Gatrell et al., 996).

As a result of a kernel we get a map showing for each point, a value of density according

to the position and the pattern exhibit by the original points. This estimation lead to a

punctual distribution to a continuous density value map. An example of a map of kernel

density estimator can be seen at figure 9.

: Map of kernel density estimator, illustrating the distribution of Canine Visceral

in each sector of Ihla Solteira (Extracted from Paulan et al., 2012).

13

e these sample points to predict values of

sitive to the choice of bandwidth, r. As

this is increased, there is more smoothing of the spatial variation in intensity; as it is

reduced we obtain an increasingly 'spiky' estimate. What value, then should we choose? In

y to experiment with

ferent degrees of smoothing in

also methods which

matically to choose a value of z which optimally balances the reliability of the

il that is retained, given the observed pattern of

event locations. We should further note that it is possible to adjust the value of z at different

points in R in order to improve the kernel estimate. Such local adjustment of bandwidth

areas in which

events are more densely packed than others (and thus where more detailed information on

width is smaller

(Gatrell et al., 996).

As a result of a kernel we get a map showing for each point, a value of density according

tion lead to a

punctual distribution to a continuous density value map. An example of a map of kernel

: Map of kernel density estimator, illustrating the distribution of Canine Visceral

in each sector of Ihla Solteira (Extracted from Paulan et al., 2012).

Page 14: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

14

2.2 Areal Patterns:

2.2.1 Spatial autocorrelarion indexes

2.2.1.1 Global indexes

Global indexes measure how the variable being studied is correlated within a region,

and are useful for a holistic characterization providing a single value to the region (Pina et

al., 2010).

In global tests for autocorrelation, it is assumed that the relationship between nearby

or otherwise connected observations will remain the same everywhere in the study area

(referred to as “stationarity” or “structural stability”) (Jerrett, Gale & Kontgis, 2010).

Moran’s I is one of the oldest indicators of spatial autocorrelation. It is a popular

measure and has remained a de facto standard in examining zones or points with

continuous variables associated with them. Moran’s I is similar to the Pearson’s correlation

coefficient and gives a score ranging between –1 and 1. A positive score means a “hot” spot

or that a polygon or point with a high score has other polygons or points with high scores

surrounding it. Conversely, an occurrence of a low score indicates a “cold” spot because of

low scoring occurrences in the neighborhood. A score of zero indicates that nothing can be

assumed about the scores of the neighboring polygons or points. A negative score means a

“spatial outlier” or that the scores of neighboring locations will be the opposite of the

location under examination; that is, a polygon or point with a low score will have high

scoring neighbors, and viceversa. (Lai, So & Chan, 2009).

Moran´s I statistic gives a formal indication of the degree of linear association

between a vector of observed values and a weighted average of the neighboring values, or

spatial lag. It provides an unique value for each dataset. A test that allows to measure the

statistical significance of the space autocorrelation is built. Formally, Moran´s I can be

expressed in matrix notation as:

Page 15: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

Where stands for the sum of all elements in the spatial weights matrix. Z

observations are the deviations from de mean

value in an particular spatial units while X

localization, normally its neighbor.

One way of specifying interdependence among observations is to use a spatial

weight matrix. The spatial weight matrix

Wi j > 0 indicates that observation

Wi j > 0 if region i is contiguous to region

neighbors to themselves. Measures of proximity, such as cardinal distance (e.g., kilometers),

ordinal distance (e.g., the m near

border with region j), have been used to specify

how the weight matrix is constructed:

Figure 10. Construction of a weight matrix (W). 1 indicates

indicates

In 1993 Anselin introduced

observe in a scatter graph the behavior of each spatial unit. This novelty was one of the fir

steps toward the local analysis, i.e. the

scatterplot can be augmented with a linear regression w

which can be used to indicate the degree of fit, the presence of outlier

can be divided into four quadrants (

top and following clockwise with the next ones. In the x

a variable for each spatial unit of the studied

the average of the values in neighboring units of the same variable (univariate analysis) or

stands for the sum of all elements in the spatial weights matrix. Z

observations are the deviations from de mean where Xi is the variable

value in an particular spatial units while Xj is the variable value measured in another

calization, normally its neighbor.

One way of specifying interdependence among observations is to use a spatial

weight matrix. The spatial weight matrix W is an n × n exogenous nonnegative matrix where

0 indicates that observation i depends upon neighboring observations j. For example,

is contiguous to region j. Also, Wii = 0, so observations cannot be

neighbors to themselves. Measures of proximity, such as cardinal distance (e.g., kilometers),

nearest neighbors), and contiguity (Wi j > 0, if region

), have been used to specify W. (Pace & LeSage, 2010). Figure

how the weight matrix is constructed:

. Construction of a weight matrix (W). 1 indicates neighboring observations

the observations which are not neighbors.

In 1993 Anselin introduced the Moran scatterplot, an analysis tool that allows to

observe in a scatter graph the behavior of each spatial unit. This novelty was one of the fir

steps toward the local analysis, i.e. the desaggregation of the global value of AE.

scatterplot can be augmented with a linear regression which has Moran´s I

ich can be used to indicate the degree of fit, the presence of outliers, etc. The scatterplot

can be divided into four quadrants (see Figure 11) starting with the first one on the right

top and following clockwise with the next ones. In the x-axis appear standardized values of

a variable for each spatial unit of the studied area, and in the y-axis standardized values of

the average of the values in neighboring units of the same variable (univariate analysis) or

15

stands for the sum of all elements in the spatial weights matrix. Z

is the variable

is the variable value measured in another

One way of specifying interdependence among observations is to use a spatial

exogenous nonnegative matrix where

. For example,

= 0, so observations cannot be

neighbors to themselves. Measures of proximity, such as cardinal distance (e.g., kilometers),

0, if region i shares a

Figure 10 shows

neighboring observations, 0

an analysis tool that allows to

observe in a scatter graph the behavior of each spatial unit. This novelty was one of the first

of AE. The Moran

I a slope, and

The scatterplot

) starting with the first one on the right

axis appear standardized values of

axis standardized values of

the average of the values in neighboring units of the same variable (univariate analysis) or

Page 16: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

another variable (bivariate analysis)

values higher than the average that, in turn, also have neighbors with higher values

(situation high-high, also known as hot spots in the scatterplot of Moran). The reverse

situation is recorded in the quadrant III (situation low

quadrants allow to detect clusters of spatial units with similar values to those presented by

its neighbors. In counterpart, outliers respond to space mixed contexts, in other words,

spatial units with low values (lower than the average) with neighbors

(situation low-high) in the quadrant IV. The opposite scenario (situation high

located in the quadrant II. (Anselin, 1993;

Figure 11. A Moran scatterplot in GeoDa.

Local autocorrelation:

In the analysis of spatial association, it has long been recognized that the assumption

of stationarity or structural stability over space may be highly unrealistic

relationships are of less interest than local relationships or clusters that may display non

stationarity. In exploratory spatial data analysis (ESDA), the predominant approach to

assess the degree of spatial association still ignores this potential instability, as it is based

on global statistics such as Moran's

an allowance for local instabilities in overall spatial

more appropriate perspective.

Llocal indicators of spatial association

indicators, such as Moran's I,

1995). These local statistics usually break the study area into smaller regions to determine

if local areas have attribute values that are higher or lower than would be expected based

on the global average or a random expectation for the entire study area.

translate how the variable in one area is correlated to the same variable in a close

another variable (bivariate analysis). In quadrant I we can identify the spatial units with

verage that, in turn, also have neighbors with higher values

high, also known as hot spots in the scatterplot of Moran). The reverse

situation is recorded in the quadrant III (situation low-low, also called cold spots)

to detect clusters of spatial units with similar values to those presented by

its neighbors. In counterpart, outliers respond to space mixed contexts, in other words,

spatial units with low values (lower than the average) with neighbors with

high) in the quadrant IV. The opposite scenario (situation high

(Anselin, 1993; Lai, So & Chan, 2009).

A Moran scatterplot in GeoDa. The slope of the line represents the Moran´s

In the analysis of spatial association, it has long been recognized that the assumption

of stationarity or structural stability over space may be highly unrealistic. Sometimes global

terest than local relationships or clusters that may display non

In exploratory spatial data analysis (ESDA), the predominant approach to

assess the degree of spatial association still ignores this potential instability, as it is based

bal statistics such as Moran's I. A focus on local patterns of association (hot spots) and

an allowance for local instabilities in overall spatial association has been suggested as

more appropriate perspective.

local indicators of spatial association (LISA) allow for the decomposition of global

I, into the contribution of each individual observation (Anselin,

These local statistics usually break the study area into smaller regions to determine

ute values that are higher or lower than would be expected based

on the global average or a random expectation for the entire study area. Local indicators

translate how the variable in one area is correlated to the same variable in a close

16

. In quadrant I we can identify the spatial units with

verage that, in turn, also have neighbors with higher values

high, also known as hot spots in the scatterplot of Moran). The reverse

low, also called cold spots). Both

to detect clusters of spatial units with similar values to those presented by

its neighbors. In counterpart, outliers respond to space mixed contexts, in other words,

higher values

high) in the quadrant IV. The opposite scenario (situation high-low) is

The slope of the line represents the Moran´s I

In the analysis of spatial association, it has long been recognized that the assumption

Sometimes global

terest than local relationships or clusters that may display non-

In exploratory spatial data analysis (ESDA), the predominant approach to

assess the degree of spatial association still ignores this potential instability, as it is based

. A focus on local patterns of association (hot spots) and

association has been suggested as a

allow for the decomposition of global

of each individual observation (Anselin,

These local statistics usually break the study area into smaller regions to determine

ute values that are higher or lower than would be expected based

Local indicators

translate how the variable in one area is correlated to the same variable in a close

Page 17: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

neighborhood; they provide a value for each area and allow the identification of clusters.

Local indicators are more sensitive to variations of the values of the variable

Kontgis, 2010). Local spatial clusters, sometimes referred to as

as those locations or sets of contiguous locations for which the LISA is significant.

indicates the presence of spatial dependency (clusters) in some areas, i.e., areas where the

incidence rates were significantly correlated (p < 0.05

neighbors. This is shown in a map called by Anselin “LISA map” (Anselin, 1995).

is shown in figure 12:

Figure 12. LISA Maps showing c

Legionellosis incidence rates by age and by

2.3 Geostatistics

D. G. Krige was for many years a professor at the University of the Witwatersrand,

South Africa. He promoted the use of

Krige (1951), set the seeds for the lat

at L’ ´ Ecole des Mines in Fontainbleau, France, of the branch of spatial statistics known as

geostatistics. The spatial prediction method known

different scientific setting, the objective analysis of

tool for constructing spatially continuous weather maps from spatially discrete

observations on the ground and in th

ey provide a value for each area and allow the identification of clusters.

Local indicators are more sensitive to variations of the values of the variable (Jerrett,

Local spatial clusters, sometimes referred to as hot spots, may be i

those locations or sets of contiguous locations for which the LISA is significant.

indicates the presence of spatial dependency (clusters) in some areas, i.e., areas where the

nificantly correlated (p < 0.05) with the incidence rates of its

own in a map called by Anselin “LISA map” (Anselin, 1995).

LISA Maps showing clusters for men (left) and women (right) of standardized

e rates by age and by city council in Spain (Extracted from Gomez

Barrosoa et al., 2011).

D. G. Krige was for many years a professor at the University of the Witwatersrand,

South Africa. He promoted the use of statistical methods in mineral exploration and, in

Krige (1951), set the seeds for the later development, by Georges Mathéron and colleagues

at L’ ´ Ecole des Mines in Fontainbleau, France, of the branch of spatial statistics known as

patial prediction method known as kriging is named in his honor. In a

different scientific setting, the objective analysis of kriging, was for a long time the standard

for constructing spatially continuous weather maps from spatially discrete

the ground and in the air (Diggle, 2010).

17

ey provide a value for each area and allow the identification of clusters.

Jerrett, Gale &

may be identified

those locations or sets of contiguous locations for which the LISA is significant. The LISA

indicates the presence of spatial dependency (clusters) in some areas, i.e., areas where the

idence rates of its

own in a map called by Anselin “LISA map” (Anselin, 1995). An example

lusters for men (left) and women (right) of standardized

in Spain (Extracted from Gomez-

D. G. Krige was for many years a professor at the University of the Witwatersrand,

statistical methods in mineral exploration and, in

ron and colleagues

at L’ ´ Ecole des Mines in Fontainbleau, France, of the branch of spatial statistics known as

as kriging is named in his honor. In a

kriging, was for a long time the standard

for constructing spatially continuous weather maps from spatially discrete

Page 18: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

In geostatistics, use is made of the phenomenon of spatial dependence: the tendency

for proximate observations to be more similar than more distant ones. Spatial dependence

may be represented by a range of

z(xi) on property z at locations

estimated for a set of discrete lag (distance and direction) vectors

where N(h) is the number of paired comparisons at the set of discrete lags

represents the spatial dependence (the tendency for proximate points to be more related

than more distant ones) in the data.

display of semivariance (γ) versus distance

of spatial dependence or autocorrelation between samples. To

we need to determine how variance behaves

model levels out. The distance where the model first flattens out is known as the range and

the corresponding value on the y axis is called the sill. Samples separated by distances

closer than the range are spatially autocorrelated, whereas locations farther apart than t

range are not. The range is thus the distance beyond which the deviation in z values does

not depend on distance. Once the sample variogram has been estimated, it is necessary to fit

a continuous mathematical model to it to allow statistical inference (

exponential, Gaussian, linear)

used to estimate distance weights for interpolation

Figure 13. Semivariogram:

In geostatistics, use is made of the phenomenon of spatial dependence: the tendency

for proximate observations to be more similar than more distant ones. Spatial dependence

may be represented by a range of functions. Here, we focus on the variogram. Given

at locations xi, i = 1, 2, . . . , n, the sample variogram

estimated for a set of discrete lag (distance and direction) vectors h using:

of paired comparisons at the set of discrete lags h. The variogram

represents the spatial dependence (the tendency for proximate points to be more related

than more distant ones) in the data. (Graham et al., 2004). A semivariogram is a graphic

) versus distance or lag. Semivariance is a measure of the

of spatial dependence or autocorrelation between samples. To compute a semivariogram,

we need to determine how variance behaves against distance.. At a certain distance, the

t. The distance where the model first flattens out is known as the range and

the corresponding value on the y axis is called the sill. Samples separated by distances

closer than the range are spatially autocorrelated, whereas locations farther apart than t

range are not. The range is thus the distance beyond which the deviation in z values does

Once the sample variogram has been estimated, it is necessary to fit

a continuous mathematical model to it to allow statistical inference (e.g., spherical, cubic,

exponential, Gaussian, linear) (Figure 13). Once the semivariogram has been developed, it is

weights for interpolation.

Semivariogram: graphic display of semivariance (γ) versus distance or la

18

In geostatistics, use is made of the phenomenon of spatial dependence: the tendency

for proximate observations to be more similar than more distant ones. Spatial dependence

functions. Here, we focus on the variogram. Given n data

, the sample variogram γ(h) may be

. The variogram

represents the spatial dependence (the tendency for proximate points to be more related

A semivariogram is a graphic

or lag. Semivariance is a measure of the degree

compute a semivariogram,

At a certain distance, the

t. The distance where the model first flattens out is known as the range and

the corresponding value on the y axis is called the sill. Samples separated by distances

closer than the range are spatially autocorrelated, whereas locations farther apart than the

range are not. The range is thus the distance beyond which the deviation in z values does

Once the sample variogram has been estimated, it is necessary to fit

e.g., spherical, cubic,

Once the semivariogram has been developed, it is

) versus distance or lag

Page 19: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

2.3.1 Kriging

The modelled variogram can be used in a wide range of geostatistical operations.

The geostatistical method for spatial prediction known as

prediction variance and unbiased) under the cons

x0 is a linear weighted sum of j

where the λj are the j weights. Optimality is obtained by selecting the weights based on the

variogram: proximate neighbours receive more weight than more distant data, and the

exact weights are determined from the fitted mathematical (variogram) model.

Unbiasedness is achieved by setting the su

2004).

The result of this process is a map showing the values for a certain variable in all the

geographic space (Figure 14).

The modelled variogram can be used in a wide range of geostatistical operations.

The geostatistical method for spatial prediction known as kriging is optimal (minimum

prediction variance and unbiased) under the constraint that the prediction ˆz(x

is a linear weighted sum of j neighbouring data z(xj):

weights. Optimality is obtained by selecting the weights based on the

variogram: proximate neighbours receive more weight than more distant data, and the

exact weights are determined from the fitted mathematical (variogram) model.

is achieved by setting the sum of the kriging weights to one (Graham et al.,

The result of this process is a map showing the values for a certain variable in all the

geographic space (Figure 14).

19

The modelled variogram can be used in a wide range of geostatistical operations.

is optimal (minimum

z(x0) at location

weights. Optimality is obtained by selecting the weights based on the

variogram: proximate neighbours receive more weight than more distant data, and the

exact weights are determined from the fitted mathematical (variogram) model.

m of the kriging weights to one (Graham et al.,

The result of this process is a map showing the values for a certain variable in all the

Page 20: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

20

Figure 14. Map of kriged risk of Childhood Cancer in the West Midlands, England (Extracted

from Webster et al., 1994)

3. Disease mapping

Tobler´s first law of geography, which states that “Things that are closer are more

related”, is central to core spatial analytical techniques as well as analytical conceptions of

geographic space. In the case of disease spread, individuals near or exposed to a contagious

person or a tainted environmental setting are deemed more susceptible to certain types of

illnesses. Cartographic design and mapping techniques can draw attention to these

locations by displaying an aggregation and design has the following framework: 1)-

geographic feature classification, 2) scale determination, 3) symbol categorization, and 4)

graphic primitives (figure 15). The basic working units of disease data include point (e.g.,

patient locations), line (e.g., transmission route), and area (e.g, disease rate by country).

Depending on the data scaling and level of measurement (whether nominal, ordinal,

interval, or ratio), the use of certain combinations of symbols and graphic primitives is more

effective in conveying spatial distributions.

Figure 15. A framework of cartographic design

In spatial epidemiology, point-based data representing disease or patient locations are

the bases of data collection. Geocoded point data derived from address matching form the

essential input for disease mapping. Very often, disease mapping involves making maps of

point and choropleth patterns (Figure 3.5). Although it is appropriate in secure research

settings to represent locations of disease incidence at the local scale, for example, to search

for possible disease clusters, point maps for public distribution and consumption may be

deemed too revealing and sensitive.

Although point pattern representation is a quasi-accurate account of a health event, its

use is undesirable in portraying disease occurrences of acute sensitivity (such as AIDS and

Page 21: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

21

SARS). To safeguard personal privacy and curtail social segregation, point-based data are

collapsed by enumeration units for visual presentation (as in choropleth, proportional

symbol, and cartogram methods). Aggregating point data by a set of areal units allows

distributional maps to be created to reveal new insights. The point-in-polygon operation in

a GIS, for example, can aggregate point data by administrative zones.

Identification and discrimination between map symbols are necessary to represent data

in a meaningful way. Clear and intuitive map symbols are the main components to allow

map viewers a visual understanding of the resultant pattern or intended message. Other

than point pattern maps, the most commonly used mapping technique is by means of

choropleth or shaded area mapping (Figure 16). This method involves grouping numerical

values (e.g., disease rate per 1,000, standard deviation) associated with some enumeration

units (e.g., census tracts) into ordinal classes (e.g., five ranked classes representing very

high, high, medium, low, and very low readings). Each group is assigned a color in which the

darker color represents a higher value and lighter color a lower value. Each enumeration

area is shaded the color of its corresponding class containing the value. To reduce adverse

visualization effects projected by small areas of high values or large areas of small values, it

is recommended that this technique be used to map rates instead of raw readings.

The remaining mapping techniques illustrated in Figure 3.5 are used to portray results

of spatial analytical functions. The examples show a variation of mapping techniques arising

from point-based data. They offer uniquely different visualization of disease or health-

related patterns, development, and trends. For example, the kernel density method is a

means of summarizing points by quadrants or grids of a uniform size (instead of some

administrative zones) through a moving window approach. This method of presentation not

only addresses the issue of data privacy but also diminishes the effects of MAUP (Modifiable

areal unit problem) and area dependence. Point buffers may also be used to indicate more

clearly the patterns of spatial clustering of points and delineate possible hot spot areas. The

choice of suitable mapping techniques relies largely on cartographic experience and

geographic understanding, in addition to creativity on the part of the spatial analyst.

Page 22: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

22

Figure 16. Types of spatial analytical map outputs (Extracted from Lai, So & Chan, 2009)

Page 23: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

23

3.1. Why disease mapping is important?

Disease mapping is the first step toward understanding the spatial aspects of health-related

problems because particular types of information are high-lighted in maps. Disease

distributions can be shown through different cartographic symbolization as points, lines,

and patterns. Associative analyses can then be formulated through visual inspection of the

disease maps in conjuction with statistical deduction. Although disease mapping seems only

a tool for preliminary data exploration, it nonetheless offers useful hints in terms of

informing needs for further statistical and empirical analyses as well as different

visualization techniques.

Disease maps provide a rapid visual summary of complex geographic information and

may identify subtle patterns in the data that are missed in tabular presentations. They are

used variously for descriptive purposes, to generate hypotheses as to etiology, for

surveillance to highlight areas at apparently high risk, and to aid policy formation and

resource allocation. They are also useful to help place specific disease clusters and results of

point-source studies in proper context.

4. Some Spatial Analysis softwares that can be used in Epidemiology.

• SaTScan is a free software that analyzes spatial, temporal and space-time data using

the spatial, temporal, or space-time scan statistics.

It is designed for any of the following interrelated purposes:

- Perform geographical surveillance of disease, to detect spatial or space-

time disease clusters, and to see if they are statistically significant.

- Test whether a disease is randomly distributed over space, over time or

over space and time.

- Evaluate the statistical significance of disease cluster alarms.

-Perform prospective real-time or time-periodic disease surveillance for the

early detection of disease outbreaks.

SaTScan is a software available for Windows, Linux and Mac. Download is

done trough the software website: http://www.satscan.org/ . In the website we can

find a list of published papers related to the epidemiological field sorted

thematically.

Page 24: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

24

• GeoDa is an interactive environment that combines maps with statistical charts and

graphics, using the technology of dynamically linked windows. It was developed by

Luc Anselin of the Spatial Analysis Laboratory of the University of Illinois, Urbana–

Champaign. Along with its mapping functionality, GeoDa contains the usual EDA

graphs (i.e., EDA graphics including histogram, box plot, scatterplot, etc.) and

implements brushing for bothmaps and statistical plots. The beta release is free for

download for noncommercial use only from http://geodacenter.asu.edu/ .

• R has many spatial analysis packages, e.g gstat package

• GRASS GIS, commonly referred to as GRASS (Geographic Resources Analysis

Support System), is a free and open source Geographic Information System (GIS)

software suite used for geospatial data management and analysis, image processing,

graphics and maps production, spatial modeling, and visualization.

For more information about spatial analysis softwares see the next link

http://en.wikipedia.org/wiki/List_of_spatial_analysis_software

5. Some applications of Spatial Analysis (SA) in Epidemiology

One of the most important applications of SA in Epidemiology are diseases mapping,

ecologic studies, cluster identification and environment surveillance (Santos & Souza,

2007).

In literature there are hundreds of paper studies that use SA in some aspect related

to Epidemiology. Here, three papers that use different SA techniques were selected from the

literature to be given as examples of application of SA in Epidemiology.

Porcasi et al. (2006) studied the Infestation of Rural Houses by Triatoma Infestans

(Hemiptera: Reduviidae) in Southern Area of Gran Chaco in Argentina (Porcasi et al., 2006).

They analyzed of the spatial pattern of house infestations by T. infestans before and after

house spraying with deltamethrin in the San Martín Department (an arid Chaco region of

central Argentina). Before house spraying, all houses within this department were

inspected and infestation by T. infestans in the domestic and peridomestic structures was

recorded. All houses within the department were treated with deltamethrin. House

spraying was carried out between November 2003 and June 2004. Latitude and longitude

coordinates of 151 localities were recorded with a GPS and used to build a geographic

database. These localities were the units of analysis and were considered infested if at least

one house of the group was infested by T. infestans. One year after spraying the houses

localities recorded before the insecticide application were visited to carry out an active

Page 25: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

25

search for T. infestans. A locality was considered infested if at least one peridomestic

structure was positive for T. infestans. Spatial analysis of the house infestation rate was

carried out with the Bernoulli model of the SaTScan, version 4.0 statistics (Information

Management Services 2003). This point-pattern analysis seeks the existence of groups of

localities with significantly higher or lower infestation rates than the departmental average.

To carry out the analysis, the scan statistics calculate the number of infested houses in

circles of increasing radii and compares this number with the expected number of houses

predicted by the binomial probability distribution.

The pattern of house infestation before the insecticide application showed two

clusters of high house infestation rate. A primary cluster was identified to the southwest of

the department, where 15 localities showed a house infestation rate of 68.4% (n _ 52

houses). A secondary cluster of high house infestation rate, composed by four localities with

100% of infested houses (n_12) was located at the northwest of the department. A cluster of

low house infestation rate (11.4%) was located at the northeast of the department,

including four localities (with 35 houses); (Figure 17 A). Another cluster of low house

infestation (20.7%) was located east of the department, including 150 houses in 48

localities (Figure 17 B). A comparison of house infestation before and after the spraying

intervention of 89 localities showed that 47 localities that were infested before the

spraying, 46.8% (n _ 22) remained infested 1 yr later. Of the 42 localities that were not

infested before the spraying, 69% (n _ 29) remained uninfested 1 yr later (Fig. 2).

All localities included in the cluster of low infestation recorded before spraying were

included in the low infestation cluster after spraying to the east of the San Martín

Department.

Figure 17. (A) Cluster of high infestation (primary and secondary clusters: 100 and 68.4%

infestation, respectively), as closed squares and cluster of low infestation (11.4%) as open

squares before the insecticide application. (B) Cluster of high infestation (46.9%), as closed

circles and cluster of low infestation (20.7%) as open circles after the insecticide

application.

Page 26: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

26

The results observed in this study confirm that the identification of the infestation

rate clustering at the departmental level by using a geographic information system offers

several advantages over the traditional reporting system currently in use by the vector

control programs in Argentina. It allows for a clear identification of heterogeneity in the

infestation rate distribution that could be used as a basis for risk stratification and

differential allocation of resources. The GIS database also allows a more efficient

mechanism for monitoring vector control activities, compared with procedures based on

hard-copy forms. Other activities carried out by the Chagas disease control program (e.g.,

parasitological treatments) or not connected with the program (e.g., vaccination programs)

could use the same information base.

In a study carried out by Chen et al. (2007), he purpose is to present kriging based

on data from a few sites as a solution to the problem of predicting the spatial distribution of

S. japonicum infection over Dangtu county, China. They established population-based

database containing the human prevalence of schistosomiasis at the village level from 2001

to 2004. Spatial correlation analysis was performed by the semivariogram model, which

provides a measure of variance as a function of distance between data points. The

semivariance was calculated. They used the spatial analyst module of ArcGIS 9.0 and

selected the exponential model to fit the spatial correlation of infection rate with S.

japonicum. They investigated the direction trend of the infection to identify the

presence/absence of trends at a certain direction in the input dataset, and select the

suitable order for the ordinary kriging analysis for the next step. By using the geostatistical

module of ArcGIS 9.0 they selected the suitable order to carry out the ordinary kriging

analysis and developed the prediction map based on the human infection rate for each

endemic village. They then categorized the predicted infection rate to create the

schistosomiasis endemic map with classified strata, based on the “Chinese Operational

Scheme of Schistosomiasis Control” enacted by the Ministry of Health (MoH) in 2004. At the

same time, the map of the standard error of the prediction, i.e. the uncertainty of the

prediction, was produced to qualify the prediction result enabling they to put forward a

particular control strategy for each endemic stratum according to their specific

environmental characteristics.

Based on the result of directional trend analysis, they developed a prediction map by using

ordinary kriging, which is shown in Figure 18. The darker the colour, the higher the

predicted S. japonicum infection rate. The apparent spatial pattern of S. japonicum infection

in Dangtu county presented an infection situation which was the most serious in the north-

west and south-east, while the south-western, north-eastern and central areas were much

less affected with medium infection rates in the transition areas. From the category map,

developed as part of the study, we found that most endemic villages in Dangtu county fell in

the 4th and 5th epidemic strata accounting for 72.7% of all the endemic villages in the

Page 27: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

north-west and south-east part of the county. This was followed by the 3

accounting for 14.1% of all the endemic

of the endemic villages which were found i

Figure 18. Prediction map of

Fig. 19. Category map of epidemic strata of

It is suitable to predict the spatial distributi

infection with population-based prevalence data for each endemic village.

this study stated that the directional trend and moderate spatial correlation

part of the county. This was followed by the 3

accounting for 14.1% of all the endemic villages, and the 2nd stratum, accounting for 13.2%

of the endemic villages which were found in the centre of the county (Figure 19

Prediction map of S. japonicum infection in Dangtu county, China

. Category map of epidemic strata of S. japonicum infection in Dangtu county.

It is suitable to predict the spatial distribution of the prevalence of

based prevalence data for each endemic village. The authors of

the directional trend and moderate spatial correlation resulting from

27

part of the county. This was followed by the 3rd stratum,

villages, and the 2nd stratum, accounting for 13.2%

ure 19).

county, China

infection in Dangtu county.

the prevalence of S. japonicum

The authors of

resulting from

Page 28: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

28

the study is partially due to the spatially correlated distribution of vegetation and

temperature as well as the water-contact behavior related to the Yangtze River and its

branches. Since different control strategies can be defined according to the principle of the

national schistosomiasis control strategy, they recommend that the control strategy be

defined based on the local environmental settings as well as epidemic strata at base level or

village level, at least in the Dangtu county.

According to Hay, Graham and Rogers (2006): “Geostatistical kriging has been

applied to a variety of disease prediction problems. For example, Oliver et al. (1992) and

Webster et al. (1994) were one amongst the first to apply geostatistics to characterize and

map disease pattern. Kelsall and Wakefield (2002) used kriging to map colorectal cancer in

Birmingham, UK. Geostatistical cokriging has been applied to map the risk of childhood

cancer (Oliver et al., 1998) and tick habitats from NOAA AVHRR imagery (Estrada-Pena,

1998)”.

Page 29: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

29

6. Referencias bibliográficas

Anselin, L. 1993. The Moran Scatterplot as an ESDA Tool to Assess Local Instability in

Spatial Association. Research Paper 9330. Paper prepeared for presentation at the

GISDATA Specialist Meeting on GIS and Spatial Analysis, The Netherlands, December

1-5.

Anselin, L. 1995. Local Indicators of Spatial Association-LISA. TheGeographical Analysis,.

27(2).

Chen, Z.; Zhou, X.N.; Yang, K.; Wang, X.H.; Yao, Z.Q.; Wang, T.P.; Yang, G.J.; Yang, Y.J.; Zhang,

S.Q.; Wang, J.; Jia, T.W.;Wu, X. H. 2007. Strategy formulation for Schistosomiasis

japonica control in different environmental settings supported by spatial analysis: a

case study from China Geospatial Health 2: 223-231.

Diggle, P.J. 2010. Historical Introduction in Spatial Epidemiology in Gelfald, A.E.; Diggle, P.J.;

Fuentes, M. and Guttorp, P.eds, Handbook of spatial statistics. Taylor and Francis,

London.

Estrada-Pena, A. 1998. Geostatistics and remote sensing as predictive tools of tick

distribution: a cokriging system to estimate Ixodes scapularis (Acari: Ixodidae)

habitat suitability in the United States and Canada from advanced very high

resolution radiometer satellite imagery. Journal of Medical Entomology 35: 989–995.

Gatrell, A.C.; Bailey, T.C.; Diggle, P.J.; B.S. Rowlingson. 1996. Spatial Point Pattern Analysis

and Its Application in Geographical Epidemiology. Transactions of the Institute of

British Geographers, New Series 21(1):256-274.

Gomez-Barrosoa, D.; Nogaredaa, F.; Canoa, R.; Pina, M.F.; Del Barriof, J.L.; Simona, F. 2011.

Patron espacial de la legionelosis en España, 2003-2007. Gaceta Sanitaria 25(4):

290–295.

Graham, A.J.; Atkinson, P.M. & Danson, F.M. 2004. Spatial analysis for epidemiology. Acta

Tropica 91: 219–225.

Haining R. 1994. Designing spatial data analysis modules for geographical information

Systems in Fotheringham A.S. and Rogerson P. Eds Spatial analysis and GIS Taylor

and Francis, London.

Kelsall, J. & Wakefield, J.(2002. Modeling spatial variation in disease risk: a geostatistical

approach. Journal of the American Statistical Association 97: 692–701.

Lai, P.C.; So F.M. & Chan, K.W. Eds. 2009. Spatial epidemiological approaches in disease

mapping and analysis. CRC Press Taylor and Francis Group, Boca Ratón, FL.

Jerrett, M.; Gale, S.; Kontgis, C. 2010. Spatial Modeling in Environmental and Public Health Research, International Journal of Environmental Research and Public Health 7: 1302-

1329.

Page 30: SPATIAL ANALYSIS APPLIED TO EPIDEMIOLOGYaulavirtual.ig.conae.gov.ar/moodle/pluginfile.php/513/mod_page... · 3 1. Introduction 1.1 What is Spatial Analysis? Spatial analysis is a

30

Oliver, M.A., Muir, K.R., Webster, R., Parkes, S.E., Cameron, A.H., Stevens, M.C.G. & Mann, J.R.

1992. A geostatistical approach to the analysis of pattern in rare disease. Journal of

Public Health Medicine 14: 280–289.

Oliver, M.A., Webster, R., Lajaunie, C., Muir, K.R., Parkes, S.E., Cameron, A.H., Stevens, M.C.G.

& Mann, J.R. 1998. Binomial cokriging for estimating and mapping the risk of

childhood cancer. IMA Journal of Mathematics Applied in Medicine and Biology 15:

279–297.

Pace, R.K. & LeSage, J. 2010.Spatial Econometrics in Gelfald, A.E.; Diggle, P.J.; Fuentes, M. and

Guttorp, P.eds, Handbook of spatial statistics. Taylor and Francis, London.

Paulan, S.C.; Silva, H.; Freitas Lima, E.; Flores, E.F.; Tachibana, V.M.; Kanda, C.Z.; Noronha,

A.C.F and Dobre, P. 2012. Spatial distribution of canine visceral Leishmaniasis in Ilha

Solteira, Sao Paulo, Brazil. Engenharia Agricola Jaboticabal 32(4): 765-774.

Pina, M.F.; Ferreira Alves, S.; Correia Ribeiro, A.I.; Castro Olhero, A. 2010. Epidemiología

espacial: nuevos enfoques para viejas preguntas. Universitas Odontológica 29(63):

47-65.

Porcasi, X.; Catalán S.S.; Hrellac, H.; Scavuzzo, M.C and Gorla, D.E, 2006. Infestation of Rural

Houses by Triatoma Infestans (Hemiptera: Reduviidae) in Southern Area of Gran

Chaco in Argentina. Journal of Medical Entomology 43(5): 1060-1067.

Rezaeian, M.; Dunn, G.; St Leger, S.; Appleby, L. 2007. Geographical epidemiology, spatial

analysis and geographical information systems: a multidisciplinary glossary. Journal

of Epidemiology & Community Health 61: 98-102.

Rot M. de la C., 2006. Introdución al análisis de datos mapeados o algunas de las (muchas)

cosas que puedo hacer si tengo coordenadas. Ecosistemas 15 (3): 19-39.

Santos, S.M.& Souza, W.V. Eds. 2007. Introdução à Estatística Espacial para a Saúde Pública /

Ministério da Saúde, Fundação Oswaldo Cruz; - Brasília : Ministério da Saúde. (Série

B. Textos Básicos de Saúde) (Série Capacitação e Atualização em Geoprocessamento

em Saúde;3).

Waller, L. 2010. Point Process Models and Methods in Spatial Epidemiology in Gelfald, A.E.;

Diggle, P.J.; Fuentes, M. and Guttorp, P.eds, Handbook of spatial statistics. Taylor and

Francis, London.

Webster, R., Oliver, M.A., Muir, K.R. and Mann, J.R. 1994. Kriging the local risk of a rare

disease from a register of diagnoses. Geographical Analysis 26 (2): 168–185.