Computers, Environment and Urban Systems
29 (2005) 558–579
www.elsevier.com/locate/compenvurbsys
A cokriging method for estimatingpopulation density in urban areas
Changshan Wu a,*, Alan T. Murray b,1
a Department of Geography, University of Wisconsin-Milwaukee, P.O. Box 413,
Milwaukee, WI 53201-0413, USAb Department of Geography, The Ohio State University, Columbus, OH 43210-1361, USA
Abstract
Population information is typically available for analysis in aggregate socioeconomic
reporting zones, such as census blocks in the United States and enumeration districts in the
United Kingdom. However, such data mask underlying individual population distributions
and may be incompatible with other information sources (e.g. school districts, transportation
analysis zones, metropolitan statistical areas, etc.). Moreover, it is well known that there are
potential significance issues associated with scale and reporting units, the modifiable areal unit
problem (MAUP), when such data are used in analysis. This may lead to biased results in spa-
tial modeling approaches. In this study, impervious surface fraction derived from Thematic
Mapper (TM) imagery was applied to derive the underlying population of an urban region.
A cokriging method was developed to interpolate population density by modeling the spatial
correlation and cross-correlation of population and impervious surface fraction. Results sug-
gest that population density can be accurately estimated using cokriging applied to impervious
surface fraction. In particular, the relative population estimation error is �0.3% for the entire
study area and 10–15% at block group and tract levels. Moreover, unlike other interpolation
methods, cokriging gives estimation variance at the TM pixel level.
� 2005 Elsevier Ltd. All rights reserved.
0198-9715/$ - see front matter � 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compenvurbsys.2005.01.006
* Corresponding author. Tel.: +1 414 2294860; fax: +1 414 2293981.
E-mail addresses: [email protected] (C. Wu), [email protected] (A.T. Murray).1 Tel.: +1 614 688 5441; fax: +1 614 292 6213.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 559
Keywords: Population interpolation; Cokriging; Remote sensing
1. Introduction
The difficulties associated with the application of zone-based census population
data in geographical analyses have been well documented in previous studies
(Fotheringham & Wong, 1991; Martin, 1989, 1996). One important issue is data
aggregation. In many applications, census data cannot sufficiently represent theunderlying geographical distribution of population because it is reported through
aggregating individual population counts in irregular areal units, which can be geo-
graphically meaningless. This aggregation tends to smooth local variability and
requires an assumption of uniformly distributed population within a reporting unit
(Moon & Farmer, 2001). While there are legitimate reasons for reporting census
information in this way (i.e. privacy of census respondents), business and service
planning benefit substantially from greater resolution population data (Longley
& Clarke, 1995). For example, Martin and Williams (1992) and Beguin, Thomas,and Vandenbussche (1992) emphasized the importance of detailed population
information in the location analyses of health-care centers and public libraries.
Moreover, in urban sustainability studies Harris and Longley (2000) point out that
census-based models tend to overestimate residential area because of its coarse
resolution.
Another difficulty with zone-based population data is related to incompatible
spatial information layers (Bracken, 1993; Goodchild, Anselin, & Deichmann,
1993). Different departments and agencies collect and distribute data in varying zo-nal arrangements (e.g. school districts, transportation analysis zones, metropolitan
statistical areas, etc.). As a consequence, a significant problem arises in regional
analysis and modeling, in which multiple data sources must be integrated before
analysis can be implemented (Goodchild et al., 1993). Moreover, the boundaries
of areal units in census data are not data derived, but rather are the result of enu-
meration and reporting. The modifiable areal unit problem (MAUP) may exist
when utilizing such data in geographical applications. In particular, the relation-
ship between variables may only be valid for one particular zonal arrangementand scale, potentially biasing results obtained in statistical and spatial analyses
(Martin, 1996; Openshaw, 1977).
One approach for dealing with the above problems is to transform aggregated
census data to grid-based population estimates using areal interpolation (Langford,
Maguire, & Unwin, 1991; Martin, 1989; Okabe & Sadahiro, 1997). Areal interpola-
tion methods may be grouped into two categories: simple interpolation and intelli-
gent interpolation (Okabe & Sadahiro, 1997). Simple interpolation involves
transferring data from irregular polygons to regular grids without any supplemen-tary data (Lam, 1983; Martin, 1996; Tobler, 1999). This method is preferred when
fast computation is important or additional information is unavailable (Okabe &
Sadahiro, 1997). In contrast, intelligent interpolation transfers data with the help
560 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
of additional information (Harris & Longley, 2000; Langford et al., 1991). This
method has proven more accurate than simple interpolation, although greater
computational processing is required (Fisher & Langford, 1995; Sadahiro, 1999).
Regression analyses supplemented with land use and land cover data are often ap-
plied in intelligent interpolation (Langford et al., 1991; Langford & Unwin, 1994).However, detailed biophysical information is usually lost in producing land use data
from remotely sensed images (Jensen, 1983). As a result, limited land use types are
too coarse for estimating detailed population density. Moreover, the basic assump-
tions of regression analyses (e.g. spatial independence) are unlikely to be satisfied in
geographical applications (Griffith & Can, 1996). Impervious surface fraction in res-
idential areas may be useful for supplementing the developed interpolation process.
Detailed information on residential areas can thus be maintained, providing clues on
population distribution (Ji & Jensen, 1999). Spatial autocorrelation in impervioussurface fraction and population, and the cross-correlation between these two spatial
variables, are explored and modeled in this paper using geostatistical techniques.
Based on modeled spatial relationships, cokriging is applied in this paper to deter-
mine population density in Columbus, OH.
The organization of this paper is as follows. Our study area and data sources
are described in Section 2. The process of deriving impervious surface fraction
in residential areas from remotely sensed imagery is described in Section 3. In par-
ticular, we detail the creation of impervious surface fraction from ETM+ imageryfor the entire study region and describe a procedure for delineating residential
areas within this region. Population density estimation using cokriging combined
with residential impervious surface fraction is reported in Section 4. Accuracy
assessment of the population estimates is addressed in Section 5. Section 6 reports
an adjustment of the population estimates. Finally, conclusions and discussion are
provided in Section 7.
2. Study area and data sources
A portion of the Columbus metropolitan area in Franklin County, OH, USA was
chosen as our study region for this research. This region is 47.4 km2 and is divided
into 36 tracts, 125 block groups, and 2445 blocks in the 2000 US Census (see Fig. 1).
The 2000 Census data were acquired from the ESRI website in the shapefile format
(United States Census Bureau, 2002). Landsat 7 ETM+ imagery, which was utilized
to derive residential impervious surface fraction, was acquired on July 8, 1999. Addi-tional data, such as Digital Orthophoto Quarterquadrangles (DOQQs) from the
Ohio Geographically Referenced Information Program (OGRIP, 1999) and Na-
tional Land Cover Data (NLCD) from the Multi-Resolution Land Characteristics
Consortium (Multi-Resolution Land Characteristics Consortium, 2002), were uti-
lized to examine residential classification accuracy and select training samples. More-
over, parcel data from the Franklin County Auditor (2002) and address-based
employment data from the Mid-Ohio Regional Planning Commission (MORPC,
Fig. 1. Study area as part of the Columbus metropolitan area in Franklin County, OH, USA (left) and
Landsat ETM+ image acquired on July 8, 1999 for this area (right).
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 561
2002) were utilized to identify possible misclassified pixels since these data maintain
detailed local information about land use and employment.
3. Estimating impervious surface fraction in residential areas
Impervious surface is any material prohibiting the infiltration of water into soil.
As a major component of urban infrastructure, impervious surface has become a pri-
mary variable in urban planning and environmental management (Ji & Jensen, 1999;
Ridd, 1995). Impervious surface fraction, calculated as the proportion of impervious
surface over a small area, has been found to reveal more information about built-upareas than land use and land cover classification (Ji & Jensen, 1999). For population
562 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
estimation, as an example, impervious surface in residential areas generally corre-
sponds to housing, which serves as an indicator of people.
3.1. Impervious surface fraction estimation
Methods for quantifying impervious surface from remotely sensed data are typi-
cally based on either fuzzy classification or spectral mixture analysis (Ji & Jensen,
1999; Phinn, Stanford, Scarth, Murray, & Shyy, 2002; Rashed, Weeks, Gadalla, &
Hill, 2001). In this study, a spectral mixture analysis method was applied to estimate
impervious surface fraction from an ETM+ image (Wu & Murray, 2003). Four end-
members (see Fig. 2), vegetation, high albedo, low albedo and soil, were selected to
represent heterogeneous urban land use and land cover through the analysis of the
spectral feature spaces of a transformed ETM+ image using the maximum noisefraction (MNF) transformation, the details of which are given in Green, Berman,
Switzer, and Craig (1988) and Lee, Woodyatt, and Berman (1990). Consequently,
a fully constrained four-endmember linear mixing model was applied to calculate
each endmember fraction from the Landsat ETM+ data (see Fig. 3). Furthermore,
impervious surface fraction in each ETM+ pixel was modeled by adding the frac-
tions of low albedo and high albedo endmembers after removing the effects of water
and clouds (see Fig. 4).
3.2. Residential area classification
To this point we have detailed impervious surface fraction estimation for the
entire study area. However, we know that population (the major interest in this
Fig. 2. ETM+ reflectance spectra of selected endmembers. These endmembers were chosen by analyzing
the spectral feature spaces of the MNF transformed ETM+ image.
Fig. 3. Endmember fraction images calculated through a fully constrained four-endmember linear mixing
model: (a) vegetation fraction image; (b) high albedo fraction image; (c) low albedo fraction image
(including water); (d) soil fraction image.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 563
research) is generally restricted to residential areas. Therefore, it is necessary to iden-
tify residential land use within the study area. A maximum likelihood classification
was applied to delineate residential pixels. Similar approaches have been utilized in
classifying residential land uses by Lo (1995), Mesev (1998), and Chen (2002). Six
classes, vegetation, soil, water, commercial and transportation, low density residen-
tial, and high density residential, were specified in selecting training samples with the
help of DOQQ data, NLCD data, and the original ETM+ image. The classification(see Fig. 5) was conducted using a maximum likelihood classifier provided in ER-
DAS Imagine 8.4 (ERDAS Imagine, 1997). After deriving this image, we grouped
the six classes into two major classes: residential and non-residential.
Since we are estimating detailed population density, residential classification accu-
racy is essential in this research. Therefore, we performed post-processing to identify
possible misclassified pixels. In particular, pixels within zero population census
Fig. 4. Impervious surface fraction image calculated through adding low albedo and high albedo
endmember fractions after removing the effects of water and clouds.
Fig. 5. A maximum likelihood classification of the ETM+ image for the Columbus metropolitan area.
564 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
blocks should obviously not be classified as residential land use. Such pixels were
identified and reclassified as non-residential. Alternatively, pixels within high popu-
lation density census blocks were also subject to further scrutiny. If these pixels are
not classified as residential, they are possibly misclassified and require further anal-
ysis. In this study, we utilized parcel and employment data to identify potential mis-
classified pixels. Group-quarter populations, people in institutions, shelters, andnursing homes, and students in university dormitories (Plane & Rogerso, 1994), were
typically found in these misclassified pixels. Such areas are difficult to classify using
only remotely sensed data because they share similar spectral signatures to commer-
Fig. 6. Residential land use classification after the maximum likelihood classification and post-processing.
Table 1
Residential land use classification accuracy assessment
Classified image Reference image
Residential Non-residential Commission errors (%)
Residential 146 15 9.32
Non-residential 25 214 10.46
Omission error 14.62 6.55
Overall accuracy = 90.00%, overall kappa statistics = 0.7942.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 565
cial land uses. With the help of parcel and employment data, we were able to identify
these pixels and reclassify them as residential areas.
The classification accuracy of residential land use after the maximum likelihood
classification and post-processing (see Fig. 6) was examined using 400 stratified ran-
domly selected samples. The DOQQ images acquired between 1994 and 1995 were
used in this study for ground truthing. These DOQQs were co-registered with the
ETM+ image. A 3 by 3 sampling unit was adopted to avoid geometric errors. The
overall classification accuracy is 90% and the overall kappa coefficient is 0.7942(see Table 1).
With impervious surface fraction for the entire study area (Fig. 4) and the iden-
tified residential land use areas (Fig. 6), impervious surface fraction in residential
areas was easily obtained (see Fig. 7).
4. Interpolating population density using cokriging
After obtaining impervious surface fraction for residential areas, it can be utilized
as supplementary data to interpolate population density. Population density is
Fig. 7. Impervious surface fraction in residential areas.
566 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
usually estimated using a regression approach, which models the relationship be-
tween population and supplementary data derived from remote sensing imagery
(Chen, 2002; Harvey, 2002; Lo, 1995). An implicit assumption of regression analysis
is that population density is spatially independent. However, many researchers have
questioned this assumption, claiming that simple regression may lead to biased re-
sults (Griffith, 1993; Griffith & Can, 1996). Therefore, a model considering spatial
autocorrelation is more appropriate. Cokriging may improve the estimation preci-sion by accounting simultaneously for spatial autocorrelation in population density
and impervious surface fraction and the cross-correlation between these spatial vari-
ables. Moreover, it is suitable when the variable to be estimated (e.g. population den-
sity) is under-sampled while other supplementary variables are abundant (e.g.
impervious surface fraction).
Cokriging is a geostatistical method originating from mining applications (Cres-
sie, 1993; Journel & Huijbregts, 1978) and widely applied in soil science (Vauclin,
Vieira, Vachaud, & Nielsen, 1983; Webster, 1985; Webster & Burgess, 1980). Geosta-tistical methods were introduced in remote sensing in the late 1980s (Curran, 1988;
Woodcock, Strahler, & Jupp, 1988). Now geostatistics are commonly applied in soil
science, biogeography, climatology, and environmental studies (Atkinson, Webster,
& Curran, 1992, 1994; Oliver, Webster, & Gerrard, 1989a, 1989b). A review of geo-
statistical methods and associated applications may be found in Cressie (1993), Cur-
ran and Atkinson (1998), and Curran (2001). Although widely applied in physical
geography, cokriging has rarely been utilized in estimating socio-economic condi-
tions, such as population densities. In this paper, population density is estimatedusing a cokriging method in which the impervious surface fraction is taken as a sec-
ondary variable to improve estimation accuracy.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 567
4.1. Cokriging theory
As an extension to two or more variables in ordinary kriging, cokriging is based
on regionalized variable theory (Journel & Huijbregts, 1978; Oliver et al., 1989a).
According to this theory, any regionalized variable z(x) can be considered a realiza-tion of a random function Z(x), which is a combination of a deterministic compo-
nent, m(x), and random fluctuation, e(x):
zðxÞ ¼ mðxÞ þ eðxÞ ð1Þ
where x denotes the geographical coordinates in one, two, or three dimensions; m(x)
indicates a geographical trend or drift; and, e(x) is the spatially dependent random
errors with mean zero. In most applications, the deterministic component, m(x), isassumed to be locally constant,
mðxÞ ¼ l ð2Þ
and for any given distance and direction h, the variance of differences between z(x)
and z(x + h) is finite and independent of x:
var½zðxÞ � zðxþ hÞ� ¼ E½fzðxÞ � zðxþ hÞg2� ¼ 2cðhÞ ð3Þwhere vector h, the lag, is a given separation distance and direction from x, and c(h)is the variogram. c(h) has been found to be an important tool in modeling spatial
autocorrelation (Journel & Huijbregts, 1978). Moreover, if two or more variablesare needed, a cross-variogram is defined as follows:
cuvðhÞ ¼ 12E½fzuðxÞ � zuðxþ hÞgfzvðxÞ � zvðxþ hÞg� ð4Þ
Based on regionalized variable theory, it is necessary to estimate an under-sampled
variable using cokriging. This method ensures unbiased estimates with minimum and
known variance (Curran, 2001). If we consider estimating a variable u in a block B
with sampling points of u and a second variable v, our estimate will be
zuðBÞ ¼XNu
i¼1
kuizuðxuiÞ þXNv
j¼1
kvjzvðxvjÞ ð5Þ
in which Nu and Nv are the number of sampling points for variable u and v; xui and
xvj are the locations of sampling points for variable u and v, respectively; and, kui andkvj are the weights to be calculated.
In order to ensure unbiasedness, the following constraints must be satisfied(Aboufirassi & Marino, 1984):XNu
i¼1
kui ¼ 1 ð6Þ
XNv
j¼1
kvj ¼ 0 ð7Þ
568 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
The first constraint indicates that at least one observation of the primary variable u is
necessary for cokriging. Moreover, constraint (7) ensures that the summation of the
weights for the secondary variable v is zero. Subject to these constraints, we minimize
the estimation variance:
r2uðBÞ ¼ E½fzuðBÞ � zuðBÞg2� ð8Þ
This is an optimization problem in which kui and kvj are the decision variables and
r2uðBÞ is the objective function. Standard Lagrangian techniques can be applied to
solve this problem. This results in the following:
XNu
i¼1
kuicuuðxui; xukÞ þXNv
j¼1
kvjcuvðxuk; xvjÞ þ wu ¼ �cuuðB; xukÞ k ¼ 1;Nu ð9Þ
XNu
i¼1
kuicuvðxui; xvlÞ þXNv
j¼1
kvjcvvðxvj; xvlÞ þ wv ¼ �cuvðB; xvlÞ l ¼ 1;Nv ð10Þ
cuu(xui, xuk) is the semi-variogram of variable u between site i and k, cuv(xuk, xvj) is thecross semi-variogram between variable u and v at site k and j. Finally, �cuvðB; xvlÞ is thecross semi-variogram between variable u and v at block B and site l.
Using this method, there are Nu + Nv + 2 equations and Nu + Nv + 2 variables,
which can be easily solved by linear algebra. After obtaining the parameters kuiand kvj, zuðBÞ may be estimated using Eq. (5). The cokriging variance can be obtained
as a byproduct of the cokriging process as follows:
r2uðBÞ ¼
XNu
i¼1
kui�cuuðB; xuiÞ þXNv
j¼1
kvj�cuvðB; xvjÞ � wu � �cuuðB;BÞ ð11Þ
Matrix formulations of these equations can be found in Myers (1982), McBratney
and Webster (1983), and Aboufirassi and Marino (1984). Details on solving this
problem using Lagrangian techniques are given in Vauclin et al. (1983) and Atkinson
et al. (1992).
4.2. Variogram estimation
From Eqs. (6), (7), (9) and (10), it is clear that parameters kui and kvj are depen-
dent on the variograms associated with variables u and v, their cross-variogram, and
block size. In this study, block size is defined to be the same as the TM image reso-
lution (30 m by 30 m). Therefore, once the variograms and cross-variogram have
been derived, cokriging is a straightforward process (Atkinson et al., 1992, Atkinson,
Webster, & Curran, 1994). In practice, the variograms are typically estimated using
sampling points as follows:
cðhÞ ¼ 1
2NðhÞXNðhÞ
i¼1
fzðxiÞ � zðxi þ hÞg2 ð12Þ
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 569
where z(xi) are known values of variable u or v at sampling point xi, and N(h) is the
number of sampling point pairs separated by lag h. Similarly, the cross-variogram
can be estimated as follows:
cuvðhÞ ¼1
2NðhÞXNðhÞ
i¼1
fzuðxiÞ � zuðxi þ hÞg fzvðxiÞ � zvðxi þ hÞg ð13Þ
After obtaining the variogram and cross-variogram, a theoretical model is needed to
fit them. Such a model needs to be positive definite and coregionalized to ensure the
cokriging variance is non-negative. More discussion about choosing theoretical func-
tions can be found in McBratney and Webster (1986) and Curran (1988). In thisstudy, we chose the model satisfying the positive definite and coregionalized require-
ments, the details of which are discussed later in this paper.
4.3. Interpolating population density using cokriging
In this study population density is considered the primary variable to be esti-
mated. In addition, residential impervious surface fraction is considered a secondary
variable used to increase estimation accuracy. One issue is that reported census sta-tistics are not based on a sampling point, but rather on an areal unit like a block. The
centroid of a census block may be used as the sampling point for the assignment of
population density. However, this method is not realistic because there may not
actually be people at the centroid of a block. Martin (1989) solved this problem
by using a population-weighted point as the representative point of a census block.
In a similar manner, in this research the central point of the pixel whose impervious
surface fraction is approximately equal to the block mean is used as a population-
weighted block point. In addition, we assign impervious surface fraction of the pixeland average population density of the block to this sampling point. After obtaining
the impervious surface fraction and population density on these samples, the charac-
teristics of the data are explored. If they are not secondary stationary, i.e. have the
same mean and variance, the accuracy of the estimated experimental variogram and
associated cokriging will be degraded (Cressie, 1993). The histograms for population
density (see Fig. 8a) and impervious surface fraction (see Fig. 9) were captured based
on the sampling points. It is clear that population density is highly positively skewed
and may be approximated by a Poisson function with its variance proportional to itsmean value (Bailey & Gatrell, 1995; Harvey, 2002). A square root transformation
was performed on population density to stabilize its variance. The histogram of
the transformed population density (see Fig. 8b) shows that its distribution is near
normal and its variance is approximately constant. The histogram of impervious sur-
face fraction is slightly negatively skewed, but may be considered approximately nor-
mal. Thus, no transformation was conducted on impervious surface fraction. We
excluded zero population density census blocks because no interpolation is necessary
for these blocks.In this study, the primary variable u is the square root of population density, and
the secondary variable v is impervious surface fraction. Experimental variograms
Fig. 8. Histogram of (a) population density and (b) square root of population density at sampling points.
It shows that population density may be described by a Poisson distribution, while the square root
transformation is a reasonable approximation of a normal distribution.
Fig. 9. Histogram of impervious surface fraction at sampling points.
570 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
and cross-variograms were calculated using Eqs. (3) and (4). Gstat software was uti-
lized to fit these variograms to theoretical functions (Pebesma & Wesselin, 1998).
The weighted least squared method and visualization were applied in modeling the
experimental variograms (Cressie, 1985). Directional variograms were also com-
puted and no obvious anisotropies were found. Therefore, the variograms were as-
sumed to be isotropic and were fitted using an exponential model of the following
form:
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 571
cðhÞ ¼ C0 þ C1f1� eð�h=rÞg for h > 0
0 h ¼ 0
(ð14Þ
Here C0 is the nugget representing unexplained variance and r defines the spatial
scale of the variation. In practice, the sill is C0 + 0.95C1 at the point of 3r. In this
study, the parameters were calculated for the variograms of the square root of pop-
ulation density and impervious surface fraction, and also for their cross-variogram
(see Table 2 and Fig. 10).After obtaining the variograms of impervious surface fraction, square root of
population density, and their cross-variogram, a block cokriging was performed to
interpolate population density (see Fig. 11) using Gstat software embedded in
Idrisi (Harmon, 2002). Fig. 11 shows a clear geographical pattern of population
distribution in the study region. In particular, few people live in the CBD except
Table 2
Coefficients of the theoretical variogram and cross-variogram functions
C0 C1 r
Population density 0.196 0.176 1000
Impervious surface 0.007 0.0089 1000
Population density–impervious surface 0.012 0.030 1000
Fig. 10. Variograms of (a) square root of population density, (b) residential impervious surface fraction,
and (c) the cross-variogram between square root of population density and impervious surface fraction.
Exponential functions with r = 1000 are chosen to model these variograms.
Fig. 11. Estimated population density using developed cokriging method. The height indicates the value
of population density for each TM pixel. The average population density is 4.28, with a maximum of 52,
and a minimum of 0.
572 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
group-quarter populations. High-density household-based populations are adjacent
to the CBD in the southern and northwestern portions of the study region. More-
over, low-density household-based populations reside relatively far away from the
CBD (in the eastern and southern portions).
5. Accuracy assessment
Using the cokriging variance approach defined in Eq. (11) for the square root of
population density, the mean cokriging variance is 23.5% (minimum of 21.3% and
maximum of 50.3%). Fig. 12 shows the distribution of cokriging variance in the
study area. In particular, cokriging variance is high along the study area boundary
because few samples are used in estimating population density in this portion of
the region.It is possible to examine population count estimation accuracies at each census
zonal level using the root mean square error (ERMS) and coefficient of variation
(V) to evaluate the absolute and relative error as follows:
ERMS ¼ 1
n
Xn
i¼1
ðP i � bP iÞ2" #1=2
ð15Þ
V ¼ 1
P
Xn
i¼1
jP i � bP ij ð16Þ
Fig. 12. Cokriging variance of the square root of population density estimation. The average cokriging
variance is 0.235, with a maximum of 0.503, and a minimum of 0.213.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 573
where n is the number of total census zones; P is the total population in the study
area; Pi is the population count of census zone i; and bP i is the estimated population
count for zone i. The overall regional assessment of population count estimation
accuracy can be carried out using the relative estimation error (R):
R ¼ ðbP � P Þ=P ð17Þwhere bP is the total population estimate for the study area.
The cokriging method contrasts the traditional regression approach used to esti-
mate population density. The first regression model explores the relationship be-tween population density and the proportion of low and high density residential
areas within a census block (Langford et al., 1991; Lo, 1995; Chen, 2002). Applied
to our study area, the model is as follows (see Table 3):bP T
i ¼ 2.25526 � RLi þ 5.0612 � RH
i ð18Þwhere RL
i and RHi are the proportion of low and high density residential areas in a
census block and bP T
i is the expected population density in a census block using
the traditional regression approach.
A valid alternative regression model would be investigating the relationship be-
tween population density and impervious surface fraction in low and high density
Table 3
Coefficients of the regression model with residential land cover classes as explanatory variables
Coefficients Value Std. error t value Pr(>jtj)RL 2.2552 0.1727 13.0554 0.0000
RH 5.0612 0.0958 52.8349 0.0000
Table 4
Coefficients of the regression model with residential impervious surface fraction as explanatory variables
Coefficients Value Std. error t value Pr(>jtj)IL 6.5798 0.4335 15.1793 0.0000
IH 9.4650 0.1687 56.1212 0.0000
574 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
residential areas within each census block. Applied to our study area, the model is as
follows (see Table 4):bP A
i ¼ 6.5798 � ILi þ 9.4650 � IHi ð19Þwhere lLi and lHi are the fraction of impervious surface in low and high residential
areas in a census block and bP A
i is the expected population density in a census block
using this alternative regression approach. In both regression models, the area of
each census block was chosen as a weighting factor to reduce the effects of zone size.
Moreover, the intercepts in these regression models are not included because they are
not statistically significant (further its meaning in population estimation is not clear).
The explanatory variables are statistically significant (p 6 0.0001), which shows the
strong correlation between population density and the chosen explanatory variables(see Tables 3 and 4).
Comparative results (see Table 5) show that the cokriging method is the most
accurate. In particular, the coefficient of variation is relatively low at the census
block level (34.7%), low at the block group and tract levels (15.2% and 10.2% respec-
tively), and near zero for the entire study area (�0.3%). The estimation accuracies of
the two regression models are reported in Table 5 as well. Neither regression models
perform as well as the cokriging method in terms of estimation accuracy. As an
example, the coefficients of variation for the census tract level in the regression mod-els are 22.9% and 21.0% respectively, substantially higher than the variation ob-
tained using cokriging (10.2%). Comparing the two regression models, regression
with impervious surface fraction is slightly better than with land use classes (e.g.
21.0% vs. 22.9% estimate error at the census tract level). This result is consistent with
the literature showing that impervious surface fraction performs better than land
use/cover in urban analysis (Ji & Jensen, 1999).
Table 5
Absolute and relative estimation errors of the cokriging and regression models
Zones Average
population
Cokriging Regression with
land cover
Regression with
impervious surface
ERMS V ERMS V ERMS V
Block (2445) 40.99 45.3 34.7% 47.9 48.8% 45.5 46.6%
Block group (125) 801.74 215.0 15.2% 325.6 27.8% 290.7 25.2%
Tract (36) 2825.84 411.0 10.2% 967.6 22.9% 846.0 21.0%
Total study area 100, 200 �0.3% 1.0% 2.6%
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 575
6. Population density adjustment
The cokriging approach gives unbiased estimates for the square root of popula-
tion density with minimum variance. However, the population count estimation er-
rors evaluated at the census block level are still somewhat large (34.7%). As discussedin previous studies (Langford & Unwin, 1994; Fisher & Langford, 1995; Martin,
1996), interpolation methods should preserve population counts in each reporting
zone. One option is adding a volume-preserving constraint in the cokriging model.
However, this will make the model more complex since it has a quadratic objective
function and a quadratic regional constraint. In fact, it is not clear that this resulting
model can be solved, exactly or heuristically. An alternative option is to rescale the
population estimates on every pixel to satisfy this zonal constraint:
P �ij ¼ bP �
ij
P ibP i
ð20Þ
Here P �ij is the rescaled population estimates of pixel j in census block i, bP ij is the
population estimates through the cokriging, and Pi and bP i are the population counts
of block i (census count and cokriging estimates, respectively). This rescaled popu-
lation density (see Fig. 13) generally maintains the estimates obtained using cokri-ging, but emphasizes local variation as well. For example, the cokriging method
tends to underestimate population counts in multi-story and high-rise buildings
(the middle portion of Fig. 11). In contrast, the rescaling approach adjusts these
inaccuracies and obtains more accurate population density estimates.
Fig. 13. Adjusted population density that preserves zonal population counts. The height indicates the
value of population density for each TM pixel. The average population density is 4.40, with a maximum of
143, and a minimum of 0.
576 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
7. Conclusion
In this paper a cokriging method was developed for interpolating residential pop-
ulation density using census count data and impervious surface fraction. The results
are clearly better than regression-based interpolation approaches. In particular, therelative population estimation error for the entire study area is �0.3%, which is bet-
ter than the results obtained using regression methods (1.0%–2.6% estimation error).
Moreover, the estimation errors at the census block group and tract levels (15.2%
and 10.2% respectively) are about 10% lower than those calculated using regression
models (about 25–27% and 21–23% respectively). At census block level, the estima-
tion error is about 13–15% lower than those reported for the regression models (see
Table 3). These results demonstrate that cokriging applied to residential impervious
surface fraction is a superior alternative to traditional regression based interpolationapproaches using land use and land cover data.
One reason explaining why cokriging performs well is that it addresses spatial
autocorrelation and cross-autocorrelation associated with the distribution of people
in urban areas. Instead of ignoring spatial dependence, it models the spatial autocor-
relation of population and impervious surface fraction through variograms, and ap-
plies them in population interpolation. Moreover, unlike other interpolation
methods, it provides estimation variance (see Eq. (11) and Fig. 12) at the TM pixel
level (30 by 30 meter). This estimation variance is an important tool for assessingpopulation estimation error, without aggregating to census reporting zones.
Another interesting aspect of this work is that residential impervious surface frac-
tion was found to be an effective replacement for land use and land cover data typ-
ically used in modeling population density. This makes sense intuitively given that
impervious surface fraction is closely related to housing development, and thus pop-
ulation density. Moreover, the cross-variogram (see Fig. 10c and Table 2) clearly
shows that population density and impervious surface fraction are co-regionalized
variables, with only 25% variance unexplained. Also, regression analyses show thatthe regression model with impervious surface fraction consistently performs better
than the other utilizing land use classes.
A final point is that the obtained population estimates are essential for urban
planning applications. As an example, in sustainability studies, residential popula-
tion density is a primary indicator of automobile dependent regions (Harris & Long-
ley, 2000). In addition, the estimates of population density may be utilized in
transportation analyses. The traffic analysis zone (TAZ) is typically used as a basic
unit in traffic demand estimation and trip generation. However, there are significantproblems with traditional TAZ definitions as well as difficulties with associated tra-
vel distance calculation (Daganzo, 1980; Miller, 1999). Detailed population informa-
tion may be potentially helpful in redefining TAZs in order to achieve more
homogeneous population densities and socio-economic characteristics, thus poten-
tially eliminating the modifiable areal unit problem in a range transportation analy-
sis approaches.
While the developed approach is a considerable improvement for estimating pop-
ulation density at a fine scale, there are potential improvements that may be worth
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 577
exploring. One improvement would be satisfying the volume preserving constraint
during the interpolation process, requiring that interpolated population counts in
every census zone be equal to observed counts. In this study, we satisfied this con-
straint by rescaling population density in every pixel after interpolation. Although
the population counts in every census zone are maintained, this adjustment mayintroduce bias and increase estimation variance. More sophisticated models might
increase population density estimation accuracy and maintain population counts
in every census zone simultaneously.
References
Aboufirassi, M., & Marino, M. A. (1984). Cokriging of aquifer transmissivities from field measurements of
transmissivity and specific capacity. Mathematical Geology, 16(1), 19–35.
Atkinson, P. M., Webster, R., & Curran, P. J. (1992). Cokriging with ground-based radiometry. Remote
Sensing of Environment, 41, 45–60.
Atkinson, P. M., Webster, R., & Curran, P. J. (1994). Cokriging with airborne MSS imagery. Remote
Sensing of Environment, 50, 335–345.
Bailey, T., & Gatrell, A. C. (1995). Chapter 7: The analysis of area data. Interactive Spatial Data Analysis,
Longman Group Limited.
Beguin, H., Thomas, I., & Vandenbussche, D. (1992). Weight variation with a set of demand points, and
location–allocation issues: A case study of public libraries. Environment and Planning A, 24, 1769–1779.
Bracken, I. (1993). An extensive surface model database for population related information: Concept and
application. Environment and Planning B, 20, 13–27.
Chen, K. (2002). An approach to linking remotely sensed data and areal census data. International Journal
of Remote Sensing, 23, 37–48.
Cressie, N. (1985). Fitting variogram models by weighted least squares. Mathematical Geology, 17,
563–586.
Cressie, N. (1993). Statistics for spatial data (revised edition). New York: Wiley.
Curran, P. J. (1988). The semivariogram in remote sensing: An introduction. Remote Sensing of
Environment, 24, 493–507.
Curran, P. J. (2001). Remote sensing: Using the spatial domain. Environmental and Ecological Statistics, 8,
331–344.
Curran, P. J., & Atkinson, P. M. (1998). Geostatistics and remote sensing. Progress in Physical Geography,
22(1), 61–78.
Daganzo, C. F. (1980). Network representation, continuum approximations and a solution to the spatial
aggregation problem of traffic assignment. Transportation Research, 14B, 229–239.
ERDAS Imagine (1997). ERDAS Imagine tour guides (4th ed.). Atlanta Georgia: ERDAS, Inc.
Fisher, P. F., & Langford, M. (1995). Modeling the errors in areal interpolation between zonal systems by
Monte Carlo simulation. Environment and Planning A, 27, 211–224.
Fotheringham, A. S., & Wong, D. W. S. (1991). The modifiable areal unit problem in multivariate
statistical analysis. Environmental and Planning A, 23, 1025–1034.
Franklin County Auditor (2002). Franklin county auditor�s interactive geographic information system.
<http://209.51.193.83/search.html>.
Goodchild, M. F., Anselin, L., & Deichmann, U. (1993). A framework for the areal interpolation of
socioeconomic data. Environment and Planning A, 25, 383–397.
Green, A. A., Berman, M., Switzer, P., & Craig, M. D. (1988). A transformation for ordering multispectral
data in terms of image quality with implications for noise removal. IEEE Transactions on Geoscience
and Remote Sensing, 26, 65–74.
Griffith, D. A. (1993). Spatial regression analysis on the PC: Spatial statistics using SAS. Washington, DC:
Association of American Geographers.
578 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579
Griffith, D. A., & Can, A. (1996). Spatial statistical/econometric version of simple urban population
density models. In S. L. Arlinghaus & D. A. Griffith (Eds.), Practical handbook of spatial statistics.
CRC Press.
Harmon, D. (2002). Quick Take Reviews: Idrisi32 Release 2. GEOWorld, March, pp. 50–51.
Harris, R. J., & Longley, P. A. (2000). New data and approaches for urban analysis: Modeling residential
densities. Transactions in GIS, 4(3), 217–234.
Harvey, J. T. (2002). Estimating census district populations from satellite imagery: Some approaches and
limitations. International Journal of Remote Sensing, 23(10), 2071–2095.
Jensen, J. R. (1983). Biophysical remote sensing. Annals of the Association of American Geographers, 73,
111–132.
Ji, M., & Jensen, J. R. (1999). Effectiveness of subpixel analysis in detecting and quantifying urban
imperviousness from Landsat Thematic Mapper imagery. Geocarto International, 14(4), 31–39.
Journel, A. G., & Huijbregts, C. J. (1978). Mining geostatistics. New York: Academic Press.
Lam, N. S. (1983). Spatial interpolation methods: A review. American Cartographer, 10(2), 129–149.
Langford, M., Maguire, D. J., & Unwin, D. J. (1991). The areal interpolation problem: Estimating
population using remote sensing in a GIS framework. In I. Masser & M. Blakemore (Eds.), Handling
geographical information: Methodology and potential applications (pp. 55–77). Harlow, Essex:
Longman.
Langford, M., & Unwin, D. J. (1994). Generating and mapping population density surfaces within a
geographical information system. Cartographic Journal, 31, 21–26.
Lee, J. B., Woodyatt, A. S., & Berman, M. (1990). Enhancement of high spectral resolution remote sensing
data by a noise-adjusted principal components transformation. IEEE Transactions on Geoscience and
Remote Sensing, 28, 295–304.
Lo, C. P. (1995). Automated population and dwelling unit estimation from high-resolution satellite
images: A GIS approach. International Journal of Remote Sensing, 16(1), 17–34.
Longley, P., & Clarke, G. (1995). GIS for business and service planning. Cambridge: GeoInformation
International.
Martin, D. (1989). Mapping population data from zone centroid locations. Transactions—Institute of
British Geographers, 14, 90–97.
Martin, D. (1996). An assessment of surface and zonal models of population. International Journal of
Geographical Information Systems, 10(8), 973–989.
Martin, D., & Williams, H. C. W. L. (1992). Market-area analysis and accessibility to primary health-care
centers. Environment and Planning A, 24, 1009–1019.
McBratney, A. B., & Webster, R. (1983). Optimal interpolation and isarithmic mapping of soil properties:
5. Co-regionalization and multiple sampling strategy. Journal of Soil Science, 34(1), 137–162.
McBratney, A. B., & Webster, R. (1986). Choosing functions for semi-variograms of soil properties and
fitting them to sampling estimates. Journal of Soil Science, 37, 617–639.
Mesev, V. (1998). The use of census data in urban image classification. Photogrammetric Engineering and
Remote Sensing, 64, 431–438.
Mid-Ohio Regional Planning Commission (MORPC) (2002). GIS technology. <http://www.morpc-
soft.org/GIS/gis.htm>.
Miller, H. J. (1999). Potential contributions of spatial analysis to geographical information systems for
transportation (GIS-T). Geographical Analysis, 31(4), 373–399.
Moon, Z. K., & Farmer, F. L. (2001). Population density surface: A new approach to an old problem.
Society and Natural Resources, 14, 39–49.
Multi-Resolution Land Characteristics Consortium (MRLC) (2002). National land cover data (NLCD).
<http://www.epa.gov/mrlc/nlcd.html>.
Myers, D. E. (1982). Matrix formulation of co-kriging. Mathematical Geology, 14(3), 250–257.
Ohio Geographically Referenced Information Program (OGRIP) (1999). Digital orthophoto quarter-
quadrangles. <ftp.geodata.gis.state.oh.us/geodata/doqq>.
Okabe, A., & Sadahiro, Y. (1997). Variation in count data transferred from a set of irregular zones to a set
of regular zones through the point-in-polygon method. International Journal of Geographical
Information Science, 11(1), 93–106.
C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 579
Oliver, M., Webster, R., & Gerrard, J. (1989a). Geostatistics in physical geography, Part I: Theory.
Transactions—Institute of British Geographers, 14, 259–269.
Oliver, M., Webster, R., & Gerrard, J. (1989b). Geostatistics in physical geography, Part II: Applications.
Transactions—Institute of British Geographers, 14, 270–286.
Openshaw, S. (1977). Optimal zoning systems for spatial interaction models. Environment and Planning A,
9, 169–184.
Pebesma, E. J., & Wesseling, C. G. (1998). Gstat: A program for geostatistical modeling, prediction and
simulation. Computers and Geosciences, 24(1), 17–31.
Phinn, S., Stanford, M., Scarth, P., Murray, A. T., & Shyy, T. (2002). Monitoring the composition and
form of urban environments based on the vegetation–impervious surface–soil (VIS) model by sub-pixel
analysis techniques. International Journal of Remote Sensing, 23, 4131–4153.
Plane, D. A., & Rogerson, P. A. (1994). The geographical analysis of population with applications to
business and planning. New York: Wiley.
Rashed, T., Weeks, J. R., Gadalla, M. S., & Hill, A. G. (2001). Revealing the anatomy of cities through
spectral mixture analysis of multispectral satellite imagery: A case study of the Greater Cairo region,
Egypt. Geocarto International, 16(4), 5–15.
Ridd, M. K. (1995). Exploring a V–I–S (vegetation–impervious surface–soil) model for urban ecosystem
analysis through remote sensing: Comparative anatomy for cities. International Journal of Remote
Sensing, 16, 2165–2185.
Sadahiro, Y. (1999). Accuracy of areal interpolation: A comparison of alternative methods. Journal of
Geographical Systems, 1, 323–346.
Tobler, W. (1999). Linear pycnophylactic reallocation—comment on a paper by D. Martin. International
Journal of Geographical Information Science, 13(1), 85–90.
United States Census Bureau (2002). United States Census 2000. <http://www.census.gov/main/www/
cen2000.html>.
Vauclin, M., Vieira, S. R., Vachaud, G., & Nielsen, D. R. (1983). The use of cokriging with limited field
soil observations. Journal of Soil Science Society of American, 47(2), 175–184.
Webster, R. (1985). Quantitative spatial analysis of soil in the field. Advances in Soil Science, 3, 1–70.
Webster, R., & Burgess, T. M. (1980). Optimal interpolation and isarithmic mapping of soil properties, III
changing drift and universal kriging. Journal of Soil Science, 31, 505–524.
Woodcock, C. E., Strahler, A. H., & Jupp, D. L. B. (1988). The use of variograms in remote sensing: I.
Scene models and simulated images. Remote Sensing of Environment, 25, 323–348.
Wu, C., & Murray, A. T. (2003). Estimating impervious surface distribution by spectral mixture analysis.
Remote Sensing of Environment, 84, 493–505.
Top Related