Download - A cokriging method for estimating population density in urban areas · 2017-12-22 · A cokriging method for estimating population density in urban areas Changshan Wu a,*, Alan T.

Computers, Environment and Urban Systems

29 (2005) 558–579

www.elsevier.com/locate/compenvurbsys

A cokriging method for estimatingpopulation density in urban areas

Changshan Wu a,*, Alan T. Murray b,1

a Department of Geography, University of Wisconsin-Milwaukee, P.O. Box 413,

Milwaukee, WI 53201-0413, USAb Department of Geography, The Ohio State University, Columbus, OH 43210-1361, USA

Abstract

Population information is typically available for analysis in aggregate socioeconomic

reporting zones, such as census blocks in the United States and enumeration districts in the

United Kingdom. However, such data mask underlying individual population distributions

and may be incompatible with other information sources (e.g. school districts, transportation

analysis zones, metropolitan statistical areas, etc.). Moreover, it is well known that there are

potential significance issues associated with scale and reporting units, the modifiable areal unit

problem (MAUP), when such data are used in analysis. This may lead to biased results in spa-

tial modeling approaches. In this study, impervious surface fraction derived from Thematic

Mapper (TM) imagery was applied to derive the underlying population of an urban region.

A cokriging method was developed to interpolate population density by modeling the spatial

correlation and cross-correlation of population and impervious surface fraction. Results sug-

gest that population density can be accurately estimated using cokriging applied to impervious

surface fraction. In particular, the relative population estimation error is �0.3% for the entire

study area and 10–15% at block group and tract levels. Moreover, unlike other interpolation

methods, cokriging gives estimation variance at the TM pixel level.

� 2005 Elsevier Ltd. All rights reserved.

0198-9715/$ - see front matter � 2005 Elsevier Ltd. All rights reserved.

doi:10.1016/j.compenvurbsys.2005.01.006

* Corresponding author. Tel.: +1 414 2294860; fax: +1 414 2293981.

E-mail addresses: [email protected] (C. Wu), [email protected] (A.T. Murray).1 Tel.: +1 614 688 5441; fax: +1 614 292 6213.

mailto:[email protected]

mailto:[email protected]

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579 559

Keywords: Population interpolation; Cokriging; Remote sensing

1. Introduction

The difficulties associated with the application of zone-based census population

data in geographical analyses have been well documented in previous studies

(Fotheringham & Wong, 1991; Martin, 1989, 1996). One important issue is data

aggregation. In many applications, census data cannot sufficiently represent theunderlying geographical distribution of population because it is reported through

aggregating individual population counts in irregular areal units, which can be geo-

graphically meaningless. This aggregation tends to smooth local variability and

requires an assumption of uniformly distributed population within a reporting unit

(Moon & Farmer, 2001). While there are legitimate reasons for reporting census

information in this way (i.e. privacy of census respondents), business and service

planning benefit substantially from greater resolution population data (Longley

& Clarke, 1995). For example, Martin and Williams (1992) and Beguin, Thomas,and Vandenbussche (1992) emphasized the importance of detailed population

information in the location analyses of health-care centers and public libraries.

Moreover, in urban sustainability studies Harris and Longley (2000) point out that

census-based models tend to overestimate residential area because of its coarse

resolution.

Another difficulty with zone-based population data is related to incompatible

spatial information layers (Bracken, 1993; Goodchild, Anselin, & Deichmann,

1993). Different departments and agencies collect and distribute data in varying zo-nal arrangements (e.g. school districts, transportation analysis zones, metropolitan

statistical areas, etc.). As a consequence, a significant problem arises in regional

analysis and modeling, in which multiple data sources must be integrated before

analysis can be implemented (Goodchild et al., 1993). Moreover, the boundaries

of areal units in census data are not data derived, but rather are the result of enu-

meration and reporting. The modifiable areal unit problem (MAUP) may exist

when utilizing such data in geographical applications. In particular, the relation-

ship between variables may only be valid for one particular zonal arrangementand scale, potentially biasing results obtained in statistical and spatial analyses

(Martin, 1996; Openshaw, 1977).

One approach for dealing with the above problems is to transform aggregated

census data to grid-based population estimates using areal interpolation (Langford,

Maguire, & Unwin, 1991; Martin, 1989; Okabe & Sadahiro, 1997). Areal interpola-

tion methods may be grouped into two categories: simple interpolation and intelli-

gent interpolation (Okabe & Sadahiro, 1997). Simple interpolation involves

transferring data from irregular polygons to regular grids without any supplemen-tary data (Lam, 1983; Martin, 1996; Tobler, 1999). This method is preferred when

fast computation is important or additional information is unavailable (Okabe &

Sadahiro, 1997). In contrast, intelligent interpolation transfers data with the help

560 C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558–579

of additional information (Harris & Longley, 2000; Langford et al., 1991). This

method has proven more accurate than simple interpolation, although greater

computational processing is required (Fisher & Langford, 1995; Sadahiro, 1999).

Regression analyses supplemented with land use and land cover data are often ap-

plied in intelligent interpolation (Langford et al., 1991; Langford & Unwin, 1994).However, detailed biophysical information is usually lost in producing land use data

from remotely sensed images (Jensen, 1983). As a result, limited land use types are

too coarse for estimating detailed population density. Moreover, the basic assump-

tions of regression analyses (e.g. spatial independence) are unlikely to be satisfied in

geographical applications (Griffith & Can, 1996). Impervious surface fraction in res-

idential areas may be useful for supplementing the developed interpolation process.

Detailed information on residential areas can thus be maintained, providing clues on

population distribution (Ji & Jensen, 1999). Spatial autocorrelation in impervioussurface fraction and population, and the cross-correlation between these two spatial

variables, are explored and modeled in this paper using geostatistical techniques.

Based on modeled spatial relationships, cokriging is applied in this paper to deter-

mine population density in Columbus, OH.

The organization of this paper is as follows. Our study area and data sources

are described in Section 2. The process of deriving impervious surface fraction

in residential areas from remotely sensed imagery is described in Section 3. In par-

ticular, we detail the creation of impervious surface fraction from ETM+ imageryfor the entire study region and describe a procedure for delineating residential

areas within this region. Population density estimation using cokriging combined

with residential impervious surface fraction is reported in Section 4. Accuracy

assessment of the population estimates is addressed in Section 5. Section 6 reports

an adjustment of the population estimates. Finally, conclusions and discussion are

provided in Section 7.

2. Study area and data sources

A portion of the Columbus metropolitan area in Franklin County, OH, USA was

chosen as our study region for this research. This region is 47.4 km2 and is divided

into 36 tracts, 125 block groups, and 2445 blocks in the 2000 US Census (see Fig. 1).

The 2000 Census data were acquired from the ESRI website in the shapefile format

(United States Census Bureau, 2002). Landsat 7 ETM+ imagery, which was utilized

to derive residential impervious surface fraction, was acquired on July 8, 1999. Addi-tional data, such as Digital Orthophoto Quarterquadrangles (DOQQs) from the

Ohio Geographically Referenced Information Program (OGRIP, 1999) and Na-

tional Land Cover Data (NLCD) from the Multi-Resolution Land Characteristics

Consortium (Multi-Resolution Land Characteristics Consortium, 2002), were uti-

lized to examine residential classification accuracy and select training samples. More-

over, parcel data from the Franklin County Auditor (2002) and address-based

employment data from the Mid-Ohio Regional Planning Commission (MORPC,

Fig. 1. Study area as part of the Columbus metropolitan area in Franklin County, OH, USA (left) and

Landsat ETM+ image acquired on July 8, 1999 for this area (right).


2002) were utilized to identify possible misclassified pixels since these data maintain

detailed local information about land use and employment.

3. Estimating impervious surface fraction in residential areas

Impervious surface is any material prohibiting the infiltration of water into soil.

As a major component of urban infrastructure, impervious surface has become a pri-

mary variable in urban planning and environmental management (Ji & Jensen, 1999;

Ridd, 1995). Impervious surface fraction, calculated as the proportion of impervious

surface over a small area, has been found to reveal more information about built-upareas than land use and land cover classification (Ji & Jensen, 1999). For population


estimation, as an example, impervious surface in residential areas generally corre-

sponds to housing, which serves as an indicator of people.

3.1. Impervious surface fraction estimation

Methods for quantifying impervious surface from remotely sensed data are typi-

cally based on either fuzzy classification or spectral mixture analysis (Ji & Jensen,

1999; Phinn, Stanford, Scarth, Murray, & Shyy, 2002; Rashed, Weeks, Gadalla, &

Hill, 2001). In this study, a spectral mixture analysis method was applied to estimate

impervious surface fraction from an ETM+ image (Wu & Murray, 2003). Four end-

members (see Fig. 2), vegetation, high albedo, low albedo and soil, were selected to

represent heterogeneous urban land use and land cover through the analysis of the

spectral feature spaces of a transformed ETM+ image using the maximum noisefraction (MNF) transformation, the details of which are given in Green, Berman,

Switzer, and Craig (1988) and Lee, Woodyatt, and Berman (1990). Consequently,

a fully constrained four-endmember linear mixing model was applied to calculate

each endmember fraction from the Landsat ETM+ data (see Fig. 3). Furthermore,

impervious surface fraction in each ETM+ pixel was modeled by adding the frac-

tions of low albedo and high albedo endmembers after removing the effects of water

and clouds (see Fig. 4).

3.2. Residential area classification

To this point we have detailed impervious surface fraction estimation for the

entire study area. However, we know that population (the major interest in this

Fig. 2. ETM+ reflectance spectra of selected endmembers. These endmembers were chosen by analyzing

the spectral feature spaces of the MNF transformed ETM+ image.

Fig. 3. Endmember fraction images calculated through a fully constrained four-endmember linear mixing

model: (a) vegetation fraction image; (b) high albedo fraction image; (c) low albedo fraction image

(including water); (d) soil fraction image.


research) is generally restricted to residential areas. Therefore, it is necessary to iden-

tify residential land use within the study area. A maximum likelihood classification

was applied to delineate residential pixels. Similar approaches have been utilized in

classifying residential land uses by Lo (1995), Mesev (1998), and Chen (2002). Six

classes, vegetation, soil, water, commercial and transportation, low density residen-

tial, and high density residential, were specified in selecting training samples with the

help of DOQQ data, NLCD data, and the original ETM+ image. The classification(see Fig. 5) was conducted using a maximum likelihood classifier provided in ER-

DAS Imagine 8.4 (ERDAS Imagine, 1997). After deriving this image, we grouped

the six classes into two major classes: residential and non-residential.

Since we are estimating detailed population density, residential classification accu-

racy is essential in this research. Therefore, we performed post-processing to identify

possible misclassified pixels. In particular, pixels within zero population census

Fig. 4. Impervious surface fraction image calculated through adding low albedo and high albedo

endmember fractions after removing the effects of water and clouds.

Fig. 5. A maximum likelihood classification of the ETM+ image for the Columbus metropolitan area.


blocks should obviously not be classified as residential land use. Such pixels were

identified and reclassified as non-residential. Alternatively, pixels within high popu-

lation density census blocks were also subject to further scrutiny. If these pixels are

not classified as residential, they are possibly misclassified and require further anal-

ysis. In this study, we utilized parcel and employment data to identify potential mis-

classified pixels. Group-quarter populations, people in institutions, shelters, andnursing homes, and students in university dormitories (Plane & Rogerso, 1994), were

typically found in these misclassified pixels. Such areas are difficult to classify using

only remotely sensed data because they share similar spectral signatures to commer-

Fig. 6. Residential land use classification after the maximum likelihood classification and post-processing.

Table 1

Residential land use classification accuracy assessment

Classified image Reference image

Residential Non-residential Commission errors (%)

Residential 146 15 9.32

Non-residential 25 214 10.46

Omission error 14.62 6.55

Overall accuracy = 90.00%, overall kappa statistics = 0.7942.


cial land uses. With the help of parcel and employment data, we were able to identify

these pixels and reclassify them as residential areas.

The classification accuracy of residential land use after the maximum likelihood

classification and post-processing (see Fig. 6) was examined using 400 stratified ran-

domly selected samples. The DOQQ images acquired between 1994 and 1995 were

used in this study for ground truthing. These DOQQs were co-registered with the

ETM+ image. A 3 by 3 sampling unit was adopted to avoid geometric errors. The

overall classification accuracy is 90% and the overall kappa coefficient is 0.7942(see Table 1).

With impervious surface fraction for the entire study area (Fig. 4) and the iden-

tified residential land use areas (Fig. 6), impervious surface fraction in residential

areas was easily obtained (see Fig. 7).

4. Interpolating population density using cokriging

After obtaining impervious surface fraction for residential areas, it can be utilized

as supplementary data to interpolate population density. Population density is

Fig. 7. Impervious surface fraction in residential areas.


usually estimated using a regression approach, which models the relationship be-

tween population and supplementary data derived from remote sensing imagery

(Chen, 2002; Harvey, 2002; Lo, 1995). An implicit assumption of regression analysis

is that population density is spatially independent. However, many researchers have

questioned this assumption, claiming that simple regression may lead to biased re-

sults (Griffith, 1993; Griffith & Can, 1996). Therefore, a model considering spatial

autocorrelation is more appropriate. Cokriging may improve the estimation preci-sion by accounting simultaneously for spatial autocorrelation in population density

and impervious surface fraction and the cross-correlation between these spatial vari-

ables. Moreover, it is suitable when the variable to be estimated (e.g. population den-

sity) is under-sampled while other supplementary variables are abundant (e.g.

impervious surface fraction).

Cokriging is a geostatistical method originating from mining applications (Cres-

sie, 1993; Journel & Huijbregts, 1978) and widely applied in soil science (Vauclin,

Vieira, Vachaud, & Nielsen, 1983; Webster, 1985; Webster & Burgess, 1980). Geosta-tistical methods were introduced in remote sensing in the late 1980s (Curran, 1988;

Woodcock, Strahler, & Jupp, 1988). Now geostatistics are commonly applied in soil

science, biogeography, climatology, and environmental studies (Atkinson, Webster,

& Curran, 1992, 1994; Oliver, Webster, & Gerrard, 1989a, 1989b). A review of geo-

statistical methods and associated applications may be found in Cressie (1993), Cur-

ran and Atkinson (1998), and Curran (2001). Although widely applied in physical

geography, cokriging has rarely been utilized in estimating socio-economic condi-

tions, such as population densities. In this paper, population density is estimatedusing a cokriging method in which the impervious surface fraction is taken as a sec-

ondary variable to improve estimation accuracy.


4.1. Cokriging theory

As an extension to two or more variables in ordinary kriging, cokriging is based

on regionalized variable theory (Journel & Huijbregts, 1978; Oliver et al., 1989a).

According to this theory, any regionalized variable z(x) can be considered a realiza-tion of a random function Z(x), which is a combination of a deterministic compo-

nent, m(x), and random fluctuation, e(x):

zðxÞ ¼ mðxÞ þ eðxÞ ð1Þ

where x denotes the geographical coordinates in one, two, or three dimensions; m(x)

indicates a geographical trend or drift; and, e(x) is the spatially dependent random

errors with mean zero. In most applications, the deterministic component, m(x), isassumed to be locally constant,

mðxÞ ¼ l ð2Þ

and for any given distance and direction h, the variance of differences between z(x)

and z(x + h) is finite and independent of x:

var½zðxÞ � zðxþ hÞ� ¼ E½fzðxÞ � zðxþ hÞg2� ¼ 2cðhÞ ð3Þwhere vector h, the lag, is a given separation distance and direction from x, and c(h)is the variogram. c(h) has been found to be an important tool in modeling spatial

autocorrelation (Journel & Huijbregts, 1978). Moreover, if two or more variablesare needed, a cross-variogram is defined as follows:

cuvðhÞ ¼ 12E½fzuðxÞ � zuðxþ hÞgfzvðxÞ � zvðxþ hÞg� ð4Þ

Based on regionalized variable theory, it is necessary to estimate an under-sampled

variable using cokriging. This method ensures unbiased estimates with minimum and

known variance (Curran, 2001). If we consider estimating a variable u in a block B

with sampling points of u and a second variable v, our estimate will be

zuðBÞ ¼XNu

i¼1

kuizuðxuiÞ þXNv

j¼1

kvjzvðxvjÞ ð5Þ

in which Nu and Nv are the number of sampling points for variable u and v; xui and

xvj are the locations of sampling points for variable u and v, respectively; and, kui andkvj are the weights to be calculated.

In order to ensure unbiasedness, the following constraints must be satisfied(Aboufirassi & Marino, 1984):XNu

i¼1

kui ¼ 1 ð6Þ

XNv

j¼1

kvj ¼ 0 ð7Þ


The first constraint indicates that at least one observation of the primary variable u is

necessary for cokriging. Moreover, constraint (7) ensures that the summation of the

weights for the secondary variable v is zero. Subject to these constraints, we minimize

the estimation variance:

r2uðBÞ ¼ E½fzuðBÞ � zuðBÞg2� ð8Þ

This is an optimization problem in which kui and kvj are the decision variables and

r2uðBÞ is the objective function. Standard Lagrangian techniques can be applied to

solve this problem. This results in the following:

XNu

i¼1

kuicuuðxui; xukÞ þXNv

j¼1

kvjcuvðxuk; xvjÞ þ wu ¼ �cuuðB; xukÞ k ¼ 1;Nu ð9Þ

XNu

i¼1

kuicuvðxui; xvlÞ þXNv

j¼1

kvjcvvðxvj; xvlÞ þ wv ¼ �cuvðB; xvlÞ l ¼ 1;Nv ð10Þ

cuu(xui, xuk) is the semi-variogram of variable u between site i and k, cuv(xuk, xvj) is thecross semi-variogram between variable u and v at site k and j. Finally, �cuvðB; xvlÞ is thecross semi-variogram between variable u and v at block B and site l.

Using this method, there are Nu + Nv + 2 equations and Nu + Nv + 2 variables,

which can be easily solved by linear algebra. After obtaining the parameters kuiand kvj, zuðBÞ may be estimated using Eq. (5). The cokriging variance can be obtained

as a byproduct of the cokriging process as follows:

r2uðBÞ ¼

XNu

i¼1

kui�cuuðB; xuiÞ þXNv

j¼1

kvj�cuvðB; xvjÞ � wu � �cuuðB;BÞ ð11Þ

Matrix formulations of these equations can be found in Myers (1982), McBratney

and Webster (1983), and Aboufirassi and Marino (1984). Details on solving this

problem using Lagrangian techniques are given in Vauclin et al. (1983) and Atkinson

et al. (1992).

4.2. Variogram estimation

From Eqs. (6), (7), (9) and (10), it is clear that parameters kui and kvj are depen-

dent on the variograms associated with variables u and v, their cross-variogram, and

block size. In this study, block size is defined to be the same as the TM image reso-

lution (30 m by 30 m). Therefore, once the variograms and cross-variogram have

been derived, cokriging is a straightforward process (Atkinson et al., 1992, Atkinson,

Webster, & Curran, 1994). In practice, the variograms are typically estimated using

sampling points as follows:

cðhÞ ¼ 1

2NðhÞXNðhÞ

i¼1

fzðxiÞ � zðxi þ hÞg2 ð12Þ


where z(xi) are known values of variable u or v at sampling point xi, and N(h) is the

number of sampling point pairs separated by lag h. Similarly, the cross-variogram

can be estimated as follows:

cuvðhÞ ¼1

2NðhÞXNðhÞ

i¼1

fzuðxiÞ � zuðxi þ hÞg fzvðxiÞ � zvðxi þ hÞg ð13Þ

After obtaining the variogram and cross-variogram, a theoretical model is needed to

fit them. Such a model needs to be positive definite and coregionalized to ensure the

cokriging variance is non-negative. More discussion about choosing theoretical func-

tions can be found in McBratney and Webster (1986) and Curran (1988). In thisstudy, we chose the model satisfying the positive definite and coregionalized require-

ments, the details of which are discussed later in this paper.

4.3. Interpolating population density using cokriging

In this study population density is considered the primary variable to be esti-

mated. In addition, residential impervious surface fraction is considered a secondary

variable used to increase estimation accuracy. One issue is that reported census sta-tistics are not based on a sampling point, but rather on an areal unit like a block. The

centroid of a census block may be used as the sampling point for the assignment of

population density. However, this method is not realistic because there may not

actually be people at the centroid of a block. Martin (1989) solved this problem

by using a population-weighted point as the representative point of a census block.

In a similar manner, in this research the central point of the pixel whose impervious

surface fraction is approximately equal to the block mean is used as a population-

weighted block point. In addition, we assign impervious surface fraction of the pixeland average population density of the block to this sampling point. After obtaining

the impervious surface fraction and population density on these samples, the charac-

teristics of the data are explored. If they are not secondary stationary, i.e. have the

same mean and variance, the accuracy of the estimated experimental variogram and

associated cokriging will be degraded (Cressie, 1993). The histograms for population

density (see Fig. 8a) and impervious surface fraction (see Fig. 9) were captured based

on the sampling points. It is clear that population density is highly positively skewed

and may be approximated by a Poisson function with its variance proportional to itsmean value (Bailey & Gatrell, 1995; Harvey, 2002). A square root transformation

was performed on population density to stabilize its variance. The histogram of

the transformed population density (see Fig. 8b) shows that its distribution is near

normal and its variance is approximately constant. The histogram of impervious sur-

face fraction is slightly negatively skewed, but may be considered approximately nor-

mal. Thus, no transformation was conducted on impervious surface fraction. We

excluded zero population density census blocks because no interpolation is necessary

for these blocks.In this study, the primary variable u is the square root of population density, and

the secondary variable v is impervious surface fraction. Experimental variograms

Fig. 8. Histogram of (a) population density and (b) square root of population density at sampling points.

It shows that population density may be described by a Poisson distribution, while the square root

transformation is a reasonable approximation of a normal distribution.

Fig. 9. Histogram of impervious surface fraction at sampling points.


and cross-variograms were calculated using Eqs. (3) and (4). Gstat software was uti-

lized to fit these variograms to theoretical functions (Pebesma & Wesselin, 1998).

The weighted least squared method and visualization were applied in modeling the

experimental variograms (Cressie, 1985). Directional variograms were also com-

puted and no obvious anisotropies were found. Therefore, the variograms were as-

sumed to be isotropic and were fitted using an exponential model of the following

form:


cðhÞ ¼ C0 þ C1f1� eð�h=rÞg for h > 0

0 h ¼ 0

(ð14Þ

Here C0 is the nugget representing unexplained variance and r defines the spatial

scale of the variation. In practice, the sill is C0 + 0.95C1 at the point of 3r. In this

study, the parameters were calculated for the variograms of the square root of pop-

ulation density and impervious surface fraction, and also for their cross-variogram

(see Table 2 and Fig. 10).After obtaining the variograms of impervious surface fraction, square root of

population density, and their cross-variogram, a block cokriging was performed to

interpolate population density (see Fig. 11) using Gstat software embedded in

Idrisi (Harmon, 2002). Fig. 11 shows a clear geographical pattern of population

distribution in the study region. In particular, few people live in the CBD except

Table 2

Coefficients of the theoretical variogram and cross-variogram functions

C0 C1 r

Population density 0.196 0.176 1000

Impervious surface 0.007 0.0089 1000

Population density–impervious surface 0.012 0.030 1000

Fig. 10. Variograms of (a) square root of population density, (b) residential impervious surface fraction,

and (c) the cross-variogram between square root of population density and impervious surface fraction.

Exponential functions with r = 1000 are chosen to model these variograms.

Fig. 11. Estimated population density using developed cokriging method. The height indicates the value

of population density for each TM pixel. The average population density is 4.28, with a maximum of 52,

and a minimum of 0.


group-quarter populations. High-density household-based populations are adjacent

to the CBD in the southern and northwestern portions of the study region. More-

over, low-density household-based populations reside relatively far away from the

CBD (in the eastern and southern portions).

5. Accuracy assessment

Using the cokriging variance approach defined in Eq. (11) for the square root of

population density, the mean cokriging variance is 23.5% (minimum of 21.3% and

maximum of 50.3%). Fig. 12 shows the distribution of cokriging variance in the

study area. In particular, cokriging variance is high along the study area boundary

because few samples are used in estimating population density in this portion of

the region.It is possible to examine population count estimation accuracies at each census

zonal level using the root mean square error (ERMS) and coefficient of variation

(V) to evaluate the absolute and relative error as follows:

ERMS ¼ 1

n

Xn

i¼1

ðP i � bP iÞ2" #1=2

ð15Þ

V ¼ 1

P

Xn

i¼1

jP i � bP ij ð16Þ

Fig. 12. Cokriging variance of the square root of population density estimation. The average cokriging

variance is 0.235, with a maximum of 0.503, and a minimum of 0.213.


where n is the number of total census zones; P is the total population in the study

area; Pi is the population count of census zone i; and bP i is the estimated population

count for zone i. The overall regional assessment of population count estimation

accuracy can be carried out using the relative estimation error (R):

R ¼ ðbP � P Þ=P ð17Þwhere bP is the total population estimate for the study area.

The cokriging method contrasts the traditional regression approach used to esti-

mate population density. The first regression model explores the relationship be-tween population density and the proportion of low and high density residential

areas within a census block (Langford et al., 1991; Lo, 1995; Chen, 2002). Applied

to our study area, the model is as follows (see Table 3):bP T

i ¼ 2.25526 � RLi þ 5.0612 � RH

i ð18Þwhere RL

i and RHi are the proportion of low and high density residential areas in a

census block and bP T

i is the expected population density in a census block using

the traditional regression approach.

A valid alternative regression model would be investigating the relationship be-

tween population density and impervious surface fraction in low and high density

Table 3

Coefficients of the regression model with residential land cover classes as explanatory variables

Coefficients Value Std. error t value Pr(>jtj)RL 2.2552 0.1727 13.0554 0.0000

RH 5.0612 0.0958 52.8349 0.0000

Table 4

Coefficients of the regression model with residential impervious surface fraction as explanatory variables

Coefficients Value Std. error t value Pr(>jtj)IL 6.5798 0.4335 15.1793 0.0000

IH 9.4650 0.1687 56.1212 0.0000


residential areas within each census block. Applied to our study area, the model is as

follows (see Table 4):bP A

i ¼ 6.5798 � ILi þ 9.4650 � IHi ð19Þwhere lLi and lHi are the fraction of impervious surface in low and high residential

areas in a census block and bP A

i is the expected population density in a census block

using this alternative regression approach. In both regression models, the area of

each census block was chosen as a weighting factor to reduce the effects of zone size.

Moreover, the intercepts in these regression models are not included because they are

not statistically significant (further its meaning in population estimation is not clear).

The explanatory variables are statistically significant (p 6 0.0001), which shows the

strong correlation between population density and the chosen explanatory variables(see Tables 3 and 4).

Comparative results (see Table 5) show that the cokriging method is the most

accurate. In particular, the coefficient of variation is relatively low at the census

block level (34.7%), low at the block group and tract levels (15.2% and 10.2% respec-

tively), and near zero for the entire study area (�0.3%). The estimation accuracies of

the two regression models are reported in Table 5 as well. Neither regression models

perform as well as the cokriging method in terms of estimation accuracy. As an

example, the coefficients of variation for the census tract level in the regression mod-els are 22.9% and 21.0% respectively, substantially higher than the variation ob-

tained using cokriging (10.2%). Comparing the two regression models, regression

with impervious surface fraction is slightly better than with land use classes (e.g.

21.0% vs. 22.9% estimate error at the census tract level). This result is consistent with

the literature showing that impervious surface fraction performs better than land

use/cover in urban analysis (Ji & Jensen, 1999).

Table 5

Absolute and relative estimation errors of the cokriging and regression models

Zones Average

population

Cokriging Regression with

land cover

Regression with

impervious surface

ERMS V ERMS V ERMS V

Block (2445) 40.99 45.3 34.7% 47.9 48.8% 45.5 46.6%

Block group (125) 801.74 215.0 15.2% 325.6 27.8% 290.7 25.2%

Tract (36) 2825.84 411.0 10.2% 967.6 22.9% 846.0 21.0%

Total study area 100, 200 �0.3% 1.0% 2.6%


6. Population density adjustment

The cokriging approach gives unbiased estimates for the square root of popula-

tion density with minimum variance. However, the population count estimation er-

rors evaluated at the census block level are still somewhat large (34.7%). As discussedin previous studies (Langford & Unwin, 1994; Fisher & Langford, 1995; Martin,

1996), interpolation methods should preserve population counts in each reporting

zone. One option is adding a volume-preserving constraint in the cokriging model.

However, this will make the model more complex since it has a quadratic objective

function and a quadratic regional constraint. In fact, it is not clear that this resulting

model can be solved, exactly or heuristically. An alternative option is to rescale the

population estimates on every pixel to satisfy this zonal constraint:

P �ij ¼ bP �

ij

P ibP i

ð20Þ

Here P �ij is the rescaled population estimates of pixel j in census block i, bP ij is the

population estimates through the cokriging, and Pi and bP i are the population counts

of block i (census count and cokriging estimates, respectively). This rescaled popu-

lation density (see Fig. 13) generally maintains the estimates obtained using cokri-ging, but emphasizes local variation as well. For example, the cokriging method

tends to underestimate population counts in multi-story and high-rise buildings

(the middle portion of Fig. 11). In contrast, the rescaling approach adjusts these

inaccuracies and obtains more accurate population density estimates.

Fig. 13. Adjusted population density that preserves zonal population counts. The height indicates the

value of population density for each TM pixel. The average population density is 4.40, with a maximum of

143, and a minimum of 0.


7. Conclusion

In this paper a cokriging method was developed for interpolating residential pop-

ulation density using census count data and impervious surface fraction. The results

are clearly better than regression-based interpolation approaches. In particular, therelative population estimation error for the entire study area is �0.3%, which is bet-

ter than the results obtained using regression methods (1.0%–2.6% estimation error).

Moreover, the estimation errors at the census block group and tract levels (15.2%

and 10.2% respectively) are about 10% lower than those calculated using regression

models (about 25–27% and 21–23% respectively). At census block level, the estima-

tion error is about 13–15% lower than those reported for the regression models (see

Table 3). These results demonstrate that cokriging applied to residential impervious

surface fraction is a superior alternative to traditional regression based interpolationapproaches using land use and land cover data.

One reason explaining why cokriging performs well is that it addresses spatial

autocorrelation and cross-autocorrelation associated with the distribution of people

in urban areas. Instead of ignoring spatial dependence, it models the spatial autocor-

relation of population and impervious surface fraction through variograms, and ap-

plies them in population interpolation. Moreover, unlike other interpolation

methods, it provides estimation variance (see Eq. (11) and Fig. 12) at the TM pixel

level (30 by 30 meter). This estimation variance is an important tool for assessingpopulation estimation error, without aggregating to census reporting zones.

Another interesting aspect of this work is that residential impervious surface frac-

tion was found to be an effective replacement for land use and land cover data typ-

ically used in modeling population density. This makes sense intuitively given that

impervious surface fraction is closely related to housing development, and thus pop-

ulation density. Moreover, the cross-variogram (see Fig. 10c and Table 2) clearly

shows that population density and impervious surface fraction are co-regionalized

variables, with only 25% variance unexplained. Also, regression analyses show thatthe regression model with impervious surface fraction consistently performs better

than the other utilizing land use classes.

A final point is that the obtained population estimates are essential for urban

planning applications. As an example, in sustainability studies, residential popula-

tion density is a primary indicator of automobile dependent regions (Harris & Long-

ley, 2000). In addition, the estimates of population density may be utilized in

transportation analyses. The traffic analysis zone (TAZ) is typically used as a basic

unit in traffic demand estimation and trip generation. However, there are significantproblems with traditional TAZ definitions as well as difficulties with associated tra-

vel distance calculation (Daganzo, 1980; Miller, 1999). Detailed population informa-

tion may be potentially helpful in redefining TAZs in order to achieve more

homogeneous population densities and socio-economic characteristics, thus poten-

tially eliminating the modifiable areal unit problem in a range transportation analy-

sis approaches.

While the developed approach is a considerable improvement for estimating pop-

ulation density at a fine scale, there are potential improvements that may be worth


exploring. One improvement would be satisfying the volume preserving constraint

during the interpolation process, requiring that interpolated population counts in

every census zone be equal to observed counts. In this study, we satisfied this con-

straint by rescaling population density in every pixel after interpolation. Although

the population counts in every census zone are maintained, this adjustment mayintroduce bias and increase estimation variance. More sophisticated models might

increase population density estimation accuracy and maintain population counts

in every census zone simultaneously.

References

Aboufirassi, M., & Marino, M. A. (1984). Cokriging of aquifer transmissivities from field measurements of

transmissivity and specific capacity. Mathematical Geology, 16(1), 19–35.

Atkinson, P. M., Webster, R., & Curran, P. J. (1992). Cokriging with ground-based radiometry. Remote

Sensing of Environment, 41, 45–60.

Atkinson, P. M., Webster, R., & Curran, P. J. (1994). Cokriging with airborne MSS imagery. Remote

Sensing of Environment, 50, 335–345.

Bailey, T., & Gatrell, A. C. (1995). Chapter 7: The analysis of area data. Interactive Spatial Data Analysis,

Longman Group Limited.

Beguin, H., Thomas, I., & Vandenbussche, D. (1992). Weight variation with a set of demand points, and

location–allocation issues: A case study of public libraries. Environment and Planning A, 24, 1769–1779.

Bracken, I. (1993). An extensive surface model database for population related information: Concept and

application. Environment and Planning B, 20, 13–27.

Chen, K. (2002). An approach to linking remotely sensed data and areal census data. International Journal

of Remote Sensing, 23, 37–48.

Cressie, N. (1985). Fitting variogram models by weighted least squares. Mathematical Geology, 17,

563–586.

Cressie, N. (1993). Statistics for spatial data (revised edition). New York: Wiley.

Curran, P. J. (1988). The semivariogram in remote sensing: An introduction. Remote Sensing of

Environment, 24, 493–507.

Curran, P. J. (2001). Remote sensing: Using the spatial domain. Environmental and Ecological Statistics, 8,

331–344.

Curran, P. J., & Atkinson, P. M. (1998). Geostatistics and remote sensing. Progress in Physical Geography,

22(1), 61–78.

Daganzo, C. F. (1980). Network representation, continuum approximations and a solution to the spatial

aggregation problem of traffic assignment. Transportation Research, 14B, 229–239.

ERDAS Imagine (1997). ERDAS Imagine tour guides (4th ed.). Atlanta Georgia: ERDAS, Inc.

Fisher, P. F., & Langford, M. (1995). Modeling the errors in areal interpolation between zonal systems by

Monte Carlo simulation. Environment and Planning A, 27, 211–224.

Fotheringham, A. S., & Wong, D. W. S. (1991). The modifiable areal unit problem in multivariate

statistical analysis. Environmental and Planning A, 23, 1025–1034.

Franklin County Auditor (2002). Franklin county auditor�s interactive geographic information system.

<http://209.51.193.83/search.html>.

Goodchild, M. F., Anselin, L., & Deichmann, U. (1993). A framework for the areal interpolation of

socioeconomic data. Environment and Planning A, 25, 383–397.

Green, A. A., Berman, M., Switzer, P., & Craig, M. D. (1988). A transformation for ordering multispectral

data in terms of image quality with implications for noise removal. IEEE Transactions on Geoscience

and Remote Sensing, 26, 65–74.

Griffith, D. A. (1993). Spatial regression analysis on the PC: Spatial statistics using SAS. Washington, DC:

Association of American Geographers.

http://209.51.193.83/search.html


Griffith, D. A., & Can, A. (1996). Spatial statistical/econometric version of simple urban population

density models. In S. L. Arlinghaus & D. A. Griffith (Eds.), Practical handbook of spatial statistics.

CRC Press.

Harmon, D. (2002). Quick Take Reviews: Idrisi32 Release 2. GEOWorld, March, pp. 50–51.

Harris, R. J., & Longley, P. A. (2000). New data and approaches for urban analysis: Modeling residential

densities. Transactions in GIS, 4(3), 217–234.

Harvey, J. T. (2002). Estimating census district populations from satellite imagery: Some approaches and

limitations. International Journal of Remote Sensing, 23(10), 2071–2095.

Jensen, J. R. (1983). Biophysical remote sensing. Annals of the Association of American Geographers, 73,

111–132.

Ji, M., & Jensen, J. R. (1999). Effectiveness of subpixel analysis in detecting and quantifying urban

imperviousness from Landsat Thematic Mapper imagery. Geocarto International, 14(4), 31–39.

Journel, A. G., & Huijbregts, C. J. (1978). Mining geostatistics. New York: Academic Press.

Lam, N. S. (1983). Spatial interpolation methods: A review. American Cartographer, 10(2), 129–149.

Langford, M., Maguire, D. J., & Unwin, D. J. (1991). The areal interpolation problem: Estimating

population using remote sensing in a GIS framework. In I. Masser & M. Blakemore (Eds.), Handling

geographical information: Methodology and potential applications (pp. 55–77). Harlow, Essex:

Longman.

Langford, M., & Unwin, D. J. (1994). Generating and mapping population density surfaces within a

geographical information system. Cartographic Journal, 31, 21–26.

Lee, J. B., Woodyatt, A. S., & Berman, M. (1990). Enhancement of high spectral resolution remote sensing

data by a noise-adjusted principal components transformation. IEEE Transactions on Geoscience and

Remote Sensing, 28, 295–304.

Lo, C. P. (1995). Automated population and dwelling unit estimation from high-resolution satellite

images: A GIS approach. International Journal of Remote Sensing, 16(1), 17–34.

Longley, P., & Clarke, G. (1995). GIS for business and service planning. Cambridge: GeoInformation

International.

Martin, D. (1989). Mapping population data from zone centroid locations. Transactions—Institute of

British Geographers, 14, 90–97.

Martin, D. (1996). An assessment of surface and zonal models of population. International Journal of

Geographical Information Systems, 10(8), 973–989.

Martin, D., & Williams, H. C. W. L. (1992). Market-area analysis and accessibility to primary health-care

centers. Environment and Planning A, 24, 1009–1019.

McBratney, A. B., & Webster, R. (1983). Optimal interpolation and isarithmic mapping of soil properties:

5. Co-regionalization and multiple sampling strategy. Journal of Soil Science, 34(1), 137–162.

McBratney, A. B., & Webster, R. (1986). Choosing functions for semi-variograms of soil properties and

fitting them to sampling estimates. Journal of Soil Science, 37, 617–639.

Mesev, V. (1998). The use of census data in urban image classification. Photogrammetric Engineering and

Remote Sensing, 64, 431–438.

Mid-Ohio Regional Planning Commission (MORPC) (2002). GIS technology. <http://www.morpc-

soft.org/GIS/gis.htm>.

Miller, H. J. (1999). Potential contributions of spatial analysis to geographical information systems for

transportation (GIS-T). Geographical Analysis, 31(4), 373–399.

Moon, Z. K., & Farmer, F. L. (2001). Population density surface: A new approach to an old problem.

Society and Natural Resources, 14, 39–49.

Multi-Resolution Land Characteristics Consortium (MRLC) (2002). National land cover data (NLCD).

<http://www.epa.gov/mrlc/nlcd.html>.

Myers, D. E. (1982). Matrix formulation of co-kriging. Mathematical Geology, 14(3), 250–257.

Ohio Geographically Referenced Information Program (OGRIP) (1999). Digital orthophoto quarter-

quadrangles. <ftp.geodata.gis.state.oh.us/geodata/doqq>.

Okabe, A., & Sadahiro, Y. (1997). Variation in count data transferred from a set of irregular zones to a set

of regular zones through the point-in-polygon method. International Journal of Geographical

Information Science, 11(1), 93–106.

http://www.morpcsoft.org/GIS/gis.htm

http://www.morpcsoft.org/GIS/gis.htm

http://www.epa.gov/mrlc/nlcd.html


Oliver, M., Webster, R., & Gerrard, J. (1989a). Geostatistics in physical geography, Part I: Theory.

Transactions—Institute of British Geographers, 14, 259–269.

Oliver, M., Webster, R., & Gerrard, J. (1989b). Geostatistics in physical geography, Part II: Applications.

Transactions—Institute of British Geographers, 14, 270–286.

Openshaw, S. (1977). Optimal zoning systems for spatial interaction models. Environment and Planning A,

9, 169–184.

Pebesma, E. J., & Wesseling, C. G. (1998). Gstat: A program for geostatistical modeling, prediction and

simulation. Computers and Geosciences, 24(1), 17–31.

Phinn, S., Stanford, M., Scarth, P., Murray, A. T., & Shyy, T. (2002). Monitoring the composition and

form of urban environments based on the vegetation–impervious surface–soil (VIS) model by sub-pixel

analysis techniques. International Journal of Remote Sensing, 23, 4131–4153.

Plane, D. A., & Rogerson, P. A. (1994). The geographical analysis of population with applications to

business and planning. New York: Wiley.

Rashed, T., Weeks, J. R., Gadalla, M. S., & Hill, A. G. (2001). Revealing the anatomy of cities through

spectral mixture analysis of multispectral satellite imagery: A case study of the Greater Cairo region,

Egypt. Geocarto International, 16(4), 5–15.

Ridd, M. K. (1995). Exploring a V–I–S (vegetation–impervious surface–soil) model for urban ecosystem

analysis through remote sensing: Comparative anatomy for cities. International Journal of Remote

Sensing, 16, 2165–2185.

Sadahiro, Y. (1999). Accuracy of areal interpolation: A comparison of alternative methods. Journal of

Geographical Systems, 1, 323–346.

Tobler, W. (1999). Linear pycnophylactic reallocation—comment on a paper by D. Martin. International

Journal of Geographical Information Science, 13(1), 85–90.

United States Census Bureau (2002). United States Census 2000. <http://www.census.gov/main/www/

cen2000.html>.

Vauclin, M., Vieira, S. R., Vachaud, G., & Nielsen, D. R. (1983). The use of cokriging with limited field

soil observations. Journal of Soil Science Society of American, 47(2), 175–184.

Webster, R. (1985). Quantitative spatial analysis of soil in the field. Advances in Soil Science, 3, 1–70.

Webster, R., & Burgess, T. M. (1980). Optimal interpolation and isarithmic mapping of soil properties, III

changing drift and universal kriging. Journal of Soil Science, 31, 505–524.

Woodcock, C. E., Strahler, A. H., & Jupp, D. L. B. (1988). The use of variograms in remote sensing: I.

Scene models and simulated images. Remote Sensing of Environment, 25, 323–348.

Wu, C., & Murray, A. T. (2003). Estimating impervious surface distribution by spectral mixture analysis.

Remote Sensing of Environment, 84, 493–505.

http://www.census.gov/main/www/cen2000.html

http://www.census.gov/main/www/cen2000.html