Road tracking in aerial images based on human–computer interaction and Bayesian filtering

emote Sensing 61 (2006) 108–124www.elsevier.com/locate/isprsjprs

ISPRS Journal of Photogrammetry & R

Road tracking in aerial images based on human–computer interactionand Bayesian filtering

Jun Zhou a,⁎, Walter F. Bischof a, Terry Caelli b,c

a Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E8b Canberra Laboratory, National ICT Australia, Locked Bag 8001, Canberra ACT 2601, Australia

c Research School of Information Science and Engineering, Australian National University, Building 115, Canberra ACT 0200, Australia

Received 31 January 2006; received in revised form 11 September 2006; accepted 11 September 2006

Abstract

A typical way to update map road layers is to compare recent aerial images with existing map data, detect new roads and add themas cartographic entities to the road layer. This method cannot be fully automated because computer vision algorithms are still notsufficiently robust and reliable. More importantly, maps require final checking by a human due to the legal implications of errors. Inthis paper we introduce a road tracking system based on human–computer interactions (HCI) and Bayesian filtering. Bayesian filters,specifically, extended Kalman filters and particle filters, are used in conjunction with human inputs to estimate road axis points andupdate the tracking algorithms. Experimental results show that this approach is efficient and reliable and that it produces substantialsavings over the traditional manual map revision approach. The main contribution of the paper is to propose a general and practicalsystem that optimizes the performance of road tracking when both human and computer resources are involved.© 2006 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rightsreserved.

Keywords: Road tracking; Aerial images; Human–computer interaction; Bayesian filtering; Particle filter; Extended Kalman filter

1. Introduction

Map revision is traditionally a manual task especiallywhen maps are updated on the basis of aerial images andexisting map data. This is a time consuming andexpensive task. For this reason, maps are typically out ofdate. For example, it has been reported that, for anumber of reasons, the revision lag-time for topographicmaps from the United States Geological Survey (USGS)is more than 23 years (Groat, 2003).

⁎ Corresponding author. Tel.: +1 780 4926554; fax: +1 7804921071.

E-mail addresses: [email protected] (J. Zhou),[email protected] (W.F. Bischof), [email protected] (T. Caelli).

0924-2716/$ - see front matter © 2006 International Society for PhotogramAll rights reserved.doi:10.1016/j.isprsjprs.2006.09.002

Road detection and tracking based on computervision has been one approach to speeding up thisprocess. It requires knowledge about the road databaseas well as image-related knowledge (Crevier andLepage, 1997) including the road context, previousprocessing results, rules, and constraints (Baltsavias,1997). Many road tracking methods make assumptionsabout road characteristics (Wang and Newkirk, 1988;Vosselman and Knecht, 1995; Mayer and Steger, 1998;Katartzis et al., 2001; Bentabet et al., 2003; Zlotnick andCarnine, 1993; Mckeown et al., 1998; Klang, 1998;Tupin et al., 2002), including:

• roads are elongated,• road surfaces are usually homogeneous,

metry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V.

mailto:[email protected]



http://dx.doi.org/10.1016/j.isprsjprs.2006.09.002

109J. Zhou et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 108–124

• there is adequate contrast between road and adjacentareas.

One problem with these systems is that suchassumptions are pre-defined and fixed whereas imageroad features vary considerably. For example:

• roads may not be elongated at crossings, bridges, andramps,

• road surfaces may be built from various materialsthat cause radiometric changes,

• ground objects such as trees, houses, vehicles andshadows may occlude the road surface and maystrongly influence the road appearance,

• road surfaces may not have adequate contrast withadjacent areas because of road texture, lightingconditions, and weather conditions,

• the resolution of aerial images can have a significantimpact on computer vision algorithms.

Such properties cannot be completely predicted andthey constitute the main source of problems with fullyautomated systems.

One solution to this problem is to adopt a semi-automatic approach that retains the ‘‘the human in theloop” where computer vision algorithms are used toassist humans performing these tasks (Myers et al.,2000; Pavlovic et al., 1997). In this approach, dynamicknowledge can be transferred to computers, not onlywhen necessary, but also to guide the computer.

Several semi-automatic road tracking systems havebeen proposed in the past. McKeown and Denlinger(1988) introduced a semi-automatic road tracker basedon road profile correlation and road edge following. Thetracker was initialized by the user to obtain startingvalues for position, direction and width of the road. Roadaxis points were then predicted by a road trajectory andcorrelation model. Vosselman and Knecht (1995)proposed a road tracker based on a single-observationKalman filter. Human input was used to initialize thestate of the Kalman filter and to extract a template roadprofile. The Kalman filter then recursively updated itsstate to predict the road axis points using feedback frommatching the template profiles to the observed profiles.Baumgartner et al. (2002) developed a prototype systembased on the above method. An interaction interface wasdesigned to coordinate human actions with computerpredictions. More recent semi-automatic approachesinclude the least squares template matching methods(Gruen and Li, 1997) for road centerline extraction byHu et al. (2004) and Kim et al. (2004), both requiringseed-point input from humans to generate 2D template.

Another semi-automatic system was introduced byXiong and Sperling (2004), who presented a semi-automatic tool that enabled the user to visually check andcorrect mistakes of the clustering results in performingroad network matching.

These semi-automatic systems only allow humans toinitiate the tracking process and/or to perform finalediting. This makes the road tracking process difficult tocontrol and leaves the combination of human andcomputer resources suboptimal. Typically, the trackingprocess is only guided by the most recent human input.

In this paper, we present an approach that uses asemi-automatic road tracking system based on human–computer interaction and Bayesian filtering (Arulampa-lam et al., 2002). Two models of Bayesian filters,extended Kalman filters and particle filters, are used toestimate the current state of the system based on past andcurrent observations. When the Bayesian filters fail, ahuman operator observes the reason of the failure andinitializes another filter. Observation profiles aregenerated from 2D features of the road texture makingthe tracker more robust. Optimal profile matches aredetermined from the current state of the Bayesian filtersand the multiple observations. The human operatorinteracts with the road tracker not only at the beginningbut throughout the tracking process. User input not onlysets the initial state of the Bayesian filters but alsoreflects knowledge of road profiles. Consequently, theroad tracker is more flexible in dealing with differentkinds of road situations including obstructions byvehicles, bridges, road surfaces changes and more.

The main contribution of the paper is not to develop anew automatic road tracking method, but to propose ageneral and robust system that effectively combinesexisting technology with task demands and humanperformance (Harvey et al., 2004). It is a practicalsolution to applications in remote sensing and imageexploitation, where many automatic algorithms havebeen developed, but most of them have been unusable inreality (Glatz, 1997).

2. System overview

2.1. Application background

One of the main paper products of the United StatesGeological Survey (USGS) is the 7.5-minute quadrangletopographic map series (Groat, 2003) which consists ofabout 55,000 map sheets. The revision of this map seriesis the Raster Graph Revision (RGR) program which is atedious and time consuming manual process. The RGRprogram uses existing map sheets as primary input and

Fig. 1. Map revision environment. Previous map layers are aligned with latest digital image data.

110 J. Zhou et al. / ISPRS Journal of Photogrammetry & Remote Sensing 61 (2006) 108–124

creates new maps as primary output. The existing mapsare displayed on a computer screen togetherwith the latestdigital orthophoto quads (DOQs) of the area to bemapped(see Fig. 1). The cartographer thenmakes a visual compa-rison of the map and the DOQs. When an inconsistency isfound between a feature on the map and the DOQ thecartographer modifies the map to match the DOQ.

The DOQs are orthogonally rectified images pro-duced from aerial photos taken at height of 20,000 ft,with an approximate scale of 1:40,000 and a groundresolution of 1 m. An example of a road scene in a DOQis shown in Fig. 2.

Fig. 2. An image sample of size 663 by

2.2. Prototype road tracking system

The road tracking system consists of 5 components(see Fig. 3):

(1) Human. The human is the center of control andthe source of knowledge.

(2) Computer. The computer performs perceptualtasks to replace the human where performance isknown to be reliable. It uses vision algorithms forimage preprocessing, feature extraction, andtracking.

423 pixels extracted from a DOQ.

Fig. 3. Block diagram of road tracking system. This system is composed of five components. 1. human; 2. computer vision algorithms to processimages and track roads; 3. an interface to track and parse human actions; 4. a database to store knowledge, and 5. evaluation algorithms for feedbackpurposes, so that the computer can quantitatively evaluate human input and its own performance, and the human can evaluate the performance ofcomputer.


(3) Human–computer interface. The choice of theinterface has a significant effect on what can beattained. The computer records and parses user'sactions in the RGR system through the interface.

(4) Knowledge transfer scheme. The computer selectsand optimizes vision algorithm parameters basedon their ability to predict human behaviorthroughout the tracking process.

(5) Performance evaluation. Performance evaluationis a three-way process in HCI. First, evaluation ofhuman inputs enables the computer to eliminateinput noise and decide whether to accept or rejectthe input. Second, the computer evaluates its ownperformance and allows the human to gain controlover the whole process. Third, the user evaluatesthe performance of the computer and decides onwhat to do next.

The road tracking process starts with an initial humaninput of a road segment, which indicates the roadcenterline. From this input, the computer learns relevantroad information, such as starting location, direction,width, reference profile, and step size. This informationis then used to set the initial state model and relatedparameters of the automatic road tracker. The computeralso preprocesses the image to facilitate extraction ofroad features. The extracted features are compared withknowledge learned from the human operator throughprofile matching. The computer tracks the road axispoints using a Bayesian filter. During tracking, thecomputer continuously updates road knowledge while,at the same time, evaluating the tracking results. When it

detects a possible tracking problem or a tracking failure,it returns control back to the human. The humanobserves the road changes, diagnoses the failure reasonand indicates the correct tracking direction by inputtinga new road segment. The new input enables prompt andreliable correction of the state model of the tracker. Thenew segment can either be input at the current locationto resume tracking or can be input at the location wherethe tracking error happened. In this way errors can beidentified and corrected in real time without interruptingthe tracking process.

This system architecture enables two knowledge ac-cumulation processes. First, reference profiles extractedfrom human inputs are stored and the road trackergradually accumulates knowledge on these referenceprofiles. These profiles represent different road situa-tions that the tracker has seen. This knowledge passingprocess makes the tracker increasingly robust. Second,the computer also accumulates knowledge by itself:During tracking, it continues to update the matchedreference profiles with the latest tracking results. Thisenables the tracker to adapt to gradual road changes sothat human inputs can be reduced.

The tracking performance is continuously evaluated.When there is lack of confidence over several con-secutive positions, the tracker returns control to thehuman and waits for the next input. The evaluation ofnew input is based on cross-correlation, i.e. new profilesare defined in terms of their lack of correlation with pastones. In this way, knowledge redundancy is avoided,and the knowledge base does not expand too quickly,thus keeping tracking performance high.


This intelligent tutor/decision maker and apprentice/assistant architecture provides a useful communicationpath between the human operator and the computer. Thecomputer can learn quickly from humans, and it worksmore and more independently as tracking goes on.

2.3. Human–computer interface

To enable adequate human–computer interactions weneed to track and understand user actions in a softwareenvironment. In the USGS map revision system, asimple drawing operation can be implemented by eitherclicking a tool icon followed by clicking on maps usingthe mouse, or by entering a key-in command. Each toolin the tool bar corresponds to one cartographic symboland may encompass a sequence of key-in commands inthe execution. Each key-in is considered an event.Events from both inside or outside the system areprocessed by an input handler and are sent to an inputqueue. Then a task ID is assigned to each event.

We implemented embedded software to keep track ofthe states of the event queue and extract detailedinformation of each event (see Table 1).

We capture and record the time-stamped system-levelevent sequence, which contains both inter-action andintra-action information. To group the events into mean-ingful user actions, we analyze and parse the eventsusing natural language processing methods. Theseinclude a semantic lexicon for storing action informa-tion and a parser for analyzing syntactic and semanticaction information. The event sequence is fed into theparser and is segmented into complete actions. A comp-lete description on the human–computer interface isreported in Zhou et al., 2004.

Besides providing analysis of human performance,the interface provides the operator with tools for eval-uating the performance of the road tracker. As intro-duced in later sections, confidence values are generatedfrom the proposed Bayesian filters that allow theperformance of the model to be compared directly tothe user performance through the realtime trackerinterface.

Table 1Data structure for system-level event

Event ID ID of the eventEvent name The key-in commandEvent type Is it a key in, coordinate, or reset?Event time The time when the event is capturedEvent source Where does the event come fromx coordinate x coordinate of the mouse clickingy coordinate y coordinate of the mouse clicking

3. Preprocessing

The preprocessing module consists of three compo-nents: image smoothing, road width estimation, andextraction of an initial reference-profile. In the smooth-ing step, the input image is convolved with a 5×5Gaussian filter

G ¼ exp −x2 þ y2

2r2

� �ð1Þ

where r ¼ ffiffiffi2

ppixels. This filter was used to set the

analysis scale and to reduce high-frequency noise.

3.1. Road width estimation

3.1.1. MethodRoad width determines whether road profiles can be

correctly extracted or not. In previous semi-automaticroad trackers, the road width was typically entered bythe human operator at the beginning of the tracking(McKeown and Denlinger, 1988; Baumgartner et al.,2002), or was estimated automatically in the profilematching step (Vosselman and Knecht, 1995), whereasin our system, the road width is estimated automaticallyat the beginning of the tracking. A road segment isentered by the human operator with two consecutivemouse clicks with the axis joining the points definingthe road center line. We assume that the roadsides arestraight and parallel lines on both sides of the road axis.Road width can be estimated by calculating the distancebetween the roadsides. Further, knowledge about roadcharacteristics also helps determining road edgesbecause road width varies as a function of road class.

To detect the road edges, a method based ongradient profiles has been developed. This edgedetector first estimates the true upper and lowerbound of the road width, with the USGS road widthdefinitions serving as a reference (USGS, 1996). Ateach axis point a profile is extracted perpendicular tothe axis. The length of the profile is bounded by theroad width limits defined by USGS. The gradient ofthe profile along the profile direction is calculated andone point is selected on both sides of the axis pointwhere the largest maximum gradient is found. Ifseveral equal largest local maxima are found, the firsttwo local maxima are used. The distance betweenthe two points is considered as the road width atthis axis point. For a road axis segment, we obtaina probability density function p(x)

pðxiÞ ¼ number of times xi appears; 1V iV n ð2Þ

Fig. 4. Road edge detection results. (a) Cropped image from DOQ with human input (white blocks). (b) Result of Canny edge detector: note thepresence of multiple road edges. (c) Result of gradient profile based detector: only one pair of road edges is detected.


where xi is the road width value extracted above. ndepends on the road width limit from USGS and thecomplexity of the road conditions. Because the imageresolution is 1 m, xi corresponds to road width ofapproximate xi meters. Searching for an x* where

pðxTÞ ¼ arg maxx

pðxiÞ 1 V i V n ð3Þ

yields a dominant road width that appears most of thetime. Then new road bounds are calculated using thefunctions

lb ¼ xT−e and ub ¼ xT þ e ð4Þ

where lb is the new lower bound, ub is the new upperbound, and e=4 is an empirical value that proved to besuitable for our application. Using the new bounds, theedge detector determines the new road width at eachaxis point and computes the average as the final roadwidth for profile matching.

3.1.2. DiscussionFig. 4 shows road edges detected by the Canny edge

detector (Canny, 1986) and our own gradient-baseddetector. Since the Gaussian filter has already beenapplied to the image, the implementation of the Cannyedge detector starts from calculating the x- and y-gradient. Then the magnitude and direction of thegradient is calculated at each pixel. To perform the non-maximum suppression, we used 0.1 and 0.3 as the lowand high threshold to determine the strong edges. Noticethat the Canny edge detector does not take advantage of

the known road direction and the road width limits,multiple edges may be detected, which causes trouble infinding the true road edges. Thus, our gradient profilebased edge detector performs better than the Cannyoperator, at least in this specific application.

The mean value of the estimated road width was10.8 pixels, with a standard deviation of 4.3 pixels. In93.8% of the cases, the estimated road width variedbetween 6 and 18 pixels, depending on the real roadwidths, the road conditions, and the locations of humaninputs. The estimation of road width can be affected byseveral factors. First, pavement markings in multi-laneroads, rather than the true road edges, may generate themaximum gradient value. In our application, the aerialimages had a resolution of one meter per pixel. Thus,pavement markings were either not wide enough to bedisplayed or appeared to be less salient after thesmoothing step. Second, off-road areas and the roadcan be made of the same material, or have the sameradiometric properties. For example, both the road andthe sidewalk could be concrete. In this case, themaximum gradient is not found at the road edge. Ourgradient based method either takes the first point at bothsides of the road axis as the road edges, or takes the edgeof sidewalk as the road edge. In both situations, the roadwidth is bounded by the limits defined by the USGS, sothat it does not deviate far from the ground truth.

The reason for estimating the road width automati-cally was to allow the operator to focus on the road axispoints and road directions, consistent with the operationof plotting roads in real-world map revision systems.However, it should be pointed out that tools with manual


input are more accurate, though more time consuming,for estimating road widths, as used, for example in theROADMAP system developed by Harvey et al. (2004).

3.2. Profile extraction

An initial reference profile is extracted as a vector ofgreylevels from the road segment entered by the humanoperator. Later, new profiles are extracted from newhuman inputs and placed into a profile list for further use.

To improve robustness of the system, we use two-dimensional road features, i.e. in addition to searchingalong a line perpendicular to the road direction, we alsosearch a line along the road direction. Profiles areextracted in both directions and combined. The parallelprofile is useful since greylevel values vary little alongthe road direction, whereas this is not the case in off-road areas. Thus the risk of off-road tracking is reducedand, in turn, tracking errors are reduced. As will bedescribed later in Section 4.2, the observation profilesare also 2D features.

From each human input, we obtain a profile sequencethat contains the road surface texture information whichmay include occluding objects. For a sequence of roadprofiles P=[ p1, p2,…, pn], profile extraction proceedsas follows. First, an average profile is calculated. Theneach profile in the sequence is cross-correlated with theaverage profile. Whenever the correlation coefficient isbelow a threshold (set to 0.8), the profile is removedfrom the sequence. In this way, all axis points areevaluated and road profiles extracted from noisy axispoints, for example, where cars and trucks are presented,are removed. The algorithm iterates through all theprofiles until a new profile sequence is generated, andthe average profile of the new sequence is taken as thefinal road segment profile.

The effectiveness of this noise removal method isaffected by road conditions. When occlusions aresparse, the method is quite effective. However, in thecase of more populated roads, e.g. roads with a trafficjam, or roads under the shadow of trees, noisy referenceprofiles may be generated. In these cases, the perfor-mance of the system drops.

4. Road tracking

If we consider the road tracking process as a timeseries, it can be modelled by a state-space approachinvolving state evolution and noisy measurements. Thestate evolution of the tracking process can be defined as

xk ¼ fkðxk−1; vk−1Þ kaℕ ð5Þ

where xk is the state vector at time k, vk is the processnoise, and fk is a function of xk−1 and vk−1.

Given an observation sequence z1:k, the trackerrecursively estimates xk using the prior probabilitydensity function p(xk|xk−1) and the posterior probabilitydensity function p(xk|z1:k). The relationship betweenobservations and states is defined by

zk ¼ hkðxk ; nkÞ kaℕ ð6Þ

where nk is the measurement noise.Depending on the properties of the state evolution,

the observations, and the posterior density, the trackingproblem can be solved with different approaches, suchas Kalman filters, hidden Markov models, extendedKalman filters and particle filters (Kalman, 1960;Rabiner, 1989; Welch and Bishop, 1995; Arulampalamet al., 2002). In the following subsections, we introducetwo solutions to the tracking problem, extended Kalmanfiltering and particle filtering.

4.1. State model

Road axis points are tracked using recursive esti-mation following Vosselman and Knecht (1995), whoproposed the following state model:

x ¼xyhh V

2664

3775 ð7Þ

where x and y are the coordinates of road axis points, θis the direction of the road, and θ′ is the change in roaddirection. The state model is updated by the followingnon-linear function

xk ¼

xk−1 þ scos hk−1 þ sh Vk−12

� �

yk−1 þ ssin hk−1 þ sh Vk−12

� �hk−1 þ sh Vk−1

h Vk−1

26666664

37777775

ð8Þ

Differences between this simplified process and thetrue road shape are interpreted as process noise vk,whose covariance matrix is Qk.

In Eq. (8), τ is the interval between time k−1 and k,determining the distance the road tracker traverses ineach step. Initially it is set to the length of the road widthand it is affected by three parameters. The first parameter


is a “jump-over” factor corresponding to the internalevaluation of the road tracker (see Section 4.5). Thesecond parameter corresponds to the prediction scale(see Section 4.6). The third parameter corresponds to thecurvature of the road. When the road curvature is high, asmaller τ is used to avoid off-road tracking.

4.2. Observation model

Observations are obtained by matching the referenceprofiles to the observed profiles, the latter being extractedin 2D at the position estimated by the state models. Tominimize disturbances due to background objects on theroad and due to road surfaces changes, a heuristic multi-ple-observations method is used to search the neighbor-hood of the estimated points for better matches. Euclideandistances between the matching and observed profiles arecalculated, and the position with the minimum distance isselected as the optimal observation in an iteration. Theobservations zk are thus calculated as

zk ¼ xk−sksinðhk þ akÞyk þ skcosðhk þ akÞ� �

; ð9Þ

where sk is a shift from the estimated road axis point andαk is a small change to the estimated road direction.

4.3. Extended Kalman filtering

We have defined a tracking system based on non-linear state and observation models. Extended Kalmanfiltering has been widely used to solve such nonlineartime series (Brown and Hwang, 1992; Welch andBishop, 1995) where the posterior density is assumed tobe approximately Gaussian.

The tracking task is performed by estimating the optimalstate x̂ at each iteration. First, we compute Φ, which con-tains the coefficients of the linearized time update equations

Uk ¼ dfkðxÞdx

jx¼ x̂k−1 : ð10Þ

The covariance matrix of the predicted state vectorbecomes

Pkjk−1 ¼ UkPk−1jk−1UTk þ Qk−1: ð11Þ

After the state update, the extended Kalman filtercontinues the iteration by solving the followingmeasurement update equations:

Kk ¼ Pkjk−1AT ðAPkjk−1AT þ RkÞ−1 ð12Þ

x̂k ¼ Uk x̂k−1 þ Kkðzk−AUk x̂k−1Þ ð13ÞPkjk ¼ ðI−KkAÞPkjk−1 ð14Þ

In Eq. (12), A is the measurement matrix

A ¼ 1 0 0 00 1 0 0

� �; ð15Þ

and R is the covariance matrix of the measurement noise

Rk ¼ r2 sin2ðhkÞ sinðhkÞcosðhkÞsinðhkÞcosðhkÞ cos2ðhkÞ

� �; ð16Þ

where σ2 is the variance of the shift s in the observationmodel.

The initial state of the Extended Kalman filter is setto x̂0= [x0 y0 θ0 0]

T, where x0 and y0 are the coordinatesof the end point of the road segment input by humanoperator, and θ0 indicates the direction of the roadsegment. Starting from the initial state, the extendedKalman filter tracks the road axis points iteratively untilx or y are outside the image boundaries or a stoppingcondition has been met.

Vosselman and Knecht (1995) suggested that thecovariance matrix Qk of the process noise in roadtracking is mainly determined by the difference betweenthe constant road curvature assumption and the actualcurvature changes. They set the standard deviation ofthe process noise in θk′ to 1/400 the radius of the roadand propagate it to the standard deviation of other statevariables. We followed this rule in determining theprocess noise.

4.4. Particle filtering

Particle filtering, specifically the CONDENSATIONalgorithm proposed in (Isard and Blake, 1998), has beensuccessfully used in modelling non-linear and non-Gaussian processes (Arulampalam et al., 2002; Southalland Taylor, 2001; Lee et al., 2002). The filter approxi-mates the posterior density p(xk|zk) by the particle set{sk

i , wki , i=1,…,N} in each time step k, where wk

i is aweight used to characterize the probability of the par-ticle sk

i .Given the particle set {sk−1

i , wk−1i , i=1,…,N} at time

k−1, the iteration k of the particle filter can besummarized as follows:

(1) Construct cumulative density functions {ck−1i }

on the current particle set. Sample N particles{xk−1

j , j=1,…,N} according to the cumulativedensity function. The sampling of the jth particlexk−1j is done by generating a uniform random


number u j on [0, 1] and searching for the firstparticle sk−1

i with ck−1i ≥u j.

(2) Update each particle by Eq. (8) to generate newparticles {xk

j , j=1,…,N}. In the state update, theroad curvature parameter θ′ is influenced by azero mean Gaussian random variable with unitvariance.

(3) Calculate new weights for each particle based onhow well they fit the observation zk. The weightsare normalized and are proportional to thelikelihood p(zk|xk

j ). In this way, a new particleset {sk

i , wki , i=1,…,N} is constructed.

The estimated state at time k is then

EðxkÞ ¼XNi¼1

wiks

ik : ð17Þ

In our application, we assume that the observation isnormally distributed with standard deviation r ¼ ffiffiffi

2p

and so the likelihood of the observation is

p zjxj� �~

1ffiffiffiffiffiffi2p

p exp −d2j2r2

!; ð18Þ

where dj is the Euclidean distance between the positionof particle xj and the observation. The number ofparticles is set to 20 times the road width in pixels. Theinitial density of p(x0) is set to a uniform distribution,which means each particle has the same initialprobability. The particle filter gradually adjusts theweights of each particle during the evolution process.

4.5. Stopping criteria

A matching profile is extracted from the observationmodel and cross-correlated with the reference profile. Ifthe correlation coefficient exceeds some thresholds (e.g.0.8 in (Vosselman and Knecht, 1995)), the observation isaccepted; if the coefficient is below the threshold, andsome other conditions are met (e.g. a high contrastbetween the profiles), the observation is rejected. In thiscase, the Bayesian filters make another state update basedon the previous state, using a larger time interval τ, so thatthe estimated position without accepted observation isjumped over. When contiguous jumps occur (set to 5jumps), the Bayesian filter recognizes this as a trackingfailure and returns control back to the human operator.

The jump-over strategy is particularly useful in dealingwith small occlusions on the road, for example, when carsand longtrucks are present. In these cases, the profilematching will not generate high correlation coefficient at

the predicted state. The jump over strategy uses anincremented time interval to skip these road positions, sothat a state without occlusions can be reached.

In real applications, however, road characteristics aremore complex. Cross-correlation may not alwaysgenerate a meaningful profile match, which in turnmay lead to errors in the tracking process. For example,a constant road profile may generate high coefficientwhen cross-correlated with a profile extracted from anoff-road area with constant greylevel. Furthermore, theBayesian filters may often fail because the predictedposition may not contain an observation profile thatmatches the reference profile. For example, whenocclusions are present on the road, the reference andobservation profiles may generate a small correlationcoefficient, and in turn reject the observation. Thesystem then requires substantial interactions with thehuman operator, making the tracking process lessefficient and quite annoying for the user.

4.6. Improving efficiency

In previous algorithms (Vosselman and Knecht, 1995;Baumgartner et al., 2002) each time a new reference profilewas extracted the old reference profile was discarded. Inour system all the reference profiles are retained and theroad tracker gradually accumulates knowledge on roadconditions. In profile matching, the latest profile is giventhe highest priority. When matching fails, the Bayesianfilters search the list of reference profiles for a match. Toreflect the gradual change of the road texture, the referenceprofile is updated by successful matches using a weightedsum. We call this the multiple profiles method.

We developed an algorithm to search for the optimalobservation-reference profile combination. The searchspace V=bX,Y,ΘN is defined by the current state xk,where X, Y and Θ are bounded by a small neighborhoodof x, y and θ respectively. The search algorithm isdescribed below:

Algorithm: OPTPROFILE(P=p1, p2,…, pn, V )for each vi∈V

extract profile pi′ at vic( pi′, p1)← cross-correlation coefficient of

pi′ and p1end forc*=max(c( pi′, p1))if c*N0.9

update p1return v*

elsefor each pi′


for each pj∈P, j≠1c( pi′, pj)←cross-correlation

coefficient of pi′ and pjend for

end forc*=max(c( pi′, pj))if c*N0.9

p*=pj corresponding to c*switch p1 and p*

return v*else

return rejectionend if

end if

In many tasks, humans use multi-scale attention tofocus on important features and to reduce the influence ofdistractors (LaBerge, 1995). To simulate such behavior,we adopted a step prediction scaling strategy to improvethe efficiency of road tracking. A prediction scale is addedto the state update model of the Bayesian filters, con-tributing to the calculation of the time interval τ. Theinitial prediction scale is set to 1.When a successfulmatchhappens the scale parameter is incremented by 1. When-ever matching fails the prediction scale is reset to 1. In thisway, the time interval is adjusted automatically. If the roadis long, straight and homogenous in surface, the roadtracker can predict the next road axis point using a largerscale and ignore many details on the road thus increasingthe speed of the tracking process.

5. Experimental results

5.1. Data collection

Eight students were required to plot roads manuallyin the USGS map revision environment, which displaysthe old map and the latest DOQ simultaneously on thescreen. Plotting was performed by selecting tools forspecific road classes, followed by mouse clicks on theperceived road axis points in the image. Before per-forming the actual annotation, each user was given 30minto understand the road interpretation process as well as

Table 2Statistics on human input

User1 User2 User3

Total number of inputs 510 415 419Total time (in seconds) 2765 2784 1050Average number of inputs per task 18.2 15.2 15.0Average time per task (in seconds) 98.8 99.4 37.5

operations such as file input, road plotting, viewingchange, and error correction. They did so by working on areal map for the Lake Jackson area in Florida. When theyfelt confident in using the tools and road recognition, theywere assigned 28 tasks to plot roads on the map for theMarietta area in Florida. The users were told that roadplotting should be as accurate as possible, i.e. the mouseclicks should be on the true road axis points. Furthermore,the road should be smooth, i.e. abrupt changes indirections should be avoided and no zigzags shouldoccur. Although professional cartographers would beexpected to perform such tasks better than the studentsused here, considering the simplicity of tasks, we believethe performance of students was close to that of experts.Indeed all users became familiar with the annotationoperations in less than 15 min.

The plotting tasks included a variety of scenes such astrans-national highways, intra-state highways and roads forlocal transportation. Further, these tasks contained differentroad types such as straight roads, curves, ramps, crossings,and bridges. They also included various road conditionssuch as occlusions by vehicles, trees, or shadows.

Both spatial and temporal information on human in-puts were recorded and parsed and only road trackinginputs were kept. We obtained 8 data sets each containing28 sequences of road axis coordinates tracked by users.Table 2 shows some statistics on the human data. The totaltime in the table includes the time that users spent onimage interpretation, plotting, and error correction. Asshown in the evaluation criteria, these were all taken intoaccount in the efficiency calculation.

The system also allowed us to simulate the human–computer interactions using the recorded human data asvirtual users. The road trackers interacted with thevirtual users throughout the semi-automatic trackingprocess. Finally, we compared the performance betweenthe simulated semi-automatic road tracking and thecomplete manual tracking for each user.

5.2. Evaluation

Semi-automatic systems can be evaluated in manyways. For a real-world application, it often includes user

User4 User5 User6 User7 User8

849 419 583 492 4842481 1558 1966 1576 155230.3 15.0 20.8 17.6 17.388.6 55.6 70.2 56.3 55.4

Table 3Comparison of road tracking results between particle filters andextended Kalman filter

Inputsaving (%)

Timesaving (%)

Distancesaving (%)

RMSE(pixels)

ExtendedKalman filter

71.9 63.7 85.3 1.86

Particle filter(average)

72.3 62.0 85.6 1.90

Particle filter(best)

79.1 71.6 87.9 2.19


experience evaluation as reported by Harvey et al.(2004). Since our system is still in the simulation stage,we focused on the engineering aspect of the evaluationwhere the human factors components are only part of theassessment criteria. The criteria used to evaluate thissystem included the following:

• Correctness: Were there any tracking errors?• Completeness:Were there anymissing road segments?• Efficiency: How much could tracking save in termsof human input, tracking distance, and plotting time?

Fig. 5. Comparison of tracking performances for the particle filters (PF1 anperformance of the particle filter over 10 Monte Carlo trials. PF2 shows theperformance is evaluated on the saving of number of human inputs (upper lefof tracking distance (lower left graph), and the accuracy as the root mean squaright graph).

• Accuracy: How much did tracking deviate frommanual inputs?

Correctness and completeness have the highestpriority in Cartography. When errors occur the humanoperator has to search and correct these errors and thismay take longer than the time that was initially saved.The same problem can occur if the update on a road isincomplete. The most important advantage of theproposed system over fully automatic ones is that thehuman involvement guarantees correctness and com-pleteness of road tracking. The human operator alwaysfollows and interacts with the road tracker and wheneveran error happens the operator can correct it immediatelyby initializing a new tracking iteration. The trackingprocess does not stop until the user decides that all roadshave been plotted.

Consequently, in evaluating efficiency, savings inhuman inputs, in plotting time and in tracking distancehave to be considered. The number of human inputs andplotting time are related, so reducing the number ofhuman inputs also decreases plotting time. Given anaverage time for a human input, which includes the time

d PF2) and the extended Kalman filter (EKF). PF1 shows the averagebest performance of the particle filter over 10 Monte Carlo trials. Thet graph), the saving of total plotting time (upper right graph), the savingre error of the tracking results against the human input road axis (lower

Fig. 6. The inf luence of number of particles on the performance of thesystem on data extracted from user one.


for observation, plotting and the switching time betweenthe two, we obtain an empirical function for calculatingthe time cost of the road tracker:

tc ¼ tt þ knh: ð19Þwhere tc is the total time cost, tt is the tracking time usedby road tracker, nh is the number of human inputsrequired during the tracking, and λ is an user-specificvariable, which is calculated as the average time for aninput

ki ¼ total time for user itotal number of inputs for user i

1V i V 8

ð20Þ

In real application, the λ could be different, depend-ing on the usability of the human–computer interfaceprovided to the user. The savings in tracking distance is

Fig. 7. Road tracking from upper left to lower right. White dots are the detecinput.

defined as the percentage of roads tracked by computer.Tracking accuracy is evaluated as the root mean squareerror between the road tracker and human input.

5.3. Experimental results

We compared the performance of the particle filterwith the extended Kalman filter. Due to the factoredsampling involved in the particle filter the tracker mayperform differently for each Monte Carlo trial. For thisreason we evaluated the particle filter over 10 MonteCarlo trials and we report both average and bestperformance. Table 3 shows the performance compar-ison for the road trackers based on particle filtering andextended Kalman filtering. The proposed road trackingsystem shows substantial efficiency improvement inboth non-linear filtering algorithms compared to ahuman doing the tasks alone. More detailed perfor-mance comparison for each user are shown in Fig. 5.

In the proposed road tracking application the systemstates and observations are subject to noise fromdifferent sources including those caused by the imagegeneration, disturbances on the road surface, roadcurvature changes as well as other unknown sources.The nonlinear state evolution process propagates thenoise into the state probability density function (pdf ).For this reason it is better to construct a non-Gaussianand multi-modal pdf, and the extended Kalman filterand the particle filter are two sub-optimal methods tosolve such systems. The experimental results show thatthe performance of the extended Kalman filter and theaverage performance of particle filter are quite similar.Due to the uncertainty in the state evolution of particlefilters, different pdf of the states can be approximated indifferent Monte Carlo trials. When the approximation of

ted road axis points, white line segment shows the location of human

Fig. 8. Road tracking from upper left to lower right. White dots are the detected road axis points, white line segment shows the location of humaninput. The number of human inputs is reduced by searching multiple reference profile lists, as described in the text.


the state pdf is a better fit to the true pdf, the particlefilter out-performs the extended Kalman filter. This iswhy the best performance of the particle filter is betterthan that of the extended Kalman filter. We also noticedthe compensation on the accuracy in particle filtertracking. This is caused by the error correction functionof the particle filters which tolerates more deviations intracking.

Notice in Table 3 that both Bayesian filters provideapproximately the same level of improvement. Thissuggests that a combination of both filters may furtherimprove the performance of the tracking system. Forexample, both filters could perform the tracking tasksimultaneously using a two-filter competing strategy. A

Fig. 9. Road tracking from upper left to upper right. Black dots are the detectedThe tracking fails when road direction changes dramatically.

dynamic programming approach could also be used tocoordinate the behavior of the trackers, using thecorrelation traces to decide the optimal tracking path.The correlations could also provide human operators withrealtime feedback on the ‘‘level of confidence” the systemhas in the prediction step. This could assist the humanoperator in monitoring the tracking and in making adecisionwhether to allow the system to continue tracking.

As can be seen in Fig. 5, the Kalman and particlefilters perform differently for different users suggestingthat an adaptive, competitive filter selection strategyshould be included in the complete tracking model.

Some tracking results are shown in Figs. 7–12. InFig. 7, the road tracking starts from the upper left corner,

road axis points, blackline segment shows the location of human input.

Fig. 10. Road tracking from upper right to upper left. Black dots are the detected road axis points, blackline segment shows the location of humaninput. The tracking fails when road profile changes at the road connection.


with the white line segment showing the location ofhuman input. The following white dots are the road axispoints detected by the road tracker. When the texture ofthe road surface changes, the road tracker failed topredict the next position. Control was returned to theuser who entered another road segment as marked by ashort line segment. Step prediction scaling strategyenables the road tracker to work faster. This can be seenin the image, where larger step sizes are used whenconsecutive predictions were successful.

Fig. 8 shows how multiple reference profiles help thetracking. Tracking starts at the upper left corner. Whenthe road changes from white to black, a match cannot befound between the observation and the reference

Fig. 11. Road tracking from upper left to lower right. Black dots are the deteinput. This is an extreme case when trees occlude the road. Intensive human

profiles and human input is required as indicated bythe white line segment. When the road changes back towhite, no human input is necessary because the profilefor white road is already in the list of reference profiles.The tracker searches the whole list for an optimal match.

The performance of the system is influenced byseveral factors. First, human factors play an importantrole. Human input is not always accurate; hence the roadtracker is influenced strongly in the preprocessing stepwhen initial parameters are set and road profilesextracted. This is reflected in similar trends of differentfiltering algorithms tracking in the same data sets. Forexample, as shown in Fig. 5, the improvement ofefficiency of all trackers is the poorest for user 3 and

cted road axis points, blackline segment shows the location of humaninputs are required.


accuracy is the poorest for user 4. This suggests that oursystem can also be used to model the inputs given bydifferent users. This, in turn, opens the possibility ofinvestigating user-adapted systems.

Second, the number of particles in particle filteraffects the performance of the system, as shown in Fig. 6.When the number of particles is smaller than 20 times ofthe road width, the performance of the system is quitesteady. However, when this number continues toincrease, the performance of the system drops quickly.Though more particles allow for an improved approx-imation of the posterior density of the state, the systemperformance decreases due to the time spent on theparticle evolution and likelihood computations.

Third, complex road scene can cause tracking fail-ures, requiring further human input. Figs. 7 and 8 showcases of tracking failures that are caused by abruptchanges of radiometric property of the road. Fig. 9shows the case that tracking stops where road directionchanges abruptly. Figs. 10 and 11 show the trackingfailures caused by road profile changes due to roadconnection and occlusions.

When junctions are encountered the road trackermakes different judgements based on the road conditionand the status of the tracker, as shown in Fig. 12. Atjunction 1, a matching observation could not be found.But further along the direction of the road, a matchingobservation profile was found. Thus, junction 1 wasjumped over by state updating with a large step size τ dueto the jump-over strategy (see Section 4.5) or the stepprediction scaling strategy (see Section 4.6). However, atjunction 2, no matching profile could be found. Thus, thetracking process stopped and control was returned to thehuman operator. Ultimately, in all difficult situations, it isthe humanwho has to decide how to proceedwith tracking.

Vosselman and Knecht (1995) pointed out that a biasmay be introduced to the road center estimation, whenreference road profiles are updated using successfullymatched observation profiles. Unbalanced greylevels ofroad sides may lead to shift of the road center in thereference road profile. In our system, this is avoided byselecting, in each tracking step, an optimal observationprofile from multiple observation profiles, so that the

Fig. 12. Handling of road junctions. Blackdots are the detected road axis point

quality of the observation profiles can be improvedcompared to single observations. The experiments showthat updating reference road profile is a compromise. Itslightly improves the efficiency of the road tracking,while slightly lowering tracking accuracy. From an HCIpoint of view, this is a necessary step, as it enables thecomputer to contribute to knowledge accumulation. Webelieve this step can be further improved using machinelearning approaches.

6. Conclusion

This paper introduced a human–computer interactionsystem for robust and efficient road tracking. It attempts tobridge the gap between human and computer in automaticor semi-automatic systems.This approach has a potentiallysignificant impact on the daily work ofmap revision. It cangreatly reduce human effort in the road revision process.At the same time, it guarantees correct and completeresults because the user is never removed from the process.

The proposed framework consists of several compo-nents, the user, the human–computer interface, computervision algorithms, knowledge transfer schemes andevaluation criteria. It can compensate for the deficienciesof computer vision systems in performing tasks usuallydone by humans. This framework also can be applied tosystems that require understanding of different levels ofinteractions between human and computer.

The road tracking method is based on Bayesian filtersthat match observation profiles to reference profiles.Particle filters and extended Kalman filters are used topredict road axis points by state update equations andcorrect the predictions by measurement update equa-tions. During the measurement update process, multi-ple observations are obtained at the predicted position.The tracker evaluates the tracking result using nor-malized cross-correlation between road profiles atprevious and at the current position. When multipleprofiles are obtained from human input, the profile withthe highest cross-correlation coefficient is searched,with the most recently used profile being given thehighest priority. The use of two-dimensional features,multiple observation and multiple profile methods has

s. Junction 1 was jumped over, while the tracking stopped at junction 2.


greatly improved the robustness of the road tracker.Finally, when they were combined with step predic-tion scaling method, tracking efficiency was furtherincreased.

To progress with making human–machine systemsmore robust and useful we need to explore a number ofpaths.

• The simulated system approximated the human–computer interaction in a real-world system. Tofurther study the effectiveness and usability of thesystem, we need to implement it on an industrialplatform, such as in the USGS map revision system.

• The combination of Kalman filters and Particle filtersshould be studied, especially in developing user-adapted systems.

• The system can bemore automated and its performancebe further improved if more knowledge resources canbe involved (for example, the buildings layer).

• The system framework was developed for USGSmap revision environment which uses aerial image asthe source of revision. It would be interesting to seehow this system can be applied to other types ofimages, for example, to satellite images.

References

Arulampalam, M., Maskell, S., Gordon, N., Clapp, T., 2002. Atutorial on particle filter for online nonlinear/non-GaussianBayesian tracking. IEEE Transaction on Signal Processing 50(2), 174–188.

Baltsavias, E., 1997. Object extraction and revision by image analysisusing existing geodata and knowledge: current status and stepstowards operational systems. ISPRS Journal of Photogrammetryand Remote Sensing 58 (3–4), 129–151.

Baumgartner, A., Hinz, S., Wiedemann, C., 2002. Efficient methodsand interfaces for road tracking. International Archives ofPhotogrammetry and Remote Sensing 34 (Part 3B), 28–31.

Bentabet, L., Jodouin, S., Ziou, D., Vaillancourt, J., 2003. Road vectorsupdate using SAR imagery: a Snake-based method. IEEE Transac-tion on Geoscience and Remote Sensing 41 (8), 1785–1803.

Brown, R., Hwang, P., 1992. Introduction to Random Signals andApplied Kalman Filtering, second ed. Wiley.

Canny, J., 1986. A computational approach to edge detection. IEEETransaction on Pattern Analysis and Machine Intelligence 8 (6),679–698.

Crevier, D., Lepage, R., 1997. Knowledge-based image understandingsystems: a survey. Computer Vision and Image Understanding 67(2), 161–185.

Glatz,W., 1997. RADIUS and theNEL. In: Firschein, O., Strat, T. (Eds.),RADIUS: Image Understanding for Imagery Intelligence, pp. 3–4.

Groat, C., 2003. The National Map — a continuing, critical need forthe nation. Photogrammetric Engineering and Remote Sensing 69(10), 1087–1090.

Gruen, A., Li, H., 1997. Semi-automatic linear feature extraction bydynamic programming and LSB-snakes. Photogrammetric Engi-neering and Remote Sensing 63 (8), 985–995.

Harvey, W., McGlone, J., McKeown, D., Irvine, J., 2004. User-centric evaluation of semi-automated road network extraction.Photogrammetric Engineering and Remote Sensing 70 (12),1353–1364.

Hu, X., Zhang, Z., Tao, C., 2004. A robust method for semi-automaticextraction of road centerlines using a piecewise parabolic modeland least square template matching. Photogrammetric Engineeringand Remote Sensing 70 (12), 1393–1398.

Isard, M., Blake, A., 1998. CONDENSATION-conditional density pro-pagation for visual tracking. International Journal of Computer Vision29 (1), 5–28.

Kalman, R., 1960. A new approach to linear filtering and predictionproblems. ASME Journal of Basic Engineering 82 (D), 35–45.

Katartzis, A., Sahli, H., Pizurica, V., Cornelis, J., 2001. A model-basedapproach to the automatic extraction of linear feature from airborneimages. IEEE Transaction on Geoscience and Remote Sensing 39(9), 2073–2079.

Kim, T., Park, S., Kim, M., Jeong, S., Kim, K., 2004. Tracking roadcenterlines from high resolution remote sensing images by leastsquares correlation matching. Photogrammetric Engineering andRemote Sensing 70 (12), 1417–1422.

Klang, D., 1998. Automatic detection of changes in road databasesusing satellite imagery. The International Archives of Photogram-metry and Remote Sensing 32 (Part 4), 293–298.

LaBerge, D., 1995. Computational and anatomical models of selectiveattention in object identification. In: Gazzaniga, M. (Ed.), TheCognitive Neurosciences. MIT Press, Cambridge, pp. 649–664.

Lee,M., Cohen, I., Jung, S., 2002. Particle filter with analytical inferencefor human body tracking. Proceedings of the IEEE Workshop onMotion and Video Computing, Orlando, Florida, pp. 159–166.

Mayer, H., Steger, C., 1998. Scale-space events and their link toabstraction for road extraction. ISPRS Journal of Photogrammetryand Remote Sensing 53 (2), 62–75.

McKeown, D., Denlinger, J., 1988. Cooperative methods for roadtracing in aerial imagery. Proceedings of the IEEE Conference inComputer Vision and Pattern Recognition, Ann Arbor, MI, USA,pp. 662–672.

Mckeown, D., Bullwinkle, G., Cochran, S., Harvey, W., McGlone,C., McMahill, J., Polis, M., Shufelt, J., 1998. Research in imageunderstanding and automated cartography: 1997–1998. Tech-nical Report. School of Computer Science Carnegie MellonUniversity.

Myers, B., Hudson, S.E., Pausch, R., 2000. Past, present, and future ofuser interface software tools. ACM Transactions on Computer–Human Interaction 7 (1), 3–28.

Pavlovic, V., Sharma, R., Huang, T.S., 1997. Visual interpretation ofhand gestures for human–computer interaction: a review. IEEETransaction on Pattern Analysis and Machine Intelligence 19 (7),677–695.

Rabiner, L., 1989. A tutorial on hidden Markov models and selectedapplications in speech recognition. Proceedings of the IEEE 77 (2),257–286.

Southall, B., Taylor, C., 2001. Stochastic road shape estimation. Pro-ceedings of the Eighth International Conference On ComputerVision, Vancouver, Canada, pp. 205–212.

Tupin, F., Houshmand, B., Datcu, F., 2002. Road detection in denseurban areas using SAR imagery and the usefulness of multipleviews. IEEE Transaction on Geoscience and Remote Sensing 40(11), 2405–2414.

USGS, 1996. Standards for 1:24000-Scale Digital Line Graphs andQuadrangle Maps, U.S. Geological Survey, U.S. Department ofThe Interior.


Vosselman, G., Knecht, J., 1995. Road tracing by profile matching andKalman filtering. Proceedings of the Workshop on AutomaticExtraction of Man- Made Objects from Aerial and Space Images,Birkhaeuser, Germany, pp. 265–274.

Wang, F., Newkirk, R., 1988. A knowledge-based system for highwaynetwork extraction. IEEE Transactions on Geoscience and RemoteSensing 26 (5), 525–531.

Welch, G., Bishop, G., 1995. An introduction to the Kalman filter.Technical Report. Department of Computer Science, University ofNorth Carolina- Chapel Hill, TR95-041.

Xiong, D., Sperling, J., 2004. Semiautomated matching for networkdatabase integration. ISPRS Journal of Photogrammetry andRemote Sensing 59 (1–2), 35–46.

Zhou, J., Bischof, W.F., Caelli, T., 2004. Understanding human–computer interactions in map revision. Proceedings of the 10thInternational Workshop on Structural and Syntactic PatternRecognition, Lisbon, Portugal, pp. 287–295.

Zlotnick, A., Carnine, P., 1993. Finding road seeds in aerial images.CVGIP. Image Understanding 57 (2), 243–260.

Road tracking in aerial images based on human–computer interaction and Bayesian filtering

Documents

Transcript of Road tracking in aerial images based on human–computer interaction and Bayesian filtering