Download - Path Planning Algorithms for Autonomous Border · PDF filePath Planning Algorithms for Autonomous Border Patrol ... Path Planning Algorithms for Autonomous Border Patrol Vehicles ...

Path Planning Algorithms for Autonomous Border PatrolVehicles

by

George Tin Lam Lau

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied ScienceGraduate Department of Aerospace Studies

University of Toronto

Copyright c© 2012 by George Tin Lam Lau

Abstract

Path Planning Algorithms for Autonomous Border Patrol Vehicles

George Tin Lam Lau

Master of Applied Science

Graduate Department of Aerospace Studies

University of Toronto

2012

This thesis presents an online path planning algorithm developed for unmanned vehicles

in charge of autonomous border patrol. In this Pursuit-Evasion game, the unmanned

vehicle is required to capture multiple trespassers on its own before any of them reach a

target safe house where they are safe from capture. The problem formulation is based on

Isaacs’ Target Guarding problem, but extended to the case of multiple evaders. The pro-

posed path planning method is based on Rapidly-exploring random trees (RRT) and is

capable of producing trajectories within several seconds to capture 2 or 3 evaders. Simu-

lations are carried out to demonstrate that the resulting trajectories approach the optimal

solution produced by a nonlinear programming-based numerical optimal control solver.

Experiments are also conducted on unmanned ground vehicles to show the feasibility of

implementing the proposed online path planning algorithm on physical applications.

ii

Acknowledgements

The author would like to thank Dr. Hugh H.T. Liu for his supervision during the past two

years of this research project. In addition, the author would like to acknowledge Jason

Zhang, Sohrab Haghihat, and Keith Leung for their guidance during various stages of

the research.

iii

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Literature Review 6

2.1 Pursuit-Evasion Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Motion Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Problem Formulation 11

3.1 The Border Patrol Scenario . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.2 A Specific Example . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Evader Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Optimal Control Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Algorithms 25

4.1 Rapidly-exploring Random Trees . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 RRT Modification for SP2E . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

iv

4.3.2 Node Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.3 Cost Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3.4 Rewiring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.4 RRT Modification for SP3E . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.4.1 Tree Hybridization . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4.2 Rewiring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5 Simulations 42

5.1 GPOPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 SP2E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 SP3E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 Experiments 53

6.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.2 Controller Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.3 SP2E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.4 SP3E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7 Conclusion 66

7.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

A SPSE Target Guarding 68

B C++ SP2E Algorithm 71

C Kalman Filter for Speed Prediction 84

Bibliography 86

v

Nomenclature

P Pursuer

Ei Evader i

x State vector

u Control vector

xP Pursuer’s state vector

xEiEvader i’s state vector

uP Pursuer’s control vector

uEiEvader i’s control vector

zi Mobility of Evader i

X Configuration space

Xb Boundary of configuration space

lb Boundary length (m)

XenterEntry boundary

Xsafe Safe house

rsafe Safe house radius (m)

Xcapt Pursuer’s capture region

rcapt Pursuer’s capture radius (m)

xP Pursuer’s x-coordinate (m)

yP Pursuer’s y-coordinate (m)

vP Pursuer’s speed (m/s)

vi

θP Pursuer’s heading input (rad)

xEiEvader i’s x-coordinate (m)

yEiEvader i’s y-coordinate (m)

vE Evader’s speed (m/s)

θEiEvader i’s heading input (rad)

k Pursuer-evader speed ratio

JP Pursuer’s objective function

JEiEvader i’s objective function

φ Terminal cost/objective

L Time-integral cost/objective

λ Costate vector

H Hamiltonian

ψ Terminal constraint

ν Terminal constraint multiplier

Border Patrol Algorithm Nomenclature:

V Node set

E Edge set

Xnear Set of locations of nearby nodes

V π Expected total reward for following control policy π

πS One-by-one SPSE pursuit policy

zc Compact representation of evader mobility

Experiment Nomenclature:

FV Vicon inertial reference frame

FR Robot reference frame

xg Goal state

vii

FG Goal inertial reference frame

θV Robot heading, Vicon frame

e Pose error, Robot frame

K Control matrix

v Translational speed (m/s)

ω Rotational speed (rad/s)

r Vector from robot to goal

ρ Distance between robot and goal (m)

α Angle between r and x-axis of FR (rad)

β Angle between r and x-axis of FG (rad)

viii

Chapter 1

Introduction

Pursuit-Evasion games are a family of problems where a class of agents, called pursuers,

are assigned the task of capturing another class of agents, called evaders, in a given

environment. Research in this topic is widespread and delves into many variations of the

Pursuit-Evasion game, including the number of pursuers and evaders, the cost function by

which performance is measured, the movement and sensing capabilities of the agents, and

the environment in which the game is played. The problem of Pursuit-Evasion applies in

many situations. When building security is breached, searchers must sweep the different

corridors and rooms of the building and locate the intruder without letting it escape.

Search and rescue operations are variants of the Pursuit-Evasion problem where a team

of vehicles must search an unknown environment to track down lost individuals. Military

operations requiring the capture of various agents using Unmanned Aerial Vehicles (UAV)

are also related to Pursuit-Evasion problems. Another potential application of this field

of research is automated border patrol, which is the focus of this thesis.

The automated border patrol scenario has several defining features that distinguish

it from most research in Pursuit-Evasion problems. Due to the vast search area involved

in patrolling a national border, assigned agents would be spread out very far apart from

each other. When illegal entry occurs and individuals trespass upon the area, the agent(s)

1

Chapter 1. Introduction 2

in closest proximity must capture all the evaders before they can reach a target location,

such as a dense forest or urban area, where tracking becomes much more difficult. The

unique features of this problem are: 1) a few pursuers attempting to capture multiple

evaders, and 2) the evaders having a target destination that they attempt to reach where

they will be safe from the pursuer upon arrival. The border patrol scenario is set up

with these features to represent the use of a UAV for autonomously patrolling a national

border. Because of the lengths of the Canada-U.S. and U.S.-Mexico borders, using a

system of UAVs would be a significant improvement over human patrol teams in terms

of cost, range, and efficiency.

1.1 Motivation

The utilization of UAVs for border patrol is not a novel concept. In 2005, the U.S.

Department of Homeland Security’s Customs and Border Protection began using the

MQ-9 Predator B UAVs as part of their taskforce for patrolling the U.S.-Mexico border1.

This was part of a new security initiative known as SBInet where technological solutions

were to be set up to monitor the area2. However, after four years, the $4 billion spent

amounted to a system covering only 28 miles, or 1.8% of the border3.

Despite the setbacks, the UAV portion of the project was the lone bright spot of the

system. While the overall SBInet project has been halted, more UAVs are being added

for border surveillance, with the expectation that 24 Predator drones will be in operation

1P. McLeary. ”New Technologies Aid Border Patrol.” Internet: http://www.aviationweek.

com/aw/generic/story_generic.jsp?channel=dti&id=news/dti/2010/11/01/DT_11_01_2010_

p37-262578.xml&headline=null&next=10, Nov. 2010 [Mar. 26, 2011].2”U.S. Customs and Border Protection - Border Security.” Internet: http://cbp.gov/xp/cgov/

border_security/sbi/sbi_net/, Dec. 2009 [Mar. 26, 2011].3C. McCutcheon. ”Securing Our Borders.” Internet: http://www.americanprogress.org/issues/

2010/04/secure_borders.html, Apr. 2010 [Mar. 26, 2011].


by 20164. The relatively low $16-million cost of the UAV5 makes them a cost-efficient

and flexible solution for border security. However, existing ones are manually operated

and serve only as a complement to ground agents. The proposed research will provide a

guidance system for pursuing and capturing trespassers so that UAVs can autonomously

complete the task without the aid of human patrol teams.

1.2 Objective

Here, we address a multi-agent Pursuit-Evasion game where a single pursuer, representing

an unmanned vehicle, is patrolling a border and is responsible for capturing multiple

evaders before they are able to arrive at a target destination. The goal of this thesis is

to develop a guidance system that generates a trajectory for the pursuer to efficiently

capture these border trespassers. In order to have a guidance system that is practical

and can be implemented on physical applications, several requirements must be met:

• The guidance system must be able to generate a trajectory for the pursuer to

follow in real-time (i.e. within several seconds of receiving information regarding

the position of the trespassers)

• The guidance system must be able to operate under a wide range of scenarios (i.e.

variations in initial positions, maximum speed, and vehicle capture capabilities)

• The guidance system must generate trajectories that are close to the optimal solu-

tion

Current research for finding solutions to Pursuit-Evasion games have typically been

analytical or heuristic in nature, which limits their ability to satisfy the above require-

4W. Booth. ”More Predator drones fly U.S.-Mexico border.” Internet: http://www.

washingtonpost.com/world/more-predator-drones-fly-us-mexico-border/2011/12/01/

gIQANSZz8O_story.html, Dec. 2011 [Jun. 22, 2012].5J. Hing. ”Aerial Drones Take Over the Border.” Internet: http://colorlines.com/archives/

2010/09/aerial_drones_take_over_the_border.html, Sept. 2010 [Mar. 26, 2011].


ments. Analytical solutions are favorable and would yield guidance laws that can produce

desired trajectories in real-time, but unfortunately, such solutions are available only to the

simplest types of problems. Isaacs, in his pioneering work Differential Games provided

minimax solutions for several Single-Pursuer, Single-Evader (SPSE) zero-sum problems

[1]. Analytical solutions have also been found for several other SPSE problems, which

will be presented in the next chapter. For more complex problems, such as problems

with more than one pursuer and/or evader, complicated cost functions, or cluttered or

changing environments, analytical solutions are often intractable. Heuristic methods are

then used in such cases in two ways: to reduce the problem into simpler, manageable

ones, or to generate a pursuit strategy in itself. Such heuristic solutions use human ex-

perience to generate better strategies, but it is difficult to determine whether they are

generating trajectories that are close to the minimax solution or not.

Due to the solution requirements, the nature of the problem, and the limits of current

solutions for Pursuit-Evasion games, the challenges in developing a suitable guidance

system for this application are:

• Having multiple evaders in the problem increases the complexity of the problem

• Contrary to most Pursuit-Evasion problems studied, where the cost/objective func-

tion is the time to capture, the pursuer’s cost function is a function of the distance

of the evaders from the goal when they are captured

• The cost/objective function will be different for the pursuer than the evaders, re-

sulting in a non-zero-sum problem

• When an evader is captured, it is immobilized, resulting in a change of vehicle

dynamics and hence creating a multi-stage problem that must be solved

As mentioned before, while analytical solutions are definitely the most appealing, the

complexity of this problem suggests that developing a numerical solution would be more


practical. The algorithm developed in this thesis for the border patrol Pursuit-Evasion

problem is a sampling-based motion planner based on robot path-planning methods. The

rest of this thesis will be devoted to explaining the concept behind this algorithm and

demonstrating the feasibility in implementing the proposed guidance system for physical

applications.

1.3 Overview

The rest of the thesis is organized as follows. Chapter 2 provides a literature review on

existing research on Pursuit-Evasion games as well as robot motion planning algorithms.

In Chapter 3, the border patrol problem is presented for autonomous vehicles. Simpli-

fying assumptions concerning the problem are also explained - assumptions based on

Isaacs’ Target Guarding problem [1] that reduce the problem from a differential games

problem to an optimal control problem. In Chapter 4, the algorithms developed for the

autonomous vehicle border patrol scenario are presented. Simulation results using the

algorithms are provided in Chapter 5 for different test cases and compared to offline

trajectories generated by existing numerical optimal control solvers to show that the al-

gorithms achieve near-optimal solutions. In Chapter 6, experimental method and results

using autonomous ground vehicles are presented to demonstrate the algorithms in use for

physical applications. To conclude, a summary of the research completed and possible

future work are given in Chapter 7.

Chapter 2

Literature Review

This chapter summarizes the existing and current research on Pursuit-Evasion games

in general. From this, we show the lack of research and understanding in problems

related to the border patrol scenario, including multiple-player problems and non-zero-

sum problems. Since the border patrol algorithm proposed in Chapter 4 is based on path

planners developed in the field of robotics, a review of different types of motion planning

algorithms and recent development of particular algorithms of interest is provided.

2.1 Pursuit-Evasion Games

Isaacs’ Differential Games was the first comprehensive study of the topic of Pursuit-

Evasion games. Approaching the problem from a purely mathematical perspective, he

derived analytical solutions for several examples, such as the Homicidal Chauffeur prob-

lem and the Isotropic Rocket problem [1]. His study was focused on Single-Pursuer,

Single-Evader (SPSE) problems where the opposing players were trying to minimize

and maximize the same objective function, respectively. These are known as zero-sum

problems and lead to the notion of minimax solutions, where each player optimizes its

objective while considering the other player’s attempt to achieve the opposite. Analyt-

ical studies have also been conducted on singular and universal surfaces that define the

6

Chapter 2. Literature Review 7

convergence of optimal paths [1, 2].

Unless it is assumed that the pursuer has complete visibility of the entire environment,

Pursuit-Evasion problems are usually divided into two stages: searching and pursuing

the evader. For the search stage, discrete probabilistic frameworks have been developed

in the case where pursuers’ sensor readings may be erroneous [3, 4, 5]. Randomized

sampling-based algorithms have been proposed where the environment is divided into

different roadmaps for searching [6, 7]. The set of possible current locations for previously

detected evaders that have escaped the pursuer’s visibility region has also been studied

[8].

For the pursuit phase, analytical solutions have been found for several SPSE problems,

including a stochastic differential game based on Dubins’ car model [9] and a bounded

game for an interceptor missile [10]. There has also been research done that deals with

problems with obstacles in the environment and multiple players. Maintaining visibility

of evaders around obstacles was addressed [11, 12, 13]. Evolutionary algorithms have

been applied in Pursuit-Evasion games as well [14].

When dealing with multiple pursuers and evaders, the problem becomes more com-

plex. As analytical solutions for such cases are intractable, there have been numerical

solutions where the problem is broken down into several SPSE problems, with periodic

assessments of the global problem at preset intervals [15, 16]: the SPSE engagements are

evaluated, and if deemed to be inefficient, the engagements are reassigned [15]. Particle

swarm optimization has also been proposed for multiple pursuers trying to capture a

single pursuer [17]. Research in SPME problems is scarce; geometry bounds on capture

regions have been studied for basic problems [18], but no solution has yet been developed.

Other guidance laws developed for multiple-pursuer or multiple-evaders problems have

been mostly heuristic in nature. These include the planes strategy where the pursuers

form a semi-circle to contain the evader [19], a frontier-based search to trap the evader

[20], assigning guards [21], and reducing sensor overlap [22].


If the pursuer and evader are optimizing over different objective functions, the problem

becomes a non-zero-sum differential game. Theory on the existence and determination of

equilibrium points is not as developed as those for zero-sum problems [23, 24]. Optimal

solutions have been found for the linear-quadratic case [25], but besides this, most studies

on this specific topic have been preliminary in nature [23, 26, 27].

2.2 Motion Planning

Robot motion planning in the presence of obstacles has been a hot topic of research in

recent years, and this has led to the development of many types of solution methods. Of

these appraches, several notable classes of algorithms include roadmap methods, cell de-

composition methods, potential field methods, and sampling-based planning. Roadmap

methods decompose the free space into a graph by incrementally building a map of it as

data about the environment is gathered from the robot’s sensors [28]. Visibility graphs,

Voronoi roadmaps, and silhouette methods are instances of roadmap planners. Most of

these methods are limited to two or three dimensions [29].

Cell decomposition methods also decompose the configuration space for the purpose

of finding an obstacle-free trajectory via graph search, but rather than fitting a graph

onto the free space, it is split into a finite number of cells, with the boundaries of the cells

usually representing the change in an obstacle’s geometry or the visibility due to that

obstacle [28]. Some decomposition methods include trapezoidal, critical-curve based, and

tree decomposition [29].

Potential field methods map a potential function to the free space so that the vehicle

will move in reaction to the forces defined by this field. In obstacle avoidance, obstacles

are assumed to emit repulsive forces like that of an electric charge in order to drive the

vehicle away from it [30]. The trajectory can be generated by the potential function via

gradient descent, but the existence of local minima is a potential problem of such an


approach. Possible solutions to this problem are to use a wave-front planner on a grid

similar to dynamic programming or to define a navigation potential function that only

has one minimum [28].

All the methods described so far require an explicit representation of the free space,

leading to computational inefficiency as the dimensionality increases. Sampling-based

methods avoid this by building a map of the environment by incrementally sampling

random new points, checking the validity of the new points by a collision detection

module, and connecting them to existing ones in the map if valid [31]. The way in which

points are connected vary based on the method applied, but the common feature is that

connections are made between two selected points if the path between them is obstacle-

free, and rejected otherwise. Sampling-based planners have the ability to incorporate

vehicle differential constraints into its state space search [29].

In the border patrol scenario, sampling-based methods are more suitable for appli-

cation to this problem. Of particular interest is the Rapidly-exploring Random Tree

(RRT), a sampling-based method that uses a tree structure to store and connect the

sample points in the configuration space. First developed by LaValle and Kuffner [32]

and shown to be probabilistically complete [33], it was then extended to deal with system

dynamics [34]. Since then, the RRT algorithm has been modified for different purposes,

including optimal motion planning [35], multiple-UAV obstacle avoidance [36], and urban

driving [37]. Recently, RRTs have also been applied to a Pursuit-Evasion problem, where

an algorithm was developed for an evader trying to reach a goal while avoiding multiple

slower pursuers [38].

In summary, the Rapidly-exploring Random Tree is a sampling-based method for

robot motion planning that is more flexible for modification for the border patrol problem

of interest. It is superior in this respect over other motion planning algorithms as it

doesn’t require an explicit representation of the problem for it to plan paths for the

robot. The reason for turning to motion planning algorithms is because of a lack of


existing research on Pursuit-Evasion games dealing with properties in the border patrol

scenario, namely problems with a single pursuer and multiple evaders and non-zero-sum

problems. While there have been studies done for simple cases of such problems, most

Pursuit-Evasion studies have been focused on analytical studies of SPSE problems and

heuristic strategies for problems with multiple pursuers and/or evaders.

Chapter 3

Problem Formulation

In this chapter, the formulation of the border patrol Pursuit-Evasion problem of interest

is presented, beginning with a general description and followed by a specific instance

for which simulations and experimental results are included in subsequent sections. The

procedures involved in developing simplifying assumptions about the evaders’ dynamics

based on Isaacs’ Target Guarding Problem are then introduced, followed by a description

of the resulting optimal control problem whose objective function we seek to optimize by

selecting a proper control input for the pursuer.

3.1 The Border Patrol Scenario

3.1.1 General Formulation

The dynamics of the pursuer P and evaders Ei in the problem is governed by the set of

differential equations

xP = fP (xP ,uP ), xP (0) = xP,0 (3.1)

xEi= fEi

(xEi,uEi

), xEi(0) = xEi,0 (3.2)

11

Chapter 3. Problem Formulation 12

where xP and uP are the state and control vectors of the pursuer and xEiand uEi

are

the state and control vectors of evader i. Assuming the players exist in Rd physical space

with dimensionality d = 2 or 3, let

xP =

[xpos,P xdyn,P

]T, xpos,P ε Rd (3.3)

xEi=

[xpos,Ei

xdyn,Ei

]T, xpos,Ei

ε Rd (3.4)

such that xpos,P and xpos,Eicontain the states describing P and Ei’s positions in physical

space, and xdyn,P and xdyn,Eicontain additional states necessary to describe P and Ei’s

dynamics (e.g. velocity, heading). Then, restrict xpos,P and xpos,Eito remain in the

closed set X ⊂ Rd over some finite horizon t ε [0, tf ]. The equations of motion for Ei can

be decomposed as follows:

xEi=

xpos,Ei

xdyn,Ei

= fEi(xEi

,uEi) =

zi · fpos,Ei(xEi

,uEi)

fdyn,Ei(xEi

,uEi)

(3.5)

zi represents the mobility of evader i, and takes the values 1 (evader is active) and 0

(evader is immobilized). For a game with n evaders, the full state and control vectors of

the problem would be

x =

[xP xE1 · · · xEn

]T(3.6)

u =

[uP uE1 · · · uEn

]T(3.7)

Likewise, the full set of differential equations would be


x =

fP (xP ,uP )

fE1(xE1 ,uE1)

...

fEn(xEn ,uEn)

(3.8)

Letting Xb be the boundary of X and Xenter ⊂ Xb represent the border through

which the evaders trespass into X, the initial positions of the pursuer and evader i are

given by xpos,P (0) ε X and xpos,Ei(0) ε Xenter respectively. The mobility functions zi are

all initially set to 1. The evaders seek to enter into the subset Xsafe ⊂ X representing

the safe house, while the pursuer attempts to capture and immobilize them before any

evaders can arrive at Xsafe.

Capture of evader i occurs at time tcapt,i when

xpos,Ei(tcapt,i) ε Xcapt(xP (tcapt,i)), tcapt,i ε [0, tf ] (3.9)

Xcapt ⊂ X is a set pinned to and moves with xP , analogous to a rigid body in dynamics.

It represents the region surrounding P for which any evader entering it is within range

for the pursuer to instantly capture it. Upon capture, Ei is immobilized, and zi assumes

the value of 0 for all t ≥ tcapt,i.

There are two termination conditions for this finite-horizon problem, one correspond-

ing to the success of the pursuer in capturing all the evaders before any of them arrive

at Xsafe, and the other corresponding to its failure in doing so. The success criteria at

tf can be expressed as

∀i = 1, . . . , n, zi(tf ) = 0 (3.10)

and the failure criterion as

∃ i ε 1, . . . , n s.t. xpos,Ei(tf ) ε Xsafe (3.11)


The objective functions of the evaders are their respective Euclidean distances away

from the closest point in Xsafe when the problem terminates:

JEi(xEi

(tf )) = minxsεXsafe

‖xpos,Ei(tf )− xs‖, i = 1, . . . , n (3.12)

In turn, the objective function of the pursuer is the closest Euclidean distance of any of

the evaders away from the closest point in Xsafe when the problem terminates:

JP (x(tf )) = mini=1,...,n

JEi(xEi

(tf )) (3.13)

The problem is then one where the evaders seek to minimize their own objectives,

minuEi

JEi(xEi

(tf )), i = 1, . . . , n (3.14)

while the pursuer seeks to maximize its objective,

maxuP

JP (x(tf )) (3.15)

Note that there is no time-integral cost in the objective function, and that the problem

is non-zero-sum.

3.1.2 A Specific Example

Consider the problem in Figure 3.1. X ⊂ Rd is a square space of length lb with boundary

Xb. At t = 0, evaders E1, E2, and E3 are located on Xenter, the dashed left-half portion

of Xb. Their individual goals are to reach Xsafe, a semi-circle of radius rsafe attached

to the right side of X with its center located at (xsafe, ysafe). At t = 0, P is in X and

attempts to capture all three evaders before any of them arrive at Xsafe. P has a circular

capture region, denoted by Xcapt, of radius rcapt with its center at pos(xP ) for which all

evaders entering it are assumed to be captured and immobilized.


Figure 3.1: Problem formulation of the SPME safe house problem

We apply unicycle dynamics to all players, such that they are not constrained by a

minimum turning radius. Also, P and the evaders Ei always travel at their maximum

speeds (vP and vE respectively), so their control inputs are their heading angles (θP and

θEi). The state and control vectors of the problem are as follows:

x =

[xP xE1 xE2 xE3

]T=

[xP yP xE1 yE1 xE2 yE2 xE3 yE3

]T(3.16)

u =

[uP uE1 uE2 uE3

]T=

[θP θE1 θE2 θE3

]T(3.17)


The differential equations governing the pursuer’s dynamics are:

xP =

xPyP

=

vP cos θP

vP sin θP

(3.18)

while the differential equations governing the evaders’ dynamics are:

xE1 =

xE1

yE1

=

z1 · vE cos θE1

z1 · vE sin θE1

(3.19)

xE2 =

xE2

yE2

=

z2 · vE cos θE2

z2 · vE sin θE2

(3.20)

xE3 =

xE3

yE3

=

z3 · vE cos θE3

z3 · vE sin θE3

(3.21)

As Xsafe is a semi-circular region attached to the right edge of X, the following are

the objective functions of evader i and the pursuer respectively:

JEi(xEi

(tf )) =√

(xsafe − xEi(tf ))2 + (ysafe − yEi

(tf ))2 − rsafe, i = 1, 2, 3 (3.22)

JP (x(tf )) = mini=1,2,3

JEi(xEi

(tf )) (3.23)

This specific instantiation of the border patrol problem will be studied for the re-

mainder of this thesis. In Chapter 4, the algorithms developed will be for this particular

scenario consisting of either two or three evaders.

3.2 Evader Control

With the current problem formulation, finding an optimal solution for the pursuer’s

control also requires simultaneously finding a solution for the evaders’ controls as well.


This is known as a differential games problem, which is similar to the problems studied

in game theory, except in a continuous setting. The solution to such problems are called

minimax solutions, where any deviation for either the pursuer(s) or evader(s) from the

strategy presented by the minimax solution will lead to that party doing worse than if it

didn’t deviate. There currently does not exist any methods, analytical or numerical, for

solving the general class of differential games.

In light of the intractability in finding an online algorithm to produce the minimax

solution to this differential games problem, it is beneficial, both in terms of being able

to generate a solution and computational cost, to first find a reasonable control law of

the evaders. Knowing or having a good approximation of their behavior converts the

differential games problem into an optimal control problem.

Figure 3.2: SPSE minimax solution, vP/vE = 1

In Isaacs’ Differential Games, the minimax solution for the Single-Pursuer Single-

Evader case is briefly introduced as the Target Guarding Problem when both players

share the same speed [1] and is shown in Figure 3.2. The perpendicular bisector of P

and E’s initial positions divides the regions that P and E can reach before its opposition

does if it takes the shortest path to that point, which is a straight line. The bisector that

divides these two regions can be expressed as follows:


ydiv(x) = −xE − xPyE − yP

x+1

2(yP + yE +

x2E − x2PyE − yP

) (3.24)

The minimax strategy for both players is to travel in a straight line from its initial position

to the point on the bisector closest to Xsafe. Any trajectory for P and E that deviates

from this strategy would not be optimal and will lead to capture closer to or further away

from the Xsafe, respectively, assuming that its opponent follows the minimax strategy.

In the example presented in 3.1.2, the closest point on the bisector to Xsafe is

x∗ =xsafe + 1

2(xE−xPyE−yP

)

1 + (xE−xPyE−yP

)2· (yP + yE − 2ysafe +

x2E − x2PyE − yP

) (3.25)

y∗ = ydiv(xminmax) (3.26)

When the pursuer is faster than the evader, we extend Isaacs’ minimax solution for

the same-speed case to the case when the P -to-E speed ratio, k = vPvE

, is greater than 1.

The locus of points that divide the regions that P and E can reach before its opposition

does is given by the following expression:

ydiv(x) =(yP − k2yE)±

√∆(x)

(1− k2)where

∆(x) = −(k2 − 1)2x2 + 2(k2 − 1)(k2xE

− xP )x+ k2(yP − yE)2 + (k2 − 1)(x2P − k2x2E) (3.27)

The optimal path for both players is a straight line from their initial positions to the

point on the locus closest to Xsafe. The paths for different values of k are presented in

Figure 3.3. In the example presented in 3.1.2, the closest point to Xsafe is the solution

to the quadratic equation


ax∗2 + bx∗ + c = 0, where

a = (k2 − 1)2[((k2 − 1)ysafe + yP − k2yE)2 + (k2xE − xP − (k2 − 1)xsafe)2]

b = (k2 − 1)[−2((k2 − 1)ysafe + yP − k2yE)2(k2xE − xP )

− 2(k2xE − xP − (k2 − 1)xsafe)2(k2xE − xP )]

c = ((k2 − 1)ysafe + yP − k2yE)2(k2xE − xP )2

− (k2xE − xP − (k2 − 1)xsafe)2(k2(yP − yE)2 + (k2 − 1)(x2P − k2x2E)) (3.28)

Figure 3.3: SPSE minimax solution, vP/vE > 1

For the rest of this section, the solutions just presented in Equations 3.24-3.28 will be

referred to as the geometric solution. To confirm that this geometric solution is indeed

the minimax solution to the zero-sum problem, consider the following solution technique

presented by Isaacs for general zero-sum differential games. Given n state equations

x = f(x, uP , uE) and the objective function

J = φ(xf , tf ) +

∫ tf

0

L(x, uP , uE)dt (3.29)

that the pursuer and the evader seek to maximize and minimize, respectively, define the

Hamiltonian with costates λ


H(x, uP , uE) = L+ λT (t) · f(x, uP , uE) (3.30)

In order to find the minimax solution, Isaacs’ technique is to define a parametric

surface representing the terminal states of the problem, and integrate backwards in time

starting from this surface. Although this will not yield an analytical solution that provides

a control law, the trajectories generated can be compared with those generated by the

geometric solution proposed earlier. If the trajectories from the two solutions coincide,

then the geometric solution is indeed the minimax solution.

For the rest of this thesis, any function that is superscripted with an asterisk (e.g.

u∗) indicates the minimax or optimal value that function can attain. Isaacs states that

the minimax solution imposes the following condition, known as Isaacs’ Main Equation

[2]:

H∗(xf , uP,f , uE,f , λf ) = 0 (3.31)

Also, the evolution of the costates in the minimax solution are governed by the fol-

lowing differential equations:

λ(t) = −∂H∗

∂x(3.32)

The costate and state differential equations form 2n equations that must be integrated

over time to yield the minimax trajectory. This requires 2n final conditions at the

terminal state. The choice of the final states to begin backwards integration over time

provides n such conditions, while Eq. 3.31 provides another condition. Additionally,

define the n − 1 dimensional parametric surface representing the terminal states by the

following:

xi = hi(s1, . . . , sn−1), i = 1, . . . , n (3.33)


The partial derivative of the terminal objective φ with respect to the parametric

surface obeys the following equalities:

∂φ

∂sk=

n∑i

λi∂hi∂sk

, k = 1, . . . , n− 1 (3.34)

Eq. 3.34 provides the remaining n−1 equations required for the final conditions. The

details of this solution technique applied to the SPSE problem for different speeds can

be found in Appendix A. Results of the solution trajectory generated by this backward

integration technique and that generated by the geometric solution proposed earlier for

the same initial conditions are compared in Fig. 3.4 for speed ratios of 1 and 1.5. As can

be seen, the trajectories are identical, confirming that the geometric solution does indeed

provide the minimax control law.

(a) vP /vE = 1 (b) vP /vE = 1.5

Figure 3.4: Comparison of the trajectories generated by the control law and Isaacs’

solution technique different pursuer-evader speed ratios. The circles indicate the initial

positions of the agents, while the crosses indicate the capture points.

This is a reasonable model of the evaders’ control inputs as the evaders are non-

cooperative and seek only to maximize its own objective without any consideration of


the other evaders’ objectives. Using Isaacs’ minimax strategy is the best that each evader

can do in the one-on-one case. Granted, it is possible that border trespassers may deviate

from this strategy, but as this model is an optimal representation of the SPSE safe house

problem, it will allow the pursuer to perform even better in most cases when the evader

opts for a different control law.

Note that the minimax strategy only applies when the evader’s reachable region does

not contain any part of Xsafe (i.e., the pursuer is guaranteed to capture the evader if it

employs the minimax strategy). In the case where the evader’s reachable region contains

some or all of Xsafe, the evader’s control law would be to move in a straight line towards

the closest point on Xsafe that it can safely reach. Define the evader’s ability to safely

reach the safe house as

σi =

1, ∃ xsafe ε Xsafe s.t. vPvE· ‖xEi

− xsafe‖ < ‖xP − xsafe‖

0, otherwise(3.35)

Then, the evader’s heading is given as

θEi=

tan−1(y∗−yEi

x∗−xEi

), σi = 0

tan−1(ysafe−yEi

xsafe−xEi

), σi = 1

(3.36)

3.3 Optimal Control Problem

With a control law for the evaders in place, the problem is now an optimal control

problem in terms of the pursuer’s control input. The problem is governed by the m state

equations x = f(x, u, t) given in Eq. 3.8 with m initial conditions x0, and we are trying

to maximize the following objective function, upon substituting Eqs. 3.12 and 3.13, with

respect to the control input u:


JP = φ(xf , tf ) +

∫ tf

0

L(x, u, t)dt, L(x, u, t) = 0

= mini=1,...,n

JEi(xEi,f )

= min{ minxsεXsafe

‖xpos,E1 − xs‖, . . . , minxsεXsafe

‖xpos,En − xs‖} (3.37)

The problem is a free-time problem with terminal constraints, meaning there is no

fixed time at which the problem ends, but it ends when specified conditions are met.

Assuming that we are not interested in situations where the pursuer fails to capture the

evaders before they reach the safe house (for which the objective function would have no

quantitative meaning), the success terminal condition, by expressing Eq. 3.10 in another

format, is as follows:

ψ(xf,tf ) =n∑i=1

zi(tf ) = 0 (3.38)

Define the Hamiltonian with costates λ

H = L+ λT (t) · f(x, u, t)

=

[λ1 · · ·λm

]·

fP (xP ,uP )

fE1(xE1 ,uE1)

...

fEn(xEn ,uEn)

(3.39)

The m costate equations of the problem are

λ(t) = −∂H∂x

(3.40)

We also have a terminal constraint multiplier ν that does not change over time, ie.

ν = 0 (3.41)


For optimality (ie. the control input u is optimal), the m costates are governed by the

following end-point conditions (we will from this point on denote all optimal conditions

with an asterisk):

λ∗(tf ) =∂φ(xf,tf )

∂x+∂ψ

∂x· νT (3.42)

As this is a problem with free final time, an additional terminal condition is imposed on

the Hamiltonian

H(xf,uf,tf,λf ) +∂φ(xf,tf )

∂t+ νT · ∂ψ(xf,tf )

∂t= 0 (3.43)

whereas the stationarity condition for a solution extrema is given by

∂HT

∂u= 0 (3.44)

Finally, to confirm that the extrema is a maxima, the following condition must be met:

∂2HT

∂u2< 0 (3.45)

Although the border patrol algorithm proposed in the next chapter is not based

on the standard optimal control problem formulation explained above, it is nonetheless

important to outline this framework. In Chapter 5, the trajectories generated by the

algorithm developed in this thesis are compared with those generated by the GPOPS

numerical optimal control solver, which checks its solution with the optimality conditions

presented here.

Chapter 4

Algorithms

In the previous chapter, the Pursuit-Evasion problem was presented and, using differential

game theory to make assumptions about the evaders’ control law, was modified into an

optimal control problem. Although numerical optimal control methods exist in literature

and in practice, they may require significant computational resources, do not provide

convergence guarantees to a feasible trajectory, and generally require an initial guess of

the solution that is close to the optimal solution for convergence. In this problem, the

goal is to find a practical path planner that can generate such trajectories for autonomous

vehicles in real-time, so it is required that any such planner be relatively efficient and

quickly generate paths for the vehicle to follow, be able to produce feasible trajectories

in any scenario, and do not require a priori knowledge of the solution.

Due to these requirements, we turn to the sampling-based methods of robot motion

planning algorithms. In roadmap and cell decomposition methods, the configuration

space is partitioned based on the location of obstacles, but in our case, the goal is

not obstacle-avoidance but optimizing the vehicle’s ability to capture evaders, so an

efficient decomposition structure may be difficult to generate. In potential field methods,

a potential function is assigned to the configuration space based on the knowledge of

obstacles, but in Pursuit-Evasion games, knowledge of regions of high and low potential

25

Chapter 4. Algorithms 26

is not readily available. The advantage of sampling-based path planning, in contrast

to these methods, is that they do not require a priori knowledge of the features of the

problem. Points are sampled at random, and trajectories found as experience with the

environment is accumulated. In particular, the Rapidly-exploring Random Tree (RRT)

is the sampling-based method of interest.

When adopting the RRT algorithm for use, the differences between the Pursuit-

Evasion problem of interest and the problem for which the RRT was intended means that

simple application of the sampling-based method without modifications will not suffice.

To bridge this gap while maintaining computational and optimality requirements, the

fundamental concepts of the dynamic programming approach to optimization are used. It

will be seen in the explanation of the border patrol algorithms that although the canonical

dynamic programming solution is not adhered to, Bellman’s principle of optimality and

the notion of backward induction do play a significant role in their designs.

This chapter provides a brief introduction to existing RRT algorithms and the basic

ideas of dynamic programming, followed by the modifications to the RRT proposed in this

paper for application to the Pursuit-Evasion optimal control problem presented earlier.

4.1 Rapidly-exploring Random Trees

The Rapidly-Exploring Random Tree (RRT) was developed for obstacle avoidance and

finding paths to specified destinations while having the ability to consider holonomic

constraints in the planning process [32]. The concept behind the RRT is to incrementally

map out the entire environment by sampling random points within it and connect these

points to the existing tree structure. First, the initial position of the vehicle is added

as a node to the search tree. Then, random points in the environment are sampled and

connected to the nearest node within the existing tree whose route from that node to the

new point is obstacle-free. As more sample points are added to the tree as nodes, the


environment is mapped out with obstacle-free paths to each node in the search tree [32].

Algorithm 1 Main program of the RRT algorithm

V ← {xinit}, E ← ∅for i = 1 to N do

xnew ←Sample()(V, E)←AddNode(V, E,xnew)

end for

Algorithm 2 AddNode function, RRT

xnearest ←Nearest(V,xnew)xfree ←ObstacleCheck(xnearest,xnew)if xfree 6= xnearest thenV ← V ∪ {xfree}E ← E ∪ {(xnearest,xfree)}

end ifreturn (V, E)

The algorithm can be found in Algorithms 1 and 2 [31]. V contains the location of

the nodes in the tree, while E consists of duples (xparent, xchild) representing the edges

connecting the nodes. In the main program, random sample points in the configuration

space are sampled via the Sample function and incrementally added to the tree via the

AddNode function. In this function call, the closest existing node in V , xnearest, is found

using Nearest and chosen to be the parent of the new node, xnew, to be added to V .

ObstacleCheck is called to find the configuration in obstacle-free space along the path

from xnearest to xnew that is closest to xnew; if the entire path is obstacle-free, it simply

returns xnew. If this configuration is unique, then it is added to the tree.

The RRT algorithm is probabilistically complete in that the probability of finding a

feasible path from the initial configuration to the goal, if it exists, converges to one [34].

However, the solution is not optimal in the sense of finding the shortest path to the goal.

A more recent algorithm, called the RRT∗, addresses this issue [35].

The algorithm is presented in Algorithm 3. The RRT∗ modifies the AddNode function

in how it connects the new node to the tree: instead of connecting it to the nearest node

in the tree, all nearby nodes xnear within a certain radii from xnew, found with the Near


function, are evaluated as candidate parents. xnew is then connected to the candidate

node that yields the lowest cost (i.e. shortest path) from the initial configuration to xnew.

After adding xnew to the tree, the algorithm also checks if rewiring nearby nodes xnear

to xnew would reduce the path length from the initial configuration to xnear. If so, the

existing edge in E containing xnear as the child is removed, and a new edge is added

where xnew is the parent of xnear. It has been shown that the path returned by RRT∗

converges to the time-optimal solution if lower bounds for the AddNode search radii given

in [35] are satisfied.

Algorithm 3 AddNode function, RRT∗

xnearest ←Nearest(V,xnew)if ObstacleFree(xnearest,xnew) then

xmin ← xnearest

Xnear ← Near(V,xnew, ‖V‖)for all xnear ε Xnear do

if ObstacleFree(xnear,xnew) thenif Cost(xnear) + Length(xnear,xnew) < Cost(xnew) then

xmin ← xnear

end ifend if

end forV ← V ∪ {xnew}E ← E ∪ {(xmin,xnew)}for all xnear ε Xnear\{xmin} do

if ObstacleFree(xnew,xnear) thenif Cost(xnew) + Length(xnew,xnear) < Cost(xnear) then

xparent ← Parent(xnear)E ← E\{(xparent,xnear)}E ← E ∪ {(xnew,xnear)}

end ifend if

end forend ifreturn (V, E)

4.2 Dynamic Programming

In the RRT algorithm, the goal was a fixed area in the configuration space, but in the

Pursuit-Evasion problem at hand, the capture locations of the evaders (the “goals”) are


not readily known. Even by assuming that we know the control law of the evaders as

defined in Section 3.2 and that we can predict their trajectories, the continuous setting

of the problem makes it numerically impossible for the RRT to stumble upon these goal

states by just picking random points in space for the pursuer to travel to and hope that

when the pursuer gets there, one of the evaders happens to be there as well. In order

to address this problem, the dynamic programming approach to solving optimization

problems will be used partially in the design of the border patrol algorithm.

Consider the following discrete-time problem: given a set of admissible states S for

the problem, and a set of admissible actions A(s) that depends on the state s ε S the

problem is currently in, there exists instantaneous rewards that are acquired whenever an

action a is taken for a given state s. The value of this reward is represented by r(s, a), and

the objective of the problem is to acquire the greatest sum of rewards possible starting

from any initial state s0 in n time steps. The basic idea of dynamic programming in

solving problems of this type rests on Bellman’s principle of optimality:

Any optimal policy has the property that, whatever the current state and decision, the

remaining decisions must constitute an optimal policy with regard to the state resulting

from the current decision. [39]

Define the value V π(s) as the expected total reward, obtained from the objective

function, of being in s and following a control policy π that defines the actions a taken

for any given state s. If V ∗(s) denotes the value function for the optimal control policy,

the principle of optimality can be expressed mathematically by Bellman’s equation for

time-discounted problems, where γ ε (0, 1) is the discount factor [40]:

V ∗(s) = maxa{rt+1 + γV ∗(st+1) | st = s, at = a} (4.1)

In order to solve this equation to find the optimal control policy, the idea of backward

induction is used. Starting from the final time step n of the problem, there is no action

to be taken as the problem has terminated, so it follows that there is no reward to be


acquired. Thus, V ∗(sn) = 0 for all s. Then, go back one time step to n−1, and calculate

the optimal value functions V ∗(sn−1) for all possible states s ε S using Eq. 4.1. Repeat

this process of moving back one time step and calculating the value functions for time

t − 1 based on the value functions for t found in the previous iteration, until one has

arrived at t = 0. When this process is complete, the optimal control policy is found for

all s and their corresponding value functions computed.

Although dynamic programming provides a systematic method of finding the opti-

mal control policy, its usefulness in practical applications is limited due to the curse

of dimensionality [40]. As the number of state variables increase, the number of pos-

sible states grows exponentially, and subsequently the number of control policies that

must be evaluated. Furthermore, dynamic programming requires a discrete set of states,

actions, and time steps: using the dynamic programming approach for continuous prob-

lems requires discretizing these sets, and for computational time to be reasonable in the

Pursuit-Evasion problem of interest, the resolution of the discretization mesh would be

too poor for meaningful results.

Still, dynamic programming principles can be borrowed to overcome the issue men-

tioned in the beginning of this section, illustrated in Fig. 4.1. In the research problem

described in Chapter 3, a multi-stage problem is to be solved: for instance, if three

evaders are initially on the loose, the problem would have three stages:

• Stage 1: All three evaders are active, while the pursuer seeks to capture/immobilize

one of them

• Stage 2: One evader has been immobilized, while the remaining two evaders are

active. The pursuer seeks to capture/immobilize one of them

• Stage 3: Two evaders have been immobilized, while the remaining evader is active.

The pursuer seeks to capture/immobilize this last evader


(a) Stage 3 (b) Stage 2

(c) Stage 2 Refinement (d) Stage 1

Figure 4.1: Dynamic programming applied to the border patrol problem.

It is clear that the optimal control policy for Stage 3 is known: it is the SPSE

minimax strategy proposed by Isaacs provided in Section 3.2. Thus, if an algorithm can

be found to find the optimal control policies for Stages 1 and 2 of the problem, the overall


solution to the problem is complete. To find the solution to Stage 2 of the problem, the

RRT algorithm can be used to sample random points and try various trajectories for P

to follow, and once P has completed this trajectory, it switches to the SPSE minimax

strategy to capture the evader (Fig. 4.1b). Although using the SPSE strategy is not

optimal in this case, it is important to provide a policy that captures the evader so that

Stage 2 can be connected to Stage 3. By sampling more points to refine and improve the

trajectory, the length of Stage 2 that relies on the SPSE strategy is reduced (Fig. 4.1c),

and the solution brought closer to optimal. Finally, once Stage 2 is complete, Stage 1

can be carried out in a similar procedure as that of Stage 2 (Fig. 4.1d).

It will be seen that the sampling-based algorithm proposed for border patrol, in

fact, does not follow the backward induction technique by completing Stage n before

commencing Stage n − 1; all stages are sampled and refined simultaneously. What this

illustration demonstrates, however, is that the principle of optimality forms the backbone

of the algorithm developed for this problem. As we already know the minimax optimal

solution for Stage 3, the optimal value function for Stage 3 is known. We then proceed to

estimate the value functions for Stage 2 based on the optimal values for Stage 3 using the

RRT algorithm, and progressively improve them as more samples are taken. At the same

time, the value functions for Stage 1 are estimated based on the value function estimates

for Stage 2. As the number of samples made increases, the value functions approach the

optimal values, and the non-optimal influence of the SPSE minimax strategy used to

connect the different stages gradually dies out.

4.3 RRT Modification for SP2E

When adopting the RRT for use in Pursuit-Evasion problems, several conceptual dif-

ferences first need to be brought to the forefront. In the RRT∗, the accumulated cost

of each node was easily computed as the path length from the starting configuration of


the vehicle to the node. However, in the border patrol scenario, there is no accumulated

cost, as can be seen by simple examination of Eq. 3.13. Since the RRT∗ uses accumulated

cost as the metric for connecting and rewiring nodes in the tree, any suitable algorithm

for the border patrol problem must use a different means to determine how nodes are

connected.

Also, the number of states of the problem has been greatly increased due to the

presence of evaders. Although a natural extension of the RRT would be to include the

states of the evaders in its Sample procedure, this is computationally very expensive:

assuming unicycle dynamics for all players, the state vector size would increase from

2 without evaders to (2 + 2n) for n evaders. Thus, the number of nodes that would

need to be sampled in order to find a feasible trajectory would dramatically increase.

Furthermore, much computational effort would be involved in order to find a control

input for P that would cause the states of the evaders to arrive at the values for the new,

randomly sampled, node. In light of such computational difficulties, the algorithm must

keep the size of the sample space down while retaining information about the states that

are excluded from the sample space.

Algorithm 4 Main program of the SP2E Algorithm

xinit ← [xP,init xE1,init xE2,init]T

V ← {xinit}, E ← ∅for i = 1 to N do

xP,new ←Sample(V)(V, E)←AddNode(V, E,xP,new)

end for

4.3.1 Sampling

The main program of the algorithm developed for the 2-evader border patrol problem

is shown in Algorithm 4 and samples only xP when determining the state x of a new

node. Since the values of xE1 and xE2 are also needed for x, they are computed based

on the edges connecting the new node to the initial configuration (i.e. the root node).


The procedure for computing these states will be explained in 4.3.2.

In order to improve the efficiency of the algorithm, it is desired that the samples,

although random, be more evenly distributed in the sample space. The Sample(V) func-

tion used in Algorithm 4 takes a random sample for xP and checks whether it is at least

a required distance away from the closest node in V . If so, it returns that sample; if

not, it takes a new random sample and continues to do so until it satisfies the minimum

distance requirement.

Algorithm 5 AddNode function, SP2E

xnearest ←Nearest(V,xP,new)(xE1,curr,xE2,curr)←Steer(xnearest,xP,new)xparent ← xnearest

xnew ← [xP,new xE1,curr xE2,curr]T

Xnear ← Near(V,xP,new, ‖V‖)for all xnear ε Xnear do

(xE1,curr,xE2,curr)←Steer(xnear,xP,new)xcurr ← [xP,new xE1,curr xE2,curr]T

if V πS (xcurr) > V πS (xnew) thenxparent ← xnear

xnew ← xcurr

end ifend forV ← V ∪ {xnew}E ← E ∪ {(xparent,xnew)}for all xnear ε Xnear\{xparent} do

(xE1,curr,xE2,curr)←Steer(xnew,xP,near)xcurr ← [xP,near xE1,curr xE2,curr]T

if V πS (xcurr) > V πS (xnear) thenxparent ← Parent(xnear)V ← V\{xnear}E ← E\{(xparent,xnear)}xnear ← xcurr

V ← V ∪ {xnear}E ← E ∪ {(xnew,xnear)}

end ifend forreturn (V, E)

4.3.2 Node Connection

In the AddNode function of the 2-evader border patrol algorithm shown in Algorithm 5,

the general two-phase structure of the RRT∗ algorithm is used: xnew is added to the tree


and connected to the best candidate parent node, and then nearby nodes are rewired

to xnew if it improves their associated costs. When considering different nearby nodes

xnear as parents of xnew, xE1,new and xE2,new will be different depending on which node

is chosen to be the parent. In order to calculate these values, the Steer(xnear,xP,new)

function reads the positions of P , E1, and E2 at xnear, and returns the final positions of

E1 and E2 if P were to travel in a straight line from xP,near to xP,new, and E1 and E2

assumed Isaacs’ control law provided in Section 3.2 for the duration of P ’s travel from

xP,near to xP,new.

4.3.3 Cost Estimate

The metric used for evaluating the benefit of connecting a node to a candidate parent

node is based on the positions of the evaders using the Steer function explained in the

previous section, and the expected reward should P assume a one-by-one SPSE pursuit

policy for the remainder of the problem (as introduced in Section 4.2). Let π1,2(x) be the

pursuit policy where P uses Isaacs’ minimax strategy in Section 3.2 to capture E1, and

once E1 has been immobilized, uses Isaacs’ minimax strategy to capture E2. Similarly,

let π2,1(x) be the pursuit policy where P uses Isaacs’ minimax strategy to capture E2,

and once E2 has been immobilized, uses Isaacs’ minimax strategy to capture E1. Also, let

V π(x) be the value, i.e. the expected total reward obtained from the objective function,

of state x if P follows a policy π. Then, the one-by-one SPSE pursuit policy is

πS(x) = arg maxπ=π1,2,π2,1

V π(x) (4.2)

In Algorithm 5, for each candidate parent node xnear, the Steer function computes

the expected positions of E1 and E2 if P travels from xP,near to xnew, and stores them

in xcurr. Then, the value of xcurr under the policy πS is compared with that of the best

candidate parent found so far. If the value of xcurr is higher than the previous best,

xnear is set as the new best candidate parent, and the positions of E1 and E2 for xnew


updated. This is repeated for all xnear ε Xnear, after which xnew is added to the tree

and connected to the best candidate parent found at the end of this selection process.

4.3.4 Rewiring

When rewiring nearby nodes to xnew, the same Steer function described in Section 4.3.2

is called to determine the positions of E1 and E2 should xnear be rewired to xnew as

its parent. Steer(xnew,xP,near) reads the positions of P , E1, and E2 at xnew, and

returns the final positions of E1 and E2 if P were to travel in a straight line from xP,new

to xP,near, and E1 and E2 assumed Isaacs’ control law provided in Section 3.2 for the

duration of P ’s travel from xP,new to xP,near.

Similarly, the same metric described in Section 4.3.3 is used to determine if setting

xnew as the parent node of xnear would improve the value of xnear. For each nearby

node xnear ε Xnear, after the Steer function computes the expected positions of E1 and

E2 and stores them in xcurr, the value of xcurr under the policy πS is compared with

the existing value of xnear. If it is higher, xnew is set as the new parent of xnear, and

the positions of E1 and E2 for xnear updated.

The class and function definitions in the C++ implementation of the SP2E border

patrol algorithm can be found in Appendix B.

4.4 RRT Modification for SP3E

In the case of 3 evaders, an additional stage is added to the start of the problem where the

pursuer must consider how its actions will influence the movement of all three opponents.

In addition, depending on whether E1, E2, or E3 is captured first during this stage, there

are three possible combinations of remaining evaders still on the loose. Due to this added

complexity, the algorithm developed for this scenario produces a random tree that is a

hybrid of 4 different trees:


• Subtree 0: occurring at the start of the problem when none of the evaders have

been captured yet

• Subtree 1: when E1 has been captured and P needs to capture E2 and E3



Since a third evader is now introduced into the problem, x also includes the state of

E3. Thus, whenever Sample takes random values for xP , the values of xE1 , xE2 , and

xE3 will be computed using Steer.

Furthermore, as the random tree now contains 4 subtrees, each representing a different

scenario of the problem, it is necessary to store that information in x. Let zc be a compact

representation of the mobility of the evaders, i.e. which evaders have been captured. If

evader i is captured, then zc = i. If no evaders have been captured, then zc = 0:

zc =

0 if z1 = z2 = z3 = 1

1 if z1 = 0, z2 = z3 = 1

2 if z2 = 0, z1 = z3 = 1

3 if z3 = 0, z1 = z2 = 1

(4.3)

zc then becomes a state stored in x whose purpose is to represent which subtree each

node belongs to. A node that belongs to subtree i will thus attain zc = i and will be

called a Type i node for the rest of this paper. Note that these 4 combinations of z1, z2,

and z3 are the only combinations that the nodes in the random tree can attain.

The SP3E algorithm is presented in Algorithms 6 to 8. The general structure of the

algorithm remains unchanged, with the AddNode function consisting of a parent selection

phase for xnew followed by a rewiring phase where xnew is considered as a parent for

nearby nodes. The main differences between the SP2E and SP3E algorithms lie in the


Algorithm 6 Main program of the SP3E Algorithm

zc,init ← 0xinit ← [xP,init xE1,init xE2,init xE3,init zc,init]

T

V ← {xinit}, E ← ∅for zc = 1 to 3 do

xspawn ←SpawnNode(xinit, zc)V ← V ∪ {xspawn}E ← E ∪ {(xinit,xspawn)}

end forfor i = 1 to N do

xP,new ←Sample(V)(V, E)←AddNode(V, E,xP,new)

end for

Algorithm 7 AddNode function, SP3E (connection phase)

xnearest ←Nearest(V,xP,new)(xE1,curr,xE2,curr,xE3,curr)←Steer(xnearest,xP,new)

xparent ← xnearest

xnew ← [xP,new xE1,curr xE2,curr

xE3,curr zc,nearest]T

if zc,new = 0 thenzc,spawn ←ChooseType(V)xspawn ←SpawnNode(xnew, zc,spawn)

elsexspawn ← ∅

end ifXnear ←Near(V,xP,new, ‖V‖)for all xnear ε Xnear do

(xE1,curr,xE2,curr,xE3,curr)←Steer(xnear,xP,new)

xcurr ← [xP,new xE1,curr xE2,curr

xE3,curr zc,near]T

if zc,curr = 0 thenzc,spawn,curr ←ChooseType(V)xspawn,curr ←SpawnNode(xcurr, zc,spawn,curr)

elsexspawn,curr ← ∅

end ifif V πS (xcurr) or V πS (xspawn,curr) > V πS (xnew) then

xparent ← xnear

xnew ← xcurr

xspawn ← xspawn.curr

end ifend forV ← V ∪ {xnew}E ← E ∪ {(xparent,xnew)}if xspawn 6= ∅ thenV ← V ∪ {xspawn}E ← E ∪ {(xnew,xspawn)}

end if


Algorithm 8 AddNode function, SP3E (continued - rewiring phase)

Xcheck ← {xnew,xspawn}for all xcheck ε Xcheck doXnear ←Near(V,xP,check, ‖V‖)for all xnear ε Xnear\Ancestor(xcheck) do

if zc,check = zc,near thenVsub ←SubTree(V, E,xnear)(xE1,curr,xE2,curr,xE3,curr)←Steer(xcheck,xP,near)

xcurr ← [xP,near xE1,curr xE2,curr

xE3,curr zc,near]T

V ′sub ←USubTree(V, E,xnear,xcurr)

if max(V πS (x ε V ′sub)) > max(V πS (x ε Vsub)) then

xparent ← Parent(xnear)V ← V\VsubE ← E\{(xparent,xnear)}xnear ← xcurr

V ← V ∪ V ′sub

E ← E ∪ {(xcheck,xnear)}end if

end ifend for

end forreturn (V, E)

hybridization of the subtrees, the way in which cost estimates are used to evaluate node

connections, and the rewiring process. These differences will be explained below.

4.4.1 Tree Hybridization

In order to join the Subtrees 1, 2, and 3 to Subtree 0, it is necessary that the root nodes

of Subtrees 1, 2, and 3 have Type 0 parent nodes. The SpawnNode(x, i) function creates

a Type i node by steering the problem from x to the state where Ei would be captured

if P were to employ Isaacs’ SPSE minimax strategy. This node spawned by SpawnNode

serves as the root node of Subtree i, and is connected to the node x which spawned it.

In the main program (Algorithm 6), this function is called for the Type 0 root node of

the random tree, xinit. In order to start up the algorithm so it contains at least one

node for each subtree, SpawnNode is called three times to create Type 1, 2, and 3 nodes

connected to xinit.


In the AddNode function (Algorithm 7), SpawnNode serves a different purpose. When

a parent node for xnew is found, xnew will assume a value for zc equal to that of xparent

(i.e. if the parent node is Type i, the new node will also be Type i). In order to determine

which candidate parent is best, the values obtained by connecting xnew to the candidate

parent as described in Section 4.3.3 are compared. However, if the candidate parent node

is Type 0, it does not have an associated value V πS , since the policy πS assumes only two

mobile evaders are present, and Type 0 nodes by definition assumes three mobile evaders.

Thus, after Steer is called to drive the state from the candidate parent node to xnew,

SpawnNode is called to create a Type 1, 2, or 3 node that will have an associated value;

the value of this spawn node will then be used to compare with that of other candidate

parent nodes (or their respective spawn nodes). The spawn node type is determined by

the ChooseType function, which randomly picks Type 1, 2, or 3 based on the number of

nodes of each type currently in V :

n1 = ‖V(zc = 1)‖+ ε

n2 = ‖V(zc = 2)‖+ ε

n3 = ‖V(zc = 3)‖+ ε

P (ChooseType = i) =ni

n1 + n2 + n3

, i = 1, 2, 3 (4.4)

Here, ε is a constant value that adjusts the bias towards growing the largest current

subtree. The higher the value of ε, the lower the bias and more even the probability

between the different types. If a Type 0 node is ultimately chosen as the parent of xnew,

the spawn node is also added to V as well as xnew.

4.4.2 Rewiring

The rewiring process described in Algorithm 8 is similar to that in 4.3.4, except that

when a specific rewiring is being evaluated, the values of all descendant nodes affected by


that rewiring are checked. The SubTree(V , E,xnear) function returns the branch Vsub

of V whose root is xnear. Meanwhile, the USubTree(V , E,xnear,xcurr) function also

returns the branch V ′sub of V whose root is xnear, but replaces xnear with xcurr and

updates the states of all its descendant nodes. Thus, whenever rewiring xnear to xnew is

considered, these two functions are called to generate the original and updated branch,

respectively. If the highest value of any node in the updated branch V ′sub is greater than

that of the original branch Vsub, the rewiring is carried out. Note that rewiring is only

considered if xnear and xnew are of the same type.

Chapter 5

Simulations

The algorithms presented in the previous chapter are applied to the formulation given in

Section 3.1.2 for different scenarios. To demonstrate their ability to generate trajectories

that are close to the optimal solution, we compare the results with those generated by

the GPOPS numerical optimal control method [41].

5.1 GPOPS

It is important to find the optimal solution for the control problem presented above

as a benchmark for the online path planning algorithm to be presented in the next

section. This will serve as a proper indicator of how well the algorithm performs. In

this thesis, a numerical method package known as the General Pseudospectral Optimal

Control Software (GPOPS) was used for this purpose. This solver first converts the

optimal control problem into an equivalent nonlinear programming problem and solves

that instead [41]. Although GPOPS is an offline numerical method that requires a good

initial guess of the solution for it to converge and can require several minutes of runtime

to produce the optimal solution, it is used here as a benchmark to compare the border

patrol algorithms to.

GPOPS is an open-source numerical optimal control software that uses the Radau

42

Chapter 5. Simulations 43

Pseudospectral Method [41] and is superior to numerical shooting methods for solv-

ing optimal control problems in terms of radii of convergence [42]. It is a collocation

method that converts the dynamic equations of motion into a system of equations using

time-marching methods, thereby changing the optimal control problem into a nonlinear

programming problem of the form

min J(q) (5.1)

subject to

g(q) = 0 (5.2)

where, if xi and ui are the state and control at different times ti and tM = tf , then

q = (x0,x1, ...,xM ,u0,u1, ...,uM−1)

The constraints in Eq. (5.2) include the system of equations resulting from the dis-

cretization of the dynamic equations of motion, and the terminal constraints ψ [42]. In

GPOPS, the collocation points are the Legendre-Gauss-Radau points and uses Gaussian

quadrature to approximate the integrals in the problem [41]. The software also checks

whether the optimality conditions of the equivalent optimal control problem are indeed

satisfied by the generated numerical solution.

Because the evaders have a switching control law as indicated by Eq. (3.36), a sigmoid

function was used to remove the discontinuity at the point where the evader’s control

input switches for the purpose of allowing GPOPS to converge to a solution:

p (±, xEi, yEi

) =1

1 + e±ζ

[vPvE·((xEi

−xD)2+(yEi−yD)2)−(xP−xD)2+(yP−yD)2

] (5.3)

where ζ is the sigmoid sharpness parameter. The higher the value of ζ, the more the

sigmoid function resembles a step function. In other words, increasing the sharpness


Figure 5.1: SP2E Optimal Solution using GPOPS

parameter will make the switch more sudden and accurate, but closer to the original

discontinuous function and thus less likely for the numerical method to converge to a

solution. Upon incorporating the sigmoid function into the evader’s dynamics, the state

equations of evader i would then be

xEi

yEi

=

zi · vE cos θEi

zi · vE sin θEi

(5.4)

where θEi= tan−1

(p(−, xEi

, yEi) · y∗ + p(+, xEi

, yEi) · ysafe − yEi

p(−, xEi, yEi

) · x∗ + p(+, xEi, yEi

) · xsafe − xEi

)

The optimal solution for a particular SP2E scenario found using GPOPS is shown

in Fig. 5.1. For more information on GPOPS or numerical optimal control methods, we

refer the reader to [42, 41, 43]. The test case assumes the pursuer and evaders have equal

speeds, and that the pursuer’s capture radius is zero.


(a) (a)N = 80 (b) (b)N = 200

(c) (c)N = 280 (d) (d)N = 400

Figure 5.2: Progression of the random tree produced the SP2E border patrol algorithm

as the number of nodes increases. The nodes are represented by the cyan dots and the

edges connecting them by the dashed lines.

5.2 SP2E

Algorithms 4 and 5 were implemented in C++ and run for various test cases. In Fig. 5.2,

the pursuer is faster than the evaders by a factor of 1.5, and has a capture radius of 0.


The random tree is shown in cyan, whereas the trajectories of the agents are shown in

solid lines. The pursuer’s trajectory is the path from the root node of the random tree

to the node with the highest value. Once the pursuer has arrived at that node, it then

proceeds to capture the evaders using Isaacs’ SPSE minimax strategy one at a time. The

crosses indicate the positions of the agents when the evaders are captured. The subplots

show the progression of the random tree as N , the number of nodes added to it, increases.

Simulations were also run for various pursuer-evader speed ratios. The trajectories

for these test cases were presented in [44] and are shown here in Fig. 5.3 by the solid lines.

The pursuer has a capture radius of 0 m, while the speed ratio is varied from 1 to 1.7.

The random tree was preset to grow until it contained 200 nodes. For these scenarios,

the approximate runtime on a 2.10 GHz Intel Core 2 Duo laptop was 7 s.

The same scenarios were run using GPOPS and their trajectories represented in the

same figures by dashed lines. As can be seen, the results are very similar, and the

difference in objective function very minimal: for Fig. 5.3a, the closest capture distance

predicted by GPOPS was 0.98245 m, whereas the RRT algorithm generated a trajectory

whose closest capture distance was 0.97667 m. Though the capture of E1 may occur at a

slightly different location, the capture location of E2 predicted by both methods are the

same. Since E2 is able to get closer to the safe house, the capture location of E1 in this

case has no influence on the objective function.

Simulations were also carried out where the pursuer’s capture radius was varied for

the same player initial positions. Figure 5.4, also presented in [44], shows the trajectories

generated by the RRT algorithm and GPOPS for the scenario where the pursuer-evader

speed ratio was fixed at 1.5, whereas the pursuer’s capture radius was varied from 0 to

0.5 m. Once again, the trajectories produced by the RRT and GPOPS algorithms are

similar.


(a) (a)vP /vE = 1 (b) (b)vP /vE = 1.4

(c) (c)vP /vE = 1.5 (d) (d)vP /vE = 1.7

Figure 5.3: Trajectories generated by the SP2E border patrol algorithm (solid lines) and

the GPOPS numerical method (dashed lines) for different speed ratios. The random tree

produced by the RRT algorithm was grown until it contained 200 nodes.


(a) (a)rcapt = 0 (b) (b)rcapt = 0.20m

(c) (c)rcapt = 0.35m (d) (d)rcapt = 0.50m


the GPOPS numerical method (dashed lines) for different capture radii. The random

tree produced by the RRT algorithm was grown until it contained 200 nodes.


(a) (a)N = 80 (b) (b)N = 160

(c) (c)N = 280 (d) (d)N = 400

Figure 5.5: Progression of the random tree produced the SP3E border patrol algorithm

as the number of nodes increases. The nodes are represented by the dots and the edges

connecting them by the dashed lines.

5.3 SP3E

Algorithms 6 and 7 were implemented in C++ and run for various test cases. In Fig. 5.5,

the pursuer is faster than the evaders by a factor of 1.7, and has a capture radius of 0.


The random tree is shown in various colours: nodes belonging to Subtree 0 are shown in

cyan, Subtree 1 in red, Subtree 2 in magenta, and Subtree 3 in black. As was the case in

SP2E, the pursuer’s trajectory is the path from the root node of the random tree to the

node with the highest value. Once the pursuer has arrived at that node, it then proceeds

to capture the remaining two evaders using Isaacs’ SPSE minimax strategy one at a time.

The crosses indicate the positions of the agents when the evaders are captured.

Simulations were also run for various pursuer-evader speed ratios and the resulting

trajectories shown in Fig. 5.6 by the solid lines. The pursuer has a capture radius of 0

m, while the speed ratio is varied from 1.5 to 2.0. The random tree was preset to grow

until it contained 200 nodes. For these scenarios, the approximate runtime on a 2.10

GHz Intel Core 2 Duo laptop was 12 s.

The same scenarios were run using GPOPS and their trajectories represented by

dashed lines. Again, the results are very similar, and the difference in objective function

very minimal. In fact, in Fig. 5.6b, the border patrol algorithm actually outperforms

the trejectory generated by GPOPS: the closest capture distance predicted by GPOPS

was 0.31391 m, whereas the RRT algorithm generated a trajectory whose closest capture

distance was 0.55008 m. In this case, it is likely that GPOPS is stuck in a local maxima,

while the RRT algorithm presented in this paper has the ability to overcome such ob-

stacles due to its sample-based design. For the other scenarios presented in Fig. 5.6, the

proximity of the closest capture location from the safe house found by the border patrol

algorithm and GPOPS are very similar.

Simulations were also carried out where the pursuer’s capture radius was varied for

the same player initial positions. Fig. 5.7 shows the trajectories generated by the RRT

algorithm and GPOPS for the scenario where the pursuer-evader speed ratio was fixed

at 1.6, while the pursuer’s capture radius was varied from 0 to 0.6 m. Once again, the

trajectories produced by the RRT and GPOPS algorithms are similar.


(a) (a)vP /vE = 1.5 (b) (b)vP /vE = 1.6

(c) (c)vP /vE = 1.8 (d) (d)vP /vE = 2.0


the GPOPS numerical method (dashed lines) for different speed ratios. The random tree

produced by the RRT algorithm was grown until it contained 200 nodes.


(a) (a)rcapt = 0 (b) (b)rcapt = 0.20m

(c) (c)rcapt = 0.40m (d) (d)rcapt = 0.60m


the GPOPS numerical method (dashed lines) for different capture radii. The random

tree produced by the RRT algorithm was grown until it contained 300 nodes.

Chapter 6

Experiments

To demonstrate the feasibility of using the border patrol algorithms developed earlier for

physical applications, experiments were conducted on ground robots for both the two and

three-evader scenarios. By developing a suitable closed-loop control law for the pursuer

to follow the trajectory produced by the algorithm, and by incorporating feedback of the

evaders’ positions, a complete picture of the border patrol pursuit system is provided

that can capture border trespassers before they arrive at the safe house.

6.1 Setup

The experimental setup consists of four iRobot Creates, shown in Fig. 6.1, three acting

as the evaders and one as the pursuer. The four robots are identical and acquire their

poses from a 10-camera Vicon motion capture system mounted at the perimeter of the

test facility. The experimental scenario is set up as defined in 3.1.2, with lb = 5m and

rsafe = 1.25m. The evader robots begin at Xenter, the left-half perimeter of X, while the

pursuer is allowed to begin at any point in X. Once the problem begins, the pursuer

robot must compute a trajectory to follow so it can capture all evaders before they arrive

at Xsafe. An evader is assumed to be captured if it is within the pursuer’s capture region

Xcapt.

53

Chapter 6. Experiments 54

Figure 6.1: Experimental setup consisting of four iRobot Create ground robots in a 5 m

by 5 m environment

The software modules implemented for this experiment and data flow is described

in Fig. 6.2. Each robot continually reads its own pose and the pose of its adversaries

from the Vicon Motion Capture System. For the evaders, these poses are sent to the

guidance module (labelled Isaacs algorithm) that computes Isaacs’ minimax destination.

This destination is then sent to the controller, which uses this reference state to calculate

the control input to be sent to the motion driver.

The pursuer’s software architecture is more complex, as the guidance system needs to

switch between the trajectory generated by the RRT border patrol algorithm and that

generated by the Isaacs algorithm. The waypoint planner module handles this task of

determining which guidance module to follow. For the most part, the RRT algorithm’s

waypoints are followed whenever they are published. However, when the RRT algorithm

publishes a waypoint that is the expected capture location of an evader, or when the


pursuer has completed all waypoints published by the RRT algorithm, the waypoint

planner hands control over to the Isaacs algorithm module for the purpose of capturing

a specific evader.

Figure 6.2: Experimental setup data flow

6.2 Controller Design

The waypoints provide the controller module with reference states that the robot needs to

arrive at sequentially before moving on to the next waypoint. Given an inertial reference

frame FV whose coordinates are given by the Vicon Motion Capture System and a goal

state xg = [xg yg θg]T , the control law developed to drive the robot to this state is based

on the motion controller proposed in [45]. Assume a robot reference frame FR whose

origin is pinned to the robot center of mass and whose x-axis points in the direction of


the robot heading θV , as shown in Fig. 6.3. As the RRT algorithm only provides the x

and y coordinates of the goal state xg, the desired final heading θg is specified to be the

angle between the vector from the initial position of the robot to the goal state and the

x-axis of FV . Also assume a goal inertial frame FG translated and rotated with respect

to FV by (xr, yr) and θg. The pose error vector in FR is given by

e =

[x y θ

]T(6.1)

As the control inputs to the robot are its forward and angular speeds v and ω, the task,

then, is to find a controller of the form

v(t)

ω(t)

= K · e =

k11 k12 k13

k21 k22 k23

e, kij = k(t, e) (6.2)

such that the pose error vector e goes to zero in finite time [45].

The kinematics of the robot in the goal inertial frame is given by

x

y

θ

=

cos θ 0

sin θ 0

0 1

vω

(6.3)

Let r be the vector from the position of the robot to the goal state, and α be the

angle between r and the x-axis of FR. Perform a coordinate transformation into polar

coordinates in FG:

ρ =√

∆x2 + ∆y2 (6.4)

α = atan2(∆y,∆x)− θ (6.5)

β = −θ − α (6.6)

where ρ is the distance between the robot and the goal and β is the angle between r and

the x-axis of FG. The kinematics expressed in these coordinates is


Figure 6.3: Robot kinematics

ρ

α

β

=

− cosα 0

sinαρ

−1

− sinαρ

0

vω

(6.7)

The proposed control law takes the form

vω

=

kρ 0 0

0 kα kβ

ρ

α

β

(6.8)

For this application, as we constrain the vehicles to travel at constant speeds, we set

kρ = kcρ/ρ, where kcρ is constant. The resulting closed loop system has been shown in


[45], for specific ranges of kρ, kα, and kβ, to be locally exponentially stable if αεI1, where

I1 = (−π2,π

2] (6.9)

By choosing appropriate gains that lie within the range guaranteeing stability, a control

law was developed such that the robots can follow the specified trajectories to reach their

intended waypoints. If α happens to lie outside of I1 when a new waypoint is published,

the robot temporarily switches to an auxiliary control law where kρ = 0 so that the robot

can perform a stationary turn until α is driven back to within I1.

6.3 SP2E

With the setup and control law presented above, experiments were conducted with two

evaders to test the feasibility of the SP2E algorithm. As the robots have a diameter of

0.35 m, the smallest allowable capture radius was 0.5 m since reducing rcapt further would

cause them to crash into each other. The pursuer’s speed command was fixed at 0.15

m/s, and the random tree grown to 200 nodes before the robots began moving.

The results are presented in Figs. 6.4, 6.6, and 6.8. For comparison purposes, trajec-

tories produced by simulations for the same test cases are plotted on the same graphs.

The speeds and the distances traveled by the pursuer, as predicted using a Kalman filter

that uses Vicon data as position measurements, are plotted in Figs. 6.5, 6.7, and 6.9.

More detail on the Kalman filter design can be found in Appendix C. As can be seen, the

experimental trajectories (thick lines) and simulation trajectories (thin lines) are similar,

especially in Fig. 6.6. Although the simulation trajectory predicts slightly better results

in Figs. 6.4 and 6.8, this is mainly due to the robot having to perform stationary turns

at certain waypoints due to the control law stability requirement in Eq. 6.9. The turn-

ing constraint of the robot was also a factor, though not as significant, in the difference

between predicted and actual capture locations of the evaders.


Figure 6.4: Experimental result of the SP2E border patrol algorithm, with vP/vE =

1.5, rcapt = 0.5m. The capture distance of the closest evader was 0.443 m.

Figure 6.5: Speed and distance traveled by the pursuer, as estimated by the Kalman

filter, for the case with vP/vE = 1.5, rcapt = 0.5m.


6.4 SP3E

Experiments were also conducted with three evaders using the SP3E algorithm. Once

again, the random tree was grown until it contained 200 nodes before the robots began

moving. The results are presented in Figs. 6.10, 6.12, and 6.14, while the speeds and

distances traveled predicted by the Kalman filter are shown in Figs. 6.11, 6.13, and

6.15. The experimental and simulation trajectories are similar quite similar in all three

test cases, and although the third evader adds a significant level of complexity to the

algorithm, there does not appear to be any degradation in results from the SP2E case.

In all three trials, it can be seen from the speed plots that there are instances where

the pursuer needs to stop and perform a stationary turn so its heading is within the

range specified by Eq. 6.9 for the controller to be stable. This usually occurs right after

the pursuer has caught the first evader and needs to alter its heading significantly to

pursue the next evader, and are the main causes for capture of the last evader closer to

the safe house than predicted. There are also instances in which the pursuer performs a

stationary turn after arriving at its first waypoint (i.e. the first troughs in Figs. 6.11 and

6.13), but these do not seem to have a significant effect on the result, as the first evader

in both cases is captured almost exactly at the predicted capture location.

From these experiments, the feasibility of the border patrol algorithms is verified.

Having tested them on unmanned ground vehicles in scenarios where the number of

evaders, speed ratio, and initial positions were varied, the similarities between the pre-

dicted and actual capture locations demonstrates that the algorithms can be used in

physical applications. The one variable in which the experimental setup did not allow

for significant variations was the capture radius, as the robot’s diameter provided a lower

limit of 0.5 m, while the 5 m width of the test facility limited the meaningfulness of trials

with capture radii greater than 0.6 m. Notwithstanding, since the algorithm was initially

designed for the case of rcapt = 0, the results for test cases with smaller capture radii

should be even better than the results presented in this chapter for larger capture radii.

Chapter 7

Conclusion

7.1 Contribution

In this thesis, an efficient sampling-based algorithm was presented for the border pa-

trol Pursuit-Evasion game, where a single vehicle was responsible for capturing multiple

trespassers before they found refuge in a target safe house. An appropriate model of

the evaders’ control law was first developed based on Isaacs’ Target Guarding problem,

extended to the case where the pursuer is faster than the evader, and demonstrated to be

the minimax solution in the SPSE case. Using this model, an algorithm was developed

for the resulting optimal control problem based on the Rapidly-exploring Random Tree

path planning algorithm and concepts from Dynamic Programming. Separate algorithms

were developed for the case of two and three evaders.

Simulations were run for different scenarios, including two and three evaders with

varying speed ratios, capture radii, and initial positions. By comparing the results with

the solution generated by the GPOPS numerical optimal control method, it was shown

that the trajectories produced by the algorithm were near-optimal and efficient in quickly

generating the solution. Experiments were also carried out to demonstrate the feasibility

of implementing this path planning algorithm on physical applications. A closed-loop

66

Chapter 7. Conclusion 67

feedback system was set up for the ground robots to capture evaders and a control

law developed for following the trajectory produced by the border patrol algorithm.

Experimental results showed that the final trajectories followed by the robot were similar

to that predicted in simulation.

7.2 Future Work

Future work on these algorithms includes considering the holonomic constraints of the

robot into the algorithms for the purpose of generating trajectories that can be more

accurately followed by the vehicle. A possible method for including such holonomic

constraints, such as the minimum turning radius, is to include the heading angle of the

pursuer in the algorithm’s sampling procedure, and connect nodes within the random

tree using Dubins’ paths, which were shown in [46] to be optimal in terms of path length,

and the choice of the six different Dubins’ paths classified for different scenarios in [47].

The addition of the heading angle during sampling increases the dimensionality of the

configuration space, so improvements in computational efficiency will be necessary for

the non-holonomic version of the algorithm to be efficient enough for real-time guidance

systems.

In addition, proving the algorithm’s convergence to the optimal solution as the number

of nodes in the tree increases and feasibility studies on the region in which the pursuer

is able to catch all evaders are theoretical tasks that would provide a more complete

picture of the border patrol problem. Also, while the algorithm is able to generate near-

optimal trajectories within several seconds, improvements can be made in improving its

computational efficiency. Such improvements would allow for expanding the algorithm

such that it continually improves the trajectory even as the robot is moving.

Appendix A

SPSE Target Guarding

State equations for the SPSE problem:

xP = vP cos θP yP = vP sin θP

xE = vE cos θE yE = vE sin θE

Objective function:

J = φ(xf ) +

∫ tf

0

L(x, uP , uE)dt, L(x, uP , uE) = 0

=√

(xE,f − xD)2 + (yE,f − yD)2 − rD

Terminal conditions:

ψ =√

(xE − xP )2 + (yE − yP )2 − rcapt = 0

Modified space:

x1 = xE − xP x2 = yE − yP

x3 = xD − xE x4 = yD − yE

The problem, in the modified space, is then

x1 = vE cos θE − vP cos θP x2 = vE sin θE − vP sin θP

x3 = −vE cos θE x4 = −vE sin θE

68

Appendix A. SPSE Target Guarding 69

J =√x23,f + x24,f − rD (A.1)

ψ =√x21 + x22 − rcapt = 0 (A.2)

Isaacs’ Main Equation:

H∗(xf , uP,f , uE,f , λf ) = 0

−vP (λ1 cos θ∗P + λ2 sin θ∗P )− vE((−λ1 + λ2) cos θ∗E + (−λ2 + λ4) sin θ∗E) = 0 (A.3)

From [1],

maxθP

(λ1 cos θP + λ2 sin θP ) = ρ1 where

ρ1 =√λ21 + λ22, cos θ∗P =

λ1ρ1, sin θ∗P =

λ2ρ1

and

minθE

((−λ1 + λ3) cos θE + (−λ2 + λ4) sin θE) = −ρ2 where

ρ2 =√

(−λ1 + λ3)2 + (−λ2 + λ4)2, cos θ∗E =−(−λ1 + λ3)

ρ1, sin θ∗E =

−(−λ2 + λ4)

ρ1

Eq. A.3 then becomes

−vPρ1 + vEρ2 = 0 (A.4)

The costate equations, from Eq. 3.32, are:

λ1 = λ2 = λ3 = λ4 = 0 (A.5)

Parametric terminal surface:

x1 = rcapt cos s1 x2 = rcapt sin s1

x3 = s2 x4 = s3

Appendix A. SPSE Target Guarding 70

From Eq. 3.34, the following end-point conditions are:

∂φ

∂s1= 0 = λ1(−rcapt sin s1) + λ2(rcapt cos s1) → λ2 = λ1 tan s1 (A.6)

∂φ

∂s2=

s2√s22 + s23

= λ3 (A.7)

∂φ

∂s3=

s3√s22 + s23

= λ4 (A.8)

Also, from Eq. A.4,

−vP√λ21 + λ22 + vE

√(−λ1 + λ3)2 + (−λ2 + λ4)2 = 0 (A.9)

Sub Eqs. A.6, A.7, A.8 in A.9:(((vPvE

)2

− 1

)sec2 s1

)λ21 + 2

(s2 + s3 tan s1√

s22 + s23

)λ1 − 1 = 0 (A.10)

This quadratic equation can be solved for an explicit expression of λ1, while Eqs. A.6,

A.7, and A.8 provide expressions for λ2, λ3, and λ4.

Backward integration: τ = tf − t, u = dudτ

x1 = vP cos θP − vE cos θE x2 = vP sin θP − vE sin θE

x3 = vE cos θE x4 = vE sin θE

Since λ’s are constant, trajectories are all straight lines emanating from the capture point.

Integration yields the following minimax trajectory:

x1 =

[vP

(λ1ρ1

)+ vE

(λ3 − λ1ρ2

)]τ + rcapt cos s1 (A.11)

x2 =

[vP

(λ2ρ1

)+ vE

(λ4 − λ2ρ2

)]τ + rcapt sin s1 (A.12)

x3 =

[−vE

(λ3 − λ1ρ2

)]τ + s2 (A.13)

x4 =

[−vE

(λ4 − λ2ρ2

)]τ + s3 (A.14)

Appendix B

C++ SP2E Algorithm

Node.h

#ifndef NODE_H

#define NODE_H

#include <vector>

#include "fn_header.h"

using namespace std;

class Node {

public:

int id; // this is only for MATLAB plotting purposes

double x, y;

double payoff;

double xe1,ye1,xe2,ye2;

vector<Node *> children;

Node * parent;

Node(double,double,Node *,int);

~Node();

void set_Epos(double,double,double,double);

void set_payoff(double);

void add_child(Node *);

void change_parent(Node *);

void steer(double,double,double &, double &, double &, double &);

vector<Node *> ancestors();

void update_descendants();

};

//--------------------------------------------------------------------------------

// Constructor

Node::Node(double x_in, double y_in, Node * parent_in, int id_in) {

x = x_in;

71

Appendix B. C++ SP2E Algorithm 72

y = y_in;

parent = parent_in; // if parent == 0, it is a null pointer and thus indicates a root node

id = id_in;

// if node has a parent, add node to parent’s list to children

if (parent != 0) parent->add_child(this);

}

//--------------------------------------------------------------------------------

// Destructor

Node::~Node() {

delete parent;

}

//--------------------------------------------------------------------------------

// set E1 and E2’s positions if P is at this node

void Node::set_Epos(double xe1_in, double ye1_in, double xe2_in, double ye2_in) {

xe1 = xe1_in;

ye1 = ye1_in;

xe2 = xe2_in;

ye2 = ye2_in;

}

//--------------------------------------------------------------------------------

// set payoff of node

void Node::set_payoff(double payoff_in) {

payoff = payoff_in;

}

//--------------------------------------------------------------------------------

// add child

void Node::add_child(Node * child_in) {

children.push_back(child_in);

}

//--------------------------------------------------------------------------------

// change parent

void Node::change_parent(Node * parent_in) {

// erase this node as a child of old parent

Node * old_parent = parent;

for (int i=0; i < (int) old_parent->children.size(); i++) {

if (old_parent->children[i] == this) {

old_parent->children.erase(old_parent->children.begin()+i);

break;

}

}

// change parent from old to new

parent = parent_in;

// add this node as a child of the new parent

parent->children.push_back(this);

}

//--------------------------------------------------------------------------------


// calculate E1 and E2’s coords if P went from current node to specified coords

void Node::steer(double x_f, double y_f, double &xe1_f, double &ye1_f, double &xe2_f, double &ye2_f) {

double xp_curr, yp_curr, xe1_curr, ye1_curr, xe2_curr, ye2_curr;

double theta_p, theta_e1, theta_e2, payoff1_min, payoff2_min;

double length, t_capt, n_raw, dt_use, dl_use;

int n_use;

xp_curr = x;

yp_curr = y;

xe1_curr = xe1;

ye1_curr = ye1;

xe2_curr = xe2;

ye2_curr = ye2;

// P’s heading

theta_p = atan2(y_f-yp_curr,x_f-xp_curr);

// modify time interval slightly s.t. n*dt is equal to the time need to travel to capture pt

length = sqrt(pow(x_f-xp_curr,2) + pow(y_f-yp_curr,2));

t_capt = length/v;

n_raw = t_capt/dt;

n_use = (int) ceil(n_raw);

dt_use = length/(n_use*v);

dl_use = v*dt_use;

int status1 = 0, status2 = 0;

// march forward in time until P arrives at specified coords

for (int t = 0; t < n_use; t++) {

if (status1 == 0) { //reduce # calls to find_heading if E makes it to safehouse

// find the optimal heading of E1

find_heading(xp_curr,yp_curr,xe1_curr,ye1_curr,xe1_f,ye1_f,payoff1_min,status1);

// find E1’s heading angle

theta_e1 = atan2(ye1_f-ye1_curr,xe1_f-xe1_curr);

}

if (status2 == 0) {

// find the optimal heading of E2

find_heading(xp_curr,yp_curr,xe2_curr,ye2_curr,xe2_f,ye2_f,payoff2_min,status2);

// find E2’s heading angle

theta_e2 = atan2(ye2_f-ye2_curr,xe2_f-xe2_curr);

}

// find P’s new position

xp_curr = xp_curr + dl_use*cos(theta_p);

yp_curr = yp_curr + dl_use*sin(theta_p);

// find E1’s new position

xe1_curr = xe1_curr + dl_use/k*cos(theta_e1);

ye1_curr = ye1_curr + dl_use/k*sin(theta_e1);

// find E2’s new position

xe2_curr = xe2_curr + dl_use/k*cos(theta_e2);

ye2_curr = ye2_curr + dl_use/k*sin(theta_e2);

}

// return final E pos

xe1_f = xe1_curr;

ye1_f = ye1_curr;

xe2_f = xe2_curr;


ye2_f = ye2_curr;

}

//--------------------------------------------------------------------------------

// produce vector of specified node’s ancestors

vector<Node *> Node::ancestors() {

Node * node_curr, * node_parent;

vector<Node *> ancestor_nodes;

node_curr = parent;

while (node_curr != 0) {

ancestor_nodes.push_back(node_curr);

node_parent = node_curr->parent;

node_curr = node_parent;

}

return ancestor_nodes;

}

//--------------------------------------------------------------------------------

// update properties of descendants (this function is called when node is rewired to new parent)

void Node::update_descendants() {

double x_child, y_child, xe1, ye1, xe2, ye2, payoff;

Node * node_child;

// for each child node

int num_children = (int) children.size();

for (int i=0; i < num_children; i++) {

// access current child node

node_child = children[i];

x_child = node_child->x;

y_child = node_child->y;

// travel from current node to child node

this->steer(x_child,y_child,xe1,ye1,xe2,ye2);

// find payoff as a result of getting to child node

payoff = find_payoff(x_child,y_child,xe1,ye1,xe2,ye2);

// update child’s E pos and payoff

node_child->set_Epos(xe1, ye1, xe2, ye2);

node_child->set_payoff(payoff);

// recursively call child’s update_descendants method

node_child->update_descendants();

}

}

#endif

Tree.h

#ifndef TREE_H

#define TREE_H


#include <iostream>

#include <fstream>

#include <sstream>

#include <vector>

#include <math.h>

#include "Node.h"

using namespace std;

class Tree {

public:

vector<Node *> node_list;

Tree(Node *,int);

~Tree();

void insertNode(Node *);

Node * nearest(double,double,double &);

vector<Node *> near(double, double, double);

void extend(double,double,int);

Node * max_payoff();

};

//--------------------------------------------------------------------------------

// constructor (root node supplied as argument)

Tree::Tree(Node * root_node, int max_size) {

node_list.push_back(root_node);

node_list.reserve(max_size);

}

//--------------------------------------------------------------------------------

// Destructor

Tree::~Tree() {

int num = (int) node_list.size();

// for each node in tree

for (int i = 0; i < num; i++) {

delete node_list[i];

}

}

//--------------------------------------------------------------------------------

// insert node into tree

void Tree::insertNode(Node * node_in) {

node_list.push_back(node_in);

}

//--------------------------------------------------------------------------------

// find nearest node to specified coordinates

Node * Tree::nearest(double x, double y, double &distance_out) {

Node * node_curr, * node_closest;

double x_curr, y_curr, dist_curr, dist_closest = 9999;



for (int i = 0; i < num; i++) {

// access node coords


node_curr = node_list[i];

x_curr = node_curr->x;

y_curr = node_curr->y;

// calculate distance between node and specified coords

dist_curr = sqrt(pow(x_curr-x,2) + pow(y_curr-y,2));

// if this is the closest we’ve found so far

if (dist_curr < dist_closest) {

dist_closest = dist_curr;

node_closest = node_curr;

}

}

distance_out = dist_closest;

return node_closest;

}

//--------------------------------------------------------------------------------

// find all nodes that are within the prescribed distance from specified coords

vector<Node *> Tree::near(double x, double y, double radius) {

vector<Node *> near_nodes;

Node * node_curr;

double x_curr, y_curr, dist_curr;



for (int i = 0; i < num; i++) {

// access node coords




// calculate distance between node and specified coords

dist_curr = sqrt(pow(x_curr-x,2) + pow(y_curr-y,2));

// if distannce is within radius, add to list of nodes to be returned

if (dist_curr <= radius) {

near_nodes.push_back(node_curr);

}

}

return near_nodes;

}

//--------------------------------------------------------------------------------

// extend tree by adding new node with specified coordinates

void Tree::extend(double x_new, double y_new, int id) {

double distance, xe1, ye1, xe2, ye2, xe1_best, ye1_best, xe2_best, ye2_best, payoff, payoff_best;

Node * node_curr, * parent_best;

double n, radius;

vector<Node *> nodes_nearby;

int num_nearby, i;

double r_factor = 20/4.0; // setting on the size of the circle to check for nearby nodes

double r_decay = 0.5; // setting on how fast the circle decays in size

// find node closest to (x_new,y_new), the new node being added

node_curr = this->nearest(x_new, y_new, distance);

// calculate E1 and E2’s updated pos if P goes from node_curr to new node


node_curr->steer(x_new, y_new, xe1, ye1, xe2, ye2);

// calculate payoff of new node as a result of getting there from node_curr

payoff = find_payoff(x_new, y_new, xe1, ye1, xe2, ye2);

// specify node_curr as the best parent found so far

xe1_best = xe1;

ye1_best = ye1;

xe2_best = xe2;

ye2_best = ye2;

parent_best = node_curr;

payoff_best = payoff;

// find all nodes within search radius of new node

n = (double) node_list.size() + 1;

radius = r_factor*sqrt(log(r_decay*n)/(r_decay*n));

nodes_nearby = this->near(x_new, y_new, radius);

num_nearby = (int) nodes_nearby.size();

// for each nearby node

for (i = 0; i < num_nearby; i++) {

// specify current nearby node as node_curr

node_curr = nodes_nearby[i];

// calculate E1 and E2’s updated pos if P goes from node_curr to new node

node_curr->steer(x_new, y_new, xe1, ye1, xe2, ye2);

// calculate payoff of new node as a result of getting there from node_curr

payoff = find_payoff(x_new, y_new, xe1, ye1, xe2, ye2);

// if this route yields a higher payoff than the best route found so far

if (payoff > payoff_best) {

// specify node_curr as the best parent found so far

xe1_best = xe1;

ye1_best = ye1;

xe2_best = xe2;

ye2_best = ye2;

parent_best = node_curr;


}

}

// create new node

Node * node_new = new Node(x_new, y_new, parent_best, id+1);

node_new->set_Epos(xe1_best, ye1_best, xe2_best, ye2_best);

node_new->set_payoff(payoff_best);

// add new node to tree

this->insertNode(node_new);

// ===== REWIRING PHASE =====

int num_ancestors, j;

vector<Node *> ancestor_nodes;

bool is_ancestor;

double x_curr, y_curr, payoff_curr;

// produce list of all ancestors of new node

ancestor_nodes = node_new->ancestors();

num_ancestors = (int) ancestor_nodes.size();

// for each nearby node

for (i = 0; i < num_nearby; i++) {

// specify current nearby node as node_curr


node_curr = nodes_nearby[i];

// make sure that this node is not an ancestor of new node

is_ancestor = false;

for (j = 0; j < num_ancestors; j++) {

if (node_curr == ancestor_nodes[j]) {

is_ancestor = true;

break;

}

}

if (is_ancestor) continue;

// node_curr data



payoff_curr = node_curr->payoff;

// calculate E1 and E2’s updated pos if P goes from new node to node_curr

node_new->steer(x_curr, y_curr, xe1, ye1, xe2, ye2);

// calculate payoff of new node as a result of getting there from new node

payoff = find_payoff(x_curr, y_curr, xe1, ye1, xe2, ye2);

// if this route yields a higher payoff than the old one

if (payoff > payoff_curr) {

// make new node the parent of node_curr

node_curr->change_parent(node_new);

// update E pos and payoff of node_curr

node_curr->set_Epos(xe1, ye1, xe2, ye2);

node_curr->set_payoff(payoff);

// need to update E pos and payoffs of all child nodes

node_curr->update_descendants();

}

}

}

//--------------------------------------------------------------------------------

// return node with max payoff

Node * Tree::max_payoff() {

Node * node_best, * node_curr;

double payoff, payoff_best = -9999;



for (int i = 0; i < num; i++) {

// access node


payoff = node_curr->payoff;

// if node has higher payoff than previous best, make this the best one so far

if (payoff > payoff_best) {


node_best = node_curr;

}

}

return node_best;

}


#endif

find heading.cpp

#include <math.h>

#include "constants.h"

void shortest_strat(double xp, double yp, double xe, double ye, double &xf, double &yf,

double &payoff_min);

void intersection_strat(double xp, double yp, double xe, double ye, double &xf, double &yf,

double &payoff_min);

/* find_heading ----------------------------------------------------------------------------

Find the destination point that E will go to such that P will get the lowest payoff possible.

Returns a pointer to an array containing {xf,yf,payoff_min}.

*/

void find_heading(double xp, double yp, double xe, double ye, double &xf, double &yf,

double &payoff_min, int &status) {

// status: 0 - E can’t make it to safehouse; 1 - E can make it; 2 - E already in safehouse

// Feb 25: check whether E is already in safehouse

if (sqrt(pow(xd-xe,2) + pow(yd-ye,2))-rd <= 0) {

xf = -1;

yf = -1;

payoff_min = -100;

status = 2;

}

// if E not in safehouse, check if E can get to safehouse without being caught

else {

shortest_strat(xp,yp,xe,ye,xf,yf,payoff_min);

if (xf < 0) status = 0;

else status = 1;

}

// if P is able to capture E, then switch to intersection strategy

if (status == 0) {

intersection_strat(xp,yp,xe,ye,xf,yf,payoff_min);

}

}

/* shortest_strat ----------------------------------------------------------------------------

Determines whether it is possible for E to safely get to the safehouse. If possible,

function returns a pointer to the closest coordinates on the safehouse perimeter that

E can safely get to. If not possible (P can capture it), the coordinates are given as (-1,-1).

Returns a pointer to an array containing {xf,yf,payoff_min=-100}.

*/

void shortest_strat(double xp, double yp, double xe, double ye, double &xf, double &yf,

double &payoff_min) {

xf = -1;

yf = -1;

payoff_min = -100;


double re_best = 9999;

double x_per, y_per, re, rp;

// for each point on safehouse perimeter

for (y_per = yd-rd; y_per <= yd+rd; y_per = y_per + 0.05/4.0) {

x_per = xd - sqrt(pow(rd,2) - pow(y_per-yd,2));

// calculate E and P’s distance from that point

re = sqrt(pow(x_per-xe,2) + pow(y_per-ye,2));

rp = sqrt(pow(x_per-xp,2) + pow(y_per-yp,2));

// if E can get there before P does, E can get there safely

if (re < rp/k) {

// if we haven’t found a safe destination prior to this, or if this new point is

// closer than what we’ve previously found

if (xf < 0 || re < re_best) {

// make this new point the E’s new destination

xf = x_per;

yf = y_per;

re_best = re;

//break; // technically, shouldn’t do this cuz its incorrect, but save proc. time

}

}

}

}

/* intersection_strat ----------------------------------------------------------------------------

Provides the point on the locus of intersection points between P and E that is closest to the safehouse.

Returns a pointer to an array containing {xf,yf,payoff_min}.

*/

void intersection_strat(double xp, double yp, double xe, double ye, double &xf, double &yf,

double &payoff_min) {

double x_locus, y_locus1, y_locus2, sqrt_arg,H1,H2;

double Hmin = 9999;

xf = -1;

yf = -1;

// for each point on locus of intersection points

for (x_locus = 0; x_locus <= xb; x_locus = x_locus + 0.1/4.0) {

sqrt_arg = -1*pow(k*k-1,2)*pow(x_locus,2) + 2*(k*k-1)*(k*k*xe-xp)*x_locus + k*k*pow(yp-ye,2)

+ (k*k-1)*(xp*xp - pow(k*xe,2));

// can’t take square root of negative number, so this point doesnt exist on locus; skip

if (sqrt_arg < 0) {

continue;

}

// calculate y-coordinates associated with current x-coordinate

y_locus1 = (yp - k*k*ye + sqrt(sqrt_arg))/(1-k*k);

y_locus2 = (yp - k*k*ye - sqrt(sqrt_arg))/(1-k*k);

// if intersection point 1 is within the game environment

if (y_locus1 <= yb && y_locus1 >= 0) {

H1 = sqrt(pow(xd-x_locus,2) + pow(yd-y_locus1,2)) - rd;

// if payoff of intersection point is lower than the lowest found so far, make this the new lowest

if (H1 < Hmin) {

Hmin = H1;

xf = x_locus;

yf = y_locus1;

}


}

// if intersection point 2 is within the game environment

if (y_locus2 <= yb && y_locus2 >= 0) {

H2 = sqrt(pow(xd-x_locus,2) + pow(yd-y_locus2,2)) - rd;

// if payoff of intersection point is lower than the lowest found so far, make this the new lowest

if (H2 < Hmin) {

Hmin = H2;

xf = x_locus;

yf = y_locus2;

}

}

}

payoff_min = Hmin;

}

find payoff.cpp

#include <math.h>

#include <iostream>

#include "constants.h"

void find_heading(double xp, double yp, double xe, double ye, double &xf, double &yf,

double &payoff_min, int &status);

double find_payoff(double xp, double yp, double xe1, double ye1, double xe2, double ye2) {

double payoff_best = -9999;

double xe1_f, ye1_f, payoff1_min, xe2_f, ye2_f, payoff2_min;

double length, t_capt, n_raw, dt_use, dl_use, payoff_curr;

double theta_seq[3];

int order, rows, n_use;

rows = 2;

double pos[2][6] = {{xp, yp, xe1, ye1, xe2, ye2},{xp, yp, xe2, ye2, xe1, ye1}};

int status1, status2;

// for different engagement sequences

for (order = 0; order < rows; order++) {

status1 = 0;

status2 = 0;

// find the optimal heading of P and E#1 and the corresponding payoff

find_heading(pos[order][0],pos[order][1],pos[order][2],pos[order][3],xe1_f,ye1_f,payoff1_min,status1);

// if P is able to catch E#1, do it

if (status1 == 0) {

// find P and E#1’s heading angles

theta_seq[0] = atan2(ye1_f-pos[order][1],xe1_f-pos[order][0]);



length = sqrt(pow(xe1_f-pos[order][0],2) + pow(ye1_f-pos[order][1],2));

t_capt = length/v;

n_raw = t_capt/dt;




dl_use = v*dt_use;

// march forward in time until E#1 is captured

for (int t = 0; t < n_use; t++) {

// if E#1 is captured due to capture radius, break loop

if (sqrt(pow(pos[order][0]-pos[order][2],2) + pow(pos[order][1]-pos[order][3],2)) <= rcap) {

payoff1_min = sqrt(pow(pos[order][2]-xd,2) + pow(pos[order][3]-yd,2))-rd;

if (payoff1_min <= 0) payoff1_min = -100;

break;

}

if (status2 == 0) {

// find the optimal heading of E#2


// find E#2’s heading angle


}


pos[order][0] = pos[order][0] + dl_use*cos(theta_seq[0]);

pos[order][1] = pos[order][1] + dl_use*sin(theta_seq[0]);

// find E#1’s new position (move only if not in safehouse)

if (status1 != 2) {

pos[order][2] = pos[order][2] + dl_use/k*cos(theta_seq[1]);

pos[order][3] = pos[order][3] + dl_use/k*sin(theta_seq[1]);

}

// find E#2’s new position (move only if not in safehouse)

if (status2 != 2) {



}

}

}

// find the optimal heading of P and E#2 and the corresponding payoff


// find P and E#2’s heading angles




length = sqrt(pow(xe2_f-pos[order][0],2) + pow(ye2_f-pos[order][1],2));

t_capt = length/v;

n_raw = t_capt/dt;



dl_use = v*dt_use;

// march forward in time until E#2 is captured

for (int t = 0; t < n_use; t++) {

// if E#2 is captured due to capture radius, break loop

if (sqrt(pow(pos[order][0]-pos[order][4],2) + pow(pos[order][1]-pos[order][5],2)) <= rcap) {


payoff2_min = sqrt(pow(pos[order][4]-xd,2) + pow(pos[order][5]-yd,2))-rd;

if (payoff2_min <= 0)

payoff2_min = -100;

break;

}


pos[order][0] = pos[order][0] + dl_use*cos(theta_seq[0]);

pos[order][1] = pos[order][1] + dl_use*sin(theta_seq[0]);

// find E#2’s new position



}

// if P couldn’t catch E#1, that’s a -100 penalty plus the capture distance of E#2

// (if P can’t catch E#2 either, that’s another -100 penalty for a total of -200)


payoff_curr = payoff1_min + payoff2_min;

// if P catches E#1

else {

// if P can’t catch E#2 after, that’s a -100 penalty


payoff_curr = payoff1_min + payoff2_min;

// if P catches E#2 as well, then the payoff is the closest capture distance

else {

if (payoff1_min <= payoff2_min) payoff_curr = payoff1_min;

else payoff_curr = payoff2_min;

}

}

// if payoff resulting from this order of engagement is better than what we’ve found so far

if (payoff_curr > payoff_best) {

payoff_best = payoff_curr;

}

}

return payoff_best;

}

Appendix C

Kalman Filter for Speed Prediction

The state vector consists of the position and the x and y components of the velocity:

x =

[x y vx vy

]TThe state-space model of the problem is

x = Ax +Gw (C.1)

=

02x2 I2x2

02x2 02x2

x +

02x2

I2x2

w

y = Cx + v (C.2)

=

[I2x2 02x2

]x + v

where w and v are the white process and measurement noises, respectively, satisfying

E(w) = E(v) = 0, E(wwT ) = Q, E(vvT ) = R, E(wvT ) = 0

where Q =

0.1 0

0 0.1

, R =

0.01 0

0 0.01

84

Appendix C. Kalman Filter for Speed Prediction 85

The observability matrix Q0 is

Q0 =

C

CA

...

CAn−1

=

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

......

......

rank(Q0) = n, therefore (C,A) is observable.

The Kalman filter that minimizes the steady-state error covariance is

˙x = Ax + L(y − Cx) (C.3)

= (A− LC)x + Ly

y = Cx (C.4)

The matrix L is found using MATLAB’s kalman function, where

L = PCTR−1 (C.5)

and the error covariance P is found by solving the algebraic Riccati equation

ATP + PA− PBR−1BTP +Q = 0 (C.6)

where B = G as there is no inputs to the system.

Bibliography

[1] R. Isaacs, Differential Games. New York: John Wiley and Sons, Inc., 1965.

[2] J. Lewin, Differential Games: Theory and Methods for Solving Game Problems with

Singular Surfaces. London: Springer-Verlag, 1994.

[3] J. Hespanha, M. Prandini, and S. Sastry, “Probabilistic pursuit-evasion games: A

one-step nash approach,” Proceedings of the 39th IEEE Conference on Decision and

Control, Sydney, Australia, vol. 3, pp. 2272–2277, 2000.

[4] J. Hespanha, H. Kim, and S. Sastry, “Multiple-agent probabilistic pursuit-evasion

games,” Proceedings of the 38th IEEE Conference on Decision and Control, Phoenix,

Arizona, vol. 3, pp. 2432–2437, 1999.

[5] R. Vidal, O. Shakernia, H. Kim, D. Shim, and S. Sastry, “Probabilistic pursuit-

evasion games: Theory, implementation, and experimental evaluation,” IEEE Trans-

actions on Robotics and Automation, vol. 18(5), pp. 662–669, 2002.

[6] S. LaValle, D. Lin, L. Guibas, J. Latombe, and R. Motwani, “Finding an unpre-

dictable target in a workspace with obstacles,” Proceedings of the IEEE International

Conference on Robotics and Automation, Albuquerque, New Mexico, pp. 737–742,

1997.

86

Bibliography 87

[7] A. AlDahak and A. Elnagar, “A practical pursuit-evasion algorithm: Detection and

tracking,” IEEE International Conference on Robotics and Automation, Roma, Italy,

pp. 343–348, 2007.

[8] B. Tovar and S. LaValle, “Visibility-based pursuit-evasion with bounded speed,” The

International Journal of Robotics Research, vol. 27(11-12), pp. 1350–1360, 2008.

[9] D. Li and J. Cruz, Jr., “A two-player stochastic pursuit-evasion differential game,”

Proceedings of the 46th IEEE Conference on Decision and Control, New Orleans,

LA, pp. 4057–4062, 2007.

[10] T. Shima and O. Golan, “Bounded differential games guidance law for dual-

controlled missiles,” IEEE Transactions on Control Systems Technology, vol. 14(4),

pp. 719–724, 2006.

[11] S. Bhattacharya, S. Hutchinson, and T. Basar, “Game-theoretic analysis of a visi-

bility based pursuit-evasion game in the presence of obstacles,” Proceedings of the

American Conference, St. Louis, Missouri, pp. 373–378, 2009.

[12] S. Bhattacharya and S. Hutchinson, “On the existence of nash equilibrium for a two-

player pursuit-evasion game with visibility constraints,” The International Journal

of Robotics Research, vol. 29(7), pp. 831–839, 2010.

[13] T. Muppirala, S. Hutchinson, and R. Murrieta-Cid, “Optimal motion strategies

based on critical events to maintain visibility of a moving target,” Proceedings of

the IEEE International Conference on Robotics and Automation, Barcelona, Spain,

pp. 3826–3831, 2005.

[14] J. Kim and M. Tahk, “Co-evolutionary computation for constrained min-max prob-

lems and its applications for pursuit-evasion games,” Proceedings of the Congress of

Evolutionary Computation, vol. 2, pp. 1205–1212, 2001.

Bibliography 88

[15] J. Ge, L. Tang, J. Reimann, and G. Vachtsevanos, “Hierarchical decomposition

approach for pursuit-evasion differential game with multiple players,” Proceedings of

the IEEE Aerospace Conference, 2006.

[16] D. Li, J. Cruz, Jr., G. Chen, C. Kwan, and M. Chang, “A hierarchical approach

to multi-player pursuit-evasion differential games,” Proceedings of the 44th IEEE

Conference on Decision and Control, Seville, Spain, pp. 5674–5679, 2005.

[17] A. Sun and H. Liu, “Multi-pursuer evasion,” AIAA Guidance, Navigation and Con-

trols Conference, Honolulu, HI, 2008.

[18] A. Bolonkin and R. Murphey, “Geometry-based parametric modeling for single-

pursuer/mutiple-evader problems,” Journal of Guidance, Control, and Dynamics,

vol. 28(1), pp. 145–149, 2005.

[19] S. Bopardikar, F. Bullo, and J. Hespanha, “On discrete-time pursuit-evasion games

with sensing limitations,” IEEE Transactions on Robotics, vol. 24(6), pp. 1429–1439,

2008.

[20] J. Durham, A. Franchi, and F. Bullo, “Distributed pursuit-evasion with limited-

visibility sensors via frontier-based exploration,” Proceedings of the IEEE Interna-

tional Conference on Robotics and Automation, pp. 3562–3568, 2010.

[21] B. Gerkey, S. Thrun, and G. Gordon, “Visibility-based pursuit-evasion with limited

field of view,” International Journal of Robotics Research, vol. 25(4), pp. 299–315,

2006.

[22] A. Antoniades, H. Kim, and S. Sastry, “Pursuit-evasion strategies for teams of mul-

tiple agents with incomplete information,” Proceedings of the 42nd IEEE Conference

on Decision and Control, Maui, HI, vol. 1, pp. 756–761, 2003.

Bibliography 89

[23] G. Wang and Z. Yu, “A pontryagins maximum principle for non-zero sum differen-

tial games of bsdes with applications,” IEEE Transactions on Automatic Control,

vol. 55(7), pp. 1742–1747, 2010.

[24] A. Starr and Y. Ho, “Nonzero-sum differential games,” Journal of Optimization

Theory and Applications, vol. 3(3), pp. 184–206, 1969.

[25] H. Sun, M. Li, and W. Zhang, “Infinite time nonzero-sum linear quadratic stochastic

differential games,” Proceedings of the 29th Chinese Control Conference, Beijing,

China, pp. 1081–1084, 2010.

[26] A. Akhmetzhanov, P. Bernhard, F. Grognard, and L. Mailleret, “Competition be-

tween foraging predators and hiding preys as a nonzero-sum differential game,”

Game Theory for Networks, pp. 357–365, 2009.

[27] A. Lim, X. Zhou, , and J. Moore, “Multiple objective risk-sensitive control and

stochastic differential games,” Proceedings of the 38th Conference on Decision &

Control, Phoenix, AZ, pp. 558–563, 1999.

[28] H. Choset, K. Lynch, S. Hutchinson, G. Kantor, W. Burgard, L. Kavraki, and

S. Thrun, Principles of Robot Motion: Theory, Algorithms, and Implementations.

Cambridge, MA: MIT Press, 2005.

[29] C. Goerzen, Z. Kong, and B. Mettler, “A survey of motion planning algorithms from

the perspective of autonomous uav guidance,” Journal of Intelligent and Robotic

Systems, vol. 57, pp. 65–100, 2010.

[30] Y. Hwang and N. Ahuja, “A potential field approach to path planning,” IEEE

Transactions of Robotics and Automation, vol. 8(1), pp. 23–32, 1992.

[31] S. LaValle, Planning Algorithms. Cambridge University Press, 2006.

Bibliography 90

[32] S. LaValle and J. J. Kuffner, “Randomized kinodynamic planning,” Proceedings of

the IEEE International Conference on Robotics and Automation, Detroit, MI, vol. 1,

pp. 473–479, 1999.

[33] J. Kuffner and S. LaValle, “RRT-connect: An efficient approach to single-query

path planning,” IEEE International Conference on Robotics and Automation, San

Francisco, CA, vol. 2, pp. 995–1001, 2000.

[34] E. Frazzoli, M. Dahleh, and E. Feron, “Real-time motion planning for agile au-

tonomous vehicles,” Journal of Guidance, Control, and Dynamics, vol. 25(1),

pp. 116–129, 2002.

[35] S. Karaman and E. Frazzoli, “Incremental sampling-based algorithms for optimal

motion planning,” Robotics: Science and Systems (RSS), Zaragoza, Spain, 2010.

[36] A. Neto, D. Macharet, and M. Campos, “On the generation of trajectories for multi-

ple uavs in environments with obstacles,” Journal of Intelligent and Robotic Systems,

vol. 57, pp. 123–141, 2010.

[37] Y. Kuwata, J. Teo, G. Fiore, S. Karaman, E. Frazzoli, and J. How, “Real-time mo-

tion planning with applications to autonomous urban driving,” IEEE Transactions

onf Control Systems Technology, vol. 17(5), pp. 1105–1118, 2009.

[38] S. Karaman and E. Frazzoli, “Incremental sampling-based algorithms for a class of

pursuit-evasion games,” Workshop on Algorithmic Foundations of Robotics, Singa-

pore, 2010.

[39] S. Bradley, A. Hax, and T. Magnanti, Applied Mathematical Programming. Addison-

Wesley, 1977.

[40] R. Sutton and A. Barto, Reinforcement Learning: An Introduction. MIT Press,

1998.

Bibliography 91

[41] A. Rao, D. Benson, C. Darby, M. Patterson, C. Francolin, I. Sanders, and

G. Huntington, “Open-source general pseudospectral optimal control software.”

http://www.gpops.org, 2008.

[42] A. Rao, A Primer on Pseudospectral Methods for Solving Optimal Control Problems.

Gainesville, FL, January 2011.

[43] D. Kirk, Optimal Control Theory: An Introduction. Eaglewood Cliffs, NJ: Printice

Hall, 1970.

[44] G. Lau and H. Liu, “Online border patrol guidance algorithm for unmanned aerial

vehicles,” Proceedings of the AIAA Guidance, Navigation and Controls Conference,

Minneapolis, Minnesota, 2012.

[45] R. Siegwart and I. Nourbakhsh, Introduction to Autonomous Mobile Robots. MIT

Press, 2004.

[46] L. Dubins, “On curves of minimal length with a constraint on average curvature,

and with prescribed initial and terminal positions and tangents,” American Journal

of Mathematics, vol. 79, pp. 497–516, 1957.

[47] A. Shkel and V. Lumelsky, “Classification of the dubins set,” Robotics and Au-

tonomous Systems, vol. 34, pp. 179–202, 2001.