Lectures on Robotic Planning and...

On Robotic Routing and Stochastic Surveillance

Francesco Bullo

Center for Control,Dynamical Systems & Computation

University of California at Santa Barbara

http://motion.me.ucsb.edu

Shenyang Institute of Automation, Chinese Academy of Sciences,Shenyang, China, Jun ’18

Academy of Mathematics and System Science, Chinese Academy ofScience, Beijing, China, Jun ’18

School of Automation, Beijing Institute of Technology, Beijing, China,Jun ’18

Acknowledgments

Xiaoming DuanUCSB

Mishel GeorgeScotty Labs

Rush PatelNorthropGrumman

PushkariniAgharkarGoogle

Jeff PetersUTRC

NSF AFOSR ARO ONR DOE

New text “Lectures on Network Systems”

Lectures onNetwork Systems

Francesco Bullo

With contributions byJorge Cortés

Florian DörflerSonia Martínez

Lectures on Network SystemsFrancesco Bullo

These lecture notes provide a mathematical introduction to multi-agent

dynamical systems, including their analysis via algebraic graph theory

and their application to engineering design problems. The focus is on

fundamental dynamical phenomena over interconnected network

systems, including consensus and disagreement in averaging systems,

stable equilibria in compartmental flow networks, and synchronization

in coupled oscillators and networked control systems. The theoretical

results are complemented by numerous examples arising from the

analysis of physical and natural systems and from the design of

network estimation, control, and optimization systems.

Francesco Bullo is professor of Mechanical Engineering and member of the Center for Control, Dynamical Systems, and Computation at the

University of California at Santa Barbara. His research focuses on

modeling, dynamics and control of multi-agent network systems, with

applications to robotic coordination, energy systems, and social

networks. He is an award-winning mentor and teacher.

Francesco BulloLectures on N

etwork System

s

Lectures on Network Systems, Francesco Bullo,Createspace, 1 edition, ISBN 978-1-986425-64-3

For students: free PDF for downloadFor instructors: slides and answer keyshttps://www.amazon.com/dp/1986425649300 pages (plus 200 pages solution manual)3K downloads since Jun 2016150 exercises with solutions20 instructors have adopted parts of it

Linear Systems:

1 social, sensor, robotic & compartmental examples,

2 matrix and graph theory, with an emphasis onPerron–Frobenius theory and algebraic graph theory,

3 averaging algorithms in discrete and continuous time,described by static and time-varying matrices, and

4 positive & compartmental systems, dynamical flowsystems, Metzler matrices.

Nonlinear Systems:

5 nonlinear consensus models,

6 population dynamic models in multi-species systems,

7 coupled oscillators, with an emphasis on theKuramoto model and models of power networks

New text “Lectures on Robotic Planning and Kinematics”

Lectures on Robotic Planning and Kinematics, ver .91For students: free PDF for downloadFor instructors: slides and answer keyshttp://motion.me.ucsb.edu/book-lrpk/

Robotic Planning:

1 Sensor-based planning

2 Motion planning via decomposition and search

3 Configuration spaces

4 Sampling and collision detetion

5 Motion planning via sampling

Robotic Kinematics:

6 Intro to kinematics

7 Rotation matrices

8 Displacement matrices and inverse kinematics

9 Linear and angular velocities

Stochastic surveillance and dynamic routing

Design efficient vehicle control strategies to

1 search unpredictably

2 detect anomalies quickly

3 provide service to customers at known locations

4 perform load balancing among vehicles

VehicleRouting

StochasticSurveillance

DataAggregation

Outline

1 vehicle routing2 load balancing and partitioning

3 stochastic surveillance

AeroVironment Inc, “Raven”unmanned aerial vehicle

iRobot Inc, “PackBot”unmanned ground vehicle

Vehicle routing in dynamic stochastic environments

customers appear sequentially randomly space/time

robotic network knows locations and provides service

Goal: distributed adaptive algos, delay vs throughput

F. Bullo, E. Frazzoli, M. Pavone, K. Savla, and S. L. Smith. Dynamic vehicle routing forrobotic systems.Proceedings of the IEEE, 99(9):1482–1504, 2011.doi:10.1109/JPROC.2011.2158181

Algo #1: Receding-horizon shortest-path policy

Receding-horizon Shortest-Path (RH-SP)

For η ∈ (0, 1], single agent performs:1: while no customers, move to center

2: while customers waiting

1 compute shortest path through current

customers

2 service η-fraction of path

shortest path is NP-hard, but effectiveheuristics available

delay is optimal in light traffic

delay is constant-factor optimal in high traffic

Algo #2: Load balancing via territory partitioning

RH-SP + Partitioning

For η ∈ (0, 1], agent i performs:1: compute own cell vi in optimal partition2: apply RH-SP policy on vi

Asymptotically constant-factor optimal in light and high traffic

Outline

1 vehicle routing

2 load balancing and partitioning3 stochastic surveillance



Load balancing via partitioning

ANALYSIS of cooperative distributed behaviors

DESIGN of performance metrics

1 how to cover a region with n minimum-radius overlapping disks?

2 how to design a minimum-distortion (fixed-rate) vector quantizer?

3 where to place mailboxes in a city / cache servers on the internet?

Voronoi+centering algorithm

Voronoi+centering law

At each comm round:

1: acquire neighbors’ positions

2: compute own dominance region

3: move towards center of own

dominance region

Area-center Incenter Circumcenter

S. Mart́ınez, J. Cortés, and F. Bullo. Motion coordination with distributed information.IEEE Control Systems, 27(4):75–88, 2007.doi:10.1109/MCS.2007.384124

Variations on the Voronoi+centering theme

T. Hatanaka, M. Fujita, TokyoTech 3D coverage

Variations on the Voronoi+centering theme

J. W. Durham, R. Carli, P. Frasca, and F. Bullo. Discrete partitioning and coverage controlfor gossiping robots.IEEE Transactions on Robotics, 28(2):364–378, 2012.doi:10.1109/TRO.2011.2170753

Outline

1 vehicle routing

2 load balancing and partitioning

3 stochastic surveillance



Outline

1 Problem setup and motivation

2 Markov chains with maximum return time entropy

3 Performance of proposed solution

4 Conclusion and future directions

Related work on persistent monitoring and surveillance:

1 G. Cannata and A. Sgorbissa. A minimalist algorithm for multirobotcontinuous coverage.

IEEE Transactions on Robotics, 27(2):297–312, 2011.

doi:10.1109/TRO.2011.2104510

2 S. Alamdari, E. Fata, and S. L. Smith. Persistent monitoring in discreteenvironments: Minimizing the maximum weighted latency betweenobservations.

International Journal of Robotics Research, 33(1):138–154, 2014.

doi:10.1177/0278364913504011

3 J. Yu, S. Karaman, and D. Rus. Persistent monitoring of events withstochastic arrivals at multiple stations.

IEEE Transactions on Robotics, 31(3):521–535, 2015.

doi:10.1109/TRO.2015.2409453

Our publications

1 R. Patel, P. Agharkar, and F. Bullo. Robotic surveillance and Markov chains withminimal weighted Kemeny constant.

IEEE Transactions on Automatic Control, 60(12):3156–3167, 2015.

doi:10.1109/TAC.2015.2426317

2 R. Patel, A. Carron, and F. Bullo. The hitting time of multiple random walks.

SIAM Journal on Matrix Analysis and Applications, 37(3):933–954, 2016.

doi:10.1137/15M1010737

3 M. George, S. Jafarpour, and F. Bullo. Markov chains with maximum entropy forrobotic surveillance.

IEEE Transactions on Automatic Control, May 2018.

doi:10.1109/TAC.2018.2844120

4 X. Duan, M. George, and F. Bullo. Markov chains with maximum return timeentropy for robotic surveillance.

IEEE Transactions on Automatic Control, May 2018.

Submitted.

URL: https://arxiv.org/abs/1803.07705

Stochastic surveillance: Motivating example 1/2

Markovian surveillance agents with visit frequency constraints

Intelligent intruders can sense position/observe path of agent

Design optimal unpredictable transitions for the surveillance agents

Stochastic surveillance: Motivating example 2/2

San Francisco

crime rate at 12 locations

complete by-car travel times(quantized in minutes)

define π ∼ crime rate

Rational intruder:

Picks a node i to attack with probability πi for duration τ

Learns the inter-visit time statistics of surveillance agent

Attacks at time which maximizes likelihood of not being detected

Why Markov chains for routing and planning strategies?

rain sun

p21

p12

p22p11

Advantages of adopting Markov chains:

1 quantify and optimize randomness & unpredictability

2 vast body of work on Markov chains (eg, fastest mixing)

3 finite-dimensional opt problem

4 note: TSP may be written as Markov transition matrix

Entropy of random variable

Given a discrete random variable X ∈ {1, . . . , k}, the Shannon entropy is

H(X ) = −k∑

i=1

pi log pi .

Unbiased coin: P[X = Head] = 0.5 H(X ) = log 2 = 0.693Biased coin: P[X = Head] = 0.75 H(X ) = 0.562Predictable coin: P[X = Head] = 1 H(X ) = 0

The entropy of what variable?

visit random locations

sequence of locations

generate random symbols

what would bank robber do?

when does the police visit?

learn the statistics of return times

0 10 20 30 40 50 60Return times

0

0.05

0.1

0.15

0.2

Ret

urn

Proa

bilit

y

0 10 20 30 40 50 60Return times

0

0.05

0.1

0.15

0.2

Ret

urn

Proa

bilit

y

The entropy rate of a Markov chainA classic notion from information theory

entropy rate of sequence of symbols/locations

Hlocation(P) = −n∑

i=1

πi

n∑

j=1

pij log pij

Maximizing the location entropy rate

Given stationary distribution π & adjacency matrix A

maxP

Hlocation(P)

1 P is transition matrix with stationary distribution π

2 P is consistent with A

Return time entropy of Markov chainBetter entropy notion

Consider irreducible digraph with integer travel timesFor a transition matrix P

Tii (P) = first time agent starting at i returns back to i

Return time entropy of Markov chain

Given irreducible Markov chain P over weighted digraph G = {V , E ,W }and stationary distribution π, the return time entropy is

Hret-time(P) =n∑

i=1

πiH(Tii (P))

Main problem statement

Maximize Hret-time ProblemGiven stationary distribution π and a weighted digraph G = {V , E ,W },

maxP

Hret-time(P)

subject to

1 P is transition matrix with stationary distribution π

2 P is consistent with G

X. Duan, M. George, and F. Bullo. Markov chains with maximum return time entropy forrobotic surveillance.IEEE Transactions on Automatic Control, May 2018.Submitted.URL: https://arxiv.org/abs/1803.07705

Outline





Summary of results

Maximize Hret-time ProblemGiven stationary distribution π and a weighted digraph G = {V , E ,W },

maxP

Hret-time(P)

subject to

1 P is transition matrix with stationary distribution π.

2 P is consistent with G.

Thm 1: Hitting time probability dynamics and well-posedness

Thm 2: Upper bound and solution for complete graph

Thm 3: Relations with the location entropy rate

Thm 4: Truncation, approximation and computation

Basic ideas

Tij = min{∑k−1

s=0wXsXs+1 |X0 = i ,Xk = j , k ≥ 1

}

Fk(i , j) = P[Tij = k]

Hret-time(Tii ) = −∞∑

k=1

Fk(i , i) log Fk(i , i)

Recusive formula, for k ∈ Z>0,

Fk(i , j) = pij1{k=wij} +n∑

h=1,h 6=jpihFk−wih(h, j) (1)

where 1{·} indicator function andwhere Fk(i , j) = 0 for all k ≤ 0 and i , j

Little example with summable seriesOnly other example is complete homogeneous graph

rain sun

p21

p12

p22p11

For this special case

P(T11 = k) =

{p11, if k = 1,

p12pk−222 p21, if k ≥ 2.

H(T11) = −p11 log p11 − p12 log(p12p21)−p12p22 log p22

p21

Hret-time(P) = −2π1p11 log(p11)− 2π2p22 log(p22)− 2π1p12 log(p12)− 2π2p21 log(p21).

In general, Hret-time(P) does not admit a closed form.

Thm 1: Hitting time probability dynamics, well-posedness

Thm 1: Hitting time probability dynamics and well-posedness

Given an irreducible Markov chain P ∈ Rn×n on weighted digraph G,1 hitting time probabilities satisfy

vec(Fk) =n∑

i ,j=1

pij([1n − ei ]⊗ eie>j ) vec(Fk−wij )

+ vec(P ◦ 1{k1n1>n =W })

2 discrete-time affine system with delays – is exponentially stable

Hret-time is a continuous function over a compact set

Consider a compact set of Schur stable matrices A ⊂ Rn×n and let

ρA := maxA∈A ρ(A) < 1.

Then for any λ ∈ (ρA, 1) and for any ‖ · ‖, there exists c > 0 such that

‖Ak‖ ≤ cλk , for all A ∈ A and k ∈ Z≥0.

Consider a sequence of functions {fk : X → R}k∈Z>0 . If there exists asequence of Weierstrass scalars {Mk}k∈Z>0 such that

∑∞k=1

Mk 0,

then∑∞

k=1 fk converges uniformly. Today fk = Fk(i , i) log Fk(i , i).

The uniform limit of any sequence of continuous functions is continuous.

Thm 2: Upper bound and solution for complete graph

given expected value µ,

geometric distribution P[Y = k] =(

1− 1µ)k−1

1µ is maxentropic

Given a strongly connected unweighted digraph, stationary distribution π,

1 the return time entropy function is upper bounded by

Hret-time(P) ≤ −n∑

i=1

(πi log πi + (1− πi ) log(1− πi ));

2 if G is complete, the upper bound is achieved with P = 1nπ>.

upper bound only depends on the stationary distribution πcomplete unweighted G: MaxLocationEntropy = MaxReturnEntropy

Thm 3: Relations with the location entropy rate

Given an irreducible Markov chain P ∈ Rn×n over an unweighted digraphG and stationary distribution π, Hret-time(P) and Hlocation(P) satisfy

Hlocation(P) ≤ Hret-time(P) ≤ nHlocation(P).

lower bound: due to concavity of -x log x

lower bound: achieved with P is a permutation matrix, 0 = 0

upper bound: proof by analyzing the entropy of trajectories

upper bound: achieved when different return paths = different lengths

Lesson: Hret-time(P) can be very different from Hlocation(P)

Computational ideas

Given accuracy η, truncation duration Nη and tail probability satisfy

Nη =⌈ wmaxηπmin

⌉− 1 ⇒ P[Tii ≥ Nη + 1] ≤ η.

The conditional return time entropy is of interest:

(Hret-time)cond,η(P) =n∑

i=1

πiH(Tii |Tii ≤ Nη)

= −n∑

i=1

πi

Nη∑

k=1

Fk(i , i)Nη∑k=1

Fk(i , i)

logFk(i , i)

Nη∑k=1

Fk(i , i)

.

In practice, the truncated return time entropy is

(Hret-time)trunc,η(P) = −n∑

i=1

πi

Nη∑

k=1

Fk(i , i) log Fk(i , i).



Given a strongly connected weighted digraph G, stationary distribution π,1 Asymptotic agreement

Hret-time(P) = limη→0+

(Hret-time)cond,η(P) = limη→0+

(Hret-time)trunc,η(P)

2 The gradient of (Hret-time)trunc,η(P) can be computed via

vec(∂(Hret-time)trunc,η(P)

∂P

)= −

n∑

i=1

πi

Nη∑

k=1

∂(Fk(i , i) log Fk(i , i)

)

∂Fk(i , i)G>k e(i−1)n+i ,

where Gk =[∂ vec(Fk )∂p11

· · · ∂ vec(Fk )∂pnn]

satisfies a delayed linear system;

Proof: exp stability of affine delayed system + uniform bound + chain rule

Gradient projection algorithm

1: select: minimum edge weight �� 1,select: truncation accuracy η � 1, andselect: initial condition P0 in P�G,π

2: for iteration parameter s = 0 : (number-of-steps) do3: {Gk}k∈{1,...,Nη} := solution to Thm 4 at Ps4: ∆s := gradient of (Hret-time)trunc,η(Ps)5: Ps+1 := projectionP�G,π(Ps + (step size) ·∆s)6: end for

Outline





Compare three chains

1 MaxReturnEntropymaxP

Hret-time(P)

2 MaxLocationEntropymaxP

Hlocation(P)

entropy rate of sequence of symbols/locations

Hlocation(P) = −n∑

i=1

πi

n∑

j=1

pij log pij

3 MinKemeny: minP

E[K (P)]Minimize the mean first passage time:

ki =∑

i

E[Tij ]πj = kj = Kemeny constant

Comparison over a ring and a grid graph 1/2

Unit travel times.Ring weights = 4 high, 4 low. Grid weights ∼ node degree.

0 10 20 30 40 50 60Return times

0

0.05

0.1

0.15

0.2

Ret

urn

Pro

abili

ty

(a) MaxReturnEntropy

0 10 20 30 40 50 60Return times

0

0.05

0.1

0.15

0.2

Ret

urn

Pro

abili

ty

(b) MaxLocationEntropy

0 10 20 30 40 50 60Return times

0

0.05

0.1

0.15

0.2

Ret

urn

Pro

abili

ty

(c) MinKemeny

0 10 20 30 40 50 60Return times

0

0.05

0.1

0.15

0.2

0.25

0.3

Ret

urn

Pro

abili

ty

(d) MaxReturnEntropy

0 10 20 30 40 50 60Return times

0

0.05

0.1

0.15

0.2

0.25

0.3

Ret

urn

Pro

abili

ty

(e) MaxLocationEntropy

0 10 20 30 40 50 60Return times

0

0.05

0.1

0.15

0.2

0.25

0.3

Ret

urn

Pro

abili

ty

(f) MinKemeny

Comparison over a ring and a grid graph 2/2

Graph Markov chains Hret-time(P) Hlocation(P)Kemenyconstant

8-node ring MaxReturnEntropy 2.49 0.86 10.04MaxLocationEntropy 2.35 0.98 19.53

MinKemeny 1.96 0.46 6.16

4-by-4 gridMaxReturnEntropy 3.65 0.94 16.35

MaxLocationEntropy 3.28 1.40 30.86MinKemeny 2.09 0.21 10.09

MaxReturnEntropy chain combines speed and unpredictability.MaxReturnEntropy is nonreversible and thus faster in general.

Comparison over San Francisco map 1/3Stochastic surveillance: Motivating example 2/2

San Francisco

crime rate at 12 locations

complete by-car travel times(quantized in minutes)

π ∼ crime rate

A rational intruder:

Picks a node i to attack with probability πi for duration τ

Learns the inter-visit time statistics of surveillance agent

Attacks at time which maximizes likelihood of not being detected

Comparison over San Francisco map 2/3

(g) MaxReturnEntropy (h) MinKemeny

Figure: Pixel image of the Markov chains with row sum being 1

MinKemeny chain is close to a shortest tour with self weights

MaxReturnEntropy chain is dense and creates more return entropy

Comparison over San Francisco map 3/3: high vs. low

0 10 20 30 40 50 60Return times

0

0.01

0.02

0.03

0.04

Ret

urn

Pro

abili

ty

0 10 20 30 40 50 60Return times

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Ret

urn

Pro

abili

ty

0 10 20 30 40 50 60Return times

0

0.005

0.01

0.015

0.02

Ret

urn

Pro

abili

ty

0 10 20 30 40 50 60Return times

0

0.02

0.04

0.06

0.08

0.1

Ret

urn

Pro

abili

ty

Comparison in catching the rational intruder 1/2

Rational intruder:

Picks a node i to attack with probability πi

Collects the inter-visit (return) time statistics of the agent

Attacks when the agent is absent for si timesteps since last visit

si = argmin0≤s≤Si

{∑τk=1

P(Tii = s + k |Tii > s)},

where τ is the attack duration and Si is determined by the degree ofimpatience δ, i.e., P(Tii ≥ Si ) ≤ δ

Comparison in catching the rational intruder 2/2

0 5 10 15 20 25Attack duration

0

0.2

0.4

0.6

0.8

1

Pro

babi

lity

of c

aptu

re

MaxReturnEntropyMaxEntropyRateMinKemeny

0 10 20 30 40 50 60Attack duration

0

0.2

0.4

0.6

0.8

1

Pro

babi

lity

of c

aptu

re

MaxReturnEntropyMinKemeny

4× 4 grid: MaxReturnEntropy > MaxLocationEntropy,MaxReturnEntropy > MinKemeny for short attack duration

SF map: MaxReturnEntropy > MinKemeny for short attack duration

Conclusion and future directions

Conclusion

1 new metric for unpredictability of Markov chains

2 analysis and computation for maximum return time entropy chain

3 applicability (and comparison) in stochastic surveillance

Future Work

1 extensions to multi-vehicle problems

2 scalable computation for large graphs

3 transcription from continuous space/time

Lectures on Robotic Planning and...

Documents

Transcript of Lectures on Robotic Planning and...