Lectures on Robotic Planning and...
Transcript of Lectures on Robotic Planning and...
-
On Robotic Routing and Stochastic Surveillance
Francesco Bullo
Center for Control,Dynamical Systems & Computation
University of California at Santa Barbara
http://motion.me.ucsb.edu
Shenyang Institute of Automation, Chinese Academy of Sciences,Shenyang, China, Jun ’18
Academy of Mathematics and System Science, Chinese Academy ofScience, Beijing, China, Jun ’18
School of Automation, Beijing Institute of Technology, Beijing, China,Jun ’18
Acknowledgments
Xiaoming DuanUCSB
Mishel GeorgeScotty Labs
Rush PatelNorthropGrumman
PushkariniAgharkarGoogle
Jeff PetersUTRC
NSF AFOSR ARO ONR DOE
New text “Lectures on Network Systems”
Lectures onNetwork Systems
Francesco Bullo
With contributions byJorge Cortés
Florian DörflerSonia Martínez
Lectures on Network SystemsFrancesco Bullo
These lecture notes provide a mathematical introduction to multi-agent
dynamical systems, including their analysis via algebraic graph theory
and their application to engineering design problems. The focus is on
fundamental dynamical phenomena over interconnected network
systems, including consensus and disagreement in averaging systems,
stable equilibria in compartmental flow networks, and synchronization
in coupled oscillators and networked control systems. The theoretical
results are complemented by numerous examples arising from the
analysis of physical and natural systems and from the design of
network estimation, control, and optimization systems.
Francesco Bullo is professor of Mechanical Engineering and member of the Center for Control, Dynamical Systems, and Computation at the
University of California at Santa Barbara. His research focuses on
modeling, dynamics and control of multi-agent network systems, with
applications to robotic coordination, energy systems, and social
networks. He is an award-winning mentor and teacher.
Francesco BulloLectures on N
etwork System
s
Lectures on Network Systems, Francesco Bullo,Createspace, 1 edition, ISBN 978-1-986425-64-3
For students: free PDF for downloadFor instructors: slides and answer keyshttps://www.amazon.com/dp/1986425649300 pages (plus 200 pages solution manual)3K downloads since Jun 2016150 exercises with solutions20 instructors have adopted parts of it
Linear Systems:
1 social, sensor, robotic & compartmental examples,
2 matrix and graph theory, with an emphasis onPerron–Frobenius theory and algebraic graph theory,
3 averaging algorithms in discrete and continuous time,described by static and time-varying matrices, and
4 positive & compartmental systems, dynamical flowsystems, Metzler matrices.
Nonlinear Systems:
5 nonlinear consensus models,
6 population dynamic models in multi-species systems,
7 coupled oscillators, with an emphasis on theKuramoto model and models of power networks
New text “Lectures on Robotic Planning and Kinematics”
Lectures on Robotic Planning and Kinematics, ver .91For students: free PDF for downloadFor instructors: slides and answer keyshttp://motion.me.ucsb.edu/book-lrpk/
Robotic Planning:
1 Sensor-based planning
2 Motion planning via decomposition and search
3 Configuration spaces
4 Sampling and collision detetion
5 Motion planning via sampling
Robotic Kinematics:
6 Intro to kinematics
7 Rotation matrices
8 Displacement matrices and inverse kinematics
9 Linear and angular velocities
-
Stochastic surveillance and dynamic routing
Design efficient vehicle control strategies to
1 search unpredictably
2 detect anomalies quickly
3 provide service to customers at known locations
4 perform load balancing among vehicles
VehicleRouting
StochasticSurveillance
DataAggregation
Outline
1 vehicle routing2 load balancing and partitioning
3 stochastic surveillance
AeroVironment Inc, “Raven”unmanned aerial vehicle
iRobot Inc, “PackBot”unmanned ground vehicle
Vehicle routing in dynamic stochastic environments
customers appear sequentially randomly space/time
robotic network knows locations and provides service
Goal: distributed adaptive algos, delay vs throughput
F. Bullo, E. Frazzoli, M. Pavone, K. Savla, and S. L. Smith. Dynamic vehicle routing forrobotic systems.Proceedings of the IEEE, 99(9):1482–1504, 2011.doi:10.1109/JPROC.2011.2158181
Algo #1: Receding-horizon shortest-path policy
Receding-horizon Shortest-Path (RH-SP)
For η ∈ (0, 1], single agent performs:1: while no customers, move to center
2: while customers waiting
1 compute shortest path through current
customers
2 service η-fraction of path
shortest path is NP-hard, but effectiveheuristics available
delay is optimal in light traffic
delay is constant-factor optimal in high traffic
-
Algo #2: Load balancing via territory partitioning
RH-SP + Partitioning
For η ∈ (0, 1], agent i performs:1: compute own cell vi in optimal partition2: apply RH-SP policy on vi
Asymptotically constant-factor optimal in light and high traffic
Outline
1 vehicle routing
2 load balancing and partitioning3 stochastic surveillance
AeroVironment Inc, “Raven”unmanned aerial vehicle
iRobot Inc, “PackBot”unmanned ground vehicle
Load balancing via partitioning
ANALYSIS of cooperative distributed behaviors
DESIGN of performance metrics
1 how to cover a region with n minimum-radius overlapping disks?
2 how to design a minimum-distortion (fixed-rate) vector quantizer?
3 where to place mailboxes in a city / cache servers on the internet?
Voronoi+centering algorithm
Voronoi+centering law
At each comm round:
1: acquire neighbors’ positions
2: compute own dominance region
3: move towards center of own
dominance region
Area-center Incenter Circumcenter
S. Mart́ınez, J. Cortés, and F. Bullo. Motion coordination with distributed information.IEEE Control Systems, 27(4):75–88, 2007.doi:10.1109/MCS.2007.384124
-
Variations on the Voronoi+centering theme
T. Hatanaka, M. Fujita, TokyoTech 3D coverage
Variations on the Voronoi+centering theme
J. W. Durham, R. Carli, P. Frasca, and F. Bullo. Discrete partitioning and coverage controlfor gossiping robots.IEEE Transactions on Robotics, 28(2):364–378, 2012.doi:10.1109/TRO.2011.2170753
Outline
1 vehicle routing
2 load balancing and partitioning
3 stochastic surveillance
AeroVironment Inc, “Raven”unmanned aerial vehicle
iRobot Inc, “PackBot”unmanned ground vehicle
Outline
1 Problem setup and motivation
2 Markov chains with maximum return time entropy
3 Performance of proposed solution
4 Conclusion and future directions
-
Related work on persistent monitoring and surveillance:
1 G. Cannata and A. Sgorbissa. A minimalist algorithm for multirobotcontinuous coverage.
IEEE Transactions on Robotics, 27(2):297–312, 2011.
doi:10.1109/TRO.2011.2104510
2 S. Alamdari, E. Fata, and S. L. Smith. Persistent monitoring in discreteenvironments: Minimizing the maximum weighted latency betweenobservations.
International Journal of Robotics Research, 33(1):138–154, 2014.
doi:10.1177/0278364913504011
3 J. Yu, S. Karaman, and D. Rus. Persistent monitoring of events withstochastic arrivals at multiple stations.
IEEE Transactions on Robotics, 31(3):521–535, 2015.
doi:10.1109/TRO.2015.2409453
Our publications
1 R. Patel, P. Agharkar, and F. Bullo. Robotic surveillance and Markov chains withminimal weighted Kemeny constant.
IEEE Transactions on Automatic Control, 60(12):3156–3167, 2015.
doi:10.1109/TAC.2015.2426317
2 R. Patel, A. Carron, and F. Bullo. The hitting time of multiple random walks.
SIAM Journal on Matrix Analysis and Applications, 37(3):933–954, 2016.
doi:10.1137/15M1010737
3 M. George, S. Jafarpour, and F. Bullo. Markov chains with maximum entropy forrobotic surveillance.
IEEE Transactions on Automatic Control, May 2018.
doi:10.1109/TAC.2018.2844120
4 X. Duan, M. George, and F. Bullo. Markov chains with maximum return timeentropy for robotic surveillance.
IEEE Transactions on Automatic Control, May 2018.
Submitted.
URL: https://arxiv.org/abs/1803.07705
Stochastic surveillance: Motivating example 1/2
Markovian surveillance agents with visit frequency constraints
Intelligent intruders can sense position/observe path of agent
Design optimal unpredictable transitions for the surveillance agents
Stochastic surveillance: Motivating example 2/2
San Francisco
crime rate at 12 locations
complete by-car travel times(quantized in minutes)
define π ∼ crime rate
Rational intruder:
Picks a node i to attack with probability πi for duration τ
Learns the inter-visit time statistics of surveillance agent
Attacks at time which maximizes likelihood of not being detected
-
Why Markov chains for routing and planning strategies?
rain sun
p21
p12
p22p11
Advantages of adopting Markov chains:
1 quantify and optimize randomness & unpredictability
2 vast body of work on Markov chains (eg, fastest mixing)
3 finite-dimensional opt problem
4 note: TSP may be written as Markov transition matrix
Entropy of random variable
Given a discrete random variable X ∈ {1, . . . , k}, the Shannon entropy is
H(X ) = −k∑
i=1
pi log pi .
Unbiased coin: P[X = Head] = 0.5 H(X ) = log 2 = 0.693Biased coin: P[X = Head] = 0.75 H(X ) = 0.562Predictable coin: P[X = Head] = 1 H(X ) = 0
The entropy of what variable?
visit random locations
sequence of locations
generate random symbols
what would bank robber do?
when does the police visit?
learn the statistics of return times
0 10 20 30 40 50 60Return times
0
0.05
0.1
0.15
0.2
Ret
urn
Proa
bilit
y
0 10 20 30 40 50 60Return times
0
0.05
0.1
0.15
0.2
Ret
urn
Proa
bilit
y
The entropy rate of a Markov chainA classic notion from information theory
entropy rate of sequence of symbols/locations
Hlocation(P) = −n∑
i=1
πi
n∑
j=1
pij log pij
Maximizing the location entropy rate
Given stationary distribution π & adjacency matrix A
maxP
Hlocation(P)
1 P is transition matrix with stationary distribution π
2 P is consistent with A
-
Return time entropy of Markov chainBetter entropy notion
Consider irreducible digraph with integer travel timesFor a transition matrix P
Tii (P) = first time agent starting at i returns back to i
Return time entropy of Markov chain
Given irreducible Markov chain P over weighted digraph G = {V , E ,W }and stationary distribution π, the return time entropy is
Hret-time(P) =n∑
i=1
πiH(Tii (P))
Main problem statement
Maximize Hret-time ProblemGiven stationary distribution π and a weighted digraph G = {V , E ,W },
maxP
Hret-time(P)
subject to
1 P is transition matrix with stationary distribution π
2 P is consistent with G
X. Duan, M. George, and F. Bullo. Markov chains with maximum return time entropy forrobotic surveillance.IEEE Transactions on Automatic Control, May 2018.Submitted.URL: https://arxiv.org/abs/1803.07705
Outline
1 Problem setup and motivation
2 Markov chains with maximum return time entropy
3 Performance of proposed solution
4 Conclusion and future directions
Summary of results
Maximize Hret-time ProblemGiven stationary distribution π and a weighted digraph G = {V , E ,W },
maxP
Hret-time(P)
subject to
1 P is transition matrix with stationary distribution π.
2 P is consistent with G.
Thm 1: Hitting time probability dynamics and well-posedness
Thm 2: Upper bound and solution for complete graph
Thm 3: Relations with the location entropy rate
Thm 4: Truncation, approximation and computation
-
Basic ideas
Tij = min{∑k−1
s=0wXsXs+1 |X0 = i ,Xk = j , k ≥ 1
}
Fk(i , j) = P[Tij = k]
Hret-time(Tii ) = −∞∑
k=1
Fk(i , i) log Fk(i , i)
Recusive formula, for k ∈ Z>0,
Fk(i , j) = pij1{k=wij} +n∑
h=1,h 6=jpihFk−wih(h, j) (1)
where 1{·} indicator function andwhere Fk(i , j) = 0 for all k ≤ 0 and i , j
Little example with summable seriesOnly other example is complete homogeneous graph
rain sun
p21
p12
p22p11
For this special case
P(T11 = k) =
{p11, if k = 1,
p12pk−222 p21, if k ≥ 2.
H(T11) = −p11 log p11 − p12 log(p12p21)−p12p22 log p22
p21
Hret-time(P) = −2π1p11 log(p11)− 2π2p22 log(p22)− 2π1p12 log(p12)− 2π2p21 log(p21).
In general, Hret-time(P) does not admit a closed form.
Thm 1: Hitting time probability dynamics, well-posedness
Thm 1: Hitting time probability dynamics and well-posedness
Given an irreducible Markov chain P ∈ Rn×n on weighted digraph G,1 hitting time probabilities satisfy
vec(Fk) =n∑
i ,j=1
pij([1n − ei ]⊗ eie>j ) vec(Fk−wij )
+ vec(P ◦ 1{k1n1>n =W })
2 discrete-time affine system with delays – is exponentially stable
Hret-time is a continuous function over a compact set
Consider a compact set of Schur stable matrices A ⊂ Rn×n and let
ρA := maxA∈A ρ(A) < 1.
Then for any λ ∈ (ρA, 1) and for any ‖ · ‖, there exists c > 0 such that
‖Ak‖ ≤ cλk , for all A ∈ A and k ∈ Z≥0.
Consider a sequence of functions {fk : X → R}k∈Z>0 . If there exists asequence of Weierstrass scalars {Mk}k∈Z>0 such that
∑∞k=1
Mk 0,
then∑∞
k=1 fk converges uniformly. Today fk = Fk(i , i) log Fk(i , i).
The uniform limit of any sequence of continuous functions is continuous.
-
Thm 2: Upper bound and solution for complete graph
given expected value µ,
geometric distribution P[Y = k] =(
1− 1µ)k−1
1µ is maxentropic
Given a strongly connected unweighted digraph, stationary distribution π,
1 the return time entropy function is upper bounded by
Hret-time(P) ≤ −n∑
i=1
(πi log πi + (1− πi ) log(1− πi ));
2 if G is complete, the upper bound is achieved with P = 1nπ>.
upper bound only depends on the stationary distribution πcomplete unweighted G: MaxLocationEntropy = MaxReturnEntropy
Thm 3: Relations with the location entropy rate
Given an irreducible Markov chain P ∈ Rn×n over an unweighted digraphG and stationary distribution π, Hret-time(P) and Hlocation(P) satisfy
Hlocation(P) ≤ Hret-time(P) ≤ nHlocation(P).
lower bound: due to concavity of -x log x
lower bound: achieved with P is a permutation matrix, 0 = 0
upper bound: proof by analyzing the entropy of trajectories
upper bound: achieved when different return paths = different lengths
Lesson: Hret-time(P) can be very different from Hlocation(P)
Computational ideas
Given accuracy η, truncation duration Nη and tail probability satisfy
Nη =⌈ wmaxηπmin
⌉− 1 ⇒ P[Tii ≥ Nη + 1] ≤ η.
The conditional return time entropy is of interest:
(Hret-time)cond,η(P) =n∑
i=1
πiH(Tii |Tii ≤ Nη)
= −n∑
i=1
πi
Nη∑
k=1
Fk(i , i)Nη∑k=1
Fk(i , i)
logFk(i , i)
Nη∑k=1
Fk(i , i)
.
In practice, the truncated return time entropy is
(Hret-time)trunc,η(P) = −n∑
i=1
πi
Nη∑
k=1
Fk(i , i) log Fk(i , i).
Thm 4: Truncation, approximation and computation
Thm 4: Truncation, approximation and computation
Given a strongly connected weighted digraph G, stationary distribution π,1 Asymptotic agreement
Hret-time(P) = limη→0+
(Hret-time)cond,η(P) = limη→0+
(Hret-time)trunc,η(P)
2 The gradient of (Hret-time)trunc,η(P) can be computed via
vec(∂(Hret-time)trunc,η(P)
∂P
)= −
n∑
i=1
πi
Nη∑
k=1
∂(Fk(i , i) log Fk(i , i)
)
∂Fk(i , i)G>k e(i−1)n+i ,
where Gk =[∂ vec(Fk )∂p11
· · · ∂ vec(Fk )∂pnn]
satisfies a delayed linear system;
Proof: exp stability of affine delayed system + uniform bound + chain rule
-
Gradient projection algorithm
1: select: minimum edge weight �� 1,select: truncation accuracy η � 1, andselect: initial condition P0 in P�G,π
2: for iteration parameter s = 0 : (number-of-steps) do3: {Gk}k∈{1,...,Nη} := solution to Thm 4 at Ps4: ∆s := gradient of (Hret-time)trunc,η(Ps)5: Ps+1 := projectionP�G,π(Ps + (step size) ·∆s)6: end for
Outline
1 Problem setup and motivation
2 Markov chains with maximum return time entropy
3 Performance of proposed solution
4 Conclusion and future directions
Compare three chains
1 MaxReturnEntropymaxP
Hret-time(P)
2 MaxLocationEntropymaxP
Hlocation(P)
entropy rate of sequence of symbols/locations
Hlocation(P) = −n∑
i=1
πi
n∑
j=1
pij log pij
3 MinKemeny: minP
E[K (P)]Minimize the mean first passage time:
ki =∑
i
E[Tij ]πj = kj = Kemeny constant
Comparison over a ring and a grid graph 1/2
Unit travel times.Ring weights = 4 high, 4 low. Grid weights ∼ node degree.
0 10 20 30 40 50 60Return times
0
0.05
0.1
0.15
0.2
Ret
urn
Pro
abili
ty
(a) MaxReturnEntropy
0 10 20 30 40 50 60Return times
0
0.05
0.1
0.15
0.2
Ret
urn
Pro
abili
ty
(b) MaxLocationEntropy
0 10 20 30 40 50 60Return times
0
0.05
0.1
0.15
0.2
Ret
urn
Pro
abili
ty
(c) MinKemeny
0 10 20 30 40 50 60Return times
0
0.05
0.1
0.15
0.2
0.25
0.3
Ret
urn
Pro
abili
ty
(d) MaxReturnEntropy
0 10 20 30 40 50 60Return times
0
0.05
0.1
0.15
0.2
0.25
0.3
Ret
urn
Pro
abili
ty
(e) MaxLocationEntropy
0 10 20 30 40 50 60Return times
0
0.05
0.1
0.15
0.2
0.25
0.3
Ret
urn
Pro
abili
ty
(f) MinKemeny
-
Comparison over a ring and a grid graph 2/2
Graph Markov chains Hret-time(P) Hlocation(P)Kemenyconstant
8-node ring MaxReturnEntropy 2.49 0.86 10.04MaxLocationEntropy 2.35 0.98 19.53
MinKemeny 1.96 0.46 6.16
4-by-4 gridMaxReturnEntropy 3.65 0.94 16.35
MaxLocationEntropy 3.28 1.40 30.86MinKemeny 2.09 0.21 10.09
MaxReturnEntropy chain combines speed and unpredictability.MaxReturnEntropy is nonreversible and thus faster in general.
Comparison over San Francisco map 1/3Stochastic surveillance: Motivating example 2/2
San Francisco
crime rate at 12 locations
complete by-car travel times(quantized in minutes)
π ∼ crime rate
A rational intruder:
Picks a node i to attack with probability πi for duration τ
Learns the inter-visit time statistics of surveillance agent
Attacks at time which maximizes likelihood of not being detected
Comparison over San Francisco map 2/3
(g) MaxReturnEntropy (h) MinKemeny
Figure: Pixel image of the Markov chains with row sum being 1
MinKemeny chain is close to a shortest tour with self weights
MaxReturnEntropy chain is dense and creates more return entropy
Comparison over San Francisco map 3/3: high vs. low
0 10 20 30 40 50 60Return times
0
0.01
0.02
0.03
0.04
Ret
urn
Pro
abili
ty
0 10 20 30 40 50 60Return times
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Ret
urn
Pro
abili
ty
0 10 20 30 40 50 60Return times
0
0.005
0.01
0.015
0.02
Ret
urn
Pro
abili
ty
0 10 20 30 40 50 60Return times
0
0.02
0.04
0.06
0.08
0.1
Ret
urn
Pro
abili
ty
-
Comparison in catching the rational intruder 1/2
Rational intruder:
Picks a node i to attack with probability πi
Collects the inter-visit (return) time statistics of the agent
Attacks when the agent is absent for si timesteps since last visit
si = argmin0≤s≤Si
{∑τk=1
P(Tii = s + k |Tii > s)},
where τ is the attack duration and Si is determined by the degree ofimpatience δ, i.e., P(Tii ≥ Si ) ≤ δ
Comparison in catching the rational intruder 2/2
0 5 10 15 20 25Attack duration
0
0.2
0.4
0.6
0.8
1
Pro
babi
lity
of c
aptu
re
MaxReturnEntropyMaxEntropyRateMinKemeny
0 10 20 30 40 50 60Attack duration
0
0.2
0.4
0.6
0.8
1
Pro
babi
lity
of c
aptu
re
MaxReturnEntropyMinKemeny
4× 4 grid: MaxReturnEntropy > MaxLocationEntropy,MaxReturnEntropy > MinKemeny for short attack duration
SF map: MaxReturnEntropy > MinKemeny for short attack duration
Conclusion and future directions
Conclusion
1 new metric for unpredictability of Markov chains
2 analysis and computation for maximum return time entropy chain
3 applicability (and comparison) in stochastic surveillance
Future Work
1 extensions to multi-vehicle problems
2 scalable computation for large graphs
3 transcription from continuous space/time