Nonmyopic Adaptive Informative Path Planning for Multiple Robots
Amarjeet Singh (UCLA)Andreas Krause (Caltech)
William Kaiser (UCLA)
[email protected] theory and practice collide
2
Can only make a limited number of measurements!
Dep
th
Location across lake
Monitoring rivers and lakes [IJCAI ‘07]Need to monitor large spatial phenomena
Temperature, nutrient distribution, fluorescence, …
Predict atunobserved
locations
NIMSKaiseret.al.
(UCLA)
Color indicates actual temperature Predicted temperatureUse robotic sensors tocover large areas
Where should we sense to get most accurate predictions?
3
Urban Search & Rescue
How can we coordinate multiple search & rescue helicopters to quickly locate moving survivors?
Detection Range
Detected Survivors
4
Related workInformation gathering problems considered in
Experimental design (Lindley ’56, Robbins ’52…), Value of information (Howard ’66), Spatial statistics (Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …)
Existing algorithms typicallyHeuristics: No guarantees! Can do arbitrarily badly.Find optimal solutions (Mixed integer programming, POMDPs):
Very difficult to scale to bigger problems.
Want algorithms that have theoretical guaranteesand scale to large problems!
5
How to quantify collected information?Sensing quality function F(A) assigns utility to set A of locations, e.g.,
Expected reduction in MSE for predictions based GP model
F(A1) = 4 F(A2) = 10
Want to pick sensing locations A µ V to maximize F(A)
66
Selecting sensing locationsGiven: finite set V of locationsWant: A*µ V such that
Typically NP-hard!
Greedy algorithm:Start with A = ;For i = 1 to k
s* := argmaxs F(A [ {s})
A := A [ {s*}
G1 G2
G3
G4
How well does the greedy algorithm do?
7
Y1Y2
Y3
Y4Y5
Selection B = {Y1,…, Y5}
Key observation: Diminishing returns
Y1Y2
Selection A = {Y1, Y2}
Adding Y’ will help a lot! Adding Y’ doesn’t help muchY‘
New observation Y’
Y’B AY’
+
+
Large improvement
Small improvement
For A µ B, F(A [ {Y’}) – F(A) ¸ F(B [ {Y’}) – F(B)
Submodularity:
Many sensing quality functions are submodular*:Information gain [Krause & Guestrin ’05]Expected Mean Squared Error [Das & Kempe ’08]Detection time / likelihood [Krause et al. ’08]…
*See paper for details
88
Selecting sensing locationsGiven: finite set V of locationsWant: A*µ V such that
Typically NP-hard!
Greedy algorithm:Start with A = ;For i = 1 to k
s* := argmaxs F(A [ {s})
A := A [ {s*}
G1 G2
G3
G4
Theorem [Nemhauser et al. ‘78]: F(AG) ¸ (1-1/e) F(OPT)
Greedy near-optimal!
9
Challenges for informative path planningUse robots to monitorenvironment
Not just select best k locations A for given F(A). Need to… take into account cost of traveling between locations… cope with environments that change over time… need to efficiently coordinate multiple agents
Want to scale to very large problems and have guarantees
10
Outline and Contributions
Path Constraints
Dynamicenvironments
Multi-robotcoordination
11
Informative path planning
So far:max F(A) s.t. |A|· k
s1s2
s4
s5s3
2 11
1
s10 s11
11 1
2
Most informative locationsmight be far apart!Robot needs to travelbetween selected locations
Locations V nodes in a graphC(A) = cost of cheapest path
connecting nodes A
max F(A) s.t. C(A) · B
Known as submodular orienteering problem.
Best known algorithms (Chekuri & Pal ’05, Singh et al ’07) are superpolynomial!
Can we exploit additional structure to get better algorithms?
Greedy algorithm fails arbitrarily badly!
12
If A, B are observation sets close by, then F(A [ B) < F(A) + F(B)If A, B are observation sets, at least r apart, then F(A [ B) ¼ F(A) + F(B)
Sensors that are far apart are approximately independentHolds for many objective functions (e.g., GPs with decaying covariance etc.)We showed locality is empirically valid!
Additional structure: Locality
A1
F(B) B1
B2
r
A2
F(A)
Call such an F(r,°)-local
[we only assume F(A [ B) ¸ ° (F(A) + F(B))]
13
The pSPIELOR Algorithm based on sensor placement algorithm by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06
pSPIEL: Efficient nonmyopic algorithm(padded Sensor Placements at Informative and cost-Effective Locations) Select starting and ending
location s1 and sB
Decompose sensing region into small, well-separated clustersSolve cardinality constrained problem per cluster (greedy)Combine solutions using orienteering algorithmSmooth resulting path
C1 C2
C3
C4
S1 SB
g4,2
g4,1
g2,2
g2,1
g3,1
g3,2
g3,3
g1,1
g1,2
g1,3 g2,3
g3,4
g4,4
g4,3
14
Theorem: For (r,°)-local submodular F pSPIEL finds a path A with
submodular utility F(A) ¸ (°) OPTF
path length C(A) · O(r) OPTC
Guarantees for pSPIELOR based on results by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06
*See paper for details
15
pSPIEL Results: Search & RescueSensor Planning Research Challenge
Coordination of multiple mobile sensors to detect survivors of major urban disasterBuildings obstruct viewfield of cameraF(A) = Expected # of people detected
Detection Range
Rescue Range
Detected Survivors
Rescued Survivors
pSPIELGreedy
Heuristic(Chao et al)
Number of timesteps
Expe
cted
num
ber o
f sur
vivo
rs re
scue
d
0 10 20 30 40 500
20
40
60
80
pSPIEL outperforms existing algorithmsfor informative path planning
16
Outline and Contributions
Path Constraints
Dynamicenvironments
Multi-robotcoordination
pSPIELOR exploits(r,°)-locality to near-optimallysolve submodular orienteering
17
Dynamic environmentsSo far: maxA F(A) s.t. C(A) · B
Assumes we know the sensing quality F in advancePlan a fixed (nonadaptive) path / placement A
In practice:Model unknown; need to learn as we goEnvironment changes dynamically
Active learning: Find adaptive policy that modifies solution based on observations
Gigantic POMDP (intractable)
Can we efficiently find a good solution?
18
Sequential sensing
expected utility over outcome of observations
X5=?
X3 =? X2 =?
X7 =?
F(X5=17, X3=16, X7=19) = 3.4
X5=17X5=21
X3 =16
X7 =19 X12=? X23 =?
F(…) = 2.1 F(…) = 2.4
Sensingpolicy
F() = 3.1
Want to pick sensing policy ¼ to maximize F(¼)
19
At each timestep tPlan nonadaptive solution A* = argmax Ft(A)
Execute first step of nonadaptive solutionReceive observations obsUpdate sensing quality Ft+1(A) = Ft(A | obs) 8 A
Defines a Nonmyopic Adaptive informatIVE policy NAIVE
How well does this policy compare to the optimal policy?
NAÏVE Algorithm [Singh, K, Kaiser, IJCAI ’09]
Efficient!E.g., using
pSPIEL
20
Theorem: (see paper for details)At every timestep t it holds thatFt(NAIVE) = (1) Ft(OPT) – O(H(|obs))
Guarantees for NAÏVE-pSPIEL [Singh, K, Kaiser IJCAI ‘09]
Value of optimalpolicy OPT
Uncertainty in model parameters Application specific
Need to trade off exploration (reducing H()) and exploitation (maximizing F(A))
Key idea: Replace Ft by Gt(¼) = Ft(¼) + ¸ I(£ | ¼)
where ¸ 0 is a learning rate parameter
21
Exploration-exploitation tradeoff
Intermediate values of ¸ lead to best performance
0 10 20 30 40 500
20
40
60
80
100
Number of timesteps
Expe
cted
num
ber o
f sur
vivo
rs re
scue
d
= 0.1
= 0.5
= 0.9
= 0
22
Results: Search & Rescue
0 10 20 30 40 500
20
40
60
80
Number of timesteps
Exp
ecte
d nu
mbe
r of
sur
vivo
rs r
escu
ed
NAIVE-Greedy
NAIVE-pSPIELOR
Greedy
pSPIELOR
Adaptive planning leads to significant performance improvement!
23
Example paths
0 100 200 300 4000
100
200
300
400
Distance (pixels)
Dis
tanc
e (p
ixel
s)
Starting Location
Initial SurvivorLocations
0 100 200 300 4000
100
200
300
400
Distance (pixels)D
ista
nce
(pix
els)
Starting Location
Initial SurvivorLocations
Greedy algorithm pSPIELOR
24
Results: environmental monitoring
Monitor photosyntheticallyactive regions underforest canopyF(A) = #”critical” regions
covered
0 10 20 30 400
0.05
0.1
0.15
0.2
Number of timesteps
% o
f criti
cal l
ocati
ons
obse
rved
NAIVE-pSPIEL
pSPIELAdaptive planning leads to significant performance improvement!
25
Outline and Contributions
Path Constraints
Dynamicenvironments
Multi-robotcoordination
pSPIELOR exploits(r,°)-locality to near-optimallysolve submodular orienteering
NAÏVE-pSPIEL implicitly trades offexploration and exploitation toobtain near-optimal adaptive policy
26
Multi-robot coordination
Can use single-robot algorithm to plan joint policyExponential increase in complexity with #robots
max¼1…¼k F(¼1 U ¼2 U … U ¼k)
s t
s.t. C(¼1) · B; C(¼2) · B; … ; C(¼k) · B
¼2
¼k
¼1
27
Sequential allocation
s t
¼2
¼k
¼1
Use pSPIEL to find policy P1 for the first robot
max¼1 F(¼1) s.t. C(¼1) · B
Optimize for second robot (P2) committing to nodes in P1
max¼ 2 F(¼1 U ¼2) s.t. C(¼2) · B
Optimize for k-th robot (Pk) committing to nodes in P1,…,Pk-1
max¼k F(¼1 U ¼2 U … ¼k} s.t. C(¼k) · B
28
Performance comparison
Works for any single robot path adaptive planning algorithm!Independent of number of robots used!Key tool for analysis: Extension of submodular functions to adaptive policies
RewardSA¸
RewardOpt
1 +
Greedy selection of
nodes with no path cost constraintArbitrarily Poor
NAÏVE-pSPIELOR policy planning
RewardPS ¸RewardOpt
= O(1/°)
Sequential allocation for multiple robots – Greedy over policies
??
Theorem:
29
Multi-robot results
Diminishing returns as the number of robots increases
0 10 20 30 40 500
20
40
60
80
100
120
Number of timesteps
Aver
age
num
ber o
f sur
vivo
rs re
scue
d
1 Robot
2 Robots
3 Robots
30
ConclusionsNew algorithm pSPIELOR for nonadaptive informative path planning for (r,°)-local submodular functionsNew algorithm, NAÏVE-pSPIELOR for adaptive informative path planning using implicit exploration-exploitation analysisExtensions to multiple robots by sequential allocationPerform well on real world problems
Top Related