Download - Nonmyopic Adaptive Informative Path Planning for Multiple Robots

Nonmyopic Adaptive Informative Path Planning for Multiple Robots

Amarjeet Singh (UCLA)Andreas Krause (Caltech)

William Kaiser (UCLA)

[email protected] theory and practice collide

2

Can only make a limited number of measurements!

Dep

th

Location across lake

Monitoring rivers and lakes [IJCAI ‘07]Need to monitor large spatial phenomena

Temperature, nutrient distribution, fluorescence, …

Predict atunobserved

locations

NIMSKaiseret.al.

(UCLA)

Color indicates actual temperature Predicted temperatureUse robotic sensors tocover large areas

Where should we sense to get most accurate predictions?

3

Urban Search & Rescue

How can we coordinate multiple search & rescue helicopters to quickly locate moving survivors?

Detection Range

Detected Survivors

4

Related workInformation gathering problems considered in

Experimental design (Lindley ’56, Robbins ’52…), Value of information (Howard ’66), Spatial statistics (Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …)

Existing algorithms typicallyHeuristics: No guarantees! Can do arbitrarily badly.Find optimal solutions (Mixed integer programming, POMDPs):

Very difficult to scale to bigger problems.

Want algorithms that have theoretical guaranteesand scale to large problems!

5

How to quantify collected information?Sensing quality function F(A) assigns utility to set A of locations, e.g.,

Expected reduction in MSE for predictions based GP model

F(A1) = 4 F(A2) = 10

Want to pick sensing locations A µ V to maximize F(A)

66

Selecting sensing locationsGiven: finite set V of locationsWant: A*µ V such that

Typically NP-hard!

Greedy algorithm:Start with A = ;For i = 1 to k

s* := argmaxs F(A [ {s})

A := A [ {s*}

G1 G2

G3

G4

How well does the greedy algorithm do?

7

Y1Y2

Y3

Y4Y5

Selection B = {Y1,…, Y5}

Key observation: Diminishing returns

Y1Y2

Selection A = {Y1, Y2}

Adding Y’ will help a lot! Adding Y’ doesn’t help muchY‘

New observation Y’

Y’B AY’

+

+

Large improvement

Small improvement

For A µ B, F(A [ {Y’}) – F(A) ¸ F(B [ {Y’}) – F(B)

Submodularity:

Many sensing quality functions are submodular*:Information gain [Krause & Guestrin ’05]Expected Mean Squared Error [Das & Kempe ’08]Detection time / likelihood [Krause et al. ’08]…

*See paper for details

88

Selecting sensing locationsGiven: finite set V of locationsWant: A*µ V such that

Typically NP-hard!

Greedy algorithm:Start with A = ;For i = 1 to k

s* := argmaxs F(A [ {s})

A := A [ {s*}

G1 G2

G3

G4

Theorem [Nemhauser et al. ‘78]: F(AG) ¸ (1-1/e) F(OPT)

Greedy near-optimal!

9

Challenges for informative path planningUse robots to monitorenvironment

Not just select best k locations A for given F(A). Need to… take into account cost of traveling between locations… cope with environments that change over time… need to efficiently coordinate multiple agents

Want to scale to very large problems and have guarantees

10

Outline and Contributions

Path Constraints

Dynamicenvironments

Multi-robotcoordination

11

Informative path planning

So far:max F(A) s.t. |A|· k

s1s2

s4

s5s3

2 11

1

s10 s11

11 1

2

Most informative locationsmight be far apart!Robot needs to travelbetween selected locations

Locations V nodes in a graphC(A) = cost of cheapest path

connecting nodes A

max F(A) s.t. C(A) · B

Known as submodular orienteering problem.

Best known algorithms (Chekuri & Pal ’05, Singh et al ’07) are superpolynomial!

Can we exploit additional structure to get better algorithms?

Greedy algorithm fails arbitrarily badly!

12

If A, B are observation sets close by, then F(A [ B) < F(A) + F(B)If A, B are observation sets, at least r apart, then F(A [ B) ¼ F(A) + F(B)

Sensors that are far apart are approximately independentHolds for many objective functions (e.g., GPs with decaying covariance etc.)We showed locality is empirically valid!

Additional structure: Locality

A1

F(B) B1

B2

r

A2

F(A)

Call such an F(r,°)-local

[we only assume F(A [ B) ¸ ° (F(A) + F(B))]

13

The pSPIELOR Algorithm based on sensor placement algorithm by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06

pSPIEL: Efficient nonmyopic algorithm(padded Sensor Placements at Informative and cost-Effective Locations) Select starting and ending

location s1 and sB

Decompose sensing region into small, well-separated clustersSolve cardinality constrained problem per cluster (greedy)Combine solutions using orienteering algorithmSmooth resulting path

C1 C2

C3

C4

S1 SB

g4,2

g4,1

g2,2

g2,1

g3,1

g3,2

g3,3

g1,1

g1,2

g1,3 g2,3

g3,4

g4,4

g4,3

14

Theorem: For (r,°)-local submodular F pSPIEL finds a path A with

submodular utility F(A) ¸ (°) OPTF

path length C(A) · O(r) OPTC

Guarantees for pSPIELOR based on results by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06

*See paper for details

15

pSPIEL Results: Search & RescueSensor Planning Research Challenge

Coordination of multiple mobile sensors to detect survivors of major urban disasterBuildings obstruct viewfield of cameraF(A) = Expected # of people detected

Detection Range

Rescue Range

Detected Survivors

Rescued Survivors

pSPIELGreedy

Heuristic(Chao et al)

Number of timesteps

Expe

cted

num

ber o

f sur

vivo

rs re

scue

d

0 10 20 30 40 500

20

40

60

80

pSPIEL outperforms existing algorithmsfor informative path planning

16


Path Constraints

Dynamicenvironments


pSPIELOR exploits(r,°)-locality to near-optimallysolve submodular orienteering

17

Dynamic environmentsSo far: maxA F(A) s.t. C(A) · B

Assumes we know the sensing quality F in advancePlan a fixed (nonadaptive) path / placement A

In practice:Model unknown; need to learn as we goEnvironment changes dynamically

Active learning: Find adaptive policy that modifies solution based on observations

Gigantic POMDP (intractable)

Can we efficiently find a good solution?

18

Sequential sensing

expected utility over outcome of observations

X5=?

X3 =? X2 =?

X7 =?

F(X5=17, X3=16, X7=19) = 3.4

X5=17X5=21

X3 =16

X7 =19 X12=? X23 =?

F(…) = 2.1 F(…) = 2.4

Sensingpolicy

F() = 3.1

Want to pick sensing policy ¼ to maximize F(¼)

19

At each timestep tPlan nonadaptive solution A* = argmax Ft(A)

Execute first step of nonadaptive solutionReceive observations obsUpdate sensing quality Ft+1(A) = Ft(A | obs) 8 A

Defines a Nonmyopic Adaptive informatIVE policy NAIVE

How well does this policy compare to the optimal policy?

NAÏVE Algorithm [Singh, K, Kaiser, IJCAI ’09]

Efficient!E.g., using

pSPIEL

20

Theorem: (see paper for details)At every timestep t it holds thatFt(NAIVE) = (1) Ft(OPT) – O(H(|obs))

Guarantees for NAÏVE-pSPIEL [Singh, K, Kaiser IJCAI ‘09]

Value of optimalpolicy OPT

Uncertainty in model parameters Application specific

Need to trade off exploration (reducing H()) and exploitation (maximizing F(A))

Key idea: Replace Ft by Gt(¼) = Ft(¼) + ¸ I(£ | ¼)

where ¸ 0 is a learning rate parameter

21

Exploration-exploitation tradeoff

Intermediate values of ¸ lead to best performance

0 10 20 30 40 500

20

40

60

80

100

Number of timesteps

Expe

cted

num

ber o

f sur

vivo

rs re

scue

d

= 0.1

= 0.5

= 0.9

= 0

22

Results: Search & Rescue

0 10 20 30 40 500

20

40

60

80

Number of timesteps

Exp

ecte

d nu

mbe

r of

sur

vivo

rs r

escu

ed

NAIVE-Greedy

NAIVE-pSPIELOR

Greedy

pSPIELOR

Adaptive planning leads to significant performance improvement!

23

Example paths

0 100 200 300 4000

100

200

300

400

Distance (pixels)

Dis

tanc

e (p

ixel

s)

Starting Location

Initial SurvivorLocations

0 100 200 300 4000

100

200

300

400

Distance (pixels)D

ista

nce

(pix

els)

Starting Location

Initial SurvivorLocations

Greedy algorithm pSPIELOR

24

Results: environmental monitoring

Monitor photosyntheticallyactive regions underforest canopyF(A) = #”critical” regions

covered

0 10 20 30 400

0.05

0.1

0.15

0.2

Number of timesteps

% o

f criti

cal l

ocati

ons

obse

rved

NAIVE-pSPIEL

pSPIELAdaptive planning leads to significant performance improvement!

25


Path Constraints

Dynamicenvironments


pSPIELOR exploits(r,°)-locality to near-optimallysolve submodular orienteering

NAÏVE-pSPIEL implicitly trades offexploration and exploitation toobtain near-optimal adaptive policy

26

Multi-robot coordination

Can use single-robot algorithm to plan joint policyExponential increase in complexity with #robots

max¼1…¼k F(¼1 U ¼2 U … U ¼k)

s t

s.t. C(¼1) · B; C(¼2) · B; … ; C(¼k) · B

¼2

¼k

¼1

27

Sequential allocation

s t

¼2

¼k

¼1

Use pSPIEL to find policy P1 for the first robot

max¼1 F(¼1) s.t. C(¼1) · B

Optimize for second robot (P2) committing to nodes in P1

max¼ 2 F(¼1 U ¼2) s.t. C(¼2) · B

Optimize for k-th robot (Pk) committing to nodes in P1,…,Pk-1

max¼k F(¼1 U ¼2 U … ¼k} s.t. C(¼k) · B

28

Performance comparison

Works for any single robot path adaptive planning algorithm!Independent of number of robots used!Key tool for analysis: Extension of submodular functions to adaptive policies

RewardSA¸

RewardOpt

1 +

Greedy selection of

nodes with no path cost constraintArbitrarily Poor

NAÏVE-pSPIELOR policy planning

RewardPS ¸RewardOpt

= O(1/°)

Sequential allocation for multiple robots – Greedy over policies

??

Theorem:

29

Multi-robot results

Diminishing returns as the number of robots increases

0 10 20 30 40 500

20

40

60

80

100

120

Number of timesteps

Aver

age

num

ber o

f sur

vivo

rs re

scue

d

1 Robot

2 Robots

3 Robots

30

ConclusionsNew algorithm pSPIELOR for nonadaptive informative path planning for (r,°)-local submodular functionsNew algorithm, NAÏVE-pSPIELOR for adaptive informative path planning using implicit exploration-exploitation analysisExtensions to multiple robots by sequential allocationPerform well on real world problems