Nonmyopic Adaptive Informative Path Planning for Multiple Robots

30
Nonmyopic Adaptive Informative Path Planning for Multiple Robots Amarjeet Singh (UCLA) Andreas Krause (Caltech) William Kaiser (UCLA) rsrg@caltech ..where theory and practice collide

description

Nonmyopic Adaptive Informative Path Planning for Multiple Robots. Amarjeet Singh (UCLA) Andreas Krause (Caltech) William Kaiser (UCLA). rsrg @caltech. ..where theory and practice collide. TexPoint fonts used in EMF. - PowerPoint PPT Presentation

Transcript of Nonmyopic Adaptive Informative Path Planning for Multiple Robots

Page 1: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

Nonmyopic Adaptive Informative Path Planning for Multiple Robots

Amarjeet Singh (UCLA)Andreas Krause (Caltech)

William Kaiser (UCLA)

[email protected] theory and practice collide

Page 2: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

2

Can only make a limited number of measurements!

Dep

th

Location across lake

Monitoring rivers and lakes [IJCAI ‘07]Need to monitor large spatial phenomena

Temperature, nutrient distribution, fluorescence, …

Predict atunobserved

locations

NIMSKaiseret.al.

(UCLA)

Color indicates actual temperature Predicted temperatureUse robotic sensors tocover large areas

Where should we sense to get most accurate predictions?

Page 3: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

3

Urban Search & Rescue

How can we coordinate multiple search & rescue helicopters to quickly locate moving survivors?

Detection Range

Detected Survivors

Page 4: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

4

Related workInformation gathering problems considered in

Experimental design (Lindley ’56, Robbins ’52…), Value of information (Howard ’66), Spatial statistics (Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …)

Existing algorithms typicallyHeuristics: No guarantees! Can do arbitrarily badly.Find optimal solutions (Mixed integer programming, POMDPs):

Very difficult to scale to bigger problems.

Want algorithms that have theoretical guaranteesand scale to large problems!

Page 5: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

5

How to quantify collected information?Sensing quality function F(A) assigns utility to set A of locations, e.g.,

Expected reduction in MSE for predictions based GP model

F(A1) = 4 F(A2) = 10

Want to pick sensing locations A µ V to maximize F(A)

Page 6: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

66

Selecting sensing locationsGiven: finite set V of locationsWant: A*µ V such that

Typically NP-hard!

Greedy algorithm:Start with A = ;For i = 1 to k

s* := argmaxs F(A [ {s})

A := A [ {s*}

G1 G2

G3

G4

How well does the greedy algorithm do?

Page 7: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

7

Y1Y2

Y3

Y4Y5

Selection B = {Y1,…, Y5}

Key observation: Diminishing returns

Y1Y2

Selection A = {Y1, Y2}

Adding Y’ will help a lot! Adding Y’ doesn’t help muchY‘

New observation Y’

Y’B AY’

+

+

Large improvement

Small improvement

For A µ B, F(A [ {Y’}) – F(A) ¸ F(B [ {Y’}) – F(B)

Submodularity:

Many sensing quality functions are submodular*:Information gain [Krause & Guestrin ’05]Expected Mean Squared Error [Das & Kempe ’08]Detection time / likelihood [Krause et al. ’08]…

*See paper for details

Page 8: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

88

Selecting sensing locationsGiven: finite set V of locationsWant: A*µ V such that

Typically NP-hard!

Greedy algorithm:Start with A = ;For i = 1 to k

s* := argmaxs F(A [ {s})

A := A [ {s*}

G1 G2

G3

G4

Theorem [Nemhauser et al. ‘78]: F(AG) ¸ (1-1/e) F(OPT)

Greedy near-optimal!

Page 9: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

9

Challenges for informative path planningUse robots to monitorenvironment

Not just select best k locations A for given F(A). Need to… take into account cost of traveling between locations… cope with environments that change over time… need to efficiently coordinate multiple agents

Want to scale to very large problems and have guarantees

Page 10: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

10

Outline and Contributions

Path Constraints

Dynamicenvironments

Multi-robotcoordination

Page 11: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

11

Informative path planning

So far:max F(A) s.t. |A|· k

s1s2

s4

s5s3

2 11

1

s10 s11

11 1

2

Most informative locationsmight be far apart!Robot needs to travelbetween selected locations

Locations V nodes in a graphC(A) = cost of cheapest path

connecting nodes A

max F(A) s.t. C(A) · B

Known as submodular orienteering problem.

Best known algorithms (Chekuri & Pal ’05, Singh et al ’07) are superpolynomial!

Can we exploit additional structure to get better algorithms?

Greedy algorithm fails arbitrarily badly!

Page 12: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

12

If A, B are observation sets close by, then F(A [ B) < F(A) + F(B)If A, B are observation sets, at least r apart, then F(A [ B) ¼ F(A) + F(B)

Sensors that are far apart are approximately independentHolds for many objective functions (e.g., GPs with decaying covariance etc.)We showed locality is empirically valid!

Additional structure: Locality

A1

F(B) B1

B2

r

A2

F(A)

Call such an F(r,°)-local

[we only assume F(A [ B) ¸ ° (F(A) + F(B))]

Page 13: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

13

The pSPIELOR Algorithm based on sensor placement algorithm by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06

pSPIEL: Efficient nonmyopic algorithm(padded Sensor Placements at Informative and cost-Effective Locations) Select starting and ending

location s1 and sB

Decompose sensing region into small, well-separated clustersSolve cardinality constrained problem per cluster (greedy)Combine solutions using orienteering algorithmSmooth resulting path

C1 C2

C3

C4

S1 SB

g4,2

g4,1

g2,2

g2,1

g3,1

g3,2

g3,3

g1,1

g1,2

g1,3 g2,3

g3,4

g4,4

g4,3

Page 14: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

14

Theorem: For (r,°)-local submodular F pSPIEL finds a path A with

submodular utility F(A) ¸ (°) OPTF

path length C(A) · O(r) OPTC

Guarantees for pSPIELOR based on results by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06

*See paper for details

Page 15: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

15

pSPIEL Results: Search & RescueSensor Planning Research Challenge

Coordination of multiple mobile sensors to detect survivors of major urban disasterBuildings obstruct viewfield of cameraF(A) = Expected # of people detected

Detection Range

Rescue Range

Detected Survivors

Rescued Survivors

pSPIELGreedy

Heuristic(Chao et al)

Number of timesteps

Expe

cted

num

ber o

f sur

vivo

rs re

scue

d

0 10 20 30 40 500

20

40

60

80

pSPIEL outperforms existing algorithmsfor informative path planning

Page 16: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

16

Outline and Contributions

Path Constraints

Dynamicenvironments

Multi-robotcoordination

pSPIELOR exploits(r,°)-locality to near-optimallysolve submodular orienteering

Page 17: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

17

Dynamic environmentsSo far: maxA F(A) s.t. C(A) · B

Assumes we know the sensing quality F in advancePlan a fixed (nonadaptive) path / placement A

In practice:Model unknown; need to learn as we goEnvironment changes dynamically

Active learning: Find adaptive policy that modifies solution based on observations

Gigantic POMDP (intractable)

Can we efficiently find a good solution?

Page 18: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

18

Sequential sensing

expected utility over outcome of observations

X5=?

X3 =? X2 =?

X7 =?

F(X5=17, X3=16, X7=19) = 3.4

X5=17X5=21

X3 =16

X7 =19 X12=? X23 =?

F(…) = 2.1 F(…) = 2.4

Sensingpolicy

F() = 3.1

Want to pick sensing policy ¼ to maximize F(¼)

Page 19: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

19

At each timestep tPlan nonadaptive solution A* = argmax Ft(A)

Execute first step of nonadaptive solutionReceive observations obsUpdate sensing quality Ft+1(A) = Ft(A | obs) 8 A

Defines a Nonmyopic Adaptive informatIVE policy NAIVE

How well does this policy compare to the optimal policy?

NAÏVE Algorithm [Singh, K, Kaiser, IJCAI ’09]

Efficient!E.g., using

pSPIEL

Page 20: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

20

Theorem: (see paper for details)At every timestep t it holds thatFt(NAIVE) = (1) Ft(OPT) – O(H(|obs))

Guarantees for NAÏVE-pSPIEL [Singh, K, Kaiser IJCAI ‘09]

Value of optimalpolicy OPT

Uncertainty in model parameters Application specific

Need to trade off exploration (reducing H()) and exploitation (maximizing F(A))

Key idea: Replace Ft by Gt(¼) = Ft(¼) + ¸ I(£ | ¼)

where ¸ 0 is a learning rate parameter

Page 21: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

21

Exploration-exploitation tradeoff

Intermediate values of ¸ lead to best performance

0 10 20 30 40 500

20

40

60

80

100

Number of timesteps

Expe

cted

num

ber o

f sur

vivo

rs re

scue

d

= 0.1

= 0.5

= 0.9

= 0

Page 22: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

22

Results: Search & Rescue

0 10 20 30 40 500

20

40

60

80

Number of timesteps

Exp

ecte

d nu

mbe

r of

sur

vivo

rs r

escu

ed

NAIVE-Greedy

NAIVE-pSPIELOR

Greedy

pSPIELOR

Adaptive planning leads to significant performance improvement!

Page 23: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

23

Example paths

0 100 200 300 4000

100

200

300

400

Distance (pixels)

Dis

tanc

e (p

ixel

s)

Starting Location

Initial SurvivorLocations

0 100 200 300 4000

100

200

300

400

Distance (pixels)D

ista

nce

(pix

els)

Starting Location

Initial SurvivorLocations

Greedy algorithm pSPIELOR

Page 24: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

24

Results: environmental monitoring

Monitor photosyntheticallyactive regions underforest canopyF(A) = #”critical” regions

covered

0 10 20 30 400

0.05

0.1

0.15

0.2

Number of timesteps

% o

f criti

cal l

ocati

ons

obse

rved

NAIVE-pSPIEL

pSPIELAdaptive planning leads to significant performance improvement!

Page 25: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

25

Outline and Contributions

Path Constraints

Dynamicenvironments

Multi-robotcoordination

pSPIELOR exploits(r,°)-locality to near-optimallysolve submodular orienteering

NAÏVE-pSPIEL implicitly trades offexploration and exploitation toobtain near-optimal adaptive policy

Page 26: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

26

Multi-robot coordination

Can use single-robot algorithm to plan joint policyExponential increase in complexity with #robots

max¼1…¼k F(¼1 U ¼2 U … U ¼k)

s t

s.t. C(¼1) · B; C(¼2) · B; … ; C(¼k) · B

¼2

¼k

¼1

Page 27: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

27

Sequential allocation

s t

¼2

¼k

¼1

Use pSPIEL to find policy P1 for the first robot

max¼1 F(¼1) s.t. C(¼1) · B

Optimize for second robot (P2) committing to nodes in P1

max¼ 2 F(¼1 U ¼2) s.t. C(¼2) · B

Optimize for k-th robot (Pk) committing to nodes in P1,…,Pk-1

max¼k F(¼1 U ¼2 U … ¼k} s.t. C(¼k) · B

Page 28: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

28

Performance comparison

Works for any single robot path adaptive planning algorithm!Independent of number of robots used!Key tool for analysis: Extension of submodular functions to adaptive policies

RewardSA¸

RewardOpt

1 +

Greedy selection of

nodes with no path cost constraintArbitrarily Poor

NAÏVE-pSPIELOR policy planning

RewardPS ¸RewardOpt

= O(1/°)

Sequential allocation for multiple robots – Greedy over policies

??

Theorem:

Page 29: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

29

Multi-robot results

Diminishing returns as the number of robots increases

0 10 20 30 40 500

20

40

60

80

100

120

Number of timesteps

Aver

age

num

ber o

f sur

vivo

rs re

scue

d

1 Robot

2 Robots

3 Robots

Page 30: Nonmyopic Adaptive Informative Path Planning for Multiple Robots

30

ConclusionsNew algorithm pSPIELOR for nonadaptive informative path planning for (r,°)-local submodular functionsNew algorithm, NAÏVE-pSPIELOR for adaptive informative path planning using implicit exploration-exploitation analysisExtensions to multiple robots by sequential allocationPerform well on real world problems