Download - Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Transcript
Page 1: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Li Li and Nilufer OnderDepartment of Computer ScienceMichigan Technological University

(Presented by: Li Li)AAAI 08 Chicago July 16, 2008

Page 2: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Outline

Example domain Two usages of concurrent actions AO* and CPOAO* algorithms Heuristics used in CPOAO* Experiment results Conclusion and future work

Page 3: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

A simple Mars rover domain

Locations A, B, C and D on Mars:

A

B

C

D

Page 4: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Main features Aspects of complex domains

Deadlines, limited resources Failures Oversubscription Concurrency

Two types of parallel actions Different goals (“all finish”) Redundant (“early finish”)

Aborting actions When they succeed When they fail

Page 5: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

The actions

Action Success probability

Description

Move(L1,L2) 100% Move the rover from Location L1 to location L2

Sample (L) 70% Collect a soil sample at location L

Camera (L) 60% Take a picture at location L

Page 6: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Problem 1 Initial state:

The rover is at location A No other rewards have been achieved

Rewards: r1 = 10: Get back to location A r2 = 2: Take a picture at location B r3 = 1: Collect a soil sample at location

B r4 = 3: Take a picture at location C

Page 7: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Problem 1 Time Limit:

The rover is only allowed to operate for 3 time units

Actions: Each action takes 1 time unit to finish Actions can be executed in parallel if

they are compatible

Page 8: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

A solution to problem 1 (1) Move (A, B) (2) Camera (B)

Sample (B) (3) Move (B, A)

A

B

D

C R4=3

R1=10

R2=2R3=1

Page 9: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Add redundant actions Actions Camera0 (60%) and

Camera1 (50%) can be executed concurrently.

There are two rewards: R1: Take a picture P1 at location A R2: Take a picture P2 at location A

Page 10: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Two ways of using concurrent actions

All finish: Speed up the executionUse concurrent actions to achieve different goals.

Early finish: Redundancy for critical tasksUse concurrent actions to achieve the same goal.

Page 11: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Example of all finish actions If R1=10 and R2=10, execute Camera0 to achieve one reward

and execute Camera1 to achieve another.(All finish)

The expected total rewards = 10*60% + 10*50%

= 11

Page 12: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Example of early finish actions If R1=100 and R2=10,

Use both Camera0 and Camera1 to achieve R1. (Early finish)

The expected total rewards = 100*50% + (100-100*50%)*60% = 50 + 30 = 80

Page 13: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

The AO* algorithmAO* searches in an and-or graph (hypergraph)

OR

AND Hyperarcs (compact way)

Page 14: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Concurrent Probabilistic Over-subscription AO* (CPOAO*) Concurrent action set

Represent parallel actions rather than individual actions Use hyperarcs to represent them

State Space Resource levels are part of a state Unfinished actions are part of a state

Page 15: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* search Example

A

B

C

D

3

4

4

5

6

A Mars Rover problemMap:

Actions: Move (Location, Location) Image_S (Target) 50%, T= 4 Image_L (Target) 60%, T= 5 Sample (Target) 70%, T= 6

Targets: I1 – Image at location B I2 – Image at location C S1 – Sample at location B S2 – Sample at location DRewards: Have_Picture(I1) = 3 Have_Picture(I2) = 4 Have_Sample(S1) = 4 Have_Sample(S2) = 5 At_location(A) = 10;

Page 16: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* search Example

18.4S0

A

B

C

D

3

4

4

5

6

Expected reward calculated using the heuristics

T=10

Page 17: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* Search Example

15.2

S0

T=10

{Move(A,B)} {Move(A,C)

}

Do-nothing{Move(A,D)}

3 04 6

S1

S2

S3

S415.

213.2 0 10

T=7 T=4T=6 T=10

A

B

C

D

3

4

4

5

6

Values of terminal nodes

Expected reward calculated using the heuristics

Expected reward calculated from children

Best action

Page 18: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* search Example

S1

15.2

{Move(B,D)}

Do-nothing

{Move(A,B)}

{a1,a2,a3}

4 4 0

a1: Sample(T2)

a3: Image_L(T1)a2: Image_S(T1)

S5

S6

S7

S815.

814.6

0 0T=3 T=3 T=3 T=750% 50%

Sample(T2)=2 Sample(T2)=2Image_L(T1)=1 Image_L(T1)=1

A

B

C

D

3

4

4

5

6

Values of terminal nodes

Expected reward calculated using the heuristics

Expected reward calculated from children

Best action

S0

15.2

Page 19: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* search ExampleS1

15.2

{Move(B,D)}

Do-nothing

{Move(A,B)}

{a1,a2,a3}

4 4 0

a1: Sample(T2)

a3: Image_L(T1)a2: Image_S(T1)

S5

S6

S7

S815.

814.6

0 0T=3 T=3 T=3 T=750% 50%

Sample(T2)=2 Sample(T2)=2Image_L(T1)=1 Image_L(T1)=1

A

C

D

3

4

4

5

6

Values of terminal nodes

Expected reward calculated using the heuristics

Expected reward calculated from children

Best action

S0

15.2

Page 20: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* search Example

S1

11.5

{Move(B,D)}

Do-nothing

{Move(A,B)}

{a1,a2,a3}

4 4 0

a1: Sample(T2)

a3: Image_L(T1)a2: Image_S(T1)

S5 S

6 S7

S8

13 100 0

T=3

T=2

T=7

50%Sample(T2)=2

Image_L(T1)=1

S10S9

S11

S12

S13 S14

S15

S167 3 13 3 5.8 2.8 010T=1

T=1

T=2T=3T=0 T=0 T=3

a4: Move(B,A)

{a1}

2 1{a1,a2}

a5: Do-nothing

{a4}

{a5}

{a4}

{a5}

3 3 0

0

A

B

C

D

3

4

4

5

6

Values of terminal nodes

Expected reward calculated using the heuristics

Expected reward calculated from children

Best action

S0 13.

2S213.2

Page 21: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* search Example

13.2

S0

T=10

{Move(A,B)} {Move(A,C)

}

Do-nothing{Move(A,D)}

3 04 6S1 S

2S3

S4

11.5

13.2 0 10T=7T=4T=6 T=1

0{a1,a2,a3}

a1: Sample(T2)

a3: Image_L(T1)a2: Image_S(T1)

a4: Move(B,A)a5: Do-nothing

A

B

C

D

3

4

4

5

6

Values of terminal nodes

Expected reward calculated using the heuristics

Expected reward calculated from children

Best action

Page 22: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* search Example

11.5

S0

T=10

{Move(A,B)} {Move(A,C)

}

Do-nothing{Move(A,D)}

3 04 6S1

S2 S

3S4

11.5

3.20 10T=7T=4 T=1

0{a1,a2,a3}

T=6

S17

S18

S19

S20T=2 T=2 T=1 T=64 2.4 0 0

a1: Sample(T2)

a3: Image_L(T1)a2: Image_S(T1)

a4: Move(B,A)a5: Do-nothinga6: Image_S(T3)

a8: Move(C,D)

{a6,a7} {a8

}Do-nothing

04 5

50% 50%

A

B

C

D

3

4

4

5

6

Values of terminal nodes

Expected reward calculated using the heuristics

Expected reward calculated from children

Best action

a7: Image_L(T3)

Page 23: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* search Example

S1

11.5

{Move(B,D)}

Do-nothing

{Move(A,B)}

{a1,a2,a3}

4 4 0

a1: Sample(T1)

a3: Image_L(T2)a2: Image_S(T2)

S5 S

6 S7

S8

1310

0 0

T=3

T=2

T=7

50%Sample(T2)=2

Image_L(T1)=1

S10S9

S11

S12

S13 S14

S15

S165 3 13 3 4.4 1.4 010T=1

T=1

T=2T=3T=0 T=0 T=3

a4: Move(B,A)

{a1}

2 1{a1,a2}

a5: Do-nothing

{a4}

{a5}

{a4}

{a5}

3 3 0

0

11.5

S0

A

B

C

D

3

4

4

5

6

Values of terminal nodes

Expected reward calculated using the heuristics

Expected reward calculated from children

Best action

T=3

Page 24: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

CPOAO* search improvements

S1

11.5

{Move(B,D)}

Do-nothing

{Move(A,B)}

{a1,a2,a3}

4 4 0S5 S

6 S7

S8

1310

0 0

T=3

T=2

T=7

50%Sample(T2)=2

Image_L(T1)=1

S10S9

S11

S12

S13 S14

S15

S165 3 13 3 4.4 1.4 010T=1

T=1

T=2T=3T=0 T=0 T=3

{a1}

2 1{a1,a2}{a4

}

{a5}

{a4}

{a5}

3 3 0

0

11.5

S0Estimate total expected rewards

Prune branches

T=3

Plan Found: Move(A,B) Image_S(T1) Move(B,A)

Page 25: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Heuristics used in CPOAO* A heuristic function to estimate the

total expected reward for the newly generated states using a reverse plan graph.

A group of rules to prune the branches of the concurrent action sets.

Page 26: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Estimating total rewards A three-step process using an rpgraph

1. Generate an rpgraph for each goal2. Identify the enabled propositions3. Compute the probability of achieving each goal

Compute the expected rewards based on the probabilities Sum up the rewards to compute the value of this state

Page 27: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Heuristics to estimate the total rewards

Have_Picture(I1)

At_Location(B)Image_S(I1)

Move(A,B)At_Location(A)

At_Location(D)

Reverse plan graph

Start from goals.

Layers of actions and propositions

Cost marked on the actions

Accumulated cost marked on the propositions.

4

58

4

8

7

4

5 3

3

Move(A,B)Image_L(I1)

At_Location(B) Move(D,B) At_Location(D)

At_Location(A)

Move(D,B) 4

9

Page 28: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Heuristics to estimate the total rewards Given a specific state … At_Location(A)=T Time= 7 Enabled propositions are marked in blue.

Have_Picture(I1)

At_Location(B)Image_S(I1)

Move(A,B)At_Location(A)

At_Location(D)4

58

4

8

7

4

5 3

3

Move(A,B)Image_L(I1)

At_Location(B) Move(D,B) At_Location(D)

At_Location(A)

Move(D,B) 4

9

Page 29: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Heuristics to estimate the total rewards Enable more

actions and propositions

Actions are probabilistic

Estimate the probabilities that propositions and the goals being true

Sum up rewards on all goals.

Have_Picture(I1)

At_Location(B)Image_S(I1)

Move(A,B)At_Location(A)

At_Location(D)4

58

4

8

7

4

5 3

3

Move(A,B)Image_L(I1)

At_Location(B) Move(D,B)

Move(D,B) 4

9

Page 30: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Rules to prune branches (when time is the only resource) Include the action if it does not delete anything Ex. {action-1, action-2, action-3} is better than {action-2,action-3} if action-1 does not delete anything. Include the action if it can be aborted later Ex. {action-1,action-2} is better than {action-1} if the duration of action2 is longer than the duration of action-1. Don’t abort an action and restart it again immediately

Page 31: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Experimental work Mars rover problem with following actions

Move Collect-Soil-Sample Camera0 Camera1

3 sets of experiments - Gradually increase complexity Base Algorithm With rules to prune branches only With both heuristics

Page 32: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Results of complexity experimentsProblem Base CPOAO* CPOAO* With Pruning Only CPOAO* With Pruning and rpgraph

Time=20 Time=20 Time=40 Time=20 Time=40NG ET (s) NG ET (s) NG ET (s) NG ET (s) NG ET (s)

12-12-12 530 <0.1 120 <0.1 2832 1 56 <0.1 662 <0.112-12-23 1170 <0.1 287 <0.1 27914 23 86 <0.1 5269 212-21-12 501 <0.1 204 <0.1 11315 3 53 <0.1 1391 <0.112-21-23 1230 <0.1 380 <0.1 85203 99 116 <0.1 11833 515-16-14 1067 <0.1 180 <0.1 6306 2 60 <0.1 1232 115-16-31 1941 <0.1 417 <0.1 49954 61 93 <0.1 7946 215-28-14 1121 <0.1 347 <0.1 29760 18 71 <0.1 2460 <0.115-28-31 2345 <0.1 694 <0.1 --- --- 146 <0.1 19815 8

Problem: locations-paths-rewards; NG: The number of nodes generated; ET: Execution time (sec.)

Page 33: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Ongoing and Future Work Ongoing

Add typed objects and lifted actions Add linear resource consumptions

Future Explore the possibility of using state caching Classify domains and develop domain-specific heuristic functions Approximation techniques

Page 34: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Related Work LAO*: A Heuristic Search Algorithm that finds solutions with loops (Hansen & Zilberstein 2001) CoMDP: Concurrent MDP (Mausam & Weld 2004) GSMDP: Generalized semi-Markov decision

process (Younes & Simmons 2004) mGPT: A Probabilistic Planner based on Heuristic Search (Bonet & Geffner 2005) Over-subscription Planning (Smith 2004; Benton, Do, & Kambhampati 2005) HAO*: Planning with Continuous Resources in Stochastic Domains (Mausam, Benazera, Brafman, Meuleau & Hansen 2005)

Page 35: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Related Work CPTP:Concurrent Probabilistic Temporal

Planning (Mausam & Weld 2005) Paragraph/Prottle: Concurrent Probabilistic

Planning in the Graphplan Framework (Little & Thiebaux 2006) FPG: Factored Policy Gradient Planner (Buffet & Aberdeen 2006) Probabilistic Temporal Planning with Uncertain

Durations (Mausam & Weld 2006) HYBPLAN:A Hybridized Planner for Stochastic

Domains (Mausam, Bertoli and Weld 2007)

Page 36: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

Conclusion An AO* based modular framework Use redundant actions to increase

robustness Abort running actions when needed Heuristic function using reverse plan graph Rules to prune branches