Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

of 36 /36
Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University (Presented by: Li Li) AAAI 08 Chicago July 16, 2008

Embed Size (px)

description

Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains. Li Li and Nilufer Onder Department of Computer Science Michigan Technological University (Presented by: Li Li ) AAAI 0 8 Chicago July 16 , 200 8. Outline. Example domain Two usages of concurrent actions - PowerPoint PPT Presentation

Transcript of Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains

  • Generating Plans in Concurrent, Probabilistic, Oversubscribed DomainsLi Li and Nilufer OnderDepartment of Computer ScienceMichigan Technological University(Presented by: Li Li)AAAI 08 Chicago July 16, 2008

  • OutlineExample domainTwo usages of concurrent actionsAO* and CPOAO* algorithmsHeuristics used in CPOAO*Experiment resultsConclusion and future work

  • A simple Mars rover domainLocations A, B, C and D on Mars: ABCD

  • Main featuresAspects of complex domainsDeadlines, limited resourcesFailures OversubscriptionConcurrencyTwo types of parallel actionsDifferent goals (all finish)Redundant (early finish)Aborting actionsWhen they succeedWhen they fail

  • The actions

  • Problem 1Initial state: The rover is at location ANo other rewards have been achievedRewards:r1 = 10: Get back to location Ar2 = 2: Take a picture at location Br3 = 1: Collect a soil sample at location Br4 = 3: Take a picture at location C

  • Problem 1Time Limit: The rover is only allowed to operate for 3 time unitsActions: Each action takes 1 time unit to finishActions can be executed in parallel if they are compatible

  • A solution to problem 1 (1) Move (A, B) (2) Camera (B) Sample (B) (3) Move (B, A)

  • Add redundant actionsActions Camera0 (60%) and Camera1 (50%) can be executed concurrently.There are two rewards:R1: Take a picture P1 at location AR2: Take a picture P2 at location A

  • Two ways of using concurrent actionsAll finish: Speed up the execution Use concurrent actions to achieve different goals. Early finish: Redundancy for critical tasks Use concurrent actions to achieve the same goal.

  • Example of all finish actionsIf R1=10 and R2=10, execute Camera0 to achieve one reward and execute Camera1 to achieve another. (All finish)

    The expected total rewards = 10*60% + 10*50% = 11

  • Example of early finish actionsIf R1=100 and R2=10, Use both Camera0 and Camera1 to achieve R1. (Early finish) The expected total rewards = 100*50% + (100-100*50%)*60% = 50 + 30 = 80

  • The AO* algorithmAO* searches in an and-or graph (hypergraph)ORANDHyperarcs (compact way)

  • Concurrent Probabilistic Over-subscription AO* (CPOAO*)Concurrent action setRepresent parallel actions rather than individual actionsUse hyperarcs to represent them State Space Resource levels are part of a stateUnfinished actions are part of a state

  • CPOAO* search ExampleABCD34456A Mars Rover problemMap:Actions: Move (Location, Location) Image_S (Target) 50%, T= 4 Image_L (Target) 60%, T= 5 Sample (Target) 70%, T= 6Targets: I1 Image at location B I2 Image at location C S1 Sample at location B S2 Sample at location DRewards: Have_Picture(I1) = 3 Have_Picture(I2) = 4 Have_Sample(S1) = 4 Have_Sample(S2) = 5 At_location(A) = 10;

  • CPOAO* search Example18.4S0ABCD34456Expected reward calculated using the heuristics T=10

  • CPOAO* Search Example15.2S0T=10{Move(A,B)}{Move(A,C)}Do-nothing{Move(A,D)}3046S1S2S3S415.213.2010T=7T=4T=6T=10ABCD34456Values of terminal nodesExpected reward calculated using the heuristics Expected reward calculated from childrenBest action

  • CPOAO* search ExampleS115.2{Move(B,D)}Do-nothing{Move(A,B)}{a1,a2,a3}440a1: Sample(T2)a3: Image_L(T1)a2: Image_S(T1)S5S6S7S815.814.600T=3T=3T=3T=750%50%Sample(T2)=2Sample(T2)=2Image_L(T1)=1Image_L(T1)=1ABCD34456Values of terminal nodesExpected reward calculated using the heuristics Expected reward calculated from childrenBest actionS015.2

  • CPOAO* search ExampleS115.2{Move(B,D)}Do-nothing{Move(A,B)}{a1,a2,a3}440a1: Sample(T2)a3: Image_L(T1)a2: Image_S(T1)S5S6S7S815.814.600T=3T=3T=3T=750%50%Sample(T2)=2Sample(T2)=2Image_L(T1)=1Image_L(T1)=1ACD34456Values of terminal nodesExpected reward calculated using the heuristics Expected reward calculated from childrenBest actionS015.2

  • CPOAO* search ExampleS111.5{Move(B,D)}Do-nothing{Move(A,B)}{a1,a2,a3}440a1: Sample(T2)a3: Image_L(T1)a2: Image_S(T1)S5S6S7S8131000T=3T=2T=750%Sample(T2)=2Image_L(T1)=1S10S9S11S12S13S14S15S16731335.82.8010T=1T=1T=2T=3T=0T=0T=3a4: Move(B,A){a1}21{a1,a2}a5: Do-nothing{a4}{a5}{a4}{a5}3300ABCD34456Values of terminal nodesExpected reward calculated using the heuristics Expected reward calculated from childrenBest actionS013.2S213.2

  • CPOAO* search Example13.2S0T=10{Move(A,B)}{Move(A,C)}Do-nothing{Move(A,D)}3046S1S2S3S411.513.2010T=7T=4T=6T=10{a1,a2,a3}a1: Sample(T2)a3: Image_L(T1)a2: Image_S(T1)a4: Move(B,A)a5: Do-nothingABCD34456Values of terminal nodesExpected reward calculated using the heuristics Expected reward calculated from childrenBest action

  • CPOAO* search Example11.5S0T=10{Move(A,B)}{Move(A,C)}Do-nothing{Move(A,D)}3046S1S2S3S411.53.2010T=7T=4T=10{a1,a2,a3}T=6S17S18S19S20T=2T=2T=1T=642.400a1: Sample(T2)a3: Image_L(T1)a2: Image_S(T1)a4: Move(B,A)a5: Do-nothinga6: Image_S(T3)a8: Move(C,D){a6,a7}{a8}Do-nothing04550%50%ABCD34456Values of terminal nodesExpected reward calculated using the heuristics Expected reward calculated from childrenBest actiona7: Image_L(T3)

  • CPOAO* search ExampleS111.5{Move(B,D)}Do-nothing{Move(A,B)}{a1,a2,a3}440a1: Sample(T1)a3: Image_L(T2)a2: Image_S(T2)S5S6S7S8131000T=3T=2T=750%Sample(T2)=2Image_L(T1)=1S10S9S11S12S13S14S15S16531334.41.4010T=1T=1T=2T=3T=0T=0T=3a4: Move(B,A){a1}21{a1,a2}a5: Do-nothing{a4}{a5}{a4}{a5}330011.5S0ABCD34456Values of terminal nodesExpected reward calculated using the heuristics Expected reward calculated from childrenBest actionT=3

  • CPOAO* search improvementsS111.5{Move(B,D)}Do-nothing{Move(A,B)}{a1,a2,a3}440S5S6S7S8131000T=3T=2T=750%Sample(T2)=2Image_L(T1)=1S10S9S11S12S13S14S15S16531334.41.4010T=1T=1T=2T=3T=0T=0T=3{a1}21{a1,a2}{a4}{a5}{a4}{a5}330011.5S0Estimate total expected rewardsPrune branchesT=3Plan Found: Move(A,B) Image_S(T1) Move(B,A)

  • Heuristics used in CPOAO*A heuristic function to estimate the total expected reward for the newly generated states using a reverse plan graph.A group of rules to prune the branches of the concurrent action sets.

  • Estimating total rewardsA three-step process using an rpgraphGenerate an rpgraph for each goalIdentify the enabled propositionsCompute the probability of achieving each goalCompute the expected rewards based on the probabilitiesSum up the rewards to compute the value of this state

  • Heuristics to estimate the total rewardsHave_Picture(I1)At_Location(B)Image_S(I1)Move(A,B)At_Location(A)At_Location(D)Reverse plan graphStart from goals.Layers of actions and propositionsCost marked on the actionsAccumulated cost marked on the propositions.4584874533Move(A,B)Image_L(I1)At_Location(B)Move(D,B)At_Location(D)At_Location(A)Move(D,B)49

  • Heuristics to estimate the total rewardsGiven a specific state At_Location(A)=T Time= 7Enabled propositions are marked in blue.Have_Picture(I1)At_Location(B)Image_S(I1)Move(A,B)At_Location(A)At_Location(D)4584874533Move(A,B)Image_L(I1)At_Location(B)Move(D,B)At_Location(D)At_Location(A)Move(D,B)49

  • Heuristics to estimate the total rewardsEnable more actions and propositions Actions are probabilisticEstimate the probabilities that propositions and the goals being trueSum up rewards on all goals.Have_Picture(I1)At_Location(B)Image_S(I1)Move(A,B)At_Location(A)At_Location(D)4584874533Move(A,B)Image_L(I1)At_Location(B)Move(D,B)Move(D,B)49

  • Rules to prune branches (when time is the only resource)Include the action if it does not delete anything Ex. {action-1, action-2, action-3} is better than {action-2,action-3} if action-1 does not delete anything.Include the action if it can be aborted later Ex. {action-1,action-2} is better than {action-1} if the duration of action2 is longer than the duration of action-1.Dont abort an action and restart it again immediately

  • Experimental workMars rover problem with following actionsMoveCollect-Soil-SampleCamera0Camera13 sets of experiments - Gradually increase complexityBase AlgorithmWith rules to prune branches onlyWith both heuristics

  • Results of complexity experimentsProblem: locations-paths-rewards; NG: The number of nodes generated; ET: Execution time (sec.)

  • Ongoing and Future WorkOngoingAdd typed objects and lifted actions Add linear resource consumptionsFutureExplore the possibility of using state cachingClassify domains and develop domain-specific heuristic functionsApproximation techniques

  • Related WorkLAO*: A Heuristic Search Algorithm that finds solutions with loops (Hansen & Zilberstein 2001)CoMDP: Concurrent MDP (Mausam & Weld 2004)GSMDP: Generalized semi-Markov decision process (Younes & Simmons 2004)mGPT: A Probabilistic Planner based on Heuristic Search (Bonet & Geffner 2005)Over-subscription Planning (Smith 2004; Benton, Do, & Kambhampati 2005)HAO*: Planning with Continuous Resources in Stochastic Domains (Mausam, Benazera, Brafman, Meuleau & Hansen 2005)

  • Related Work

    CPTP:Concurrent Probabilistic Temporal Planning (Mausam & Weld 2005)Paragraph/Prottle: Concurrent Probabilistic Planning in the Graphplan Framework (Little & Thiebaux 2006)FPG: Factored Policy Gradient Planner (Buffet & Aberdeen 2006)Probabilistic Temporal Planning with Uncertain Durations (Mausam & Weld 2006)HYBPLAN:A Hybridized Planner for Stochastic Domains (Mausam, Bertoli and Weld 2007)

  • ConclusionAn AO* based modular frameworkUse redundant actions to increase robustnessAbort running actions when neededHeuristic function using reverse plan graphRules to prune branches

    Where does the exmaple come fromReasons for choosing AO* :1.Straightforward to model probabilistic actions; Simple but able to incorporate everything 2.The resource constraints ensures the termination of search3.Modular framework with changeable heuristic functions

    Use heuristic function to estimates the total of the achievable rewards for the new statesUse pruning rules to prune some branches