Learning High Level Planning From...

Post on 29-Dec-2019

2 views 0 download

Transcript of Learning High Level Planning From...

Learning High Level Planning From Text

Nate Kushman

S.R.K. Branavan, Tao Lei, Regina Barzilay

1

NLP: Linguistic Relation

Precondition/Effects Relationships

Classical Planning:

2

Goal: Show that planning can be improved by utilizing precondition information in text

castle have magic bricks

have

Castles are built with magic bricks

Castles are built with magic bricks

A pickaxe, which is used to harvest stone, can be made from wood.

pickaxe stone

Text

Preconditions

Plan • Move to location: <3,3>

• Harvest: wood

• Retrieve: harvested wood

• Setup crafting table

• Place on crafting table: wood

• Craft: pickaxe

• Retrieve: pickaxe

• Move to location: <1,2>

• Pickup tool: pickaxe

• Harvest: stone with: pickaxe

• Retrieve: stone

wood

pickaxe

stone

3

How Text Can Help Planning

Challenge: Preconditions from text cannot map directly to planning action preconditions

pickaxe wood

Minecraft : Virtual world allowing tool creation and complex construction.

4

Classical Planning’s Problem: Exponential heuristic search

Traditional Solution: Analyze domain to induce subgoals

Text: A pickaxe, which is used to harvest stone, can be made from wood.

Precondition Relations:

wood pickaxe

pickaxe stone

Key Idea: Map text precondition information to subgoals

Opportunity

stone

pickaxe

wood

Utilize domain specific information in text to induce subgoals Jonsson and Barto, 2005; Wolfe and Barto, 2005; Mehta et al.,2008; Barry et al., 2011

looked only at domain, did not utilize text Learn from only environment feedback

Girju and Moldovan, 2002; Chang and Choi, 2006; Blanco et al., 2008; Beamer and Girju, 2009; Do et al., 2011; Kwiakowski, 2012

Learns from supervised data, does not utilized environment feedback

Utilize text providing abstract domain relationships (not goal specific)

Oates, 2001; Siskind, 2001; Yu and Ballard, 2004; Fleischman and Roy, 2005; Mooney, 2008; Branavan et al., 2009; Liang et al. , 2009; Vogel and Jurafsky, 2010; Branavan et al., 2009; Branavan et al. 2010; Vogel and Jurafsky, 2010; Branavan et al., 2011

Focused on grounding words to objects, does not ground relations

5

Key Departures

Text Log-linear model

Planning

target goal

Plan

Log-linear model

Low-level planner

Model precondition descriptions

Model object relations

in world, and ground

preconditions.

Precondition relations

Sub-goal sequence

6

Hybrid Model

Learn model parameters from planning feedback

• State is represented by a set of predicates

• Actions represented by preconditions and effects

current_location(1,2) = TRUE current_tool(pickaxe) = TRUE

Action: chop_tree(1,2)

Preconditions: tree_at(1,2) = TRUE current_location(1,2) = TRUE

Effect: tree_at(1,2) FALSE have(wood) TRUE

7

Goals and subgoals are represented as predicates

Modeling the World

Cooked fish is obtained when raw fish is cooked in a furnace.

Goal Independent

8

Model Part 1: Predict Precondition Relations

cooked fish

have raw fish

have furnace have

Given Goal State

Cooked fish

Raw fish

Start Fishing

pole

• Model as a Markov process

• Explicitly model preconditions observed via planner

Model Part 2: Predict Subgoal Sequence

… ?

Manual groundings, x Prediction per pair Sentence, w, dependency parse, q

Model Part 1: Predict Precondition Relations from text

Relations between predicates, x

10

Markov Assumption Relations from text, C

Policy Functions

Model Part 2: Predict Subgoal Sequence

Reinforcement Learning Algorithm

(Policy Gradient)

Parameter

Updates

Sub-goal sequence

Model

Model Parameters

Low-level planner

Planner succeeds or fails on each step

11

Learn Parameters Using Feedback from the Planner

Pickaxe Start Pickaxe Fishing

pole Raw fish

Fishing pole

Fishing pole

Pickaxe Raw fish

Separate update for each relation

Negative update for all unnecessary preconditions

Parameter Updates: Relation Prediction

Cooked fish

Start

Fishing pole

Pickaxe

Fishing pole

Start

Start

Start String Raw fish

Fishing pole

Cooked fish

One update for the whole sequence

Parameter Updates: Subgoal Sequence Prediction

Raw fish

Fishing pole

Go fishing

Raw fish

Fishing pole

Start Go to

market Buy fish

14

Success/failure of one subgoal pair standard log-linear gradient

Success or failure of entire sequence Sum over all subgoal pairs

Updates

standard log-linear gradient

Model Part 1: Precondition Relation Prediction

Model Part 2: Subgoal Sequence Prediction

Sentences: 242

Vocabulary: 979

Documents:

User authored wiki articles

Text Statistics:

World:

Minecraft virtual world

Tasks: 98

Avg. plan length: 35

Min. Branching Factor: 8

Planning task Statistics:

15

Experimental Domain

Unmodified Low-level Planner

Fast-Forward – standard baseline in classical planning

No induced subgoals

No Text

Second half of model given no relations from text

All Text

Generate all connections with grounded phrase in same sentence

Second-half of model with this set of connections

Full Model

As described so far

Manual Text Connections

Manually annotate all connections implied by the text Use second half of model with the manual connections

16

Models compared

17

40.80%

0% 20% 40% 60% 80% 100%

Low-level Planner

% of tasks completed successfully

Results

Full Model

18

40.80%

0% 20% 40% 60% 80% 100%

Low-level Planner

% of tasks completed successfully

Results

No Text

Full Model

19

40.8%

0% 20% 40% 60% 80% 100%

Low-level Planner

% of tasks completed successfully

Results

No Text

Full Model

All Text

20

40.8%

0% 20% 40% 60% 80% 100%

Manual Text

Low-level Planner

% of tasks completed successfully

Results

No Text

Full Model

Very close to upper bound

All Text

21

14%

0% 20% 40% 60% 80%

Manual Text

Low-level Planner

% of tasks completed successfully

No Text

Full Model

All Text

Results: Tasks Longer Than 35 Actions

Almost twice the performance of No Text

22

Results: Text Analysis

• Our method can learn to ground textual descriptions of precondition relations

• Precondition relationship information can improve performance on complex planning tasks

Code and data available at:

http://groups.csail.mit.edu/rbg/code/planning/

23

Conclusion