Learning to Plan with Logical Automatakiranvodrahalli.github.io/talks/RSS2019-Presentation.pdf ·...

Post on 19-Aug-2020

1 views 0 download

Transcript of Learning to Plan with Logical Automatakiranvodrahalli.github.io/talks/RSS2019-Presentation.pdf ·...

Learning to Plan with Logical Automata

Brandon Araki1*, Kiran Vodrahalli2*, Thomas Leech1,3,Mark Donahue3, Cristian-Ioan Vasile1, Daniela Rus1

1Massachusetts Institute of Technology2Columbia University

3MIT Lincoln Laboratory

*Equal contributors

1

2

Goals

Learn to plan in an environment with rules1. Learn the rules in a way that they can be easily interpreted by humans

2. Incorporate the rules into planning so that modifying the rules results in predictable changes in behavior

3

Packing a Lunchbox

Pack a burger or a sandwich; then pack a banana

4

Goal 1 – Interpretability

Pack a burger or a sandwich;

then pack a banana

5

InitialState

Picked upor

Packedor

Picked up Packed

GOAL!

Rules

Finite State Automaton

6

Low-level MDP

High-level MDPPack sandwich or burger;

Then pack bananaAvoid obstacles

Factoring the Environment

77

Discrete 2D gridworld

Finite state automaton

Representing the Environment

Pack sandwich or burger;Then pack banana

Avoid obstacles

InitialState

Picked upor

Packedor

Picked up Packed

Goal 2 – Manipulability

Incorporate FSA into planning

8

InitialState

Picked upor

Packedor

Picked up Packed

S0 S1 S2 S3 G

S0 o ØS0

S1

S2

S3

G

T

Learn reward

Learn transitions

Learn transitions of FSA

9

One VIN for each FSA state

Differentiable Recursive Planning

Based on Tamar, Aviv, et al. "Value iteration networks." Advances in Neural Information Processing Systems. 2016.

Experiments - Interpretability

10

Propositions

FSA

Sta

tes

S0 o ØS0

S1

S2

S3

G

T

S0 o ØS0

S1

S2

S3

G

T

Experiments - Interpretability

11

Picking up the sandwich or the hamburger causes a transition to the next state

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

12

InitialState

Picked upor

Packedor

Picked up Packed

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

13

InitialState

Picked upor

Packedor

Picked up Packed

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

14

S0 o ØS0

S1

S2

S3

G

T

S0 o ØS0

S1

S2

S3

G

T

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

15

S0 o ØS0

S1

S2

S3

G

T

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

16

Learning to Plan with Logical Automata

Brandon Araki1*, Kiran Vodrahalli2*, Thomas Leech1,3,Mark Donahue3, Cristian-Ioan Vasile1, Daniela Rus1

1Massachusetts Institute of Technology2Columbia University

3MIT Lincoln Laboratory

*Equal contributors

17