Learning to Plan with Logical Automata
Brandon Araki1*, Kiran Vodrahalli2*, Thomas Leech1,3,Mark Donahue3, Cristian-Ioan Vasile1, Daniela Rus1
1Massachusetts Institute of Technology2Columbia University
3MIT Lincoln Laboratory
*Equal contributors
1
2
Goals
Learn to plan in an environment with rules1. Learn the rules in a way that they can be easily interpreted by humans
2. Incorporate the rules into planning so that modifying the rules results in predictable changes in behavior
3
Packing a Lunchbox
Pack a burger or a sandwich; then pack a banana
4
Goal 1 – Interpretability
Pack a burger or a sandwich;
then pack a banana
5
InitialState
Picked upor
Packedor
Picked up Packed
GOAL!
Rules
Finite State Automaton
6
Low-level MDP
High-level MDPPack sandwich or burger;
Then pack bananaAvoid obstacles
Factoring the Environment
77
Discrete 2D gridworld
Finite state automaton
Representing the Environment
Pack sandwich or burger;Then pack banana
Avoid obstacles
InitialState
Picked upor
Packedor
Picked up Packed
Goal 2 – Manipulability
Incorporate FSA into planning
8
InitialState
Picked upor
Packedor
Picked up Packed
S0 S1 S2 S3 G
S0 o ØS0
S1
S2
S3
G
T
Learn reward
Learn transitions
Learn transitions of FSA
9
One VIN for each FSA state
Differentiable Recursive Planning
Based on Tamar, Aviv, et al. "Value iteration networks." Advances in Neural Information Processing Systems. 2016.
Experiments - Interpretability
10
Propositions
FSA
Sta
tes
S0 o ØS0
S1
S2
S3
G
T
S0 o ØS0
S1
S2
S3
G
T
Experiments - Interpretability
11
Picking up the sandwich or the hamburger causes a transition to the next state
Experiments – Manipulability
We can modify the FSA so that it will only pick up the burger and not the sandwich.
12
InitialState
Picked upor
Packedor
Picked up Packed
Experiments – Manipulability
We can modify the FSA so that it will only pick up the burger and not the sandwich.
13
InitialState
Picked upor
Packedor
Picked up Packed
Experiments – Manipulability
We can modify the FSA so that it will only pick up the burger and not the sandwich.
14
S0 o ØS0
S1
S2
S3
G
T
S0 o ØS0
S1
S2
S3
G
T
Experiments – Manipulability
We can modify the FSA so that it will only pick up the burger and not the sandwich.
15
S0 o ØS0
S1
S2
S3
G
T
Experiments – Manipulability
We can modify the FSA so that it will only pick up the burger and not the sandwich.
16
Learning to Plan with Logical Automata
Brandon Araki1*, Kiran Vodrahalli2*, Thomas Leech1,3,Mark Donahue3, Cristian-Ioan Vasile1, Daniela Rus1
1Massachusetts Institute of Technology2Columbia University
3MIT Lincoln Laboratory
*Equal contributors
17
Top Related