Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... ·...

57
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation Neural Information Processing Systems (NeurIPS), 2020 Zhiwei Deng Karthik Narasimhan Olga Russakovsky

Transcript of Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... ·...

Page 1: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Neural Information Processing Systems (NeurIPS), 2020

Zhiwei Deng Karthik Narasimhan Olga Russakovsky

Page 2: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Human communicates with robots - through language

Robots interact with environments- perceive visual information- perform planning, take actions

Human language

Environment observation

Planning

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 3: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Vision-and-Language Navigation Task

Unseen environment

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language NavigationVision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments, Peter Anderson et al., CVPR 2018

Page 4: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Vision-and-Language Navigation TaskUnseen environment

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language NavigationVision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments, Peter Anderson et al., CVPR 2018

Photorealistic images

Page 5: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Vision-and-Language Navigation TaskFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Unseen environment

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Photorealistic images

Human annotated instructions

Page 6: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Vision-and-Language Navigation TaskFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Unseen environment

StartTarget

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Photorealistic images

Human annotated instructions

Navigation in a room

Page 7: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Vision-and-Language Navigation TaskFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Unseen environment

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 8: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Vision-and-Language Navigation TaskFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Unseen environment

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 9: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Vision-and-Language Navigation TaskUnseen environment

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 10: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Challenge 1: Reason over observation and languages

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 11: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environmentChallenge 1: Reason over observation and languages

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 12: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Challenge 2: Perform error correction and recovery

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 13: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environmentChallenge 2: Perform error correction and recovery

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Deviate from correct path

Page 14: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environmentChallenge 2: Perform error correction and recovery

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Incorrect actionCorrect action Deviate from correct path

Page 15: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Vision-and-Language Navigation TaskUnseen environment

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 16: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Vision-and-Language Navigation TaskUnseen environment

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 17: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Current navigation architecturesUnseen environment

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Agent ActionEncoderDecision space

Page 18: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environment

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Agent ActionEncoderDecision space

Current navigation architectures

Page 19: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environment

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Existing navigation architectures for VLN: constrained local decision space

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Agent ActionEncoderDecision space

Current navigation architectures

Page 20: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Observation + decision space

Alignment confusion

Current navigation architectures

Page 21: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

AgentDecision space

ActionEncoder

Need to make multi-step decisions, making error correction harder

Current navigation architectures

??

??

Page 22: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

A differentiable graphical planner

Evolving Graphical Structure Proxy graphs for planning

Graph Pool

Message Passing

Graph Unpool

Graph-augmented supervision

Condensation

Page 23: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Standard VLN navi-agentUnseen environment

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

AgentDecision space

ActionEncoder

Page 24: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

A differentiable graphical planner: global decision space helps

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 25: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

AgentDecision space

ActionEncoder

Topological map

Our work: Evolving Graphical Planner

Page 26: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

AgentDecision space

ActionEncoder

Topological map

Visit to expandVisit to stop

Our work: Evolving Graphical Planner

Page 27: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: Graphical memory – topological connection + raw feat.

Graphical memory

Instructions

Observations (visual + angle)

AC

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language NavigationTopological map

Gt = (Vt ,Et )

vti = (visualt

i ,angleti )

Page 28: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

Graphical memory

Instructions

Observations (visual + angle)

o Grounding: global alignment

A differentiable graphical planner: Graphical memory

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

AC

Topological map

Page 29: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

Graphical memory

Instructions

Observations (visual + angle)

ACo Follow the memorized path

o Decision made in single step

o Easier error correction

A differentiable graphical planner: Graphical memory

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language NavigationTopological map

Page 30: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

Ever expanding graph…

Instructions

Observations (visual + angle)

AC

A differentiable graphical planner: Proxy graphs

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 31: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

Instructions

Observations (visual + angle)

AC

A differentiable graphical planner: Proxy graphs

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Operate on the full graph: high planning cost

Ever expanding graph…

Page 32: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Ever expanding graph…

AC

A differentiable graphical planner: Proxy graphs

Pool

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18

V! t = AtTVt

E! t = AtT Et At

G! t = (Vt ,Et )Gt = (Vt ,Et )

Page 33: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Pooling matrix : soft “attention” or aggregation from the original graph

AC

A differentiable graphical planner: Proxy graphs

Pool

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18

G! t = (Vt ,Et )Gt = (Vt ,Et )

At

Page 34: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Pooling matrix : obtained from

AC

A differentiable graphical planner: Proxy graphs

Pool

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18

G! t = (Vt ,Et )Gt = (Vt ,Et )

At f (Gt ,language,agent − state)

Page 35: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

AC

A differentiable graphical planner: Proxy graphs

Pool

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18

G! t = (Vt ,Et )Gt = (Vt ,Et )

Neural message passing: GraphNeuralNetworks(Gt ,k = steps)

Relational inductive biases, deep learning, and graph networks, arxiv’18

Page 36: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

AC

A differentiable graphical planner: Proxy graphs

Un-pool

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18

G! t = (Vt ,Et )Gt = (Vt ,Et )

Pooling matrix : transpose as the un-pool matrixAt

Page 37: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

AC

A differentiable graphical planner: Proxy graphs

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18

G! t = (Vt ,Et )Gt = (Vt ,Et )

Propose next action

Un-pool

Page 38: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

AC

A differentiable graphical planner: Proxy graphs – multi-channel

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18

Gt = (Vt ,Et )

Propose next action

Un-pool

{G! tk(Vt

k ,Etk )},k = 1,...,K

Page 39: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Expert trajectories are provided

Page 40: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

How to use expert trajectory supervision?

Page 41: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Option 1: “teacher forcing”

D ={(a1,a2 ,...,aTi )i}Expert trajectory dataset:

a2 a3a1

Page 42: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Option 1: “teacher forcing”

P(a1,a2 ,...,aT | s) = P(a1 | s) P(at | a1,a2 ,...,at−1,s)t=2

T

a2 a3a1

Page 43: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Option 1: “teacher forcing” – drifting issue in unseen data

Page 44: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Option 2: “student forcing”

a2*

a3*

a1*

a4*

Page 45: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Option 2: “student forcing” – generate new supervision (shortest path)

D* ={(a1*,a2

*,...,aTi* )i}

D∪ D*

a2*

a3*

a1*

a4*

Page 46: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Option 2: “student forcing” – shortest path supervisionmismatch

Page 47: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Option 2: “student forcing” – graph augmented supervision

Decision space

Page 48: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Option 2: “student forcing” – graph augmented supervision

a2*

a3*

a1*

a4*

a5*

D* ={(a1*,a2

*,...,aTi* )i}

D∪ D*

Page 49: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…

A differentiable graphical planner: how to supervise the imitation learner?

Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19

S

Option 2: “student forcing” – graph augmented supervision

o Ground truth always exists

o No mismatch problem

o No need to access the ENV

Page 50: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

A differentiable graphical planner: full training process

Instructions

Observations (visual + angle)

Graphical memory

Multi-channel planner

Action

Loss

Page 51: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Our work: Evolving Graphical Planner

A differentiable graphical planner: test inference matches the training

Instructions

Observations (visual + angle)

Graphical memory

Multi-channel planner

Action

Page 52: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Experiments

• Room-to-Room (R2R): all trajectories are generated through shortest-path, emphasize on goal reaching

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 53: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Contribution of each component

44454647484950515253

Top-K = 3 Top-K = 5 Top-K =10

Top-K =All

Does global decision space help (success rate %)

• Room-to-Room (R2R): all trajectories are generated through shortest-path, emphasize on goal reaching

The global decision space, the planner and the new supervision strategy help on navigation success rate

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

0

10

20

30

40

50

60

Messagepassing steps

= 0

Messagepassing steps

= 3

Multi-channelplanner

Proxy graph choices (success rate %)

42

44

46

48

5052

54

Shortest path Graph-augmented(ours)

Supervision strategy for imitation learner (success rate %)

Page 54: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Compare to existing backbones

• Room-to-Room (R2R): all trajectories are generated through shortest-path, emphasize on goal reaching

We outperform previous backbone architecture

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 55: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Room-for-room with pure imitation learning

• Room-for-Room (R4R): measured by Coverage weighted by Length Score (CLS), normalized dynamic time warping (DTW), Success rate weighted normalized Dynamic Time Warping (SDTW), emphasize on path following

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 56: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Room-for-room with pure imitation learning

• Room-for-Room (R4R): measured by Coverage weighted by Length Score (CLS), normalized dynamic time warping (DTW), Success rate weighted normalized Dynamic Time Warping (SDTW)

We achieve the state-of-the-art using pure imitation learning

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Page 57: Evolving Graphical Planner: Contextual Global Planning for Vision …zhiweid/pdfs/Princeton... · 2021. 4. 23. · Vision-and-Language Navigation Task Unseen environment Evolving

Contributions

o A differentiable graphical planner that extends the decision space globally

o A new supervision strategy for training imitation agent in navigation

o Introduce proxy graphs for improving the efficiency of planning

Email: [email protected]