Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a...

78
Jacob Andreas Berkeley Microsoft Semantic Machines MIT Linguistic scaolds for policy learning

Transcript of Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a...

Page 1: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Jacob Andreas

Berkeley → Microsoft Semantic Machines → MIT

Linguistic scaffolds for policy learning

Page 2: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Linguistic scaffolds for policy learning

Work on language!

Jacob Andreas

Berkeley → Microsoft Semantic Machines → MIT

Page 3: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

What RL can do for language

What language can do for RL

Under review as a conference paper at ICLR 2017

A TASKS AND SKETCHES

The complete list of tasks, sketches, and symbols is given below. Tasks marked with an asterisk⇤ areheld out for the generalization experiments described in Section 4.4, but included in the multitasktraining experiments in Sections 4.2 and 4.3.

Goal Sketch

Maze environmentgoal1 left leftgoal2 left downgoal3 right downgoal4 up leftgoal5 up rightgoal6 up right upgoal7 down right upgoal8 left left downgoal9 right down downgoal10 left up right

Crafting environmentmake plank get wood use toolshedmake stick get wood use workbenchmake cloth get grass use factorymake rope get grass use toolshedmake bridge get iron get wood use factorymake bed⇤ get wood use toolshed get grass use workbenchmake axe⇤ get wood use workbench get iron use toolshedmake shears get wood use workbench get iron use workbenchget gold get iron get wood use factory use bridgeget gem get wood use workbench get iron use toolshed use axe

12

replacethelastletteroftheworddropheadchangethefinallettertotiaddazifthelastcharacterisaeveryvowelbecomesychangeonlythefirstconsonanttofirst&last3lettersdeleteeveryvowelreplaceallnswithc

Page 4: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

What RL can do for language

DanielFried

RonghangHu

VolkanCirik

w/ Anja Rohrbach, L.P. Morency, Taylor Berg-Kirkpatrick, Trevor Darrell and Dan Klein

Page 5: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Generation & understanding

[Anderson et al. 18]

Turn right and walk through the kitchen. Go right into the living room and stop by the rug.

Page 6: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

A reference game

[Frank & Goodman 12]

Page 7: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

“glasses"

[Frank & Goodman 12]

Page 8: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

“glasses"

[Frank & Goodman 12]

Page 9: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

“glasses"

[Frank & Goodman 12]

Page 10: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

“glasses"

[Frank & Goodman 12]

Page 11: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

The rational speech acts model

[Frank & Goodman 12]

L0( . | glasses)

L0( . | hat)

1/2 1/20 1

Page 12: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

The rational speech acts model

L0( . | glasses)

L0( . | hat)

1/2 1/20 1

S1( glasses | . ) ∝ L0( . | glasses) 1 1/3S1( hat | . ) 0 2/3

[Frank & Goodman 12]

Page 13: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

The rational speech acts model

3/4 1/40 1

S1( glasses | . ) ∝ L0( . | glasses) 1 1/3S1( hat | . ) 0 2/3

[Frank & Goodman 12]

L1( . | glasses ) ∝ S1( glasses | . )

L1( . | hat )

Page 14: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Pragmatics

Q: Do you know what time it is?

Page 15: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Q: Do you know what time it is?

A: Yes

Pragmatics

Page 16: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Pragmatics

Q: Do you know what time it is?

A: Yes

I find his cooking very interesting.

[Grice 70]

Page 17: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

RSA game tree

hat

glasses

speaker

Page 18: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

RSA game tree: as speaker

hat

glasses

hat

glasses

-1

+1

-1

+1

speaker (listener)

Page 19: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

RSA game tree: as speaker

hat

glasses

hat

glasses

-1

+1

-1

+1

speaker (listener)

Page 20: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

RSA game tree: as speaker

hat

glasses

hat

glasses

-1

+1

-1

+1

speaker (listener)

Page 21: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

RSA game tree: as listener

glasses glasses

?

?

(speaker) listener

?

Page 22: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

glasses glasses

?

?

(speaker) listener

?

RSA game tree: as listener

Language use is gameplay!

Page 23: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

A recipe for pragmatic text generation

smiley plainglassesman

glasses hat &glasses

glasses

1. Train a base listener model

Page 24: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

A recipe for pragmatic text generation

2. Train a reasoning speaker to win when playing with the listener

1. Train a base listener model

hat

glasses

hat

glasses

-1

+1

-1

+1

Page 25: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Application: image captioning

1. Train an image retrieval / gen model

a snake is slithering away from Jenny

Page 26: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Application: image captioning

2. Describe images using the listener model for search at inference time

a snake is slithering away

+1

the sun is in the sky

-1

-1

-1[A & Klein 16, Vedantam et al. 17]

Page 27: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Application: image captioning

2. Describe images using the listener model as a training-time reward (“self-play”)

the sun is in the sky

-1

captioner model retrieval loss

[Yu et al. 16, Mao et al. 16]

Page 28: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Descriptive captions [Vedantam et al. 17]

Context-aware Captions from Context-agnostic Supervision

Ramakrishna Vedantam1 Samy Bengio2 Kevin Murphy2 Devi Parikh3 Gal Chechik2

1Virginia Tech 3Georgia Institute of Technology [email protected] [email protected] 2{bengio,kpmurphy,gal}@google.com

Abstract

We introduce an inference technique to produce discrim-inative context-aware image captions (captions that de-scribe differences between images or visual concepts) usingonly generic context-agnostic training data (captions thatdescribe a concept or an image in isolation). For exam-ple, given images and captions of “siamese cat” and “tigercat”, we generate language that describes the “siamesecat” in a way that distinguishes it from “tiger cat”. Ourkey novelty is that we show how to do joint inference overa language model that is context-agnostic and a listenerwhich distinguishes closely-related concepts. We first ap-ply our technique to a justification task, namely to describewhy an image contains a particular fine-grained categoryas opposed to another closely-related category of the CUB-200-2011 dataset. We then study discriminative image cap-tioning to generate language that uniquely refers to oneof two semantically-similar images in the COCO dataset.Evaluations with discriminative ground truth for justifica-tion and human studies for discriminative image captioningreveal that our approach outperforms baseline generativeand speaker-listener approaches for discrimination.

1. IntroductionLanguage is the primary modality for communicating,

and representing knowledge. To convey relevant informa-tion, we often use language in a way that takes into accountcontext. For example, instead of describing a situation in a“literal” way, one might pragmatically emphasize selectedaspects in order to be persuasive, impactful or effective.Consider the target image at the bottom left in Fig. 1. Aliteral description “An airplane is flying in the sky” conveysthe semantics of the image, but would be inadequate if thegoal was to disambiguate this image from the distractor im-age (bottom right). For this purpose, a more pragmatic de-scription would be, “A large passenger jet flying through ablue sky”. This description is aware of context, namely, thatthe distractor image also has an airplane. People use suchpragmatic considerations continuously, and effortlessly in

Figure 1: An illustration of two tasks requiring pragmatic reasoning ex-plored in this paper. 1) justification: Given an image of a bird, a target(ground-truth) class (green), and a distractor class (red), describe the targetimage to explain why it belongs to the target class, and not the distractorclass. The distractor class images are only shown for illustration, and arenot provided to the algorithm. 2) discriminative image captioning: Giventwo similar images, produce a sentence to identify a target image (green)from the distractor image (red). Our introspective speaker model improvesover a context-free speaker.

teaching, conversation and discussions.In this vein, it is desirable to endow machines with prag-

matic reasoning. One approach would be to collect trainingdata of language used in context, for example, discrimina-tive ground truth utterances from people describing imagesin context of other images, or justifications explaining whyan image contains a target class as opposed to a distractorclass (Fig. 1). Unfortunately, collecting such data has a pro-hibitive cost, since the space of objects in possible contextsis often too large. Furthermore, in some cases the contextin which we wish to be pragmatic may be unknown apri-ori. For example, a free-form conversation agent may haveto respond in a context-aware or discriminative fashion de-pending upon the history of a conversation. Such scenariosalso arise in human-robot interaction, as in the case where,a robot may need to reason about which spoon a person isasking for. Thus, in this paper, we focus on deriving prag-matic (context-aware) behavior given access only to generic(context-agnostic) ground truth.

1

arX

iv:1

701.

0287

0v3

[cs.C

V]

31 Ju

l 201

7

seq2seq captioner: this bird has a yellow breast with a short pointy bill

pragmatic captioner: a small yellow bird with black stripes on its body and black stripe on the wings.

Page 29: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Contrastive captions without contrastive data!

(a) (b) (c)

(b vs. a) mike is holding a baseball bat

(b vs. c) the snake is slithering away from mike and jenny

Figure 6: Descriptions of the same image in different contexts.When the target scene (b) is contrasted with the left (a), thesystem describes a bat; when the target scene is contrasted withthe right (c), the system describes a snake.

descriptions from the model.

5 Conclusion

We have presented an approach for learning to gen-erate pragmatic descriptions about general referents,even without training data collected in a pragmaticcontext. Our approach is built from a pair of sim-ple neural base models, a listener and a speaker, anda high-level model that reasons about their outputsin order to produce pragmatic descriptions. In anevaluation on a standard referring expression game,our model’s descriptions produced correct behaviorin human listeners significantly more often than ex-isting baselines.

It is generally true of existing derived approachesto pragmatics that much of the system’s behavior re-quires hand-engineering, and generally true of di-rect approaches (and neural networks in particular)that training is only possible when supervision isavailable for the precise target task. By synthesiz-ing these two approaches, we address both prob-lems, obtaining pragmatic behavior without domainknowledge and without targeted training data. Webelieve that this general strategy of using reasoningto obtain novel contextual behavior from neural de-coding models might be more broadly applied.

References

Anne H. Anderson, Miles Bader, Ellen Gurman Bard,Elizabeth Boyle, Gwyneth Doherty, Simon Garrod,

hard development pairs respectively. In addition to perform-ing slightly worse than our reasoning model, this alternativeapproach relies on the structure of scene representations andcannot be applied to more general pragmatics tasks.

Stephen Isard, Jacqueline Kowtko, Jan McAllister, JimMiller, et al. 1991. The HCRC map task corpus. Lan-

guage and speech, 34(4):351–366.Luciana Benotti and David Traum. 2009. A computa-

tional account of comparative implicatures for a spo-ken dialogue agent. In Proceedings of the Eighth In-

ternational Conference on Computational Semantics,pages 4–17. Proceedings of the Annual Meeting of theAssociation for Computational Linguistics.

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadar-rama, Marcus Rohrbach, Subhashini Venugopalan,Kate Saenko, and Trevor Darrell. 2015. Long-termrecurrent convolutional networks for visual recogni-tion and description. In Proceedings of the Conference

on Computer Vision and Pattern Recognition, pages2625–2634.

Nicholas FitzGerald, Yoav Artzi, and Luke Zettlemoyer.2013. Learning distributions over logical forms forreferring expression generation. In Proceedings of

the Conference on Empirical Methods in Natural Lan-

guage Processing.Michael C Frank, Noah D Goodman, Peter Lai, and

Joshua B Tenenbaum. 2009. Informative communi-cation in word production and word learning. In Pro-

ceedings of the 31st annual conference of the cognitive

science society, pages 1228–1233.Albert Gatt, Ielka Van Der Sluis, and Kees Van Deemter.

2007. Evaluating algorithms for the generation of re-ferring expressions using a balanced corpus. In Pro-

ceedings of the Eleventh European Workshop on Nat-

ural Language Generation, pages 49–56. Proceedingsof the Annual Meeting of the Association for Compu-tational Linguistics.

Dave Golland, Percy Liang, and Dan Klein. 2010. Agame-theoretic approach to generating spatial descrip-tions. In Proceedings of the 2010 conference on

Empirical Methods in Natural Language Processing,pages 410–419. Association for Computational Lin-guistics.

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy.2014. Explaining and harnessing adversarial exam-ples. arXiv preprint arXiv:1412.6572.

Herbert P Grice. 1970. Logic and conversation.Daniel Jurafsky, Rebecca Bates, Noah Coccaro, Rachel

Martin, Marie Meteer, Klaus Ries, Elizabeth Shriberg,Audreas Stolcke, Paul Taylor, Van Ess-Dykema, et al.1997. Automatic detection of discourse structure forspeech recognition and understanding. In IEEE Work-

shop on Automatic Speech Recognition and Under-

standing, pages 88–95. IEEE.Sahar Kazemzadeh, Vicente Ordonez, Mark Matten, and

Tamara L Berg. 2014. Referitgame: Referring to ob-jects in photographs of natural scenes. In Proceedings

Mike is holding a baseball bat.

The snake is slithering away from Mike & Jenny. [A & Klein 16]

Page 30: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

2. Train an instruction generation model to get the follower to goal states

1. Train a base instruction following model

Application: instruction generation

Page 31: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Application: instruction generation

[Fried, Hu, Cirik et al. 18]

speaker-listener: Walk past the dining room table and chairs and take a right into the living room. Stop once you are on the rug. 

seq2seq: Walk past the dining room table and chairs and wait there. 

human: Turn right and walk through the kitchen. Go right into the living room and stop by the rug.

Page 32: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Listener mode

[Fried, Hu, Cirik et al. 18]

human: Go through the door on the right and continue straight. Stop in the next room in front of the bed.instruction:

Go through the door on the right and continue straight. Stop in the next room in front of the bed.

(a) orange: trajectory without pragmatic inference

(b) green: trajectory with pragmatic inference

top-down overview of trajectories

seq-to-seq speaker-listener

Page 33: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

The rules of the game

glasses glasses

+1

Page 34: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

The rules of the game

hat hat

+1

Page 35: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Killer robots [Lewis et al. 17]

Bob: i can i i everything else . . . . . . . . . . . . . .

Alice: balls have zero to me to me to me to me to me to me to me to me to

Bob: you i everything else . . . . . . . . . . . . . .

Alice: balls have a ball to me to me to me to me to me to me to me

Page 36: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Killer robots [Lewis et al. 17]

Bob: i can i i everything else . . . . . . . . . . . . . .

Alice: balls have zero to me to me to me to me to me to me to me to me to

Bob: you i everything else . . . . . . . . . . . . . .

Alice: balls have a ball to me to me to me to me to me to me to me

Page 37: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Problems to work on

How do we use tools like self-play and treesearch while remaining within the rules of natural language?

How do we do efficient search in string- valued action spaces?

Page 38: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Problems to work on

How do we use tools like self-play and treesearch while remaining within the rules of natural language?

How do we do efficient search in string- valued action spaces?

Page 39: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

What language can do for RL

w/ Dan Klein and Sergey Levine

Page 40: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

A crafting game

makeplanks makesticks

Page 41: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Learning with sketches

usesaw

getwood

useaxe

getwood

Page 42: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

The options framework

[Su$onetal.99]

Page 43: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Unsupervised option learning

[Bacon&Precup16]

+r

Page 44: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Learning with intermediate rewards

[Kearns&Singh02,Kulkarnietal.16]

+r +r

Page 45: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Segmenting demonstrations

Ï

[Stolle&Precup02,Fox&Krishnanetal.16]

Page 46: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Learning from sketches

Ï

getwood usesaw

[A,Klein&Levine17]

Page 47: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Modular policies

getwood usesaw

getwood useaxe

π1 π2

π1 π3

Page 48: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Modular policies

getwood usesaw

getwood useaxe

π1 π2

π1 π3

Page 49: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Modular policies

π1 getwood

TURNLEFT

Page 50: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Results: crafting game

Page 51: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Results: crafting game

x106episodes1 2 30

RewardUnsupervised

Sketches:modular

Sketches:joint

Page 52: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Results: locomotion

Page 53: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Results: locomotion

x108Tmesteps1 2 30

Reward

Unsupervised

Sketches:modular

Sketches:joint

Page 54: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Generalization

WhatifIdon’tgetasketchattestTme?

???

Page 55: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Generalization

0

25

50

75

100

Training AdaptaTon

47

8976

42Sketches

Unsupervised

WhatifIdon’tgetasketchattestTme?

Page 56: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Moral

A little bit of (structured) language goes a long way!

Page 57: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Beyond structured sketches

itch itctch

Language learning Learning from demonstrations

emboldens dogtrot loneliness

emboldecs dogtrot locelicessfirst & last 3 letters

vein ???

[A,Klein&Levine17]

Page 58: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Beyond structured sketches

itch itctch

Language learning Learning from demonstrations

emboldens dogtrot loneliness

emboldecs dogtrot locelicessfirst & last 3 letters

vein ???

Page 59: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

f( · ; �, )

Pretraining via language learning

wonderful wonful

first & last 3 letters

[Branavan et al., 09]

Page 60: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

L(f( · ; �, ), · )

Concept learning

emboldens vein loneliness

emboldecs veic locelicess

Page 61: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Concept learning

every vowel becomes i

L(f( · ; �, ), · )emboldens vein loneliness

emboldecs veic locelicess

Page 62: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Concept learning

128.6

L(f( · ; �, ), · )emboldens vein loneliness

emboldecs veic locelicess

every vowel becomes i

Page 63: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Concept learning

change consonants to c 52.3

every vowel becomes i 128.6

L(f( · ; �, ), · )emboldens vein loneliness

emboldecs veic locelicess

Page 64: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Concept learning

change consonants to c 52.3

every vowel becomes i 128.6

replace n with c 8.3

L(f( · ; �, ), · )emboldens vein loneliness

emboldecs veic locelicess

Page 65: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Prediction

replace n with c

L(f( · ; �, ), · )

Page 66: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Evaluation

replace n with c

loonies L(f( · ; �, ), · )

Page 67: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

f( · ; �, )

Evaluation

loonies loocies

replace n with c

Page 68: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

As multitask learning

wonderful glabrous itch

wonful glaous itctch

Pretraining data Training data

emboldens vein loneliness

emboldecs veic locelicess

veinveicitchitctch

first & last 3 letters replace n with c

argmin⌘

L(f( | ; ⌘, ))<latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">AAACQXicbZBBSxwxFMcz1ra62nbbHr0EF2EtZZmRQgu9SNtDDxUVXBV2luVN9s0aTDJj8qawDOvH8NN4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP332fGGxtbT84uWr9us3+y4rrMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB22YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFFy6v0w==</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">AAACQXicbZBBSxwxFMcz1ra62nbbHr0EF2EtZZmRQgu9SNtDDxUVXBV2luVN9s0aTDJj8qawDOvH8NN4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP332fGGxtbT84uWr9us3+y4rrMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB22YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFFy6v0w==</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">AAACQXicbZBBSxwxFMcz1ra62nbbHr0EF2EtZZmRQgu9SNtDDxUVXBV2luVN9s0aTDJj8qawDOvH8NN4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP332fGGxtbT84uWr9us3+y4rrMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB22YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFFy6v0w==</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">AAACQXicbZBBSxwxFMcz1ra62nbbHr0EF2EtZZmRQgu9SNtDDxUVXBV2luVN9s0aTDJj8qawDOvH8NN4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP332fGGxtbT84uWr9us3+y4rrMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB22YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFFy6v0w==</latexit>

argmin⌘

L(f( | ; ⌘, ))<latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">AAACQXicbZBBSxwxFMcz1ra62nbbHr0EF2EtZZmRQgu9SNtDDxUVXBV2luVN9s0aTDJj8qawDOvH8NN4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP332fGGxtbT84uWr9us3+y4rrMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB22YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFFy6v0w==</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">AAACQXicbZBBSxwxFMcz1ra62nbbHr0EF2EtZZmRQgu9SNtDDxUVXBV2luVN9s0aTDJj8qawDOvH8NN4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP332fGGxtbT84uWr9us3+y4rrMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB22YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFFy6v0w==</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">AAACQXicbZBBSxwxFMcz1ra62nbbHr0EF2EtZZmRQgu9SNtDDxUVXBV2luVN9s0aTDJj8qawDOvH8NN4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP332fGGxtbT84uWr9us3+y4rrMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB22YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFFy6v0w==</latexit><latexit sha1_base64="6DTSKcLHA7PT7ua1kd9L70AxyAc=">AAACQXicbZBBSxwxFMcz1ra62nbbHr0EF2EtZZmRQgu9SNtDDxUVXBV2luVN9s0aTDJj8qawDOvH8NN4K+136EfwJh71YmacQ6s+SPjx/+clef8kV9JRGP4N5p7MP332fGGxtbT84uWr9us3+y4rrMC+yFRmDxNwqKTBPklSeJhbBJ0oPEiOv1b+wU+0TmZmj6Y5DjVMjEylAPLSqB22YrATLc0oRgL+o5t245OTAsY81tJvNX/mlfmen56ur4/anbAX1sUfQtRAhzW1M2pfx+NMFBoNCQXODaIwp2EJlqRQOGvFhcMcxDFMcODRgEY3LOvJZnzNK2OeZtYvQ7xW/+0oQTs31Yk/qYGO3H2vEh/zBgWln4alNHlBaMTdQ2mhOGW8iomPpUVBauoBhJX+r1wcgQVBPsxW/A39LBa3/L3bOVqgzL4rmyRnNcQV+bCi+9E8hP2NXhT2ot0Pnc0vTWwLbIWtsi6L2Ee2yb6zHdZngp2xc/ab/Ql+BRfBZXB1d3QuaHresv8quLkFFy6v0w==</latexit>

???

[Caruana 97]

Page 69: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

As inverse reinforcement learning

vein

replace n with c???

{<latexit sha1_base64="H0oI12VVFcZFlVoYZOQ3lWhXltU=">AAACu3icbVFda9RAFJ3ErxqtbvXRl6FLYatlSUpBQYTiB/igWMFtCzsh3ExutmMnk3RmIiwxf6P/rf/FByfZKHbrhRnOnHPPnbl30koKY8PwyvNv3b5z997G/eDBw81Hj0dbT45NWWuOM17KUp+mYFAKhTMrrMTTSiMUqcST9Pxdp5/8QG1Eqb7ZZYVxAQslcsHBOioZXe4w0ItCqIShBfppkk/YxUUNGWWFcFuPX9NO3HMnd9jdDf54KKN/3ZVwDrBnadp8aJOGWagpM6KgTmnnfd2O+jmUpGs144D1zTQas5Y1yWgcTsM+6E0QDWBMhjhKtrxNlpW8LlBZLsGYeRRWNm5AW8EltgGrDVbAz2GBcwcVFGjipr+zpTuOyWheareUpT37r6OBwphlkbrMrkezrnXk/7R5bfNXcSNUVVtUfHVRXktqS9r9Bs2ERm7l0gHgWri3Un4GGrh1fxaw9+h60fjZ1f1SoQZb6ufNMPK2B6xD156zaskNMFof101wvD+Nwmn09WB8+HYY5QZ5RrbJhETkJTkkH8kRmRFOfnnb3gtvz3/jc/+7L1epvjd4npJr4de/Ad1k1zY=</latexit><latexit sha1_base64="H0oI12VVFcZFlVoYZOQ3lWhXltU=">AAACu3icbVFda9RAFJ3ErxqtbvXRl6FLYatlSUpBQYTiB/igWMFtCzsh3ExutmMnk3RmIiwxf6P/rf/FByfZKHbrhRnOnHPPnbl30koKY8PwyvNv3b5z997G/eDBw81Hj0dbT45NWWuOM17KUp+mYFAKhTMrrMTTSiMUqcST9Pxdp5/8QG1Eqb7ZZYVxAQslcsHBOioZXe4w0ItCqIShBfppkk/YxUUNGWWFcFuPX9NO3HMnd9jdDf54KKN/3ZVwDrBnadp8aJOGWagpM6KgTmnnfd2O+jmUpGs144D1zTQas5Y1yWgcTsM+6E0QDWBMhjhKtrxNlpW8LlBZLsGYeRRWNm5AW8EltgGrDVbAz2GBcwcVFGjipr+zpTuOyWheareUpT37r6OBwphlkbrMrkezrnXk/7R5bfNXcSNUVVtUfHVRXktqS9r9Bs2ERm7l0gHgWri3Un4GGrh1fxaw9+h60fjZ1f1SoQZb6ufNMPK2B6xD156zaskNMFof101wvD+Nwmn09WB8+HYY5QZ5RrbJhETkJTkkH8kRmRFOfnnb3gtvz3/jc/+7L1epvjd4npJr4de/Ad1k1zY=</latexit><latexit sha1_base64="H0oI12VVFcZFlVoYZOQ3lWhXltU=">AAACu3icbVFda9RAFJ3ErxqtbvXRl6FLYatlSUpBQYTiB/igWMFtCzsh3ExutmMnk3RmIiwxf6P/rf/FByfZKHbrhRnOnHPPnbl30koKY8PwyvNv3b5z997G/eDBw81Hj0dbT45NWWuOM17KUp+mYFAKhTMrrMTTSiMUqcST9Pxdp5/8QG1Eqb7ZZYVxAQslcsHBOioZXe4w0ItCqIShBfppkk/YxUUNGWWFcFuPX9NO3HMnd9jdDf54KKN/3ZVwDrBnadp8aJOGWagpM6KgTmnnfd2O+jmUpGs144D1zTQas5Y1yWgcTsM+6E0QDWBMhjhKtrxNlpW8LlBZLsGYeRRWNm5AW8EltgGrDVbAz2GBcwcVFGjipr+zpTuOyWheareUpT37r6OBwphlkbrMrkezrnXk/7R5bfNXcSNUVVtUfHVRXktqS9r9Bs2ERm7l0gHgWri3Un4GGrh1fxaw9+h60fjZ1f1SoQZb6ufNMPK2B6xD156zaskNMFof101wvD+Nwmn09WB8+HYY5QZ5RrbJhETkJTkkH8kRmRFOfnnb3gtvz3/jc/+7L1epvjd4npJr4de/Ad1k1zY=</latexit><latexit sha1_base64="38u3Msd8cLInPlViofgp/9PDR90=">AAACFXicbZDNSgMxFIUz9a+OrbZrN8EiiIsy40aXgi7ciBXsD3SGksnctqGZZEgyQhn6Am59BZ/GnbjybcxMu7CtFwIf5yY395wo5Uwbz/txKju7e/sH1UP3qObWj08atZ6WmaLQpZJLNYiIBs4EdA0zHAapApJEHPrR7K7o919BaSbFi5mnECZkItiYUWKs1Bk1Wl7bKwtvg7+CFlrVqOnUg1jSLAFhKCdaD30vNWFOlGGUw8INMg0poTMygaFFQRLQYV7uucDnVonxWCp7hMGl+vdFThKt50lkbybETPVmrxD/6w0zM74JcybSzICgy4/GGcdG4sI0jpkCavjcAqGK2V0xnRJFqLHRuME9WC8KHu3cpxQUMVJd5gFRk4SJRQlBQWvrLC3Z/PzNtLahd9X2vbb/7KEqOkVn6AL56BrdogfUQV1EUYze0Lvz4Xw6X8ucK84q8CZaK+f7F+W1ogc=</latexit><latexit sha1_base64="907aLh1HkgsKeqMR7wtaRLVIO5o=">AAACsHicbVFNb9QwEHXCVwktbLlysVpV2kK1SriAhJCQAIkDiCKxbaV1FE2cydatnaS2g7Qy+Rv8N/4LB5xsQHTLSLae3/PMeJ7zRgpj4/hnEN66fefuva370YPtnYePJrvbJ6ZuNcc5r2Wtz3IwKEWFcyusxLNGI6hc4ml++bbXT7+hNqKuvtpVg6mCZSVKwcF6Kpv8OGCgl0pUGUML9OO0nLKrqxYKypTw24Bf0V488id/ODyM/uRQRv9mN8JngD3Pc/e+yxyz0FJmhKJe6RZD3Z76PpakGzXTiA3DOI1Fx1w22Y9n8RD0JkhGsE/GOM52gx1W1LxVWFkuwZhFEjc2daCt4BK7iLUGG+CXsMSFhxUoNKkbenb0wDMFLWvtV2XpwP6b4UAZs1K5v9nPaDa1nvyftmht+TJ1ompaixVfNypbSW1N+9+ghdDIrVx5AFwL/1bKz0EDt/7PIvYO/SwaP/m6nxvUYGv91I2WdwNgPbr2nPVI3sBk066b4OT5LIlnyZeYbJEnZI9MSUJekDfkAzkmc8LJr2AveBYcha9DHl6srQ6D0fPH5FqE8jeKt9YP</latexit><latexit sha1_base64="907aLh1HkgsKeqMR7wtaRLVIO5o=">AAACsHicbVFNb9QwEHXCVwktbLlysVpV2kK1SriAhJCQAIkDiCKxbaV1FE2cydatnaS2g7Qy+Rv8N/4LB5xsQHTLSLae3/PMeJ7zRgpj4/hnEN66fefuva370YPtnYePJrvbJ6ZuNcc5r2Wtz3IwKEWFcyusxLNGI6hc4ml++bbXT7+hNqKuvtpVg6mCZSVKwcF6Kpv8OGCgl0pUGUML9OO0nLKrqxYKypTw24Bf0V488id/ODyM/uRQRv9mN8JngD3Pc/e+yxyz0FJmhKJe6RZD3Z76PpakGzXTiA3DOI1Fx1w22Y9n8RD0JkhGsE/GOM52gx1W1LxVWFkuwZhFEjc2daCt4BK7iLUGG+CXsMSFhxUoNKkbenb0wDMFLWvtV2XpwP6b4UAZs1K5v9nPaDa1nvyftmht+TJ1ompaixVfNypbSW1N+9+ghdDIrVx5AFwL/1bKz0EDt/7PIvYO/SwaP/m6nxvUYGv91I2WdwNgPbr2nPVI3sBk066b4OT5LIlnyZeYbJEnZI9MSUJekDfkAzkmc8LJr2AveBYcha9DHl6srQ6D0fPH5FqE8jeKt9YP</latexit><latexit sha1_base64="K8qKatu3reyGM+zdZ8LxlK8+MbY=">AAACu3icbVFNb9QwEHXCVwmUbuHIxeqq0haqVcIFJIRUlSJxAFEktq20jqKJM9ma2klqO0irNH+j/43/wgEnGxDdMpKt5/fmjT3jtJLC2DD86fl37t67/2DjYfDo8eaTrdH20xNT1prjjJey1GcpGJSiwJkVVuJZpRFUKvE0vXjf6ac/UBtRFt/sssJYwaIQueBgHZWMrncZ6IUSRcLQAv00ySfs8rKGjDIl3Nbjt7QT993JHfb2gj8eyuhfdyWcA+x5mjYf2qRhFmrKjFDUKe28r9tRV0NJulYzDljfTKMxa1mTjMbhNOyD3gbRAMZkiONk29tkWclrhYXlEoyZR2Fl4wa0FVxiG7DaYAX8AhY4d7AAhSZu+jtbuuuYjOaldquwtGf/dTSgjFmq1GV2PZp1rSP/p81rm7+JG1FUtcWCry7Ka0ltSbvfoJnQyK1cOgBcC/dWys9BA7fuzwJ2hK4XjZ9d3S8VarClftEMI297wDp04zmrltwAo/Vx3QYnr6ZROI2+huODw2GUG+Q52SETEpHX5IB8JMdkRjj55e14L719/53P/e++XKX63uB5Rm6EX/8G3CTXMg==</latexit><latexit sha1_base64="H0oI12VVFcZFlVoYZOQ3lWhXltU=">AAACu3icbVFda9RAFJ3ErxqtbvXRl6FLYatlSUpBQYTiB/igWMFtCzsh3ExutmMnk3RmIiwxf6P/rf/FByfZKHbrhRnOnHPPnbl30koKY8PwyvNv3b5z997G/eDBw81Hj0dbT45NWWuOM17KUp+mYFAKhTMrrMTTSiMUqcST9Pxdp5/8QG1Eqb7ZZYVxAQslcsHBOioZXe4w0ItCqIShBfppkk/YxUUNGWWFcFuPX9NO3HMnd9jdDf54KKN/3ZVwDrBnadp8aJOGWagpM6KgTmnnfd2O+jmUpGs144D1zTQas5Y1yWgcTsM+6E0QDWBMhjhKtrxNlpW8LlBZLsGYeRRWNm5AW8EltgGrDVbAz2GBcwcVFGjipr+zpTuOyWheareUpT37r6OBwphlkbrMrkezrnXk/7R5bfNXcSNUVVtUfHVRXktqS9r9Bs2ERm7l0gHgWri3Un4GGrh1fxaw9+h60fjZ1f1SoQZb6ufNMPK2B6xD156zaskNMFof101wvD+Nwmn09WB8+HYY5QZ5RrbJhETkJTkkH8kRmRFOfnnb3gtvz3/jc/+7L1epvjd4npJr4de/Ad1k1zY=</latexit><latexit sha1_base64="H0oI12VVFcZFlVoYZOQ3lWhXltU=">AAACu3icbVFda9RAFJ3ErxqtbvXRl6FLYatlSUpBQYTiB/igWMFtCzsh3ExutmMnk3RmIiwxf6P/rf/FByfZKHbrhRnOnHPPnbl30koKY8PwyvNv3b5z997G/eDBw81Hj0dbT45NWWuOM17KUp+mYFAKhTMrrMTTSiMUqcST9Pxdp5/8QG1Eqb7ZZYVxAQslcsHBOioZXe4w0ItCqIShBfppkk/YxUUNGWWFcFuPX9NO3HMnd9jdDf54KKN/3ZVwDrBnadp8aJOGWagpM6KgTmnnfd2O+jmUpGs144D1zTQas5Y1yWgcTsM+6E0QDWBMhjhKtrxNlpW8LlBZLsGYeRRWNm5AW8EltgGrDVbAz2GBcwcVFGjipr+zpTuOyWheareUpT37r6OBwphlkbrMrkezrnXk/7R5bfNXcSNUVVtUfHVRXktqS9r9Bs2ERm7l0gHgWri3Un4GGrh1fxaw9+h60fjZ1f1SoQZb6ufNMPK2B6xD156zaskNMFof101wvD+Nwmn09WB8+HYY5QZ5RrbJhETkJTkkH8kRmRFOfnnb3gtvz3/jc/+7L1epvjd4npJr4de/Ad1k1zY=</latexit><latexit sha1_base64="H0oI12VVFcZFlVoYZOQ3lWhXltU=">AAACu3icbVFda9RAFJ3ErxqtbvXRl6FLYatlSUpBQYTiB/igWMFtCzsh3ExutmMnk3RmIiwxf6P/rf/FByfZKHbrhRnOnHPPnbl30koKY8PwyvNv3b5z997G/eDBw81Hj0dbT45NWWuOM17KUp+mYFAKhTMrrMTTSiMUqcST9Pxdp5/8QG1Eqb7ZZYVxAQslcsHBOioZXe4w0ItCqIShBfppkk/YxUUNGWWFcFuPX9NO3HMnd9jdDf54KKN/3ZVwDrBnadp8aJOGWagpM6KgTmnnfd2O+jmUpGs144D1zTQas5Y1yWgcTsM+6E0QDWBMhjhKtrxNlpW8LlBZLsGYeRRWNm5AW8EltgGrDVbAz2GBcwcVFGjipr+zpTuOyWheareUpT37r6OBwphlkbrMrkezrnXk/7R5bfNXcSNUVVtUfHVRXktqS9r9Bs2ERm7l0gHgWri3Un4GGrh1fxaw9+h60fjZ1f1SoQZb6ufNMPK2B6xD156zaskNMFof101wvD+Nwmn09WB8+HYY5QZ5RrbJhETkJTkkH8kRmRFOfnnb3gtvz3/jc/+7L1epvjd4npJr4de/Ad1k1zY=</latexit><latexit sha1_base64="H0oI12VVFcZFlVoYZOQ3lWhXltU=">AAACu3icbVFda9RAFJ3ErxqtbvXRl6FLYatlSUpBQYTiB/igWMFtCzsh3ExutmMnk3RmIiwxf6P/rf/FByfZKHbrhRnOnHPPnbl30koKY8PwyvNv3b5z997G/eDBw81Hj0dbT45NWWuOM17KUp+mYFAKhTMrrMTTSiMUqcST9Pxdp5/8QG1Eqb7ZZYVxAQslcsHBOioZXe4w0ItCqIShBfppkk/YxUUNGWWFcFuPX9NO3HMnd9jdDf54KKN/3ZVwDrBnadp8aJOGWagpM6KgTmnnfd2O+jmUpGs144D1zTQas5Y1yWgcTsM+6E0QDWBMhjhKtrxNlpW8LlBZLsGYeRRWNm5AW8EltgGrDVbAz2GBcwcVFGjipr+zpTuOyWheareUpT37r6OBwphlkbrMrkezrnXk/7R5bfNXcSNUVVtUfHVRXktqS9r9Bs2ERm7l0gHgWri3Un4GGrh1fxaw9+h60fjZ1f1SoQZb6ufNMPK2B6xD156zaskNMFof101wvD+Nwmn09WB8+HYY5QZ5RrbJhETkJTkkH8kRmRFOfnnb3gtvz3/jc/+7L1epvjd4npJr4de/Ad1k1zY=</latexit><latexit sha1_base64="H0oI12VVFcZFlVoYZOQ3lWhXltU=">AAACu3icbVFda9RAFJ3ErxqtbvXRl6FLYatlSUpBQYTiB/igWMFtCzsh3ExutmMnk3RmIiwxf6P/rf/FByfZKHbrhRnOnHPPnbl30koKY8PwyvNv3b5z997G/eDBw81Hj0dbT45NWWuOM17KUp+mYFAKhTMrrMTTSiMUqcST9Pxdp5/8QG1Eqb7ZZYVxAQslcsHBOioZXe4w0ItCqIShBfppkk/YxUUNGWWFcFuPX9NO3HMnd9jdDf54KKN/3ZVwDrBnadp8aJOGWagpM6KgTmnnfd2O+jmUpGs144D1zTQas5Y1yWgcTsM+6E0QDWBMhjhKtrxNlpW8LlBZLsGYeRRWNm5AW8EltgGrDVbAz2GBcwcVFGjipr+zpTuOyWheareUpT37r6OBwphlkbrMrkezrnXk/7R5bfNXcSNUVVtUfHVRXktqS9r9Bs2ERm7l0gHgWri3Un4GGrh1fxaw9+h60fjZ1f1SoQZb6ufNMPK2B6xD156zaskNMFof101wvD+Nwmn09WB8+HYY5QZ5RrbJhETkJTkkH8kRmRFOfnnb3gtvz3/jc/+7L1epvjd4npJr4de/Ad1k1zY=</latexit><latexit sha1_base64="H0oI12VVFcZFlVoYZOQ3lWhXltU=">AAACu3icbVFda9RAFJ3ErxqtbvXRl6FLYatlSUpBQYTiB/igWMFtCzsh3ExutmMnk3RmIiwxf6P/rf/FByfZKHbrhRnOnHPPnbl30koKY8PwyvNv3b5z997G/eDBw81Hj0dbT45NWWuOM17KUp+mYFAKhTMrrMTTSiMUqcST9Pxdp5/8QG1Eqb7ZZYVxAQslcsHBOioZXe4w0ItCqIShBfppkk/YxUUNGWWFcFuPX9NO3HMnd9jdDf54KKN/3ZVwDrBnadp8aJOGWagpM6KgTmnnfd2O+jmUpGs144D1zTQas5Y1yWgcTsM+6E0QDWBMhjhKtrxNlpW8LlBZLsGYeRRWNm5AW8EltgGrDVbAz2GBcwcVFGjipr+zpTuOyWheareUpT37r6OBwphlkbrMrkezrnXk/7R5bfNXcSNUVVtUfHVRXktqS9r9Bs2ERm7l0gHgWri3Un4GGrh1fxaw9+h60fjZ1f1SoQZb6ufNMPK2B6xD156zaskNMFof101wvD+Nwmn09WB8+HYY5QZ5RrbJhETkJTkkH8kRmRFOfnnb3gtvz3/jc/+7L1epvjd4npJr4de/Ad1k1zY=</latexit>

cost function

argmin E⌧⇠⇡[L(f(⌧ | ; ⌘, ))]<latexit sha1_base64="eHQpI+OSq3iuV/rUS12oFbSDhEE=">AAADHHicnVJNi9RAEO3ErzW6OqtHL43DwKzIkIig4GXxAzysuIKzuzAdQqVTmWk2X9vdEYY2J6/6F/w13sSr4C/xaicTR3fWkwVpXr/qeql63XGVCaV9/4fjXrh46fKVravetevbN24Odm4dqrKWHKe8zEp5HIPCTBQ41UJneFxJhDzO8Cg+edbmj96hVKIs3uplhWEO80KkgoO2VDT4OWIg57koIoYa6P44HbPT0xoSynJhlw4/oW3yvt3Zze6u97uGMrquroStAL2IY/OiiQzTUFOmRE5tppl1ui31vpekG5qht9ZcgDZ/pP5La8Q6Z4zEpGHGG6WUcSE53V+3Hg2G/sTvgp4HQQ+GpI+DaMfZZknJ6xwLzTNQahb4lQ4NSC14ho3HaoUV8BOY48zCAnJUoen6aOjIMglNS2m/QtOO/bvCQK7UMo/tyXZytZlryX/lZrVOH4dGFFWtseCrH6V1RnVJ2+umiZDIdba0ALgUtlfKFyCBa/soPPYc7SwSX1nd1xVK0KW8Z3qLmg6wFp1pZzWSNTDYtOs8OHwwCfxJ8ObhcO9pb+UWuUPukjEJyCOyR16SAzIl3AmdD85H55P72f3ifnW/rY66Tl9zm5wJ9/sv7Qn7Sg==</latexit><latexit sha1_base64="eHQpI+OSq3iuV/rUS12oFbSDhEE=">AAADHHicnVJNi9RAEO3ErzW6OqtHL43DwKzIkIig4GXxAzysuIKzuzAdQqVTmWk2X9vdEYY2J6/6F/w13sSr4C/xaicTR3fWkwVpXr/qeql63XGVCaV9/4fjXrh46fKVravetevbN24Odm4dqrKWHKe8zEp5HIPCTBQ41UJneFxJhDzO8Cg+edbmj96hVKIs3uplhWEO80KkgoO2VDT4OWIg57koIoYa6P44HbPT0xoSynJhlw4/oW3yvt3Zze6u97uGMrquroStAL2IY/OiiQzTUFOmRE5tppl1ui31vpekG5qht9ZcgDZ/pP5La8Q6Z4zEpGHGG6WUcSE53V+3Hg2G/sTvgp4HQQ+GpI+DaMfZZknJ6xwLzTNQahb4lQ4NSC14ho3HaoUV8BOY48zCAnJUoen6aOjIMglNS2m/QtOO/bvCQK7UMo/tyXZytZlryX/lZrVOH4dGFFWtseCrH6V1RnVJ2+umiZDIdba0ALgUtlfKFyCBa/soPPYc7SwSX1nd1xVK0KW8Z3qLmg6wFp1pZzWSNTDYtOs8OHwwCfxJ8ObhcO9pb+UWuUPukjEJyCOyR16SAzIl3AmdD85H55P72f3ifnW/rY66Tl9zm5wJ9/sv7Qn7Sg==</latexit><latexit sha1_base64="eHQpI+OSq3iuV/rUS12oFbSDhEE=">AAADHHicnVJNi9RAEO3ErzW6OqtHL43DwKzIkIig4GXxAzysuIKzuzAdQqVTmWk2X9vdEYY2J6/6F/w13sSr4C/xaicTR3fWkwVpXr/qeql63XGVCaV9/4fjXrh46fKVravetevbN24Odm4dqrKWHKe8zEp5HIPCTBQ41UJneFxJhDzO8Cg+edbmj96hVKIs3uplhWEO80KkgoO2VDT4OWIg57koIoYa6P44HbPT0xoSynJhlw4/oW3yvt3Zze6u97uGMrquroStAL2IY/OiiQzTUFOmRE5tppl1ui31vpekG5qht9ZcgDZ/pP5La8Q6Z4zEpGHGG6WUcSE53V+3Hg2G/sTvgp4HQQ+GpI+DaMfZZknJ6xwLzTNQahb4lQ4NSC14ho3HaoUV8BOY48zCAnJUoen6aOjIMglNS2m/QtOO/bvCQK7UMo/tyXZytZlryX/lZrVOH4dGFFWtseCrH6V1RnVJ2+umiZDIdba0ALgUtlfKFyCBa/soPPYc7SwSX1nd1xVK0KW8Z3qLmg6wFp1pZzWSNTDYtOs8OHwwCfxJ8ObhcO9pb+UWuUPukjEJyCOyR16SAzIl3AmdD85H55P72f3ifnW/rY66Tl9zm5wJ9/sv7Qn7Sg==</latexit><latexit sha1_base64="eHQpI+OSq3iuV/rUS12oFbSDhEE=">AAADHHicnVJNi9RAEO3ErzW6OqtHL43DwKzIkIig4GXxAzysuIKzuzAdQqVTmWk2X9vdEYY2J6/6F/w13sSr4C/xaicTR3fWkwVpXr/qeql63XGVCaV9/4fjXrh46fKVravetevbN24Odm4dqrKWHKe8zEp5HIPCTBQ41UJneFxJhDzO8Cg+edbmj96hVKIs3uplhWEO80KkgoO2VDT4OWIg57koIoYa6P44HbPT0xoSynJhlw4/oW3yvt3Zze6u97uGMrquroStAL2IY/OiiQzTUFOmRE5tppl1ui31vpekG5qht9ZcgDZ/pP5La8Q6Z4zEpGHGG6WUcSE53V+3Hg2G/sTvgp4HQQ+GpI+DaMfZZknJ6xwLzTNQahb4lQ4NSC14ho3HaoUV8BOY48zCAnJUoen6aOjIMglNS2m/QtOO/bvCQK7UMo/tyXZytZlryX/lZrVOH4dGFFWtseCrH6V1RnVJ2+umiZDIdba0ALgUtlfKFyCBa/soPPYc7SwSX1nd1xVK0KW8Z3qLmg6wFp1pZzWSNTDYtOs8OHwwCfxJ8ObhcO9pb+UWuUPukjEJyCOyR16SAzIl3AmdD85H55P72f3ifnW/rY66Tl9zm5wJ9/sv7Qn7Sg==</latexit>

Page 70: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

As a language game…

replace n with c

speaker model listener loss

f � L<latexit sha1_base64="LWz4dIFHTSmXqy3ZapCM+I1HvOk=">AAACxnicbVFda9RAFJ3Ej9ZodauPvgwuha3okojYQl+KH9CHihXctrATws3kZjs0maTzoSwx4F/xZ/lvnGSj2K0XEs6cM+fM3DtpXQhtwvCX59+6fefuxua94P6DrYePRtuPT3VlFccZr4pKnaegsRASZ0aYAs9rhVCmBZ6ll+86/ewrKi0q+cUsa4xLWEiRCw7GUcno5w4DtSiFTBgaoMeTfMKurixklJXC/Xp8QDvxhVu5xe5u8MdDGf3rroVzgLlI0+ZDmzTMgKVMi5I6pZ33uR31fYika5mxC+27aRRmLWuCnDIuFKfHyWgcTsO+6E0QDWBMhjpJtr0tllXcligNL0DreRTWJm5AGcELbANmNdbAL2GBcwcllKjjpj+9pTuOyWheKfdJQ3v2X0cDpdbLMnU7u3b1utaR/9Pm1uT7cSNkbQ1KvjootwU1Fe0ehmZCITfF0gHgSri7Un4BCrhxzxew9+h6UfjR5X6qUYGp1PNmmH7bA9aha9dZteQGGK2P6yY4fTWNwmn0+fX48O0wyk3ylDwjExKRPXJIjsgJmRHubXgvvTfenn/kS9/631ZbfW/wPCHXyv/xGxd+2cM=</latexit><latexit sha1_base64="LWz4dIFHTSmXqy3ZapCM+I1HvOk=">AAACxnicbVFda9RAFJ3Ej9ZodauPvgwuha3okojYQl+KH9CHihXctrATws3kZjs0maTzoSwx4F/xZ/lvnGSj2K0XEs6cM+fM3DtpXQhtwvCX59+6fefuxua94P6DrYePRtuPT3VlFccZr4pKnaegsRASZ0aYAs9rhVCmBZ6ll+86/ewrKi0q+cUsa4xLWEiRCw7GUcno5w4DtSiFTBgaoMeTfMKurixklJXC/Xp8QDvxhVu5xe5u8MdDGf3rroVzgLlI0+ZDmzTMgKVMi5I6pZ33uR31fYika5mxC+27aRRmLWuCnDIuFKfHyWgcTsO+6E0QDWBMhjpJtr0tllXcligNL0DreRTWJm5AGcELbANmNdbAL2GBcwcllKjjpj+9pTuOyWheKfdJQ3v2X0cDpdbLMnU7u3b1utaR/9Pm1uT7cSNkbQ1KvjootwU1Fe0ehmZCITfF0gHgSri7Un4BCrhxzxew9+h6UfjR5X6qUYGp1PNmmH7bA9aha9dZteQGGK2P6yY4fTWNwmn0+fX48O0wyk3ylDwjExKRPXJIjsgJmRHubXgvvTfenn/kS9/631ZbfW/wPCHXyv/xGxd+2cM=</latexit><latexit sha1_base64="LWz4dIFHTSmXqy3ZapCM+I1HvOk=">AAACxnicbVFda9RAFJ3Ej9ZodauPvgwuha3okojYQl+KH9CHihXctrATws3kZjs0maTzoSwx4F/xZ/lvnGSj2K0XEs6cM+fM3DtpXQhtwvCX59+6fefuxua94P6DrYePRtuPT3VlFccZr4pKnaegsRASZ0aYAs9rhVCmBZ6ll+86/ewrKi0q+cUsa4xLWEiRCw7GUcno5w4DtSiFTBgaoMeTfMKurixklJXC/Xp8QDvxhVu5xe5u8MdDGf3rroVzgLlI0+ZDmzTMgKVMi5I6pZ33uR31fYika5mxC+27aRRmLWuCnDIuFKfHyWgcTsO+6E0QDWBMhjpJtr0tllXcligNL0DreRTWJm5AGcELbANmNdbAL2GBcwcllKjjpj+9pTuOyWheKfdJQ3v2X0cDpdbLMnU7u3b1utaR/9Pm1uT7cSNkbQ1KvjootwU1Fe0ehmZCITfF0gHgSri7Un4BCrhxzxew9+h6UfjR5X6qUYGp1PNmmH7bA9aha9dZteQGGK2P6yY4fTWNwmn0+fX48O0wyk3ylDwjExKRPXJIjsgJmRHubXgvvTfenn/kS9/631ZbfW/wPCHXyv/xGxd+2cM=</latexit><latexit sha1_base64="38u3Msd8cLInPlViofgp/9PDR90=">AAACFXicbZDNSgMxFIUz9a+OrbZrN8EiiIsy40aXgi7ciBXsD3SGksnctqGZZEgyQhn6Am59BZ/GnbjybcxMu7CtFwIf5yY395wo5Uwbz/txKju7e/sH1UP3qObWj08atZ6WmaLQpZJLNYiIBs4EdA0zHAapApJEHPrR7K7o919BaSbFi5mnECZkItiYUWKs1Bk1Wl7bKwtvg7+CFlrVqOnUg1jSLAFhKCdaD30vNWFOlGGUw8INMg0poTMygaFFQRLQYV7uucDnVonxWCp7hMGl+vdFThKt50lkbybETPVmrxD/6w0zM74JcybSzICgy4/GGcdG4sI0jpkCavjcAqGK2V0xnRJFqLHRuME9WC8KHu3cpxQUMVJd5gFRk4SJRQlBQWvrLC3Z/PzNtLahd9X2vbb/7KEqOkVn6AL56BrdogfUQV1EUYze0Lvz4Xw6X8ucK84q8CZaK+f7F+W1ogc=</latexit><latexit sha1_base64="YgEGFlhxUQC9mG7/D4825E2/RMY=">AAACu3icbVFda9RAFJ3Ej9bY6tZXXwZLYSu6JD5YwRdBhT5UrOC2hZ0QbiY326GTSTofyhLzY/xZ/hsn2Sh264WEM+fMPTP3TN5IYWwc/wrCO3fv3d/afhA93Nl99Hiyt3Nmaqc5znkta32Rg0EpFM6tsBIvGo1Q5RLP86v3vX7+DbURtfpqVw2mFSyVKAUH66ls8vOAgV5WQmUMLdCTaTll19cOCsoq4X8Dfkt78YVf+cXhYfSnhzL6t7sRvgPsZZ63H7usZRYcZUZU1CvdYvDtqR+jJd3wTL3pME2rsehYG5WUcaE5Pckm+/EsHoreBskI9slYp9lesMuKmrsKleUSjFkkcWPTFrQVXGIXMWewAX4FS1x4qKBCk7bD6R098ExBy1r7T1k6sP92tFAZs6pyv7Mf12xqPfk/beFs+SZthWqcRcXXB5VOUlvT/mFoITRyK1ceANfC35XyS9DArX++iH1AP4vGT973c4MabK2ft2P63QBYj25cZz2SDzDZjOs2OHs1S+JZ8iUm2+QpeUamJCFH5B05JqdkTniwFbwMXgdH4XGoQreOOgzGzJ+QGxV+/w2httiR</latexit><latexit sha1_base64="YgEGFlhxUQC9mG7/D4825E2/RMY=">AAACu3icbVFda9RAFJ3Ej9bY6tZXXwZLYSu6JD5YwRdBhT5UrOC2hZ0QbiY326GTSTofyhLzY/xZ/hsn2Sh264WEM+fMPTP3TN5IYWwc/wrCO3fv3d/afhA93Nl99Hiyt3Nmaqc5znkta32Rg0EpFM6tsBIvGo1Q5RLP86v3vX7+DbURtfpqVw2mFSyVKAUH66ls8vOAgV5WQmUMLdCTaTll19cOCsoq4X8Dfkt78YVf+cXhYfSnhzL6t7sRvgPsZZ63H7usZRYcZUZU1CvdYvDtqR+jJd3wTL3pME2rsehYG5WUcaE5Pckm+/EsHoreBskI9slYp9lesMuKmrsKleUSjFkkcWPTFrQVXGIXMWewAX4FS1x4qKBCk7bD6R098ExBy1r7T1k6sP92tFAZs6pyv7Mf12xqPfk/beFs+SZthWqcRcXXB5VOUlvT/mFoITRyK1ceANfC35XyS9DArX++iH1AP4vGT973c4MabK2ft2P63QBYj25cZz2SDzDZjOs2OHs1S+JZ8iUm2+QpeUamJCFH5B05JqdkTniwFbwMXgdH4XGoQreOOgzGzJ+QGxV+/w2httiR</latexit><latexit sha1_base64="LMvrh5/kVgAFyuB7+QrReexaOGU=">AAACxnicbVFda9RAFJ3Ej9Zo7bY++jK4FLaiS+KDFXwpfkAfKlZw28JOCDeTm+3QySSdmVSWGPCv+LP8N06yUezWCwlnzplzZu6dtJLC2DD85fl37t67v7H5IHj4aOvx9mhn99SUteY446Us9XkKBqVQOLPCSjyvNEKRSjxLL993+tk1aiNK9dUuK4wLWCiRCw7WUcno5x4DvSiEShhaoMeTfMKurmrIKCuE+/X4Le3EF27lFvv7wR8PZfSvuxLOAfYiTZuPbdIwCzVlRhTUKe28z+2o70MkXcuMXWjfTaMxa1kT5JRxoTk9TkbjcBr2RW+DaABjMtRJsuNtsazkdYHKcgnGzKOwsnED2gousQ1YbbACfgkLnDuooEATN/3pLd1zTEbzUrtPWdqz/zoaKIxZFqnb2bVr1rWO/J82r23+Jm6EqmqLiq8OymtJbUm7h6GZ0MitXDoAXAt3V8ovQAO37vkC9gFdLxo/udzPFWqwpX7eDNNve8A6dOM6q5bcAKP1cd0Gp6+mUTiNvoTjw3fDKDfJU/KMTEhEDsghOSInZEa4t+G99F57B/6Rr/za/7ba6nuD5wm5Uf6P3xY+2b8=</latexit><latexit sha1_base64="LWz4dIFHTSmXqy3ZapCM+I1HvOk=">AAACxnicbVFda9RAFJ3Ej9ZodauPvgwuha3okojYQl+KH9CHihXctrATws3kZjs0maTzoSwx4F/xZ/lvnGSj2K0XEs6cM+fM3DtpXQhtwvCX59+6fefuxua94P6DrYePRtuPT3VlFccZr4pKnaegsRASZ0aYAs9rhVCmBZ6ll+86/ewrKi0q+cUsa4xLWEiRCw7GUcno5w4DtSiFTBgaoMeTfMKurixklJXC/Xp8QDvxhVu5xe5u8MdDGf3rroVzgLlI0+ZDmzTMgKVMi5I6pZ33uR31fYika5mxC+27aRRmLWuCnDIuFKfHyWgcTsO+6E0QDWBMhjpJtr0tllXcligNL0DreRTWJm5AGcELbANmNdbAL2GBcwcllKjjpj+9pTuOyWheKfdJQ3v2X0cDpdbLMnU7u3b1utaR/9Pm1uT7cSNkbQ1KvjootwU1Fe0ehmZCITfF0gHgSri7Un4BCrhxzxew9+h6UfjR5X6qUYGp1PNmmH7bA9aha9dZteQGGK2P6yY4fTWNwmn0+fX48O0wyk3ylDwjExKRPXJIjsgJmRHubXgvvTfenn/kS9/631ZbfW/wPCHXyv/xGxd+2cM=</latexit><latexit sha1_base64="LWz4dIFHTSmXqy3ZapCM+I1HvOk=">AAACxnicbVFda9RAFJ3Ej9ZodauPvgwuha3okojYQl+KH9CHihXctrATws3kZjs0maTzoSwx4F/xZ/lvnGSj2K0XEs6cM+fM3DtpXQhtwvCX59+6fefuxua94P6DrYePRtuPT3VlFccZr4pKnaegsRASZ0aYAs9rhVCmBZ6ll+86/ewrKi0q+cUsa4xLWEiRCw7GUcno5w4DtSiFTBgaoMeTfMKurixklJXC/Xp8QDvxhVu5xe5u8MdDGf3rroVzgLlI0+ZDmzTMgKVMi5I6pZ33uR31fYika5mxC+27aRRmLWuCnDIuFKfHyWgcTsO+6E0QDWBMhjpJtr0tllXcligNL0DreRTWJm5AGcELbANmNdbAL2GBcwcllKjjpj+9pTuOyWheKfdJQ3v2X0cDpdbLMnU7u3b1utaR/9Pm1uT7cSNkbQ1KvjootwU1Fe0ehmZCITfF0gHgSri7Un4BCrhxzxew9+h6UfjR5X6qUYGp1PNmmH7bA9aha9dZteQGGK2P6yY4fTWNwmn0+fX48O0wyk3ylDwjExKRPXJIjsgJmRHubXgvvTfenn/kS9/631ZbfW/wPCHXyv/xGxd+2cM=</latexit><latexit sha1_base64="LWz4dIFHTSmXqy3ZapCM+I1HvOk=">AAACxnicbVFda9RAFJ3Ej9ZodauPvgwuha3okojYQl+KH9CHihXctrATws3kZjs0maTzoSwx4F/xZ/lvnGSj2K0XEs6cM+fM3DtpXQhtwvCX59+6fefuxua94P6DrYePRtuPT3VlFccZr4pKnaegsRASZ0aYAs9rhVCmBZ6ll+86/ewrKi0q+cUsa4xLWEiRCw7GUcno5w4DtSiFTBgaoMeTfMKurixklJXC/Xp8QDvxhVu5xe5u8MdDGf3rroVzgLlI0+ZDmzTMgKVMi5I6pZ33uR31fYika5mxC+27aRRmLWuCnDIuFKfHyWgcTsO+6E0QDWBMhjpJtr0tllXcligNL0DreRTWJm5AGcELbANmNdbAL2GBcwcllKjjpj+9pTuOyWheKfdJQ3v2X0cDpdbLMnU7u3b1utaR/9Pm1uT7cSNkbQ1KvjootwU1Fe0ehmZCITfF0gHgSri7Un4BCrhxzxew9+h6UfjR5X6qUYGp1PNmmH7bA9aha9dZteQGGK2P6yY4fTWNwmn0+fX48O0wyk3ylDwjExKRPXJIjsgJmRHubXgvvTfenn/kS9/631ZbfW/wPCHXyv/xGxd+2cM=</latexit><latexit sha1_base64="LWz4dIFHTSmXqy3ZapCM+I1HvOk=">AAACxnicbVFda9RAFJ3Ej9ZodauPvgwuha3okojYQl+KH9CHihXctrATws3kZjs0maTzoSwx4F/xZ/lvnGSj2K0XEs6cM+fM3DtpXQhtwvCX59+6fefuxua94P6DrYePRtuPT3VlFccZr4pKnaegsRASZ0aYAs9rhVCmBZ6ll+86/ewrKi0q+cUsa4xLWEiRCw7GUcno5w4DtSiFTBgaoMeTfMKurixklJXC/Xp8QDvxhVu5xe5u8MdDGf3rroVzgLlI0+ZDmzTMgKVMi5I6pZ33uR31fYika5mxC+27aRRmLWuCnDIuFKfHyWgcTsO+6E0QDWBMhjpJtr0tllXcligNL0DreRTWJm5AGcELbANmNdbAL2GBcwcllKjjpj+9pTuOyWheKfdJQ3v2X0cDpdbLMnU7u3b1utaR/9Pm1uT7cSNkbQ1KvjootwU1Fe0ehmZCITfF0gHgSri7Un4BCrhxzxew9+h6UfjR5X6qUYGp1PNmmH7bA9aha9dZteQGGK2P6yY4fTWNwmn0+fX48O0wyk3ylDwjExKRPXJIjsgJmRHubXgvvTfenn/kS9/631ZbfW/wPCHXyv/xGxd+2cM=</latexit><latexit sha1_base64="LWz4dIFHTSmXqy3ZapCM+I1HvOk=">AAACxnicbVFda9RAFJ3Ej9ZodauPvgwuha3okojYQl+KH9CHihXctrATws3kZjs0maTzoSwx4F/xZ/lvnGSj2K0XEs6cM+fM3DtpXQhtwvCX59+6fefuxua94P6DrYePRtuPT3VlFccZr4pKnaegsRASZ0aYAs9rhVCmBZ6ll+86/ewrKi0q+cUsa4xLWEiRCw7GUcno5w4DtSiFTBgaoMeTfMKurixklJXC/Xp8QDvxhVu5xe5u8MdDGf3rroVzgLlI0+ZDmzTMgKVMi5I6pZ33uR31fYika5mxC+27aRRmLWuCnDIuFKfHyWgcTsO+6E0QDWBMhjpJtr0tllXcligNL0DreRTWJm5AGcELbANmNdbAL2GBcwcllKjjpj+9pTuOyWheKfdJQ3v2X0cDpdbLMnU7u3b1utaR/9Pm1uT7cSNkbQ1KvjootwU1Fe0ehmZCITfF0gHgSri7Un4BCrhxzxew9+h6UfjR5X6qUYGp1PNmmH7bA9aha9dZteQGGK2P6yY4fTWNwmn0+fX48O0wyk3ylDwjExKRPXJIjsgJmRHubXgvvTfenn/kS9/631ZbfW/wPCHXyv/xGxd+2cM=</latexit><latexit sha1_base64="LWz4dIFHTSmXqy3ZapCM+I1HvOk=">AAACxnicbVFda9RAFJ3Ej9ZodauPvgwuha3okojYQl+KH9CHihXctrATws3kZjs0maTzoSwx4F/xZ/lvnGSj2K0XEs6cM+fM3DtpXQhtwvCX59+6fefuxua94P6DrYePRtuPT3VlFccZr4pKnaegsRASZ0aYAs9rhVCmBZ6ll+86/ewrKi0q+cUsa4xLWEiRCw7GUcno5w4DtSiFTBgaoMeTfMKurixklJXC/Xp8QDvxhVu5xe5u8MdDGf3rroVzgLlI0+ZDmzTMgKVMi5I6pZ33uR31fYika5mxC+27aRRmLWuCnDIuFKfHyWgcTsO+6E0QDWBMhjpJtr0tllXcligNL0DreRTWJm5AGcELbANmNdbAL2GBcwcllKjjpj+9pTuOyWheKfdJQ3v2X0cDpdbLMnU7u3b1utaR/9Pm1uT7cSNkbQ1KvjootwU1Fe0ehmZCITfF0gHgSri7Un4BCrhxzxew9+h6UfjR5X6qUYGp1PNmmH7bA9aha9dZteQGGK2P6yY4fTWNwmn0+fX48O0wyk3ylDwjExKRPXJIjsgJmRHubXgvvTfenn/kS9/631ZbfW/wPCHXyv/xGxd+2cM=</latexit>

argmin<latexit sha1_base64="/8RfHiPqR2J1MpofaCATkWdehoU=">AAACz3icbVFdaxQxFM2MX3W0utVHX4LLwlZkmSmigg8ufoAPFVtw28JmWO5k7mxDMx9NMsoSR3z1r/iP/DdmZqdit15IODkn5+bem6SSQpsw/O35167fuHlr63Zw5+72vfuDnQdHuqwVxxkvZalOEtAoRYEzI4zEk0oh5InE4+Tsbasff0GlRVl8NqsK4xyWhcgEB+OoxeDXiIFa5qJYMDRA98fZmJ2f15BSlgu3dfgVbcWn7uQOu7vBhYcy+tddCecAc5ok9n2zsMxATZkWOXVKM+/yttS3PiXdyBm7pF03VmHaMBuMMsq4UJzuBxdvDIbhJOyCXgVRD4akj4PFjrfN0pLXORaGS9B6HoWViS0oI7jEJmC1xgr4GSxx7mABOerYdmU0dOSYlGalcqswtGP/dVjItV7libvZ9q03tZb8nzavTfYytqKoaoMFXz+U1ZKakrY/RFOhkBu5cgC4Eq5Wyk9BATfuHwP2Dl0vCj+6vJ8qVGBK9cT2I2o6wFp0qZx1S26A0ea4roKjvUkUTqLDZ8Ppm36UW+QReUzGJCIvyJR8IAdkRrg38J57r72pf+h/9b/7P9ZXfa/3PCSXwv/5B8AY3SY=</latexit><latexit sha1_base64="/8RfHiPqR2J1MpofaCATkWdehoU=">AAACz3icbVFdaxQxFM2MX3W0utVHX4LLwlZkmSmigg8ufoAPFVtw28JmWO5k7mxDMx9NMsoSR3z1r/iP/DdmZqdit15IODkn5+bem6SSQpsw/O35167fuHlr63Zw5+72vfuDnQdHuqwVxxkvZalOEtAoRYEzI4zEk0oh5InE4+Tsbasff0GlRVl8NqsK4xyWhcgEB+OoxeDXiIFa5qJYMDRA98fZmJ2f15BSlgu3dfgVbcWn7uQOu7vBhYcy+tddCecAc5ok9n2zsMxATZkWOXVKM+/yttS3PiXdyBm7pF03VmHaMBuMMsq4UJzuBxdvDIbhJOyCXgVRD4akj4PFjrfN0pLXORaGS9B6HoWViS0oI7jEJmC1xgr4GSxx7mABOerYdmU0dOSYlGalcqswtGP/dVjItV7libvZ9q03tZb8nzavTfYytqKoaoMFXz+U1ZKakrY/RFOhkBu5cgC4Eq5Wyk9BATfuHwP2Dl0vCj+6vJ8qVGBK9cT2I2o6wFp0qZx1S26A0ea4roKjvUkUTqLDZ8Ppm36UW+QReUzGJCIvyJR8IAdkRrg38J57r72pf+h/9b/7P9ZXfa/3PCSXwv/5B8AY3SY=</latexit><latexit sha1_base64="/8RfHiPqR2J1MpofaCATkWdehoU=">AAACz3icbVFdaxQxFM2MX3W0utVHX4LLwlZkmSmigg8ufoAPFVtw28JmWO5k7mxDMx9NMsoSR3z1r/iP/DdmZqdit15IODkn5+bem6SSQpsw/O35167fuHlr63Zw5+72vfuDnQdHuqwVxxkvZalOEtAoRYEzI4zEk0oh5InE4+Tsbasff0GlRVl8NqsK4xyWhcgEB+OoxeDXiIFa5qJYMDRA98fZmJ2f15BSlgu3dfgVbcWn7uQOu7vBhYcy+tddCecAc5ok9n2zsMxATZkWOXVKM+/yttS3PiXdyBm7pF03VmHaMBuMMsq4UJzuBxdvDIbhJOyCXgVRD4akj4PFjrfN0pLXORaGS9B6HoWViS0oI7jEJmC1xgr4GSxx7mABOerYdmU0dOSYlGalcqswtGP/dVjItV7libvZ9q03tZb8nzavTfYytqKoaoMFXz+U1ZKakrY/RFOhkBu5cgC4Eq5Wyk9BATfuHwP2Dl0vCj+6vJ8qVGBK9cT2I2o6wFp0qZx1S26A0ea4roKjvUkUTqLDZ8Ppm36UW+QReUzGJCIvyJR8IAdkRrg38J57r72pf+h/9b/7P9ZXfa/3PCSXwv/5B8AY3SY=</latexit><latexit sha1_base64="/8RfHiPqR2J1MpofaCATkWdehoU=">AAACz3icbVFdaxQxFM2MX3W0utVHX4LLwlZkmSmigg8ufoAPFVtw28JmWO5k7mxDMx9NMsoSR3z1r/iP/DdmZqdit15IODkn5+bem6SSQpsw/O35167fuHlr63Zw5+72vfuDnQdHuqwVxxkvZalOEtAoRYEzI4zEk0oh5InE4+Tsbasff0GlRVl8NqsK4xyWhcgEB+OoxeDXiIFa5qJYMDRA98fZmJ2f15BSlgu3dfgVbcWn7uQOu7vBhYcy+tddCecAc5ok9n2zsMxATZkWOXVKM+/yttS3PiXdyBm7pF03VmHaMBuMMsq4UJzuBxdvDIbhJOyCXgVRD4akj4PFjrfN0pLXORaGS9B6HoWViS0oI7jEJmC1xgr4GSxx7mABOerYdmU0dOSYlGalcqswtGP/dVjItV7libvZ9q03tZb8nzavTfYytqKoaoMFXz+U1ZKakrY/RFOhkBu5cgC4Eq5Wyk9BATfuHwP2Dl0vCj+6vJ8qVGBK9cT2I2o6wFp0qZx1S26A0ea4roKjvUkUTqLDZ8Ppm36UW+QReUzGJCIvyJR8IAdkRrg38J57r72pf+h/9b/7P9ZXfa/3PCSXwv/5B8AY3SY=</latexit>

???

vein veic

Page 71: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Results: string editing accuracy

Identity Multitask Meta This Work

18

5062

76

Page 72: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Results

change any n to a c

replace all n s with c

loocies

loocies

replace consonant - vowel pairings with n

replace pairs of letters consisting of a consonant followed by a

vowel with an n

ntnynd

ntnnd

(a)

(b)

(c)

examples true description true output

pred. description pred. output

emboldens kisses loneliness vein dogtrot

emboldecs kisses locelicess veic dogtrot

loonies

mapper concluding excuse effete contracting

npnr nncnnng exnn efnn nntncnng

betrayed

plummest bereaving eddied struggles evils

plummesti bereavinti eddieti struggleti evilti

change the last letter of the word

into t i

replace the last letter of the word with t i

mistrialti

mistrialti

mistrials

Figure 6: Example predictions for string editing.

posal model. These results are shown in Table 3.A few interesting facts stand out. Under the or-

dinary evaluation condition (with no ground-truthannotations provided), language-learning with nat-ural language data is actually better than language-learning with regular expressions. This might be be-cause the extra diversity helps the model figure outthe relevant axes of variation and avoid overfitting toindividual strings. Allowing the model to do its owninference is also better than providing ground-truthnatural language descriptions, suggesting that it isactually better at generalizing from the relevant con-cepts than our human annotators (who occasionallywrite things like I have no idea for the inferred rule).Unsurprisingly, with ground truth REs (which unlikethe human data are always correct) we can do betterthan any of the models that have to do inference.Coupling our inference procedure with an oracle REevaluator, we essentially recover the synthesis-basedapproach of Devlin et al. (2017). Our findings areconsistent with theirs: when a complete and accu-rate execution engine is available, there is no reasonnot to use it. But we can get almost 90% of the waythere with an execution model learned from scratch.Some examples of model behavior are shown in Fig-ure 6; more may be found in Appendix D.

6 Policy Search

The previous two sections examined supervised set-tings where the learning signal comes from few ex-

amples but is readily accessible. In this section, wemove to a set of reinforcement learning problems,where the learning signal is instead sparse and time-consuming to obtain. We evaluate on a collection of2-D treasure hunting tasks. These tasks require theagent to discover a rule that determines the locationof buried treasure in a large collection of environ-ments of the kind shown in Figure 7. To recoverthe treasure, the agent must navigate (while avoid-ing water) to its goal location, then perform a DIGaction. At this point the episode ends; if the treasureis located in the agent’s current position, it receivesa reward, otherwise it does not. In every task, thetreasure has consistently been buried at a fixed posi-tion relative to some landmark (like the heart in Fig-ure 7). Both the offset and the identity of the targetlandmark are unknown to the agent, and the locationlandmark itself varies across maps. Indeed, thereis nothing about the agent’s observations or actionspace to suggest that landmarks and offsets are eventhe relevant axis of variation across tasks, but thisstructure is made clear in the natural language an-notations. The high-level structure of these tasks issimilar to one used by Hermer-Vazquez et al. (2001)to study concept learning in humans.

The interaction between language and learning inthese tasks is rather different than in the supervisedsettings. In the supervised case, language servedmostly as a guard against overfitting, and could

Figure 7: Example treasure hunting task: the agent isplaced in a random environment and must collect a re-ward that has been hidden at a consistent offset with re-spect to some landmark. At language-learning time, nat-ural language instructions and expert policies are addi-tionally provided. The agent must both learn primitivenavigation skills, like avoiding water, as well as the high-level structure of the reward functions for this domain.

Page 73: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Problems

How do we bootstrap from (unannotated)exploration of the environment alone?

How good are inferred descriptions as explanations?

Page 74: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Problems

How do we bootstrap from (unannotated)exploration of the environment alone

How good are inferred descriptions as explanations?

Page 75: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Conclusions

Page 76: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Conclusions

Use RL in NLP by formulating language generation / understanding as reward maximization rather than supervised learning.

Page 77: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Conclusions

Use language in (I)RL as a scaffold for learning options, goal representations, cost functions.

Languages encode 100k years of accumulated knowledge about which abstractions are useful—take advantage of it!

Page 78: Linguistic sca olds for policy learningpeople.eecs.berkeley.edu/~jda/slides/a_rl_neurips18.pdf · a snake is slithering away from Jenny. ... Luciana Benotti and David Traum. 2009.

Thanks!

also… Looking for NLP jobs? Ask me about Microsoft! Applying to PhD programs? Ask me about MIT!

[email protected]