Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating...

129
Structure and Interpretation of Neural Codes Jacob Andreas

Transcript of Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating...

Page 1: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Structure and Interpretation of Neural Codes

Jacob Andreas

Page 2: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Translating Neuralese

Jacob Andreas, Anca Drăgan and Dan Klein

Page 3: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Learning to Communicate

33[Wagner et al. 03, Sukhbaatar et al. 16, Foerster et al. 16]

Page 4: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Learning to Communicate

44

Page 5: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Neuralese

55

1.02.3-0.30.4-1.21.1

Page 6: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Translating neuralese

1.02.3-0.30.4-1.21.1

all clear

6

Page 7: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

• Interoperate with autonomous systems

• Diagnose errors

• Learn from solutions

Translating neuralese

1.02.3-0.30.4-1.21.1

all clear

[Lazaridou et al. 16] 7

Page 8: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Outline

Natural language & neuralese

Statistical machine translation

Semantic machine translation

Implementation details

Evaluation8

Page 9: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Outline

Natural language & neuralese

Statistical machine translation

Semantic machine translation

Implementation details

Evaluation9

Page 10: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Outline

Natural language & neuralese

Statistical machine translation

Semantic machine translation

Implementation details

Evaluation10

Page 11: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Outline

Natural language & neuralese

Statistical machine translation

Semantic machine translation

Implementation details

Evaluation1111

Page 12: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Outline

Natural language & neuralese

Statistical machine translation

Semantic machine translation

Implementation details

Evaluation12

Page 13: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Outline

Natural language & neuralese

Statistical machine translation

Semantic machine translation

Implementation details

Evaluation13

Page 14: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

A statistical MT problem

14

max p( | ) p( )0 a aa

1.02.3-0.30.4-1.21.1

all clear

[e.g. Koehn 10]

Page 15: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

A statistical MT problem

15

How do we induce a translation model?

Page 16: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

A statistical MT problem

16

max p( | ) p( )0 a aa

max Σ p( | ) p( | ) p( )0 aa

Page 17: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Strategy mismatch

17

�(s) =1

�(s)

� �

0

1

ex � 1xs dx

x

Page 18: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Strategy mismatch

18

not sure

�(s) =1

�(s)

� �

0

1

ex � 1xs dx

x

Page 19: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Strategy mismatch

19

not sure

dunno

Page 20: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Strategy mismatch

20

not sure

dunno

yes

Page 21: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Strategy mismatch

21

not sure

dunno

yes

yes

no

yes

Page 22: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Strategy mismatch

22

not sure yes

Σ p( , | not sure ) p( not sure )0

Page 23: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Stat MT criterion doesn’t capture meaning

23

moving(0,3)→(1,4)In the

intersection

Page 24: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Outline

Natural language & neuralese

Statistical machine translation

Semantic machine translation

Implementation details

Evaluation

24

Page 25: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

A “semantic MT” problem

25

I’m going north

The meaning of an utterance is given by its truth conditions

[Davidson 67]

Page 26: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

A “semantic MT” problem

26✔ ✔ ✘

I’m going north

The meaning of an utterance is given by its truth conditions

[Davidson 67]

Page 27: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

A “semantic MT” problem

27(loc(goalblue)north)

I’m going north

The meaning of an utterance is given by its truth conditions

Page 28: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

A “semantic MT” problem

280.4 0.2 0.001

I’m going north

The meaning of an utterance is given by its truth conditions

the distribution over states in which it is uttered

[Beltagy et al. 14]

Page 29: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

A “semantic MT” problem

290.4 0.2 0.001

I’m going north

The meaning of an utterance is given by its truth conditions

the distribution over states in which it is uttered

or equivalently, the belief it induces in listeners

[Frank et al. 09, A & Klein 16]

Page 30: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Representing meaning

30

The meaning of an utterance is given by its truth conditions

the distribution over states in which it is uttered

or equivalently, the belief it induces in listeners

Page 31: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Representing meaning

31

The meaning of an utterance is given by its truth conditions

the distribution over states in which it is uttered

or equivalently, the belief it induces in listeners

This distribution is well-defined even if the “utterance” is a vector rather than a sequence of tokens.

Page 32: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Translating with meaning

32

1.02.3-0.30.4-1.21.1

Page 33: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Translating with meaning

33

1.02.3-0.30.4-1.21.1

In the intersection

Page 34: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Translating with meaning

34

1.02.3-0.30.4-1.21.1

I’m going north

Page 35: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Translating with meaning

35

1.02.3-0.30.4-1.21.1

I’m going north

p( | )0p( | )a

Page 36: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Translating with meaning

36

1.02.3-0.30.4-1.21.1

I’m going north

β( )0β( )a

Page 37: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Interlingua!

37

source text target text

β( )0 β( )a

Page 38: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

KL( || )

Translation criterion

argmin

β( )0 β( )a

a

38

Page 39: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

KL( || )

Translation criterion

argmin

β( )0 β( )a

a

39

Page 40: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

KL( || )

Translation criterion

argmin

β( )0 β( )a

a

40

Page 41: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

KL( || )

Translation criterion

argmin

β( )0 β( )a

a

41

Page 42: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Computing representations

argmina

KL( || )β( ) β( )a0

42

Page 43: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Computing representations: sparsity

argmina

KL( || )β( ) β( )a0

p( | )ap( | )0

43

Page 44: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Computing representations: smoothing

argmina

KL( || )β( ) β( )a0

actions &messages

agentpolicy

44

Page 45: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

argmina

KL( || )β( ) β( )a0

actions &messages

agentpolicy

agentmodel

45

Computing representations: smoothing

Page 46: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

argmina

KL( || )β( ) β( )a0

actions &messages

human

46

Computing representations: smoothing

Page 47: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

argmina

KL( || )β( ) β( )a0

actions &messages

humanpolicy

humanmodel

47

Computing representations: smoothing

Page 48: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

argmina

KL( || )β( ) β( )a0

0.10

0.05

0.13

0.08

0.01

0.22

0 a

48

Computing representations: smoothing

Page 49: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Computing KL

argmina

KL( || )β( ) β( )a0

49

Page 50: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Computing KL

argmina

KL( || )β( ) β( )a0

KL(p || q) = Ep( )

q( )

50

p

Page 51: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Computing KL: sampling

argmina

KL( || )β( ) β( )a0

KL(p || q) = Σ p( ) logp( )

q( )

51

ii

i

i

Page 52: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Finding translations

argmina

KL( || )β( ) β( )a0

52

Page 53: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Finding translations: brute force

argmina

KL( || )β( ) β( )a0

going north

crossing the intersection

I’m done

after you

0.5

2.3

0.2

9.753

Page 54: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

argmina

KL( || )β( ) β( )a0

going north

crossing the intersection

I’m done

after you

0.5

2.3

0.2

9.754

Finding translations: brute force

Page 55: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

KL( || )

Finding translations

argmin

β( )0 β( )a

a

55

Page 56: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Outline

Natural language & neuralese

Statistical machine translation

Semantic machine translation

Implementation details

Evaluation56

Page 57: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Referring expression games

57

1.02.3-0.30.4-1.21.1

orange bird with black

face

Page 58: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Evaluation: translator-in-the-loop

58

1.02.3-0.30.4-1.21.1

orange bird with black

face

Page 59: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

59

1.02.3-0.30.4-1.21.1

orange bird with black

face

Evaluation: translator-in-the-loop

Page 60: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Experiment: color references

60

Page 61: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Experiment: color references

0.50

1.00Neuralese → Neuralese

English → English*0.83

61

Page 62: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

0.50

1.00

Neuralese → English* English → Neuralese

Neuralese → Neuralese

English → English*Statistical MT 0.83

0.72 0.70

62

Experiment: color references

Page 63: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

0.50

1.00

Neuralese → English* English → Neuralese

Neuralese → Neuralese

English → English*Statistical MT

Semantic MT

0.72 0.70

0.86

0.73

0.83

63

Experiment: color references

Page 64: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

translation), we focus on developing automaticmeasures of system performance. We use the avail-able training data to develop simulated models ofhuman decisions; by first showing that these mod-els track well with human judgments, we can beconfident that their use in evaluations will corre-late with human understanding. We employ thefollowing two metrics:

Belief evaluation This evaluation focuses on thedenotational perspective in semantics that moti-vated the initial development of our model. Wehave successfully understood the semantics of amessage z

r

if, after translating z

r

7! z

h

, a humanlistener can form a correct belief about the statein which z

r

was produced. We construct a simplestate-guessing game where the listener is presentedwith a translated message and two state observa-tions, and must guess which state the speaker wasin when the message was emitted.

When translating from natural language to neu-ralese, we use the learned agent model to directlyguess the hidden state. For neuralese to naturallanguage we must first construct a “model humanlistener” to map from strings back to state repre-sentations; we do this by using the training data tofit a simple regression model that scores (state, sen-tence) pairs using a bag-of-words sentence repre-sentation. We find that our “model human” matchesthe judgments of real humans 83% of the time onthe colors task, 77% of the time on the birds task,and 77% of the time on the driving task. This givesus confidence that the model human gives a reason-ably accurate proxy for human interpretation.

Behavior evaluation This evaluation focuses onthe cooperative aspects of interpretability: we mea-sure the extent to which learned models are ableto interoperate with each other by way of a transla-tion layer. In the case of reference games, the goalof this semantic evaluation is identical to the goalof the game itself (to identify the hidden state ofthe speaker), so we perform this additional prag-matic evaluation only for the driving game. Wefound that the most data-efficient and reliable wayto make use of human game traces was to constructa “deaf” model human. The evaluation selects afull game trace from a human player, and replaysboth the human’s actions and messages exactly (dis-regarding any incoming messages); the evaluationmeasures the quality of the natural-language-to-neuralese translator, and the extent to which the

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.50

0.770.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

learned agent model can accommodate a (real) hu-man given translations of the human’s messages.

Baselines We compare our approach to two base-lines: a random baseline that chooses a translationof each input uniformly from messages observedduring training, and a direct baseline that directlymaximizes p(z

0|z) (by analogy to a conventionalmachine translation system). This is accomplishedby sampling from a DCP speaker in training stateslabeled with natural language strings.

8 Results

In all below, “R” indicates a DCP agent, “H” in-dicates a real human, and “H*” indicates a modelhuman player.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in both

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

magenta, hot, rose

magenta, hot, violet

olive, puke, pea

pinkish, grey, dull

0

64

Experiment: color references

Page 65: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

translation), we focus on developing automaticmeasures of system performance. We use the avail-able training data to develop simulated models ofhuman decisions; by first showing that these mod-els track well with human judgments, we can beconfident that their use in evaluations will corre-late with human understanding. We employ thefollowing two metrics:

Belief evaluation This evaluation focuses on thedenotational perspective in semantics that moti-vated the initial development of our model. Wehave successfully understood the semantics of amessage z

r

if, after translating z

r

7! z

h

, a humanlistener can form a correct belief about the statein which z

r

was produced. We construct a simplestate-guessing game where the listener is presentedwith a translated message and two state observa-tions, and must guess which state the speaker wasin when the message was emitted.

When translating from natural language to neu-ralese, we use the learned agent model to directlyguess the hidden state. For neuralese to naturallanguage we must first construct a “model humanlistener” to map from strings back to state repre-sentations; we do this by using the training data tofit a simple regression model that scores (state, sen-tence) pairs using a bag-of-words sentence repre-sentation. We find that our “model human” matchesthe judgments of real humans 83% of the time onthe colors task, 77% of the time on the birds task,and 77% of the time on the driving task. This givesus confidence that the model human gives a reason-ably accurate proxy for human interpretation.

Behavior evaluation This evaluation focuses onthe cooperative aspects of interpretability: we mea-sure the extent to which learned models are ableto interoperate with each other by way of a transla-tion layer. In the case of reference games, the goalof this semantic evaluation is identical to the goalof the game itself (to identify the hidden state ofthe speaker), so we perform this additional prag-matic evaluation only for the driving game. Wefound that the most data-efficient and reliable wayto make use of human game traces was to constructa “deaf” model human. The evaluation selects afull game trace from a human player, and replaysboth the human’s actions and messages exactly (dis-regarding any incoming messages); the evaluationmeasures the quality of the natural-language-to-neuralese translator, and the extent to which the

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.50

0.770.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

learned agent model can accommodate a (real) hu-man given translations of the human’s messages.

Baselines We compare our approach to two base-lines: a random baseline that chooses a translationof each input uniformly from messages observedduring training, and a direct baseline that directlymaximizes p(z

0|z) (by analogy to a conventionalmachine translation system). This is accomplishedby sampling from a DCP speaker in training stateslabeled with natural language strings.

8 Results

In all below, “R” indicates a DCP agent, “H” in-dicates a real human, and “H*” indicates a modelhuman player.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in both

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

magenta, hot, rose

magenta, hot, violet

olive, puke, pea

pinkish, grey, dull

0

65

Experiment: color references

Page 66: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

translation), we focus on developing automaticmeasures of system performance. We use the avail-able training data to develop simulated models ofhuman decisions; by first showing that these mod-els track well with human judgments, we can beconfident that their use in evaluations will corre-late with human understanding. We employ thefollowing two metrics:

Belief evaluation This evaluation focuses on thedenotational perspective in semantics that moti-vated the initial development of our model. Wehave successfully understood the semantics of amessage z

r

if, after translating z

r

7! z

h

, a humanlistener can form a correct belief about the statein which z

r

was produced. We construct a simplestate-guessing game where the listener is presentedwith a translated message and two state observa-tions, and must guess which state the speaker wasin when the message was emitted.

When translating from natural language to neu-ralese, we use the learned agent model to directlyguess the hidden state. For neuralese to naturallanguage we must first construct a “model humanlistener” to map from strings back to state repre-sentations; we do this by using the training data tofit a simple regression model that scores (state, sen-tence) pairs using a bag-of-words sentence repre-sentation. We find that our “model human” matchesthe judgments of real humans 83% of the time onthe colors task, 77% of the time on the birds task,and 77% of the time on the driving task. This givesus confidence that the model human gives a reason-ably accurate proxy for human interpretation.

Behavior evaluation This evaluation focuses onthe cooperative aspects of interpretability: we mea-sure the extent to which learned models are ableto interoperate with each other by way of a transla-tion layer. In the case of reference games, the goalof this semantic evaluation is identical to the goalof the game itself (to identify the hidden state ofthe speaker), so we perform this additional prag-matic evaluation only for the driving game. Wefound that the most data-efficient and reliable wayto make use of human game traces was to constructa “deaf” model human. The evaluation selects afull game trace from a human player, and replaysboth the human’s actions and messages exactly (dis-regarding any incoming messages); the evaluationmeasures the quality of the natural-language-to-neuralese translator, and the extent to which the

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.50

0.770.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

learned agent model can accommodate a (real) hu-man given translations of the human’s messages.

Baselines We compare our approach to two base-lines: a random baseline that chooses a translationof each input uniformly from messages observedduring training, and a direct baseline that directlymaximizes p(z

0|z) (by analogy to a conventionalmachine translation system). This is accomplishedby sampling from a DCP speaker in training stateslabeled with natural language strings.

8 Results

In all below, “R” indicates a DCP agent, “H” in-dicates a real human, and “H*” indicates a modelhuman player.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in both

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

magenta, hot, rose

magenta, hot, violet

olive, puke, pea

pinkish, grey, dull

0

66

Experiment: color references

Page 67: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

translation), we focus on developing automaticmeasures of system performance. We use the avail-able training data to develop simulated models ofhuman decisions; by first showing that these mod-els track well with human judgments, we can beconfident that their use in evaluations will corre-late with human understanding. We employ thefollowing two metrics:

Belief evaluation This evaluation focuses on thedenotational perspective in semantics that moti-vated the initial development of our model. Wehave successfully understood the semantics of amessage z

r

if, after translating z

r

7! z

h

, a humanlistener can form a correct belief about the statein which z

r

was produced. We construct a simplestate-guessing game where the listener is presentedwith a translated message and two state observa-tions, and must guess which state the speaker wasin when the message was emitted.

When translating from natural language to neu-ralese, we use the learned agent model to directlyguess the hidden state. For neuralese to naturallanguage we must first construct a “model humanlistener” to map from strings back to state repre-sentations; we do this by using the training data tofit a simple regression model that scores (state, sen-tence) pairs using a bag-of-words sentence repre-sentation. We find that our “model human” matchesthe judgments of real humans 83% of the time onthe colors task, 77% of the time on the birds task,and 77% of the time on the driving task. This givesus confidence that the model human gives a reason-ably accurate proxy for human interpretation.

Behavior evaluation This evaluation focuses onthe cooperative aspects of interpretability: we mea-sure the extent to which learned models are ableto interoperate with each other by way of a transla-tion layer. In the case of reference games, the goalof this semantic evaluation is identical to the goalof the game itself (to identify the hidden state ofthe speaker), so we perform this additional prag-matic evaluation only for the driving game. Wefound that the most data-efficient and reliable wayto make use of human game traces was to constructa “deaf” model human. The evaluation selects afull game trace from a human player, and replaysboth the human’s actions and messages exactly (dis-regarding any incoming messages); the evaluationmeasures the quality of the natural-language-to-neuralese translator, and the extent to which the

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.50

0.770.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

learned agent model can accommodate a (real) hu-man given translations of the human’s messages.

Baselines We compare our approach to two base-lines: a random baseline that chooses a translationof each input uniformly from messages observedduring training, and a direct baseline that directlymaximizes p(z

0|z) (by analogy to a conventionalmachine translation system). This is accomplishedby sampling from a DCP speaker in training stateslabeled with natural language strings.

8 Results

In all below, “R” indicates a DCP agent, “H” in-dicates a real human, and “H*” indicates a modelhuman player.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in both

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

magenta, hot, rose

magenta, hot, violet

olive, puke, pea

pinkish, grey, dull

0

67

Experiment: color references

Page 68: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

translation), we focus on developing automaticmeasures of system performance. We use the avail-able training data to develop simulated models ofhuman decisions; by first showing that these mod-els track well with human judgments, we can beconfident that their use in evaluations will corre-late with human understanding. We employ thefollowing two metrics:

Belief evaluation This evaluation focuses on thedenotational perspective in semantics that moti-vated the initial development of our model. Wehave successfully understood the semantics of amessage z

r

if, after translating z

r

7! z

h

, a humanlistener can form a correct belief about the statein which z

r

was produced. We construct a simplestate-guessing game where the listener is presentedwith a translated message and two state observa-tions, and must guess which state the speaker wasin when the message was emitted.

When translating from natural language to neu-ralese, we use the learned agent model to directlyguess the hidden state. For neuralese to naturallanguage we must first construct a “model humanlistener” to map from strings back to state repre-sentations; we do this by using the training data tofit a simple regression model that scores (state, sen-tence) pairs using a bag-of-words sentence repre-sentation. We find that our “model human” matchesthe judgments of real humans 83% of the time onthe colors task, 77% of the time on the birds task,and 77% of the time on the driving task. This givesus confidence that the model human gives a reason-ably accurate proxy for human interpretation.

Behavior evaluation This evaluation focuses onthe cooperative aspects of interpretability: we mea-sure the extent to which learned models are ableto interoperate with each other by way of a transla-tion layer. In the case of reference games, the goalof this semantic evaluation is identical to the goalof the game itself (to identify the hidden state ofthe speaker), so we perform this additional prag-matic evaluation only for the driving game. Wefound that the most data-efficient and reliable wayto make use of human game traces was to constructa “deaf” model human. The evaluation selects afull game trace from a human player, and replaysboth the human’s actions and messages exactly (dis-regarding any incoming messages); the evaluationmeasures the quality of the natural-language-to-neuralese translator, and the extent to which the

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.50

0.770.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

learned agent model can accommodate a (real) hu-man given translations of the human’s messages.

Baselines We compare our approach to two base-lines: a random baseline that chooses a translationof each input uniformly from messages observedduring training, and a direct baseline that directlymaximizes p(z

0|z) (by analogy to a conventionalmachine translation system). This is accomplishedby sampling from a DCP speaker in training stateslabeled with natural language strings.

8 Results

In all below, “R” indicates a DCP agent, “H” in-dicates a real human, and “H*” indicates a modelhuman player.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in both

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

magenta, hot, rose

magenta, hot, violet

olive, puke, pea

pinkish, grey, dull

0

68

Experiment: color references

Page 69: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

translation), we focus on developing automaticmeasures of system performance. We use the avail-able training data to develop simulated models ofhuman decisions; by first showing that these mod-els track well with human judgments, we can beconfident that their use in evaluations will corre-late with human understanding. We employ thefollowing two metrics:

Belief evaluation This evaluation focuses on thedenotational perspective in semantics that moti-vated the initial development of our model. Wehave successfully understood the semantics of amessage z

r

if, after translating z

r

7! z

h

, a humanlistener can form a correct belief about the statein which z

r

was produced. We construct a simplestate-guessing game where the listener is presentedwith a translated message and two state observa-tions, and must guess which state the speaker wasin when the message was emitted.

When translating from natural language to neu-ralese, we use the learned agent model to directlyguess the hidden state. For neuralese to naturallanguage we must first construct a “model humanlistener” to map from strings back to state repre-sentations; we do this by using the training data tofit a simple regression model that scores (state, sen-tence) pairs using a bag-of-words sentence repre-sentation. We find that our “model human” matchesthe judgments of real humans 83% of the time onthe colors task, 77% of the time on the birds task,and 77% of the time on the driving task. This givesus confidence that the model human gives a reason-ably accurate proxy for human interpretation.

Behavior evaluation This evaluation focuses onthe cooperative aspects of interpretability: we mea-sure the extent to which learned models are ableto interoperate with each other by way of a transla-tion layer. In the case of reference games, the goalof this semantic evaluation is identical to the goalof the game itself (to identify the hidden state ofthe speaker), so we perform this additional prag-matic evaluation only for the driving game. Wefound that the most data-efficient and reliable wayto make use of human game traces was to constructa “deaf” model human. The evaluation selects afull game trace from a human player, and replaysboth the human’s actions and messages exactly (dis-regarding any incoming messages); the evaluationmeasures the quality of the natural-language-to-neuralese translator, and the extent to which the

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.50

0.770.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

learned agent model can accommodate a (real) hu-man given translations of the human’s messages.

Baselines We compare our approach to two base-lines: a random baseline that chooses a translationof each input uniformly from messages observedduring training, and a direct baseline that directlymaximizes p(z

0|z) (by analogy to a conventionalmachine translation system). This is accomplishedby sampling from a DCP speaker in training stateslabeled with natural language strings.

8 Results

In all below, “R” indicates a DCP agent, “H” in-dicates a real human, and “H*” indicates a modelhuman player.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in both

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

magenta, hot, rose

magenta, hot, violet

olive, puke, pea

pinkish, grey, dull

0

69

Experiment: color references

Page 70: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Experiment: image references

50

95

Neuralese → English* English → Neuralese

Neuralese → Neuralese

English → English*

Statistical MT

Semantic MT

77

72 70

86

73

70

Page 71: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

alization and ablation techniques used in previouswork on understanding complex models (Strobeltet al., 2016; Ribeiro et al., 2016).

While structurally quite similar to the task ofmachine translation between pairs of human lan-guages, interpretation of neuralese poses a numberof novel challenges. First, there is no natural sourceof parallel data: there are no bilingual “speakers”of both neuralese and natural language. Second,there may not be a direct correspondence betweenthe strategy employed by humans and CDP agents:even if it were constrained to communicate usingnatural language, an automated agent might chooseto produce a different message from humans in agiven state. We tackle both of these challenges byappealing to the grounding of messages in game-play. Our approach is based on one of the coreinsights in natural language semantics: messages(whether in neuralese or natural language) havesimilar meanings when they induce similar beliefsabout the state of the world. We explore severalrelated questions:

• What makes a good translation, and underwhat conditions is translation possible at all(Section 4)?

• How can we build a model to translatebetween neuralese and natural language(Section 5)?

• How does this model trade off the compet-ing demands of translatability and accuracy inlearned policies (Section 6)?

We conclude by applying our approach to agentstrained on a number of different tasks, finding thatit outperforms a more conventional machine trans-lation criterion both when attempting to interoper-ate with a neuralese speaker and when predictingits hidden state.

2 Related work

A variety of approaches for learning deep policieswith communication were proposed essentially si-multaneously in the past year. We have broadlylabeled these as “communicating deep policies”;concrete examples include Lazaridou et al. (2016b),Foerster et al. (2016), and Sukhbaatar et al. (2016).The policy representation we employ in this paperis similar to the latter two of these, although thegeneral framework is agnostic to low-level model-ing details and could be straightforwardly applied

to other architectures. Analysis of communicationstrategies in all these papers has been largely ad-hoc, obtained by clustering states from which simi-lar messages are emitted and attempting to manu-ally assign semantics to these clusters. The presentwork aims at developing tools for performing thisanalysis automatically.

Most closely related to our approach is that ofLazaridou et al. (2016a), who also develop a modelfor assigning natural language interpretations tolearned messages; however, this approach relieson a combination of supervised cluster labels andis targeted specifically towards referring expres-sion games. Here we attempt to develop an ap-proach that can handle general multiagent interac-tions without assuming a prior discrete structure inspace of observations.

The literature on learning decentralized multi-agent policies in general is considerably larger(Bernstein et al., 2002; Dibangoye et al., 2016).This includes work focused on communication inmultiagent settings (Roth et al., 2005) and evencommunication using natural language messages(Vogel et al., 2013b). All of these approaches em-ploy structured communication schemes with man-ually engineered messaging protocols; these are, insome sense, automatically interpretable, but at thecost of introducing considerable complexity intoboth training and inference.

Our evaluation in this paper investigates com-munication strategies that arise in a number of dif-ferent games, including reference games and anextended-horizon driving game. Communicationstrategies for reference games were previosly ex-plored by Vogel et al. (2013a), Andreas and Klein(2016) and Kazemzadeh et al. (2014), and refer-

large bird, black wings, black crown

small brown, light brown, dark brown

Figure 2: Preview of our approach—best-scoring translationsgenerated for a reference game involving images of birds.The speaking agent’s goal is to send a message that uniquelyidentifies the bird on the left. From these translations it can beseen that the learned model appears to discriminate based oncoarse attributes like size and color.

alization and ablation techniques used in previouswork on understanding complex models (Strobeltet al., 2016; Ribeiro et al., 2016).

While structurally quite similar to the task ofmachine translation between pairs of human lan-guages, interpretation of neuralese poses a numberof novel challenges. First, there is no natural sourceof parallel data: there are no bilingual “speakers”of both neuralese and natural language. Second,there may not be a direct correspondence betweenthe strategy employed by humans and CDP agents:even if it were constrained to communicate usingnatural language, an automated agent might chooseto produce a different message from humans in agiven state. We tackle both of these challenges byappealing to the grounding of messages in game-play. Our approach is based on one of the coreinsights in natural language semantics: messages(whether in neuralese or natural language) havesimilar meanings when they induce similar beliefsabout the state of the world. We explore severalrelated questions:

• What makes a good translation, and underwhat conditions is translation possible at all(Section 4)?

• How can we build a model to translatebetween neuralese and natural language(Section 5)?

• How does this model trade off the compet-ing demands of translatability and accuracy inlearned policies (Section 6)?

We conclude by applying our approach to agentstrained on a number of different tasks, finding thatit outperforms a more conventional machine trans-lation criterion both when attempting to interoper-ate with a neuralese speaker and when predictingits hidden state.

2 Related work

A variety of approaches for learning deep policieswith communication were proposed essentially si-multaneously in the past year. We have broadlylabeled these as “communicating deep policies”;concrete examples include Lazaridou et al. (2016b),Foerster et al. (2016), and Sukhbaatar et al. (2016).The policy representation we employ in this paperis similar to the latter two of these, although thegeneral framework is agnostic to low-level model-ing details and could be straightforwardly applied

to other architectures. Analysis of communicationstrategies in all these papers has been largely ad-hoc, obtained by clustering states from which simi-lar messages are emitted and attempting to manu-ally assign semantics to these clusters. The presentwork aims at developing tools for performing thisanalysis automatically.

Most closely related to our approach is that ofLazaridou et al. (2016a), who also develop a modelfor assigning natural language interpretations tolearned messages; however, this approach relieson a combination of supervised cluster labels andis targeted specifically towards referring expres-sion games. Here we attempt to develop an ap-proach that can handle general multiagent interac-tions without assuming a prior discrete structure inspace of observations.

The literature on learning decentralized multi-agent policies in general is considerably larger(Bernstein et al., 2002; Dibangoye et al., 2016).This includes work focused on communication inmultiagent settings (Roth et al., 2005) and evencommunication using natural language messages(Vogel et al., 2013b). All of these approaches em-ploy structured communication schemes with man-ually engineered messaging protocols; these are, insome sense, automatically interpretable, but at thecost of introducing considerable complexity intoboth training and inference.

Our evaluation in this paper investigates com-munication strategies that arise in a number of dif-ferent games, including reference games and anextended-horizon driving game. Communicationstrategies for reference games were previosly ex-plored by Vogel et al. (2013a), Andreas and Klein(2016) and Kazemzadeh et al. (2014), and refer-

large bird, black wings, black crown

small brown, light brown, dark brown

Figure 2: Preview of our approach—best-scoring translationsgenerated for a reference game involving images of birds.The speaking agent’s goal is to send a message that uniquelyidentifies the bird on the left. From these translations it can beseen that the learned model appears to discriminate based oncoarse attributes like size and color.

large bird, black wings, black crown

small brown, light brown, dark brown71

Experiment: image references

Page 72: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Experiment: driving game

1.35

1.93

Neuralese ↔ English*

Neuralese → Neuralese

Statistical MT

Semantic MT1.49

1.54

72

Page 73: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

How to translate

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.5

0.710.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

R / R H / H R / H

1.93 / 0.71 — / 0.771.35 / 0.641.49 / 0.67

1.54 / 0.67

Table 2: Behavior evaluation results for the driving game.Scores are presented in the form “reward / completion rate”.While less accurate than either humans or CDPs with a sharedlanguage, the models that employ a translation layer obtainhigher reward and a greater overall success rate than baselines.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in bothcases, while a model trained to communicate innatural language achieves somewhat lower perfor-mance. Regardless of whether the speaker is a CDPand the listener a model human or vice-versa, trans-lation based on the belief-matching criterion in Sec-tion 5 achieves the best performance; indeed, whentranslating from neuralese to natural language, thelistener is able to achieve a higher score than it isnatively. This suggests that the automated agenthas discovered a more effective strategy than theone demonstrated by humans in the dataset, andthat the effectiveness of this strategy is preservedby translation. Example translations from the refer-ence games are depicted in Figure 2 and Figure 7.

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

as speakerR H

aslis

tene

r R 0.850.50 random0.45 direct0.61 belief (ours)

H*0.5

0.770.450.57

Table 3: Belief evaluation results for the driving game. Drivingstates are challenging to identify based on messages alone (asevidenced by the comparatively low scores obtained by single-language pairs) . Translation based on belief achieves the bestoverall performance in both directions.

Driving game Behavior evaluation of the drivinggame is shown in Table 2, and belief evaluation isshown in Table 3. Translation of messages in thedriving game is considerably more challenging thanin the reference games, and scores are uniformlylower; however, a clear benefit from the belief-matching model is still visible. Belief matchingleads to higher scores on the belief evaluation inboth directions, and allows agents to obtain a higherreward on average (though task completion ratesremain roughly the same across all agents).

Some example translations of driving game mes-sages are shown in Figure 8.

9 Conclusion

We have investigated the problem of interpretingmessage vectors from communicating deep policiesby translating them. After introducing a translationcriterion based on matching listener beliefs aboutspeaker states, we presented both theoretical andempirical evidence that this criterion outperformsa more conventional machine translation approachat both recovering the content of message vectorsand facilitating collaboration between neuraleseand natural language speakers.

at goal, done, left to top

going in intersection, proceed, going

you first, following, going down

Figure 8: Best-scoring translations generated for driving task.

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.5

0.710.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

R / R H / H R / H

1.93 / 0.71 — / 0.771.35 / 0.641.49 / 0.67

1.54 / 0.67

Table 2: Behavior evaluation results for the driving game.Scores are presented in the form “reward / completion rate”.While less accurate than either humans or CDPs with a sharedlanguage, the models that employ a translation layer obtainhigher reward and a greater overall success rate than baselines.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in bothcases, while a model trained to communicate innatural language achieves somewhat lower perfor-mance. Regardless of whether the speaker is a CDPand the listener a model human or vice-versa, trans-lation based on the belief-matching criterion in Sec-tion 5 achieves the best performance; indeed, whentranslating from neuralese to natural language, thelistener is able to achieve a higher score than it isnatively. This suggests that the automated agenthas discovered a more effective strategy than theone demonstrated by humans in the dataset, andthat the effectiveness of this strategy is preservedby translation. Example translations from the refer-ence games are depicted in Figure 2 and Figure 7.

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

as speakerR H

aslis

tene

r R 0.850.50 random0.45 direct0.61 belief (ours)

H*0.5

0.770.450.57

Table 3: Belief evaluation results for the driving game. Drivingstates are challenging to identify based on messages alone (asevidenced by the comparatively low scores obtained by single-language pairs) . Translation based on belief achieves the bestoverall performance in both directions.

Driving game Behavior evaluation of the drivinggame is shown in Table 2, and belief evaluation isshown in Table 3. Translation of messages in thedriving game is considerably more challenging thanin the reference games, and scores are uniformlylower; however, a clear benefit from the belief-matching model is still visible. Belief matchingleads to higher scores on the belief evaluation inboth directions, and allows agents to obtain a higherreward on average (though task completion ratesremain roughly the same across all agents).

Some example translations of driving game mes-sages are shown in Figure 8.

9 Conclusion

We have investigated the problem of interpretingmessage vectors from communicating deep policiesby translating them. After introducing a translationcriterion based on matching listener beliefs aboutspeaker states, we presented both theoretical andempirical evidence that this criterion outperformsa more conventional machine translation approachat both recovering the content of message vectorsand facilitating collaboration between neuraleseand natural language speakers.

at goal, done, left to top

going in intersection, proceed, going

you first, following, going down

Figure 8: Best-scoring translations generated for driving task.

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.5

0.710.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

R / R H / H R / H

1.93 / 0.71 — / 0.771.35 / 0.641.49 / 0.67

1.54 / 0.67

Table 2: Behavior evaluation results for the driving game.Scores are presented in the form “reward / completion rate”.While less accurate than either humans or CDPs with a sharedlanguage, the models that employ a translation layer obtainhigher reward and a greater overall success rate than baselines.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in bothcases, while a model trained to communicate innatural language achieves somewhat lower perfor-mance. Regardless of whether the speaker is a CDPand the listener a model human or vice-versa, trans-lation based on the belief-matching criterion in Sec-tion 5 achieves the best performance; indeed, whentranslating from neuralese to natural language, thelistener is able to achieve a higher score than it isnatively. This suggests that the automated agenthas discovered a more effective strategy than theone demonstrated by humans in the dataset, andthat the effectiveness of this strategy is preservedby translation. Example translations from the refer-ence games are depicted in Figure 2 and Figure 7.

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

as speakerR H

aslis

tene

r R 0.850.50 random0.45 direct0.61 belief (ours)

H*0.5

0.770.450.57

Table 3: Belief evaluation results for the driving game. Drivingstates are challenging to identify based on messages alone (asevidenced by the comparatively low scores obtained by single-language pairs) . Translation based on belief achieves the bestoverall performance in both directions.

Driving game Behavior evaluation of the drivinggame is shown in Table 2, and belief evaluation isshown in Table 3. Translation of messages in thedriving game is considerably more challenging thanin the reference games, and scores are uniformlylower; however, a clear benefit from the belief-matching model is still visible. Belief matchingleads to higher scores on the belief evaluation inboth directions, and allows agents to obtain a higherreward on average (though task completion ratesremain roughly the same across all agents).

Some example translations of driving game mes-sages are shown in Figure 8.

9 Conclusion

We have investigated the problem of interpretingmessage vectors from communicating deep policiesby translating them. After introducing a translationcriterion based on matching listener beliefs aboutspeaker states, we presented both theoretical andempirical evidence that this criterion outperformsa more conventional machine translation approachat both recovering the content of message vectorsand facilitating collaboration between neuraleseand natural language speakers.

at goal, done, left to top

going in intersection, proceed, going

you first, following, going down

Figure 8: Best-scoring translations generated for driving task.

at goaldoneleft to top

going in intersectionproceedgoing

you firstfollowinggoing down

Page 74: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

• Classical notions of “meaning” apply even toun-language-like things (e.g. RNN states)

• These meanings can be compactly represented without logical forms if we have access to world states

• Communicating policies “say” interpretable things!

Conclusions so far

74

Page 75: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

• Classical notions of “meaning” apply even tonon-language-like things (e.g. RNN states)

• These meanings can be compactly represented without logical forms if we have access to world states

• Communicating policies “say” interpretable things!

75

Conclusions so far

Page 76: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

• Classical notions of “meaning” apply even tonon-language-like things (e.g. RNN states)

• These meanings can be compactly represented without logical forms if we have access to world states

• Communicating policies “say” interpretable things!

76

Conclusions so far

Page 77: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Limitations

argmina

KL( || )β( ) β( )a0

KL(p || q) = Σ p( ) logp( )

q( )

77

ii

i

i

Page 78: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

but what about compositionality?

Page 79: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Analogs of linguistic structure in deep representations

Jacob Andreas and Dan Klein

Page 80: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

“Flat” semantics

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.5

0.710.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

R / R H / H R / H

1.93 / 0.71 — / 0.771.35 / 0.641.49 / 0.67

1.54 / 0.67

Table 2: Behavior evaluation results for the driving game.Scores are presented in the form “reward / completion rate”.While less accurate than either humans or CDPs with a sharedlanguage, the models that employ a translation layer obtainhigher reward and a greater overall success rate than baselines.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in bothcases, while a model trained to communicate innatural language achieves somewhat lower perfor-mance. Regardless of whether the speaker is a CDPand the listener a model human or vice-versa, trans-lation based on the belief-matching criterion in Sec-tion 5 achieves the best performance; indeed, whentranslating from neuralese to natural language, thelistener is able to achieve a higher score than it isnatively. This suggests that the automated agenthas discovered a more effective strategy than theone demonstrated by humans in the dataset, andthat the effectiveness of this strategy is preservedby translation. Example translations from the refer-ence games are depicted in Figure 2 and Figure 7.

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

as speakerR H

aslis

tene

r R 0.850.50 random0.45 direct0.61 belief (ours)

H*0.5

0.770.450.57

Table 3: Belief evaluation results for the driving game. Drivingstates are challenging to identify based on messages alone (asevidenced by the comparatively low scores obtained by single-language pairs) . Translation based on belief achieves the bestoverall performance in both directions.

Driving game Behavior evaluation of the drivinggame is shown in Table 2, and belief evaluation isshown in Table 3. Translation of messages in thedriving game is considerably more challenging thanin the reference games, and scores are uniformlylower; however, a clear benefit from the belief-matching model is still visible. Belief matchingleads to higher scores on the belief evaluation inboth directions, and allows agents to obtain a higherreward on average (though task completion ratesremain roughly the same across all agents).

Some example translations of driving game mes-sages are shown in Figure 8.

9 Conclusion

We have investigated the problem of interpretingmessage vectors from communicating deep policiesby translating them. After introducing a translationcriterion based on matching listener beliefs aboutspeaker states, we presented both theoretical andempirical evidence that this criterion outperformsa more conventional machine translation approachat both recovering the content of message vectorsand facilitating collaboration between neuraleseand natural language speakers.

at goal, done, left to top

going in intersection, proceed, going

you first, following, going down

Figure 8: Best-scoring translations generated for driving task.

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.5

0.710.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

R / R H / H R / H

1.93 / 0.71 — / 0.771.35 / 0.641.49 / 0.67

1.54 / 0.67

Table 2: Behavior evaluation results for the driving game.Scores are presented in the form “reward / completion rate”.While less accurate than either humans or CDPs with a sharedlanguage, the models that employ a translation layer obtainhigher reward and a greater overall success rate than baselines.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in bothcases, while a model trained to communicate innatural language achieves somewhat lower perfor-mance. Regardless of whether the speaker is a CDPand the listener a model human or vice-versa, trans-lation based on the belief-matching criterion in Sec-tion 5 achieves the best performance; indeed, whentranslating from neuralese to natural language, thelistener is able to achieve a higher score than it isnatively. This suggests that the automated agenthas discovered a more effective strategy than theone demonstrated by humans in the dataset, andthat the effectiveness of this strategy is preservedby translation. Example translations from the refer-ence games are depicted in Figure 2 and Figure 7.

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

as speakerR H

aslis

tene

r R 0.850.50 random0.45 direct0.61 belief (ours)

H*0.5

0.770.450.57

Table 3: Belief evaluation results for the driving game. Drivingstates are challenging to identify based on messages alone (asevidenced by the comparatively low scores obtained by single-language pairs) . Translation based on belief achieves the bestoverall performance in both directions.

Driving game Behavior evaluation of the drivinggame is shown in Table 2, and belief evaluation isshown in Table 3. Translation of messages in thedriving game is considerably more challenging thanin the reference games, and scores are uniformlylower; however, a clear benefit from the belief-matching model is still visible. Belief matchingleads to higher scores on the belief evaluation inboth directions, and allows agents to obtain a higherreward on average (though task completion ratesremain roughly the same across all agents).

Some example translations of driving game mes-sages are shown in Figure 8.

9 Conclusion

We have investigated the problem of interpretingmessage vectors from communicating deep policiesby translating them. After introducing a translationcriterion based on matching listener beliefs aboutspeaker states, we presented both theoretical andempirical evidence that this criterion outperformsa more conventional machine translation approachat both recovering the content of message vectorsand facilitating collaboration between neuraleseand natural language speakers.

at goal, done, left to top

going in intersection, proceed, going

you first, following, going down

Figure 8: Best-scoring translations generated for driving task.

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.5

0.710.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

R / R H / H R / H

1.93 / 0.71 — / 0.771.35 / 0.641.49 / 0.67

1.54 / 0.67

Table 2: Behavior evaluation results for the driving game.Scores are presented in the form “reward / completion rate”.While less accurate than either humans or CDPs with a sharedlanguage, the models that employ a translation layer obtainhigher reward and a greater overall success rate than baselines.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in bothcases, while a model trained to communicate innatural language achieves somewhat lower perfor-mance. Regardless of whether the speaker is a CDPand the listener a model human or vice-versa, trans-lation based on the belief-matching criterion in Sec-tion 5 achieves the best performance; indeed, whentranslating from neuralese to natural language, thelistener is able to achieve a higher score than it isnatively. This suggests that the automated agenthas discovered a more effective strategy than theone demonstrated by humans in the dataset, andthat the effectiveness of this strategy is preservedby translation. Example translations from the refer-ence games are depicted in Figure 2 and Figure 7.

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

as speakerR H

aslis

tene

r R 0.850.50 random0.45 direct0.61 belief (ours)

H*0.5

0.770.450.57

Table 3: Belief evaluation results for the driving game. Drivingstates are challenging to identify based on messages alone (asevidenced by the comparatively low scores obtained by single-language pairs) . Translation based on belief achieves the bestoverall performance in both directions.

Driving game Behavior evaluation of the drivinggame is shown in Table 2, and belief evaluation isshown in Table 3. Translation of messages in thedriving game is considerably more challenging thanin the reference games, and scores are uniformlylower; however, a clear benefit from the belief-matching model is still visible. Belief matchingleads to higher scores on the belief evaluation inboth directions, and allows agents to obtain a higherreward on average (though task completion ratesremain roughly the same across all agents).

Some example translations of driving game mes-sages are shown in Figure 8.

9 Conclusion

We have investigated the problem of interpretingmessage vectors from communicating deep policiesby translating them. After introducing a translationcriterion based on matching listener beliefs aboutspeaker states, we presented both theoretical andempirical evidence that this criterion outperformsa more conventional machine translation approachat both recovering the content of message vectorsand facilitating collaboration between neuraleseand natural language speakers.

at goal, done, left to top

going in intersection, proceed, going

you first, following, going down

Figure 8: Best-scoring translations generated for driving task.

at goaldone

going in intersectionproceedgoing

you firstfollowing

80

Page 81: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Compositional semantics

81

Page 82: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Compositional semantics

82

Page 83: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Compositional semantics

✔ ✔ ✔

83

message

Page 84: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Compositional semantics

✔ ✔ ✔

everything but the blue shapes

orange square and non-squares

84[FitzGerald et al. 2013]

Page 85: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Compositional semantics

✔ ✔ ✔

lambdax:not(blue(x))

lambdax:or(orange(x),not(square(x))

85[FitzGerald et al. 2013]

Page 86: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Compositional semantics

✔ ✔ ✔

86

???

Page 87: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Model architecture

87

Page 88: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Model architecture

88

Page 89: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Model architecture

✔ ✔ ✔

89

Page 90: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Model architecture

✔ ✔ ✔

90

1.02.3-0.30.4-1.21.1

Page 91: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Computing meaning representations

91

1.02.3-0.30.4-1.21.1

on the left

Page 92: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

92

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

✔ ✔ ✔

✔ ✔ ✔✔

everything but squares

Computing meaning representations

Page 93: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

everything but squares

93

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

Computing meaning representations

Page 94: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

everything but squares

94

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

✔ ✔ ✔

✔ ✔ ✔✔

Computing meaning representations

Page 95: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

everything but squares

95

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

✔ ✔ ✔

✔ ✔ ✔✔

Computing meaning representations

Page 96: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

96

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

✔ ✔ ✔

✔ ✔ ✔✔

everything but squares

Computing meaning representations

Page 97: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

97

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

✔ ✔ ✔

✔ ✔ ✔✔

lambdax:not(square(x))

Computing meaning representations

Page 98: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

98

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

✔ ✔ ✔

✔ ✔ ✔✔

lambdax:not(square(x))

Computing meaning representations

Page 99: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

q( , ) =a KL( || )β( ) β( )a0

0.10

0.05

0.13

0.08

0.01

0.2299

0

0 a

Translation criterion

Page 100: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

q( , ) =a β( ) E[ ]β( )a0

0 a

100

0

✔ ✔ ✔

✔ ✔ ✔✔

✔ ✔ ✔

✔ ✔ ✔✔

=

Translation criterion

Page 101: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Experiments

“High-level” communicative behavior

“Low-level” message structure

101

Page 102: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Experiments

“High-level” communicative behavior

“Low-level” message structure

102

Page 103: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

103

Comparing strategies

Page 104: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

104

everything but squares

-0.11.30.5-0.40.21.0

Comparing strategies

Page 105: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

105

everything but squares

✔ ✔ ✔

✔ ✔ ✔✔

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

Comparing strategies

Page 106: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

106

everything but squares

✔ ✔ ✔

✔ ✔ ✔✔

=?

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

Comparing strategies

Page 107: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

107

✔ ✔ ✔

✔ ✔✔

=?

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

Theories of model behavior: random

Page 108: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

108

-0.11.30.5-0.40.21.0

✔ ✔ ✔

✔ ✔ ✔✔

✔ ✔

=?

Theories of model behavior: literal

Page 109: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

109

0.00

0.50

1.00

50

63

27

0 Literal Human

Evaluation: high-level scene agreement

Page 110: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

110

0.00

0.50

1.00

72

50

92

74

Random Literal Human

Evaluation: high-level object agreement

Page 111: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Experiments

“High-level” communicative behavior

“Low-level” message structure

111

Page 112: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Collecting translation data

112

all the red shapes

blue objects

everything but red

green squares

not green squares

Page 113: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Collecting translation data

113

λx.red(x)

λx.blu(x)

λx.¬red(x)

λx.grn(x)∧sqr(x)

λx.¬(grn(x)∧sqr(x))

Page 114: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Collecting translation data

114

λx.red(x)

λx.blu(x)

λx.¬red(x)

λx.grn(x)∧sqr(x)

λx.¬(grn(x)∧sqr(x))

0.1-0.30.51.1

-0.30.20.10.1

1.4-0.3-0.50.8

0.2-0.20.5-0.1

0.3-1.3-1.50.1

Page 115: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Extracting related pairs

115

λx.red(x) λx.¬red(x)

λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x))

0.1-0.30.51.1 1.4-0.3-0.50.8

0.2-0.20.5-0.1 0.3-1.3-1.50.1

Page 116: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Extracting related pairs

116

λx.red(x) λx.¬red(x)

λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x))

0.1-0.30.51.1 1.4-0.3-0.50.8

0.2-0.20.5-0.1 0.3-1.3-1.50.1

Page 117: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Learning compositional operators

117

argmin

2

Page 118: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Evaluating learned operators

λx.red(x) λx.¬red(x)

λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x))

0.1-0.30.51.1 1.4-0.3-0.50.8

0.2-0.20.5-0.1 0.3-1.3-1.50.1

λx.f(x)

0.2-0.20.5-0.1

Page 119: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Evaluating learned operators

λx.red(x) λx.¬red(x)

λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x))

0.1-0.30.51.1 1.4-0.3-0.50.8

0.2-0.20.5-0.1 0.3-1.3-1.50.1

λx.f(x)

0.2-0.20.5-0.1 -0.20.4-0.30.0

Page 120: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Evaluating learned operators

λx.red(x) λx.¬red(x)

λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x))

0.1-0.30.51.1 1.4-0.3-0.50.8

0.2-0.20.5-0.1 0.3-1.3-1.50.1

λx.f(x)

0.2-0.20.5-0.1

???

-0.20.4-0.30.0

Page 121: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

121

Evaluation: scene agreement for negation

0.00

0.50

1.00

50

81

120

Page 122: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

122�4 �3 �2 �1 0 1 2 3 4 5�5

�4

�3

�2

�1

0

1

2

3all the toys that

are not red

every thing that is red

only the blue andgreen objects

all items that arenot blue or green

Input

Predicted

True

Visualizing negation

Page 123: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

123

0.00

0.50

1.00

50

54

90

Evaluation: scene agreement for disjunction

Page 124: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

124

Input

Predicted

True

�4 �3 �2 �1 0 1 2 3 4�3

�2

�1

0

1

2

3all of the red objects

the blue andred items

the blue objects

the blue andyellow items

all the yellow toys

all yellow orred items

Visualizing disjunction

Page 125: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

• We can translate between neuralese and natural lang. by grounding in distributions over world states

• Under the right conditions, neuralese exhibits interpretable pragmatics & compositional structure

• Not just communication games—language might be a good general-purpose tool for interpreting deep reprs.

Conclusions

125

Page 126: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

• We can translate between neuralese and natural lang. by grounding in distributions over world states

• Under the right conditions, neuralese exhibits interpretable pragmatics & compositional structure

• Not just communication games—language might be a good general-purpose tool for interpreting deep reprs.

Conclusions

126

Page 127: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

• We can translate between neuralese and natural lang. by grounding in distributions over world states

• Under the right conditions, neuralese exhibits interpretable pragmatics & compositional structure

• Not just communication games—language might be a good general-purpose tool for interpreting deep reprs.

Conclusions

127

Page 128: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

Conclusions

128

not sure

dunno

yes

yes

no

yes

Page 129: Structure and Interpretation of Neural Codes · 2017. 9. 7. · Jacob Andreas. Translating Neuralese Jacob Andreas, Anca Drăgan and Dan Klein. Learning to Communicate [Wagner et

1.02.3-0.30.4-1.21.1 Thank you!

http://github.com/jacobandreas/{neuralese,rnn-syn}