Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to...

55
Quoc V. Le joint work with many Google Brain collaborators Sequence to Sequence Learning

Transcript of Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to...

Page 1: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc V. Le

joint work with many Google Brain collaborators

Sequence to Sequence Learning

brainvision summit

Page 2: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

0

300

600

900

1200

Q1-2012 Q3-2012 Q1-2013 Q3-2013 Q1-2014 Q3-2014 Q1-2015 Q3-2015

Projects use GoogleBrain

Time

Page 3: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

0

300

600

900

1200

Q1-2012 Q3-2012 Q1-2013 Q3-2013 Q1-2014 Q3-2014 Q1-2015 Q3-2015

Significant speech & vision improvements

Sequence to sequence models

TensorFlow launched

SmartReply

RankBrain

Directories use

GoogleBrain

Time

Discovery of Cat Detector on YouTube

Google Photos

Word2Vec

Page 4: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

0

300

600

900

1200

Q1-2012 Q3-2012 Q1-2013 Q3-2013 Q1-2014 Q3-2014 Q1-2015 Q3-2015

Significant speech & vision improvements

TensorFlow launched

SmartReply

RankBrain

Directories use

GoogleBrain

Time

Discovery of Cat Detector on YouTube

Google Photos

Word2Vec

Sequence to sequence models

Page 5: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Machines that understand natural language

Step 1: Understand words

Page 6: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Machines that understand natural language

Step 1: Understand words

Step 2: Understand strings of words

Page 7: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Machines that understand natural language

Step 1: Understand words

Step 2: Understand strings of words

Page 8: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

a aa dogcat

. . .

zzz

100 dimensions . . .

the

. . .

sat

. . .. . .

aardvark

Page 9: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

the cat sat on the mat

a aa dogcat

. . .

zzz

100 dimensions . . .

the

. . .

sat

. . .. . .

Page 10: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

lovelikeadoreking

queenprincess

prince

saturdaysundaymonday

manwoman

googlefacebook microsoft

ibm

hellohi

goodbye

englandfrance

russiachina

smallsmaller

bigbigger

first principal component

Page 11: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

lovelikeadoreking

queenprincess

prince

saturdaysundaymonday

manwoman

googlefacebook microsoft

ibm

hellohi

goodbye

englandfrance

russiachina

smallsmaller

bigbigger

first principal component

Page 12: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

lovelikeadoreking

queenprincess

prince

saturdaysundaymonday

manwoman

googlefacebook microsoft

ibm

hellohi

goodbye

englandfrance

russiachina

smallsmaller

bigbigger

first principal component

Page 13: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

four

five

three

two

one

English word vectors for numbers

Page 14: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

four

five

three

two

one

cuatro

cinco

tres

dos

uno

Spanish word vectors for numbers

English word vectors for numbers

Page 15: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

horsecow dog

cat

pig

caballovaca

cerdo

gato

perro

Spanish word vectors for animals

English word vectors for animals

Page 16: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

horsecow dog

cat

pig

caballovaca

cerdo

gato

perro

Spanish word vectors for animals

English word vectors for animals

Mikolov, Le, Sutskever Exploiting Similarities among Languages for

Machine Translation. arXiv, 2013

Page 17: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Machines that understand natural language

Step 1: Understand words

Step 2: Understand strings of words

Page 18: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i love music

i like mathematics

i enjoy computer science

me encanta la musica

Spanish “sentence” vectorsEnglish “sentence” vectors

me gustan las matemáticas

me gusta disfrutar de la informática

Page 19: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me encanta la musica

<start> me encanta lalove music musica

<stop>

Page 20: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me encanta la musica

<start> me encanta lalove music musica

<stop>

“end-to-end training”Sutskever, Vinyals, Le. Sequence to Sequence Learning

with Neural Networks. NIPS, 2014. Related Approach: Cho & Bengio. Encoder-Decoder RNNs.

Page 21: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

brainvision summit

lovemusic

<start>

meencantala

musica<stop>

Page 22: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me encanta la musica

<start> me encanta lalove music musica

<stop>

Page 23: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Mary admires John

Mary is in love with John

Mary respects John

Page 24: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Mary admires John

Mary is in love with John

Mary respects John

John admires Mary

John is in love with Mary

John respects Mary

Page 25: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: Hello, I have a problem with my machine Machine: Hi, this is Shiva. How may I assist you today? Quoc: I want to access using VPN Machine: Currently, the machine is connected to corporate network? Quoc: No Machine: Check out this solution <URL> Quoc: Thanks , bye! Machine: Thank you.

Page 26: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: What is the purpose of life? Machine: To serve the greater good.

Quoc: What is the purpose of living? Machine: To live forever.

Quoc: What is the purpose of dying? Machine: To have a life.

Vinyals, Le. A Neural Conversational Model. ICML Deep Learning Workshop, 2015.

Page 27: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

SmartReply

Page 28: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Machines that understand natural language

Step 1: Understand words

Step 2: Understand strings of words

Step 3: Have memory, logical reasoning

Page 29: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Advances in Memory and Logical Reasoning

“Soft” Attention Mechanism (Bahdanau, Cho, Bengio, 2014)

Augmented Memory (Graves et al, 2014; Weston et al, 2014)

Augmented Logic and Arithmetic Components (Neelakatan, Le, Sutskever, 2016)

Page 30: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me encanta la musica

<start> me encanta lalove music musica

<end>

Page 31: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me

<start>love music

Page 32: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me

<start>love music

Page 33: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me

<start>love music

Page 34: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me

<start>love music

Bahdanau, Cho, Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR, 2015.

Page 35: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me

<start>love musicExternal Memory

Graves, Wayne, Danihelka. Neural Turing Machines. ArXiv, 2014 Sukhabaatar, Szlam, Weston, Fergus.

End-to-end memory networks. NIPS, 2015.

Page 36: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me

<start>love musicExternal Memory

Arithmetic& Logic

Functions

Neelakantan, Le, Sutskever. Neural Programmer: Inducing Latent Programs with Gradient Descent. ICLR, 2016.

add(a,b)subtract(a,b)multiply(a,b)

Page 37: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

i

me

<start>love musicExternal Memory

Arithmetic& Logic

Functions

Neelakantan, Le, Sutskever. Neural Programmer: Inducing Latent Programs with Gradient Descent. ICLR, 2016.

add(a,b)subtract(a,b)multiply(a,b)

Page 38: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

Machine: 5 apples.

Page 39: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b)subtract(a,b)multiply(a,b)

Page 40: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b)subtract(a,b)multiply(a,b)

Page 41: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b)subtract(a,b)multiply(a,b)

Page 42: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b)subtract(a,b)multiply(a,b)

Page 43: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b)subtract(a,b)multiply(a,b)

2

Page 44: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b)subtract(a,b)multiply(a,b)

2,3

Page 45: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b)subtract(a,b)multiply(a,b)

2,3

Page 46: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b) = 5subtract(a,b) = -1multiply(a,b) = 6

2,3

Page 47: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b) = 5 * w1subtract(a,b) = -1 * w2multiply(a,b) = 6 * w3

2,3

Page 48: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b) = 5 * w1subtract(a,b) = -1 * w2multiply(a,b) = 6 * w3

2,3

Page 49: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

brainvision summit

External Memory

Arithmetic & LogicFunctions

add(a,b) = 5 * w1subtract(a,b) = -1 * w2multiply(a,b) = 6 * w3

2,3

Quoc: I have 2 apples and Tom gives me 3 apples. How many apples do I have?

Machine: 5 apples.

Page 50: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Machines that understand natural language

Step 1: Understand words

Step 2: Understand strings of words

Step 3: Have memory, logical reasoning

Step 4: Learn from many tasks

Page 51: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

English (unsupervised)

German (translation)

Tags (parsing)English

Page 52: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

English (unsupervised)

German (translation)

Tags (parsing)English

One-to-many

Page 53: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

English (unsupervised)

German (translation)

Tags (parsing)English

One-to-many

English (unsupervised)

Image (captioning) English

German (translation)

Many-to-one

German (translation)

English (unsupervised) German (unsupervised)

English

Many-to-many

Able to improve accuracies of all related tasks

Page 54: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

English (unsupervised)

German (translation)

Tags (parsing)English

One-to-many

English (unsupervised)

Image (captioning) English

German (translation)

Many-to-one

German (translation)

English (unsupervised) German (unsupervised)

English

Many-to-many

Dai, Le. Semi-supervised Sequence Learning. NIPS, 2015. Luong et al. Multitask Sequence Learning. ICLR, 2016

Able to improve accuracies of all related tasks

Page 55: Sequence to Sequence Learning · joint work with many Google Brain collaborators Sequence to Sequence Learning brain vision summit. 0 300 600 900 1200 Q1-2012 Q3-2012 Q1-2013 Q3-2013

Machines that understand natural language

Step 1: Understand words

Step 2: Understand strings of words

Step 3: Have memory, logical reasoning

Step 4: Learn from many tasks