Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational...

16
Variational Attention Sequence-to-Sequence Models Source : COLING 2018 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2020/01/03 for

Transcript of Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational...

Page 1: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Variational Attention

《Sequence-to-Sequence Models》

Source : COLING 2018

Speaker : Ya-Fang, Hsiao

Advisor : Jia-Ling, Koh

Date : 2020/01/03

for

Page 2: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

PART

Page 3: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Introduction

Auto-Encoder

Encoder-DecoderDeterministic

Variational

Auto-Encoder

Encoder-Decoder

DAE

DED

VAE

VED

Page 4: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

PART

Page 5: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Variational Autoencoder[Bowman et al. 2016] Generating Sentences from a Continuous Space

Data likelihood under the posterior (cross entropy)

KL divergence of the posterior from the prior

ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧 𝑥 [𝑙𝑜𝑔𝑝θ(𝑥|𝑧)] − KL 𝑞𝜙 𝑧 𝑥 ||𝑝(𝑧)

Page 6: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Variational Seq2Seq model

A B

Bypassing phenomenon

C D

Page 7: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

PART

Page 8: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

VED+VAttn

ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧, 𝑎 𝑥 [𝑙𝑜𝑔𝑝θ(𝑦|𝑧, 𝑎)] − KL 𝑞𝜙 𝑧, 𝑎 𝑥 ||𝑝(𝑧, 𝑎)

Variational Attention for

ℒ θ, 𝜙 = 𝔼𝑞𝜙(𝑧) 𝑧 𝑥 ,𝑞𝜙

(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎

−KL 𝑞𝜙(𝑧)

𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙(𝑎)

𝑎 𝑥 ||𝑝(𝑎)

Variational Encoder Decoder

1. 𝑁 0, 𝐼

2. 𝑁(തℎ 𝑠𝑟𝑐 , 𝐼)

Page 9: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

ℒ θ, 𝜙 = 𝔼𝑞𝜙(𝑧) 𝑧 𝑥 ,𝑞𝜙

(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎

−KL 𝑞𝜙𝑧

𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙(𝑎)

𝑎 𝑥 ||𝑝(𝑎)𝜆𝐾𝐿[ ]

VED+VAttn

Variational Attention for Variational Encoder Decoder

𝛾𝑎

+

Page 10: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

PART

Page 11: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Question GenerationStandford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)

Page 12: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Question GenerationStandford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)

Page 13: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Question GenerationStandford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)

Page 14: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Case study

Page 15: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

PART

Page 16: Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Using variational attention

to solve bypassing phenomenon

Showing more diversified

while retaining high quality