Discreteness in Generative Models 25... · Multiple data inputs and multiple data outputs Input...

Discreteness in Generative Models-- Discrete Sequence Generation

Hao Dong

Peking University

• Introduction• RNN Language Modelling• Generating Sentences from a Continuous Space• Recap: Inverse RL vs. GAN

• GAN+RL• SeqGAN• RankGAN• MaskGAN

• GANs• Adversarial Text Generation without Reinforcement Learning• GANs for Sequences Generation with the Gumbel-softmax

Discreteness in Generative Models – Discrete Sequence Generation

RNN Language Modelling

• Recap: Recurrent Neural Network

Multiple inputs and multiple outputs

synchronous(simultaneous)

Hiddenstate

Output

asynchronous(Seq2Seq)

this is a bird

is a bird <EOS>

For testing, input “this,” output the entire sentence

• MLEtraining

• Theoutputofeachstepisequaltotheinputofitsnextstep.

• Synchronous Many-to-Many

• Limitations

• 1. No latent space – Representation learning

• 2. Exposure bias – Long sentence generation problem

• Introduction• RNN Language Modelling• Generating Sentences from a Continuous Space• Recap: Inversed RL vs. GAN

Generating Sentences from a Continuous Space

• Why Continuous Space

Generating Sentences from a Continuous Space. Bowman, Samuel R. Vilnis, Luke. Vinyals, Oriol. Dai, Andrew. Jozefowicz, Rafal. Bengio, Samy. Arxiv 2015.

• VAE Approach

Multiple data inputs and multiple data outputs

Hiddenstate

Output

asynchronous(Seq2Seq)

VAE reparametric trick here

• VAE Approach: Training

p(z|X)

• VAE Approach: Testing

• Application

Impute missing words within sentences.

• Application

Results from the mean of the posterior distribution and samples from that distribution.

• Application

Interpolation between two points in the VAE manifold.

• Limitation

Difficult to generate long sentences

Recap: Inverse RL vs. GAN

• Reinforcement Learning and Imitation Learning

• Why?

Difficult and even impossible to define reward functions for many environments

• Behaviour Cloning

• Inverse RL

• Behaviour Cloning == Supervised Learning

• Given expert’s demonstrations: (𝑠-, 𝑎-), (𝑠., 𝑎.)….. (𝑠/, 𝑎/)

• Supervised training:

Agent/Actor𝑠; 𝑎;

• Behaviour Cloning == Supervised Learning

• Problem

• Expert only samples from limited observation (states)(Solution: Dataset Aggregation)

• If machine has limited capacity, it may choose the wrong behavior to copy

• Assume the training and testing data distributions are the same

• Inverse Reinforcement Learning

RewardFunction

“Common”Reinforcement

Learning

OptimalActor

Environment

RewardFunction

“Inverse”Reinforcement

Learning

ExpertDemonstration

Environment𝒯 =(𝑠!, 𝑎!,𝑠", 𝑎"….. 𝑠# , 𝑎#)

……

RewardFunction

“Inverse”Reinforcement

Learning

ExpertDemonstration

Environment𝒯 =(𝑠!, 𝑎!,𝑠", 𝑎"….. 𝑠$ , 𝑎$)

……

Learning

OptimalActor

• Inverse RL: Training

Expert 𝜋 𝒯1 𝒯2 𝒯3…. 𝒯N

Actor $𝜋

=𝒯1 =𝒯2 =𝒯3…. =𝒯N

RewardFunction

Goal: The expert is always the best

Learning

𝑅(𝒯𝑛) > 5%&!

𝑅( :𝒯𝑛)

Updated Reward Function

Update ActorNeural Network

Neural Network

𝒯 =(𝑠!, 𝑎!,𝑠", 𝑎"….. 𝑠$ , 𝑎$)

• GAN

• Inverse RLExpert 𝜋 𝒯1 𝒯2 𝒯3 …. 𝒯N

Actor ?𝜋 "𝒯1 "𝒯2 "𝒯3 …. "𝒯N

RewardFunction

Learning

Dataset 𝑥1 𝑥2 𝑥3 …. 𝑥N

Generator $𝑥1 $𝑥2 $𝑥3 …. $𝑥N

Discriminator

"𝒯 are not differentiable ..so update the actor via RL

real images

expert demonstrations

$𝑥 are differentiable ..so update the generator through the discriminator

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

• Motivation

MLE has exposure bias

MLE tends to move to the mean when there is uncertainty“this is a …” “this is one” ç “a” or “one” which one is correct?

MLE-free sentence generation using adversarial networks could be better

• Challenge

Sentences are discrete

Sentence generation is not differentiable

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu. AAAI. 2016.

• Inverse RL

• SeqGAN

Expert 𝜋 𝒯1 𝒯2 𝒯3 …. 𝒯N

Actor ?𝜋 "𝒯1 "𝒯2 "𝒯3 …. "𝒯N

RewardFunction

Learning

"𝒯 are not differentiable ..so update the actor via RL

expert demonstrations

Dataset 𝑡1 𝑡2 𝑡3 …. 𝑡N

Generator �̂�1�̂�2�̂�3 ….�̂�N

Discriminator

Policy Gradient�̂� are not differentiable ..so update the generator via RL

real sentences

• Method Details

this is a bird

is a bird <EOS>• The generator is a LSTM or GRU language model

• Pretrain the generator using MLE before RL trainingfor better initialization

• The discriminator is a CNN network

• Method Details

• Results

RankGAN: Adversarial Ranking for Language Generation

• Motivation

the existing GANs restrict the discriminator to be a binary classifier, limiting their learning capacity for tasks that need to synthesize output with rich structures such as natural language descriptions

so … we use a Ranker as the discriminator … improved SeqGAN

RankGAN: Adversarial Ranking for Language Generation. Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun. NeurIPS. 2017

• Method

Generator hopes the Ranker rank the fake sentence to the top… a better reward function for RL training

• Method

𝑈 is a set of real data for reference𝐶 −

is a set of randomly sampled fake data𝐶

+is a set of randomly sampled real data

𝑝( and 𝐺) are real and fake distributionThe “Rank” of a data is the similarity with the reference 𝑈𝑦* and 𝑦+ are the feature vector of 𝑈 and 𝑠 (𝑠 is a singe data point)The similarity score of the input sequence 𝑠 given a reference 𝑢:

The ranking score for a certain sequence 𝑠 given a comparison set 𝐶:

The ranking score for the input sentence 𝑠 :

MaskGAN: Better Text Generation via Filling in the ____

• Motivation

SeqGAN and RankGAN use Policy Gradient for RL training, … thus we can only have a single reward for the whole sentence

Could we have a reward for each word?

MaskGAN uses Actor Critic for RL training.

MaskGAN: Better Text Generation via Filling in the ____ . William Fedus, Ian Goodfellow, Andrew M. Dai. ICLR 2018

• Method

Both Generator and Discriminator are Seq2Seq

real/fake real/fake no need no need no need

reward

Discriminator

Generator

• Results

• Conditional Sentence Generation == Inpainting

• Unconditional Sentence Generation == All words are masked

Adversarial Text Generation without Reinforcement Learning

Adversarial Text Generation without Reinforcement Learning. Donahue, David. Rumshisky, Anna. 2018

• Motivation

RL training is too slow …

• Method: Stage 1

The generator is a common Seq2Seq

Adversarial Text Generation without Reinforcement Learning

• Method: Stage 2

Adversarial Text Generation without Reinforcement Learning. Donahue, David. Rumshisky, Anna. 2018

GANs for Sequence Generation with Gumbel-softmax

• MotivationTrain like a normal GAN

GANs for Sequences of Discrete Elements with the Gumbel-softmax Distribution. Matt Kusner, Jose Miguel Hernandez-Lobato. NeurIPS. 2016.

• SeqGAN

• Gumbel-softmax GAN

Generator �̂�1�̂�2�̂�3 ….�̂�N

Discriminator

Policy Gradient

real sentences

Generator �̂�1�̂�2�̂�3 ….�̂�N

Discriminator

real sentences

Gumbel-softmax𝑧à

LSTM decoder

LSTM language model

LSTM encoder

• Limitation

• Gumbel-softmax’s gradient is too large … poor performance

• Introduction• RNN Language Modelling --• Generating Sentences from a Continuous Space 2016• Recap: Inverse RL vs. GAN

• GAN+RL• SeqGAN 2017• RankGAN 2017• MaskGAN 2018

• GANs• Adversarial Text Generation without Reinforcement Learning 2018• GANs for Sequences Generation with the Gumbel-softmax 2016

Discreteness in Generative Models – Discrete Sequence Generation

Thanks

Discreteness in Generative Models 25... · Multiple data inputs and multiple data outputs Input...

Documents

Transcript of Discreteness in Generative Models 25... · Multiple data inputs and multiple data outputs Input...

Lecture 9: Unsupervised, Generative & Adversarial Networks · Lecture 9: Unsupervised, Generative & Adversarial Networks ... GENERATIVE & ADVERSARIAL NETWORKS - 20 o Autoencoders

Strong Generative Capacity, Weak Generative Capacity, and …alpha-leonis.lids.mit.edu/wordpress/wp-content/uploads/... · 2014-05-01 · Strong Generative Capacity, Weak Generative

Seq2Seq Models with Dropout can Learn Generalizable ...people.umass.edu/bprickett/Downloads/SigmorphonPoster-PrickettEtAl.2018.pdf · Seq2Seq Models with Dropout can Learn Generalizable

Generative - dsba.korea.ac.krdsba.korea.ac.kr/wp/wp-content/seminar/Paper Review/VanillaGan... · •Generative models can be incorporated into reinforcement learning •Generative

ATELIER GENERATIVE SHEETMETAL DESIGN - VUE 3Dvue3d.pagesperso-orange.fr/tutoriaux/sheetmetal/ATELIER GENERATIVE... · ATELIER GENERATIVE SHEETMETAL DESIGN L’atelier permettant la

Dealing with discreteness

Strong Generative Capacity, Weak Generative Capacity…aclweb.org/anthology/J/J84/J84-3005.pdf · Strong Generative Capacity, Weak Generative Capacity, ... weak generative capacity,

Cogsci 2012 kibrik discreteness

Stock Prices Clustering and Discreteness

Do Price Discreteness andTransactions Costs A¡ect Stock ...

Generative Adversarial Networks (GANs) - Sciencesconf.org · Generative Adversarial Networks (GANs) Anirban Mukhopadhyay TU Darmstadt, Germany. Introduction Generative vs. Discriminative

arXiv:2004.11019v3 [cs.CL] 11 Jun 20202.1 Seq2Seq Dialogue Generation We deﬁne the Seq2Seq task-oriented dialogue gen-eration as ﬁnding the system response Yaccording to the input

Predicting Outcomes of Chemical Reactions: A Seq2Seq ...

Generative Art Conference GA2008 A Generative Remixing of ...unemi/1/DT4/GA08Unemi.pdf · 11th Generative Art Conference GA2008 Page 392 4. Concluding remarks A generative remixing

Seq2Seq RNN based Gait Anomaly Detection from Smartphone Acquired Multimodal … › pdf › 1911.08608v1.pdf · 2019-11-21 · 1 Seq2Seq RNN based Gait Anomaly Detection from Smartphone

New Era of Discreteness and Periodicity in Optics George Stegeman , KFUPM Chair Professor

GENERATIVE PROCESS GENERATIVE OUTCOME …media.aicommunity.net/2015/12/Generative-process...generative questions, generative conversations, and generative action are offered. Keywords:

The Discreteness of Matter

Table-to-text Generation by Structure-aware Seq2seq Learning · Table-to-text Generation by Structure-aware Seq2seq Learning Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang and Zhifang

Interpreting, Training, and Distilling Seq2Seq Modelsnlp.seas.harvard.edu/seq2seq-talk/slidescmu.pdf · Actual Seq2Seq / Encoder-Decoder / Attention-Based Models Di erent encoders,