Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI...
Transcript of Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI...
Language Models are Unsupervised Multitask Learners
Author: Alec Radford
OpenAI
Presenter: Faizan Ahmadhttps://qdata.github.io/deep2Read
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 1 / 14
Outline
1 Introduction
2 Related Work
3 OpenAI Generative Pre-Training (GPT) 2
4 Evaluation and Results
5 Discussion
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 2 / 14
Introduction
Pre-training and fine tuning - current trend in NLP
Can we do without fine tuning?
Can a single model learn multiple tasks?
Language models to the rescue
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 3 / 14
Related WorkMultitask Learning
Task Specific Architectures
Last 7-10 years
Single Model Finetuned on Different Tasks
BERT by GoogleOpenAI GPT
Single Model for Multiple Tasks without Finetuning
Reading Comprehension
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 4 / 14
OpenAI Generative Pre-Training (GPT) 2
Language modeling at the core
Train a large language model and solve multiple tasks with it
Figure: GPT-1 Model
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 5 / 14
OpenAI Generative Pre-Training (GPT) 2
Which dataset to use?
Character based input or word based?
How many parameters?
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 6 / 14
OpenAI Generative Pre-Training (GPT) 2Dataset (WebText)
Current large data sets have bad quality
Need new quality data sets
How?
Scrap links from RedditKeep the ones with more than 3 likes45 Million Links, 8 Million Documents
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 7 / 14
OpenAI Generative Pre-Training (GPT) 2Input
Word level input converges slowly
Character level input does not work very well
Need a middle ground
Byte Pair Encoding to the rescue (Sub Words)
Figure: BPE
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 8 / 14
OpenAI Generative Pre-Training (GPT) 2Parameters
Bigger is better
Figure: Effect of Parameters
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 9 / 14
ResultsZero-shot Setting
Figure: Summary Results
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 10 / 14
ResultsChildren’s Book Test
Fill in the blanks
10 Choices
New state of the art results
Figure: CBT Results
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 11 / 14
ResultsLAMBADA
Fill the last word of a sentence with at least 50 tokens
New state of the art results
Figure: Summary Results
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 12 / 14
ResultsQualitative Results
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 13 / 14
Discussion
New trend in deep learning for NLP
High quality text generation
Adversarial attacks on these models?
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 14 / 14