Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI...

14
Language Models are Unsupervised Multitask Learners Author: Alec Radford OpenAI Presenter: Faizan Ahmad https://qdata.github.io/deep2Read Author: Alec Radford Language Models are Unsupervised Multitask Learners Presenter: Faizan Ahmad https://qdata.gi

Transcript of Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI...

Page 1: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

Language Models are Unsupervised Multitask Learners

Author: Alec Radford

OpenAI

Presenter: Faizan Ahmadhttps://qdata.github.io/deep2Read

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 1 / 14

Page 2: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

Outline

1 Introduction

2 Related Work

3 OpenAI Generative Pre-Training (GPT) 2

4 Evaluation and Results

5 Discussion

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 2 / 14

Page 3: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

Introduction

Pre-training and fine tuning - current trend in NLP

Can we do without fine tuning?

Can a single model learn multiple tasks?

Language models to the rescue

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 3 / 14

Page 4: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

Related WorkMultitask Learning

Task Specific Architectures

Last 7-10 years

Single Model Finetuned on Different Tasks

BERT by GoogleOpenAI GPT

Single Model for Multiple Tasks without Finetuning

Reading Comprehension

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 4 / 14

Page 5: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

OpenAI Generative Pre-Training (GPT) 2

Language modeling at the core

Train a large language model and solve multiple tasks with it

Figure: GPT-1 Model

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 5 / 14

Page 6: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

OpenAI Generative Pre-Training (GPT) 2

Which dataset to use?

Character based input or word based?

How many parameters?

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 6 / 14

Page 7: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

OpenAI Generative Pre-Training (GPT) 2Dataset (WebText)

Current large data sets have bad quality

Need new quality data sets

How?

Scrap links from RedditKeep the ones with more than 3 likes45 Million Links, 8 Million Documents

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 7 / 14

Page 8: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

OpenAI Generative Pre-Training (GPT) 2Input

Word level input converges slowly

Character level input does not work very well

Need a middle ground

Byte Pair Encoding to the rescue (Sub Words)

Figure: BPE

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 8 / 14

Page 9: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

OpenAI Generative Pre-Training (GPT) 2Parameters

Bigger is better

Figure: Effect of Parameters

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 9 / 14

Page 10: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

ResultsZero-shot Setting

Figure: Summary Results

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 10 / 14

Page 11: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

ResultsChildren’s Book Test

Fill in the blanks

10 Choices

New state of the art results

Figure: CBT Results

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 11 / 14

Page 12: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

ResultsLAMBADA

Fill the last word of a sentence with at least 50 tokens

New state of the art results

Figure: Summary Results

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 12 / 14

Page 13: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

ResultsQualitative Results

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 13 / 14

Page 14: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec

Discussion

New trend in deep learning for NLP

High quality text generation

Adversarial attacks on these models?

Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 14 / 14