Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build...
Transcript of Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build...
![Page 1: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/1.jpg)
Question Answering
Fall 2019
COS 484: Natural Language Processing
![Page 2: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/2.jpg)
Announcements
• Final project presentation: January 13, 10am-12pm • Revised project proposal: November 22
• Come meet us during OHs!
![Page 3: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/3.jpg)
Course planning
Transformers + Question Answering
Dialogue
Advanced topics in QA and others
![Page 4: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/4.jpg)
RNNs vs Transformers
the movie was terribly exciting !
Transformer layer 3
Transformer layer 2
Transformer layer 1
![Page 5: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/5.jpg)
Transformers
(Vaswani et al, 2017): Attention is all you need
Key concepts: • (scaled) dot-product attention • Self-attention • Multi-head self-attention
![Page 6: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/6.jpg)
Recap: seq2seq with attention
eti = g(henc
i , hdect ) αt
i =exp(et
i )∑n
j=1 exp(etj)
at =n
∑i=1
αti henc
i
key
query value
![Page 7: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/7.jpg)
Generalized Attention
• A query and a set of key-value pairs to an outputq (ki, vi)
• Dot-product attention:A(q, {ki, vi}) = ∑
i
eq⋅ki
∑j eq⋅kjvi
ki, vi, q ∈ ℝd
• If we have multiple queries:
A(Q, K, V ) = softmax(QK⊺)V
Q ∈ ℝnQ×d, K, V ∈ ℝn×d
• Scaled dot-product attention:A(Q, K, V ) = softmax(
QK⊺
d)V
![Page 8: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/8.jpg)
Self-attention
• Input:
• Output: • Key idea: let’s use each word as query and compute the attention
with all the other words
x1, x2, …, xn ∈ ℝdin
h1, h2, …, hn ∈ ℝd
• Input: X ∈ ℝn×din
A(XWQ, XWK, XWV) ∈ ℝn×d
WQ, WK, WV ∈ ℝdin×d
![Page 9: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/9.jpg)
Multi-head self-attentionOne head is not expressive enough. Let’s have multiple heads!
A(Q, K, V ) = Concat(head1, …, headh)WO
headi = A(XWQi , XWK
i , XWVi )
In practice, , h = 8d = dout /h, WO ∈ ℝdout×dout
https://github.com/jessevig/bertviz
![Page 10: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/10.jpg)
Putting it all together
• Each Transformer block has two sub-layers • Multi-head attention • 2-layer feedforward NN (with ReLU)
• Each sublayer has a residual connection and a layer normalization
LayerNorm(x + SubLayer(x))
(Ba et al, 2016): Layer Normalization
• Input layer has a positional encoding
• BERT_base: 12 layers, 12 heads, hidden size = 768, 110M parameters
• BERT_large: 24 layers, 16 heads, hidden size = 1024, 340M parameters
![Page 11: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/11.jpg)
Encoder-decoder architecture
(Vaswani et al, 2017): Attention is all you need
![Page 12: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/12.jpg)
Question Answering
• Goal: build computer systems to answer questions
Question
When were the first pyramids built?
What’s the weather like in Princeton?
Why do we yawn?
Where is Einstein’s house?
Answer
2630 BC
42 F
When we’re bored or tired we don’t breathe as deeply as we normally do. This causes a drop in our blood-oxygen levels and yawning helps us counter-balance that.
112 Mercer St, Princeton, NJ 08540
![Page 13: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/13.jpg)
Question Answering
• You can easily find these answers in google today!
![Page 14: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/14.jpg)
Question Answering
• People ask lots of questions to Digital Personal Assistants:
![Page 15: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/15.jpg)
Question Answering
IBM Watson defeated two of Jeopardy's greatest champions in 2011
![Page 16: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/16.jpg)
Why care about question answering?
• Lots of immediate applications: search engines, dialogue systems
• Question answering is an important testbed for evaluating how well compute systems understand human language
“Since questions can be devised to query any aspect of text comprehension, the ability to answer questions is the strongest possible demonstration of understanding.”
![Page 17: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/17.jpg)
QA Taxonomy
• Factoid questions vs non-factoid questions
• Answers • A short span of text • A paragraph • Yes/No • A database entry • A list
• Context • A passage, a document, a large collection of documents • Knowledge base • Semi-structured tables • Images
![Page 18: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/18.jpg)
Textual Question Answering
(Rajpurkar et al, 2016): SQuAD: 100,000+ Questions for Machine Comprehension of Text
Also called “Reading Comprehension”
![Page 19: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/19.jpg)
Textual Question Answering
(Richardson et al, 2013): MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
James the Turtle was always getting in trouble.Sometimes he'd reach into the freezer and emptyout all the food. Other times he'd sled on the deckand get a splinter. His aunt Jane tried as hard asshe could to keep him out of trouble, but he wassneaky and got into lots of trouble behind herback.
One day, James thought he would go into townand see what kind of trouble he could get into. Hewent to the grocery store and pulled all thepudding off the shelves and ate two jars. Then hewalked to the fast food restaurant and ordered 15bags of fries. He didn't pay, and instead headedhome.
His aunt was waiting for him in his room. She toldJames that she loved him, but he would have tostart acting like a well-behaved turtle.
After about a month, and after getting into lots oftrouble, James finally made up his mind to be abetter turtle.
1) What is the name of the trouble making turtle?
A) Fries
B) PuddingC) James
D) Jane
2) What did James pull off of the shelves in the grocery store?A) pudding
B) friesC) food
D) splinters
![Page 20: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/20.jpg)
Conversational Question AnsweringThe Virginia governor’s race, billed as the marquee battle of an otherwise anticlimactic 2013 election cycle, is shaping up to be a foregone conclusion. Democrat Terry McAuliffe, the longtime political fixer and moneyman, hasn’t trailed in a poll since May. Barring a political miracle, Republican Ken Cuccinelli will be delivering a concession speech on Tuesday evening in Richmond. In recent ...
Q: What are the candidates running for? A: Governor
A: Virginia
Q: Who is the democratic candidate? A: Terry McAuliffe
A: Ken Cuccinelli
Q: Which of them is winning?
A: Republican
Q: Who is his opponent?
Q: What party does he belong to?
Q: Where?
(Reddy & Chen et al, 2019): CoQA: A Conversational Question Answering Challenge
![Page 21: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/21.jpg)
Long-form Question Answering
https://ai.facebook.com/blog/longform-qa/ (Fan et al, 2019): ELI5: Long Form Question Answering
![Page 22: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/22.jpg)
Open-domain Question Answering
(Chen et al, 2017): Reading Wikipedia to Answer Open-Domain Questions
DrQA
![Page 23: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/23.jpg)
Knowledge Base Question Answering
(Berant et al, 2013): Semantic Parsing on Freebase from Question-Answer Pairs
![Page 24: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/24.jpg)
Table-based Question Answering
(Pasupat and Liang, 2015): Compositional Semantic Parsing on Semi-Structured Tables.
![Page 25: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/25.jpg)
Visual Question Answering
(Antol et al, 2015): Visual Question Answering
![Page 26: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/26.jpg)
Stanford Question Answering Dataset (SQuAD)
• (passage, question, answer) triples
https://stanford-qa.com (Rajpurkar et al, 2016): SQuAD: 100,000+ Questions for Machine Comprehension of Text
• Passage is from Wikipedia, question is crowd-sourced
• Answer must be a span of text in the passage (aka. “extractive question answering”)• SQuAD 1.1: 100k answerable questions, SQuAD 2.0: another 50k unanswerable questions
![Page 27: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/27.jpg)
Stanford Question Answering Dataset (SQuAD)
(Rajpurkar et al, 2016): SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD 1.1 evaluation: • 3 gold answers are collected for each answer • Two metrics: exact match (EM) and F1 • Exact match: 1/0 accuracy on whether you match one of the three
answers • F1: take each gold answer and system output as bag of words, compute
precision, recall and harmonic mean. Take the max of the three scores.
Q: Rather than taxation, what are private schools largely funded by? A: {tuition, charging their students tuition, tuition}
![Page 28: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/28.jpg)
Feature-based models
• Generate a list of candidate answers • Considered only the constituents in parse trees
{a1, a2, …, aM}
(Rajpurkar et al, 2016): SQuAD: 100,000+ Questions for Machine Comprehension of Text
• Define a feature vector : • Word/bigram frequencies • Parse tree matches • Dependency labels, length, part-of-speech tags
ϕ(p, q, ai) ∈ ℝd
• Apply a (multi-class) logistic regression model
![Page 29: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/29.jpg)
BiLSTM-based models
(Seo et al, 2017): Bidirectional Attention Flow for Machine Comprehension
BiDAF
![Page 30: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/30.jpg)
BiLSTM-based models
(Seo et al, 2017): Bidirectional Attention Flow for Machine Comprehension
• Encode the question using word/character embeddings; pass to an biLSTM encoder
• Encode the passage similarly
• Passage-to-question and question-to-passage attention
• The entire model can be trained in an end-to-end way
• Modeling layer: another BiLSTM layer
• Output layer: two classifiers for predicting start and end points
![Page 31: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/31.jpg)
BERT-based models
Pre-training
![Page 32: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/32.jpg)
BERT-based models
• Concatenate question and passage as one single sequence separated with a [SEP] token, then pass it to the BERT encoder
• Train two classifiers on top of the passage tokens
![Page 33: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/33.jpg)
Experiments on SQuAD v1.1
40
55
70
85
10095.1
91.290.985.8
81.1
51.0
F1
Logistic Regression BiDAF++ + Human
Performancestate-of-the-art(as of Nov 2019)
*: single model only
![Page 34: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/34.jpg)
Is Reading Comprehension solved?
Nope, maybe the SQuAD dataset is solved.
![Page 35: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/35.jpg)
Is Reading Comprehension solved?
(Jia et al, 2017): Adversarial Examples for Evaluating Reading Comprehension Systems
![Page 36: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/36.jpg)
SQuAD Limitations
• SQuAD has a number of limitations: • Only span-based answers (no yes/no, counting, implicit why) • Questions were constructed looking at passages
• Not genuine information needs • Generally greater lexical and syntactic matching between
question and answer span • Barely any multi-fact/sentence inference beyond coreference
Slide credit: Chris Manning
• Nevertheless, it is a well-targeted, well-structured, clean dataset • The most used and competed QA dataset • A useful starting point for building systems in industry (although
in-domain data always really helps!)
![Page 37: Question Answering - Princeton University · 2019-11-19 · Question Answering • Goal: build computer systems to answer questions Question When were the first pyramids built? What’s](https://reader034.fdocuments.in/reader034/viewer/2022043000/5f779c1f7fd5c04e502aa029/html5/thumbnails/37.jpg)
37
DrQA Demohttps://github.com/facebookresearch/DrQA