Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded...

15

Transcript of Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded...

Page 1: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate
Page 2: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

Isabelle Augenstein, UCL / UCPH

Machine Reading Using Neural Machines

Page 3: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

Goal: Fact Checking

query

“Unemployment in the US is 42%”

MachineReader

unemployment(US, 42%)

What is the stance of HRC on immigration?stance(HRC, immigration, X)

SemEval 2016 Stance Detection, EMNLP 2016,

SemEval 2017 RumourEval, Fake News Challenge 2017

Page 4: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

Goal: Understanding Scientific Publications

query

MachineReader

Method A outperforms method B for task C

outperforms(A, B, C)

What models exists for question answering?

method_for_task(QA)

SemEval 2017 ScienceIE (organiser), ACL 2017, ConLL 2017

Page 5: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

What models exists for question answering?

method_for_task(QA)

MachineReader

query

query

Questions

What is the stance of HRC on immigration?stance(HRC, immigration, X)

Page 6: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

What models exists for question answering?

method_for_task(QA)

MachineReader

query

query

retrieval

RNN, method-for, QA

“We introduce a RNN-based method for QA”

“Immigrants welcome!”Evidence

Questions

What is the stance of HRC on immigration?stance(HRC, immigration, X)

Page 7: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

What models exists for question answering?

method_for_task(QA)

MachineReader

query

query

retrieval

RNN, method-for, QA

“We introduce a RNN-based method for QA”

“Immigrants welcome!”

method_for_task(QA)

Representations

updatesanswers

Evidence

Questions

What is the stance of HRC on immigration?stance(HRC, immigration, X)

Page 8: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

What models exists for question answering?

method_for_task(QA)

MachineReader

query

query

Methods based on RNNs are widely used

RNNs

answers

HRC is in favour of immigration

favour

answers

retrieval

RNN, method-for, QA

“We introduce a RNN-based method for QA”

“Immigrants welcome!”

method_for_task(QA)

Representations

updatesanswers

Evidence

QuestionsAnswers

What is the stance of HRC on immigration?stance(HRC, immigration, X)

Page 9: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

Task: Stance Detection

Document: “… Led Zeppelin’s Robert Plant turned down £500 MILLION to reform supergroup. …”Headline/Target: “Robert Plant Ripped up $800M Led Zeppelin Reunion Contract

-> Confirming

Task: How do sequences relate to one another? (e.g. for, against, neutral)

Problems:

• Interpretation depends on headline/target

• Target not always seen in training set

• However, overlap between target + document

Fake News Challenge 2017

Page 10: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

Task: Stance Detection

Tweet:

Target: Legalization of Abortion, Atheism, Pro Life, …

Task: How do sequences relate to one another? (e.g. for, against, neutral)

Problems:

• Interpretation depends on target

• Target not always mentioned in tweet

• No training data for test target -> target-independent / unseen headline

approach neededSemEval 2016 Stance Detection, EMNLP 2016

A foetus has rights too!

Page 11: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

Task: Scientific Paper Summarisation

A Supervised Approach to Extractive Summarisation of Scientific Papers

Ed Collins, Isabelle Augenstein, Sebastian Riedel

{edward.collins.13 | i.augenstein | s.riedel}@ucl.ac.uk

Select the sentences

from within a paper

which best summarise

that paper. Treated as

a binary classification

task - each sentence

classified as either

summary or not.

The Task

Challenges

Data

Length

Data

Context

• 10148 Computer Science publications from Science Direct,

each with author written highlight statements.

• Each highlight statement is a good summary even out of

context

• Used the ROUGE-L metric to find more sentences like this in

each paper, treated these and highlights as positive data,

sampled equal amount of negative data

• Yielded 395160 items of data for training and testing

• 150 full papers used to evaluate summariser performance

measured with ROUGE-L

Approach

• Sentences can be good

summaries without any

context, tried to identify these

for summaries using a neural

network

• Dealt with length by including

sentence location in paper as

a feature to only select from

sections most relevant to

summaries

• Many classifiers trained:

features only, sentence text

only, individual features

• Strongest feature was

AbstractROUGE, ROUGE-L

score of sentence with

abstract, taking inspiration

from other work on

summarising papers

Results & Conclusion

Our model significantly outperforms many baselines. The best

models were ensembles of different classifiers. Classifiers

which used a neural network to read text suffered no

significant changes to performance if features were missing.

Classifiers trained on the automatically extended dataset

performed better than those trained on the original dataset.

Remaining challenges are to encode the whole document,

rather than just a sentence, with neural networks to better

understand its global context.

Code: https://github.com/EdCo95/scientific-paper-summarisation

Papers are long - a lot of information to

summarise

There are no big datasets to train data-

hungry learning algorithms

Sentences need to make sense out of

context

• Extractive summarisation

Binary classification task:

for each sentence, is it

summary statement or not?

• Fine neural encoding of

current sentence

• Simple, coarse features for

paragraph and global (e.g.

location) features

Page 12: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

Science Paper Summarisation Dataset

Paper title Statistical estimation of the names of HTTPS servers

Author-Written Highlights (= Summary Statements!)

- We present the domain name graph (DNG), which is a formal expression that

can keep track of cname chains and (…)

- …

Summary statements highlighted in main text:

In this work, we present a novel methodology that aims to (…)

The key contributions of this work are as follows.

We present the domain name graph (DNG), which is a formal expression that can

keep track of cname chains (challenge 1) and (…)

Page 13: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

Research Challenges

- Small Datasets

- Weak supervision, multi-task learning, semi-supervision

- Computational cost of neural machine reading

- Makes small benchmark datasets more attractive for research

- Few datasets with large or multiple documents

- Proliferation of datasets, some toy tasks or not well designed

- E.g. small vocabulary, easy to “game”

Page 14: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate

Thank you!Questions?

Papers at ACL 2017: ACL 2017, ConLL 2017, SemEval 2017 ScienceIE (organiser), SemEval 2017 RumourEval

[email protected]

@iaugensteingithub.com/isabelleaugenstein

Page 15: Using Neural Machines - microsoft.com...RNN, method-for, QA “We introduce a RNN-based ... Y ielded 395160 items of data for training and testing 150 full papers used to evaluate