FAKE NEWS RESEARCH RECENT ADVANCES IN FACT CHECKING …architap/pdf/SIG_Talk.pdf · Fact-Checking...

FAKE NEWS RESEARCH – RECENT ADVANCES IN FACT CHECKING AND CLAIM

VERIFICATION

Presented by – Archita Pathak

THE SCIENCE OF FAKE NEWS

Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., ... & Schudson, M. The science of fake news. Science,

359(6380), 1094-1096. (2018).

2

Introduction

■ Fake News: fabricated information that mimics news media content in form but not in

organizational process or intent.

■ Parasitic on standard news outlets, simultaneously benefiting from and undermining their

credibility

■ General trust in the mass media has collapsed to historic lows

3

Fake News Properties

4

• During 2016 US presidential elections, the average American encountered between one and three fake news stories

• On Twitter, falsehood spreads faster than truth, especially when the topic is politics

How common is fake news?

• Lesser electoral impact

• Eg: influencing a person to vote for some other candidate

• Major social and behavioral impacts

• Increased cynicism and extremism

What is the impact?

• Empowering individuals (fact checking and educational training)

• Platform based detection

• Government intervention

What interventions can stem the flow

and influence?

PROGRESS TO DATE

Pathak, Archita, and Rohini K. Srihari. "BREAKING! Presenting Fake News Corpus for Automated Fact Checking." In Proceedings of the 57th Annual

Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 357-362. (2019).

5

Definitions

■ Fake News: Verifiably false or misleading information that is created,

presented and disseminated for monetary gain or to intentionally

deceive the public, and in any event to cause public harm (European

Commission, 2018).

■ Claim: Assertion of fact/event/opinion to influence reader perception.

6

Research QuestionsGiven an article which contains a set of claims {c1, c2, c3…cn},

1. Can we detect language clues for obvious fake news detection?

2. Can we automatically identify which claims play a key role in

influencing reader perception?

3. Can we verify the veracity of claims by finding evidence {e1,e2…em}

from trusted sources?

4. Can we develop a robust, scalable models for claim detection and

verification?

5. Can we explain the decision made by the models?

7

Previous Work

■ Broad classification into “fake” and “real”

– Based on user response (social media analysis, identifying source etc)

– Based on linguistic features (stylometry, LIWC features such as n-grams,

word count etc)

– Based on pattern learning models (Machine Learning and Deep Learning

Models for NLP)

8

Research OverviewDataset

Creation

Broad

Classification

Fine-Grained

Classification

Orthological &

morphological

features

Writing Style

Classification based on

veracity of claims.

Claim

Detection

Claim

Verification

Explanation

Important to overcome issues

like confirmation bias

9

Completed Tasks

Manual Annotation

False: Innovated lies written in

compelling way

Half baked/Partial truth:

Manipulating true events to

suit agenda

Opinions stated as facts:

Written in a third person

narrative with no disclaimer of

the story being a personal

opinion

Working links from

Stanford, NYU

dataset

Fake and

compelling

articles

Finalized 26 + 679 articles

on 2016 US Presidential

elections

Manually categorizing

articles based on

veracity of assertions

Automated cleaning

out gibberish like

[MORE], [CLICK

HERE] etc.

10

Label Description & Comparison■ Based on the percentage of claims verified, we categorize the entire article as:

– False: We couldn’t find any evidence that support any of the claims.

– Half baked/Partial truth: We could find refuting evidences for some claims and

supporting evidences for others.

– Opinions stated as facts: There are opinions in the articles.

■ Comparison with PolitiFact labels:

– PolitiFact is an organization that manually label political statements into 6

categories.

True Mostly True Half True Mostly False False Pants on Fire

Statement is

accurate and

there is nothing

significant

missing.

Statement is

accurate but

needs

clarification.

Statement is

partially accurate

but leaves out

important details

Statement contains

an element of truth

but ignores critical

fact that would give a

different impressions.

Statement is

not accurate

Statement is

not accurate

and makes a

ridiculous claim

11

Label Description & Comparison■ Based on the percentage of claims verified, we categorize the entire article as:

– False: We couldn’t find any evidence that support any of the claims.

– Half baked/Partial truth: We could find refuting evidences for some claims and

supporting evidences for others.

– Opinions stated as facts: There are opinions in the articles.

■ Comparison with (Rashkin et al., 2017):

– Satire: mimics real news but still cues the

reader that it is not meant to be taken seriously

– Hoax: convinces readers of the validity of a

paranoia-fueled story

– Propaganda: misleads readers so that they

believe a particular political/social agenda

12

Corpus DetailsURL Authors Content Headline Primary Label Secondary Label

URL of

article

Can contain

anonymous

authors

Collection of

assertions

(gibberish

removed)

Headline of the

article

1. False

2. Partial Truth

3. Opinions

1. Fake

2. Questionable

Top 20 most common keywords in fake news corpus

13

Challenge to distinguish fake news from mainstreamAttributes Mainstream Questionable

Word count range 20-100 21-100

Char count range 89-700 109-691

Uppercase words 0-14 0-8

Mainstream Example WASHINGTON - An exhausted Iraqi Army faces daunting obstacles on

the battlefield that will most likely delay for months a major offensive

on the Islamic State stronghold of Mosul, American and allied officials

say. The delay is expected despite American efforts to keep Iraqs

creaky war machine on track. Although President Obama vowed to end

the United States role in the war in Iraq, in the last two years the

American military has increasingly provided logistics to prop up the

Iraqi military, which has struggled to move basics like food, water and

ammunition to its troops.

Questionable Example WASHINGTON - Hillary Clinton is being accused of knowingly allowing

American weapons into the hands of ISIS terrorists. Weapons that

Hillary Clinton sent off to Qatar ostensibly designed to give to the rebels

in Libya eventually made their way to Syria to assist the overthrow of

the Assad regime. The folks fighting against Assad were ISIS and al-

Qaeda jihadists.14

Classification Model

Bi-directional architecture using bi-LSTM (128 units in

each layer) and Char Embedding, to learn orthographic

and morphological features of the text, implemented

using 1-D CNN with temporal maxpooling.

Split Random K-Fold

Questionable

(1)

Mainstrea

m (0)

Questionable

(1)

Mainstream

(0)

Train 406 5334 396 5343

Test 90 1345 100 1336

Evaluation results of our model over various metrics.

Performance of Stratified K-Fold is exceptionally good in terms

of ROC and F1 scores.15

TRUTH OF VARYING SHADES: Analyzing Language in Fake News and Political Fact-Checking

Rashkin, Hannah, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. "Truth of varying shades: Analyzing language in fake news and political fact-checking." In Proceedings of the 2017 Conference on Empirical Methods in

Natural Language Processing, pp. 2931-2937. (2017)

16

Overview

■ Presents an analytic study on the language of news media in the context of political fact-checking

and fake news detection.

■ Presented a dataset of fake news articles crawled from fake news domain

■ 4 categories : Trusted News, Hoax, Satire , Propaganda

■ Compares the language of real news with that of satire, hoaxes, and propaganda to find linguistic

characteristics of untrustworthy text.

■ Presents a case study based on PolitiFact.com using their factuality judgments on a 6-point scale.

17

Fake News Analysis – Linguistic Discussion

■ Ratio refers to how frequently it appears in fake articles compared to trusted ones

18

Fake News Analysis – News Reliability Prediction

19

FEVER - A large scale dataset for Fact Extraction and VERification

James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. Fever: a large scale dataset for fact extraction and verification. In Proceedings of the 2018 Conference of the North American Chapter of

the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), volume 1, pages 809–819 (2018)

20

Overview

■ 185,445 claims manually generated by altering sentences extracted from June, 2017 Wikipedia

dump

■ Manually annotated as SUPPORTED, REFUTED and NOTENOUGHINFO

– For the first two classes, annotators also recorded the sentence(s) used as evidence for the

judgement

■ To characterize the challenges posed by this dataset, developed a pipeline approach to for claim

verification task

■ Results:

– 31.87% Accuracy when requiring correct evidences to be retrieved for claims SUPPORTED OR

REFUTED

– 50.91% Accuracy if the correct evidences are ignored

21

BASELINE SYSTEM■ Pipeline approach with the following flow:

– Document Retrieval: k-nearest documents

using cosine similarity

– Sentence Selection: Simple IR methods to

rank sentences

– Recognizing Textual Entailment: . State of

the art model in RTE, decomposable

attention (DA)

■ Evaluation Measures:

– NOSCOREEV – accuracy of claim verification, neglecting the validity of evidence;

– SCOREEV – accuracy of claim verification with a requirement that the predicted evidence fully covers the gold evidence for SUPPORTED and REFUTED;

– F1 – between the predicted evidence sentences and the ones chosen by annotators.

22

Claim Verification Evidence Identification

NoScoreEv ScoreEV Recall Precision F1

50.91% 31.87% 45.89% 10.79% 17.47%

Full pipeline results on test set

TWOWINGOS: A Two-Wing Optimization Strategy for Evidential Claim Verification

Yin, Wenpeng, and Dan Roth. “TwoWingOS: A Two-Wing Optimization Strategy for Evidential Claim Verification.” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

(2018)

23

Problem Statement

■ A set of sentences 𝑆 as the candidate evidence space,

a claim 𝑥, and a decision space 𝑌 for the claim

verification.

■ Problem Definition: given a collection of evidence

candidates 𝑆 = {𝑠1, 𝑠2, … , 𝑠𝑖 … , 𝑠𝑚}, a claim x and a

decision set 𝑌 = {𝑦1, … , 𝑦𝑛}, the model TWOWINGOS

predicts a binary vector 𝑝 over 𝑆 and a one-hot vector

𝑜 over 𝑌 against the ground truth, a binary vector 𝑞and a one-hot vector 𝑧, respectively.

■ A binary vector over 𝑆 means a subset of sentences

(𝑆𝑒) act as evidence, and the one-hot vector indicates

a single decision (𝑦𝑖) to be made towards the claim 𝑥given the evidence 𝑆𝑒.

24

Evidence Identification■ Coarse-grained representation

– Directly concatenate the representation of si and 𝑥

■ Fine-grained representation (Inspired by “Attention Convolution”)

– Step 1: For each word in 𝑠𝑖, calculate its matching score to all z words in 𝑥

– Step 2: Use convolutional encoder to generate each word’s claim-aware representation

– Step 3: Compose these claim-aware word representations as the representation

for sentence 𝑠𝑖 by using a max-pooling over 𝒊𝑖𝑗

along with j, generating 𝒊𝑖. Let the entire process be denoted as 𝒊𝑖 = 𝑓𝑖𝑛𝑡(𝑠𝑖 , 𝑥)

– Step 4: The fine-grained evidence representation for 𝑠𝑖:

25

Evidence Identification

■ Loss Function

– A probability score 𝛼𝑖 ∈ (0,1) is calculated via a non-linear sigmoid function for each sentence

𝑠𝑖 showing the probability of it to be an evidence:

– Loss 𝑙𝑒𝑣 is calculated against a ground truth binary vector 𝑞 as a binary cross-entropy:

– As the output of this evidence identification module, the probability vector is binarized by 𝑝𝑖 =[𝛼𝑖 > 0.5] (“[x]” is 1 if x is true or 0 otherwise).

– 𝑝𝑖 indicates 𝑠𝑖 is evidence or not. All {𝑠𝑖} with 𝑝𝑖 = 1 act as evidence set 𝑆𝑒.

26

Claim Verification■ Coarse-grained representation

– All sentence representations in 𝑆𝑒 are summed up to create a representation 𝒆

■ Single-channel fine-grained representation

– 𝒊𝑖 = 𝑓𝑖𝑛𝑡(𝑠𝑖 , 𝑥) for 𝑠𝑖 and 𝒙𝐢 = 𝑓𝑖𝑛𝑡(𝑥, 𝑠𝑖) for 𝑥

■ Two-channel fine-grained representation

■ Loss function

– [𝒆, 𝐱] is forwarded to logistic regression layer in order to infer probability distribution 𝑜 over the

label spacy 𝑌, 𝒐 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑾. 𝒆, 𝒙 + 𝑏

– The loss 𝑙𝑐,𝑣 is implemented as negative log-likelihood, 𝑙𝑐,𝑣 = −log(𝒐. 𝒛𝑻)

– Hence, overall training loss for joint optimization 𝑙 = 𝑙𝑒,𝑣 + 𝑙𝑐,𝑣

27

Results

28

GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification

Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, Maosong Sun. GEAR: Graph based Evidence Aggregating and Reasoning

for Fact Verification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 892-901. (2019)

29

Overview■ Proposes a graph-based evidence aggregating and reasoning (GEAR) framework

■ Enables information to transfer on a fully-connected evidence graph

■ Utilizes different aggregators to collect multievidence information.

■ Motivation is to grasp sufficient relational and logical information among the evidence

30

Evidence Reasoning Graph

■ A fully-connected evidence graph where each node indicates a piece of evidence.

– Hidden state of the nodes at layer 𝑡 is represented as ℎ𝑡 = {ℎ1𝑡 , ℎ2

𝑡 , … , ℎ𝑚𝑡 } where ℎ𝑖

𝑡 ∈ ℝ𝐹𝑥1

and

– The initial hidden state of the node on layer 0, ℎ𝑖0 is initialized by the evidence representation

𝑒𝑖.

– An MLP to compute the attention coefficients between node 𝑖 and its neighbor node 𝑗 ∀𝑗 ∈ ℵ𝑖, where ℵ𝑖 denotes the set of neighbors of node 𝑖

■ Attention Aggregator:

31

Results

32

DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning

Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum. Declare: Debunking fake news and false claims using evidence-aware deep learning. In Proceedings of the 2018 Conference on Empirical Methods in

Natural Language Processing, pages 22–32. (2018)

33

Overview■ A neural network model that judiciously aggregates signals from external evidence articles:

– the language of these articles

– the trustworthiness of their sources

– informative features for generating user-comprehensible explanations

■ Problem Definition: Consider a set of 𝑁claims < 𝐶𝑛 > from the respective origins/sources <𝐶𝑆𝑛 >, where 𝑛 ∈ [1, 𝑁].

– Each claims 𝐶𝑛 is reported by a set of 𝑀 articles < 𝐴𝑚,𝑛 > along with their respective sources

< 𝐴𝑆𝑚,𝑛 >, where m ∈ [1,𝑀].

– Each corresponding tuple of claim and its origin, reporting articles and article sources - <𝐶𝑛, 𝐶𝑆𝑛, 𝐴𝑚,𝑛, 𝐴𝑆𝑚,𝑛 > forms a training instance, along with the credibility label of the claim as

ground truth during network training.

■ Example:

34

Framework for credibility assessment.

■ Upper part of the pipeline combines the article and claim embeddings to get the claim specific

attention weights.

■ Lower part of the pipeline captures the article representation through biLSTM.

■ Attention focused article representation along with the source embeddings are passed through

dense layers to predict the credibility score of the claim.35

Results – Snopes and PolitiFact

36

Results – NewsTrust and SemEval

37

Analysis

38

Sentence-Level Evidence Embedding for Claim Verification with Hierarchical Attention Networks

Jing Ma, Wei Gao, Shafiq Joty, and Kam-Fai Wong. Sentence-level evidence embedding for claim verification with hierarchical attention networks. In

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pages

2561–2571. (2019)

39

Problem Definition

■ A claim verification dataset is defined as {𝐶}, where each instance 𝐶 = (𝑦, 𝑐, 𝑆) is a tuple

representing a given claim 𝑐 which is associated with a ground truth label 𝑦 and a set of 𝑛sentences 𝑆 = {𝑠𝑖}𝑖=1

𝑛 from the relevant documents of the claim.

■ Approach exploits two core semantic relations:

– Coherence of the sentences: a coherence-based attention component by cross-

checking if any sentence 𝑠𝑖 ∈ 𝑆 coheres well with the claim and with other sentences

in 𝑆 in terms of topical consistency.

– Textual entailment between the claim and each sentence: an entailment-based

attention component that can be pre-trained on other dataset (SNLI) to capture

entailment relations based on sentence pairs labelled with NLI specific classes:

entails, contradicts and neutral.

40

System Design

■ Based on the attention weights, each sentence can be represented as the weighted sum of

all sentences, capturing its overall coherence:

■ Finally, the coherence based embedding is concatenated with original embedding to

obtain a richer sentence representation:

41

System Design■ Entailment-based Evidence Attention: enhancing the sentence representation by capturing the

entailment relations between the sentences and the claim based on the NLI method:

■ The overall model:

42

Experiment and Results

43

THANK YOU!

44

FAKE NEWS RESEARCH RECENT ADVANCES IN FACT CHECKING …architap/pdf/SIG_Talk.pdf · Fact-Checking...

Documents

Transcript of FAKE NEWS RESEARCH RECENT ADVANCES IN FACT CHECKING …architap/pdf/SIG_Talk.pdf · Fact-Checking...