International Program (Japan) 2012.01.31 - 2012.02.14 Chang Seungbin, Choi Dasom, Choi Hyunjoo.
FAKE NEWS RESEARCH RECENT ADVANCES IN FACT CHECKING …architap/pdf/SIG_Talk.pdf · Fact-Checking...
Transcript of FAKE NEWS RESEARCH RECENT ADVANCES IN FACT CHECKING …architap/pdf/SIG_Talk.pdf · Fact-Checking...
FAKE NEWS RESEARCH – RECENT ADVANCES IN FACT CHECKING AND CLAIM
VERIFICATION
Presented by – Archita Pathak
THE SCIENCE OF FAKE NEWS
Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., ... & Schudson, M. The science of fake news. Science,
359(6380), 1094-1096. (2018).
2
Introduction
■ Fake News: fabricated information that mimics news media content in form but not in
organizational process or intent.
■ Parasitic on standard news outlets, simultaneously benefiting from and undermining their
credibility
■ General trust in the mass media has collapsed to historic lows
3
Fake News Properties
4
• During 2016 US presidential elections, the average American encountered between one and three fake news stories
• On Twitter, falsehood spreads faster than truth, especially when the topic is politics
How common is fake news?
• Lesser electoral impact
• Eg: influencing a person to vote for some other candidate
• Major social and behavioral impacts
• Increased cynicism and extremism
What is the impact?
• Empowering individuals (fact checking and educational training)
• Platform based detection
• Government intervention
What interventions can stem the flow
and influence?
PROGRESS TO DATE
Pathak, Archita, and Rohini K. Srihari. "BREAKING! Presenting Fake News Corpus for Automated Fact Checking." In Proceedings of the 57th Annual
Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 357-362. (2019).
5
Definitions
■ Fake News: Verifiably false or misleading information that is created,
presented and disseminated for monetary gain or to intentionally
deceive the public, and in any event to cause public harm (European
Commission, 2018).
■ Claim: Assertion of fact/event/opinion to influence reader perception.
6
Research QuestionsGiven an article which contains a set of claims {c1, c2, c3…cn},
1. Can we detect language clues for obvious fake news detection?
2. Can we automatically identify which claims play a key role in
influencing reader perception?
3. Can we verify the veracity of claims by finding evidence {e1,e2…em}
from trusted sources?
4. Can we develop a robust, scalable models for claim detection and
verification?
5. Can we explain the decision made by the models?
7
Previous Work
■ Broad classification into “fake” and “real”
– Based on user response (social media analysis, identifying source etc)
– Based on linguistic features (stylometry, LIWC features such as n-grams,
word count etc)
– Based on pattern learning models (Machine Learning and Deep Learning
Models for NLP)
8
Research OverviewDataset
Creation
Broad
Classification
Fine-Grained
Classification
Orthological &
morphological
features
Writing Style
Classification based on
veracity of claims.
Claim
Detection
Claim
Verification
Explanation
Important to overcome issues
like confirmation bias
9
Completed Tasks
Manual Annotation
False: Innovated lies written in
compelling way
Half baked/Partial truth:
Manipulating true events to
suit agenda
Opinions stated as facts:
Written in a third person
narrative with no disclaimer of
the story being a personal
opinion
Working links from
Stanford, NYU
dataset
Fake and
compelling
articles
Finalized 26 + 679 articles
on 2016 US Presidential
elections
Manually categorizing
articles based on
veracity of assertions
Automated cleaning
out gibberish like
[MORE], [CLICK
HERE] etc.
10
Label Description & Comparison■ Based on the percentage of claims verified, we categorize the entire article as:
– False: We couldn’t find any evidence that support any of the claims.
– Half baked/Partial truth: We could find refuting evidences for some claims and
supporting evidences for others.
– Opinions stated as facts: There are opinions in the articles.
■ Comparison with PolitiFact labels:
– PolitiFact is an organization that manually label political statements into 6
categories.
True Mostly True Half True Mostly False False Pants on Fire
Statement is
accurate and
there is nothing
significant
missing.
Statement is
accurate but
needs
clarification.
Statement is
partially accurate
but leaves out
important details
Statement contains
an element of truth
but ignores critical
fact that would give a
different impressions.
Statement is
not accurate
Statement is
not accurate
and makes a
ridiculous claim
11
Label Description & Comparison■ Based on the percentage of claims verified, we categorize the entire article as:
– False: We couldn’t find any evidence that support any of the claims.
– Half baked/Partial truth: We could find refuting evidences for some claims and
supporting evidences for others.
– Opinions stated as facts: There are opinions in the articles.
■ Comparison with (Rashkin et al., 2017):
– Satire: mimics real news but still cues the
reader that it is not meant to be taken seriously
– Hoax: convinces readers of the validity of a
paranoia-fueled story
– Propaganda: misleads readers so that they
believe a particular political/social agenda
12
Corpus DetailsURL Authors Content Headline Primary Label Secondary Label
URL of
article
Can contain
anonymous
authors
Collection of
assertions
(gibberish
removed)
Headline of the
article
1. False
2. Partial Truth
3. Opinions
1. Fake
2. Questionable
Top 20 most common keywords in fake news corpus
13
Challenge to distinguish fake news from mainstreamAttributes Mainstream Questionable
Word count range 20-100 21-100
Char count range 89-700 109-691
Uppercase words 0-14 0-8
Mainstream Example WASHINGTON - An exhausted Iraqi Army faces daunting obstacles on
the battlefield that will most likely delay for months a major offensive
on the Islamic State stronghold of Mosul, American and allied officials
say. The delay is expected despite American efforts to keep Iraqs
creaky war machine on track. Although President Obama vowed to end
the United States role in the war in Iraq, in the last two years the
American military has increasingly provided logistics to prop up the
Iraqi military, which has struggled to move basics like food, water and
ammunition to its troops.
Questionable Example WASHINGTON - Hillary Clinton is being accused of knowingly allowing
American weapons into the hands of ISIS terrorists. Weapons that
Hillary Clinton sent off to Qatar ostensibly designed to give to the rebels
in Libya eventually made their way to Syria to assist the overthrow of
the Assad regime. The folks fighting against Assad were ISIS and al-
Qaeda jihadists.14
Classification Model
Bi-directional architecture using bi-LSTM (128 units in
each layer) and Char Embedding, to learn orthographic
and morphological features of the text, implemented
using 1-D CNN with temporal maxpooling.
Split Random K-Fold
Questionable
(1)
Mainstrea
m (0)
Questionable
(1)
Mainstream
(0)
Train 406 5334 396 5343
Test 90 1345 100 1336
Evaluation results of our model over various metrics.
Performance of Stratified K-Fold is exceptionally good in terms
of ROC and F1 scores.15
TRUTH OF VARYING SHADES: Analyzing Language in Fake News and Political Fact-Checking
Rashkin, Hannah, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. "Truth of varying shades: Analyzing language in fake news and political fact-checking." In Proceedings of the 2017 Conference on Empirical Methods in
Natural Language Processing, pp. 2931-2937. (2017)
16
Overview
■ Presents an analytic study on the language of news media in the context of political fact-checking
and fake news detection.
■ Presented a dataset of fake news articles crawled from fake news domain
■ 4 categories : Trusted News, Hoax, Satire , Propaganda
■ Compares the language of real news with that of satire, hoaxes, and propaganda to find linguistic
characteristics of untrustworthy text.
■ Presents a case study based on PolitiFact.com using their factuality judgments on a 6-point scale.
17
Fake News Analysis – Linguistic Discussion
■ Ratio refers to how frequently it appears in fake articles compared to trusted ones
18
Fake News Analysis – News Reliability Prediction
19
FEVER - A large scale dataset for Fact Extraction and VERification
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. Fever: a large scale dataset for fact extraction and verification. In Proceedings of the 2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), volume 1, pages 809–819 (2018)
20
Overview
■ 185,445 claims manually generated by altering sentences extracted from June, 2017 Wikipedia
dump
■ Manually annotated as SUPPORTED, REFUTED and NOTENOUGHINFO
– For the first two classes, annotators also recorded the sentence(s) used as evidence for the
judgement
■ To characterize the challenges posed by this dataset, developed a pipeline approach to for claim
verification task
■ Results:
– 31.87% Accuracy when requiring correct evidences to be retrieved for claims SUPPORTED OR
REFUTED
– 50.91% Accuracy if the correct evidences are ignored
21
BASELINE SYSTEM■ Pipeline approach with the following flow:
– Document Retrieval: k-nearest documents
using cosine similarity
– Sentence Selection: Simple IR methods to
rank sentences
– Recognizing Textual Entailment: . State of
the art model in RTE, decomposable
attention (DA)
■ Evaluation Measures:
– NOSCOREEV – accuracy of claim verification, neglecting the validity of evidence;
– SCOREEV – accuracy of claim verification with a requirement that the predicted evidence fully covers the gold evidence for SUPPORTED and REFUTED;
– F1 – between the predicted evidence sentences and the ones chosen by annotators.
22
Claim Verification Evidence Identification
NoScoreEv ScoreEV Recall Precision F1
50.91% 31.87% 45.89% 10.79% 17.47%
Full pipeline results on test set
TWOWINGOS: A Two-Wing Optimization Strategy for Evidential Claim Verification
Yin, Wenpeng, and Dan Roth. “TwoWingOS: A Two-Wing Optimization Strategy for Evidential Claim Verification.” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
(2018)
23
Problem Statement
■ A set of sentences 𝑆 as the candidate evidence space,
a claim 𝑥, and a decision space 𝑌 for the claim
verification.
■ Problem Definition: given a collection of evidence
candidates 𝑆 = {𝑠1, 𝑠2, … , 𝑠𝑖 … , 𝑠𝑚}, a claim x and a
decision set 𝑌 = {𝑦1, … , 𝑦𝑛}, the model TWOWINGOS
predicts a binary vector 𝑝 over 𝑆 and a one-hot vector
𝑜 over 𝑌 against the ground truth, a binary vector 𝑞and a one-hot vector 𝑧, respectively.
■ A binary vector over 𝑆 means a subset of sentences
(𝑆𝑒) act as evidence, and the one-hot vector indicates
a single decision (𝑦𝑖) to be made towards the claim 𝑥given the evidence 𝑆𝑒.
24
Evidence Identification■ Coarse-grained representation
– Directly concatenate the representation of si and 𝑥
■ Fine-grained representation (Inspired by “Attention Convolution”)
– Step 1: For each word in 𝑠𝑖, calculate its matching score to all z words in 𝑥
– Step 2: Use convolutional encoder to generate each word’s claim-aware representation
– Step 3: Compose these claim-aware word representations as the representation
for sentence 𝑠𝑖 by using a max-pooling over 𝒊𝑖𝑗
along with j, generating 𝒊𝑖. Let the entire process be denoted as 𝒊𝑖 = 𝑓𝑖𝑛𝑡(𝑠𝑖 , 𝑥)
– Step 4: The fine-grained evidence representation for 𝑠𝑖:
25
Evidence Identification
■ Loss Function
– A probability score 𝛼𝑖 ∈ (0,1) is calculated via a non-linear sigmoid function for each sentence
𝑠𝑖 showing the probability of it to be an evidence:
– Loss 𝑙𝑒𝑣 is calculated against a ground truth binary vector 𝑞 as a binary cross-entropy:
– As the output of this evidence identification module, the probability vector is binarized by 𝑝𝑖 =[𝛼𝑖 > 0.5] (“[x]” is 1 if x is true or 0 otherwise).
– 𝑝𝑖 indicates 𝑠𝑖 is evidence or not. All {𝑠𝑖} with 𝑝𝑖 = 1 act as evidence set 𝑆𝑒.
26
Claim Verification■ Coarse-grained representation
– All sentence representations in 𝑆𝑒 are summed up to create a representation 𝒆
■ Single-channel fine-grained representation
– 𝒊𝑖 = 𝑓𝑖𝑛𝑡(𝑠𝑖 , 𝑥) for 𝑠𝑖 and 𝒙𝐢 = 𝑓𝑖𝑛𝑡(𝑥, 𝑠𝑖) for 𝑥
■ Two-channel fine-grained representation
■ Loss function
– [𝒆, 𝐱] is forwarded to logistic regression layer in order to infer probability distribution 𝑜 over the
label spacy 𝑌, 𝒐 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑾. 𝒆, 𝒙 + 𝑏
– The loss 𝑙𝑐,𝑣 is implemented as negative log-likelihood, 𝑙𝑐,𝑣 = −log(𝒐. 𝒛𝑻)
– Hence, overall training loss for joint optimization 𝑙 = 𝑙𝑒,𝑣 + 𝑙𝑐,𝑣
27
Results
28
GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification
Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, Maosong Sun. GEAR: Graph based Evidence Aggregating and Reasoning
for Fact Verification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 892-901. (2019)
29
Overview■ Proposes a graph-based evidence aggregating and reasoning (GEAR) framework
■ Enables information to transfer on a fully-connected evidence graph
■ Utilizes different aggregators to collect multievidence information.
■ Motivation is to grasp sufficient relational and logical information among the evidence
30
Evidence Reasoning Graph
■ A fully-connected evidence graph where each node indicates a piece of evidence.
– Hidden state of the nodes at layer 𝑡 is represented as ℎ𝑡 = {ℎ1𝑡 , ℎ2
𝑡 , … , ℎ𝑚𝑡 } where ℎ𝑖
𝑡 ∈ ℝ𝐹𝑥1
and
– The initial hidden state of the node on layer 0, ℎ𝑖0 is initialized by the evidence representation
𝑒𝑖.
– An MLP to compute the attention coefficients between node 𝑖 and its neighbor node 𝑗 ∀𝑗 ∈ ℵ𝑖, where ℵ𝑖 denotes the set of neighbors of node 𝑖
■ Attention Aggregator:
31
Results
32
DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning
Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum. Declare: Debunking fake news and false claims using evidence-aware deep learning. In Proceedings of the 2018 Conference on Empirical Methods in
Natural Language Processing, pages 22–32. (2018)
33
Overview■ A neural network model that judiciously aggregates signals from external evidence articles:
– the language of these articles
– the trustworthiness of their sources
– informative features for generating user-comprehensible explanations
■ Problem Definition: Consider a set of 𝑁claims < 𝐶𝑛 > from the respective origins/sources <𝐶𝑆𝑛 >, where 𝑛 ∈ [1, 𝑁].
– Each claims 𝐶𝑛 is reported by a set of 𝑀 articles < 𝐴𝑚,𝑛 > along with their respective sources
< 𝐴𝑆𝑚,𝑛 >, where m ∈ [1,𝑀].
– Each corresponding tuple of claim and its origin, reporting articles and article sources - <𝐶𝑛, 𝐶𝑆𝑛, 𝐴𝑚,𝑛, 𝐴𝑆𝑚,𝑛 > forms a training instance, along with the credibility label of the claim as
ground truth during network training.
■ Example:
34
Framework for credibility assessment.
■ Upper part of the pipeline combines the article and claim embeddings to get the claim specific
attention weights.
■ Lower part of the pipeline captures the article representation through biLSTM.
■ Attention focused article representation along with the source embeddings are passed through
dense layers to predict the credibility score of the claim.35
Results – Snopes and PolitiFact
36
Results – NewsTrust and SemEval
37
Analysis
38
Sentence-Level Evidence Embedding for Claim Verification with Hierarchical Attention Networks
Jing Ma, Wei Gao, Shafiq Joty, and Kam-Fai Wong. Sentence-level evidence embedding for claim verification with hierarchical attention networks. In
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pages
2561–2571. (2019)
39
Problem Definition
■ A claim verification dataset is defined as {𝐶}, where each instance 𝐶 = (𝑦, 𝑐, 𝑆) is a tuple
representing a given claim 𝑐 which is associated with a ground truth label 𝑦 and a set of 𝑛sentences 𝑆 = {𝑠𝑖}𝑖=1
𝑛 from the relevant documents of the claim.
■ Approach exploits two core semantic relations:
– Coherence of the sentences: a coherence-based attention component by cross-
checking if any sentence 𝑠𝑖 ∈ 𝑆 coheres well with the claim and with other sentences
in 𝑆 in terms of topical consistency.
– Textual entailment between the claim and each sentence: an entailment-based
attention component that can be pre-trained on other dataset (SNLI) to capture
entailment relations based on sentence pairs labelled with NLI specific classes:
entails, contradicts and neutral.
40
System Design
■ Based on the attention weights, each sentence can be represented as the weighted sum of
all sentences, capturing its overall coherence:
■ Finally, the coherence based embedding is concatenated with original embedding to
obtain a richer sentence representation:
41
System Design■ Entailment-based Evidence Attention: enhancing the sentence representation by capturing the
entailment relations between the sentences and the claim based on the NLI method:
■ The overall model:
42
Experiment and Results
43
THANK YOU!
44