Detection of Text Lines of Handwritten Arabic Manuscripts ...
Machine Generation and Detection of Arabic Manipulated and ...
Transcript of Machine Generation and Detection of Arabic Manipulated and ...
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Machine Generation and Detection of ArabicManipulated and Fake News
El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, MuhammadAbdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3
[email protected] Natural Language Processing Lab,
1,3 The University of British Columbia2 Department of Computer Science, Columbia University
1/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
1 Introduction
2 Machine Text Generation
3 Detection Models
4 Conclusion and Future Work
2/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Challenges & Goals
3/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Machine Text GenerationVeracityHuman Annotation
Machine Text Generation
Authentic news dataset
Part-of-speech (POS) tagger
Word embedding model
4/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Machine Text GenerationVeracityHuman Annotation
Our Method
5/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Machine Text GenerationVeracityHuman Annotation
Illustrative Example
News Title: PBðXàñJÊÓ 120 ÉK. A
�®Ó
�éKñÊ
��QK. úÍ@ É
�®�J�K
PQm×
1 Step1: Identify POS tags.
Words POS TagsPQm× → NOUN_PROP
É�®
�J�K → VERB
úÍ@ → PREP�éKñÊ
��QK. → NOUN_PROP
ÉK. A�®Ó → NOUN
120 → NUMàñJÊÓ → NOUN
PBðX → NOUN
6/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Machine Text GenerationVeracityHuman Annotation
Illustrative Example
2 Step 2: POS and Token Selection.
Words POS TagsPQm× → NOUN_PROP
É�®
�J�K → VERB
úÍ@ → PREP�éKñÊ
��QK. → NOUN_PROP
ÉK. A�®Ó → NOUN
120 → NUM
àñJÊÓ → NOUN
PBðX → NOUN
7/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Machine Text GenerationVeracityHuman Annotation
Illustrative Example
3 Step 3: Sentence Manipulation.
Subs. with 5-closest of �éKñÊ
��QK. Subs. with 5-closest of �
éKñÊ
��QK. , PQm× and 120
PBðXàñJÊÓ 120 ÉK. A
�®Ó YKPYÓ úÍ@ É
�®�J�K
PQm× PBðX
àñJÊÓ 350 ÉK. A�®Ó PYJË úÍ@ É
�®�J�K hC�
PBðXàñJÊÓ 120 ÉK. A
�®Ó ��PAK. úÍ@ É
�®�J�K
PQm× PBðXàñJÊÓ 450 ÉK. A
�®Ó YKPYÓ úÍ@ É
�®�J�K ú
æ�JÓ
PBðX
àñJÊÓ 120 ÉK. A�®Ó AJ�
�ËA
¯ úÍ@ É
�®�J�K
PQm× PBðX
àñJÊÓ 155 ÉK. A�®Ó ��PAK. úÍ@ É
�®�J�K ðYËA
KðP
PBðX
àñJÊÓ 120 ÉK. A�®Ó
àCJÓ úÍ@ É
�®�J�K
PQm× PBðXàñJÊÓ 280 ÉK. A
�®Ó AJ�
�ËA
¯ úÍ@ É
�®�J�K ú
GAÓ
PBðXàñJÊÓ 120 ÉK. A
�®Ó Q�
���
��AÓ úÍ@ É
�®�J�K
PQm× PBðX
àñJÊÓ 70 ÉK. A�®Ó AJÊJ�QÓ úÍ@ É
�®�J�K ðQKñ
«@
... ...
8/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Machine Text GenerationVeracityHuman Annotation
Veracity of the manipulated News
1 Stays True.
àñ
®K
B@ ñë ú
»
X
�KAë É
�
¯ @ →
àñ®K
B@ ñë ú
»
X
�KAë á�k
@
2 Becomes Change.
hAK. P @ úΫ
@
�éJ�Ë@ è
Yë
��
�®m�
�' ñºÓ@P
@ → hAK. P
@ úΫ
@
�éJ�Ë@ è
Yë
��
�®m�
�'
àð PAÓ @
9/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Machine Text GenerationVeracityHuman Annotation
Human Annotation
1 Manipulated text detection.
Human Vs. Machine
κ = 79.46%
2 Veracity of manipulated text.
True Vs. Fake
κ = 81.07%
Annotators Agreement (%)#Sent. Hum/Mach True/Fake %Fake
Hum 145 97.93 N/A N/A
Mach
ADJ 27 96.30 74.07 48.15ADJ_COMP 24 100 91.67 58.33ADJ_NUM 26 76.92 73.08 78.85NEG_PART 32 87.50 90.63 76.56N_NUM 19 100 73.68 76.32N_PROP 27 92.59 74.07 83.33Overall 155 94.67 80 70.32
Table 1: Inter-annotator agreement
10/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Machine Text GenerationVeracityHuman Annotation
Human Annotation Examples
Annotator 2 Annotator 1POS Gold Sentence
T/F M/H T/F M/H
True Mach Fake Mach N_PROPHum ��Ò
mÌ'@ ú
�æk AêË AÖßQº
�K éJ
�KX
�HBA
®
�Jk@ð
�HAJJ.
�®Ë @ ú
¯ ¨ñ��
��ë@QÖÏ @ QKQ
�K �Y
�®Ë@ Q
KA
g
X
Mach ��ÒmÌ'@ ú
�æk AêË AÖßQº
�K éJ
�KX
�HBA
®�Jk@ð
�HAJJ.
�®Ë @ ú
¯ ¨ñ�� É
®¢Ë@ QKQ
�K �Y
�®Ë@ Q
KA
g
X
True Mach Fake Hum N_PROPHum éJ
�JªJ. Ë @ éK. A�ªË@ è @ Õºk è@ ú
Í@
��@QªË@ ¨Ag. P@ð éJ�AJ�Ë@ éJÊÒªË@ AJ.k@
¬Yî
�D�
�� èQÓ@
ñÖÏ @ è
Yë
Mach éJ�JªJ. Ë @ éK. A�ªË@ è @ Õºk è@ ú
Í@
àXPB@ ¨Ag. P@ð éJ�AJ�Ë@ éJÊÒªË@ AJ.k@
¬Yî
�D�
�� èQÓ@
ñÖÏ @ è
Yë
Fake Hum True Mach ADJHum ÉJj
���ÖÏ @ A
JÓ I. Ê¢
�� éÓñºmÌ'@ð
�ñ
Q̄Ó YKYm.
Ì'@ ú×C«B@ ék. ñ
�JË @
à@ ©K. A
�Kð
Mach ÉJj���ÖÏ @ A
JÓ I. Ê¢
�� éÓñºmÌ'@ð ÉÊ
�Ó YKYm.
Ì'@ ú×C«B@ ék. ñ
�JË @
à@ ©K. A
�Kð
Fake Hum Fake Mach NEG_PARTHum ù
KA
�
�¯ P@Q
�¯ ú
Í@ Y
KA« ½Ë
X
à@ð éKPY
Jº�B@ ú
¯ YªK. AÒîE.
�IJ. Ë @ Õ
�æK ÕË áKYª
�®Ó
à@ l�
�ð@ð
Mach ù
KA
�
�¯ P@Q
�¯ ú
Í@ Y
KA« ½Ë
X
à@ð éKPY
Jº�B@ ú
¯ YªK. AÒîE.
�IJ. Ë @ Õ
�æK
áKYª�®Ó
à@ l�
�ð@ð
Fake Hum True Mach ADJ_NUMHum é«A
J�Ë@ �
®
K áÓ
�éJ
K A
�K
��Y
JK. ø
@
�I
�®k I. »Q
�K áKðð ø
Yë 骢
�®ËAë Qª� ú
Gñ¢ª
�K ÕºJK. @
Mach é«AJ�Ë@ �
®
K áÓ ©K. @P
��Y
JK. ø
@
�I
�®k I. »Q
�K áKðð ø
Yë 骢
�®ËAë Qª� ú
Gñ¢ª
�K ÕºJK. @
Table 2: Examples of disagreement between annotators
11/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Models OverviewManipulated Text DetectionFake News Detection
Models Overview
12/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Models OverviewManipulated Text DetectionFake News Detection
Data Splits
# SplitHuman Machine Manipulated
# Sent. ADJ ADJ_COMP ADJ_NUM N_NUM N_PROP NEG_PART Total
TRAIN 48, 727 9, 600 4, 513 5, 752 9, 600 9, 600 9, 600 48, 665
DEV 6, 573 1, 300 638 844 1, 300 1, 300 1, 300 6, 682
TEST 5, 895 1, 200 592 665 1, 200 1, 200 1, 200 6, 057
Table 3: TRAIN, DEV, and TEST splits of ATB+ and AraNews+
13/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Models OverviewManipulated Text DetectionFake News Detection
Model, Hyper-Parameters, and Data.
1 Models.mBERT (Devlin et al., 2018)..AraBERT (Antoun et al., 2020).XLM-RBase (Conneau et al., 2020).XLM-RLarge (Conneau et al., 2020).
2 Hyper-Parameters.25 epochs,Batch size of 32,Max sequence length of 128,Learning rate of 1e5.
3 Training & Evaluation Data.TRAIN split of (a) ATB+ and (b) AraNews+ .DEV and TEST splits from either ATB+ or AraNews+.
14/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Models OverviewManipulated Text DetectionFake News Detection
Fake News Detection
Our generated data: ATB+ and Aranews+
Improving fake news detection on an external dataset
Evaluating on Khouja (2020):
3,072 true sentences
1,475 fake sentences
15/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Models OverviewManipulated Text DetectionFake News Detection
Results on Khouja (2020) Data
16/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Models OverviewManipulated Text DetectionFake News Detection
Results on Khouja (2020) Data
17/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Models OverviewManipulated Text DetectionFake News Detection
Results on Khouja (2020) Data
18/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Models OverviewManipulated Text DetectionFake News Detection
Results on Khouja (2020) Data
19/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Models OverviewManipulated Text DetectionFake News Detection
Results on Khouja (2020) Data
20/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
Conclusion and Future Work
1 New method
2 A new large-scale POS-tagged Arabic news dataset
3 New models:
Detecting manipulated news textDetecting fake news
4 New SOTA on the task of fake news detection
https://github.com/UBC-NLP/wanlp2020_arabic_fake_news_detection
21/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News
IntroductionMachine Text Generation
Detection ModelsConclusion and Future Work
22/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News