Machine Generation and Detection of Arabic Manipulated and ...

22
Introduction Machine Text Generation Detection Models Conclusion and Future Work Machine Generation and Detection of Arabic Manipulated and Fake News El Moatez Billah Nagoudi 1 , AbdelRahim Elmadany 1 , Muhammad Abdul-Mageed 1 , Tariq Alhindi 2 , Hasan Cavusoglu 3 [email protected] 1 Natural Language Processing Lab, 1,3 The University of British Columbia 2 Department of Computer Science, Columbia University 1/22 El Moatez Billah Nagoudi 1 , AbdelRahim Elmadany 1 , Muhammad Abdul-Mageed Machine Generation and Detection of Arabic Manipulated an

Transcript of Machine Generation and Detection of Arabic Manipulated and ...

Page 1: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Machine Generation and Detection of ArabicManipulated and Fake News

El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, MuhammadAbdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3

[email protected] Natural Language Processing Lab,

1,3 The University of British Columbia2 Department of Computer Science, Columbia University

1/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 2: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

1 Introduction

2 Machine Text Generation

3 Detection Models

4 Conclusion and Future Work

2/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 3: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Challenges & Goals

3/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 4: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Machine Text GenerationVeracityHuman Annotation

Machine Text Generation

Authentic news dataset

Part-of-speech (POS) tagger

Word embedding model

4/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 5: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Machine Text GenerationVeracityHuman Annotation

Our Method

5/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 6: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Machine Text GenerationVeracityHuman Annotation

Illustrative Example

News Title: PBðXàñJÊÓ 120 ÉK. A

�®Ó

�éKñÊ

��QK. úÍ@ É

�®�J�K

PQm×

1 Step1: Identify POS tags.

Words POS TagsPQm× → NOUN_PROP

É�®

�J�K → VERB

úÍ@ → PREP�éKñÊ

��QK. → NOUN_PROP

ÉK. A�®Ó → NOUN

120 → NUMàñJÊÓ → NOUN

PBðX → NOUN

6/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 7: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Machine Text GenerationVeracityHuman Annotation

Illustrative Example

2 Step 2: POS and Token Selection.

Words POS TagsPQm× → NOUN_PROP

É�®

�J�K → VERB

úÍ@ → PREP�éKñÊ

��QK. → NOUN_PROP

ÉK. A�®Ó → NOUN

120 → NUM

àñJÊÓ → NOUN

PBðX → NOUN

7/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 8: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Machine Text GenerationVeracityHuman Annotation

Illustrative Example

3 Step 3: Sentence Manipulation.

Subs. with 5-closest of �éKñÊ

��QK. Subs. with 5-closest of �

éKñÊ

��QK. , PQm× and 120

PBðXàñJÊÓ 120 ÉK. A

�®Ó YKPYÓ úÍ@ É

�®�J�K

PQm× PBðX

àñJÊÓ 350 ÉK. A�®Ó PYJË úÍ@ É

�®�J�K hC�

PBðXàñJÊÓ 120 ÉK. A

�®Ó ��PAK. úÍ@ É

�®�J�K

PQm× PBðXàñJÊÓ 450 ÉK. A

�®Ó YKPYÓ úÍ@ É

�®�J�K ú

æ�JÓ

PBðX

àñJÊÓ 120 ÉK. A�®Ó AJ�

�ËA

¯ úÍ@ É

�®�J�K

PQm× PBðX

àñJÊÓ 155 ÉK. A�®Ó ��PAK. úÍ@ É

�®�J�K ðYËA

KðP

PBðX

àñJÊÓ 120 ÉK. A�®Ó

àCJÓ úÍ@ É

�®�J�K

PQm× PBðXàñJÊÓ 280 ÉK. A

�®Ó AJ�

�ËA

¯ úÍ@ É

�®�J�K ú

GAÓ

PBðXàñJÊÓ 120 ÉK. A

�®Ó Q�

���

��AÓ úÍ@ É

�®�J�K

PQm× PBðX

àñJÊÓ 70 ÉK. A�®Ó AJÊJ�QÓ úÍ@ É

�®�J�K ðQKñ

«@

... ...

8/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 9: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Machine Text GenerationVeracityHuman Annotation

Veracity of the manipulated News

1 Stays True.

àñ

®K

B@ ñë ú

»

X

­

�KAë É

¯ @ →

àñ®K

B@ ñë ú

»

X

­

�KAë á�k

@

2 Becomes Change.

hAK. P @ úΫ

@

�éJ�Ë@ è

��

�®m�

�' ñºÓ@P

@ → hAK. P

@ úΫ

@

�éJ�Ë@ è

��

�®m�

�'

àð PAÓ @

9/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 10: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Machine Text GenerationVeracityHuman Annotation

Human Annotation

1 Manipulated text detection.

Human Vs. Machine

κ = 79.46%

2 Veracity of manipulated text.

True Vs. Fake

κ = 81.07%

Annotators Agreement (%)#Sent. Hum/Mach True/Fake %Fake

Hum 145 97.93 N/A N/A

Mach

ADJ 27 96.30 74.07 48.15ADJ_COMP 24 100 91.67 58.33ADJ_NUM 26 76.92 73.08 78.85NEG_PART 32 87.50 90.63 76.56N_NUM 19 100 73.68 76.32N_PROP 27 92.59 74.07 83.33Overall 155 94.67 80 70.32

Table 1: Inter-annotator agreement

10/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 11: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Machine Text GenerationVeracityHuman Annotation

Human Annotation Examples

Annotator 2 Annotator 1POS Gold Sentence

T/F M/H T/F M/H

True Mach Fake Mach N_PROPHum ��Ò

mÌ'@ ú

�æk AêË AÖßQº

�K éJ

�KX

�HBA

®

�Jk@ð

�HAJJ.

�®Ë @ ú

¯ ¨ñ��

��ë@QÖÏ @ QKQ

�K �Y

�®Ë@ Q

KA

g

X

Mach ��ÒmÌ'@ ú

�æk AêË AÖßQº

�K éJ

�KX

�HBA

®�Jk@ð

�HAJJ.

�®Ë @ ú

¯ ¨ñ�� É

®¢Ë@ QKQ

�K �Y

�®Ë@ Q

KA

g

X

True Mach Fake Hum N_PROPHum éJ

�JªJ. Ë @ éK. A�ªË@ è @ Õºk è@ ú

Í@

��@QªË@ ¨Ag. P@ð éJ�AJ�Ë@ éJÊÒªË@  AJ.k@

¬Yî

�D�

�� èQÓ@

ñÖÏ @ è

Mach éJ�JªJ. Ë @ éK. A�ªË@ è @ Õºk è@ ú

Í@

àXPB@ ¨Ag. P@ð éJ�AJ�Ë@ éJÊÒªË@  AJ.k@

¬Yî

�D�

�� èQÓ@

ñÖÏ @ è

Fake Hum True Mach ADJHum ÉJj

���ÖÏ @ A

JÓ I. Ê¢

�� éÓñºmÌ'@ð

�ñ

Q̄Ó YKYm.

Ì'@ ú×C«B@ ék. ñ

�JË @

à@ ©K. A

�Kð

Mach ÉJj���ÖÏ @ A

JÓ I. Ê¢

�� éÓñºmÌ'@ð ÉÊ

�Ó YKYm.

Ì'@ ú×C«B@ ék. ñ

�JË @

à@ ©K. A

�Kð

Fake Hum Fake Mach NEG_PARTHum ù

KA

�¯ P@Q

�¯ ú

Í@ Y

KA« ½Ë

X

à@ð éKPY

Jº�B@ ú

¯ YªK. AÒîE.

�IJ. Ë @ Õ

�æK ÕË áKYª

�®Ó

à@ l�

�ð@ð

Mach ù

KA

�¯ P@Q

�¯ ú

Í@ Y

KA« ½Ë

X

à@ð éKPY

Jº�B@ ú

¯ YªK. AÒîE.

�IJ. Ë @ Õ

�æK

áKYª�®Ó

à@ l�

�ð@ð

Fake Hum True Mach ADJ_NUMHum é«A

J�Ë@ �

®

K áÓ

�éJ

K A

�K

��Y

JK. ø

@

�I

�®k I. »Q

�K áKðð ø

Yë 骢

�®ËAë Qª� ú

Gñ¢ª

�K ÕºJK. @

Mach é«AJ�Ë@ �

®

K áÓ ©K. @P

��Y

JK. ø

@

�I

�®k I. »Q

�K áKðð ø

Yë 骢

�®ËAë Qª� ú

Gñ¢ª

�K ÕºJK. @

Table 2: Examples of disagreement between annotators

11/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 12: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Models OverviewManipulated Text DetectionFake News Detection

Models Overview

12/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 13: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Models OverviewManipulated Text DetectionFake News Detection

Data Splits

# SplitHuman Machine Manipulated

# Sent. ADJ ADJ_COMP ADJ_NUM N_NUM N_PROP NEG_PART Total

TRAIN 48, 727 9, 600 4, 513 5, 752 9, 600 9, 600 9, 600 48, 665

DEV 6, 573 1, 300 638 844 1, 300 1, 300 1, 300 6, 682

TEST 5, 895 1, 200 592 665 1, 200 1, 200 1, 200 6, 057

Table 3: TRAIN, DEV, and TEST splits of ATB+ and AraNews+

13/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 14: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Models OverviewManipulated Text DetectionFake News Detection

Model, Hyper-Parameters, and Data.

1 Models.mBERT (Devlin et al., 2018)..AraBERT (Antoun et al., 2020).XLM-RBase (Conneau et al., 2020).XLM-RLarge (Conneau et al., 2020).

2 Hyper-Parameters.25 epochs,Batch size of 32,Max sequence length of 128,Learning rate of 1e5.

3 Training & Evaluation Data.TRAIN split of (a) ATB+ and (b) AraNews+ .DEV and TEST splits from either ATB+ or AraNews+.

14/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 15: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Models OverviewManipulated Text DetectionFake News Detection

Fake News Detection

Our generated data: ATB+ and Aranews+

Improving fake news detection on an external dataset

Evaluating on Khouja (2020):

3,072 true sentences

1,475 fake sentences

15/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 16: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Models OverviewManipulated Text DetectionFake News Detection

Results on Khouja (2020) Data

16/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 17: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Models OverviewManipulated Text DetectionFake News Detection

Results on Khouja (2020) Data

17/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 18: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Models OverviewManipulated Text DetectionFake News Detection

Results on Khouja (2020) Data

18/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 19: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Models OverviewManipulated Text DetectionFake News Detection

Results on Khouja (2020) Data

19/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 20: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Models OverviewManipulated Text DetectionFake News Detection

Results on Khouja (2020) Data

20/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 21: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

Conclusion and Future Work

1 New method

2 A new large-scale POS-tagged Arabic news dataset

3 New models:

Detecting manipulated news textDetecting fake news

4 New SOTA on the task of fake news detection

https://github.com/UBC-NLP/wanlp2020_arabic_fake_news_detection

21/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News

Page 22: Machine Generation and Detection of Arabic Manipulated and ...

IntroductionMachine Text Generation

Detection ModelsConclusion and Future Work

22/22El Moatez Billah Nagoudi1, AbdelRahim Elmadany1, Muhammad Abdul-Mageed1, Tariq Alhindi2, Hasan Cavusoglu3Machine Generation and Detection of Arabic Manipulated and Fake News