Protein threading using context specific alignment potential ismb-2013
Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services
-
Upload
precog -
Category
Engineering
-
view
211 -
download
0
Transcript of Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services
![Page 1: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/1.jpg)
TechniquesforAutomatingQualityAssessmentofContext-specificContentonSocialMediaServices
Prateek DewanPhDThesisDefense
November14,2017
CommitteemembersDr.AlessandraSala
Dr.Sanasam Ranbir Singh
Dr.AdityaTelang
Dr.Ponnurangam Kumaraguru (Advisor)
![Page 2: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/2.jpg)
WhoamI?
• DataScientistatApple• PhDstudentsinceFebruary,2012– IIIT-Delhi• Masters(2010– 2012), IIIT-Delhi
• Collaborations• IBMIRL(DelhiandBengaluru), SymantecResearchLabs(Pune), DublinCityUniversity(Ireland),UFMG(Brazil)
• WorkedinPrivacyandSecurityonOnlineSocialMedia
• Researchinterests• AppliedMachineLearning
• NaturalLanguageProcessing• WebSecurity
2
![Page 3: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/3.jpg)
OnlineSocialMedia:TheBigPicture
3
![Page 4: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/4.jpg)
“Withgreatpowercomesgreatresponsibility”
4
![Page 5: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/5.jpg)
Thesisstatement
• Todesignandevaluateautomatedtechniquesforqualityassessmentofcontext-specificcontentonsocialmediaservicesinrealtime
• Focus:Facebook• BiggestOnlineSocialMediaservice
• 2.01billionmonthlyactiveusers
• Every2outof7humanbeingsontheplanetusesFacebook
• Mostsought-afterOSNfornews
5
![Page 6: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/6.jpg)
ProposedSolution
6
Identify Characterize Model
PrototypeDeployEvaluate
![Page 7: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/7.jpg)
FacebookInspector:Demo
7
![Page 8: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/8.jpg)
Scope
• Establishingthedefinitionofpoorqualitycontent•Whatallcontentispoorinquality?• Untrustworthy• Childunsafe• Misleadinginformation
• Hoaxes,scams,clickbait
• Violence,hatespeech• Definitionconformingto• Facebook’scommunitystandards1
• Definitionsofpagespam
81https://www.facebook.com/communitystandards
![Page 9: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/9.jpg)
Approach
•Poorqualityposts published onFacebook• Facebook pages publishing poorqualitycontent•Misinformation spreadonFacebookthroughimages
Characterize
•GroundtruthextractionusingURLblacklists, andhumanannotation
•Experimentswithmultiple supervised learningtechniques
•Two-foldmodeltoidentifymalicious contentinrealtimeModel
•FacebookInspector (FbI)Architecture
• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox
•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed
•Evaluation intermsofresponse time,performance,andusability
Implement
9
![Page 10: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/10.jpg)
Approach
• Poorqualityposts publishedonFacebook•Facebook pages publishing poorqualitycontent•Misinformation spreadonFacebookthroughimages
Characterize
•GroundtruthextractionusingURLblacklists, andhumanannotation
•Experimentswithmultiple supervised learningtechniques
•Two-foldmodeltoidentifymalicious contentinrealtimeModel
•FacebookInspector (FbI)Architecture
• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox
•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed
•Evaluation intermsofresponse time,performance,andusability
Implement
10
![Page 11: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/11.jpg)
Dataset
DataType Quantity
Uniqueposts 4,465,371
Uniqueentities 3,373,953
Uniqueusers 2,983,707
Uniquepages 390,246
UniqueURLs 480,407
Uniquepostswithoneormore URLs 1,222,137
UniqueentitiespostingURLs 856,758
UniquepostswithoneormoremaliciousURLs 11,217
Uniqueentitiespostingone ormoremaliciousURLs 7,962
Unique maliciousURLs 4,622
11
![Page 12: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/12.jpg)
EstablishingGroundTruth
• ExtractedpostscontainingoneormoreURLs• 1.2millionoutof4.4millionpostsintotal
• 480kuniqueURLs• UsedsixURLblacklists• GoogleSafebrowsing (malware/phishing)• VirusTotal (spam/malware/phishing)• Surbl (spam)• WebofTrust(trustscore)*
• SpamHaus (spam)• Phishtank (phishing)
• PostcontainingoneormoreblacklistedURLmarkedaspoorqualityposts (11,217inall)
12
![Page 13: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/13.jpg)
WebofTrust
13
Reputation:Unsatisfactory/Poor/Verypoor (lessthan60)Confidence:High(greaterthan10)
ORCategory:Negative
Malicious
http://www.domain.com
![Page 14: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/14.jpg)
Findings
• Facebook’scurrenttechniquesdonotsuffice• 65%ofallpoorqualitypostsexistedonFacebookafter4(ormore)months• Gatheredlikes from52,169uniqueusers;comments from8,784uniqueusers
• Facebook’spartnershipwithWebofTrust?• 88%ofallmaliciousURLshadpoorreputationonWOT
• Nowarningpages
14
![Page 15: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/15.jpg)
Platformsusedtopost
15
![Page 16: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/16.jpg)
Distributionofpoorqualityposts
16
Pages Users
Entities Posts
![Page 17: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/17.jpg)
Approach
•Poorqualityposts published onFacebook• Facebook pages publishingpoorqualitycontent•Misinformation spreadonFacebookthroughimages
Characterize
•GroundtruthextractionusingURLblacklists, andhumanannotation
•Experimentswithmultiple supervised learningtechniques
•Two-foldmodeltoidentifymalicious contentinrealtimeModel
•FacebookInspector (FbI)Architecture
• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox
•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed
•Evaluation intermsofresponse time,performance,andusability
Implement
17
![Page 18: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/18.jpg)
FacebookPagespostingpoorqualitycontent
18
HidinginPlainSight:CharacterizingandDetectingMaliciousFacebookPages. Prateek Dewan,Shrey Bagroy,andPonnurangamKumaraguru (Shortpaper).PublishedatIEEE/ACMConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM), San
Francisco,USA.2016.
![Page 19: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/19.jpg)
GroundTruthextraction:Facebookpages
4.4millionposts
10,341maliciousposts
(1,557pages;5,868users)
627malicious
pages
19
1ormoremaliciousURLsin
themostrecent100posts
![Page 20: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/20.jpg)
Datasetofpages postingpoorqualitycontent
WOTresponse No.ofpages No. ofposts
Childunsafe 387 10,891
Untrustworthy 317 8,057
Questionable 312 8,859
Negative 266 5,863
Adult content 162 3,290
Spam 124 4,985
Phishing 39 495
Total 627(31) 20,999
20
• NumbersinbracketsareVerifiedpages
![Page 21: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/21.jpg)
Contentanalysis(pagenames)
21
• SentenceTokenizationàWordTokenizationà CasenormalizationàStemmingà Stopword removal
• N-gramanalysis(n=1,2,3)
• Politicallypolarizedentitiesamongstpoorqualitypages• BritishNationalParty(BNP),TheTeaParty,EnglishDefenseLeague,AmericanDefenseLeague,AmericanConservatives,GeertWilderssupporters…
![Page 22: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/22.jpg)
Networkanalysis
22
• Collusivebehaviorwithinpages postingpoorqualitycontent
Shares LikesComments
![Page 23: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/23.jpg)
Temporalactivity
• Activityratio:"#.#%&'()*"'&+,-&'.)&#&,/"#.#%&'()*"'&+ duringcompleteobservationperiod
• Maliciouspagesaremoreactivethanbenignpages
23
![Page 24: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/24.jpg)
Approach
•Poorqualityposts published onFacebook• Facebook pages publishing poorqualitycontent• MisinformationspreadonFacebookthroughimages
Characterize
•GroundtruthextractionusingURLblacklists, andhumanannotation
•Experimentswithmultiple supervised learningtechniques
•Two-foldmodeltoidentifymalicious contentinrealtimeModel
•FacebookInspector (FbI)Architecture
• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox
•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed
•Evaluation intermsofresponse time,performance,andusability
Implement
24
![Page 25: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/25.jpg)
Why?:TheHumanBrain- Imagesversustext
• Humanbrainprocessesimages60,000timesfasterthantext
25
![Page 26: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/26.jpg)
Arewedoingenoughto"understand" images?
• Mostresearchtoanalyzesocialmediacontentfocusesontext• Topicmodelling
• Sentimentanalysis
• Doesitcaptureeverything?• Studiesrelatedtoimagesarelimitedtosmallscale• Fewhundred imagesmanuallyannotatedandanalyzed
• Whatcanbedone?• Automated techniquesforimagesummarization;DeepLearningandConvolutionalNeuralNetworks(CNNs)toscaleacrosslargeno.ofimages
• Domaintransferlearning
• OpticalCharacterRecognition
26
![Page 27: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/27.jpg)
Methodology
• ImagespostedonFacebookduringtheParisAttacks,November2015
• 3-tierpipelineforextractinghighlevelimagedescriptorsfromimages
27
Uniqueposts 131,548
Unique users 106,275
Postswithimages 75,277
Total imagesextracted 57,748
Totaluniqueimages 15,123
Images
Themes(Inceptionv3)
ImageSentiment(DeCAF trainedon
SentiBank)
OpticalCharacterRecognition
Humanunderstandabledescriptors
TextSentiment(LIWC) +Topics(TF)
Manualcalibration
Tier1:VisualThemes
Tier2:ImageSentiment
Tier3:Textembeddedinimages
![Page 28: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/28.jpg)
TierI:VisualThemes
• ImageNetLargeScaleVisualRecognitionChallenge(ILSVRC),2012• 1.2millionimages,1,000categories
•Winner:Google’sInception-v3(top-1error:17.2%)• 48-layerDeepConvolutionalNeuralNetwork
28
![Page 29: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/29.jpg)
TierI:VisualThemescontd.
• AllimageslabeledusingInception-v3
• Validation:• Randomsampleof2,545imagesannotatedby3humanannotators
• 38.87%accuracy(majorityvoting)
•Manualcalibration• Renamed7outofthetop30(mostfrequentlyoccurring)labels
• Newaccuracy:51.3%•Whyrename?à
29
BoloTie
(Inception-v3)
PeaceForParis
(Ourdataset)
![Page 30: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/30.jpg)
TierII:ImageSentiment
• DomainTransferLearning
• Inception-v3’slastlayerretrainedusingSentiBank• SentiBank• ImagescollectedfromFlickrusingAdjectiveNounPairs(ANPs)assearchquery
• ANPs:happydog,adorablebaby,abandonedhouse• Weaklylabeleddatasetofimagescarryingemotion
• Finaltrainingset– 133,108negative+305,100positivesentimentimages
• 10-foldrandomsubsampling
• 69.8% accuracy
30
![Page 31: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/31.jpg)
TierIII:Textembeddedinimages
• OpticalCharacterRecognition(OCR)• TesseractOCR(Python)
• 31,689imageshadtext
• Manuallyextractedtextfromarandomsampleof1,000images
• ComparedwithOCRoutputusingstringsimilaritymetrics
• ~62%accuracy
31
Tesseractoutput:
No-onethinksthatthesepeoplearerepresentativeofChristians.SowhydosomanythinkthatthesepeoplearerepresentativeofMuslims?
![Page 32: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/32.jpg)
Imageandposttexthaddifferenttopics
• Textembeddedinimagesdepictedmorenegativesentimentthanusergeneratedtextualcontent
32
Textembedded inimages Usergeneratedtext
![Page 33: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/33.jpg)
Sentiment:Imagesversustext
• Imagesentimentwasmorepositivethantextsentiment
33
0
0.1
0.2
0.3
0.4
0.5
0.6
8 24 40 56 72 88 104 120 136 152 168 184 200 216 232 248 264 280
Sentim
entValue
/Vo
lumeFractio
n
No.ofhoursaftertheattacks
PostText ImageTextImage VolumeFraction
![Page 34: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/34.jpg)
Poorqualityimagecontent popularonFacebook
34
![Page 35: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/35.jpg)
Approach
•Poorqualityposts published onFacebook• Facebook pages publishing poorqualitycontent•Misinformation spreadonFacebookthroughimages
Characterize
•GroundtruthextractionusingURLblacklists, andhumanannotation
•Experimentswithmultiple supervised learningtechniques
•Two-foldmodeltoidentifymalicious contentinrealtimeModel
•FacebookInspector (FbI)Architecture
• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox
•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed
•Evaluation intermsofresponse time,performance,andusability
Implement
35
![Page 36: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/36.jpg)
Revisiting-- EstablishingGroundTruth
• ExtractedpostscontainingoneormoreURLs• 1.2millionoutof4.4millionpostsintotal
• 480kuniqueURLs• UsedsixURLblacklists• GoogleSafebrowsing (malware/phishing)• VirusTotal (spam/malware/phishing)• Surbl (spam)• WebofTrust(trustscore)*
• SpamHaus (spam)• Phishtank (phishing)
• PostcontainingoneormoreblacklistedURLmarkedaspoorqualityposts (11,217inall)
36
![Page 37: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/37.jpg)
GroundTruthextraction– DatasetII
•WhatifapostdoesnothaveaURL?
• 500randomFacebookpostsx17eventsx3annotators
• Definitionofmaliciouspost• “AnyirrelevantorunsolicitedmessagessentovertheInternet,typicallytolargenumbersofusers,forthepurposesofadvertising,phishing,spreadingmalware,etc.arecategorizedasspam.Intermsofonlinesocialmedia,socialspamisanycontentwhichisirrelevant/unrelatedtotheeventunderconsideration,and/oraimedatspreadingphishing,malware,advertisements,selfpromotionetc.,includingbulkmessages,profanity, insults,hatespeech,maliciouslinks,fraudulentreviews,scams,fakeinformationetc.”
• Finaldataset(all3annotatorsagreedonthesamelabel)• 571maliciousposts
• 3,841benignposts
37
![Page 38: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/38.jpg)
Featureset:FacebookPosts
Source Features
Entity (9) isPage, gender,pageCategory,hasUsername,usernameLength,
nameLength,numWordsInName, locale,pageLikes
Textualcontent
(18)
Presenceof!,?,!!,??, emoticons(smile,frown),numWords,
avgWordLength,numSentences,avgSentenceLength,
numDictionaryWords,numHashtags,hashtagsPerWord,numCharacters,
numURLs,URLsPerWord,numUppercaseCharacters,numWords /
numUniqueWords
Metadata(10) Application,Presence offacebook.com URL,Presenceof
apps.facebook.com URL,PresenceofFacebookeventURL,hasMessage,
hasStory,hasPicture,hasLink,type, linkLength
Link(7) http/https,numHyphens, numParameters,avgParameterLength,
numSubdomains, pathLength
38
![Page 39: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/39.jpg)
Supervisedlearning:DatasetI
Classifier/Features
Entity Text Metadata Link All Top 7
NaïveBayes 54.79 52.41 71.60 69.25 56.15 74.72
DecisionTree 63.02 64.78 80.56 82.34 84.67 86.17
RandomForest 63.47 66.25 80.67 82.56 85.05 86.62
SVMrbf 61.77 64.89 78.75 81.45 75.89 83.66
39
![Page 40: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/40.jpg)
Supervisedlearning:DatasetII
Classifier/Features
Entity Text Metadata Link All
NaïveBayes 51.67 51.60 72.45 77.58 67.63
DecisionTree 51.66 73.16 79.01 81.04 76.17
RandomForest 52.86 76.56 79.87 81.49 80.56
SVMrbf 53.16 76.52 78.18 80.37 73.79
40
![Page 41: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/41.jpg)
Featureset:FacebookPages
Pagefeatures Likes,talking about,descriptionlength,bio,category,name,location,check-ins,…
Postingbehavior
Dailyactivityratio,posttypes,postlikes,postcomments,postshares,postengagementratio,postlanguage,averagepostlength,no.ofuniqueURLsinposts,no.ofuniquedomainsinposts,etc.
41
• Supervised learning• Page+postfeatures• 55featuresfrompageinformation
• 41featuresfrompostingbehavior
• Bagofwords• Contentgeneratedbypages
![Page 42: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/42.jpg)
Supervisedlearning:Page+postfeatures
Classifier Featureset Accuracy(%) ROCAUC
NaïveBayesian
Page 63.95 0.685
Post 69.61 0.753
Page+Post 70.81 0.776
LogisticRegression
Page 67.38 0.745
Post 76.55 0.825
Page+Post 76.71 0.846
DecisionTrees
Page 65.55 0.668
Post 71.37 0.720
Page+Post 70.81 0.758
Random Forest
Page 67.86 0.750
Post 74.95 0.829
Page+Post 75.27 0.83742
![Page 43: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/43.jpg)
Supervisedlearning:Bagofwords
Classifier Featureset Accuracy (%) ROCAUC
NaïveBayesian
Unigrams 68.27 0.682
Bigrams 69.06 0.690
Trigrams 69.77 0.697
LogisticRegression
Unigrams 74.18 0.795
Bigrams 74.34 0.791
Trigrams 73.93 0.789
Decision Trees
Unigrams 68.12 0.678
Bigrams 67.05 0.678
Trigrams 66.63 0.672
RandomForest
Unigrams 72.26 0.794
Bigrams 71.80 0.802
Trigrams 72.18 0.794
Sparse NN
Unigrams 81.74 0.862
Bigrams 84.12 0.872
Trigrams 84.13 0.90043
![Page 44: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/44.jpg)
Modelforrealtimedetection
•Modelforpagesdependsonpostspublishedbypages• Can’tbeusedfordetectioninrealtime
• Twofoldsupervisedlearningbasedmodelusingpostfeatures
• Utilizingclassprobabilitiesfordecisionmaking
44
![Page 45: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/45.jpg)
Decisionboundary
45Classifier1
Classifier2
1
10
High
High
LowMalicious
Benign
![Page 46: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/46.jpg)
Approach
•Poor qualityposts published onFacebook• Facebook pages publishing poorqualitycontent•Misinformation spreadonFacebookthroughimages
Characterize
•GroundtruthextractionusingURLblacklists, andhumanannotation
•Experimentswithmultiple supervised learningtechniques
•Two-foldmodeltoidentifymalicious contentinrealtimeModel
•FacebookInspector (FbI)Architecture
• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox
•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed
•Evaluation intermsofresponse time,performance,andusability
Implement
46
![Page 47: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/47.jpg)
FacebookInspector(FbI):Architecture
47
![Page 48: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/48.jpg)
FbI stats
Dateofpublic launch August23,2015
Total IncomingRequests 9million+
Total publicpostsanalyzed 3.5million+
Totaldownloads 5,000+
Dailyactiveusers 250+
Totaluniquebrowsers 1,250+
Postsmarkedasmalicious 615,000+
Postsmarkedasbenign 2.9million+
48
![Page 49: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/49.jpg)
FbI evaluation:Responsetime
49
• ~80%postsprocessedwithin3seconds
• Averagetimeperpost:2.635seconds
![Page 50: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/50.jpg)
FbI evaluation:Usability
• Usabilitystudywith53participants• SUSscore:81.36(Agrade)• Higherperceivedusabilitythat>90%ofallsystemsevaluatedusingSUSscale
• 98.1%participantsfoundFbI “easytouse”• 67.9%participantswouldlikeuseFbI frequently• Quotesfromusers:• “Savesyourtimespentonspamlinksandhenceenhancesuserexperience.”• “[FacebookInspector]Canbeusefulforminorsandpeoplewholackthejudgementtodecidehowthepostis.”
50
![Page 51: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/51.jpg)
Contributionssummary
• IdentifiedandcharacterizedpoorqualitycontentspreadonFacebook,withthepurposeofidentifyingpoorqualitypostspublishedduringnews-makingeventsinrealtime
• Evaluated supervisedlearningapproachesforidentifyingpoorqualitypostsonFacebookinrealtime,usingentity,textual,metadata,andURLfeatures
• Deployedandevaluated anovelframeworkandsystemforrealtimedetectionofpoorqualitypostsonFacebookduringnews-makingevents
51
![Page 52: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/52.jpg)
Howdoesithelp?
• SocialmediaservicesaretheprimarysourceofinformationformajorityofInternetusers• Contentisunmoderatedandcrowd-sourced;everythingyouseemaynotbetrue
• FacebookInspectorprovidesausefulandusablerealworldsolution toassistusers
• Methodologyforfastandaccuratesummarizationofimagedatasetspertainingtoagiventopic• Governmentagencies/brandscanusethismethodology toquicklyproducehigh-levelsummariesofevents/productsandgaugethepulseofthemasses
52
![Page 53: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/53.jpg)
Realworldimpact
• RealtimesystemFacebookInspectorbuilttoidentifypoorqualitycontentisusedby250+Facebookusers,andhasprocessedover9millionrequests
• AuniquedatasetofFacebookpostscontainingmaliciousURLs,pagespostingmaliciouscontent,andimagesdepictingmisinformationfrom20+news-makingevents
53
![Page 54: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/54.jpg)
Limitationsandfuturework
• Currentsystemdoesnotincorporateuserfeedback• Wewould liketoenableuserstoprovide feedbacktomakeamorepersonalizeddetectionmodel
• Computervisiontechniqueshavelimitedaccuracyonsocialmediacontent• Objectdetection,sentimentanalysis,andopticalcharacterrecognitiontechniquesweusedarenottestedthoroughlyonsocialmediacontent
• Identifyandrankusersonthebasisofdegreeofmalice• Moremaliciouscontentgenerated,highertheranking
54
![Page 55: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/55.jpg)
Acknowledgements
• NIXIfortravelsupport(eCRS,2014)• IIIT-Delhi fortravelsupport(ASONAM,2017)
• Govt.ofIndiaforfundingduringPhD• Collaboratorsandco-authors:Dr.Anand Kashyap,Shrey Bagroy,Anshuman Suri,VarunBharadhwaj,AditiMithal
• Monitoringcommittee:Dr.Vinayak andDr.Sambuddho
• Peers:Dr.Niharika Sachdeva,Anupama Aggarwal,Dr.Paridhi Jain,Dr.AditiGupta,Srishti Gupta,Rishabh Kaushal
• MembersofPrecog@IIITD andCERC
• Everyoneelsewhohasbeenpartofmyjourney…
55
![Page 56: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/56.jpg)
Publications– Partofthesis
• Dewan,P.,Bagroy,S.,andKumaraguru,P.HidinginPlainSight:TheAnatomyofMaliciousPagesonFacebook.Bookchapter,LectureNotesinSocialNetworks,Springer2017(Toappear)
• Dewan,P.,Suri,A.,Bharadhwaj,V.,Mithal,A.,andKumaraguru,P.TowardsUnderstandingCrisisEventsOnOnlineSocialNetworksThroughPictures.IEEE/ACMInternationalConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM),2017.
• Dewan,P.,andKumaraguru,P.FacebookInspector(FbI):TowardsAutomaticRealTimeDetectionofMaliciousContentonFacebook.SocialNetworkAnalysisandMiningJournal(SNAM),2017.Volume7,Issue1.
• Dewan,P.,Bagroy,S.,andKumaraguru,P.HidinginPlainSight:CharacterizingandDetectingMaliciousFacebookPages.IEEE/ACMInternationalConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM),2016(Shortpaper)
• Dewan,P.,andKumaraguru,P.TowardsAutomaticRealTimeIdentificationofMaliciousPostsonFacebook.ThirteenthAnnualConferenceonPrivacy,SecurityandTrust(PST),2015
• Dewan,P.,Kashyap,A.,andKumaraguru,P.AnalyzingSocialandStylometric FeaturestoIdentifySpearphishingEmails.APWGeCrime ResearchSymposium(eCRS),2014
56
![Page 57: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services](https://reader031.fdocuments.in/reader031/viewer/2022030317/5a65431a7f8b9aff1a8b4831/html5/thumbnails/57.jpg)
Publications– Other
• Kaushal,R.,Chandok,S.,JainP., Dewan,P.,Gupta,N.,andKumaraguru,P.NudgingNemo:HelpingUsersControlLinkability acrossSocialNetworks.9thInternationalConferenceonSocialInformatics(SocInfo),2017(Shortpaper).
• Deshpande,P.,Joshi,S., Dewan,P.,Murthy,K.,Mohania,M.,Agrawal,S.TheMaskofZoRRo:preventinginformationleakagefromdocuments.KnowledgeandInformationSystemsJournal,2014
• Mittal,S.,Gupta,N., Dewan,P.,Kumaraguru,P.Pinnedit!AlargescalestudyofthePinterestnetwork.1stACMIKDDConferenceonDataSciences(CoDS),2014
• Dewan,P.,Gupta,M.,Goyal,K.,andKumaraguru,P.MultiOSN:Realtime MonitoringofRealWorldEventsonMultipleOnlineSocialMediaIBMICARE2013
• Magalhães,T.,Dewan,P.,Kumaraguru,P.,Melo-Minardi,R.,andAlmeida,V.uTrack:TrackYourself!MonitoringInformationonOnlineSocialMedia.22ndInternationalWorldWideWebConference(WWW)(2013)
• ConwayM., DewanP.,Kumaraguru P.,McInerney L.'WhitePrideWorldwide':AMeta- analysisofStormfront.orgInternet,Politics,Policy2012:BigData,BigChallenges?,OxfordInternetInstitute,UniversityofOxford.
57