MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
-
Upload
multimediaeval -
Category
Education
-
view
92 -
download
2
Transcript of MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015 Christina Boididou1, Katerina Andreadou1, Symeon Papadopoulos1, Duc-Tien Dang-Nguyen2, Giulia Boato2, Michael Riegler3 & Yiannis Kompatsiaris1
1 Information Technologies Institute (ITI), CERTH, Greece
2 University of Trento, Italy
3 Simula Research Lab, Norway
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany
Real or Fake
#3
Real photo captured April 2011 by WSJ but heavily tweeted during Hurricane Sandy (29 Oct 2012) Tweeted by multiple sources & retweeted multiple times Original online at:
http://blogs.wsj.com/metropolis/2011/04/28/weather-journal-clouds-gathered-but-no-tornado-damage/
Task at a Glance
#4
TWEET
IMAGE
MEDIAEVAL SYSTEM
FAKE
REAL
Systems may use:
• Tweet text
• Tweet metadata
• Twitter user profile
• Image content
AUTHOR (PROFILE)
A Typology of Fake: Reposting of Real
• Photos from past events reposted as being associated to current event
#5
A Typology of Fake: Speculations
• Speculations regarding the association of persons or actions to current event
#7
Ground Truth Generation
• Data (tweet) collection
– Historic (known cases discussed online) using Topsy
– Real-time during major events using streaming API
• Tweet set expansion
– Near-duplicate image search + human inspection was used to increase the number of associated tweets
• Label assignment
– Fake/real labels were manually assigned after consulting online reports that were posted after each event
#12
Annotation Challenges
• Tweets declaring that the embedded image is fake
• Tweets with obvious manipulations
• All those cases were manually checked and removed
from both the development and test set!
#13
Verification Corpus - Dev
#14
Event Name fake real
#images #tweets #users #images #tweets #users
Hurricane Sandy 62 5,559 5,432 148 4,664 4,446
Boston Marathon bombing 35 189 187 28 344 310
Sochi Olympics 26 274 252 - - -
MH370 Flight 29 501 493 - - -
Bring Back Our Girls 7 131 126 - - -
Columbian Chemicals 15 185 87 - - -
Passport hoax 2 44 44 - - -
Rock Elephant 1 13 13 - - -
Underwater bedroom 3 113 112 - - -
Livr mobile app 4 9 9 - - -
Pig fish 1 14 14 - - -
Total 185 7,032 6,769 176 5,008 4,756
Verification Corpus - Test
#15
Event Name fake real
#images #tweets #users #images #tweets #users
Solar Eclipse 6 137 135 4 140 133
Samurai with girl 4 218 212 - - -
Nepal Earthquake 21 356 343 11 1004 934
Garissa Attack 2 6 6 2 73 72
Syrian boy 1 1786 1692 - - -
Varoufakis 1 61 59 - - -
• Evaluation was based on classic IR/ML measures: Precision, Recall, F-measure (target class: fake)
• Participants were allowed to mark a tweet as “unknown” (expected to result in reduced recall)
Results
#16
Team Run Recall Precision F-Score
MCG-ICT run1 0.921 0.964 0.942
run2 0.922 0.937 0.930
UoS-ITI
run1 0.032 1.000 0.063
run2 0.017 1.000 0.034
run3 0.034 1.000 0.065
run4 0.720 1.000 0.837
CERTH-UNITN
run1 0.794 0.733 0.762
run2 0.749 0.994 0.854
run3 0.922 0.736 0.819
run4 0.798 0.860 0.828
run5 0.967 0.862 0.911
Results: Examples #1
• All participants failed to classify those correctly
• True label: Fake / Predicted: Real
#17
Future Plans
• Move beyond tweets + images
– Blog/news articles
– Public Facebook posts (in pages)
– Other?
• Move beyond the simple fake/real distinction
– Real, but inaccurate
– Messages expressing doubt
– Other?
• Use different evaluation measures
– AUC probably better especially when there is class imbalance
#22
Thank you!
• Code:
https://github.com/MKLab-ITI/image-verification-corpus
• Get in touch:
@sympapadopoulos / [email protected]
@CMpoi / [email protected]