Reading the Correct History? Modeling Temporal Intention in Resource Sharing
-
Upload
heinestien -
Category
Technology
-
view
1.701 -
download
1
description
Transcript of Reading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in
Resource Sharing
Hany SalahEldeen & Michael Nelson Reading the Correct History?
Hany M. SalahEldeen & Michael L. Nelson
Old Dominion University Department of Computer Science
Web Science and Digital Libraries Lab.
Hany SalahEldeen & Michael Nelson 1 Reading the Correct History?
• We share web pages
What I share might not be what my readers read Possible Scenario:
• Web pages change
• Readers explore shared pages
Motivation
A temporal inconsistency can arise in the intention of the author regarding the state of the resource between the
tweet time and the read time…
Hany SalahEldeen & Michael Nelson 2 Reading the Correct History?
Can we detect and model this difference in intention?
The game plan
Hany SalahEldeen & Michael Nelson 3 Reading the Correct History?
Problem Illustration
Training data collection attempts
The TIRM model
Ground truth validation
Data collection
Feature extraction and modeling
Model evaluation
Example: Obama’s press conference on 14th of Jan 2013
Hany SalahEldeen & Michael Nelson 4 Reading the Correct History?
Clicking on the link in the tweet …
Hany SalahEldeen & Michael Nelson 5 Reading the Correct History?
Using the Twitter expanded interface
Hany SalahEldeen & Michael Nelson 6 Reading the Correct History?
The attack on the embassy was in February 2013
Problem: There is an inconsistency between what the tweet’s author intended
to share at time ttweet
and what the reader might actually read upon clicking on the link at time tclick .
Hany SalahEldeen & Michael Nelson 7 Reading the Correct History?
Hany SalahEldeen & Michael Nelson 8 Reading the Correct History?
Implication: Since tweets are considered the first draft of history… the historical
integrity of the tweets could be compromised.
Solution: Detect the correct intention
Hany SalahEldeen & Michael Nelson 9 Reading the Correct History?
Option 1 Option 2 Option 3
The game plan
Hany SalahEldeen & Michael Nelson Reading the Correct History?
Problem Illustration
Training data collection attempts
The TIRM model
Ground truth validation
Data collection
Feature extraction and modeling
Model evaluation
Amazon’s Mechanical Turk (MT) • Crowdsourcing Internet marketplace
• Co-ordinates the use of human intelligence to perform tasks that computers are currently unable to do.*
Hany SalahEldeen & Michael Nelson 10 Reading the Correct History?
* http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk
Goal: Collect user intention data via MT
Hany SalahEldeen & Michael Nelson 11
Reading the Correct History?
Tweets dataset Intention Classification Tasks User Intention Data
Classifier
Train
• Problem:
– It is not as easy as it seems!
How not to classify temporal intention 101
• Given a tweet, is the intended state of the link is in:
Hany SalahEldeen & Michael Nelson 12 Reading the Correct History?
past state? current state? No information?
Ground truth collection
• A dataset of 100 tweets classified by:
– Our Web Science and Digital Libraries (WS-DL) research group members
– MT workers
Hany SalahEldeen & Michael Nelson 13 Reading the Correct History?
The agreement was very low…
• Reliability of agreement between:
– WS-DL members = Fleiss’ ϰ = 0.14
– MT workers = Fleiss’ ϰ = 0.07
• Inter-rater agreement between the collective WS-DL members and MT workers = Cohen’s ϰ = 0.04
Slight agreement
Hany SalahEldeen & Michael Nelson 14 Reading the Correct History?
So we removed the guessing part: • The tweet is presented along with the two snapshots:
Hany SalahEldeen & Michael Nelson 15 Reading the Correct History?
at ttweet at tclick
… and classified the 100 tweets again
• Via a face to face meeting with WS-DL members.
• Resubmitted the new experiment to MT.
Hany SalahEldeen & Michael Nelson 16 Reading the Correct History?
The tweet, current and past snapshots
Hany SalahEldeen & Michael Nelson 17 Reading the Correct History?
Past Version Current Version
The results remained very low
• For 9 MT assignments per tweet:
– If we allowed 4-5 splits we have 58% match with WS-DL.
– If we allowed 3-6 splits or better we got 31% match
Which is worse that flipping a coin!
Hany SalahEldeen & Michael Nelson 18 Reading the Correct History?
Observations
• Assigning a temporal intention is not a trivial task.
• MT workers are accustomed to more straightforward tasks.
• The concept of “time on the web” is foreign to MT workers.
Hany SalahEldeen & Michael Nelson 19 Reading the Correct History?
The game plan
Hany SalahEldeen & Michael Nelson Reading the Correct History?
Problem Illustration
Training data collection attempts
The TIRM model
Ground truth validation
Data collection
Feature extraction and modeling
Model evaluation
Idea: We need to transform the problem from intention to
relevance.
Hany SalahEldeen & Michael Nelson 20 Reading the Correct History?
Relevance tasks are simpler
• MT workers are more accustomed to classification tasks and it requires minimum amount of explanation
Is that a cat?
- Yes
- No
Hany SalahEldeen & Michael Nelson 21 Reading the Correct History?
Hany SalahEldeen & Michael Nelson 22 Reading the Correct History?
Temporal Intention Relevancy Model ( TIRM)
Between ttweet and tclick:
The linked resource could have: • Changed • Not changed
The tweet and the linked resource could be: • Still relevant • No longer relevant
Hany SalahEldeen & Michael Nelson 23 Reading the Correct History?
Resource is changed but relevant
• The resource changed • But it is still relevant
Intention: need the current version of the resource at any time
Hany SalahEldeen & Michael Nelson 24 Reading the Correct History?
Relevancy and Intention Mapping
Current
Hany SalahEldeen & Michael Nelson 25 Reading the Correct History?
Resource is changed and not relevant
Intention: need the past version of the resource at any time
• The resource changed • But it is no longer relevant
Past
Hany SalahEldeen & Michael Nelson 26 Reading the Correct History?
Relevancy and Intention Mapping
Current
Hany SalahEldeen & Michael Nelson 27 Reading the Correct History?
Resource is not changed and relevant
Intention: need the past version of the resource at any time
• The resource is not changed • And it is relevant
Past
Hany SalahEldeen & Michael Nelson 28 Reading the Correct History?
Relevancy and Intention Mapping
Current
Past
Hany SalahEldeen & Michael Nelson 29 Reading the Correct History?
Resource is not changed and not relevant
Intention: I am not sure which version of the resource I need
• The resource is not changed • But it is not relevant
Past
Hany SalahEldeen & Michael Nelson 30 Reading the Correct History?
Relevancy and Intention Mapping
Current
Past Not Sure
The game plan
Hany SalahEldeen & Michael Nelson Reading the Correct History?
Problem Illustration
Training data collection attempts
The TIRM model
Ground truth validation
Data collection
Feature extraction and modeling
Model evaluation
Next step: validation
• MT workers ≡ judgments of the experts (WS-DL members)
Hany SalahEldeen & Michael Nelson 31 Reading the Correct History?
✓
Is the content still relevant to the tweet?
Filtering the results
• We accepted raters with: – At least 1000 accepted HITs
– 95% acceptance rate
• Average completion time = 61 seconds
• We removed:
– Any assignments that took <10 seconds hasty decision
– Low quality repetitive assignments and banned the raters
Hany SalahEldeen & Michael Nelson 32 Reading the Correct History?
Mechanical Turk Workers Vs. Experts
• For 100 tweets, WS-DL members % of agreement :
• Cohen’s ϰ = 0.854 almost perfect agreement
Hany SalahEldeen & Michael Nelson 33 Reading the Correct History?
Agreement in three or more votes 93%
Agreement in four or more votes 80%
Agreement with all five votes 60%
The game plan
Hany SalahEldeen & Michael Nelson 34 Reading the Correct History?
Problem Illustration
Training data collection attempts
The TIRM model
Ground truth validation
Data collection
Feature extraction and modeling
Model evaluation
Data collection
• From SNAP dataset we extracted:
– Tweets in English
– Each has an embedded URI pointing to an external resource.
– The embedded URI is shortened via Bit.ly
– The external resource:
• Still persists.
• Has at least 10 mementos.
• Is unique.
We extracted 5,937 unique instances
Hany SalahEldeen & Michael Nelson 35 Reading the Correct History?
Get the closest memento
Hany SalahEldeen & Michael Nelson 35 Reading the Correct History?
… t1 t2
tn
t4 t3
Δ1 Δ2 < Pick Memento @ t1
Sorted Time Delta between tweet and closest memento
Hany SalahEldeen & Michael Nelson 36 Reading the Correct History?
Randomly selected 1,124 instances Time delta range: 3.07 minutes to 56.04 hours Average: 25.79 hours ~ 1 day
Tweet time
After Tweet time
Before Tweet time
Training dataset
• Rcurrent: The state of the resource at current time.
• Rclick: The state of the resource at click time.
Hany SalahEldeen & Michael Nelson 37 Reading the Correct History?
Relevant Assignments 929 82.65%
Non-Relevant Assignments 195 17.35%
5 MT workers agreeing (5-0 split) 589 52.40%
4 MT workers agreeing (4-1 split) 309 27.49%
3 MT workers agreeing (3-2 close call split) 226 20.11%
The game plan
Hany SalahEldeen & Michael Nelson 38 Reading the Correct History?
Problem Illustration
Training data collection attempts
The TIRM model
Ground truth validation
Data collection
Feature extraction and modeling
Model evaluation
Feature extraction
• For each tweet we perform:
– Link analysis
– Social Media Mining
– Archival Existence
– Sentiment Analysis
– Content Similarity
– Entity Identification
Hany SalahEldeen & Michael Nelson 39 Reading the Correct History?
Link analysis
• Since the tweets have embedded resources shortened by Bit.ly we can extract: – Total number of clicks
– Hourly click logs
– Creation dates
– Referring websites
– Referring countries.
• We calculate the depth of the resource in relation to its domain (either it is a leaf node or a root page) – We calculated the number of backslashes in the resource’s URI
Hany SalahEldeen & Michael Nelson 40 Reading the Correct History?
Social Media Mining
• Twitter:
– Using Topsy.com’s API to extract: • Total number of tweets.
• The most recent 500.
• Number of tweets by influential users.
Hany SalahEldeen & Michael Nelson 41 Reading the Correct History?
The collection of tweets extracted provided an extended context of the resource authored by users in the twittersphere.
Social Media Mining
• Facebook:
– Mined too for likes, shares, posts, and clicks related to each resource.
Hany SalahEldeen & Michael Nelson 42 Reading the Correct History?
Archival Existence
• Using Memento Time Maps we get: – Total mementos
available
– Different archives count.
– The closest archived version to the tweet time.
Hany SalahEldeen & Michael Nelson 43 Reading the Correct History?
Sentiment Analysis • Using NLTK libraries of natural language text processing
• Extract the most prominent sentiment in the text
Hany SalahEldeen & Michael Nelson 44 Reading the Correct History?
Content Similarity • Steps:
– We download the content HTML using Lynx browser.
– We apply boilerplate removal algorithm and full text extraction.
– Calculate the cosine similarity between the two pages.
Hany SalahEldeen & Michael Nelson 45 Reading the Correct History?
70% similarity
Entity Identification • By visual inspection we observed that the majority of tweets about
celebrities are related to current events.
• We harvested Wikipedia for lists of actors, politicians, and athletes.
• Checked the existence of a celebrity mention in the tweets.
Hany SalahEldeen & Michael Nelson 46 Reading the Correct History?
Actor: Johnny Depp
• To remove confusion we removed the close calls
898 instances remaining
Relevant Assignments 929 82.65%
Non-Relevant Assignments 195 17.35%
5 MT workers agreeing (5-0 split) 589 52.40%
4 MT workers agreeing (4-1 split) 309 27.49%
3 MT workers agreeing (3-2 close call split) 226 20.11%
Modeling and Classification
Hany SalahEldeen & Michael Nelson 47 Reading the Correct History?
The trained classifier
• From the feature extraction phase we extracted 39 different features to train the classifier.
• Using 10-fold cross validation, the Cost Sensitive Classifier Based on Random Forests gave the highest success rate = 90.32%
Hany SalahEldeen & Michael Nelson 48 Reading the Correct History?
Testing the model
Hany SalahEldeen & Michael Nelson 49 Reading the Correct History?
10-Fold Cross-Validation Testing
Classifier Mean Absolute Error
Root Mean Squared Error
Kappa Statistic
Incorrectly Classified %
Correctly Classified %
Cost sensitive classifier based on Random Forest
0.15 0.27 0.39 9.68% 90.32%
Classifier Precision Recall F-measure Class
Cost sensitive classifier based on Random Forest
0.93 0.53
0.96 0.37
0.95 0.44
Relevant Non-Relevant
Weighted Average 0.89 0.90 0.90
Feature significance
• Since we have 39 features, we needed to understand the effect of each feature and which are the strongest ones affecting the classification
• We applied an attribute evaluator supervised algorithm based on Ranker search to find the strongest features
Hany SalahEldeen & Michael Nelson 50 Reading the Correct History?
Most significant features sorted by information gain
Hany SalahEldeen & Michael Nelson 51 Reading the Correct History?
Rank Feature Gain Ratio
1 Existence of celebrities in tweets 0.149
2 Number of mementos 0.090
3 Tweet similarity with current page 0.071
4 Similarity: Current & past page 0.0527
5 Similarity: Tweet & past page 0.04401
6 Original URI’s depth 0.0324
The game plan
Hany SalahEldeen & Michael Nelson Reading the Correct History?
Problem Illustration
Training data collection attempts
The TIRM model
Ground truth validation
Data collection
Feature extraction and modeling
Model evaluation
Model Evaluation
• Next step was to test the trained model against other datasets and examine the results.
• We tested against: – The remaining 4,813 from the original 5,937 instances after extracting the
1,124 used in training.
– The Tweet Collections based on historic events. (MJ, Obama, Iran, Syria, & H1N1)
Hany SalahEldeen & Michael Nelson 52 Reading the Correct History?
Results of testing the model against multiple datasets
Hany SalahEldeen & Michael Nelson 53 Reading the Correct History?
Dataset Status 200 Status 404 of other Relevant % Non-Relevant %
Extended 4,813 instances 96.77% 3.23% 96.74% 3.26%
MJ’s Death 57.54% 42.46% 93.24% 6.76%
H1N1 Outbreak 8.96% 91.04% 97.48% 2.52%
Iran Elections 68.21% 31.79% 94.69% 5.31%
Obama’s Nobel Prize 62.86% 37.14% 93.89% 6.11%
Syrian Uprising 80.80% 19.20% 70.26% 29.75%
Hany SalahEldeen & Michael Nelson 54 Reading the Correct History?
Idea: We need to transform the problem from intention to
relevance.
Recap…
Now we need to transform it back!
Mapping TIRM
• We used 70% similarity as a threshold of relevancy.
Hany SalahEldeen & Michael Nelson 55 Reading the Correct History?
Conclusions • TIRM successfully transfers the temporal intention
problem to a temporal relevancy problem.
• Temporal relevancy is easier to solve and MT workers provide almost perfect agreement with experts’ opinions.
• We successfully collected a gold standard dataset of temporal user intention.
• We found a temporal inconsistency in the shared resource ranging from <1% to 25% according to the dataset.
Hany SalahEldeen & Michael Nelson 56 Reading the Correct History?