The Lord Of Flies Presented by : Sellami Raouf Halima Bessioud Allele Inchirah.
On the Enrichment of a RDF Repository of City Points of Interest based on Social Data Zied Sellami*,...
-
Upload
abraham-dawson -
Category
Documents
-
view
214 -
download
0
Transcript of On the Enrichment of a RDF Repository of City Points of Interest based on Social Data Zied Sellami*,...
On the Enrichment of a RDF Repository of City Points of Interest based on Social Data
Zied Sellami*, Gianluca Quercini**, Chantal Reynaud*
*IASI Team, Université Paris-Sud 11, France
{sellami, reynaud}@lri.fr
**E3S Team, Supélec, France
WOD’2013 - Paris - 03th June, 2013
/20
Outline
1. Introduction and Related Issues
2. Reconciliation of POI Data and Social Data
3. Enrichment based on Opinion Mining
4. Experiments and Results
5. Conclusion and Future Work
2WOD'2013, Paris, June 2013
/20
Introduction and Related Issues
Points of interest (POI) : geographic locations Restaurants, museums, hotels, theatres, landmarks, etc.
Formalized as a RDF repository in the context of the DataBridges project (Quercini et al., 2012)
A POI is described by facets (or attributes) : name, type, category, address, longitude and latitude.
Example of POI : Louvre Museum
3WOD'2013, Paris, June 2013
/20
Introduction and Related Issues
POIs are automatically obtained by data extracting from Google Fusion Tables (GFT) (Quercini et al., 2012)
Some extracted POIs contains few attributes Some extracted POIs do not contains a precise attributes (not complete
address, not precise geographic location) Lack of valuable indications in the extracted POIs (users reviews, official
Web Site, e-mail, etc.)
Enrich and Correct POIs Additional elements : Phone number, e-mail, official web site… Useful indications to potential visitors (good and bad aspects of
the place)
Enrich POI using what? Using Social Networking Systems (Social Data)
4WOD'2013, Paris, June 2013
/20
Matching POIs Across Social Networks
Accessing and searching social Web Pages concerning POI
1. Yelp (http://www.yelp.com/) Social networking site for retrieving and reviewing POI
2. Foursquare (https://foursquare.com/) Application combining geolocalisation and social guidance
Similar searching method Input: name and geographic position Output: list of Web Pages of POI related to the geographic
position and words included in the query
Filtering the list to select only pertinent Web pages
5WOD'2013, Paris, June 2013
/20
Matching POIs Across Social Networks
Selecting the appropriate Web Pages for a POI Computing a similarity value
Several parameters can be used Name Address Category Longitude and Latitude
Definition of a similarity formula
6WOD'2013, Paris, June 2013
/20
Matching POIs Across Social Networks: Similarity Measure
2 parameters used : name; longitude and latitude Different social data different manner to describe category
Eiffel Tower (Monument, Garden, etc ; Landmarks, Historical Building)
Uncontrolled social data string address incomplete or wrong O Pelicano (Portugal); Restaurante O Requinte (Portugal); etc.
String techniques for name pruning and name comparison Stemming with porter algorithm; stop words lists Levensthein distance and Jaccard distance
Filtering results using distance proximity Processing geographic distance between POI and Web Page by
using longitude and latitude
7WOD'2013, Paris, June 2013
/20
Matching POIs Across Social Networks: Similarity Measure
Similarity measure WP(x).name : name of an entity x in a Web Page p.name : name of a POI
Combination of Levenshtein and Jaccard Boosts the similarity score between names that employ words
even in a different order Example : Museum of Louvre; Louvre Museum
8WOD'2013, Paris, June 2013
/20
Matching POIs Across Social Networks: Filtering Measure
Filtering measure δ1 and δ2 : similarity name thresholds
distmax: distance thresholds
Thresholds values fixed after some experiments δ1 = 0.9 and δ2 = 0.7
distmax = 1000 meters
9WOD'2013, Paris, June 2013
/20
Opinion Mining
Evaluation of the POI from reviews and comments Notation: Good, Very Good, Bad, Very Bad, etc. Useful information for a potential visitor:
What is interesting? (food, ambiance, place, etc.) What is to be avoided? (drink, person, etc.)
Go further than a conventional sentiment analysis Tweets classification (positive, negative or undetermined) (Pak
and Paroubek, 2010) http://smm.streamcrab.com/ http://www.sentiment140.com/
Linguistic approach for opinion mining
10WOD'2013, Paris, June 2013
/20
Opinion Mining: Principle
Identification of positive and negative expressions Using Verbs and adjectives (Chesley et al., 2006) (Moghaddam
and Popowich, 2010) (Li et al., 2012) Example : Great food, not good place, I like the place, etc.
Generating a lexicon of positive and negative verbs and adjectives
Processing with TreeTagger a lexicon of positive words and negative words
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html Positive adjectives (1467 adj) / Negative adjectives (1609 adj) Positive verbs (421 verb) / Negative Verbs (1243 verb)
11WOD'2013, Paris, June 2013
/20
Opinion Mining: Phrase Extraction
Definition of lexico-syntactic patterns to identify pertinent expressions
Expressions describing objects 1. (NOT)* ADJ OBJECT (Great food, not interesting place, etc.)
2. OBJECT BE ADJ (Sandwich is good, restaurant is nice, etc.)
Expressions describing sentiments or advice1. ITS ADJ (it’s interesting, it’s happy, etc.)
2. I FEEL OR SUGGEST OBJECT ( I like this place, I advice you to test the hotel, etc.)
3. I FEEL (NOT)* ADJ (I feel happy, I feel very hungry, etc.)
Implementation with Java Regex
12WOD'2013, Paris, June 2013
/20
Repository Enrichment: Notation of a POI
Notation measure
Scale for giving appreciation to POI
Very bad Bad Medium Indetermined Fairly GoodVery Good
-10 -6,6 -3,3 0 3,3 6,6 10
13WOD'2013, Paris, June 2013
/20
Repository Enrichment: Identifying Useful Information
1. General assessment Expressions describing sentiments Expressions describing objects concerning the place; the name
of a POI; or one of the POI category
2. Tips Expressions describing advices
3. Specific ideas Expressions describing objects other than place; name or
category of the POI
14WOD'2013, Paris, June 2013
/20
Evaluation of the Similarity Measure
Dataset : 600 POI compared with foursquare data Comparing our formula with Levenshtein and Jaccard
The combination of Levenshtein and Jaccard improves the similarity precision
Our formula and Levenshtein have a same F-measure Precision parameter is more important
Formula Precision Recall F-measure
Name + Levenshtein 0.84 0.68 0.75
Name + Jaccard 0.85 0.66 0.74
Our formula 0.86 0.66 0.75
15WOD'2013, Paris, June 2013
/20
Evaluation of the Opinion Mining Approach
40 Yelp reviews of Louvre Museum and Eiffel Tower Louvre Museum notation: Very Good (7.23) Eiffel Tower notation: Very Good (8.58)
Louvre Museum Eiffel Tower
General assessment:Positive [magnificent place, beautiful place, good museum, prestigious museum]Negative [crowded place, hard museum, uncomfortable museum]Tips:go basement, visit basement, not use pyramid entranceSpecific ideas:Positive [contemporary art, contemporary sculpture, original decor, real mummy]Negative [sketchy people, strange marble sculpture, massive crowd, grumpy folk]
General assessment:Positive [great place, funny place, beautiful monument]Negative []Tips:Go topSpecific ideas:Positive [good view, panoramic view, light show]Negative [slow elevator, crazy line, illegal Eiffel tower souvenir]
16WOD'2013, Paris, June 2013
/20
Evaluation of the Opinion Mining Approach
Comparison with sentiment140 (statistical approach) Analysis of 20 tweets concerning Louvre Museum and 14 tweets
concerning Eiffel TowerPolarity of Louvre Museum tweet
sentiment140 Our approach
Positive 13 10
Negative 2 0
Undetermined 5 10
Polarity of Eiffel Tower tweet
sentiment140 Our approach
Positive 11 10
Negative 1 1
Undetermined 2 3
Not contradictory resultsOur approach identified 3 sentiments that where not identified by sentiment140
2 tweets analyzed differentlyOur approach identified the correctness polarity
17WOD'2013, Paris, June 2013
/20
Conclusion
Original approach for POI data enrichment Definition of a similarity formula to compare POI data Linguistic approach to analyze reviews and comments Complete tool implemented in Java
Experiments shows promising results About 86 % of similarity precision Linguistic approach able to identify exactly positive and negative
aspects of the POI
18WOD'2013, Paris, June 2013
/20
Future Work
1. Similarity measure optimisation Compare selected Web Pages for the POI
2. Filtering positive and negative expressions Using metrics like frequency
3. Learning new positive and negative verbs and adjectives Using SentiWordNet (Baccianella et al., 2010)
4. Using adverbs in the opinion mining approach (Benamara et al., 2007)
Very good food is stronger than Good food
19WOD'2013, Paris, June 2013