Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data...
Transcript of Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data...
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 1
Computing Geo-Spatial Motives from LinkedData for Search-driven Applications
Andreas Both, Liliya Avdiyenko, Christiane Lemke (Unister GmbH)
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 2
Unister Group: Multi-brand strategy
eCommerce segments
travel
comparison
ventures
R&D Department
Making web and data techolo-gies useful, focus: B2C search
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 2
Unister Group: Multi-brand strategy
eCommerce segments
travel
comparison
ventures
R&D Department
Making web and data techolo-gies useful, focus: B2C search
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 3
making the web an exploratory place for geospatial data
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 4
GeoKnow in a nutshell
What
Why
For whom
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 4
GeoKnow in a nutshell
What
Bring geospatial knowledge integration to the Linked DataWeb
Billion-triple geospatial reasoning and data provenance
Qualitative interlinking and fusing of geospatial and semanticinformation
Adaptive geospatial exploration, authoring and curation
Why
For whom
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 4
GeoKnow in a nutshell
What
Why
Unlock isolated islands of geographic information
80% of all data has some spatial dimension, most of it is notprocessable today
Geographic data authoring with millions of users requirespowerful tools
For whom
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 4
GeoKnow in a nutshell
What
Why
For whom
Added value for the companies and the Linked Data Webcommunity
Cost-effective data integration for SMEs
Enterprises can add value to their data with volunteeredgeographic information
Users from travel industry will benefit from more backgroundinformation
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 5
GeoKnow Consortium
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 6
Example
Example
winter holidays with culture
Possible Interpretation
winter holidays: places in the mountains with at least threeski lifts
culture: places that offer culture points of interests
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 6
Example
Example
winter holidays with culture
Possible Interpretation
winter holidays: places in the mountains with at least threeski lifts
culture: places that offer culture points of interests
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 6
Example
Example
winter holidays with culture
Possible Interpretation
winter holidays: places in the mountains with at least threeski lifts
culture: places that offer culture points of interests
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 7
Problem Description
Users Perspektive
data needs to be accessible
B2C context → need easy-to-use interface
Linked Data Perspektive
many data sets are available
data sets are interlinked → many features for training
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 7
Problem Description
Users Perspektive
data needs to be accessible
B2C context → need easy-to-use interface
Linked Data Perspektive
many data sets are available
data sets are interlinked → many features for training
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 8
Linked Data Life Cycle
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 9
Research Questions
Geospatial Regions of Interest
A motive can be defined as the reason for a search as wellas particular conditions like how much a product shouldcost.
Research Questions
Can one use machine learning techniques on features of linkeddata for computing motives?
Is it beneficial to aggregate data from different sources?
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 9
Research Questions
Geospatial Regions of Interest
A motive can be defined as the reason for a search as wellas particular conditions like how much a product shouldcost.
Research Questions
Can one use machine learning techniques on features of linkeddata for computing motives?
Is it beneficial to aggregate data from different sources?
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 9
Research Questions
Geospatial Regions of Interest
A motive can be defined as the reason for a search as wellas particular conditions like how much a product shouldcost.
Research Questions
Can one use machine learning techniques on features of linkeddata for computing motives?
Is it beneficial to aggregate data from different sources?
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 10
Data Sets (Geo-spatial Entities)
DBpedia
collections of entities with geospatial position
Natural Earth
curated populated places
polygons for countries
Geonames
attributes for geospatial entities
Note: Data was interlinked already (manually curated).
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 11
Approach
Compute Regions of Interest
1 extract related entities
2 define training data by experts for the given motive
3 use training set to learn relevant places
→ check in case study
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 11
Approach
Compute Regions of Interest
1 extract related entities
2 define training data by experts for the given motive
3 use training set to learn relevant places
→ check in case study
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 12
Case Study
Educational Regions
regional character → analyses per country
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 12
Case Study
Educational Regions
regional character → analyses per country
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 12
Case Study
Educational Regions in Central Europe
regional character → analyses per country
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 13
Case Study: Education
Process: 3 Steps
1 extract related entities
extract all entities of DBpedia with typehttp://dbpedia.org/ontology/EducationalInstitution
2 define training data by experts for the given motive
3 expertsquestion: Is the considered place important w.r.t. education inthis country?
3 use training set to learn relevant places
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 13
Case Study: Education
Process: 3 Steps
1 extract related entities
extract all entities of DBpedia with typehttp://dbpedia.org/ontology/EducationalInstitution
2 define training data by experts for the given motive
3 expertsquestion: Is the considered place important w.r.t. education inthis country?
3 use training set to learn relevant places
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 13
Case Study: Education
Process: 3 Steps
1 extract related entities
extract all entities of DBpedia with typehttp://dbpedia.org/ontology/EducationalInstitution
2 define training data by experts for the given motive
3 expertsquestion: Is the considered place important w.r.t. education inthis country?
3 use training set to learn relevant places
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 13
Case Study: Education
Process: 3 Steps
1 extract related entities
extract all entities of DBpedia with typehttp://dbpedia.org/ontology/EducationalInstitution
2 define training data by experts for the given motive
3 expertsquestion: Is the considered place important w.r.t. education inthis country?
3 use training set to learn relevant places
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 14
Training
Reminder: Research Questions
Is it beneficial to aggregate data from different sources?
Approach: iterative integration of linked data
1 DBpedia (educational entities) and populated places (NaturalEarth)
2 add all features of populated places (Natural Earth)
3 add all features of GeoNames
Note
feature selection after each step
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 14
Training
Reminder: Research Questions
Is it beneficial to aggregate data from different sources?
Approach: iterative integration of linked data
1 DBpedia (educational entities) and populated places (NaturalEarth)
2 add all features of populated places (Natural Earth)
3 add all features of GeoNames
Note
feature selection after each step
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 14
Training
Reminder: Research Questions
Is it beneficial to aggregate data from different sources?
Approach: iterative integration of linked data
1 DBpedia (educational entities) and populated places (NaturalEarth)
2 add all features of populated places (Natural Earth)
3 add all features of GeoNames
Note
feature selection after each step
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 15
Experimental Settings
Training Data
881 samples of training data
average expert rating normalized from [0, 2] to relevant orirrelevant
Training Process
Weka
200 runs
Breiman’s random forest (cross validation)
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 15
Experimental Settings
Training Data
881 samples of training data
average expert rating normalized from [0, 2] to relevant orirrelevant
Training Process
Weka
200 runs
Breiman’s random forest (cross validation)
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 16
Results
Datasets D1 D2 D3
features entities nearby 10kmentities nearby 25kmentities nearby 50km
all features of D1and
max of statese in countryworldcitypop maxrank maxmax areami
all features of D2and
gn pop
TP rate 0.73 0.79 0.80FP rate 0.57 0.42 0.41
F-measure 0.71 0.78 0.79Precision 0.70 0.78 0.79
AUC 0.64 0.82 0.83Accurancy 0.73 0.79 0.80
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 16
Results
Datasets D1 D2 D3
features entities nearby 10kmentities nearby 25kmentities nearby 50km
all features of D1and
max of statese in countryworldcitypop maxrank maxmax areami
all features of D2and
gn pop
TP rate 0.73 0.79 0.80FP rate 0.57 0.42 0.41
F-measure 0.71 0.78 0.79Precision 0.70 0.78 0.79
AUC 0.64 0.82 0.83Accurancy 0.73 0.79 0.80
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 16
Results
Datasets D1 D2 D3
features entities nearby 10kmentities nearby 25kmentities nearby 50km
all features of D1and
max of statese in countryworldcitypop maxrank maxmax areami
all features of D2and
gn pop
TP rate 0.73 0.79 0.80FP rate 0.57 0.42 0.41
F-measure 0.71 0.78 0.79Precision 0.70 0.78 0.79
AUC 0.64 0.82 0.83Accurancy 0.73 0.79 0.80
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 17
Conclusion and Future Work
presented an approach for answering commonly questions
focus: geo-spatial motives in search queries
approach based on machine learning
iterative data integration (linked data)
→ winter holidays with culture
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 17
Conclusion and Future Work
presented an approach for answering commonly questions
focus: geo-spatial motives in search queries
approach based on machine learning
iterative data integration (linked data)
→ winter holidays with culture
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 18
Conclusion and Future Work
Reseach Questions
Can one use machine learning techniques on features of linked datafor computing motives?
approach for deriving regions of interest
geospatial search motives are covered with a good quality
however, time-consuming approach
Is it beneficial to aggregate data from different sources?
quality increased
even duplicate attributes support
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 18
Conclusion and Future Work
Reseach Questions
Can one use machine learning techniques on features of linked datafor computing motives?
approach for deriving regions of interest
geospatial search motives are covered with a good quality
however, time-consuming approach
Is it beneficial to aggregate data from different sources?
quality increased
even duplicate attributes support
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 19
Conclusion and Future Work
Major Open Research tasks
How to reduce the manual effort?
How to handle low quality data sets?
How to compute regions unrelated to populated places?
. . .
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 20
Take Away
search-driven applications need tobridge the user intentions
linked data helps at getting the data
geo-spatial motives can be covered
reduction of manual effort is needed
Dr. Andreas Both
Head of Researchand Development
Unister GmbH,Leipzig, Germany
+49 341 65050 24496
http://www.unister.de
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 21
Metrics
Evaluation metrics for a binary classifier (defined in terms ofpositives (P), negatives (N), true positives (TP), true negatives
(TN) and false positives (FP))
Metric Definition
True positive rate (recall) TPR = TP/P
False positive rate FPR = FP/P
Precision Pr = TP/(TP + FP)
F-measure F = 2 ∗ Pr ∗ TPR/(Pr + TPR)
AUC area under the curve depicting TPR plotted against FPR
Accuracy Acc = (TP + TN)/(P + N)