Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data...

43
A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 1 Computing Geo-Spatial Motives from Linked Data for Search-driven Applications Andreas Both , Liliya Avdiyenko, Christiane Lemke (Unister GmbH)

Transcript of Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data...

Page 1: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 1

Computing Geo-Spatial Motives from LinkedData for Search-driven Applications

Andreas Both, Liliya Avdiyenko, Christiane Lemke (Unister GmbH)

Page 2: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 2

Unister Group: Multi-brand strategy

eCommerce segments

travel

comparison

ventures

R&D Department

Making web and data techolo-gies useful, focus: B2C search

Page 3: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 2

Unister Group: Multi-brand strategy

eCommerce segments

travel

comparison

ventures

R&D Department

Making web and data techolo-gies useful, focus: B2C search

Page 4: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 3

making the web an exploratory place for geospatial data

Page 5: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 4

GeoKnow in a nutshell

What

Why

For whom

Page 6: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 4

GeoKnow in a nutshell

What

Bring geospatial knowledge integration to the Linked DataWeb

Billion-triple geospatial reasoning and data provenance

Qualitative interlinking and fusing of geospatial and semanticinformation

Adaptive geospatial exploration, authoring and curation

Why

For whom

Page 7: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 4

GeoKnow in a nutshell

What

Why

Unlock isolated islands of geographic information

80% of all data has some spatial dimension, most of it is notprocessable today

Geographic data authoring with millions of users requirespowerful tools

For whom

Page 8: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 4

GeoKnow in a nutshell

What

Why

For whom

Added value for the companies and the Linked Data Webcommunity

Cost-effective data integration for SMEs

Enterprises can add value to their data with volunteeredgeographic information

Users from travel industry will benefit from more backgroundinformation

Page 9: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 5

GeoKnow Consortium

Page 10: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 6

Example

Example

winter holidays with culture

Possible Interpretation

winter holidays: places in the mountains with at least threeski lifts

culture: places that offer culture points of interests

Page 11: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 6

Example

Example

winter holidays with culture

Possible Interpretation

winter holidays: places in the mountains with at least threeski lifts

culture: places that offer culture points of interests

Page 12: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 6

Example

Example

winter holidays with culture

Possible Interpretation

winter holidays: places in the mountains with at least threeski lifts

culture: places that offer culture points of interests

Page 13: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 7

Problem Description

Users Perspektive

data needs to be accessible

B2C context → need easy-to-use interface

Linked Data Perspektive

many data sets are available

data sets are interlinked → many features for training

Page 14: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 7

Problem Description

Users Perspektive

data needs to be accessible

B2C context → need easy-to-use interface

Linked Data Perspektive

many data sets are available

data sets are interlinked → many features for training

Page 15: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 8

Linked Data Life Cycle

Page 16: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 9

Research Questions

Geospatial Regions of Interest

A motive can be defined as the reason for a search as wellas particular conditions like how much a product shouldcost.

Research Questions

Can one use machine learning techniques on features of linkeddata for computing motives?

Is it beneficial to aggregate data from different sources?

Page 17: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 9

Research Questions

Geospatial Regions of Interest

A motive can be defined as the reason for a search as wellas particular conditions like how much a product shouldcost.

Research Questions

Can one use machine learning techniques on features of linkeddata for computing motives?

Is it beneficial to aggregate data from different sources?

Page 18: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 9

Research Questions

Geospatial Regions of Interest

A motive can be defined as the reason for a search as wellas particular conditions like how much a product shouldcost.

Research Questions

Can one use machine learning techniques on features of linkeddata for computing motives?

Is it beneficial to aggregate data from different sources?

Page 19: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 10

Data Sets (Geo-spatial Entities)

DBpedia

collections of entities with geospatial position

Natural Earth

curated populated places

polygons for countries

Geonames

attributes for geospatial entities

Note: Data was interlinked already (manually curated).

Page 20: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 11

Approach

Compute Regions of Interest

1 extract related entities

2 define training data by experts for the given motive

3 use training set to learn relevant places

→ check in case study

Page 21: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 11

Approach

Compute Regions of Interest

1 extract related entities

2 define training data by experts for the given motive

3 use training set to learn relevant places

→ check in case study

Page 22: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 12

Case Study

Educational Regions

regional character → analyses per country

Page 23: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 12

Case Study

Educational Regions

regional character → analyses per country

Page 24: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 12

Case Study

Educational Regions in Central Europe

regional character → analyses per country

Page 25: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 13

Case Study: Education

Process: 3 Steps

1 extract related entities

extract all entities of DBpedia with typehttp://dbpedia.org/ontology/EducationalInstitution

2 define training data by experts for the given motive

3 expertsquestion: Is the considered place important w.r.t. education inthis country?

3 use training set to learn relevant places

Page 26: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 13

Case Study: Education

Process: 3 Steps

1 extract related entities

extract all entities of DBpedia with typehttp://dbpedia.org/ontology/EducationalInstitution

2 define training data by experts for the given motive

3 expertsquestion: Is the considered place important w.r.t. education inthis country?

3 use training set to learn relevant places

Page 27: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 13

Case Study: Education

Process: 3 Steps

1 extract related entities

extract all entities of DBpedia with typehttp://dbpedia.org/ontology/EducationalInstitution

2 define training data by experts for the given motive

3 expertsquestion: Is the considered place important w.r.t. education inthis country?

3 use training set to learn relevant places

Page 28: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 13

Case Study: Education

Process: 3 Steps

1 extract related entities

extract all entities of DBpedia with typehttp://dbpedia.org/ontology/EducationalInstitution

2 define training data by experts for the given motive

3 expertsquestion: Is the considered place important w.r.t. education inthis country?

3 use training set to learn relevant places

Page 29: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 14

Training

Reminder: Research Questions

Is it beneficial to aggregate data from different sources?

Approach: iterative integration of linked data

1 DBpedia (educational entities) and populated places (NaturalEarth)

2 add all features of populated places (Natural Earth)

3 add all features of GeoNames

Note

feature selection after each step

Page 30: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 14

Training

Reminder: Research Questions

Is it beneficial to aggregate data from different sources?

Approach: iterative integration of linked data

1 DBpedia (educational entities) and populated places (NaturalEarth)

2 add all features of populated places (Natural Earth)

3 add all features of GeoNames

Note

feature selection after each step

Page 31: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 14

Training

Reminder: Research Questions

Is it beneficial to aggregate data from different sources?

Approach: iterative integration of linked data

1 DBpedia (educational entities) and populated places (NaturalEarth)

2 add all features of populated places (Natural Earth)

3 add all features of GeoNames

Note

feature selection after each step

Page 32: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 15

Experimental Settings

Training Data

881 samples of training data

average expert rating normalized from [0, 2] to relevant orirrelevant

Training Process

Weka

200 runs

Breiman’s random forest (cross validation)

Page 33: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 15

Experimental Settings

Training Data

881 samples of training data

average expert rating normalized from [0, 2] to relevant orirrelevant

Training Process

Weka

200 runs

Breiman’s random forest (cross validation)

Page 34: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 16

Results

Datasets D1 D2 D3

features entities nearby 10kmentities nearby 25kmentities nearby 50km

all features of D1and

max of statese in countryworldcitypop maxrank maxmax areami

all features of D2and

gn pop

TP rate 0.73 0.79 0.80FP rate 0.57 0.42 0.41

F-measure 0.71 0.78 0.79Precision 0.70 0.78 0.79

AUC 0.64 0.82 0.83Accurancy 0.73 0.79 0.80

Page 35: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 16

Results

Datasets D1 D2 D3

features entities nearby 10kmentities nearby 25kmentities nearby 50km

all features of D1and

max of statese in countryworldcitypop maxrank maxmax areami

all features of D2and

gn pop

TP rate 0.73 0.79 0.80FP rate 0.57 0.42 0.41

F-measure 0.71 0.78 0.79Precision 0.70 0.78 0.79

AUC 0.64 0.82 0.83Accurancy 0.73 0.79 0.80

Page 36: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 16

Results

Datasets D1 D2 D3

features entities nearby 10kmentities nearby 25kmentities nearby 50km

all features of D1and

max of statese in countryworldcitypop maxrank maxmax areami

all features of D2and

gn pop

TP rate 0.73 0.79 0.80FP rate 0.57 0.42 0.41

F-measure 0.71 0.78 0.79Precision 0.70 0.78 0.79

AUC 0.64 0.82 0.83Accurancy 0.73 0.79 0.80

Page 37: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 17

Conclusion and Future Work

presented an approach for answering commonly questions

focus: geo-spatial motives in search queries

approach based on machine learning

iterative data integration (linked data)

→ winter holidays with culture

Page 38: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 17

Conclusion and Future Work

presented an approach for answering commonly questions

focus: geo-spatial motives in search queries

approach based on machine learning

iterative data integration (linked data)

→ winter holidays with culture

Page 39: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 18

Conclusion and Future Work

Reseach Questions

Can one use machine learning techniques on features of linked datafor computing motives?

approach for deriving regions of interest

geospatial search motives are covered with a good quality

however, time-consuming approach

Is it beneficial to aggregate data from different sources?

quality increased

even duplicate attributes support

Page 40: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 18

Conclusion and Future Work

Reseach Questions

Can one use machine learning techniques on features of linked datafor computing motives?

approach for deriving regions of interest

geospatial search motives are covered with a good quality

however, time-consuming approach

Is it beneficial to aggregate data from different sources?

quality increased

even duplicate attributes support

Page 41: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 19

Conclusion and Future Work

Major Open Research tasks

How to reduce the manual effort?

How to handle low quality data sets?

How to compute regions unrelated to populated places?

. . .

Page 42: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 20

Take Away

search-driven applications need tobridge the user intentions

linked data helps at getting the data

geo-spatial motives can be covered

reduction of manual effort is needed

Dr. Andreas Both

Head of Researchand Development

Unister GmbH,Leipzig, Germany

[email protected]

+49 341 65050 24496

http://www.unister.de

Page 43: Computing Geo-Spatial Motives from Linked Data for Search ...€¦ · Making web and data techolo-gies useful, focus: ... Both { KNOW LOD workshop, ESWC 2015 | Portoroz, 2015-05-31

A. Both – KNOW LOD workshop, ESWC 2015 — Portoroz, 2015-05-31 Slide 21

Metrics

Evaluation metrics for a binary classifier (defined in terms ofpositives (P), negatives (N), true positives (TP), true negatives

(TN) and false positives (FP))

Metric Definition

True positive rate (recall) TPR = TP/P

False positive rate FPR = FP/P

Precision Pr = TP/(TP + FP)

F-measure F = 2 ∗ Pr ∗ TPR/(Pr + TPR)

AUC area under the curve depicting TPR plotted against FPR

Accuracy Acc = (TP + TN)/(P + N)