AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... ·...

63
AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONS -Ramya Venkateswaran -([email protected]) 1

Transcript of AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... ·...

Page 1: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

AD HOC DATA INTEGRATIONFOR MOBILE GIS

APPLICATIONS

-Ramya Venkateswaran-([email protected])

1

Page 2: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Contents

1. Scenario

2. Research Objective

3. Introduction: Overview of the GenW2 project

4. Motivation: Why is Ad hoc Data Integration needed?

5. State of the Art

6. Research Questions: Discuss 3 research questions

7. Methods: TourGuide and friends

8. Next Steps: Data Enrichment and Quality control

2

Page 3: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Scenario1

Page 4: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Scenario of Usage I will be vacationing in Paris and I want to visit some of the famous palaces, History related places and other tourist locations in Paris

Other Sources?Recommendations

from

People

Tourist Guides

Albums & Images

Tourist & Travel Websites

Page 5: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Scenario of UsageI’d still like to go to Paris..

Other Sources?

People

Tourist Guides

Albums & Images

Tourist & Travel Websites

Tourguide

Recommendationsfrom

Page 6: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Research Objective2

Page 7: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Objective of my research

Data Integration

•Flavour Based integration

• Ad hoc DI vs. Traditional DI

• TourGuide

Data enrichment

• POI Enrichment

• Website credibility

Data quality control

• Completeness

• Correctness

• Credibility

• User feedback

Ad hoc Data Integration

7

Page 8: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Overview and Introduction3

Page 9: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Overview of the GenW2 Project

Short for: Generalization for portrayal in Web and Wireless mapping

Develop new methods for web and wireless mapping

Focus on ad hoc integration of heterogeneous information on-the-fly map generalization in a mobile context.

9

Page 10: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework10

Web

Result

Internal Database

Information retrieval component

ParserRuleset & Association Component

Spatio-Temporal

Event handler

User

Privacy Controller and Firewall

Visualization

Filter & Relevance

Component

Genera-lization

Query

ParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data Integrator

Data sources

1

1

3

2

Page 11: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework11

Web

Result

Internal Database

Information retrieval component

ParserRuleset & Association Component

Spatio-Temporal

Event handler

User

Privacy Controller and Firewall

Visualization

Filter & Relevance

Component

Genera-lization

Query

ParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data Integrator

Data sources

1

1

3

2

Page 12: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework12

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Page 13: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

MRDBFacts DB

Image metadata

Types of Data sources

Webservices

13

Web pages

Staticdatasets

Page 14: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Motivation - Why is Ad hoc Data Integration needed?4

Page 15: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Motivation

So many data sources and so little structure

Web as a database – Too much information to ignore!

Ad hoc integration – Need based according to scenario and flavour, unlike search engines.

Importance of recording certain facts that can enrich the MRDB and the integration process.

15

Page 16: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

State of the art5

Page 17: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Relevant Domains

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

17

Ad hoc Data Integration

Page 18: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

State of Art

Data Integration

•Flavour Based integration

• Ad hoc DI vs. Traditional DI

• TourGuide

Data enrichment

• POI Enrichment

• Website credibility

Data quality control

• Completeness

• Correctness

• Credibility

• User feedback

Ad hoc Data Integration

18

Page 19: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Integration, IR and decision systems

Different concepts and methods in Data Integration Data Integration from multiple sources Geospatial data mining and integration. (Knoblock et al.

2001, Michalowski et al., 2004)

Mashup web data for overall importance of landmarks. (Grabler et al., 2008)

SPIRIT – Design, techniques and implementation (Purves et al., 2007, Jones et al., 2002, Bucher et al., 2005)

Geo parsing, geo coding and IR techniques (Clough et al., 2005)

19

Page 20: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Integration, IR and decision systems

Methods for marking tourist locations and a guide that is 'context aware'. (Abowd et al., 2004)

Activity based model of decisions that are affected based on activity-travel behavior and also predict the activities. (Arentze and Timmermans, 2004)

Voluntary information from a community, collaborative semantics, recommendation systems (Schlieder , 2007)

20

Page 21: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Data Enrichment

Methods and algorithms for the provision of auxiliary data and its use for controlling an automated adaptive generalization process (Neun, 2007)

21

Page 22: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Data quality and assessment

Framework for efficient and accurate integration of geospatial data from a large number of sources

Positional accuracy, completeness (Thakker et al., 2007)

VGI (Volunteered Geographic Information) Trust models for Gazetteers (Keßler et al., 2009)

22

Page 23: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Observations from literature

Considerable work and methods for traditional data integration, variety of methods in IR and GIR

Lesser work and methods for data integration from multiple and dynamic sources (Focus on semantics rather than data and context) and recording reusable facts.

Considerable work on user modeling, activities and activity recommendation

Data enrichment work for improving generalization

23

Page 24: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Challenges

Datasets are not static and are dynamic and heterogeneous

Auxiliary data Determining parameters (user categories, activities

habits etc, not a single user or set of preferences) Point of complete integration Methods to test and evaluate the effectiveness

24

Page 25: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Research Questions6

Page 26: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

RQ1 – Flavour Based Integration

Given an activity and unrelated data that is heterogeneous and dynamic, what is an effective method of data integration, so that the results are streamlined towards information about events and places for a set of users? Flavour based data integration from various sources Ad hoc DI vs. Traditional DI Tour guide – An example of web data integration

26

Page 27: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

RQ2 – Data Enrichment

How can the Generalization for portrayal in Web and Wireless mapping (GenW2) framework record and exploit valuable reusable information, obtained from the preceding data integration? Facts DB Activity-Location pairs

Data source credibility (Keßler et al., 2009) User feedback

27

Page 28: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

RQ3 – Quality of data

What are the different metrics that can be used to control and/or assess the quality of the integrated data? Measurement of Quality?

Quality of data by completeness (Thakkar et al., 2007) Quality of data by correctness (Thakkar et al., 2007)

Another metric for Quality Assessment Quality of data by collective user feedback

Credibility rank of information sources (Keßler et al., 2009)

Evaluation Methodology

28

Page 29: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Methods7

Page 30: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

30

Page 31: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Definition - Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

31

Page 32: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Definition - Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

32

Page 33: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

33

Definition - Flavour Based Data Integration

Page 34: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

34

Definition - Flavour Based Data Integration

Page 35: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

35

Definition - Flavour Based Data Integration

Page 36: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

36

Page 37: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Keyphrases in FBDI

Systematic approach to extracting information Obtain information from one or many knowledge

resource/s Recommendations for user groups or user

categories

Opinions of a community of users Keyword, flavour or activity such as tourism, history,

sport, culture, shopping etc

37

Page 38: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Definition of FBDI

FBDI is an activity based, systematic approach to extract and integrate information from multiple knowledge sources depending on habits of certain user groups or user categories, capable of learning over time.

Flavour = typical activities of a certain user group Examples – Tourism, Shopping, Sports, Historical

excursions, Cultural excursions etc

38

Page 40: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework40

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Page 41: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

The GenW2 Framework41

Page 42: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Adaptive tour guide for Paris

Flavour Based Integration with web as datasource Only web as the

database (Grabler et al.,2008 )

Integration of data on Tourism Transport User feedback User Rating Facebook profile Dopplr profile

Scheduler

42

Page 43: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Data Integrator

Example of web data integration Functional components (Baumgartner et al., 2009)

Web interaction component Lonelyplanet, wikitravel, virtualtourist, tripadvisor and

official tourist website

Wrapper generatorOpenKapow Robomaker

Data transformer DOM parser for RSS and XML formats

43

Page 44: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework44

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Page 45: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Data Integrator

Example of web data integration Functional components (Baumgartner et al., 2009)

Web interaction component Lonelyplanet, wikitravel, virtualtourist, tripadvisor and

official tourist website

Wrapper generatorOpenKapow Robomaker

Data transformer DOM parser for RSS and XML formats

45

Page 46: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework46

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Page 47: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Web data Extraction

Semi automatic wrappers

Automatic wrapper Induction WIEN (Kushmerick et al., 1997)

Stalker (Muslea et al., 2001)

DEBye (Laender et al., 2000)

47

Academic XWARP (Liu et al., 2000)

Lixto (Baumgartner et al., 2001)

Wargo (Pan et al., 2002)

Commercial RoboMaker

(Kapow Technologies)

WebQL(QL2 Software Inc.)

Page 48: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Data Integrator

Example of web data integration Functional components (Baumgartner et al., 2009)

Web interaction component Lonelyplanet, wikitravel, virtualtourist, tripadvisor and

official tourist website

Wrapper generatorOpenKapow Robomaker

Data transformer DOM parser for RSS and XML formats

48

Page 49: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework49

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Page 50: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Data Integrator

Example of web data integration Google as a first part of integration Second Part - Functional components (Baumgartner

et al., 2009) Web interaction component lonelyplanet, wikitravel, virtualtourist, tripadvisor and

official tourist website

Wrapper generatorOpenKapow Robomaker

Data transformer DOM parser for RSS and XML formats

50

Page 51: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework51

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Page 52: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Intelligent Ranker and Scheduler

Third step of integration. Applies different profiles to the data, like Facebook

and Dopplr. Arranges the data in a ranked form depending on

matches from user interests and activities. Brute force cumulative ranking algorithm

3 – Explicitly mentioned 2 – Description match 1 – Suggested by other users

Merges data from public transport

52

Page 53: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework53

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Page 54: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Facts DB

Location information from the MRDB and map LOD with place

Activity Location pairs Fact DB structure

54

Page 55: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Facts DB Structure

High Level Structure

Lower level structure – Database Object maps to more locations

Limit to two levels Inverse Page Lookup

55

Activity LocationFrom LocationTo Name Rank User Feedback

Shopping 47°22′40″N,8°32′25″E

47.3671°N , 8.5409°E

Bahnhofstrasse 3 Shop for watches, jewelry, clothes

Database Object

Page 56: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Data Quality

Evaluation through completeness and correctness Example : Shopping stores in Bahnofstrasse Extract lat-lng Shop name, website, details and contact details Shop opening and closing times Evaluate against manually collected data for completeness

and correctness.

56

Page 57: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Next steps8

Page 58: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Next Steps

Formalizing parameters and methods for integration (Link)

Improve scoring algorithm for places Structure of Facts DB for efficient storage and

retrieval Develop on quality control methods like considering

user feedback and credibility

58

Page 59: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Open Questions

At what point is the data integrated? When is it complete? Qualitative vs. Quantitative Error recovery and correction mechanism in

FactsDB? Mapping of place’s score to LOD?

59

Page 60: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Fall 2008Year 1Spring2009

Fall 2009Year 2Spring 2010

Fall 2010Year 3 Spring 2011

• Literature review

• Develop overall framework

• Start to develop research questions and focus area.

• Literature review

• Develop research questions

• Define use cases

• Make a prototype of one use case - TourGuide

• Develop concept and methods for RQ1

• Implement parts of TourGuide

• Develop user tests for input to RQ2 and RQ3

• Continue work on RQ1. Formaliseparameters.

• Analyseinput from user tests and combine with other parameters for RQ2

• Continue work with RQ2 and start RQ3

• Formaliseparameters for data quality control

• Perform evaluation of data, define and implement quality assessing/controlling parameters for FBDI

• Finalize publications

• Thesis write-up

Milestones60

Page 61: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Summary: Expected contributions

Working system and framework for ad hoc data integration, that will work for certain flavours

Methodology of Flavour based data integration (RQ1) Structure Algorithm for efficient data source selection depending on “flavour” Algorithm for scoring different places depending on number of parameters.

Concept and structure of FactsDB that will work with data from the MRDB for enrichment (RQ2)

Improved and adapted parameters and a mechanism for checking the quality of the integrated data and some test cases (RQ3)

61

Page 62: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

The GenW2 Framework62

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

Data Integrator

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

Page 63: AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... · 2009-11-11 · MRDB Intelligent Ranker Data Integrator. Data sources. 1 1 3 2. The

Thank you!

Ramya Venkateswaran ([email protected])

Demo and slides at http://www.geo.uzh.ch/~ramya/kolloquium/

63