090925 Data Transformation

8
Guidelines for Interoperability Guidelines for Interoperability in Tourism in Tourism Data Transformation Data Transformation Wolfram Höpken Wolfram Höpken [email protected] NoFrills Travel & Technology Expo, Bergamo NoFrills Travel & Technology Expo, Bergamo 25.09.2009 25.09.2009

description

 

Transcript of 090925 Data Transformation

Page 1: 090925 Data Transformation

Guidelines for Interoperability in Guidelines for Interoperability in TourismTourism

Data TransformationData Transformation

Wolfram HöpkenWolfram Hö[email protected]

NoFrills Travel & Technology Expo, BergamoNoFrills Travel & Technology Expo, Bergamo25.09.200925.09.2009

Page 2: 090925 Data Transformation

2Wolfram Höpken NoFrills Bergamo 09-09-25

Data transformation – structured data mappingData transformation – structured data mapping

Structured data mappingStructured data mapping Schema mappingSchema mapping: Establishing mappings between local data : Establishing mappings between local data

sources (or database schemas)sources (or database schemas)

Datasource-to-ontology mappingDatasource-to-ontology mapping: Establishing mappings : Establishing mappings between a datasource and an ontologybetween a datasource and an ontology

Mapping languagesMapping languages Should be fully declarative in order toShould be fully declarative in order to

efficiently define and describe mappingsefficiently define and describe mappings

discover inconsistencies and ambiguities in mappingsdiscover inconsistencies and ambiguities in mappings

Examples: XSLT, Examples: XSLT, D2R map, R2OD2R map, R2O

Page 3: 090925 Data Transformation

3Wolfram Höpken NoFrills Bergamo 09-09-25

Data transformation – structured data mappingData transformation – structured data mapping

Types of clashes between data sourcesTypes of clashes between data sources Different Different namingnaming: Equivalent concepts have different names in : Equivalent concepts have different names in

different datasources (fully mappable)different datasources (fully mappable) Different Different positionposition: Equivalent concepts have different positions within : Equivalent concepts have different positions within

the structure of the datasource (fully mappable)the structure of the datasource (fully mappable) Different Different scopescope of concepts: Concepts, containing the same piece of of concepts: Concepts, containing the same piece of

information in different datasources, have different scopes, i.e., the information in different datasources, have different scopes, i.e., the same piece of information might be represented as single concept or same piece of information might be represented as single concept or as part of several concepts (fully mappable)as part of several concepts (fully mappable)

Different Different abstraction levelsabstraction levels: The same information is represented on : The same information is represented on different levels of abstraction (partially mappable)different levels of abstraction (partially mappable)

Different Different granularitygranularity: The same information is represented on : The same information is represented on different levels of granularity (partially mappable)different levels of granularity (partially mappable)

Missing conceptMissing concept: A concept in one datasource has no counterpart in : A concept in one datasource has no counterpart in the other datasource (not mappable)the other datasource (not mappable)

Page 4: 090925 Data Transformation

4Wolfram Höpken NoFrills Bergamo 09-09-25

Data transformation – structured data mappingData transformation – structured data mapping

Short-term recommendations (1–3 years)Short-term recommendations (1–3 years) Use Use (graphical) mediation tools(graphical) mediation tools that automatically map two that automatically map two

different data structuresdifferent data structures Introduce Introduce reasoning capabilitiesreasoning capabilities within resource mediation within resource mediation

tools to automatically suggest tools to automatically suggest inconsistenciesinconsistencies

Long-term recommendations (3–10 years)Long-term recommendations (3–10 years) Use semantic web technologies (e.g. based on RDF) to Use semantic web technologies (e.g. based on RDF) to name name

and represent (data) resources on the Weband represent (data) resources on the Web so that mapping so that mapping can be automatically undertakencan be automatically undertaken

Foster Foster high level general ontologieshigh level general ontologies to describe particular to describe particular domains of interest so that low-level more concrete ontologies domains of interest so that low-level more concrete ontologies can later be linked or merged within the (general) structurecan later be linked or merged within the (general) structure

Page 5: 090925 Data Transformation

5Wolfram Höpken NoFrills Bergamo 09-09-25

Data transformation – semantic annotationData transformation – semantic annotation

Semantic annotationSemantic annotation Adding meaning to unstructured, semi-structured or Adding meaning to unstructured, semi-structured or

structured content (html documents, word documents, structured content (html documents, word documents, video or audio content, etc.)video or audio content, etc.)

Based on ontologies as referenced semanticBased on ontologies as referenced semantic

TaggingTagging User-generated semantic annotationUser-generated semantic annotation Often based on taxonomiesOften based on taxonomies

FolksonomiesFolksonomies Community-generated taxonomiesCommunity-generated taxonomies Especially used for annotation of user-generated contentEspecially used for annotation of user-generated content

Page 6: 090925 Data Transformation

6Wolfram Höpken NoFrills Bergamo 09-09-25

Data transformation – semantic annotationData transformation – semantic annotation

Short-term recommendations (1–3 years)Short-term recommendations (1–3 years)

Build Build graphic manual annotationgraphic manual annotation toolstools that enable that enable transparent semantic annotationtransparent semantic annotation and automatic and automatic generation of correspondent source codegeneration of correspondent source code

Long-term recommendations (3–10 years)Long-term recommendations (3–10 years)

Support Support natural language processingnatural language processing annotation annotation techniquestechniques

Page 7: 090925 Data Transformation

7Wolfram Höpken NoFrills Bergamo 09-09-25

Data transformation – automatic information Data transformation – automatic information extractionextraction

Information extractionInformation extraction Structuring unstructured data in a way that it can be automatically Structuring unstructured data in a way that it can be automatically

analysed, queried and integrated with structured data sourcesanalysed, queried and integrated with structured data sources

Automatic identification of selected types of entities, relations, or Automatic identification of selected types of entities, relations, or events in free textevents in free text

Named entity recognitionNamed entity recognition Explication of references to organisations, institutions, facilities, Explication of references to organisations, institutions, facilities,

places, etc.places, etc.

Machine learning techniquesMachine learning techniques like maximum entropy or hidden markov like maximum entropy or hidden markov

Current approaches reach up to Current approaches reach up to 90% precision90% precision

Event extractionEvent extraction Normally Normally template-basedtemplate-based extraction of information, built on top of extraction of information, built on top of

named entity recognition approachesnamed entity recognition approaches

Page 8: 090925 Data Transformation

8Wolfram Höpken NoFrills Bergamo 09-09-25

Data transformation – automatic information Data transformation – automatic information extractionextraction

Short-term recommendations (1–3 years)Short-term recommendations (1–3 years)

Foster the use of semantic web technologies to Foster the use of semantic web technologies to describe describe non-structured datanon-structured data on the web by the means of on the web by the means of resources to make data machine processableresources to make data machine processable

Long-term recommendations (3–10 years)Long-term recommendations (3–10 years)

Agree on the Agree on the labelslabels (preferably with intervention of a (preferably with intervention of a recognized body such as the W3C) particular tourism recognized body such as the W3C) particular tourism content ought to have, so that it is made content ought to have, so that it is made visible for visible for search enginessearch engines

Develop SW that enables Develop SW that enables (semi)automatic information (semi)automatic information annotationannotation according to the previous recommendation according to the previous recommendation