On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study

Post on 02-Jul-2015

149 views 2 download

description

The growing size, heterogeneity and complexity of databases demand the creation of strategies to facilitate users and systems to consume data. Ideally, query mechanisms should be schema-agnostic or vocabulary-independent, i.e. they should be able to match user queries in their own vocabulary and syntax to the data, abstracting data consumers from the representation of the data. Despite being a central requirement across natural language interfaces and entity search, there is a lack on the conceptual analysis of schema-agnosticism and on the associated semantic differences between queries and databases. This work aims at providing an initial conceptualization for schema-agnostic queries aiming at providing a fine-grained classification which can support the scoping, evaluation and development of semantic matching approaches for schema-agnostic queries.

Transcript of On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study

On the Semantic Mapping of Schema-

agnostic Queries: A Preliminary Study

André Freitas, João C. Pereira da Silva, Edward Curry

Insight Centre for Data Analytics

NLIWoD, ISWC 2014

Riva del Garda

On the Semantic Mapping of Schema-

agnostic Queries: A Preliminary Study

André Freitas, João C. Pereira da Silva, Edward Curry

Insight Centre for Data Analytics

NLIWoD, ISWC 2014

Riva del Garda

Outline

Goals

Semantic Tractability

Dimensions of Query-Database Semantic Heterogeneity

Definitions

Semantic Resolvability

Summary

Motivation

QA/NLI

Q0, R0

...Q1, R1

Qn, Rn

f-measure

What is being evaluated by the test collection ?

semantic matching

Goals

Provide a preliminary categorization on the semanticmatching (schema-agnosticism) classes.

Support a conceptual understanding on the semanticphenomena behind schema-agnostic queries.

Applications:

- Help on the design and evaluation of schema-agnostic query mechanisms

- Relevant to Question Answering and Natural Language Interfaces

Semantic Tractability

Popescu et al. (2003)

Towards a Theory of Natural Language Interfaces to Databases

Definition focuses on soundness and completeness

conditions for mapping Natural Language Queries to Database

elements

Semantic Tractability

Leaves many queries outside the tractability scope

Conditions:- Query-Database syntactic isomorphism- Explicit and unambiguous synonymic mapping

Goal is to provide an all inclusive categorization system

Dimensions of Query-Database Semantic

Heterogeneity

Methodology for the creation of a taxonomy of lexico-semantic

differences

Listing of concepts expressed in the existing semantic

heterogeneity taxonomies - George, 2005

- Colomb, 1997

- Parent & Spaccapietra, 1998

- Kashyap & Sheth, 1996

Elimination of concepts which were not relevant in the context of

the query-database semantic differences

Merging and renaming of equivalent concepts

Taxonomy of Semantic Differences

Semantic Mapping

Query Tokens

Dataset Lexical Element

Associated Semantic Knowledge Base (M)

Query

TokenM token q

Dataset

LexiconM Σ

...

Semantic Reachability

Query-Dataset Semantic mapping:

Semantic Resolvability

Resolved Schema-agnostic Query

Semantic Mapping Types

Classifies each semantic mapping

According to the semantic heterogeneity classes

Taking into account some semantic phenomena (ambiguity, vagueness)

AP: Abstraction Process

Trivial

Lexical

Synonymic

Generalization/specialization

Conceptual

Functional/Aggregation

PS: Predicate Structure

Predication preseving

Predication difference

M: Semantic Knowledge Base

Self-Sufficient

Dependent on External Knolwedge Base

SE: Semantic Evidence & Uncertainty

Absolute

Context resolvable

CT: Context

Sufficient

Insufficient

MC: Mapping Cardinality

1:1

1:N

N:1

M:N

Semantic Intepretation Model

Example

Semantic Resolvability Classes

Easier

Harder

Example test collection analysis

Test collection X

Has 4 distinct semantic resolvability classes

50% are trivial mappings

23% are lexical mappings

27% are synonymic mappings

100% of the predicates are structure preserving

100% of the mapping cardinalities are 1:1

Example system evaluation

System Y

Addresses 5 out of 10 semantic resolvability classes

(AP=conceptual, PS=*, MC=1:1, SE=*, M=*, CT=*)- map = 0.51, recall = 0.7

...

Summary

NLI/QA Systems have semantic matching (schema-

agnosticism) at its center

The proposed categorization can be used for a more principled

interpretation of the results of NLI/QA systems

... and also on which dimensions evaluation campaigns actually

measure

It supports deeper comparative analysis

Future work includes the categorization of the QALD test

collection