Wi presentation

31
Keyword-driven SPARQL Query Generation Leveraging Background Knowledge Authors: Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus Stadler AKSW group Universität Leipzig WI-IAT conference

Transcript of Wi presentation

Page 1: Wi presentation

Keyword-driven SPARQL Query Generation

Leveraging Background Knowledge

Authors:Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel

Gerber, Sebastian Hellmann, Claus Stadler

AKSW group

Universität Leipzig

WI-IAT conference

Page 2: Wi presentation

Outline• Motivation• Entity recognition Phase• SPARQL query generation • Evaluation• Conclusion and future work

2AKSW group - Universität Leipzig 24 August 2011

Page 3: Wi presentation

Querying web of documents

3AKSW group - Universität Leipzig

Text retrieval

24 August 2011

Page 4: Wi presentation

Web of Data

AKSW group - Universität Leipzig 424 August 2011

Page 5: Wi presentation

Motivations

Difficulties of Sparql

• Knowledge about the underlying ontology structure.

• Proficiency in formulating formal queries.

Keyword paradigm

• Successful experience of keyword-based search in document retrieval

• Satisfactory research results about the usability of this paradigm

5AKSW group - Universität Leipzig 24 August 2011

Page 6: Wi presentation

Birds-eye-view of the envisioned search approach

6AKSW group - Universität Leipzig 24 August 2011

Page 7: Wi presentation

Overview of the proposed method

7AKSW group - Universität Leipzig 24 August 2011

Page 8: Wi presentation

Outline• Motivation• Entity recognition phase• SPARQL query generation phase • Evaluation• Conclusion and future work

8AKSW group - Universität Leipzig 24 August 2011

Page 9: Wi presentation

Mapping keywords to IRIs

• The goal is recognition of entities.

• Mapping is based on string similarity.

• This similarity is applied on all types of entities (i.e., classes, properties and instances).

• As a result, for each keyword, we retrieve a list of IRI candidates called anchor points.

9AKSW group - Universität Leipzig 24 August 2011

Page 10: Wi presentation

Ranking and Selecting Anchor Points

• Ranking is based on Specificity degree.• Specificity degree is in terms of string similarity and

connectivity degree.• The string similarity score calculates the similarity of

the label of to • The connectivity degree CD(u) for each is

computed as counting how often occurs in the triples of the knowledge base.

10

iKi APu

iKi APu

u

iK

AKSW group - Universität Leipzig 24 August 2011

Page 11: Wi presentation

Ranking and Selecting Anchor Points

• Specificity degree is defined as:

• Sorting anchor points corresponding to each keyword based on specificity degree.

• Selecting IRIs in each sorted anchor points list.

11

))(log(),()( uCDKuuS ilabel

ntop

AKSW group - Universität Leipzig 24 August 2011

Page 12: Wi presentation

Outline• Motivation• Entity recognition phase• SPARQL query generation phase • Evaluation• Conclusion and future work

12AKSW group - Universität Leipzig 24 August 2011

Page 13: Wi presentation

Graph pattern template

• H is a set of placeholders and V is a set of variable identifiers being

disjoint from each other and from .

• A graph pattern template is defined as:

• After replacing the placeholders in a graph pattern template with the detected IRIs, a graph pattern with triple patterns of the form

13

)}()()(|),,{( EVoEVpEVsopsGPT

)()()( ICVPVIV

PIC

AKSW group - Universität Leipzig 24 August 2011

Page 14: Wi presentation

Categorization of all graph pattern templates

14

Category Possible Patterns Pattern Schema

Instance-Property (IP)

IP.P1 IP.P2

IP.P3 IP.P4 IP.P5 IP.P6

)s, p, ?o(?) s, p, o(

?) s1, ?p1, o1?)(s1, p2, ?o2 (?)s1, ?p1, o1?)(o2, p2, ?s1() s1, ?p1, ?o1?)(s2, p2, ?o1() s1, ?p1, ?o1?)(o1, p2, ?o2 (

Class-Instance (CI) CI.P7

CI.P8 ?)s1, a, c?)(s1, ?p1, o1 (?)s1, a, c)(s2, ?p1, ?s1 (

Instance-Instance (II)

II.P9 II.P10 II.P11 II.P12

)s, ?p, o() s, ?p1, ?x?)(x, ?p2, o(

) s1, ?p1, ?x)(s2, ?p2, ?x(?)s, ?p1, o1?)(s, ?p2, o2 (

Class-Property (CP) CP.P13

CP.P14 ?)s, a, c?)(s, p, ?o(?) s, a, c?)(x, p, ?s (

Property-Property (PP) PP.P15

PP.P16 PP.P17

?)s, p1, ?x?)(x, p2, ?o(?) s1, p1, ?o?)(s2, p2, ?o (?)s, p1, ?o1?)(s, p2, ?o2 (

AKSW group - Universität Leipzig 24 August 2011

Page 15: Wi presentation

Appropriate identified graph pattern templates

15

Category Possible Patterns Pattern Schema

Instance-Property (IP) IP.P1IP.P4 IP.P6

)s, p, ?o(?)s1, ?p1, o1?)(o2, p2, ?s1(

) s1, ?p1, ?o1?)(o1, p2, ?o2 (

Class-Instance (CI) CI.P7

CI.P8 ?)s1, a, c?)(s1, ?p1, o1 (?)s1, a, c)(s2, ?p1, ?s1 (

Instance-Instance (II) II.P9

II.P10 )s, ?p, o(

) s, ?p1, ?x?)(x, ?p2, o (

Class-Property (CP) CP.P14 ?) s, a, c?)(x, p, ?s (

Property-Property (PP) - -

AKSW group - Universität Leipzig 24 August 2011

Page 16: Wi presentation

Query generation algorithm

16AKSW group - Universität Leipzig 24 August 2011

Page 17: Wi presentation

Example

Consider two keywords : "Germany“ and "island“ User intention: the list of Germany's islands.

After applying mapping and ranking functions on the user keywords, we obtain two identified IRIs, i.e.

1. http://dbpedia.org/ ontology/ Island with the type class

2. http://dbpedia.org/ resource/Germany with the type instance.

The possible graph pattern templates for these two IRIs are:

1. (?island, a, dbo:Island), (?island, ?p, dbr:Germany)

2. (?island, a, dbo:Island), (dbr:Germany, ?p, ?island)

17AKSW group - Universität Leipzig 24 August 2011

Page 18: Wi presentation

Example

SPARQL queries are:

SELECT * WHERE { ?island a dbo:Island . ?island ?p dbp:Germany . }

SELECT * WHERE { ?island a dbo:Island . dbp:Germany ?p ?island. }

Some desired answers to be retrieved are: db:Rettbergsaue a dbo:Island .

db:Rettbergsaue dbp:country dbr:Germany .

db:Sylt a dbo:Island .

db:Sylt dbp:country dbr:Germany .

db:Vilm a dbo:Island .

db:Vilm dbp:country dbr:Germany .

db:Mainau a dbo:Island .

db:Mainau dbp:country dbr:Germany .

18AKSW group - Universität Leipzig 24 August 2011

Page 19: Wi presentation

Online interface

19AKSW group - Universität Leipzig

lod-query.aksw.org

24 August 2011

Page 20: Wi presentation

Outline• Introduction• Entity recognition phase• SPARQL query generation phase• Evaluation• Conclusion and future work

20AKSW group - Universität Leipzig 24 August 2011

Page 21: Wi presentation

Accuracy metrics

• The user’s intention in keyword-based search is ambiguous.

• Judging the correctness of the retrieved answers is a challenging task.

• Example: Given the keywords France and President .

• Following RDF graphs (i.e. answers) are presented to the user:1. Nicolas_Sarkozyy nationality France .

Nicolas_Sarkozy a President .

2. Felix_Faure birthplace France .

Felix_Faure a President .

3. Yasser_Arafat deathplace France .

Yasser_Arafat a President .

...

21AKSW group - Universität Leipzig 24 August 2011

Page 22: Wi presentation

Accuracy metrics

• Besides distinguishing between answers related to different interpretations, we differentiate between pure answers (just containing preferred terms) and those which contain some impurity.

• In fact, the correctness of an answer is not a bivalent value.

• We investigate two questions:

1) For how many of the keyword queries do the templates yield answers at all with respect to the original intention?

2) If answers are returned, how correct are they?

AKSW group - Universität Leipzig 2224 August 2011

Page 23: Wi presentation

Accuracy metrics

• Correctness rate. For an individual answer, we define correctness rate as the fraction of correct (preferred) RDF terms occurring in it.

• Average CR. For a given set of answers of a query q, we define average correct rate as the arithmetic mean of the CRs of its individual answers.

• Fuzzy precision metric (FP). which measures the overall correctness of the answers corresponding to a set of keyword queries.

AKSW group - Universität Leipzig 2324 August 2011

Page 24: Wi presentation

Accuracy metrics

• We also measured the recall as the fraction of keyword queries for which answers were found:

AKSW group - Universität Leipzig 2424 August 2011

Page 25: Wi presentation

Accuracy of each categorized graph pattern

25AKSW group - Universität Leipzig 24 August 2011

Page 26: Wi presentation

Categorization based on the matter of information.

1. Finding special characteristics of an instance - IP.P1, IP.P4 IP.P6

2. Finding similar instances - CI.P7, CI.P8, CP.P14

• Finding associations between instances - II.P9, II.P10

26AKSW group - Universität Leipzig 24 August 2011

Page 27: Wi presentation

Samples of keywords and results

27AKSW group - Universität Leipzig 24 August 2011

Page 28: Wi presentation

Accuracy results for different categories

Category Recall Fuzzy precision F-score

Similar instances 0.700 0.735 0.717

Characteristics of an instance

0.625 0.700 0.660

Associations between instances

0.500 0.710 0.580

General accuracy 0.625 0.724 0.670

28AKSW group - Universität Leipzig 24 August 2011

Page 29: Wi presentation

Outline• Introduction• Entity recognition Phase• SPARQL query generation • Evaluation• Conclusion and future work

29AKSW group - Universität Leipzig 24 August 2011

Page 30: Wi presentation

Conclusion and future work

• Analysis of graph patterns for limiting search space.

• We did not separate ontology level and knowledge base level for generating graph patterns.

We aim to:

1. Allow a larger number of keywords.

2. Make more extensive use of linguistic features and techniques.

3. Enable users to refine obtained queries and to add additional constraints.

4. Apply this work on large-scale datasets of Data Web.

30AKSW group - Universität Leipzig 24 August 2011

Page 31: Wi presentation

31

Thank you for your attention.Thanks to my colleague from AKSW

research group.Any Question?

AKSW group - Universität Leipzig 24 August 2011