ANGIE in wonderland

45
1 ANGIE in Wonderland Nicoleta Preda

Transcript of ANGIE in wonderland

1

ANGIE in Wonderland

Nicoleta Preda

Motivating example

Long term goal: new intelligent applications such as

Applications that automatically compute vacations plans

Example: • I would like to travel for 3 weeks in South America • Visit UNESCO sites • Old palaces

2

3

Automatic computation of vacation plans

Personal CalendarWeb Services API

Traveling Related BooksWeb Services API

Flights Web Services API

Countries, Cities, Airports

Web Services API

Web Service APIs available on the Web

ProgrammableWeb.com counts >12000 APIs from various domains:

• Search (3200 APIs)

• Social (3000 APIs)

• Traveling (1200 APIs)

• Music (1000 APIs)

• Financial (1200 APIs), Science (600 APIs), Weather (300 APIs)

4

Query examples

• Places in Peru listed as UNESCO heritage

• Books written by South American Nobel Prize Winners

• Memorial houses of Brazilian Kings

5

Our research

• Query Evaluation using Web Service APIs

• Mapping Web Services to Knowledge Bases

6

Web Services WWW

SUSIE

Web Services

ANGIE

KB

Web Services KnowledgeBase

DORIS

7

Web Services

ANGIE

KB

8

Problem Description

Given a query Q against

• a knowledge base (KB)

• a set of Web services F

• a bound Max for the number of Web service calls

compute answers for Q using at most Max calls

8

9

Representing functions of Web Service APIs

A function is a named parameterized conjunctive query where

• Inputs must be bound to entities before the call execution

• Outputs are bound as the result of the call

• Relations are from a global schema (knowledge base schema)

outputinput

parent

p_place

birthplace

?child

?c_place

birthplace

hasChild

getChildren(parent, p_place,?child, ?c_place)

getChildren(parent, p_place,?child, ?c_place) :- birthplace(parent, p_place),

hasChild(parent, ?child)

birthplace(hasChild,?c_place)

9

Query example

parent

p_place

birthplace

?child

?c_place

birthplace

hasChild

getChildren(parent, p_place,?child, ?c_place)

?place

birthplace

Pedro II of Brazil

Query

Pedro II of Brazil

Baseline Solution (aiming at completeness)

getChildren

birthplace

hasChild

getChildren

birthplace

hasChild

X

Brussels

birthplace

Isabella of Austria

getChildren

hasChild….

getChildren birthplace

hasChild

birthplace

Palace of São Cristóvão, Rio de

Janeiro

Pedro II of Brazil

birthplace

Kensington Palace,

London

Queen Victoria of the UKBut I only have a small budget of calls !

11

ANGIE Algorithm: the bang for the buck

birthplace

?place

Pedro II of Brazil

parent

p_place

birthplace

hasChild

Pedro II of Brazil

hasChild

Pedro I of Brazil

Ajuda, Lisbonbirthplace

hasChild

Juan VI of Portugal

parent

p_place

birthplace

hasChild

Querluz Palace, Lisbon

Palace of São Cristóvão, Rio de

Janeiro

Juan VI of Portugal

Ajuda, Lisbon

Pedro I of Brazil

parent

p_place

birthplace

hasChild ?child

?c_place

birthplace

12

13

Property

For a pipeline of calls:

W1 < W2 <… Wi … Wn < Q

where the inputs are extracted using the local queries

Q1KB Q2

KB … QiKB … Qn

KB

If the knowledge base has answers for QiKB then

execute only Wi … Wn

13

Web call composition graph

YAGO

Query

?placebirthplace

?personidhasId

getInfoByPersonId

?idperson

getPersonId

hasId

GetChildren

Juan VI of Portugal, Ajuda

GetChildren

Pedro I of Brazil

Pedro II of Brazil

GetPersonId

GetInfoByPersonId

id_Pedro-II

14

0

100

200

300

400

500

600

0 4 81

21

62

02

42

83

23

64

04

44

85

25

65

96

36

77

17

57

98

38

79

19

59

91

03

DF

F-RDF

F-RDF-R

Number of answers

Nu

mb

er

of c

alls

ANGIE

ANGIE-cost

Books of French Nobel Prize winners

Experiments

15

50 real Web services from 3 domains:

• Music• Books• Movies

16

ANGIE: Active Knowledge & Interaction Exploration

Query MediatorDynamically computes the Web calls that answer the query

RDF Warehouse

• The local KB stores the results of all executed Web calls

• Stored call results may speed-up the evaluation of related queries

16Active Knowledge : Dynamically Enriching RDF Knowledge Bases by Web Services.with F. M. Suchanek, G. Kasneci, T. Neumann, W. Yuan, G. Weikum, SIGMOD 2010

SUSIE

17

Web Services WWW

Problem: Asymmetric accesses

• Consider a source publishing only the Web service:

getLeaderInfo(leader, type, country)

• And the queries:

Q1: getLeaderInfo(Pablo II, ?, ?)

Q2: getLeaderInfo(?, ?, Brazil)

Q3: getLeaderInfo(?, king, Brazil)

18

Easy

ImpossibleImpossible

DB of leaders

1 million calls and two will succeed

Our Approach: Use the Web as an Oracle

Example: implement “get head by country and type”

19

King, Brazil “King of Brazil?”

Lula

Pedro I

Pedro II

HTMLInformationExtraction (IE)

getLeaderInfo King, Brazil getLeaderInfo King, Brazil

getLeaderInfo President, Brazil

3 calls and 2 will succeed

X

Model oracles as functions

20

HTMLInformationExtraction (IE)

[outputs (verified by WS)]

[country, head-type] “[type] of [country]”

oracleGetCandidates(person, type, country)

countryheadOf?person country

type

type

Are inefficient plans automatically avoided?

21

New Query

22

?countryheadOf

Brazil

King

type

oracleGetCandidates ?inauguration

headOfleader

date

inauguration

getInaugurationDay(leader, date) oracleGetCandidates(leader, type, country)

countryheadOfleader country

type

type

Pedro I of Brazil

10 March 1826

getInaugurationDay

Consider the additional Web services

getCurrentLeader(country, leader)

countryheadOfleader country

getPredecessor(leader, pLeader, pType, pDate, pCountry)

predecessor

leader

countryheadOfleader country

type

type

date

inauguration

Relevant but inefficient results

24

getCurrentLeader(Brazil, leader1, type1, date1)

getPredecessor(leader1, leader2, type2, date2, country2)

getPredecessor(leader2, leader2, type3, date3, country3)

getPredecessor(leader2, leader, type4, date4, country4)

countryheadOf

type

King

Brazil

inauguration

Smart calls vs. relevant but “guess” plans

25

countryheadOf

type

King

inauguration

getCurrentLeader(Brazil)getPredecessor(leader)

oracleGetCandidates(Brazil, King)

getInaugurationDay(leader)

Brazil predecessor

predecessor

Smart calls

Given a call Wi that belongs to a plan W1,… Wi,… Wn we say Wi is a smart call if its consequences are:

• either included in the union of the consequences of the previous functions Wi-1, ... W1

• or are atoms of the query

Property:

If a plan consists of only smart calls, and if every call has results, then the plan will deliver an answer for the query.

26

27

Experiments

50 Web services from three domains:

• Books• isbndb.org

• librarything.com

• abebooks.com

• Movies• internetvideoarchive.com (IVA)

• Music• musicbrainz.org

• last.fm

• discogs.com

• lyricWiki.org27

Evaluation results

28

Get prize winners TD ANGIE SUSIE

Nobel Prize in Literature 0 0 14

Golden Pen Award 0 0 11

Franz Kafka Prize 0 0 5

American Book Medal 0 0 16

Jerusalem Prize 0 0 11

Get books of winners of prize TD ANGIE SUSIE

Nobel Prize Literature 0 0 198

Golden Pen Award 0 0 228

Franz Kafka Prize 0 0 132

Jerusalem Prize 0 0 220

Get books of winners by prize and country TD ANGIE SUSIE

Nobel Prize Literature, France 0 0 144

Franz Kafka Prize, UK 0 0 79

Related Work: Answering Queries using Views

• Maximal contained rewritings (MCR) • Plans computing the largest number of answers

• Approaches based on reducing the number of irrelevant calls

• Benedict & al. PODS 2011, VLDB 2012• S. Kambhampa, JIIC 2004

• SUSIE does not target maximal contained rewritings• Relevant calls for MCR includes all calls that might return results

• Smart calls are a subset of relevant calls.

29

SUSIE

• Addressed the problem of asymmetric accesses

• A novel approach to answer such queries where the inputs for the Web service call are extracted on the fly, from the Web

• New evaluation algorithm that prioritizes smart calls

• An experimental evaluation using a representative set of queries and real data sources

30

SUSIE: Search Using Services and Information Extraction.with F. M. Suchanek, W. Yuan, G. Weikum ICDE 2013

31

Ongoing work

Given a query Q and a set of function F compute all smart plans (for which it can be proven that they return answers)

31

SUSIE

Web Services KnowledgeBase

DORIS

32

Web Service API

• Web Services for applications ≅ Web forms for humans

• An API = collection Web services

• A Web Service • expects bindings for input parameters

• returns structured data: XML or JSON

33

<geonames> <country> <ccode> AR </ccode> <cname> Argentina </cname> <isonumeric>032</isonumeric> <fipscode> ARG <fipscode> <continent> SA </continent> <continentName> Argentina </continentName> <capital> Buenos Aires </capital> <cities>

<city>

<name>Buenos Aires</name>

Goals

For every Web service:1) Compute a parameterized query (relations are from the KB)

2) Compute a transformation script XSLT to be applied for every call result XML result results for the parameterized query

34

1) Parameterized query for getCountryByName 35

getCountryByName(country, name, time-zone, capital, type, lat, lng city, c_lat, c_lng)

labelcountry

has

Ca

pita

l

time-zone

hasT

imeZ

one

name

hasCity

typetype

citylabel

c_lat

c_lng

latitude

longitude

lnglat

longitude

latitude

r

e

“Republic”“ARS’’

“Argentina”

“Buenos Aires”

f

“Buenos Aires”

g h

“-34”

i

“-64” “Córdoba”

g h

“-31.40833”

i

“-64.18388”

f

dcba j l

“-34” “-64”

getCountryByName(Argentina)

r

e

“Republic”“GMT+2’’

“Romania”

“Bucharest”f

“Bucharest”

g h

“44.4”

i

“26.1” “Rm Valcea”

g h

“45.1”

i

“24”

f

dcba j l

“44.4” “26.1”

2) An XSLT transformation for all call results

getCountryByName(Romania, GMT+2, Bucharest, Republic, 44.4, 26.1, Bucharest, 44.4,

26.1)

getCountryByName(Romania, GMT+2, Bucharest, Republic, 44.4, 26.1, Rm Valcea, 45.1, 24)

General Challenges

• Heterogeneity: Every Web services has its schema for outputs

• Schemas are unknown• >85% of Web services implemented using REST• REST Web services do not expose schema descriptions

Our approach: use the overlapping between Web services & Knowledge Bases

Intuition

38

r

e

“Republic”“ARS’’

“Argentina”

“Buenos Aires”

f

“Buenos Aires”

g h

“-34”

i

“-64” “Córdoba”

g h

“-31.40833”

f

dcba j l

“-34” “-64”

label

URI1

Argentina

has

Ca

pita

l

URI2

label

Buenos Aires

URI1

r

Three steps algorithm

1) Align root-to text-nodes to paths from the input in the KB

2) Compute class and relation alignment candidates satisfying functional constraits

3) For each candidate compute transformation functions and check inclusion and equivalence for the non-functional relations

Observation:

The first 2 steps alone lead to a precision/recall of of around 90%

39

40

DORIS: Some experimental results

More than 50 Web services from 4 domains• Books• Movies• Music• Geo data

KB Precision Recall

Classes Relations Classes Relations

YAGO 0.92 0.91 0.96 0.93

DBpedia 0.89 0.88 0.98 0.95

BNF 1 1 1 1

40

Summary

• Addressed the problem of inferring views

• An instance based approach to the schema matching problem

• An experimental evaluation using real Web sources

41

DORIS: Discovering ontological relations in sources.with Mary Koutraki, Dan Vodislav, in preparation

getCountryByName(country, name, time-zone, capital, type, lat, lng, city, c_lat, c_lng)

labelcountry

has

Cap

ital

time-zone

hasTim

eZone

name

hasCity

type

type

citylabel

c_lat

c_lng

latitudelongitude

lnglat

longitudelatitude

<geonames> <country> <ccode> AR </ccode> <cname> Argentina </cname> <isonumeric>032</isonumeric> <fipscode> ARG <fipscode> <continent> SA </continent> <continentName> Argentina </continentName> <capital> Buenos Aires </capital> <areaInSqKM> <areaInSqKM>

Our work

• Query Evaluation using Web Service APIs

• Mapping Web Services to Knowledge Bases

42

Web Services WWW

SUSIE

Web Services

ANGIE

KB

Web Services KnowledgeBase

DORIS

Same plan as a graph

predecessor

getPredecessorcountry

Henrique Cardoso Brazil

President

type

headOfState

1 January 1995 

predecessor

getPredecessorcountry

Lula da Silva Brazil

President

type

headOfState

1 January 2003 

getCurrentHeadOfState

Dilma RousseffcountryheadOfState

Brazil

King

type

President

1 January 2011 

BrazilDilma Rousseff

Lula da Silva

IE: Authors who won prize X

44

Precision Recall Prize

38%       59% National Book

62%   44% Phoenix

23% 52%  Jerusalem

78% 79% Pulizer

25%     73%  Franz Kafka

31% 13% Prix Femina

28% 6%  Prix Decembre

41%  29%    Nobel Prize

25% 73%    Golden Pen

Challenges of an instanced-based approach

• XML elements do not correspond to entities in KB

• Entities in KB are URIs and are not to be found in call results

• What is an entity in the XML call result?

• Spurious matches (Argentina is a capital and also a person)

45

Idea: align properties expressed as text or literals first