IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q,...
Transcript of IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q,...
IR Models based on Predicate Logic
Norbert Fuhr
1 / 97
The logical view on IR
IR as InferenceIR as uncertain inferencePropositional vs. Predicate Logic
The logical view on IRIR as inference
q - queryd – documentretrieval:search for documents which imply the query: d → q
Example: classical IR:d = {t1, t2, t3}q = {t1, t3}retrieval: q ⊂ d ?
logical view:d = t1 ∧ t2 ∧ t3
q = t1 ∧ t3
retrieval: d → q ?
3 / 97
advantage of inference-based approach:step from term-based to knowledge-based retrievale.g. easy incorporation of additional knowledgeexample:d : ’squares’q: ’rectangles’thesaurus: ’squares’ → ’rectangles’⇒: d → q
4 / 97
IR as uncertain inference
d : ’quadrangles’q: ’rectangles’⇒ uncertain knowledge required
’quadrangles’0.3→ ’rectangles’
[Rijsbergen 86]:IR as uncertain inferenceRetrieval =estimate probability P(d → q) = P(q|d)
t1 t4
t2 t5
t3 t6d
q
5 / 97
Limitations of propositional logic:
conventional indexing (based on propositional logic): d = {tree,house}query: Is there a picture with a tree on the left of the house?⇒ query cannot be expressed in propositional logic
predicate logic:
d: tree(t1). house(h1). left(h1,t1).
?- tree(X) & house(Y) & left(X,Y).
6 / 97
Relational Structures: Datalog
Datalog program: finite set of rules each expressing a conjunctivequery
t(X1, ...,Xk) : −r1(U11, . . . ,U1m1), . . . , rn(Un1, . . . ,Unmn)
where each variable Xi occurs in the body of the rule (this way,every rule is safe). This corresponds to the logical formula
∀X1 . . . ∀Xk∀U11 . . . ∀Unmn
t(X1, ...,Xk) ∧ ¬r1(U11, . . . ,U1m1) ∧ . . . ∧ ¬rn(Un1, . . . ,Unmn)
t(X1, ..., xk) is called the head andr1(U11, . . . ,U1m1), ..., rn(Un1, . . . ,Unmn) the body.
A formula without a head is also called a fact
7 / 97
Datalog Properties
I horn predicate logic
I no functions
I restricted forms of negation allowed
t(X1, ...,Xk) : −r1(U11, . . . ,U1m1), ...,¬rn(Un1, . . . ,Unmn)
I rules may be recursive (head predicate may occur in the body)
r(X ,Y ) : −l(X ,Z ), r(Z ,Y )
I sound and complete evaluation algorithms
8 / 97
IR and DatabasesThe Logic View
Retrieval
I DB: given query q, find objects o with o → q
I IR: given query q, find documents d with high values ofP(d → q)
I DB is a special case of IR!(in a certain sense)
This section: Focusing on the logic view
I Inference
I Vague predicates
I Query language expressiveness
9 / 97
Inference
IR with the Relational ModelThe Probabilistic Relational ModelInterpretation of probabilistic weightsExtensions
Disjoint eventsRelational BayesProbabilistic rules
Relational ModelProjection
indexDOCNO TERM
1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop
topic
irdboopai
Projection: what is the collection about?topic(T) :- index(D,T).
11 / 97
Relational ModelSelection
indexDOCNO TERM
1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop
aboutir
124
Selection: which documents are about IR?aboutir(D) :- index(D,ir).
12 / 97
Relational ModelJoin
indexDOCNO TERM
1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop
authorDOCNO NAME
1 smith2 miller3 johnson4 firefly4 bradford5 bates
irauthor
smithmillerfireflybradford
Join: who writes about IR?irauthor(A):- index(D,ir) & author(D,A).
13 / 97
Relational ModelUnion
indexDOCNO TERM
1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop
irordb
12345
Union: which documents are about IR or DB?irordb(D) :- index(D,ir).
irordb(D) :- index(D,db).
14 / 97
Relational ModelDifference
indexDOCNO TERM
1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop
irnotdb
24
Difference: which documents are about IR, but not DB?irnotdb(D) :- index(D,ir) & not(index(D,db)).
15 / 97
The Probabilistic Relational Model
[Fuhr & Roelleke 97] [Suciu et al 11]
indexβ DOCNO TERM
0.8 1 IR0.7 1 DB0.6 2 IR0.5 3 DB0.8 3 OOP0.9 4 IR0.4 4 AI0.8 5 DB0.3 5 OOP
aboutdb0.7 10.5 30.8 5
aboutirdb0.8*0.7 1
Which documents are about DB?aboutdb(D) :- index(D,db).
Which documents are about IR and DB?aboutirdb(D) :- index(D,ir) & index(D,db).
16 / 97
Extensional vs. intensional semanticsdoctermβ DOC TERM
0.9 d1 ir0.5 d1 db
linkβ S T
0.7 d2 d1
about(D,T) :- docTerm(D,T).
about(D,T) :- link(D,D1) & about(D1,T)
q(D) :- about(D,ir) & about(D,db).
Extensional semantics:weight of derived fact as function of weights of subgoalsP(q(d2)) = P(about(d2,ir)) · P(about(d2,db)) =
(0.7 · 0.9) · (0.7 · 0.5)
Problem“improper treatment of correlated sources of evidence” [Pearl 88]→ extensional semantics only correct for tree-shaped inferencestructures
17 / 97
Intensional semantics
weight of derived fact as function of weights of underlying groundfacts
Method: Event keys and event expressions
doctermβ κ DOC TERM
0.9 dT(d1,ir) d1 ir0.5 dT(d1,db) d1 db
linkβ κ S T
0.7 l(d2,d1) d2 d1
?- docTerm(D,ir) & docTerm(D,db).
givesd1 [dT(d1,ir) & dT(d1,db)] 0.9 · 0.5 = 0.45
18 / 97
Event keys and event expressions
doctermβ κ DOC TERM
0.9 dT(d1,ir) d1 ir0.5 dT(d1,db) d1 db
linkβ κ S T
0.7 l(d2,d1) d2 d1
about(D,T) :- docTerm(D,T).
about(D,T) :- link(D,D1) & about(D1,T)
?- about(D,ir) & about(D,db).
givesd1 [dT(d1,ir) & dT(d1,db)] 0.9 · 0.5 = 0.45d2 [l(d2,d1) & dT(d1,ir) & l(d2,d1) & dT(d1,db)]
0.7 · 0.9 · 0.5 = 0.315
19 / 97
Recursion
about(D,T) :- docTerm(D,T).
about(D,T) :- link(D,D1) & about(D1,T).
d3
docterm
linkd1 d2
0.5
0.40.8
0.9
0.5
ir
db
?- about(D,ir)
d1 [dT(d1,ir) | l(d1,d2) & l(d2,d3) & l(d3,d1) &
dT(d1,ir) | ...] 0.900d3 [l(d3,d1) & dT(d1,ir)] 0.720d2 [l(d2,d3) & l(d3,d1) & dT(d1,ir)] 0.288
?- about(D,ir) & about(D,db)
d1 [dT(d1,ir) & dT(d1,db)] 0.450d3 [l(d3,d1) & dT(d1,ir) & l(d3,d1) & dT(d1,db)] 0.360
d2 [l(d2,d3) & l(d3,d1) & dT(d1,ir) & dT(d1,db)] 0.14420 / 97
Computation of probabilities for event expressions
1. transformation of expression into disjunctive normal form
2. application of sieve formula:I simple case of 2 conjuncts: P(a∨ b) = P(a) + P(b)−P(a∧ b)I general case:
ci – conjunct of event keys
P(c1 ∨ . . . ∨ cn) =n∑
i=1
(−1)i−1∑
1≤j1<...<ji≤n
P(cj1 ∧ . . . ∧ cji ).
I exponential complexity
I use only when necessary for correctness
I see [Dalvi & Suciu 07]
21 / 97
Possible worlds semantics
0.9 docTerm(d1,ir).
P(W1) = 0.9: {docTerm(d1,ir)}P(W2) = 0.1: {}
22 / 97
0.6 docTerm(d1,ir). 0.5 docTerm(d1,db).
Possible interpretations:
I1: P(W1) = 0.3: {docTerm(d1,ir)}P(W2) = 0.3: {docTerm(d1,ir), docTerm(d1,db)}P(W3) = 0.2: {docTerm(d1,db)}P(W4) = 0.2: {}
I2: P(W1) = 0.5: {docTerm(d1,ir)}P(W2) = 0.1: {docTerm(d1,ir), docTerm(d1,db)}P(W3) = 0.4: {docTerm(d1,db)}
I3: P(W1) = 0.1: {docTerm(d1,ir)}P(W2) = 0.5: {docTerm(d1,ir), docTerm(d1,db)}P(W3) = 0.4: {}
probabilistic logic:0.1 ≤ P(docTerm(d1, ir)&docTerm(d1, db)) ≤ 0.5probabilistic Datalog with independence assumptions:P(docTerm(d1, ir)&docTerm(d1, db)) = 0.3
23 / 97
Disjoint events
β City State
0.7 Paris France0.2 Paris Texas0.1 Paris Idaho
Interpretation:P(W1) = 0.7: {cityState(paris, france)}P(W2) = 0.2: {cityState(paris, texas)}P(W3) = 0.1: {cityState(paris, idaho)}
24 / 97
Relational Bayes
[Roelleke et al. 07]
Role of the relational Bayes: Generation of a probabilistic database
database
Non−probabilistic Bayes
database
Probabilistic
25 / 97
Relational BayesExample: P(Nationality | City)
nationality and cityNationality City
”British” ”London””British” ”London””British” ”London””Scottish” ”London””French” ”London””German” ”Hamburg””German” ”Hamburg””Danish” ”Hamburg””British” ”Hamburg””German” ”Dortmund””German” ”Dortmund””Turkish” ”Dortmund””Scottish” ”Glasgow”
=⇒Bayes[$City]()
nationality cityP(Nationality|City) Nationality City
0.600 ”British” ”London”0.200 ”Scottish” ”London”0.200 ”French” ”London”0.500 ”German” ”Hamburg”0.250 ”Danish” ”Hamburg”0.250 ”British” ”Hamburg”0.667 ”German” ”Dortmund”0.333 ”Turkish” ”Dortmund”1.000 ”Scottish” ”Glasgow”
1 # P(Nationality | City) :2 nationality city SUM(Nat, City) :−3 nationality and city (Nat, City) | (City) ;
26 / 97
Relational BayesExample: P(t|d)
termTerm DocId
sailing doc1boats doc1sailing doc2boats doc2sailing doc2east doc3coast doc3sailing doc3sailing doc4boats doc5
p t d space(Term, DocId) :-term(Term, DocId) | (DocId);
P(t|d) Term DocId
0.50 sailing doc10.50 boats doc10.33 sailing doc20.33 boats doc20.33 sailing doc20.33 east doc30.33 coast doc30.33 sailing doc31.00 sailing doc41.00 boats doc5
p t d SUM(Term, DocId) :-term(Term, DocId) | (DocId);
P(t|d) Term DocId
0.50 sailing doc10.50 boats doc10.67 sailing doc20.33 boats doc20.33 east doc30.33 coast doc30.33 sailing doc31.00 sailing doc41.00 boats doc5
27 / 97
Probabilistic rulesRules for deterministic facts:
0.7 likes-sports(X) :- man(X).
0.4 likes-sports(X) :- woman(X).
man(peter).
Interpretation:P(W1) = 0.7: {man(peter), likes-sports(peter)}P(W2) = 0.3: {man(peter)}
28 / 97
Probabilistic rulesRules for uncertain facts:
# gender is disjoint on the first attribute
0.7 l-sports(X) :- gender(X,male).
0.4 l-sports(X) :- gender(X,female).
0.5 gender(X,male) :- human(X).
0.5 gender(X,female) :- human(X).
human(jo).
Interpretation:P(W1) = 0.35: {gender(jo,male), l-sports(jo)}P(W2) = 0.15: {gender(jo,male)}P(W3) = 0.20: {gender(jo,female), l-sports(jo)}P(W4) = 0.30: {gender(jo,female)}
?- l-sports(jo) P(W1) + P(W3) = 0.55
29 / 97
Probabilistic rulesRules for independent events
sameauthor(D1,D2) :- author(D1,X) & author(D2,X).
0.5 link(D1,D2) :- refer(D1,D2).
0.2 link(D1,D2) :- sameauthor(D1,D2).
?? link(D1,D2) :- refer(D1,D2) & sameauthor(D1,D2).
P(l |r), P(l |s) → P(l |r ∧ s)?
30 / 97
Rules for independent eventsModeling probabilistic inference networks
0.7 link(D1,D2) :- refer(D1,D2) & sameauthor(D1,D2).
0.5 link(D1,D2) :- refer(D1,D2) & not(sameauthor(D1,D2)).
0.2 link(D1,D2) :- sameauthor(D1,D2) & not(refer(D1,D2)).
Probabilistic inference networks,rules define link matrix
refer sameauthor
link
31 / 97
Vague Predicates
The Logical View on Vague PredicatesVague Predicates in IR and DatabasesProbabilistic Modeling of Vague Predicates
Vague PredicatesMotivating Example
33 / 97
Propositional vs. Predicate Logic
I Current IR systems are based on proposition logic(query term present/absent in document)
I Similarity of values not considered
I but multimedia IR deals with similarity already
I transition from propositional to predicate logic necessary
34 / 97
Vague Predicates in Probabilistic Datalog
[Fuhr & Roelleke 97] [Fuhr 00]
I Example: Shopping 45 inch LCD TV
I vague predicates as builtin predicates:X ≈ Y
I query(D):- Category(D,tv) &
type(D,lcd) & size(D,X) &
≈(X,45)
X ≈ Y
β X Y
0.7 42 450.8 43 450.9 44 451.0 45 450.9 46 450.8 47 45. . . . . . . . .
35 / 97
Data types and vague predicates in IR
Data type: domain + (vague) predicates
I Language (multilingual documents) /(language-specific stemming)
I Person names / “his name sounds like Jones”
I Dates / “about a month ago”
I Amounts / “orders exceeding 1 Mio $”
I Technical measurements / “at room temperature”
I Chemical formulas
36 / 97
Vague Criteria in Fact Databases
”I am looking for a 45-inch LCD TV with
I wide viewing angle
I high contrast
I low price
I high user rating”
→ vague criteria are very frequent in end-user querying of factdatabases
→ but no appropriate support in SQL
vague conditions → similar to fuzzy predicates
37 / 97
Probabilistic Modeling of Vague Predicates
[Fuhr 90]
I learn vague predicates fromfeedback data
I construct feature vector~x(qi , di ) from query value qi
and document value di
(e.g. relative difference)
I apply logistic regression
38 / 97
Expressiveness
Retrieval Rules, Joins, Aggregations and RestructuringExpressiveness in XML Retrieval
ExpressivenessFormulating Retrieval Rules
about(D,T) :- docTerm(D,T).
consider document linking / anchor textabout(D,T) :- link(D1,D),about(D1,T).
consider term hierarchyabout(D,T) :- subconcept(T,T1) & about(D,T1).
field-specific term weighting0.9 docTerm(D,T) :- occurs(D,T,title).
0.5 docTerm(D,T) :- occurs(D,T,body).
40 / 97
ExpressivenessJoins
IR authors:
irauthor(N):- about(D,ir) & author(D,N).
Smith’s IR papers cited by Miller
?- author(D,smith) & about(D,ir) &
author(D1,miller) & cites(D,D1).
41 / 97
ExpressivenessAggregation (1)
Who are the major IR authors?
indexβ DNO TERM
0.9 1 ir0.8 1 db0.6 2 ir0.8 3 ir0.7 3 ai
authorDNO NAME
1 smith2 miller3 smith
irauthor0.98 smith0.6 miller
irauthor(A):- index(D,ir) & author(D,A).
Aggregation through projection!
42 / 97
ExpressivenessAggregation (2)
Who are the major IR authors?
indexβ DNO TERM
0.9 1 ir0.8 1 db0.6 2 ir0.8 3 ir0.7 3 ai
authorDNO NAME
1 smith2 miller3 smith
irauths1.7 smith0.6 miller
Aggregation through summing:
irauth(D,A):- index(D,ir) & author(D,A).
irauths SUM(Name) :- irdbauth(Doc,Name) | (Name)
43 / 97
Expressiveness in XML Retrieval
[Fuhr & Lalmas 07]
44 / 97
XML structure: 1. Nested Structure
I XML document as hierarchicalstructure
I Retrieval of elements (subtrees)
I Typical query language does notallow for specification of structuralconstraints
I Relevance-oriented selection ofanswer elements: return the mostspecific relevant elements
45 / 97
XML structure: 2. Named Fields
I Reference to elementsthrough field names only
I Context of elements isignored(e.g. author of article vs.author of referenced paper)
I Post-Coordination may leadto false hits(e.g. author name – authoraffiliation)
Example: Dublin Core<oai dc:dc xmlns:dc=
"http://purl.org/dc/elements/1.1/">
<dc:title>Generic Algebras
... </dc:title>
<dc:creator>A. Smith (ESI),
B. Miller (CMU)</dc:creator>
<dc:subject>Orthogonal group,
Symplectic group</dc:subject>
<dc:date>2001-02-27</dc:date>
<dc:format>application/postscript</dc:format>
<dc:identifier>ftp://ftp.esi.ac.at/pub/esi1001.ps</dc:identifier>
<dc:source>ESI preprints
</dc:source>
<dc:language>en</dc:language>
</oai dc:dc>
46 / 97
XML structure: 3. XPath
/document/chapter[about(./heading, XML) AND
about(./section//*,syntax)]
47 / 97
XML structure: 3. XPath (cont’d)
I Full expressiveness for navigation through document tree(+links)
I Parent/child, ancestor/descendantI Following/preceding, following-sibling, preceding-siblingI Attribute, namespace
I Selection of arbitrary elements/subtrees(but answer can be only a single element of the originatingdocument)
48 / 97
XML structure: 4. XQuery
Higher expressiveness, especially for database-like applications:
I Joins (trees → graphs)
I Aggregations
I Constructors for restructuring results
Example: List each publisher and the average price of its books
FOR $p IN distinct(document("bib.xml")//publisher)
LET $a := avg(document("bib.xml")//book[publisher =
$p]/price)
RETURN
<publisher>
<name> $p/text() </name>
<avgprice> $a </avgprice>
</publisher>
49 / 97
XML content typing
50 / 97
XML content typing: 1. Text<book>
<author>John Smith</author>
<title>XML Retrieval</title>
<chapter> <heading>Introduction</heading>
This text explains all about XML and IR.
</chapter>
<chapter>
<heading> XML Query Language XQL
</heading>
<section>
<heading>Examples</heading>
</section>
<section>
<heading>Syntax</heading>
Now we describe the XQL syntax.
</section>
</chapter>
</book>
Example query
//chapter[about(.,
XML query language]
51 / 97
XML content typing: 2. Data Types
I Data type: domain + (vague) predicates(see above)
I Close relationship to XML Schema, butI XMLS supports syntactic type checking onlyI No support for vague predicates
52 / 97
XML content typing: 3. Object TypesBased on Tagging / Named Entity Recognition
I Object types: Persons, Locations. Dates, .....
Pablo Picasso (October 25, 1881 - April 8, 1973) was aSpanish painter and sculptor..... In Paris, Picasso entertaineda distinguished coterie of friends in the Montmartre andMontparnasse quarters, including Andre Breton, GuillaumeApollinaire, and writer Gertrude Stein.
To which other artists did Picasso have close relationships?Did he ever visit the USA?
I Named entity recognition methods allow for automaticmarkup of object types
I Object types support increased precision
53 / 97
XML content typingTag semantics modelled as hierarchies
54 / 97
XML content typingTag semantics modelled in OWL
55 / 97
Retrieval of structured documents: POOL
Structure of POOL programsContexts and augmentationAugmentation and inconsistencies
Goals
I retrieval of structured documents→ hierarchical logical structure
I abstraction from node types→ contexts as untyped nodes
I multimedia retrieval→ expressiveness of restricted predicate logic
57 / 97
Structure of POOL programs
object: identifier + content
context: object with nonempty contenta1[ ], s11[ ], s12[ ]
program: set of clauses
clause: context / proposition / rule
proposition:
I term(image, presentation)
I classification(article(a1), section(s11))
I attribute(s11.author(smith), a1.pubyear(1997))
58 / 97
POOL Example Program
a1[ s11[ image 0.6 retrieval presentation ]
s12[ ss121[ audio indexing ]
ss122[ video not presentation ] ] ]
s11.author(smith)
s121.author(miller) s122.author(jones)
a1.pubyear(1997)
article(a1) section(s11) section(s12)
subsection(ss121) subsection(ss122)
59 / 97
Rules in POOL
rule: head :- body
head: proposition / context containing a proposition
body conjunction of subgoals (propositions or contexts)
docnode(D) :- article(D)
docnode(D) :- section(D)
docnode(D) :- subsection(D)
mm-ir-doc(D) :- docnode(D) &
D[audio & retrieval]
german-paper(D) :- D.author.country(germany)
query: ?- body
?- D[audio & indexing]
60 / 97
Contexts and augmentation
clauses only hold for context where stated
augmentation:propagation of propositions to surrounding contexts
a1[ s11[ image 0.6 retrieval presentation ]
s12[ ss121[ audio indexing ]
ss122[ video not presentation ] ] ]
?- D[audio & video]
s12
a1
61 / 97
Augmentation with Uncertainty:
?- audio
1.00 ss121
0.60 s12
0.36 a1
?- D[audio & video]
0.36 s12
0.22 a1
augmentation with uncertainty prefers most specific context!
62 / 97
Augmentation and Inconsistencies
d1[ s1[ audio indexing ]
s2[ s21[ image retrieval]
s22[ video not retrieval ] ] ]
?- D[audio & indexing]
s1
d1
?- D[video & image]
s2
d1
?- D[video & retrieval]
retrieval is inconsistent in s2
63 / 97
Four-Valued Logic
truth values: unknown, true, false, inconsistent
∧ T F U I
T T F U IF F F F FU U F U FI I F F I
∨ T F U I
T T T T TF T F U IU T U U TI T I T I
¬T FF TU UI I
Interpretation as sets over two values:T={true}F={false}U={true,false}I ={}
64 / 97
Applying Four-Valued Logic
d1[ s1[ audio indexing ]
s2[ s21[ image retrieval]
s22[ video not retrieval ] ] ]
s22:video 7→ trueimage 7→ unknownretrieval 7→ falses2:image 7→ truevideo 7→ trueretrieval 7→ inconsistent
65 / 97
Description logic
ThesaurusIntroduction into OWLSPARQL
regular
triangle
regular
polygon
polygon
rectangle
square
triangle quadrangle ...
thesaurus knowledge:can be expressed in propositional logicsquare = quadrangle ∧ regular-polygon
description logic
I based on semantic networksI more expressive than thesauri
I instances of conceptsI roles between (instances of) concepts
67 / 97
Semantic Web (ontology) languages
RDF: “Resource description language”semantic markup language, only resources and theirproperties, serialisation in XML
RDFS: “RDF Schema”, schema definition language for RDF
OWL: extends RDF/RDFS by richer modelling primitives,OWL Lite/DL/Full
I OWL Lite contains simple primitivesI OWL DL corresponds to expressive description
logicI OWL Full is OWL DL + RDF
knowledge base can be modelled as collection of RDFtriples (RDF/XML serialisation)alternative encoding: abstract syntax (easier to read)
68 / 97
Objects, classes, literals and datatypes
I Two distinct domains:
Classes: for objectsData types: for literals
Person
owl:Class owl:Datatype
Peter 1,89
rdf:type rdf:type
rdf:type rdf:type
xsd:decimal
69 / 97
Classes (1)
owl:Class
ManMale
Animal Person
Femaleowl:disjointWith
rdfs:subClassOfrdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdf:type
Class(Female partial Animal)
<owl:Class rdf:ID="Female">
<rdfs:subClassOf rdf:resource="#Animal"/>
</owl:Class>
70 / 97
Classes (2)
owl:Class
ManMale
Animal Person
Femaleowl:disjointWith
rdfs:subClassOfrdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdf:type
Class(Male partial Animal)
DisjointClasses(Male Female)
<owl:Class rdf:ID="Male">
<rdfs:subClassOf rdf:resource="#Animal"/>
<owl:disjointWith rdf:resource="#Female"/>
</owl:Class>
71 / 97
Object properties (1)
owl:ObjectProperty
Animal Animalrdfs:domain
rdfs:rangehasFather Male
rdfs:rangehasParent
rdf:type
owl:subPropertyOf
ObjectProperty(hasParent domain(Animal) range(Animal))
<owl:ObjectProperty rdf:ID="hasParent">
<rdfs:domain rdf:resource="#Animal"/>
<rdfs:range rdf:resource="#Animal"/>
</owl:Class>
72 / 97
Object properties (2)
owl:ObjectProperty
Animal Animalrdfs:domain
rdfs:rangehasFather Male
rdfs:rangehasParent
rdf:type
owl:subPropertyOf
ObjectProperty(hasFather super(hasParent) range(Male))
<owl:ObjectProperty rdf:ID="hasFather">
<rdfs:subPropertyOf rdf:resource="#hasParent"/>
<rdfs:range rdf:resource="#Male"/>
</owl:Class>
73 / 97
Datatype properties
rdfs:domain rdfs:range
rdf:type
owl:DatatypeProperty
shoesizePerson xsd:decimal
rdf:type
owl:FunctionalProperty
DatatypeProperty(shoesize Functional domain(Animal) range(xsd:decimal))
<owl:DatatypeProperty rdf:ID="shoesize">
<rdfs:domain rdf:resource="#Animal"/>
<rdfs:range rdf:resource="xsd:decimal"/>
<rdf:type rdf:resource="owl:FunctionalProperty"/>
</owl:Class>
74 / 97
Property restrictions
Class(Person partial Animal restriction(hasParent allValuesFrom(Person))
restriction(hasParent cardinality(2)))
<owl:Class rdf:ID="Person">
<rdfs:subClassOf rdf:resource="#Animal"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#hasParent"/>
<owl:allValuesFrom rdf:resource="#Person"/>
</owl:Restriction>
</rdfs:subClassOf>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#hasParent"/>
<owl:cardinality>2</owl:cardinality>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
75 / 97
Instances
Individual(Kain type(Male) value(hasFather Adam)
value(hasMother Eve)
value(shoesize 10))
<Male rdf:ID="Kain">
<hasFather rdf:resource="#Adam"/>
<hasMother rdf:resource="#Eve"/>
<shoesize>10</shoesize>
</Male>
76 / 97
Further modelling primitives
owl:inverseOf: inverse property: p(a, b)↔ r(b, a)
owl:TransitiveProperty: p(a, b), p(b, c)→ p(a, c)
owl:SymmetricProperty: p(a, b)→ p(b, a)
owl:InverseFunctionalProperty: inverse property is functional
owl:hasValue at least one property value equals object or datatypevalue
owl:someValuesFrom at least one property value is instance ofclass, expression or datatype
owl:interSectionOf, owl:unionOf, owl:complementOf: booleancombinations of class expressions
owl:oneOf: define class by enumerating its instances
77 / 97
OWL Class Constructors
78 / 97
OWL Axioms
79 / 97
OWL DL Semantics
80 / 97
Limitations of OWL
OWL lacks support for
I uncertainty: only deterministic relationships possible,no weighting or probabilistic facts⇒ “Pr(hasFather(lisa,thomas))=0.9” cannot be expressed
I rules: no general rules,only specific rules like subClassOf, TransitiveProperty . . .⇒ “if hasParent(A,B) and hasParent(C,D) andhasSibling(B,D), then hasCousion(A,C)” cannot be expressed
I n-ary datatype predicates:OWL datatypes are based on XML Schema datatypes, thusproviding only unary datatype predicates⇒ “sameDomain([email protected],[email protected])” cannot beexpressed
⇒ IR queries cannot be expressed directly in OWL
81 / 97
OWL: Conclusion
I OWL extends RDF(S) by additional modelling primitives
I well-defined semantics, based on description logics
I does not support all RDF features (no reification, only threelevels owl:Class, classes and objects)
I lacks important features:I only deterministic features, no probabilistic relationshipsI no rules (but in SWRL)I restricted datatype predicates (due to XML Schema)
I OWL and associated languages become standard in theSemantic Web
82 / 97
Semantic Web Layers
83 / 97
SPARQL
query language for getting information from RDF (OWL) graphsFacilities for
I extract information in the form of URIs, blank nodes, plainand typed literals
I extract RDF subgraphs
I construct new RDF graphs based on information in thequeried graphs
Features:
I matching graph patterns
I variables – global scope; indicated by ’?’ or ‘$‘
84 / 97
SPARQL: Basic Graph Pattern
I Set of Triple PatternsI Triple Pattern – similar to an RDF Triple (subject, predicate,
object), but any component can be a query variable; literalsubjects are allowed?book dc:title ?title
I Matching a triple pattern to a graph: bindings betweenvariables and RDF Terms
I Matching of Basic Graph PatternsI A Pattern Solution of Graph Pattern GP on graph G is any
substitution S such that S(GP) is a subgraph of G.SELECT ?x ?v WHERE ?x ?v ?x
85 / 97
SPARQL: Group Patterns + Value Constraints
Group Pattern: A set of graph patterns which must all matchValue Constraints: restrict RDF terms in a solutionSELECT ?n WHERE
?n profession "Physicist" . ?n isa "Politician"
86 / 97
SPARQL: Query forms
SELECT returns all, or a subset of the variables bound in aquery pattern matchformats : XML or RDF/XML
CONSTRUCT returns an RDF graph constructed by substitutingvariables in a set of triple templates
DESCRIBE returns an RDF graph that describes the resourcesfound.
ASK returns whether a query pattern matches or not.
87 / 97
Conclusion and Outlook
Conclusion
Inference
I Probabilistic relational model supports integration of IR+DB
I Probabilistic Datalog as powerful inference mechanism
I Allows for formulating retrieval strategies as logical rules
Vague predicates
I Natural extension of IR methods to attribute values
I Vague predicates can be learned from feedback data
I Transition from propositional to predicate logic
Expressive query language
I Joins
I Aggregations
I (Re)structuring of results89 / 97
http://www.eecs.qmul.ac.uk/~thor/
http://www.spinque.com/
OutlookIR Systems vs. DBMS
Separation between IRS and IR application?
92 / 97
Towards an IRMS
93 / 97
References I
Dalvi, N. N.; Suciu, D.(2007).Efficient query evaluation on probabilistic databases.VLDB J. 16(4), pages 523–544.
Forst, J. F.; Tombros, A.; Roelleke, T.(2007).POLIS: A Probabilistic Logic for Document Summarisation.In: Proceedings of the 1st International Conference on Theory of InformationRetrieval (ICTIR 07) - Studies in Theory of Information Retrieval, pages201–212.
Frommholz, I.; Fuhr, N.(2006).Probabilistic, Object-oriented Logics for Annotation-based Retrieval in DigitalLibraries.In: Nelson, M.; Marshall, C.; Marchionini, G. (eds.): Opening InformationHorizons – Proc. of the 6th ACM/IEEE Joint Conference on Digital Libraries(JCDL 2006), pages 55–64. ACM, New York.
94 / 97
References II
Fuhr, N.; Lalmas, M.(2007).Advances in XML retrieval: the INEX initiative.In: IWRIDL ’06: Proceedings of the 2006 international workshop on Researchissues in digital libraries, pages 1–6. ACM, New York, NY, USA.
Fuhr, N.; Rolleke, T.(1997).A Probabilistic Relational Algebra for the Integration of Information Retrievaland Database Systems.ACM Transactions on Information Systems 14(1), pages 32–66.
Fuhr, N.; Rolleke, T.(1998).HySpirit – a Probabilistic Inference Engine for Hypermedia Retrieval in LargeDatabases.In: Proceedings of the 6th International Conference on Extending DatabaseTechnology (EDBT), pages 24–38. Springer, Heidelberg et al.
95 / 97
References III
Fuhr, N.(1990).A Probabilistic Framework for Vague Queries and Imprecise Information inDatabases.In: Proceedings of the 16th International Conference on Very Large Databases,pages 696–707. Morgan Kaufman, Los Altos, California.
Fuhr, N.(2000).Probabilistic Datalog: Implementing Logical Information Retrieval for AdvancedApplications.Journal of the American Society for Information Science 51(2), pages 95–110.
Lalmas, M.; Roelleke, T.; Fuhr, N.(2002).Intelligent Hypermedia Retrieval.In: Szczepaniak, P. S.; Segovia, F.; Zadeh, L. A. (eds.): Intelligent Explorationof the Web, pages 324–344. Springer, Heidelberg et al.
Pearl, J.(1988).Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.Morgan Kaufman, San Mateo, California.
96 / 97
References IV
Rolleke, T.; Wu, H.; Wang, J.; Azzam, H.(2007).Modelling retrieval models in a probabilistic relational algebra with a newoperator: the relational Bayes.The International Journal on Very Large Data Bases (VLDB) 17(1), pages 5–37.
Suciu, D.; Olteanu, D.; Re, C.; Koch, C.(2011).Probabilistic Databases.Morgan & Claypool Publishers.
97 / 97