Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data
-
Upload
thomas-gottron -
Category
Science
-
view
822 -
download
0
Transcript of Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data
Institute for Web Science & Technologies – WeST
Leveraging the Web of Data: Managing, Analysing and Making
Use of Linked Open Data
Thomas Gottron
July 18th, 2014 IRSS
Thomas Gottron IRSS, Athens, 18.7.2014, 2 Leveraging the Web of Data
Linked Open Data – Vision of a Web of Data
§ „Classic“ Web w Linked documents
§ Web of Data w Linked data entities
Thomas Gottron IRSS, Athens, 18.7.2014, 3 Leveraging the Web of Data
§ „Classic“ Web
Linked Open Data – Vision of a Web of Data
§ Web of Data ID
ID
Thomas Gottron IRSS, Athens, 18.7.2014, 4 Leveraging the Web of Data
LOD – Base technologies
§ IDs: Dereferencable HTTP URIs § Data Format: RDF § No schema § Links to other data sources
foaf:Document
„Extracting schema ...“
fb:Computer_Scientist
dc:creator
http://dblp.l3s.de/.../NesterovAM98
http://dblp.l3s.de/.../Serge_Abiteboul
rdf:type
„Serge Abiteboul“
dc:title
rdf:type
foaf:name
http://www.bibsonomy.org/.../Serge+Abiteboul
rdfs:seeAlso
1 Statement = 1 Tripel
Subject Predicate Object
rdf:type = http://www.w3.org/1999/02/22-rdf-syntax-ns#type foaf:Document = http://xmlns.com/foaf/0.1/Document
swrc:InProceedings rdf:type
Thomas Gottron IRSS, Athens, 18.7.2014, 5 Leveraging the Web of Data
LOD Cloud
… the Web of Linked Data consisting of more than 30 Billion RDF triples from
hundreds of data sources …
Gerhard Weikum SIGMOD Blog, 6.3.2013 http://wp.sigmod.org/
Where’s the Data in the Big Data Wave?
Thomas Gottron IRSS, Athens, 18.7.2014, 6 Leveraging the Web of Data
Some „Bubbles“ of the LOD Cloud
Thomas Gottron IRSS, Athens, 18.7.2014, 7 Leveraging the Web of Data
Making Use of the Linked Data Cloud ...
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
LOD: a rich, huge, diverse, public and distributed knowledge base on the Web.
Pros Cons
rich
knowledge
base
diverse public
huge
on the Web
diverse distributed
Find technical solutions to overcome challenges
Thomas Gottron IRSS, Athens, 18.7.2014, 8 Leveraging the Web of Data
Man
agin
g k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
Mak
ing
Use
A
naly
sing
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70 80
Mic
ro A
vg. F
1
Week of Data Snapshot
RDF Type TS PS IPS ECS SchemEX
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Thomas Gottron IRSS, Athens, 18.7.2014, 9 Leveraging the Web of Data
Man
agin
g k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Ana
lysi
ng
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70 80
Mic
ro A
vg. F
1
Week of Data Snapshot
RDF Type TS PS IPS ECS SchemEX
Mak
ing
Use
Thomas Gottron IRSS, Athens, 18.7.2014, 10 Leveraging the Web of Data
Data Format
§ Linked Data as N-Quads:
triple – what is the information?
context URI – where does it come from?
s o p
c
( ) s o p c
Thomas Gottron IRSS, Athens, 18.7.2014, 11 Leveraging the Web of Data
Index Models
Thomas Gottron IRSS, Athens, 18.7.2014, 12 Leveraging the Web of Data
(Abstract) Index Models
w D : Data elements to be retrieved (payload) w K : Key elements to access the data (index elements) w σ : Selection function: How to get data for a key
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
DK σ
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
℘( )
Data items / Payload Keys
Thomas Gottron IRSS, Athens, 18.7.2014, 13 Leveraging the Web of Data
Choices for Key Elements
Subject
s o p c
Search data
structure
s1 s2 sn
Context
Search data
structure
c1 c2 cm
s o p c
Literals
Search data
structure
x y z
s lit p c
Types
Search data
structure
t1 t2 tk
s t rdf: type
c
... Alternative:
predicates or
objects
Thomas Gottron IRSS, Athens, 18.7.2014, 14 Leveraging the Web of Data
Choices for the Payload
Full Caching
local
Web
s o p c
Triples
local
Web
s o p
Entities
local
Web
s
Data Sources
local
Web
c
...
Thomas Gottron IRSS, Athens, 18.7.2014, 15 Leveraging the Web of Data
, ...
Concrete Example: Subject Based Index Model
west:Gottron
west:Staab
west:Schegi
...
tud:CGottron
(west:Gottron, rdf:type, foaf:Person) (west:Gottron, foaf:knows, west:Staab) ...
(west:Staab, swrc:institution, west:WeST) (west:Staab, foaf:name, „Steffen Staab“) ...
(west:Schegi, rdf:type, foaf:Person) (west:Schegi, foaf:name, „Stefan Scheglmann“)
(tud:CGottron, swrc:institution, tud:KOM) (tud:CGottron, foaf:knows, west:Gottron) ...
s o1 p1 s s o2 p2
Thomas Gottron IRSS, Athens, 18.7.2014, 16 Leveraging the Web of Data
Implemented Index Models
§ Triple based
§ Meta data
https://github.com/gottron/lod-index-models
s o p s
s o p p
s o p o
s o p term
s o p c
s o p PLD
Thomas Gottron IRSS, Athens, 18.7.2014, 17 Leveraging the Web of Data
Schema-level Indices
Thomas Gottron IRSS, Athens, 18.7.2014, 18 Leveraging the Web of Data
Schema Information on the LOD Cloud
(No) Schema?
Guidelines / best practices
Automatic tools Social effects
Emerging Schema!
Induce from data observations
Thomas Gottron IRSS, Athens, 18.7.2014, 19 Leveraging the Web of Data
Examples for Schema Information
Property Set Type Set
{ }
... x, ... p1 p3 p2 { }
...
t1 t2 y, ...
rdf:type
y rdf:type
t1
t2
x
p1
p2
p3
Thomas Gottron IRSS, Athens, 18.7.2014, 20 Leveraging the Web of Data
Implemented Index Models
§ Triple based
§ Meta data
§ Schema-level
https://github.com/gottron/lod-index-models
s o p s
s o p p
s o p o
s o p term
s o p c
s o p PLD
type s
SchemEX s
t t s t t
p p s p p
p-1 p-1 o p-1 p-1
t p s p t
Thomas Gottron IRSS, Athens, 18.7.2014, 21 Leveraging the Web of Data
SchemEX
Thomas Gottron IRSS, Athens, 18.7.2014, 22 Leveraging the Web of Data
Schema-based Access to the LOD cloud
? foaf:Document
fb:Computer_Scientist
dc:creator
x
swrc:InProceedings
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
Thomas Gottron IRSS, Athens, 18.7.2014, 23 Leveraging the Web of Data
Schema-based Access to the LOD cloud
Schema-level Index
Where? • ACM • DBLP
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
Which schema information?
Thomas Gottron IRSS, Athens, 18.7.2014, 24 Leveraging the Web of Data
Typecluster
§ Entities with the same Set of types
t1 t2
C1 C2 Cm
tn ...
...
TCj
Thomas Gottron IRSS, Athens, 18.7.2014, 25 Leveraging the Web of Data
Typecluster: Example
foaf:Document swrc:InProceedings
DBLP ACM
tc2309
Thomas Gottron IRSS, Athens, 18.7.2014, 26 Leveraging the Web of Data
Property Sets
§ Entities with the same Set of properties
p1 p2
C1 C2 Cm
pn ...
...
PSi
Thomas Gottron IRSS, Athens, 18.7.2014, 27 Leveraging the Web of Data
Bi-Simulation: Example
dc:creator
BBC DBLP
ps2608
Thomas Gottron IRSS, Athens, 18.7.2014, 28 Leveraging the Web of Data
SchemEX: Combination TC and PS
§ Partition of TC based on PS with restrictions on the destination TC (equivalence relation)
t1 t2 tn ...
C1 C2 Cm ...
t45 t2 tn‘ ...
p1 pn‘‘ ... EQC EQC
Cx
TCj TCk
EQCl
PSi
Sch
ema
Pay
load
p2
PS
Thomas Gottron IRSS, Athens, 18.7.2014, 29 Leveraging the Web of Data
SchemEX: Example
DBLP
...
tc2309 tc2101
eqc707
ps2608
foaf:Document swrc:InProceedings fb:Computer_Scientist
dc:creator
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
Thomas Gottron IRSS, Athens, 18.7.2014, 30 Leveraging the Web of Data
Building Indices
Thomas Gottron IRSS, Athens, 18.7.2014, 31 Leveraging the Web of Data
Building Indices: Operators
§ Combination of few simple operations w Aggregate, Join, Invert
§ Example: Property Set index
s1 o1 p1 c1
s1 o1 p2 c1
s2 o2 p2 c1
s3 o3 p1 c1
s3 o4 p2 c1
s4 o1 p3 c1
s1 p1 p2
s2 p2
s3 p1 p2
s4 p3
p1 p2 s1 s3
p2 s2
p3 s4
Aggregate Invert
Thomas Gottron IRSS, Athens, 18.7.2014, 32 Leveraging the Web of Data
SchemEX: Computation
§ Precise computation: 3 Aggregates + Join + Invert
t1 t2 tn ...
C1 C2 Cm ...
t45 t2 tn‘ ...
p1 pn‘‘ ... EQC EQC
Cx
TCj TCk
EQCl
PSi
Sch
ema
Pay
load
p2
PS
Better Approach? (Faster, more scalable)
Thomas Gottron IRSS, Athens, 18.7.2014, 33 Leveraging the Web of Data
Stream-based Computation of SchemEX
§ LOD Crawler: Stream of N-Quads
… Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1
FiFo
4
3
2
1
1 6
2 3
4
5
t3
t2
t2
t1
t1 t2 tn ...
C1 C2 Cm ...
t45 t2 tn‘ ...
p1 pn‘‘ ... EQC EQC
Cx
TCj TCk
EQCl
PSi
Sch
ema
Pay
load
p2
PS
Thomas Gottron IRSS, Athens, 18.7.2014, 34 Leveraging the Web of Data
Quality of Approximated Index
§ Stream-based computation vs. precise computation w Data set of 11 Mio. tripel
Thomas Gottron IRSS, Athens, 18.7.2014, 35 Leveraging the Web of Data
SchemEX: 1st place @ BTC 2011
§ SchemEX w Allows complex queries (Star, Chain) w Scalable computation w High quality
§ Index over BTC 2011 data w 2.17 billion triple w Index: 55 million triple
§ Commodity hardware w VM: 1 Core, 4 GB RAM w Throughput: 39.500 triple / second w Computation of full index: 15h
Thomas Gottron IRSS, Athens, 18.7.2014, 36 Leveraging the Web of Data
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval M
anag
ing
Mak
ing
Use
A
naly
sing
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70 80
Mic
ro A
vg. F
1
Week of Data Snapshot
RDF Type TS PS IPS ECS SchemEX
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Thomas Gottron IRSS, Athens, 18.7.2014, 37 Leveraging the Web of Data
Redundancy of Schema Information
Thomas Gottron IRSS, Athens, 18.7.2014, 38 Leveraging the Web of Data
Explicit and Implicit Schema Information on LOD
E1
rdf:type
Explicit
Assigning class types
Implicit
Modelling attributes
dc:creator E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Class Type
Entity
rdf:type Entity Property
Entity 2
To which degree?
Thomas Gottron IRSS, Athens, 18.7.2014, 39 Leveraging the Web of Data
e.g.
Probabilistic model of schema information
§ Joint Distribution P(T,R) over w Type sets: TS w Property sets: PS
§ P(T=t,R=r) : probability of resource to have type set t and property set r
P(T,R) r1 r2 r3 r4 P(T) t1 14% 2% 5% 8% 29% t2 5% 15% 2% 3% 25% t3 7% 3% 30% 5% 45%
P(R) 26% 20% 37% 17% foaf:Document
swrc:InProceedings
e.g.
dc:creator
dc:title
Marginal Distributions
Marginal Distributions
Thomas Gottron IRSS, Athens, 18.7.2014, 40 Leveraging the Web of Data
Estimating probabilities
Data set Triples TS PS Rest 22.3M 793 7,522 Datahub 910.1M 28,924 14,712 Dbpedia 198.1M 1,026,272 391,170 Freebase 101.2M 69,732 162,023 Timbl 204.8M 4,139 9,619
§ Todo (on large data sets) w Determine schema use w Aggregate counts
§ „Query“ schema-level index:
§ Data background: segments from BTC‘12
p(t, r) =d ∈σ t, r( )
N
Thomas Gottron IRSS, Athens, 18.7.2014, 41 Leveraging the Web of Data
Redundancy
To which degree can one type of information (either types or properties) explain the respective other?
§ Mutual Information
§ Normalized MI:
• H(T) and H(R) : entropy of the marginal distributions.
I(T,R) = p t, r( ) ⋅ logp t, r( )
P T = t( ) ⋅P R = r( )
"
#$$
%
&''
r∈PS∑
t∈TS∑
I0 (T,R) =I T,R( )
min H T( ),H R( )( )
Thomas Gottron IRSS, Athens, 18.7.2014, 42 Leveraging the Web of Data
Example: Normalized Mutual Information
P(T,R) r1 r2 r3 r4 P(T) t1 14% 2% 5% 8% 29% t2 5% 15% 2% 3% 25% t3 7% 3% 30% 5% 45%
P(R) 26% 20% 37% 17%
I0 (T,R) = 0.239
P(T,R) r1 r2 r3 r4 P(T) t1 22% 1% 0% 1% 24% t2 1% 23% 1% 0% 26% t3 1% 0% 48% 1% 50%
P(R) 24% 24% 49% 2%
I0 (T,R) = 0.766
Thomas Gottron IRSS, Athens, 18.7.2014, 43 Leveraging the Web of Data
Normalized Mutual Information on BTC‘12
§ Tendencies: w Relatively high redundancy w Freebase: (weakly) pre-defined schema w Timbl: narrow domain (FOAF profiles) w DBpedia: de-centralized schema
Data set I0(T,R) Rest 0.881 Datahub 0.747 Dbpedia 0.635 Freebase 0.860 Timbl 0.850
1
4
5
2
3
Thomas Gottron IRSS, Athens, 18.7.2014, 44 Leveraging the Web of Data
Finding Alternative Descriptions
Thomas Gottron IRSS, Athens, 18.7.2014, 45 Leveraging the Web of Data
Searching for a Suitable Description
SELECT ?x WHERE { ?x rdf:type foaf:Document }
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type foaf:PersonalProfileDocument }
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type sioc:Post . }
Declarative descriptions
Thomas Gottron IRSS, Athens, 18.7.2014, 46 Leveraging the Web of Data
Operations on the Declarative Description
Entity Set
C1
C2
C1
C2
C3
C1
C2
C1
C2
C1sub
C1sup
C2
C1
Add
Delete
Refine
Generalize
Thomas Gottron IRSS, Athens, 18.7.2014, 47 Leveraging the Web of Data
Just (Small) Baby Steps ...
Thomas Gottron IRSS, Athens, 18.7.2014, 48 Leveraging the Web of Data
Formal Concept Analysis
§ Constructs concepts and their hierarchy from data objects
§ Input (formal context) w Set O (the objects) w Set A (the attributes) w Relation I ⊆ O×A (which object has which attributes)
§ Derivation Operator
w (Sub)Set of objects
w (Sub)Set of attributes
Attributes common to all objects in X
Objects which have all attributes in Y
X ' := y ∈ A : x, y( )∈ I,∀x ∈ X{ }
Y ' := x ∈O : x, y( )∈ I,∀y ∈ Y{ }
Thomas Gottron IRSS, Athens, 18.7.2014, 49 Leveraging the Web of Data
Formal Concept Analysis
§ (X,Y) is a formal concept, if X'=Y and Y'=X w X is the extent w Y is the intent w Example: ({1,5,10},{a,c})
§ Partial order for formal concepts:
w (X1,Y1) ≤ (X2,Y2) if X1 ⊂ X2 w Equivalent to Y1 ⊃ Y2 w Example: ({1,10},{a,b,c}) ≤ ({1,5,10},{a,b})
§ Defines a lattice structure on the concepts
5
Table 1. Example of two formal contexts over two different attribute sets.
Object a b c
1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥
10 ⇥ ⇥ ⇥
Object x y z
1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥
10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q
q
q
q
q
q
M
M
M
M
M
M
�� �⌧⇠⇡ ⇢�;p
p
p
p
p
p
p�� �⌧⇠⇡ ⇢�{a}L
L
L
L
L
�� �⌧⇠⇡ ⇢�{b}r
r
r
r
r
L
L
L
L
L
�� �⌧⇠⇡ ⇢�{c}r
r
r
r
r
�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q
q
q
q
q
M
M
M
M
M�� �⌧⇠⇡ ⇢�{a, b}L
L
L
L
�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r
r
r
r
�� �⌧⇠⇡ ⇢�{x, y}M
M
M
M
�� �⌧⇠⇡ ⇢�{y, z}q
q
q
q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}
Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.
4.1 Parallel Formal Concept Lattices
Assume we have two sets M
1
and M
2
which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I
1
and I
2
. Then, we can construct twoparallel formal concept lattices B(G,M
1
, I
1
) and B(G,M
2
, I
2
). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.
Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I
2
over the set of attributes M2
= {x, y, z}. In I
2
the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.
4.2 Extension and Reduction Mappings Between Parallel Concept Lattices
We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.
Thomas Gottron IRSS, Athens, 18.7.2014, 50 Leveraging the Web of Data
Formal Concept Lattice (extent)
≤
5
Table 1. Example of two formal contexts over two different attribute sets.
Object a b c
1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥
10 ⇥ ⇥ ⇥
Object x y z
1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥
10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q
q
q
q
q
q
M
M
M
M
M
M
�� �⌧⇠⇡ ⇢�;p
p
p
p
p
p
p�� �⌧⇠⇡ ⇢�{a}L
L
L
L
L
�� �⌧⇠⇡ ⇢�{b}r
r
r
r
r
L
L
L
L
L
�� �⌧⇠⇡ ⇢�{c}r
r
r
r
r
�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q
q
q
q
q
M
M
M
M
M�� �⌧⇠⇡ ⇢�{a, b}L
L
L
L
�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r
r
r
r
�� �⌧⇠⇡ ⇢�{x, y}M
M
M
M
�� �⌧⇠⇡ ⇢�{y, z}q
q
q
q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}
Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.
4.1 Parallel Formal Concept Lattices
Assume we have two sets M
1
and M
2
which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I
1
and I
2
. Then, we can construct twoparallel formal concept lattices B(G,M
1
, I
1
) and B(G,M
2
, I
2
). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.
Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I
2
over the set of attributes M2
= {x, y, z}. In I
2
the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.
4.2 Extension and Reduction Mappings Between Parallel Concept Lattices
We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.
Thomas Gottron IRSS, Athens, 18.7.2014, 51 Leveraging the Web of Data
Formal Concept Lattice (intent)
Top-Concept: (O,Ø)
Bottom-Concept: (Ø,A)
5
Table 1. Example of two formal contexts over two different attribute sets.
Object a b c
1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥
10 ⇥ ⇥ ⇥
Object x y z
1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥
10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q
q
q
q
q
q
M
M
M
M
M
M
�� �⌧⇠⇡ ⇢�;p
p
p
p
p
p
p�� �⌧⇠⇡ ⇢�{a}L
L
L
L
L
�� �⌧⇠⇡ ⇢�{b}r
r
r
r
r
L
L
L
L
L
�� �⌧⇠⇡ ⇢�{c}r
r
r
r
r
�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q
q
q
q
q
M
M
M
M
M�� �⌧⇠⇡ ⇢�{a, b}L
L
L
L
�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r
r
r
r
�� �⌧⇠⇡ ⇢�{x, y}M
M
M
M
�� �⌧⇠⇡ ⇢�{y, z}q
q
q
q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}
Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.
4.1 Parallel Formal Concept Lattices
Assume we have two sets M
1
and M
2
which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I
1
and I
2
. Then, we can construct twoparallel formal concept lattices B(G,M
1
, I
1
) and B(G,M
2
, I
2
). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.
Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I
2
over the set of attributes M2
= {x, y, z}. In I
2
the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.
4.2 Extension and Reduction Mappings Between Parallel Concept Lattices
We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.
5
Table 1. Example of two formal contexts over two different attribute sets.
Object a b c
1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥
10 ⇥ ⇥ ⇥
Object x y z
1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥
10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q
q
q
q
q
q
M
M
M
M
M
M
�� �⌧⇠⇡ ⇢�;p
p
p
p
p
p
p�� �⌧⇠⇡ ⇢�{a}L
L
L
L
L
�� �⌧⇠⇡ ⇢�{b}r
r
r
r
r
L
L
L
L
L
�� �⌧⇠⇡ ⇢�{c}r
r
r
r
r
�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q
q
q
q
q
M
M
M
M
M�� �⌧⇠⇡ ⇢�{a, b}L
L
L
L
�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r
r
r
r
�� �⌧⇠⇡ ⇢�{x, y}M
M
M
M
�� �⌧⇠⇡ ⇢�{y, z}q
q
q
q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}
Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.
4.1 Parallel Formal Concept Lattices
Assume we have two sets M
1
and M
2
which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I
1
and I
2
. Then, we can construct twoparallel formal concept lattices B(G,M
1
, I
1
) and B(G,M
2
, I
2
). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.
Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I
2
over the set of attributes M2
= {x, y, z}. In I
2
the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.
4.2 Extension and Reduction Mappings Between Parallel Concept Lattices
We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.
Thomas Gottron IRSS, Athens, 18.7.2014, 52 Leveraging the Web of Data
Navigating the Lattice
Remove constraints
Extend object set
Add constraints
Reduce object set
Nice formalization, but ...
5
Table 1. Example of two formal contexts over two different attribute sets.
Object a b c
1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥
10 ⇥ ⇥ ⇥
Object x y z
1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥
10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q
q
q
q
q
q
M
M
M
M
M
M
�� �⌧⇠⇡ ⇢�;p
p
p
p
p
p
p�� �⌧⇠⇡ ⇢�{a}L
L
L
L
L
�� �⌧⇠⇡ ⇢�{b}r
r
r
r
r
L
L
L
L
L
�� �⌧⇠⇡ ⇢�{c}r
r
r
r
r
�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q
q
q
q
q
M
M
M
M
M�� �⌧⇠⇡ ⇢�{a, b}L
L
L
L
�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r
r
r
r
�� �⌧⇠⇡ ⇢�{x, y}M
M
M
M
�� �⌧⇠⇡ ⇢�{y, z}q
q
q
q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}
Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.
4.1 Parallel Formal Concept Lattices
Assume we have two sets M
1
and M
2
which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I
1
and I
2
. Then, we can construct twoparallel formal concept lattices B(G,M
1
, I
1
) and B(G,M
2
, I
2
). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.
Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I
2
over the set of attributes M2
= {x, y, z}. In I
2
the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.
4.2 Extension and Reduction Mappings Between Parallel Concept Lattices
We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.
Thomas Gottron IRSS, Athens, 18.7.2014, 53 Leveraging the Web of Data
… still Baby Steps
Thomas Gottron IRSS, Athens, 18.7.2014, 54 Leveraging the Web of Data
Parallel Lattices
§ Availability of several attribute sets w Facet dimensions w „Natural“ subdivision w Different descriptions of the same data
Thomas Gottron IRSS, Athens, 18.7.2014, 55 Leveraging the Web of Data
Parallel Lattices
Thomas Gottron IRSS, Athens, 18.7.2014, 56 Leveraging the Web of Data
General Idea for Mapping
Entity Set
C1 C2
C3
C4
C5
Approx. Entity
Set
deriv
e derive
approximate
description alternative description
Thomas Gottron IRSS, Athens, 18.7.2014, 57 Leveraging the Web of Data
Implementing Mappings
§ Minimal Extension w Top-Down
Maximal Reduction Bottom-Up
{b,c}' = {1,4,6,9,10} Alternative description for {b,c}
1
1,9
10
2,8 3,5,7
Precision? Recall?
Thomas Gottron IRSS, Athens, 18.7.2014, 58 Leveraging the Web of Data
Observations
§ On LOD: Mapping type sets onto property sets w Evaluation on 20 data sets (subset of BTC‘12)
§ Quality of approximations w max-red:
• High recall: mainly > 0.8 • Better for smaller concepts
w min-ext: • Good precision: mainly > 0.5 • Better for larger concepts
rss:Item
sioc:MicroblogPost
foaf:maker
sioc:has_discussion
dcterms:date
Thomas Gottron IRSS, Athens, 18.7.2014, 59 Leveraging the Web of Data
Evolution of Linked Data
Thomas Gottron IRSS, Athens, 18.7.2014, 60 Leveraging the Web of Data
Evolution of LOD
2007
2008
2009 2010
2011
Thomas Gottron IRSS, Athens, 18.7.2014, 61 Leveraging the Web of Data
Evolution of LOD
Time
Volu
me
Triples provided by data sources
Insertion, deletion,
modification
Thomas Gottron IRSS, Athens, 18.7.2014, 62 Leveraging the Web of Data
Effects on Indices: Decline in accuracy
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70 80
Mic
ro A
vg. F
1
Week of Data Snapshot
RDF Type TS PS IPS ECS SchemEX
Thomas Gottron IRSS, Athens, 18.7.2014, 63 Leveraging the Web of Data
Updates of Indices and Caches
Which sources to
prioritise in an update?
Thomas Gottron IRSS, Athens, 18.7.2014, 64 Leveraging the Web of Data
Change Metrics
§ Comparison of two RDF data sets (e.g. from different points in time) w Xi : Set of triple statements w Numeric expression for „distance“
§ Example:
X1
X2
Δ 0,∞[ )
ΔJaccard X1,X2( ) =1−X1∩X2X1∪X2
Suitable to measure dynamics???
Thomas Gottron IRSS, Athens, 18.7.2014, 65 Leveraging the Web of Data
Toy example: Changes Analysis of LOD
1st snapshot
Gerd Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Thomas Gottron IRSS, Athens, 18.7.2014, 66 Leveraging the Web of Data
Toy example: Changes Analysis of LOD
1st snapshot
Gerd Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
2nd snapshot
Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Institute Paluno
Thomas Gottron IRSS, Athens, 18.7.2014, 67 Leveraging the Web of Data
Toy example: Changes Analysis of LOD
Changes detected between 1st and 2nd snapshot 1. Deleted: <InstituteWEST hasMember Gerd> 2. New: <InstitutePaluno hasMember Gerd >
1st snapshot
Gerd Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
2nd snapshot
Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Institute Paluno
Thomas Gottron IRSS, Athens, 18.7.2014, 68 Leveraging the Web of Data
Toy example: Changes Analysis of LOD
1st snapshot
Gerd Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
2nd snapshot
Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Institute Paluno
3rd snapshot
Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Thomas Gottron IRSS, Athens, 18.7.2014, 69 Leveraging the Web of Data
Toy example: Changes Analysis of LOD
1st snapshot 2nd snapshot 3rd snapshot
Gerd Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Institute Paluno
Changes detected between 2nd and 3rd snapshot 1. New: <InstituteWEST hasMember Gerd> 2. Deleted: <InstitutePaluno hasMember Gerd >
Thomas Gottron IRSS, Athens, 18.7.2014, 70 Leveraging the Web of Data
Toy example: Changes Analysis of LOD
1st snapshot 2nd snapshot 3rd snapshot
Gerd Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Institute ZBW
Institute WeST
Thomas
Gerd
Ansgar
Renata
Institute Paluno
Changes detected between 1st and 3rd snapshot None!?!
Change metrics
capture differences We want to measure dynamics!
Thomas Gottron IRSS, Athens, 18.7.2014, 71 Leveraging the Web of Data
Measuring Dynamics: Requirements
§ Dynamics function Θ w quantify the evolution of a dataset X over a period of time
Θtit j (X) =Θ(Xtj
)−Θ(Xti) ≥ 0
Θ
Dynamics as amount of evolution
Time ti t j
X
Thomas Gottron IRSS, Athens, 18.7.2014, 72 Leveraging the Web of Data
Constructing a Dynamics Function
§ Function Θ difficult to define directly § Indirect definition over a change rate function c(Xt)
Θ(Xtj)−Θ(Xti
) = c Xt( )ti
t j
∫ dt
Time
Θ
cti t j
X
Thomas Gottron IRSS, Athens, 18.7.2014, 73 Leveraging the Web of Data
Change Rate Function
§ Also c(Xt) not explicitely known! § But can be approximated!
w Given snapshots of the data in small time intervals:
w The change rate can be approximated via change metrics:
Δ Xti,Xti−1( )
ti − ti−1ti−1→ti$ →$$ c Xti( ) = ddtΘ(Xti
)
Xt1,Xt2
,Xt3,!,Xtn
Thomas Gottron IRSS, Athens, 18.7.2014, 74 Leveraging the Web of Data
Dynamics Framework
§ Approximating c(Xt) as step function
Time ti t j
Θ
c
Θt1tn (X) = Δ Xti
,Xti−1( )i=2
n
∑ Choice of Δ:
Flexible use of
different notions
of change!
X
Thomas Gottron IRSS, Athens, 18.7.2014, 75 Leveraging the Web of Data
Introduction of Decay
§ So far: w Impact of evolution independent of moment in time w Desirable: Focus on certain periods of time
• e.g. recent past § Solution:
w Decay function f to assign weights to moments in time
Time
cti t j
ff ⋅c
Thomas Gottron IRSS, Athens, 18.7.2014, 76 Leveraging the Web of Data
Implementing a Decay Function
§ Exponential decay function:
§ Incoporated in the framework:
§ When using the step function approximation of c(Xt) :
f t( ) = e−λt
Θ(Xtj)−Θ(Xti
) = e−λ t j−t( ) ⋅c Xt( )ti
t j
∫ dt
Θt1tn (X) = e−λ tn−ti( ) ⋅ Δ Xti
,Xti−1( )i=2
n
∑
Thomas Gottron IRSS, Athens, 18.7.2014, 77 Leveraging the Web of Data
Tabelle1
Seite 1
2012-0
5-0
6
2012-0
6-0
3
2012-0
7-0
1
2012-0
7-2
9
2012-0
8-2
6
2012-0
9-2
3
2012-1
0-2
1
2012-1
1-1
8
2012-1
2-1
6
2013-0
1-1
3
2013-0
2-2
4
2013-0
3-2
4
2013-0
4-2
2
2013-0
5-1
9
2013-0
6-1
6
2013-0
7-1
4
2013-0
8-1
1
2013-0
9-0
8
2013-1
0-0
6
2013-1
1-0
3
0
0,2
0,4
0,6
0,8
1
Change Rate Function of Seleted Data Sources
Tabelle1
Seite 1
2012-0
5-0
6
2012-0
5-2
7
2012-0
6-1
7
2012-0
7-0
8
2012-0
7-2
9
2012-0
8-1
9
2012-0
9-0
9
2012-0
9-3
0
2012-1
0-2
1
2012-1
1-1
1
2012-1
2-0
2
2012-1
2-2
3
2013-0
1-1
3
2013-0
2-1
9
2013-0
3-1
0
2013-0
3-3
1
2013-0
4-2
2
2013-0
5-1
2
2013-0
6-0
4
2013-0
6-2
3
2013-0
7-1
4
2013-0
8-0
4
2013-0
8-2
5
2013-0
9-1
5
2013-1
0-0
6
2013-1
0-2
7
2013-1
1-1
7
0
0,2
0,4
0,6
0,8
1
Θ = 55.71 , Θdecay = 23.42 dbpedia.org
Tabelle1
Seite 1
2012-0
5-0
6
2012-0
6-0
3
2012-0
7-0
1
2012-0
7-2
9
2012-0
8-2
6
2012-0
9-2
3
2012-1
0-2
1
2012-1
1-1
8
2012-1
2-1
6
2013-0
1-1
3
2013-0
2-2
4
2013-0
3-2
4
2013-0
4-2
2
2013-0
5-1
9
2013-0
6-1
6
2013-0
7-1
4
2013-0
8-1
1
2013-0
9-0
8
2013-1
0-0
6
2013-1
1-0
3
0
0,2
0,4
0,6
0,8
1
Θ = 58.45 , Θdecay = 18.48 identi.ca
Θ = 51.75 , Θdecay = 25.03 linkedct.org
Tabelle1
Seite 1
2012-0
5-0
6
2012-0
6-0
3
2012-0
7-0
1
2012-0
7-2
9
2012-0
8-2
6
2012-0
9-2
3
2012-1
0-2
1
2012-1
1-1
8
2012-1
2-1
6
2013-0
1-1
3
2013-0
2-2
4
2013-0
3-2
4
2013-0
4-2
2
2013-0
5-1
9
2013-0
6-1
6
2013-0
7-1
4
2013-0
8-1
1
2013-0
9-0
8
2013-1
0-0
6
2013-1
1-0
3
0
0,2
0,4
0,6
0,8
1
Θ = 20.90 , Θdecay = 8.33 dbtune.org
Thomas Gottron IRSS, Athens, 18.7.2014, 78 Leveraging the Web of Data
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval M
anag
ing
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Ana
lysi
ng
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70 80
Mic
ro A
vg. F
1
Week of Data Snapshot
RDF Type TS PS IPS ECS SchemEX
Mak
ing
Use
Thomas Gottron IRSS, Athens, 18.7.2014, 79 Leveraging the Web of Data
Schema-level Search on LOD
Thomas Gottron IRSS, Athens, 18.7.2014, 80 Leveraging the Web of Data
Schema-based Access to the LOD cloud
Schema-level Index
Where? • ACM • DBLP
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
Thomas Gottron IRSS, Athens, 18.7.2014, 81 Leveraging the Web of Data
LODatio: Schema-level Search of LOD
Thomas Gottron IRSS, Athens, 18.7.2014, 82 Leveraging the Web of Data
LODatio: Query transformation
Thomas Gottron IRSS, Athens, 18.7.2014, 83 Leveraging the Web of Data
LODatio: Query transformation
foaf:Document
fb:Computer_Scientist
dc:creator
x
swrc:InProceedings
DBLP
...
tc2309 tc2101
eqc707
ps2608
foaf:Document swrc:InProceedings fb:Computer_Scientist
dc:creator
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
SELECT ?c WHERE { ?eqc schemex:hasDataset ?c . ?tc_A schemex:hasSubset ?eqc . ?tc_A schemex:hasClass foaf:Document . ?tc_A schemex:hasClass swrc:InProceedings . ?bs void:subjectsTarget ?eqc . ?bs void:objectsTarget ?tc_B . ?bs void:property dc:creator . ?tc_B schemex:hasClass fb:Computer_Scientist }
Thomas Gottron IRSS, Athens, 18.7.2014, 84 Leveraging the Web of Data
LODatio: Retrieval Results
Thomas Gottron IRSS, Athens, 18.7.2014, 85 Leveraging the Web of Data
LODatio: Retrieval Results
C1
EQCl
C1
EQCl
DS 23
URI 1
URI 2
URI 3
Entity count
Example entities
Thomas Gottron IRSS, Athens, 18.7.2014, 86 Leveraging the Web of Data
LODatio: User Support
Thomas Gottron IRSS, Athens, 18.7.2014, 87 Leveraging the Web of Data
LODatio: User Support
§ Currently implemented: w Moderate reductions /
extensions
§ Next release: w Include alternative
description based on parallel lattices
further properties
further types
DBLP
...
tc2309 tc2101
eqc707
ps2608
foaf:Document swrc:InProceedings fb:Computer_Scientist
dc:creator
DS 23
Thomas Gottron IRSS, Athens, 18.7.2014, 88 Leveraging the Web of Data
LODatio: next steps
Keyword search
Better recommendations
Other payload entities
Visual exploration
Related datasources
Coverage
Thomas Gottron IRSS, Athens, 18.7.2014, 89 Leveraging the Web of Data
Focused Exploration (work in progress)
Thomas Gottron IRSS, Athens, 18.7.2014, 90 Leveraging the Web of Data
Use Case: Social Media Coverage of Events
Thomas Gottron IRSS, Athens, 18.7.2014, 91 Leveraging the Web of Data
LinkedGeoData
OSM
owl:sameAs
??? Other locations?
Thomas Gottron IRSS, Athens, 18.7.2014, 92 Leveraging the Web of Data
Extending LinkedGeoData
Seed Exploration Overlay
Thomas Gottron IRSS, Athens, 18.7.2014, 93 Leveraging the Web of Data
Task of Focused Exploration (use case: locations)
§ Prioritise/select object URIs for exploration
umbel:Village
s
-1.404
50.897
wgs84:long
wgs84:lat
dbponto:isPartOf
dbponto:wikiPageExternalLink
dbponto:governmentType
dbpprop:settlementType
dbpprop:subdivisionName
o1
dbpprop:postalCode dcterms:subject
o4
o5
o6
o7
o8
o10
o9
o11
o2
o3
xxx
yyy
wgs84:long
wgs84:lat
Thomas Gottron IRSS, Athens, 18.7.2014, 94 Leveraging the Web of Data
Exploration based on Schema Semantics
§ Exploit rdfs:range definitions of predicates
§ Follow edges which lead to locations with higher priority
dbponto:twinCity dbpedia:City rdfs:range
dbpedia:Place
rdfs:subClassOf
Thomas Gottron IRSS, Athens, 18.7.2014, 95 Leveraging the Web of Data
Supervised Machine Learning
§ Use incoming predicates as features w Learn predicates typically leading to locations
§ Train a classifier (e.g. Naive Bayes)
o
xxx
yyy
wgs84:long
wgs84:lat
p2
p3 o‘
p4
p6
Thomas Gottron IRSS, Athens, 18.7.2014, 96 Leveraging the Web of Data
IR Inspired Approaches
§ Model discriminativeness of predicates w Inspired by tf-idf
§ Property relevance frequency (prf):
• Normalised version (prr) § Inverse property frequency
§ Rank by combine measure: prf-ipf
prf = c(p,L)
ipf = log c(∗,∗)c(p,∗)"
#$
%
&'
Thomas Gottron IRSS, Athens, 18.7.2014, 97 Leveraging the Web of Data
Performance
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
ROC
randomSchema SemanticsNB (all predicates)
NB (present predicates)prf-ipfprr-ipf
0.95
0.975
1
0 0.025 0.05
Thomas Gottron IRSS, Athens, 18.7.2014, 98 Leveraging the Web of Data
Performance 10
Table 2. Average performance of approaches († indicates significant improvements atconfidence level ⇢ = 0.01)
Method Recall Precision F1 Accuracy AUC
Schema Scemantics 0.1188 0.8119 0.2073 0.7262 0.5552NB (all predicates) 0.9906 0.9491 † 0.9694 † 0.9812 0.9970NB (observed predicates) 0.9943 0.9436 0.9683 0.9804 0.9968prf-ipf 0.8512 † 0.9754 0.9091 0.9487 0.9958prr-ipf † 0.9973 0.9240 0.9592 0.9745 0.9769
performance in bold. Furthermore, we marked the results where we had a significant im-provement over the second best method at confidence level of ⇢ = 0.01. The aggregatedvalues basically confirm the observations made above. In general, when considering themeasures F1, Accuracy and AUC, the Naive Bayes classifier making use of all predi-cates performs best. However, the advantage in comparison to the Naive Bayes classifierusing only observed terms is negligible. In application scenarios, where a high Recallis of importance, instead, the prr-ipf approach achieves the best results with more than99.7%. When focusing on Precision, prf-ipf performs best and demonstrated the high-est values. More than 97% of the objects predicted to have geo-coordinates actually didprovide such information. In a setting where we want to focus on promising items thismight be the kind of performance the end user is looking for.
One explanation for the very high accuracy in general might also be the dataset.Given that we started the exploration from location entities on DBPedia and Linked-GeoData, the overall dataset was biased towards entities from DBPedia. Hence, we in-tend to extend the evaluation to see if the quality of the supervised approaches remainsat a comparable level, when using larger and even more diverse datasets.
6 Related Work
Previous work related to this paper can be found in three areas, each of which will bedescribed below: (a) Extraction of geographic entities provides a starting point for ourapproach. The fields of (b) focused crawling on the WWW and (c) machine learningapplied to Linked Data in general each share some similarities with our classificationand ranking task, although differences do exist.
6.1 Extraction of Geographic Entities
Work done in the TRIDEC project [7] examined how geographic databases such asGeonames, OpenStreetMap and GooglePlaces could be used to avoid the need for errorprone named entity recognition and thus increase the overall precision when geoparsinglarge volumes of Twitter reports for crisis mapping. This work directly compared crisismaps from Twitter with official post-disaster environment agency impact assessments,highlighting just how accurate maps based on large-scale geospatial report crowd sourc-ing can be. We are building on this approach within the REVEAL project and extending
Thomas Gottron IRSS, Athens, 18.7.2014, 99 Leveraging the Web of Data
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Ana
lysi
ng
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70 80
Mic
ro A
vg. F
1
Week of Data Snapshot
RDF Type TS PS IPS ECS SchemEX
Mak
ing
Use
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval M
anag
ing
Thomas Gottron IRSS, Athens, 18.7.2014, 100 Leveraging the Web of Data
Summary
Pros Cons
rich
knowledge
base
diverse public
huge
on the Web
diverse distributed
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Technical solutions to some of the problems
Thomas Gottron IRSS, Athens, 18.7.2014, 101 Leveraging the Web of Data
Summary
Pros Cons
rich
knowledge
base
diverse public
huge
on the Web
diverse distributed
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Thomas Gottron IRSS, Athens, 18.7.2014, 102 Leveraging the Web of Data
Thank you!
Contact: Thomas Gottron WeST – Institute for Web Science and Technologies Universität Koblenz-Landau [email protected]
Thomas Gottron IRSS, Athens, 18.7.2014, 103 Leveraging the Web of Data
References
1. M. Konrath, T. Gottron, and A. Scherp, “Schemex – web-scale indexed schema extraction of linked open data,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2011.
2. M. Konrath, T. Gottron, S. Staab, and A. Scherp, “Schemex—efficient construction of a data catalogue by stream-based indexing of linked data,” Journal of Web Semantics, 2012.
3. T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “Explicit and implicit schema information on the linked open data cloud: Joined forces or antagonists?,” Tech. Rep. 06/2012, Institut WeST, Universität Koblenz-Landau, 2012.
4. T. Gottron and R. Pickhardt, “A detailed analysis of the quality of stream-based schema construction on linked open data,” in CSWS’12: Proceedings of the Chinese Semantic Web Symposium, 2012.
5. T. Gottron, A. Scherp, B. Krayer, and A. Peters, “Get the google feeling: Supporting users in finding relevant sources of linked open data at web-scale,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2012.
6. T. Gottron, A. Scherp, B. Krayer, and A. Peters, “LODatio: Using a Schema-Based Index to Support Users in Finding Relevant Sources of Linked Data,” in K-CAP’13: Proceedings of the Conference on Knowledge Capture, 2013.
7. T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud,” in ESWC’13: Proceedings of the 10th Extended Semantic Web Conference, 2013.
8. J. Schaible, T. Gottron, S. Scheglmann, and A. Scherp, “LOVER: Support for Modeling Data Using Linked Open Vocabularies,” in LWDM’13: 3rd International Workshop on Linked Web Data Management, 2013.
9. R. Dividino, A. Scherp, G. Gröner, and T. Gottron, “Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not?,” in COLD’13: International Workshop on Consuming Linked Data, 2013.
Thomas Gottron IRSS, Athens, 18.7.2014, 104 Leveraging the Web of Data
References
10. T. Gottron, M. Knauf, and A. Scherp, “Analysis of schema structures in the linked open data graph based on unique subject uris, pay-level domains, and vocabulary usage,” Distributed and Parallel Databases, pp. 1–39, 2014.
11. T. Gottron and C. Gottron, “Perplexity of index models over evolving linked data,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.
12. T. Gottron, A. Scherp, and S. Scheglmann, “Providing alternative declarative descriptions for entity sets using parallel concept lattices,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.
13. Carothers, G.: Rdf 1.1 n-quads. W3C Recommendation (Feb 2014), http://www.w3. org/TR/2014/REC-n-quads-20140225/, (accessed 14 March 2014)
14. Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In: The Se- mantic Web: Semantics and Big Data, Lecture Notes in Computer Science, vol. 7882, pp. 213–227. Springer Berlin Heidelberg (2013)
15. T. Gottron, “Of Sampling and Smoothing: Approximating Distributions over Linked Open Data,” in PROFILES’14: Proceedings of the Workshop on Dataset ProfiIling and Federated Search for Linked Data, 2014.
16. R. Dividino, T. Gottron, A. Scherp, and G. Gröner, “From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources,” in PROFILES’14: Proceedings of the Workshop on Dataset ProfiIling and Federated Search for Linked Data, 2014.
17. R. Dividino, A. Kramer, and T. Gottron, “An Investigation of HTTP Header Information for Detecting Changes of Linked Open Data Sources,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.
Thomas Gottron IRSS, Athens, 18.7.2014, 105 Leveraging the Web of Data
Sources
• Photograph of three of Nevins Memorial Library's earliest librarians. Wikimedia Commons collection, http://commons.wikimedia.org/wiki/File:Nevins_Library_First_Librarians.jpg
• Wide-angle view of the ALMA correlator, This Wikipedia and Wikimedia Commons image is from the European Southern Observatory (ESO) and is freely available at http://commons.wikimedia.org/wiki/File:Wide-angle_view_of_the_ALMA_correlator.jpg under Creative Commons Attribution 3.0 Unportedlicense.
• Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/, This work is available under a CC-BY-SA license.