Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

105
Institute for Web Science & Technologies – WeST Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data Thomas Gottron July 18th, 2014 IRSS

Transcript of Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Page 1: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Institute for Web Science & Technologies – WeST

Leveraging the Web of Data: Managing, Analysing and Making

Use of Linked Open Data

Thomas Gottron

July 18th, 2014 IRSS

Page 2: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 2 Leveraging the Web of Data

Linked Open Data – Vision of a Web of Data

§  „Classic“ Web w  Linked documents

§  Web of Data w  Linked data entities

Page 3: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 3 Leveraging the Web of Data

§  „Classic“ Web

Linked Open Data – Vision of a Web of Data

§  Web of Data ID

ID

Page 4: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 4 Leveraging the Web of Data

LOD – Base technologies

§  IDs: Dereferencable HTTP URIs §  Data Format: RDF §  No schema §  Links to other data sources

foaf:Document

„Extracting schema ...“

fb:Computer_Scientist

dc:creator

http://dblp.l3s.de/.../NesterovAM98

http://dblp.l3s.de/.../Serge_Abiteboul

rdf:type

„Serge Abiteboul“

dc:title

rdf:type

foaf:name

http://www.bibsonomy.org/.../Serge+Abiteboul

rdfs:seeAlso

1 Statement = 1 Tripel

Subject Predicate Object

rdf:type = http://www.w3.org/1999/02/22-rdf-syntax-ns#type foaf:Document = http://xmlns.com/foaf/0.1/Document

swrc:InProceedings rdf:type

Page 5: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 5 Leveraging the Web of Data

LOD Cloud

… the Web of Linked Data consisting of more than 30 Billion RDF triples from

hundreds of data sources …

Gerhard Weikum SIGMOD Blog, 6.3.2013 http://wp.sigmod.org/

Where’s the Data in the Big Data Wave?

Page 6: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 6 Leveraging the Web of Data

Some „Bubbles“ of the LOD Cloud

Page 7: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 7 Leveraging the Web of Data

Making Use of the Linked Data Cloud ...

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

LOD: a rich, huge, diverse, public and distributed knowledge base on the Web.

Pros Cons

rich

knowledge

base

diverse public

huge

on the Web

diverse distributed

Find technical solutions to overcome challenges

Page 8: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 8 Leveraging the Web of Data

Man

agin

g k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

Mak

ing

Use

A

naly

sing

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70 80

Mic

ro A

vg. F

1

Week of Data Snapshot

RDF Type TS PS IPS ECS SchemEX

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Page 9: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 9 Leveraging the Web of Data

Man

agin

g k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Ana

lysi

ng

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70 80

Mic

ro A

vg. F

1

Week of Data Snapshot

RDF Type TS PS IPS ECS SchemEX

Mak

ing

Use

Page 10: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 10 Leveraging the Web of Data

Data Format

§  Linked Data as N-Quads:

triple – what is the information?

context URI – where does it come from?

s o p

c

( ) s o p c

Page 11: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 11 Leveraging the Web of Data

Index Models

Page 12: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 12 Leveraging the Web of Data

(Abstract) Index Models

w  D : Data elements to be retrieved (payload) w  K : Key elements to access the data (index elements) w  σ : Selection function: How to get data for a key

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

DK σ

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

℘( )

Data items / Payload Keys

Page 13: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 13 Leveraging the Web of Data

Choices for Key Elements

Subject

s o p c

Search data

structure

s1 s2 sn

Context

Search data

structure

c1 c2 cm

s o p c

Literals

Search data

structure

x y z

s lit p c

Types

Search data

structure

t1 t2 tk

s t rdf: type

c

... Alternative:

predicates or

objects

Page 14: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 14 Leveraging the Web of Data

Choices for the Payload

Full Caching

local

Web

s o p c

Triples

local

Web

s o p

Entities

local

Web

s

Data Sources

local

Web

c

...

Page 15: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 15 Leveraging the Web of Data

, ...

Concrete Example: Subject Based Index Model

west:Gottron

west:Staab

west:Schegi

...

tud:CGottron

(west:Gottron, rdf:type, foaf:Person) (west:Gottron, foaf:knows, west:Staab) ...

(west:Staab, swrc:institution, west:WeST) (west:Staab, foaf:name, „Steffen Staab“) ...

(west:Schegi, rdf:type, foaf:Person) (west:Schegi, foaf:name, „Stefan Scheglmann“)

(tud:CGottron, swrc:institution, tud:KOM) (tud:CGottron, foaf:knows, west:Gottron) ...

s o1 p1 s s o2 p2

Page 16: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 16 Leveraging the Web of Data

Implemented Index Models

§  Triple based

§  Meta data

https://github.com/gottron/lod-index-models

s o p s

s o p p

s o p o

s o p term

s o p c

s o p PLD

Page 17: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 17 Leveraging the Web of Data

Schema-level Indices

Page 18: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 18 Leveraging the Web of Data

Schema Information on the LOD Cloud

(No) Schema?

Guidelines / best practices

Automatic tools Social effects

Emerging Schema!

Induce from data observations

Page 19: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 19 Leveraging the Web of Data

Examples for Schema Information

Property Set Type Set

{ }

... x, ... p1 p3 p2 { }

...

t1 t2 y, ...

rdf:type

y rdf:type

t1

t2

x

p1

p2

p3

Page 20: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 20 Leveraging the Web of Data

Implemented Index Models

§  Triple based

§  Meta data

§  Schema-level

https://github.com/gottron/lod-index-models

s o p s

s o p p

s o p o

s o p term

s o p c

s o p PLD

type s

SchemEX s

t t s t t

p p s p p

p-1 p-1 o p-1 p-1

t p s p t

Page 21: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 21 Leveraging the Web of Data

SchemEX

Page 22: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 22 Leveraging the Web of Data

Schema-based Access to the LOD cloud

? foaf:Document

fb:Computer_Scientist

dc:creator

x

swrc:InProceedings

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }

Page 23: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 23 Leveraging the Web of Data

Schema-based Access to the LOD cloud

Schema-level Index

Where? •  ACM •  DBLP

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }

Which schema information?

Page 24: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 24 Leveraging the Web of Data

Typecluster

§  Entities with the same Set of types

t1 t2

C1 C2 Cm

tn ...

...

TCj

Page 25: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 25 Leveraging the Web of Data

Typecluster: Example

foaf:Document swrc:InProceedings

DBLP ACM

tc2309

Page 26: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 26 Leveraging the Web of Data

Property Sets

§  Entities with the same Set of properties

p1 p2

C1 C2 Cm

pn ...

...

PSi

Page 27: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 27 Leveraging the Web of Data

Bi-Simulation: Example

dc:creator

BBC DBLP

ps2608

Page 28: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 28 Leveraging the Web of Data

SchemEX: Combination TC and PS

§  Partition of TC based on PS with restrictions on the destination TC (equivalence relation)

t1 t2 tn ...

C1 C2 Cm ...

t45 t2 tn‘ ...

p1 pn‘‘ ... EQC EQC

Cx

TCj TCk

EQCl

PSi

Sch

ema

Pay

load

p2

PS

Page 29: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 29 Leveraging the Web of Data

SchemEX: Example

DBLP

...

tc2309 tc2101

eqc707

ps2608

foaf:Document swrc:InProceedings fb:Computer_Scientist

dc:creator

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }

Page 30: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 30 Leveraging the Web of Data

Building Indices

Page 31: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 31 Leveraging the Web of Data

Building Indices: Operators

§  Combination of few simple operations w  Aggregate, Join, Invert

§  Example: Property Set index

s1 o1 p1 c1

s1 o1 p2 c1

s2 o2 p2 c1

s3 o3 p1 c1

s3 o4 p2 c1

s4 o1 p3 c1

s1 p1 p2

s2 p2

s3 p1 p2

s4 p3

p1 p2 s1 s3

p2 s2

p3 s4

Aggregate Invert

Page 32: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 32 Leveraging the Web of Data

SchemEX: Computation

§  Precise computation: 3 Aggregates + Join + Invert

t1 t2 tn ...

C1 C2 Cm ...

t45 t2 tn‘ ...

p1 pn‘‘ ... EQC EQC

Cx

TCj TCk

EQCl

PSi

Sch

ema

Pay

load

p2

PS

Better Approach? (Faster, more scalable)

Page 33: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 33 Leveraging the Web of Data

Stream-based Computation of SchemEX

§  LOD Crawler: Stream of N-Quads

… Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1

FiFo

4

3

2

1

1 6

2 3

4

5

t3

t2

t2

t1

t1 t2 tn ...

C1 C2 Cm ...

t45 t2 tn‘ ...

p1 pn‘‘ ... EQC EQC

Cx

TCj TCk

EQCl

PSi

Sch

ema

Pay

load

p2

PS

Page 34: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 34 Leveraging the Web of Data

Quality of Approximated Index

§  Stream-based computation vs. precise computation w  Data set of 11 Mio. tripel

Page 35: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 35 Leveraging the Web of Data

SchemEX: 1st place @ BTC 2011

§  SchemEX w  Allows complex queries (Star, Chain) w  Scalable computation w  High quality

§  Index over BTC 2011 data w  2.17 billion triple w  Index: 55 million triple

§  Commodity hardware w  VM: 1 Core, 4 GB RAM w  Throughput: 39.500 triple / second w  Computation of full index: 15h

Page 36: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 36 Leveraging the Web of Data

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval M

anag

ing

Mak

ing

Use

A

naly

sing

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70 80

Mic

ro A

vg. F

1

Week of Data Snapshot

RDF Type TS PS IPS ECS SchemEX

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Page 37: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 37 Leveraging the Web of Data

Redundancy of Schema Information

Page 38: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 38 Leveraging the Web of Data

Explicit and Implicit Schema Information on LOD

E1

rdf:type

Explicit

Assigning class types

Implicit

Modelling attributes

dc:creator E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Class Type

Entity

rdf:type Entity Property

Entity 2

To which degree?

Page 39: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 39 Leveraging the Web of Data

e.g.

Probabilistic model of schema information

§  Joint Distribution P(T,R) over w  Type sets: TS w  Property sets: PS

§  P(T=t,R=r) : probability of resource to have type set t and property set r

P(T,R)   r1 r2 r3 r4   P(T)  t1 14%   2%   5%   8%   29%  t2 5%   15%   2%   3%   25%  t3 7%   3%   30%   5%   45%   

P(R)   26%   20%   37%   17%  foaf:Document

swrc:InProceedings

e.g.

dc:creator

dc:title

Marginal Distributions

Marginal Distributions

Page 40: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 40 Leveraging the Web of Data

Estimating probabilities

Data set Triples TS PS Rest 22.3M 793 7,522 Datahub 910.1M 28,924 14,712 Dbpedia 198.1M 1,026,272 391,170 Freebase 101.2M 69,732 162,023 Timbl 204.8M 4,139 9,619

§  Todo (on large data sets) w  Determine schema use w  Aggregate counts

§  „Query“ schema-level index:

§  Data background: segments from BTC‘12

p(t, r) =d ∈σ t, r( )

N

Page 41: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 41 Leveraging the Web of Data

Redundancy

To which degree can one type of information (either types or properties) explain the respective other?

§  Mutual Information

§  Normalized MI:

•  H(T) and H(R) : entropy of the marginal distributions.

I(T,R) = p t, r( ) ⋅ logp t, r( )

P T = t( ) ⋅P R = r( )

"

#$$

%

&''

r∈PS∑

t∈TS∑

I0 (T,R) =I T,R( )

min H T( ),H R( )( )

Page 42: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 42 Leveraging the Web of Data

Example: Normalized Mutual Information

P(T,R)   r1 r2 r3 r4   P(T)  t1 14%   2%   5%   8%   29%  t2 5%   15%   2%   3%   25%  t3 7%   3%   30%   5%   45%   

P(R)   26%   20%   37%   17%  

I0 (T,R) = 0.239

P(T,R)   r1 r2 r3 r4   P(T)  t1 22%   1%   0%   1%   24%  t2 1%   23%   1%   0%   26%  t3 1%   0%   48%   1%   50%   

P(R)   24%   24%   49%   2%  

I0 (T,R) = 0.766

Page 43: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 43 Leveraging the Web of Data

Normalized Mutual Information on BTC‘12

§  Tendencies: w  Relatively high redundancy w  Freebase: (weakly) pre-defined schema w  Timbl: narrow domain (FOAF profiles) w  DBpedia: de-centralized schema

Data set I0(T,R) Rest 0.881 Datahub 0.747 Dbpedia 0.635 Freebase 0.860 Timbl 0.850

1

4

5

2

3

Page 44: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 44 Leveraging the Web of Data

Finding Alternative Descriptions

Page 45: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 45 Leveraging the Web of Data

Searching for a Suitable Description

SELECT ?x WHERE { ?x rdf:type foaf:Document }

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type foaf:PersonalProfileDocument }

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type sioc:Post . }

Declarative descriptions

Page 46: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 46 Leveraging the Web of Data

Operations on the Declarative Description

Entity Set

C1

C2

C1

C2

C3

C1

C2

C1

C2

C1sub

C1sup

C2

C1

Add

Delete

Refine

Generalize

Page 47: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 47 Leveraging the Web of Data

Just (Small) Baby Steps ...

Page 48: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 48 Leveraging the Web of Data

Formal Concept Analysis

§  Constructs concepts and their hierarchy from data objects

§  Input (formal context) w  Set O (the objects) w  Set A (the attributes) w Relation I ⊆ O×A (which object has which attributes)

§  Derivation Operator

w  (Sub)Set of objects

w  (Sub)Set of attributes

Attributes common to all objects in X

Objects which have all attributes in Y

X ' := y ∈ A : x, y( )∈ I,∀x ∈ X{ }

Y ' := x ∈O : x, y( )∈ I,∀y ∈ Y{ }

Page 49: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 49 Leveraging the Web of Data

Formal Concept Analysis

§  (X,Y) is a formal concept, if X'=Y and Y'=X w  X is the extent w  Y is the intent w  Example: ({1,5,10},{a,c})

§  Partial order for formal concepts:

w  (X1,Y1) ≤ (X2,Y2) if X1 ⊂ X2 w  Equivalent to Y1 ⊃ Y2 w  Example: ({1,10},{a,b,c}) ≤ ({1,5,10},{a,b})

§  Defines a lattice structure on the concepts

5

Table 1. Example of two formal contexts over two different attribute sets.

Object a b c

1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥

10 ⇥ ⇥ ⇥

Object x y z

1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥

10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q

q

q

q

q

q

M

M

M

M

M

M

�� �⌧⇠⇡ ⇢�;p

p

p

p

p

p

p�� �⌧⇠⇡ ⇢�{a}L

L

L

L

L

�� �⌧⇠⇡ ⇢�{b}r

r

r

r

r

L

L

L

L

L

�� �⌧⇠⇡ ⇢�{c}r

r

r

r

r

�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q

q

q

q

q

M

M

M

M

M�� �⌧⇠⇡ ⇢�{a, b}L

L

L

L

�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r

r

r

r

�� �⌧⇠⇡ ⇢�{x, y}M

M

M

M

�� �⌧⇠⇡ ⇢�{y, z}q

q

q

q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}

Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.

4.1 Parallel Formal Concept Lattices

Assume we have two sets M

1

and M

2

which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I

1

and I

2

. Then, we can construct twoparallel formal concept lattices B(G,M

1

, I

1

) and B(G,M

2

, I

2

). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.

Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I

2

over the set of attributes M2

= {x, y, z}. In I

2

the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.

4.2 Extension and Reduction Mappings Between Parallel Concept Lattices

We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.

Page 50: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 50 Leveraging the Web of Data

Formal Concept Lattice (extent)

5

Table 1. Example of two formal contexts over two different attribute sets.

Object a b c

1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥

10 ⇥ ⇥ ⇥

Object x y z

1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥

10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q

q

q

q

q

q

M

M

M

M

M

M

�� �⌧⇠⇡ ⇢�;p

p

p

p

p

p

p�� �⌧⇠⇡ ⇢�{a}L

L

L

L

L

�� �⌧⇠⇡ ⇢�{b}r

r

r

r

r

L

L

L

L

L

�� �⌧⇠⇡ ⇢�{c}r

r

r

r

r

�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q

q

q

q

q

M

M

M

M

M�� �⌧⇠⇡ ⇢�{a, b}L

L

L

L

�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r

r

r

r

�� �⌧⇠⇡ ⇢�{x, y}M

M

M

M

�� �⌧⇠⇡ ⇢�{y, z}q

q

q

q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}

Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.

4.1 Parallel Formal Concept Lattices

Assume we have two sets M

1

and M

2

which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I

1

and I

2

. Then, we can construct twoparallel formal concept lattices B(G,M

1

, I

1

) and B(G,M

2

, I

2

). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.

Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I

2

over the set of attributes M2

= {x, y, z}. In I

2

the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.

4.2 Extension and Reduction Mappings Between Parallel Concept Lattices

We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.

Page 51: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 51 Leveraging the Web of Data

Formal Concept Lattice (intent)

Top-Concept: (O,Ø)

Bottom-Concept: (Ø,A)

5

Table 1. Example of two formal contexts over two different attribute sets.

Object a b c

1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥

10 ⇥ ⇥ ⇥

Object x y z

1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥

10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q

q

q

q

q

q

M

M

M

M

M

M

�� �⌧⇠⇡ ⇢�;p

p

p

p

p

p

p�� �⌧⇠⇡ ⇢�{a}L

L

L

L

L

�� �⌧⇠⇡ ⇢�{b}r

r

r

r

r

L

L

L

L

L

�� �⌧⇠⇡ ⇢�{c}r

r

r

r

r

�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q

q

q

q

q

M

M

M

M

M�� �⌧⇠⇡ ⇢�{a, b}L

L

L

L

�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r

r

r

r

�� �⌧⇠⇡ ⇢�{x, y}M

M

M

M

�� �⌧⇠⇡ ⇢�{y, z}q

q

q

q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}

Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.

4.1 Parallel Formal Concept Lattices

Assume we have two sets M

1

and M

2

which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I

1

and I

2

. Then, we can construct twoparallel formal concept lattices B(G,M

1

, I

1

) and B(G,M

2

, I

2

). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.

Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I

2

over the set of attributes M2

= {x, y, z}. In I

2

the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.

4.2 Extension and Reduction Mappings Between Parallel Concept Lattices

We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.

5

Table 1. Example of two formal contexts over two different attribute sets.

Object a b c

1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥

10 ⇥ ⇥ ⇥

Object x y z

1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥

10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q

q

q

q

q

q

M

M

M

M

M

M

�� �⌧⇠⇡ ⇢�;p

p

p

p

p

p

p�� �⌧⇠⇡ ⇢�{a}L

L

L

L

L

�� �⌧⇠⇡ ⇢�{b}r

r

r

r

r

L

L

L

L

L

�� �⌧⇠⇡ ⇢�{c}r

r

r

r

r

�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q

q

q

q

q

M

M

M

M

M�� �⌧⇠⇡ ⇢�{a, b}L

L

L

L

�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r

r

r

r

�� �⌧⇠⇡ ⇢�{x, y}M

M

M

M

�� �⌧⇠⇡ ⇢�{y, z}q

q

q

q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}

Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.

4.1 Parallel Formal Concept Lattices

Assume we have two sets M

1

and M

2

which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I

1

and I

2

. Then, we can construct twoparallel formal concept lattices B(G,M

1

, I

1

) and B(G,M

2

, I

2

). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.

Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I

2

over the set of attributes M2

= {x, y, z}. In I

2

the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.

4.2 Extension and Reduction Mappings Between Parallel Concept Lattices

We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.

Page 52: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 52 Leveraging the Web of Data

Navigating the Lattice

Remove constraints

Extend object set

Add constraints

Reduce object set

Nice formalization, but ...

5

Table 1. Example of two formal contexts over two different attribute sets.

Object a b c

1 ⇥ ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥5 ⇥ ⇥6 ⇥ ⇥7 ⇥8 ⇥ ⇥9 ⇥ ⇥

10 ⇥ ⇥ ⇥

Object x y z

1 ⇥ ⇥2 ⇥3 ⇥4 ⇥ ⇥ ⇥5 ⇥6 ⇥ ⇥ ⇥7 ⇥8 ⇥9 ⇥ ⇥

10 ⇥ ⇥�� �⌧⇠⇡ ⇢�;q

q

q

q

q

q

M

M

M

M

M

M

�� �⌧⇠⇡ ⇢�;p

p

p

p

p

p

p�� �⌧⇠⇡ ⇢�{a}L

L

L

L

L

�� �⌧⇠⇡ ⇢�{b}r

r

r

r

r

L

L

L

L

L

�� �⌧⇠⇡ ⇢�{c}r

r

r

r

r

�� �⌧⇠⇡ ⇢�{x} �� �⌧⇠⇡ ⇢�{y}q

q

q

q

q

M

M

M

M

M�� �⌧⇠⇡ ⇢�{a, b}L

L

L

L

�� �⌧⇠⇡ ⇢�{a, c} �� �⌧⇠⇡ ⇢�{b, c}r

r

r

r

�� �⌧⇠⇡ ⇢�{x, y}M

M

M

M

�� �⌧⇠⇡ ⇢�{y, z}q

q

q

q�� �⌧⇠⇡ ⇢�{a, b, c} �� �⌧⇠⇡ ⇢�{x, y, z}

Figure 2. Formal concept lattice structures based on the relations in Table 1. The concepts arerepresented by their intent—which provides a better overview.

4.1 Parallel Formal Concept Lattices

Assume we have two sets M

1

and M

2

which can serve as attributes to describe theobjects in G. Accordingly, there are two relations I

1

and I

2

. Then, we can construct twoparallel formal concept lattices B(G,M

1

, I

1

) and B(G,M

2

, I

2

). Note, that while theintent of the concepts in parallel lattices is defined over two different sets of attributes,the extent of the concepts are always based on the same set G. The idea of parallellattices can easily be extended to an arbitrary number of attribute sets.

Example 2 (Parallel Concept Lattice). In Table 1, we have listed a second relation I

2

over the set of attributes M2

= {x, y, z}. In I

2

the same objects are related to a differentset of attributes. If we construct a formal concept lattice over this relation we obtain thelattice on the right hand side in Figure 2.

4.2 Extension and Reduction Mappings Between Parallel Concept Lattices

We now introduce two mappings between parallel lattices which are defined over theextent of the formal concepts in the lattices. Such mappings will allow for the approxi-mation of the extent of a concept from a base lattice using the extent of a concept in analternative lattice. The concept in an alternative lattice provides an alternative represen-tation via its intent composed over a different set of attributes.

Page 53: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 53 Leveraging the Web of Data

… still Baby Steps

Page 54: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 54 Leveraging the Web of Data

Parallel Lattices

§  Availability of several attribute sets w  Facet dimensions w  „Natural“ subdivision w Different descriptions of the same data

Page 55: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 55 Leveraging the Web of Data

Parallel Lattices

Page 56: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 56 Leveraging the Web of Data

General Idea for Mapping

Entity Set

C1 C2

C3

C4

C5

Approx. Entity

Set

deriv

e derive

approximate

description alternative description

Page 57: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 57 Leveraging the Web of Data

Implementing Mappings

§  Minimal Extension w  Top-Down

Maximal Reduction Bottom-Up

{b,c}' = {1,4,6,9,10} Alternative description for {b,c}

1

1,9

10

2,8 3,5,7

Precision? Recall?

Page 58: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 58 Leveraging the Web of Data

Observations

§  On LOD: Mapping type sets onto property sets w  Evaluation on 20 data sets (subset of BTC‘12)

§  Quality of approximations w max-red:

•  High recall: mainly > 0.8 •  Better for smaller concepts

w min-ext: •  Good precision: mainly > 0.5 •  Better for larger concepts

rss:Item

sioc:MicroblogPost

foaf:maker

sioc:has_discussion

dcterms:date

Page 59: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 59 Leveraging the Web of Data

Evolution of Linked Data

Page 60: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 60 Leveraging the Web of Data

Evolution of LOD

2007

2008

2009 2010

2011

Page 61: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 61 Leveraging the Web of Data

Evolution of LOD

Time

Volu

me

Triples provided by data sources

Insertion, deletion,

modification

Page 62: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 62 Leveraging the Web of Data

Effects on Indices: Decline in accuracy

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70 80

Mic

ro A

vg. F

1

Week of Data Snapshot

RDF Type TS PS IPS ECS SchemEX

Page 63: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 63 Leveraging the Web of Data

Updates of Indices and Caches

Which sources to

prioritise in an update?

Page 64: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 64 Leveraging the Web of Data

Change Metrics

§  Comparison of two RDF data sets (e.g. from different points in time) w  Xi : Set of triple statements w  Numeric expression for „distance“

§  Example:

X1

X2

Δ 0,∞[ )

ΔJaccard X1,X2( ) =1−X1∩X2X1∪X2

Suitable to measure dynamics???

Page 65: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 65 Leveraging the Web of Data

Toy example: Changes Analysis of LOD

1st snapshot

Gerd Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Page 66: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 66 Leveraging the Web of Data

Toy example: Changes Analysis of LOD

1st snapshot

Gerd Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

2nd snapshot

Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Institute Paluno

Page 67: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 67 Leveraging the Web of Data

Toy example: Changes Analysis of LOD

Changes detected between 1st and 2nd snapshot 1.  Deleted: <InstituteWEST hasMember Gerd> 2.  New: <InstitutePaluno hasMember Gerd >

1st snapshot

Gerd Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

2nd snapshot

Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Institute Paluno

Page 68: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 68 Leveraging the Web of Data

Toy example: Changes Analysis of LOD

1st snapshot

Gerd Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

2nd snapshot

Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Institute Paluno

3rd snapshot

Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Page 69: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 69 Leveraging the Web of Data

Toy example: Changes Analysis of LOD

1st snapshot 2nd snapshot 3rd snapshot

Gerd Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Institute Paluno

Changes detected between 2nd and 3rd snapshot 1.  New: <InstituteWEST hasMember Gerd> 2.  Deleted: <InstitutePaluno hasMember Gerd >

Page 70: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 70 Leveraging the Web of Data

Toy example: Changes Analysis of LOD

1st snapshot 2nd snapshot 3rd snapshot

Gerd Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Institute ZBW

Institute WeST

Thomas

Gerd

Ansgar

Renata

Institute Paluno

Changes detected between 1st and 3rd snapshot None!?!

Change metrics

capture differences We want to measure dynamics!

Page 71: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 71 Leveraging the Web of Data

Measuring Dynamics: Requirements

§  Dynamics function Θ w  quantify the evolution of a dataset X over a period of time

Θtit j (X) =Θ(Xtj

)−Θ(Xti) ≥ 0

Θ

Dynamics as amount of evolution

Time ti t j

X

Page 72: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 72 Leveraging the Web of Data

Constructing a Dynamics Function

§  Function Θ difficult to define directly §  Indirect definition over a change rate function c(Xt)

Θ(Xtj)−Θ(Xti

) = c Xt( )ti

t j

∫ dt

Time

Θ

cti t j

X

Page 73: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 73 Leveraging the Web of Data

Change Rate Function

§  Also c(Xt) not explicitely known! §  But can be approximated!

w Given snapshots of the data in small time intervals:

w  The change rate can be approximated via change metrics:

Δ Xti,Xti−1( )

ti − ti−1ti−1→ti$ →$$ c Xti( ) = ddtΘ(Xti

)

Xt1,Xt2

,Xt3,!,Xtn

Page 74: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 74 Leveraging the Web of Data

Dynamics Framework

§  Approximating c(Xt) as step function

Time ti t j

Θ

c

Θt1tn (X) = Δ Xti

,Xti−1( )i=2

n

∑ Choice of Δ:

Flexible use of

different notions

of change!

X

Page 75: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 75 Leveraging the Web of Data

Introduction of Decay

§  So far: w  Impact of evolution independent of moment in time w  Desirable: Focus on certain periods of time

•  e.g. recent past §  Solution:

w  Decay function f to assign weights to moments in time

Time

cti t j

ff ⋅c

Page 76: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 76 Leveraging the Web of Data

Implementing a Decay Function

§  Exponential decay function:

§  Incoporated in the framework:

§  When using the step function approximation of c(Xt) :

f t( ) = e−λt

Θ(Xtj)−Θ(Xti

) = e−λ t j−t( ) ⋅c Xt( )ti

t j

∫ dt

Θt1tn (X) = e−λ tn−ti( ) ⋅ Δ Xti

,Xti−1( )i=2

n

Page 77: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 77 Leveraging the Web of Data

Tabelle1

Seite 1

2012-0

5-0

6

2012-0

6-0

3

2012-0

7-0

1

2012-0

7-2

9

2012-0

8-2

6

2012-0

9-2

3

2012-1

0-2

1

2012-1

1-1

8

2012-1

2-1

6

2013-0

1-1

3

2013-0

2-2

4

2013-0

3-2

4

2013-0

4-2

2

2013-0

5-1

9

2013-0

6-1

6

2013-0

7-1

4

2013-0

8-1

1

2013-0

9-0

8

2013-1

0-0

6

2013-1

1-0

3

0

0,2

0,4

0,6

0,8

1

Change Rate Function of Seleted Data Sources

Tabelle1

Seite 1

2012-0

5-0

6

2012-0

5-2

7

2012-0

6-1

7

2012-0

7-0

8

2012-0

7-2

9

2012-0

8-1

9

2012-0

9-0

9

2012-0

9-3

0

2012-1

0-2

1

2012-1

1-1

1

2012-1

2-0

2

2012-1

2-2

3

2013-0

1-1

3

2013-0

2-1

9

2013-0

3-1

0

2013-0

3-3

1

2013-0

4-2

2

2013-0

5-1

2

2013-0

6-0

4

2013-0

6-2

3

2013-0

7-1

4

2013-0

8-0

4

2013-0

8-2

5

2013-0

9-1

5

2013-1

0-0

6

2013-1

0-2

7

2013-1

1-1

7

0

0,2

0,4

0,6

0,8

1

Θ = 55.71 , Θdecay = 23.42 dbpedia.org

Tabelle1

Seite 1

2012-0

5-0

6

2012-0

6-0

3

2012-0

7-0

1

2012-0

7-2

9

2012-0

8-2

6

2012-0

9-2

3

2012-1

0-2

1

2012-1

1-1

8

2012-1

2-1

6

2013-0

1-1

3

2013-0

2-2

4

2013-0

3-2

4

2013-0

4-2

2

2013-0

5-1

9

2013-0

6-1

6

2013-0

7-1

4

2013-0

8-1

1

2013-0

9-0

8

2013-1

0-0

6

2013-1

1-0

3

0

0,2

0,4

0,6

0,8

1

Θ = 58.45 , Θdecay = 18.48 identi.ca

Θ = 51.75 , Θdecay = 25.03 linkedct.org

Tabelle1

Seite 1

2012-0

5-0

6

2012-0

6-0

3

2012-0

7-0

1

2012-0

7-2

9

2012-0

8-2

6

2012-0

9-2

3

2012-1

0-2

1

2012-1

1-1

8

2012-1

2-1

6

2013-0

1-1

3

2013-0

2-2

4

2013-0

3-2

4

2013-0

4-2

2

2013-0

5-1

9

2013-0

6-1

6

2013-0

7-1

4

2013-0

8-1

1

2013-0

9-0

8

2013-1

0-0

6

2013-1

1-0

3

0

0,2

0,4

0,6

0,8

1

Θ = 20.90 , Θdecay = 8.33 dbtune.org

Page 78: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 78 Leveraging the Web of Data

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval M

anag

ing

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Ana

lysi

ng

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70 80

Mic

ro A

vg. F

1

Week of Data Snapshot

RDF Type TS PS IPS ECS SchemEX

Mak

ing

Use

Page 79: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 79 Leveraging the Web of Data

Schema-level Search on LOD

Page 80: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 80 Leveraging the Web of Data

Schema-based Access to the LOD cloud

Schema-level Index

Where? •  ACM •  DBLP

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }

Page 81: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 81 Leveraging the Web of Data

LODatio: Schema-level Search of LOD

Page 82: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 82 Leveraging the Web of Data

LODatio: Query transformation

Page 83: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 83 Leveraging the Web of Data

LODatio: Query transformation

foaf:Document

fb:Computer_Scientist

dc:creator

x

swrc:InProceedings

DBLP

...

tc2309 tc2101

eqc707

ps2608

foaf:Document swrc:InProceedings fb:Computer_Scientist

dc:creator

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }

SELECT ?c WHERE { ?eqc schemex:hasDataset ?c . ?tc_A schemex:hasSubset ?eqc . ?tc_A schemex:hasClass foaf:Document . ?tc_A schemex:hasClass swrc:InProceedings . ?bs void:subjectsTarget ?eqc . ?bs void:objectsTarget ?tc_B . ?bs void:property dc:creator . ?tc_B schemex:hasClass fb:Computer_Scientist }

Page 84: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 84 Leveraging the Web of Data

LODatio: Retrieval Results

Page 85: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 85 Leveraging the Web of Data

LODatio: Retrieval Results

C1

EQCl

C1

EQCl

DS 23

URI 1

URI 2

URI 3

Entity count

Example entities

Page 86: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 86 Leveraging the Web of Data

LODatio: User Support

Page 87: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 87 Leveraging the Web of Data

LODatio: User Support

§  Currently implemented: w Moderate reductions /

extensions

§  Next release: w  Include alternative

description based on parallel lattices

further properties

further types

DBLP

...

tc2309 tc2101

eqc707

ps2608

foaf:Document swrc:InProceedings fb:Computer_Scientist

dc:creator

DS 23

Page 88: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 88 Leveraging the Web of Data

LODatio: next steps

Keyword search

Better recommendations

Other payload entities

Visual exploration

Related datasources

Coverage

Page 89: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 89 Leveraging the Web of Data

Focused Exploration (work in progress)

Page 90: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 90 Leveraging the Web of Data

Use Case: Social Media Coverage of Events

Page 91: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 91 Leveraging the Web of Data

LinkedGeoData

OSM

owl:sameAs

??? Other locations?

Page 92: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 92 Leveraging the Web of Data

Extending LinkedGeoData

Seed Exploration Overlay

Page 93: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 93 Leveraging the Web of Data

Task of Focused Exploration (use case: locations)

§  Prioritise/select object URIs for exploration

umbel:Village

s

-1.404

50.897

wgs84:long

wgs84:lat

dbponto:isPartOf

dbponto:wikiPageExternalLink

dbponto:governmentType

dbpprop:settlementType

dbpprop:subdivisionName

o1

dbpprop:postalCode dcterms:subject

o4

o5

o6

o7

o8

o10

o9

o11

o2

o3

xxx

yyy

wgs84:long

wgs84:lat

Page 94: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 94 Leveraging the Web of Data

Exploration based on Schema Semantics

§  Exploit rdfs:range definitions of predicates

§  Follow edges which lead to locations with higher priority

dbponto:twinCity dbpedia:City rdfs:range

dbpedia:Place

rdfs:subClassOf

Page 95: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 95 Leveraging the Web of Data

Supervised Machine Learning

§  Use incoming predicates as features w  Learn predicates typically leading to locations

§  Train a classifier (e.g. Naive Bayes)

o

xxx

yyy

wgs84:long

wgs84:lat

p2

p3 o‘

p4

p6

Page 96: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 96 Leveraging the Web of Data

IR Inspired Approaches

§  Model discriminativeness of predicates w  Inspired by tf-idf

§  Property relevance frequency (prf):

•  Normalised version (prr) §  Inverse property frequency

§  Rank by combine measure: prf-ipf

prf = c(p,L)

ipf = log c(∗,∗)c(p,∗)"

#$

%

&'

Page 97: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 97 Leveraging the Web of Data

Performance

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

ROC

randomSchema SemanticsNB (all predicates)

NB (present predicates)prf-ipfprr-ipf

0.95

0.975

1

0 0.025 0.05

Page 98: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 98 Leveraging the Web of Data

Performance 10

Table 2. Average performance of approaches († indicates significant improvements atconfidence level ⇢ = 0.01)

Method Recall Precision F1 Accuracy AUC

Schema Scemantics 0.1188 0.8119 0.2073 0.7262 0.5552NB (all predicates) 0.9906 0.9491 † 0.9694 † 0.9812 0.9970NB (observed predicates) 0.9943 0.9436 0.9683 0.9804 0.9968prf-ipf 0.8512 † 0.9754 0.9091 0.9487 0.9958prr-ipf † 0.9973 0.9240 0.9592 0.9745 0.9769

performance in bold. Furthermore, we marked the results where we had a significant im-provement over the second best method at confidence level of ⇢ = 0.01. The aggregatedvalues basically confirm the observations made above. In general, when considering themeasures F1, Accuracy and AUC, the Naive Bayes classifier making use of all predi-cates performs best. However, the advantage in comparison to the Naive Bayes classifierusing only observed terms is negligible. In application scenarios, where a high Recallis of importance, instead, the prr-ipf approach achieves the best results with more than99.7%. When focusing on Precision, prf-ipf performs best and demonstrated the high-est values. More than 97% of the objects predicted to have geo-coordinates actually didprovide such information. In a setting where we want to focus on promising items thismight be the kind of performance the end user is looking for.

One explanation for the very high accuracy in general might also be the dataset.Given that we started the exploration from location entities on DBPedia and Linked-GeoData, the overall dataset was biased towards entities from DBPedia. Hence, we in-tend to extend the evaluation to see if the quality of the supervised approaches remainsat a comparable level, when using larger and even more diverse datasets.

6 Related Work

Previous work related to this paper can be found in three areas, each of which will bedescribed below: (a) Extraction of geographic entities provides a starting point for ourapproach. The fields of (b) focused crawling on the WWW and (c) machine learningapplied to Linked Data in general each share some similarities with our classificationand ranking task, although differences do exist.

6.1 Extraction of Geographic Entities

Work done in the TRIDEC project [7] examined how geographic databases such asGeonames, OpenStreetMap and GooglePlaces could be used to avoid the need for errorprone named entity recognition and thus increase the overall precision when geoparsinglarge volumes of Twitter reports for crisis mapping. This work directly compared crisismaps from Twitter with official post-disaster environment agency impact assessments,highlighting just how accurate maps based on large-scale geospatial report crowd sourc-ing can be. We are building on this approach within the REVEAL project and extending

Page 99: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 99 Leveraging the Web of Data

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Ana

lysi

ng

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70 80

Mic

ro A

vg. F

1

Week of Data Snapshot

RDF Type TS PS IPS ECS SchemEX

Mak

ing

Use

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval M

anag

ing

Page 100: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 100 Leveraging the Web of Data

Summary

Pros Cons

rich

knowledge

base

diverse public

huge

on the Web

diverse distributed

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Technical solutions to some of the problems

Page 101: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 101 Leveraging the Web of Data

Summary

Pros Cons

rich

knowledge

base

diverse public

huge

on the Web

diverse distributed

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Page 102: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 102 Leveraging the Web of Data

Thank you!

Contact: Thomas Gottron WeST – Institute for Web Science and Technologies Universität Koblenz-Landau [email protected]

Page 103: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 103 Leveraging the Web of Data

References

1.  M. Konrath, T. Gottron, and A. Scherp, “Schemex – web-scale indexed schema extraction of linked open data,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2011.

2.  M. Konrath, T. Gottron, S. Staab, and A. Scherp, “Schemex—efficient construction of a data catalogue by stream-based indexing of linked data,” Journal of Web Semantics, 2012.

3.  T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “Explicit and implicit schema information on the linked open data cloud: Joined forces or antagonists?,” Tech. Rep. 06/2012, Institut WeST, Universität Koblenz-Landau, 2012.

4.  T. Gottron and R. Pickhardt, “A detailed analysis of the quality of stream-based schema construction on linked open data,” in CSWS’12: Proceedings of the Chinese Semantic Web Symposium, 2012.

5.  T. Gottron, A. Scherp, B. Krayer, and A. Peters, “Get the google feeling: Supporting users in finding relevant sources of linked open data at web-scale,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2012.

6.  T. Gottron, A. Scherp, B. Krayer, and A. Peters, “LODatio: Using a Schema-Based Index to Support Users in Finding Relevant Sources of Linked Data,” in K-CAP’13: Proceedings of the Conference on Knowledge Capture, 2013.

7.  T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud,” in ESWC’13: Proceedings of the 10th Extended Semantic Web Conference, 2013.

8.  J. Schaible, T. Gottron, S. Scheglmann, and A. Scherp, “LOVER: Support for Modeling Data Using Linked Open Vocabularies,” in LWDM’13: 3rd International Workshop on Linked Web Data Management, 2013.

9.  R. Dividino, A. Scherp, G. Gröner, and T. Gottron, “Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not?,” in COLD’13: International Workshop on Consuming Linked Data, 2013.

Page 104: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 104 Leveraging the Web of Data

References

10.  T. Gottron, M. Knauf, and A. Scherp, “Analysis of schema structures in the linked open data graph based on unique subject uris, pay-level domains, and vocabulary usage,” Distributed and Parallel Databases, pp. 1–39, 2014.

11.  T. Gottron and C. Gottron, “Perplexity of index models over evolving linked data,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.

12.  T. Gottron, A. Scherp, and S. Scheglmann, “Providing alternative declarative descriptions for entity sets using parallel concept lattices,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.

13.  Carothers, G.: Rdf 1.1 n-quads. W3C Recommendation (Feb 2014), http://www.w3. org/TR/2014/REC-n-quads-20140225/, (accessed 14 March 2014)

14.  Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In: The Se- mantic Web: Semantics and Big Data, Lecture Notes in Computer Science, vol. 7882, pp. 213–227. Springer Berlin Heidelberg (2013)

15.  T. Gottron, “Of Sampling and Smoothing: Approximating Distributions over Linked Open Data,” in PROFILES’14: Proceedings of the Workshop on Dataset ProfiIling and Federated Search for Linked Data, 2014.

16.  R. Dividino, T. Gottron, A. Scherp, and G. Gröner, “From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources,” in PROFILES’14: Proceedings of the Workshop on Dataset ProfiIling and Federated Search for Linked Data, 2014.

17.  R. Dividino, A. Kramer, and T. Gottron, “An Investigation of HTTP Header Information for Detecting Changes of Linked Open Data Sources,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.

Page 105: Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open Data

Thomas Gottron IRSS, Athens, 18.7.2014, 105 Leveraging the Web of Data

Sources

•  Photograph of three of Nevins Memorial Library's earliest librarians. Wikimedia Commons collection, http://commons.wikimedia.org/wiki/File:Nevins_Library_First_Librarians.jpg

•  Wide-angle view of the ALMA correlator, This Wikipedia and Wikimedia Commons image is from the European Southern Observatory (ESO) and is freely available at http://commons.wikimedia.org/wiki/File:Wide-angle_view_of_the_ALMA_correlator.jpg under Creative Commons Attribution 3.0 Unportedlicense.

•  Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/, This work is available under a CC-BY-SA license.