Information Content based Ranking Metric for Linked Open Vocabularies
-
Upload
ghislain-atemezing -
Category
Data & Analytics
-
view
281 -
download
1
description
Transcript of Information Content based Ranking Metric for Linked Open Vocabularies
Ghislain A. Atemezing (@gatemezing)
Raphaël Troncy (@rtroncy)
Information Content based
Ranking Metric for Linked Open
Vocabularies
Goal and Agenda
Goal: Present a new ranking metric for reusing
vocabularies
Motivation
Combine Information Theory with metadata information
Find new assessment metric for vocabularies
Current situation
Unicity of popularity based-metric (e.g. prefix.cc or lodstats)
Only ONE dimension used for assessing vocabularies
Proposal: compute informativeness of LOV terms
Experiments and Results
Applications
201/09/05 - 2SEMANTICS 2014 - Leipzig, Germany
Vocabulary Purpose
Model to understand a domain’s semantics
Vocabulary terms contain information
A term = Class, Object Property, Data Property
Essential for publishing data on the Web
How to quantify value of a term?
Informativeness value = negative relation with
probability
201/09/05 - 3SEMANTICS 2014 - Leipzig, Germany
Existing catalogs of vocabularies
201/09/05 - 4SEMANTICS 2014 - Leipzig, Germany
Some catalogs of vocabularies
Linked Open Vocabularies (LOV)
A curated list of vocabularies
More than 420 vocabularies
Each of them described by the vocabulary-of-a-friend
(voaf) schema
Track the (temporal) evolution of vocabularies
Some related services
SPARQL endpoint: http://lov.okfn.org/endpoint/lov
Search function: http://lov.okfn.org/dataset/lov/search
An Aggregator endpoint:
http://lov.okfn.org/endpoint/lov_aggregator
An intelligent bot agent for updates:
http://lov.okfn.org/dataset/lov/bot201/09/05 - 5SEMANTICS 2014 - Leipzig, Germany
LOV DESCRIPTION: http://lov.okfn.org/dataset/lov/
CORE FEATURES OF THE FRAMEWORK
Domain Intended Use Collection GatekeepingNumber of
OntologiesDynamics
Search
metadata
Search
within
ontology
Search across
ontologies
Navigation
criteria
General
Promote and
facilitate the
reuse of
vocabularies in
the linked data
ecosystem.
Submitted by any
user via LOV-
Suggest tool.
Manual
curation and
automatic URI
validation
450+ Growing
Yes, with
visual
depiction
Yes
Keyword-based;
structured
search (query-
based)
Ordered by
prefix,
namespace,
title and
visual links
navigation
CORE FEATURES OF THE FRAMEWORK
MetricsComments
and reviewRanking
Web
service
access
SPARQL
endpoint
Content
available
Read/
Write
Ontology
directory
Ontology
registry
Applicatio
n platform
Reuse
popularity on
the LOD
Cloud
N/A - Only by
the curators
Metric-
basedAPI Yes
Ontology
metadata
, URI
Read Yes Yes Yes
LOV DESCRIPTION WITH THE FRAMEWORK OF [d’Aquin-Noy2012-Survey]
201/09/05 - 6SEMANTICS 2014 - Leipzig, Germany
LOV Evolution since March, 2011
The glitch in 2012
corresponds to the
migration to OKFN
Quasi linearity of the growth,
started with 75 vocabularies
201/09/05 - 7SEMANTICS 2014 - Leipzig, Germany
Proposal: Metrics for Ranking LOV
Metrics
Information Content Metric (IC): value of
information associated with a given entity
Partition Information Content Metric (PIC)
Proposed a ranking based on IC and PIC
Method
Adapt IC and PIC function on semantics
Select candidate vocabularies in LOV catalog
Compute the scores
201/09/05 - 8SEMANTICS 2014 - Leipzig, Germany
Information Content Metrics for LOV
Information Content
Formula:
N = MAX value of term
occurrence in LOV
φ(t)=occurrence of
term in LOV
Partitioned IC
LOV is a semantic
network of resources
Formula:
wf= weight for vocab f
+objectURI+ = owl:ObjectProperty/Datatyp
eProperty; rdfs:Property
201/09/05 - 9SEMANTICS 2014 - Leipzig, Germany
Information Content Metrics for LOV
(Light)weighting
scheme wf=2 if datasets are using
vocabulary
wf=1 if vocabulary reused
other vocabularies.
wf=3 if vocabulary reused
elsewhere
201/09/05 - 10SEMANTICS 2014 - Leipzig, Germany
Ranking Algorithm
Output ranking
4- Compute PIC score
3- Compute IC score
2- Grouping terms by namespace & weight assignment
1- Candidate terms selection in LOV
201/09/05 - 11SEMANTICS 2014 - Leipzig, Germany
Running Example: dcterms vs foaf
dcterms:
http://purl.org/dc/terms/
Candidate terms: 53 (39
properties + 14 classes)
wf = 1+ 2+3 = 6
PIC = 1724.844
foaf:
http://xmlns.com/foaf/0.1/
Candidate terms: 35 (26
properties + 9 classes)
wf = 1+ 2+ 3 = 6
PIC = 1033.197
PIC(dcterms) > PIC(foaf)
201/09/05 - 12SEMANTICS 2014 - Leipzig, Germany
Results on Ranking
Top-15 terms (IC value) Top-15 vocabs (PIC value)
201/09/05 - 13SEMANTICS 2014 - Leipzig, Germany
Relative stable position of foaf in prefix.cc,
vocab.cc and lodstats catalogues.
LOV-PIC/LODstats: skos, dcterms
with “relative” stable raking.
List of “most popular”
vocabularies: foaf, skos,
dcterms, time, dce, prov.
Comparison
201/09/05 - 14SEMANTICS 2014 - Leipzig, Germany
Vocabulary life-cycle management
Help assessing the use of terms and vocabulary updates
Monitoring the use of http://www.w3.org/2003/06/sw-
vocab-status/ns#:term_status or owl:deprecated
Semantic Web applications
Vocabularies with higher PIC might be proposed to a
user as much as possible, e.g. for choosing properties to
display in a facetted browsing interface
Interlinking datasets
Generate sameAs links with data based on vocabularies
terms with lower IC value
Applications of the Ranking Metrics
201/09/05 - 15SEMANTICS 2014 - Leipzig, Germany
Conclusion and Future Work
We have presented new metrics for ranking
vocabularies
By applying Information Content concept to LOV
By taking more dimensions in the ranking metrics
The metrics can be applied to vocabulary
reused, ontology modelling and visualizations
Future work
Add equivalence axioms in the ranking model
Compare (P)IC with other graph-based ranking
(e.g. pagerank)
Investigate the dependency ranking between vocabularies
201/09/05 - 16SEMANTICS 2014 - Leipzig, Germany
Q/A Session
Thanks for your attention!