Presentation at MTSR 2012
-
Upload
riccardo-albertoni -
Category
Documents
-
view
61 -
download
0
Transcript of Presentation at MTSR 2012
Date: 30/11/2012
SSONDE: Semantic Similarity On liNked Data Entities
Riccardo [email protected]
Ontology Engineering Group. Departamento de Inteligencia ArtificialFacultad de Informática
Universidad Politécnica de MadridJoint work with Monica De Martino (CNR-IMATI-GE)
MTSR 2012,
6th Metadata and Semantics Research Conference
28-30 November 2012 - Cádiz (Spain)
2
Presentation Outline
1. How SSONDE fits with other linked data technologies• What is it for? what is it not for?
2. Characteristics of instance similarity in SSONDE• The theory behind SSONDE’s similarity is detailed in
• Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008.
3. SSONDE Architecture and Examples on Linked Data
Riccardo Albertoni
3
Linked data Crawling architectural pattern
Riccardo Albertoni
SSONDE
LDSPIDER/FUSEKI
LDIF
Cluster analysis Explorative search on resources
Build analysis services
Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). 1-136. Morgan & Claypool
4
SSONDE Instance similarity
is not to align ontologies/schemas;
to interlink/consolidate entities;
aims at • providing a method for comparing entities represented as
instances in ontology driven repository or as entities exposed in linked data;
• supporting in explorative searches.
assumes all the integration steps are doneActually, it works at the Application Layer of the Linked Data Crawling Architectural Pattern
main characteristics (make SSONDE unique in its kind)Context to represent similarity criteria (algorithm parameters);
Asymmetry to emphasize containment between instances.
Example: comparing researchers
5
Presentation Outline
1. How SSONDE fits with other linked data technologies• What is it for? what is it not for?
2. Characteristics of instance similarity in SSONDE• The theory behind SSONDE’s similarity is detailed in
• Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008.
3. SSONDE Architecture and Examples on Linked Data
Riccardo Albertoni
6
Example: Researchers’ comparison
theirPublications
ResearchersTheir
Research Topics
TheirProjects
7
• Common publications
• Common research projects
• Similar research interests
Different Contexts
the researchers, publications, … are instances
Researcher’sExperience
Researchers’ Scientific
Interest
• Age• Number of
publications• Number of projects
Contexts Researchers’ Features
(Data/Object properties) considered in the Sim.
It is used only in this context!!They are used
In both the contexts!!
8
[ResearchStaff, Interest]{{{TopicName,Inter}},{{RelatedTopic, Inter} }}
Formalization of Application Context
A function that for each recursion path specifies data/objects properties and which operations to consider
Example
• Common publications• Common research
project• Similar research interest
Researchers’ Scientific
Interest
[ResearchStaff] {{Φ}, {{Publication, Inter} {WorkAtProject, Inter} {interest, Simil}}}
9
Why an Asymmetric Similarity?
Sim(a,b) might differ from Sim(b,a) • Sim is not the inverse of a metric distance metric properties
cannot be exploited to prune comparisons
Here asymmetry is adopted to highlight the containment between instances A, B
Example of containment: (Comparing wrt publications only)
• A is Ph.D student who has always published with his tutor B,
A
B
pub 3
pub 1
pub 2
A is contained in B!!! (A<<B)A can be replaced by B
B is not contained in A!!!If you replace B with A
some experience got lost !!
10
SSONDE’s Asymmetric Similarity returns
Sim(A,B) ranges in [0,1]
It is proportional to the number of data and object property values that A shares with B • A is contained in B Sim(A,B)=1 • If A is not contained in B Sim(A,B)<1 • If A and B don’t share any “features” Sim(A,B)=0• If A has exactly the same characteristics of B (A<<B,
B<<A) Sim(A,B) = Sim(B,A) = 1
11
Results comparing young and senior researchers of IMATI
Research Experience Research Interest
The darkest is the matrix value the more is the similarity
12
Presentation Outline
1. How SSONDE fits with other linked data technologies• What is it for? what is it not for?
2. Characteristics of instance similarity in SSONDE• The theory behind SSONDE’s similarity is detailed in
• Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008.
3. SSONDE Architecture and Examples on Linked Data
Riccardo Albertoni
13
SS
ON
DE
Output
TDB Rep.
SDBRep.
RDF Dumps
Configuration Similarity
Context Layer
Ontology Layer
Data Layer
Data wrappers
JENA TDB
JENASDB
JENA MEM
List of Instances Java Class to
generate the list
Ref. Context
Ref. Rules (e.g., JENA rules)
Similarity matrix in CSV
n-most similar entities
In JSON...Virtuoso
Wrppr
virtuoso
Kind of Store
….
WE
B O
F
DA
TA
RDF Dumps
HTTP DEREFERENCIABLE URIs
SPARQLEnd Points
Third parties
Served Linked dataset
Crawling architectural pattern
LDIFLDSpider +Fuseki Linked data consumption
Local Data Store/Cache
SSONDE ARCHITECTURE
14
SSONDE: a building block for new analysis services
SSONDE applied on “real linked data”• Analysing Habitat and Species
• published in NatureSDIplus (ECP-2007-GEO-317007), a European project developing a Spatial Data Infrastructure for Nature Conservation.
• to rank habitats according to the species they host an insight into inter-dependencies between habitats and species
• Analysing overlaps among scientific interests• Subset of linked dataset provided data.cnr.it as part of
SemanticScout framework by third parties (Gangemi et al)• to compare IMATI-CNR researcher according to their
research interests
Riccardo Albertoni
15Riccardo Albertoni
Identify crawling seeds• URI of
entities to be involved in the analysis
Identify RDF properties• to be used in
the instances comparison
Run LDSpider • constraining
the crawling to the selected properties
configure SSONDE• JSON
configuration file
• Context definition
Run SSONDE
Analyse results
Applying SSONDE on data.cnr.it
16Riccardo Albertoni
Identify crawling seeds• URI of
entities to be involved in the analysis
Identify RDF properties• to be used in
the instances comparison
Run LDSpider • constraining
the crawling to the selected properties
configure SSONDE• JSON
configuration file
• Context definition
Run SSONDE
Analyse results
Applying SSONDE on data.cnr.it
http://code.google.com/p/ssonde/wiki/RDF_statements_download
17
Configuration file 1
{ "StoreConfiguration":{
"KindOfStore":"JENATDB",
"RDFDocumentURIs":[ ],
"TDBDirectory":"data/CNRIT/TDB-0.8.9/CNRR/"
},
"InstanceConfiguration":{
"InstanceURIsClass":"application.dataCNRIt.GetResearcherIMATIplusCoauthor"
},
"OutputConfiguration":{
"KindOfOutput":"JSONOrderedResult",
"NumberOfOrderedResult":”20",
"FilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CRRIIntPub.res.json"
},
"ContextConfiguration":{
"ContextFilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CCRIIntPub.ctx"
}
}
Riccardo Albertoni
List of LOD Entities URI Java class Implementing ListOfInputInstances
Similarity Matrix CSV - JSON encoding of top n-most similar
Context Encoded in a format in-house text format/ hopefully soon in JSON
18
Crawled by Data.CNR.it
Crawled by DBPEDIA
Data.cnr.it – defining a context
Riccardo Albertoni
Res 226
pub: 22
Topic:25 Res 225
Topic:26
pub: 26
Topic:2
pub: 29
Res 226
Topic:27
Topic:23
skos:broader
dc:subject
pub:autoreCNRdi
PREFIX dc: <http://purl.org/dc/terms/>PREFIX pub: <http://www.cnr.it/ontology/cnr/pubblicazioni.owl#>
[owl:Thing, dc:subject]-> {{},{(skos:broader, Inter)}}[owl:Thing]-> {{}, { (pub:autoreCNRDi, Inter),(dc:subject, Simil)}}
No data properties are
considered in this context
Publications
Interests
Interest Hierarchy
19
Similarity Matrix:
Riccardo Albertoni
data is more recent but less accurate
But More Researchers are
represented&
Still containment is highlighted
20
Hierarchical clustering: Scientific cluster are discovered
Computer graphics
Grid Computing
Knowledge management
E-Learning
Visiting researchers/
Technicians / Associates
Hierarchical Clustering Hierarchical Clustering Explorer, 3.0, Human-Computer Interaction Lab University of Maryland. http://www.cs.umd.edu/hcil/multi-cluster/.
21
What next?
(i) semantic similarity optimization:(i) the caching of intermediate similarity results
(ii) the adoption of MapReduce paradigm to speed up the assessment of semantic similarity;
(ii) domain driven extensions at data layer: (iii) defining new data layer measures suited for geo-
referenced entities
(iv) the multilingual similarity
(iii) definition of interfaces sifting entities according to their similarity exploiting visualization frameworks such as Exibit, Google visualization and JavaScript InfoVis Toolkit.
Riccardo Albertoni
22
THANKS for your kind attention!!!
Questions/ Discussion / Suggestion Riccardo Albertoni
• SSONDE can be deployed in some of your future projects (proposal)
• You are interested in contributing to SSONDE Open framework
Do not hesitate to contact us if
SSONDE framework• pushes our instance similarity as a ready-to-go tool for the
analysis of linked data. • its Java Code available in Google Code
• http://purl.oclc.org/NET/SSONDE• licenced as open source code (GNU GPL v3)
23
SSONDE Framework • R. Albertoni, M. De Martino, SSONDE: Semantic Similarity On liNked Data Entities, 6th Metadata
and Semantics Research Conference, 28-30 November 2012 - Cádiz (Spain) [to appear]• Framework Installation & use http://code.google.com/p/ssonde/wiki/GettingStarted
Semantic Similarity Theoretical Framework• Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among
ontology instances, Journal of Data Semantics, LNCS, 2008.• Albertoni R. and De Martino M.;. Semantic similarity of ontology instances tailored on the
application context. Full paper at On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE, volume 4275 of LNCS, pages 1020–1038. Springer, 2006.
Issues adapting theoretical framework to Linked Data • Albertoni R., De Martino M.; Semantic Similarity and Selection of Resources Published
According to Linked Data Best Practice, OnToContent 2010, Part of the OTM (OTM'10)
Further ApplicationsComparing EUNIS habitats wrt their species• Albertoni R., De Martino M.; Semantic Technology to Exploit Digital Content Exposed as Linked
Data, eChallenges e-2011, 26-28 October 2011 Florence, Italy
Comparing shapes metadata (not Linked Data)• Albertoni R., De Martino M.; Using Context Dependent Semantic Similarity to Browse
Information Resources: an Application for the Industrial Design, First workshop on multimedia Annotation and Retrieval enabled by Shared Ontologies, Genoa, Italy, (2007)
A complete list of references on SSONDE and its Instance Similarity