The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology...
-
Upload
raul-garcia-castro -
Category
Technology
-
view
174 -
download
0
description
Transcript of The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology...
Speaker: Raúl García-Castro Talk at IMATI-CNR, October 15th, Genova, Italy
The evolution of semantic technology evaluation
in my own flesh (The 15 tips for technology evaluation)
Raúl García-Castro
Ontology Engineering Group. Universidad Politécnica de Madrid, Spain
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Index
• Self-awareness • Crawling (Graduation Project) • Walking (Ph.D. Thesis) • Cruising (Postdoctoral Research) • Insight
2
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Who am I?
• Assistant Professor - Ontology Engineering Group - Computer Science School at Universidad Politécnica de Madrid (UPM)
• Research lines - Evaluation and benchmarking of semantic technologies
• Conformance and interoperability of ontology engineering tools • Evaluation infrastructures
- Ontological engineering • Sensors, ALM, energy efficiency, context, software evaluation
- Application integration
3
http://www.garcia-castro.com/
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Semantic Web technologies
The Semantic Web is: • “An extension of the current web in which information is given well-
defined meaning, better enabling computers and people to work in cooperation” [Berners-Lee et al., 2001]
• A common framework for data sharing and reusing across applications • Distinctive characteristics:
- Use of W3C standards - Use ontologies as data models - Inference of new information - Open world assumption
• High heterogeneity: - Different functionalities
• In general • In particular
- Different KR formalisms • Different expressivity • Different reasoning capabilities
4
Distributed data
repository
Distributed instance repository
Ontology learner
Ontology selector
Distributed alignment repository
Ontology ranker
Ontology versioner
Service composer
Distributed annotated
data repository Ontology
evaluator
Ontology visualizer
Ontology editor
Ontology browser
Ontology profiler
Service orchestration
Service choreography
engine
ONTOLOGY DEVELOPMENT & MANAGEMENT
ONTOLOGY CUSTOMIZATION
Manual annotation
Ontology populator
Query answering
Ontology merger
Instance editor
Ontology integrator
Ontology transformer
Ontology reconciler
ONTOLOGY ALIGNMENT
DATA MANAGEMENT
Ontology evolution manager
Ontology evolution visualizer
ONTOLOGY EVOLUTION
Ontology searcher
Ontology localizer
Ontology configuration
manager
Ontology aligner
Ontology matcher
Semantic query
processor
Semantic query editor
Service process mediator
Service non-functional
selector
Service discoverer
Service directory manager Distributed
ontology repository
Information Directory manager
Ontology modularizer
Automatic annotation
Distributed registry
SEMANTIC WEB
SERVICES
ONTOLOGY INSTANCE
GENERATION
QUERYING AND
REASONING
García-Castro, R.; Muñoz-García, O.; Gómez-Pérez, A.; Nixon L. "Towards a component-based framework for developing Semantic Web applications". 3rd Asian Semantic Web Conference (ASWC 2008). 2-5 February, 2009. Bangkok, Thailand.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Ontology engineering tools
Allow the creation and management of ontologies: • Ontology editors
- User oriented
• Ontology language APIs - Programming oriented
5
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Index
• Self-awareness • Crawling (Graduation Project) • Walking (Ph.D. Thesis) • Cruising (Postdoctoral Research) • Insight
6
http://www.phdcomics.com/comics/archive.php?comicid=1012
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation goal
GQM paradigm: Any software measurement activity should be preceded by:
1.- The identification of a software engineering goal ...
Goal: To improve the performance and the scalability of the methods provided by the ontology management APIs of ontology development tools
2.- ... which leads to questions ...
Which is the actual performance of the API methods?
Is the performance of the methods stable?
Are there any anomalies in the performance of the methods?
Do changes in a method’s parameters affect its performance?
Does tool load affect the performance of the methods?
Metric: Execution times of the methods of the API with different load factors
3.- ... which in turn lead to actual metrics.
Execution time of each method
Variance of execution times of each method
Percentage of execution times out of range in each method’s sample
Execution time with parameter A = Execution time with parameter B
Tool load versus execution time relationship
Latency
Scalability
7
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation data
8
• Atomic operations of the ontology management API • Multiple benchmarks defined for each method according to
changes in its parameters • Benchmarks parameterised according to the number of
consecutive executions of the method
insertConcept insertRelation insertClassAttribute insertInstanceAttribute insertConstant insertReasoningElement insertInstance updateConcept updateRelation updateClassAttribute updateInstanceAttribute updateConstant updateReasoningElement updateInstance .......
(72 methods)
insertConcept(String ontology, String concept)
Concept_1 . . .
Concept_N
benchmark1_1_09(N) “Inserts 1 concept in N ontologies”
benchmark1_1_08(N) “Inserts N concepts in 1 ontology”
Ontology_1
Concept_1
Ontology_1
Ontology_N
.
.
.
(128 benchmarks)
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro 9
Workload generator
• Generates and inserts into the tool synthetic ontologies accordant with: - Load factor (X). Defines the size of ontology data - Ontology structure dependent on the benchmarks
benchmark1_1_08
benchmark1_1_09
Inserts N concepts in an ontology
Inserts a concept in N ontologies
1 ontology
N ontologies
benchmark1_3_20
benchmark1_3_21
Removes N concepts from an ontology
Removes a concept from N ontologies
1 ontology with N concepts
N ontologies with 1 concept
Execution needs Operation Benchmark
For executing all the benchmarks, the ontology structure includes the execution needs of all the benchmarks
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro 10
Evaluation infrastructure
Benchmark Suite
Executor
Workload Generator
Performance Benchmark
Suite
Ontology Development
Tool
Measurement Data Library
Statistical Analyser
To be instantiated for each tool
… http://knowledgeweb.semanticweb.org/wpbs/
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro 11
Statistical analyser Benchmark
Suite Executor
Workload Generator
Performance Benchmark
Suite
Ontology Development
Tool
Measurement Data Library
Statistical Analyser
Statistical software
BenchStats
y=154.96-0.001x 0.25 151 10 150 160 400 5000 benchmark1_3_21
y=155.25-0.003x 1.25 150 10 150 160 400 5000 benchmark1_3_20
y=910.25-0.003x 1.75 911 11 901 912 400 5000 benchmark1_1_09
y=62.0-0.009x 1.25 60 0 60 60 400 5000 benchmark1_1_08
Function % Outliers Median IQR LQ UQ N Load
Measurement Data Library
benchmark1_1_08 400 measurements
2134 ms. 2300 ms. 2242 ms. 2809 ms. ...
benchmark1_1_09 400 measurements
1399 ms. 2180 ms. ...
benchmark1_3_20 400 measurements
2032 ms. 1459 ms. ...
…
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro 12
Result analysis - Latency Metric for the execution time: The median of the execution times of a method
No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.
N=400, X=5000
Metric for the variability of the execution time: The interquartile range of the execution times of a method
No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.
N=400, X=5000
Metric for anomalies in the execution times: Percentage of outliers in the execution times of a method
No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.
N=400, X=5000
Effect of changes in method parameters: Comparison of the medians of the execution times of the benchmarks that use the same method
No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.
N=400, X=5000
8 Methods with execution times>800 ms.
3 methods with IQR>11 ms.
2 methods with % outliers>5%
5 methods with differences in execution times > 60 ms.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro 13
Result analysis - Scalability
Effect of changes in WebODE’s load:
Slope of the function estimated by simple linear regression of the medians of the execution times from a minimum load (X=500) to a maximum one (X=5000).
8 methods with slope>0.1 ms.
N=400, X=500..5000
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro 14
Limitations
• Evaluating other tools is expensive
• Analysis of results was difficult - The evaluation was executed 10 times with different load factors:
500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000 - 128 benchmarks x 10 executions = 1280 files with results!!!!!
Ontology Development
Tool
Ontology Development
Tool
Ontology Development
Tool
Benchmark Suite
Executor
Workload Generator
Performance Benchmark
Suite
Ontology Development
Tool
Measurement Data Library
Statistical Analyser
García-Castro R., Gómez-Pérez A "Guidelines for Benchmarking the Performance of Ontology Management APIs" 4th International Semantic Web Conference (ISWC2005), LNCS 3729. November 2005. Galway, Ireland.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
The 15 tips for technology evaluation
• Know the technology
• Support different types of technology
• Automate the evaluation framework
• Expect reproducibility • Beware of result
analysis • Learn statistics • Plan for evaluation
requirements
15
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Index
• Self-awareness • Crawling (Graduation Project) • Walking (Ph.D. Thesis) • Cruising (Postdoctoral Research) • Insight
16
KHAAAAN!
http://www.phdcomics.com/comics/archive.php?comicid=500
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Interoperability in the Semantic Web
• Interoperability is the ability that Semantic Web technologies have to interchange ontologies and use them - At the information level; not at the system level - In terms of knowledge reuse; not information integration
• In the real world it is not feasible to use a single system or a single formalism
• Different behaviours in interchanges between different formalisms:
17
Same formalism A B disjoint
A B disjoint
Different formalism
A B disjoint
C subclass
disjoint subclass
C subclass
A B
C subclass subclass
A B
C myDisjoint myDisjoint
A B
C
LOSS
LESS
A B
LOSS
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation goal
To evaluate and improve the interoperability of Semantic Web technologies using RDF(S) and OWL as interchange languages
18
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation workflow - Manual
Tool X
Oi = Oi’ + α – α’
Oi Oi’ RDF(S)/OWL
Ontologies
Import
Tool X
Oi = Oi’ + β - β’
Oi Oi’ RDF(S)/OWL
Ontologies
Export
Tool Y
Oi = Oi’’ + α - α’ + β - β’
Oi’’
Tool X
Oi Oi’ RDF(S)/OWL
Ontologies
Interoperability
19
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation workflow - Automatic
20
Existing ontologies
O1..On Tool X Tool Y
Interchange language
Step 1: Import + Export
O1 = O1’’ + α - α’
Step 2: Import + Export
O1’’=O1’’’’ + β - β’
Interchange
O1 = O1’’’’ + α - α’ + β - β’
O1 O1’ O1’’ O1’’’ O1’’’’ RDF(S)/OWL RDF(S)/OWL RDF(S)/OWL
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation data - OWL Lite Import Test Suite
21
RDF/XML Syntax variants <rdf:Description rdf:about="#class1"> <rdf:type rdf:resource="&rdfs;Class"/> </rdf:Description>
<rdfs:Class rdf:about="#class1"> </rdfs:Class>
=
Component combinations
Subclass of class Subclass of restriction Value constraints
Cardinality + object property
Cardinality + datatype property
Set operators
Group No. Class hierarchies 17 Class equivalences 12 Classes defined with set operators 2 Property hierarchies 4 Properties with domain and range 10 Relations between properties 3 Global cardinality constraints and logical property characteristics
5
Single individuals 3 Named individuals and properties 5 Anonymous individuals and properties 3 Individual identity 3 Syntax and abbreviation 15 TOTAL 82
David S., García-Castro, R.; Gómez-Pérez, A. "Defining a Benchmark Suite for Evaluating the Import of OWL Lite Ontologies". Second International Workshop OWL: Experiences and Directions 2006 (OWL2006). November, 2006. Athens, Georgia, USA.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation criteria
22
• Execution informs about the correct execution: – OK. No execution problem – FAIL. Some execution problem – Comparer Error (C.E.) Comparer exception – Not Executed. (N.E.) Second step not executed
• Information added or lost in terms of triples
• Interchange informs whether the ontology has been interchanged correctly with no addition or loss of information: – SAME if Execution is OK and Information added and
Information lost are void – DIFFERENT if Execution is OK but Information added
or Information lost are not void – NO if Execution is FAIL, N.E. or C.E.
Oi = Oi’ + α - α’
Oi = Oi’ ?
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation campaigns
23
9 Tools
SemTalk (Frames) (OWL) (Frames)
1 ontology-based annotation tool
3 ontology repositories
5 ontology development tools
6 Tools (Frames)
3 ontology repositories
3 ontology development tools
RDF(S) Interoperability Benchmarking
OWL Interoperability Benchmarking
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation infrastructure - IRIBA
!
24
!
! !
!
!
http://knowledgeweb.semanticweb.org/iriba/
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation infrastructure - IBSE
• Automatically executes experiments between all the tools • Allows configuring different execution parameters • Uses ontologies to represent benchmarks and results • Depends on external ontology comparers (KAON2 OWL Tools and RDF-
utils)
25
Describe benchmarks
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#"
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" arkOntology#" arkOntology#"> <owl:Ontology rescription of the benchmark suite inputs.</rdfs:comment> <owl:versionInfo>24 October 2006</owl:versionInfo> </owl:Ontology> <!-- classes -->
Generate reports
Execute benchmarks
Benchmark descriptions
Execution results
Tools
Reports (HTML, SVG)
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#"
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" arkOntology#" arkOntology#"> <owl:Ontology rescription of the benchmark suite inputs.</rdfs:comment> <owl:versionInfo>24 October 2006</owl:versionInfo> </owl:Ontology> <!-- classes -->
OWL Lite Import
Benchmark Suite
1
2 3
benchmarkOntology
rdf:type
resultOntology
rdf:type
…
http://knowledgeweb.semanticweb.org/benchmarking_interoperability/ibse/
García-Castro, R.; Gómez-Pérez, A., Prieto-González J. "IBSE: An OWL Interoperability Evaluation Infrastructure". Third International Workshop OWL: Experiences and Directions 2007 (OWL2007). June, 2007. Innsbruck, Austria.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation results - Variability
• High variability in evaluation results
• Different perspectives for analysis - Results per tool / pair of tools - Results per component - Result evolution over time - …
26
Tool import/export: Models and executes
Does not model and executes Models and fails
Does not model and fails Not executed
Same information More information Less information
Tool fails Comparer fails
Not valid ontology
Ontology comparison:
0
5
10
15
20
25
30
35
40
45
50
04-2005 05-2005 10-2005 01-2006
Models and executes
Not models and executes
Models and fails
Not models and fails
Combinations K-K P-P W-W K-P K-W P-W K-P-W Classes (2) Y Y Y Y Y Y Y Classes instance of a single metaclass (4) Y Y - N - - - Classes instance of multiple metaclasses (1) Y N - N - - - Class hierarchies without cycles (3) Y Y Y Y Y Y Y Class hierarchies with cycles (2) - - - - - - - Classes related through object or datatype properties (6) - - - - - - - Datatype properties without domain or range (7) Y Y - N - - - Datatype properties with multiple domains (3) Y - - - - - - Datatype properties whose range is String (5) Y Y Y N N Y N Datatype properties whose range is a XML Schema datatype (2) Y - Y - Y - - Object properties without domain or range (8) Y Y - Y - - - Object properties with a domain and a range (2) Y Y Y Y Y Y Y Object properties with multiple domains or ranges (5) Y - - - - - - Instances of undefined resources (1) - - - - - - - Instances of a single class (2) Y Y Y Y Y Y Y Instances of a multiple classes (1) Y N - N - - - Instances related via object properties (7) Y Y Y Y Y Y Y Instances related via datatype properties (2) Y Y Y N Y Y N Instances related via datatype properties with range a XML schema datatype (2) - - Y - - - - Instances related via undefined object or datatype properties (3) - - - - - - -
OR
IGIN
DESTINATION JE PO SW K2 GA ST WE PF
Jena 100 100 100 78 85 16 17 5 Protégé-OWL 100 100 95 78 89 16 17 5
SWIProlog 100 100 100 78 55 45 17 5 KAON2 78 78 78 78 40 39 6 0 GATE 96 52 79 74 46 13 15 13
SemTalk 45 46 46 27 24 46 17 0 WebODE 17 18 0 6 16 17 17 12
Protégé-Frames 5 5 0 0 4 5 0 13
DESTINATION
OR
IGIN
JE PO SW K2 GA ST WE PF Jena 100 100 100 78 85 16 17 5
Protégé-OWL 100 100 95 78 89 16 17 5 SWIProlog 100 100 100 78 55 45 17 5
KAON2 78 78 78 78 40 39 6 0 GATE 96 52 79 74 46 13 15 13
SemTalk 45 46 46 27 24 46 17 0 WebODE 17 18 0 6 16 17 17 12
Protégé-Frames 5 5 0 0 4 5 0 13
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation results - Interoperability
27
Clear picture of the interoperability between different tools • Low interoperability and few clusters of interoperable tools • Interoperability depends on:
- Ontology translation (tool knowledge model) - Specification (development decisions) - Robustness (tool defects) - Tools participating in the interchange (each behaves differently)
• Tools have improved • Involvement of tool developers is needed
- Tool developers have been informed - Tool improvement is out of our scope
• Results are expected to change - Continuous evaluation is needed
García-Castro, R.; Gómez-Pérez, A. "Interoperability results for Semantic Web technologies using OWL as the interchange language". Web Semantics: Science, Services and Agents in the World Wide Web. ISSN: 1570-8268. Elsevier. Volume 8, number 4. pp. 278-291. November 2010.
García-Castro, R.; Gómez-Pérez, A. "RDF(S) Interoperability Results for Semantic Web Technologies". International Journal of Software Engineering and Knowledge Engineering. ISSN: 0218-1940. Editor: Shi-Kuo Chang. Volume 19, number 8. pp. 1083-1108. December 2009.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Benchmarking interoperability
28
Method for benchmarking interoperability • Common for different Semantic Web technologies • Problem-focused instead of tool-focused • Manual vs automatic experiments:
- It depends on the specific needs of the benchmarking - Automatic: cheaper, more flexible and extensible - Manual: higher quality of results
Resources for benchmarking interoperability • All the benchmark suites, software and results are publicly
available • Independent of:
- The interchange language - The input ontologies
Automatic Manual
IBSE rdfsbs IRIBA
OWL Lite Import B. Suite RDF(S) Import B. Suite RDF(S) Export B. Suite
RDF(S) Interoperability B. Suite
Tool X Tool Y Tool X Tool Y
RDF(S) Interoperability B. OWL Interoperability B.
García-Castro, R. "Benchmarking Semantic Web technology". Studies on the Semantic Web vol. 3. AKA Verlag – IOS Press. ISBN: 978-3-89838-622-7. January 2010.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Limitations
• Number of results to analyse increased exponentially - 2168 executions in the RDF(S) benchmarking activity and - 6642 executions in the OWL one
• Hard to support and maintain different test data and tools
• Every tool to be evaluated had to be deployed in the same computer
29
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
The 15 tips for technology evaluation
• Know the technology • Support different test
data
• Support different types of technology
• Use machine-processable descriptions of evaluation resources
• Automate the evaluation framework
• Expect reproducibility • Beware of result
analysis • Learn statistics • Plan for evaluation
requirements
• Organize (or join) evaluation campaigns
30
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Index
• Self-awareness • Crawling (Graduation Project) • Walking (Ph.D. Thesis) • Cruising (Postdoctoral Research) • Insight
31
http://www.phdcomics.com/comics/archive.php?comicid=570
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
The SEALS Project (RI-238975)
32
Universidad Politécnica de Madrid, Spain (Coordinator) University of Sheffield, UK Forschungszentrum InformaCk, Germany University of Innsbruck, Austria InsCtut NaConal de Recherche en InformaCque et en AutomaCque, France
1
3 1
2 1 2
University of Mannheim, Germany University of Zurich, Switzerland STI InternaConal, Austria Open University, UK Oxford University, UK
Project Coordinator: Asunción Gómez Pérez <[email protected]>
hVp://www.seals-‐project.eu/ EC contribu2on: 3.500.000 €
Dura2on: June 2009-‐June 2012
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Semantic technology evaluation @ SEALS
33
SEALS Pla8orm
SEALS Evalua2on Campaigns SEALS Community
SEALS Evalua2on Services
Wrigley S.; García-Castro R.; Nixon L. "Semantic Evaluation At Large Scale (SEALS)". 21st International World Wide Web Conference (WWW 2012). European projects track. pp. 299-302. Lyon, France. 16-20 April 2012.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Test Data
Evaluations
Tools Results
The SEALS entities
34
Ontology engineering Storage and reasoning
Ontology matching Semantic search
Semantic web service
Raw Results Interpretations
15/10/13
34
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Structure of the SEALS entities
35
EnCty
Data Metadata Discovery, Valida2on
Exploita2on
• Java Binaries • Shell scripts • Bundles
• BPEL • Java Binaries • Ontologies
http://www.seals-project.eu/ontologies/ SEALS Ontologies
García-Castro R.; Esteban-Gutiérrez M.; Kerrigan M.; Grimm S. "An Ontology Model to Support the Automatic Evaluation of Software". 22nd International Conference on Software Engineering and Knowledge Engineering (SEKE 2010). pp. 129-134. Redwood City, USA. 1-3 July 2010.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
SEALS logical architecture
36
SEALS Service Manager
Run2me Evalua2on Service
SEALS Portal
Test Data Repository Service
Tools Repository Service
Results Repository Service
Evalua2on Descrip2ons
Repository Service
Technology Providers
Evaluation Organisers
Technology Adopters
Software agents
SEALS Repositories
A OS
García-Castro R.; Esteban-Gutiérrez M.; Gómez-Pérez A. "Towards an Infrastructure for the Evaluation of Semantic Technologies". eChallenges e-2010 Conference (e-2010). pp. 1-8. Warsaw, Poland. 27-29 October 2010.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Challenges
• Tool heterogeneity - Hardware requirements - Software requirements
• Reproducibility - Ensure execution environment offers
the same initial status
37
Virtualiza)on Solu)on • VMWare Server 2.0.2 • VMWare vSphere 4 • Amazon EC2 (In progress)
Tool
Virtual Machine
Tool
Execu)on Node
…
Processing Node
Virtual Machine
Virtualization as a
technology enabler
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation campaign methodology
38
SEALS Methodology for Evaluation Campaigns
Raúl García-Castro and Stuart N. Wrigley
September 2011
INITIATION INVOLVEMENT
PREPARATION & EXECUTION
DISSEMINATION
FINALIZATION
• SEALS-independent • Includes:
- Actors - Process - Recommendations - Alternatives - Terms of participation - Use rights
García Castro R.; Martin-Recuerda F.; Wrigley S. "SEALS. Deliverable 3.8 SEALS Methodology for Evaluation Campaigns v2". Technical Report. SEALS project. July 2011.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Ontology engineering
Ontology reasoning
Ontology matching
Semantic search
Semantic web service
• Conformance • Interoperability • Scalability
DL reasoning: • Classification • Class satisfiability
• Ontology satisfiability
• Entailment • Non-entailment • Instance retrieval RDF reasoning: • Conformance
• Matching accuracy
• Matching accuracy multilingual
• Scalability (ontology size, # CPU)
• Search accuracy, efficiency (automated)
• Usability, satisfaction (user-in-the-loop)
• SWS Discovery
Conformance & interoperability: • RDF(S) • OWL Lite, DL and
Full • OWL 2
Expressive x3 • OWL 2 Full Scalability: • Real-world • LUBM • Real-world + • LUBM +
DL reasoning: • Gardiner test
suite • Wang et al.
repository • Versions of
GALEN • Ontologies from
EU projects • Instance
retrieval test data
RDF reasoning: • OWL 2 Full
• Benchmark • Anatomy
Conference • MultiFarm • Large Biomed
(supported by SEALS)
Automated: • EvoOnt • MusicBrainz (from QALD-1)
User-in-the-loop: • Mooney • Mooney +
• OWLS-TC 4.0 • SAWSDL-TC 3.0 • WSMO-LITE-TC
Current SEALS evaluation services
Test Data
Evaluations
39
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
New evaluation data – Conformance and interoperability
• OWL DL test suite à keyword-driven approach - Manual definition of test in CSV/
spreadsheet using a keyword library
• OWL 2 test suite à automatically generate ontologies of increasing expressiveness: - Using ontologies in the Web - Maximizing expressiveness
40
Test Suite Definition
Script
Test Suite
…
Preprocessor
Interpreter
Expanded Test Suite Definition
Script
Test Suite Generator
Keyword Library
ontology01.owl
ontology02.owl
ontology03.owl
Metadata
Initial ontologies
Original test suite
Metadata
Ontology Search
Ontology generation process
Ontology Module Extraction
Increase expressivity
Maximize expressivity
Expressive test suite
Metadata
Full-expressive test suite
Metadata
Online Ontologies
OWL API
OWLDLGenerator (http://knowledgeweb.semanticweb.org/benchmarking_interoperability/OWLDLGenerator/)
García-Castro R.; Gómez-Pérez A. "A Keyword-driven Approach for Generating Ontology Language Conformance Test Data". Engineering Applications of Artificial Intelligence. ISSN: 0952-1976. Elsevier. Editor: B. Grabot.
Grangel-González I.; García-Castro R. "Automatic Conformance Test Data Generation Using Existing Ontologies in the Web". Second International Workshop on Evaluation of Semantic Technologies (IWEST 2012). 28 May 2012. Heraklion, Greece.
OWL2EG (http://knowledgeweb.semanticweb.org/benchmarking_interoperability/OWL2EG/)
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
1st Evaluation Campaign
41
Campaign Tool Provider Country Ontology engineering Jena HP Labs UK
Sesame Aduna Netherlands Protégé 4 University of Stanford USA Protégé OWL University of Stanford USA NEON toolkit NEON Foundation Europe OWL API University of Manchester UK
Reasoning HermiT University of Oxford UK jcel Tec. Universitat Dresden Germany FaCT++ University of Manchester UK
Matching AROMA INRIA France ASMOV INFOTECH Soft USA Aroma Nantes University France Falcon-AO Southeast University China Lily Southeast University China RiMOM Tsinghua University China Mapso FZI Germany CODI University of Mannheim Germany AgreeMaker Advances in Computing Lab USA Gerome* RWTH Aachen Germany Ef2Match Nanyang Tec. University China
Semantic search K-Search K-Now Ltd UK Ginseng University of Zurich Switzerland NLP-Reduce University of Zurich Switzerland PowerAqua KMi, Open University UK Jena Arq HP Labs, Talis UK
Semantic web service 4 OWLS-MX variants DFKI Germany
29 tools from 8 countries
Nixon L.; García-Castro R.; Wrigley S.; Yatskevich M.; Trojahn-dos-Santos C.; Cabral L. "The state of semantic technology today – overview of the first SEALS evaluation campaigns". 7th International Conference on Semantic Systems (I-SEMANTICS2011). Graz, Austria. 7-9 September 2011.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
WP
Tool Provider Country
10 Jena HP Labs UK Sesame Aduna Netherlands Protégé 4 University of Stanford USA Protégé OWL University of Stanford USA NeOn toolkit NeOn Foundation Europe OWL API University of Manchester UK
11 HermiT University of Oxford UK jcel Technischen Universitat Dresden Germany FaCT++ University of Manchester UK WSReasoner University of New Brunswick Canada
2nd Evaluation Campaign
42
41 tools from 13 countries
WP
Tool Provider Country
12 AgrMaker University of Illinois at Chicago USA Aroma INRIA Grenoble Rhone-Alpes France AUTOMSv2 VTT Technical Research Centre Finland CIDER Universidad Politecnica de
Madrid Spain
CODI Universitat Mannheim Germany CSA University of Ho Chi Minh City Vietnam GOMMA Universitat Leipzig Germany Hertuda Technische Universitat
Darmstadt Germany
LDOA Tunis-El Manar University Tunisia Lily Southeast University China LogMap University of Oxford UK LogMapLt University of Oxford UK MaasMtch Maastricht University Netherlands MapEVO FZI Forschungszentrum
Informatik Germany
MapPSO FZI Forschungszentrum Informatik
Germany
MapSSS Wright State University USA Optima University of Georgia USA WeSeEMtch Technische Universitat
Darmstadt Germany
YAM++ LIRMM France
WP
Tool Provider Country
13 K-Search K-Now Ltd UK Ginseng University of Zurich Switzerland NLP-Reduce University of Zurich Switzerland PowerAqua KMi, Open University UK Jena Arq v2.8.2 HP Labs, Talis UK Jena Arq v2.9.0 HP Labs, Talis UK rdfQuery v0.5.1-beta
University of Southampton UK
Semantic Crystal University of Zurich Switzerland Affective Graphs University of Sheffield UK
14 WSMO-LITE-OU KMi, Open University UK SAWSDL-OU KMi, Open University UK OWLS-URJC University of Rey Juan Carlos Spain OWLS-M0 DFKI Germany
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evaluation services
43
Tools
Test data Evaluations
Results
My tool
My test data
My results
My results Tools Evaluations
Tools Test data
My evaluation
My tool
My results
My test data
Exploit results
Execute evaluaCons
Update them
Or define your own
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Quality model for semantic technologies
44
Tool/Measures Raw Results Interpretations Quality
Measures Quality sub-
characteristics
Ontology engineering tools 7 20 8 6
Ontology matching tools 1 4 4 2
Reasoning systems 11 0 16 5
Semantic search tools 12 8 18 7
Semantic web service tools 5 9 10 2
Total 34 41 55 17
Radulovic, F., Garcia-Castro, R., Extending Software Quality Models - A Sample In The Domain of Semantic Technologies. 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE2011). Miami, USA. July, 2011
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Semantic technology recommendation
45
I need a robust ontology engineering tool and a seman.c search tool with the highest precision
User Quality Requirements
SEALS Pla<orm Tools Repository Service
Results Repository Service
Seman2c Technology
Recommenda2on
RecommendaCon
SemanCc Technology
Quality model
You should use Sesame v2.6.5 and Arq v2.9.0
The reason for this is...
Alterna.vely, you can use ...
Radulovic F.; García-Castro R. "Semantic Technology Recommendation Based on the Analytic Network Process". 24th Int. Conference on Software Engineering and Knowledge Engineering (SEKE 2012). Redwood City, CA, USA. 1-3 July 2012. 3rd Best Paper Award!
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
You can use the SEALS Platform
• The SEALS Platform facilitates: - Comparing tools under common settings - Reproducibility of evaluations - Reusing evaluation resources, completely or partially - Or defining new ones - Managing evaluation resources using platform services - Computational resources for demanding evaluations
• Don’t start your evaluation from scratch!
46
15/10/13
46
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
The 15 tips for technology evaluation
• Know the technology • Support different test
data • Facilitate test data
definition • Support different
types of technology • Define declarative
evaluation workflows • Use machine-
processable descriptions of evaluation resources
• Automate the evaluation framework
• Expect reproducibility • Beware of result
analysis • Learn statistics • Plan for evaluation
requirements • Use a quality model • Organize (or join)
evaluation campaigns • Share evaluation
resources • Exploit evaluation
results
47
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Index
• Self-awareness • Crawling (Graduation Project) • Walking (Ph.D. Thesis) • Cruising (Postdoctoral Research) • Insight
48
insight noun […] [mass noun] Psychiatry awareness by a mentally ill person that their mental experiences are not based in external reality.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
Evolution towards maturity
49
TABLE ILEVELS AND THEMES OF SOFTWARE EVALUATION TECHNOLOGY MATURITY
Level Formalization of theevaluation workflow
Software support tothe evaluation
Applicability to multiplesoftware types
Usability of test data Exploitability ofresults
Representativenessof participants
Initial Ad-hoc workflow informallydefined.
Manual evaluation.No software support.
Small number of softwareproducts of the same type.
Informally defined. Informally defined.Not verifiable.
One team.
Repeatable Ad-hoc workflow defined. Ad-hoc evaluation soft-ware.
Small number of softwareproducts of the same type.Ad-hoc access to softwareproducts.
Defined. Machine-processable.Combined for somesoftware products ofthe same type.
One or few teams.
Reusable Technology-specificworkflow defined.
Reusable evaluationsoftware:- multiple softwareproducts.- multiple test data.
Multiple software prod-ucts of the same type.Generic access to soft-ware products.
Machine-processable. Machine-processable.Combined for manysoftware products ofthe same type.
Several teams.
Integrated Generic workflow defined.Machine-processable andbuilt reusing common parts.Evaluation resources builtupon shared principles.
Evaluationinfrastructure:- multiple types ofsoftware products.- multiple test data.
Multiple software prod-ucts of different types.Generic access to soft-ware products.
Machine-processable.Reused across evalua-tions.
Machine-processable.Combined for manysoftware products ofdifferent types.
Several teams.Stakeholders.
Optimized Generic workflow defined.Machine-processable andbuilt reusing common parts.Evaluation resources builtupon shared principles.Measured and optimized.
Federation of evaluationinfrastructures:- autonomous infras-tructures.- interchange of evalua-tion resources.- data access and usepolicies.
Multiple softwareproducts of differenttypes.Generic access tosoftware products.Support any softwareproduct requirement.
Machine-processable.Reused acrossevaluations.Customizable,optimized andcurated.
Machine-processable.Combined for manysoftware products ofdifferent types.High availability andquality.
Community.
characteristics of such software products. This workflow issupported by evaluation software that can be used to assessany software product of the type covered by the evaluation;the software product must have previously implemented therequired mechanisms to be integrated with the evaluation soft-ware. Test data and evaluation results are machine-processable;therefore, they can be reused. Furthermore, the results can becombined for all the software products of the same type.
D. Level 4. IntegratedAt this level, several teams in collaboration with relevant
stakeholders (e.g., users or providers) define a generic evalu-ation framework that can be used with any type of softwareproduct. This generic framework for software evaluation al-lows building evaluation resources (i.e., evaluation workflow,tools, test data, and results) upon shared principles and reusingcommon parts. Here, evaluation workflows are defined in amachine-interpretable format so they can be automated. Anevaluation infrastructure gives support both to the evaluationof multiple types of software products, taking into accounttheir different characteristics, and to the management of thedifferent evaluation resources. Test data can be reused acrossdifferent evaluations, and the evaluation results can be com-bined for software products of different types.
E. Level 5. OptimizedAt this level the whole community has adopted a generic
framework for software evaluation in which evaluation work-flows are measured and optimized. The centralized scenarioof the previous levels has now evolved into a federationof autonomous evaluation infrastructures. These evaluationinfrastructures must support not only the evaluation workflowbut also new requirements, such as the interchange of eval-uation resources or the implementation of policies for data
access, interchange, and use. This federation of infrastructurespermits satisfying any software or hardware requirements ofthe different software products; customizing, optimizing, andcurating test data; and improving the availability and qualityof the evaluation results.
One of the notions behind the maturity model, as Figure 2shows, is that a higher maturity level implies higher integra-tion of evaluation efforts in one field, ranging from isolatedevaluations in the lower maturity level to fully-integratedevaluations in the higher level. In this scenario, maturityevolves from a starting point of decentralized efforts intocentralized infrastructures and ends with networks of federatedinfrastructures.
Another notion to consider in this model is that of cost.While the cost of defining new evaluations decreases when thematurity level increases, mainly due to the reuse of existingresources, the cost associated to the evaluation infrastructure(hardware and infrastructure development and maintenance)significantly increases.
VI. ASSESSMENTS IN THE SEMANTIC RESEARCH FIELD
This section presents how we have used SET-MM to assessthe maturity of software evaluation technologies in a specificresearch field.
Other maturity models provide appraisal methods for com-paring with the maturity model. However, we do not proposeany appraisal method because our scope is a whole researchfield and, therefore, it would be difficult to obtain objectivemetrics since any judgment would be subjective.
Therefore, our approach has been, first, to identify someevaluation efforts that stand out because of their impact in thefield and, second, to try to assess the maturity of the softwareevaluation technologies used in them.
Software Evaluation Technology Maturity Model
Automatic Manual
IBSE rdfsbs IRIBA
OWL Lite Import B. Suite
RDF(S) Import B. Suite
RDF(S) Export B. Suite
RDF(S) Interoperability B.
Suite
Tool X Tool Y Tool X Tool Y
RDF(S) Interoperability B. OWL Interoperability B.
UPM-FBI
García-Castro R. "SET-MM – A Software Evaluation Technology Maturity Model". 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE2011). pp. 660-665. Miami Beach, USA. 7-9 July 2011.
Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro
The 15 tips for technology evaluation
• Know the technology • Support different test
data • Facilitate test data
definition • Support different types
of technology • Define declarative
evaluation workflows • Use machine-
processable descriptions of evaluation resources
• Automate the evaluation framework
• Expect reproducibility • Beware of result
analysis • Learn statistics • Plan for evaluation
requirements • Use a quality model • Organize (or join)
evaluation campaigns • Share evaluation
resources • Exploit evaluation
results
50
Speaker: Raúl García-Castro Talk at IMATI-CNR, October 15th, Genova, Italy
Thank you for your attention!