The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology...

Speaker: Raúl García-Castro Talk at IMATI-CNR, October 15th, Genova, Italy

The evolution of semantic technology evaluation

in my own flesh (The 15 tips for technology evaluation)

Raúl García-Castro

Ontology Engineering Group. Universidad Politécnica de Madrid, Spain

[email protected]

Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro

Index

•  Self-awareness •  Crawling (Graduation Project) •  Walking (Ph.D. Thesis) •  Cruising (Postdoctoral Research) •  Insight

2


Who am I?

•  Assistant Professor -  Ontology Engineering Group -  Computer Science School at Universidad Politécnica de Madrid (UPM)

•  Research lines -  Evaluation and benchmarking of semantic technologies

•  Conformance and interoperability of ontology engineering tools •  Evaluation infrastructures

-  Ontological engineering •  Sensors, ALM, energy efficiency, context, software evaluation

-  Application integration

3

http://www.garcia-castro.com/


Semantic Web technologies

The Semantic Web is: •  “An extension of the current web in which information is given well-

defined meaning, better enabling computers and people to work in cooperation” [Berners-Lee et al., 2001]

•  A common framework for data sharing and reusing across applications •  Distinctive characteristics:

-  Use of W3C standards -  Use ontologies as data models -  Inference of new information -  Open world assumption

•  High heterogeneity: -  Different functionalities

•  In general •  In particular

-  Different KR formalisms • Different expressivity • Different reasoning capabilities

4

Distributed data

repository

Distributed instance repository

Ontology learner

Ontology selector

Distributed alignment repository

Ontology ranker

Ontology versioner

Service composer

Distributed annotated

data repository Ontology

evaluator

Ontology visualizer

Ontology editor

Ontology browser

Ontology profiler

Service orchestration

Service choreography

engine

ONTOLOGY DEVELOPMENT & MANAGEMENT

ONTOLOGY CUSTOMIZATION

Manual annotation

Ontology populator

Query answering

Ontology merger

Instance editor

Ontology integrator

Ontology transformer

Ontology reconciler

ONTOLOGY ALIGNMENT

DATA MANAGEMENT

Ontology evolution manager

Ontology evolution visualizer

ONTOLOGY EVOLUTION

Ontology searcher

Ontology localizer

Ontology configuration

manager

Ontology aligner

Ontology matcher

Semantic query

processor

Semantic query editor

Service process mediator

Service non-functional

selector

Service discoverer

Service directory manager Distributed

ontology repository

Information Directory manager

Ontology modularizer

Automatic annotation

Distributed registry

SEMANTIC WEB

SERVICES

ONTOLOGY INSTANCE

GENERATION

QUERYING AND

REASONING

García-Castro, R.; Muñoz-García, O.; Gómez-Pérez, A.; Nixon L. "Towards a component-based framework for developing Semantic Web applications". 3rd Asian Semantic Web Conference (ASWC 2008). 2-5 February, 2009. Bangkok, Thailand.


Ontology engineering tools

Allow the creation and management of ontologies: •  Ontology editors

-  User oriented

•  Ontology language APIs -  Programming oriented

5


Index


6

http://www.phdcomics.com/comics/archive.php?comicid=1012


Evaluation goal

GQM paradigm: Any software measurement activity should be preceded by:

1.- The identification of a software engineering goal ...

Goal: To improve the performance and the scalability of the methods provided by the ontology management APIs of ontology development tools

2.- ... which leads to questions ...

Which is the actual performance of the API methods?

Is the performance of the methods stable?

Are there any anomalies in the performance of the methods?

Do changes in a method’s parameters affect its performance?

Does tool load affect the performance of the methods?

Metric: Execution times of the methods of the API with different load factors

3.- ... which in turn lead to actual metrics.

Execution time of each method

Variance of execution times of each method

Percentage of execution times out of range in each method’s sample

Execution time with parameter A = Execution time with parameter B

Tool load versus execution time relationship

Latency

Scalability

7


Evaluation data

8

•  Atomic operations of the ontology management API •  Multiple benchmarks defined for each method according to

changes in its parameters •  Benchmarks parameterised according to the number of

consecutive executions of the method

insertConcept insertRelation insertClassAttribute insertInstanceAttribute insertConstant insertReasoningElement insertInstance updateConcept updateRelation updateClassAttribute updateInstanceAttribute updateConstant updateReasoningElement updateInstance .......

(72 methods)

insertConcept(String ontology, String concept)

Concept_1 . . .

Concept_N

benchmark1_1_09(N) “Inserts 1 concept in N ontologies”

benchmark1_1_08(N) “Inserts N concepts in 1 ontology”

Ontology_1

Concept_1

Ontology_1

Ontology_N

.

.

.

(128 benchmarks)

Talk at IMATI-CNR. 15th October 2013 © Raúl García Castro 9

Workload generator

•  Generates and inserts into the tool synthetic ontologies accordant with: -  Load factor (X). Defines the size of ontology data - Ontology structure dependent on the benchmarks

benchmark1_1_08

benchmark1_1_09

Inserts N concepts in an ontology

Inserts a concept in N ontologies

1 ontology

N ontologies

benchmark1_3_20

benchmark1_3_21

Removes N concepts from an ontology

Removes a concept from N ontologies

1 ontology with N concepts

N ontologies with 1 concept

Execution needs Operation Benchmark

For executing all the benchmarks, the ontology structure includes the execution needs of all the benchmarks


Evaluation infrastructure

Benchmark Suite

Executor

Workload Generator

Performance Benchmark

Suite

Ontology Development

Tool

Measurement Data Library

Statistical Analyser

To be instantiated for each tool

… http://knowledgeweb.semanticweb.org/wpbs/


Statistical analyser Benchmark

Suite Executor

Workload Generator


Suite


Tool



Statistical software

BenchStats

y=154.96-0.001x 0.25 151 10 150 160 400 5000 benchmark1_3_21

y=155.25-0.003x 1.25 150 10 150 160 400 5000 benchmark1_3_20

y=910.25-0.003x 1.75 911 11 901 912 400 5000 benchmark1_1_09

y=62.0-0.009x 1.25 60 0 60 60 400 5000 benchmark1_1_08

Function % Outliers Median IQR LQ UQ N Load


benchmark1_1_08 400 measurements

2134 ms. 2300 ms. 2242 ms. 2809 ms. ...


1399 ms. 2180 ms. ...


2032 ms. 1459 ms. ...

…


Result analysis - Latency Metric for the execution time: The median of the execution times of a method

No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.

N=400, X=5000

Metric for the variability of the execution time: The interquartile range of the execution times of a method


N=400, X=5000

Metric for anomalies in the execution times: Percentage of outliers in the execution times of a method


N=400, X=5000

Effect of changes in method parameters: Comparison of the medians of the execution times of the benchmarks that use the same method


N=400, X=5000

8 Methods with execution times>800 ms.

3 methods with IQR>11 ms.

2 methods with % outliers>5%

5 methods with differences in execution times > 60 ms.


Result analysis - Scalability

Effect of changes in WebODE’s load:

Slope of the function estimated by simple linear regression of the medians of the execution times from a minimum load (X=500) to a maximum one (X=5000).

8 methods with slope>0.1 ms.

N=400, X=500..5000


Limitations

•  Evaluating other tools is expensive

•  Analysis of results was difficult -  The evaluation was executed 10 times with different load factors:

500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000 -  128 benchmarks x 10 executions = 1280 files with results!!!!!


Tool


Tool


Tool

Benchmark Suite

Executor

Workload Generator


Suite


Tool



García-Castro R., Gómez-Pérez A "Guidelines for Benchmarking the Performance of Ontology Management APIs" 4th International Semantic Web Conference (ISWC2005), LNCS 3729. November 2005. Galway, Ireland.


The 15 tips for technology evaluation

•  Know the technology

•  Support different types of technology

•  Automate the evaluation framework

•  Expect reproducibility •  Beware of result

analysis •  Learn statistics •  Plan for evaluation

requirements

15


Index


16

KHAAAAN!



Interoperability in the Semantic Web

•  Interoperability is the ability that Semantic Web technologies have to interchange ontologies and use them -  At the information level; not at the system level -  In terms of knowledge reuse; not information integration

•  In the real world it is not feasible to use a single system or a single formalism

•  Different behaviours in interchanges between different formalisms:

17

Same formalism A B disjoint

A B disjoint

Different formalism

A B disjoint

C subclass

disjoint subclass

C subclass

A B

C subclass subclass

A B

C myDisjoint myDisjoint

A B

C

LOSS

LESS

A B

LOSS


Evaluation goal

To evaluate and improve the interoperability of Semantic Web technologies using RDF(S) and OWL as interchange languages

18


Evaluation workflow - Manual

Tool X

Oi = Oi’ + α – α’

Oi Oi’ RDF(S)/OWL

Ontologies

Import

Tool X

Oi = Oi’ + β - β’

Oi Oi’ RDF(S)/OWL

Ontologies

Export

Tool Y

Oi = Oi’’ + α - α’ + β - β’

Oi’’

Tool X

Oi Oi’ RDF(S)/OWL

Ontologies

Interoperability

19


Evaluation workflow - Automatic

20

Existing ontologies

O1..On Tool X Tool Y

Interchange language

Step 1: Import + Export

O1 = O1’’ + α - α’

Step 2: Import + Export

O1’’=O1’’’’ + β - β’

Interchange

O1 = O1’’’’ + α - α’ + β - β’

O1 O1’ O1’’ O1’’’ O1’’’’ RDF(S)/OWL RDF(S)/OWL RDF(S)/OWL


Evaluation data - OWL Lite Import Test Suite

21

RDF/XML Syntax variants <rdf:Description rdf:about="#class1"> <rdf:type rdf:resource="&rdfs;Class"/> </rdf:Description>

<rdfs:Class rdf:about="#class1"> </rdfs:Class>

=

Component combinations

Subclass of class Subclass of restriction Value constraints

Cardinality + object property

Cardinality + datatype property

Set operators

Group No. Class hierarchies 17 Class equivalences 12 Classes defined with set operators 2 Property hierarchies 4 Properties with domain and range 10 Relations between properties 3 Global cardinality constraints and logical property characteristics

5

Single individuals 3 Named individuals and properties 5 Anonymous individuals and properties 3 Individual identity 3 Syntax and abbreviation 15 TOTAL 82

David S., García-Castro, R.; Gómez-Pérez, A. "Defining a Benchmark Suite for Evaluating the Import of OWL Lite Ontologies". Second International Workshop OWL: Experiences and Directions 2006 (OWL2006). November, 2006. Athens, Georgia, USA.


Evaluation criteria

22

•  Execution informs about the correct execution: –  OK. No execution problem –  FAIL. Some execution problem –  Comparer Error (C.E.) Comparer exception –  Not Executed. (N.E.) Second step not executed

•  Information added or lost in terms of triples

•  Interchange informs whether the ontology has been interchanged correctly with no addition or loss of information: –  SAME if Execution is OK and Information added and

Information lost are void –  DIFFERENT if Execution is OK but Information added

or Information lost are not void –  NO if Execution is FAIL, N.E. or C.E.

Oi = Oi’ + α - α’

Oi = Oi’ ?


Evaluation campaigns

23

9 Tools

SemTalk (Frames) (OWL) (Frames)

1 ontology-based annotation tool

3 ontology repositories

5 ontology development tools

6 Tools (Frames)

3 ontology repositories

3 ontology development tools

RDF(S) Interoperability Benchmarking

OWL Interoperability Benchmarking


Evaluation infrastructure - IRIBA

!

24

!

! !

!

!

http://knowledgeweb.semanticweb.org/iriba/

Evaluation infrastructure - IBSE

•  Automatically executes experiments between all the tools •  Allows configuring different execution parameters •  Uses ontologies to represent benchmarks and results •  Depends on external ontology comparers (KAON2 OWL Tools and RDF-

utils)

25

Describe benchmarks

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#"

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" arkOntology#" arkOntology#"> <owl:Ontology rescription of the benchmark suite inputs.</rdfs:comment> <owl:versionInfo>24 October 2006</owl:versionInfo> </owl:Ontology> 

Generate reports

Execute benchmarks

Benchmark descriptions

Execution results

Tools

Reports (HTML, SVG)

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#"

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" arkOntology#" arkOntology#"> <owl:Ontology rescription of the benchmark suite inputs.</rdfs:comment> <owl:versionInfo>24 October 2006</owl:versionInfo> </owl:Ontology> 

OWL Lite Import

Benchmark Suite

1

2 3

benchmarkOntology

rdf:type

resultOntology

rdf:type

…

http://knowledgeweb.semanticweb.org/benchmarking_interoperability/ibse/

García-Castro, R.; Gómez-Pérez, A., Prieto-González J. "IBSE: An OWL Interoperability Evaluation Infrastructure". Third International Workshop OWL: Experiences and Directions 2007 (OWL2007). June, 2007. Innsbruck, Austria.


Evaluation results - Variability

•  High variability in evaluation results

•  Different perspectives for analysis -  Results per tool / pair of tools -  Results per component -  Result evolution over time -  …

26

Tool import/export: Models and executes

Does not model and executes Models and fails

Does not model and fails Not executed

Same information More information Less information

Tool fails Comparer fails

Not valid ontology

Ontology comparison:

0

5

10

15

20

25

30

35

40

45

50

04-2005 05-2005 10-2005 01-2006

Models and executes

Not models and executes

Models and fails

Not models and fails

Combinations K-K P-P W-W K-P K-W P-W K-P-W Classes (2) Y Y Y Y Y Y Y Classes instance of a single metaclass (4) Y Y - N - - - Classes instance of multiple metaclasses (1) Y N - N - - - Class hierarchies without cycles (3) Y Y Y Y Y Y Y Class hierarchies with cycles (2) - - - - - - - Classes related through object or datatype properties (6) - - - - - - - Datatype properties without domain or range (7) Y Y - N - - - Datatype properties with multiple domains (3) Y - - - - - - Datatype properties whose range is String (5) Y Y Y N N Y N Datatype properties whose range is a XML Schema datatype (2) Y - Y - Y - - Object properties without domain or range (8) Y Y - Y - - - Object properties with a domain and a range (2) Y Y Y Y Y Y Y Object properties with multiple domains or ranges (5) Y - - - - - - Instances of undefined resources (1) - - - - - - - Instances of a single class (2) Y Y Y Y Y Y Y Instances of a multiple classes (1) Y N - N - - - Instances related via object properties (7) Y Y Y Y Y Y Y Instances related via datatype properties (2) Y Y Y N Y Y N Instances related via datatype properties with range a XML schema datatype (2) - - Y - - - - Instances related via undefined object or datatype properties (3) - - - - - - -

OR

IGIN

DESTINATION JE PO SW K2 GA ST WE PF

Jena 100 100 100 78 85 16 17 5 Protégé-OWL 100 100 95 78 89 16 17 5

SWIProlog 100 100 100 78 55 45 17 5 KAON2 78 78 78 78 40 39 6 0 GATE 96 52 79 74 46 13 15 13

SemTalk 45 46 46 27 24 46 17 0 WebODE 17 18 0 6 16 17 17 12

Protégé-Frames 5 5 0 0 4 5 0 13

DESTINATION

OR

IGIN

JE PO SW K2 GA ST WE PF Jena 100 100 100 78 85 16 17 5

Protégé-OWL 100 100 95 78 89 16 17 5 SWIProlog 100 100 100 78 55 45 17 5

KAON2 78 78 78 78 40 39 6 0 GATE 96 52 79 74 46 13 15 13

SemTalk 45 46 46 27 24 46 17 0 WebODE 17 18 0 6 16 17 17 12

Protégé-Frames 5 5 0 0 4 5 0 13


Evaluation results - Interoperability

27

Clear picture of the interoperability between different tools •  Low interoperability and few clusters of interoperable tools •  Interoperability depends on:

-  Ontology translation (tool knowledge model) -  Specification (development decisions) -  Robustness (tool defects) -  Tools participating in the interchange (each behaves differently)

•  Tools have improved •  Involvement of tool developers is needed

-  Tool developers have been informed -  Tool improvement is out of our scope

•  Results are expected to change -  Continuous evaluation is needed

García-Castro, R.; Gómez-Pérez, A. "Interoperability results for Semantic Web technologies using OWL as the interchange language". Web Semantics: Science, Services and Agents in the World Wide Web. ISSN: 1570-8268. Elsevier. Volume 8, number 4. pp. 278-291. November 2010.

García-Castro, R.; Gómez-Pérez, A. "RDF(S) Interoperability Results for Semantic Web Technologies". International Journal of Software Engineering and Knowledge Engineering. ISSN: 0218-1940. Editor: Shi-Kuo Chang. Volume 19, number 8. pp. 1083-1108. December 2009.


Benchmarking interoperability

28

Method for benchmarking interoperability •  Common for different Semantic Web technologies •  Problem-focused instead of tool-focused •  Manual vs automatic experiments:

-  It depends on the specific needs of the benchmarking -  Automatic: cheaper, more flexible and extensible -  Manual: higher quality of results

Resources for benchmarking interoperability •  All the benchmark suites, software and results are publicly

available •  Independent of:

-  The interchange language -  The input ontologies

Automatic Manual

IBSE rdfsbs IRIBA

OWL Lite Import B. Suite RDF(S) Import B. Suite RDF(S) Export B. Suite

RDF(S) Interoperability B. Suite

Tool X Tool Y Tool X Tool Y

RDF(S) Interoperability B. OWL Interoperability B.

García-Castro, R. "Benchmarking Semantic Web technology". Studies on the Semantic Web vol. 3. AKA Verlag – IOS Press. ISBN: 978-3-89838-622-7. January 2010.


Limitations

•  Number of results to analyse increased exponentially -  2168 executions in the RDF(S) benchmarking activity and -  6642 executions in the OWL one

•  Hard to support and maintain different test data and tools

•  Every tool to be evaluated had to be deployed in the same computer

29



•  Know the technology •  Support different test

data

•  Support different types of technology

•  Use machine-processable descriptions of evaluation resources




requirements

•  Organize (or join) evaluation campaigns

30


Index


31



The SEALS Project (RI-238975)

32

Universidad Politécnica de Madrid, Spain (Coordinator) University of Sheffield, UK Forschungszentrum InformaCk, Germany University of Innsbruck, Austria InsCtut NaConal de Recherche en InformaCque et en AutomaCque, France

1

3 1

2 1 2

University of Mannheim, Germany University of Zurich, Switzerland STI InternaConal, Austria Open University, UK Oxford University, UK

Project Coordinator: Asunción Gómez Pérez <[email protected]>

hVp://www.seals-‐project.eu/ EC contribu2on: 3.500.000 €

Dura2on: June 2009-‐June 2012


Semantic technology evaluation @ SEALS

33

SEALS Pla8orm

SEALS Evalua2on Campaigns SEALS Community

SEALS Evalua2on Services

Wrigley S.; García-Castro R.; Nixon L. "Semantic Evaluation At Large Scale (SEALS)". 21st International World Wide Web Conference (WWW 2012). European projects track. pp. 299-302. Lyon, France. 16-20 April 2012.


Test Data

Evaluations

Tools Results

The SEALS entities

34

Ontology engineering Storage and reasoning

Ontology matching Semantic search

Semantic web service

Raw Results Interpretations

15/10/13

34


Structure of the SEALS entities

35

EnCty

Data Metadata Discovery, Valida2on

Exploita2on

• Java Binaries • Shell scripts • Bundles

• BPEL • Java Binaries • Ontologies

http://www.seals-project.eu/ontologies/ SEALS Ontologies

García-Castro R.; Esteban-Gutiérrez M.; Kerrigan M.; Grimm S. "An Ontology Model to Support the Automatic Evaluation of Software". 22nd International Conference on Software Engineering and Knowledge Engineering (SEKE 2010). pp. 129-134. Redwood City, USA. 1-3 July 2010.


SEALS logical architecture

36

SEALS Service Manager

Run2me Evalua2on Service

SEALS Portal

Test Data Repository Service

Tools Repository Service

Results Repository Service

Evalua2on Descrip2ons

Repository Service

Technology Providers

Evaluation Organisers

Technology Adopters

Software agents

SEALS Repositories

A OS

García-Castro R.; Esteban-Gutiérrez M.; Gómez-Pérez A. "Towards an Infrastructure for the Evaluation of Semantic Technologies". eChallenges e-2010 Conference (e-2010). pp. 1-8. Warsaw, Poland. 27-29 October 2010.


Challenges

•  Tool heterogeneity -  Hardware requirements -  Software requirements

•  Reproducibility -  Ensure execution environment offers

the same initial status

37

Virtualiza)on Solu)on •  VMWare Server 2.0.2 •  VMWare vSphere 4 •  Amazon EC2 (In progress)

Tool

Virtual Machine

Tool

Execu)on Node

…

Processing Node

Virtual Machine

Virtualization as a

technology enabler


Evaluation campaign methodology

38

SEALS Methodology for Evaluation Campaigns

Raúl García-Castro and Stuart N. Wrigley

September 2011

INITIATION INVOLVEMENT

PREPARATION & EXECUTION

DISSEMINATION

FINALIZATION

•  SEALS-independent •  Includes:

-  Actors -  Process -  Recommendations -  Alternatives -  Terms of participation -  Use rights

García Castro R.; Martin-Recuerda F.; Wrigley S. "SEALS. Deliverable 3.8 SEALS Methodology for Evaluation Campaigns v2". Technical Report. SEALS project. July 2011.


Ontology engineering

Ontology reasoning

Ontology matching

Semantic search

Semantic web service

•  Conformance •  Interoperability •  Scalability

DL reasoning: • Classification • Class satisfiability

• Ontology satisfiability

• Entailment • Non-entailment •  Instance retrieval RDF reasoning: • Conformance

• Matching accuracy

• Matching accuracy multilingual

• Scalability (ontology size, # CPU)

• Search accuracy, efficiency (automated)

• Usability, satisfaction (user-in-the-loop)

• SWS Discovery

Conformance & interoperability: •  RDF(S) •  OWL Lite, DL and

Full •  OWL 2

Expressive x3 •  OWL 2 Full Scalability: •  Real-world •  LUBM •  Real-world + •  LUBM +

DL reasoning: •  Gardiner test

suite •  Wang et al.

repository •  Versions of

GALEN •  Ontologies from

EU projects •  Instance

retrieval test data

RDF reasoning: • OWL 2 Full

•  Benchmark •  Anatomy

Conference •  MultiFarm •  Large Biomed

(supported by SEALS)

Automated: • EvoOnt • MusicBrainz (from QALD-1)

User-in-the-loop: • Mooney • Mooney +

• OWLS-TC 4.0 • SAWSDL-TC 3.0 • WSMO-LITE-TC

Current SEALS evaluation services

Test Data

Evaluations

39


New evaluation data – Conformance and interoperability

•  OWL DL test suite à keyword-driven approach -  Manual definition of test in CSV/

spreadsheet using a keyword library

•  OWL 2 test suite à automatically generate ontologies of increasing expressiveness: -  Using ontologies in the Web -  Maximizing expressiveness

40

Test Suite Definition

Script

Test Suite

…

Preprocessor

Interpreter

Expanded Test Suite Definition

Script

Test Suite Generator

Keyword Library

ontology01.owl

ontology02.owl

ontology03.owl

Metadata

Initial ontologies

Original test suite

Metadata

Ontology Search

Ontology generation process

Ontology Module Extraction

Increase expressivity

Maximize expressivity

Expressive test suite

Metadata

Full-expressive test suite

Metadata

Online Ontologies

OWL API

OWLDLGenerator (http://knowledgeweb.semanticweb.org/benchmarking_interoperability/OWLDLGenerator/)

García-Castro R.; Gómez-Pérez A. "A Keyword-driven Approach for Generating Ontology Language Conformance Test Data". Engineering Applications of Artificial Intelligence. ISSN: 0952-1976. Elsevier. Editor: B. Grabot.

Grangel-González I.; García-Castro R. "Automatic Conformance Test Data Generation Using Existing Ontologies in the Web". Second International Workshop on Evaluation of Semantic Technologies (IWEST 2012). 28 May 2012. Heraklion, Greece.

OWL2EG (http://knowledgeweb.semanticweb.org/benchmarking_interoperability/OWL2EG/)


1st Evaluation Campaign

41

Campaign Tool Provider Country Ontology engineering Jena HP Labs UK

Sesame Aduna Netherlands Protégé 4 University of Stanford USA Protégé OWL University of Stanford USA NEON toolkit NEON Foundation Europe OWL API University of Manchester UK

Reasoning HermiT University of Oxford UK jcel Tec. Universitat Dresden Germany FaCT++ University of Manchester UK

Matching AROMA INRIA France ASMOV INFOTECH Soft USA Aroma Nantes University France Falcon-AO Southeast University China Lily Southeast University China RiMOM Tsinghua University China Mapso FZI Germany CODI University of Mannheim Germany AgreeMaker Advances in Computing Lab USA Gerome* RWTH Aachen Germany Ef2Match Nanyang Tec. University China

Semantic search K-Search K-Now Ltd UK Ginseng University of Zurich Switzerland NLP-Reduce University of Zurich Switzerland PowerAqua KMi, Open University UK Jena Arq HP Labs, Talis UK

Semantic web service 4 OWLS-MX variants DFKI Germany

29 tools from 8 countries

Nixon L.; García-Castro R.; Wrigley S.; Yatskevich M.; Trojahn-dos-Santos C.; Cabral L. "The state of semantic technology today – overview of the first SEALS evaluation campaigns". 7th International Conference on Semantic Systems (I-SEMANTICS2011). Graz, Austria. 7-9 September 2011.


WP

Tool Provider Country

10 Jena HP Labs UK Sesame Aduna Netherlands Protégé 4 University of Stanford USA Protégé OWL University of Stanford USA NeOn toolkit NeOn Foundation Europe OWL API University of Manchester UK

11 HermiT University of Oxford UK jcel Technischen Universitat Dresden Germany FaCT++ University of Manchester UK WSReasoner University of New Brunswick Canada

2nd Evaluation Campaign

42

41 tools from 13 countries

WP


12 AgrMaker University of Illinois at Chicago USA Aroma INRIA Grenoble Rhone-Alpes France AUTOMSv2 VTT Technical Research Centre Finland CIDER Universidad Politecnica de

Madrid Spain

CODI Universitat Mannheim Germany CSA University of Ho Chi Minh City Vietnam GOMMA Universitat Leipzig Germany Hertuda Technische Universitat

Darmstadt Germany

LDOA Tunis-El Manar University Tunisia Lily Southeast University China LogMap University of Oxford UK LogMapLt University of Oxford UK MaasMtch Maastricht University Netherlands MapEVO FZI Forschungszentrum

Informatik Germany

MapPSO FZI Forschungszentrum Informatik

Germany

MapSSS Wright State University USA Optima University of Georgia USA WeSeEMtch Technische Universitat

Darmstadt Germany

YAM++ LIRMM France

WP


13 K-Search K-Now Ltd UK Ginseng University of Zurich Switzerland NLP-Reduce University of Zurich Switzerland PowerAqua KMi, Open University UK Jena Arq v2.8.2 HP Labs, Talis UK Jena Arq v2.9.0 HP Labs, Talis UK rdfQuery v0.5.1-beta

University of Southampton UK

Semantic Crystal University of Zurich Switzerland Affective Graphs University of Sheffield UK

14 WSMO-LITE-OU KMi, Open University UK SAWSDL-OU KMi, Open University UK OWLS-URJC University of Rey Juan Carlos Spain OWLS-M0 DFKI Germany


Evaluation services

43

Tools

Test data Evaluations

Results

My tool

My test data

My results

My results Tools Evaluations

Tools Test data

My evaluation

My tool

My results

My test data

Exploit results

Execute evaluaCons

Update them

Or define your own


Quality model for semantic technologies

44

Tool/Measures Raw Results Interpretations Quality

Measures Quality sub-

characteristics

Ontology engineering tools 7 20 8 6

Ontology matching tools 1 4 4 2

Reasoning systems 11 0 16 5

Semantic search tools 12 8 18 7

Semantic web service tools 5 9 10 2

Total 34 41 55 17

Radulovic, F., Garcia-Castro, R., Extending Software Quality Models - A Sample In The Domain of Semantic Technologies. 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE2011). Miami, USA. July, 2011


Semantic technology recommendation

45

I need a robust ontology engineering tool and a seman.c search tool with the highest precision

User Quality Requirements

SEALS Pla<orm Tools Repository Service

Results Repository Service

Seman2c Technology

Recommenda2on

RecommendaCon

SemanCc Technology

Quality model

You should use Sesame v2.6.5 and Arq v2.9.0

The reason for this is...

Alterna.vely, you can use ...

Radulovic F.; García-Castro R. "Semantic Technology Recommendation Based on the Analytic Network Process". 24th Int. Conference on Software Engineering and Knowledge Engineering (SEKE 2012). Redwood City, CA, USA. 1-3 July 2012. 3rd Best Paper Award!


You can use the SEALS Platform

•  The SEALS Platform facilitates: -  Comparing tools under common settings -  Reproducibility of evaluations -  Reusing evaluation resources, completely or partially -  Or defining new ones -  Managing evaluation resources using platform services -  Computational resources for demanding evaluations

•  Don’t start your evaluation from scratch!

46

15/10/13

46




data •  Facilitate test data

definition •  Support different

types of technology •  Define declarative

evaluation workflows •  Use machine-

processable descriptions of evaluation resources




requirements •  Use a quality model •  Organize (or join)

evaluation campaigns •  Share evaluation

resources •  Exploit evaluation

results

47


Index


48

insight noun […] [mass noun] Psychiatry awareness by a mentally ill person that their mental experiences are not based in external reality.


Evolution towards maturity

49

TABLE ILEVELS AND THEMES OF SOFTWARE EVALUATION TECHNOLOGY MATURITY

Level Formalization of theevaluation workflow

Software support tothe evaluation

Applicability to multiplesoftware types

Usability of test data Exploitability ofresults

Representativenessof participants

Initial Ad-hoc workflow informallydefined.

Manual evaluation.No software support.

Small number of softwareproducts of the same type.

Informally defined. Informally defined.Not verifiable.

One team.

Repeatable Ad-hoc workflow defined. Ad-hoc evaluation soft-ware.

Small number of softwareproducts of the same type.Ad-hoc access to softwareproducts.

Defined. Machine-processable.Combined for somesoftware products ofthe same type.

One or few teams.

Reusable Technology-specificworkflow defined.

Reusable evaluationsoftware:- multiple softwareproducts.- multiple test data.

Multiple software prod-ucts of the same type.Generic access to soft-ware products.

Machine-processable. Machine-processable.Combined for manysoftware products ofthe same type.

Several teams.

Integrated Generic workflow defined.Machine-processable andbuilt reusing common parts.Evaluation resources builtupon shared principles.

Evaluationinfrastructure:- multiple types ofsoftware products.- multiple test data.

Multiple software prod-ucts of different types.Generic access to soft-ware products.

Machine-processable.Reused across evalua-tions.

Machine-processable.Combined for manysoftware products ofdifferent types.

Several teams.Stakeholders.

Optimized Generic workflow defined.Machine-processable andbuilt reusing common parts.Evaluation resources builtupon shared principles.Measured and optimized.

Federation of evaluationinfrastructures:- autonomous infras-tructures.- interchange of evalua-tion resources.- data access and usepolicies.

Multiple softwareproducts of differenttypes.Generic access tosoftware products.Support any softwareproduct requirement.

Machine-processable.Reused acrossevaluations.Customizable,optimized andcurated.

Machine-processable.Combined for manysoftware products ofdifferent types.High availability andquality.

Community.

characteristics of such software products. This workflow issupported by evaluation software that can be used to assessany software product of the type covered by the evaluation;the software product must have previously implemented therequired mechanisms to be integrated with the evaluation soft-ware. Test data and evaluation results are machine-processable;therefore, they can be reused. Furthermore, the results can becombined for all the software products of the same type.

D. Level 4. IntegratedAt this level, several teams in collaboration with relevant

stakeholders (e.g., users or providers) define a generic evalu-ation framework that can be used with any type of softwareproduct. This generic framework for software evaluation al-lows building evaluation resources (i.e., evaluation workflow,tools, test data, and results) upon shared principles and reusingcommon parts. Here, evaluation workflows are defined in amachine-interpretable format so they can be automated. Anevaluation infrastructure gives support both to the evaluationof multiple types of software products, taking into accounttheir different characteristics, and to the management of thedifferent evaluation resources. Test data can be reused acrossdifferent evaluations, and the evaluation results can be com-bined for software products of different types.

E. Level 5. OptimizedAt this level the whole community has adopted a generic

framework for software evaluation in which evaluation work-flows are measured and optimized. The centralized scenarioof the previous levels has now evolved into a federationof autonomous evaluation infrastructures. These evaluationinfrastructures must support not only the evaluation workflowbut also new requirements, such as the interchange of eval-uation resources or the implementation of policies for data

access, interchange, and use. This federation of infrastructurespermits satisfying any software or hardware requirements ofthe different software products; customizing, optimizing, andcurating test data; and improving the availability and qualityof the evaluation results.

One of the notions behind the maturity model, as Figure 2shows, is that a higher maturity level implies higher integra-tion of evaluation efforts in one field, ranging from isolatedevaluations in the lower maturity level to fully-integratedevaluations in the higher level. In this scenario, maturityevolves from a starting point of decentralized efforts intocentralized infrastructures and ends with networks of federatedinfrastructures.

Another notion to consider in this model is that of cost.While the cost of defining new evaluations decreases when thematurity level increases, mainly due to the reuse of existingresources, the cost associated to the evaluation infrastructure(hardware and infrastructure development and maintenance)significantly increases.

VI. ASSESSMENTS IN THE SEMANTIC RESEARCH FIELD

This section presents how we have used SET-MM to assessthe maturity of software evaluation technologies in a specificresearch field.

Other maturity models provide appraisal methods for com-paring with the maturity model. However, we do not proposeany appraisal method because our scope is a whole researchfield and, therefore, it would be difficult to obtain objectivemetrics since any judgment would be subjective.

Therefore, our approach has been, first, to identify someevaluation efforts that stand out because of their impact in thefield and, second, to try to assess the maturity of the softwareevaluation technologies used in them.

Software Evaluation Technology Maturity Model

Automatic Manual

IBSE rdfsbs IRIBA

OWL Lite Import B. Suite

RDF(S) Import B. Suite

RDF(S) Export B. Suite

RDF(S) Interoperability B.

Suite

Tool X Tool Y Tool X Tool Y

RDF(S) Interoperability B. OWL Interoperability B.

UPM-FBI

García-Castro R. "SET-MM – A Software Evaluation Technology Maturity Model". 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE2011). pp. 660-665. Miami Beach, USA. 7-9 July 2011.




data •  Facilitate test data

definition •  Support different types

of technology •  Define declarative

evaluation workflows •  Use machine-

processable descriptions of evaluation resources




requirements •  Use a quality model •  Organize (or join)

evaluation campaigns •  Share evaluation

resources •  Exploit evaluation

results

50

Speaker: Raúl García-Castro Talk at IMATI-CNR, October 15th, Genova, Italy

Thank you for your attention!

The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology...

Technology

Transcript of The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology...