VocBench - pdfs.semanticscholar.org · Maria Teresa Pazienza, Noemi Scarpato, Armando Stellato and...

33
University of Rome Tor Vergata VocBench A Web Application for Collaborative Development of Multilingual Thesauri Armando Stellato + , Sachit Rajbhandari*, Andrea Turbati + , Manuel Fiorelli + Caterina Caracciolo*, Tiziano Lorenzetti + , Johannes Keizer*, Maria Teresa Pazienza + + ART Group, Dept of Enterprise Engineering, University of Rome Tor Vergata, Via del Politecnico 1, 00133 Rome, Italy *Food and Agricultural Organization of the United Nations (FAO), Viale delle Terme di Caracalla, 00153 Rome, Italy Contacts: {surname}@info.uniroma2.it {name.surname}@fao.org Portoroz, Slovenia, 31 st May - 4 th June 2015

Transcript of VocBench - pdfs.semanticscholar.org · Maria Teresa Pazienza, Noemi Scarpato, Armando Stellato and...

University of

Rome

Tor Vergata

VocBench

A Web Application for

Collaborative Development of Multilingual Thesauri

Armando Stellato+, Sachit Rajbhandari*, Andrea Turbati+, Manuel Fiorelli+

Caterina Caracciolo*, Tiziano Lorenzetti+, Johannes Keizer*, Maria Teresa Pazienza+

+ART Group, Dept of Enterprise Engineering, University of Rome Tor Vergata, Via del Politecnico 1, 00133 Rome, Italy

*Food and Agricultural Organization of the United Nations (FAO), Viale delle Terme di Caracalla, 00153 Rome, Italy

Contacts: {surname}@info.uniroma2.it {name.surname}@fao.org

Portoroz, Slovenia, 31st May - 4th June 2015

Why was it built?

AGROVOC (big agriculture vocabulary developed by FAO)

– >32 000 concepts in up to 22 languages

– A global group of terminologists.

– No tool to support their work

– No existing tool that met all of FAO’s needs

03/06/2015 12th ESWC, Portoroz, Slovenia 2

V1.0 – 2010

• Google Web Toolkit (for the Web Application)

• Lucene (for label indexing & free-text search)

• Protégé API

– DB backend

• (later) OWLART API

• MySQL

• Custom model for

thesaurus

representation

Business logic

MySQL

Protégé 3.4OWLART API

GWT / Presentation

03/06/2015 12th ESWC, Portoroz, Slovenia 3

V1.0 – 2010

03/06/2015 12th ESWC, Portoroz, Slovenia 4

V1.x Problems

• Could not support other triple stores (Glued to Protégé API)

• Custom representation model

• No support for emerging standards, e.g. SKOS

• I/O

– No import

– Complicated export

• No support for alignments

– AGROVOC aligned to a dozen other vocabularies

• No SPARQL interface

03/06/2015 12th ESWC, Portoroz, Slovenia 5

03/06/2015 12th ESWC, Portoroz, Slovenia 6

Towards VB2.0…

Many of VB1.x limitations derived from the absence of a true RDF Backend

• not just connection to a RDF triple store

• but a proper abstraction layer providing high level functionalities for ontology/thesaurus management

Driving lines for VB2.0

• A completely rebuilt backing framework for the service and data layers, based on an already existing open source project:

Semantic Turkey [1]

– Based on OSGi Open Services Gateway

– Open Connectibility to most notable RDF middleware and triple storing technologies (Sesame2, OWLIM, Allegrograph, Jena (not maintained) )

– Native support for SKOS and SKOSXL over RDF (no more conversions from internal legacy models), other than OWL

• major reworking

– all changes under-the-hood, and leaving user experience almost unchanged.

– New features added in the following versions

[1] http://semanticturkey.uniroma2.it/

Maria Teresa Pazienza, Noemi Scarpato, Armando Stellato and Andrea Turbati Semantic Turkey: A Browser-Integrated Environment for Knowledge

Acquisition and Management, Semantic Web Journal, vol. 3, no. 3, 2012

Objectives and Requisites for a V2

R1. Multilingualism

R2. Controlled Collaboration

R3. Data Interoperability and Consistency

R4. Software Interoperability/Extensibility

R5. Scalability

R6. Under-the-hood data access/modification

R7. Ease-of-use for both users and system administrators

03/06/2015 12th ESWC, Portoroz, Slovenia 7

…and here we are

R1. Multilingualism

…and multilingual UI: (currently: English, Spanish, Dutch, Thai)

multilingual editing

…and visualization

R2. Controlled Collaboration (1/3)

Role-based Access Control

03/06/2015 12th ESWC, Portoroz, Slovenia 10

R2. Controlled Collaboration (2/3)

Formal Editorial Workflow

• Following the full life-cycle of concepts/terms, from proposal to deprecation

• Supported by Role-based Access Control

an example of a

typical workflow:

GUEST

<concept-create>

Proposed by guest

VALIDATOR

<validates>

Validated

PUBLISHER

<publishes>

Published

TERM EDITOR

<concept-edit>

Revised

ADMINISTRATOR

<validates>

Published

ONTOLOGY EDITOR

<concept-delete>

Proposed deprecated

PUBLISHER

<validates>

Deprecated

03/06/2015 12th ESWC, Portoroz, Slovenia 11

Recent Changes

• Available through a

dedicated module

• or as RSS feeds

includes both:– User changes

– Content changes

03/06/2015 12th ESWC, Portoroz, Slovenia 12

R2. Controlled Collaboration (3/3)

R3. Data Interoperability and Consistency (1/3)

Formats

• Import/Export in all popular RDF serialization formats

• Concrete availability of the various formats depend however on the connected triple store/RDF middleware

Models

• VocBench adopts a SKOS-XL + reified skos:definitions model

• Import of SKOS core data

– Refactoring for SKOSSKOS-XL

and skos:definition reification

• Export

– SKOS-XL:

• “All contents” or

• Filtered export based on broader

concept/schems

– SKOS: options for removing/keeping

reified labels and definitions

Vocabularies

• Possibility to owl:import any existing vocabulary,

from the web or from local files.

• Availaibility of a caching mirror for previously imported vocabularies

R3. Data Interoperability and Consistency (2/3)

Integrity / Consistency

• VB features a complex multi-scheme

management of thesauri

• Actions creating potential breaks in the

structure (e.g. breaking reachability of

a concept) are forbidden

• To deal with imported data, Integrity

Constraint Validation checks have

been included in the platform

– Currently, only dangling concepts have

been deal with

– More to come, already available as

services from ST

Alignment

R3. Data Interoperability and Consistency (2/3)

03/06/2015 12th ESWC, Portoroz, Slovenia 15

R4. Software

Interoperability/Extensibility

Triple Store Agnostic

• OWLART API provide:

– a very tiny layer over existing middlewares (e.g. Sesame, Jena)

– High-level “vocabulary layer” for OWL, SKOS, SKOS-XL

• What triple stores do we currently support and which connectors are

actively maintained?

– Sesame2 (standard internal triple stores, both in-memory and native)

– GraphDB/OWLIM (through Sesame remote connection, and an

optional parameter expressly dedicated to cover the different

management of graphs wrt Sesame)

– Other partners have experimented with other triple stores

https://art-uniroma2.atlassian.net/wiki/display/ST/Accessing+Various+Triplestores

– Past experiments with Allegrograph and Jena Middleware

• For GraphDB/OWLIM SE, we exploit its free-text indexing capabilities

03/06/2015 12th ESWC, Portoroz, Slovenia 16

Vendor data access layerVendor Triple store

High-level data access

Raw triple access

Vendor data access layer

OWLART API

Semantic Turkey

Business logic

GWT/Presentation

R5. Scalability

Performance

• Information is provided to the frontend as much as possible in an incremental fashion (e.g., each level of

the concept hierarchy, as nodes are expanded).

• Interfaces reverts to limited content and search-filtering for potentially exploding results

Maintenance

• ST offers a meaningful core set of RDF services…

• …however many functionalities (especially in UI) require the composition of several calls.

• Solution: combo of:

– per-service ad-hoc solutions (heavy weight single services realizing specific functionalities)

– general development facilities for the injection of additional information into common API calls (e.g. the rendering of

RDF resources is available as an extension point, with different implementations being dynamically injectable into the

SPARQL queries of several services).

03/06/2015 12th ESWC, Portoroz, Slovenia 17

R6. Under-the-hood data access/modification

Embedded SPARQL Editor

Syntax highlight…

…completion…

…and validation

R7. Ease-of-use for both users

and system administrators

Continuous check-on-start life cycle

• VB technically never recognizes itself as

installed/deployed

• At each startup it checks that the

complete set of pre-requisites for a

correct start is satisfied.

• Whenever a new VB version is installed,

if new features have been introduced, or

mandatory configuration options added,

or the database requires update batches,

the system will identify these needs and

react accordingly, eventually interacting

with the user upon necessity

03/06/2015 12th ESWC, Portoroz, Slovenia 19

Statistics Module

03/06/2015 12th ESWC, Portoroz, Slovenia 20

Three layered extensible

architecture

• Presentation Layer

– GWT (Google Web Toolkit)

Vocbench User Interface (Mozilla apps

in the original framework)

• Services Layer

– Enables communication between the

client (Vocbench UI) and the

ontology persistence layer.

– HTTP based Services accessed

through the Ajax paradigm

– OSGi Extensible Servicing System

• Persistence Layer

– Access to ontological knowledge.

– Based on dedicated ontology API,

which can be implemented through

use of different technologies.

Vocbench 2.0 (and ST) Architecture

2103/06/2015

Vocbench 2.0 (and ST) Architecture

03/06/2015 12th ESWC, Portoroz, Slovenia 22

Front end Back end

Administrative

Database

(MySQL)

Triple Store

Middleware

Hibernate

Layer

Semantic

Turkey/

OWLART

API

Gilead

Service

Wrapper

Layer

Google Web

Toolkit

(GWT)

Graph

Visualization

GWT

Incubator

Web services

VB “desktop version”:

Semantic Turkey for Firefox

Related Works (1/2)

• PoolParty: http://www.poolparty.biz/ [18]. Web-based Editor for Thesauri using Linked Data

– Support for SKOS (optional add-on for SKOS-XL)

– Use: Commercial license (Evaluated thanks to a free evaluation account for PoolParty Advanced Server version 4.5.1 (rev 5429) )

– Version Tracking is supported, as the system performs access control to some extent.

– An add-on further enables an approval workflow based on the existing role based access control mechanism.

– Editing history is shown both at project level and at entity level.

– Alignment: lookup over LOD, or different projects can be linked together

– Publishes a SPARQL endpoint, dereferenceable URIs, and a wiki with limited editing capabilities.

– Quality criteria: can enforced interactively (i.e., illegal operations are blocked), or violations are simply recorded in a quality report.

– Backed by Sesame middleware

– Incorrect multiple scheme support (violates non -entailment of scheme containment along concept hierarchies, section 4.6.4 of the SKOS Reference [1] )

• TemaTres: http://www.vocabularyserver.com/ Web-based Editor for Controlled Vocabularies

– term-based meta-model, no native support for SKOS

– Use: Free and open-source

– due to the term-based nature of the model, the export to SKOS is often confusing (e.g. two synonyms terms exported as two different concepts)

– Monolingual (though alignments between vocabularies)

– No multiple scheme support (each thesaurus is a scheme)

– Rigid access control mechanism based on user roles (administrator, editor, guest).

• Workflow management: term transition from candidate status to either accepted or rejected. “Accepted” cannot be reverted, even after modifications

– Data quality: metrics and a flexible reporting generator.

– Connectivity: available API and a few plugins (e.g. for publication over different platforms, such as WordPress) are available

03/06/2015 12th ESWC, Portoroz, Slovenia 24

Related Works (2/2)

• TopBraid EVN: Web-based Editor for Business Vocabularies http://www.topquadrant.com/products/topbraid-enterprise-vocabulary-net/

– Support for SKOS, OWL Ontologies and Content tag sets

– Use: Commercial license (We didn’t carry extensive evaluation as we did not receive the evaluation license we requested)

• SKOSEd: https://code.google.com/p/skoseditor plugin for Protégé 4.x for editing SKOS thesauri

– Support for SKOS

– Use: free of use and open-source. We have evaluated version 1.0-alpha(build04) on Protégé 4.1 as, 2.0-alpha has a bug related to scheme

management

– Desktop tool (no web application)

– Ontology editing, SKOSEd allows interweaving SKOS and OWL constructs (defect: same form for skos:Concept and skos:ConceptSchemes

– Incorrect concept scheme management (same as PoolParty)

– Being an extension of Protégé 4.x, SKOSEd may not be used in conjunction with the collaboration framework developed for Protégé 3.x

• Web Protégé: http://webprotege.stanford.edu [16] Collaborative Web-based Ontology Editor

– No support for SKOS/SKOS-XL (supports OWL/OBO editing)

– Use: local Installation or service via public portal. Free of use in both cases

– Collaboration: based on the collaboration plugin for Protégé 3 [17], providing:

• Change tracking

• Inline discussions and notifications.

• Access control mechanism for user groups, based on configurable policies enforced at various granularities.

– Completely configurable user interface

– Available API

03/06/2015 12th ESWC, Portoroz, Slovenia 25

Functional Comparison

Name LicenseFree to

use

Deployme

nt

Data

Models

Import/

Export

Scheme

Managem

ent

Custom

RelationsReasoner

Data

quality

Extendibility /

InteroperabilityACL

Workflow

Managem

ent

Collaborati

on,

Content

Validation

RDF

Middlewar

e

RDF

Backend

SPARQL

QueryingSemantic Integration

VocBench

GNU GPL

v3 (web

application)

, Mozilla

Public

License

MPL

(Semantic

Turkey)

YesWeb

application

SKOS-XL,

SKOS

through

offline

scaling

tool

SKOS(-

XL),

versatile

spreadshe

et import

(through

ST Firefox

UI)

Yes

Creation,

Import,

use

Depends

on triple

store

Metrics

API, shared

backend,

pluggable

Yes Yes

Change

feed,

validation

OWL ART

API

(connector

s to

others:

Sesame2

bundled)

provided

by

Sesame2,

or other

connectors

Yes

assisted

(browse&search) linking

of resources from other

projects / manual linking

of LOD resources.

Extensions for RDF

lifting from unstructured

content

PoolParty Proprietary NoWeb

application

SKOS,

SKOS-XL

add-on

SKOS(-

XL),

static

spreadshe

et import

Only top

concepts

Creation,

Import,

use

Depends

on triple

store

Metrics

Validation

rules

REST API YesYes (add-

on)

History,

versioning,

validation

Sesame

SAIL API

provided

by

Sesame2

Yes

Linking

Text Mining & Entity

Extraction, Search

function

WebProtégé

Mozilla

Public

License

(MPL)

YesWeb

application

OWL 2,

OBOOWL

Not

applicable

Creation,

Import,

use

No,

external

reasoning

possible

Metrics

API, shared

backend,plugg

able

Yes No

Discussion

, watching,

changes

feed

OWL API

provided

by Protégé

3

No linking to BioPortal

TemaTres

GNU

General

Public

License

version 2.0

(GPLv2)

YesWeb

application

Term

based

thesaurus

organizatio

n

MADS,

SKOS-

Core,

Zthes,

Others

Import

from:

Skos-

Core,

tabulated

or tagged

text file

One

scheme

per

vocabulary

Creation,

useNo

Metrics,

ReportsAPI

Yes;

limited

Yes;

limited

Limited

validation

No RDF

Middlewar

e, SKOS

RDF/XML

available

only as an

export

Relational

database

(MySQL

by default)

Not native,

no realtime,

can export

data to a

SPARQL

endpoint

through

ARC2 (RDF

library for

PHP)

Linking between

vocabularies, Entity

Extraction (via addon)

SKOSEd

GNU

Lesser

GPL

YesDesktop

applicationSKOS SKOS

Only top

concepts

Creation,

Import,

use

Depends

on

available

plugins

KB

consistenc

y

Pluggable No No No

OWL API

(used by

Protégé 4)

provided

by Protégé

4 (OWL

API)

Yes

(inherited

from

Protégé 4)

N/A

Example 9 (non-entailment)

<A> skos:narrower <B> .

<A> skos:inScheme <MyScheme> .

does not entail<B> skos:inScheme <MyScheme> .

03/06/2015 12th ESWC, Portoroz, Slovenia 27

Here is a list of relevant organizations* adopting, or close to adopting, VB2.0:

• Food and Agriculture Organization (FAO) > AGROVOC, Biotechnology, Land and Water, FAO Topics

• EU Documentation Office > EUROVOC

• Italian Senate > TESEO

• European Environment Agency (EEA) > GEMET

• Harvard University > Unified Astronomy Thesaurus (UAT)

• EC Parliament Library

• Agence Nationale de la Recherche > Infrastructure nationale AnaEE France

• CABI

• United Nations Convention to Combat Desertification (UNCCD)

• Scottish Government > Gov metadata

• Columbia University > IEDA Thesaurus * http://aims.fao.org/tools/vocbench/partners

User Community and Evaluation (1/2)

2 VB community mailing lists:

- VB user forum

- VB developer forum

User Community and Evaluation (2/2)

USE Values Feature Evaluation

03/06/2015 12th ESWC, Portoroz, Slovenia 28

UsefulnessEase of

use

Ease of

learningSatisfaction

Global 5,34 4,49 5,11 4,93

Experienced 5,58 4,66 5,18 5,02

Inexperienced 4,97 4,19 5,00 4,79

easy to use effective interesting

History 5,38 5,50 6,33

SPARQL Querying 4,00 5,40 6,29

Publication Workflow

Management5,50 5,63 6,22

Collaborative

Management5,75 5,88 6,11

Scheme Management 4,83 5,17 5,57

Role-based Access

Control5,33 5,22 5,40

Reasoning 4,29 4,43 5,38

Triple Store Connectivity 3,67 4,50 5,00

Online Questionnaire: http://vocbench.uniroma2.it/purl/VocBench-User-Questionnaire_2014-10.zip

USE* questionnaire: http://hcibib.org/perlman/question.cgi?form=USE

values ranging from 1 to 7

collected 11 anonymous responses

Lund, A.M. (2001) Measuring Usability with the USE Questionnaire. STC Usability SIG Newsletter, 8:2.

Why should I "buy" it?

Collaborative Management

– Validation&Publication Workflow (propose, validate, publish, revise, deprecate…)

– Fine grained user management

• both users and functionalities may be associated in groups"

• Functionalities (or groups of) may be assigned to different users (or groups of)

– Full editing history (not only concepts, but most of the actions can be subject to validation too)

– RSS Feeds

– Fine-grained metadata and editorial notes: SKOS-XL and reified definitions allow for timestamped status and rich editorial notes

Multilinguality

– Strong support for multi-lingual thesauri management

– Application itself is also multilingual (currently support for english, dutch, spanish, more languages coming)

Native RDF support

– Support for different triple stores

– Possibilty to SPARQL query/update through a dedicated interface with syntax completion/highlight

– SKOS-XL management

• If preferred, SKOS-core export through available conversion tools

Large scale thesauri management

– Scalability limited only by the underlying triple store

Extensibility

– OSGi connectable services

Advanced skos:ConceptScheme Management

– SKOS allows for non-trivial management of multiple conceptual schemes, which is fully supported by VB

And, last but not the least: Free and Open Source! (http://vocbench.uniroma2.it)

03/06/2015 12th ESWC, Portoroz, Slovenia 29

Future works

• A more dynamic framework for content validation

– Trade-off between extensibility/flexibility and the strong

controlled approach

• ICV: more checks (available from the ST engine)

• Overcome extensibility limitations of GWT

• More interaction with the Linked Data

– Improved Alignment

– Generation of VoID / LIME descriptions

03/06/2015 12th ESWC, Portoroz, Slovenia 30

Contacts

VocBench site: http://vocbench.uniroma2.it/

VocBench pages@FAO: http://aims.fao.org/tools/vocbench-2/

You can also follow VB by registering to:

• AIMS Community Site: http://aims.fao.org/ (you can selected the topics you

are interested into)

• VocBench Mailing Lists:

– User: http://groups.google.com/group/vocbench-user

– Developer: http://groups.google.com/group/vocbench-developer

• Semantic Turkey Mailing Lists:

– User: http://groups.google.com/group/semanticturkey-user

– Developer: http://groups.google.com/group/semanticturkey-developer

03/06/2015 12th ESWC, Portoroz, Slovenia 31

03/06/2015 12th ESWC, Portoroz, Slovenia 32/XX

TIME FOR QUESTIONS :-)

03/06/2015 12th ESWC, Portoroz, Slovenia 33

…oh, and there’s a

demo tomorrow!