An information model for managing resources and their metadata · metadata “records” from one...

20
Semantic Web 0 (2012) 1–0 1 IOS Press An information model for managing resources and their metadata Editor(s): Krzysztof Janowicz, University of California, Santa Barbara, USA Solicited review(s): Sven Schade, European Commission – Joint Research Centre, Italy; Tudor Groza, The University of Queensland, Australia; Paul Groth, VU Amsterdam, The Netherlands Hannes Ebner a,b,* and Matthias Palmér a,b a KTH Royal Institute of Technology, 100 44 Stockholm, Sweden E-mail: {hebner,matthias}@kth.se b MetaSolutions AB, 133 31 Saltsjöbaden, Sweden E-mail: {hannes,matthias}@metasolutions.se Abstract Today information is managed within increasingly complicated Web applications which often rely on similar infor- mation models. Finding a reusable and sufficiently generic information model for managing resources and their metadata would greatly simplify development of Web applications. This article presents such an information model, namely the Resource and Metadata Management Model (ReM 3 ). The information model builds upon Web architecture and standards, more specifically the Linked Data principles when managing resources together with their metadata. It allows to express relations between meta- data and to keep track of provenance and access control. In addition to this information model, the architecture of the reference implementation is described along with a Web application that builds upon it. To show the taken approach in practice, several real-world examples are presented as showcases. The information model and its reference implementation have been evaluated from several perspectives, such as the suitability for resource annotation, a preliminary scalability analysis and the adoption in a number of projects. This evaluation in various complementary dimensions shows that ReM 3 has been successfully applied in practice and can be considered a serious alternative when developing Web applications where resources are managed along with their metadata. Keywords: information model, metadata management, resource annotation, linked data, provenance 1. Introduction Most libraries and educational institutions manage their content in repositories, where small groups of do- main experts annotate and manage resources and their metadata. Such repositories are homogeneous within the respective institution that also has the authority to decide upon which standards to support and what level of interoperability to strive for. This poses a problem in a heterogeneous landscape where interoperability be- tween repositories is sought after. An appropriate in- formation model is needed to be able to cope with dif- ferent metadata standards, separated management of * Corresponding author. E-mail: [email protected]. resources and their metadata, and transfer and enrich- ment of metadata across systems. The research described in this article was carried out within projects that focused on publication, en- richment and management of heterogeneous metadata. The common denominator in all of these projects was the annotation of resources with metadata and the col- lection and enrichment of already existing metadata originating from a multitude of content repositories. The heterogeneity of the metadata requires the capa- bility of handling arbitrary formats, such as established standards and their variations, or also proprietary for- mats. 1570-0844/12/$27.50 c 2012 – IOS Press and the authors. All rights reserved

Transcript of An information model for managing resources and their metadata · metadata “records” from one...

Page 1: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

Semantic Web 0 (2012) 1–0 1IOS Press

An information model for managingresources and their metadataEditor(s): Krzysztof Janowicz, University of California, Santa Barbara, USASolicited review(s): Sven Schade, European Commission – Joint Research Centre, Italy; Tudor Groza, The University of Queensland,Australia; Paul Groth, VU Amsterdam, The Netherlands

Hannes Ebner a,b,∗ and Matthias Palmér a,ba KTH Royal Institute of Technology, 100 44 Stockholm, SwedenE-mail: {hebner,matthias}@kth.seb MetaSolutions AB, 133 31 Saltsjöbaden, SwedenE-mail: {hannes,matthias}@metasolutions.se

Abstract Today information is managed within increasingly complicated Web applications which often rely on similar infor-mation models. Finding a reusable and sufficiently generic information model for managing resources and their metadata wouldgreatly simplify development of Web applications. This article presents such an information model, namely the Resource andMetadata Management Model (ReM3). The information model builds upon Web architecture and standards, more specificallythe Linked Data principles when managing resources together with their metadata. It allows to express relations between meta-data and to keep track of provenance and access control. In addition to this information model, the architecture of the referenceimplementation is described along with a Web application that builds upon it. To show the taken approach in practice, severalreal-world examples are presented as showcases. The information model and its reference implementation have been evaluatedfrom several perspectives, such as the suitability for resource annotation, a preliminary scalability analysis and the adoption ina number of projects. This evaluation in various complementary dimensions shows that ReM3 has been successfully applied inpractice and can be considered a serious alternative when developing Web applications where resources are managed along withtheir metadata.

Keywords: information model, metadata management, resource annotation, linked data, provenance

1. Introduction

Most libraries and educational institutions managetheir content in repositories, where small groups of do-main experts annotate and manage resources and theirmetadata. Such repositories are homogeneous withinthe respective institution that also has the authority todecide upon which standards to support and what levelof interoperability to strive for. This poses a problem ina heterogeneous landscape where interoperability be-tween repositories is sought after. An appropriate in-formation model is needed to be able to cope with dif-ferent metadata standards, separated management of

*Corresponding author. E-mail: [email protected].

resources and their metadata, and transfer and enrich-ment of metadata across systems.

The research described in this article was carriedout within projects that focused on publication, en-richment and management of heterogeneous metadata.The common denominator in all of these projects wasthe annotation of resources with metadata and the col-lection and enrichment of already existing metadataoriginating from a multitude of content repositories.The heterogeneity of the metadata requires the capa-bility of handling arbitrary formats, such as establishedstandards and their variations, or also proprietary for-mats.

1570-0844/12/$27.50 c© 2012 – IOS Press and the authors. All rights reserved

Page 2: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

2 H. Ebner and M. Palmér / An information model for managing resources and their metadata

The following requirements were relevant for sev-eral content-heavy projects (such as the Organic.Edunetproject [24]), where it was necessary to

– manage resources together with their metadata,– handle a wide variety of different metadata ex-

pressions,– support Web technologies to enable modern Web

applications, and– to provide a unified approach for the integration

of metadata from legacy (non-Web) systems.

The requirements above led to a set of specific gen-eral problems that needed to be addressed in the courseof developing an appropriate information model.

1.1. Problem statement

The general problems are summarized in the follow-ing paragraphs which consist of the questions that formthe corner stones of the information model.

Management of resources and their metadata. Howare situations distinguished where either both resourceand metadata, only metadata, or neither metadata norresource are in the system?

Enrichment of metadata. Adding domain- or subject-specific metadata in addition to generic metadata is aprimary use case in the projects in which the informa-tion model is being used. How can metadata be en-riched while keeping different descriptions separate?

Organization of metadata. Metadata harvesting doesnot come with a built-in mechanism that connectsdifferent metadata about the same resource. What isneeded to maintain the original metadata and to keeptrack of enrichments?

Integration of heterogeneous information sources.Metadata expressed in different models and standardsshould be used together, e.g. generic metadata in con-nection with domain-specific educational metadata.What is required from the information model that man-ages these metadata in a common carrier (see 2.2)?

Support for Web architecture. How should an infor-mation model for managing resources and their meta-data look like if it is to be used in Web applications?A prerequisite is the support for Web architecture andstandards, in particular Linked Data.

The concept of named graphs named graphs, andparticularly the use of the Resource Description Frame-work (RDF) have already been suggested as a par-

tial solution. However, even though named graphs areloosely specified in different articles and standardssuch as [4,31,18], they lack clear guidelines for howthey should be used in the context of Web architec-ture. In addition to answers to the questions above, thedescribed information model and its reference imple-mentation sought after best practices for how to:

– express that named graphs are related, for in-stance if they identify metadata that describe thesame resource.

– retrieve and modify named graphs using a stan-dard protocol such as HTTP.

– keep track of provenance and access control ofnamed graphs and resources described withinthem.

To solve the problems as stated above, this articleintroduces an information model together with a ref-erence implementation. They can be used to manageresources and their metadata, to express relations be-tween metadata, and to handle provenance and accesscontrol. The described approaches are intended to pro-vide solutions that make it possible to bring alreadyexisting metadata into the world of Web standards.Such resources and metadata are then uniquely iden-tifiable, accessible and modifiable using HTTP URIsand REST-based services following the Linked Dataprinciples. In addition, a short summary of severalshowcases is presented, including some conclusionsregarding the general applicability and possible futureapplications. The information model and its referenceimplementation have been evaluated from several per-spectives, such as the suitability for resource annota-tion, a preliminary scalability analysis and the adop-tion in a number of projects.

1.2. Important terms

Several terms have a frequent occurrence in this ar-ticle. The most important ones are explained in thissection in order to disambiguate their meaning in thecontext of the work described here.

Uniform resource identifier and resource. This arti-cle uses the same definitions of the terms uniform re-source identifier (URI) and resource as they are pro-vided in the Architecture of the World Wide Web [21]in section 2 Identification, which is:

By design a URI identifies one resource. We donot limit the scope of what might be a resource.The term “resource” is used in a general sense for

Page 3: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

H. Ebner and M. Palmér / An information model for managing resources and their metadata 3

whatever might be identified by a URI. It is conven-tional on the hypertext Web to describe Web pages,images, product catalogs, etc. as “resources”. Thedistinguishing characteristic of these resources isthat all of their essential characteristics can beconveyed in a message. We identify this set as “in-formation resources”.

Metadata. A common and widely accepted defini-tion of metadata is that it is “data about data”, seealso the considerations in [1] where it is definedwith “Metadata is machine understandable informa-tion about web resources or other things”. In addition,the term meta-metadata is of relevance for the infor-mation model described here, so the axioms “metadatais data” and “metadata can describe metadata” are im-portant to mention.

Resource annotation. When metadata is created ormodified in order to describe a resource than we callthis resource annotation. More specifically, the mean-ing of annotation in this article is that “metadata aboutone document can occur within a separate documentwhich may be transferred accompanying the docu-ment” [1]. The term document in this definition isequivalent to resource.

Repository. A repository is a server from which re-sources and metadata can be retrieved using a standardprotocol such as HTTP.

Harvesting. Metadata harvesting is used to retrievemetadata “records” from one or more repositories intoanother and can be used to build large collections ofmetadata. The harvesting process is usually carried outusing the “Open Archives Initiative Protocol for Meta-data Harvesting” [22]. It uses XML over HTTP andrequires as a minimum Dublin Core (DC) metadata[6,7], but other representations may be used in additionto DC.

Provenance. As discussed in the introduction of thePROV Model Primer [19], there are different uses ofprovenance. The information model described heremakes use of both agent-centered provenance andobject-centered provenance which is described in 3.3.

1.3. Structure of this article

This article is organized as follows. Section 2 givesan account of the relevant state of the art for metadatamanagement and resource annotation. The informationmodel is introduced in section 3 which is followed by

a presentation of the reference implementation in sec-tion 4. A Web application which implements a user in-terface to the reference implementation is described insection 5. This is followed by some showcases in sec-tion 6. The evaluations in section 7 discuss scalabil-ity, the applicability for resource annotation, and theadoption in real applications. The conclusions in sec-tion 8 summarize the work carried out and depict howthe problems that have been stated in section 1 weresolved. The arcticle concludes with the planned nextsteps and possible future work in section 9.

2. State of the art

This section briefly analyzes and summarizes thestate of the art which is of relevance for the researchdescribed in this article.

2.1. Document- vs. graph-centric metadata

Traditional ways of annotating resources often takea document-centric approach and use the XML formatas it is an established standard for expressing infor-mation. Unfortunately, when document-centric meta-data are transferred between systems (e.g. using a har-vesting protocol like OAI-PMH [22]), the metadata iscopied and a fork takes place. The alternative, to reusemetadata without making a copy, requires that the orig-inal instance can be uniquely identified. This is mostoften not possible with the current approach of meta-data repositories as everything is based on harvest-ing metadata from one system into another, leading tocopies and forks instead of references. Information isunnecessarily duplicated and numerous variations ofdescriptions of the same resource are created withoutbeing able to reconstruct their history.

2.2. RDF as common carrier for metadata

To be able to create flexible annotations of resourcesit is necessary to use a data model which is designedto allow multiple metadata expressions following dif-ferent standards to coexist. RDF is such a (meta) datamodel [23]. However, expressing metadata in RDF re-quires a thorough mapping to be crafted, which ofteninvolves an analysis of the exact semantics of the stan-dard. Good knowledge of RDF and related standardsis required as it is good practice to reuse establishedterms from other RDF-based vocabularies wheneverpossible. There are situations where the conceptual

Page 4: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

4 H. Ebner and M. Palmér / An information model for managing resources and their metadata

model cannot be cleanly mapped to the RDF modeland information may be lost. To avoid such situations,RDF should be considered as a basis for metadata in-teroperability - a common carrier - when adapting ex-isting or creating new metadata standards. For a longerdiscussion on this subject see [28].

The most important metadata standards in the con-text of this article are Dublin Core metadata [6,7], itsabstract model (DCAM) [30] and IEEE Learning Ob-ject Metadata (LOM) [8]. They are used within theshowcases described in section 6. IEEE LOM is mostlyused in its draft mapping into the DCAM to be able tostore it in RDF.

2.3. Named graphs to manage sets of triples andprovenance

The Semantic Web [3] allows statements about iden-tifiable resources to be expressed using RDF tripleswhich may also be made available on the Web for oth-ers to discover. When new statements are made, thereis no need to duplicate information. Additional state-ments about the same identifiable resources can be ex-pressed as new RDF triples and be published on theWeb separately from the first set of triples. If all avail-able triples describing the same resource are mergedinto a single big graph, a holistic view about a resourcecan be constructed. With only triples as the source ofthis information it is impossible to identify triples orsets of them, which creates several problems. To men-tion only a few, it is difficult to detect which tripleshave been replaced in more recent revisions, it is hardto keep track of the history of a resource’s descrip-tions, and it is almost impossible to provide informa-tion which depends on its purpose (i.e. contextualiza-tion).

The concept of named graphs (NG) [4,18], enablesus to work around this, by being able to uniquely iden-tify sets of triples with URIs. This generic approachshould be compared to how specific metadata stan-dards have solved the same problem, for instance IEEELOM [8] with its Metametadata identifier expressed inXML.

Another issue is related to searching and indexing. Ifa query matches one or more triples it is unclear wherethose triples originate from and in which context theyexpress information about the described resource. Thiscan be partially solved by using NGs as this allowsfor uniquely identifying the relevant triples. The useof named graphs in this article is basically identical to

how they are treated by the SPARQL query language[31].

Named graphs provide an approach for denotingcollections of triples which are annotated with relevantprovenance information. However, in [26] it is men-tioned that most approaches building on named graphsfor provenance lack a clear specification of how prove-nance should be represented.

2.4. Representational state transfer

Representational State Transfer (REST) [17] is anarchitectural style for distributed hypermedia systemsand is a popular design pattern used for resource basedweb services. REST itself is protocol-agnostic, but inthis article it is used in the context of HTTP. Its ar-chitectural elements are resource identifiers, resources,resource representations and their metadata. RDF onthe other hand only provides resource descriptions viaresource identifiers without any knowledge of how toaccess those resources via resource representations.However, when named graphs are given URIs they areeffectively resources that contain sets of triples and it isquite natural that they should have resource represen-tations which makes REST a logical choice to accessRDF-based systems.

“Pure” REST is difficult to achieve and most ofthe offered REST-ful web services are REST-orientedbut also contain other concepts such as RPC-orientedmethods [36]. An implementation taking advantage ofHTTP makes it easier to align with the Linked Dataprinciples as described below.

2.5. Linked Data

Linked Data (LD) is a recommended best practicefor interlinking and exploiting data. This e.g. enablesthe exploration the Web of Data which is constructedby related documents on the web. The focus lies onlinks between uniquely identifiable things describedusing RDF. The term “thing” as used in the LinkedData rules is equivalent to the term “resource” in thisarticle. Linked Data implements the following fourrules [2]:

1. Use URIs as names for things.2. Use HTTP URIs so that people can look up those

names.3. When someone looks up a URI, provide useful

information, using the standards (RDF, SPARQL)[23,31].

Page 5: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

H. Ebner and M. Palmér / An information model for managing resources and their metadata 5

4. Include links to other URIs so that they can dis-cover more things.

LD suggests the use of URIs, HTTP and RDF, whichmakes it more specific than REST. However, REST-ful web services can operate on the Web of Data whenthe offered data conforms to the Linked Data princi-ples. The growing LOD cloud [5] is easily extendedby simply providing statements which link to the exist-ing published datasets. This is also one of the big dif-ferences to traditional repositories: instead of harvest-ing and copying data, it is sufficient to refer to things,lookup identifiers and fetch from the original source.To improve performance the data can be cached, butthis does not affect the basic principles. The main-stream of learning repositories [28] has not arrived inthe LOD cloud yet and it will be necessary to provide a“bridge” between these two worlds. How this can workis also topic of this article and described further down.

2.6. Access control and RDF

Web Access Control (WAC) as described in [39] isan existing decentralized system that makes use of anontology for access control on the Web with severalimplementations. Users and groups are identified byHTTP URIs and the model allows various forms of ac-cess to resources. In addition to URIs, users are iden-tified by WebIDs as specified in [34]. The URIs aredereferencable, which means that users and groups canbe looked up across systems and given access to re-sources even if they do not exist in the local system.How access control is realized in ReM3 is discussed in3.4.

3. An information model for managing resourcesand their metadata

This section describes the Resource and MetadataManagement Model (ReM3) by first providing a con-ceptual overview and then explaining in detail howvarious aspects such as provenance and access controlare expressed inside the model. The information modelis based on and relates to the state of the art as de-scribed in the previous section.

3.1. Conceptual overview of ReM3

ReM3 is an information model for keeping track ofresources and their metadata. It is based on the con-

cepts of contexts and entries where each context man-ages a set of entries. A context is a container for a set ofentries that are managed together, at a minimum it pro-vides default ownership of the contained entries. Anentry contains a resource, descriptive metadata aboutthe resource as well as some administrative informa-tion of the entry which will be referred to as the entryinformation. The entry information also keeps track ofaccess control and provenance. Access control can bemanaged on both context and entry level, depending onhow fine-grained access control is needed. The entryalso keeps track of relationships from other entries viaa special relations graph. See figure 1 for a conceptualrepresentation of a ReM3 entry.

Entry Type

Graph Type

Resource Type

Access Control

Provenance

Resource URI Resource

Local Metadata Graph URI

External Metadata URI

Cached External

Metadata Graph URI

Relations Graph URI

...

Local Metadata Graph

External Metadata

Cached External

Metadata Graph

Relations Graph

The cardinalities between the entities above are 1:1.

Figure 1. ReM3 Entry and its linked information

Each entry has three different kinds of types thatdetermine where the resource and its metadata reside,and how the resource is represented and how it shouldbe treated:

– Entry Type indicates if neither, one, or both ofthe entry’s resource and metadata is maintainedwithin the local system. This is the most impor-tant type for the showcases shown in section 6as it differentiates between local and external (re-mote and harvested) resources.

– Resource Type tells whether a resource has a dig-ital representation or not.

Page 6: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

6 H. Ebner and M. Palmér / An information model for managing resources and their metadata

– Graph Type indicates whether a resource gets aspecial treatment within the implementation ofthe model.

Ideally, the entry information, that is, the informa-tion about an entry, is represented in a single RDFgraph which can be requested and updated as a wholeor in part. If an application needs additional informa-tion about a resource, it can be represented in the sameRDF graph by adding additional properties. Within theentry information, the resource, the describing meta-data graphs and the relation graph are URIs that are de-tectable via special properties from the entry URI. TheURIs for the metadata, the relation graph and some-times the resource (in case the resource is an RDFgraph) point to named graphs.

However, the availability of named graphs for an en-try also depends on the entry type which indicates ifmetadata and the resource is to be found locally or ex-ternally. More specifically, the possible values for theentry type are as follows (see also figure 2).

– Local - both metadata and resource are main-tained in the entry’s context.

– Link - the metadata, but not the resource, aremaintained in the entry’s context.

– Reference - the resource and the metadata aremaintained outside the entry’s context.

– Link reference - the resource and the metadata aremaintained outside the entry’s context, in additionthere are complementary metadata maintained inthe entry’s context.

Entry Type

Figure 2. The ReM3 Entry Type and its implications for the locationof metadata

Whenever there are metadata maintained outside ofthe entry’s context it may be cached locally (whichmakes it cached external metadata) to increase relia-

bility and performance, and to avoid pushing the re-sponsibility of doing metadata format transformationsto application developers. The entry information is al-ways kept in the corresponding context, independentlyfrom the used entry type.

The resource type indicates to which extent a re-source is an information resource. A resource is any-thing that can be identified by a URI whereas an in-formation resource is a resource whose essential char-acteristics can be conveyed in a message. Examplesare documents, images, videos, etc., of various sortswhich have representations, e.g. HTML, ODT, PNG,etc., which can transferred in a message body which isthe result of an HTTP request. The idea behind the re-source type is based on the Architecture of the WorldWide Web [21], the W3C TAG discussions on HTTPdereferencing [38] and the W3C Interest Group Noteon “Cool URIs” [32].

The two possible values for the resource type are:

– Information resource - resource has a representa-tion, in the repository or elsewhere.

– Named resource - The resource is not an informa-tion resource, the resource can be referred to incommunication but not transferred in a message.

The graph type was introduced to be able to easilyrecognize resources which need special treatment bythe implementation. Examples are the graph types usedfor access control, namely User and Group; Contextfor container entries, and List to indicate an orderedlist of entries within a context.

The ReM3 terms as shown in figure 3 have been for-malized as an RDF schema [11] and are described inthe ReM3 specification [12].

3.2. Named graphs in ReM3

The information model is RDF-oriented and relieson the concept of named graphs which is part of theSPARQL protocol and the query language specifica-tion. The formal semantics of named graphs are de-scribed by Carroll et al. in [4] and also the SPARQLspecification [31]. As every NG is identified by a URI,it is possible to keep track of the NG provenancesthrough the entry information as described above. Theentry information contains expressions that describethe relationships between graphs. This is used to ex-press that NGs are related, as it is the case when thesame resource is described in different contexts. In thecase of ReM3, NGs are used for expressing the follow-ing pieces of information:

Page 7: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

H. Ebner and M. Palmér / An information model for managing resources and their metadata 7

Figure 3. ReM3 terms described in an RDF schema

– The entry itself, this is the “main” NG whereall other NGs belonging to the same entry arelinked together. This effectively makes it themeta-metadata.

– Metadata, if present.– Cached external metadata, if present.– Resource, if the resource can be expressed in

RDF.

In addition to being the foundation for the ReM3, theuse of NGs makes contextualisation of metadata pos-sible. Without naming the graphs it would be hard todifferentiate between triples originating from differentsources.

3.3. Expressing provenance in ReM3

ReM3 supports agent-centered provenance, whichmeans that it keeps track of information about whichusers were involved in creating or modifying infor-mation, and object-centered provenance, that is, keep-ing track of the origins of a resource or its metadata.The same terminology and definitions are also used in

the PROV Model Primer [19]. However, the origins ofReM3 date back to a time (see [10]) when the PROVModel did not exist, so it could not be taken into con-sideration when ReM3 was implemented.

ReM3 keeps track of who created or contributedwhat, when, where and perhaps even why. All thesepieces of information are kept in the entry informationand are available if the resource originates from a localReM3-based repository. The following provenance-related properties are a minimum for being able to keeptrack of annotation cases where both local and externalmetadata are involved, i.e. entries with entry type LinkReference:

– Creator and contributor– Creation and modification date– Reference to the resource– Reference to the external (possibly original)

metadata– Date when the external metadata was cached

Provenance information can be both metadata andmeta-metadata, e.g. when keeping track of the originof a resource it is metadata, while it becomes meta-

Page 8: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

8 H. Ebner and M. Palmér / An information model for managing resources and their metadata

metadata when it is used to express similar informationfor metadata. Restrictions apply if the metadata origi-nates from an external system, i.e. provenance for theresource and metadata is only known if this is man-aged and exposed by the system where the informationis fetched from. If this is not the case then the “prove-nance trail” starts at the time the external metadata iscached in the ReM3 system.

One of the currently existing restrictions of themodel is the lack of revisions and versioning, both forthe metadata and the described resource. There is pre-vious work which can used in this context [37] whichis being considered for revisions of this informationmodel.

The information above about provenance in ReM3

is about what the model supports in its entry informa-tion. In addition it is possible to add any information tothe metadata or meta-metadata graphs, even if ReM3

is unable to interpret it directly. Currently it is also ex-plored to which extent the PROV Model [19] can beused.

3.4. Expressing access control in ReM3

Just as provenance is expressed in the entry infor-mation, so is access control. The purpose of the ac-cess control in ReM3 is to control who has rights toaccess entries. Access to the entry, metadata and re-source is determined by specific ACL statements usingthe URIs of the entry, metadata and and the resourceURI, respectively. The access control information forthe resource is only relevant when it can be enforcedby an implementation, i.e. if the resource is located inthe same system (entry type is local). Similarly, accesscontrol for metadata is only relevant when it is in thesame system (when entry type is local, link or link ref-erence but not reference).

Access control information is expressed as a set ofread and write permissions for users and groups on theentry, the metadata and the resource. Any explicit per-mission given on entry level automatically applies tothe resource and metadata and does not need not be re-peated. An exception is that by default anyone has readaccess to the entry information, but not to the resourceor metadata. Anyone who has been given write accessto a entry is considered to be an owner of that entry.

Contexts are also represented as entries. Access con-trol for a context, expressed on its entry, has a specialmeaning with regard to all entries located in that con-text:

– Permissions given for the metadata of a contexthas no effect on the entries in the context.

– Permissions given for the resource of a contextapplies to all entries in the context who lack ownaccess control. I.e., if an entry holds ACL infor-mation then those permissions override any per-missions inherited from the contexts resource.

– Ownership of a context (write permissions on theentry level of a context) implies ownership of allentries in the context regardless of any accesscontrol specified on them.

Users and groups that can be given permissions arerepresented as entries with the special graph type Userand Group respectively. There are two default usersand two default groups. First, “_guest” represents anyuser that has not authenticated himself while “_users”is the group of all users that can authenticate them-selves in the system. Second, “_admin” is a predefinedsuperuser and “_admins” is a group to which users thatshould have superuser privileges can be added.

There are a two special rules with regard to lists, thatis, entries with graph type list:

– Entries which are created as children of a listwith custom ACL automatically inherit permis-sions from that list.

– An entry that belongs to a single list cannot be re-moved from that list (making it “unlisted”) with-out also removing the entry itself unless the userhas write permissions in the context.

Web Access Control (WAC) as summarized in 2.6has not been implemented because the authors decidedto start with a more light-weight and non-distributedaccess control model. WAC in its current form is lim-ited to information resources, whereas ReM3 differ-entiates between resource and metadata. However, theWAC ontology [39] and WebID [34] are being consid-ered for integration into ReM3 in the future.

4. Exposing ReM3 using Web technologies

This section describes how ReM3 can be accessedusing standard Web technologies. The reference im-plementation and its reliance on Web architecture aresummarized and implications for interopability are ex-plained.

Page 9: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

H. Ebner and M. Palmér / An information model for managing resources and their metadata 9

4.1. Reference implementation

The research questions stated in the beginning werethe main driver behind developing an own frameworkas described in [10]. It should make it possible to man-age data and its metadata in an interoperable and con-ceptually clean way, being compatible with traditionaldata sources and the possibility of using Semantic Webtechnologies and linking data at the same time. The in-formation structure of ReM3 is suitable to be exposedas a REST-ful HTTP API, see also 2.4. The describedwork resulted in a reference implementation called En-tryStore [16] on which the following sections focus.The framework is built on top of the quadruple storeOpenRDF Sesame1, making it possible to identify setsof triples using named graphs as mentioned above.

4.2. REST-based interface

There are three basic kinds of REST resources in acontext: resource, metadata, and entry. There are twoadditional kinds of resources, the relations resourcethat contains relations from other entries, as well asthe cached-external-metadata resource that contains acache of the external metadata if the entry type is ref-erence or link reference.

The pattern below shows the URIs and allowedHTTP operations for the multiple kinds of REST re-sources:{http-verb} {base-uri}/{context-id}/{kind}/{entry-id}

– http-verb is one of GET, PUT, POST or DELETE.– base-uri is the base URI (namespace) that is spe-

cific for each system.– context-id is a unique identifier for a context.– kind is one of the kinds of REST resources.– entry-id is a identifier for an entry that must be

unique within each context.

Providing an easy-to-use and REST-oriented inter-face together with ReM3 allows for enrichment ofmetadata as the protocol makes communication in bothways possible. Resources in other systems can be de-scribed by linking to them and building a connectionbetween the metadata and the resource. Such connec-tions are in turn exposed using Linked Data which inte-grates heterogeneous information sources. The HTTPAPI of EntryStore is only summarized here, a more de-tailed description can be found in an earlier paper [10].

1OpenRDF Sesame, http://www.openrdf.org

4.3. The use of Linked Data

As indicated in 2.5, ReM3 can be used to build abridge between traditional repositories and the LinkedData cloud. The main point for linking information inReM3 is the entry, but also lists are used to build in-direct relations. RDF triples in the entry informationgraph are used to link entries, resources and their meta-data together. All involved entities are identified bydereferencable URIs whenever possible and HTTP isthe standard protocol.

An EntryStore repository can also be queried througha SPARQL endpoint. The ACL model of ReM3 limitswhich metadata can be exposed. The SPARQL proto-col does not support any access control, so this hadto be solved on the level of the repository by expos-ing only public metadata. Other metadata, no matterwhether completely private to the creator or restrictedto groups, is not exposed at all through SPARQL.There are endpoints on two different levels:

1. A global endpoint for the whole EntryStorerepository, including all contexts and their en-tries.

2. An endpoint per context, including all entries ofa context. This allows to restrict queries to a lim-ited amount of entries and speeds up queries.

Information about named graphs is also exposed us-ing the GRAPH keyword which allows to create viewsof contextualized resource metadata in SPARQL queryresults.

4.4. Additional interfaces

EntryStore also has support for additional proto-cols, mainly aimed for harvesting and querying, suchas OAI-PMH [22] and SQI [33]. EntryStore supportsboth directions, that is, querying and harvesting othersystems as well as being queried and harvested itself.The architecture of EntryStore makes it possible tohook in additional protocols if required. The same ap-plies to metadata converters as the infrastructure in-cludes support for mapping metadata to and from RDF.

Legacy standards and protocols are supported tomake an integration into already existing repositorylandscapes possible.

4.5. Interoperability and implementation experiences

The metadata editor in use allows editing of RDFgraphs directly and send it to the backend. Dublin

Page 10: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

10 H. Ebner and M. Palmér / An information model for managing resources and their metadata

Core-based application profiles (AP) are a naturalchoice because they map easily into RDF. As an ex-ample, to be able to do the same with Learning ObjectMetadata (LOM v1.0)-based profiles, a mapping fromLOM to the Dublin Core Abstract Model (DCAM) wasnecessary. The DCMI developed such a mapping andpublished a draft in their wiki2. On top of that, addi-tional mappings were created to support the LRE v3.0AP used by the Organic.Edunet project [24,9] whichis based on LOM and replaces respectively enhancessome vocabularies. Dublin Core terms are (re)usedwherever possible, only metadata properties specific toLOM were given an own identifier.

EntryStore supports HTTP content negotiation andperforms conversions between metadata formats asneeded. It is e.g. possible to send LOM/XML to theserver and request RDF for the same metadata graph.The formats differ, but the information is the same dueto a careful mapping that balances accuracy againstdiscarding of information that cannot be translated ina good enough manner.

4.6. Free-text queries

There is one exception to the overall good perfor-mance: free-text queries on literals. SPARQL queriesusing FILTER and regular expressions are very expen-sive. To solve this problem an Apache Solr index3 isused for searches in metadata literals. EntryStore im-plements listener interfaces and notifies Solr of eventsin the repository, which (re-)indexes entries and theirmetadata as soon as a change is made. This is impor-tant to keep the repository and the search index in sync.The combination of SPARQL and Solr queries allowsfor powerful and efficient searches even in large repos-itories.

The Solr API is not exposed directly as it wouldnot be possible to respect the ACL. Instead, EntryS-tore queries Solr internally and sends the result to theclient after handling the access rules. Every ReM3 hasone corresponding Solr document which includes allnecessary fields for free-text searches in both metadataand resource. There is experimental support for full-text search in resources if they contain text and are pro-vided in a common format. Apache Tika4 is used forthis.

2DCMI Education Community Wiki, http://dublincore.org/educationwiki/

3Apache Solr, http://lucene.apache.org/solr/4Apache Tika, http://tika.apache.org

5. Building Web applications using ReM3

Since the recommended way to utilize ReM3 is viaits REST-based interface we chose to focus on devel-oping Web applications based on JavaScript, i.e. appli-cations that maintain state on the client side and use aREST-ful approach to retrieve and update data.

There are no hard restrictions on which applicationscan be built on top of ReM3, in fact the informationmodel is very generic and it should be possible to use itin a wide variety of applications. Still, certain applica-tions are easier to build than others due to the nature ofthe information model. This section focuses on an ap-plication that more or less directly exposes the capabil-ities of the ReM3, namely the EntryScape Web appli-cation (included in the EntryStore project [16]) whichpreviously was known as Confolio. EntryScape is byno means the only or necessarily the best way to ex-pose the capabilities of ReM3. However, it sufficientlyexposes some of the complexity of building user inter-faces that make use of the full flexibility of ReM3.

EntryScape provides portfolios (which can be seenas personal spaces) for individuals and groups. Eachportfolio provides a place to store resources - in theform of uploaded files, web content, physical entitiesor abstract concepts, together with descriptive meta-data. A portfolio is represented as a ReM3 context inEntryStore, and a resource together with its metadatacorresponds to a ReM3 entry. In figure 4 a work viewis shown of a portfolio with a listing to the left anddetails of a selected entry to the right.

The metadata expressions may differ greatly be-tween entries because:

– entries may represent different things, for exam-ple web pages or physical objects.

– entries may be described for different purposesand different target groups.

– entries may originate from different informationsources which use different standards.

The use of RDF as common carrier allows thesemetadata expressions to co-exist, both between entriesand sometimes within a single metadata expression.This flexibility presents a challenge when presentingand editing metadata since very little can be taken forgranted. The solution taken in EntryScape is to rely onthe library RForms5 that generates user interfaces forboth presentation and editing of metadata from a con-

5RForms is a JavaScript re-implementation of the SHAME Javalibrary

Page 11: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

H. Ebner and M. Palmér / An information model for managing resources and their metadata 11

Figure 4. A screenshot of EntryScape which was extended work with the Europeana search API and metadata

figuration mechanism called annotation profiles (AP).The details on how RForms and APs are used to trans-form an RDF graph into a form is beyond the scopeof this article and the interested reader is encouragedto look at [15,29,14] for details where also relations toother initiatives such as DCAP DSP [27] are discussed.

To generate an editor, RForms must be told whichannotation profile or which combination of APs to use.In theory, the user could be asked which AP to use ineach situation given that enough descriptive informa-tion is provided to make an informed decision. How-ever, from a usability perspective it is often better topresent users with a reasonable default and allow it tobe changed into something more specific when needed.Each EntryScape installation may configure a defaultannotation profile for every entry type it wants to sup-port.

In figure 5 we see basic Dublin Core metadata com-bined with a copyright statement from IEEE LOM.

In presentation mode the same Annotation Profilewill be used, but only fields that have been filled in willbe shown. If RForms detects that there are more meta-

data available than can be shown with the current An-notation Profile, it will look for other Annotation Pro-files as a fallback. Such a situation can occur when en-tries originate from another system, or for that matter,the user has switched back and forth between Applica-tion Profiles or intentionally combined them.

6. Showcases

The following showcases are all centered aroundlearning resource descriptions. They involve annota-tion of resources which are uploaded into or linkedfrom the EntryScape web application as well as en-hancement and contextualisation of metadata which isharvested from other repositories.

6.1. Organic.Edunet

The goal of the now successfully completed Or-ganic.Edunet project was to facilitate access, usage andexploitation of digital educational content related to

Page 12: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

12 H. Ebner and M. Palmér / An information model for managing resources and their metadata

Figure 5. An editable metadata form which blends Dublin Core and LOM metadata

Organic Agriculture and Agroecology. The combina-tion of EntryStore and EntryScape (in Organic.Edunetstill called “Confolio”) – in the context of this projectreferred to as “Organic.Edunet repository tools” – wasused from the very beginning of the content populationprocess. The Organic.Edunet federation consists ofnumerous repository tool installations which are har-vested using OAI-PMH by the Organic.Edunet portal[24] on a regular basis. More than 11000 educationalresources have been described with educational meta-data by several hundred contributors so far. Roughlyhalf of the learning resources were already describedwith some basic metadata without educational in-formation. These already existing metadata instanceswere harvested using OAI-PMH and converted andmapped into RDF and LOM/DCAM.

Additional educational metadata was added in theOrganic.Edunet repositories. This approach is greatlysupported by the ReM3 model, which allows a dif-ferentiation between local and external resources andmetadata. Such a differentiation in combination withthe use of separate metadata graphs is used to enhanceharvested resource descriptions from e.g. the Intuterepository6. In this case, two metadata graphs are usedper resource: one with cached external metadata (insimple DC format harvested using OAI-PMH) and onewith local educational metadata using LOM/DCAM.If Intute modifies the metadata in its repository it willbe reflected in EntryStore after the next re-harvest.The locally annotated educational metadata remain un-touched, which is only possible by keeping metadatafrom different origins in separate graphs.

6http://www.intute.ac.uk

6.2. ARIADNE

Following up on the results from Organic.Edunetand as proof-of-concept for the general applicabil-ity of ReM3 and the reference implementation, theOAI-PMH target of the ARIADNE foundation7 (see[35] for a description of ARIADNE’s architecture)was harvested and triplified, resulting in around 50million triples within 1.2 million metadata graphs inone EntryStore repository. The provided LOM meta-data was mapped into the DCAM and converted intoRDF during the harvesting process. As in the case ofOrganic.Edunet, a scaffolding approach to describinglearning resources can be taken. The surrounding con-text of a learning resource can be bootstrapped usingLink References, e.g. by providing different descrip-tions for different learning scenarios.

Another benefit of having all ARIADNE metadatain RDF is the possibility of running SPARQL queriesagainst a large amount of learning resource descrip-tions. SPARQL can be used to formulate complexqueries based on the LOM/DCAM elements to queryand build graphs in the repository. An example is re-questing a list of all LOM Learning Resource Typesthat a specific person has used when annotating learn-ing materials. More complex queries can be formu-lated by using additional metadata elements and ad-vanced query logic. A use case is the contextualizationof learning resources, to get information on how dif-ferent persons described the same resource with differ-ent metadata to reflect their specific use within variouseducational (or other) activities. The amount of tripleswill increase in the future as the implementation of theLOM/DCAM mapping is refined and completed.

7ARIADNE foundation, http://www.ariadne-eu.org

Page 13: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

H. Ebner and M. Palmér / An information model for managing resources and their metadata 13

6.3. Europeana

The “Hack4Europe!” competition in Stockholm8,organized by the Europeana project9, had the goal toshow the potentials of the Europeana content by build-ing applications to showcase the social and businessvalue of open cultural heritage data.

Within the scope of the hack day we developedanother showcase to demonstrate how heterogeneousmetadata can be managed using ReM3. Like in Or-ganic.Edunet, a combination of EntryStore and En-tryScape was used. Both applications were extendedin a way so that they can search in Europeana andextract Europeana metadata from the search results.This allows for adding resources directly from a Eu-ropeana search result to a user’s personal portfoliofor further annotation with contextual metadata. Thedemonstrated use case Europeana portfolio10 was tosearch for resources which are suitable to be used inan educational context and to turn them into learningresources by annotating them with educational meta-data in EntryScape. Technically this means search-ing and caching metadata described using the Euro-peana Data Model [13] and adding educational meta-data (e.g. in LOM/DCAM) using a ReM3 Link Ref-erence in EntryStore. Everything is integrated into theEntryScape interface and the end user does not haveto know anything about where the metadata originatesfrom or which formats that are used.

7. Evaluation

So far, ReM3 and its reference implementation En-tryStore have been evaluated with respect to scalabil-ity, their suitability for resource annotation processes,and the adoption in production environments. Theseaspects are described below and provide a picture ofthe applicability of the platform in question.

7.1. Scalability analysis

A series of load tests has been carried out to get animpression of the scalability of the ReM3 reference im-plementation EntryStore. In the sections below the testenvironment and the results from different scenarios

8http://www.hack4europe.se9http://europeana.eu10http://hack4europe.se/information/

meta-solutions-europeana-portfolio/

will be discussed. The overall goal was to find out howmany concurrent requests could be run while still stay-ing below 100 ms response time to ensure the user hasa feeling of instantenous response [25]. Taking round-trip times, request creation and parsing, and also userinterface updates into consideration, the tests belowaimed for response times at a maximum of 50 ms. Thefocus was on requests of entries for reading and meta-data graphs for modification. There were no tests car-ried out with binary resources as their size depends toomuch on the specific use case and their handling doesnot involve any complex operations in the triple store.

7.1.1. Test environmentAn EntryStore instance was deployed in a KVM-

based virtual machine (VM) on a Linux host. The VMhad access to four Intel Xeon X3440 CPU cores run-ning at 2.53 GHz and 4 GB of memory. The clientfor creating the traffic to the server came with an In-tel Core i5 with two CPU cores at 2.4 GHz. To cre-ate, log and graph the requests the application JMeterwas used in version 1.6. The network connection be-tween client and server was 100 Mbit (duplex) with anaverage round-trip time of 5 ms. EntryStore was con-figured to use Sesame’s native store (using one “cspo”index) and all communication with the EntryStore in-stance was done via its HTTP API. The graphs as seenin this evaluation were created with Loadosophia.org.The tested scenario was a medium-sized repository asit was used in the Organic.Edunet project. The En-tryStore instance was seeded with a copy of the Or-ganic.Edunet data from the KTH installation. It con-sisted of 1 024 898 triples in 57 353 named graphs,holding around 11 000 ReM3 entries.

7.1.2. Test resultsTo get a first feeling for the relation between re-

sponse times and the amount of concurrent requestthreads, a first test run was made. For this a ramp-upscheme was used, where the amount of active clientthreads was increased up to a maximum of 300 con-current threads during 360 seconds. In JMeter a vir-tual user (VU in the figures) is equivalent to one clientthread. Every thread continuously requested a randomentry from EntryStore, to retrieve a JSON object whichassembles information from up to four named graphs(entry, metadata, cached external metadata, and re-source). The results of this first test run, as can beseen in figure 6 and figure 7, shows that the responsetimes increase with the amount of concurrent clientthreads, while the transactions per second stay at thesame level. Interpreting these two graphs led to the

Page 14: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

14 H. Ebner and M. Palmér / An information model for managing resources and their metadata

conclusion that around 20 concurrent threads is mostlikely the number which provides low response times(below 50 ms) and high request throughput in this spe-cific setting.

Figure 6. Response time and read transactions per second with 300active client threads

Figure 7. Read transactions per second with 300 active client threads

The following tests were run for 360 seconds with20 concurrent threads, which resulted in an averageresponse time of 38 ms while sustaining an averagethroughput of 478 requests per second (see figure 8).The spikes in the graph are probably caused by garbagecollection in the JVM, but this was not further investi-gated.

Another test was carried out with modifying re-quests where metadata graphs of random entries wereupdated with graphs consisting of 40 triples (this wasthe average amount of triples per metadata graph inthe Organic.Edunet repository). Modifying transac-tions are not treated concurrently by EntryStore andaccording to an analysis of the Apache log files ofthe Organic.Edunet installation only slightly more than1% of all requests were modifying. As a consequencethe amount of concurrent threads was decreased to 5

Figure 8. Response time and read transactions per second with 20active client threads

in this test run. The average response time in this casewas 34 ms while maintaining an average of 144 trans-actions per second, see figure 9.

Figure 9. Response time and write transactions per second with 20active client threads

The significantly lower number of transactions persecond can be explained by several critera which arespecific for modifications in the repository:

– To ensure consistency, there is no support forconcurrent write transactions in Sesame’s NativeStore, so modifying requests block each other(this does not affect reading requests).

– Each modification triggers not only updates in thetriple store, but also in the Solr index as all literalsare indexed for free-text search.

– The request body in RDF/JSON has to be parsedby the server, which is not the case for readingrequests.

7.1.3. Discussion and possible improvementsThe described test is only the beginning of a more

structured series of tests in which various differentscenarios will be designed. This is out of scope for

Page 15: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

H. Ebner and M. Palmér / An information model for managing resources and their metadata 15

this article and will be published separately. How-ever, the numbers above give a preliminary picture ofthe scalability of EntryStore in the context of the Or-ganic.Edunet scenario. The next steps are to also testwith different backends, so far only Sesames NativeStore was used. There are other backends which im-plement the Sesame Sail API11 and claim to scale well(and also support concurrent write operations in con-trast to Native Store), such as Bigdata12, OWLIM13 andVirtuoso14. Big data sets with realistic data on botha single server and in a cluster are to be tested andprovisional tests with an EntryStore instance holdingslightly above 2 million entries on a single machinehave shown good results. However, there is no thor-ough analysis of EntryStore instances of this size yet,so this was not included in this article.

7.2. Suitability for resource annotation processes

A survey and structured interview were carried outin order to analyze the current practices and tech-nologies for resource annotation processes in hetero-geneous metadata repositories. The results are primar-ily valid for learning repositories because of the back-ground of the participating experts, but can be also beapplied to metadata management in generic reposito-ries because of the common characteristics of thesesystems. User experience (UX) was not investigatedeven though some of the discussed issues may haveimplications for the UX as well.

7.2.1. Survey resultsThe survey consisted of 5 multiple choice questions

and 7 structured interview questions (see the appendixfor a list questions). The results are based on the an-swers from 16 experts. All of the respondents have pro-fessional experience with resource annotation, the ma-jority of them (69%) within research. Most of them(63%) either develop systems for resource annotationand some also annotate themselves (38%). Some areor were involved in the development of metadata spec-ifications or other similar activities (25%). All of theparticipants were affiliated with a university at the timethe survey was carried out. In total 12 organizationsfrom 7 European countries were represented.

11http://www.openrdf.org/doc/sesame2/system/ch05.html

12http://www.systap.com/bigdata.htm13http://www.ontotext.com/owlim14http://virtuoso.openlinksw.com

Most of the annotation activities in which the par-ticipants were involved were about annotation fromscratch (69%) and enhancement of already existingmetadata either in the same (50%) or a different con-text (44%). In the cases where existing metadata wasenhanced (64%) the most common solution was tocopy the original metadata into the local system andmodify there, i.e. causing a fork. In only 21% of thecases the original metadata was preserved and linkedto instead of forked. All respondents were involved inthe annotation of information resources. Some haveexperience with annotation of parties such as personsand organizations (31%), but also with physical ob-jects such as non-human objects or substances (13%).The primary classification techniques were ontologies(88%) and tags (56%). In almost half of the casesalso formal relationships (i.e. predicates) between re-sources and tags or concepts (44%) were applied.

7.2.2. Structured interview resultsIn the structured interview the respondents were

asked to discuss several issues in metadata manage-ment in the context of resource annotation. The biggestand most relevant for this evaluation were: importantfeatures for managing information; possible problemswhen copying (as it is the case in harvesting) metadatabetween systems; and problems that could occur whenenhanced metadata are linked to their original meta-data (instead of copied).

Most important features. The respondents were askedto give an account of the features they would priori-tize if they were to build a tool which is capable ofhandling resources and their metadata. The primaryfeature requirement was interoperability of protocolsand formats, almost all respondents mentioned it inone way or another. Abstract models such as DCAMor MLR were recommended over XML-based models.The handling and transformation of metadata in vari-ous schemas and formats was a requirement. The reuseof existing metadata was mentioned together with thepossibility of contextualizing resources. A majority ofthe respondents also considered provenance and thetracking of contributions as important. Another in-quired feature was the capability to maintain a clearseparation between metadata from different sources.This included a demand for links between all meta-data and resources. Interlinking resources (i.e. creatingrelations between resources) and automatic or semi-automatic term extraction were other features that wereon the wish list.

Page 16: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

16 H. Ebner and M. Palmér / An information model for managing resources and their metadata

Issues when copying. When asked which issuescome to their mind when metadata is copied (i.e. har-vested) between repositories, most respondents wereconcerned about redundancy and conflicting changesin the metadata. Redundancy because the same orslightly modified versions of the same descriptions co-exist in several different systems. Conflicts becausechanges in one repository are not reflected in otherplaces or in the worst case overriden. Issues of dataconsistency and multiple copies of the same data arecaused by cascaded harvesting, i.e. aggregators withintersecting harvesting targets. This demands a sophis-ticated identification management and provenance ofmetadata records. Inconsistencies between the seman-tics of same metadata elements in different systemshave also been experienced. The attribution of meta-data authors in the connection with IPR and metadataquality (e.g. certification or validation) is another is-sue which was considered non-trivial, as this requiresmeta-metadata being transferred between systems inaddition to the usual metadata record.

Issues when linking. The answers were very differentfrom the previous case (copying) when the respondentsdescribed possible issues when metadata was linkedto instead of copied. All of them focused on perfor-mance issues and broken links or unavailable informa-tion. They stressed the fact that the performance is usu-ally poor when external systems are involved and thatmetadata may disappear from the originating systemand therefore become unavailable. Nobody mentionedthe possibility of caching to avoid performance dete-rioration or other similar techniques. Instead of con-flicting changes like in the previous case, it seemedto be a problem that original metadata only can beextended but not changed. Another potential problemmentioned was the maintenance of semantics of the es-tablished links, both between metadata and resources,in all combinations.

It should be mentioned that very specific require-ments, issues or ideas, possibly touched on by only oneperson, were not included in the summaries above.

7.2.3. ReM3 in the light of the interview resultsThis section takes a look at the capabilities of ReM3

in the context of the results of the structured interviewas summarized above.

ReM3 and its reference implementation EntryStorehave good prerequisites for solving interoperability is-sues. The model itself is based on RDF which canbe considered a model that works well to supportmetadata harmonization (see [28], chapter 6 “Horizon-

tal harmonization”). ReM3’s reference implementationEntryStore rests upon Web architecure and its standardprotocols. Linking and a clear separation between dif-ferent metadata graphs describing the same resourceare built into the model, this is the core of a ReM3 en-try. The meta-metadata is part of an entry and keepstrack of provenance which satisfies requirements suchas metadata attribution, licensing, and quality certifi-cation. The performance issues that almost all respon-dents worried about are circumvented with a specialmetadata graph that only holds cached external meta-data as described in 3.1. This makes it robust in caseswhere the external system is slow, offline, or if theresource or their metadata have been removed com-pletely from the original system. The semantics of re-lationships were mentioned to be problematic becauseof potentially different interpretations. The relation-ships in ReM3 are defined in RDFS [12] and if otherrelationships (i.e. predicates) are used they originatefrom a well-defined vocabulary or ontology.

In the light of the interview results the combinationof ReM3 and EntryStore can be considered being veryclose to the capabilities of repositories as imagined bythe experts. To clarify, they were asked to think of howthey would design a system that conformes to their re-quirements, but they were not requested to commenton ReM3 or EntryStore.

7.3. Adoption

The showcases described in section 6 describe threequite different scenarios, one of which is the use ofEntryStore and EntryScape as repository tools in theOrganic.Edunet project (see 6.1). This is a produc-tion setting, with a federation consisting of four au-tonomous repositories. The other two briefly describedshowcases have a more experimental character. Thereis the proof-of-concept of using EntryStore for a large-scale triplification of ARIADNE (see 6.2) which ba-sically could be used in production, but has not un-dergone thorough testing of specifics relevant for largedatasets. There is also the Hack4Europe! contributiondescribed in 6.3 where EntryStore and EntryScape in-tegrate with the Europeana APIs and provide an inter-face for the contextualization of the cultural heritagedata provided by the Europeana foundation.

In addition to Organic.Edunet, there are severalother projects which make use of EntryStore and itsEntryScape frontend in a production setting:

Page 17: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

H. Ebner and M. Palmér / An information model for managing resources and their metadata 17

Virtual Open Access Agriculture & Aquaculture Repos-itory (VOA3R). The VOA3R project15 uses Entry-Scape to offer providers of Open Educational Re-sources (OER) a means to upload, link, and describelearning material which is compliant with the FRBR-based VOA3R Application Profile. The project is on-going and it is yet unclear how many resources aregoing to be described using EntryScape.

TEL-Map. EntryScape is used in the TEL-Map proj-ect16 to provide advanced descriptions of collabora-tive EC-funded projects as well as projects by individ-ual researchers (such as PhD students). These meta-data describe the overall project aspects which in-clude the projects’ contexts, assumptions, goals, im-pacts, and approaches. Relationships between projectsor between projects and organisations can be formu-lated using semantic connections, e.g. an organisationcan be partner, stakeholder, collaborator, etc, whereasprojects can be successors, predecessors or simply re-lated to other projects.

Hematology Net. In Hematology Net17 a combina-tion of EntryScape and the EHA CV passport18 wasused. A user can upload a resource and link it toa learning goal in the CV passport. Users have thepossibility to provide their own competence profilethrough a digital version of the EHA CV passport inEntryScape. This allows them to match their compe-tence profile with their learning goals to get recom-mendations for for educational material relevant forreaching their goals.

8. Conclusions

In this section the problems from 1.1 are revisitedalong with some considerations regarding the applica-bility of the information model and its reference im-plementation.

8.1. Problems in retrospect

Management of resources and their metadata. ReM3

allows to distinguish between situations where either

15Funded by the EC through ICT PSP, http://voa3r.eu16Funded by the EC through FP7, http://telmap.org17Funded by the EC through Leonardo da Vinci, http://www.

hematologynet.eu18http://www.ehaweb.org/

education/online-learning-tools/curriculum-cv-passport/

both resource and metadata, only metadata, or neithermetadata nor resource are handled in the system. Thesolution is the introduction of the entry type (see 3.1),which is also the most important type for the show-cases shown in section 6 as it provides a clear distinc-tion between local and external (remote and harvested)resources.

Enrichment of metadata. The same resource can bedescribed with different metadata, identified by namedgraphs. Basic metadata with e.g. title and descriptioncan be enhanced with educational metadata in an addi-tional metadata graph and used in an educational con-text. A resource can be contextualized just by annotat-ing it with additional metadata, building upon the al-ready existing descriptions. Write-access is not neces-sary, the metadata belong to the respective users, andsince the LD principles are followed, anyone can pointto and describe anything.

Organization of metadata. An ReM3 entry keepstrack of all involved pieces of information and linksback to the original source of information. Metadataare cached locally for performance reasons. This link-ing approach is also part of the solution to the problemon how to expose the combination of resource and theirmetadata in a Linked Data way. The ReM3 entry con-tains statements to keep resources and their metadatatogether with the possibility of including additional re-lations.

Integration of heterogeneous information sources. Itwas unclear how to design an information model thatcan integrate heterogeneous information sources andsupports harmonization of metadata expressions fromdifferent standards using a common carrier. The RDFdata model is used as a common carrier by the refer-ence implementation. Together with the ReM3 entrytype this provides a solution to the problem on howto integrate heterogeneous metadata and how to en-rich metadata originating from other sources, as men-tioned above. The latter also benefits from a clear dis-tinction of separate descriptions, implemented usingnamed graphs and kept together by ReM3 entries.

Support for Web architecture and best practices forusing named graphs in Web applications. All enti-ties in the ReM3 model (see 3.1) such as the entry it-self, metadata, relations, principals, are identified bynamed graphs which are given dereferencable URIs.The resources are linked with their metadata throughthe entry information. Relationships between namedgraphs are expressed in in the entry information where

Page 18: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

18 H. Ebner and M. Palmér / An information model for managing resources and their metadata

also provenance and access control is handled. Namedgraphs can be retrieved and queried using HTTP andReM3’s URL-schema (see 4.2). The use of HTTP andRDF wherever possible fulfills the requirement of sup-porting Web architecture and faciliates the creation ofWeb applications.

8.2. Applicability and evaluation

The information model has a reference implementa-tion which is described and evaluated in the sections4 and 7. It shows that ReM3 is more than just an ideaand that it can be applied in a meaningful way in real-world applications. Based on the answers from experts,an evaluation showed that ReM3 implements a feasibleapproach for resource annotation processes (see 7.2).The extent of the adoption of EntryStore in several suc-cessfully completed and ongoing projects shows thatReM3 works in the described scenarios (see 7.3) andshowcases (see 6).

The focus on Semantic Web technologies and thesupport for SPARQL and Linked Data in generalmakes it possible to contribute to the LOD cloud. Eventhough the system currently is mostly deployed in thecontext of learning repositories with a focus on edu-cational metadata, there are no restrictions regardingthe metadata standards or application profiles in use.The supported protocols for metadata harvesting andquerying can easily be extended.

9. Future work and next steps

In the context of metadata and resource manage-ment it is relevant to provide means for creating ad-ditional links between resources. Currently, interlink-ing is mostly based on lists (assuming that resourcescontained in the same list have something in common)and entries keeping together different metadata graphsdescribing the same resource in different contexts. Inaddition to that it would be useful to provide seman-tics for lists, such as e.g. programmes, courses, coursemodules, etc. and explicit semantic relations betweenresources by exposing this in the user interface.

The system works well with large repositories (sev-eral million entries in one installation), but there is lit-tle knowledge about where the limits are for concur-rent access to very large systems in typical usage sce-narios. An elaborated set of benchmarks has to be de-veloped to collect significant evidence regarding per-formance and scalability. However, this is more related

to the reference implementation than to the informa-tion model.

The REST-based interface has its limitations in cer-tain collaborative use cases. If an update of a resourceor metadata is performed by a client, all clients whichhave data cached e.g. in the browser cache have tomake a conditional reload in order to see the changes.Due to the nature of HTTP, the clients do not get noti-fied of any changes on the server side, they have to pollinstead. This is an issue which can be solved by addingsupport for push technologies such as WebSockets [20]to EntryStore.

In addition to pushing information updates to theclients, it can be of interest to keep a version historyof a resource and its metadata. Such snapshots in com-bination with a short summary of changes can provideinput for collaborators on e.g. what has been changedby whom, when and why; to put it simply, informationabout how a resource has evolved over time.

Acknowledgements

The work presented in this article has been par-tially carried out with financial support from the EITICT Labs activity Data Bridges in the thematic actionline Digital Cities of the Future, and the EC-fundedprojects Organic.Edunet (grant agreement ECP-2006-EDU-410012), TEL-Map (grant agreement 257822)and Responsive Open Learning Environments (ROLE)(grant agreement 231396), which the authors grate-fully acknowledge.

Appendix

Questions of structured interview as summarized insection 7.2

– What is your relationship with metadata annota-tion?

– What was the purpose of the annotation processyou were involved in?

– If existing metadata were enhanced, how was thismanaged?

– What kinds of resources have been annotated?– Were any classification techniques used?– From an information management perspective,

which features do you consider as most importantfor an annotation tool?

Page 19: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

H. Ebner and M. Palmér / An information model for managing resources and their metadata 19

– If you copy (harvest) metadata between systemsand modify it, which potential problems come toyour mind?

– If you - instead of copying - link between meta-data instances (original and enhancement), whichpotential problems come to your mind?

– Which system(s) did or do you use for annota-tion?

– Are you familiar with or aware of any tool thatallows for annotating information resources bylinking metadata instead of copying?

– Do you have any other comments or questionsthat you consider relevant and that you thinkshould have been asked?

References

[1] T. Berners-Lee. Design issues – axioms of web architec-ture: Metadata. http://www.w3.org/DesignIssues/Metadata.html, 1997.

[2] T. Berners-Lee. Linked Data - Design Issues. http://www.w3.org/DesignIssues/LinkedData.html, 2009.

[3] T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web.Scientific American, 284:34–43, 2001.

[4] J. Carroll, C. Bizer, P. Hayes, and P. Stickler. Named graphs.Web Semantics: Science, Services and Agents on the WorldWide Web, 3(4):247–267, 2005.

[5] R. Cyganiak. The Linking Open Data cloud diagram. http://richard.cyganiak.de/2007/10/lod/, 2012.

[6] Dublin Core Metadata Initiative recommendation – DublinCore metadata element set, version 1.1. http://dublincore.org/documents/dces/, 2010.

[7] Dublin Core Metadata Initiative recommendation –DCMI metadata terms. http://dublincore.org/documents/dcmi-terms/, 2010.

[8] E. Duval. Final draft standard for Learning Ob-ject Metadata (LOM) IEEE 1484.12.1-2002. http://ltsc.ieee.org/wg12/files/LOM_1484_12_1_v1_Final_Draft.pdf, 2002.

[9] H. Ebner, N. Manouselis, M. Palmér, F. Enoksson, N. Palav-itsinis, K. Kastrantas, and A. Naeve. Learning object annota-tion for agricultural learning repositories. In Proceedings ofthe Ninth IEEE International Conference on Advanced Learn-ing Technologies, 2009. ICALT 2009, pages 438–442. IEEE,2009. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5194270.

[10] H. Ebner and M. Palmér. A mashup-friendly resourceand metadata management framework. In F. Wild,M. Kalz, and M. Palmer, editors, Proceedings of the FirstInternational Workshop on Mashup Personal LearningEnvironments (MUPPLE08). Maastricht, The Nether-lands, volume 388, pages 14–17, September 2008.http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-388/ebner.pdf.

[11] H. Ebner and M. Palmér. RDFS of the Resourceand Metadata Management Model (ReM3). http:

//entrystore.googlecode.com/svn/store/trunk/doc/terms.rdf, 2012.

[12] H. Ebner and M. Palmér. Specification of the Resource andMetadata Management Model (ReM3). http://code.google.com/p/entrystore/wiki/ReM3, 2012.

[13] Europeana Data Model documentation. http://pro.europeana.eu/edm-documentation, 2012.

[14] F. Enoksson. Flexible Authoring of Metadata for Learning:Assembling forms from a declarative data and view model. Li-centiate thesis, KTH Royal Institute of Technology, School ofComputer Science and Communication, 2011.

[15] F. Enoksson, M. Palmér, and A. Naeve. An RDF modificationprotocol, based on the needs of editing tools. In Metadata andSemantics, Post-proceedings of the 2nd International Confer-ence on Metadata and Semantics Research, MTSR 2007, CorfuIsland, Greece. Springer, October 2007.

[16] EntryStore project at Google Code. http://code.google.com/p/entrystore/, 2012.

[17] R. T. Fielding. Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University ofCalifornia, Irvine, 2000. http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm.

[18] F. Gandon and O. Corby. Name That Graph or the need to pro-vide a model and syntax extension to specify the provenanceof rdf graphs. Proceedings of the W3C Workshop - RDF NextSteps, 2010.

[19] Y. Gil, S. Miles, K. Belhajjame, H. Deus, D. Garijo, G. Klyne,P. Missier, S. Soiland-Reyes, and S. Zednik. PROV modelprimer. W3C Working Draft, 2012. http://www.w3.org/TR/prov-primer/.

[20] I. Hickson. The WebSocket API. W3C Working Draft, 2012.http://www.w3.org/TR/websockets/.

[21] I. Jacobs and N. Walsh. Architecture of the World Wide Web.W3C Recommendation, 1:1–37, 2004. http://www.w3.org/TR/webarch/.

[22] C. Lagoze, H. Van de Sompel, M. Nelson, and S. Warner.The Open Archives Initiative Protocol for Metadata Har-vesting. http://www.openarchives.org/OAI/openarchivesprotocol.html, 2002.

[23] F. Manola and E. Miller. RDF primer. W3C Recommenda-tion, 10(February):1–107, 2004. http://www.w3.org/TR/rdf-primer/.

[24] N. Manouselis, K. Kastrantas, S. Sanchez-Alonso, J. Cáceres,H. Ebner, M. Palmer, and A. Naeve. Architecture of the Or-ganic.Edunet web portal. International Journal of Web Portals(IJWB), 1(1):71–91, 2009.

[25] R. B. Miller. Response time in man-computer conversationaltransactions. In Proceedings of the December 9-11, 1968, falljoint computer conference, part I, AFIPS ’68 (Fall, part I),pages 267–277, New York, NY, USA, 1968. ACM.

[26] L. Moreau. The foundations for provenance on the web. Foun-dations and Trends in Web Science, 2(2-3):99–241, 2010.

[27] M. Nilsson. Description Set Profiles: A constraint languagefor dublin core application profiles. http://dublincore.org/documents/dc-dsp/, 2008.

[28] M. Nilsson. From Interoperability to Harmonizationin Metadata Standardization: Designing an EvolvableFramework for Metadata Harmonization. PhD thesis,KTH Royal Institute of Technology, School of Com-puter Science and Communication, 2010. http://kmr.nada.kth.se/papers/SemanticWeb/

Page 20: An information model for managing resources and their metadata · metadata “records” from one or more repositories into another and can be used to build large collections of ...

20 H. Ebner and M. Palmér / An information model for managing resources and their metadata

FromInteropToHarm-MikaelsThesis.pdf.[29] M. Palmér, F. Enoksson, M. Nilsson, and A. Naeve. Annotation

profiles: configuring forms to edit RDF. In Proceedings of the2007 international conference on Dublin Core and MetadataApplications, DCMI ’07, pages 10–21. Dublin Core MetadataInitiative, 2007.

[30] A. Powell, M. Nilsson, A. Naeve, P. Johnston, and T. Baker.DCMI abstract model. http://dublincore.org/documents/abstract-model, 2007.

[31] E. Prud’hommeaux and A. Seaborne. SPARQL query languagefor RDF. W3C Recommendation, 2009(January):1–106, 2008.http://www.w3.org/TR/rdf-sparql-query/.

[32] L. Sauermann, R. Cyganiak, and M. Völkel. Cool URIsfor the Semantic Web. W3C Interest Group Note, 49(De-cember 2008):1–15, 2008. http://www.w3.org/TR/cooluris/.

[33] B. Simon, D. Massart, F. Van Assche, S. Ternier, E. Duval,S. Brantner, D. Olmedilla, and Z. Miklos. A simple query in-terface for interoperable learning repositories. In Proceedingsof the 1st Workshop on Interoperability of Web-Based Educa-tional Systems in conjunction with 14th International WorldWide Web Conference (WWW’05), pages 11–18. CEUR, 2005.http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-143/paper02.pdf.

[34] H. Story, S. Corlosquet, M. Sporny, T. Inkster, B. Harbulot, andR. Backmann-Gmur. WebID 1.0 – web identification and dis-covery. W3C Editor’s Draft, 2011. http://webid.info/spec/.

[35] S. Ternier, K. Verbert, G. Parra, B. Vandeputte, J. Klerkx,E. Duval, V. Ordoez, and X. Ochoa. The ARIADNE infrastruc-ture for managing and storing metadata. IEEE Internet Com-puting, 13(4):18–25, 2009.

[36] R. Thurlow. RPC: Remote procedure call protocol specifi-cation version 2. Internet Network Working Group Requestfor Comments, page 25, 2009. http://www.ietf.org/rfc/rfc5531.txt.

[37] H. Van De Sompel, R. Sanderson, M. L. Nelson, L. L. Bal-akireva, H. Shankar, and S. Ainsworth. An HTTP-based ver-sioning mechanism for Linked Data. In Proceedings of theworkshop on Linked Data on the Web (LDOW), April 2010.

[38] World Wide Web Consortium Technical Architecture Group is-sues list: httpRange-14: What is the range of the HTTP deref-erence function? http://www.w3.org/2001/tag/issues.html#httpRange-14.

[39] Web Access Control Ontology. http://www.w3.org/wiki/WebAccessControl, 2012.