Platform Interoperability...

313
Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0 – Final Dissemination level: Public First version of the guidelines for infrastructure interoperability structured into sets that target the stakeholder groups (providers of content and software resources) H2020-EINFRA-2014-2015 / H2020-EINFRA-2014-2 Topic: EINFRA-1-2014 Managing, preserving and computing with big research data Research & Innovation action Grant Agreement 654021

Transcript of Platform Interoperability...

Page 1: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability

Guidelines March 16, 2017

Deliverable Code: D5.5

Version: 1.0 – Final Dissemination level: Public

First version of the guidelines for infrastructure interoperability structured into sets that target the stakeholder groups (providers of content and software resources)

H2020-EINFRA-2014-2015 / H2020-EINFRA-2014-2 Topic: EINFRA-1-2014 Managing, preserving and computing with big research data Research & Innovation action Grant Agreement 654021

Page 2: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 1 of 30

Document Description D5.5 – Platform Interoperability Guidelines

WP5 – Interoperability Framework

WP participating organizations: ARC, USFD, UNIMAN, AK, UoG, GRNET

Contractual Delivery Date: 9/2016 Actual Delivery Date: 3/2017

Nature: Report Version: 1.0 Final

Public Deliverable

Preparation slip Name Organization Date

From Penny Labropoulou Dimitris Galanis Angus Roberts Matt Shardlow Giulia Dore Thomas Margoni Byron Georgantopoulos Panagiotis Zervas Pythagoras Karampiperis Richard Eckart de Castilho

ARC ARC USFD UNIMAN UoG UoG GRNET AK AK UKP-TUDA

21/02/2017

Edited by Penny Labropoulou ARC 16/03/2017 Reviewed by Vangelis Floros

Christian O'Reilly Mappet Walker Lucas Anastasiou

GRNET EPFL FRONTIERS OU

07/03/2017

Approved by Androniki Pavlidou ARC 16/03/2017 For delivery Mike Hatzopoulos ARC 21/03/2017

Page 3: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 2 of 30

Document change record Issue Item Reason for Change Author Organization V0.1 Draft version Initial version sent for comments Penny Labropoulou ARC

V0.2

Draft version Version sent to internal reviewers Penny Labropoulou ARC

V0.3 Draft version Version from internal reviewers Vangelis Floros Christian O'Reilly Mappet Walker

GRNET EPFL FRONTIERS

V0.4 Draft version Version sent to internal reviewers (second round)

Penny Labropoulou ARC

V0.5 Draft version Versions from internal reviewers Vangelis Floros Christian O'Reilly Mappet Walker Lucas Anastasiou

GRNET EPFL FRONTIERS OU

V0.9 Pre-final version Version incorporating the internal reviewers' comments; pending final approval

Penny Labropoulou ARC

v1.0 Final version Final version; incorporating all comments Penny Labropoulou ARC

Page 4: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 3 of 30

Table of Contents

1. Introduction 14

2. The OpenMinTeD platform 14

3. Target audience 17

4. Background and methodology of work 18

5. The OMTD-SHARE metadata schema 20

6. Structure of the guidelines 23

Appendix A - References 25

Appendix B – Acknowledgements & Contributors 26

Appendix C - Guidelines 29

Page 5: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 4 of 30

Table of Figures

Figure 1. Overview of the OMTD-SHARE data model .............................................................................. 22

Page 6: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 5 of 30

Disclaimer This document contains description of the OpenMinTeD project findings, work and products. Certain parts of it might be under partner Intellectual Property Right (IPR) rules so, prior to using its content please contact the consortium head for approval.

In case you believe that this document harms in any way IPR held by you as a person or as a representative of an entity, please do notify us immediately.

The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated in the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content.

This publication has been produced with the assistance of the European Union. The content of this publication is the sole responsibility of the OpenMinTeD consortium and can in no way be taken to reflect the views of the European Union.

The European Union is established in accordance with the Treaty on European Union (Maastricht). There are currently 28 Member States of the Union. It is based on the European Communities and the member states cooperation in the fields of Common Foreign and Security Policy and Justice and Home Affairs. The five main institutions of the European Union are the European Parliament, the Council of Ministers, the European Commission, the Court of Justice and the Court of Auditors. (http://europa.eu.int/)

OpenMinTeD is a project funded by the European Union (Grant Agreement No 654021).

Page 7: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 6 of 30

Acronyms API Application Programming Interface LR Language Resource NLP Natural Language Processing ML Machine Learning OA Open Access OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting OKFN Open Knowledge Foundation OMTD OpenMinTeD OWL Web Ontology Language PDF Portable Document Format RDF Resource Description Framework REST Representational State Transfer RI Research Infrastructure SKOS Simle Knowledge Organization System SOAP Simple Object Access Protocol TDM Text and Data Mining VM Virtual Machine WP Workpackage XML Extensible Markup Language XSD XML Schema Definition

Page 8: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 7 of 30

Glossary annotation (text/corpus annotation) A note by way of explanation or comment added to a text or diagram [OED, https://en.oxforddictionaries.com/definition/annotation]. In OpenMinTeD, the term refers mainly to text or corpus annotation, which is the practice of adding interpretative linguistic information grounded in a knowledge resource to a text or corpus respectively. For example, one common type of annotation is the addition of tags, or labels, indicating the word class to which lexical units in a text belong; these tags come from a predefined set (e.g. Noun, Verb, Preposition, etc.). Semantic labeling with terms and concepts from an ontology is another common example of annotation. Relationships such as syntactic dependencies or semantic relations that link entities of the text are also annotations.

annotation resource Any resource that can be used for annotating a text, including part-of-speech tagsets, annotation schemes, domain-specific ontologies, etc.

annotation scheme A set of elements and values designed to annotate data. An annotation scheme usually aims to represent a specific level of information, such as morphological features of words, syntactic dependency relations between phrases, discourse level information, etc. It can consist of a flat structure of elements and values (e.g. part-of-speech tags) or it can be more complex with interrelated elements (e.g. specific morphological features to be used for each part-of-speech).

application Any software program (or group of programs seen as a whole) intended for the end-user and addressing one or multiple related user needs.

Page 9: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 8 of 30

component (software component) An algorithm wrapped in a standard way so that it can be integrated as a reusable tool or service within a particular component-oriented framework such as UIMA, GATE, etc.

corpus A structured collection of pieces of data (textual, audio, video, multimodal/multimedia, etc.) typically of considerable size and selected according to criteria external to these data (e.g. size, type of language, type of producers or expected audience, etc.) to represent as comprehensively as possible the object of study.

data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities. [Wikipedia, https://en.wikipedia.org/wiki/Data_model]

distribution Any form by which a resource can be shared; it can be a downloadable PDF or a plain text file, a form of a corpus accessible only through a web interface, or the source code of a software, etc.

document A piece of written, printed, or electronic matter that is primarily intended for reading.

interoperability Interoperability describes the extent to which systems and devices can work together, exchange data, and interpret that shared data. For two systems to be interoperable, they must be able to exchange data and subsequently present that data such that it can be understood by a user. [Research Data Alliance, http://smw-rda.esc.rzg.mpg.de/index.php/Interoperability]

Page 10: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 9 of 30

licence A permission or a written evidence of a permission that confers the licensee the right to do something that otherwise would be prevented by the law.

licence compatibility/interoperability The condition or state in which two or more licences can co-exist or be combined without conflicting with each other. In OpenMinTeD, licence compatibility and licence interoperability are used as synonyms.

knowledge resource A resource (data and/or tool) containing, producing or representing knowledge; knowledge is specific information that is relevant for the linguistic and conceptual interpretation of data. For OpenMinTeD purposes, this information is exploited or produced by TDM modules and tools, or exchanged between them.

language description The resource describes a language or some aspect(s) of a language via a systematic documentation of linguistic structures. [Open Language Archives Community, http://www.language-archives.org/REC/type.html#language_description] Examples include sketch grammar, computational grammar, etc.

language resource Language Resources (LRs) encompass (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, used to assist and augment language processing applications, but also, in a broader sense, in language and language-mediated research studies and applications, and (b) tools/technologies/services used for their processing.

Page 11: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 10 of 30

lexical/conceptual resource A resource organised on the basis of lexical or conceptual entries (lexical items, terms, concepts, etc.) with their supplementary information (e.g. grammatical, semantic, statistical information, etc.). In OpenMinTeD, they can be used for annotation purposes.

machine learning (ML) model The process of training an ML model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from. The term ML model refers to the model artifact that is created by the training process. [http://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html]

metadata Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. [National Information Standards Organization, Understanding metadata, http://www.niso.org/publications/press/UnderstandingMetadata.pdfhttp://www.niso.org/publications/press/UnderstandingMetadata.pdf]

open access (OA) The free and online availability of literature, which allows to read, download, copy, distribute, print, search, or link to the full text, crawl articles for indexing, pass them as data to software, or use them for any other useful purpose. An availability that is granted without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself, and those related to giving authors control over the integrity of their work and the right to be properly acknowledged and cited [Budapest OA Initiative 2002; Bethesda Statement on OA Publishing 2003; Berlin Declaration on OA Knowledge in Science and Humanities 2003]

OpenMinTeD infrastructure An infrastructure refers to the basic structures and facilities required for the operation of a system. The

Page 12: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 11 of 30

OpenMinTeD infrastructure consists of different layers of resources: content resources that can be mined, ancillary knowledge resources, tools and web services. Any resource that can be registered in the OpenMinTeD registry is part of the underlying infrastructure.

OpenMinTeD platform The OpenMinTeD platform brings together all the services that facilitate the interoperability aspects of the underlying infrastructure (e.g. registration, search and browsing, creation of workflows, processing, annotation, etc.) and, thus, becomes an infrastructural service of the wider research ecosystem.

publication A book, article, etc., that has been made available to the public either via a formal publication service or over the internet and is stored at an archive or repository. For OpenMinTeD purposes, this mainly covers scholarly publications.

resource Something that you can use to help you to achieve something, especially in your work or study. [MacMillan dictionary, http://www.macmillandictionary.com/dictionary/british/resource_1]

rights statement Formal or official statement asserting the copyright status and/or the licensing conditions for a given resource. It can be issued by an authoritative body (e.g. http://rightsstatements.org/). For OpenMinTeD purposes, it can be deemed similar to a "licence category", grouping licences that share similar features.

Text and Data Mining Text and Data Mining (TDM) was initially defined as “the discovery by computer of new, previously unknown information, by automatically extracting and relating information from different (…) resources, to reveal otherwise hidden meanings” (Hearst, 1999), in other words, “an exploratory data analysis that leads to the discovery of heretofore unknown information, or to answers for questions for

Page 13: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 12 of 30

which the answer is not currently known” (Hearst, 1999). [FutureTDM, http://www.futuretdm.eu/news/tdm-definition/]

service / web service A piece of software accessible through remote invocation typically using some REST-style APIs or SOAP protocols.

tool Piece of (standalone) software typically for a very limited technical purpose, such as a particular implementation of a part-of-speech tagger (e.g. TreeTagger), a tree parsing program (e.g. mstparser), etc. Preferred terms in OpenMinTeD include 'component' and 'workflow'.

workflow A series of software components assembled together in order to perform a specific task.

Page 14: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 13 of 30

Publishable Summary The current deliverable brings together the guidelines that interested stakeholders must follow in order to be compatible with OpenMinTeD interoperability specifications.

The guidelines intend to present in a user-friendly way the specifications set for empowering interoperability between content and software resources, especially in the framework of the OpenMinTeD platform. It is, therefore, based on input from

● D5.2 and its updated version D5.3 - Interoperability Requirements Reports (in-progress) that includes the interoperability specifications set for OpenMinTeD,

● D6.1 - Platform Architectural Specification that describes the architecture and functions of the OpenMinTeD platform, and

● the data model adopted by OpenMinTeD for describing resources involved in TDM and implemented in the OMTD-SHARE metadata schema.

The deliverable presents the work and methodology according to which the guidelines have been created, while the actual guidelines are annexed to this report and published online at https://guidelines.openminted.eu.

Four guidelines have been created, targeting respectively the providers of publications, corpora, ancillary knowledge resources and TDM software resources. The specifications determine technical (e.g. data representation formats, transfer protocols), legal and documentation (metadata) issues. Two levels of compliance are foreseen, corresponding to mandatory and recommended specifications, allowing for a gradual adoption by stakeholder groups.

Public review will be solicited from the stakeholder groups and their comments, together with additional requirements from the ongoing work on the project, will be taken into account for the next version of the guidelines (D5.6).

Page 15: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 14 of 30

1. Introduction OpenMinTeD enables the creation of an infrastructure that fosters and facilitates the use of text and data mining technologies in the scientific publications world and beyond, by both application domain users (i.e., scientists, technicians, etc.) and text mining experts. OpenMinTeD builds upon existing tools and text mining platforms. It aims at rendering them discoverable through the OpenMinTeD registry, and interoperable through the interoperability layer, also based on existing and emerging standards and best practices.

The current deliverable puts together the guidelines that interested parties must follow in order to be compatible with OpenMinTeD interoperability specifications. To serve better the needs of the target stakeholder groups and the peculiarities of each resource type, separate guidelines are available per resource type and provider group. Thus, the deliverable is structured as follows:

● a short presentation of the OpenMinTeD platform and the objectives it serves, ● a short presentation of the audience targeted by the guidelines ● background and methodology of the work ● a synopsis of the OMTD-SHARE metadata schema, which is used for the documentation of all

resources in OpenMinTeD, and the data model it supports. The guidelines themselves are presented in Appendix C, while an online version is available at: https://guidelines.openminted.eu. Given that the project is still in progress, there will be two new releases in the next twelve months, taking into account stakeholders' feedback and additional specifications coming from the project; backwards compatibility of the new versions will be a priority and, where needed, conversion tools to the new version will be made available.

2. The OpenMinTeD platform TDM involves a wide range of resource types:

● the content resources to be mined (scholarly publications in the OpenMinTeD project), ● the text mining software and ● ancillary knowledge resources used for the operation of the software (e.g. annotation schemas,

Page 16: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 15 of 30

linguistic tagsets, lexical or ontological resources used for annotating the resources to be mined, machine learning models, annotated textual corpora).

The OpenMinTeD platform1 integrates these resources and supports their interaction through appropriate services:

● a Registry service for storing, browsing, downloading, searching and managing the various resources, which will be registered in OpenMinTeD by using a set of specifications/protocols (e.g. OAI-PMH [https://www.openarchives.org/pmh/], Maven [https://maven.apache.org/]) and documented with high-quality metadata;

● the Workflow Editor service of the platform to guide users (via an appropriate User Interface) in creating interoperable workflows of TDM components, which will be executed by the Workflow Execution service in a cloud infrastructure (or on a local machine);

● the Annotation Editor service to allow users to annotate the publications (texts) in order to create datasets that can be used in workflows, e.g. for evaluation purposes.

The OpenMinTeD platform was designed and is being implemented as a facilitator of TDM in an ecosystem of e-infrastructures and repositories, collecting, transforming and making available resources only as needed for TDM purposes. In other words, it is not one more registry of content and services, and it doesn't seek to collect and provide information about resources that might be of interest to TDM stakeholders. Resources are uploaded and stored only as required to accommodate the processing process. Thus, for instance, knowledge resources can be registered at the OpenMinTeD registry and continue to reside at locations outside the platform, only to be accessed at the time of processing. Publications, on the other hand, are harvested and locally stored at OpenMinTeD storage facilities to ensure processing requirements and improve processing time.

Resources are to be registered into OpenMinTeD only if they can be accessed and deployed in the context of a TDM processing operation.

For this reason, it is imperative that

• the resource itself can be accessed in a single step process and in a transparent way through the

1 For a full description of the platform, see D6.1 - Platform Architectural Specification.

Page 17: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 16 of 30

OpenMinTeD mechanisms;

• the resource is properly described with the metadata schema adopted by OpenMinTeD, i.e. the OMTD-SHARE schema (see section 5), at least at the minimal level, ensuring that it can be discovered through the search-or-browse interface by the platform users, and that it can be instantiated when required by the software components at the time of execution of a workflow

• the resource is in a form that can be exploited as is in the OpenMinTeD context (or can be easily transformed into one of the OpenMinTeD acceptable forms through one of the conversion tools included in the platform)

• the resource adheres to the specifications set by OpenMinTeD (at least at the minimal level) that seek to achieve interoperability among all resources, as described in the guidelines.

The resources will be registered into OpenMinTeD by trustworthy sources, i.e. registered individuals. Bilateral agreements with repositories, infrastructures and other registries containing useful resources will also be made to facilitate this process.

In addition, new resources created using the OpenMinTeD toolbox and resources (i.e. corpora built by users by selecting scholarly publications, workflows created by TDM developers with components registered in OpenMinTeD, and outputs of running TDM tools and services in the platform), are also registered, stored and made available to the end-users through OpenMinTeD2 and must follow the above principles. The descriptions of new resources are produced semi-automatically, based on information from the resources used in their composition, and can be edited and enriched by users.

Providers of resources interact with the OpenMinTeD Registry service through a specially designed interface, guiding them through the process of registration (uploading resources and their descriptions). All users can browse resources through the catalogue, select a specific resource and view its detailed description; moreover, resources are fed internally through the system into the Workflow and Annotation services, where they are presented to expert users for further operations.

2 There's an ongoing discussion on the archiving and distribution of the output resources; more information on this will be made available when decisions are reached on this issue.

Page 18: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 17 of 30

3. Target audience OpenMinTeD targets the following groups:

● End users as consumers of the e-Infrastructure, which are further divided to: o Domain specific researchers and research communities (e.g. research labs around the world):

Users that are not knowledgeable about TDM and who want to find end-to-end applications (e.g. web services) that fulfill their needs in an off the shelf type of situation.

o Application developers / Research e-Infrastructures data scientists: People who understand the basic usage of NLP and TDM services, but not the (algorithmic) details. They are aware of the research community needs, limitations and goals. They know how to connect and configure components, and which content they must use to get the required results. They need to develop end-to-end applications.

o e-Infrastructure operators: Users agnostic to the internal specifics of TDM, but who need to integrate and operate TDM services into daily workflows which serve their constituency; the group includes, for instance, researchers of an RI, of a national e-Infrastructure or of a research institution.

● Contributors of content and software resources: o For content to be mined (scholarly publications), a potentially wide group of stakeholders

can be envisaged; in the current phase, the focus is on publishers and repository managers (research libraries).

o For TDM software resources, two subgroups are identified: ▪ A well-established community of expert language technology oriented people, who

are using specific technologies and frameworks (e.g. UIMA, GATE) to develop and enhance their software, which can be used for TDM purposes. Examples of software include Named Entity Recognizers and Term Extractors that incorporate grammatical taggers and parsers.

▪ Non-NLP expert developers, who are creating TDM modules based on off the shelf libraries and tools (e.g. Python NLTK3, Tidytext4, Scikit-learn5, Genism6, OKFN’s

3 https://pypi.python.org/pypi/nltk 4 https://cran.r-project.org/web/packages/tidytext/index.html 5 http://scikit-learn.org/stable/ 6 https://radimrehurek.com/gensim/

Page 19: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 18 of 30

relevant initiative ContentMine7). These are not familiar with NLP frameworks and terminology but are eager to publish their TDM software.

o For ancillary resources, contributions are expected from two main sources: ▪ The TDM software developers (see above) who are usually bundling the required

resources in their software, but also make them available as separate entities; this includes, for instance, ML models that come together with the software that uses them but may also be distributed separately and, thus, re-used with other software.

▪ Language resources developers (e.g. terminologists, lexicographers, NLP experts producing annotation resources) and members of the various domain communities that already use resources such as ontologies, terminological lexica, thesauri etc. in their work. For this phase, the focus is on the communities targeted by the OpenMinTeD use cases, i.e. research analytics, life sciences, agriculture & biodiversity, social sciences.

The guidelines, at the present stage, are targeting only the second group, i.e. contributors of resources. It supplies instructions and advice on the registration and uploading process, as well as on the proper packaging and documentation of the resources required for importing resources in the OpenMinTeD platform. It also provides recommendations on technical features and properties that contribute to interoperability.

It should be noted, though, that the needs, expertise, habits and expectations of the first group have also influenced the descriptive schema of the resources as well as the functionalities and services supported by the platform. In addition, to further assist the end-users, the creation of guidelines targeting them, with examples and suggested pathways on the use of the OpenMinTeD platform will be investigated during the second phase of the project.

4. Background and methodology of work

The guidelines provide instructions on how to prepare, package and add new resources using the Registry interface. Their production has been based mainly on the interoperability specifications (WP5), taking into account the overall OpenMinTeD architecture and the platform implementation (WP6) as

7 http://contentmine.org/

Page 20: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 19 of 30

well as the user requirements (cf. Appendix B for acknowledgements).

The four working groups participating in WP5 have set a number of abstract requirements which are described in D5.2 - Interoperability Standards and Specifications Report. The requirements specify ways of assessing and improving interoperability between content resources and software components involved in TDM operations. The next step in this endeavour has been the formulation of concrete requirements (which will be included in the updated version of this deliverable, i.e. D5.3 - Interoperability Standards and Specifications Report (2nd edition)) recommending specific implementation strategies, techniques and features that ensure interoperability as envisaged in OpenMinTeD. These requirements have fed and will continue to feed the Guidelines, given that this is still an ongoing work and that updated versions will be released during the subsequent phases of project.

An important instrument construed to support interoperability in OpenMinTeD is the OMTD-SHARE metadata schema, which is used for the description of the resources (see next section). The Guidelines include separate sections on the use of the metadata schema for each resource type, focusing in the first phase on the minimal level, which includes mandatory and strongly recommended elements. In the next release of the Guidelines, we will also include a full documentation with examples for all resource types, FAQ's and tips/advice. Given the size and complexity of the schema, we have decided to adopt this stepwise process in order to have a first testbed regarding the user-friendliness of the guidelines, and then build upon them following recommendations from the stakeholders.

Additional input for the Guidelines will come from discussions on policies regarding the registration of providers and resources in the platform. Key issues include:

● the interaction with other infrastructures, data and software repositories, in order to manually or automatically harvest all or selected resources from them,

● the involvement of organizations vs. individuals in the process of registering and uploading resources,

● the criteria for accepting resources, ● the criteria for assigning user privileges.

Page 21: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 20 of 30

The structure and the content of the Guidelines reflect these decisions by addressing issues related to specific types of providers in specific sections.

The current version is limited to the resource providers listed hereafter, but next releases will broaden this scope to cover additional stakeholders (e.g. the non-NLP expert software developers). More specifically:

● for content resources (scholarly publications), we expect to get input through big aggregators, i.e. OpenAIRE and CORE, who are aggregating open access content from various sources, such as repositories, publishers, journals etc. To further support the task of data collection, a connector is implemented in OpenMinTeD targeting specifically content from traditional publishers of open access publications.

● for software resources, we expect input mainly from the Consortium partners, collected through software repositories (e.g. Maven Central), but also through MetaShare that hosts resources intended for Language Technology development. In both cases, these belong mainly to the expert language technology oriented communities of developers;

● for ancillary resources, such as lexica, ontologies, ML models etc., we expect input from (a) the TDM software developers, who are wrapping ancillary resources (especially typesystems, models and tagsets) with their software modules; in the first phase, again we are focusing on the Consortium partners; ( b) developers of language resources, who are describing and storing their resources in repositories intended for that purpose, such as MetaShare, and/or in discipline repositories (especially as regards terminologies and ontologies); the main focus will be on the disciplines targeted by the WP9 use cases.

5. The OMTD-SHARE metadata schema The OMTD-SHARE metadata schema8 is the recommended schema for the description of the resources. It has been designed in order to support interoperability between the various resources used in TDM processes. This interoperability is achieved by homogenising descriptions of TDM resources from the different scientific communities using a common core vocabulary, which is linked to pre-

8 The full OMTD-SHARE schema is documented at: https://openminted.github.io/releases/omtd-share/.

Page 22: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 21 of 30

existing domain-specific vocabularies. Standards and best practices of the source communities are integrated whenever possible. The main principles and strategies employed in the design of the OMTD-SHARE schema consist of the following:

● cover needs of resource discoverability and TDM processing ● cover documentation needs of all resource types involved in TDM ● be flexible enough to support varying degrees of documentation completeness ● organize the schema elements and accommodate common vs. particular features of resources ● reuse what is available vs. create new elements and values ● normalize user input vs. allow for free user input ● document processing procedure and outputs.

It has largely been based on the META-SHARE metadata schema9 [Gavrilidou et al. 2012], which caters for the description of language resources, encompassing both data (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) and technologies (tools/services) used for their processing. The OMTD-SHARE is more restricted in the sense that it focuses on text resources only, while it also extends the basic schema in order to include TDM-specific concepts, and enhances the description of processing procedures and workflows.

As in META-SHARE, the schema documents the full lifecycle of a resource, including at least a minimal documentation of its satellite entities (see Figure 1), especially their interrelations. The OMTD-SHARE data model thus comprises of the following entities:

● the resources, further classified into: ○ corpora, i.e. datasets of text documents - mainly scholarly publications in OMTD-SHARE ○ lexical/conceptual resources, including lexica, ontologies, term lists, gazetteers, etc., but

also tagsets and annotation schemas, which are used for annotating corpora ○ language descriptions, which mainly refer to computational grammars ○ machine learning and statistical models10 ○ software components, pieces of software, tools offered as locally executable codes or as

9 http://metashare.ilsp.gr/knowledgebase/homePage 10Models could be considered as a subtype of language descriptions, but we decided to keep it distinct because it had a lot of properties that differentiated it from grammars; it was also considered better to keep them apart as it would enhance their discoverability.

Page 23: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 22 of 30

web services, wrapped in a workflow or as standalone end-to-end applications, and, finally, ○ publications, which constitute a peculiar resource type, as they are viewed in OpenMinTeD

only in a collective form, as a "corpus", ● the satellite entities, such as actors, be it persons or organizations that have created the resources,

or the projects using or funding them.

Figure 1. Overview of the OMTD-SHARE data model

The schema is composed of metadata elements that are used to describe properties and relationships. Some of these elements, especially those that pertain to administrative features, are common to all types of resources (e.g. identification, contact, licensing information, etc.) while others, mainly technical features about the contents and format of resources, differ across types. As aforesaid, publications differ from other resources types: their recommended metadata elements mainly describe criteria used for their selection in the corpus building process.

Page 24: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 23 of 30

One of the characteristic features of the META-SHARE family of schemas11 is the adoption of the component-based mechanism (Component MetaData Infrastructure, CMDI), according to which semantically coherent elements are grouped together to form components12 [Broeder et al., 2008]. For instance, the licensing module includes elements such as the name and URL of a licence, attribution text, copyright holders, etc. For the sake of simplicity, the container elements used for this grouping will not be presented in the guidelines unless required.

The OMTD-SHARE schema classifies elements into three levels of optionality:

● mandatory: elements that are necessary for intended purposes, i.e. for discovering resources and for triggering operations between content and software components

● recommended: elements that can help the current or future use of the resource, or useful information that providers have not yet standardized

● optional: all remaining information related to the lifecycle of a resource.

The XML Schema Definition (XSD) that formally describes the schema has been made publicly available13. An important difference from META-SHARE lies in the organisation vis-a-vis the different resource types covered: while META-SHARE describes all resources types in one common XSD, in OMTD-SHARE, the resource types are described in a more modular way as separate sets of XSDs.

Work is ongoing for producing also an RDF/OWL version, which will be documented in the next release of the guidelines.

6. Structure of the guidelines The current release includes four guidelines (cf. Appendix C), which correspond to the three major

11 Based on the META-SHARE schema, four more adaptations are now available: ELRC-SHARE, clarin:el, and OMTD-SHARE. The META-SHARE schema has also been implemented as an RDF/OWL ontology with the collaboration of the ld4lt W3C group. 12 To avoid confusion with the term "component" also used for software components, we will from now on refer to this concept as "modules". 13 The current version of XSD's is available at: https://github.com/openminted/omtd-share_metadata_schema and the documentation of v1.0.0 at: https://openminted.github.io/releases/omtd-share/.

Page 25: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 24 of 30

distinctions of resources involved in TDM processes:

● content resources to be mined, i.e. scholarly publications, ● ancillary (knowledge) resources used for the operation of the software (e.g. annotation schemas,

linguistic tagsets, lexical or ontological resources used for annotating the resources to be mined, machine learning models)

● TDM (-related) software, and one more for

● corpora as they can be used either as an ancillary resource or as a resource to be mined.

Each set of guidelines contains the following information:

● a brief introduction, specifying the resources expected, potential sources, minimal requirements for the contributions

● packaging and registering instructions for the OpenMinTeD registry ● technical and metadata requirements that empower interoperability ● for each resource type, an overview of the OMTD-SHARE metadata schema (minimal level) with

definitions, explanations, recommended usage and mappings to other widespread metadata schemas

● further instructions per type of contributors or resource type/subtype where required.

Page 26: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 25 of 30

Appendix A - References

Broeder, D., T. Declerck, E. Hinrichs, S. Piperidis, L. Romary, N. Calzolari and P. Wittenburg,“Foundation of a Component-based Flexible Registry for Language Resources and Technology”, Proceedings of the 6th International Conference of Language Resources and Evaluation, 2008. Available at: http://www.lrec-conf.org/proceedings/lrec2008/

Gavrilidou M., P. Labropoulou, E. Desipri, S. Piperidis, H. Papageorgiou, M. Monachini, F. Frontini, T. Declerck, G. Francopoulo, V. Arranz, V. Mapelli (2012) "The META-SHARE Metadata Schema for the Description of Language Resources", LREC 2012, Istanbul, Turkey. http://www.lrec-conf.org/proceedings/lrec2012/pdf/998_Paper.pdf

Page 27: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 26 of 30

Appendix B – Acknowledgements & Contributors The guidelines have been the product of work carried out mainly in the OpenMinTeD WP5 Interoperability Framework. The following internal and external experts have exchanged ideas and participated in discussions that have formulated the interoperability requirements, which these guidelines purport to describe:

Internal experts

• Sophia Ananiadou (University of Manchester, UK) • Lucas Anastasiou (Open University, UK) • Sophie Aubin (INRA, France) • Mouhamadou Ba (INRA, France) • Kalina Bontcheva (University of Sheffield, UK) • Robert Bossy (INRA, France) • Jacob Carter (University of Manchester, UK) • Louise Deléger (INRA, France) • Giulia Dore (University of Glasgow, UK) • Richard Eckart de Castilho (TU Darmstadt, Germany) • Fred Fenter (Frontiers Media S.A, Switzerland) • Dimitris Galanis (Athena RC, Greece) • Maria Gavriilidou (Athena RC, Greece) • Patricia Geretto (INRA, France) • Mark Greenwood (University of Sheffield, UK) • Lucie Guibault (University of Amsterdam, Netherlands) • Masoud Kiaeeha (TU Darmstadt, Germany) • Petr Knoth (Open University, UK) • Penny Labropoulou (Athena RC, Greece) • Antonis Lempesis (Athena RC, Greece) • Miguel Madrid (CNIO) • Natalia Manola (Athena RC, Greece) • Thomas Margoni (University of Glasgow, UK) • John McNaught (University of Manchester, UK) • Claire Nedellec (INRA, France) • Wim Peters (University of Sheffield, UK) • Stelios Piperidis (Athena RC, Greece) • Prokopis Prokopidis (Athena RC, Greece)

Page 28: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 27 of 30

• Piotr Przybyla (University of Manchester, UK) • Angus Roberts (University of Sheffield, UK) • Matt Shardlow (University of Manchester, UK) • Mappet Walker (Frontiers Media SA, Switzerland)

External experts

• Giulia Ajmone Marsan (The Organisation for Economic Co-operation and Development) • Enrique Alonso (Consejo de Estado) • Geoffrey Bilder (CrossRef) • Lukasz Bolikowski (University of Warsaw, Poland) • Maurizio Borghi (Bournemouth University, UK) • Steve Cassidy (Macquarie University Sydney, Australia) • Christopher Cieri (LDC, USA) • Christian Chiarcos (Goethe-Universität Frankfurt am Main, Germany) • Liam Earney (JISC, UK) • Kristofer Erickson (CREATe) • Dominique Estival (Western Sydney University, Australia) • Gwen Franck (Creative Commons, EIFL) • Thilo Götz (IBM) • Nancy Ide (Vassar College, USA) • Pawel Kamocki (Institut für Deutsche Sprache, Germany) • Andreas Kempf (Deutsche Zentralbibiothek für Wirtschaftswissenschaften, Germany) • Jin-Dong Kim (Database Center for Life Science, Research Organization of Information and

Systems) • John McCrae (National University of Ireland, Galway, Ireland) • Federico Morando (Nexa Center for Internet & Society, Italiae) • Eric Nyberg (Carnegie Mellon University, USA) • Mark Perry (University of new England, Australia) • Diane Peters (Creative Commons HQ) • Rafal Rak (UberResearch, UK) • Jochen Schirrwagen (Universität Bielefeld, Germany) • Ineke Schuurman (CCL, University of Leuven) • Peter Suber (Berkman Klein Centre, Harvard University) • Keith Suderman (Vassar College, LAPPS) • Prodromos Tsiavos (The Media Institute) • Paul Uhlir (National Academy of Sciences) • Maarten van Gompel (Radboud University Nijmegen)

Page 29: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 28 of 30

• Marc Verhagen (Brandeis University, LAPPS) • Piek Vossen (VU University Amsterdam, Netherlands) • Menzo Windhouwer (MPI for Psycholinguistics, Netherlands) • Maarten Zeinstra (Kennisland)

Page 30: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 29 of 30

Appendix C - Guidelines

Page 31: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

1.1

1.2

1.3

1.3.1

1.3.2

1.3.2.1

1.3.2.2

1.3.2.3

1.3.3

1.3.3.1

1.3.3.2

1.3.4

1.3.5

1.3.5.1

1.3.5.2

1.3.5.3

1.3.5.4

1.3.5.5

1.3.5.6

1.3.5.7

1.3.5.8

1.3.5.9

1.3.5.10

1.3.5.11

1.3.5.12

1.3.5.13

1.3.5.14

1.3.5.15

1.3.5.16

1.3.5.17

TableofContentsOpenMinTeDguidelines

Acknowledgements&Contributors

Guidelinesforprovidersofpublications

Introduction

Instructionsforpublicationrepositories,libraries,journals,publishers,etc.

Howtoregisteryourresources

Howtomakeyourresourcesinteroperable

Howtodocumentyourresources

InstructionsforaggregatorsHowtoregisteryourresources

Howtodocumentyourresources

Furtherrequirementsforannotatedpublications

RecommendedschemaforpublicationsdocumentType

publicationType

identifier

title

licence

rightsStmtName

rightsStmtURL

nonStandardLicenceName

nonStandardLicenceTermsURL

versionoflicence

distributionMedium

downloadURL

documentLanguage

fullText

abstract

author

publisher

1

Page 32: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

1.3.5.18

1.3.5.19

1.3.5.20

1.3.5.21

1.3.5.22

1.3.5.23

1.3.5.24

1.3.5.25

1.3.5.26

1.3.5.27

1.3.5.28

1.3.5.29

1.3.5.30

1.3.5.31

1.3.5.32

1.3.6

1.3.6.1

1.3.6.2

1.3.6.3

1.3.6.4

1.3.6.5

1.3.6.6

1.3.6.7

1.3.6.8

1.3.6.9

1.3.6.10

1.3.6.11

1.4

1.4.1

1.4.2

1.4.2.1

1.4.2.2

1.4.2.3

1.4.2.4

journal

mimeType

characterEncoding

publicationDate

subject

keyword

collectedFromrepositoryNameorrepositoryIdentifier

sourceMetadataLink

originalDataProviderType

originalDataProviderRepository

originalDataProviderJournal

originalDataProviderPublisher

relationType

relatedResource1

relatedResource2

Metadataschemaforannotatedpublications

annotationLevel

annotationStandoff

mimeType

documentationURL

dataFormatSpecific

characterEncoding

typesystem

tagset

annotationMode

isAnnotatedBy

annotationDate

Guidelinesforprovidersofcorpora

IntroductionInstructionsforprovidersofcorpora

Howtoregisteryourresources

Howtomakeyourresourcesinteroperable

Howtodocumentyourresources

Furtherrequirementsforannotatedcorpora

2

Page 33: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

1.4.2.5

1.4.2.5.1

1.4.2.5.2

1.4.2.5.3

1.4.2.5.4

1.4.2.5.5

1.4.2.5.6

1.4.2.5.7

1.4.2.5.8

1.4.2.5.9

1.4.2.5.10

1.4.2.5.11

1.4.2.5.12

1.4.2.5.13

1.4.2.5.14

1.4.2.5.15

1.4.2.5.16

1.4.2.5.17

1.4.2.5.18

1.4.2.5.19

1.4.2.5.20

1.4.2.5.21

1.4.2.5.22

1.4.2.5.23

1.4.2.5.24

1.4.2.5.25

1.4.2.5.26

1.4.2.5.27

1.4.2.5.28

1.4.2.5.29

1.4.2.5.30

1.4.2.5.31

1.4.2.5.32

RecommendedschemaforcorporaresourceName

resourceType

description

identifier

version

licence

rightsStmtName

rightsStmtURL

versionoflicence

nonStandardLicenceName

nonStandardLicenceTermsURL

distributionMedium

downloadURL

contactEmail

landingPage

contactPerson(identifierorpersonName)

contactGroup(identifierororganizationName)

mustBeCitedWith

resourceCreator

creationDate

corpusType

mediaType

lingualityType

multilingualityType

language

sizePerLanguage

size

mimeType

characterEncoding

domain

subject

keyword

3

Page 34: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

1.4.2.5.33

1.4.2.5.34

1.4.2.5.35

1.4.2.5.36

1.4.2.6

1.5

1.5.1

1.5.2

1.5.2.1

1.5.2.2

1.5.2.3

1.5.2.4

1.5.2.4.1

1.5.2.4.2

1.5.2.4.3

1.5.2.4.4

1.5.2.4.5

1.5.2.4.6

1.5.2.4.7

1.5.2.4.8

1.5.2.4.9

1.5.2.4.10

1.5.2.4.11

1.5.2.4.12

1.5.2.4.13

1.5.2.4.14

1.5.2.4.15

1.5.2.4.16

1.5.2.4.17

1.5.2.4.18

1.5.2.4.19

1.5.2.4.20

1.5.2.4.21

userQuery

relationType

relatedResource1

relatedResource2

Metadataschemaforannotatedcorpora

Guidelinesforprovidersofknowledgeresources

Introduction

Instructionsforprovidersofancillaryknowledgeresources

Howtoregisteryourknowledgeresources

Howtomakeyourknowledgeresourcesinteroperable

Howtodocumentyourknowledgeresources

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

resourceType

resourceName

description

identifier

licence

rightsStmtName

rightsStmtURL

nonStandardLicenceName

nonStandardLicenceTermsURL

versionoflicence

distributionMedium

downloadURL

accessURL

contactEmail

landingPage

contactPerson(identifierorpersonName)

contactGroup(identifierororganizationName)

mustBeCitedWith

lexicalConceptualResourceType

encodingLevel

linguisticInformation

4

Page 35: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

1.5.2.4.22

1.5.2.4.23

1.5.2.4.24

1.5.2.4.25

1.5.2.4.26

1.5.2.4.27

1.5.2.4.28

1.5.2.4.29

1.5.2.4.30

1.5.2.4.31

1.5.2.4.32

1.5.2.5

1.5.2.5.1

1.5.2.5.2

1.5.2.5.3

1.5.2.5.4

1.5.2.5.5

1.5.2.5.6

1.5.2.5.7

1.5.2.5.8

1.5.2.5.9

1.5.2.5.10

1.5.2.5.11

1.5.2.5.12

1.5.2.5.13

1.5.2.5.14

1.5.2.5.15

1.5.2.5.16

1.5.2.5.17

1.5.2.5.18

1.5.2.5.19

1.5.2.5.20

1.5.2.5.21

conformanceToStandardsBestPractices

lingualityType

language

metalanguage

size

domain

characterEncoding

mimeType

relationType

relatedResource1

relatedResource2

RecommendedschemaformodelsresourceType

resourceName

identifier

description

version

licence

rightsStmtName

rightsStmtURL

nonStandardLicenceName

nonStandardLicenceTermsURL

versionoflicence

distributionMedium

downloadURL

contactEmail

landingPage

contactPerson(identifierorpersonName)

contactGroup(identifierororganizationName)

mustBeCitedWith

resourceCreator(personororganization,describedwithidentifierorname)

variantName

tagset

5

Page 36: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

1.5.2.5.22

1.5.2.5.23

1.5.2.5.24

1.5.2.5.25

1.5.2.5.26

1.5.2.5.27

1.5.2.5.28

1.5.2.5.29

1.5.2.5.30

1.5.2.5.31

1.5.2.5.32

1.5.2.5.33

1.6

1.6.1

1.6.2

1.6.2.1

1.6.2.2

1.6.2.3

1.6.2.4

1.6.3

1.6.4

1.6.4.1

1.6.4.2

1.6.4.3

1.6.4.4

1.6.4.5

1.6.4.6

1.6.4.7

1.6.4.8

1.6.4.9

1.6.4.10

1.6.4.11

1.6.4.12

1.6.4.13

typesystem

algorithm

trainingCorpusDetails

mediaType

lingualityType

language

size

mimeType

characterEncoding

relationType

relatedResource1

relatedResource2

GuidelinesforprovidersofsoftwareresourcesIntroduction

Instructionsforprovidersofsoftwarecomponents

Howtoregisteryourcomponents

Howtomakeyourcomponentsinteroperable

Howtodocumentyourcomponents

GuidefordeployingUIMAcomponentsintheArgoplatform

Recommendedancillaryknowledgeresources

Recommendedschemaforsoftwareresources

resourceType

resourceName

description

identifier

version

componentType

licence

rightsStmtName

rightsStmtURL

nonStandardLicenceTermsURL

versionoflicence

componentDistributionMedium

accessURL

6

Page 37: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

1.6.4.14

1.6.4.15

1.6.4.16

1.6.4.17

1.6.4.18

1.6.4.19

1.6.4.20

1.6.4.21

1.6.4.22

1.6.4.25

1.6.4.26

1.6.4.29

1.6.4.30

1.6.4.31

1.6.4.33

1.6.4.34

1.6.4.35

1.6.4.36

1.6.4.37

1.6.4.38

1.6.4.39

1.6.4.23

1.6.4.24

1.6.4.27

1.6.4.28

1.6.4.32

1.7

1.8

downloadURL

contactEmail

landingPage

contactPerson(identifierorpersonName)

contactGroup(identifierororganizationName)

mailingListInfo

onlineHelpURL

issueTracker

mustBeCitedWith

resourceCreator(personororganization,describedwithidentifierorname)

mediaTypeinsideinputContentResourceInfooroutputResourceInfo

resourceTypeinsideinputContentResourceInfooroutputResourceInfo

languageinsideinputContentResourceInfooroutputResourceInfo

characterEncodinginsideinputContentResourceInfooroutputResourceInfo

mimeTypeinsideinputContentResourceInfooroutputResourceInfo

dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo

typesysteminsideinputContentResourceInfooroutputResourceInfo

tagsetinsideinputContentResourceInfooroutputResourceInfo

annotationLevelinsideinputContentResourceInfooroutputResourceInfo

typesysteminsidecomponentDependencies

tagsetinsidecomponentDependencies

annotationResourceinsidecomponentDependencies

framework

relationType

relatedResource1

relatedResource2

TheOMTD-SHAREmetadataschema

Glossary

7

Page 38: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

OpenMinTeDguidelinesWelcometotheOpenMinTeDGuidelines!

OpenMinTeDenablesthecreationofaninfrastructurethatfostersandfacilitatestheuseofTextandDataMining(TDM)technologiesinthescientificpublicationsworld,buildsonexistingTDMtoolsandplatforms,andrendersthemdiscoverableandinteroperablethroughappropriateregistriesandastandards-basedinteroperabilitylayer,respectively.

Thisiswhereyou'llfindinformationon

howtomakeyourresourcesinteroperablewithotherresourcesforTDMpurposeshowtoregisteryourresourcesattheOpenMinTeDplatform(https://services.openminted.eu/)howtocontributetotheguidelines.

TDMinvolvesawiderangeofresourcetypes:

thecontentresourcestobemined,i.e.scholarlypublicationsinthecurrentphase,theTDMsoftwareandancillaryknowledgeresourcesusedfortheoperationofthesoftware(e.g.annotationschemes,linguistictagsets,lexicalorontologicalresourcesusedforannotatingtheresourcestobemined,machinelearningmodels,annotatedtextualcorpora).

Fourguidelinesarereleasedtargetingprovidersoftheseresources:

GuidelinesforprovidersofpublicationsGuidelinesforprovidersofcorporaGuidelinesforprovidersofsoftwareresourcesGuidelinesforprovidersofknowledgeresources

TheOpenMinTeDplatformservesasafacilitatorofTDMinanecosystemofe-infrastructuresandrepositories,collecting,transformingandmakingavailableresourcesonlyasneededforTDMpurposes.Inotherwords,itisnotonemoreregistryofcontentandservices,anditdoesn'tseektocollectandprovideinformationaboutresourcesthatmightbeofinteresttoTDMstakeholders.

Importantnotice

ResourcesaretoberegisteredintoOpenMinTeDonlyiftheycanbeaccessedanddeployedinthecontextofaTDMprocessingoperation.

OpenMinTeDguidelines

8

Page 39: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Eachsetofguidelinescontainsthefollowinginformation:

abriefintroduction,specifyingtheresourcesexpected,potentialsources,minimalrequirementsforthecontributionspackagingandregisteringinstructionsfortheOpenMinTeDregistrytechnicalandmetadatarequirementsthatempowerinteroperabilityforeachresourcetype,anoverviewoftheOMTD-SHAREmetadataschema(minimallevel)withdefinitions,explanations,recommendedusageandmappingstootherpopularmetadataschemasfurtherinstructionspertypeofcontributorsorresourcetype/subtypewhererequired.

OpenMinTeDguidelines

9

Page 40: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Acknowledgements&ContributorsTheguidelineshavebeentheproductofworkcarriedoutmainlyintheOpenMinTeDWP5InteroperabilityFramework.Thefollowinginternalandexternalexpertshaveexchangedideasandparticipatedindiscussionsthathaveformulatedtheinteroperabilityrequirements,whichtheseguidelinespurporttodescribe:

Internalexperts

SophiaAnaniadou(UniversityofManchester,UK)LucasAnastasiou(OpenUniversity,UK)SophieAubin(INRA,France)MouhamadouBa(INRA,France)KalinaBontcheva(UniversityofSheffield,UK)RobertBossy(INRA,France)JacobCarter(UniversityofManchester,UK)LouiseDeléger(INRA,France)GiuliaDore(UniversityofGlasgow,UK)RichardEckartdeCastilho(TUDarmstadt,Germany)FredFenter(FrontiersMediaS.A,Switzerland)DimitrisGalanis(AthenaRC,Greece)MariaGavriilidou(AthenaRC,Greece)PatriciaGeretto(INRA,France)MarkGreenwood(UniversityofSheffield,UK)LucieGuibault(UniversityofAmsterdam,Netherlands)MasoudKiaeeha(TUDarmstadt,Germany)PetrKnoth(OpenUniversity,UK)PennyLabropoulou(AthenaRC,Greece)AntonisLempesis(AthenaRC,Greece)MiguelMadrid(CNIO)NataliaManola(AthenaRC,Greece)ThomasMargoni(UniversityofGlasgow,UK)JohnMcNaught(UniversityofManchester,UK)ClaireNedellec(INRA,France)WimPeters(UniversityofSheffield,UK)SteliosPiperidis(AthenaRIC,Greece)ProkopisProkopidis(AthenaRC,Greece)PiotrPrzybyla(UniversityofManchester,UK)AngusRoberts(UniversityofSheffield,UK)

Acknowledgements&Contributors

10

Page 41: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

MattShardlow(UniversityofManchester,UK)MappetWalker(FrontiersMediaSA,Switzerland)

Externalexperts

GiuliaAjmoneMarsan(TheOrganisationforEconomicCo-operationandDevelopment)EnriqueAlonso(ConsejodeEstado)GeoffreyBilder(CrossRef)LukaszBolikowski(UniversityofWarsaw,Poland)MaurizioBorghi(BournemouthUniversity,UK)SteveCassidy(MacquarieUniversitySydney,Australia)ChristopherCieri(LDC,USA)ChristianChiarcos(Goethe-UniversitätFrankfurtamMain,Germany)LiamEarney(JISC,UK)KristoferErickson(CREATe)DominiqueEstival(WesternSydneyUniversity,Australia)GwenFranck(CreativeCommons,EIFL)ThiloGötz(IBM)NancyIde(VassarCollege,USA)PawelKamocki(InstitutfürDeutscheSprache,Germany)AndreasKempf(DeutscheZentralbibiothekfürWirtschaftswissenschaften,Germany)Jin-DongKim(DatabaseCenterforLifeScience,ResearchOrganizationofInformationandSystems)JohnMcCrae(NationalUniversityofIreland,Galway,Ireland)FedericoMorando(NexaCenterforInternet&Society,Italiae)EricNyberg(CarnegieMellonUniversity,USA)MarkPerry(UniversityofnewEngland,Australia)DianePeters(CreativeCommonsHQ)RafalRak(UberResearch,UK)JochenSchirrwagen(UniversitätBielefeld,Germany)InekeSchuurman(CCL,UniversityofLeuven)PeterSuber(BerkmanKleinCentre,HarvardUniversity)KeithSuderman(VassarCollege,LAPPS)ProdromosTsiavos(TheMediaInstitute)PaulUhlir(NationalAcademyofSciences)MaartenvanGompel(RadboudUniversityNijmegen)MarcVerhagen(BrandeisUniversity,LAPPS)PiekVossen(VUUniversityAmsterdam,Netherlands)MenzoWindhouwer(MPIforPsycholinguistics,Netherlands)MaartenZeinstra(Kennisland)

Acknowledgements&Contributors

11

Page 42: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Acknowledgements&Contributors

12

Page 43: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

GuidelinesforprovidersofpublicationsIntroductionInstructionsforpublicationrepositories,libraries,publishersetc.InstructionsforaggregatorsFurtherrequirementsforannotatedpublicationsRecommendedschemaforpublications

Guidelinesforprovidersofpublications

13

Page 44: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

IntroductionOpenMinTeDfacilitatestheuseofTDMtechnologiesinthescientificpublicationsworld,rangingfromgenericscholarlycommunicationtoliteraturerelatedtospecificdisciplines.Scholarlypublicationscomefromawidebulkofstakeholders,e.g.institutionalanddisciplinerepositories,academicjournals,scientificpublishers,etc.Forthefirstphase,thefocusisonliteraturerepositoriesandpublishers,asregardssources,andonOpenAccesscontent,asregardsaccessconditions.

Importantnotice

Itshouldbenotedthatonlypublicationsthatprovidethefulltextor,atleast,anabstractarecandidateforinclusioninOpenMinTeD.

OpenMinTeDreliesonexistinginfrastructuresandstandards/bestpracticesforitsoperation.Thus,toaccessscholarlypublications,itreliesonthetwomainaggregatorsofsuchcontent,OpenAIREandCORE.Providersofscholarlypublicationsareaskedtocontributetheirresourcesbydepositingthematoneofthesestakeholders,followingtheirrespectiveguidelinesandprocedures.Inaddition,OpenAIREandCOREaredevelopingacontentconnectorthatallowsharvestingofopenaccesspublicationsthroughtheAPIsofpublishersthatallowthis.

ScholarlypublicationsareimportedintoOpenMinTeDforTDMprocessingviathecreationofcorporauponqueriessubmittedbytheend-users.ResearcherscometoOpenMinTeDnottoreadpublications,buttobuildacorpusbyselectingpublicationsfromvarioussourcesbasedonspecificcriteria,e.g."acorpusofEnglisharticlesinthebiomedicinearea",inordertorunTDMservicesonthem.

OpenMinTedhaselaboratedseveralarchitecturaloptionsofhowtointegrateexistingcontentproviders(suchasOpenAIREandCOREbutnotlimitedto)andchooseanapproachwherebycontentismanagedinthoseexternalservicesbutisaccessibleintheOpenMinTeDplatformthroughafederatedsearchstrategy.ContentismadeavailabletoOpenMinTedplatformthroughasimpleAPI,definingsimpleoperationstosearchandretrievecontent.

Asoneofthefirststepsofbuildingacorpusofscholarlypublications,end-usersareexpectedtoissueaqueryintheOpenMinTedregistry:infact,theyarepresentedwithafacetedviewoftheOpenMinTeDcontents(i.e.ofallregisteredcontentproviders)and,byselectingfromarangeofcriteria,aqueryisgraduallybuilt.Resultsfromallregisteredcontentprovidersarepresentedtotheend-userand,afterrefinementandcarefulelicitation

Introduction

14

Page 45: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

ofthefinalquery,theassociatedcontentistransferredtoOpenMinTeD’sregistryandbecomesavailableforthesubsequentstepsofaTDMworkflow.Alazydeposit/cachingstrategyhasbeenemployedtoavoidredundantqueries(insimpletermsarecordisfetchedonlythefirsttimeitisrequestedandremainspersistentlocallyforfurtherrequests).Extracareistakentoensurereproducibilityofthecreatedcorpusbystoringanexactversionofthecontentusedinit.

Thus,acorpusincludedintheOpenMinTeDRegistryessentiallyconsistsofalistofpublications.Eachpublicationisidentified(equivalenttoaprimarykey)byitscontent(fulltextpdf)hashvalueandasetofmetadatafiles(intheOMTD-SHAREschema)thatdescribetheresource.Inmostcases,thissetconsistsofjustoneitembutthecasethatmultiplemetadatafilesdescribethesameresourceispossible(forexampledifferentmetadatafilesfromCOREorOpenAIRE,updateinmetadatafields,richermetadatafromacontentprovider,etc.)

Thefollowingsectionspresentalistofinstructions,requirementsandrecommendationsthatpublicationsmustmeettointeractwithTDMresources.

Introduction

15

Page 46: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Instructionsforpublicationrepositories,libraries,journals,publishers,etc.

HowtoregisteryourresourcesHowtomakeyourresourcesinteroperableHowtodocumentyourresources

Instructionsforpublicationrepositories,libraries,journals,publishers,etc.

16

Page 47: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtoregisteryourresources

IfyouwishtoregisterpublicationsthatcanbeharvestedforTDMpurposesthroughOpenMinTeD,youcandoso

byregisteringthroughOpenAIRE,followingproceduresandguidelinesat:https://www.openaire.eu/validator/welcome.action

OR

byregisteringthroughCORE,followingproceduresat:https://core.ac.uk/join

ForeachpublicationtobevalidforimportintoOpenMinTeD,ametadatarecordconformantwiththeOMTD-SHAREminimalschema,andafilewiththecontentsmustbedelivered.

Howtoregisteryourresources

17

Page 48: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtomakeyourresourcesinteroperable

TobefullycompatiblewithOpenMinTeD,youmust:

provideafilewiththeactualcontentsofeachpublicationinanyformatyoudesire(e.g.PDF,HTML,etc.).

Inaddition,ifyouwishyourmaterialtobeeasilyprocessableandinteroperablewithTDMtoolsandservices,youshouldadoptthefollowingrecommendations:

Thepreferredformatsfordeliveringtextualmaterialareplaintext,XML,PDF(notproprietaryandcertainlynotofscannedimages),whichcanbereadbyoneoftheexistingreaders.

Ifappropriateforyourmaterial,useoneofthemorespecificdataformatsthatarealreadysupportedbyreadersandconvertersincludedintheOpenMinTeDregistry(cf.dataFormatSpecific).

ThepreferredcharacterencodingisUTF-8.

Please,notethatnotalloftheaboverequirementsareabsolute:ifyourmaterialisnotcompliantwiththem,itmaystillbeprocessable,buttheiradoptionmakesitbetterequippedforTDMandNLPprocessing.

Howtomakeyourresourcesinteroperable

18

Page 49: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtodocumentyourresources

TobefullycompatiblewithOpenMinTeD,youmust

provideametadatarecordforeachpublicationwithatleastbibliographicinformationaboutit,inpreferencefollowingtheOpenAIREguidelinesensurethatthepublicationsaredistributedunderOpenAccessconditionsincludeinthemetadatarecordofeachpublicationalinktothelicencedocumentthatdescribesthetermsandconditionsunderwhichitisprovided,andattachthelicencedocumenttogetherwiththepublicationifyoualreadyhaveaPIDforyourpublication(preferablyDOI),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformationonidentifierschemes).

Thefollowingrecommendationswillhelpinteractionwithyourresources,buttheyarenotmandatory.

FurtheradoptionofstandardssuchastheJATSarticletagsuiteorTEIP5guidelinesforannotatingtheinnerstructureofpublicationsisrecommended.Usestandardclassificationvocabularies,suchasMeSH,DDC,LCSHetc.)foraddingclassificationtagstoyourmaterialandspecifythevocabularyyouuseinthemetadatarecord;provideatleastonebroadcategoryforyourmaterial(e.g.lifesciences,computingetc.).Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.)inthemetadatarecordsisadded,pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.

Howtodocumentyourresources

19

Page 50: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

InstructionsforaggregatorsForthefirstphaseoftheproject,OpenAIREandCOREwillbringcontentresourcesintoOpenMinTeDthroughuserqueries.Fornextversions,interestedcontentproviderswillbeabletocontributedirectlytoOpenMinTeDiftheyimplementthefollowing:

MapthemetadataoftheircontentstotheOMTD-SHAREschemaProvidesearchcapabilitiesonthemetadataProvidetheactualcontent(e.g.fulltextinthecaseofpublications)

Morespecificinstructionsarefoundinthenextsection.

HowtoregisteryourresourcesHowtodocumentyourresources

Instructionsforaggregators

20

Page 51: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtoregisteryourresources

InterestedcontentprovidersmustimplementaJavainterface,calledContentConnector,whichcanbefoundathttps://github.com/openminted/content-connector-api.TheimplementationisthenincludedinthecodeoftheContentServiceoftheOpenMinTeDplatform.Thisinterfacespecifiesthreemethods:

search,whichacceptsaQueryobjectdescribingaqueryandreturnsapageofmetadata.Thismethodisusedforbrowsingthemetadataoftheproviderandsupportskeywordsearch,advancedsearchinanumberoffieldsandalsofacetedsearch.Theresultofthemethodis(a)apage(ofuserspecifiedsize)ofmetadata,(b)thestatisticsoftheresults(totalnumberofhits,etc),and(c)thefacets(ifrequested).

fetchMetadata,whichacceptsaQuery,but,unlikethepreviousmethod,returnsallthemetadataoftheresult,withoutanystatisticsorfacets.Theresultisastreamcontainingasinglexmlelement(called“publications”),whichinturncontainsallthemetadataofthecontent.Thismethodiscalledwhenacorpusisbeingbuilt.

downloadFullText,whichgivenapublicationidentifier(ascontainedinthemetadata)returnsastreamcontainingtheactualcontent.Thismethodisagainusedwhentheplatformisbuildingacorpus.

AdditionaltechnicalinformationisprovidedintheJavacodeoftheinterface.

Howtoregisteryourresources

21

Page 52: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtodocumentyourresources

Inthecaseofpublications,therequiredmetadatarecordscomeattwolevels:

oneforthewholequery-generatedcorpusofpublications,incompliancewiththeOMTD-SHAREschemaforcorpora,whichisautomaticallyconstructedonthebasisoftheuserfiltersandmanuallyenrichedbytheuser;oneperpublication,withaminimalsetofmetadataelementsincompliancewiththeOMTD-SHAREschemaforpublications,automaticallyconvertedfromthecurrentschemasoftheproviders.

Itshouldbenotedthattheoriginalresourceproviders(e.g.publicationrepositories,publishersetc.)thatofferpublicationsviaOpenAIREandCOREdonothavetochangetheircurrentschemas.MappingsandconversionsbetweentheOpenAIRE andCOREmetadataandtheOMTD-SHAREschemaaremadebytheprovidersthemselvesintheframeworkofOpenMinTeD .

AllmetadatarecordsforpublicationsmustbedeliveredinXMLformat.

.TheOpenAIREschemaandguidelinesarecurrentlyunderrevision;collaborationwiththerelevantactorshasbeenestablishedtotakeintoaccountthenewfeaturesand,wheredesired,influencethechangessoastosupportTDMprocessesinaccordancetotheinteroperabilityrequirements.↩

.Mappingswithothermetadataschemas,includingOpenAIREandCORE,areincludedinthepresentationoftherecommendedmetadataschema.↩

1

2

1

2

Howtodocumentyourresources

22

Page 53: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Furtherrequirementsforannotatedpublications

ScholarlypublicationswillnormallybeimportedintotheOpenMinTeDplatforminanunprocessedformatandwillbeannotatedbytheoperationofTDMsoftwarealsoregisteredintheplatform.

However,certainprovidersmaydecidetoruntheTDMorannotationsoftwareattheirownpremisesanduploadtheresultsoftheprocessingdirectlyintoOpenMinTeD(e.g.annotatingthepublicationswithstructuralmarkup,extractingacknowledgementsorcitationssectionsetc.).

Inthesecases,theannotatedoutputisconsideredanewresourceand,therefore,shouldberegistered

asaseparateresourcefromtherawpublicationinafoldercalled"annotatedfiles"withitsownmetadatarecord,followingtheinstructionsforannotatedpublications.

ItshouldbenotedthatpublicationsannotatedbymeansoftheOpenMinTeDplatformwillbeautomaticallyassignedtheappropriatevaluesfortheseelements.

Furtherrequirementsforannotatedpublications

23

Page 54: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

RecommendedschemaforpublicationsThissectionincludestheoverviewoftherecommendedOMTD-SHAREschemaforpublications,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelements.Onlyelementsrelatedtothedescriptionoftheresourcearepresentedhere;additionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arehandledinternallybytheOpenMinTeDplatform.

Forannotatedpublications,seehere.

OMTD-SHAREelement Usage

documentType M

publicationType M

identifier M

title M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) M

nonStandardLicenceName Rwhenapplicable

nonStandardLicenceTermsURL Mwhenapplicable

versionoflicence Μ

distributionMedium M

downloadURL Μwhenapplicable

documentLanguage M

fullText R

abstract R

author R

publisher R

journal R

mimeType R

characterEncoding R

publicationDate R

Recommendedschemaforpublications

24

Page 55: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

subject R

keyword R

collectedFromrepositoryNameorrepositoryIdentifier R

sourceMetadataLink R

originalDataProviderType R

originalDataProviderRepository Rwhenapplicable

originalDataProviderJournal Rwhenapplicable

originalDataProviderPublisher Rwhenapplicable

relationType R

relatedResource1 Mwhenapplicable

relatedResource2 Mwhenapplicable

Recommendedschemaforpublications

25

Page 56: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

documentType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:documentType:bibliographicRecordOnly,abstract,fullText

Definition/Explanations

Specifieswhetherthemetadatarecordprovidesaccesstothefulltext,theabstractorservesonlyasabibliographicrecord(i.e.includesonlymetadata)

Recommendedusage

Please,selectoneofthevaluesprovidedtoindicatewhetherthemetadatarecordincludesthefulltext(eitherasalinkorasafreetextfieldinsidetherecord),theabstract(again,asalinkorasafreetextdescriptioninametadataelement)ornoneatall.Iftherecordincludesboththeabstractandthefulltext,thepreferredoptionistoselect"fullText".

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:type

documentType

26

Page 57: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

publicationType

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:publicationType:article,bachelorThesis,masterThesis,doctoralThesis,book,bookPart,review,conferenceObject,lecture,workingPaper,prePrint,report,annotation,contributionToPeriodical,patent,inProceedings,booklet,manual,techReport,inCollection,unpublished,other

Definition/Explanations

Specifiesthetypeofthepublication(e.g.whetherit'sajournalarticle,oralpaperorposterintheproceedingsofaconferenceetc.)

Recommendedusage

Please,selectoneofthevaluesfromthelist(compatiblewiththeCASRAIresearch/scholarlyoutputtypesIhttp://dictionary.casrai.org/Output_Types));ifnoneofthevaluesfits,pleaseuse"other"

Relationtoothermetadataschemas

OpenAIREcurrentversion:computedfrominstanceTypeOpenAIREv4.0:dc:typeCORE:article.typesDCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusageforpublicationsistouse"text"fordatacite:resourceTypeGeneralandoneoftheCASRAIvaluesfordatacite:resourceType(e.g.text/ConferenceObject)

publicationType

27

Page 58: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

identifier

Usage

Mandatory

Type

freetext

Attributes

ms-omtd:publicationIdentifierSchemeNameorms-omtd:schemeURI

Definition/Explanations

ReferencetoaDOI(recommended)oranykindofidentifierusedforthepublication

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURIthatdocumentstheschemeitadheresto.

Relationtoothermetadataschemas

OpenAIREcurrentversion:doi/pmc/etc.identifiersOpenAIREv4.0:dc:identifierCORE:article.id&article.identifiersDCMI:skos:closeMatchdct:identifierDataCite4.0:datacite:contributorwithskos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)contributorType="ContactPerson",contributorName(familyName&givenName)ornameIdentifierandnameIdentifierSchemeandschemeURI

identifier

28

Page 59: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

title

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:langandms-omtd:titleType

Definition/Explanations

Thetitleofthepublication

Recommendedusage

Pleaseprovidethetitleasintheoriginalmetadatarecord;the"lang"attributecanbeusedtospecifythelanguageofthetitle,andthe"titleType"attribute(afterDataCite)todifferentiatebetweenmaintitle,alternativeortranslatedtitleandsubtitle.

Relationtoothermetadataschemas

OpenAIREcurrentversion:titleOpenAIREv4.0:dc:titleCORE:article.titleDCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

title

29

Page 60: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

licence

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

licence

30

Page 61: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.ForpublicationsharvestedfromOpenAIREandCORE,pleaseprovidetheoriginallicencevalueifitwasincludedintheoriginalmetadatarecord;inanycase,the"rightsStmtName"elementmustadditionallybeusedforallpublications.

Relationtoothermetadataschemas

OpenAIREcurrentversion:bestlicenseprovidesinfoforNonStandardLicenceTermsandRightsStatementInfoOpenAIREv4.0:dc:rights&file/dc:accessRightsDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

licence

31

Page 62: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtName

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

OpenAIREcurrentversion:conversionfrombestlicenceclassnameDCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

rightsStmtName

32

Page 63: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtURL

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

OpenAIREcurrentversion:http://api.openaire.eu/vocabularies/dnet:access_modesDCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

rightsStmtURL

33

Page 64: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

nonStandardLicenceName

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

freetext

Definition/Explanations

Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences

Recommendedusage

Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.

Relationtoothermetadataschemas

OpenAIREcurrentversion:bestlicenseDCMI:skos:closeMatchdct:title(fordct:licenseDocument)

nonStandardLicenceName

34

Page 65: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

nonStandardLicenceTermsURL

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

OpenAIREcurrentversion:bestlicenseclassidDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

nonStandardLicenceTermsURL

35

Page 66: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

versionoflicence

36

Page 67: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

distributionMedium

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other

Definition/Explanations

Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource

Recommendedusage

Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.ForpublicationsharvestedfromOpenAIREandCORE,thedefaultvalueis"downloadable",ifthedocumentTypeis"abstract"or"fullText".Please,notethatIfthepublicationisdistributedindifferentmediumsunderdifferenttermsofuseorlicences,youcanrepeatthewholesetofelements("distributionInfo")todescribethem.

Relationtoothermetadataschemas

OpenAIREv4.0:distributionInfoarerelatedtowebresourceorurlDCMI:skos:closeMatchdct:medium

distributionMedium

37

Page 68: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

downloadURL

Usage

Recommendedunderconditions

Conditionsforusage

ifdistributionMedium=downloadable

Definition/Explanations

AnyURLwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,useforpublicationswhoseactualcontentisnotalreadyuploadedinOpenMinTeD;inthiscase,pleaseensurethattheURLlinkleadstotheactualcontentofthepublicationandnottoalandingpage.ForpublicationsharvestedfromOpenAIRE&CORE,thefullcontentmustbeuploadedinOpenMinTeDaccordingtotheapprovedguidelinesfortheuserbuiltcorporaofpublications.

Relationtoothermetadataschemas

OpenAIREcurrentversion:urlCORE:article.fulltextURLs

downloadURL

38

Page 69: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

documentLanguage

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:documentLanguage(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

ThelanguagethedocumentiswritteninaccordingtoIETFBCP47guidelines

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageofthedocument(e.g.en-US)accordingtotheIETFBCP47guidelines

Relationtoothermetadataschemas

OpenAIREcurrentversion:language(buttobemappedfromISO639-23-lettercodestous)OpenAIREv4.0:dc:languageCORE:article.languageDCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language

documentLanguage

39

Page 70: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

fullText

Usage

Recommended

Type

freetext

Attributes

xs:lang

Definition/Explanations

Thefulltextofthepublicationinsimpletextformat

Recommendedusage

Youcanusethismetadataelementtoincludethefulltextofthepublicationinsimpletextformatinsteadofuploadingitasaseparatefile.

Relationtoothermetadataschemas

OpenAIREv4.0:file/objectTypeCORE:article.fulltext

fullText

40

Page 71: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

abstract

Usage

Recommended

Type

freetext

Attributes

xs:lang

Definition/Explanations

Theabstractofthedocumentinplaintextformat

Recommendedusage

Youcanusethismetadataelementtoincludetheabstractofthepublicationinsimpletextformat;theelementcanberepeatedforthedifferentlanguageversionsusingthe"lang"attributetospecifythelanguage.

Relationtoothermetadataschemas

OpenAIREcurrentversion:dc:descriptionOpenAIREv4.0:dc:descriptionCORE:article.descriptionDCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType

abstract

41

Page 72: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

author

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thathas/haveauthoredthepublication

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.

Relationtoothermetadataschemas

OpenAIREcurrentversion:rels/relOpenAIREv4.0:datacite:creatorCORE:article.authorsDCMI:skos:closeMatchdct:creatorDataCite4.0:skos:closeMatchdatacite:Creatorwithdatacite:creatorName(familyName&givenName)ordatacite:nameIdentifier&datacite:nameIdentifierScheme&datacite:schemeURI

author

42

Page 73: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

publisher

Usage

Recommended

Type

personororganization,bothencodedwithidentifierormultilingualfreetext

Attributes

forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)ororganization(s)thathas/havepublishedthepublication

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.

Relationtoothermetadataschemas

OpenAIREcurrentversion:publisherOpenAIREv4.0:dc:publisherCORE:article.publisherDCMI:skos:exactMatchdct:publisher

publisher

43

Page 74: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

DataCite4.0:skos:exactMatchdct:Publisher

publisher

44

Page 75: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

journal

Usage

Mandatoryifapplicable

Conditionsforusage

Ifthearticlecomesfromajournal

Type

identifierormultilingualfreetext

Attributes

ms-omtd:journalIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)

Definition/Explanations

Groupsinformationonthejournalwherethepublicationhasappeared

Recommendedusage

Therecommendedwayforreferringtoajournalisbygivingtheiridentifier(e.g.ISSN,DOI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"journalIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthejournal,youmayprovidethetitleatleastinEnglish;ifyouwanttoaddtitlesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREcurrentversion:journalCORE:article.journalsDCMI:skos:exactMatchdct:title(forjournals)

journal

45

Page 76: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)

Relationtoothermetadataschemas

OpenAIREv4.0:format&file/mimetypeDCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format

mimeType

46

Page 77: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mimeType

47

Page 78: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.

characterEncoding

48

Page 79: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

publicationDate

Usage

Recommended

Type

datepattern(yearoryearandmonthorfulldate)

Definition/Explanations

Thepublicationdateor,foranunpublishedwork,thedateitwaswritten

Recommendedusage

Ifpossible,provideatleasttheyearofpublication(orcreation)

Relationtoothermetadataschemas

OpenAIREcurrentversion:dateofacceptanceOpenAIREv4.0:datacite:datewithdateType:acceptedCORE:Article.datePublishedDCMI:skos:closeMatchdct:createdDataCite4.0:skos:closeMatchdatacite:CreationDate

publicationDate

49

Page 80: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

subject

Usage

Recommended

Type

freetext

Attributes

ms:classificationSchemeNameandms:schemeURI

Definition/Explanations

Subjectortopicofthedocument

Recommendedusage

Itisrecommendedthatthesubjectsaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthesubjectvaluesistheidentifierofthesubjectinthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.

Relationtoothermetadataschemas

OpenAIREcurrentversion:subjectwithschemeid&schemename(aftermappingtoourvalues)OpenAIREv4.0:dc:subjectCORE:article.subjects&article.topicsDCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI

subject

50

Page 81: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

keyword

Usage

Recommended

Type

freetext

Definition/Explanations

Wordsusedforindexingthedocument

Recommendedusage

Afreetextelementusedforencodingkeywordsfortheclassificationofthepublication,onlyinEnglish;please,encodeoneword/phraseeachtimeandrepeattheelementformultiplekeywords.

Relationtoothermetadataschemas

OpenAIREcurrentversion:subjectwithclassidequaltokeywordDCMI:skos:narrowMatchdct:subject

keyword

51

Page 82: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

collectedFromrepositoryNameorrepositoryIdentifier

Usage

Recommended

Type

identifier(repositoryIdentifier)ormultilingualfreetext(repositoryName)

Attributes

ms-omtd:repositoryIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)

Definition/Explanations

Referstotheentity(repository,aggregatoretc.)fromwhichthemetadatarecordhasbeenharvestedintoOMTD

Recommendedusage

Therecommendedwayforreferringtoarepositoryisbygivingitsidentifier(e.g.openDOAR);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"repositoryIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftherepository,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREv4.0:dc:source

collectedFromrepositoryNameorrepositoryIdentifier

52

Page 83: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

sourceMetadataLink

Usage

Recommended

Type

URLpattern

Definition/Explanations

Alinktotheoriginalmetadatarecord,incasesofharvesting

Recommendedusage

ThiselementcanbeencodedautomaticallybyOMTDincasesofharvesting.

Relationtoothermetadataschemas

CORE:article.idDCMI:skos:narrowMatchdct:source

sourceMetadataLink

53

Page 84: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

originalDataProviderType

Usage

Recommended

Type

closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:originalDataProviderType:repository,journal,publisher

Definition/Explanations

Referstothetypeoftheoriginaldataprovider(repository/journal/publisher),incasethemetadatarecordcarriesinformationtakenfrompreviousrepositories/journals/publishers(e.g.incasetheOMTDrecord'ssourceisanaggregator)

Recommendedusage

Please,selectoneofthepredefinedvaluesasappropriate.ForrecordsharvestedfromOpenAIREandCORE,thisistheelementwheretheoriginaldataprovider(i.e.therepo/journal/publisher)fromwhichtheythemselveshaveharvestedtherecord.

Relationtoothermetadataschemas

OpenAIREcurrentversion:hastobecomputedfromtheidentifierofcollectedFrominOpenAIRE

originalDataProviderType

54

Page 85: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

originalDataProviderRepository

Usage

Recommendedunderconditions

Conditionsforusage

iforiginalDataProviderType=repository

Type

identifierormultilingualfreetext

Attributes

ms-omtd:repositoryIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)

Definition/Explanations

Referstotheentity(repository,aggregatoretc.)fromwhichthemetadatarecordhasbeenharvested

Recommendedusage

Therecommendedwayforreferringtoarepositoryisbygivingtheiridentifier(e.g.fromOpenDOAR,re3dataetc.);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"repositoryIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftherepository,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREcurrentversion:collectedFromCORE:article.repositoriesDCMI:skos:narrowMatchdct:source

originalDataProviderRepository

55

Page 86: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

originalDataProviderJournal

Usage

Recommendedunderconditions

Conditionsforusage

iforiginalDataProviderType=journal

Type

identifierormultilingualfreetext

Attributes

ms-omtd:journalIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)

Definition/Explanations

Referstothejournalfromwhichthemetadatarecordhasbeenharvested

Recommendedusage

Therecommendedwayforreferringtoajournalisbygivingtheiridentifier(e.g.ISSN,DOI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"journalIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthejournal,youmayprovidethetitleatleastinEnglish;ifyouwanttoaddtitlesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREcurrentversion:collectedFromCORE:article.journalsDCMI:skos:narrowMatchdct:source

originalDataProviderJournal

56

Page 87: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

originalDataProviderPublisher

Usage

Recommendedunderconditions

Conditionsforusage

iforiginalDataProviderType=publisher

Type

organizationencodedwithidentifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Referstothepublisherfromwhichthemetadatarecordhasbeenharvested

Recommendedusage

Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREcurrentversion:collectedFromDCMI:skos:narrowMatchdct:source

originalDataProviderPublisher

57

Page 88: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Forpublications,therecommendedrelationsareisVersionOfandisSimilarTo,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchhasVersionDataCite4.0:skos:closeMatchdatacite:relationType

relationType

58

Page 89: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

relatedResource1

59

Page 90: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

relatedResource2

60

Page 91: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Metadataschemaforannotatedpublications

Annotatedpublicationsaredocumentedasseparateresourceswithalinktotherawpublicationandtheirownsetofmetadataelementsprovidinginformationontheannotationprocess,tooletc.

OMTD-SHAREelement Usage

publicationIdentifier M

annotationLevel M

annotationStandoff R

mimeType R

dataFormatSpecific R

documentationURL R

characterEncoding R

typesystem R

tagset R

annotationMode R

isAnnotatedBy R

annotationDate R

Metadataschemaforannotatedpublications

61

Page 92: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

annotationLevel

Usage

Mwhenapplicable

Conditionsforusage

forallannotatedresources

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:annotationLevel:alignment,discourseAnnotation,discourseAnnotation-argumentation,discourseAnnotation-audienceReactions,discourseAnnotation-coreference,discourseAnnotation-dialogueActs,discourseAnnotation-discourseRelations,lemmatization,morphosyntacticAnnotation-bPosTagging,morphosyntacticAnnotation-posTagging,segmentation,semanticAnnotation,semanticAnnotation-certaintyLevel,semanticAnnotation-emotions,semanticAnnotation-events,semanticAnnotation-namedEntities,semanticAnnotation-polarity,semanticAnnotation-questionTopicalTarget,semanticAnnotation-readabilty,semanticAnnotation-semanticClasses,semanticAnnotation-semanticRelations,semanticAnnotation-semanticRoles,semanticAnnotation-speechActs,semanticAnnotation-subjectivity,semanticAnnotation-temporalExpressions,semanticAnnotation-textualEntailment,semanticAnnotation-wordSenses,syntacticAnnotation-semanticFrames,speechAnnotation,speechAnnotation-orthographicTranscription,speechAnnotation-paralanguageAnnotation,speechAnnotation-phoneticTranscription,speechAnnotation-prosodicAnnotation,speechAnnotation-soundEvents,speechAnnotation-soundToTextAlignment,speechAnnotation-speakerIdentification,speechAnnotation-speakerTurns,stemming,structuralAnnotation,structuralAnnotation-documentDivisions,structuralAnnotation-sentences,structuralAnnotation-clauses,structuralAnnotation-phrases,structuralAnnotation-words,syntacticAnnotation-subcategorizationFrames,syntacticAnnotation-dependencyTrees,syntacticAnnotation-constituencyTrees,syntacticAnnotation-chunks,syntacticosemanticAnnotation-links,translation,transliteration,modalityAnnotation-bodyMovements,modalityAnnotation-facialExpressions,modalityAnnotation-gazeEyeMovements,modalityAnnotation-handArmGestures,modalityAnnotation-handManipulationOfObjects,modalityAnnotation-headMovements,modalityAnnotation-lipMovements,other

annotationLevel

62

Page 93: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Definition/Explanations

Theannotationleveloftheannotatedresourceorwhatas/wcomponentconsumesorproducesasoutput

annotationLevel

63

Page 94: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

annotationStandoff

Usage

Recommended

Type

boolean

Definition/Explanations

Indicateswhethertheannotationiscreatedinlineorinastand-offfashion.

Forinteroperabilityreasons,therecommendedformatisthestand-offmode

annotationStandoff

64

Page 95: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.

mimeType

65

Page 96: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

documentationURL

Usage

Recommended

Type

urlpattern

Definition/Explanations

Linktothedocumentationforthespecificdataformat(explanationsandexamples)

documentationURL

66

Page 97: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

dataFormatSpecific

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:dataFormatSpecific:aclAnthology,aimedCorpus,alvisEnrichedDocument,bioNLP,bioNLP;,format-variant=ST2013a1_a2bnc,cadixeJSON,conll2000,conll2002,conll2006,conll2007,conll2009,conll2012,dataSift,factoredTagLem,gate,genia,graf,html5Microdata,i2b2,imsCwb,jdbc,keaCorpus,lll,negraExport,pml,ptb;,format-variant=chunked,ptb;,format-variant=combined,relp,tiger,tupp-dz,twitter,uimaBinaryCas,uimaCASDump,web1t,xces;,format-variant=ilsp

Definition/Explanations

Thesupplementarylevelofdataformat

dataFormatSpecific

67

Page 98: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.Theelementcanberepeatedforcorporathatincludesfilesofvariouscharacterencodings.

characterEncoding

68

Page 99: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

typesystem

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

typesystem

69

Page 100: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

tagset

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

tagset

70

Page 101: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

annotationMode

Usage

Recommended

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:annotationMode:manual,automatic,mixed,interactive

Definition/Explanations

Indicateswhethertheresourceisannotatedmanuallyorbyautomaticprocesses

annotationMode

71

Page 102: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

isAnnotatedBy

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothecomponentusedfortheannotationoftheresource

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

isAnnotatedBy

72

Page 103: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

annotationDate

Usage

Recommended

Type

dateorrangeofdates

Definition/Explanations

Thedates(eitherdateorrangeofdates)inwhichtheannotationprocesshastakenplace

annotationDate

73

Page 104: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

GuidelinesforprovidersofcorporaIntroductionInstructionsforprovidersofcorpora

Guidelinesforprovidersofcorpora

74

Page 105: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

IntroductionOpenMinTeDfacilitatestheuseofTDMtechnologiesinthescientificpublicationsworld,rangingfromgenericscholarlycommunicationtoliteraturerelatedtospecificdisciplines.

CorporaintheOpenMinTeDframeworkrefermainlytothecollectionsofpublicationsthatwillbeusedasminingsourceintheTDMprocess.Infact,theOpenMinTeDplatformincludesamechanismforautomaticallygeneratingcorporabasedonusercriteriaselectedfromafacetedviewofallpublicationsprovidedbytheOpenMinTeDpartners-moredetailsareincludedintheGuidelinesforpublications.

Corporamayalsocomefromrepositoriesoflanguageresources,suchasMETA-SHAREandCLARIN,ordiscipline-specificrepositories,inwhichcasetheydonothavetobecomposedofscholarlypublications.Examplesincludereferencecorpora(i.e.corporadeemedrepresentativeofgenerallanguageorasublanguageusage),newscorpora,collectionsofdomain-specifictexts,suchasmanuals,etc.aswellasannotatedcorpora,suchastreebanks,morphologicallytaggedgoldencorporaetc.Thesecorporaarenottargetedassourceofminingbutcanbeusedfortrainingcomponents(e.g.trainalanguagemodel)orforevaluatingtheirperformanceorforancillarypurposes.

TobevalidforregistrationintoOpenMinTeD,allcorporamustbeaccompaniedwithametadatarecordconformantwiththeOMTD-SHAREschema,andafilewiththecontentsmustbemadereadilyaccessibleduringtheprocessingoperation.

Thefollowingsectionspresentalistofinstructions,requirementsandrecommendationsthatcorporamustmeettointeractwithTDMresources.

Introduction

75

Page 106: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

InstructionsforprovidersofcorporaHowtoregisteryourresourcesHowtomakeyourresourcesinteroperableHowtodocumentyourresourcesFurtherrequirementsforannotatedcorporaRecommendedschemaforcorpora

Instructionsforprovidersofcorpora

76

Page 107: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtoregisteryourresources

Corporacanberegisteredbyauthorisedusers.

Ifyouwishtoregisteracorpus,youmust:

provideametadatarecordcompliantwiththeOMTD-SHAREschemaforcorpora,atleastattheminimallevelwhichyoucanuploadtotheRegistryasanXMLfileand/oreditwiththeOpenMinTeDmetadataeditorprovideazippedfilewiththecontentsofthecorpusoralinktoaURLwherethecorpusisdirectlyaccessible(i.e.notalandingpage);wherepossible,thezippedfileshouldfollowthefolderstructurerecommendedforOpenMinTeDpublications,i.e.separatefoldersforcontents,metadatarecordsandlicencedocuments.

Ifthecorpusisstoredattherepositoryofanetworkorinfrastructurethatallowsharvesting(normallyuponagreementsmadewithOpenMinTeD),youcanalsoprovidetherelevantidentifierandthiswillbeuploadedwiththeappropriatedescription.Wherepossible(andthiswillbeappropriatelyindicated),themetadatadescriptionwillbeautomaticallyconvertedtotheOMTD-SHAREschemaandpresentedtotheuserforfurtherediting.

Howtoregisteryourresources

77

Page 108: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtomakeyourresourcesinteroperable

InordertoensurethatyourcorporacanbeminedintheOpenMinTeDplatform,youmustfollowthesamerequirementsthataresetforscholarlypublications.Youmusttherefore

providedirectaccesstothecontentsofeachcorpusdescribeeachcorpuswithametadatarecordcompatiblewiththeOTMD-SHAREminimalschema.

Inaddition,thefollowingrecommendationscontributetointeroperabilityandmakeyourcorporaeasiertoprocess:

Thepreferredformatsfordeliveringtextualmaterialareplaintext,XML,PDF(notproprietaryandcertainlynotofscannedimages),whichcanbereadbyoneoftheexistingreaders.

Ifappropriateforyourmaterial,useoneofthemorespecificdataformatsthatarealreadycoveredbyreadersandconverters(cf.dataFormatSpecific).

ThepreferredcharacterencodingisUTF-8.

Ifyoufailtoabidetothem,itmightstillbepossibletoprocessyourcorporaviatheOpenMinTeDplatform,butthiscannotbeguaranteedandinteroperabilitywithotherresourceswillsufferloss.

So,ifyouintendtocreateanewcorpus,itisimportantthatyoutakeintoaccountfromtheearlystepsofitsdesign,therequirements,standards,bestpracticesandrecommendationspromotedbyOpenMinTeDandothercooperatinginfrastructures.

Please,notethattherearenogeneralrequirementsyetforcorporatobeusedforancillarypurposes(e.g.fortrainingatool),asthesearedependentontherequirementsofthesoftwarethatwillusethemandonthepurposeofuse.

Howtomakeyourresourcesinteroperable

78

Page 109: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtodocumentyourresources

TobefullycompliantwithOpenMinTeD,youmust

ensurethatthecorpusisdistributedunderOpenAccessconditionsincludeinthemetadatarecordalinktothelicencedocumentthatdescribesthetermsandconditionsunderwhichitisprovided,andattachthelicencedocumenttogetherwiththeresourceifyoualreadyhaveaPIDforyourpublication(preferablyDOI),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformationonidentifierschemes).

Furtherrecommendationswillcontributetotheinteroperabilityofyourresources:

FurtheradoptionofstandardssuchastheJATSarticletagsuiteorTEIP5guidelinesforannotatingtheinnerstructureoftextsisrecommended.Please,ensurethatyouversionallyourresourcesandlabeltheversionsinanunambiguousway,preferablyfollowingtheSemanticVersioningrecommendations.Itisimportantthatyouprovidetheappropriatedocumentationforyourresource(e.g.publicationsaboutthedesignandconstructionofthecorpusetc.),whichyoushouldalsoversionalongwiththecorpusandaddasreferencetoyourmetadatarecord.Recommendoneofthepublicationsaboutyourresourceastheonetobecitedforscholarlyattributionandaddthisinformationinthemetadatarecord.Makesurethatyoufillinthemetadatarecordalltheelementsrequiredforcitingyourresource ,i.e.thecreatoroftheresource,atitle,theresourcetypeandanidentifier,andoptionally,thepublicationdate,theversionandthepublisherordistributor.Usestandardclassificationvocabularies(e.g.MeSH,DDC,LCSHetc.)foraddingclassificationtagstoyourmaterialandspecifythevocabularyyouuseinthemetadatarecord;provideatleastonebroadcategoryforyourmaterial(e.g.lifesciences,computingetc.).Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.)inthemetadatarecordsisadded,pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.

Forcitation,OpenMinTeDendorsestheJointDeclarationofDataCitationPrinciples,aswellasthemorespecialisedRDArecommendationsfordatacitationofevolvingdataandDataCiteguidelines.

1

1

Howtodocumentyourresources

79

Page 110: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtodocumentyourresources

80

Page 111: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Furtherrequirementsforannotated/processedcorpora

CorporacanberegisteredintheOpenMinTeDplatform

inanunprocessedformatandannotatedbytheoperationofTDMsoftwarealsoregisteredintheplatformand/orinanalreadyprocessedformat;inthiscase,theymustbeincludedasaseparateresourcewithitsownmetadatarecordincludingaspecificsetofmetadataelements(thesameasforannotatedpublications).

ItshouldbenotedthatcorporaannotatedbymeansoftheOpenMinTeDplatformwillbeautomaticallyassignedtheappropriatevaluesfortheseelements.

Furtherrequirementsforannotatedcorpora

81

Page 112: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Recommendedschemaforcorpora

Overview

Thissectionincludesasynopsisoftheminimalschemaforcorpora,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelements.Additionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arenotpresentedhere,astheyaretobehandledbytheOMTDplatform.

Forannotatedcorpora,seehere.

OMTD-SHAREelement Usage

resourceType Μ

resourceName Μ

description Μ

identifier Μ

version M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) Μ

nonStandardLicenceName Rwhenapplicable

nonStandardLicenceTermsURL Μwhenapplicable

versionoflicence Μ

distributionMedium Μ

downloadURL Μwhenapplicable

contactEmailorlandingPage(oneofthetwomustbeprovided) Μ

contactPerson(identifierorpersonName) R

contactGroup(identifierororganizationName R

mustBeCitedWith R

resourceCreator R

creationDate R(Mforquery-builtcorpora)

corpusType Μ

mediaType Μ

Recommendedschemaforcorpora

82

Page 113: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

lingualityType Μ

multilingualityType Μwhenapplicable

language Μ

sizePerLanguage Μ

size Μ

mimeType R

characterEncoding R

domain R

subject R

keyword R

userQuery Μwhenapplicable

relationType R

relatedResource1 R

relatedResource2 R

Recommendedschemaforcorpora

83

Page 114: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceName

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Thefullnamebywhichtheresourceisknown

Recommendedusage

Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“BritishNationalCorpus”insteadofjust“corpusofEnglish”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,pleaseuseanindicativenamewiththesourcesandthedates(e.g."SubcorpusofOpenAIREwithbiochemistryarticlescreatedon4/10/2016")

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

Recommendedschemaforcorpora

84

Page 115: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Attributes

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component

Definition/Explanations

Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatatoolorservicetakesasinputorproducesasoutput

Recommendedusage

Forcorpora,thefixedvalue"corpus"mustbeaddedautomatically

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusagefortextcorporaistouse"dataset"butthevalues"collection"and"text"canalsobeused

Recommendedschemaforcorpora

85

Page 116: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

description

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Providesthedescriptionoftheresourceinprose

Recommendedusage

Giveabriefyetinformativedescriptionofthecorpuscontents,mentioningatleastlanguage(s),subject(s)/domain(s)and,ifpossible,sizeandprovenance.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType

Recommendedschemaforcorpora

86

Page 117: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

identifier

Usage

Mandatory

Type

freetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI

Definition/Explanations

ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither

theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theidentifiermustbeassignedautomatically.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)

Recommendedschemaforcorpora

87

Page 118: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

version

Usage

Recommended

Type

freetext

Definition/Explanations

Anystring,usuallyanumber,thatidentifiestheversionofaresource

Recommendedusage

Please,keepthisonlyforversionsofthesameresource(e.g.corrected,enlargedetc.)andnotforvariantsorforversionswithadditionalordifferentinformation.Therecommendedpracticeforversioningshouldfollowsemanticversioningguidelines(http://semver.org/)

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:hasVersionDataCite4.0:skos:exactMatchdatacite:Version

Recommendedschemaforcorpora

88

Page 119: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

licence

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

Recommendedschemaforcorpora

89

Page 120: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thelicencevaluescanbeautomaticallyaggregatedfromthelicencevaluesofthemetadatarecordsincludedinthem;inanycase,the"rightsStmtName"canalsobecomputedautomatically.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaforcorpora

90

Page 121: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtName

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaforcorpora

91

Page 122: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtURL

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaforcorpora

92

Page 123: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

Recommendedschemaforcorpora

93

Page 124: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

nonStandardLicenceName

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

freetext

Definition/Explanations

Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences

Recommendedusage

Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:title(fordct:licenseDocument)

Recommendedschemaforcorpora

94

Page 125: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

nonStandardLicenceTermsURL

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaforcorpora

95

Page 126: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

distributionMedium

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other

Definition/Explanations

Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource<

Recommendedusage

Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thedefaultvalueis"downloadable".Please,notethatIfthepublicationisdistributedindifferentmediumsunderdifferenttermsofuseorlicences,youcanrepeatthewholesetofelements("distributionInfo")todescribethem.

Recommendedschemaforcorpora

96

Page 127: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

downloadURL

Usage

Mandatoryunderconditions

Conditionsforusage

ifdistributionMedium=downloadable

Definition/Explanations

Anyurlwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,useforcorporawhoseactualcontentisnotuploadedinOpenMinTeD;inthiscase,pleaseensurethattheURLlinkleadstotheactualcontentofthecorpusandnottoalandingpage.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thefullcontentisalreadyuploadedinOpenMinTeD,andthereforethedownloadURLisautomaticallyinserted(publicurllinkfromwhichthecorpuscanbedownloaded).

Recommendedschemaforcorpora

97

Page 128: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactEmail

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

emailpattern

Definition/Explanations

Ageneralemailaddressthatcanbeusedascontactpointforaresource([email protected])

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"elementForcorporacreatedthroughtheOMTDcorpusbuildingprocess,acontactEmailisinsertedautomaticallyfilledinwiththeemailaddressoftheuserthathasbuiltit.

Recommendedschemaforcorpora

98

Page 129: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

landingPage

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

URLpattern

Definition/Explanations

AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"elementForcorporacreatedthroughtheOMTDcorpusbuildingprocess,alandingPagewillalsobeautomaticallycreatedwithinformationontheuserqueryandthecontentsoftheresults.

Recommendedschemaforcorpora

99

Page 130: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactPerson(identifierorpersonName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)

Recommendedschemaforcorpora

100

Page 131: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactGroup(identifierororganizationName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.

Recommendedschemaforcorpora

101

Page 132: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mustBeCitedWith

Usage

Recommended

Type

freetextoridentifier

Definition/Explanations

Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)

Recommendedusage

Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.

Recommendedschemaforcorpora

102

Page 133: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceCreator

Usage

Recommended

Type

personororganization,bothencodedwithidentifierormultilingualfreetext

Attributes

forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)ororganization(s)thathas/havecreatedtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theresourcecreatorisconsideredtobethepersonthathasputtogetherthecorpusthroughtheuserquery.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:creatorDataCite4.0:skos:closeMatchdatacite:Creatorwithdatacite:creatorName(familyName&givenName)ordatacite:nameIdentifier&datacite:nameIdentifierScheme&

Recommendedschemaforcorpora

103

Page 134: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

datacite:schemeURI

Recommendedschemaforcorpora

104

Page 135: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

creationDate

Usage

Recommended

Type

datepatternordaterange

Definition/Explanations

Thedateofthecreationofhteresource,expressedasarangebetweenstartingandenddateorexactdate

Recommendedusage

Please,indicateatleastyearofcreation,ortimeinterval.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thecreationDateisautomaticallyinserted.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:createdDataCite4.0:skos:exactMatchdatacite:CreationDate

Recommendedschemaforcorpora

105

Page 136: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

corpusType

Usage

Mandatory

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:corpusType:raw,annotated,annotations

Definition/Explanations

Thesubtypeofthecorpusintermsofprocessing(i.e.whetheritisraw/unprocessed,annotatedorcomposedonlyofannotationswithlinkstotheoriginalrawcorpus

Recommendedusage

Please,selecttheappropriatevalue.ForcorporacreatedthroughthecorpusbuildingprocessofOMTD,thevalueisautomaticallysetto"raw"

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdc:type

Recommendedschemaforcorpora

106

Page 137: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mediaType

Usage

Mandatory

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mediaType:text,audio,video,image

Definition/Explanations

Specifiesthemediatypeoftheresourceandbasicallycorrespondstothephysicalmediumofthecontentrepresentation.Eachmediatypeisdescribedthroughadistinctivesetoffeatures.Aresourcemayconsistofpartsattributedtodifferenttypesofmedia.Acomponentmaytakeasinput/outputmorethanonedifferentmediatypes.

Recommendedusage

OpenMinTeDonlyhandlestextresources,sothedefaultvalueissetto"text".

Recommendedschemaforcorpora

107

Page 138: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

lingualityType

Usage

Mandatory

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:lingualityType:monolingual,bilingual,multilingual

Definition/Explanations

Indicateswhethertheresourcecontainsone,twoormorelanguages

Recommendedusage

Please,selectoneofthevalues.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thevaluecanbeautomaticallycomputed.

Recommendedschemaforcorpora

108

Page 139: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

multilingualityType

Usage

Mandatoryunderconditions

Conditionsforusage

iflingualityType=bilingualormultilingual

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:multilingualityType:parallel,comparable,multilingualSingleText,originalTranslationsInSameText,other

Definition/Explanations

Indicateswhetherthecorpusisparallel,comparableormixed

Recommendedusage

Please,selectoneofthevalues.

Recommendedschemaforcorpora

109

Page 140: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

language

Usage

Mandatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)ofthecorpusaccordingtoIETFBCP47guidelines.

ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thevaluecanbecomputedautomatically.

Theelementcanberepeatedtoencodemultiplelanguages.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageofthedocument(e.g.en-US)accordingtotheIETFBCP47guidelines

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language

Recommendedschemaforcorpora

110

Page 141: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

sizePerLanguage

Usage

Recommended

Type

sizepattern(sizeandsizeUnit)

Definition/Explanations

Providesinformationonthesizeperlanguagesubset

Recommendedusage

Youmayindicatethesizeofthesubsetsofthecorpusperlanguage;todothat,fillintheappropriatenumber(withoutspaces)andselecttheappropraitesizeUnit(e.g.20000words).ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thiscanbeautomaticallycomputed,forinstance,forfiles/publications.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size

Recommendedschemaforcorpora

111

Page 142: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

size

Usage

Mandatory

Type

sizepattern(sizeandsizeUnit)

Definition/Explanations

Providesinformationonthesizeoftheresourceorofresourceparts.

Recommendedusage

Youmayindicatethesizeoftheentirecorpus(orcorpusparts)byfillingintheappropriatenumberandselectingtheappropriatesizeUnit(e.g.20000words).ThepreferredsizeUnitiswordsorsentences.Ifnothingelseisknown,pleaseindicateatleastfiles.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thiscanbeautomaticallycomputed,forinstance,forfiles/publications.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size

Recommendedschemaforcorpora

112

Page 143: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format

Recommendedschemaforcorpora

113

Page 144: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Recommendedschemaforcorpora

114

Page 145: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.Theelementcanberepeatedforcorporathatincludesfilesofvariouscharacterencodings.

Recommendedschemaforcorpora

115

Page 146: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

domain

Usage

Recommended

Type

freetext

Attributes

ms:classificationSchemeNameandms:schemeURI

Definition/Explanations

Domainofthecorpus

Recommendedusage

Itisrecommendedthatdomainvaluesaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthedomainvaluesistheidentifierofthedomaininthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI

Recommendedschemaforcorpora

116

Page 147: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

subject

Usage

Recommended

Type

freetext

Attributes

ms:classificationSchemeNameandms:schemeURI

Definition/Explanations

Subjectortopicofthecorpus

Recommendedusage

Itisrecommendedthatthesubjectsaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthesubjectvaluesistheidentifierofthesubjectinthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI

Recommendedschemaforcorpora

117

Page 148: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

keyword

Usage

Recommended

Type

freetext

Definition/Explanations

Wordsusedforindexingthecorpus

Recommendedusage

Afreetextelementusedforencodingkeywordsfortheclassificationofthepublication,onlyinEnglish;please,encodeoneword/phraseeachtimeandrepeattheelementformultiplekeywords.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:subject

Recommendedschemaforcorpora

118

Page 149: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

userQuery

Usage

Mandatorywhenapplicable

Type

freetext

Definition/Explanations

Thequerytextthathascreatedthecorpusofscholarlypublications

Recommendedusage

TobefilledinautomaticallyduringtheOMTDcorpusbuildingprocess

Recommendedschemaforcorpora

119

Page 150: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Forcorpora,therecommendedrelationsareisVersionOfandisSimilarTo,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DataCite4.0:skos:closeMatchdatacite:relationType

Recommendedschemaforcorpora

120

Page 151: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaforcorpora

121

Page 152: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaforcorpora

122

Page 153: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Metadataschemaforannotatedcorpora

Annotatedcorporaaredocumentedasseparateresources

includingonlytheannotateddata,withalinktotherawcorpusanditsownsetofmetadataelementsprovidinginformationontheannotationprocess,tooletc.orasasetofrawandannotatedfilestogether,withametadatarecordthatincludesalltheappropriateelementsforrawcorpora(cf.above)withtheadditionalsetofmetadataelementsforannotations,i.e.allthefollowingelementsexceptfor"resourceIdentifier".

OMTD-SHAREelement Usage

resourceIdentifier M

annotationLevel M

annotationStandoff R

mimeType R

dataFormatSpecific R

documentationURL R

characterEncoding R

typesystem R

tagset R

annotationMode R

isAnnotatedBy R

annotationDate R

Metadataschemaforannotatedcorpora

123

Page 154: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Guidelinesforprovidersofancillaryknowledgeresources

IntroductionInstructionsforprovidersofancillaryknowledgeresources

Guidelinesforprovidersofknowledgeresources

124

Page 155: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

IntroductionManyTDMtoolsandservicesmakeuseofancillaryknowledgeresources.Byknowledgeresources,wemeaninformationfromsomedomainorareaofhumanendeavor(e.g.linguistics,agriculture,orthesocialsciences),representedinaformthatcanbeusedtosolveproblemscomputationallyinthatdomainorarea .Creationofsuchknowledgeresourcesiswidespreadinbothlinguistics,andinmanydomainswhereinformaticsisapplied.Theseknowledgeresourcestypicallyincludecontrolledvocabularies,terminologies,lexica,ontologies,andsoon.

AsOpenMinTeDisaboutapplyingTDMtoend-userdomains,theresourcesusedinthosedomainsareofprimaryimportance.Similarly,astextisimportanttoOpenMinTeDtoolsandservices,solinguisticresources(e.g.resourcesthatdescribepartsofspeech)arealsoimportant.

OpenMinTeDtoolsandservicesmaymakeuseoftheseresourcesinordertoprocesstext.Forexample,aservicemaymakeuseofadictionaryofarchaeologicaltermswhenprocessingobjectdescriptions.Or,aservicemaymakeuseofpartsofspeechtofindtheadjectivesinadocument,andusethisinformationtohelpdeterminethesentimentofthedocument.

InordertomakeiteasiertosharetheresultsofTDM,andinordertoallowTDMtoolsandservicestoworktogether,OpenMinTeDmakesanumberofrecommendationsabouthowknowledgeresourcesarerepresented.Knowledgeresourcesthatdonotfollowtheserecommendationscanofcoursebeused;however,interoperabilitywillbereduced.

TheOpenMinTeDrecommendationsonknowledgeresourcesarebasedontheLinkedDataparadigm.By"LinkedData",wemeandatathatiscreatedandmadeavailablewiththeuseofsemanticwebtechnologiesandformats(e.g.RDF,OWL,SPARQL)and,mostimportantly,thatisinterrelatedwithotherdata.

.Poole,DavidandAlanMackworth(2010)ArtificialIntelligence,CambridgeUniversityPress↩

1

1

Introduction

125

Page 156: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Instructionsforprovidersofancillaryknowledgeresources

HowtoregisteryourknowledgeresourcesHowtomakeyourknowledgeresourcesinteroperableHowtodocumentyourknowledgeresourcesRecommendedschemaforlexical/conceptualresources,incl.annotationresourcesRecommendedschemaformodels

Instructionsforprovidersofancillaryknowledgeresources

126

Page 157: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

HowtoregisteryourknowledgeresourcesAncillaryknowledgeresourcescanberegisteredbyauthorisedusersasdecidedintheOpenMinTeDPolicies.

Ifyouwishtoregistersucharesource,dependingonthemodeofregisteringtheresource,thefollowingrequirementsareinorder:

iftheresourceisbeingprovidedforuploadtotheOpenMinTeDregistry,pleasepackageitasazipfilepreservingtherecommendedfolderstructureiftheresourceisavailableaspartofaMavenartifact,pleaseprovidetheappropriateMavencoordinatesiftheresourceisofferedwithaSPARQLendpointorataURL,pleasetypeintherelevantlink.

Inallcases,youmustalso

provideametadatarecordcompliantwiththeOMTD-SHAREschema.

Wherepossible,e.g.inthecaseofprovidingaMavenartifact,metadatamaybe,atleastpartially,convertedfromtheexistingdescriptors.Inallcases,youwillbenotifiedoftheavailabilityofconvertedmetadataatthetimeofuploading.

Howtoregisteryourknowledgeresources

127

Page 158: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtomakeyourknowledgeresourcesinteroperable

Inaddition,ifyouwanttobefullycompliantwiththeOpenMinTeDinteroperabilityrequirements,pleaseensurethat

youprovidetheresourceinastandardformat,preferablyXMLorJSON-basedsyntax,oranyotherRDFserialisationformat(e.g.TurtleorN3)allelementsintheknowledgeresourceareidentifiedwithaURI;forLinkedDataresources,thefollowingidentifiersshouldbeused:

JSON-LD-the@idkeywordRDF/XML-theattributesxml:base,rdf:IDandrdf:aboutXML-thexml:idattribute

youregisterknowledgeresourcesindependentlyofanycomponentthatusesthem,e.g.inaseparateMavenartifact.

Inthecasethatyouprovidetheresource

inanotherformat,giventhatadherencetoLinkedDatastandardsisnotimposedpackagedinMavenartifactswiththecomponentsthatuseit,attheexpense,however,ofreusability

youstillqualifyforpartialcompliance.

Howtomakeyourknowledgeresourcesinteroperable

128

Page 159: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtodocumentyourknowledgeresources

TobefullycompatiblewithOpenMinTeD,youmust

ensurethattheresourceisdistributedunderOpenAccessconditionsincludeinthemetadatarecordalinktothelicencedocumentthatdescribesthetermsandconditionsunderwhichitisprovided,andattachthelicencedocumenttogetherwiththeresourceifyoualreadyhaveaPIDforyourresource(e.g.aURIoraHANDLE),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformation).providelinkagebetweenyourresourceandotherresources(domain-specificorgenericresources);forlinksbetweenknowledgeresourcesintheLinkedDataparadigm,mappingshouldbeexpressedthroughRDFstatements,usingrelationsfromSKOS,togetherwiththefollowingOWLandRDFobjectproperties:owl:sameAs,owl:equivalentClass,owl:equivalentProperty,rdfs:subClassOf,rdfs:subPropertyOf.versionallyourresourcesandlabeltheversionsinanunambiguousway,preferablyfollowingtheSemanticVersioningrecommendations.

Thefollowingrecommendationscontributetointeroperabilitybutarenotyetenforced:

Itisimportantthatyouprovidetheappropriatedocumentationforyourresource(e.g.publicationsaboutthedesignandconstructionofthecorpusetc.),whichyoushouldalsoversionalongwiththeknowledgeresourceandaddasreferencetoyourmetadatarecord.Recommendoneofthepublicationsaboutyourresourceastheonetobecitedforscholarlyattributionandaddthisinformationinthemetadatarecord.Makesurethatyoufillinthemetadatarecordalltheelementsrequiredforcitingyourresource ,i.e.thecreatoroftheresource,atitle,theresourcetypeandanidentifier,andoptionally,thepublicationdate,theversionandthepublisherordistributorofUsestandardclassificationvocabularies(e.g.MeSH,DDC,LCSHetc.)foraddingclassificationtagstoyourmaterialandspecifythevocabularyyouuseinthemetadatarecord;provideatleastonebroadcategoryforyourmaterial(e.g.lifesciences,computingetc.).Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.)inthemetadatarecordsisadded,pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.

Thefollowingsectionsincludeasynopsisoftheminimalschemasforancillaryknowledgeresources,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelementsperresourcetype,giventhatknowledgeresourcesmaytakeoneofthefollowingresource

1

Howtodocumentyourknowledgeresources

129

Page 160: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

types:

lexical/conceptualresource:reservednotonlyforlexica,ontologies,termlists,glossariesetc.butalsoforanyresourcethatcanbeusedforannotationpurposes,i.e.linguistictagsets,typesystemsetc.languagedescription:reservedmainlyforcomputationalgrammarsmodel:formachinelearningandstatisticalmodels.

Itshouldalsobenotedthatadditionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arenotpresentedhere,astheyaretobehandledbytheOMTDplatform.

Forcitation,OpenMinTeDendorsestheJointDeclarationofDataCitationPrinciples,aswellasthemorespecialisedRDArecommendationsfordatacitationofevolvingdataandDataCiteguidelines.

1

Howtodocumentyourknowledgeresources

130

Page 161: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

131

Page 162: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

OMTD-SHAREelement Usage

resourceType M

resourceName M

description M

identifier M

version M

distributionMedium M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided)] M

versionoflicence M

distributionMedium M

downloadURL Mwhenapplicable

contactEmailorlandingPage(oneofthetwomustbeprovided) M

contactPerson(identifierorpersonName) R

contactGroup(identifierororganizationName) R

mustBeCitedWith R

lexicalConceptualResourceType M

encodingLevel R

linguisticInformation R

conformanceToStandardsBestPractices R

lingualityType M

language M

metalanguage R

size&sizeUnit M

mimeType R

characterEncoding R

domain R

relationType R

relatedResource1 R

relatedResource2 R

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

132

Page 163: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

133

Page 164: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Attributes

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component

Definition/Explanations

Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatacomponenttakesasinputorproducesasoutput

Recommendedusage

Forlexical/conceptualresources,thefixedvalue"lexicalConceptualResource"mustbeaddedautomatically

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusageforlexical/conceptualresourcesistouse"dataset"butthevalues"collection"and"text"canalsobeused

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

134

Page 165: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceName

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Thefullnamebywhichtheresourceisknown

Recommendedusage

Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“GreekPAROLElexicon”insteadofjust“amonolinguallexiconofGreek”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

135

Page 166: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

description

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Providesthedescriptionoftheresourceinprose

Recommendedusage

Giveabriefyetinformativedescriptionofthecorpuscontents,mentioningatleastlanguage(s),subject(s)/domain(s)and,ifpossible,sizeandprovenance.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

136

Page 167: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

identifier

Usage

Mandatory

Type

freetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI

Definition/Explanations

ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither

theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

137

Page 168: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

licence

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

138

Page 169: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

139

Page 170: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtName

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

140

Page 171: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtURL

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

141

Page 172: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

nonStandardLicenceName

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

freetext

Definition/Explanations

Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences

Recommendedusage

Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:title(fordct:licenseDocument)

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

142

Page 173: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

nonStandardLicenceTermsURL

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

143

Page 174: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

144

Page 175: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

distributionMedium

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other

Definition/Explanations

Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource<

Recommendedusage

Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.Forinteroperabilityreasons,therecommendedwayofprovidingannotationresources(e.g.tagsets,ontologiesetc.)istodistributetheminadownloadableformorinawaythatcanbeeasilyaccessedbythes/w

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

145

Page 176: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

downloadURL

Usage

Mandatoryunderconditions

Conditionsforusage

ifdistributionMedium=downloadable

Definition/Explanations

Anyurlwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,useforresourceswhoseactualcontentisnotuploadedinOpenMinTeD;inthiscase,pleaseensurethattheURLlinkleadstotheresourceitselfandnottoalandingpage.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

146

Page 177: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

accessURL

Usage

Mandatoryunderconditions

Conditionsforusage

ifdistributionMedium=webExecutableoraccessibleThroughInterface

Definition/Explanations

Alandingpage,feed,SPARQLendpointetc.thatgivesaccesstotheresourceorwherethewebservice/workflowisexecuted

Recommendedusage

Pleaseuseforresourcesthatare"accessibleThroughInterface"or"webExecutable"

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

147

Page 178: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactEmail

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

emailpattern

Definition/Explanations

Ageneralemailaddressthatcanbeusedascontactpointforaresource([email protected])

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

148

Page 179: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

landingPage

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

URLpattern

Definition/Explanations

AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

149

Page 180: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactPerson(identifierorpersonName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

150

Page 181: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactGroup(identifierororganizationName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

151

Page 182: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mustBeCitedWith

Usage

Recommended

Type

identifierorfreetext

Definition/Explanations

Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)

Recommendedusage

Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

152

Page 183: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

lexicalConceptualResourceType

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:lexicalConceptualResourceType:wordList,computationalLexicon,ontology,wordnet,thesaurus,framenet,terminologicalResource,machineReadableDictionary,lexicon,typesystem,tagset,mappingOfResources,other

Definition/Explanations

Specifiesthetypeoflexical/conceptualresources

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:type

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

153

Page 184: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

encodingLevel

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:encodingLevel:phonetics,phonology,semantics,morphology,syntax,pragmatics,other

Definition/Explanations

InformationonthecontentsofthelexicalConceptualResourceasregardsthelinguisticlevelofanalysis

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

154

Page 185: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

linguisticInformation

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:linguisticInformation:accentuation,lemma,lemma-MultiWordUnits,lemma-Variants,lemma-Abbreviations,lemma-Compounds,lemma-CliticForms,partOfSpeech,morpho-Features,morpho-Case,morpho-Gender,morpho-Number,morpho-Degree,morpho-IrregularForms,morpho-Mood,morpho-Tense,morpho-Person,morpho-Aspect,morpho-Voice,morpho-Auxiliary,morpho-Inflection,morpho-Reflexivity,syntax-SubcatFrame,semantics-Traits,semantics-SemanticClass,semantics-CrossReferences,semantics-Relations,semantics-Relations-Hyponyms,semantics-Relations-Hyperonyms,semantics-Relations-Synonyms,semantics-Relations-Antonyms,semantics-Relations-Troponyms,semantics-Relations-Meronyms,usage-Frequency,usage-Register,usage-Collocations,usage-Examples,usage-Notes,definition/gloss,translationEquivalent,phonetics-Transcription,semantics-Domain,semantics-EventType,semantics-SemanticRoles,statisticalProperties,morpho-Derivation,semantics-QualiaStructure,syntacticoSemanticLinks,other

Definition/Explanations

AmoredetailedaccountofthelinguisticinformationcontainedinthelexicalConceptualResource

Relationtoothermetadataschemas

DataCite4.0:creatorwithcreatorNameornameIdentifier&nameIdentifierScheme&schemeURI;N.B.creatorNamefamilyName&givenNameinv4

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

155

Page 186: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

conformanceToStandardsBestPractices

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:conformanceToStandardsBestPractices:AgroVoc,ALVIS,ARGO,BML,CES,DKPro_Core,EAGLES,EDAMontology,ELSST,EML,EMMA,GATE,GESIS,GMX,GrAF,HamNoSys,HASSET,InkML,ILSP_NLP,ISO12620,ISO16642,ISO1987,ISO26162,ISO30042,ISO704,JATS,LAF,LAPPS,Lemon,LMF,MAF,MLIF,MOSES,MULTEXT,MUMIN,multimodalInteractionFramework,OAXAL,OLIA,OWL,PANACEA,pennTreeBank,pragueTreebank,RDF,SemAF,SemAF_DA,SemAF_NE,SemAF_SRL,SemAF_DS,SKOS,SRX,SynAF,TBX,TMX,TEI,TEI_P3,TEI_P4,TEI_P5,TimeML,XCES,XLIFF,UD,WordNet,othe

Definition/Explanations

Specifiesthestandardsorthebestpracticestowhichthetagsetusedfortheannotationconforms

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

156

Page 187: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

lingualityType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:lingualityType:monolingual,bilingual,multilingual

Definition/Explanations

Indicateswhethertheresourcecontainsone,twoormorelanguages

Recommendedusage

Please,selectoneofthevalues.Please,notethattheelementconcernsthelanguageoftheresourceitselfandnotthelanguageusedforitsdescription;forinstance,alexiconofEnglishwithdefinitionsbothinEnglishandFrenchisconsideredmonolingual.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

157

Page 188: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

language

Usage

Mandatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)oftheresourceaccordingtoIETFBCP47guidelines.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageusedtodescribetheresource(e.g.en-US)accordingtotheIETFBCP47guidelines;nottobeconfusedwith"language"whichisusedforthelanguageofthecontentsoftheresource.Forinstance,alexiconofEnglishwithdefinitionsinEnglishandFrenchmustbeencodedwith"language""English"and2"metalanguage"valuesfor"English"and"French".Theelementcanberepeatedformultiplelanguages.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

158

Page 189: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

metalanguage

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)usedtodescribethecontentsoftheresource(the"metalanguage")accordingtoIETFBCP47guidelines.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageusedtodescribetheresource(e.g.en-US)accordingtotheIETFBCP47guidelines;nottobeconfusedwith"language"whichisusedforthelanguageofthecontentsoftheresource.Forinstance,alexiconofEnglishwithdefinitionsinEnglishandFrenchmustbeencodedwith"language""English"and2"metalanguage"valuesfor"English"and"French".Theelementcanberepeatedformultiplelanguages.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

159

Page 190: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

size

Usage

Mandatory

Type

sizepattern(sizeandsizeUnit)

Definition/Explanations

Providesinformationonthesizeoftheresourceorofresourceparts.

Recommendedusage

Youmayindicatethesizeofthelexical/conceptualresourcebyfillingintheappropriatenumberandselectingtheappropriatesizeUnit(e.g.20000words).

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

160

Page 191: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

domain

Usage

Recommended

Type

freetext

Attributes

ms:classificationSchemeNameandms:schemeURI

Definition/Explanations

Domainofthelexical/conceptualresource

Recommendedusage

Itisrecommendedthatdomainvaluesaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthedomainvaluesistheidentifierofthedomaininthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

161

Page 192: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

162

Page 193: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

163

Page 194: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

164

Page 195: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Forlexical/conceptualresources,therecommendedrelationsareisVersionOfandrequiresSoftware,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DataCite4.0:skos:closeMatchdatacite:relationType

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

165

Page 196: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

166

Page 197: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

167

Page 198: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Recommendedschemaformodels

Recommendedschemaformodels

168

Page 199: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

OMTD-SHAREelement Usage

resourceType M

resourceName M

description M

identifier M

version M

distributionMedium M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) M

versionoflicence M

distributionMedium M

downloadURL Mwhenapplicable

contactEmailorlandingPage(oneofthetwomustbeprovided) M

contactPerson(identifierorpersonName) R

contactGroup(identifierororganizationName) R

mustBeCitedWith R

resourceCreator(personororganization,describedwithidentifierorname) R

variantName M

tagset R

typesystem R

algorithm R

trainingCorpusDetails R

mediaType M

lingualityType M

multilingualityType Mwhenapplicable

language M

size M

relationType=isCompatibleWith(externalrelationbetweenmodelsandcomponentsthatcanusethem) R

Recommendedschemaformodels

169

Page 200: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Recommendedschemaformodels

170

Page 201: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Attributes

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component

Definition/Explanations

Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatatoolorservicetakesasinputorproducesasoutput

Recommendedusage

Formodels,thefixedvalue"model"mustbeaddedautomatically

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusageformodelsistouse"model"butthevalue"dataset"canalsobeused

Recommendedschemaformodels

171

Page 202: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceName

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Thefullnamebywhichtheresourceisknown

Recommendedusage

Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“OpenNLPPOStaggermodelforEnglish”insteadofjust“modelforEnglishPOStags”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

Recommendedschemaformodels

172

Page 203: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

identifier

Usage

Mandatory

Type

freetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI

Definition/Explanations

ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither

theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)

Recommendedschemaformodels

173

Page 204: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

description

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Providesthedescriptionoftheresourceinprose

Recommendedusage

Giveabriefyetinformativedescriptionofthemodel,e.g.thelanguage(s)itappliesto,thecorpusithasbeentrainedon,theoreticalapproachesusedetc.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType

Recommendedschemaformodels

174

Page 205: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

version

Usage

Recommended

Type

freetext

Definition/Explanations

Anystring,usuallyanumber,thatidentifiestheversionofaresource

Recommendedusage

Please,keepthisonlyforversionsofthesameresource(e.g.corrected,enlargedetc.)andnotforvariantsorforversionswithadditionalordifferentinformation.Therecommendedpracticeforversioningshouldfollowsemanticversioningguidelines(http://semver.org/)

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:hasVersionDataCite4.0:skos:exactMatchdatacite:Version

Recommendedschemaformodels

175

Page 206: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

licence

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

Recommendedschemaformodels

176

Page 207: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaformodels

177

Page 208: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtName

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaformodels

178

Page 209: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtURL

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaformodels

179

Page 210: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

nonStandardLicenceName

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

freetext

Definition/Explanations

Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences

Recommendedusage

Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:title(fordct:licenseDocument)

Recommendedschemaformodels

180

Page 211: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

nonStandardLicenceTermsURL

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaformodels

181

Page 212: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

Recommendedschemaformodels

182

Page 213: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

distributionMedium

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other

Definition/Explanations

Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource<

Recommendedusage

Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.Formodels,theexpectedvalueis"downloadable".Please,notethatIfthemodelisdistributedindifferentmediumsand/orunderdifferenttermsofuseorlicences,youcanrepeatthewholesetofelementstodescribethem.

Recommendedschemaformodels

183

Page 214: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

downloadURL

Usage

Mandatoryuponconditions

Conditionsforusage

ifdistributionMedium=downloadable

Type

urlpattern

Definition/Explanations

Anyurlwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,indicatewherethemodelcanbedownloaded;thiselementisofparticularimportanceifyouhavenotuploadedtheresourceintherepository

Recommendedschemaformodels

184

Page 215: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactEmail

Usage

Mandatoryunderconditions

Conditionsforusage

Anemailoralandingpagemustbeprovided

Type

emailpattern

Definition/Explanations

Ageneralemailaddressthatcanbeusedascontactpointforaresource([email protected])

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Recommendedschemaformodels

185

Page 216: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

landingPage

Usage

Mandatoryunderconditions

Conditionsforusage

Anemailoralandingpagemustbeprovided

Type

URLpattern

Definition/Explanations

AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Recommendedschemaformodels

186

Page 217: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactPerson(identifierorpersonName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)

Recommendedschemaformodels

187

Page 218: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactGroup(identifierororganizationName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.

Recommendedschemaformodels

188

Page 219: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mustBeCitedWith

Usage

Recommended

Type

freetextoridentifier

Definition/Explanations

Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)

Recommendedusage

Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.

Recommendedschemaformodels

189

Page 220: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceCreator(personororganization,describedwithidentifierorname)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)ororganization(s)thathas/havecreatedtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theresourcecreatorisconsideredtobethepersonthathasputtogetherthecorpusthroughtheuserquery.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:creatorDataCite4.0:skos:closeMatchdatacite:Creatorwithdatacite:creatorName(familyName

Recommendedschemaformodels

190

Page 221: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

&givenName)ordatacite:nameIdentifier&datacite:nameIdentifierScheme&datacite:schemeURI

Recommendedschemaformodels

191

Page 222: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

variantName

Usage

Mandatory

Type

freetext

Definition/Explanations

variantnameusedforthemodel

Recommendedschemaformodels

192

Page 223: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

tagset

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaformodels

193

Page 224: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

typesystem

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

xs:resourceIdentifierSchemeNameorxs:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaformodels

194

Page 225: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

algorithm

Usage

Recommended

Type

freetext

Definition/Explanations

Trainingalgorithmusedforthemodel(e.g.maximumentropy,svmetc.)

Recommendedusage

Please,provideanameandnotdetailsaboutit

Recommendedschemaformodels

195

Page 226: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

trainingCorpusDetails

Usage

Recommended

Type

freetext

Definition/Explanations

Detaileddescriptionofthetrainingcorpus(e.g.size,numberoffeaturesetc.)

Recommendedschemaformodels

196

Page 227: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mediaType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mediaType:text,audio,video,image

Definition/Explanations

Specifiesthemediatypeoftheresourceandbasicallycorrespondstothephysicalmediumofthecontentrepresentation.Eachmediatypeisdescribedthroughadistinctivesetoffeatures.Aresourcemayconsistofpartsattributedtodifferenttypesofmedia.Acomponentmaytakeasinput/outputmorethanonedifferentmediatypes.

Recommendedusage

OpenMinTeDonlyhandlestextresources,sothedefaultvalueissetto"text".

Recommendedschemaformodels

197

Page 228: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

lingualityType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:lingualityType:monolingual,bilingual,multilingual

Definition/Explanations

Indicateswhethertheresourcecontainsone,twoormorelanguages

Recommendedusage

Please,selectoneofthevalues.

Recommendedschemaformodels

198

Page 229: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

language

Usage

Mandatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)forwhichthemodelhasbeentrained,expressedaccordingtoIETFBCP47guidelines.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguagethatthemodelcanbeusedfor(e.g.en-US)accordingtotheIETFBCP47guidelines.Theelementcanberepeatedtoencodemultiplelanguages.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language

Recommendedschemaformodels

199

Page 230: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

size

Usage

Mandatory

Type

sizepattern(sizeandsizeUnit)

Definition/Explanations

Providesinformationonthesizeoftheresourceorofresourceparts.

Recommendedusage

YoumayindicatethesizeoftheentiremodelbyfillingintheappropriatenumberandselectingtheappropriatesizeUnit(e.g.20000words).ThepreferredsizeUnitiswordsorsentences.Ifnothingelseisknown,pleaseindicateatleastfiles.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size

Recommendedschemaformodels

200

Page 231: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format

Recommendedschemaformodels

201

Page 232: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Recommendedschemaformodels

202

Page 233: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.

Recommendedschemaformodels

203

Page 234: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Formodels,therecommendedrelationisisCompatibleWithholdingwithsoftwarecomponents,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DataCite4.0:skos:closeMatchdatacite:relationType

Recommendedschemaformodels

204

Page 235: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaformodels

205

Page 236: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaformodels

206

Page 237: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Guidelinesforprovidersofsoftwareresources

IntroductionInstructionsforprovidersofsoftwarecomponentsRecommendedancillaryknowledgeresourcesRecommendedmetadataschemaforsoftwareresources

Guidelinesforprovidersofsoftwareresources

207

Page 238: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

IntroductionOpenMinTeDtargetsscholarlyresearcherswhoareagnostictosoftwaredetailsandpeculiaritiesaswellasTDMdevelopers.Itallows,therefore,theregistrationof

applications,thatcanbeusedas-istoperformTDMoperationsoncontentresources,andsoftwarecomponents,i.e.piecesofsoftwarethatcan,bymeansoftheOpenMinTeDWorkflowEditor,beputtogetherandtunedwithvariousancillaryresourcesinordertocreateworkflowsthatwillbedeliveredtotheend-usersand/orfurtherintegratedintootherworkflows.

Allofthesewillbemadeavailabletotheresearchersinawaythatwillnotrequireanykindofexpertisefromthem,bothaslocallydownloadableandexecutabletoolsoraswebservices.

TheOpenMinTeDplatform,atthecurrentstage,supportstheintegrationofsoftwarecomponentswrappedfortheGATEorUIMA/uimaFITframeworks.

TobefullycompatiblewithOpenMinTeD,youmustprovide

ametadatarecordcompliantwiththeOMTD-SHAREschema,atleastattheminimallevel(whichyoucanuploadtotheRegistryasanXMLfileand/oreditwiththeOpenMinTeDmetadataeditor),thesoftwareinanexecutableform,byuploadingitinacompressedfileorprovidingalinktoaURLlocationfromwhichitcanbedirectlyaccessed(i.e.notalandingpage).

Introduction

208

Page 239: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Instructionsforprovidersofsoftwarecomponents

HowtoregisteryourcomponentsHowtomakeyourcomponentsinteroperableHowtodocumentyourcomponentsGuidefordeployingUIMAcomponentsintheArgoplatform

Instructionsforprovidersofsoftwarecomponents

209

Page 240: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtoregisteryourcomponents

TherecommendedwayofprovidingsoftwarecomponentsisthroughtheMavenCentralrepositoryaccordingtothefollowinginstructions:

Please,puttogetherinasinglefolder(intheformthatisrequiredfromtheusedtechnologies/frameworks)

allfilesthatimplementthecomponent(e.g.Javaclassesetc.)licencetext(s),preferablynamedas"LICENCE.TXT"inordertobeunambiguouslyrecognised;inthecaseofmultiplelicences,theyshouldbeallaggregatedinthesamefileareadmenotice,thatdescribesthecontentsofthefolderaswellasanyimportantnoticeforthecompilationandexecutionofthecomponentalldescriptors(UIMA/uimaFIT,GATECREOLE ,OMTD-SHAREetc.)availableforthecomponentaccordingtotheimplementationframework,aMavenPOMXMLfile.

PackthemasaJARusingtherespectiveMavenplugin.UploadthemtotheMavenrepositoryaccordingtotheMavenguidelinesFinally,submittheMavencoordinatesintheOMTDregistry;inthiscase,themetadatarecordwillbepartiallyconvertedfromtheMavenPOMfileand,potentiallyfromelementsincludedinthemetadatadescriptorssupportedbyOpenMinTeD(UIMA/uimaFIT,CREOLE,andthenyoucanenrichitusingtheOpenMinTeDeditor.

.DetailsofGATEdescriptorscanbefoundathttps://gate.ac.uk/userguide/sec:creole-model:configalthoughtheydonotcurrentlycontainmany(ifany)oftheinformationneededtocompletetheOMTD-SHAREmetadatadescriptor.NotethatthisischangingtoincludemoreOpenMinTedlikeinformationmuchofwhichwillbespecifiedinaMavenPOMratherthanasCREOLEmetadata.ThisiscurrentlynotdocumentedasitrelatestothenextversionofGATEthatisstillunderactivedevelopment.↩

1

1

Howtoregisteryourcomponents

210

Page 241: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtomakeyourcomponentsinteroperable

Inaddition,ifyouwanttobefullycompliantwiththeOpenMinTeDinteroperabilityrequirements,pleaseensurethatyouadoptthefollowingrules;ifyoufailtoabidetothem,itmightstillbepossibletooperateyoursoftwareresourcesviatheOpenMinTeDplatform,butthiscannotbeguaranteedandinteroperabilitywithotherresourceswillsufferloss.

Please,keepancillaryknowledgeresources,e.g.models,annotationresources,etc.,separatefromthecomponentitself;documentanduploadthesealsointheOpenMinTeDRegistryfollowingtheproceduredescribedinGuidelinesforprovidersofancillaryknowledgeresources.Ifyouwanttorefertotheseresourcesfromthesoftwaremetadatarecord,pleaseusetheresourceidentifierforthelinking.Toensurethatprovidedsoftwarecomponentscanbescaledasrequiredfordifferentworkloads,itisrecommendedthattheyareimplementedinastatelessfashion,i.e.withouttheneedtomaintaininformationaboutoneormoredocumentsandtheneedtosharethisinformationwithotherinstancesofthesamecomponent.E.g.acomponentthatcountsalltokensinacorpuscannotbetriviallyscaled.InadditiontoplainUIMA/uimaFITandGATE-CREOLEdescriptors,OpenMinTeDalsosupportsArgodescriptors;furtherinstructionsfordeployingUIMAcomponentsinArgoarefoundhere.

Howtomakeyourcomponentsinteroperable

211

Page 242: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Howtodocumentyourcomponents

TobefullycompatiblewithOpenMinTeD,youmust

ensurethatthesoftwareisdistributedunderaperpetual,world-wide,no-charge,royalty-freecopyright/patentlicencethatpermitsunrestricteduseandallowsunlimitedredistributionincludeinthemetadatarecordalinktothelicencedocument(s)withthetermsandconditionsunderwhichitisprovided,andattachthelicencedocument(s)togetherwiththeresourceifyoualreadyhaveaPIDforyourresource(e.g.aURIoraHANDLE),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformation)ensurethatyouversionallyoursoftwareresourcesandlabeltheversionsinanunambiguousway,preferablyfollowingtheSemanticVersioningrecommendationsensurethatyouprovidewithyoursoftwareresourceappropriatemachine-readablemetadataembeddedinthesourcecode(wherepossible)andaccordingtotherelevantframework(e.g.uimaFITJavaannotationsetc.);makesurethatthemetadatadescriptorsareproperlyidentifiedinanunambiguouswaythatmakesthemeasytodistinguishandextractforJava-basedcomponents,ensurethatyouusetheJavafullyqualifiedclassnamingconventionsfornamingyourcomponents;togetherwiththeMavenpracticesforregisteringpackagingandversion,thiscontributestouniqueidentifiersofthecomponentsdescribealltheexecutionalrequirementsfortheproperoperationofthesoftware,i.e.requiredsoftwarelibraries,ancillaryresources,annotationschemadependencies,etc.describetheinputandoutputrequirementsforyoursoftware,atleastasregardsthetypeofresource,thelanguage(ifrequired),dataformatandcharacterencoding,andannotationtypesoftheinput/outputresourcedeclarewhetherthesoftwareisdownloadableorcanonlybeaccessedasawebserviceinthemetadataensurethatyoudescribeappropriatelythefunctionalitiesofthesoftware,boththroughtheOMTD-SHAREcomponenttypevocabularyaswellasinafreetextdescription,supplyingmoreinformationfortheuser.

Furtherrecommendationsthatcontributetointeroperabilityincludethefollowing:

Itisimportantthatyouprovidetheappropriatedocumentationforyourresource(e.g.manuals,helpfilesetc.),whichyoushouldalsoversionalongwiththesoftwareandaddasreferencetoyourmetadatarecord.Recommendoneofthepublicationsaboutyourresourceastheonetobecitedforscholarlyattributionandaddthisinformationinthemetadatarecord.

Howtodocumentyourcomponents

212

Page 243: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Makesurethatyoufillinthemetadatarecordalltheelementsrequiredforcitingyourresource ,i.e.thecreatoroftheresource,atitle,theresourcetypeandanidentifier,andoptionally,thepublicationdate,theversionandthepublisherordistributor.Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.),pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.

.Forcitation,OpenMinTeDendorsestheJointDeclarationofDataCitationPrinciples,aswellasthemorespecialisedRDArecommendationsfordatacitationofevolvingdataandDataCiteguidelines.↩

1

1

Howtodocumentyourcomponents

213

Page 244: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

GuidefordeployingUIMAcomponentsintheArgoplatformArgoisabletousestandardJavaUIMAcomponents,howevertheymustbefirstpackagedasUIMAPEAR(ProcessingEngineARchive)filesbeforetheycanbedeployedwithintheArgoplatform.

ItisstronglyrecommendedtouseMaven,abuildautomationtool,tomanageUIMAcomponentprojects,andaMavenpom.xmltemplate(seefurtherbelow)isavailable.Thehighlightedvalueswithinthepom.xmltemplatearethoseexpectedtobeconfiguredbycomponentdevelopers.

TheveryminimumfilesrequiredtoproduceaworkingUIMAcomponentare:

1. AstandardUIMAXMLdescriptor(locatedunderthedescfolderattherootoftheproject).

2. AJavaclasscontainingtheimplementationofthecomponent(locatedundersrc/main/Java).

3. AMavenpom.xml(adaptedfromthetemplate).

Figure1showstherecommendedlayoutofaverysimplecomponentprojectmanagedbyMaven,usingtheexampleplaceholdervaluesfoundintheMavenpom.xmltemplate.TheUIMAXMLdescriptorshouldbenamedusingtheMavenartifactIdvalue(e.g.uima-component)andresideunderthedescdirectoryandthenanestedsetofdirectoriesrepresentingtheMavengroupIdvalue(e.g.xyz.company.uima).

Figure1:BasiclayoutofaMaven-basedUIMAcomponentproject

ItisrecommendedtousetheMavenartifactIdandgroupIdtoproducetheUIMAComponentID(e.gthegroupIdxyz.company.uimaandartifactIduima-componentshouldresultinaComponentIDofxyz.company.uima.uima-component).Thedefaultconfigurationofthe

GuidefordeployingUIMAcomponentsintheArgoplatform

214

Page 245: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

PEARPackagingMavenplugin,withinthepom.xmltemplate,automatesthisprocedure.AComponentIDisintendedtobeuniqueandisnotintendedtobevisibletoArgoend-users.

AnyJavadependenciesofaUIMAcomponentareexpectedtobeincludedwithinacomponent’sPEARfile.Thepom.xmltemplateisconfiguredtoautomaticallypackagetheMavendependencieswhenbuildingaPEARfile.However,toachieveArgocompatibility,itisimportanttoexcludetheuimaj-coreartifactandanyartifactsrepresentingUIMATypeSystems.Inthepom.xmltemplatethisisachievedbysupplyingtheexcludeArtifactIdsconfigurationparameterofthetheMavenDependencypluginwithacomma-delimitedlistoftheaffectedartifactIds.ArgoexpectsUIMATypeSystemstobeinstalledseparatelyandpackagedasPEARfiles,asforUIMAcomponents.

AcomponentmayalsocontainanArgoXMLdescriptorfile,althoughthisisentirelyoptional.Itisintendedtoprovideadditionalmetadataforacomponent.AnArgoXMLdescriptormust:

Resideinthesamedirectoryasthecomponent’sUIMAXMLdescriptor.HavethesamefilenameastheUIMAXMLdescriptor,butwitha.argo.xmlsuffix.

Figure2showsthelocationandnameofanArgoXMLdescriptorfileforacomponentwiththeIDofxyz.company.uima.uima-component,whileFigure3showsthegeneralformatofthedescriptorfileitself.

Figure2:ExamplefilestructureofacomponentcontainingArgoXMLdescriptorfile

<argoDescriptor>

<tags>

<tag>{string}/tag>

...

</tags>

<minimumMemoryInMbs>{integer}</minimumMemoryInMbs>

<interactive>[true/false]</interactive>

<configurationParametersMetaData>

<configurationParameterMetaData>...</configurationParameterMetaData>

...

<configurationParametersMetaData>

</argoDescriptor>

Figure3:StructureofanArgoXMLdescriptor

GuidefordeployingUIMAcomponentsintheArgoplatform

215

Page 246: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

WithinanArgodescriptorfile,allofthesub-elementsdirectlyundertheargoDescriptorelementareoptional.

Thetagselementcancontainmultipletagelements,eachcontainingastringvalue.ThesetagvaluesareintendedtobeusedwithinArgo’scomponentsearchfacility,toassistend-usersinfindingrelevantcomponents.

TheminimumMemoryinMbselementholdsanintegervalue,settingthedefaultvaluefortheminimumofamountofmemory(inMegabytes)requiredbythiscomponentwhenitisraninadistributedworkflow.Thisisimportantfordeterminingtheallocationofcomponentstomachines.

Theinteractiveelementcontainsabooleanvalue.Thisvalueissettotruewhenacomponentcontainsacustomwebuserinterface,whichrequiresinteractionwiththeannotationmodelduringaworkflowexecution.TheonlyexistingArgocomponentwiththisvaluesettotrueistheManualAnnotationEditor.

TheconfigurationParametersMetaDatacancontainmultipleconfigurationParameterMetaDataelements,eachoneprovidingadditionalinformationaboutcomponentconfigurationparametersfoundwithinthematchingUIMAXMLdescriptor.AconfigurationParameterMetaDataelementmustcontainanamesubelement(whichhasthesamenameastheconfigurationparameteritisreferencingintheUIMAdescriptor)andauiTypesubelement(whichisusedbyArgotoprovidethemostappropriateUIwidgettotheend-user).ValidvaluesforuiTypearetime,date,datetime.enum,password,type,documentandtext.

Figure4showshowconfigurationParameterMetaDataelementsareconfigurediftheiruiTypevalueiseithertime,dateordatetime.ThecorrespondingUIMAconfigurationparametermustbeoftypestring.Argoneedstoknowhowtoformatthetimechosenbytheend-userusingacalendarUIwidget,sothishastobespecifiedintheformatsubelement,asdemonstratedinFigure4.

<configurationParameterMetaData>

<name>timeParam</name>

<uiType>time</uiType>

<uiConfiguration>

<format>HH:mm:ss</format>

</uiConfiguration>

</configurationParameterMetaData>

Figure4:Adate,timeordatetimeconfigurationparameter

GuidefordeployingUIMAcomponentsintheArgoplatform

216

Page 247: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Forconfigurationparametersthathaveafixedsetofvalues,auiTypevalueofenumisrequired.Thesefixedvaluesshouldbelistedasasetofvalueelements,nestedwithinavalueselement,asshowninFigure5.

<configurationParameterMetaData>

<name>enumParam</name>

<uiType>enum</uiType>

<values>

<value>red</value>

<value>green</value>

<value>blue</value>

</values>

</configurationParameterMetaData>

Figure5:Anenumconfigurationparameter

Configurationparameterscontainingsensitiveinformation,suchaspasswords,shoulduseauiTypevalueofpassword.Thishidesthevalueoftheparameterfromtheuserand,onceentered,doesnotgettransmittedbacktotheArgoUI,foradditionalsecurity.Additionally,itisalsopossibletospecifytheminimumand/orthemaximumnumberofcharacterswhichthisvaluecanhold,usingminandmaxelementswithinthevalueConstraintselement.SeeFigure6foranexample.

<configurationParameterMetaData>

<name>passwordParam</name>

<uiType>password</uiType>

<valueConstraints>

<min>5</min>

<max>10</max>

</valueConstraints>

</configurationParameterMetaData>

Figure6:Apasswordconfigurationparameter

TomakeiteasierforausertoselectUIMAtype(s)withintheArgoUI,anyconfigurationparametersrepresentingtypesshouldhaveuiTypevalueoftype.Thiswillresultinasearchablelistofalltypes,knowntoArgo,beingdisplayedtotheend-userwhentheyareconfiguringthecomponent,fromwhichtherequiredtypescanbeselected.SeeFigure7foranexample.

<configurationParameterMetaData>

<name>typeParam</name>

<uiType>type</uiType>

</configurationParameterMetaData>

GuidefordeployingUIMAcomponentsintheArgoplatform

217

Page 248: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Figure7:Atypeconfigurationparameter

Configurationparameterswhichrefertolocalfilesand/ordirectoriesshouldhavetheuiTypevalueofdocument.Thiswillallowanend-usertoselectfilesfromtheArgoFileStoreusingafileselectordialog.Figure8showsanexampleconfigurationandatabledeclaringtheUIconfigurationparametersavailabletoconfigurethefiledialogcanbefoundinFigure9.

<configurationParameterMetaData>

<name>documentParam</name>

<uiType>document</uiType>

<uiConfiguration>

<selectFile>true</selectFile>

<selectFolder>false</selectFolder>

<selectFilesRecursively>false</selectFilesRecursively>

<hideFiles>false</hideFiles>

<windowCaption>Savefileas...</windowCaption>

</uiConfiguration>

</configurationParameterMetaData>

Figure8:Adocumentconfigurationparameter

selectFile Boolean Allowsausertoselectafileinthedialog

selectFolder Boolean Allowsausertoselectafolderinthedialog

selectFilesRecursively Boolean Recursivelyselectsallofthefilesand/orfolders,undertheselectedfolders.

hideFiles Boolean Onlyshowdirectoriesinthedialog

windowCaption Boolean Acaptiontodisplayinthefilebrowserwindow

Figure9:uiConfigurationelements

ConfigurationparametersthatarelikelytoholdalargeamountoftextshoulduseauiTypevalueoftext.Thiswillresultinalargertextboxbeingmadeavailabletotheend-user.ThesizeofthetextareaisconfiguredusingcharacterWidthandvisibleLineselements,nestedwithintheuiConfigurationelement,asshowninFigure10.

<configurationParameterMetaData>

<name>textAreaParam</name>

<uiType>text</uiType>

<uiConfiguration>

<characterWidth>30</characterWidth>

<visibleLines>5</visibleLines>

</uiConfiguration>

</configurationParameterMetaData>

Figure10:Atextareaconfigurationparameter

GuidefordeployingUIMAcomponentsintheArgoplatform

218

Page 249: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

AnexampleofaUIMAXMLdescriptor,alongwithitscorrespondingArgoXMLdescriptor,canbefoundfurtherbelow.

Mavenpom.xmltemplateforArgocomponents

<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/X

MLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.

apache.org/xsd/maven-4.0.0.xsd">

<modelVersion>4.0.0</modelVersion>

<groupId>xyz.company.uima</groupId>

<artifactId>uima-component</artifactId>

<version>1.0</version>

<build>

<resources>

<resource>

<directory>desc</directory>

</resource>

<resource>

<directory>src/main/resources</directory>

</resource>

</resources>

<plugins>

<plugin>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-dependency-plugin</artifactId>

<version>2.4</version>

<executions>

<execution>

<id>copy-dependencies</id>

<phase>prepare-package</phase>

<goals>

<goal>copy-dependencies</goal>

</goals>

<configuration>

<stripVersion>true</stripVersion><outputDirectory>${project.build.directory}/pearPac

kaging/lib</outputDirectory>

<overWriteReleases>true</overWriteReleases>

<overWriteSnapshots>true</overWriteSnapshots>

<includeScope>runtime</includeScope>

<excludeArtifactIds>U_compareTypeSystem,uimaj-core</excludeArtifactId

s>

</configuration>

</execution>

</executions>

</plugin>

<plugin>

<groupId>org.apache.uima</groupId>

<artifactId>PearPackagingMavenPlugin</artifactId>

<version>2.4.0</version>

<extensions>true</extensions>

<executions>

GuidefordeployingUIMAcomponentsintheArgoplatform

219

Page 250: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

<execution>

<phase>package</phase>

<configuration><mainComponentDesc>desc/xyz/company/uima/uima-component.xml</mainComp

onentDesc><componentId>${project.groupId}.${project.artifactId}</componentId>

</configuration>

<goals>

<goal>package</goal>

</goals>

</execution>

</executions>

</plugin>

<plugin>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-install-plugin</artifactId>

<version>2.3.1</version>

<executions>

<execution>

<phase>install</phase>

<configuration>

<packaging>pear</packaging>

<groupId>${project.groupId}</groupId>

<artifactId>${project.artifactId}</artifactId>

<version>${project.version}</version>

<file>${project.build.directory}/${project.groupId}.${project.artifactId}.pear

</file>

</configuration>

<goals>

<goal>install-file</goal>

</goals>

</execution>

</executions>

</plugin>

</plugins>

<pluginManagement>

<plugins>

<plugin>

<groupId>org.eclipse.m2e</groupId>

<artifactId>lifecycle-mapping</artifactId>

<version>1.0.0</version>

<configuration>

<lifecycleMappingMetadata>

<pluginExecutions>

<pluginExecution>

<pluginExecutionFilter>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-dependency-plugin</artifactId>

<versionRange>[1.0.0,)</versionRange>

<goals>

<goal>copy-dependencies</goal>

</goals>

</pluginExecutionFilter>

<action>

GuidefordeployingUIMAcomponentsintheArgoplatform

220

Page 251: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

<execute>

<runOnIncremental>false</runOnIncremental>

</execute>

</action>

</pluginExecution>

</pluginExecutions>

</lifecycleMappingMetadata>

</configuration>

</plugin>

</plugins>

</pluginManagement>

</build>

<dependencies>

<dependency>

<groupId>org.apache.uima</groupId>

<artifactId>uimaj-core</artifactId>

<version>2.7.0</version>

</dependency>

<dependency>

<groupId>org.u_compare</groupId>

<artifactId>U_compareTypeSystem</artifactId>

<version>1.1</version>

</dependency>

</dependencies>

</project>

ArgoXMLDescriptorexample

<argoDescriptor>

<tags>

<tag>categoryA</tag>

<tag>finance</tag>

</tags>

<minimumMemoryInMbs>256</minimumMemoryInMbs>

<interactive>false</interactive>

<configurationParametersMetaData>

<configurationParameterMetaData>

<name>timeParam</name>

<uiType>time</uiType>

<uiConfiguration>

<format>HH:mm:ss</format>

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>dateParam</name>

<uiType>date</uiType>

<uiConfiguration>

<format>yyyy/MM/dd</format>

GuidefordeployingUIMAcomponentsintheArgoplatform

221

Page 252: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>dateTimeParam</name>

<uiType>datetime</uiType>

<uiConfiguration>

<format>yyyy/MM/ddHH:mm:ss</format>

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>enumParam</name>

<uiType>enum</uiType>

<values>

<value>red</value>

<value>green</value>

<value>blue</value>

</values>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>passwordParam</name>

<uiType>password</uiType>

<uiConfiguration>

</uiConfiguration>

<valueConstraints>

<min>5</min>

<max>10</max>

</valueConstraints>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>typeParam</name>

<uiType>type</uiType>

<uiConfiguration>

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>documentParam</name>

<uiType>document</uiType>

<uiConfiguration>

<selectFile>true</selectFile>

<selectFolder>false</selectFolder>

<selectFilesRecursively>false</selectFilesRecursively>

<hideFiles>false</hideFiles>

<windowCaption>Savefileas...</windowCaption>

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>textAreaParam</name>

<uiType>text</uiType>

<uiConfiguration>

<characterWidth>30</characterWidth>

<visibleLines>5</visibleLines>

</uiConfiguration>

</configurationParameterMetaData>

GuidefordeployingUIMAcomponentsintheArgoplatform

222

Page 253: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

</configurationParametersMetaData>

</argoDescriptor>

UIMAAnalysisEngineXMLDescriptorreferencedbytheArgoXMLDescriptor

<?xmlversion="1.0"encoding="UTF-8"?>

<analysisEngineDescriptionxmlns="http://uima.apache.org/resourceSpecifier">

<frameworkImplementation>org.apache.uima.Java</frameworkImplementation>

<primitive>true</primitive>

<annotatorImplementationName>xyz.company.uima.UimaComponent</annotatorImplementationNa

me>

<analysisEngineMetaData>

<name>UIMAComponent</name>

<description/>

<version>1.0</version>

<vendor/>

<configurationParameters>

<configurationParameter>

<name>timeParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>dateParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>dateTimeParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>enumParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>passwordParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>typeParam</name>

<type>String</type>

<multiValued>false</multiValued>

GuidefordeployingUIMAcomponentsintheArgoplatform

223

Page 254: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>documentParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>textAreaParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

</configurationParameters>

<configurationParameterSettings/>

<typeSystemDescription/>

<typePriorities/>

<fsIndexCollection/>

<capabilities>

<capability>

<inputs/>

<outputs/>

<languagesSupported/>

</capability>

</capabilities>

<operationalProperties>

<modifiesCas>true</modifiesCas>

<multipleDeploymentAllowed>true</multipleDeploymentAllowed>

<outputsNewCASes>false</outputsNewCASes>

</operationalProperties>

</analysisEngineMetaData>

<resourceManagerConfiguration/>

</analysisEngineDescription>

GuidefordeployingUIMAcomponentsintheArgoplatform

224

Page 255: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Recommendedancillaryknowledgeresources

Inordertofurtherencourageinteroperability,OpenMinTeDmakesspecificrecommendationsaboutparticularknowledgeresourcesthatTDMtoolsandservicesshoulduse.TheserecommendationsareintheareasoflinguisticsandoftheinitialdomainsofusetargetedbyOpenMinTeD.Thecurrentrecommendationsshouldnotbeseenasafinalandstaticset.Theywillevolvewithexperience,andasOpenMinTeDisusedforTDMofnewdomains.Usersarethereforeencouragedtousetheexistingrecommendations,buttomakeuseofotherswherethesearenotsuitable.

TDMtoolsandservicesshoulduseresourcesfromthefollowinginitiallistwherepossible.Wherethisisnotpossible,knowledgeresourceauthorsareencouragedtoprovidelinkagesbetweentheirownresourceandthosegivenhere,ortoanyotherwidelyusedorstandardLinkedDataknowledgeresource.Thislistofrecommendedresourcesshouldbeseenasafirstversion,andwillbeextended.

SocialsciencesresourcesTheSoz

AgricultureandagronomyresourcesAgrovocOntologiesfromAgroPortal

LifesciencesresourcesOboInOwlMeSH(availableinLOD)BioCNeuroLexBioLexicon

LinguisticresourcesLAPPS(vocabularyofcorelinguisticobjects)UniversalDependencies(partofspeechtags,featuresformorphologyandsyntacticdependencies)OLIA(referencemodelandannotationmodelsformorphology,morphosyntax,dependencies)PennTreebank(partofspeechtagsandfeaturesofmorphology)ISOcat/CCR(linguisticandmetadataterminology)GOLD(linguisticontology)

Typesystems

1

Recommendedancillaryknowledgeresources

225

Page 256: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

*usedbythesoftwarecomponentsintegratedintheOpenMinTeDplatform(GATE,DKPRO,ALVIS,ARGOandILSP)ISOcathasrecentlymovedtotheClarinConceptRegistry(CCR)andiscurrentlyundercuration.

1

Recommendedancillaryknowledgeresources

226

Page 257: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

RecommendedschemaforsoftwareresourcesThissectionincludesasynopsisoftherecommendedschemaforsofwareresources,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelements,onlyasregardselementsrelatedtotheresourceitself.Additionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arenotpresentedhere,astheyaretobehandledbytheOMTDplatform.

OMTD-SHAREelement Usage

resourceType M

resourceName M

description M

identifier M

version M

componentDistributionMedium M

componentType M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) M

versionoflicence M

contactEmailorlandingPage(oneofthetwomustbeprovided) M

contactPerson(identifierorpersonName) R

contactGroup(identifierororganizationName) R

mailingList(mailingListName,subscribe,unsubscribe,post,archive,otherArchive) R

issueTracker R

onlineHelpURL R

mustBeCitedWith R

downloadURLoraccessURL(oneofthetwoshouldbeprovided) Mwhenapplicable

resourceCreator(personororganization,describedwithidentifierorname) R

mediaTypeinsideinputContentResourceInfooroutputResourceInfo(i.e.mediaTypeofinputandoutputresource)

Mwhenapplicable

resourceTypeinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

Recommendedschemaforsoftwareresources

227

Page 258: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

languageinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

characterEncodinginsideinputContentResourceInfooroutputResourceInfo

Rwhenapplicable

mimeTypeinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo

Rwhenapplicable

typesysteminsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

tagsetinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

annotationLevelinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

typesystem R

tagset R

annotationResource R

framework R

forparameters:parameterName,description,parameterType,mandatory,multiValue

Mwhenapplicable

relationType=isCompatibleWith(externalrelation;linktomodels,annotationresourcesetc.thatcanbeusedwiththecomponent) R

Recommendedschemaforsoftwareresources

228

Page 259: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Attributes

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component

Definition/Explanations

Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatatoolorservicetakesasinputorproducesasoutput

Recommendedusage

Forcomponents,thefixedvalue"component"mustbeaddedautomatically

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType

Recommendedusageistouseoneofthevalues"software","service"or"workflow"fordatacite:resourceTypeGeneral

resourceType

229

Page 260: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceName

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Thefullnamebywhichtheresourceisknown

Recommendedusage

Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“OpenNLPtagger”insteadofjust“taggerofEnglish”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.N.B.Thiselementisintendedforahuman-readable/human-understandablenamefortheresource.

Relationtoothermetadataschemas

MavenPOM4.0.0:nameGATE:nameUIMA/UIMA-fit:nameDCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

resourceName

230

Page 261: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

description

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Providesthedescriptionoftheresourceinprose

Recommendedusage

Giveabriefyetinformativedescriptionofthefunctionalitiesofthecomponent,thelanguage(s)itworkson,inputrequirementsetc.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.

Relationtoothermetadataschemas

MavenPOM4.0.0:descriptionGATE:commentUIMA/UIMA-fit:descriptionDCMI:skos:exactMatchdct:descriptionDataCite4.0:Description&descriptionTypewithvalue"abstract"

description

231

Page 262: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

identifier

Usage

Mandatory

Type

freetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI

Definition/Explanations

ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither

theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.ForcomponentsharvestedfromMaven,theMavenidcanbeusedwithareferencetotheMavenscheme(https://maven.apache.org/pom.html#Maven_Coordinates\).ThisiscombinedwiththeJavafullyqualifiedclassnamingconventionstogivethefollowingcoordinates:groupId:artifactId:version:(packaging):(classifier)#class

Relationtoothermetadataschemas

MavenPOM4.0.0:groupId&artifcactId&version&packaging&classifier,withresourceIdentifierSchemeURI="https://maven.apache.org/pom.html#Maven_Coordinates"GATE:classUIMA/UIMA-fit:classDCMI:skos:narrowMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)

identifier

232

Page 263: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

identifier

233

Page 264: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

version

Usage

Recommended

Definition/Explanations

Anystring,usuallyanumber,thatidentifiestheversionofaresource

Recommendedusage

Forcomponents,therecommendedpracticeistofollowthesemanticversioning(http://semver.org/).N.B."version"shouldnotbeconfusedwiththerelationthatlinkstogetheraspecificresourcewithitsvariousenrichedormodifiedversions(e.g.annotatedversion,subsetetc.).

Relationtoothermetadataschemas

MavenPOM4.0.0:versionDCMI:skos:exactMatchdct:hasVersionDataCite4.0:skos:exactMatchdatacite:Version

version

234

Page 265: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

componentType

Usage

Mandatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:componentType:access,reader,writer,supportComponent,visualizer,debugger,validator,viewer,corpusViewer,lexiconViewer,editor,mlTrainer,mlPredictor,featureExtractor,dataSplitter,dataMerger,converter,evaluator,flowController,scriptBasedAnalyzer,matcher,gazetteerBasedComponent,crowdSourcingComponent,dataCollector,crawler,processingComponent,annotator,segmenter,stemmer,lemmatizer,tagger,chunker,parser,coreferenceAnnotator,namedEntityRecognizer,semanticsAnnotator,srlAnnotator,readabilityAnnotator,aligner,generator,summarizer,simplifier,naturalLanguageGenerator,prePostProcessor,spellingChecker,grammarChecker,normalizer,filters,extractor,topicExtractor,documentClassifier,languageIdentifier,sentimentAnalyzer,keywordsExtractor,terminologyExtractor,contradictionDetector,emotionRecognizer,eventExtractor,persuasiveExpressionMiner,informationExtractor,lexiconExtractorFromCorpora,lexiconExtractorFromLexica,wordSenseDisambiguator,qualitativeAnalyser

Definition/Explanations

Specifiesthetypeofthecomponentintermsofthefunction/taskitperforms

Recommendedusage

Please,selectoneofthepredefinedvalues.Itshouldbenotedthatthevaluesarehierarchicallyorganised,soit'srecommendedtoselectthemorespecificvalueapplicable(e.g."visualizer"ratherthanthebroader"supportComponent").Thecurrentlistofvaluesisintendedforusemainlybysimplecomponentsratherthanworkflowsorfullapplications.Thelistwillbefurtherenrichedwithvaluesthattargettheend-usersalso.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:type

componentType

235

Page 266: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

componentType

236

Page 267: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

licence

Usage

Mandatoryuponconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

licence

237

Page 268: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.

Relationtoothermetadataschemas

MavenPOM4.0.0:license/nameDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

licence

238

Page 269: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtName

Usage

Mandatoryuponconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

openAccessclosedAccessembargoedAccessrestrictedAccess:

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

rightsStmtName

239

Page 270: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

rightsStmtURL

Usage

Mandatoryuponconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

rightsStmtURL

240

Page 271: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

nonStandardLicenceTermsURL

Usage

Mandatoryuponconditions

Conditionsforusage

whenoneofthevalues"nonStandardLicenceTerms"or"proprietary"isselectedfor"licence"

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

MavenPOM4.0.0:license/urlDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

nonStandardLicenceTermsURL

241

Page 272: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

versionoflicence

242

Page 273: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

componentDistributionMedium

Usage

Mandatory

Type

closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:componentDistributionMedium:webService,sourceCode,executableCode,sourceAndExecutableCode

Definition/Explanations

Themedium/formofthedistribution(e.g.downloadableresource,accessiblethroughinterface,sourcecodeetc.)

componentDistributionMedium

243

Page 274: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

accessURL

Usage

Recommendedunderconditions

Type

urlpattern

Definition/Explanations

Alandingpage,feed,SPARQLendpointetc.thatgivesaccesstotheresourceorwherethewebservice/workflowisexecuted

Recommendedusage

Pleaseuseforcomponentsthatareexecutableaswebservices

accessURL

244

Page 275: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

downloadURL

Usage

Recommendedunderconditions

Type

urlpattern

Definition/Explanations

Anyurlwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,useifthecomponentisdistributedassourceand/orexecutablecode,andhastobedownloadedinordertobeexecuted;thiselementisofparticularimportanceifyouhavenotuploadedtheresourceintherepository

Relationtoothermetadataschemas

MavenPOM4.0.0:canbedonethroughID

downloadURL

245

Page 276: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactEmail

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

emailpattern

Definition/Explanations

Ageneralemailaddressthatcanbeusedascontactpointforaresource([email protected])

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

contactEmail

246

Page 277: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

landingPage

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

URLpattern

Definition/Explanations

AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Relationtoothermetadataschemas

MavenPOM4.0.0:url

landingPage

247

Page 278: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactPerson(identifierorpersonName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)

contactPerson(identifierorpersonName)

248

Page 279: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

contactGroup(identifierororganizationName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

MavenPOM4.0.0:developers

contactGroup(identifierororganizationName)

249

Page 280: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mailingListInfo

Usage

Recommended

Type

setofmetadataelements

Definition/Explanations

Setofmetadataelements(name,subscribe,unsbuscribe,post,archive,otherArchive)requiredfordocumentingamailing

Recommendedusage

Mailinglistsareimportantfortrackinginformationusefulfordevelopersand/orusers;thewholesetofelementsinthemailingListgroupcanberepeatedforrecordingmultiplemailinglists.

Relationtoothermetadataschemas

MavenPOM4.0.0:Mailinglist

mailingListInfo

250

Page 281: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

onlineHelpURL

Usage

Recommended

Type

urlpattern

Definition/Explanations

Aurlintendedforend-usersprovidingusefulinformationregardingthecomponetusage/application,e.g.executiontips,faq's,helpforumsetc.

Relationtoothermetadataschemas

GATE:helpurl

onlineHelpURL

251

Page 282: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

issueTracker

Usage

Recommended

Type

urlpattern

Definition/Explanations

Theurlwhereissues,bugs,andfeaturerequestsshouldbesubmitted;thisinformationisimportantfors/wdevelopers

Relationtoothermetadataschemas

MavenPOM4.0.0:issuemanagement/url

issueTracker

252

Page 283: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mustBeCitedWith

Usage

Recommended

Type

freetextoridentifier

Definition/Explanations

Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)

Recommendedusage

Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.

mustBeCitedWith

253

Page 284: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceCreator(personororganization,describedwithidentifierorname)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)ororganization(s)thathas/havecreatedtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theresourcecreatorisconsideredtobethepersonthathasputtogetherthecorpusthroughtheuserquery.

Relationtoothermetadataschemas

MavenPOM4.0.0:developersDCMI:skos:closeMatchdct:creator

resourceCreator(personororganization,describedwithidentifierorname)

254

Page 285: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

DataCite4.0:creatorwithcreatorNameornameIdentifier&nameIdentifierScheme&schemeURI;N.B.creatorNamefamilyName&givenNameinv4

resourceCreator(personororganization,describedwithidentifierorname)

255

Page 286: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mediaTypeinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mediaType:text,audio,video,image

Definition/Explanations

Specifiesthemediatypeoftheresourcethatthecomponentprocessesand/orproduces.

Recommendedusage

OpenMinTeDonlyhandlestextresources,soonly"text"mustbeallowed.

mediaTypeinsideinputContentResourceInfooroutputResourceInfo

256

Page 287: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

resourceTypeinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,document,userInputText,lexicalConceptualResource,languageDescription

Definition/Explanations

Thetypeoftheresourcethatthecomponenttakesasinputorproducesasoutput

Recommendedusage

Pleaseuseespeciallyforreadersandwritersinordertospecifytheresourcetypetheycanprocessorproduce;e.g.forreaders,whethertheytakeasinputadocument(singlefile)orcollectionoffiles(corpus).

Relationtoothermetadataschemas

GATE:parameters/UIMA/UIMA-fit:Parametersinput/outputtypes

resourceTypeinsideinputContentResourceInfooroutputResourceInfo

257

Page 288: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

languageinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)ofthetextthatthecomponentsupports(takesasinputand/orproduces),expressedaccordingtoIETFBCP47guidelines.Theelementcanberepeatedtoencodemultiplelanguages.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageofthedocument(e.g.en-US)thatthecomponentsupports(takesasinputand/orproduces),expressedaccordingtotheIETFBCP47guidelines.Theelementcanberepeatedforcomponentsthatsupportvariouscharacterencodings.

Relationtoothermetadataschemas

UIMA/UIMA-fit:@LanguageCapabilityDataCite4.0:language-butthisisthelanguageoftheresourceandnotofinput/output

languageinsideinputContentResourceInfooroutputResourceInfo

258

Page 289: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

characterEncodinginsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceorsupportedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.Theelementcanberepeatedforcomponentsthatsupportvariouscharacterencodings.

Relationtoothermetadataschemas

GATE:Parameters/encoding

characterEncodinginsideinputContentResourceInfooroutputResourceInfo

259

Page 290: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

mimeTypeinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentsupports,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcomponentsthatsupportmultiplemimetypes.

mimeTypeinsideinputContentResourceInfooroutputResourceInfo

260

Page 291: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Relationtoothermetadataschemas

UIMA/UIMA-fit:@MimeTypeCapability

mimeTypeinsideinputContentResourceInfooroutputResourceInfo

261

Page 292: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

aclAnthologyaimedCorpusalvisEnrichedDocumentbioNLPbioNLP;format-variant=ST2013a1_a2bnccadixeJSONconll2000conll2002conll2006conll2007conll2009conll2012dataSiftfactoredTagLemgategeniagrafhtml5Microdatai2b2imsCwbjdbckeaCorpuslllnegraExportpmlptb;format-variant=chunkedptb;format-variant=combinedrelptigertupp-dztwitteruimaBinaryCasuimaCASDumpweb1txces;format-variant=ilsp:

Definition/Explanations

Thesupplementarylevelofdataformat

Recommendedusage

Please,usetofurtherspecifytheformatoftheresourcesupportedbythecomponent(asinputoroutput).Forinteroperabilityreasons,itisimportanttostandardiseasfaraspossiblethiselement;thisiswhyalistofvaluesincludingtheformatscurrentlysupportedbycomponentsintheOMTDregistryisprovided.Wherepossible,itisalsorecommendedtousethe"documentationURL"elementwithinformationandexamplesaboutthespecificdataformat.

Relationtoothermetadataschemas

UIMA/UIMA-fit:@MimeTypeCapability

dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo

262

Page 293: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

typesysteminsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

whenthes/wcomponenttakesasinput(orprovidesasoutput)aresourcethatusesaspecifictypesystem

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

typesysteminsideinputContentResourceInfooroutputResourceInfo

263

Page 294: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

tagsetinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

tagsetinsideinputContentResourceInfooroutputResourceInfo

264

Page 295: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

annotationLevelinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:annotationLevel:alignment,discourseAnnotation,discourseAnnotation-argumentation,discourseAnnotation-audienceReactions,discourseAnnotation-coreference,discourseAnnotation-dialogueActs,discourseAnnotation-discourseRelations,lemmatization,morphosyntacticAnnotation-bPosTagging,morphosyntacticAnnotation-posTagging,segmentation,semanticAnnotation,semanticAnnotation-certaintyLevel,semanticAnnotation-emotions,semanticAnnotation-events,semanticAnnotation-namedEntities,semanticAnnotation-polarity,semanticAnnotation-questionTopicalTarget,semanticAnnotation-readabilty,semanticAnnotation-semanticClasses,semanticAnnotation-semanticRelations,semanticAnnotation-semanticRoles,semanticAnnotation-speechActs,semanticAnnotation-subjectivity,semanticAnnotation-temporalExpressions,semanticAnnotation-textualEntailment,semanticAnnotation-wordSenses,syntacticAnnotation-semanticFrames,speechAnnotation,speechAnnotation-orthographicTranscription,speechAnnotation-paralanguageAnnotation,speechAnnotation-phoneticTranscription,speechAnnotation-prosodicAnnotation,speechAnnotation-soundEvents,speechAnnotation-soundToTextAlignment,speechAnnotation-speakerIdentification,speechAnnotation-speakerTurns,stemming,structuralAnnotation,structuralAnnotation-documentDivisions,structuralAnnotation-sentences,structuralAnnotation-clauses,structuralAnnotation-phrases,structuralAnnotation-words,syntacticAnnotation-subcategorizationFrames,syntacticAnnotation-dependencyTrees,syntacticAnnotation-constituencyTrees,syntacticAnnotation-chunks,syntacticosemanticAnnotation-links,translation,transliteration,modalityAnnotation-bodyMovements,modalityAnnotation-facialExpressions,modalityAnnotation-

annotationLevelinsideinputContentResourceInfooroutputResourceInfo

265

Page 296: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

gazeEyeMovements,modalityAnnotation-handArmGestures,modalityAnnotation-handManipulationOfObjects,modalityAnnotation-headMovements,modalityAnnotation-lipMovements,other

Definition/Explanations

Theannotationleveloftheannotatedresourceorwhatas/wcomponentconsumesorproducesasoutput

Relationtoothermetadataschemas

UIMA/UIMA-fit:@TypeCapability

annotationLevelinsideinputContentResourceInfooroutputResourceInfo

266

Page 297: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

typesysteminsidecomponentDependencies

Usage

Mandatorywhenapplicable

Conditionsforusage

whenthes/wcomponentusesaspecifictypesystemforitsoperation

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

typesysteminsidecomponentDependencies

267

Page 298: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

tagsetinsidecomponentDependencies

Usage

Mandatorywhenapplicable

Conditionsforusage

whenthes/wcomponentusesaspecifictagsetforitsoperation

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

tagsetinsidecomponentDependencies

268

Page 299: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

annotationResourceinsidecomponentDependencies

Usage

Mandatorywhenapplicable

Conditionsforusage

whenthes/wcomponentusesaspecificannotationresourceforitsoperation

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Aresource(e.g.ontology,terminologicalresource)usedforannotatingadocument,corpus,sentenceetc.

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

annotationResourceinsidecomponentDependencies

269

Page 300: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

framework

Usage

Recommended

Controlledvocabularyreferenceand/orvalues

UIMAGATEAlvisNLPother:

Definition/Explanations

Theframeworkusedfordevelopinganddeployingthecomponent

framework

270

Page 301: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Forcomponents,therecommendedrelationisisCompatibleWithholdingwithmodels,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DataCite4.0:skos:closeMatchdatacite:relationType

relationType

271

Page 302: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

relatedResource1

272

Page 303: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

relatedResource2

273

Page 304: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

TheOMTD-SHAREmetadataschemaTheOMTD-SHAREmetadataschema istherecommendedschemaforthedescriptionoftheresources.Ithasbeenconceivedanddesignedinordertoserveasafacilitator,providingtheinteroperabilitybridgebetweenthevariousresourcetypesinvolvedinTDMprocesses,andasanintermediarywiththetargetaudience,includingTDMdevelopersandend-users.

Itsdesigntakesintoconsiderationthefactthatbothresourcesanduserscomefromdifferentscientificcommunitiesandtriestoachieveinteroperabilitythroughacommoncorevocabularyforthedescriptionofresourcesandtheirproperties,establishinglinkstothevocabulariesalreadyusedbythevarioussourcesforthispurpose.Standardsandbestpracticesofthesourcecommunitiesaretakenaboardtothebestextentpossible.ThemainprinciplesandstrategiesemployedinthedesignoftheOMTD-SHAREschemaconsistofthefollowing:

coverneedsofresourcediscoverabilityandTDMprocessingcoverdocumentationneedsofallresourcetypesinvolvedinTDMbeflexibleenoughtosupportvaryingdegreesofdocumentationcompletenessorganizetheschemaelementsandaccommodatecommonvs.particularfeaturesofresourcesreusewhatisavailablevs.createandrecommendnewelementsandvaluesstandardize/normalizeuserinputvs.allowforfreeuserinputdocumentprocessingprocedureandoutputs.

1

TheOMTD-SHAREmetadataschema

274

Page 305: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

IthaslargelybeenbasedontheMETA-SHAREmetadataschema[Gavrilidouetal.2012],whichcatersforthedescriptionoflanguageresources,encompassingbothdata(textual,multimodal/multimediaandlexicaldata,grammars,languagemodelsetc.)andtechnologies(tools/services)usedfortheirprocessing.TheOMTD-SHAREismorerestrictedinthesensethatitfocusesontextresourcesonly,whileitalsoextendsthebasicschemainordertoincludeTDM-specificconcepts,anddescribeinanenhancedwayprocessingproceduresandworkflows.

AsinMETA-SHARE,theschemasetsouttodocumentthefulllifecycleofaresource,whichalsoincludesatleastaminimaldocumentationofthesatelliteentitiesthatparticipateinit,especiallytherelationsthatholdbetweenthem.TheOMTD-SHAREdatamodelthuscomprisesofthefollowingentities:

theresources,furtherclassifiedinto:corpora,i.e.datasetsoftextdocuments-mainlyscholarlypublicationsinOMTD-SHARElexical/conceptualresources,includinglexica,ontologies,termlists,gazetteersetc.,butalsotagsetsandannotationschemas,whichareusedforannotatingcorporalanguagedescriptions,whichmainlyrefertocomputationalgrammarsmachinelearningandstatisticalmodels ,softwarecomponents,piecesofsoftware,toolsofferedaslocallyexecutablecodesoraswebservices,wrappedinaworkfloworasstandaloneend-to-endapplications,and,finally,publications,whichconstituteapeculiarresourcetype,astheyareviewedinOpenMinTeDonlyinacollectiveform,asa"corpus",

butalsosatelliteentities,suchastheactors,beitpersonsororganizationsthathavecreatedtheresources,ortheprojectsthathavefundedthemorwheretheyareused.

Obviously,lexical/conceptualresources,languagedescriptionsandmodelsareancillaryresourcesusedfortheTDMoperation.Corporaareanin-betweencaseastheymayrefertocorporausedfortheTDMoperation,suchastrainingorevaluationcorporaandthusplayasupportiverole,ortheycanbecomposedofscholarlypublications,inwhichcasetheyareapproachedasapropercontentresourcetobemined.

Theschemaiscomposedofmetadataelementsthatareusedtodescribepropertiesandrelationsbetweenalltheseentities.Someoftheseelements,especiallythosethatpertaintoadministrativefeatures(e.g.identification,contact,licensinginformationetc.),arecommontoalltypesofresources,whileotherelements,mainlythoserepresentingtechnicalfeaturesaboutthecontentsandformatofresources,differacrosstypes.Asaforesaid,publicationsdifferfromotherresourcestypes:themetadataelementsrecommendedfortheirdescriptionmainlyderivefromtheneedofservingasselectioncriteriainthecorpusbuildingprocess.

2

3

TheOMTD-SHAREmetadataschema

275

Page 306: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

OneofthecharacteristicfeaturesoftheSHAREfamilyofschemas istheadoptionofthecomponent-basedmechanism(ComponentMetaDataInfrastructure,CMDI),accordingtowhichsemanticallycoherentelementsaregroupedtogethertoformcomponents [Broederetal.,2008].Forinstance,thelicensingmoduleincludeselementssuchasthenameandURLofalicence,attributiontext,copyrightholders,etc.Forthesakeofsimplification,thecontainerelementsusedforthisgroupingwillnotbepresentedintheguidelinesunlessrequired.

TheOMTD-SHAREschemaclassifieselementsinto3levelsofoptionality:

mandatory:elementsthatarenecessaryforintendedpurposes,i.e.fordiscoveringresourcesandfortriggeringoperationsbetweencontentands/wcomponentsrecommended:elementsthatcanhelpthecurrentorfutureuseoftheresource,orusefulinformationthatprovidershavenotyetstandardizedoptional:allremaininginformationrelatedtothelifecycleofaresource.

TheschemaiscurrentlyimplementedasanXSD .AnimportantdifferencefromMETA-SHAREliesintheorganisationvis-a-visthedifferentresourcetypescovered:whileMETA-SHAREdescribesallresourcestypesinonecommonXSD,inOMTD-SHARE,theresourcetypesaredescribedinamoremodularwayasseparatesetsofXSDs.

WorkisongoingforproducingalsoanRDF/OWLversion,whichwillbedocumentedinthenextreleaseoftheguidelines.

.ThefullOMTD-SHAREschemaisdocumentedat:https://openminted.github.io/releases/omtd-share/.↩

.Modelscouldbeconsideredasasubtypeoflanguagedescriptions,butwedecidedtokeepitdistinctbecauseithadalotofpropertiesthatdifferentiateditfromgrammars;atthispointitwasalsoconsideredbettertokeepthemapartasitwouldenhancetheirdiscoverability.↩

.BasedontheMETA-SHAREschema,fourmoreadaptationsarenowavailable:ELRC-SHARE,clarin:el,andOMTD-SHARE.TheMETA-SHAREschemahasalsobeenimplementedasanRDF/OWLontologywiththecollaborationoftheld4ltW3Cgroup.↩

.Toavoidconfusionwiththeterm"component"alsousedforsoftwarecomponents,wewillfromnowonrefertothisconceptas"modules".↩

.ThecurrentversionofXSD'sisavailableat:https://github.com/openminted/omtd-share_metadata_schemaandthedocumentationofv1.0.0at:https://openminted.github.io/releases/omtd-share/1.0.0/↩

3

4

5

1

2

3

4

5

TheOMTD-SHAREmetadataschema

276

Page 307: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

TheOMTD-SHAREmetadataschema

277

Page 308: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Glossary

annotation(text/corpusannotation)Anotebywayofexplanationorcommentaddedtoatextordiagram[OxfordEnglishDictionary,https://en.oxforddictionaries.com/definition/annotation].InOpenMinTeD,thetermrefersmainlytotextorcorpusannotation,whichisthepracticeofaddinginterpretativelinguisticinformationgroundedinaknowledgeresourcetoatextorcorpusrespectively.Forexample,onecommontypeofannotationistheadditionoftags,orlabels,indicatingthewordclasstowhichlexicalunitsinatextbelong;thesetagscomefromapredefinedset(e.g.Noun,Verb,Preposition,etc.).Semanticlabelingwithtermsandconceptsfromanontologyisanothercommonexampleofannotation.Relationshipssuchassyntacticdependenciesorsemanticrelationsthatlinkentitiesofthetextarealsoannotations.

annotationresourceAnyresourcethatcanbeusedforannotatingatext,includingpart-of-speechtagsets,annotationschemes,domain-specificontologies,etc.

annotationschemeAsetofelementsandvaluesdesignedtoannotatedata.Anannotationschemeusuallyaimstorepresentaspecificlevelofinformation,suchasmorphologicalfeaturesofwords,syntacticdependencyrelationsbetweenphrases,discourselevelinformation,etc.Itcanconsistofaflatstructureofelementsandvalues(e.g.part-of-speechtags)oritcanbemorecomplexwithinterrelatedelements(e.g.specificmorphologicalfeaturestobeusedforeachpart-of-speech).

applicationAnysoftwareprogram(orgroupofprogramsseenasawhole)intendedfortheend-userandaddressingoneormultiplerelateduserneeds.

component(softwarecomponent)

Glossary

278

Page 309: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Analgorithmwrappedinastandardwaysothatitcanbeintegratedasareusabletoolorservicewithinaparticularcomponent-orientedframeworksuchasUIMA,GATE,etc.

corpusAstructuredcollectionofpiecesofdata(textual,audio,video,multimodal/multimedia,etc.)typicallyofconsiderablesizeandselectedaccordingtocriteriaexternaltothesedata(e.g.size,typeoflanguage,typeofproducersorexpectedaudience,etc.)torepresentascomprehensivelyaspossibletheobjectofstudy.

datamodelAdatamodelisanabstractmodelthatorganizeselementsofdataandstandardizeshowtheyrelatetooneanotherandtopropertiesoftherealworldentities.[Wikipedia,https://en.wikipedia.org/wiki/Data_model]

distributionAnyformbywhicharesourcecanbeshared;itcanbeadownloadablePDForaplaintextfile,aformofacorpusaccessibleonlythroughawebinterface,orthesourcecodeofasoftware,etc.

documentApieceofwritten,printed,orelectronicmatterthatisprimarilyintendedforreading.

interoperabilityInteroperabilitydescribestheextenttowhichsystemsanddevicescanworktogether,exchangedata,andinterpretthatshareddata.Fortwosystemstobeinteroperable,theymustbeabletoexchangedataandsubsequentlypresentthatdatasuchthatitcanbeunderstoodbyauser.[ResearchDataAlliance,http://smw-rda.esc.rzg.mpg.de/index.php/Interoperability]

licence

Glossary

279

Page 310: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

Apermissionorawrittenevidenceofapermissionthatconfersthelicenseetherighttodosomethingthatotherwisewouldbepreventedbythelaw.

licencecompatibility/interoperabilityTheconditionorstateinwhichtwoormorelicencescanco-existorbecombinedwithoutconflictingwitheachother.InOpenMinTeD,licencecompatibilityandlicenceinteroperabilityareusedassynonyms.

knowledgeresourceAresource(dataand/ortool)containing,producingorrepresentingknowledge;knowledgeisspecificinformationthatisrelevantforthelinguisticandconceptualinterpretationofdata.ForOpenMinTeDpurposes,thisinformationisexploitedorproducedbyTDMmodulesandtools,orexchangedbetweenthem.

languagedescriptionTheresourcedescribesalanguageorsomeaspect(s)ofalanguageviaasystematicdocumentationoflinguisticstructures.[OpenLanguageArchivesCommunity,http://www.language-archives.org/REC/type.html#language_description]Examplesincludesketchgrammar,computationalgrammar,etc.

languageresourceLanguageResources(LRs)encompass(a)datasets(textual,multimodal/multimediaandlexicaldata,grammars,languagemodels,etc.)inmachinereadableform,usedtoassistandaugmentlanguageprocessingapplications,butalso,inabroadersense,inlanguageandlanguage-mediatedresearchstudiesandapplications,and(b)tools/technologies/servicesusedfortheirprocessing.

lexical/conceptualresourceAresourceorganisedonthebasisoflexicalorconceptualentries(lexicalitems,terms,concepts,etc.)withtheirsupplementaryinformation(e.g.grammatical,semantic,statisticalinformation,etc.).InOpenMinTeD,theycanbeusedforannotationpurposes.

Glossary

280

Page 311: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

machinelearning(ML)modelTheprocessoftraininganMLmodelinvolvesprovidinganMLalgorithm(thatis,thelearningalgorithm)withtrainingdatatolearnfrom.ThetermMLmodelreferstothemodelartifactthatiscreatedbythetrainingprocess.[http://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html]

metadataMetadataisstructuredinformationthatdescribes,explains,locates,orotherwisemakesiteasiertoretrieve,use,ormanageaninformationresource.Metadataisoftencalleddataaboutdataorinformationaboutinformation.[NationalInformationStandardsOrganization,Understandingmetadata,http://www.niso.org/publications/press/UnderstandingMetadata.pdf]

openaccess(OA)Thefreeandonlineavailabilityofliterature,whichallowstoread,download,copy,distribute,print,search,orlinktothefulltext,crawlarticlesforindexing,passthemasdatatosoftware,orusethemforanyotherusefulpurpose.Anavailabilitythatisgrantedwithoutfinancial,legal,ortechnicalbarriersotherthanthoseinseparablefromgainingaccesstotheinternetitself,andthoserelatedtogivingauthorscontrolovertheintegrityoftheirworkandtherighttobeproperlyacknowledgedandcited[BudapestOAInitiative2002;BethesdaStatementonOAPublishing2003;BerlinDeclarationonOAKnowledgeinScienceandHumanities2003]

OpenMinTeDinfrastructureAninfrastructurereferstothebasicstructuresandfacilitiesrequiredfortheoperationofasystem.TheOpenMinTeDinfrastructureconsistsofdifferentlayersofresources:contentresourcesthatcanbemined,ancillaryknowledgeresources,toolsandwebservices.AnyresourcethatcanberegisteredintheOpenMinTeDregistryispartoftheunderlyinginfrastructure.

OpenMinTeDplatform

Glossary

281

Page 312: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

TheOpenMinTeDplatformbringstogetheralltheservicesthatfacilitatetheinteroperabilityaspectsoftheunderlyinginfrastructure(e.g.registration,searchandbrowsing,creationofworkflows,processing,annotation,etc.)and,thus,becomesaninfrastructuralserviceofthewiderresearchecosystem.

publicationAbook,article,etc.,thathasbeenmadeavailabletothepubliceitherviaaformalpublicationserviceorovertheinternetandisstoredatanarchiveorrepository.ForOpenMinTeDpurposes,thismainlycoversscholarlypublications.

resourceSomethingthatyoucanusetohelpyoutoachievesomething,especiallyinyourworkorstudy.[MacMillandictionary,http://www.macmillandictionary.com/dictionary/british/resource_1]

rightsstatementFormalorofficialstatementassertingthecopyrightstatusand/orthelicensingconditionsforagivenresource.Itcanbeissuedbyanauthoritativebody(e.g.http://rightsstatements.org/).ForOpenMinTeDpurposes,itcanbedeemedsimilartoa"licencecategory",groupinglicencesthatsharesimilarfeatures.

TextandDataMiningTextandDataMining(TDM)wasinitiallydefinedas“thediscoverybycomputerofnew,previouslyunknowninformation,byautomaticallyextractingandrelatinginformationfromdifferent(…)resources,torevealotherwisehiddenmeanings”(Hearst,1999),inotherwords,“anexploratorydataanalysisthatleadstothediscoveryofheretoforeunknowninformation,ortoanswersforquestionsforwhichtheanswerisnotcurrentlyknown”(Hearst,1999).[FutureTDM,http://www.futuretdm.eu/news/tdm-definition/]

service/webservicePieceofsoftwareaccessiblethroughremoteinvocationtypicallyusingsomeREST-styleAPIsorSOAPprotocols.

Glossary

282

Page 313: Platform Interoperability Guidelinesopenminted.eu/wp-content/uploads/2017/10/OpenMinTeD_D5.5...Platform Interoperability Guidelines March 16, 2017 Deliverable Code: D5.5 Version: 1.0

toolPieceof(standalone)softwaretypicallyforaverylimitedtechnicalpurpose,suchasaparticularimplementationofapart-of-speechtagger(e.g.TreeTagger),atreeparsingprogram(e.g.mstparser),etc.PreferredtermsinOpenMinTeDinclude'component'and'workflow'.

workflowAseriesofsoftwarecomponentsassembledtogetherinordertoperformaspecifictask.

Glossary

283