Post on 05-Jan-2016
www.sti-innsbruck.at © Copyright 2013 STI INNSBRUCK www.sti-innsbruck.at
“How to put an annotation in HTML?”
Ioannis Stavrakantonakis
www.sti-innsbruck.at
Outline
2
• Research question
• ITS 2.0
• NIF
• What about Microdata?
• Demo
• References
www.sti-innsbruck.at
Research question
3
We want to annotate Springfield with an URI to make sure that the computer understands we mean the Springfield in Massachusetts.
HTML:
<p>It is well known, that Springfield has mild summers and short, but hard winters.</p>
HTML with annotation (something like that):
<p>It is well known, that
<span about="http://sws.geonames.org/4951788/">Springfield</span>
has mild summers and short, but hard winters.</p>
We don't want to add whole triples, but just annotate the HTML and say "this element refers to the following URI".
From: Denny Vrandečić Sent: Wednesday, April 24, 2013 1:59 PM
To: semantic-web at W3C Subject: How to put an annotation in HTML?
www.sti-innsbruck.at
ITS 2.0
4
• International Tag Set (ITS) [2]
– enhances the foundation to integrate automated processing of human language into core Web technologies;
– focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF);
– is a technology to add metadata to Web content, for the benefit of localization, language technologies, and internationalization (see more in [5] regarding localization (l10n) and internationalization (i18n))
www.sti-innsbruck.at
ITS 2.0
5
• Potential Users of ITS [2]:
– Schema developers starting a schema from the ground up(proposals for attribute and element names to be included in their new schema)
– Schema developers working with an existing schema(should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema)
– Vendors of content-related tools (e.g. tools for authoring, translation, etc.)
– Content producers (may be used by them to mark up specific bits of content)
– Machine Translation Systems
– Text Analytics (automatically generated metadata for improving localization, data integration or knowledge management workflows)
– Localization Workflow Managers
www.sti-innsbruck.at
ITS 2.0
6
The Text Analysis use case:
•This data category is used to annotate content with lexical or conceptual information for the purpose of contextual disambiguation.
•3 pieces of annotation:
– Confidence: The confidence of the agent (that produced the annotation) in its own computation – XSD double data type (e.g. 0.63)
– Entity type: The type of entity, or concept class of the text analysis target – IRI (e.g. http://nerd.eurecom.fr/ontology#Location [8])
– Entity identifier: A unique identifier for the text analysis target – IRI or String (e.g. http://dbpedia.org/page/Innsbruck or the identifier for “Capital” from Wordnet [9])
www.sti-innsbruck.at
ITS 2.0
7
Rendered HTML:
HTML with ITS metadata:
<html xmlns="http://www.w3.org/1999/xhtml"><body>
<h2 translate="yes">Welcome to <span its-ta-ident-ref="http://dbpedia.org/page/Innsbruck" its-within-text="yes" translate="no">Innsbruck</span> in <b translate="no" its-within-text="yes">Austria</b>!</h2>
</body></html>
www.sti-innsbruck.at
ITS 2.0
8
• Conversion to NIF [2]:
– Convert XML or HTML documents that contain ITS metadata to the RDF-based format based on NIF. The conversion results in RDF.
– The conversion algorithm to generate NIF consists of seven steps. The output of the algorithm uses the ITS RDF ontology [7].
– The conversion to NIF is a possible basis for a natural language processing (NLP) application that creates, for example, named entity annotations.
– To integrate the RDF annotations into the original input document is given in [6] (NIF2ITS).
www.sti-innsbruck.at
NLP Interchange Format (NIF)
9
• NIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.
• NIF will soon be a normative part of the ITS 2.0
• NIF and its community project NLP2RDF serve as an umbrella project liaising with other community of practices, especially:
– LOD2 FP7 EU project– MultilingualWeb-LT Working Group– Best Practices for Multilingual Linked Open Data Community Group– Ontology-Lexica Community Group– Named Entity Recognition and Disambiguation (NERD)– Ontologies of Linguistic Annotation (OLiA)
• University of Leipzig
www.sti-innsbruck.at
How is it different to Microdata annotations?
10
What is the latitude and longitude of the <span ?=?>Empire State Building</span>?
<span its-ta-ident-ref="http://live.dbpedia.org/page/Empire_State_Building">
Empire State Building</span>
<div itemscope itemtype="http://schema.org/Place">
What is the latitude and longitude of the
<span itemprop="name">Empire State Building</span>?
</div>
Microdata + schema.org
ITS2.0 + dbpedia resource
www.sti-innsbruck.at
How is it different to Microdata annotations?
11
What is the latitude and longitude of the <span ?=?>Empire State Building</span>?
Semantics of ITS2.0 annotations:
Specify entity identifiers (IRIs) for the presented information item.
Semantics of Microdata annotations:
Specify the type of information that is presented.
Microdata
ITS2.0
www.sti-innsbruck.at
Hands-on / Demo
12
• HTML with ITS metadata
• Transformation of HTML with ITS metadata to NIF
Notes:
• Based on the XSLT files shared by the W3C Working Group member Felix Sasaki (@fsasaki) [4]
• The Java internal XSLTC processor fails to compile the XSLTs. Use Saxon 9 HE.
www.sti-innsbruck.at
References
[1] W3C semantic web list thread: http://lists.w3.org/Archives/Public/semantic-web/2013Apr/0218.html
[2] ITS 2.0 W3C working draft: http://www.w3.org/TR/its20/
[3] NIF Core Ontology: http://persistence.uni-leipzig.org/nlp2rdf/
[4] Felix Sasaki ITS 2.0 extractor (github): https://github.com/fsasaki/its20-extractor
[5] W3C, Localization vs. Internationalization: http://www.w3.org/International/questions/qa-i18n
[6] W3C, Conversion NIF2ITS: http://www.w3.org/TR/its20/#nif-backconversion
[7] W3C, ITS 2.0 / RDF Ontology: http://www.w3.org/2005/11/its/rdf-content/its-rdf.html
[8] Named Entity Recognition and Disambiguation (NERD): http://nerd.eurecom.fr/ontology
[9] WordNet Search 3.1: http://wordnetweb.princeton.edu/perl/webwn
13