Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language...

79
WebNLG 2016 Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web 6 September 2016 Edinburgh, Scotland

Transcript of Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language...

Page 1: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

WebNLG 2016

Proceedings of the2nd International Workshop

onNatural Language Generation

and the Semantic Web

6 September 2016Edinburgh, Scotland

Page 2: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

WebNLG 2016 is sponsored by:

the French National Research Agency Project ANR-14-CE24-0033 “Generating Text fromSemantic Web Data” (WebNLG)

c©2016 The Association for Computational Linguistics

Order copies of this and other ACL proceedings from:

Association for Computational Linguistics (ACL)209 N. Eighth StreetStroudsburg, PA 18360USATel: +1-570-476-8006Fax: [email protected]

ii

Page 3: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Introduction

It is with great pleasure that we present the current volume of papers accepted for presentation at the2nd International Workshop on Natural Language Generation and the Semantic Web to be held onSeptember 6th, 2016 in Edinburgh, Scotland.

The WebNLG 2016 workshop is a follow up to a first WebNLG workshop which was held in Nancy onJune 12th, 2015. Funded by the French ANR WebNLG Project, these two workshops aim to providea forum for presenting and discussing research on Natural Language Generation from Semantic Webdata.

WebNLG 2016 invited submissions on all topics related to natural language generation and the SemanticWeb. We received 15 submissions from all over the world. Of these 5 long papers and 8 short paperswere accepted for presentation. The long papers will be presented orally, and the short papers as posters.

In addition, WebNLG 2016 hosts an Invited talk by Roberto Navigli from Sapienza University (Rome,Italy) on the past, present and future of Babelnet .

We are indebted to the authors and to the members of our program committee for their work whichcontributed to make for a very enjoyable workshop. We are also delighted that Roberto Navigli agreedto give an invited talk at WebNLG 2016. Last but not least, many thanks go to the local organisationteam, Emilie Colin, Bikash Gyawali, Mariem Mahfoudh and Laura Perez-Beltrachini for handling thewebsite and the preparation of the meeting.

Claire Gardent and Aldo GangemiProgram co-Chairs for WebNLG 2016

iii

Page 4: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Program Chairs:

Aldo Gangemi, Universite Paris 13, Paris (France)Claire Gardent, CNRS/LORIA, Nancy (France)

Organisation Committee:

Bikash Gyawali CNRS/LORIA, Nancy (France). Chair.Laura Perez-Beltrachini CNRS/LORIA, Nancy (France), Chair.Emilie Colin, CNRS/LORIA , Nancy (France), Webmaster.Mariem Mahfoudh, CNRS/LORIA, Nancy (France)

Program Committee:

Mehwish Alam, LIPN Universite Paris 13 (France)Nathalie Aussenac-Gilles, CNRS/IRIT Toulouse (France)Valerio Basile, INRIA, Sophia Antipolis (France)Gerard Casamayor, Universitat Pompeu Fabra (Spain)Vinay Chaudhri, SRI International, Menlo Park (USA)Mathieu Dacquin, The Open University (UK)Claudia d’Amato, Bari University (Italy)Brian Davis, INSIGHT, Galway (Ireland)Marc Dymetman, XRCE, Grenoble (France)Enrico Franconi, KRDB, Bolzano (Italy)Bikash Gyawali, CNRS/LORIA, Nancy (France)Guy Lapalme, RALI / Universite de Montreal (Canada)Shao-Fen Liang, Kings College London (UK)Elena Lloret, University of Alicante (Spain)Vanessa Lopez, IBM Ireland Research Lab, Dublin (Ireland)Mariem Mahfoudh, CNRS/LORIA, Nancy (France)Yassine M’rabet, U.S. National Library of Medicine, Bethesda (USA)Shashi Narayan, University of Edinburgh (UK)Axel-Cyrille Ngonga-Ngomo, University of Leipzig (Germany)Laura Perez-Beltrachini, CNRS/LORIA, Nancy (France)Sergio Tessaris, KRDB, Bolzano (Italy)Allan Third, The Open University (UK)Yannick Toussaint, INRIA/LORIA, Nancy (France)Christina Unger, CITEC, Universitt Bielefeld (Germany)

iv

Page 5: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Invited Speaker

Roberto Navigli, Sapienza Universita di Roma, Roma, Italy

BabelNet: past, present and future

In this talk I will overview work done in my group at the Linguistic Computing Laboratory in the Com-puter Science Department of the Sapienza University of Rome which addresses key problems in mul-tilingual lexical semantics. I will start from a brief introduction to BabelNet, the largest multilingualsemantic network and encyclopedic dictionary covering 14 million concepts and entities, and 271 lan-guages, also at the core of the so-called Linguistic Linked Open Data cloud. I will move on to WordSense Disambiguation and Entity Linking in arbitrary languages with ”zero training” (Babelfy) and thenpresent recent latent and explicit vector representations of meaning which obtain state-of-the-art resultsin several NLP tasks. Finally, I will present my plan for making BabelNet a sustainable, continuously-improved resource. This is joint work with several people from my NLP group at Sapienza.

Roberto Navigli is an Associate Professor in the Department of Computer Science of the Sapienza Uni-versity of Rome. He was awarded the Marco Somalvico 2013 AI*IA Prize for the best young researcherin AI. He was the first Italian recipient of an ERC Starting Grant in computer science (2011-2016), anda co-PI of a Google Focused Research Award on Natural Language Understanding. In 2015 he receivedthe META prize for groundbreaking work in overcoming language barriers with BabelNet. His researchlies in the field of multilingual Natural Language Processing. Currently he is an Associate Editor of theArtificial Intelligence Journal.

v

Page 6: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Table of Contents

Generating Sets of Related Sentences from Input Seed FeaturesCristina Barros and Elena Lloret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

A Repository of Frame Instance Lexicalizations for GenerationBasile Valerio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Processing Document Collections to Automatically Extract Linked Data: Semantic Storytelling Tech-nologies for Smart Curation Workflows

Peter Bourgonje, Julian Moreno Schneider, Georg Rehm and Felix Sasaki . . . . . . . . . . . . . . . . . . . . 13

On the Robustness of Standalone Referring Expression Generation Algorithms Using RDF DataPablo Duboue, Martin Ariel Dominguez and Paula Estrella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Content Selection through Paraphrase Detection: Capturing different Semantic Realisations of the SameIdea

Elena Lloret and Claire Gardent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Aligning Texts and Knowledge Bases with Semantic Sentence SimplificationYassine Mrabet, Pavlos Vougiouklis, Halil Kilicoglu, Claire Gardent, Dina Demner-Fushman,

Jonathon Hare and Elena Simperl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Building a System for Stock News Generation in RussianLiubov Nesterenko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Content selection as Semantic-Based Ontology ExplorationLaura Perez-Beltrachini, Claire Gardent, Anselme Revuz and Saptarashmi Bandyopadhyay . . . . 41

ReadME generation from an OWL ontology describing NLP toolsDriss Sadoun, Satenik Mkhitaryan, Damien Nouvel and Mathieu Valette . . . . . . . . . . . . . . . . . . . . . 46

Comparing the Template-Based Approach to GF: the case of AfrikaansLauren Sanby, Ion Todd and C.Maria Keet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Generating Paraphrases from DBPedia using Deep LearningAmin Sleimi and Claire Gardent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Automatic Tweet Generation From Traffic Incident DataKhoa Tran and Fred Popowich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Analysing the Integration of Semantic Web Features for Document Planning across GenresMarta Vicente and Elena Lloret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

vi

Page 7: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Conference Program

Tuesday, September 6, 2016

9:00–10:00 Invited Talk

BabelNet: Past, Present and FutureRoberto Navigli

10:00–10:45 PostersAnalysing the Integration of Semantic Web Features for Document Planning acrossGenresMarta Vicente and Elena Lloret

Building a System for Stock News Generation in RussianLiubov Nesterenko

Comparing the template-based approach to GF: the case of AfrikaansLauren Sanby, Ion Todd and C. Maria Keet

Content selection as Semantic-Based Ontology ExplorationLaura Perez-Beltrachini, Claire Gardent, Anselme Revuz and Saptarashmi Bandy-opadhyay

Content Selection through Paraphrase Detection: Capturing different Semantic Re-alisations of the Same IdeaElena Lloret and Claire Gardent

Generating Sets of Related Sentences from Input Seed FeaturesCristina Barros and Elena Lloret

Processing Document Collections to Automatically Extract Linked Data: SemanticStorytelling Technologies for Smart Curation WorkflowsPeter Bourgonje, Julian Moreno Schneider, Georg Rehm and Felix Sasaki

ReadME Generation from an OWL Ontology Describing NLP ToolsDriss Sadoun, Satenik Mkhitaryan, Damien Nouvel and Mathieu Valette

Oral presentations

10:45–11:05 Generating Paraphrases from DBPedia using Deep LearningAmin Sleimi and Claire Gardent

11:05–11:25 Aligning Texts and Knowledge Bases with Semantic Sentence SimplificationYassine Mrabet, Pavlos Vougiouklis, Halil Kilicoglu, Claire Gardent, Dina Demner-Fushman, Jonathon Hare and Elena Simperl

vii

Page 8: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Oral presentations (continued)

11:25–11:45 On the Robustness of Standalone Referring Expression Generation Algorithms Using RDFDataPablo Duboue, Martin Ariel Dominguez and Paula Estrella

11:45–12:05 A Repository of Frame Instance Lexicalizations for GenerationBasile Valerio

12:05–12:25 Automatic Tweet Generation From Traffic Incident DataKhoa Tran and Fred Popowitch

viii

Page 9: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 1–4,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Generating sets of related sentences from input seed features

Cristina BarrosDepartment of Softwareand Computing SystemsUniversity of AlicanteApdo. de Correos 99

E-03080, Alicante, [email protected]

Elena LloretDepartment of Softwareand Computing SystemsUniversity of AlicanteApdo. de Correos 99

E-03080, Alicante, [email protected]

1 Introduction

The Semantic Web (SW) can provide Natural Lan-guage Generation (NLG) with technologies capa-ble to facilitate access to structured Web data. Thistype of data can be useful to this research area,which aims to automatically produce human ut-terances, in its different subtasks, such as in thecontent selection or its structure.

NLG has been widely applied to several fields,for instance to the generation of recommenda-tions (Lim-Cheng et al., 2014). However, gener-ation systems are currently designed for very spe-cific domains (Ramos-Soto et al., 2015) and pre-defined purposes (Ge et al., 2015). The use ofSW’s technologies can facilitate the developmentof more flexible and domain independent systems,that could be adapted to the target audience or pur-poses, which would considerably advance the stateof the art in NLG. The main objective of this pa-per is to propose a multidomain and multilingualstatistical approach focused on the surface reali-sation stage using factored language models. Ourproposed approach will be tested in the context oftwo different domains (fairy tales and movie re-views) and for the English and Spanish languages,in order to show its appropriateness to differentnon-related scenarios. The main novelty studied inthis approach is the generation of related sentences(sentences with related topics) for different do-mains, with the aim to achieve cohesion betweensentences and move forward towards the genera-tion of coherent and cohesive texts. The approachcan be flexible enough thanks to the use of an inputseed feature that guides all the generation process.Within our scope, the seed feature can be seen asan abstract object that will determine how the sen-tence will be in terms of content. For example,this seed feature could be a phoneme, a propertyor a RDF triple from where the proposed approach

could generate a sentence.

2 Factored Language Models and NLG

Factored language models (FLM) are an exten-sion of language models proposed in (Bilmesand Kirchhoff, 2003). In this model, a word isviewed as a vector of k factors such that wt ≡{f1t , f2t , . . . , fKt }. These factors can be anything,including the Part-Of-Speech (POS) tag, lemma,stem or any other lexical, syntactic or semanticfeature. Once a set of factors is selected, the mainobjective of a FLM is to create a statistical modelP (f |f1, . . . , fN ) where the prediction of a featuref is based on N parents {f1, . . . , fN}. For exam-ple, if w represents a word token and t representsa POS tag, the expression P (wi|wi−2, wi−1, ti−1)provides a model to determine the current word to-ken, based on a traditional n-gram model togetherwith the POS tag of the previous word. Therefore,in the development of such models there are twomain issues to consider: 1) choose an appropriateset of factors, and 2) find the best statistical modelover these factors.

In recent years, FLM have been used in sev-eral areas of Computational Linguistics, mostly inmachine translation (Crego, 2010; Axelrod, 2006)and speech recognition (Tachbelie et al., 2011;Vergyri et al., 2004). To a lesser extent, theyhave been also employed for generating language,mainly in English. This is the case of the BAGELsystem (Mairesse and Young, 2014), where FLM(with semantic concepts as factors) are used to pre-dict the semantic structure of the sentence that isgoing to be generated; or OpenCCG (White andRajkumar, 2009), a surface realisation tool, whereFLM (with POS tag and supertags as factors) areused to score partial and complete realisations tobe later selected. More recently, FLM (with POStag, word and lemma as factors) were used to

1

Page 10: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

rank generated sentences in Portuguese (Novaisand Paraboni, 2012).

The fact of generating connected and relatedsentences is a challenge in itself, and, to the bestof our knowledge there is not any research withthe restriction of containing words with a specificseed feature, thus leading to a more flexible NLGapproach that could be easily adapted to differentpurposes, domains and languages.

3 Generating Related Sentences UsingFLM

We propose an almost-fully language independentstatistical approach focused on the surface realisa-tion stage and based on over-generation and rank-ing techniques, which can generate related sen-tences for different domains. This is achievedthrough the use of input seed features, which areabstract objects (e.g., a phoneme, a semantic class,a domain, a topic, or a RDF triple) that will guidethe generation process in relation to the most suit-able vocabulary for a given purpose or domain.

Starting from a training corpus, a test corpusand a seed feature as the input of our approach, aFLM will be learnt over the training corpus and abag of words with words related with the seed fea-ture will be obtained from the test corpus. Then,based on the FLM and bag of words previouslyobtained, the process will generate several sen-tences for a given seed feature, which will be sub-sequently ranked. This process will prioritise theselection of words from the bag of words to guar-antee that the generated sentences will contain themaximum number of words related with the inputseed feature. Once several sentences are gener-ated, only one of them will be selected based onthe sentence probability, that will be computed us-ing a linear combination of FLMs.

When a sentence is generated, we will performpost-tagging, syntactic parsing and/or semanticparsing to identify several linguistic componentsof the sentence (such as the subject, named en-tities, etc.) that will also provide clues about itsstructural shape. This will allow us to generatethe next sentence taking into account the shape ofthe previous generated one, and the structure wewant to obtain (e.g., generating sentences aboutthe same subject with complementary informa-tion).

4 Experimental scenarios and resources

For our experimentation, we want to consider twodifferent scenarios, NLG for assistive technolo-gies and sentiment-based NLG. Within the firstscenario, the experimentation will be focused onthe domain of fairy tales. The purpose in thisscenario is the generation of stories that can beuseful for therapies in dyslalia speech therapies(Rvachew et al., 1999). Dyslalia is a disorder inphoneme articulation, so the repetition of wordswith problematic phonemes can improve their pro-nunciation. Therefore, in this scenario, the se-lected seed feature will be a phoneme, where thegenerated sentences will contain a large numberof words with a concrete phoneme. As corpora, acollection of Hans Christian Andersen tales will beused due to the fact that its vocabulary is suitablefor young audience, since dyslalia affects more tothe child population, having a 5-10% incidenceamong them (Conde-Guzon et al., 2014).

Regarding the second scenario, the experimen-tation will be focused on generating opinionatedsentences (i.e., sentences with a positive or neg-ative polarity) in the domain of movie reviews.Taking into account that there are many Websiteswhere users express their opinions by means ofnon-linguistic rates in the form of numeric valuesor symbols1, the generation of this kind of sen-tences can be used to generate sentences from vi-sual numeric rates. Given this proposed scenario,we will employ the Spanish Movie Reviews cor-pus2 and the Sentiment Polarity Dataset (Pang andLee, 2004) as our corpora for Spanish and English,respectively.

In order to learn the FLM that will be used dur-ing the generation, we will use SRILM (Stolcke,2002), a software which allows to build and applystatistical language models, which also includesan implementation of FLM.

In addition, Freeling language analyser (Padroand Stanilovsky, 2012) will be also employed totag the corpus with lexical information as well asto perform the syntactic analysis and the name en-tity recognition of the generated sentences. Fur-thermore, in order to obtain and evaluate the po-larity for our second proposed domain, we willemploy the sentiment analysis classifier describedand developed in (Fernandez et al., 2013).

1An example of such a Website can be found at:http://www.reviewsdepeliculas.com/

2http://www.lsi.us.es/ fermin/corpusCine.zip

2

Page 11: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

5 Preliminary Experimentation

As an initial experimentation, we design a simplegrammar (based on the basic clause structure thatdivides a sentence into subject and predicate) togenerate sets of sentences which will have relatedtopics (nouns) with each other, since these topicswill appear within the set.

In this case, we generate the sentences with thestructure shown in Figure 1, where we use the di-rect object of the previous generated sentences asthe subject for the following sentence to be pro-duced, so that we can obtain a preliminary set ofrelated sentences.

The words contained in these preliminary re-lated sentences are in a lemma form since this con-figuration proved to works better than others, be-ing able to be further inflected in order to obtainseveral inflections of the sentences from where thefinal generated one will be chosen.

S → NP VPNP → D NVP → V NP

Figure 1: Basic clause structure grammar.

With this structure we generated a set of 3 re-lated sentences for each phoneme in both lan-guages, Spanish and English, and another set of3 related sentences for positive and negative po-larities in the languages mentioned before.

These sentences have the structure seen aboveand were ranked according to the approach out-lined in section 3 being the linear combination ofFLM as follows: P (wi) = λ1P (fi|fi−2, fi−1) +λ2P (fi|pi−2, pi−1)+λ3P (pi|fi−2, fi−1), where fcan be can be either a lemma and a word, p refersto a POS tag, and λi are set λ1 = 0.25, λ2 = 0.25and λ3 = 0.5. These values were empirically de-termined.

Some examples of the generated sentences forthe first scenario, concerning the generation ofsentences for assistive technologies, is shown inFigure 2. In some of the sets of generated sen-tences, the same noun appears as a direct objectin both, the first and the third generated sentencesfor that set. On the other hand, examples of sets ofsentences generated in both, English and Spanish,for the second experimentation scenario (moviereviews domain) are shown in the Figure 3.

Generally, the generated sentences for our twoexperimentation scenarios, conform to the speci-fied in section 4, although in some cases the verbs

SpanishPhoneme: /n/Cuanto cosa tener nuestro pensamiento.(How much thing have our thinking.)Cuanto pensamiento tener nuestro corazon.(How much thought have our heart.)Cuanto corazon tener nuestro pensamiento.(How much heart have our thinking.)

EnglishPhoneme: /s/These child say the princess.Each princess say the shadow.Each shadow pass this story.

Figure 2: Example generated sentences for the as-sistive technologies scenario.

in these sentences need the inclusion of preposi-tion in order to bring more correctness to the gen-erated sentences.

SpanishPolarity: NegativeEste defecto ser el asesino.(This defect being the murderer.)Su asesino ser el policıa.(His murderer be the police.)El policıa interpretar este papel.(The police play this role.)

EnglishPolarity: NegativeMany critic reject the plot.This plot confuse the problem.The problem lie this mess.

Figure 3: Example generated sentences for moviereviews domain in our second scenario.

At this stage, these preliminary set of generatedrelated sentences are a promising step towards ourfinal goal, since the number of words with theseed feature among the sentences are more thanthe number of words of the sentences, meeting theoverall objective for which they were generated.Although the grammar used in the generation ofthese sentences only captures the basic structurefor the two languages studied, the use of morecomplex grammars could give us insights to im-prove some aspects of the generation of these pre-liminary sentences in the future.

6 Ongoing research steps

In order to enrich this approach and meet the finalgoal, we want to deeply research into some of therepresentation languages used by the SW, such asOWL, as well as its technologies, that fit our pro-posed approach. Obtaining information related toa certain topic is tough without using any kind of

3

Page 12: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

external technology, so the employing of SW lan-guages, such as RDF, can facilitate us accessingthis type of information.

In the future, we would like to analyse how thegenerated sentences could be connected using dis-course markers. We also would like to test the gen-eration of sentences using other structural shapes,such as sharing the same subject or sentences shar-ing the same predicative objects with differentsubjects. The generation of related sentences isnot a trivial task, being the cohesion and coher-ence between sentences very hard to be checkedautomatically. So, in that case, we plan to conductan exhaustive user evaluation of the generated sen-tences using crowdsourcing platforms.

Acknowledgments

This research work has been funded by theUniversity of Alicante, Generalitat Valenciana,Spanish Government and the European Commis-sion through the projects GRE13-15, PROM-ETEOII/2014/001, TIN2015-65100-R, TIN2015-65136-C2-2-R, and FP7-611312, respectively.

References

Amittai Axelrod. 2006. Factored language models forstatistical machine translation. master thesis. univer-sity of edinburgh.

Jeff A. Bilmes and Katrin Kirchhoff. 2003. Factoredlanguage models and generalized parallel backoff.In Proceedings of the 2003 Conference of the NorthAmerican Chapter of the Association for Compu-tational Linguistics on Human Language Technol-ogy: Companion Volume of the Proceedings of HLT-NAACL 2003–short Papers - Volume 2, pages 4–6.

Pablo Conde-Guzon, Pilar Quiros-Exposito,Marıa Jesus Conde-Guzon, and Marıa TeresaBartolome-Albistegui. 2014. Perfil neurop-sicologico de ninos con dislalias: alteracionesmnesicas y atencionales. Anales de Psicologıa,30:1105 – 1114.

Francois Crego, Josep M.and Yvon. 2010. Factoredbilingual n-gram language models for statistical ma-chine translation. Machine Translation, 24(2):159–175.

Javi Fernandez, Yoan Gutierrez, Jose Manuel Gomez,Patricio Martınez-Barco, Andres Montoyo, andRafael Munoz. 2013. Sentiment analysis of span-ish tweets using a ranking algorithm and skipgrams.Proc. of the TASS workshop at SEPLN 2013, pages133–142.

Tao Ge, Wenzhe Pei, Heng Ji, Sujian Li, BaobaoChang, and Zhifang Sui. 2015. Bring you tothe past: Automatic generation of topically relevantevent chronicles. In Proceedings of the 53rd AnnualMeeting of the Association for Computational Lin-guistics and the 7th International Joint Conferenceon Natural Language Processing, pages 575–585,July.

Natalie R. Lim-Cheng, Gabriel Isidro G. Fabia, MarcoEmil G. Quebral, and Miguelito T. Yu. 2014. Shed:An online diet counselling system. In DLSU Re-search Congress 2014.

Francois Mairesse and Steve Young. 2014. Stochasticlanguage generation in dialogue using factored lan-guage models. Comput. Linguist., 40(4):763–799.

Eder Miranda Novais and Ivandre Paraboni. 2012.Portuguese text generation using factored languagemodels. Journal of the Brazilian Computer Society,19(2):135–146.

Lluıs Padro and Evgeny Stanilovsky. 2012. Freeling3.0: Towards wider multilinguality. In Proceedingsof the Eight International Conference on LanguageResources and Evaluation.

Bo Pang and Lillian Lee. 2004. A sentimental educa-tion: Sentiment analysis using subjectivity summa-rization based on minimum cuts. In Proceedings ofthe ACL.

A. Ramos-Soto, A. J. Bugarn, S. Barro, and J. Taboada.2015. Linguistic descriptions for automatic genera-tion of textual short-term weather forecasts on realprediction data. IEEE Transactions on Fuzzy Sys-tems, 23(1):44–57.

Susan Rvachew, Susan Rafaat, and Monique Martin.1999. Stimulability, speech perception skills, andthe treatment of phonological disorders. AmericanJournal of Speech-Language Pathology, 8(1):33–43.

Andreas Stolcke. 2002. Srilm - an extensible lan-guage modeling toolkit. In Proceedings Interna-tional Conference on Spoken Language Processing,vol 2., pages 901–904.

Martha Yifiru Tachbelie, Solomon Teferra Abate, andWolfgang Menzel, 2011. Human Language Tech-nology. Challenges for Computer Science and Lin-guistics: 4th Language and Technology Conference,chapter Morpheme-Based and Factored LanguageModeling for Amharic Speech Recognition, pages82–93. Springer Berlin Heidelberg.

Dimitra Vergyri, Katrin Kirchhoff, Kevin Duh, and An-dreas Stolcke. 2004. Morphology-based languagemodeling for arabic speech recognition. In INTER-SPEECH, volume 4, pages 2245–2248.

Michael White and Rajakrishnan Rajkumar. 2009.Perceptron reranking for ccg realization. In Pro-ceedings of the 2009 Conference on Empirical Meth-ods in Natural Language Processing, pages 410–419. Association for Computational Linguistics.

4

Page 13: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 5–12,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

A Repository of Frame Instance Lexicalizations for Generation

Valerio BasileUniversite Cote d’Azur, Inria, CNRS, I3S, France

[email protected]

Abstract

Robust, statistical Natural Language Gen-eration from Web knowledge bases is hin-dered by the lack of text-aligned resources.We aim to fill this gap by presentinga method for extracting knowledge fromnatural language text, and encode it ina format based on frame semantics andready to be distributed in the Linked OpenData space. We run an implementation ofsuch methodology on a collection of shortdocuments and build a repository of frameinstances equipped with fine-grained lex-icalizations. Finally, we conduct a pilotstody to investigate the feasibility of an ap-proach to NLG based on said resource. Weperform error analysis to assess the qualityof the resource and manually evaluate theoutput of the NLG prototype.

1 Introduction

Statistical Natural Language Generation, gener-ally speaking, is based on learning a mappingbetween natural language expressions (words,phrases, sentences) and abstract representationsof their meaning or syntactic structure. In fact,such representations vary greatly in their degreeof abstraction, from shallow syntactic trees to full-fledged logical formulas, depending on factors likedownstream applications and the role of the gen-eration module in a larger framework.

In order to be useful for statistical generation,the abstract representation needs to be alignedwith the surface form. Depending on the for-mat, the level of abstraction and the target de-gree of granularity of the alignment, it may bemore or less straightforward to produce a col-lection of pairs <abstract representation, surfaceform>. Moreover, statistical methods typically

need a large number of examples to properly learna mapping and generalize efficiently.

While several resources have been successfullyemployed as training material for statistical NLG(see the related work section), they lack a di-rect link with world knowledge. Linked OpenData resources, in particular general knowledgebases such as DBpedia1, on the other hand, arenot straightfoward to use as a basis for genera-tion, while at the same time they are rich in extra-linguistic information such as type hierarchy andsemantic relations. Having the entities and con-cepts of an abstract meaning representation linkedto a knowledge base allows a generator to use allthe information coming from links to other re-sources in the LOD cloud. Such kind of input to aNLG pipeline is therefore richer than word-basedstructures, although its increased level of abstrac-tion makes the generation process more complex.

Shifting the level of abstraction, the representa-tion format must be changed accordingly. In thecase of many formats proposed in the literature(e.g., the format of the Surface Realization sharedtask), the input for NLG is made of structuresclosely resembling sentences. The notion of sen-tence, however, might not be adequate anymorewhen the abstract representation of meaning aimsto be fit for the standards of the Web. A good com-promise is a representation based on frame seman-tics (Fillmore, 1982). A frame is a unit of mean-ing denoting a situation of a particular type, e.g.,Operate vehicle. Attached to the frame there are anumber of frame elements, indicating roles that theentities involved in the frame can play, e.g., Driveror Vehicle. Rouces et al. (2015) proposes a LODversion of frame semantics implemented in the re-source called FrameBase, essentially a scheme forrepresenting instances of frames and frame ele-

1http://dbpedia.org

5

Page 14: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

ments in a Web-based format. The FrameBaseproject also produced a repository of instances cre-ated by automatically translating existing Web re-sources. Moreover, they made available a large setof (de)reification rules, that is, bidirectional rulesto convert between binary relations and frame-based representations. For instance, the binary re-lation drivesVehicle can be transformed by a reifi-cation rule into a Operate vehicle frame with thetwo members of the original relation filling in theroles of Driver and Vehicle. The reification mech-anism provides an interesting use case for NLG: ifa system is able to generate natural langauge froma frame instance, then it is also able to generatefrom the corresponding binary relation.

In this paper, we present an ongoing worktowards the construction of a domain-agnostic,LOD-compliant knowledge base of semanticframe instances. Frames, roles and entities arealigned to natural language words and phrases thatexpress them, extracted from a large corpus oftext. Thanks to this alignment, the resource canbe used to create lexicalizations for new, unseenconfigurations of entities and frames.

2 Related Work

Several resources exists have been used to train astatistical generator to learn lexicalizations for var-ious types of representations The Surface Realiza-tion Shared Task (Belz et al., 2011), for instance,provides a double dataset of shallow and deep in-put representations obtained by preprocessing theCoNNL 2008 Shared Task data (Surdeanu et al.,2008). Resources used for NLG include includ-ing the Penn Treebank (Marcus et al., 1993) forProbabilistic Lexical Functional Grammar (Cahilland Genabith, 2006) and CCGBank (Hockenmaierand Steedman, 2007) for Combinatory CategorialGrammar syntax trees (White et al., 2007). Morerecently, the Groningen Meaning Bank (Basileet al., 2012) has been proposed as a resourcefor NLG from abstract meaning representations,leveraging the fine-grained alignment betweenlogical forms and their respective surface formsgiven by the Discourse Representation Graph for-malism (Basile and Bos, 2013).

The process of generating natural languagefrom databases of structured information, in-cluding ones following Web standards, has beenstudied in the past, although often in specificapplication-oriented contexts. Bouayad-Agha et

al. (2012) propose an architecture as a basis forgeneration made of three RDF/OWL ontologies,separation the domain knowledge from the com-munication knowledge. Gyawali and Gardent(2014) propose a statistical approach to NLG fromknowledge bases based on tree adjoining gram-mars. WordNet is relatively less used for gener-ation purposes. Examples of the use of Word-net in the context of NLG include the methodsto address specific NLG-related tasks proposed byJing (1998) and the algorithm for lexical choice ofBasile (2014).

3 Aligning Text and Semantics

Basile and Bos (2013) devise a strategy to alignarbitrary natural languages expressions to for-mal representation of their meaning, encoded asDiscourse Representation Structures (DRS, Kampand Reyle (1993)). DRSs are logical formulascomprising predicates and relations over discoursereferents. For the English language, we are ableto obtain DRSs for a given text using the C&Ctools collection of linguistic analysis tools (Cur-ran et al., 2007), which includes Boxer (Bos,2008), a rule-based system that builds DRSs ontop of the CCG parse tree produced by the C&Cparser. Boxer implements Neo-davidsonian rep-resentations of meaning, that is, formulas centeredaround events to which participant entities are con-nected by filling thematic roles. Figure 1 shows anexample of DRS for the sentence “A robot is driv-ing the car” as produced by Boxer. In this exam-ple the Neo-davidsonian semantics is evident: theROBOT is the AGENT of the event DRIVE, whilethe CAR is the THEME.

e1 x1 x2

ROBOT(x1)DRIVE(e1)CAR(x2)AGENT(e1, x1)THEME(e1, x2)

Figure 1: DRS representing the meaning of thesentence “A robot is driving the car”

The alignment method proposed by Basile andBos (2013) is based on a translation of formatfrom DRS into a Discourse Representation Graph(DRG), where the semantic information is pre-served but expressed in a flat, non recursive for-malism. The surface form is then aligned at the

6

Page 15: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

word level to the appropriate tuples. Figure 2shows the DRG corresponding to the DRS in Fig-ure 1, where the alignment with the surface formis contained in the two rightmost columns. For thedetails of how the alignment is encoded we referthe reader to the aforementioned paper (Basile andBos, 2013).

k1 referent x1 1 [A]k1 referent e1

k1 referent x2 1 [the]k1 event DRIVEk1 concept ROBOTk1 role AGENTk1 concept CUSTOMERk1 role THEMEROBOT instance x1 2 [robot]DRIVE instance e1 2 [is, driving]AGENT internal e1 1AGENT external x1

CAR instance x2 2 [car]THEME internal e1 3THEME external x2

Figure 2: DRG aligned with the surface form, rep-resenting the meaning of the sentence “A robot isdriving the car”.

In order for the semantic representations, andtheir alignment to the surface, to be useful in con-texts such as knowledge representation and auto-matic reasoning, these logical forms need to belinked to some kind of knowledge base. Other-wise, the predicate symbols in a DRG like the onedepicted in Figure 2 are just interchangeable sym-bols (although Boxer uses lemmas for predicatenames) devoid of meaning.

Popular resources in the LOD ecosystem arewell-suited for serving as knowledge bases forgrounding the symbols: WordNet (Miller, 1995)can be used to represent concepts and events,while DBPedia has a very large coverage fornamed entities. FrameNet (Baker et al., 1998), aninventory of frames and frame elements inspiredby Fillmore’s frame semantics (Fillmore, 1982),has a structure that superimposes easily to the neo-Davidsonian semantics of Boxer’s DRGs. The in-ventory of thematic roles used by Boxer is takenfrom VerbNet (Schuler, 2005). By linking the dis-course referents representing concepts in a DRG toWordNet synsets, entities to DBpedia and eventsto FrameNet frames we are able to extract com-plete representations of frames from natural lan-guage text linked to LOD knowledge bases.

4 Collecting Frame Lexicalizations

We developed a pipeline of NLP tools to automati-cally extract instances of frames from the text. Thepipeline comprises the C&C tools and Boxer, amodule for word sense disambiguation and a mod-ule for entity linking. The two latter modules canbe configured to use different external software toperform their task.

The analysis of a text consists in the followingsteps:

1. Run the C&C tools and Boxer, saving bothits XML and DRG output. The XML outputof Boxer contains, for each predicates of theDRS that has been constructed, a link to thepart of the surface form that introduced it.

2. Run the WSD and entity linking components,preserving the same tokenization. The soft-ware then uses the links to the text providedby Boxer to map the word senses and DBpe-dia entities to the DRS predicates.

3. The word senses corresponding to events aremapped to FrameNet frames, using the map-ping provided by Rouces et al. (2015). TheVerbNet roles are converted into FrameNetroles using the mapping provided by Loperet al. (2007).

4. The partial surface forms in the DRG outputof Boxer are attached to the frames, semanticroles and frame elements.

Figure 3: Architectural Scheme of KNEWS.

This pipeline is implemented in theKNEWS system, available for download athttps://github.com/valeriobasile/learningbyreading. In the followingparagraphs we describe the internal details of thecomponents of KNEWS.

Semantic parsing The semantic parsing moduleemploys the C&C tools and Boxer to process the

7

Page 16: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

input text and output a complete formal represen-tation of its meaning. The C&C pipeline of sta-tistical NLP tools includes a tokenizer, a lemma-tizer, named entity and part-of-speech tagger, anda parser that creates a Combinatorial CaregorialGrammar representation of the natural languagesyntax. Boxer builds a DRS on top of the CCGanalysis. The predicates of a DRS are expressedover a set of discourse referents representing enti-ties, concepts and events. Such structures contain,among other information, predicates representingthe roles of the entities with respect to the de-tected events, e.g., event(A), entity(B), agent(A,B)to represent B playing the role of the agent of theevent A.

Word sense disambiguation and Entity Link-ing KNEWS uses WordNet to represent con-cepts and events, DBpedia for named entities, andFrameNet’s frames to represent events, integratingthe mapping with the WordNet synsets providedby FrameBase. The inventory of thematic rolesused by Boxer is taken from VerbNet (Schuler,2005), while KNEWS employs the mapping pro-vided by SemLinks (Palmer, 2009) to link them(whenever possible) to FrameNet roles. KNEWScan be configured to use either UKB (Agirre andSoroa, 2009) or Babelfy (Moro et al., 2014) to per-form the word sense disambiguation, and DBpediaSpotlight (Daiber et al., 2013) or Babelfy for en-tity linking.

Output modes KNEWS’s default output con-sists of frame instances, sets of RDF triplesthat contain a unique identifier, the type of theframe, the thematic roles involved in the instance,and the concepts or entities that fill the roles.The format follows the scheme of FrameBase,which offers the advantage of interoperability withother resources in the Linked Open Data cloud,as well as the possibility of using FrameBase’s(de)reification rules to automatically generate alarge number of binary predicates. An exampleof frame instance, extracted from the sentence “Arobot is driving the car.” is given in Figure 4.This output mode of KNEWS has been employedin Basile et al. (2016) to create a repository of gen-eral knowledge about objects.

For the purpose of NLG, we extended KNEWSwith a new output mode, similar to the previousone (frame instances) with the difference that itcontains as additional information the alignment

with the text. We exploit the DRG output ofBoxer to link the discourse referents to surfaceforms, i.e., span of the original input text, resultingin the word-aligned representation shown in Fig-ure 5. This new output mode of KNEWS consistof an XML list of frameinstance elements. Eachframe instance is equipped with its complete lex-icalization (the instancelexicalization tag), the in-complete surface form associated with the event(the framelexicalization tag) and a sequence offrameelements. A frameelement represent a rolein the frame instance. The concept tag contains aDBpedia or Wordnet resource (depending on theoutput of the disambiguation module), a lexical-ization of the role filler (the conceptlexicalizationtag), and the incomplete surface form obtained bycomposing the surface forms of the role filler andthe frame. In the next section we describe an au-tomatically built resource created by parsing textwith this configuration of KNEWS.

KNEWS has also an additional output mode:First-order Logic. With this output mode,KNEWS is able to generate first-order logic for-mulae representing the natural language text givenas input. The symbols for the predicates are Word-net symbols, allowing the output of KNEWS tobe integrated with a reasoning engine, e.g., to se-lect background knowledge in a much more fo-cused manner, as proposed in Furbach and Schon(2016).

5 Evaluation

In order to test our approach to knowledge ex-traction, we parsed a corpus of short texts, takenfrom the ESL Yes website of material for Englishlearners.2 We find this data particularly apt in themore general context of extracting general knowl-edge from text, being made of short, clear sen-tences about simple and generic topics. The cor-pus comprises 725 short stories, that we dividedinto 14,140 sentences. Parsing the ESL Yes corpuswith KNEWS we collected 30,217 frame instances(420 unique frames), 1,455 concepts (1,201 Word-Net synsets and 254 DBpedia entities) filling in41,945 roles (161 unique roles). 29,409 role in-stances could not be mapped to FrameNet, so theyare expressed by one of 18 VerbNet roles.

We evaluate the information extraction method-ology by assessing the quality of this automati-cally produced resource. For each frame instance,

2http://www.eslyes.com/

8

Page 17: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix fb: <http://framebase.org/ns/> .@prefix dbr: <http://dbpedia.org/resource/> .@prefix wn: <http://wordnet-rdf.princeton.edu/wn31/> .

fb:fi-Operate_vehicle_dc59afa6 rdf:type fb:frame-Operate_vehicle-drive.v .fb:fi-Operate_vehicle_dc59afa6 fb:fe-Driver dbr:Robot .fb:fi-Operate_vehicle_dc59afa6 fb:fe-Vehicle wn:02961779-n .

Figure 4: RDF triples extracted by KNEWS from the sentence “A robot is driving the car”, constitutingone frame instance.

<frameinstance id=’’Operate_vehicle_dc59afa6’’type=’’Operate_vehicle-drive.v’’ internalvariable=’’e1’’>

<framelexicalization>k3:x1 is driving k3:x2</framelexicalization><instancelexicalization>The robot is driving the car .</instancelexicalization><frameelements><frameelement role=’’Driver’’ internalvariable=’’x1’’>

<concept>http://dbpedia.org/resource/Robot</concept><rolelexicalization>The robot is driving x2</roleexicalization><conceptlexicalization>The robot</conceptlexicalization>

</frameelement><frameelement role=’’Vehicle’’ internalvariable=’’x2’’>

<concept>http://wordnet-rdf.princeton.edu/wn31/02961779-n</concept><rolelexicalization>x1 is driving the car .</roleexicalization><conceptlexicalization>the car .</conceptlexicalization>

</frameelement></frameelements>

</frameinstance>

Figure 5: XML output of KNEWS describing a frame instance extracted from the sentence “A robot isdriving the car”.

if all the information is present and complete, itshould be possible to recreate the instance lex-icalization by applying the composition methodof Basile and Bos (2013). The incomplete surfaceforms corresponding to the frame and the frame el-ements are automatically composed and comparedto the original frame lexicalization. We ran thisevaluation procedure on the resource and found7,366 instances are correctly regenerated, that is,about one in four instances. Of the remaininginstances, 11,996 present incorrect instance lex-icalizations, usually containing variables insteadof being complete surface forms. These occur-rences are caused by misalignments in the repre-sentation produced by Boxer, so that the composi-tion algorithm cannot recreate the original surfaceform. For instance, for the sentence “The mothergave her baby a red apple”, the lexicalized DRGproduced by Boxer, when the composition algo-rithm is applied to it, produces “The mother gavek5:x3 baby k4:x2”. We also found that in 5,211cases the presence subordination prevents the real-ization algorithm from working correctly, becauseno lexicalization is found for the discourse refer-ent corresponding to the subordinate clause. In1,865 cases, issues are caused by the presence ofphrasal verbs (e.g. “He picked up his clothes”) oradverbs, which are analyzed by Boxer using the

Table 1: Error analysis of the automatically pro-duced, text-aligned frame instance collection, bro-ken down by number of frame elements.

roles 1 2 3 allcorrect 4,774 2,374 218 7,366subordination 4,824 368 19 5,211adverb 1,288 561 16 1,865realization 5,885 5,508 603 11,996other 2,672 1,009 98 3,779total 19,443 9,820 954 30,217

relation manner between the event and the adverbor proposition, thus like in the previous case nolexicalization is found for all the discourse refer-ents. Finally, 3,779 instances failed the test due toa variety of reasons, e.g., failure of the entity link-ing module or wrong syntactic analysis. Table 1summarizes the findings exposed so far, also bro-ken down by the number of frame elements in theframe instances.

When increasing number of frame elements perframe instance, the issues with subordinate con-structions dramatically decreases: they amount to24% of the cases with one frame elements, 3%and 1% with two and three frame elements respec-tively. Conversely, wrong realizations due to rep-resentation misalignments tends to get worse, in-volving from 30% of the instances with one frameelements to 56% with two, to 39% with three.

9

Page 18: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

6 Generation of Frame Lexicalizations

The first and most obvious use for the resourcepresented here in the context of NLG is given bythe set of lexicalizations it provides for conceptsand entities. In the example in Figure 5, for in-stance, the DBpedia entity Robot is lexicalizedas “A robot” and the synset 02961779-n as “thecar”. Moreover, the frame is also given the lexi-calization with two open variables “x1 is drivingx2”. Indeed, the surface forms provided by theDRG can be incomplete, that is, containing vari-ables that can be used to compose a full surfaceform from the single ones corresponding to thediscourse referents, e.g., x1:“A robot” and e1:“x1is driving x2” compose to form e1:“A robot is driv-ing x2”, and so on.

This composition mechanism gives us the op-portunity to devise a simple method to producenew frame lexicalizations. Given new conceptsor entities with the respective lexicalizations androles (e.g., Driver: “Valentino Rossi”, Vehicule:“the motorbike”), they can be replaced in the ap-propriate frame instance so that the variables x1and x2 are linked respectively to “Valentino Rossi”and “the motorbike”. A subsequent step of compo-sition will then yield the new frame lexicalization“Valentino Rossi is driving the motorbike”.

We developed a simple prototype in order totest this approach to NLG from frame instances.This prototype is based on the resource describedin Section 5, restricted to the instances with ex-actly two frame elements and associated with acomplete surface form. The procedure we use toevaluate the system is the following:

1. For each frame instance, produce four newframe instances by replacing one or bothframe elements, either with similar conceptsor with randomly chosen concepts.

2. Generate the lexicalization of the new framesby composing the frame lexicalization struc-ture with the new concept lexicalizations.

3. For each of the four scenarios, select ran-domly one hundred instance lexicalizationfor the evaluation.

4. Manually inspect the selected lexicalizationsaccording to three possible classes of fluency:nonsensical (the sentence is not grammati-cal and it does not make sense), informative

Table 2: Result of the manual evaluation of theNLG prototype based on the collection of lexical-ized frame instances.

Replaced frame Judgmentelements nonsensical/informative/fluent1, most similar 23/33/442, most similar 24/53/231, random 23/35/422, random 54/23/23

(the grammar contains mistakes but the in-formation is clearly transmitted), and fluent(the lexicalization correctly conveys the inputknowledge).

When we replace one frame element or bothof them with similar concepts, we rely on theWUP similarity defined by Wu and Palmer (1994)for pairs of WordNet synsets, a measure of pathdistance weighted according to the depth of theWordNet taxonomy. We compute the WUP sim-ilarity for each pair of concepts in our colelctionand replace one or both frame elements with theirmost similar concepts. For example, the frameelements corresponding to the Vehicle in the frameinstance in Figure 5 is associated with the concepthttp://wordnet-rdf.princeton.edu/wn31/02961779-n (car, automobile). Thisconcept could be replaced, for the sake of theevaluation, by the similar concept (according tothe WUP metric) http://wordnet-rdf.princeton.edu/wn31/104497386-n(truck), if this is also in the collection. A newlexicalization is then produced by composition“A robot is driving the truck”. The lexicaliza-tion for the replaced concepts is chosen as themost frequent lexicalization of that particularconcept, to minimize the occuprrence of awkwardrealizations like “A robot is driving of the truck”.

Note that we only judge fluency. An evalua-tion of adequacy or other content-oriented metricsshould also take into account the input and wouldbe more difficult to evaluate in this setting, sincehere the input is artificially produced by replacingelements of the frame instances.

The manual inspection of the produced frameinstance lexicalizations resulted in the figuresshown in Table 2. As expected, replacing bothframe elements instead of just one leads to moreerrors in the realizations. This problem can be mit-igated by increasing the coverage of the resource.With a larger collection, the chance of retrieving a

10

Page 19: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

frame instance with at least one frame element incommon with the new input is higher, thus therewill be more cases where only one frame elementis new. Interestingly, the choice of concepts togenerate with respect to the frame (similar vs. ran-dom) does not seem to influence the outcome. Theresult of this pilot study are encouraging in thata sufficiently large number of correct realizationsare produced by a simple mechanism. However,a more thourough evaluation is needed, especiallywith respect to the coverage (and thus the scalabil-ity) of our approach.

7 Conclusion and Future Work

In this paper we introduced a novel methodologyto extract knowledge from text and encode it informal structures compatible with the standards ofthe Web. Such structures are essentially instancesof frames with their frame elements linked to con-cepts in Wordnet or DBpedia. This methodologyis implemented in the freely available softwarepackage KNEWS. Next, we presented a collectionof frame instances aligned with natural language,automatically created by parsing text for Englishlearners. Finally, we propose a pilot study on howto use this resource to generate natural languagefrom new frame instances.

In terms of future direction for this work, thelow hanging fruit is the enlargement of the re-source, which will lead to a higher number of“good” instances to use for direct generation (asshown in Section 6) and more data to use for a sta-tistical approach to generation. Since the resourceis produced automatically by parsing raw text withKNEWS, and natural language is abundant on theWeb, this is a direction we intend to take in theforeseeable future.

The approach to NLG based on the collectionof lexicalized frame instances introduced in NLGis at the preliminary work stage, and many re-finements can be made to the algorithm. Given anew frame instance to generate, its frame elementscould be matched to the lexicalization in the re-source with more sophisticated methods, e.g., us-ing distributional similarity.

As a possible extension to the resource, infor-mation such as lemma and number could be in-cluded in the lexicalization of concepts. With suchinformation in place, the NLG algorithm could beinterfaced with the SimpleNLG surface realizationlibrary (Gatt and Reiter, 2009) to produce more

fluent lexicalizations.The main selling point of a large knowledge

base aligned with text is that its size allows re-searchers to develop statistical methods to learn amapping between the formaly encoded knowledgeand natural language. While this could be a verychallenging enterprise, as highlighted by the workpresented in Basile (2015), this work of constitutesa first step in this direction.

ReferencesEneko Agirre and Aitor Soroa. 2009. Personalizing

pagerank for word sense disambiguation. In Pro-ceedings of the 12th Conference of the EuropeanChapter of the Association for Computational Lin-guistics, EACL ’09, pages 33–41, Stroudsburg, PA,USA. Association for Computational Linguistics.

Collin F. Baker, Charles J. Fillmore, and John B. Lowe.1998. The berkeley framenet project. In Proceed-ings of the 17th International Conference on Com-putational Linguistics - Volume 1, COLING ’98,pages 86–90, Stroudsburg, PA, USA. Associationfor Computational Linguistics.

Valerio Basile and Johan Bos. 2013. Aligning For-mal Meaning Representations with Surface Stringsfor Wide-coverage Text Generation. In ENLG 2013,page 1.

Valerio Basile, Johan Bos, Kilian Evang, and NoortjeVenhuizen. 2012. Developing a large semanticallyannotated corpus. In Proceedings of the Eighth In-ternational Conference on Language Resources andEvaluation (LREC 2012), pages 3196–3200, Istan-bul, Turkey.

Valerio Basile, Elena Cabrio, and Fabien Gandon.2016. Building a general knowledge base of physi-cal objects for robots. In The Semantic Web. LatestAdvances and New Domains.

Valerio Basile. 2014. A lesk-inspired unsupervisedalgorithm for lexical choice from wordnet synsets.The First Italian Conference on Computational Lin-guistics CLiC-it 2014, page 48.

Valerio Basile. 2015. From logic to language: Natu-ral language generation from logical forms. Ph.D.thesis.

Anja Belz, Michael White, Dominic Espinosa, EricKow, Deirdre Hogan, and Amanda Stent. 2011. Thefirst surface realisation shared task: Overview andevaluation results. In Proceedings of the 13th Eu-ropean Workshop on Natural Language Generation,ENLG ’11, pages 217–226, Stroudsburg, PA, USA.Association for Computational Linguistics.

Johan Bos. 2008. Wide-coverage semantic anal-ysis with boxer. In Semantics in Text Process-ing. STEP 2008 Conference Proceedings, volume 1,pages 277–286.

11

Page 20: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Nadjet Bouayad-Agha, Gerard Casamayor, SimonMille, Marco Rospocher, Horacio Saggion, LucianoSerafini, and Leo Wanner. 2012. From ontologyto nl: Generation of multilingual user-oriented envi-ronmental reports. In Gosse Bouma, Ashwin Ittoo,Elisabeth Metais, and Hans Wortmann, editors, Nat-ural Language Processing and Information Systems,volume 7337 of Lecture Notes in Computer Science,pages 216–221. Springer Berlin Heidelberg.

Aoife Cahill and Josef Van Genabith. 2006. Robustpcfg-based generation using automatically acquiredlfg approximations. In In Proceedings of the 44thACL.

James R. Curran, Stephen Clark, and Johan Bos. 2007.Linguistically motivated large-scale nlp with c&cand boxer. In Proceedings of the 45th Annual Meet-ing of the ACL on Interactive Poster and Demonstra-tion Sessions, ACL ’07, pages 33–36, Stroudsburg,PA, USA. Association for Computational Linguis-tics.

Joachim Daiber, Max Jakob, Chris Hokamp, andPablo N. Mendes. 2013. Improving efficiency andaccuracy in multilingual entity extraction. In Pro-ceedings of the 9th International Conference on Se-mantic Systems (I-Semantics).

Charles Fillmore. 1982. Frame semantics. Linguisticsin the morning calm, pages 111–137.

Ulrich Furbach and Claudia Schon. 2016. Common-sense reasoning meets theorem proving. In Proceed-ings of Workshop on Bridging the Gap between Hu-man and Automated Reasoning.

Albert Gatt and Ehud Reiter. 2009. Simplenlg: A re-alisation engine for practical applications. In Pro-ceedings of the 12th European Workshop on Natu-ral Language Generation, ENLG ’09, pages 90–93,Stroudsburg, PA, USA. Association for Computa-tional Linguistics.

Bikash Gyawali and Claire Gardent. 2014. Surfacerealisation from knowledge-bases. In Proceedingsof the 52nd Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Pa-pers), pages 424–434, Baltimore, Maryland, June.Association for Computational Linguistics.

Julia Hockenmaier and Mark Steedman. 2007. Ccg-bank: A corpus of ccg derivations and dependencystructures extracted from the penn treebank. Com-put. Linguist., 33(3):355–396, September.

Hongyan Jing. 1998. Usage of wordnet in nat-ural language generation. In Proceedings of theJoint 17th International Conference on Computa-tional Linguistics 36th Annual Meeting of the As-sociation for Computational Linguistics (COLING-ACL’98) workshop on Usage of WordNet in NaturalLanguage Processing Systems, pages 128–134.

Hans Kamp and Uwe Reyle. 1993. From Discourseto Logic. Introduction to Modeltheoretic Semanticsof Natural Language, Formal Logic and DiscourseRepresentation Theory. Kluwer, Dordrecht.

Edward Loper, Szu ting Yi, and Martha Palmer. 2007.Combining lexical resources: Mapping betweenpropbank and verbnet. In In Proceedings of the 7thInternational Workshop on Computational Linguis-tics.

Mitchell P. Marcus, Beatrice Santorini, and Mary AnnMarcinkiewicz. 1993. Building a large annotatedcorpus of english: The penn treebank. COMPUTA-TIONAL LINGUISTICS, 19(2):313–330.

George A. Miller. 1995. Wordnet: A lexical databasefor english. Commun. ACM, 38(11):39–41, Novem-ber.

Andrea Moro, Alessandro Raganato, and Roberto Nav-igli. 2014. Entity Linking meets Word Sense Dis-ambiguation: a Unified Approach. Transactionsof the Association for Computational Linguistics(TACL), 2:231–244.

Martha. Palmer. 2009. SemLink: Linking PropBank,VerbNet and FrameNet. In Proceedings of the Gen-erative Lexicon Conference, Pisa, Italy, Sept.

Jacobo Rouces, Gerard de Melo, and Katja Hose.2015. Framebase: Representing n-ary relations us-ing semantic frames. In Fabien Gandon, MartaSabou, Harald Sack, Claudia d’Amato, PhilippeCudr-Mauroux, and Antoine Zimmermann, editors,ESWC, volume 9088 of Lecture Notes in ComputerScience, pages 505–521. Springer.

Karin Kipper Schuler. 2005. Verbnet: A Broad-coverage, Comprehensive Verb Lexicon. Ph.D. the-sis, Philadelphia, PA, USA. AAI3179808.

Mihai Surdeanu, Richard Johansson, Adam Meyers,Lluıs Marquez, and Joakim Nivre. 2008. Theconll-2008 shared task on joint parsing of syntac-tic and semantic dependencies. In Proceedings ofthe Twelfth Conference on Computational NaturalLanguage Learning, CoNLL ’08, pages 159–177,Stroudsburg, PA, USA. Association for Computa-tional Linguistics.

Michael White, Rajakrishnan Rajkumar, and ScottMartin. 2007. Towards broad coverage surface re-alization with ccg. In In Proc. of the Workshop onUsing Corpora for NLG: Language Generation andMachine Translation (UCNLG+MT.

Zhibiao Wu and Martha Palmer. 1994. Verbs se-mantics and lexical selection. In Proceedings ofthe 32Nd Annual Meeting on Association for Com-putational Linguistics, ACL ’94, pages 133–138,Stroudsburg, PA, USA. Association for Computa-tional Linguistics.

12

Page 21: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 13–16,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Processing Document Collections to Automatically Extract Linked Data:Semantic Storytelling Technologies for Smart Curation Workflows

Peter Bourgonje, Julian Moreno Schneider, Georg Rehm, Felix SasakiDFKI GmbH, Language Technology Lab

Alt-Moabit 91c,10559 Berlin, Germany

{peter.bourgonje,julian.moreno schneider,georg.rehm,felix.sasaki}@dfki.de

Abstract

We develop a system that operates on a document

collection and represents the contained information

to enable the intuitive and efficient exploration of

the collection. Using various NLP, IE and Semantic

Web methods, we generate a semantic layer on top

of the collection, from which we take the key con-

cepts. We define templates for structured reorgan-

isation and rearrange the information related to the

key concepts to fit the respective template. The use

case of the system is to support knowledge work-

ers (journalists, editors, curators, etc.) in their task

of processing large amounts of documents by sum-

marising the information contained in these docu-

ments and suggesting potential story paths that the

knowledge worker can then process further.

1 Introduction and Context

Journalists writing a story typically have access tolarge databases that contain information relevantto their topic. Due to the novelty requirement, theyare under pressure to produce a story in a shortamount of time. They have to provide relevantbackground, but also present information that isnew and eye-opening. Curators who design show-rooms or museum exhibitions often cannot affordto spend much time on getting familiar with a newdomain, due to the large variety of unrelated do-mains they work in. Several other job profiles relyon extracting key concepts from a large documentcollection on a specific domain and understandinghow they are related to one another. We refer tothe group of people facing this challenge as knowl-edge workers. The common ground of their tasksis the curation of digital information. In our two-year project we collaborate with four SME part-ner companies that cover four different use casesand sectors (Rehm and Sasaki, 2015). We developtechnologies that enable knowledge workers to

delegate routine tasks to the machine so that theycan concentrate on their core task, i. e., producinga story that is based on a specific genre or text typeand that relies on facts contained in a documentcollection. Among the tools that we develop andintegrate into the emerging platform are semanticstorytelling, named entity recognition, entity link-ing, temporal analysis, machine translation, sum-marisation, classification and clustering. We cur-rently focus on making available RESTful APIs toour SME partners that provide basic functionali-ties that can be integrated into their own in-housesystems. In addition, we work on implementing asystem for semantic storytelling. This system willprocess large document collections, extract enti-ties and relations between them, extract temporalinformation and events in order to automaticallyproduce a hypertext view of the collection to en-able knowledge workers to familiarise themselveswith the document collection in a fast and efficientway. We also experiment with automatically gen-erating story paths through this hypertext clusterthat can then be used as the foundation of a newpiece of content. The focus of this paper is onthe approach we use to fill templates that assist theuser in the generation of a story and on selectingpossible topics for a new story.

2 Related Work

The emerging platform we develop connects allRESTful APIs that perform the individual anal-yses by using the FREME framework (Sasaki etal., 2015) throughout. A system with a similarsetup is described in (Lewis et al., 2014), but un-like our platform, this is mainly targeted at the lo-calisation industry and it deploys a different ap-proach to representing curated data. Other sys-tems aimed at collecting and processing semanti-cally enriched content are developed in the context

13

Page 22: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

of the NewsReader project 1, specifically targetedat the news domain and the SUMMA project 2,focusing on broad coverage of languages throughthe use of machine translation. Our approach to-wards language generation consists of filling tem-plates to create building blocks for a story (whichare not grammatical sentences), and it deviatesfrom a.o. (Cimiano et al., 2013), (Galanis andAndroutsopoulos, 2007), (Bontcheva and Wilks,2004) in that it does not focus on one specificdomain and it includes collecting the informa-tion that goes into the ontology. Our approachis based on extracting relations between entities.Because of our focus on domain adaptability, weplan to avoid the labour-intensive selection of seedpatterns that comes with (semi-)supervised ap-proaches as described in (Xu et al., 2013), (Brin,1998), (Agichtein and Gravano, 2000) and (Et-zioni et al., 2005). We use an unsupervised ap-proach instead with an open set of relation types.A similar system is described in (Yates et al.,2007). For ease of integration reasons we imple-ment an in-house approach to relation extract, asdescribed in 4.

3 Linked Data Generation

We produce a semantic layer over the documentcollection consisting of a set of annotations forevery document, represented in NIF3. Frequentnamed entities are interpreted as key concepts. Anessential step in producing the semantic layer istherefore the spotting and linking of named en-tities. One of our APIs performs entity spottingbased on pre-trained sample models (Nothman etal., 2012) and linking based on DBPedia Spotlight4. Because the users of our tools work on a va-riety of specific domains, the platform providesthe possibility to train a model or to plug in adomain-specific ontology and upload a key-valuestructured dictionary. The NER module is basedon the corresponding Apache OpenNLP module(Apache Software Foundation, 2016). The NERAPI generates a NIF representation of the inputin which every recognised entity is annotated withthe corresponding URI in DBPedia. Specialised

1http://www.newsreader-project.eu/2http://www.summa-project.eu/3See http://persistence.uni-leipzig.

org/nlp2rdf/ontologies/nif-core/nif-core.html

4https://github.com/dkt-projekt/e-NLPcontains code and documentation

SPARQL queries are used to retrieve type-specificrelated information (like birth and death dates andplaces for persons, geo-coordinates for locations,etc.). This information is added to the semanticlayer and is used to fill particular slots in our storytemplates.

4 Semantic Storytelling

For the task of storytelling we use a template-filling approach. The user selects one or moreconcepts and one or more templates that weattempt to fill using the semantic layer. Wehave defined a biography template and an eventtemplate. The biography template contains thefollowing slots: full name, pseudonym, date ofbirth, place of birth, date of death, place of death,mother, father, siblings, spouse, children, occu-pation, key persons and key locations. The eventtemplate contains slots for date(s), key personsand key locations. To collect the informationneeded to fill the individual slots in the templates,the results of the type-specific SPARQL queriesare used. Information in the ontology is typicallyof higher quality than information extractedusing a relation extraction component. However,because the user often works in domains forwhich no ontology is available, relation extractionis used to search for the missing information inthe document collection. In addition to fillingslots in templates, the output of relation extractionbetween entities allows the user to learn aboutrelations between key concepts that are notdirectly related to any slots in the templates.This allows the user to get a better understandingof the document collection, but can also be thebasis for selecting and populating a new eventtemplate, hence generating an angle for a newstory. Relations are extracted in the followingway: A document collection is assumed to berelatively homogeneous from a content perspec-tive (file types and information type (running textor metadata) may vary, but the information in acollection is assumed to be about one domain).The goal is to aggregate information and extractrelevant relations regardless of which documentthese originate from. The assumption is that inthe NER procedure all relevant concepts havebeen annotated and are present in the semanticlayer. For every sentence containing two or moreentities, a dependency tree is generated using theStanford CoreNLP DependencyParser (Manning

14

Page 23: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

et al., 2014). The entities are located in the tree,and a relation between entity A and entity B isestablished if A has a subject-type dependencyrelation to a verb node in the dependency graphand B has an object-type dependency relation tothis same verb node in the graph. The value ofthis verb node (e.g. the token) is taken to expressthe relation. For collecting missing informationrelated to specific slots in the template we plan toinclude a dedicated relation extraction system andtrain it with a number of seeds that corresponddirectly to the slots in the template we want tofill. Because of the limited number of relationtypes that are needed for this, a (semi-)supervised,seed-based approach is likely to produce usefulresults. For selecting new angles for a story (inthe form of events as the basis for a new tem-plate), we want to capture a larger set of relationtypes. Since the domain typically determines thepredominant relation types present in a collection,this calls for a more open approach that is notlimited by the relation types covered by theseeds. To give an idea of the current stage of ourrelation extraction component, a demo-interfacebased on the Viking corpus is available at:http://dev.digitale-kuratierung.de/ds/selectstory.php.

5 Evaluation

To assess the suitability of the platform for thetasks of template filling and interactive collectionexploration, we have asked four humans to per-form these tasks and in the process evaluate theplatform. For evaluation, three collections wereused, two of them real-world use-cases providedby the SME partners of the project: (i) a collectionof letters sent by the architect Erich Mendelsohnand his wife 5 and (ii) a private document collec-tion on Vikings. The third collection is the Wiki-Wars corpus 6.

The subjects were asked to check how manyslots in the templates could be filled with the rela-tions extracted. For the biography template somerelevant relations were found in the Viking corpus:Edward, marry, Edith, which can fill the spouseslot and William, raise, Malcolm, which could fillthe children slot (though it of course does not im-ply that Malcolm really is a child of William). Be-

5http://ema.smb.museum/en/home/6http://timexportal.

wikidot.com/forum/t-275092/wikiwars-is-available-for-download

cause we have not created a gold standard fromthese corpora, measuring recall is problematic. Asa result, it is not clear whether the approach failedto extract the relations that could lead to popu-lating templates, or whether this information wasnot present in the corpus. It is clear however thatthe real-world scenario would primarily rely ongetting information from the ontology. An ob-servation that was made by all test subjects wasthat the quality of the relations extracted from theWikiWars and the Viking corpus was much higherthan that of relations extracted from the Mendel-sohn corpus. This is probably due to the factthat the first two corpora are meant to be descrip-tive and clear records of historical events, whereasthe Mendelsohn corpus consists of private lettersthat were probably not intended to be descriptivefor the general public. In addition, the Mendel-sohn letters contained many cases of you and I,which were not recognized by the NER compo-nent (because resolution to a URI is not in allcases straightforward), hence these sentences werenot considered. With regard to the task of explor-ing the document collection and selecting possibleevents to base a new story on, the subjects reportedthat several useful relations were extracted, againwith the exception of the Mendelsohn corpus. Ex-amples are [William [[extort, Harold], [leave, Al-dred]]] from the Viking corpus and [Rome, annex,[Sardinia, Corsica]] and [Tripartite pact, unite,[Italy, Germany, Japan]] from the WikiWars cor-pus. These pieces of information also display theadvantage of a non-seed-based approach. To cap-ture the above relations, we would need to havetrained for relations of the extort-, leave-, annex-and unite-type. The point however is that upfrontthe potentially interesting relations that need to beextracted are of unknown types.

A drawback of the current approach is the factthat only binary relations (a predicate with two ar-guments) are extracted. Ditransitive verbs are notfully captured. Instead, we end up with relationslike [Edward, give, William], where an importantpiece of the expressed relation is missing. Anotherdrawback is the requirement that the two entitiesare connected through the same verb node in thedependency graph. We have no concrete figureson recall, but the number of extracted relationswere relatively low for all corpora. A third ob-served issue were errors introduced by the depen-dency parser. Relations like [United States, mem-

15

Page 24: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

ber, Allies] and [Poland, better, Russian Empire]were extracted, where apparently member and bet-ter were tagged as a verb, hence considered by ouranalysis.

6 Conclusion

We present a platform that generates a semanticlayer over a document collection and applies rela-tion extraction to fill templates that serve as build-ing blocks for the generation of new stories. Thisinformation is extracted, analysed, in some casesrearranged and presented to the user in the formof (i) filled out templates or (ii) through an in-teractive tree-based exploratory view. Test userspointed out its usefulness to exploring the contentsof a collection in a fast and intuitive way and itsshortcomings with regard to non-binary relations,low recall and lack of parsing precision. The mostimportant suggestions for future work are lookinginto means of extracting indirect objects (or gen-erally relevant additional information for specificrelations, such as location or time) and definingpaths through the dependency graphs rather thanrequiring direct connections in the graph. In addi-tion we plan to plug in a verb ontology to be ableto group the different types of relations we find.

ReferencesEugene Agichtein and Luis Gravano. 2000. Snow-

ball: Extracting relations from large plain-text col-lections. In Proceedings of the Fifth ACM Con-ference on Digital Libraries, DL ’00, pages 85–94,New York, NY, USA. ACM.

Apache Software Foundation. 2016. ApacheOpenNLP, http://opennlp.apache.org.

Kalina Bontcheva and Yorick Wilks. 2004. Automaticreport generation from ontologies: The MIAKT ap-proach. In Farid Meziane and Elisabeth Metais, edi-tors, Natural Language Processing and InformationSystems, 9th International Conference on Applica-tions of Natural Languages to Information Systems,NLDB 2004, Salford, UK, June 23-25, 2004, Pro-ceedings, volume 3136 of Lecture Notes in Com-puter Science, pages 324–335. Springer.

Sergey Brin. 1998. Extracting patterns and relationsfrom the world wide web. In In WebDB Work-shop at 6th International Conference on ExtendingDatabase Technology, EDBT98, pages 172–183.

Philipp Cimiano, Janna Luker, David Nagel, andChristina Unger. 2013. Exploiting ontology lex-ica for generating natural language texts from rdf

data. In Proceedings of the 14th European Work-shop on Natural Language Generation, pages 10–19, Sofia, Bulgaria, August. Association for Com-putational Linguistics.

Oren Etzioni, Michael Cafarella, Doug Downey, AnaMaria Popescu, Tal Shaked, Stephen Soderland,Daniel S. Weld, and Alexander Yates. 2005. Un-supervised named-entity extraction from the web:An experimental study. Artificial Intelligence,165(1):91–134, 6.

D. Galanis and I. Androutsopoulos. 2007. GeneratingMultilingual Descriptions from Linguistically An-notated OWL Ontologies: the NaturalOWL System.In Proceedings of the 11th European Workshop onNatural Language Generation (ENLG 2007), pages143–146, Schloss Dagstuhl, Germany.

David Lewis, Asuncion Gomez-Perez, Sebastian Hell-man, and Felix Sasaki. 2014. The role of linkeddata for content annotation and translation. In Pro-ceedings of the 2014 European Data Forum, EDF’14.

Christopher D. Manning, Mihai Surdeanu, John Bauer,Jenny Finkel, Steven J. Bethard, and David Mc-Closky. 2014. The Stanford CoreNLP natural lan-guage processing toolkit. In Association for Compu-tational Linguistics (ACL) System Demonstrations,pages 55–60.

Joel Nothman, Nicky Ringland, Will Radford, TaraMurphy, and James R. Curran. 2012. Learning mul-tilingual named entity recognition from Wikipedia.Artificial Intelligence, 194:151–175.

Georg Rehm and Felix Sasaki. 2015. Digitale Ku-ratierungstechnologien – Verfahren fur die effizienteVerarbeitung, Erstellung und Verteilung qualitativhochwertiger Medieninhalte. In Proceedings of the2015 International Conference of the German So-ciety for Computational Linguistics and LanguageTechnology, GSCL ’15, pages 138–139.

Felix Sasaki, T. Gornostay, M. Dojchinovski,M. Osella, E. Mannens, G. Stoitsis, Philip Richie,T. Declerck, and Kevin Koidl. 2015. Introducingfreme: Deploying linguistic linked data. In Pro-ceedings of the 4th Workshop of the MultilingualSemantic Web, MSW ’15.

Feiyu Xu, Hans Uszkoreit, Hong Li, Peter Adolphs,and Xiwen Cheng, 2013. Domain Adaptive RelationExtraction for Semantic Web, chapter X. Springer.

Alexander Yates, Michael Cafarella, Michele Banko,Oren Etzioni, Matthew Broadhead, and StephenSoderland. 2007. Textrunner: Open informa-tion extraction on the web. In Proceedings of Hu-man Language Technologies: The Annual Confer-ence of the North American Chapter of the Asso-ciation for Computational Linguistics: Demonstra-tions, NAACL-Demonstrations ’07, pages 25–26,Stroudsburg, PA, USA. Association for Computa-tional Linguistics.

16

Page 25: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 17–24,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

On the Robustness of Standalone Referring Expression GenerationAlgorithms Using RDF Data

Pablo Ariel Duboue and Martin Ariel Domınguez and Paula EstrellaFacultad de Matematica, Astronomıa y Fısica

Universidad Nacional de CordobaCordoba, Argentina

Abstract

A sub-task of Natural Language Genera-tion (NLG) is the generation of referringexpressions (REG). REG algorithms areexpected to select attributes that unam-biguously identify an entity with respectto a set of distractors. In previous workwe have defined a methodology to evaluateREG algorithms using real life examples.In the present work, we evaluate REG al-gorithms using a dataset that contains al-terations in the properties of referring en-tities. We found that naturally occurringontological re-engineering can have a dev-astating impact in the performance of REGalgorithms, with some more robust in thepresence of these changes than others. Theultimate goal of this work is observing thebehavior and estimating the performanceof a series of REG algorithms as the enti-ties in the data set evolve over time.

1 Introduction

The main research focus in NLG is the creation ofcomputer systems capable of generating human-like language. According to the consensus NaturalLanguage Generation (NLG) architecture (Cahillet al., 2001) the NLG task takes as input non-linguistic data and operates over it as a series ofenrichment steps, culminating with fully specifiedsentences from which output strings can be readout. Such a generation pipeline mimics, up to acertain extent, a Natural Language Understanding(NLU) pipeline. In NLU, however, it is expectedthat the text upon which the system is being runupon might contain a variety of errors. These er-rors include wrongly written text that a humanmight find it difficult to understand or idiosyn-cratic deviations from well accepted prose (what is

called “improper grammar” or “orthographic mis-takes” by defenders of prescriptive grammar). Thefact that plenty of texts of interest to NLU exhibitpoor quality explains the reason behind NLU’s fo-cus on robust approaches. Such approaches at-tempt to cope gracefully with inputs that do notconform to the standards of the original texts em-ployed for building the system (either as work-ing examples or training data in a machine learn-ing sense). In NLG, on the other hand, currentapproaches rarely explore fallback strategies forthose cases where the data is not fully compliantwith the expected input and, thus, there is little in-tuition about possible outputs of a system undersuch circumstances.

In this work we aim to explore robustness forthe particular case of Referring Expressions Gen-eration (REG) algorithms by means of differentversions of an ontology. Therefore, we can com-bine REG algorithms with ontologies to studytheir behavior as the entities in the chosen ontol-ogy change. In our case, we have chosen the on-tology built from Wikipedia though different ver-sions of DBpedia and three REG algorithms onwhich we will measure robustness, defined here asan algorithm’s resilience to adapt to changes in thedata or its capability to gracefully deal with noisydata. In a sense, we are interested in two differentphenomena: (1) whether an NLG subcomponent(REG in particular) can be used with outdated on-tological data to fulfill its task and (2) which im-plementation of the said subcomponent is bettersuited in this setting. Our research is driven by thesecond question but this current work sheds morelight on the first one. See Section 7 for details.

This paper is structured as follows: next sectionbriefly mentions the relevant related work, in Sec-tion 3 and Section 4 we describe the algorithms ap-plied and data used, respectively; in Section 5 wedescribe the setup used in the experiments, then

17

Page 26: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Section 6 presents the results which are furtherdiscussed in Section 7.

2 Related work

The need to account for changes in ontologies haslong been acknowledged, given that they may notbe useful in real world applications if the repre-sentation of the knowledge they contain is out-dated. Eder and Koncilia (2004) present a formal-ism to represent ontologies as graphs that containa time model including time intervals and validtimes for concepts. They base their formalismon techniques developed for temporal databases,namely the versioning of databases instead of theirevolution and they provide some guidelines on itspossible implementation.

Another source of ontology transformation isspatiotemporal changes. Dealing with spatialchanges on historical data (or over time series) iscrucial for some NLP tasks, such as informationretrieval (Kauppinen and Hyvnen, 2007). In theircase, the authors deal with the evolution of the on-tology’s underlying domain instead of its version-ing or evolution due to developments or refine-ments. Their main result is the definition of partialoverlaps between concepts in a given time series,which was applied to build a Finnish Temporal Re-gion Ontology, showing promising results.

More recently, with the development of oneof the largest ontologies, DBpedia (Bizer et al.,2009), much research has been devoted to exploit-ing this resource in NLP or NLG tasks as wellas to model its changes. For example, there isresearch on modeling DBpedia’s currency (Rulaet al., 2014), that is, the age of the data in itand the speed at which those changes can be cap-tured by any system. Although currency couldbe computed based on the modification/creationdates of the resources, this information is not al-ways present in Wikipedia pages. To overcomethis, the authors propose a model to estimate cur-rency combining information from the original re-lated pages and a couple of currency metrics mea-suring the speed of retrieval by a system and ba-sic currency or timestamp. Their experiments sug-gest that entities with high system currency are as-sociated with more complete DBpedia resourcesand entities with low system currency appear as-sociated with Wikipedia pages that are not easilytractable (or that “could not provide real world in-formation” according with the authors).

Closer to our work, Kutlak et al. (2013) useDBpedia for REG. As opposed to classical workin the field, this work experiments with entitiesthat potentially have a large number of distrac-tors, a situation that may be difficult to handle forclassical REG algorithms. Under this hypothesis,the authors propose a new corpus-based algorithminspired by the notion of Communal CommonGround (CCG), defined as information shared by aparticular community whose members assume thateveryone knows. In the extrinsic evaluation CCGoutperformed the more classical Incremental Al-gorithm (IA) (Dale and Reiter, 1995), thus authorssuggest that CCG might be more suitable for largedomains than other algorithms.

In previous work, we (Pacheco et al., 2012) em-ployed several REG algorithms to generate refer-ring expressions from DBpedia, in particular weused Full Brevity (Bohnet, 2007), Constraint Sat-isfaction approach (Gardent, 2002) and the Incre-mental Algorithm (Dale and Reiter, 1995). Ex-ploiting articles from Wikinews to generate the REfor a randomly selected group of entities (a tech-nique we also used in this work), we found thatDBpedia contained information about half of theentities mentioned in the news and that the IA andthe Constraint Satisfaction were able to generate adefinite description in about 98% of the contextsextracted from the news articles. However, thealgorithms produced satisfactory definite descrip-tions only in about 40% of the cases. The mainproblems identified in these experiments were thatsome properties are quite unique but lead to de-scriptions of little use to most people evaluatingthem (thus producing the low results obtained) andthe rather odd choice of the preference order of theIncremental Algorithm. Our focus on robustness,however, is different from it.

In an earlier version of this work (Duboue etal., 2015), we analyzed only people rather thanpeople and organizations and had an experimen-tal mishap where the old version included tuplesfrom the new version of DBpedia, producing re-sults that were too optimistic. Even with the issuewith the previous results, it encouraged us to userobustness as a metric for learning the IA order-ing, an intriguing idea we explored recently, withmixed results (Duboue and Domınguez, 2016).

18

Page 27: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

3 Algorithms

In this section we describe the REG algorithmsused in this work. We chose three representativealgorithms that can deal with single entity refer-ents, a classic algorithm (Dale and Reiter, 1995),an algorithm generating negations (Gardent, 2002)and an algorithm using graph theory (Krahmer etal., 2003). We describe each of the algorithms us-ing the following notation: R is the referent, Cis the set of distractors and P is a list of proper-ties, triples in the form (entity, attribute, value),describing R.

REG algorithms pick properties that might ulti-mately be used to generate nominal syntactic unitsthat identify the entity that is the referent.1 Wedefine the context as the set of entities that the re-ceiver is currently paying attention to. Then, thedistractor set is the context set without the pre-dicted reference. Once this is defined, they con-sider the components of a referral expression asrules to set aside or keep on considering the mem-bers of the contrast set. For example, if the speakerwants to identify a small black dog in a situationwhere the distractor set consists of a big white dogand a black small cat, we could choose the adjec-tive black to rule out the white dog and then thenoun dog to rule out the noun cat. The resultingreferral expression would be the black dog, whichrefers to R but not to the other entities in this con-text. Thus, the R was unmistakably identified.

The algorithms listed here are implementedin the Java programming language and publiclyavailable under an Open Source license as part ofthe Alusivo REG project.2

3.1 Incremental Algorithm

The incremental algorithm assumes the propertiesin P are ordered according to an established cri-teria. Then the algorithm iterates over P , addingeach triple one at a time and removing from C allentities ruled out by the new triple. Triples that donot eliminate any new entity from C are ignored.The algorithm terminates when C is empty. Thisalgorithm was created in 1995 (Dale and Reiter,

1At this level in the generation pipeline, the system op-erates on abstract semantic representations, the actual wordsand syntactic forms are left to other components, such as thelexical chooser and the surface generator. In this discussionwe use nominal phrases to illustrate the topics, but this is withthe understanding that is the output of the full system not theREG algorithm alone.

2https://github.com/DrDub/Alusivo

1995) as a simplification of previous work on thedevelopment of REG algorithms. Given its sim-plicity it is considered a baseline in many NLGarticles.

This algorithm is strongly influenced by thepreference order among attributes. We used theordering for people in Wikipedia developed byPacheco (Pacheco, 2012) which is shipped withAlusivo.

3.2 Gardent AlgorithmThe algorithm (Gardent, 2002) is based on the ideathat in many languages, a possible way to unam-biguously describe entities is to identify a set ofrelated referents and to provide a quantified ex-pression, for example, “the team where Messi hasplayed that is not in Argentina” should suffice toidentify Barcelona Ftbol Club as the referred en-tity. The speaker offers enough information to thelistener to identify the set of objects the speaker istalking about. From a generation perspective, thismeans that starting with a set of objects and theirproperties which are known to the speaker and thelistener, a distinctive description must be created,in such a way that it allows the user to unmistak-ably identify the referred objects. The solution ad-dressed from this standpoint is an algorithm thatgenerates minimal distinct descriptions, that is tosay, with the least number of literals to identifythe target. By definition, these will not be unnec-essarily long, redundant nor ambiguous. The algo-rithm performs this task using Constraint Satisfac-tion Programming (CSP) (Lassez, 1987)3 to solvetwo basic constraints: find a set of positive proper-ties P+ and negative properties P−, such that allproperties in P+ are true for the referent and all inP− are false, and it is the smaller P+ ∪ P− suchthat for every c ∈ C there exists a property in P+

that does not hold for c or a property in P− thatholds for c.

We further reuse the orderings from the incre-mental algorithm for the search process used bythe constraints solver.

3.3 Graph AlgorithmThe REG graph-based algorithm (Krahmer et al.,2003) constructs an initial directed graph G thatmodels the original problem. The nodes in G be-long to {R}∪C and the edges represent the prop-erties P . The algorithm recursively constructs a

3We employed the Choco CSP solver Java library:http://choco-solver.org/.

19

Page 28: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

sub-graph of G, V , starting with node R. Ateach step it explores the space of successors of thenodes in G that are not in V . The search is guidedby a cost function that is calculated each time anode and edge are added to V . The goal of thealgorithm is to check whether the properties in Vserve to distinguish R from C, meaning that V isdistinctive. To verify whether V is distinctive, ateach step the algorithm searches for the existenceof a sub-graphGc isomorphic to V , whereGc con-tains a node c ∈ C; if no such graph exists, then Vis distinctive. Finally, the least expensive distinc-tive graph is returned.

Different cost functions are possible. For ourexperiments, we use a cost function equal to thenumber of edges plus the number of vertices inthe subgraph and we order the search over edgesusing the same ordering as the incremental algo-rithm. This is the particular parameterization ofthe graph algorithm that we are comparing againstthe other algorithms.

4 Data

We have chosen Wikipedia as our source of enti-ties, as it represents one of the biggest freely avail-able knowledge base. Started in January 2001 atpresent it contains over 37 million articles in 284languages and it continues to grow thanks to thecollaborative creation of content by thousands ofusers around the globe.

Given that the content in Wikipedia pages isstored in a structured way, it is possible to ex-tract and organize it in an ontology-like manner asimplemented in the DBpedia community project.This is accomplished by mapping Wikipedia in-foboxes from each page to a curated shared on-tology that contains 529 classes and around 2,330different properties. DBpedia contains the knowl-edge from 111 different language editions ofWikipedia and, for English the knowledge baseconsists of more than 400 million facts describing3.7 million things (Lehmann et al., 2015). A noblefeature of this resource is that it is freely availableto download in the form of dumps or it can be con-sulted using specific tools developed to query it.

These dumps contain the information coded in alanguage called Resource Description Framework(RDF) (Lassila et al., 1998). The WWW Con-sortium (W3C) has developed RDF to encode theknowledge present in web pages, so that it is com-prehensible and exploitable by agents during any

information search. RDF is based on the conceptof making statements about (Web) resources usingexpressions in the subject-predicate-object form.These expressions are known as triples, where thesubject denotes the resource being described, thepredicate denotes a characteristic of the subjectand describes the relation between the subject andthe object. A collection of such RDF declarationscan be formally represented as a labeled directedmulti-graph, naturally appropriate to represent on-tologies.

We have chosen to use the dumps of differ-ent versions of Wikipedia, namely versions 2014(09/2014) and 3.6 (01/2011). DBpedia 3.6 ontol-ogy encompasses 359 classes and 1,775 properties(800 object properties, 859 datatype properties us-ing standard units, 116 datatype properties usingspecialized units) and DBpedia 2014 ontology en-compasses 685 classes and 2,795 properties (1,079object properties, 1,600 datatype properties usingstandard units, 116 datatype properties using spe-cialized units).4 These versions have been specifi-cally selected: the 2014 version for current up-to-date data and the 3.8 version for comparison withthe results by Pacheco et al. (Pacheco et al., 2012).

5 Experimental setup

We follow an approach similar toPacheco et al. (Pacheco et al., 2012) to ex-tract REG tasks from journalistic text: we extractall people that appear explicitly linked in a givenWikinews article. By using Wikinews, we ensureall the people are disambiguated to their DBpediaURIs by construction (Figure 1).5 We selected aWikinews dump as closest to our target DBpedia(20140901). From there, we define all URIs forwhich DBpedia has a birthDate relation (761,830entities) as “people” and all entities with afoundDate as an “organization” (19,694 entities).We extract all such people and organizations thatappear in the same Wikinews article using theprovided inter-wiki SQL links file. For each arti-cle, we randomly chose a person as the referent,turning them into a fully defined REG task. Thisapproach produced 4,741 different REG tasks,over 9,660 different people and 3,062 over 8,539

4Statistics taken from the DBpedia change log available athttp://wiki.dbpedia.org/services-resources/datasets/change-log.

5These are potential REG tasks, but not actual REG tasks.We use the news article to extract naturally co-occurring en-tities.

20

Page 29: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Algorithm Execution Errors Dice Omission Errors Inclusion ErrorsPeopleIncremental 232 (5%) 0.48 1,406 (50%) 145 (5%)Gardent 0 (0%) 0.58 1,089 (36%) 554 (18%)Graph 15 (0%) 0.38 1,870 (62%) 20 (0%)OrganizationsIncremental 1,386 (45%) 0.69 305 (31%) 3 (0%)Gardent 829 (27%) 0.70 338 (22%) 357 (23%)Graph 934 (31%) 0.06 1,347 (94%) 2 (0%)

Table 1: Our results, over 3,051 different REG tasks for people and 2,370 for organizations. The errorpercentages are computed over the total number of executed tasks.

organizations. Each REG task has an average of4.89 people and 2.79 organizations.

We then created a subset of the relevant tu-ples for these people (291,039 tuples on DBpe-dia 2014 and 129,782 on DBpedia 3.6, a 224% in-crease6) and organizations (468,041 tuples on DB-pedia 2014 and 216,730 on DBpedia 3.6, a 216%increase) by extracting all tuples were any of thepeople or organizations were involved, either assubject or object of the statement. Over these sub-sets were our algorithms executed.

As we are interested in REs occurring after firstmentions, we filter properties from the data thatunequivocally identify the entity, such as full nameor GPS location of its headquarters.

6 Results

We run three representative algorithms that candeal with single entity referents, described in Sec-tion 3. As all our algorithms can fail to produce aRE, the first column in our results table (Table 1)contains the number of execution errors. The al-gorithms can fail either because they have a time-out argument (such as in Graph) or they have arestricted set of heuristics (like IA) that might failto produce a unique description. In the event ofunknown entities (36% of people tasks and 23%of organization tasks contain an entity unknownto DBpedia 3.6), Gardent Algorithm can attemptto produce a RE using negations (using a closedworld assumption) while our implementation ofthe Graph Algorithm will also attempt building aRE if the unknown entity is not the referent. Thisseldom happens and we report only numbers on

6In DBpedia 2014, there was an average of 30.12 proper-ties per person while in DBpedia 3.6, there was an average of17.3

fully defined tasks, which mean the algorithms canbe run only on 64% of people tasks and 77% oforganization tasks. Using the RE obtained in theold version of DBpedia, we executed on the newversion and computed a Dice set similarity coeffi-cient between the two sets. However, Dice has itsown problems when it comes to evaluating REGresults (van Deemter and Gatt, 2009) and we thuscomputed two extra metrics: inclusion errors andomission errors. Inclusion errors imply the REchosen on the old version of DBpedia included ex-tra distractors when applied to the new version ofDBpedia. Omission errors imply the RE chosenon the old version of DBpedia failed to include thereferent in the new version (this figure includes allexecution errors).

The number of inclusion errors is somewhat inline with our expectations from having a more de-tailed ontological resource: as more data is knownabout the world, properties that seemed a good fitin a knowledge poor situation all of sudden be-come too general. For example, trying to dis-tinguish a politician A from two other politiciansB,C when we do not have a record that B and Care politicians will lead us to a RE (x is a politi-cian) that will overgenerate on a newer, richer on-tology.

The number of omission errors was, however,puzzling at first sight and wholesome unaccept-able. A certain level of omission errors canbe expected (referring to a former prime minis-ter as a prime minister will result in an omis-sion error) but a 3 year span cannot justify 50%omission errors. Further error analysis revealstwo key changes in DBpedia that result in thisbehavior: a re-engineering of its type system(dropping large number of types, such as ‘Politi-

21

Page 30: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

cian’) and dropping language annotations for val-ues (from “22nd”@en to “22nd” –note the drop-ping of @en). These two changes account for 90%of the omission errors and even a greater percent-age of the errors produced by the Graph Algorithmin organizations (the 0.06 Dice in the table).

Now, with further engineering on our behalf (orby choosing a different version of DBpedia wherethese changes have been already ironed out), wecan have an experiment that will shed more lightover which REG algorithm is more robust. How-ever, for the sake of our first research question, wefind these results quite enlightening: these onto-logical changes were difficult to spot (the new ver-sion was 200% bigger, there were little reason toexpect many types were dropped). Moreover, typeinformation is key for most REG algorithms. Webelieve this highlights a different point-of-viewwhen designing NLG subcomponents that operatewith real data in a changing world.

Given this, our results are too early to fully com-pare REG algorithms but as a preliminary result,the CSP approach seems to perform consistentlyat the top. We also found a few interesting exam-ples presented in the appendix.

All our code and data are available online forfurther analysis.7

7 Conclusions

We set ourselves to investigate the robustness of aNLG subcomponent when applied to Web data inRDF format. We were interested in two differentphenomena: (1) whether an NLG subcomponent(REG in particular) can be used with outdated on-tological data to fulfill its task and (2) which im-plementation of said subcomponent is better suitedto this setting. Our research is driven by the sec-ond question but this current work sheds morelight on the first one: we found that, off the bat,one quarter of the entities of interest, using threeyear old data, will be unknown. Handling un-known ontological entities is a problem that hasreceived no attention in NLG as far as we can tell(compare this to dealing with OOV words in NLU,a well defined task and problem). Moreover, wefound that in the particular span we have chosen,the typesystem underwent massive re-engineering,which in turn renders the old referring expres-

7https://github.com/DrDub/Alusivo andhttps://duboue.net/data.

Former [[New Mexico]] {{w|Governor of NewMexico|governor}} {{w|Gary Johnson}} endedhis campaign for the {{w|Republican Party(United States)|Republican Party}} (GOP)presidential nomination to seek the backingof the {{w|Libertarian Party (UnitedStates)|Libertarian Party}} (LP).

Figure 1: Wikinews example, fromhttp://en.wikinews.org /wiki/U.S.presidential candidate GaryJohnson leaves GOP to viefor the LP nom, adapted from

Pacheco et al. (Pacheco et al., 2012).

sions meaningless for this exercise8 (and renderstheir associated resources, such as lexicons, stale).Given these errors, it is still too early to concludewhich algorithm from the three REG algorithmswe analyzed fares better in this setting, but wefound early evidence in favor of the constraint sat-isfaction algorithm proposed by Gardent (2002).We also believe that there is space for a new REGalgorithm design with resiliency in mind that seeksto produce REs that hold better over time.

Our comparison has been done over threespecifically parameterized versions of the chosenalgorithms. We cannot conclude whether the dif-ferences among them are due to differences in thealgorithms themselves or in their parameteriza-tions. We believe a follow-up study measuring theimpact of different parameterizations in this set-ting is merited.

Also in future work, we plan to simulate naturalperturbations on the data in order to find the con-ditions on which REG algorithms start to fail (forexample, a simulated DBpedia of 25 years in thefuture).

AcknowledgmentsWe would like to thank the anonymous reviewersas well as Annie Ying and Victoria Reggiardo.

ReferencesC. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker,

R. Cyganiak, and S. Hellmann. 2009. DBpedia-acrystallization point for the web of data. Web Se-mantics: Science, Services and Agents on the WorldWide Web, 7(3):154–165.

B. Bohnet. 2007. is-fbn, is-fbs, is-iac: The adaptationof two classic algorithms for the generation of refer-ring expressions in order to produce expressions like

8Whether or not the RE still hold in the real world is atopic we did not investigate, given the size of our evaluation.

22

Page 31: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

humans do. MT Summit XI, UCNLG+ MT, pages84–86.

Lynne Cahill, John Carroll, Roger Evans, Daniel Paiva,Richard Power, Donia Scott, and Kees van Deemter.2001. From rags to riches: exploiting the potentialof a flexible generation architecture. In Proceedingsof the 39th Annual Meeting on Association for Com-putational Linguistics, pages 106–113. Associationfor Computational Linguistics.

R. Dale and E. Reiter. 1995. Computational interpreta-tions of the gricean maxims in the generation of re-ferring expressions. Cognitive Science, 19(2):233–263.

Pablo Duboue and Martın Domınguez. 2016. Usingrobustness to learn to order semantic properties inreferring expression generation. In Proceedings ofIberamia 2016 (to appear).

Pablo Duboue, Martın Domınguez, and Paula Estrella.2015. Evaluating robustness of referring expressiongeneration algorithms. In Proceedings of MexicanInternational Conference on Artificial Intelligence2015. IEEE Computer Society.

Johann Eder and Christian Koncilia. 2004. C.: Mod-elling changes in ontologies. In In: Proceedings ofOn The Move - Federated Conferences, OTM 2004,Springer (2004) LNCS 3292, pages 662–673.

C. Gardent. 2002. Generating minimal definite de-scriptions. In Proceedings of the 40th Annual Meet-ing on Association for Computational Linguistics,pages 96–103. Association for Computational Lin-guistics.

Tomi Kauppinen and Eero Hyvnen. 2007. Modelingand reasoning about changes in ontology time series.In Integrated Series in Information Systems, pages319–338. Springer-Verlag.

Emiel Krahmer, Sebastiaan Erk, and Andr Verleg.2003. Graph-based generation of referring expres-sions. Computational Linguistics, 29:53–72.

Roman Kutlak, Kees van Deemter, and Chris Mellish.2013. Generation of referring expressions in largedomains. PRE-CogSci, Berlin.

Jean-Louis Lassez. 1987. Logic programming. InProceedings of the Fourth International Conference.

Ora Lassila, Ralph R. Swick, World Wide, and WebConsortium. 1998. Resource description frame-work (rdf) model and syntax specification.

Jens Lehmann, Robert Isele, Max Jakob, AnjaJentzsch, Dimitris Kontokostas, Pablo N. Mendes,Sebastian Hellmann, Mohamed Morsey, Patrick vanKleef, Soren Auer, and Christian Bizer. 2015. DB-pedia - a large-scale, multilingual knowledge baseextracted from wikipedia. Semantic Web Journal,6(2):167–195.

Fabian Pacheco, Pablo Ariel Duboue, and Martın ArielDomınguez. 2012. On the feasibility of opendomain referring expression generation using largescale folksonomies. In Proceedings of the 2012Conference of the North American Chapter of theAssociation for Computational Linguistics: HumanLanguage Technologies, NAACL HLT ’12, pages641–645, Stroudsburg, PA, USA. Association forComputational Linguistics.

Fabian Pacheco. 2012. Evaluacin en gran escala dealgoritmos clsicos parade expresiones referencialesgeneracin. Master’s thesis, Facultad de MatematicaAstronomia y Fisica, Universidad Nacional de Cor-doba.

Anisa Rula, Luca Panziera, Matteo Palmonari, and An-drea Maurino. 2014. Capturing the currency of db-pedia descriptions and get insight into their validity.In Proceedings of the 5th International Workshop onConsuming Linked Data (COLD 2014) co-locatedwith the 13th International Semantic Web Confer-ence (ISWC 2014), Riva del Garda, Italy, October20, 2014.

Kees van Deemter and Albert Gatt. 2009. Beyonddice: measuring the quality of a referring expres-sion. In Proceedings of the Workshop on Productionof Referring Expressions: Bridging Computationaland Psycholinguistic Approaches.

23

Page 32: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Figure 2: Appendix: Selected Runs

• Distinguish Saddam Hussein from Paul Volcker, Kofi Annan, Boutros Boutros-Ghali

Incr orderInOffice: “President of Iraq”Gardent orderInOffice: “President of Iraq”Graph activeYearsEndDate: 2006-12-30

All algorithms perform well.

• Distinguish Daniel Vettori from Kyle Mills

Incr country: New ZealandGardent NOT country: New Zealand national cricket team*Graph description: “New Zealand cricketer”

Here the accuracy of the information comes into play. Both people are New Zealand cricketers

but the information on Mills is poorer than on Vettori. The REs for all algorithms are incorrect

but work due to lack of data on Mills. In the new version of DBpedia the description attribute for

Mills has been added and now Graph fails. The other two algorithms still work well, even if they

should not.

• Distinguish Park Geun-hye from Martin Dempsey

Incr type: Office HolderGardent NOT type: Military Person*Graph birth year: 1952

This is very curious, both people are born in the same year, but that information was missing in

the old version of DBpedia for the distractor.

• Distinguish Paul McCartney from Ringo Starr, John Lennon, George Harrison

Incr instrument: Hfner 500/1Gardent NOT associated musical artist: Plastic Ono Band*Graph background: solo singer

In the old DBpedia, McCartney was the only Beatle marked as a solo singer, while in the new

version all of them are. Note how Gardent picks having not played in the Plastic Ono Band as

McCartney’s most distinguishing feature from the rest.

• Distinguish Vazgen Sargsyan from Karen Demirchyan, Paruyr Hayrikyan, SerzhSargsyan, Hovik Abrahamyan

Incr type: Office Holder, Politician, Prime MinisterGardent type: Prime Minister*Graph death date: 1999-10-27

Both Sargsyan and Demirchyan died in a tragic shooting at the Armenian parliament. That

information was not recorded in the old DBpedia for Demirchyan, leading to the error.

24

Page 33: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 25–28,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Content Selection through Paraphrase Detection: Capturing differentSemantic Realisations of the Same Idea

Elena LloretUniversity of Alicante

Alicante, [email protected]

Claire GardentCNRS/LORIANancy, France

[email protected]

1 Introduction

Summarisation can be seen as an instance of Nat-ural Language Generation (NLG), where “what tosay” corresponds to the identification of relevantinformation, and “how to say it” would be asso-ciated to the final creation of the summary. Whendealing with data coming from the Semantic Web(e.g., RDF triples), the challenge of how a goodsummary can be produced arises. For instance,having the RDF properties from an infobox of aWikipedia page, how could a summary expressedin natural language text be generated? and howcould this summary sound as natural as possible(i.e., be an abstractive summary) far from only be-ing a bunch of selected sentences output together(i.e., extractive summary)? This would imply to beable to successfully map the RDF information to asemantic representation of natural language sen-tences (e.g., predicate-argument (pred-arg) struc-tures). Towards the long-term objective of gener-ating abstractive summaries from Semantic Webdata, the specific goal of this paper is to proposeand validate an approach to map linguistic struc-tures that can encode the same meaning but withdifferent words (e.g., sentence-to-sentence, pred-arg-to-pred-arg, RDF-to-TEXT) using continuoussemantic representation of text. The idea is to de-cide the level of document representation to workwith; convert the text into that representation; andperform a pairwise comparison to decide to whatextent two pairs can be mapped or not. For achiev-ing this, different methods were analysed, includ-ing traditional Wordnet-based ones, as well asmore recent ones based on word embeddings. Ourapproach was tested and validated in the contextof document-abstract sentence mapping to checkwhether it was appropriate for identifying impor-tant information. The results obtained good per-formance, thus indicating that we can rely on the

approach and apply it to further contexts (e.g.,mapping RDFs into natural language).

The remainder of this paper is organised as fol-lows: Section 2 outlines related work. Section 3explains the proposed approach for mapping lin-guistic units. Section 4 describes our dataset andexperiments. Section 5 provides the results anddiscussion. Finally, Section 6 draws the main con-clusions and highlights possible futures directions.

2 Related Work

Abstractive summarisation is one of the most chal-lenging issues to address automatically, since itboth requires deep language understanding andgeneration with a strong semantic component. Fortackling this task, approaches usually need to de-fine an internal representation of the text, that canbe in the form of SVO triples (Genest and La-palme, 2011), basic semantic units consisting ofactor-action-receiver (Li, 2015), or using pred-arg structures (Khan et al., 2015). In this latterwork, pred-arg structures extracted from differentrelated documents are compared, so that commonor redundant information can be grouped into clus-ters. For computing a similarity matrix, Wordnet1-based similarity metrics are used, mainly relyingon the semantic distance between concepts, givenWordnets’ hierarchy.

On the other hand, previous works on linguis-tic structure mapping can be related to paraphraseidentification (Fernando and Stevenson, 2008; Xuet al., 2015), as well as to pred-arg alignment(Wolfe et al., 2015; Roth and Frank, 2015). How-ever, these works only use semantic similaritymetrics based on WordNet or other semantic re-sources, such as ConceptNet2 or FrameNet3.

1https://wordnet.princeton.edu/2http://conceptnet5.media.mit.edu/3https://framenet.icsi.berkeley.edu/fndrupal/

25

Page 34: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

The use of continuous semantic representation,and in particular the learning or use of Word Em-beddings (WE) has been shown to be more appro-priate and powerful approach for representing lin-guistic elements (words, sentences, paragraphs ordocuments) (Turian et al., 2010; Dai et al., 2015).Given its good performance, they have been re-cently applied to many natural language genera-tion tasks (Collobert et al., 2011; Kageback et al.,2014). The work presented in (Perez-Beltrachiniand Gardent, 2016) proposes a method to learnembeddings to lexicalise RDF properties, showingalso the potential of using this type of representa-tion for the Semantic Web.

3 Our Mapping Approach

Our approach mainly consists of three stages: i)identification and extraction of text semantic struc-tures; ii) representation of these semantic struc-tures in a continuous vector space; and iii) defineand compute the similarity between two represen-tations.

For the first stage, depending on the level de-fined for the linguistic elements (e.g., a clause, asentence, a paragraph), a text processing is car-ried out, using the appropriate tools to obtainthe desired structures (e.g., sentence segmentation,semantic role labelling, syntactic parsing, etc.).Then, in the second stage, we represent each struc-ture through its WEs. If the structure consists ofmore than one element, we will compute the finalvector as the composition of the WEs of each ofthe elements it contains. This is a common strat-egy that has been previously adopted, in which theaddition or product normally lead to the best re-sults (Mitchell and Lapata, 2008; Blacoe and Lap-ata, 2012; Kageback et al., 2014). Finally, the aimof the third stage is to define a similarity metricbetween the vectors obtained in the second stage.

4 Dataset and Approach Configuration

The English training collection of documents andabstracts from the Single document Summariza-tion task (MSS)4 of the MultiLing2015 was usedas corpus. It consisted of 30 Wikipedia docu-ments from heterogeneous topics (e.g., history ofTexas University, fauna of Australia, or MagicJohnson) and their abstracts, which correspondedto the introductory paragraphs of the Wikipedia

4http://multiling.iit.demokritos.gr/pages/view/1532/task-mss-single-document-summarization-data-and-information

page. Documents were rather long, having 3,972words on average (the longest document had 8,348words and the shortest 2,091), whereas abstractswere 274 words on average (the maximum valuewas 305 words and the minimum 243), thus re-sulting in a very low compression ratio5 - around7%.

For carrying out the experiments, our approachreceives document-abstract pairs as input. Thesecorrespond to the source documents, as well asthe abstracts associated to those documents. Fol-lowing the stages defined in Section 3, both weresegmented in sentences, and the pred-arg struc-tures were automatically identified using SENNAsemantic role labeller 6. Different configurationswere tested as far as the WE and the similar-ity metrics were concerned for the second andthird stages. For representing either sentences orpred-arg structures, GLoVe pre-trained WE vec-tors (Pennington et al., 2014) were used, specifi-cally the ones derived from Wikipedia 2014 + Gi-gaword 5 corpora, containing around 6 billion to-kens; and the ones derived from a Common Crawl,with 840 billion tokens. Regarding the similar-ity metrics, Wordnet-based metrics included theshortest path between synsets, Leacock-Chodorowsimilarity, Wu-Palmer similarity, Resnik similar-ity, Jiang-Conrath similarity, and Lin similarity,all of them implemented in NLTK7. For the WEsettings, the similarity metrics were computed onthe basis of the cosine similarity and the Euclideandistance. These latter metrics were applied uponthe two composition methods for sentence embed-ding representations: addition and product, as de-scribed in (Blacoe and Lapata, 2012). In the end,a total of 38 distinct configurations were obtained.

5 Evaluation and Discussion

We addressed the validation of the sourcedocument-abstract pairs mapping as an extrinsictask using ROUGE (Lin, 2004). ROUGE is a well-known tool employed for summarisation evalu-ation, which computes the n-gram overlappingbetween an automatic and a reference summaryin terms of n-grams (unigrams - ROUGE 1; bi-grams - ROUGE 2, etc.). Our assumption behindthis type of evaluation was that considering the

5The compression ratio is the size of the summary with re-spect to the source document, i.e., the percentage of relevantinformation to be kept.

6http://ronan.collobert.com/senna/7http://www.nltk.org/

26

Page 35: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

ROUGE-1 ROUGE-2 ROUGE-SU4R P F R P F R P F

TEXT baseline 41.63 40.64 41.11 10.11 9.87 9.99 15.67 15.29 15.45Best TEXT+ WORDNET 42.04 41.58 41.78 11.40 11.25 11.32 16.55 16.34 16.43Best TEXT+ WE 50.36 47.99 49.12 17.35 16.56 16.94 22.51 21.46 21.96PRED-ARG baseline 34.64 34.05 34.24 7.19 7.09 7.12 12.25 12.04 12.10Best PRED-ARG + WORDNET 38.45 38.45 38.39 9.97 9.98 9.96 14.79 14.80 14.77Best PRED-ARG + WE 46.88 45.17 45.97 15.18 14.53 14.84 20.02 19.23 15.60

Table 1: Results (in percentages) for the extrinsic validation of the mapping.

source document snippets of the top-ranked map-ping pairs, and directly building a summary withthem (i.e., an extractive summary), good ROUGEresults should be obtained if the mapping wasgood enough.

Table 1 reports the most relevant results ob-tained. As baselines, we considered the ROUGEdirect comparison between the sentences (or pred-arg structures) of the source document and theones in the abstract (TEXT baseline, and PRED-ARG baseline, respectively). We report the re-sults for ROUGE-1, ROUGE-2 and ROUGE-SU48. The results obtained show that represent-ing the semantics of a sentence or pred-arg struc-ture using WE leads to the best results, improv-ing those from traditional WordNet-based similar-ity metrics. The best approach for the WE con-figuration corresponds to the addition compositionmethod with cosine similarity, and using the pre-trained WE derived from Wikipedia+GigaWord.Compared to the state of the art in summarisa-tion, the results with WE are also encouraging,since previous published results with the same cor-pus (Alcon and Lloret, 2015) are close to 44% (F-measure for ROUGE-1).

Concerning the comparison between whetherusing the whole text with respect to only us-ing the pred-arg structures, the former gets bet-ter results. This is logical since the more text tocompare, the higher chances to obtain similar n-grams when evaluating with ROUGE. However,this also limits the capability of abstractive sum-marisation systems, since we would end up withselecting the sentences as they are, thus restrict-ing the method to purely extractive. Nevertheless,the results obtained by the use of pred-arg struc-tures are still reasonably acceptable, and this typeof structure would allow to generalise the key con-tent to be selected that should be later rephrased ina proper sentence, producing an abstractive sum-

8ROUGE-SU4 accounts for skip-bigrams with maximumgap length of 4.

mary. Next, we provide the top 3 best pair align-ments (source document— abstract) of the highestperforming configuration using pred-arg structureas examples. The value in brackets mean the sim-ilarity percentage obtained by our approach.

protected areas — protected areas (100%)the insects comprising 75% of Australia ’s knownspecies of animals —The fauna of Australia con-sists of a huge variety of strange and unique an-imals ; some 83% of mammals, 89% of reptiles,90% of fish and insects (99.94%)European settlement , direct exploitation of na-tive faun , habitat destruction and the introduc-tion of exotic predators and competitive herbi-vores led to the extinction of some 27 mammal,23 bird and 4 frog species. — Hunting, the intro-duction of non- native species, and land - man-agement practices involving the modification ordestruction of habitats led to numerous extinc-tions (99.93%)

Finally, our intuition behind the results obtained(maximum values of 50%) is that not all the infor-mation in the abstract can be mapped with the in-formation of the source document, indicating thata proper abstract may contain extra informationthat provides from the world knowledge of its au-thor.

6 Conclusion and Future Work

This paper presented an approach to automaticallymap linguistic structures using continuous seman-tic representation of sentences. The analysis con-ducted over a wide set of configuration showedthat the use of WEs improves the results comparedto traditional WordNet-based metrics, thus beingsuitable to be employed in data-to-text NLG ap-proaches that need to align content from the Se-mantic Web to text in natural language. As fu-ture work, we plan to evaluate the approach in-trinsically and apply it to map non-linguistic in-

27

Page 36: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

formation (e.g., RDF) to natural language. Wewould also like to use the proposed method to cre-ate training positive and negative instances to learnclassification models for content selection.

Acknowledgments

This research work has been partially fundedby the University of Alicante, Generalitat Va-lenciana, Spanish Government and the EuropeanCommission through the projects, “Explotacion ytratamiento de la informacion disponible en In-ternet para la anotacion y generacion de textosadaptados al usuario” (GRE13-15) and “DIIM2.0:Desarrollo de tecnicas Inteligentes e Interacti-vas de Minerıa y generacion de informacionsobre la web 2.0” (PROMETEOII/2014/001),TIN2015-65100-R, TIN2015-65136-C2-2-R, andSAM (FP7-611312), respectively.

ReferencesOscar Alcon and Elena Lloret. 2015. Estudio de

la influencia de incorporar conocimiento lexico-semantico a la tecnica de analisis de componentesprincipales para la generacion de resumenes multil-ingues. Linguamatica, 7(1):53–63, July.

William Blacoe and Mirella Lapata. 2012. A com-parison of vector-based representations for seman-tic composition. In Proceedings of the 2012 JointConference on Empirical Methods in Natural Lan-guage Processing and Computational Natural Lan-guage Learning, pages 546–556, Jeju Island, Korea,July. Association for Computational Linguistics.

Ronan Collobert, Jason Weston, Leon Bottou, MichaelKarlen, Koray Kavukcuoglu, and Pavel Kuksa.2011. Natural language processing (almost) fromscratch. J. Mach. Learn. Res., 12:2493–2537,November.

Andrew M. Dai, Christopher Olah, and Quoc V. Le.2015. Document embedding with paragraph vec-tors. CoRR, abs/1507.07998.

Samuel Fernando and Mark Stevenson. 2008. A se-mantic similarity approach to paraphrase detection.Computational Linguistics UK (CLUK 2008) 11thAnnual Research Colloqium.

Pierre-Etienne Genest and Guy Lapalme. 2011.Framework for abstractive summarization usingtext-to-text generation. In Proceedings of theWorkshop on Monolingual Text-To-Text Generation,MTTG ’11, pages 64–73, Stroudsburg, PA, USA.Association for Computational Linguistics.

Atif Khan, Naomie Salim, and Yogan Jaya Kumar.2015. A framework for multi-document abstrac-tive summarization based on semantic role labelling.Appl. Soft Comput., 30(C):737–747, May.

Mikael Kageback, Olof Mogren, Nina Tahmasebi, andDevdatt Dubhashi. 2014. Extractive summariza-tion using continuous vector space models. In Pro-ceedings of the 2nd Workshop on Continuous VectorSpace Models and their Compositionality (CVSC),pages 31–39, Gothenburg, Sweden, April. Associa-tion for Computational Linguistics.

Wei Li. 2015. Abstractive multi-document summa-rization with semantic information extraction. InProceedings of the 2015 Conference on EmpiricalMethods in Natural Language Processing, pages1908–1913, Lisbon, Portugal, September. Associa-tion for Computational Linguistics.

Chin-Yew Lin. 2004. Rouge: A package for auto-matic evaluation of summaries. In Stan SzpakowiczMarie-Francine Moens, editor, Text SummarizationBranches Out: Proceedings of the ACL-04 Work-shop, pages 74–81, Barcelona, Spain, July. Associa-tion for Computational Linguistics.

Jeff Mitchell and Mirella Lapata. 2008. Vector-basedmodels of semantic composition. In Proceedingsof ACL-08: HLT, pages 236–244, Columbus, Ohio,June. Association for Computational Linguistics.

Jeffrey Pennington, Richard Socher, and Christo-pher D. Manning. 2014. Glove: Global vectors forword representation. In Empirical Methods in Nat-ural Language Processing (EMNLP), pages 1532–1543.

Laura Perez-Beltrachini and Claire Gardent. 2016.Learning Embeddings to lexicalise RDF Properties.In *SEM 2016: The Fifth Joint Conference on Lexi-cal and Computational Semantics, Berlin, Germany.

Michael Roth and Anette Frank. 2015. Inducing Im-plicit Arguments from Comparable Texts: A Frame-work and its Applications. Computational Linguis-tics, 41:625–664.

Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010.Word representations: A simple and general methodfor semi-supervised learning. In Proceedings of the48th Annual Meeting of the Association for Com-putational Linguistics, ACL ’10, pages 384–394,Stroudsburg, PA, USA. Association for Computa-tional Linguistics.

Travis Wolfe, Mark Dredze, and Benjamin Van Durme.2015. Predicate argument alignment using a globalcoherence model. In Proceedings of the 2015 Con-ference of the North American Chapter of the As-sociation for Computational Linguistics: HumanLanguage Technologies, pages 11–20, Denver, Col-orado, May–June. Association for ComputationalLinguistics.

Wei Xu, Chris Callison-Burch, and Bill Dolan. 2015.Semeval-2015 task 1: Paraphrase and semantic sim-ilarity in twitter (pit). In Proceedings of the 9th In-ternational Workshop on Semantic Evaluation (Se-mEval 2015), pages 1–11, Denver, Colorado, June.Association for Computational Linguistics.

28

Page 37: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 29–36,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Aligning Texts and Knowledge Bases with Semantic SentenceSimplification

Yassine Mrabet1,3, Pavlos Vougiouklis2, Halil Kilicoglu1,Claire Gardent3, Dina Demner-Fushman1, Jonathon Hare2, and Elena Simperl2

1Lister Hill National Center for Biomedical CommunicationsNational Library of Medicine, USA

{mrabety,kilicogluh,ddemner}@mail.nih.gov

2Web and Internet Science Research GroupUniversity of Southampton, UK{pv1e13,jsh2,es}@ecs.soton.ac.uk

3CNRS/LORIA, [email protected]

Abstract

Finding the natural language equivalent ofstructured data is both a challenging andpromising task. In particular, an efficientalignment of knowledge bases with textswould benefit many applications, includ-ing natural language generation, informa-tion retrieval and text simplification. Inthis paper, we present an approach to builda dataset of triples aligned with equiva-lent sentences written in natural language.Our approach consists of three main steps.First, target sentences are annotated auto-matically with knowledge base (KB) con-cepts and instances. The triples linkingthese elements in the KB are extracted ascandidate facts to be aligned with the an-notated sentence. Second, we use tex-tual mentions referring to the subject andobject of these facts to semantically sim-plify the target sentence via crowdsourc-ing. Third, the sentences provided bydifferent contributors are post-processedto keep only the most relevant simplifi-cations for the alignment with KB facts.We present different filtering methods, andshare the constructed datasets in the pub-lic domain. These datasets contain 1,050sentences aligned with 1,885 triples. Theycan be used to train natural language gen-erators as well as semantic or contextualtext simplifiers.

1 Introduction

A large part of the information on the Web is con-tained in databases and is not suited to be directlyaccessed by human users. A proper exploitationof these data requires relevant visualization tech-niques which may range from simple tabular pre-sentation with meaningful queries, to graph gener-ation and textual description. This last type of vi-sualization is particularly interesting as it producesan additional raw resource that can be read by bothcomputational agents (e.g. search engines) andhuman users. From this perspective, the abilityto generate high quality text from knowledge anddata bases could be a game changer.

In the Natural language Processing community,this task is known as Natural Language Generation(NLG). Efficient NLG solutions would allow dis-playing the content of knowledge and data basesto lay users; generating explanations, descrip-tions and summaries from ontologies and linkedopen data1; or guiding the user in formulatingknowledge-base queries.

However, one strong and persistent limitationto the development of adequate NLG solutionsfor the semantic web is the lack of appropriatedatasets on which to train NLG models. The diffi-culty is that the semantic data available in knowl-edge and data bases need to be aligned with thecorresponding text. Unfortunately, this alignmenttask is far from straightforward. In fact, both hu-man beings and machines perform poorly on it.

1http://www.linkeddata.org

29

Page 38: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Nonetheless, there has been much work on data-to-text generation and different strategies havebeen used to create the data-to-text corpora thatare required for learning and testing. Two mainsuch strategies can be identified. One strategyconsists in creating a small, domain-specific cor-pus where data and text are manually aligned by asmall group of experts (often the researchers whowork on developing the NLG system). Typically,such corpora are domain specific and of relativelysmall size while their linguistic variability is oftenrestricted.

A second strategy consists in automaticallybuilding a large data-to-text corpus in which thealignment between data and text is much looser.For instance, Lebret et al. (2016) extracted a cor-pus consisting of 728,321 biography articles fromEnglish Wikipedia and created a data-to-text cor-pus by simply associating the infobox of each ar-ticle with its introduction section. The resultingdataset has a vocabulary of 403k words but thereis no guarantee that the text actually matches thecontent of the infobox.

In this paper, we explore a middle-ground ap-proach and introduce a new methodology forsemi-automatically building large, high qualitydata-to-text corpora. More precisely, our ap-proach relies on a semantic sentence simplificationmethod which allows transforming existing cor-pora into sentences aligned with KB facts. Con-trary to manual methods, our approach does notrely on having a small group of experts to iden-tify alignments between text and data. Instead, thistask is performed (i) by multiple, independent con-tributors through a crowdsourcing platform, and(ii) by an automatic scoring of the quality of thecontributions, which enables faster and more re-liable data creation process. Our approach alsodeparts from the fully automatic approaches (e.g.,(Lebret et al., 2016) ) in that it ensures a system-atic alignment between text and data.

In the following section we present work relatedto corpus generation for NLG. In section 3 we de-scribe our approach. Section 4 presents the exper-iments, evaluations, and the statistics on the initialcorpora and the generated (aligned) datasets.

2 Related Work

Many studies tackled the construction of datasetsfor natural language generation. Several availabledatasets were created by researchers and develop-

ers working on NLG systems. Chen and Mooney(2008) created a dataset of text and data describ-ing the Robocup game. To collect the data, theyused the Robocup simulator (www.robocup.org)and derived symbolic representations of gameevents from the simulator traces using a rule-based system. The extracted events are repre-sented as atomic formulas in predicate logic withtimestamps. For the natural language portion ofthe data, they had humans comment games whilewatching them on the simulator. They manuallyaligned logical formulas to their correspondingsentences. The resulting data-to-text corpus con-tains 1,919 scenarios where each scenario consistsof a single sentence representing a fragment of acommentary on the game, paired with a set of log-ical formulas.

The SumTime-Meteo corpus was created by theSumTime project (Sripada et al., 2002). The cor-pus was collected from the commercial output offive different human forecasters, and each instancein the corpus consists of three numerical data filesproduced by three different weather simulators,and the weather forecast file written by the fore-caster. To train a sentence generator, (Belz, 2008)created a version of the SumTime-Meteo corpuswhich is restricted to wind data. The resultingcorpus consists of 2,123 instances for a total of22,985 words and was used by other researchersworking on NLG and semantic parsing (Angeli etal., 2012).

Other data-to-text corpora were proposed fortraining and testing generation systems, includingWeatherGov (Liang et al., 2009), the ATIS dataset,the Restaurant Corpus (Wen et al., 2015) and theBAGEL dataset (Mairesse et al., 2010). Weath-erGov consists of 29,528 weather scenarios for3,753 major US cities. In the air travel domain, theATIS dataset (Dahl et al., 1994) consists of 5,426scenarios. These are transcriptions of spontaneousutterances of users interacting with a hypotheti-cal online flight-booking system. The RESTAU-RANTS corpus contains utterances that a spokendialogue system might produce in an interactionwith a human user together with the correspond-ing dialog act. Similarly, the BAGEL dataset isconcerned with restaurant information in a dialogsetting.

In all these approaches, datasets are createdusing heuristics often involving extensive man-ual labour and/or programming. The data is

30

Page 39: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

mostly created artificially from sensor or web datarather than extracted from some existing knowl-edge base. As the data are often domain specific,the vocabulary size and the linguistic variability ofthe target text are often restricted.

Other approaches tackled the benchmarking ofNLG systems and provided the constructed datasetas a publicly available resource. For instance, aSurface Realisation shared task was organised in2011 to compare and evaluate sentence generators(Belz et al., 2011). The dataset prepared by the or-ganisers was derived from the PennTreebank andassociates sentences with both a shallow represen-tation (dependency trees) and a deep representa-tion where edges are labelled with semantic roles(e.g., agent, patient) and the structure is a graphrather than a tree. While the data-to-text corpusthat was made available from this shared task wasvery large, the representation associated with eachsentence is a linguistic representation and is notrelated to a data schema.

The KBGen shared task (Banik et al., 2013) fol-lowed a different approach and focused on gener-ating sentences from knowledge bases. For thistask, knowledge base fragments were extractedsemi-automatically from an existing biologyknowledge base (namely, BioKB101 (Chaudhri etal., 2013)) and expert biologists were asked to as-sociate each KB fragments with a sentence ver-balising their meaning. The resulting dataset wassmall (207 data-text instances for training, 70 fortesting) and the creation process relied heavily ondomain experts, thereby limiting its portability.

In sum, there exists so far no standard method-ology for rapidly creating data-to-text corpora thatare both sufficiently large to support the trainingand testing of NLG systems and sufficiently pre-cise to support the development of natural lan-guage generation approaches that can map KBdata to sentences. The procedures designed byindividual researchers to test their own proposalsyield data in non-standard formats (e.g., tabularinformation, dialog acts, infoboxes) and are oftenlimited in size. Data used in shared tasks either failto associate sentences with knowledge base data(SR shared task) or require extensive manual workand expert validation.

3 Methods

Our approach tackles the conversion of existingtextual corpora into a dataset of sentences aligned

with<subject, predicate, object> triples collectedfrom existing KBs. It is independent from the se-lected corpus, domain, or KB.

In the first step, we annotate automatically thetarget textual corpus by linking textual mentions toknowledge base concepts and instances (KB enti-ties for short). In the second step, we collect triplesfrom the knowledge bases that link the entitiesmentioned in a given sentence. In the third step,we keep only the mentions that refer to the subjectand object of the same triple and perform seman-tic simplification with a crowdsourcing task. Fi-nally we apply several post-processing algorithms,including clustering and scoring to keep the mostrelevant semantic simplifications of each sentenceas a natural language expression of the set of col-lected triples.

The alignment that we aim to achieve is not bi-nary, as an output of our approach, one sentencecould be aligned with N triples (N ≥ 1). Thisproperty is particularly interesting for NLG as itallows training generation systems on expressingsets of triples in the same sentence; enabling theproduction of more fluent texts.

3.1 Corpus Annotation and Initial SentenceSelection

In the following we present our methods to obtainautomatic initial annotations of the target corporaand to select the sentences that will be used in thefinal aligned dataset.

3.1.1 Corpus AnnotationIn order to have varied empirical observations, weuse two different methods for initial corpus an-notation. In the first annotation method we donot check if the candidate triples are actually ex-pressed in the sentence, only their subjects and ob-jects. This method is particularly suitable to dis-cover new linguistic expressions of triple predi-cates, and can provide actual expressions of thetriple by accumulating observations from differentsentences.

To implement this method we use KODA (Mra-bet et al., 2015) to link textual mentions to KB en-tities. KODA is an unsupervised entity linking toolthat relies only on the KB contents to detect anddisambiguate textual mentions. More precisely, itdetects candidate textual mentions with a TF-IDFsearch on the labels of KB entities, and disam-biguates them by maximizing the coherence be-tween the candidate KB entities retrieved for each

31

Page 40: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

mention using KB relations.In the second step we query the KB (e.g.,

SPARQL endpoint of DBpedia) to obtain the pred-icates that link the KB entities mentioned in thesentence and keep them as candidate facts. Forinstance, the 8 highlighted terms in figure 1 werelinked to DBpedia entities, but only 4 terms men-tion KB entities that are linked in DBpedia triples.

This first method is scalable w.r.t. the domain ofinterest as it can be ported to other KBs with thesame implementation.

In the second annotation method, we per-form the automatic annotation by checking thatthe triples are actually expressed in the sentence.We use SemRep (Rindflesch and Fiszman, 2003),a biomedical relation extraction system. Sem-Rep extracts binary relations from unstructuredtexts. The subject and object of these relations areconcepts from the UMLS Metathesaurus (Lind-berg et al., 1993) and the predicate is a relationtype from an expanded version of the UMLS Se-mantic Network (e.g., treats, diagnoses, stimu-lates, inhibits). SemRep uses MetaMap (Aronsonand Lang, 2010) to link noun phrases to UMLSMetathesaurus concepts. For example, the 4 high-lighted terms in figure 2 were linked to UMLSconcepts and all terms mention either the subjector the object of a relation extracted with SemRep.

In both methods, we keep only the annotationsthat refer to subjects and objects of candidate facts.

3.1.2 Initial Sentence Selection.Due to the unsupervised aspect of automatic an-notation and the incompleteness of the KBs, somesentences are expected to be annotated more heav-ily than others, and some sentences are expectedto have more triples associated with them thanothers. In practice, different targets of annotation(e.g. specific semantic categories) could also leadto similar discrepancies.

In order to train automatic sentence simplifierswith our datasets, we have to consider differentlevels of coverage that can correspond to differentannotation tools and dissimilar annotation goals.Accordingly, once the initial corpus is annotated,we select three sets of sentences: (1) a first setof sentences that are heavily annotated w.r.t. thenumber of triples (e.g. between 5 and 10 tokensper triple), (2) a second set with average annota-tion coverage (e.g. between 10 and 20 tokens pertriple), and (3) a third set of weakly annotated sen-tence (e.g. above 20 tokens per triple).

3.2 Semantic Sentence Simplification (S3)

In order to obtain the final dataset of KB factsaligned with natural language sentences from theinitial automatically annotated corpus, we definethe task of Semantic Sentence Simplification (S3)and introduce the crowdsourcing process used toperform it.

Definition. Given a sentence S, a set of textualmentions M(S) linked to a set of KB instancesand concepts E(S) and a set of triples T (S) ={ti(ei1 , pi, ei2), s.t.e1 ∈ E(S), e2 ∈ E(S), thesemantic simplification task consists of shorteningthe sentence S as much as possible according tothe following rules:

• Keep the textual mentions referring to thesubject and object of candidate facts.

• Keep the relations expressed between thesetextual mentions in the sentence.

• Keep the order of the words from the originalsentence as much as possible.

• Ensure that the simplified sentence is gram-matical and meaningful.

• Avoid using external words to the extent pos-sible.

Crowdsourcing. We asked contributors to pro-vide simplifications for each sentence through acrowdsourcing platform. We highlighted the tex-tual mentions referring to subjects and objects ofcandidate facts in these sentences. The contribu-tors are then asked to follow the S3 requirementsto shorten the sentences. The quality requirementthat was set during the experiment is that eachcontributor should dedicate at least 15 seconds foreach set of 3 sentences.

After several preliminary experiments, weopted for a crowdsourcing process without quizquestions to attract more participants; and wemonitored closely the process to filter out irrele-vant contributors such as spammers (e.g. peopletyping in random letters), foreign-language speak-ers who misunderstood the task and tried to pro-vide translations of the original sentence, and con-tributors who simply copied the original sentence.By flagging such contributors we also optimizedsignificantly the monitoring for the second corpus.

32

Page 41: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Sacco flew as a payload specialist on STS-73, which launched on October 20, 1995, and landed at the Kennedy Space Centeron November 5, 1995.

Mention DBpedia Entity

Sacco dbr:Albert Sacco

payload specialist dbr:Payload Specialist

STS-73 dbr:STS-73

October 20 dbr:October 20

1995 dbr:1995

Kennedy Space Center dbr:Kennedy Space Center

November 5 dbr:November 5

Triplesdbr:Albert Sacco dbo:mission dbr:STS-73dbr:STS-73 dbp:landingSite dbr:Kennedy Space Centerdbr:STS-73 dbp:launchSite dbr:Kennedy Space Center

Figure 1: Example sentence annotated with DBpedia entities and its candidate triples.

The antiviral agent amantadine has been used to manageParkinson’s disease or levodopa-induced dyskinesias for

nearly 5 decades.

Mention UMLS Entityamantadine C0002403

antiviral agent C0003451Parkinson’s disease C0030567

levodopa-induced dyskinesias C1970038

TriplesAmantadine isa Antiviral AgentsAmantadine treats Parkinson DiseaseAmantadine treats Levodopa-induced dyskinesias

Figure 2: Example sentence annotated withUMLS concepts and triples.

3.3 Selecting the best simplification

In order to select the most relevant simplificationfor a given sentence from the set of N simplifica-tions proposed by contributors, we test two base-line methods and two advanced scoring methods.

3.3.1 Baselines.

The first baseline method is simply the selectionof the simplification that has more votes. We willrefer to it as V ote in the remainder of the paper.The second baseline method, called Clustering,is based on the K-Means clustering algorithm.It uses the Euclidean distance measured betweenword vectors to cluster the set ofN simplificationsof a given sentence into K clusters. The clus-ter with the highest cumulative number of votesis selected as the most significant cluster, and theshortest sentence in that cluster is selected as thecandidate simplification.

3.3.2 Scoring MethodsOur first selection method scores a simplificationaccording to the original sentence and to thesimplification goals expressed in section 3.3.We define four elementary measures to computea semantic score: lexical integrity, semanticpreservation, conformity and relative shortening.Given an initial sentence so and a simplificationsi proposed for so, these measures are defined asfollows.

Conformity (cnf ). The conformity score repre-sents how much the simplification si conforms tothe rules of the S3 task. It combines lexical in-tegrity and semantic preservation:

cnf(si, so) = ζ(si, so)× ι(si, so) (1)

Lexical integrity (ι). ι(si, so) is the proportion ofwords in si that are in so. ι values are in the [0,1]range. The value is lower than 1 if new externalwords are used.

Semantic Preservation (ζ). Semantic preserva-tion indicates how much of the textual mentionsthat are linked to KB entities and KB triples arepresent in the simplification. More precisely,ζ(si, so) is the ratio of annotations from so thatare present in si. ζ values are in the [0,1] range.

Relative Shortening (η). Simplifications that aretoo short might miss important relations or en-tities, whereas simplifications that are too long

33

Page 42: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

might be too close (or equal) to the original sen-tence. We represent both aspects through a Gaus-sian and make use of the “wisdom of the crowd”by setting the maximum value at the averagelength of the simplifications proposed by the con-tributors. In order to have a moderate decreasearound the average, we set both the maximumvalue and the standard deviation to 1. Length ismeasured in terms of tokens.

η(si, so) = exp(−(length(si)− lengthavg)22

)

(2)Semantic score (ψ). We compute the seman-tic score for a simplification si of so by combin-ing the above elements. This combination, ex-pressed in equation 3, is based on the follow-ing intuitions: (1) between two simplificationsof the same sentence, the difference in confor-mity should have more impact than the differencein shortening, (2) for the same conformity value,simplifications that are farther from the originalsentence are preferred, and (3) simplifications thathave a more common shortening extent should bebetter ranked.

ψ(si, so) = η(si, so)×exp(cnf(si, so)×euclidean(si, so))(3)

The Euclidean function is the Euclidean dis-tance between the original sentence and the sim-plification in terms of tokens. Our second scoringmethod relied first on the clustering of the con-tributors’ sentences. As the baseline it identifiesthe cluster with more votes as most significant.However, the representative sentence is selectedaccording to the semantic score ψ, instead of sim-ply taking the shortest sentence of the cluster. Wedenote this in-cluster scoring method ξ.

4 Experiments and Results

In the first experiments, we build two datasets ofnatural language sentences aligned with KB facts.Corpora and knowledge bases. Our firstdataset is built by aligning all astronaut pageson Wikipedia2 (Wiki) with triples from DBpedia3.The main motivation behind the choice of this cor-pus is to have both general and specific relations.We used KODA as described in section 3.1.1 toobtain initial annotations.

2https://en.wikipedia.org/wiki/List of astronauts by name3http://dbpedia.org

Our second dataset is built by aligning the med-ical encyclopedia part of Medline Plus4 (MLP)with triples extracted with SemRep. The motiva-tion behind the selection of this corpus is twofold:a) to experiment with a domain-specific corpus,and b) to test the simplification when the triplesare extracted from the text itself. Table 1 presentsraw statistics on each corpus.

Wiki MLPDocuments 668 4,360Sentences 22,820 16,575Tokens 478,214 421,272Token per Sentence 20.95 25.41Triples 15,641 30,342Triples per Sentence 0.68 1.83Mentions 64,621 145,308Arguments 13,751 47,742

Table 1: Basic Statistics on Initial Corpora

Crowdsourcing. We used CrowdFlower5 as acrowdsourcing platform. We submitted 600 anno-tated sentences for the Wiki corpus and 450 sen-tences for the MLP corpus.Selection of relevant simplifications. We imple-mented several methods to select the best simpli-fication among the 15 contributions for each sen-tence (cf. section 3.3). To evaluate these methodswe randomly selected 90 initial sentences fromeach dataset, then extracted the best simplifica-tion according to each of the 4 scoring metrics.The authors then rated each simplification from1 to 5, with 1 indicating a very bad simplifica-tion, and 5 indicating an excellent simplification.One of the authors prepared the evaluation tables,anonymized the method names and did not partic-ipate in the evaluation. The remaining 6 authorsshared the 180 sentences and performed the rat-ings. Table 2 presents the final average rating foreach selection method.

Baselines ScoringVote Clustering ξ ψ

Wiki 3.62 3.06 3.51 3.22MLP 3.51 2.87 3.61 3.30Overall 3.56 2.97 3.56 3.26

Table 2: Evaluation of S3 Selection Methods (av-erage rating)

Final statistics on aligned datasets. After evalu-4https://www.nlm.nih.gov/medlineplus/5http://www.crowdflower.com

34

Page 43: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Wiki MLPSentences 600 450Triples 1,119 766Predicates 146 30Tokens 13,641 11,167

(after S3: 9,011) (after S3: 6,854)

Table 3: Statistics on the Aligned Dataset

ating the selection methods we selected the mostrelevant simplification for each sentence in thedataset according to ξ (i.e., in-cluster scoring), andgenerated the final datasets that link the best sim-plification to the facts associated with its originalsentence. Table 3 presents the final statistics on thealigned datasets. Both datasets are made availableonline6.

Table 4 presents the 10 first predicate names andtheir distribution for each dataset.

Wiki MLP

Predicate % Predicate %

rdf:type 15.6 location of 24.93

dbo:type 10.18 is a 20.75

dbo:mission 9.11 process of 14.09

dbo:crewMembers 6.34 treats 7.04

dbo:birthPlace 5.45 causes 6.78

dbo:occupation 4.64 part of 5.87

dbo:nationality 3.30 administred to 3.13

dbo:rank 3.03 coexists with 2.61

dbp:crew2Up 2.94 affects 2.08

dbo:country 1.96 uses 1.43

Table 4: Top 10 predicates

5 Discussion

Automatic Annotation. From our observationson both datasets, we came to the conclusion thatuncertainty is required to some extent in the selec-tion of candidate triples. This is due to the factthat relations extracted from the text itself will fol-low the patterns that were used to find them (e.g.,regular expressions, or classifier models) and thatwill not allow finding enough variation to enrichNLG systems. From this perspective, the best op-tion would be to rank candidate triples accordingto their probability of occurrence in the sentence

6https://github.com/pvougiou/KB-Text-Alignment

and filter out the triples with very low probability.This ranking and filtering are planned for the finalversion of our open-domain corpus.Initial sentence selection. The second goal of ourdatasets is to be able to train automatic semanticsimplifiers that would reduce the need for manualsimplification in the long term. Therefore, our firstmethod took into account different levels of an-notation coverage in order to cope with differentperformance/coverage of annotation tools and dis-similar goals in terms of the semantic categoriesof the mentions. However, for NLG, it is also im-portant to have a balanced number of samples foreach unique predicate. The first extension of ourdatasets will provide a better balance of examplesfor each predicate while keeping the balance interms of annotation coverage to the extent possi-ble.Crowdsourcing. Our crowdsourcing experimentshowed that it is possible to obtain relevant se-mantic simplifications with no specific expertise.This is supported by the fact that the V ote base-line in the selection of the final simplification ob-tained the same best performance as our scoringmethod that relies on the semantics of the S3 pro-cess. Overall, the experiment cost was only $180for 15,750 simplifications collected for 1,050 sen-tences. Our results also show that collecting only10 simplifications for each sentence (instead of 15in our experiments) would be more than adequate,which reduces the costs even further. The two jobscreated for each dataset were generally well-ratedby the contributors (cf. Table 5). The MLP corpuswas clearly more accessible than the Wiki corpuswith an ease of job estimated at 4.4 vs 3.8 (on a5 scale). Interestingly, the identical instructionswere also rated differently according to the dataset(4.2 vs. 3.8). The Wiki corpus was harder to pro-cess, due to the high heterogeneity of the relationsand entity categories. There are also fewer argu-ments per sentence in the Wiki corpus: 0.68 tripleper sentence vs. 1.83, for a close average length of20.95 tokens per sentence vs. 25.41 (cf. Table 1).

Wiki MLPNumber of participants 48 41Clarity of Instructions 3.8 4.2Ease of Job 3.8 4.4Overall Rating of Job 3.9 4.4

Table 5: Number of participants and contributors’ratings (on a 1 to 5 scale)

35

Page 44: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

6 Conclusions

We presented a novel approach to build a corpusof natural language sentences aligned with knowl-edge base facts, and shared the first constructeddatasets in the public domain. We introduced thetask of semantic sentence simplification that re-tains only the natural language elements that cor-respond minimally to KB facts. While our simpli-fication method relied on crowdsourcing, our mid-term goal is to collect enough data to train auto-matic simplifiers that would perform the same taskefficiently. Besides the simplification aspect andthe portability of the method, the shared datasetsare also a valuable resource for natural languagegeneration systems. Future work includes the ex-pansion of these datasets and the improvement ofsentence selection using grammatical-quality fac-tors.

Acknowledgments

This work was supported by the intramuralresearch program at the U.S. National Library ofMedicine, National Institutes of Health.

ReferencesGabor Angeli, Christopher D Manning, and Daniel Ju-

rafsky. 2012. Parsing time: Learning to interprettime expressions. In Proceedings of the 2012 Con-ference of the North American Chapter of the ACL:Human Language Technologies, pages 446–455.

Alan R Aronson and Francois-Michel Lang. 2010. Anoverview of metamap: historical perspective and re-cent advances. JAMIA, 17(3):229–236.

Eva Banik, Claire Gardent, and Eric Kow. 2013. Thekbgen challenge. In the 14th European Workshop onNatural Language Generation (ENLG), pages 94–97.

Anja Belz, Michael White, Dominic Espinosa, EricKow, Deirdre Hogan, and Amanda Stent. 2011. Thefirst surface realisation shared task: Overview andevaluation results. In 13th European workshop onnatural language generation, pages 217–226.

Anja Belz. 2008. Automatic generation of weatherforecast texts using comprehensive probabilisticgeneration-space models. Nat. Language Engineer-ing, 14(04):431–455.

Vinay K Chaudhri, Michael A Wessel, and Stijn Hey-mans. 2013. Kb bio 101: A challenge for tptp first-order reasoners. In CADE-24 Workshop on Knowl-edge Intensive Automated Reasoning.

David L Chen and Raymond J Mooney. 2008. Learn-ing to sportscast: a test of grounded language acqui-sition. In Proceedings of the 25th ICML conference,pages 128–135. ACM.

Deborah A Dahl, Madeleine Bates, Michael Brown,William Fisher, Kate Hunicke-Smith, David Pallett,Christine Pao, Alexander Rudnicky, and ElizabethShriberg. 1994. Expanding the scope of the atistask: The atis-3 corpus. In Proceedings of the work-shop on Human Language Technology, pages 43–48.Association for Computational Linguistics.

R. Lebret, D. Grangier, and M. Auli. 2016. GeneratingText from Structured Data with Application to theBiography Domain. ArXiv e-prints, March.

Percy Liang, Michael I Jordan, and Dan Klein. 2009.Learning semantic correspondences with less super-vision. In Proceedings of the Joint Conference of the47th Annual Meeting of the ACL and the 4th Interna-tional Joint Conference on Natural Language Pro-cessing of the AFNLP: Volume 1-Volume 1, pages91–99.

Donald A Lindberg, Betsy L Humphreys, and Alexa TMcCray. 1993. The unified medical languagesystem. Methods of information in medicine,32(4):281–291.

Francois Mairesse, Milica Gasic, Filip Jurcıcek, SimonKeizer, Blaise Thomson, Kai Yu, and Steve Young.2010. Phrase-based statistical language generationusing graphical models and active learning. In Pro-ceedings of the 48th Annual Meeting of the ACL,pages 1552–1561. Association for ComputationalLinguistics.

Yassine Mrabet, Claire Gardent, Muriel Foulonneau,Elena Simperl, and Eric Ras. 2015. Towardsknowledge-driven annotation. In Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intel-ligence, January 25-30, 2015, Austin, Texas, USA.,pages 2425–2431.

Thomas C Rindflesch and Marcelo Fiszman. 2003.The interaction of domain knowledge and linguis-tic structure in natural language processing: inter-preting hypernymic propositions in biomedical text.Journal of biomedical informatics, 36(6):462–477.

Somayajulu Sripada, Ehud Reiter, Jim Hunter, andJin Yu. 2002. Sumtime-meteo: Parallel corpusof naturally occurring forecast texts and weatherdata. Computing Science Department, Univer-sity of Aberdeen, Aberdeen, Scotland, Tech. Rep.AUCS/TR0201.

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young. 2015.Semantically conditioned lstm-based natural lan-guage generation for spoken dialogue systems.arXiv preprint arXiv:1508.01745.

36

Page 45: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 37–40,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Short paper: Building a System for Stock News Generation in Russian

Liubov NesterenkoNational Research University Higher School of Economics, Moscow

[email protected]

Abstract

In this paper we present an implementation of anNLG system that serves for stock news genera-tion. The system has two modules: analysis mod-ule and NLG module. The first one is intendedfor processing the data on stock index changes,the second one for the news texts generation usingthe template-based NLG approach. The evaluationshown that both modules give relatively accurateresults and the system can be used as a newsbotfor stock news generation.

1 Introduction

The subject of this paper is related to the naturallanguage generation of the stock news.

The stock quotes change considerably duringthe day, which means that financial media shouldreact quickly to each remarkable change and an-nounce the news very often and very fast. That iswhy it becomes necessary to make a system thatreceives the information about latest changes andgenerates short news on the base of it. In 1983there was an attempt made by K. Kukich for finan-cial news generation in English (Kukich, 1983).In our paper we present an NLG system for stocknews generation in Russian created from scratch,it generates the news having as input the daily dataon stock indexes changes.

Our goal is to describe the process of the sys-tem implementation and to discuss the problemswe had to deal with. Here are the tasks we used tosolve as we were working on the system develop-ment: choosing methods for index changes analy-sis, examining the features of financial news texts,collecting financial lexicon, choosing an NLG ap-proach, writing a Python program and evaluatingthe results. The NLG component uses template-based approach. One of the reasons for that wasthe lack of an appropriate target corpus of stock

news that could make statistical approach, likein (Sutskever et al., 2011), possible. Sometimesthe template-based approach is being underesti-mated but some researches consider that it can beas good as the standard approach (Van Deemter etal., 2005). It could be even combined with somestatistical approach, as it was done in (Howald etal., 2013). Moreover, the stock news texts havevery clear and simple structure and content, that iswhy the template-based approach works here quitewell.

2 Preliminary work

2.1 News format

Our first step was to make some research on in-dex changes and index behavior in general. Afterthat we consulted with the experts on stock newsand defined the types of the news that the pro-gram should generate. We decided that there aretwo types of news needed: the morning news andthe evening news and that they should be aboutthe changes of two main Russian indexes: MOEX(Moscow Exchange) and RTSI (Russian TradingSystem Index). Then we explored some financialmedia resources in order to understand what stocknews are like and what features they have. Themajority of the stock news on Russian financialmedia resources appeared to be whether too short(one sentence) or too long (contain lots of infor-mation regarding predictions of index future be-havior and discussions). That induced us to createour own text layouts for rather short but informa-tive news.

The content and the structure of the news textsare as follows:Morning News1. Trades beginning: indexs performance duringthe first minutes after trades started (generaltendency)

37

Page 46: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

2. MOEX index behaviour.3. RTSI index behaviour.4. The characteristic of the last trading day(generally and particular).5. MOEX index behaviour yesterday.6. RTSI index behaviour yesterday.7. The trades value for the previous trading day.

Evening news1. General tendency during the day.2. MOEX index behaviour.3. RTSI index behaviour.4. MOEX final change.5. RTSI final change.6. The trades value for the day.

These text structures serve as layouts for thenews. For each position one should create the suit-able templates that will build up the text.

3 Methodology

The system has two components: the analysismodule and the NLG module. The first one getsthe daily data on indexes as input and determinesthe index changes using the algorithm we devel-oped for that purpose. As output it gives the socalled ‘events’ (‘event’ = index changes during theday), e.g., ‘no significant changes’, ‘fluctuations’,‘increase’ etc., or the tendencies (e.g., index be-havior during the first hour after the trades open-ing). After that the NLG module takes the event asinput and generates a text according to the changesthe indexes had.

In the next two subsections we illustrate howthese modules function.

3.1 Analysis module

The analysis module uses the data on indexes val-ues to detect the behavior the indexes had duringone particular period of time. For the morningnews generation we take the previous trading daydata and the data on the first hour of the trades,for the evening news — the current day data. Asit was meantioned before the stock news genera-tion in our case demands detecting tendencies andevents. Determining tendencies is retalively easy,one compares the difference between two indexvalues: the opening value and the last value. Thetendencies could be for example ‘increase’, ‘de-crease’ etc. Determining events is a more compli-cated procedure, also the events can have a more

complex structure than the tendencies. We dis-tinguish simple events like ‘increase’, ‘decrease’‘fluctuations’ and also such events as ‘significantincrease & no changes’ or ‘increase & insignifi-cant decrease’ and other similar compound events.For events determination we use the following al-gorithm:1. Check if the index fluctuated.

We check it by calculating the Adjusted R2

value, if it appears to be less than 0.5 then weclaim that the index has fluctuated.

2. Check if there are intervals without anysignificant changes.

For each interval [t1, t2] we calculate theevaluative function by using this formula:

E = α(t1 − t2) +β

σ21,2+10−4 ,

where σ1,2 — standard deviation in the in-terval, α, β — coefficients. Thus, the bigger theinterval is, the more is the evaluative functionvalue and the less the index fluctuates, the more isthe evaluative function value.

3. Fit the polynomials of degrees 2 and 3 tothe index values data.

4. Choose the best approximation.

Since we had detected the interval with nosignificant changes and fit the polynomials,we choose the approximation that best of allcorresponds to the index behavior. If the intervalwith no significant changes is between 1/3 and 1/2of the whole trading day length and it is locatedin the first or the second half of the day, then wechoose this approximation. Otherwise, we shouldchoose between the two polynomials. Most of thecases are well described with the help of quadraticpolynomial, so we have to determine if the cubicpolynomial is needed or not. To acomplishthat one should find the inflection point of thecubic polynomial and the difference between theAdjusted R2 values of the polynomials.

5. Apply the rules to determine the event.

If the check for the intervals with no significantchanges was positive, then the resulting ‘event’will consist of ‘no significant change’ part and‘increase/decrease etc.’ part, e.g., ‘no significantchange & significant decrease’ or ‘increase & no

38

Page 47: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

significant change’. If the quadratic polynomialwas chosen as a suitable approximation, then onewill need such parameters as the sign of the x2

coefficient, the vertex location on the time axisand the threshold crossing (if the changes ex-ceeded 2% relative to the opening then we callsuch changes significant and it affects the event)to determine the event. If the cubic polynomialwas chosen as a suitable approximation, then onewill need to know the the sign of the x3 coefficientto determine the event.

By the means of this algorithm one can deter-mine different types of index changes both simplelike ‘decrease’, ‘increase’, ‘fluctuations’ and com-pound changes like ‘no significant changes & in-crease’. The information about the index changesis further used by the NLG module for news gen-eration.

3.2 NLG module

In this section we describe the NLG process inour system. This module uses text layouts, rules,sentence templates and financial lexicon that wascollected during the work with media texts. In theresult we get short texts like this one below.

(1)RussianSegodnya torgi prohodili v krasnoy zone. Utromindeks MMVB nachal torgi ponizheniem iprodolzhal ustremlyat’sya vniz. V to zhe vremya,ruhnuv utrom, indeks RTS prodolzhil sil’noepadenie. Tak indeks MMVB ponizilsya na 0.39%do otmetki v 1748 punktov, a indeks RTS —snizilsya na 4.4% i dostig 804 punktov. Ob”emtorgov po itogam dnya sostavil 700 millionovdollarov SSHA.

English translationToday the trades ran in the red zone. In themorning MOEX index started to reduce andcontinued its lowering. At the same time RTSIfell and proceeded to decrease. So MOEX lost0.39 % and made up 1748 points, RTSI fell by4.4 % and reached the grade of 804 points. Thevolume of the trading section was 700 millions ofdollars.

First of all the program takes an appropriatetext layout (morning/evening). In traditionaldescriptions of NLG architecture, as in (Reiter

et al., 2000) or (Martin and Jurafsky, 2000),one of the steps in the implementation is themacro planning. In some NLG the systems it isdone authomatically but in our system the macroplanning appears to be predetermined. Thenthe program fills in the positions with differentsentence templates. For each position there aremore than one suitable template. The templatesare clauses with some constituents missing. Someof them have more slots, some of them less,depending on how much variance is needed. Mostof the templates are independent clauses, butsome of them turn out to be the constituents ofone compound sentence in the result. Here aresome examples of the templates the program usesfor generation.

(2) [timeExpr] [subject] began to [predicate].

(3) [timeExpr] MOEX index [predicate] [value]%to [value] points, RTSI index [predicate] [value]%to [value] points.

(4) a. [timeExpr] MOEX index [predicate][value] % to [value] points.b. , [link] RTSI index [predicate] [value] % to[value] points.

In examples (2) and (3) we presented the tem-plates for independent clauses, but in (4) there aretwo clauses connected by the linking word. Thewords in the square brackets represent the missingconstituents or the slots of the templates.

The next step is filling in the slots in the tem-plates with the words from our lexicon. The in-formation about the types of index changes, or theevents, affected the contents and the structure ofthe lexicon (both predicates and connective words)that is used by the program. There are groups oflexemes which characterize the changes and corre-spond to particular events. For example there aresuch groups as ‘negative change predicates’, e.g.,to fall, to decrease, ‘positive change predicates’,e.g., to rise, to increase, ‘no change predicates’,e.g., to remain constant, etc. There are also suchgroups of words like ‘nouns of change’ related tothe verbs, e.g., rise, growth, and ‘intensifiers’, e.g.,considerably, a lot, etc.

Since we generate the news about two indexeswhich are indepent and can have different changeson the same day, it appears to be highly impor-

39

Page 48: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

tant to use plenty of connectives to provide the flu-ency to the texts. When the sentence templates hadbeen already chosen and the most of the slots werefilled in, the program applies the rules of templatescombining. For example, if in the template (4a)the predicate is ‘to increase (by)’ and in (4b) it is‘to fall (by)’, then the program chooses the adver-sative conjunction as a link for these clauses. Ingerenal the choice of the connectives depends onthe correlation of changes that two indexes demon-strate. It is taken into account if the indexes havethe same change tendencies or they differ in theirbehavior and how much differ in it.

When all the previous steps are finished the pro-gram does the post-processing such as adding thepunctuation marks and capitalisation where it isneeded.

4 Evaluation

The system evaluation was divided into twostages, because the modules were evaluated sep-arately.

The analysis module evaluation was done in thefollowing way. For 100 data samples of indexchanges during the day we automatically deter-mined their events and then manualy checked howmany of these were determined correctly. The per-centage of the right answers we got was 87%.

The NLG module was evaluated both manuallyand automatically using the BLEU metric (Pap-ineni et al., 2002). For the manual evaluation wetook 100 generated texts. These texts were ratedaccording to the following scale: 2 — ‘fluent’, 1— ‘understandable’, 0 — ‘disfluent’. It turned outthat 61% of the texts were fluent, 28% were un-derstandable and 5% were disfluent. The BLEUvalue appeared to be 0.66, for the calculation weused 70 gold standard sentences and 50 automat-ically generated sentences that describe the indexchanges. We decided to pick them for evaluationbecause unlike the other sentences in the news thesentences about changes have a high level of vari-ation. We also admit the lack of gold standard ma-terial might have affected the BLEU results.

5 Acknowledgements

I would like to thank Alexei Nesterenko, PhD, forhis advice, help and encouragement while I wasdoing the math for this research, I am also verythankful to Anastasia Bonch-Osmolovskaya, PhD,for her support and help during the whole time I

worked on this paper and to Andrei Babitsky forhis expert opinion on what the output news textsshould be like.

ReferencesBlake Howald, Ravi Kondadadi, and Frank Schilder.

2013. Domain adaptable semantic clustering instatistical nlg. In Proceedings of the 10th Inter-national Conference on Computational Semantics(IWCS 2013), pages 143–154.

Karen Kukich. 1983. Design of a knowledge-basedreport generator. In Proceedings of the 21st annualmeeting on Association for Computational Linguis-tics, pages 145–150. Association for ComputationalLinguistics.

James H Martin and Daniel Jurafsky. 2000. Speechand language processing. International Edition,710.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automaticevaluation of machine translation. In Proceedings ofthe 40th annual meeting on association for compu-tational linguistics, pages 311–318. Association forComputational Linguistics.

Ehud Reiter, Robert Dale, and Zhiwei Feng. 2000.Building natural language generation systems, vol-ume 33. MIT Press.

Ilya Sutskever, James Martens, and Geoffrey E Hin-ton. 2011. Generating text with recurrent neuralnetworks. In Proceedings of the 28th InternationalConference on Machine Learning (ICML-11), pages1017–1024.

Kees Van Deemter, Emiel Krahmer, and Mariet The-une. 2005. Real versus template-based natural lan-guage generation: A false opposition? Computa-tional Linguistics, 31(1):15–24.

40

Page 49: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 41–45,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Content selection as semantic-based ontology exploration

Laura Perez-BeltrachiniCNRS/LORIA

Nancy (France)

[email protected]

Claire GardentCNRS/LORIA

Nancy (France)

[email protected]

Anselme RevuzI.U.T. BlagnacToulouse (France)

[email protected]

Saptarashmi BandyopadhyayIIEST

Shibpur (India)

[email protected]

1 Introduction

Natural Language (NL) based access to informa-tion contained in Knowledge Bases (KBs) hasbeen tackled by approaches following differentparadigms. One strand of research deals with thetask of ontology-based data access and data ex-ploration (Franconi et al., 2010; Franconi et al.,2011). This type of approach relies on two pillarcomponents. The first one is an ontology describ-ing the underlying domain with a set of reasoningbased query construction operations. This com-ponent guides the lay user in the formulation of aKB query by proposing alternatives for query ex-pansion. The second is a Natural Language Gen-eration (NLG) system to hide the details of the for-mal query language to the user. Our ultimate goalis the automatic creation of a corpus of KB queriesfor development and evaluation of NLG systems.

The task we address is the following. Given anontology K, automatically select from K descrip-tions q which yield sensible user queries. The dif-ficulty lies in the fact that ontologies often omitimportant disjointness axioms and adequate do-main or range restrictions (Rector et al., 2004;Poveda-Villalon et al., 2012). For instance, the toyontology shown in Figure 1 licences the meaning-less query in (1). This happens because there isno disjointness axiom between the Song and Rect-angular concepts and/or because the domain of themarriedTo relation is not restricted to persons.

(1) Who are the rectangular songs married to a person?

Song u Rectangular u ∃marriedTo.Person

> v ∀ marriedTo.PersonPerson v >Song v >

Rectangular v Shape

Figure 1: Toy ontology.

In this work, we explore to what extent vector

space models can help to improve the coherence ofautomatically formulated KB queries. These mod-els are learnt from large corpora and provide gen-eral shared common semantic knowledge. Suchmodels have been proposed for related tasks. Forexample, (Freitas et al., 2014) proposes a distri-butional semantic approach for the exploration ofpaths in a knowledge graph and (Corman et al.,2015) uses distributional semantics for spottingcommon sense inconsistencies in large KBs.

Our approach draws on the fact that natural lan-guage is used to name elements, i.e. concepts andrelations, in ontologies (Mellish and Sun, 2006).Hence, the idea is to exploit lexical semanticsto detect incoherent query expansions during theautomatic query formulation process. Followingideas from the work in (Kruszewski and Baroni,2015; Van de Cruys, 2014), our approach usesword vector representations as lexical semantic re-sources. We train two semantic “compatibility”models, namely DISCOMP and DRCOMP . The firstone will model incompatibility between conceptsin a candidate query expansion and the secondincompatibility between concepts and candidateproperties.

2 Query language and operations

x

w

z

{Car, New}

{CarDealer}

{City}

soldBy

locatedIn

Following (cf. (Guagliardo,2009)) a KB query is a labelledtree where edges are labelledwith a relation name and nodesare labelled with a variable anda non-empty set of conceptnames from the ontology.

The query construction pro-cess starts from the initial KB query with a singlenode. The four operations (cf. (Guagliardo, 2009)for a formal definition of the operations) availablefor iteratively refining the KB query are: add for

41

Page 50: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

the addition of new concepts and relations; sub-stitution for replacing a portion of the query witha more general, specific or compatible concept;deletion for removing a selected part of the query;and weaken for making the query as general aspossible. A sequence of query formulation stepsillustrating these operations is shown in Figure 2.

I am looking for something. (Initial request)

... for a new car. (Substitution)

... for a new car sold by a car dealer. (Add relation)

... for a new car, a coupe sold by a car dealer. (Add concept)

... for a new car sold by a car dealer. (Deletion)

... for a car sold by a car dealer. (Weaken)

Figure 2: Query formulation sequence.

3 Extracting KB queries

To automatically select queries from a KB, werandomise the application of the add and opera-tion. That is, starting from a query tree with onenode, the operation is iteratively applied at a ran-domly selected node up to a maximum number ofsteps1. The add operation divides in add com-patible concepts and add compatible relations (cf.(Guagliardo, 2009)). Given a node n labelled withconcept s, the first one will add another conceptlabel s′ (e.g., Car and New in the example querytree in Section 2), the second will attach a relationand its range (p, o) to the node (e.g., (CarDealer,locatedIn, City)).

The add operation picks up a concept (relation)from a list of candidate concepts (relations) toexpand the current query. These candidates arecomputed using reasoning operations on the querybuild so far and the underlying ontology. As dis-cussed in Section 1, the lack of axioms in theontology will enable the inference and the selec-tion of incoherent candidate content such as (Song,Rectangular) and ( Song, marriedTo, Person).

To filter out incoherent suggestions made by theadd operations we propose the following models2.

Concept compatibility model (DISCOMP ). Asexplained in (Kruszewski and Baroni, 2015), dis-tributional semantic representations provide mod-els for semantic relatedness and have shown goodperformance in many lexical semantic tasks (Ba-roni et al., 2014). While they model semantic re-

1A parameter to the random query generation process.2Note that another alternative would be to use the models

we introduce to help with the enrichment ontologies.

latedness, for instance, car and tyre are related con-cepts, they fail to capture the notion of semanticcompatibility. That is, there is no thing that canbe both a car and a tyre at the same time. Thus,they propose a Neural Network (NN) model thatlearns semantic characteristics of concepts classi-fying them as (in)compatible. We adapt their bestperforming model, namely 2L-interaction, for ourtask of detecting whether two ontology concepts(s, s′) are incompatible.

Selectional compatibility model (DRCOMP ). Se-lectional constraints concern the semantic type im-posed by predicates to the arguments they take.For instance, the predicate sell will impose the con-straint for its subjects to be, for instance, of typeOrganisation or Person. Thus, it would be accept-able to say A car dealer sells new cars while it wouldbe rare to say A tyre sells new cars.

Our idea is to apply the notion of selectionalpreferences to ontology relations and the conceptsthey can be combined with. That is, whether a can-didate relation p to be attached to a node labelledwith concept s, i.e. forming the triple (s, p, o)3, isa plausible candidate. Along the lines of the workin (Van de Cruys, 2014), we train a NN model topredict (in)compatible subject concept - relation(s, p) pairs4.

4 Experimental setup

Both models use the best performing word vectorsavailable at http://clic.cimec.unitn.it/composes/semantic-vectors.html(Baroni et al., 2014).

DISCOMP dataset. This dataset consists of com-patible and incompatible example pairs. We ex-tract them in the following way. We combine a setof manually annotated pairs with a set of automat-ically extracted ones.

As manually annotated examples, we use thedataset of (Kruszewski and Baroni, 2015) plus ad-ditional examples extracted from the results of dif-

3Note that the concept o taking the object argument placecorresponds to the range of the relation. Thus, at thisstage, we do not attempt to model relation object concept(in)compatibility.

4The architecture of our NN is similar to that proposed by(Van de Cruys, 2014). However, rather than using a rankingloss function, we approximate this by training the networkwith a hinge loss function over labels (-1,1). Another differ-ence is that our input embedding layer is static and initialisedwith pre-trained vectors.

42

Page 51: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

ferent runs of the add operation which were anno-tated manually. These provide 7764 examples.

In addition, we automatically extracted compat-ible and incompatible pairs of concepts from ex-isting ontologies. For incompatible pairs (5273examples), we extracted definitions of disjoint ax-ioms from 52 ontologies crawled from the web andfrom YAGO (Suchanek et al., 2008). The compat-ible pairs (57968 examples) were extracted fromYAGO using the class membership of individuals.We assume that if an instance a is defined as amember of the class A and of the class B at thesame time then both classes are compatible.

The final dataset contains 71918 instances. Wetake 80% for training and the rest for testing.

DRCOMP dataset. We automatically extractsubject-predicate pairs (s, p) from two differ-ent sources, namely nsubj dependencies fromparsed sentences and domain restrictions in on-tologies.

For the extraction of pairs from text, we use theukWaCKy corpus (Baroni et al., 2009), we callthis subset of pairs ukWaCKy.SP, and the Matollcorpus (Walter et al., 2013), call it the WikiDBP.SPsubset. Both corpora contain dependency parsedsentences. In addition, the Matoll corpus pro-vides annotations linking entities mentioned inthe text with DBPedia entities. For the first SPdataset, we take the head and dependent partici-pating in nsubj dependency relations as trainingpairs (s, p). For the second SP dataset, we use theDBPedia annotations associated to nsubj depen-dents. That is, we create (s, p) pairs where the scomponent rather than being the head entity men-tion, it is the DBPedia concept to which this entitybelongs to. We do this by using the DBPedia entityannotations present in the corpus. For instance,given the dependency nsubj(Stan Kenton,winning), because Stan Kenton is annotatedwith the DBPedia entity http://dbpedia.org/resource/Stan_Kenton and this en-tity is defined to be of type Person and Artist,among others, we can create (s, p) pairs such as(person,winning) and (artist, winning).

For the pairs based on ontology definitions, weuse the 52 ontologies crawled from the web. Wecall this subset of pairs KB.SP.

For training the model, we generate nega-tive instances by corrupting the extracted data.For each (s, p) pair in the dataset we generatean (s′, p) pair where s′ is not seen occurring

with p in the training corpus. The final datasetcontains 610522 training instances (30796 fromukWaCKy.SP, 571564 from WikiDBP.SP and 8162from KB.SP ). We take out 600 cases, 300 fromukWaCKy.SP and 300 from KB.SP, for testing themodel on specific text and KB pairs.

5 Evaluation

We separately evaluate the performance of eachmodel in a held out testing set. Table 1 showsthe results for the DISCOMP model. Table 2 showsthe results obtained when evaluating the DRCOMP

model. Both models perform well in the intrinsicevaluation.

Test dataset Accuracy(Kruszewski and Baroni, 2015) 0.72

DISCOMP 0.98

Table 1: Results reported by (Kruszewski and Ba-roni, 2015) and results obtained with the DISCOMP

model.

Test dataset AccuracyEmb. + NN ukWaCKy.SP 0.69

KB.SP 0.77

Table 2: Results after (Emb.+NN) training withthe union of the ukWaCKy.SP, WikiDBP.SP andKB.SP training sets. Note that if we train onlywith the ukWaCKy.SP training set and we evaluatewith the ukWaCKy.SP testing set we get an accu-racy of 0.86 which is similar to the results reportedin (Van de Cruys, 2014).

We also asses the performance of the mod-els on the task of meaningful query generation.We run the random query generation process over5 ontologies of different domains, namely cars,travel, wines, conferences and human disabilities.At each query expansion operation, we apply themodels to the sets of candidate concepts or rela-tions. We compare the DISCOMP and DRCOMP mod-els with a baseline cosine similarity (COS ) score5.For this score we use GloVe (Pennington et al.,2014) word embeddings and simple addition forcomposing multiword concept and relation names.We use a threshold of 0.3 that was determinedempirically6. During the query generation pro-cess, we registered the candidate sets as well as

5For the case of add candidate relations, the COS modelchecks for semantic relatedness between a subject conceptand the relation and between the subject concept and the ob-ject concept, i.e. (s,p) and (s,o)

6We compare the COS baseline plus a threshold of 0.3

43

Page 52: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

addRelation addCompatibleCOS DRCOMP COS DISCOMP

P 0.51 0.67 0.90 0.88R 0.30 0.33 0.41 0.85F 0.38 0.44 0.56 0.87S 0.79 0.88 0.77 0.46A 0.59 0.65 0.47 0.78

Table 3: Precision (P), recall (R), F-measure (F),specificity (S) and accuracy (A) results for theDISCOMP , DRCOMP and COS on the add compatiblerelation (addRelation) and add compatible concept(addCompatible) query expansion operations.

the predictions of the models. In total, we col-lected 67 candidate sets corresponding to the addcompatible relation query extension and 39 to theadd compatible operation. The candidate sets weremanually annotated with (in)compatibility humanjudgements. We use these sets as gold standardto compute precision, recall, f-measure and speci-ficity measures on the task of detecting incompati-ble candidates as well as the accuracy of the mod-els. Figure 3 shows one example for each of thequery expansion operations, the annotated candi-dates and the predictions done by each of the mod-els (only incompatibles are shown).

Table 3 shows the results. Unsurprisingly, giventhe quite strong similarity threshold used for theCOS baseline, we observe that it has good precisionat spotting incompatible candidates though quitelow recall. In contrast, as shown by the f-measurevalues the compatibility models seem to achievea better performance compromise for these mea-sures. We include the specificity measure as an in-dicative of the ability of the models to avoid falsealarms, that is, to avoid predicting a candidate asincompatible when it was not.

6 Conclusions and future work

We applied two compatibility models to getaround the lack of disjointness and domain restric-tions in ontologies and facilitate the (semi-) auto-matic generation of a large set of sensible user KBqueries. These compatibility models were previ-ously proposed for two semantic tasks. One forterm compatibility (Kruszewski and Baroni, 2015)and the other for selectional preference modelling(Van de Cruys, 2014). We automatically cre-ated training datasets from several text and knowl-

and the COS baseline with 0.5. Setting this threshold is re-ally a trade off between precision and recall. The use of the0.5 threshold resulted in rejection of most of the candidatesincluding compatible ones.

[Add compatible concept] [Assistant][CANDIDATES] [Author:0, SubjectArea:1, Administrator:0,Member PC:0, Science Worker:0, Volunteer:0, Scholar:0,Regular:1, Student:0][COS ] [Member PC][DISCOMP ] [SubjectArea, Volunteer, Regular]

[Add relation] [Poster][CANDIDATES] [dealsWith:0, writtenBy:0][COS ] [dealsWith][DRCOMP ] [ ]

Figure 3: Example of gold standard annotationsfor the add compatible concept and relation oper-ations and predictions done by the different sys-tems.

edge base resources with the intention of provid-ing more adequate training signal for our specifictask.

As future work, we aim at running a larger taskbased extrinsic evaluation of these models. Weplan to generate a set of KB queries, verbalisethem using techniques proposed in (Gardent andPerez-Beltrachini, 2016; Perez-Beltrachini andGardent, 2016) and ask for human judgementsabout meaningfulness of the generated queries. Inthis larger evaluation, we plan to test the modelson larger general purpose KBs such as DBPedia.

Further work for improving on the current re-sults could explore the adaptation of the models tospecific domain vocabularies and the use of bettercomposition modelling for multiwords conceptsand relations.

Acknowledgements

We thank the French National Research Agencyfor funding the research presented in this paper inthe context of the WebNLG project. We wouldalso like to thank Sebastian Walter for kindly pro-viding us with the MATOLL corpus and the vol-unteer annotator for contributing to the evaluation.

ReferencesMarco Baroni, Silvia Bernardini, Adriano Ferraresi,

and Eros Zanchetta. 2009. The wacky wideweb: a collection of very large linguistically pro-cessed web-crawled corpora. Language resourcesand evaluation, 43(3):209–226.

Marco Baroni, Georgiana Dinu, and GermanKruszewski. 2014. Don’t count, predict! asystematic comparison of context-counting vs.context-predicting semantic vectors. In ACL (1),pages 238–247.

44

Page 53: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Julien Corman, Laure Vieu, and Nathalie Aussenac-Gilles. 2015. Distributional semantics for ontol-ogy verification. In Proceedings of the Fourth JointConference on Lexical and Computational Seman-tics, pages 30–39, Denver, Colorado, June. Associa-tion for Computational Linguistics.

Enrico Franconi, Paolo Guagliardo, and Marco Tre-visan. 2010. Quelo: a NL-based intelligent queryinterface. In Pre-Proceedings of the Second Work-shop on Controlled Natural Languages, volume 622.

Enrico Franconi, Paolo Guagliardo, Marco Trevisan,and Sergio Tessaris. 2011. Quelo: an Ontology-Driven Query Interface. In Description Logics.

Andre Freitas, Joao Carlos Pereira da Silva, EdwardCurry, and Paul Buitelaar. 2014. A distributionalsemantics approach for selective reasoning on com-monsense graph knowledge bases. In Natural Lan-guage Processing and Information Systems, pages21–32. Springer.

Claire Gardent and Laura Perez-Beltrachini. 2016.A Statistical, Grammar-Based Approach to Micro-Planning. Computational Linguistics.

Paolo Guagliardo. 2009. Theoretical foundations ofan ontology-based visual tool for query formulationsupport. Master’s thesis, KRDB Research Centre,Free University of Bozen-Bolzano, October.

German Kruszewski and Marco Baroni. 2015. So sim-ilar and yet incompatible: Toward automated iden-tification of semantically compatible words. pages964–969.

Chris Mellish and Xiantang Sun. 2006. The semanticweb as a linguistic resource: Opportunities for natu-ral language generation. Knowledge-Based Systems,19(5):298–303.

Jeffrey Pennington, Richard Socher, and Christo-pher D. Manning. 2014. Glove: Global vectors forword representation. In Empirical Methods in Nat-ural Language Processing (EMNLP), pages 1532–1543.

Laura Perez-Beltrachini and Claire Gardent. 2016.Learning Embeddings to lexicalise RDF Properties.In *SEM 2016: The Fifth Joint Conference on Lexi-cal and Computational Semantics, Berlin, Germany.

Marıa Poveda-Villalon, Mari Carmen Suarez-Figueroa,and Asuncion Gomez-Perez. 2012. Validating on-tologies with oops! In International Conference onKnowledge Engineering and Knowledge Manage-ment, pages 267–281. Springer.

Alan Rector, Nick Drummond, Matthew Horridge,Jeremy Rogers, Holger Knublauch, Robert Stevens,Hai Wang, and Chris Wroe. 2004. Owl pizzas:Practical experience of teaching owl-dl: Commonerrors & common patterns. In In Proc. of EKAW2004, pages 63–81. Springer.

Fabian M Suchanek, Gjergji Kasneci, and GerhardWeikum. 2008. Yago: A large ontology fromwikipedia and wordnet. Web Semantics: Sci-ence, Services and Agents on the World Wide Web,6(3):203–217.

Tim Van de Cruys. 2014. A neural network approachto selectional preference acquisition. In Proceed-ings of the 2014 Conference on Empirical Methodsin Natural Language Processing (EMNLP), pages26–35.

Sebastian Walter, Christina Unger, and Philipp Cimi-ano. 2013. A corpus-based approach for the induc-tion of ontology lexica. In Natural Language Pro-cessing and Information Systems, pages 102–113.Springer.

45

Page 54: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 46–49,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

ReadME generation from an OWL ontologydescribing NLP tools

Driss Sadoun, Satenik Mkhitaryan, Damien Nouvel, Mathieu ValetteERTIM, INALCO, Paris, France

[email protected]

AbstractThe paper deals with the generation ofReadME files from an ontology-baseddescription of NLP tool. ReadME filesare structured and organised accordingto properties defined in the ontology.One of the problem is being able to dealwith multilingual generation of texts.To do so, we propose to map the ontol-ogy elements to multilingual knowledgedefined in a SKOS ontology.

1 IntroductionA ReadMe file is a simple and short writtendocument that is commonly distributed alongwith a computer software, forming part of itsdocumentation. It is generally written by thedeveloper and is supposed to contain basic andcrucial information that the user reads beforeinstalling and running the software.

Existing NLP software may range from un-stable prototypes to industrial applications.Many of them are developed by researchers,in the framework of temporary projects (train-ing, PhD theses, funded projects). As their useis often restricted to their developers, they donot always meet Information technology (IT)requirements in terms of documentation and re-usability. This is especially the case for under-resourced languages, which are often developedby researchers and released without standarddocumentation, or written fully or partly in thedeveloper’s native language.

Providing a clear ReadMe file is essential foreffective software distribution and use: a con-fusing one could prevent the user from usingthe software. However, there is no well estab-lished guidelines or good practices for writing aReadMe.

In this paper we propose an ontology-basedapproach for the generation of ordered andstructured ReadMe files for NLP tools. The on-tology defines a meta-data model built based ona joint study of NLP tool documentation prac-tices and existing meta-data model for languageresources (cf. section 2). Translation functions(TFs) for different languages (currently eight)are associated to ontology properties charac-terising NLP tools. These TFs are definedwithin the Simple Knowledge Organization Sys-tem (SKOS) (cf. section 2.2). The ontology isfilled via an on-line platform by NLP expertsspeaking different languages. Each expert de-scribes the NLP tools processing the languageshe speaks (cf. section 3). A ReadMe file is thengenerated in different languages for each tooldescribed within the ontology (cf. section 3).Figure 1 depicts the whole process of multilin-gual ReadMe generation.

OWLdesciption

platform

Ontology

SKOSOntology

ReadMemiltilingualgeneration

ReadMe

TFsdefinition

NLP Tool

viaan on-line

files

NLPexperts

Figure 1: ReadMe generation process

2 NLP tools ontologyThis work takes place in the framework ofthe project MultiTal which aims at makingNLP tool descriptions available through an on-line platform, containing factual informationand verbose descriptions that should ease in-stallation and use of considered NLP tools.This project involves numerous NLP experts

46

Page 55: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

in diverse languages, currently Arabic, English,French, Hindi, Japanese, Mandarin Chinese,Russian, Ukrainian and Tibetan. Our objec-tive is to take advantage of the NLP expertsknowledge both to retrieve NLP tools in theirlanguages and to generate multilingual ReadMefiles for the retrieved NLP tools. A first stepto reach this goal is to propose a conceptualmodel whose elements are as much independentas possible from the language. Then, associateto each conceptual element, a lexicalisation foreach targeted language.

2.1 Ontology conceptualisationIn order to conceptualise an ontology thatstructures and standardises the description ofNLP tools we proceeded to a joint study of:

• Documentation for various NLP toolsprocessing aforementioned languages thathave been installed and closely tested;

• A large collection (around ten thousands)of structured ReadMe in the Markdown for-mat, crawled from GitHub repositories;

• Meta-data models for Language Resources(LR) as the CMDI (Broeder et al., 2012) orMETA-SHARE meta-data model ontology(McCrae et al., 2015).

This study gave us guidelines to define bun-dles of properties sharing a similar semantic.For example, properties referring to the affili-ation of the tool (as hasAuthor, hasLaboratoryor hasProjet), to its installation or its usage.

We distinguish two levels of meta-data: 1)a mandatory level providing basic elementsthat constitute a ReadMe file and 2) a non-mandatory level that contains additional in-formation as relations to other tools, fields ormethods. These latter serve tools’ indexationwithin the on-line platform.

Figure 2 details the major bundles of prop-erties that we conceptualized to describe anNLP tool. The processed languages are definedwithin the bundle Task. Indeed, an NLP toolmay have different tasks which may apply todifferent languages.

As our ambition is to propose pragmatic de-scriptions detailing the possible installation andexecution procedures, we particularly focusedon the decomposition of these procedures intoatomic actions.

About

Installation

Affiliation

Licence

Task

ConfigurationDescription

Properties

Mandatory

Properties

Figure 2: Bundles of properties representingReadMe sections

2.2 Multilingual translation functionsWithin the ontology, NLP tools are charac-terised by their properties. Values allocated tothese properties are as much as possible inde-pendent of the language (date of creation andlast update, developer or license names, operat-ing system information, ...). Hence, what needsto be lexicalised is the semantic of each definedproperty. Each NLP expert associate to eachproperty a translation functions (TFs) that for-malise the lexical formulation of the property inthe language he speaks. TFs are defined oncefor each language. The amount of work havenot exceeded half a day per language to asso-ciate TFs to the around eighty properties of theontology. In order to ensure a clean separationbetween the conceptual and the lexical layer,TFs are defined within a SKOS ontology. TheSKOS ontology structure is automatically cre-ated from the OWL ontology. Thus, addinga new language essentially consists in addingwithin SKOS TFs in that particular languageto each OWL property. Translation functionsare of two kinds:

1. P(V1) ; * V1 *@lang

2. P(V1,V2) ; * V1 * V2 * or * V2 * V1 * @lang

with P a property, * a set of words that canbe empty, V1, V2 values of the property P and@lang an OWL language tag that determinesthe language in which the property is lexi-calised. Below, two examples of tranlation func-tions for Japanese that have been associated tothe properties authorFirstName and download.

• authorFirstName(V1) ; 作成者名: V1 @jp

• download(V1,V2) ; V2 から V1 をダウンロ–ドする。@jp

47

Page 56: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

3 Natural language generation ofmultilingual ReadMe files

In our framework, each NLP expert finds, in-stalls and uses available NLP tools processingthe language he speaks. Then, he describes ev-ery tool that runs correctly via an on-line plat-form connected to the ontology (cf. Figure 1).Elements of description do not only come froman existing ReadMe as if they exist, they arerarely exhaustive. Hence, experts also gathertool information from the web and during in-stalling and testing each tool.

At this step, the OWL ontology is filled andthe translated functions of each property aredefined within the SKOS ontology. Our aimis to generate ordered and structured ReadMefiles in different languages. To do so, we useNatural language generation (NLG) techniquesadapted to the Semantic Web (also named On-tology verbalisation) (Staykova, 2014; Bouayad-Agha et al., 2014; Cojocaru and Trãuşan Matu,2015; Keet and Khumalo, 2016). NLG can bedivided in several tasks (Reiter and Dale, 2000;Staykova, 2014). Our approach currently in-cludes: content selection, document structur-ing, knowledge aggregation, and lexicalisation.The use of more advanced tasks as referring ex-pression aggregation, linguistic realisation andstructure realisation is in our perspectives.

3.1 Ontology content selection andstructuring

Unlike the majority of ontology verbalisationapproaches, we do not intend to verbalise thewhole content of the ontology. We simply ver-balize properties and their values that charac-terise a pertinent information that have to ap-pear in a ReadMe file. The concerned propertiesare those which belong to the mandatory level(cf. section 2.1).

The structure of ReadMe files is formalizedwithin the ontology. First, ReadMe files areorganised in sections based on bundles of prop-erties defined in the ontology (cf. Figure 2).Within each section, the order of property ispredefined. Both installation and executionprocedures are decomposed to their atomic ac-tions. These actions are automatically num-bered according to their order of execution (cf.Figure 3). Different installation and execu-tion procedures may exist according the operat-

ing system (Linux, Windows, ...), architecture(32bits, 64bites, 86bits, ...), language platform(JAVA 8, Python 3, ...) and so on. As well,execution procedures depend on tasks the NLPtool performs and the languages it processes.Thus, each procedure is distinguished and itsinformation grouped under its heading. More-over, execution procedures are also ordered asan NLP tool may have to perform tasks in aparticular ordered sequence. This structuringis part of the ontology conceptualisation. Itconsists in defining property and sub-propertyrelations and in associating a sequence numberto each property that has to be lexicalised.

3.2 Ontology content aggregation andlexicalisation

Following the heuristics proposed in (Androut-sopoulos et al., 2014) and (Cojocaru andTrãuşan Matu, 2015) to obtain concise text,OWL property values are aggregated when theycharacterise the same object. For example, if anexecution procedure (epi) has two values for op-erating system (ex : Linux and Mac) then thetwo values are merged as the following below:

hasOS(epi,Linux) ∧ hasOS(epi,Mac)⇒ hasOS(epi,Linux and Mac)

The last step consists in property lexicalisa-tion. While a number of approaches rely onontology elements’ names and labels (often inEnglish) to infer a lexicalisation (Bontcheva,2005; SUN and MELLISH, 2006; Williams etal., 2011), in our approach, the lexicalisationof properties depend only on their translationfunctions. During the ontology verbalisation,each targeted language is processed one afterthe other. The TF of encountered propertiesfor the current language is retrieved and used tolexicalise the property. Property values are con-sidered as variables of the TFs. They are nottranslated as we ensure that they are as much aspossible independent of the language. Figure 3gives an example of two installation proceduresfor the NLP tool Jieba that processes Chinese.In this example, actions are lexicalised in En-glish. Furthermore, the lexicalised commandlines appear in between brackets.

As a result of this generation, all ReadMe fileshave the same structure, organisation and, asmuch as possible, level of detail, especially re-garding installation and execution procedures

48

Page 57: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

which represent the key information for a toolusage. The resulted texts are simple which suitsa ReadMe. However, it could be valuable to usemore advanced NLG techniques as referring ex-pression aggregation, linguistic realisation andstructure realisation to produce more less sim-plified natural language texts.

Procedure name: wget - ubuntu1- download jieba-0.38.zip via wget (wgethttps://pypi.python.org/packages/f6/86/9e721cc52075a07b7d07eb12bcb5dde771d35332a3dae1e14ae4290a197a/jieba-0.38.zip)2- unzip jieba-0.38.zip (unzip jieba-0.38.zip)3- go to the directory jieba-0.38 (cd jieba-0.38/)4- type the command: python setup.py install

Procedure name: pip - ubuntu1 - type the command: sudo pip install jieba

Figure 3: Two installation procedures of theNLP tool Jieba lexicalised in English.

4 ConclusionWe proposed an ontology-based approach forgenerating simple, structured and organisedReadMe files in different languages. Readmestructuring and lexicalisation is guided by theontology properties and their associated trans-lation functions for the targeted languages. Thegenerated ReadMes are intended to be accessi-ble via an on-line platform. This platform doc-uments in several languages NLP tools process-ing different languages. In a near future, weplan to evaluate the complexity for end-usersof different level of expertise to install and ex-ecute NLP tools using our generated ReadMefiles. We also hope that, as a side-product,the proposed conceptualisation may provide astarting point to establish guidelines and bestpractices that NLP tool documentation oftenlacks, especially for under-resourced languages.

ReferencesIon Androutsopoulos, Gerasimos Lampouras,

and Dimitrios Galanis. 2014. Generatingnatural language descriptions from OWL on-tologies: the naturalowl system. CoRR,abs/1405.6164.

Kalina Bontcheva, 2005. The Semantic Web:Research and Applications: Second EuropeanSemantic Web Conference, ESWC, chapterGenerating Tailored Textual Summaries fromOntologies, pages 531–545.

Nadjet Bouayad-Agha, Gerard Casamayor, andLeo Wanner. 2014. Natural language gen-eration in the context of the semantic web.Semantic Web, 5(6):493–513.

Daan Broeder, Dieter Van Uytvanck, MariaGavrilidou, Thorsten Trippel, and MenzoWindhouwer. 2012. Standardizing a com-ponent metadata infrastructure. In LREC,pages 1387–1390.

Dragoş Alexandru Cojocaru and ŞtefanTrãuşan Matu. 2015. Text generationstarting from an ontology. In Proceedings ofthe Romanian National Human-ComputerInteraction Conference - RoCHI, pages55–60.

C. Maria Keet and Langa Khumalo. 2016. To-ward a knowledge-to-text controlled naturallanguage of isizulu. Language Resources andEvaluation, pages 1–27.

John P. McCrae, Penny Labropoulou, JorgeGracia, Marta Villegas, Víctor Rodríguez-Doncel, and Philipp Cimiano, 2015. ESWC(Satellite Events), chapter One Ontology toBind Them All: The META-SHARE OWLOntology for the Interoperability of Linguis-tic Datasets on the Web, pages 271–282.

Ehud Reiter and Robert Dale. 2000. Build-ing Natural Language Generation Systems.Cambridge University Press.

Kamenka Staykova. 2014. Natural lan-guage generation and semantic technologies.Cybernetics and Information Technologies,14(2):3–23.

Xiantang SUN and Chris MELLISH. 2006. Do-main independent sentence generation fromrdf representations for the semantic web. InCombined Workshop on Language-EnabledEducational Technology and Developmentand Evaluation of Robust Spoken DialogueSystems, European Conference on AI.

Sandra Williams, Allan Third, and RichardPower. 2011. Levels of organisation in ontol-ogy verbalisation. In 13th European Work-shop on Natural Language Generation, pages158–163. Proceedings of the 13th ENLG.

49

Page 58: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 50–53,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Comparing the template-based approach to GF: the case of Afrikaans

Lauren SanbyComputer Science Dept.University of Cape TownCape Town, South Africa{lauren.sanby,ion.todd}@alumni.uct.ac.za

Ion ToddComputer Science Dept.University of Cape TownCape Town, South Africa

C. Maria KeetComputer Science Dept.University of Cape TownCape Town, South [email protected]

Abstract

Ontologies are usually represented inOWL that is not easy to grasp by domainexperts. A solution to bridge this gap is touse a controlled natural language or natu-ral language generation (NLG), which al-lows the knowledge in the ontology to berendered automatically into a natural lan-guage. Several approaches exist to re-alise this. We used both templates andthe Grammatical Framework (GF) and ex-amined the feasibility of each by develop-ing NLG modules for a language that hadnone: Afrikaans. The template system re-quires manual translation of the ontology’svocabulary into Afrikaans, if not alreadydone so, while the GF system can translatethe terms automatically. Yet, the templatesystem is found to produce more grammat-ically correct sentences and verbalises theontology slightly faster than the GF sys-tem. The template-based approach seemseasier to extend for future development.

1 Introduction

The knowledge acquisition bottleneck is wellknown for many years, and many proposals havebeen made to ameliorate this problem. One suchavenue is to avail of a natural language interface.This has gained traction in the Semantic Web com-munity in the past 10 years; two recent surveyson this intersection serve to illustrate its relevance(Bouayad-Agha et al., 2014; Safwat and Davis,2016). While a template-based approach to gen-erate natural language from OWL files is popu-lar (e.g., (Androutsopoulos et al., 2013; Third etal., 2011)), other approaches have been proposed,from ‘patterns’ (Keet and Khumalo, 2014) to spe-cific grammars for controlled natural languages

(Kuhn, 2013) to the comprehensive GrammaticalFramework (GF) that principally serves to trans-late between natural languages (Gruzitis et al.,2010; Ranta, 2011). It is not clear what wouldbe the ‘best’ approach and technology to generatesentences from OWL files, if any, which may de-pend more on the system requirements or on thegrammar of the language. To this end, we used afairly controlled experiment in building two NLGmodules that take OWL files as input for a lan-guage that had none—Afrikaans—by two peoplewith the similar background in computer sciencein the same time frame. The template-based ap-proach included a formal specification of correct-ness of encoding and a proof-of concept imple-mentation. The GF-based approach used GF andrequired substantial software development. Bothwere evaluated and compared.

The tools, source code, template speci-fication and GF file, test data, output files,and further information on design, proofs,and analyses of the experiments are onlineat: http://pubs.cs.uct.ac.za/honsproj/

cgi-bin/view/2015/sanby_todd.zip/.

2 Design of the verbalisers

The template-based approach followed an NLGsystem-oriented development process (Reiter,1997), focussing on: surface realisation as to whatshould go in the templates, a formal proof of cor-rectness of the templates with respect to OWL,and subsequent implementation. Surface realisa-tion included design choices; e.g., ‘must be’ vs.‘at least one’, which is illustrated here for ‘is partof’ in Tak v ∃is deel van.Boom:

(1) Elke tak moet deel van ‘n boom wees‘each branch must be part of a tree’

(2) Elke tak is deel van ten minste een boom‘each branch is part of at least one tree’

50

Page 59: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

noting that the second option is easier in atemplate-based approach, because then the nameof the OWL object property can be used directly inthe template rather than requiring additional stringand verb processing.

GF is a functional programming language thathas an abstract grammar as an intermediate lan-guage and a concrete grammar that defines howcomponents should be put together in a sentence(Ranta, 2011). The latter is language-dependentand thus needs to be changed when adding a newlanguage (e.g., (Angelov and Ranta, 2009)). GFhas an Afrikaans library, but it needed to be ex-tended so as to create the specific grammar filesneeded for verbalising OWL 2 DL ontologies inAfrikaans. The GF system also required softwaredevelopment for GF↔OWL file interaction, re-sulting in the architecture depicted in Fig. 1.

Figure 1: Architecture diagram for the GF ap-proach. Preprocessing focuses on ‘normalising’names from different ontologies, such as the prop-erty names is-part-of and isPartOf.

Let us illustrate the comparison of the declara-tive components with OWL’s disjoint classes ax-iom. The template is specified in XML as follows:<Constraint type="Disjoint">

<Text>’n</Text><Object index="0"/>

<Text>is nie ’n</Text><Object index="1"/>

<Text>nie</Text></Constraint>

and the corresponding GF concrete grammar as:DisjointClasses x y =

{s="’n "++x.s++"is nie ’n"++y.s++"nie"};

Then, with the first Object index, or x, being,say, Dier ‘animal’ and the second one (y) Plant‘plant’, then the sentence ‘n dier is nie ‘n plant nie‘an animal is not a plant’ is generated. These tem-

plates and GF concrete grammar have been spec-ified for most OWL 2 DL language features (seeonline material).

3 Verbalisations compared

Six ontologies were verbalised in Afrikaans. Dueto space limitations, we only include here a selec-tion of a range of types of axioms to illustrate theoutput and compare the two; the observations holdequally for the other axioms of the same type andacross ontologies. The wine ontology is used, asthat is an important subject domain in the WesternCape where Afrikaans is spoken widely.

Taxonomic subsumption of named classes for,e.g., Chianti v ItalianWine ‘Chianti is an ital-ian wine’ is verbalised as:

T: Elke Chianti is ’n ItalianWyn.

G: elke [Chianti] is [ItalianWyn]

where G (GF-based) misses the indeterminate ar-ticle ’n that T (template-based) correctly has.Equivalent classes for ‘Sweet wine is wine and hassugar with value sweet’ are verbalised as:

T: Elke SoetWyn (is Wyn en het suiker

ten minste een Soet)

G: ’n [SoetWyn] is [Wyn] en [HetSuiker]

[Soet]

The ontology is ambiguous in this regard due tothe use of “value”, and therewith making it un-clear whether ten minste een ‘at least one’ is thethe best. A class expression with a disjunctionor a one-of construct uses of ‘or’ in both cases,illustrated here with a one-of in Zinfandel v∀hasF lavor.{Strong,Moderate} ‘Each Zin-fandel has flavour only strong or moderate’:

T: Elke Zinfandel het net geur Gematig

of Sterk

G: elke [Zinfandel] is iets wat

[Gematig] of [Sterk] [HetGeur]

noting that T’s net ‘only’ is more precise with re-spect to the universal quantification in the axiom itverbalises and G’s hetGeur ‘has flavour’ is in thewrong place. Disjointness is correct in both cases,as in, e.g., LateHarvest v ¬EarlyHarvest‘Late harvest is not an early harvest’:

T: ’n Laatoes is nie ’n VroeeOes nie.

G: ’n [Laatoes] is nie ’n [VroeeOes]

nie

Object property range has valid alternatives; e.g.,‘has sugar has as range wine sugar’:

T: Iets het net suiker WynSuiker

51

Page 60: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

G: Elke ding wat is [HetSuiker] , is

[WynSuiker]

Inverse object properties also has two valid al-ternatives, although T is easier to read thanks tothe shorthand teenoorgestelde ‘opposite/inverse’,as in, e.g., ‘made from grape is inverse of madeinto wine’:

T: "gemaak in wyn" is die

teenoorgestelde van "is gemaak van

druiwe" (As X gemaak in wyn Y; Y is

gemaak van druiwe X.).

G: As X [GemaakInWyn] Y dan Y

[IsGemaakVanDruiwe] X. As X

[IsGemaakVanDruiwe] Y dan Y

[GemaakInWyn] X.

The verbalisation of a functional property is cor-rect and understandable, respectively, althoughthey ignores whether a domain was declared,which was ‘wine’ in ‘Each thing has color at mostone wine color’:

T: Elke objek kan net een kleur he.

G: Elke ding het een [HetKleur]

The templates cover object property characteris-tics better; e.g., with oorganklike ‘transitive’ in T,where G misses the implication, and G does nothave a verbalisation for symmetric.

ABox assertions are correct in both approachesas well, although the template-based one is againmore readable also for different individuals, likein ‘OffDry, dry, and sweet are all different’, foralmal ‘all [of them]’ fits with a sequence of morethan two items compared:

T: AfDroe en Droe en Soet is almal

verskillend.

G: [AfDroe] , [Droe] en [Soet] is al

verskillende

The verbalisation of object subproperty was incor-rect in both cases. G verbalised it as symmetric,whereas T stringed the two verbalised parts of theaxiom together in the wrong order. Notwithstand-ing, we include it here, because it demonstratesthat referring expressions were incorporated cor-rectly, indicated with dit ‘it’:

T: As iets het wyn descriptor, dit het

suiker.

(which ought to have read ‘if something has sugarthen it has a wine descriptor’). Data propertieswere not included in the GF-based approach, butwere in the template-based approach, which con-tributes to the difference in number of axioms ver-balised (see Table 1). As the template-based ap-

proach took less time to develop, this freed uptime to make the sentence more natural language-like, such as addressing the capitalisation andchanging the object property names from, e.g.,IsGemaakVanDruiwe ‘madeFromGrape’ (as in thewine ontology) to is gemaak van druiwe.

4 Evaluation

We conducted two experiments with the proof-of-concept software to compare the two approaches.

4.1 Experiment set-up

The template and GF-based programs were testedusing six OWL ontologies. Measures such a num-ber of axioms verbalised and time taken were col-lected. In addition, a general comprehensibility ofthe verbalisation evaluation was conducted with ahuman domain expert who is an Afrikaans mothertongue speaker. The six sets of sentences were as-signed a quality category on a 5-point Likert scale:1. Incomprehensible; 2. Almost completely in-comprehensible; 3. Somewhat understandable; 4.Understandable but obvious errors; 5. Easy to un-derstand no obvious errors.

4.2 Results and Discussion

Table 1 includes the main quantitative results. Al-though the template program does not have 100%coverage for any of the ontologies, the only miss-ing axioms are those that are explicitly ignored(data types, keys, universal class and property).The “sentences written” column shows that thereis a difference in number of sentences generatedfor all the ontologies for that reason, whose de-tailed analysis is included in the online supple-mentary material.

The evaluation of the sentence quality is shownin Table 2. The quality of the template-based ap-proach is higher on average, though not statis-tically significantly (Mann-Whitney, p=0.12852).This is mainly because more time was available torefine the templates than GF’s concrete grammar.

As can be seen in Table 1, the template-basedapproach outperformed the GF-based approach onalmost all measured metrics. It should be noted,however, that the GF-based approach includes alsoa translation module and generates the GF filesdynamically. Also the GF↔OWL file interactiontook extra time to develop, taking away time to re-fine the GF-based verbalisations. The on-the-flytranslation takes more time to compute the results

52

Page 61: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Ontologies | ax.| Ax. verb. Pct. verb. Sent. written Time (s) Time/sent. (ms)T G T G T G T G T G

Pizza 712 707 711 99.3 99.9 707 711 1.7 2.1 2.4 3.0AfricanWildlife

56 55 56 98.2 100.0 57 57 1.3 1.7 23.2 30.9

ComputerScience

52 48 44 92.3 84.6 48 44 1.1 1.7 23.1 38.4

Wine 657 635 628 96.7 95.6 635 628 5.6 5.5 8.8 8.7University 95 91 94 95.8 99.0 91 94 1.1 1.4 12.0 15.2Stuff 136 134 110 98.5 80.9 175 110 1.2 1.5 6.9 13.7Average 96.8 93.3 285.5 273.8 2.0 2.3 12.7 18.3

Table 1: Percentage ontology verbalisation for templates (T) and the GF program (G); |ax.| = totalnumber of axioms; ax. verb. = axioms verbalised, pct. verb. = percentage verbalised; sent. = sentences.

Ontology Template GFAfrican Wildlife 5 4Computer Science 4 4Pizza 5 3Stuff 4 2University 4 4Wine 4 4Average 4.4 3.5

Table 2: Qualitative evaluation for the Grammar-based and template-based approaches.

compared to matching the OWL file with the tem-plates in the XML file. While this is relatively mi-nor with a small ontology, for a user to wait 6 sec-onds even with the wine ontology might becomeprohibitively slow with larger ontologies as wellas in use cases that require runtime sentence gen-eration. Finally, what also contributed to the tem-plate’s success is Afrikaans, which has very fewmorphological issues and not a complex systemof concordial agreement like, e.g., isiZulu (Keetand Khumalo, 2014), that is also spoken in SouthAfrica.

5 Conclusions

There is no clear ‘winner’ between a template-based approach and GF when one has to start fromscratch with a natural language that is relativelyamenable to templates, such as Afrikaans. Bothare feasible, with the GF-based approach requiringmore upfront investment and the template-basedapproach being easier to understand and thereforeeasier to refine and extend, provided a multilingualsystem does not become a requirement.

ReferencesI. Androutsopoulos, G. Lampouras, and D. Galanis.

2013. Generating natural language descriptionsfrom owl ontologies: the naturalowl system. Jour-nal of Artificial Intelligence Research, 48:671–715.

K. Angelov and A. Ranta. 2009. Implementing con-trolled languages in GF. In Proc. of CNL’09.

N. Bouayad-Agha, G. Casamayor, and L. Wanner.2014. Natural language generation in the context ofthe semantic web. Semantic Web J., 5(6):493–513.

N. Gruzitis, G. Nespore, and B. Saulite. 2010. Verbal-izing ontologies in controlled baltic languages. InProc. of HLT–The Baltic Perspective, volume 219 ofFAIA, pages 187–194. IOS Press. Riga, Latvia.

C. M. Keet and L. Khumalo. 2014. Basics fora grammar engine to verbalize logical theories inisiZulu. In Proc. of RuleML’14, volume 8620 ofLNCS, pages 216–225. Springer. Aug. 18-20, 2014,Prague, Czech Republic.

T. Kuhn. 2013. A principled approach to grammars forcontrolled natural languages and predictive editors.J. of Logic, Lang. and Inf., 22(1):33–70.

A. Ranta. 2011. Grammatical Framework: Program-ming with Multilingual Grammars. CSLI Publica-tions, Stanford.

R. Reiter, E. & Dale. 1997. Building applied natu-ral language generation systems. Natural LanguageEngineering, 3:57–87.

H. Safwat and B. Davis. 2016. CNLs for the semanticweb: a state of the art. Language Resources & Eval-uation, in print:DOI: 10.1007/s10579–016–9351–x.

A. Third, S. Williams, and R. Power. 2011. OWLto English: a tool for generating organised easily-navigated hypertexts from ontologies. poster/demopaper, Open Unversity UK. ISWC’11, 23-27 Oct2011, Bonn, Germany.

53

Page 62: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 54–57,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Generating Paraphrases from DBPedia using Deep Learning

Amin SleimiUniversite de Lorraine, Nancy (France)

[email protected]

Claire GardentCNRS/LORIA, Nancy (France)[email protected]

Abstract

Recent deep learning approaches to Nat-ural Language Generation mostly rely onsequence-to-sequence models. In theseapproaches, the input is treated as a se-quence whereas in most cases, input togeneration usually is either a tree or agraph. In this paper, we describe an exper-iment showing how enriching a sequentialinput with structural information improvesresults and help support the generation ofparaphrases.

1 Introduction

Following work by (Karpathy and Fei-Fei, 2015;Kiros et al., 2014; Vinyals et al., 2015; Fang etal., 2015; Xu et al., 2015; Devlin et al., 2014;Sutskever et al., 2011; Bahdanau et al., 2014; Lu-ong et al., 2014), there has been much work re-cently on using deep learning techniques to gen-erate text from data. (Wen et al., 2015) uses re-current neural network to generate text from dia-log speech acts. Using biography articles and in-foboxes from the WikiProject Biography, (Lebretet al., 2016) learns a conditional neural languagemodel to generate text from infoboxes. etc.

A basic feature of these approaches is that boththe input and the output data is represented as asequence so that generation can then be modeledusing a Long Short Term Memory Model (LSTM)or a conditional language model.

Mostly however, the data taken as input by nat-ural language generation systems is tree or graphstructured, not linear.

In this paper, we investigate a constrainedgeneration approach where the input is enrichedwith constraints on the syntactic shape of the sen-tence to be generated. As illustrated in Figure 1,there is a strong correlation between the shape

T1A B Cmission operator

S1.1 A participated in mission B operated by C

S1.2 A participated in mission B which was operated by C

T2

A

D

E

occupation

birthPlace

S2.1 A was born in E. She worked as an engineer.

S2.2 A was born in E and worked as an engineer.

Figure 1: Input and Output Shapes (A = SusanHelms, B = STS 78, C = NASA, D = engineer,E = Charlotte, North Carolina).

of the input and the shape of the correspondingsentence. The chaining structure T1 where B isshared by two predications (mission and operator)will favour the use of a participial or a passivesubject relative clause. In contrast, the treestructure T2 will favour the use of a new clausewith pronominal subject or a coordinated VP.Using synthetic data, we explore different waysof integrating structural constraints in the train-ing data. We focus on the following two questions.

1. Does structural information improve perfor-mance ?

We compare an approach where the structure ofthe input and of the corresponding paraphrase ismade explicit in the training data with one whereit is left implicit. We show that a model trained ona corpus making this information explicit helpsimprove the quality of the generated sentences.

2. Can structural information be used togenerate paraphrases ?

54

Page 63: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Our experiments indicates that training on cor-pora making explicit structural information in theinput data permits generating not one but severalsentences from the same input.

A

B

C

D

mission

birthDatebirthPlace

Figure 2: Example Input Graph (Subject and Ob-ject names have been replaced by capital letters)

In this first case study, we restrict our-selves to input data of the form illustrated inFigure 2 (i.e., input data consisting of threeDBPedia triples related by a shared subject(e p1 e1) (e p2 e2) (e p3 e3)) and explore differ-ent strategies for learning to generate paraphrasesusing the sequence-to-sequence model describedin (Sutskever et al., 2011).

2 Training Corpus

To learn our sequence-to-sequence models forgeneration and to test our hypotheses, we build asynthetic training data-to-text corpus for genera-tion which consists of 18 397 (data,text) pairs splitinto 11039 pairs for training, 7358 for develop-ment and 7358 for testing.

We build this corpus by extracting data fromDBPEdia using SPARQL queries and by generat-ing text using an existing surface realiser. As aresult, each training item associates a given inputshape (the shape of the RDF tree from DBPedia)with several output shapes (the syntactic shapes ofthe sentences generated from the RDF data by oursurface realiser). Figure 3 shows an example inputdata and the corresponding paraphrases.

2.1 DataRDF triples consist of (subject property object) tu-ples such as (Alan Bean occupation Test pilot).As illustrated in Figure 1, RDF data can be rep-resented by a graph in which edges are labelledwith properties and vertices with subject and ob-ject resources.

To construct a corpus of RDF data units whichcan serve as input for NLG, we retrieve sets ofRDF triples from DBPedia SPARQL endpoint.

Given a DBPedia category (e.g., Astronaut), wedefine a SPARQL query that searches for all en-tities of this category which have a given set ofproperties. The query then returns all sets ofRDF triples which satisfy this query. For instance,for the category Astronaut , we use the SPARQLquery shown in Figure 4. Using this query, we ex-tract sets of DBPedia triples corresponding to 634entities (astronauts).

2.2 Text

To associate data with text, we build lexical entriesfor DBPedia properties and use a small handwrit-ten grammar to automatically generate text fromsets of DBPedia triples using the GenI generator(Gardent and Kow, 2007).

Lexicon. The lexicon is constructed semi-automatically by tokenizing the RDF triples andcreating a lexical entry for each RDF resource.Subject and Object RDF resources trigger the au-tomatic creation of a noun phrase where the stringis simply the name of the corresponding resource(e.g., John E Blaha, San Antonio, ...). For properties,we manually create verb entries and assign eachproperty a given lexicalisation. For instance, theproperty birthDate is mapped to the lexicalisationwas born on.

Grammar. We use a simple Feature-Based Lex-icalised Tree Adjoining Grammar which capturescanonical clauses (1a), subject relative clauses(1b), VP coordination (1c) and sentence coordi-nation (1d). Given this grammar, the lexicon de-scribed in the previous section and the RDF tripleshown in (1a), the GenI generator generates thefive verbalisations shown in five (1b-f).

(1) a. John E Blaha was born on 1942 08 26

b. John E Blaha who was born in San Antonio

worked as a fighter pilot

c. John E Blaha was born on 1942 08 26 and worked

as a fighter pilot.

d. John E Blaha was born on 1942 08 26. He is from

United States

e. John E Blaha was born on 1942 08 26 . He was

born in San Antonio and worked as a fighter pilot

55

Page 64: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Input (JohnBlaha birthDate 1942 08 26 ) (JohnBlaha birthPlace SanAntonio) (JohnBlaha occupation Fighterpilot)

Simpl.Input JohnBlaha birthDate 1942 08 26 birthPlace SanAntonio occupation Fighterpilot

S1 John Blaha who was born on 1942 08 26 was born in San Antonio. He worked as Fighter pilot

S2 John Blaha was born on 1942 08 26 and worked as Fighter pilot. He was born in San Antonio

S3 John Blaha was born on 1942 08 26 and was born in San Antonio. He is from United States

S4 John Blaha was born on 1942 08 26. He was born in San Antonio and worked as Fighter pilot

S5 John Blaha was born on 1942 08 26 . He is from United States and was born in San Antonio

C-Input JohnBlaha ( birthDate 1942 08 26) birthPlace SanAntonio . occupation Fighterpilot

Figure 3: Example Data, Associated Paraphrases and Constrained Input from the Training Corpus

1 [2 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>3 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>4 PREFIX foaf: <http://xmlns.com/foaf/0.1/>5 PREFIX dbo: <http://dbpedia.org/ontology/>6

7 SELECT ?x ?birthDate (SAMPLE(?bP) as ?birthPlace)8 ?deathDate (SAMPLE(?dP) as ?deathPlace) ?occupation9 ?status ?nationality ?mission

10 WHERE {11

12 ?x rdf:type <http://dbpedia.org/ontology/Astronaut> .13 OPTIONAL {?x dbpedia2:birthPlace ?bP . }14 OPTIONAL {?x dbpedia2:birthDate ?birthDate .}15 OPTIONAL {?x dbpedia2:deathPlace ?dP .}16 OPTIONAL {?x dbpedia2:deathDate ?deathDate .}17 OPTIONAL {?x dbpedia2:occupation ?occupation .}18 OPTIONAL {?x dbpedia2:status ?status .}19 OPTIONAL {?x dbpedia2:nationality ?nationality .}20 OPTIONAL {?x dbpedia2:mission ?mission .}21

22 }23 ]

Figure 4: The sparql query to DBPedia endpoint for the Astronaut corpus

56

Page 65: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

3 Learning

To learn a sequence-to-sequence model that cangenerate sentences from RDF data, we use theneural model described in (Sutskever et al., 2011)and the code distributed by Google Inc1.

We experiment with different versions of thetraining corpus.

Raw corpus (BL). This is our a baseline system.In this case, the model is trained on the corpusof (data,text) pairs as is. No explicit informationabout the structure of the output is added to thedata.

Raw Corpus+Structure Identifier (R+I). Eachinput data is associated with a structure identifiercorresponding to one of the five syntactic shapesshown in Figure 3.

Raw corpus+Infix Connectors (R+C). The in-put data is enriched with infix connectors where &specifies conjunction, parentheses indicate a rela-tive clause and “.” sentence segmentation. The lastline in Figure 3 shows the R+C input for S1.

4 Evaluation and Results.

We evaluate the results by computing the BLEU-4score of the generated sentences against the refer-ence sentence. Table 1 shows the results.

The baseline and the R+I model have very lowresults. For the baseline model, this indicates thattraining on a corpus where the same input is as-sociated with several distinct paraphrases makeit difficult to learn a good data-to-text generationmodel.

The marked difference between the R+I andthe RI+C model shows that simply associatingeach input with an identifier labelling the syntac-tic structure of the associated sentence is not suffi-cient to learn a model that should predict differentsyntactic structures for differently labelled inputs.Interestingly, training on a corpus where the inputdata is enriched with infixed connectors giving in-dications about the structure of the associated sen-tence yields much better results.

5 Conclusion

Using synthetic data, we presented an experimentwhich suggests that enriching the data input to

1https://github.com/tensorflow/tensorflow/tree/master/tensorflow/models/rnn/translate

System S1 S2 S3 S4 S5BL 3.6 5.9 6.6 5.9 7.5R+I 4.0 6.5 6.9 6.5 8.2R+C 98.2 91.7 91.6 88.8 89.1

Table 1: BLEU-4 scores

generation with information about the correspond-ing sentence structure (i) helps improve perfor-mance and (ii) permits generating paraphrases.

Further work involves threee main directions.First, the results obtained in this first case study

should be tested for genericity . That is the syn-thetic data approach we presented here should betested on a larger scale taking into account inputstructures of different types (chaining vs branch-ing) and different sizes.

Second, the approach should be extended andtested on “real data” i.e., on a training corpuswhere the DBPEdia triples used as input data areassociated with sentences produced by humansand where there is consequently, no direct infor-mation about their structure.

Third, we plan to investigate how various deeplearning techniques, in particular, recursive neuralnetworks, could be used to capture the correlationbetween input data and sentence structure.

Acknowledgments

We thank the French National Research Agencyfor funding the research presented in this paperin the context of the WebNLG project ANR-14-CE24-0033.

References

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-gio. 2014. Neural machine translation by jointlylearning to align and translate. arXiv preprintarXiv:1409.0473.

Jacob Devlin, Rabih Zbib, Zhongqiang Huang, ThomasLamar, Richard M Schwartz, and John Makhoul.2014. Fast and robust neural network joint modelsfor statistical machine translation. In ACL (1), pages1370–1380. Citeseer.

Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh KSrivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xi-aodong He, Margaret Mitchell, John C Platt, et al.2015. From captions to visual concepts and back. InProceedings of the IEEE Conference on ComputerVision and Pattern Recognition, pages 1473–1482.

57

Page 66: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

C. Gardent and E. Kow. 2007. A symbolic approach tonear-deterministic surface realisation using tree ad-joining grammar. In ACL07.

Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descrip-tions. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, pages3128–3137.

Ryan Kiros, Ruslan Salakhutdinov, and Richard SZemel. 2014. Unifying visual-semantic embed-dings with multimodal neural language models.arXiv preprint arXiv:1411.2539.

R. Lebret, D. Grangier, and M. Auli. 2016. GeneratingText from Structured Data with Application to theBiography Domain. ArXiv e-prints, March.

Minh-Thang Luong, Ilya Sutskever, Quoc V Le, OriolVinyals, and Wojciech Zaremba. 2014. Addressingthe rare word problem in neural machine translation.arXiv preprint arXiv:1410.8206.

Ilya Sutskever, James Martens, and Geoffrey E Hin-ton. 2011. Generating text with recurrent neuralnetworks. In Proceedings of the 28th InternationalConference on Machine Learning (ICML-11), pages1017–1024.

Oriol Vinyals, Alexander Toshev, Samy Bengio, andDumitru Erhan. 2015. Show and tell: A neural im-age caption generator. In Proceedings of the IEEEConference on Computer Vision and Pattern Recog-nition, pages 3156–3164.

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young. 2015.Semantically conditioned ltsm-base natural lan-guage generation for spoken dialogue systems. InProceedings of the 2015 Conference on EmpiricalMethods in Computational Linguistics, pages 1711–1721.

Kelvin Xu, Jimmy Ba, Ryan Kiros, Aaron Courville,Ruslan Salakhutdinov, Richard Zemel, and YoshuaBengio. 2015. Show, attend and tell: Neural im-age caption generation with visual attention. arXivpreprint arXiv:1502.03044.

58

Page 67: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 59–66,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Automatic Tweet Generation From Traffic Incident Data

Khoa TranSchool of Computing Science

Simon Fraser UniversityBurnaby, BC, CANADAkhoa [email protected]

Fred PopowichSchool of Computing Science

Simon Fraser UniversityBurnaby, BC, [email protected]

Abstract

We examine the use of traffic informa-tion with other knowledge sources to auto-matically generate natural language tweetssimilar to those created by humans. Weconsider how different forms of informa-tion can be combined to provide tweetscustomized to a particular location and/orspecific user. Our approach is based ondata-driven natural language generation(NLG) techniques using corpora contain-ing examples of natural language tweets.It specifically draws upon semantic dataand knowledge developed and used in theweb based Connected Vehicles and SmartTransportation system. We introduce analignment model, generation model andlocation-based user model which will to-gether support location-relevant informa-tion delivery. We provide examples of oursystem output and discuss evaluation is-sues with generated tweets.

1 Introduction

Traffic congestion continues to be a major prob-lem in large cities around the world, and a sourceof frustration for commuters, commercial drivers,tourists, and even occasional drivers. Current ef-forts to reduce congestion and frustration ofteninvolve providing road users with real-time traf-fic information to help estimate travel time accu-rately, resulting in better route planning and traveldecisions (Tseng et al., 2013). The different ap-proaches to deliver traffic information include ra-dio, smart navigation devices and social networks.

Information from radio and social networks isdelivered as messages, which consist primarilyof natural language. When delivered on smartnavigation devices, information is presented with

colour and icons on interactive maps; for example,congested road segments are usually in red whileclear road segments are in green 1.

Text and audio messages associated with radioand social network channels are mainly human-generated, requiring time and effort. The infor-mation sources for these messages primarily usesthe same data used as smart navigation devicesin conjunction with camera images, eye-witnessreports and other sources, which collectively re-quire substantial effort and time. Although sev-eral social network channels may use computerprograms (i.e., “bots”) to generate messages auto-matically from a data source, these messages areconstructed using strict templates which appear tousers as cold, unnatural, distant and unreliable.

2 Our Approach

We look at the role of natural language generation(NLG) in the context of a system that automat-ically generate messages about traffic incidents.Our approach is based on data-driven NLG tech-niques where corpora containing examples of nat-ural language tweets are used to train the modelto generate natural language texts. It draws uponsemantic data and knowledge developed and usedin the web based Connected Vehicles and SmartTransportation (CVST) system (Tizghadam andLeon-Garcia, 2015). We introduce an alignmentmodel, generation model and location-based usermodel which together support location-relevant in-formation delivery.

We design a traffic notification system havinga location-based user model to predict a user’sroutes and deliver real-time notifications if trafficincidents occur. Figure 1 shows the design of ourproposed system. The GPS location of a user is

1Google Maps uses this colour code as described inhttps://support.google.com/maps/answer/3092439?hl=en&rd=1

59

Page 68: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Figure 1: The traffic notification system that notifies location-relevant traffic information for road users.

processed through the location-based user model.It predicts a ranked list of routes and destinationsthe user could take. Concurrently, a live stream oftraffic incidents is collected and forwarded to thelocation-relevant information filter. This compo-nent applies a location filter on the traffic incidentdata based on the predicted user’s routes and desti-nations. The output is the data scenarios of trafficincidents that happen on or nearby the routes theuser may take. Next, the generation model com-poses short messages describing nearby traffic in-cidents. These messages are sent to users as tex-tual or speech notifications using a text-to-speechsystem.

We construct our corpus from two data-setsfrom the CVST APIs 2. The first data-set is a col-lection of 13,667 tweets mentioning traffic inci-dents in the greater Toronto area of Canada. Thesecond data-set consists of 27,795 records con-cerning road incidents in greater Toronto. Usingincident times and locations from the two data-sets, we are able to match tweets with the road in-cidents to construct a corpus of road incidents withtheir corresponding tweets. We also explore othertraffic related data-sets that can be used to train oursystem. However, using such data is restricted asdiscussed in Section 4.1. On the other hand, foreach road incident in our constructed corpus, weare able to collect more than one tweet from dif-ferent users mentioning the event. Utilizing thesehuman-generated texts ensures better output qual-ity for our NLG system.

Using the constructed corpus, we apply an ex-isting semantic alignment model (Section 4.2) tolearn the semantic correspondences between data

2http://portal.cvst.ca

records and their textual descriptions in the tweets.Then, we apply a model for concept-to-text gener-ation (Section 4.3) to generate tweets about traf-fic incidents from given records. However, oursystem’s output is not limited to tweet generation.Output can be personalized, for example, as a vir-tual assistant, to generate traffic notifications forusers based on their driving routines incorporat-ing daily routes, departure and arrival times, andspecific locations. Previous work on capturingusers’ locations and route prediction (Section 4.4)can be applied to select only the potentially user-interested traffic information and deliver it to thedrivers.

The evaluation of automatically generatedtweets can be approached from several perspec-tives. Evaluation in the context of the task outlinedin Figure 1 can involve human subjects, looking atmetrics such as the usefulness of tweets (using rat-ing criteria like those in rating the helpfulness ofreviews or comments), and the quality of tweets(involving fluency and readability). Detailed hu-man evaluation of the tweets is beyond the scopeof the current research. Our plan is to focus onautomated techniques in the evaluation of the au-tomatically generated tweets, given that we have agold-standard of human generated data. To evalu-ate our models, we build on previous evaluationtechniques such as BLEU and METEOR (Kon-stas, 2014).

3 Related work

Recent work in automatic tweet generation fo-cuses on the tasks of automatic text summariza-tion and topic classification. Lloret and Palo-mar (2013) present a framework that automatically

60

Page 69: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

generates twitter headlines for journal articles us-ing text summarization approaches. Analogously,by applying different text processing techniquesincluding content grouping, topic classificationand text summarization, Lofi and Krestel (2012)develop a system that generates tweets from gov-ernment documents. Krokos and Samet (2014)also utilize several sentiment analysis and classifi-cation methods in their approach to automaticallydiscover and generate hashtags for tweets that donot have user-generated hashtags. On the otherhand, Sidhaye and Cheung (2015) use differentmetrics and statistics to show that most tweets can-not be re-constructed from the original articles thatthey reference; concluding that applying extrac-tive summarization methods to generate indicativetweets could be a limitation.

Our work focuses on another aspect wheretweets are constructed from structured data, and inour case the data can come from a real-time webapplication. Our generation task is also known asdata-to-text, concept-to-text or linguistic descrip-tion of data generation. The domain of our NLGwork is novel with respect to the previous workin different domains including weather forecasts(Ramos-Soto et al., 2015a), educational reports(Bontcheva and Wilks, 2004; Ramos-Soto et al.,2015b) and clinical reports (Portet et al., 2009).

Although the results from previous work arepromising and some proven to be better thanhuman-generated content, they still have limita-tions. Most current approaches are based on veryspecific rules or grammars. Therefore, adapt-ing these systems to a different data-set or do-main usually requires re-designing the entire sys-tem again. However, data-driven techniques areapplicable to different domains (Liang et al., 2009;Angeli et al., 2010; Kim and Mooney, 2010; Kon-stas, 2014). These approaches define probabilisticmodels that can be trained to learn the patterns andhidden alignments between data and text, therebyavoiding the construction of rules and grammarsthat require domain-specific knowledge.

We focus on a generation system that can ap-ply to different types of traffic data-sets (roadclosures, road incidents, traffic flow, etc.). Weuse an existing alignment model (Liang et al.,2009) to learn the semantic correspondences be-tween traffic data and its textual description. Kon-stas (2014)’s concept-to-text generation approachis used for the automatic generation of tweets to be

ultimately incorporated into a real-time system.Overall, we chose our data-driven approach

since it is location independent; different citieshave different kinds of traffic data, informationroad closures, road incidents, road conditions eachwith different kinds of data structures. We canhandle different data-sets without changing themodel structure, incorporating it into an end-to-end system with surface realisation and contentplanning in one model.

4 The Task

A data entry d consists of a set of records r ={r1, r2, ..., rn}. Each record is described with arecord type ri.t, 1 ≤ i ≤| r |, and a set of fieldsf . Each field fj ∈ f , 1 ≤ j ≤| f |, has a fieldtype fj .t and a value fj .v. A scenario in the train-ing corpus is a pair of (d,w) where w is the textdescribing the data entry d. Our goal is to traina model that represents the hidden alignments be-tween data entry d and the observed text w in thetraining corpus. Then, the trained model that cap-tures the alignment is used to generate text g froma new entry d not contained in the training corpus.

4.1 Dataset

There are various types of traffic-related data in-cluding traffic flow, traffic incidents, road con-structions and road closures. Such data is usu-ally available through different map and road nav-igation APIs such as Tom Tom Traffic3, GoogleMaps4 and Bing Maps5 or government open datasources. Despite the wide availability of traffic-related data, most of the data are only useful forvisualisation purposes since they lack the corre-sponding textual descriptions. A few of the datasources have a short description associated witheach data entry such as Dublin City Council’s roadworks and maintenance6 and Bing Maps’ TrafficIncidents7. However, the text description is notsufficiently detailed to cover essential informationin the data entry.

The CVST project has APIs for different traffic-related data-sets of greater Toronto area includingtraffic cameras and sensors, road closures and in-

3http://developer.tomtom.com/4https://developers.google.com/maps/5https://www.bingmapsportal.com/6https://data.dublinked.ie/dataset/roads-maintenance-

annual-works-programme7https://msdn.microsoft.com/en-

us/library/hh441726.aspx

61

Page 70: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

cidents, public transportation and tweets. We usetwo data-sets from the CVST APIs, road incidentsand twitter incidents, to construct our corpus. Theroad incidents data-set has details about traffic in-cidents such as time, location, type and reason.The twitter incidents data-set contains basic infor-mation about the incident and its related tweets.

By matching times and locations of records inthe two data-sets, we construct a parallel corpusof traffic incidents with their related tweets. How-ever, the times and locations from the two data-sets are not always exactly matched. Therefore,we allow errors when matching these values. Weconsider two incidents from two data-sets to bematched if:

• the events’ locations are within 100 metersfrom each other,

• and the events’ start times are within 90 min-utes of each other

The data is collected from January 2015 to May2016. There are 27,795 records in road incidentdata-set and 13,134 records with 13,667 tweets inthe twitter incident data-set (some records havemore than one associated tweets). After match-ing the two data-sets using the described rules, wehave a corpus of 1,388 incidents and 2,829 tweets.The tweets are crawled from Twitter and are gen-erated by both humans and machines.

4.2 The alignment modelLiang et al. (2009) introduce a hierarchical semi-Markov model to learn the correspondences be-tween a world state and an unsegmented stream oftext. Their approach is a generative process withthree main components:

• Record choice: choose a sequence of recordsr = (r1, ..., r|r|) where each ri ∈ d and hasa record type ri.t. The choice of consecutiverecords depends on their types.

• Field choice: for each chosen record ri, selecta sequence of fields fi = (fi1, ..., fi|fi|) whereeach fij ∈ {1, ...,m}.

• Word choice: for each chosen field fij ,choose a number cij > 0 and generate a se-quence of cij words.

Their record choice model is described as aMarkov chain of records conditioned on record

types. Their intention is to capture salience andcoherence. Formally:

p(r | d) =|r|∏

i=1

p(ri.t | ri−1.t)1

| s(ri.t) |

where s(ri.t) is the set of records in d that hasrecord type ri.t and r0.t is the START record type.Their model also includes a special NULL recordtype responsible for generating text that does notbelong to any real record types. Analogously, fieldchoice model is a Markov chain of fields condi-tioned on the choice of records:

p(f | r) =|r|∏

i=1

|fj |∏

j=1

p(fij | fi(j−1))

Two special fields — START and STOP — arealso implemented to capture the transitions at theboundaries of the phrases. In addition, each recordtype has a NULL field aligned to words that re-fer to that record type in general. The final stepof the process is the word choice model wherewords are generated from the choice of recordsand fields. Specifically, for each field fij , we gen-erate a number of words cij , chosen uniformly.Then the words w are generated conditioned onthe field f .

p(w | r, f , c,d) =|w|∏

k=1

pw(wk | r(k).tf(k), r(k).vf(k))

where r(k) and f(k) are record and field respon-sible for generating word wk and pw(wk | t, v) isthe distribution of words given a field type t andfield value v. Their model supports three differentfield types. Depending on the field types, Liang etal. 2009 define different methods for generatingwords:

• Integer type: generate the exact value, round-ing up, rounding down and adding or sub-tracting unexplained noise ε+ or ε−

• String type: generate a word chosen uni-formly from those in the field value

• Categorical type: maintain a separate multi-nomial distribution over words for each fieldvalue in the category.

62

Page 71: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

GCS 1. S→ R(start) [Pr = 1]

2. R(ri.t)→ FS(rj , start) R(rj .t)[P (rj .t | ri.t) 1

|s(rj .t)|

]

3. R(ri.t)→ FS(rj , start)[P (rj .t | ri.t) 1

|s(rj .t)|

]

4. FS(r, r.fi)→ F(r, r.fj)FS(r, r.fj) [P (fj | fi)]5. FS(r, r.fi)→ F(r, r.fj) [P (fj | fi)]6. F(r, r.f)→W(r, r.f)F(r, r.f) [P (w | w−1, r, r.f)]

7. F(r, r.f)→W(r, r.f) [P (w | w−1, r, r.f)]

GSURF 8. W(r, r.f)→ α [P (α | r, r.f, f.t, f.v, f.t = cat, null)]

9. W(r, r.f)→ gen(f.v) [P (gen(f.v).mode | r, r.f, f.t = int)×P (f.v | gen(f.v).mode)]

10. W(r, r.f)→ gen str(f.v, i) [Pr = 1]

Table 1: Grammar rules used for generation with their corresponding weights.

4.3 The generation model

Konstas (2014) recasts an earlier model (Lianget al., 2009) into a set of context-free grammar(CFG) rules. To capture word-to-word dependen-cies during the generation process, he added morerules to emit a chain of words, rather than wordsin isolation. Table 1 shows his defined grammarrules with their corresponding weights.

The first rule in the grammar expands froma start symbol S to a special START recordR(start). Then, the chain of two consecutiverecords, ri and rj is defined through rule (2)and (3). Their weight is the probability of emit-ting record rj given record ri and corresponds tothe record choice model of Liang et al. (2009).Equivalently, rule (4) and (5) define the chain oftwo consecutive fields, fi followed by fj , andtheir weight corresponds to the field choice model.Rule (6) and (7) are added to specify the expansionof field F to a sequence of words W. Their weightis the bigram probability of the current word givenits previous word, the current record and field. Fi-nally, rules (8)-(10) are responsible for generatingwords. If the field type is categorical (denoted ascat) or NULL (denoted as null), rule (8) is ap-plied to generate a single word α in the vocabu-lary of the training set. Its weight is the probabil-ity of seeing α, given the current record, field andthe field type is cat or null. Rule (9) is applied ifthe field type is integer (denoted as int). gen(f.v)is a function that accepts the field value (an inte-ger) as its input and return an integer using one ofthe six methods described by Liang et al. (2009).The weight is a multinomial distribution over thesix integer generation function choices, given therecord field f , times P (f.v | gen(f.v).mode)],which is set to the geometric distribution of noise

ε+ and ε−, or to 1 otherwise (Konstas, 2014).Rule (10) is responsible for generating a word forstring-type field. gen str(f.v, i) is a function thatsimply return the ith word of the string in the fieldvalue f.v.

After defining the grammar rules, Konstas(2014) treats the generation problem as a parsingproblem using the CFG rules. He uses a modi-fied version of the CYK algorithm (Kasami, 1966;Younger, 1967) to find the best text w given astructured data entry d. His basic decoder is pre-sented as a deductive proof system (Shieber et al.,1995) in Table 2. The decoding process works ina bottom-up fashion. It starts with choosing N —the length (number of words) of the output text.Konstas (2014) determines N using a simple lin-ear regression model where features being record-field pairs in the data entry d. Then, for each po-sition i in the output text, it searches for the bestscoring item that spans from i to i+ 1 (one singleword). Next, items are visited and combined in or-der for larger spans until it reaches the goal item[S, 0, N ] — symbol S spans from position 0 to N .

The basic decoder always chooses the best scor-ing item during the parsing process. Konstas(2014) extends the basic decoder with the k-bestdecoder in which a list of k-best derivations willbe kept for each item. The extension significantlyimproves the output quality by avoiding local opti-mums. He also intersects the grammar rules with atri-gram language model and a dependency modelto ensure fluency and grammaticality of the outputtext.

4.4 Location-based user model

There has been wide range of work on location-based user models, learning and predicting users’routes and destinations. These tasks involve some

63

Page 72: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Items: [A, i, j]R(A→ B)R(A→ BC)

Axioms: [W, i, i+ 1] : s W → gi+1, gi+1 ∈ {α, gen(), gen str()}Inference rules:

(1) R(A→B):s[B,i,j]:s1[A,i,j]:s·s1

(2) R(A→BC):s[B,i,k]:s1[C,k,j]:s2[A,i,j]:s·s1·s2

Goal: [S, 0, N ]

Table 2: The basic decoder deductive system.

uncertainties. Much work relies on GPS signaldata to identify a user’s location and may not beaccurate. In addition, intended destinations are notalways certain since they may be affected by fac-tors such as weather, traffic, day of week, and timeof day. Due to many uncertainties arising from thetask, most systems build probabilistic models toidentify and predict users’ locations. Marmasseand Schmandt (2000, 2002) apply pattern recog-nition techniques to learn users’ patterns of trav-eling and frequent destinations. These frequentlocations can be added or removed manually byusers. Then, each location is assigned to a to-dolist which is displayed whenever users travel tothis location. In Krumm et al. (2013)’s model, themap can be modelled as a directed graph whereroad intersections are vertices and road segmentsconnecting these intersections are edges. A prob-abilistic model is used to rank the potential desti-nations based on the current trip (previous inter-sections users have passed). After collecting a listof candidate destinations and their probabilities,route probabilities are computed by summing alldestination probabilities along the fastest routes.Therefore, a corpus of the driver’s regular routes isnot necessary in this model. Simons et al. (2006)use a Hidden Markov Model (HMM) with the ex-tended version where they consider factors such asday of week and time of travel in their predictionalgorithms, while Liao et al. (2007) use a morecomplex HMM with the ability to infer the user’smode of transportation.

5 Examples

Table 3 presents an example of input and outputof the generation model with different settings. Inthe first setting, we train the weights of the gram-mar rules using the whole corpus. Next, we use thek-best decoder integrated with a tri-gram languagemodel to generate the text given the input scenario.

We try different values of k (the number of k-bestderivations kept for each item during the gener-ation process). The generation system generatesoutput 1a and output 1b for k = 10 and k = 20respectively. We can try with larger values of k,however, it will affect the generation time, whichbecomes a factor if incorporated into a real timesystem. In the above example, instead of usingthe whole corpus, we use only tweets from a spe-cific user to train the model. The chosen tweets inthe second setting are generated by the user “680NEWS Traffic” who has the majority of tweets inthe corpus. We also try different k values (k = 10and k = 20), however, the results are the same andpresented in output 2.

Some essential records such as “Referenceroad” and “Reason” from the input scenario arenot chosen by the generator for inclusion in thegenerated tweets in output 1a and output 2 re-spectively. On the other hand, extra informationthat the input does not cover is included arbitrar-ily such as “collision”, “the right lane” or “the leftlane”. There are two main reasons for this behav-ior:

• inaccurate alignments between data and text:all the alignments are inferred from an unan-notated corpus. A fully or partially annotatedcorpus will improve the accuracy of the align-ment model.

• corpus: usually, the tweets contain more in-formation than the structured data. This extrainformation can create noise in the trainingprocess, especially without supervision.

6 Conclusions and Future Work

The preliminary results for the data-driven ap-proaches show that it is possible to generate real-time tweets for inclusion in a real-time traffic no-tification system, using techniques that are oth-

64

Page 73: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

InputMain road

Name Direction401 Eastbound

Reference roadName

Yonge St

LaneValue

collectors

StreamValue

collectors

ConditionValue

Bad traffic

ReasonValue

Disabled vehicle

Incident typeValue

Disabled vehicle

Output 1a collision: collision: #hwy401 eb express at yonge st

Output 1b 401 collectors - right lane blocked with a collision.

Output 2 eb 401 at yonge express, blocking the left lane

Table 3: Example input and output of the generation system with different settings.

erwise applicable to different domains and data-sets. There are various types and sources of traffic-related data useful to drivers (e.g. traffic flow,road construction,...), and we have only scratchedthe surface of the issues concerning personalizedtweets. Further evaluation is required and wewill present the preliminary results from automaticevaluation during the workshop.

A high priority for the ongoing research is thecontent-preference model: some users desire cer-tain information more than other information (e.g.reason of the accident, detour information, ...). Acontent-preference model can be integrated intothe grammar to re-rank the generated sentenceswith the information users need.

Given the relatively constrained domain, wewant to consider how template based models canbe used with the data-driven approach introducedin this paper. A template approach requires differ-ent set of patterns and rules for each traffic datatype, but integrating techniques involving seman-tic role labels (Lindberg et al., 2013) may assistin applying our approach to different data-sets anddifferent locations.

Another aspect we need to consider to improvethe system is how can we optimize it in termsof output quality and generation time. For out-put quality, considering the limitation describedin section 5, we may want to get more data andpotentially annotate parts of the data to get bet-ter alignment accuracy. In addition, applying dif-ferent data pre-processing and normalizing tech-niques can also help clean up the data before train-ing the model. To improve the generation time, wecan apply an approximate search approach such ascube-pruning (Chiang, 2007).

Finally, we will evaluate the system using met-rics such as BLEU and METEOR given that wehave human-generated data. These two metrics arealso used for evaluation in Konstas (2014)’s work.

However, the data we have collected is comprisedof both human-generated and machine-generatedtexts. Therefore, we need to develop a techniqueto separate the two sets. One simple way is basedon the Twitter username generating the tweets. Inaddition, we can also set up experiments compar-ing how different the results are when the systemis trained with only human-generated texts and istrained with both sets.

Acknowledgements

The authors would like to thank the reviewersfor detailed and constructive feedback. Com-ments and insights from Anoop Sarkar and MilanTofiloski were also greatly appreciated in improv-ing the quality of the paper. Special thanks to AliTizghadam for assistance in providing access tothe necessary data. This research was supportedin part by the Natural Sciences and EngineeringResearch Council of Canada.

ReferencesGabor Angeli, Percy Liang, and Dan Klein. 2010. A

simple domain-independent probabilistic approachto generation. In Proceedings of the 2010 Confer-ence on Empirical Methods in Natural LanguageProcessing, EMNLP ’10, pages 502–512, Strouds-burg, PA, USA. Association for Computational Lin-guistics.

Kalina Bontcheva and Yorick Wilks, 2004. AutomaticReport Generation from Ontologies: The MIAKTApproach, pages 324–335. Springer Berlin Heidel-berg, Berlin, Heidelberg.

David Chiang. 2007. Hierarchical phrase-based trans-lation. Comput. Linguist., 33(2):201–228, June.

T. Kasami. 1966. An efficient recognition and syntax-analysis algorithm for context-free languages. Tech-nical Report AF-CRL-65-758, Air Force CambridgeResearch Laboratory, Bedford, Massachusetts.

65

Page 74: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Joohyun Kim and Raymond J. Mooney. 2010. Gen-erative alignment and semantic parsing for learn-ing from ambiguous supervision. In Proceedingsof the 23rd International Conference on Compu-tational Linguistics: Posters, COLING ’10, pages543–551, Stroudsburg, PA, USA. Association forComputational Linguistics.

Ioannis Konstas. 2014. Joint Models for Concept-to-Text Generation. Ph.D. thesis, University of Edin-burgh.

Eric Krokos and Hanan Samet. 2014. A look intotwitter hashtag discovery and generation. In Pro-ceedings of the 7th ACM SIGSPATIAL Workshop onLocation-Based Social Networks (LBSN14), Dallas,TX, Nov.

John Krumm, Robert Gruen, and Daniel Delling. 2013.From destination prediction to route prediction. J.Locat. Based Serv., 7(2):98–120, June.

Percy Liang, Michael I. Jordan, and Dan Klein. 2009.Learning semantic correspondences with less super-vision. In Proceedings of the Joint Conference of the47th Annual Meeting of the ACL and the 4th Interna-tional Joint Conference on Natural Language Pro-cessing of the AFNLP: Volume 1 - Volume 1, ACL’09, pages 91–99, Stroudsburg, PA, USA. Associa-tion for Computational Linguistics.

Lin Liao, Donald J. Patterson, Dieter Fox, and HenryKautz. 2007. Learning and inferring transportationroutines. Artificial Intelligence, 171(5):311 – 331.

David Lindberg, Fred Popowich, John Nesbit, and PhilWinne. 2013. Generating natural language ques-tions to support learning on-line. ENLG 2013, pages105–114.

Elena Lloret and Manuel Palomar. 2013. Towardsautomatic tweet generation: A comparative studyfrom the text summarization perspective in the jour-nalism genre. Expert Systems with Applications,40(16):6624 – 6630.

Christoph Lofi and Ralf Krestel, 2012. iParticipate:Automatic Tweet Generation from Local Govern-ment Data, pages 295–298. Springer Berlin Heidel-berg, Berlin, Heidelberg.

Natalia Marmasse and Chris Schmandt. 2000.Location-aware information delivery with commo-tion. In Proceedings of the 2Nd InternationalSymposium on Handheld and Ubiquitous Comput-ing, HUC ’00, pages 157–171, London, UK, UK.Springer-Verlag.

Natalia Marmasse and Chris Schmandt. 2002. A user-centered location model. Personal Ubiquitous Com-put., 6(5-6):318–321, January.

Franois Portet, Ehud Reiter, Albert Gatt, Jim Hunter,Somayajulu Sripada, Yvonne Freer, and CindySykes. 2009. Automatic generation of textual sum-maries from neonatal intensive care data. ArtificialIntelligence, 173(7):789 – 816.

A. Ramos-Soto, A. J. Bugarn, S. Barro, and J. Taboada.2015a. Linguistic descriptions for automatic gener-ation of textual short-term weather forecasts on realprediction data. IEEE Transactions on Fuzzy Sys-tems, 23(1):44–57, Feb.

A. Ramos-Soto, M. Lama, B. Vzquez-Barreiros,A. Bugarn, and M. M. S. Barro. 2015b. Towardstextual reporting in learning analytics dashboards.In 2015 IEEE 15th International Conference onAdvanced Learning Technologies, pages 260–264,July.

Stuart M. Shieber, Yves Schabes, and Fernando C.N.Pereira. 1995. Computational linguistics and logicprogramming principles and implementation of de-ductive parsing. The Journal of Logic Program-ming, 24(1):3 – 36.

Priya Sidhaye and Jackie Chi Kit Cheung. 2015. In-dicative tweet generation: An extractive summariza-tion problem? In Llus Mrquez, Chris Callison-Burch, Jian Su, Daniele Pighin, and Yuval Marton,editors, EMNLP, pages 138–147. The Associationfor Computational Linguistics.

R. Simmons, B. Browning, Yilu Zhang, and V. Sadekar.2006. Learning to predict driver route and destina-tion intent. In 2006 IEEE Intelligent TransportationSystems Conference, pages 127–132, Sept.

Ali Tizghadam and Alberto Leon-Garcia. 2015. Ap-plication platform for smart transportation. In Fu-ture Access Enablers of Ubiquitous and IntelligentInfrastructures, pages 26–32. Springer.

Yin-Yen Tseng, Jasper Knockaert, and Erik T. Ver-hoef. 2013. A revealed-preference study of be-havioural impacts of real-time traffic information.Transportation Research Part C: Emerging Tech-nologies, 30:196 – 209.

Daniel H. Younger. 1967. Recognition and parsing ofcontext-free languages in time n3. Information andControl, 10(2):189 – 208.

66

Page 75: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG), pages 67–70,Edinburgh, Scotland, September 6th, 2016. c©2016 Association for Computational Linguistics

Analysing the Integration of Semantic Web Features for DocumentPlanning across Genres

Marta VicenteDepartment of Softwareand Computing SystemsUniversity of AlicanteApdo. de Correos 99

E-03080, Alicante, [email protected]

Elena LloretDepartment of Softwareand Computing SystemsUniversity of AlicanteApdo. de Correos 99

E-03080, Alicante, [email protected]

Abstract

Language is usually studied and analysedfrom different disciplines generally on thepremise that it constitutes a form of com-munication which pursues a specific ob-jective. The discourse, in that sense, canbe understood as a text which is con-structed to express such objective. Whena discourse is created, its production is re-lated to some textual genre, usually con-nected with some pragmatic features, likethe intention of the writer or the audienceto whom is addressed, both conditioningthe use of language. But genres can beconsidered as well as compounds of differ-ent pieces of text with a certain degree oforder, each one seeking for more concreteobjectives. This paper presents a proposalto learn such features as a way to generatericher document plans, applying clusteringtechniques over annotated documents.

1 Motivation and Research Context

The current research is carried out from a concep-tion of Natural Language Generation (NLG) forwhich the creation of a text requires an interme-diate output called a document plan. It is by themacroplanning stage that the system provides thisplan of selected and ordered content. At present,our work is focused on how to elaborate that planin order to meet some requisites regarding flex-ibility of the system: it should be able to pro-duce different outcomes conditioned by the com-municative goal, the audience,... the context, onthe whole. Henceforth, the main aim of our cur-rent research is to enrich the pragmatic facet of theNLG process. The expected outcome is a schemeor ordering of the ideas that should be realised in

a set of cohesive and coherent sentences and para-graphs.

According to some theories of the discourse(Bakhtin, 2010; Halliday et al., 2014), genres canbe understood as social constructions that settle aconnection between the discourse and the situationin which it is produced, reflected both in its struc-ture and its content. According to Swavels (1990):

“A genre comprises a class of commu-nicative events, the members of whichshare some set of communicative pur-poses. These purposes are recognisedby the expert members of the parent dis-course community, and thereby consti-tute the rationale for the genre. Thisrationale shapes the schematic structureof the discourse and influences and con-straints choice of content and style.”

Besides, genres become interesting because theyare related to communicative purposes in differentmanners, from a global viewpoint to fine-grainedlevels. As an example, we can think on the caseof a person who is looking for recommendationin review pages. Recommending would be themain, global purpose of the text he consults whenit was created. But it is possible that the writer alsowanted to explain the motivation of the journey -narrative, personal experience - or to describe thefacilities in order to complete his review. Narra-tion, description, recommendation,... they repre-sent low-level functions of the text related to theintention of the writer and, in some cases, they canbe identified as different sets of sentences. Thislead us to the possibility of learning the structureof the text and its features, which differs from onegenre to another. In reviews, the presence and or-der of the parts is not strict.

Maybe one traveller does not share his personalstory, but also he describes the room and recom-

67

Page 76: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

mends the brand, while another one first evaluatesand then describes. An example to illustrate thiscan be found in table 1. Conversely, it would makeno sense to write a scientific article that reports theresults before explaining the methodology or notexplaining it at all, for example.

Review 1Personal Experience:On our last trip to Hawaii my husband and I...As an added bonus, we were given...We decided to take advantage of...Description:The lobby is adorned with lush gardens...Alongside the gardens are tropical birds...The rooms are spacious.Recommendation:If you are ever fortunate enough to visitthe beautiful island of Kauai, try to stayat the H Regency, you won’t be disappointed.Review 2Description:The W New York is on Lexington right...The rooms are just as small as before...The lobby of the hotel is also...Personal Experience:Being a corporate lawyer I travel...The first time I was in a small room...The second time I could not believe...Description:Although the room size is awful,the hotel does have some nice touches.Another benefit of the hotel is that...

Table 1: Review ordering from a functional ap-proach. Just with the first words of the sentencessome characteristic features can be appreciated(Verb tenses, person-thirdfirst-, ...)

Therefore, our hypothesis is that it is possibleto characterise subparts of a discourse (related to agenre) according to their functionality and, at thesame time, learn about its (flexible) ordering. Dueto the lack of annotated corpora with discourse in-formation about that communicative purposes, wepropose to work with unsupervised techniques toachieve that goal. We expect to obtain the neces-sary knowledge to produce appropriate documentplans. Taking into account several genres that nor-mally exhibit a pre-defined or known-in-advancestructure, such as the case of news, Wikipediapages, or scientific article, we would be able to

validate our suggested approach in other textualgenres that lacks such well-defined structure a pri-ori.

Our methodology relies on pattern detectiontechniques. Until now, we have tried clusteringthat does not require previous knowledge of thenumber of clusters. Over an annotated corpuswe apply an Expectation-Maximisation (EM) al-gorithm, having included within the features lin-guistic information related to its placement.

The remainder of this paper is organised as fol-lows. Section 2 summarises the related work con-cerning text classification efforts and genre stud-ies related to communication objectives. Section3 describes the kind of linguistic lexical featuresthat we have been using in our experiments un-til now. After that, section 4 describes some re-sources coming from the Semantic Web environ-ment that could complement and enrich those fea-tures. Finally, section 5 describes the experimentsalready performed and outlines future research op-portunities.

2 Related Work

Back in 1997, Hearst tried to detect the struc-ture of text using patterns of lexical co-occurrenceto identify paragraphs related to the same topic(Hearst, 1997). In this case, term repetition provedto be enough to detect subtopics in explanatorytexts, but did not include consideration about othertraits of the discourse (e.g. syntactic construc-tions, verb tenses, number of adjectives in eachregion) neither recovering more meaning furtherthan topic identification, as could be the purposeintended on the paragraph(s). Besides, the authorremarked that the results had proved highly valu-able when applied to explanatory text, but theywould be less significant for other text types.

From another point of view, Bachand (Bachandet al., 2014) develops a research focused on the re-lations between text-type, discourse structures andrhetorical relations. Again, the experiments con-ducted are implemented on a single type of fea-ture, this time rhetorical relations and markers.The good results obtained by the author indicatethat our approach, which is grounded in similar in-tuitions, can reach comparable developments thatwe expect will enrich our capacity for generatingaccurate document plans.

Regarding reviews, most of the work developedrefers to sentiment analysis or polarity classifica-

68

Page 77: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

tion (Cambria et al., 2013). A few research workshave been focused on the structure related to tex-tual genres, relying on the Systemic FunctionalTheory (Taboada, 2011). The relations of differentparts of the text with several purposes are revealed,focusing their analysis on the domain of movie re-views, and showing at the same time the variabilityof the ordering in such type of documents.

Finally, a special mention must be done tothe Systemic Functional Theory (Halliday et al.,2014). It provides a notion of genre that connectssituation types with semantic/lexico-grammaticpatterns from a conception of language highly re-lated to its socio-semiotic origin. A textual typol-ogy is depicted on this terms, connected as wellwith the context of the discourse and the semanticchoices to organise it (Matthiessen, 2014). On theother hand, and as a more precise example, the ty-pology of processes that Halliday and Mathiessendescribe, directly influences the classification ac-complished by ADESSE, one of the resources ap-plied in our experiments over Spanish reviews, ex-plained in the next section.

3 Analysis of the Features

Having pointed out the expanse of the relatedwork, our approach wants to overcome its limi-tations. On the one hand, in the sense of beingsuitable for any genre, not a particular one. On theother hand, focusing on several types of featuresat the same time, in order to propose a more com-prehensive description of the parts of a discourse.

With regard to accomplish such a project, theselection and design of the proper features be-comes a challenging task itself, strongly related tothe aim of the investigation. Specifically, we tryto detect the features that may reveal links withthe functionality or purpose of the paragraph thatincludes them. We have begun annotating severalaspects by means of linguistic tools and resources:Freeling (Padro and Stanilovsky, 2012) for PoSannotation and Entity Recognition and ADESSE(Garcıa-Miguel et al., 2010) as a source of verbsenses from a semantic perspective.

4 Semantic Web to enrich the Data Set

We believe that, in order to become more mean-ingful, the quality of features could be improvedby means of some resources rooted in Web Seman-tic technologies. There is some research related togenres that can be useful in our project. In the

ADESSE verb sensesMental, material, relational, verbal,existential and modulationFREELING featuresPoS tagging: noun, adjective, pronoun,verb (tense, aspect, ...), etc.

Table 2: Features annotated over the corpus of re-views.

realm of reviews, opinion and sentiment annota-tion, we can take advantage for example of MARLOntology Specification1, a data schema that hasbeen used in the EuroSentiment Project (Buite-laar et al., 2013) or directly related to reviewsfrom a Sentiment Analysis perspective (Santoshand Vardhan, 2015). Other genres have been tar-geted for similar developments. With regard tonews genre, in order to obtain more significantannotation of the documents, BBC provides a setof ontologies related to their contents. DBPediahas been already proved useful for Wikipedia arti-cles researchers. Drammar (Lombardo and Dami-ano, 2012) and OntoMedia (Jewell et al., 2005) areontology-based models for annotating features ofmedia and cultural narratives. All of them repre-sent resources that may lead to different results inour clustering task and analysis.

5 On-going Work

Until now, some experiments have been performedover a corpus of Spanish reviews extracted fromTripadvisor. The reviews were segmented intosentences, and some figures regarding semanticand morphological features were computed afterdividing each document in regions (sets of sen-tences), increasing their number from one blockup to four blocks of sentences. Table 3 showssome statistics of the corpus employed.

Number of reviews 1400Sentences 12,467Words labelled around 200,000

Table 3: Corpus statistics.

In order to strengthen the results, corpora ofother genres with different degree of flexibility intheir structure are being analysed: tales, news andWikipedia articles are to be compared with the for-

1http://www.gsi.dit.upm.es/ontologies/marl

69

Page 78: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

mer outcomes. The length of the blocks is theresult of a proportional division of the length ofthe document for now. As the research advances,new experiments will be developed to determinea more accurate size for the pseudo-paragraphs.With the ideas introduced in the section 4, our nextstep and proposal, includes improving the signifi-cance of the features with which the clustering al-gorithms have to work, trying to reveal an innerstructure of the text related to its genre and pur-poses. The better our features are, the more pre-cise the descriptions we can do of the discourseareas.

Acknowledgments

This research work has been supported bythe Generalitat Valenciana by the grantACIF/2016/501.It has also funded by theUniversity of Alicante, Spanish Governmentand the European Commission through theprojects, ”Explotacion y tratamiento de lainformacion disponible en Internet para la an-otacion y generacion de textos adaptados alusuario” (GRE13-15) and ”DIIM2.0: Desar-rollo de tecnicas Inteligentes e Interactivas deMinerıa y generacion de informacion sobre laweb 2.0” (PROMETEOII/2014/001), TIN2015-65100-R, TIN2015-65136-C2-2-R, and SAM(FP7-611312), respectively.

ReferencesFelix-Herve Bachand, Elnaz Davoodi, and Leila Kos-

seim. 2014. An investigation on the influence ofgenres and textual organisation on the use of dis-course relations. In International Conference onIntelligent Text Processing and Computational Lin-guistics, pages 454–468. Springer.

Mikhail Mikhaılovich Bakhtin. 2010. Speech genresand other late essays. University of Texas Press.

Paul Buitelaar, Mihael Arcan, Carlos Angel Igle-sias Fernandez, Juan Fernando Sanchez Rada, andCarlo Strapparava. 2013. Linguistic linked data forsentiment analysis.

Erik Cambria, Bjorn Schuller, Yunqing Xia, andCatherine Havasi. 2013. New avenues in opin-ion mining and sentiment analysis. IEEE IntelligentSystems, 28(2):15–21.

Jose M. Garcıa-Miguel, Gael Vaamonde, andFita Gonzalez Domınguez. 2010. Adesse, adatabase with syntactic and semantic annotation ofa corpus of spanish. In LREC.

MAK Halliday, Christian MIM Matthiessen, MichaelHalliday, and Christian Matthiessen. 2014. An in-troduction to functional grammar. Routledge.

Marti A. Hearst. 1997. Texttiling: Segmenting textinto multi-paragraph subtopic passages. Computa-tional linguistics, 23(1):33–64.

Michael O. Jewell, K. Faith Lawrence, Mischa M.Tuffield, Adam Prugel-Bennett, David E. Millard,Mark S. Nixon, Nigel R. Shadbolt, et al. 2005. On-tomedia: An ontology for the representation of het-erogeneous media. In In Proceeding of SIGIR work-shop on Mutlimedia Information Retrieval. ACM SI-GIR.

Vincenzo Lombardo and Rossana Damiano. 2012. Se-mantic annotation of narrative media objects. Mul-timedia Tools and Applications, 59(2):407–439.

Christian MIM Matthiessen. 2014. Registerial cartog-raphy: context-based mapping of text types and theirrhetorical-relational organization.

Lluıs Padro and Evgeny Stanilovsky. 2012. FreeL-ing 3.0: Towards Wider Multilinguality. In Proceed-ings of the Eight International Conference on Lan-guage Resources and Evaluation (LREC’12). Euro-pean Language Resources Association (ELRA).

D. Teja Santosh and B. Vishnu Vardhan. 2015. Fea-ture and sentiment based linked instance rdf data to-wards ontology based review categorization. In Pro-ceedings of the World Congress on Engineering, vol-ume 1.

John Swales. 1990. Genre analysis: English in aca-demic and research settings. Cambridge UniversityPress.

Maite Taboada. 2011. Stages in an online reviewgenre. Text & Talk-An Interdisciplinary Journalof Language, Discourse & Communication Studies,31(2):247–269.

70

Page 79: Proceedings of the 2nd International Workshop on …2nd International Workshop on Natural Language Generation and the Semantic Web to be held on September 6th, 2016 in Edinburgh, Scotland.

Author Index

Bandyopadhyay Saptarashmi, 41Barros, Cristina, 1Basile Valerio, 5Bourgonje Peter, 13

Demner-Fushman Dina, 29Dominguez Martin, 17Duboue Pablo, 17

Estrella Paula, 17

Gardent Claire, 25, 29, 41, 54

Hare Jonathon, 29

Keet C. Maria, 50Kilicoglu Halil, 29

Lloret Elena, 1, 25, 67

Mkhitaryan Satenik, 46Moreno Schneider Julian, 13Mrabet Yassine, 29

Nesterenko Liubov, 37Nouvel Damien, 46

Perez-Beltrachini Laura, 41Popowich Fred, 59

Rehm Georg, 13Revuz Anselme, 41

Sadoun Driss, 46Sanby Lauren, 50Sasaki Felix, 13Simperl Elena, 29Sleimi Amin, 54

Todd Ion, 50Tran Khoa, 59

Valette Mathieu, 46Vicente Marta, 67Vougiouklis Pavlos, 29

71