CLTL: Description of web services and sofware. Nijmegen 2013
-
Upload
ruben-izquierdo-bevia -
Category
Technology
-
view
368 -
download
1
description
Transcript of CLTL: Description of web services and sofware. Nijmegen 2013
![Page 1: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/1.jpg)
CLTL Software and Web
ServicesRubén Izquierdo Beviá
![Page 2: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/2.jpg)
Rubén Izquierdo BeviáAbout me
5-year degree on Computer Science (University of Alicante, Alicante, Spain)
National NLP projects and 1 European project (QALLME) (University of Alicante, Alicante, Spain)
Thesis about NLP & Word Sense Disambiguation (University of Alicante, Alicante, Spain. Sept 2010)
Postdoc position at DutchSemCor Project (University of Tilburg, Tilburg. Sept 2011-Sept2012)
Postdoc position at OpeNER Project (Vrije University, Amsterdam. Sept 2012-)
![Page 3: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/3.jpg)
CLTL softwareIn general common input/output format
KAFNAF, as an extension of KAF
Single components performing single tasks Integration of existing modules
Adaptation of input/output formats
Development of new ones
![Page 4: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/4.jpg)
KAFKyoto Annotation Format
Stand-off, layered, XML-based representation formatDifferent types of information are stored in
different layersLayers are linked by means of references Suitable for creating pipelines based on this formatLayers:
Text tokensTerm lemmas, part-of-speech, term sentiment, word
sensesEntities, chunks, opinions…
![Page 5: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/5.jpg)
KAFKyoto Annotation Format
![Page 6: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/6.jpg)
NAFNewsReader Annotation FormatExtension of KAF
Allow the cross-document processingEvent coreference
ID’s are converted into valid URI’s
Store the same type of information provided by different toolsResult of two different pos-taggers
![Page 7: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/7.jpg)
How the software is provided I
All modules are publicly available on GitHubCLTL GitHub
http://github.com/cltl
NewsReader GitHubhttp://github.com/newsreader
OpeNER GitHubhttp://github.com/opener-project/
![Page 8: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/8.jpg)
How the software is provided II
Some are available as Web ServicesExposed as REST web services
Accept and input stream (KAF/NAF)
Generate an output stream (KAF/NAF)
Easy to call from command line with CURLEasy to create module pipelines in the same way you
create a linux commands pipeline
http://wordpress.let.vupr.nl/web-services/
![Page 9: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/9.jpg)
How the software is provided II
![Page 10: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/10.jpg)
How the software is provided II
![Page 11: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/11.jpg)
Our software IGeneral modules (integrated)
Tokenizers: whitespace based, open-nlp trained...
Sentence splitters: based on rules, open-nlp
Pos-taggers: treetagger, open-nlp pos taggers
Chunker: trained on Alpino data with open-nlp
Parsers: Alpino (nl), Stanford (en)
![Page 12: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/12.jpg)
Our software II General modules (developed by us)
Wordnet Tools Functions to use a WordNet in LMF format
Word Sense Disambiguation systems UKB: unsupersived SVM: supervised (for nl derived from DutchSemcor)
Multiword tagger multiword sequences of terms according the WordNet
OntoTagger Ontotagger inserts (semantic) labels into KAF representation on
the basis of lemma or wordnet synset representations of text
![Page 13: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/13.jpg)
Our software IIIGeneral modules (developed by us)
Named Entity RecognizerDetects dates and locations using specific resources
+ GeoNames
KyBotExtract tuples and relations from a set of profiles
formulated using semantic and structural properties
![Page 14: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/14.jpg)
Our software IV OpeNER related (developed by us)
Hotel property taggerDetect aspects related with cleanliness, staff,
breakfast, rooms…Term polarity tagger
Positive/negative terms, intensifiers, negators …Opinion miner
Detect opinions: target + holder + expression2 rule based version // 1 machine learning version
![Page 15: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/15.jpg)
Our software VNewsReader related (developed by us)
Discourse ModuleSplits incoming texts into headers and paragraphs
Factuality ClassifierClassifies whether a statement is
factual/probable/possible or not Event Coreference
Compares descriptions of events within and across documents to decide if they refer to the same events.
![Page 16: CLTL: Description of web services and sofware. Nijmegen 2013](https://reader034.fdocuments.in/reader034/viewer/2022051411/5478ea99b4795972098b4651/html5/thumbnails/16.jpg)
CLTL Software and Web
ServicesRubén Izquierdo Beviá