KafNafParserPy: a python library for parsing/creating KAF and NAF files
-
Upload
ruben-izquierdo-bevia -
Category
Technology
-
view
158 -
download
4
description
Transcript of KafNafParserPy: a python library for parsing/creating KAF and NAF files
KafNafParserPyA python library for parsing KAF/NAF
Ruben Izquierdo Bevia
Vrije University of Amsterdam
CLTL meeting 19th Nov 2014
What is KAF / NAF ?• Annotations formats to represent linguistic information
o XML based
o Different information in different layers interconnected
o Easy to be used in NLP pipelines
• KAF o https://github.com/opener-project/kaf/wiki/KAF-structure-overview
• NAF o http://www.newsreader-project.eu/files/2013/01/techreport.pdf
What is the KafNafParserPy
• It is a Python module/library
• It allows to parse a KAF or NAF fileo Read all the layers
o Provides access to the information by means of python classes (methods and
attributes)
• It allows to generate new KAF/NAF fileso Create new layers
o Modify existing ones
• It allows to convert NAF KAF
KafNafParserPyphilosophy
• No validation against DTD (just valid as XML)
• Python object for each XML element (header, text,
token,terms…)
• The attributes are not “parsed/read”o The KAF/NAF attributes are not defined as attributes for a class
o Just the pointer to the XML element is stored
• It provides access to all the attributes on “real time”
• Modifications are made “on the fly”
o If you change the object in memory you will need to dump it to a new
file to keep the results
KafNafParserPyphilosophy
• Class Cterm (encapsulate a KAF/NAF term)o Attributes:
• string lemma
• string pos
• string morphofeat
• Cspan span ….
o Methods
• get_lemma(…) returns the lemma attribute
• get_pos(…) returns the pos attribute
• …..
KafNafParserPyphilosophy
• Class Cterm (encapsulate a KAF/NAF term)o Attributes:
• string lemma
• string pos
• string morphofeat
• Cspan span ….
o Methods
• get_lemma(…) returns the lemma attribute
• get_pos(…) returns the pos attribute
• …..
KafNafParserPyphilosophy
• Class Cterm (encapsulate a KAF/NAF term)o Attributes:
• string type (is NAF or KAF?)
• Pointer to the xml element
o Methods
• get_lemma(…) returns xml_element.get(‘lemma’)
• get_pos(…) returns xml_element.get(‘pos’)
• get_id(…)
o xml_element.get(‘id’) for NAF
o xml_element.get(‘tid’) for KAF
Getting Started I• https://github.com/cltl/KafNafParserPy
• Basic steps:o Install lxml library for Python
• pip install lxml
o Clone the repository
• git clone https://github.com/cltl/KafNafParserPy
o Make it available for Python
• Put it on the same folder of the scripts that will import
• Add it to PYTHON_PATH
• Create a symbolic link in your virtualenv
• …
Getting Started II• Documentation:
o HTML: http://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy/
o PDF: http://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy/api.pdf
• Entry point alwayso Module KafNafParserPy
o Class KafNafParser
Getting tokens• How could I?
o We just have a “KafNafParser” object
• Go to the API and check the methods for the
KafNafParser class
Getting tokens• How could I?
o We just have a “KafNafParser” object
• Go to the API and check the methods for the
KafNafParser class
Getting tokens• How could I?
o We just have a “KafNafParser” object
• Go to the API and check the methods for the
KafNafParser class
Getting tokens
Getting terms• Use KafNafParser::get_terms(…)
• Use methods of Cterm
Modifying one token• Change w7->War to Battle
Modifying one token• Object “my_parser” after set_text(…)
o is updated with “Battle” in memory
o Original file “entities_example.naf” is not changed
• If we want to keep the changeso Close the program clean memory changes lost
o We will need to dump the object to a new file
• Could be a (string) filename or an open file
Read entities• KafNafParser::get_entities() is an iterator for
entities• Centity::get_external_references() is an iterator for
external references
Adding a new external reference
1. Create the new object external referenceo “from KafNafParserPy import KafNafParser”
o “from KafNafParserPy import *”
2. Set the attributes with the set_XYZ() methods
1. Add the new object to the layer/treeo By adding it to the specific element (the entity if we have it)
o By adding it to the general parser object providing the identifier (sometimes not
implemented)
Adding a new external reference
• Create the new external reference• Find the element where we want to add it• Use the “adding” method of the element
Adding a new external reference
• Create the new external reference• Use the “adding” method of the parser and
providing the id• Not always implemented (quite easy to do)
KafNafParserPyRuben Izquierdo Bevia
http://rubenizquierdobevia.com
GitHubhttps://github.com/cltl/KafNafParserPy
API htmlhttp://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy/
API pdfhttp://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy/api.pdf