Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts –...
-
Upload
lambert-burns -
Category
Documents
-
view
212 -
download
0
Transcript of Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts –...
![Page 1: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/1.jpg)
Annotation for Hindi PropBank
![Page 2: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/2.jpg)
Outline
• Introduction to the project
• Basic linguistic concepts– Verb & Argument– Making information
explicit– Null arguments
• Tasks to be carried out• Tools for annotation• Timesheets, tips• Practice
![Page 3: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/3.jpg)
Creation of Resources
• For machines rather than humans• Imagine a dictionary/ thesaurus for computers• A requirement for Natural Language Processing – Large annotated resources
• Annotation implies addition of linguistic information• Tailored to language specific requirements• Needs to be as consistent as possible
– Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation
![Page 4: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/4.jpg)
Hindi-Urdu Treebank Project
• One of the first efforts to make a large-scale resource for Hindi-Urdu
• Similar resources exist for Chinese, Arabic and English
• Three main components– Hindi-Urdu dependency treebank– Hindi-Urdu PropBank– Hindi-Urdu phrase structure treebank [derived]
![Page 5: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/5.jpg)
PropBank
• PropBank resource creation at CU Boulder• We annotate semantic information on top of
syntactic information• PropBank involves annotation of predicate
argument structure– Mainly concerned with verbs & their arguments– And the semantic nature of the arguments
![Page 6: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/6.jpg)
What are verbs?
• Verbs are predicating elements e.g daud, pii, baras etc
• Encode (very broadly) actions and states• Also have two kinds of grammatical
information– Tense, aspect (present, future ; perfect,
continuous)– Gender, number, person (masc/fem; sing, pl; 1st,
2nd, 3rd )
![Page 7: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/7.jpg)
What are arguments?
• In a sentence, e.g Ram ate an apple / Raam ne seb khaaya:– A verb, ‘eat’ or ‘khaa’ predicate– A person eating ‘Raam’ ARGUMENT– Thing eaten ‘apple’ / ‘seb’ ARGUMENT
• Without arguments, the meaning of the verb ‘ate’ is not realized completely
• Together, they make up the predicate argument structure of the sentence
![Page 8: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/8.jpg)
Arguments show what’s important
• Raam ne jaldi se seb khaaya– Raam, seb are arguments– But ‘jaldi se’ is not
• It’s all about the verb– It projects its need for certain arguments– Sift what’s mandatory from what’s optional
![Page 9: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/9.jpg)
Like Unix commands
• Some commands require only one argument.– cd /home/student/ashwini
– cp hmwk1.txt hmwk2.txt
• If the command is typed with too many or too few arguments…
![Page 10: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/10.jpg)
Error!
![Page 11: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/11.jpg)
Making information explicit
• As speakers of Hindi or English, we already have knowledge of predicate argument structure
• E.g. hari ___ pahuMcaa– Capturing this knowledge for the machine is
essential– Ram ne seb khaaya aur paani piyaa– Who drank the water?
![Page 12: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/12.jpg)
Identify arguments
• In PropBank, we first identify arguments of a verb
• When explicitly present, they are called ARG• Further, they are numbered as ARG0, ARG1,
ARG2 etc.• Often, you have ARG as well as ARG-M– RamARG0 ne jaldi seARG-M sebARG1 khaaya
![Page 13: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/13.jpg)
Null arguments
• What if arguments are not explicit?– E.g Ram ne seb khaaya aur ___ paani piyaa– Ram is also the person drinking water– It can be dropped, because of conjunction aur– For the machine, it must be retrieved from the
sentence • We also mark these missing or null arguments
![Page 14: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/14.jpg)
Tasks to be carried out
• Null argument insertion
• Argument annotation
![Page 15: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/15.jpg)
Tools to be used
• Sanchay – GUI for annotators. We use it especially for Null argument insertion
• Use your verbs account to access Sanchay
• Wiki for annotator resources
![Page 16: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/16.jpg)
Timesheets & tips
• Being honest about filling out timesheets is quite important
• We can access the amount of time you spend on verbs
• I will ask you to keep track of number of annotations per hour to cross check
• Turn in the timesheets at my CINC mailbox in physical form, with your signature
![Page 17: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.](https://reader030.fdocuments.in/reader030/viewer/2022020417/56649f165503460f94c2c4ab/html5/thumbnails/17.jpg)
Practice
• We need to learn about four kinds of empty categories
• Plan to proceed– Recognizing syntactic constructions – Getting familiar with the tool– Practice with the corpus– Q & A based on null argument insertion