Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts –...

17
Annotation for Hindi PropBank

Transcript of Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts –...

Page 1: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Annotation for Hindi PropBank

Page 2: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Outline

• Introduction to the project

• Basic linguistic concepts– Verb & Argument– Making information

explicit– Null arguments

• Tasks to be carried out• Tools for annotation• Timesheets, tips• Practice

Page 3: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Creation of Resources

• For machines rather than humans• Imagine a dictionary/ thesaurus for computers• A requirement for Natural Language Processing – Large annotated resources

• Annotation implies addition of linguistic information• Tailored to language specific requirements• Needs to be as consistent as possible

– Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation

Page 4: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Hindi-Urdu Treebank Project

• One of the first efforts to make a large-scale resource for Hindi-Urdu

• Similar resources exist for Chinese, Arabic and English

• Three main components– Hindi-Urdu dependency treebank– Hindi-Urdu PropBank– Hindi-Urdu phrase structure treebank [derived]

Page 5: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

PropBank

• PropBank resource creation at CU Boulder• We annotate semantic information on top of

syntactic information• PropBank involves annotation of predicate

argument structure– Mainly concerned with verbs & their arguments– And the semantic nature of the arguments

Page 6: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

What are verbs?

• Verbs are predicating elements e.g daud, pii, baras etc

• Encode (very broadly) actions and states• Also have two kinds of grammatical

information– Tense, aspect (present, future ; perfect,

continuous)– Gender, number, person (masc/fem; sing, pl; 1st,

2nd, 3rd )

Page 7: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

What are arguments?

• In a sentence, e.g Ram ate an apple / Raam ne seb khaaya:– A verb, ‘eat’ or ‘khaa’ predicate– A person eating ‘Raam’ ARGUMENT– Thing eaten ‘apple’ / ‘seb’ ARGUMENT

• Without arguments, the meaning of the verb ‘ate’ is not realized completely

• Together, they make up the predicate argument structure of the sentence

Page 8: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Arguments show what’s important

• Raam ne jaldi se seb khaaya– Raam, seb are arguments– But ‘jaldi se’ is not

• It’s all about the verb– It projects its need for certain arguments– Sift what’s mandatory from what’s optional

Page 9: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Like Unix commands

• Some commands require only one argument.– cd /home/student/ashwini

– cp hmwk1.txt hmwk2.txt

• If the command is typed with too many or too few arguments…

Page 10: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Error!

Page 11: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Making information explicit

• As speakers of Hindi or English, we already have knowledge of predicate argument structure

• E.g. hari ___ pahuMcaa– Capturing this knowledge for the machine is

essential– Ram ne seb khaaya aur paani piyaa– Who drank the water?

Page 12: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Identify arguments

• In PropBank, we first identify arguments of a verb

• When explicitly present, they are called ARG• Further, they are numbered as ARG0, ARG1,

ARG2 etc.• Often, you have ARG as well as ARG-M– RamARG0 ne jaldi seARG-M sebARG1 khaaya

Page 13: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Null arguments

• What if arguments are not explicit?– E.g Ram ne seb khaaya aur ___ paani piyaa– Ram is also the person drinking water– It can be dropped, because of conjunction aur– For the machine, it must be retrieved from the

sentence • We also mark these missing or null arguments

Page 14: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Tasks to be carried out

• Null argument insertion

• Argument annotation

Page 15: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Tools to be used

• Sanchay – GUI for annotators. We use it especially for Null argument insertion

• Use your verbs account to access Sanchay

• Wiki for annotator resources

Page 16: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Timesheets & tips

• Being honest about filling out timesheets is quite important

• We can access the amount of time you spend on verbs

• I will ask you to keep track of number of annotations per hour to cross check

• Turn in the timesheets at my CINC mailbox in physical form, with your signature

Page 17: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Practice

• We need to learn about four kinds of empty categories

• Plan to proceed– Recognizing syntactic constructions – Getting familiar with the tool– Practice with the corpus– Q & A based on null argument insertion