A modality lexicon and its use in automatic tagging

27 January 2010

A modality lexicon and its use in automatic tagging

Kathryn Baker, Michael Bloodgood, Bonnie Dorr, Nathanial W. Filardo, Lori Levin, Christine Piatko

May 20, 2010Presented by Lori Levin

Language Technologies InstituteCarnegie Mellon University

Context

SCALE 2009 Summer Camp in Applied Language Engineering Johns Hopkins University Human Language Technology

Center of Excellence SIMT

Semantically informed MT Can we improve statistical MT with semantic knowledge?

- experiments with modality and named entities

Modality Tagger Output

Example 1: Input: Americans should know that we can not hand over Dr. Khan to

them. Output: Americans <TrigRequire should> <TargRequire know> that we

<TrigAble can> <TrigNegation not> <TargNOTAble hand> over Dr. Khan to them

Example 2: Input: He managed to hold general elections in the year 2002, but he

can not be ignorant of the fact that the world at large did not accept these elections

Output: He <TrigSucceed managed> to <TargSucceed hold> general elections in the year 2002, but he <TrigAble can> <TrigNegation not> <TargNOTAble be> ignorant of the fact that the world at large did <TrigNegation not> <TrigBelief accept> these <TargBelief elections>

Trigger: lexical item that carries a modal meaning.Target: head of the proposition that it scopes over

Holder: the experiencer or cognizer of the modality.

Outline

A modality annotation scheme A modality lexicon A string based modality tagger A tree based modality tagger Evaluation of the taggers Semantically informed MT

Core Cases of Modality

Necessity PossibilityEpistemic John must have

arrivedJohn may have arrived

Deontic/Situational

John has to leave now

You may leave now.

One can get to Staten Island using a ferry.

(van der Auwera and Amman, World Atlas of Language Structures)

Related Concepts: Factivity

Did the proposition happen or not? John went to New York. John may go to New York. If John goes to New York, he will visit MOMA. John bought a ticket to go to NY.

FactBank: Saurí and Pustejovsky

Related Concepts: Evidentiality

Source of information First hand experience or hearsay

- They say that John went to NY. Sensory information

- I heard that John went to NY. Conclusion from evidence

- I don’t see John, so he must have gone to NY.

Other Related Concepts

Speaker attitude and sentiment Conditionality Hypotheticality Realis and Irrealis mood Tense, aspect, etc.

Modality

Example 1: Input: Americans should know that we can not hand over Dr. Khan

to them. Output: Americans <TrigRequire should> <TargRequire know> that

we <TrigAble can> <TrigNegation not> <TargNOTAble hand> over Dr. Khan to them

Example 2: Input: He managed to hold general elections in the year 2002, but

he can not be ignorant of the fact that the world at large did not accept these elections


Modality Annotation and Tagging

Annotation: Humans add labels to text, following instructions from a coding manual that defines an annotation scheme.

Tagging: A program automatically assigns labels Goals:

Design an annotation scheme that can be followed with high intercoder agreement and low annotation time and cost

Train a tagger on human annotated data Build a tagger based on the annotation scheme

The inventory of modalities in the annotation scheme

Belief: with what strength does H believe P? Requirement: does H require P? Permissive: does H allow P? Intention: does H intend P? Effort: does H try to do P? Ability: can H do P? Success: does H succeed in P? Want: does H want P?

Joint work with Sergei Nirenburg, Marge McShane, Teruko Mitamura, Owen Rambow, Mona Diab, Eduard Hovy, Bonnie Dorr, Christine Piatko, Michael Bloodgood

H = Holder (experiencer or cognizer)P = Proposition

The Annotation Scheme Identify a modality target P and then choose one of

these modalities (choose the first one that applies) H requires [P to be true/false] H permits [P to be true/false] H succeeds in [making P true/false] H does not succeed in [making P true/false] H is trying [to make P true/false] H is not trying [to make P true/false] H intends [to make P true/false] H does not intend [to make P true/false] H is able [to make P true/false] H is not able [to make P true/false] H wants [P to be true/false] H firmly believes [P is true/false] H believes [P may be true/false]

Six Simplifications

Transparency to negation Duality of require and permit Ordering for entailment Annotators were not asked to nest modalities. Default is Firmly Believe Annotators were not asked to mark the holder.

SimplificationsTransparency to negation

Some modalities have negatives in the annotation scheme: not intend, not try, not be able, not succeed

Believe and want do not have negatives in our annotation scheme because of the similarity of I don’t want him to go/I want him not to go.

- Both are coded as H wants P to be false I don’t believe he will go/I believe he will not go.

- Both are coded as H believes P to be false.

SimplificationsDuality of require and permit

Require and permit do not have negations in the annotation scheme because Not require P to be true means Permit P to be false Not permit P to be true means Require P to be false

SimplificationsOrdering for entailment

John managed to go to NY. What modality is this? Success? Intent? Effort?

Desire? Ability? Two entailment groupings ordered with respect

to each other: 1. {requires permits}2. {succeeds tries intends is able wants}

Both apply before “believe”, which is not in an entailment relation with either grouping.

The annotators are instructed to choose the first modality in the list that applies.

SimplificationsNo embedding of modalities

He might be able to swim Only ability is tagged

Modals are never considered as targets of other modals in the annotation process

Six Simplifications

Transparency to negation Duality of require and permit Ordering for entailment Annotators were not asked to nest modalities. Default is Firmly Believe Annotators were not asked to mark the holder.

English Modality Lexicon

Modality trigger words might, should, require, permit, need, try, possible, fail,

etc. About 150 lemmas

plus five forms for each verb where applicable- bare infinitive, present tense –s, past tense, past participle,

present participle

English Modality Lexicon Example

need Pos: VB Modality: Require Trigger word: Need Subcategorization codes

- V3-passive-basic Large helicopters are needed to dispatch urgent relief materials.

- V3-I3-basic The government will need to work continuously for at least a year. We will need them to work continuously.

- T1-monotransitive-for-V3-verbs We need a Sir Sayyed again to maintain this sentiment.

- T1-passive-for-V3-verb He is needed to work continuously.

- modal-auxiliary-basic He need not go.

Modality

Example 1: Input: Americans should know that we can not hand over Dr. Khan

to them. Output: Americans <TrigRequire should> <TargRequire know> that

we <TrigAble can> <TrigNegation not> <TargNOTAble hand> over Dr. Khan to them

Example 2: Input: He managed to hold general elections in the year 2002, but

he can not be ignorant of the fact that the world at large did not accept these elections


String Based English Modality Tagger

Input Text that has been tagged with parts of speech.

Mark Triggers Mark spans of words that are exact matches to entries

in the modality lexicon and that have the same part of speech.

Mark Targets Next non-auxiliary verb to the right of a trigger

Spans of words can be marked multiple times with different triggers and targets.

AmericansNNPS

S

NPVP

shouldMD knowVB that S

NP

wePRP VP

canMD notRB

handVB over NP

DrNNP KhanNNP

PP

to them

Modality Tagging

VB

VP

MD

should

Template

Used T-Surgeon (Stanford NLP tools) to find trees that match templates and mark modality triggers and targets.

Target

Trigger

The Structure-Based English modality Tagger

The Structure-Based English Modality Tagger

S

NP

AmericansNNPS

VP-require

MD-TrigRequire VB-TargRequireshould know that

S

NP

wePRP VP-NOTAble

MD-TrigAblecan

RB-TrigNegationnot

VB-TargNOTAblehandVB over

NP

DrNNP KhanNNP

PP

to them

1. T-surgeon

2. Percolation

What was covered

15 subcategorization patterns 150 lemmas Expressions of modality with lexical triggers

What wasn’t covered

Non-lexical modality Imperatives Other constructions

- It will be a long time/a cold day in hell before… Targets in coordinate structures

To do next Word sense disambiguation

Can, must: deontic or epistemic Manage: manage to do something vs manage a project

Transitivity alternations: alternate mappings between grammatical relations and semantic roles The plan succeeded The government succeeded in its plan. The government succeeded ????

Evaluation: agreement between string-based and structure-based taggers

Calculated Kappa on the basis of 88108 sentences from the English side of the Urdu-English corpus for MTEval

2009

Example: TargPermit (John is allowed to <TargPermit go> to NY)

- 585 Matching Both taggers- 163 Matching just structure-based tagger - 194 Matching just string-based tagger- 87166 No match either tagger

Triggers: Kappa = .82 Targets: Kappa = .76

Evaluation: Structure Based Tagger

Recall: not feasible to look for all expressions of modality that we didn’t tag.- No gold-standard annotated corpus.

Precision: - 249 sentences that were tagged with triggers and targets- From the English side of the MTEval 2009 training

sentences- 86.3% correct

But ranges from about 82% to about 92% depending on genre

Precision: Errors

Light verb or noun is correct syntactic target but not the correct semantic target. Earthquake affected areas in Pakistan will be provided the

required number of tents and blankets by November 15. The decision should be taken on delayed cases on the

basis of merit. Wrong word sense

In Bayas, Sikhs attacked a train under cover of night and killed everyone.

The process of provision of relief goods to needy people should be managed by the Army and the Edhi Trust.

Should be allowed to work like this in the future. - Like: succeed in something

Precision: Errors

Wrong subcategorization pattern. The officials should consider themselves as servants of

the people. Coordinate Structures

Many large helicopters are needed to dispatch urgent relief materials to the many affected in far flung areas of the Neelam Valley and only America can help us in this regard.

Recall: what did we miss?

Special forms of negation There was no place to seek shelter. The buildings should be reconstructed, not with the RCC,

but with the wood and steel sheets. Constructional and phrasal triggers

President Pervaiz Musharraf has said that he will not rest unless the process of rehabilitation is completed.

Random lexical omissions It is not possible in the middle of winter to re-open the

roads.

SIMTSemantically Informed MT

S

NP

AmericansNNPS

VP-require

MD-TrigRequire VB-TargRequireshould know that

S

NP

wePRP VP-NOTAble

MD-TrigAblecan

RB-TrigNegationnot

VB-TargNOTAblehandVB over

NP

DrNNP KhanNNP

PP

to them

1. T-surgeon

2. Percolation

Integration of the modality tagger with Syntax Based SMT

Joshua Syntax Based SMT system Callison-Burch

Tag modalities on the English side of the training data.

Without modality tags: BLUE 26.4 With modality tags: BLUE 26.7

Advantages of SIMT

Good for translation between a less commonly taught language and a common language Modality can be analyzed on the common language and

projected via word alignments to the LCTL Depth of semantic analysis Robustness of statistical approach

Summary

Modality annotation scheme Modality lexicon Automatic modality tagger An method for integrating semantics into SMT

Good for translation between LCTLs and common languages

Future work

Improvements to the tagger Add patterns for constructions without simple lexical

triggers. Word sense disambiguation (manage, attack, etc.) Semantic composition of multiple modalities and

negation. Tagging of holders

Applications of the tagger Further experiments with SIMT Integration into tagger for Committed Belief (factivity)

A modality lexicon and its use in automatic tagging

Documents

Transcript of A modality lexicon and its use in automatic tagging