A modality lexicon and its use in automatic tagging
description
Transcript of A modality lexicon and its use in automatic tagging
27 January 2010
A modality lexicon and its use in automatic tagging
Kathryn Baker, Michael Bloodgood, Bonnie Dorr, Nathanial W. Filardo, Lori Levin, Christine Piatko
May 20, 2010Presented by Lori Levin
Language Technologies InstituteCarnegie Mellon University
Context
SCALE 2009 Summer Camp in Applied Language Engineering Johns Hopkins University Human Language Technology
Center of Excellence SIMT
Semantically informed MT Can we improve statistical MT with semantic knowledge?
- experiments with modality and named entities
Modality Tagger Output
Example 1: Input: Americans should know that we can not hand over Dr. Khan to
them. Output: Americans <TrigRequire should> <TargRequire know> that we
<TrigAble can> <TrigNegation not> <TargNOTAble hand> over Dr. Khan to them
Example 2: Input: He managed to hold general elections in the year 2002, but he
can not be ignorant of the fact that the world at large did not accept these elections
Output: He <TrigSucceed managed> to <TargSucceed hold> general elections in the year 2002, but he <TrigAble can> <TrigNegation not> <TargNOTAble be> ignorant of the fact that the world at large did <TrigNegation not> <TrigBelief accept> these <TargBelief elections>
Trigger: lexical item that carries a modal meaning.Target: head of the proposition that it scopes over
Holder: the experiencer or cognizer of the modality.
Outline
A modality annotation scheme A modality lexicon A string based modality tagger A tree based modality tagger Evaluation of the taggers Semantically informed MT
Core Cases of Modality
Necessity PossibilityEpistemic John must have
arrivedJohn may have arrived
Deontic/Situational
John has to leave now
You may leave now.
One can get to Staten Island using a ferry.
(van der Auwera and Amman, World Atlas of Language Structures)
Related Concepts: Factivity
Did the proposition happen or not? John went to New York. John may go to New York. If John goes to New York, he will visit MOMA. John bought a ticket to go to NY.
FactBank: Saurí and Pustejovsky
Related Concepts: Evidentiality
Source of information First hand experience or hearsay
- They say that John went to NY. Sensory information
- I heard that John went to NY. Conclusion from evidence
- I don’t see John, so he must have gone to NY.
Other Related Concepts
Speaker attitude and sentiment Conditionality Hypotheticality Realis and Irrealis mood Tense, aspect, etc.
Modality
Example 1: Input: Americans should know that we can not hand over Dr. Khan
to them. Output: Americans <TrigRequire should> <TargRequire know> that
we <TrigAble can> <TrigNegation not> <TargNOTAble hand> over Dr. Khan to them
Example 2: Input: He managed to hold general elections in the year 2002, but
he can not be ignorant of the fact that the world at large did not accept these elections
Output: He <TrigSucceed managed> to <TargSucceed hold> general elections in the year 2002, but he <TrigAble can> <TrigNegation not> <TargNOTAble be> ignorant of the fact that the world at large did <TrigNegation not> <TrigBelief accept> these <TargBelief elections>
Modality Annotation and Tagging
Annotation: Humans add labels to text, following instructions from a coding manual that defines an annotation scheme.
Tagging: A program automatically assigns labels Goals:
Design an annotation scheme that can be followed with high intercoder agreement and low annotation time and cost
Train a tagger on human annotated data Build a tagger based on the annotation scheme
The inventory of modalities in the annotation scheme
Belief: with what strength does H believe P? Requirement: does H require P? Permissive: does H allow P? Intention: does H intend P? Effort: does H try to do P? Ability: can H do P? Success: does H succeed in P? Want: does H want P?
Joint work with Sergei Nirenburg, Marge McShane, Teruko Mitamura, Owen Rambow, Mona Diab, Eduard Hovy, Bonnie Dorr, Christine Piatko, Michael Bloodgood
H = Holder (experiencer or cognizer)P = Proposition
The Annotation Scheme Identify a modality target P and then choose one of
these modalities (choose the first one that applies) H requires [P to be true/false] H permits [P to be true/false] H succeeds in [making P true/false] H does not succeed in [making P true/false] H is trying [to make P true/false] H is not trying [to make P true/false] H intends [to make P true/false] H does not intend [to make P true/false] H is able [to make P true/false] H is not able [to make P true/false] H wants [P to be true/false] H firmly believes [P is true/false] H believes [P may be true/false]
Six Simplifications
Transparency to negation Duality of require and permit Ordering for entailment Annotators were not asked to nest modalities. Default is Firmly Believe Annotators were not asked to mark the holder.
SimplificationsTransparency to negation
Some modalities have negatives in the annotation scheme: not intend, not try, not be able, not succeed
Believe and want do not have negatives in our annotation scheme because of the similarity of I don’t want him to go/I want him not to go.
- Both are coded as H wants P to be false I don’t believe he will go/I believe he will not go.
- Both are coded as H believes P to be false.
SimplificationsDuality of require and permit
Require and permit do not have negations in the annotation scheme because Not require P to be true means Permit P to be false Not permit P to be true means Require P to be false
SimplificationsOrdering for entailment
John managed to go to NY. What modality is this? Success? Intent? Effort?
Desire? Ability? Two entailment groupings ordered with respect
to each other: 1. {requires permits}2. {succeeds tries intends is able wants}
Both apply before “believe”, which is not in an entailment relation with either grouping.
The annotators are instructed to choose the first modality in the list that applies.
SimplificationsNo embedding of modalities
He might be able to swim Only ability is tagged
Modals are never considered as targets of other modals in the annotation process
Six Simplifications
Transparency to negation Duality of require and permit Ordering for entailment Annotators were not asked to nest modalities. Default is Firmly Believe Annotators were not asked to mark the holder.
Six Simplifications
Transparency to negation Duality of require and permit Ordering for entailment Annotators were not asked to nest modalities. Default is Firmly Believe Annotators were not asked to mark the holder.
English Modality Lexicon
Modality trigger words might, should, require, permit, need, try, possible, fail,
etc. About 150 lemmas
plus five forms for each verb where applicable- bare infinitive, present tense –s, past tense, past participle,
present participle
English Modality Lexicon Example
need Pos: VB Modality: Require Trigger word: Need Subcategorization codes
- V3-passive-basic Large helicopters are needed to dispatch urgent relief materials.
- V3-I3-basic The government will need to work continuously for at least a year. We will need them to work continuously.
- T1-monotransitive-for-V3-verbs We need a Sir Sayyed again to maintain this sentiment.
- T1-passive-for-V3-verb He is needed to work continuously.
- modal-auxiliary-basic He need not go.
Modality
Example 1: Input: Americans should know that we can not hand over Dr. Khan
to them. Output: Americans <TrigRequire should> <TargRequire know> that
we <TrigAble can> <TrigNegation not> <TargNOTAble hand> over Dr. Khan to them
Example 2: Input: He managed to hold general elections in the year 2002, but
he can not be ignorant of the fact that the world at large did not accept these elections
Output: He <TrigSucceed managed> to <TargSucceed hold> general elections in the year 2002, but he <TrigAble can> <TrigNegation not> <TargNOTAble be> ignorant of the fact that the world at large did <TrigNegation not> <TrigBelief accept> these <TargBelief elections>
String Based English Modality Tagger
Input Text that has been tagged with parts of speech.
Mark Triggers Mark spans of words that are exact matches to entries
in the modality lexicon and that have the same part of speech.
Mark Targets Next non-auxiliary verb to the right of a trigger
Spans of words can be marked multiple times with different triggers and targets.
AmericansNNPS
S
NPVP
shouldMD knowVB that S
NP
wePRP VP
canMD notRB
handVB over NP
DrNNP KhanNNP
PP
to them
Modality Tagging
VB
VP
MD
should
Template
Used T-Surgeon (Stanford NLP tools) to find trees that match templates and mark modality triggers and targets.
Target
Trigger
The Structure-Based English modality Tagger
The Structure-Based English Modality Tagger
S
NP
AmericansNNPS
VP-require
MD-TrigRequire VB-TargRequireshould know that
S
NP
wePRP VP-NOTAble
MD-TrigAblecan
RB-TrigNegationnot
VB-TargNOTAblehandVB over
NP
DrNNP KhanNNP
PP
to them
1. T-surgeon
2. Percolation
What was covered
15 subcategorization patterns 150 lemmas Expressions of modality with lexical triggers
What wasn’t covered
Non-lexical modality Imperatives Other constructions
- It will be a long time/a cold day in hell before… Targets in coordinate structures
To do next Word sense disambiguation
Can, must: deontic or epistemic Manage: manage to do something vs manage a project
Transitivity alternations: alternate mappings between grammatical relations and semantic roles The plan succeeded The government succeeded in its plan. The government succeeded ????
Evaluation: agreement between string-based and structure-based taggers
Calculated Kappa on the basis of 88108 sentences from the English side of the Urdu-English corpus for MTEval
2009
Example: TargPermit (John is allowed to <TargPermit go> to NY)
- 585 Matching Both taggers- 163 Matching just structure-based tagger - 194 Matching just string-based tagger- 87166 No match either tagger
Triggers: Kappa = .82 Targets: Kappa = .76
Evaluation: Structure Based Tagger
Recall: not feasible to look for all expressions of modality that we didn’t tag.- No gold-standard annotated corpus.
Precision: - 249 sentences that were tagged with triggers and targets- From the English side of the MTEval 2009 training
sentences- 86.3% correct
But ranges from about 82% to about 92% depending on genre
Precision: Errors
Light verb or noun is correct syntactic target but not the correct semantic target. Earthquake affected areas in Pakistan will be provided the
required number of tents and blankets by November 15. The decision should be taken on delayed cases on the
basis of merit. Wrong word sense
In Bayas, Sikhs attacked a train under cover of night and killed everyone.
The process of provision of relief goods to needy people should be managed by the Army and the Edhi Trust.
Should be allowed to work like this in the future. - Like: succeed in something
Precision: Errors
Wrong subcategorization pattern. The officials should consider themselves as servants of
the people. Coordinate Structures
Many large helicopters are needed to dispatch urgent relief materials to the many affected in far flung areas of the Neelam Valley and only America can help us in this regard.
Recall: what did we miss?
Special forms of negation There was no place to seek shelter. The buildings should be reconstructed, not with the RCC,
but with the wood and steel sheets. Constructional and phrasal triggers
President Pervaiz Musharraf has said that he will not rest unless the process of rehabilitation is completed.
Random lexical omissions It is not possible in the middle of winter to re-open the
roads.
SIMTSemantically Informed MT
S
NP
AmericansNNPS
VP-require
MD-TrigRequire VB-TargRequireshould know that
S
NP
wePRP VP-NOTAble
MD-TrigAblecan
RB-TrigNegationnot
VB-TargNOTAblehandVB over
NP
DrNNP KhanNNP
PP
to them
1. T-surgeon
2. Percolation
Integration of the modality tagger with Syntax Based SMT
Joshua Syntax Based SMT system Callison-Burch
Tag modalities on the English side of the training data.
Without modality tags: BLUE 26.4 With modality tags: BLUE 26.7
Advantages of SIMT
Good for translation between a less commonly taught language and a common language Modality can be analyzed on the common language and
projected via word alignments to the LCTL Depth of semantic analysis Robustness of statistical approach
Summary
Modality annotation scheme Modality lexicon Automatic modality tagger An method for integrating semantics into SMT
Good for translation between LCTLs and common languages
Future work
Improvements to the tagger Add patterns for constructions without simple lexical
triggers. Word sense disambiguation (manage, attack, etc.) Semantic composition of multiple modalities and
negation. Tagging of holders
Applications of the tagger Further experiments with SIMT Integration into tagger for Committed Belief (factivity)
END