Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… ·...
Transcript of Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… ·...
Experiences of a Temporal Annotation Project forFrench
Chinese Temporal/Discourse Annotation Workshop - Los Angeles -1st June 2010
Andre Bittar
Universite Paris Diderot - Alpage
1er juin 2010
Contents
1 Project Outline
2 Methodology
3 Text Sampling
4 Development of Annotation GuidelinesAdaptations for French
5 Automatic Annotation System
6 The Annotation StrategyAutomatic Pre-annotationManual Correction
7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus
8 Conclusion
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 2 / 29
Project Outline
1 Create a TimeML annotated corpus for French• Freely distributable• Good quality• Large• A resource for machine learning• A basis for linguistic study
2 By-products• Annotation guide• Automatic annotation system
3 Methodological inquiry• How complicated is adapting TimeML to French ?• How should we organize the annotation task ?• What are the effects of pre-annotation ?• How can we ensure the quality of the data ?
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 3 / 29
Contents
1 Project Outline
2 Methodology
3 Text Sampling
4 Development of Annotation GuidelinesAdaptations for French
5 Automatic Annotation System
6 The Annotation StrategyAutomatic Pre-annotationManual Correction
7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus
8 Conclusion
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 4 / 29
Methodology
1 Text sampling• Redistributable texts• Extra-linguistic criteria• Linguistic criteria
2 Development of annotation guidelines
3 Development of automatic annotation system
4 Annotation strategy• Automatic pre-annotation + manual correction• Organization of TimeML task into stages : markables (spans,
attributes), relations• Pairwise annotator adjudication after each stage• Consistency check for temporal graphs
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 5 / 29
Contents
1 Project Outline
2 Methodology
3 Text Sampling
4 Development of Annotation GuidelinesAdaptations for French
5 Automatic Annotation System
6 The Annotation StrategyAutomatic Pre-annotationManual Correction
7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus
8 Conclusion
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 6 / 29
Text Sampling
Source
• Est Republicain newspaper corpus
• Full newspaper issues from 1999, 2002, 2003
Sampling Policy
• Objective : achieve variety according to a number of criteria
• Extra-linguistic (external) criteria : date, sex of author, genre, length
• Linguistic (internal) criterion : presence of events and temporalexpressions
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 7 / 29
Contents
1 Project Outline
2 Methodology
3 Text Sampling
4 Development of Annotation GuidelinesAdaptations for French
5 Automatic Annotation System
6 The Annotation StrategyAutomatic Pre-annotationManual Correction
7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus
8 Conclusion
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 8 / 29
Development of Annotation Guidelines
Language-independent adaptations
• Event containers• happen, take place, occur
• Aspectual variants of support verb constructions• carry out an attack vs. launch an attack
• Normalized values for modality
Adaptations for French
• Tense and aspect system
• Aspectual expressions
• Modals
• Mood
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 9 / 29
Tense and aspect system
• Generally not problematic as relatively similar to English (cf Chinese)
• Determined correspondence between TimeML tense and aspectvalues and French grammatical tense and aspect
Verb group tense aspectmange PRESENT NONE
a mange PRESENT PERFECTIVE
mangea PAST NONE
mangeait IMPERFECT NONE
avait mange PAST PERFECTIVE
mangera FUTURE NONE
... ... ...
Tab.: Some TimeML tense and aspect values for French.
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 10 / 29
Aspectual expressions
• But French also often expresses aspect through periphrasis :
Periphrasis aspecten train de + VInf PROGRESSIVEen passe de + VInf PROSPECTIVEen voie de + VInf PROSPECTIVEsur le point de + VInf PROSPECTIVEvenir de + VInf PERFECTIVEaller + VInf PROSPECTIVEen voie de + N PROGRESSIVEen cours de + N PROGRESSIVE... ...
Tab.: TimeML tense and aspect values for some periphrastic expressions.
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 11 / 29
Aspectual expressions
• But French also often expresses aspect through periphrasis :
Periphrasis aspecten train de + VInf PROGRESSIVEen passe de + VInf PROSPECTIVEen voie de + VInf PROSPECTIVEsur le point de + VInf PROSPECTIVEvenir de + VInf PERFECTIVEaller + VInf PROSPECTIVEen voie de + N PROGRESSIVEen cours de + N PROGRESSIVE... ...
Tab.: TimeML tense and aspect values for some periphrastic expressions.
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 12 / 29
Modals
• Modal verbs and adjectives need to be annotated in French• May appear in most tenses (unlike English)• May carry important tense, modality and polarity information• <EVENT class="MODAL">
Modal expression modality(in)certain, sur CERTAINTYfalloir, devoir, necessaire NECESSITYdevoir, obliger, obligatoire OBLIGATIONpouvoir, permettre, permis, interdit PERMISSIONse pouvoir, (im)possible, eventuel POSSIBILITYrisquer de, (im)probable PROBABILITYjournalistic conditional CONJECTURAL
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 13 / 29
Contents
1 Project Outline
2 Methodology
3 Text Sampling
4 Development of Annotation GuidelinesAdaptations for French
5 Automatic Annotation System
6 The Annotation StrategyAutomatic Pre-annotationManual Correction
7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus
8 Conclusion
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 14 / 29
Automatic Annotation System
• Rule-based
• Preprocessing : pos tagging, morpho analysis, chunking
• Cascade of FSTs acting on local chunk context
• Annotates temporal expressions ( TIMEX3 )
• Annotates events ( EVENT )
• Annotates relation markers ( SIGNAL )
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 15 / 29
System architecture
Pre-processed
text
TIMEX3 + Pre-
processingTIMEX3
annotation
Verb Classify Local
Adjective Filter
Noun Lookup
Verb Classify
Contextual
Context Filter
Tag Adjectives
Noun Filter
Noun Classify
Context Filter
Pre-annotated
text
Verb Lexicon
Noun Lexicon
Modality Lexicon
Tab.: Schema of pre-annotator system architecture.
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 16 / 29
Evaluation
TIMEX3 EVENT SIGNALSpan Attr Span Attr Span
0.87 0.82 0.85 0.82 0.70
Tab.: F-scores for automatic annotation.
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 17 / 29
Contents
1 Project Outline
2 Methodology
3 Text Sampling
4 Development of Annotation GuidelinesAdaptations for French
5 Automatic Annotation System
6 The Annotation StrategyAutomatic Pre-annotationManual Correction
7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus
8 Conclusion
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 18 / 29
The Annotation Strategy
EVENT correction
Adjudication Adjudication
Coherence check
TIMEX3 correction
SIGNAL correction
Pre-annotated
textAnnotated Markables
Annotated Markables + LINKs
Gold Standard
LINK annotation
Tab.: Schema of adopted annotation strategy
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 19 / 29
Automatic Pre-annotation
Side-effects
• Cuts annotation time nearly in half• For 8 texts : manual = 84 mins, pre-annotated = 48 mins• 40 TIMEX3 , 200 EVENT , 20 SIGNAL
• Clearly reduces human error for TIMEX3 :• omission of attributes (e.g. 20% of type)• erroneous value (esp. deictic expressions)• bad format (e.g. 01-6-1999TU)• erroneous span, e.g. omission of determiners le 23 novembre
• Useful for EVENT span (easy task)
• class most problematic attribute as highly context-dependent
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 20 / 29
Automatic Pre-annotation
Agreement of manual with adjudicated gold
TIMEX3 EVENT SIGNALSpan Attr Span Attr Span
0.75 0.73 0.94 0.89 0.76
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 21 / 29
Manual Correction
• 3 human annotators : 2 inexperienced, paid undergraduate internsfrom the computational linguistics program + myself
• Tools : Callisto and Tango• freely available and easy to learn• not compatible with latest TimeML
• Quickref Annotation Guide : Concise with illustrative examples
• Corpus journal : log of problem cases + adopted solutions
• Adjudication : pairwise comparison to settle discrepancies
• Consistency : temporal graphs checked for consistency throughsaturation, corrected to ensure coherence and usability of data
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 22 / 29
Contents
1 Project Outline
2 Methodology
3 Text Sampling
4 Development of Annotation GuidelinesAdaptations for French
5 Automatic Annotation System
6 The Annotation StrategyAutomatic Pre-annotationManual Correction
7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus
8 Conclusion
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 23 / 29
Evaluating the Process
On average in 1 hour :
Markable # Relations #EVENT 21 TLINK 21TIMEX3 7 or SLINK 4SIGNAL 3 ALINK 0.33
To achieve the same size as TimeBank 1.2 for English : 870 personhours
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 24 / 29
Quantitative Evaluation
In approx. 200 hours of human effort
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 25 / 29
Qualitative Evaluation
Inter-annotator agreement : as average precision and recall.
Texts TIMEX3 EVENT SIGNALSpan Attr Span Attr Span
Pair 1 39 0.9 0.88 0.91 0.91 0.74Pair 2 49 0.86 0.82 0.87 0.83 0.75Pair 3 41 0.91 0.87 0.80 0.82 0.77
Average 0.89 0.86 0.86 0.85 0.75
TimeBank 1.2 10 0.83 0.95 0.78 0.95 0.77
Tab.: F-scores for inter-annotator agreement.
Zero incoherent graphs (versus 18/183 in TimeBank1.2).
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 26 / 29
Contents
1 Project Outline
2 Methodology
3 Text Sampling
4 Development of Annotation GuidelinesAdaptations for French
5 Automatic Annotation System
6 The Annotation StrategyAutomatic Pre-annotationManual Correction
7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus
8 Conclusion
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 27 / 29
Conclusion
Conclusion
• TimeML easily adapted to French
• Modularity in the annotation task avoids confusion
• Pre-annotation has clear benefits• reduces time and errors• definitely effective for TIMEX3• useful for EVENT (but careful with attributes)
• An annotation tool (and data quality) could benefit from• modularity (cf BAT)• checking for presence and format of attributes (DTD/schema)• temporal graph coherence checking
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 28 / 29
Thank you
GrazieMerci
Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 29 / 29