Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… ·...

29
Experiences of a Temporal Annotation Project for French Chinese Temporal/Discourse Annotation Workshop - Los Angeles - 1st June 2010 Andr´ e Bittar Universit´ e Paris Diderot - Alpage 1 er juin 2010

Transcript of Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… ·...

Page 1: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Experiences of a Temporal Annotation Project forFrench

Chinese Temporal/Discourse Annotation Workshop - Los Angeles -1st June 2010

Andre Bittar

Universite Paris Diderot - Alpage

1er juin 2010

Page 2: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Contents

1 Project Outline

2 Methodology

3 Text Sampling

4 Development of Annotation GuidelinesAdaptations for French

5 Automatic Annotation System

6 The Annotation StrategyAutomatic Pre-annotationManual Correction

7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus

8 Conclusion

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 2 / 29

Page 3: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Project Outline

1 Create a TimeML annotated corpus for French• Freely distributable• Good quality• Large• A resource for machine learning• A basis for linguistic study

2 By-products• Annotation guide• Automatic annotation system

3 Methodological inquiry• How complicated is adapting TimeML to French ?• How should we organize the annotation task ?• What are the effects of pre-annotation ?• How can we ensure the quality of the data ?

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 3 / 29

Page 4: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Contents

1 Project Outline

2 Methodology

3 Text Sampling

4 Development of Annotation GuidelinesAdaptations for French

5 Automatic Annotation System

6 The Annotation StrategyAutomatic Pre-annotationManual Correction

7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus

8 Conclusion

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 4 / 29

Page 5: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Methodology

1 Text sampling• Redistributable texts• Extra-linguistic criteria• Linguistic criteria

2 Development of annotation guidelines

3 Development of automatic annotation system

4 Annotation strategy• Automatic pre-annotation + manual correction• Organization of TimeML task into stages : markables (spans,

attributes), relations• Pairwise annotator adjudication after each stage• Consistency check for temporal graphs

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 5 / 29

Page 6: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Contents

1 Project Outline

2 Methodology

3 Text Sampling

4 Development of Annotation GuidelinesAdaptations for French

5 Automatic Annotation System

6 The Annotation StrategyAutomatic Pre-annotationManual Correction

7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus

8 Conclusion

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 6 / 29

Page 7: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Text Sampling

Source

• Est Republicain newspaper corpus

• Full newspaper issues from 1999, 2002, 2003

Sampling Policy

• Objective : achieve variety according to a number of criteria

• Extra-linguistic (external) criteria : date, sex of author, genre, length

• Linguistic (internal) criterion : presence of events and temporalexpressions

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 7 / 29

Page 8: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Contents

1 Project Outline

2 Methodology

3 Text Sampling

4 Development of Annotation GuidelinesAdaptations for French

5 Automatic Annotation System

6 The Annotation StrategyAutomatic Pre-annotationManual Correction

7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus

8 Conclusion

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 8 / 29

Page 9: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Development of Annotation Guidelines

Language-independent adaptations

• Event containers• happen, take place, occur

• Aspectual variants of support verb constructions• carry out an attack vs. launch an attack

• Normalized values for modality

Adaptations for French

• Tense and aspect system

• Aspectual expressions

• Modals

• Mood

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 9 / 29

Page 10: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Tense and aspect system

• Generally not problematic as relatively similar to English (cf Chinese)

• Determined correspondence between TimeML tense and aspectvalues and French grammatical tense and aspect

Verb group tense aspectmange PRESENT NONE

a mange PRESENT PERFECTIVE

mangea PAST NONE

mangeait IMPERFECT NONE

avait mange PAST PERFECTIVE

mangera FUTURE NONE

... ... ...

Tab.: Some TimeML tense and aspect values for French.

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 10 / 29

Page 11: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Aspectual expressions

• But French also often expresses aspect through periphrasis :

Periphrasis aspecten train de + VInf PROGRESSIVEen passe de + VInf PROSPECTIVEen voie de + VInf PROSPECTIVEsur le point de + VInf PROSPECTIVEvenir de + VInf PERFECTIVEaller + VInf PROSPECTIVEen voie de + N PROGRESSIVEen cours de + N PROGRESSIVE... ...

Tab.: TimeML tense and aspect values for some periphrastic expressions.

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 11 / 29

Page 12: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Aspectual expressions

• But French also often expresses aspect through periphrasis :

Periphrasis aspecten train de + VInf PROGRESSIVEen passe de + VInf PROSPECTIVEen voie de + VInf PROSPECTIVEsur le point de + VInf PROSPECTIVEvenir de + VInf PERFECTIVEaller + VInf PROSPECTIVEen voie de + N PROGRESSIVEen cours de + N PROGRESSIVE... ...

Tab.: TimeML tense and aspect values for some periphrastic expressions.

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 12 / 29

Page 13: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Modals

• Modal verbs and adjectives need to be annotated in French• May appear in most tenses (unlike English)• May carry important tense, modality and polarity information• <EVENT class="MODAL">

Modal expression modality(in)certain, sur CERTAINTYfalloir, devoir, necessaire NECESSITYdevoir, obliger, obligatoire OBLIGATIONpouvoir, permettre, permis, interdit PERMISSIONse pouvoir, (im)possible, eventuel POSSIBILITYrisquer de, (im)probable PROBABILITYjournalistic conditional CONJECTURAL

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 13 / 29

Page 14: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Contents

1 Project Outline

2 Methodology

3 Text Sampling

4 Development of Annotation GuidelinesAdaptations for French

5 Automatic Annotation System

6 The Annotation StrategyAutomatic Pre-annotationManual Correction

7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus

8 Conclusion

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 14 / 29

Page 15: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Automatic Annotation System

• Rule-based

• Preprocessing : pos tagging, morpho analysis, chunking

• Cascade of FSTs acting on local chunk context

• Annotates temporal expressions ( TIMEX3 )

• Annotates events ( EVENT )

• Annotates relation markers ( SIGNAL )

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 15 / 29

Page 16: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

System architecture

Pre-processed

text

TIMEX3 + Pre-

processingTIMEX3

annotation

Verb Classify Local

Adjective Filter

Noun Lookup

Verb Classify

Contextual

Context Filter

Tag Adjectives

Noun Filter

Noun Classify

Context Filter

Pre-annotated

text

Verb Lexicon

Noun Lexicon

Modality Lexicon

Tab.: Schema of pre-annotator system architecture.

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 16 / 29

Page 17: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Evaluation

TIMEX3 EVENT SIGNALSpan Attr Span Attr Span

0.87 0.82 0.85 0.82 0.70

Tab.: F-scores for automatic annotation.

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 17 / 29

Page 18: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Contents

1 Project Outline

2 Methodology

3 Text Sampling

4 Development of Annotation GuidelinesAdaptations for French

5 Automatic Annotation System

6 The Annotation StrategyAutomatic Pre-annotationManual Correction

7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus

8 Conclusion

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 18 / 29

Page 19: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

The Annotation Strategy

EVENT correction

Adjudication Adjudication

Coherence check

TIMEX3 correction

SIGNAL correction

Pre-annotated

textAnnotated Markables

Annotated Markables + LINKs

Gold Standard

LINK annotation

Tab.: Schema of adopted annotation strategy

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 19 / 29

Page 20: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Automatic Pre-annotation

Side-effects

• Cuts annotation time nearly in half• For 8 texts : manual = 84 mins, pre-annotated = 48 mins• 40 TIMEX3 , 200 EVENT , 20 SIGNAL

• Clearly reduces human error for TIMEX3 :• omission of attributes (e.g. 20% of type)• erroneous value (esp. deictic expressions)• bad format (e.g. 01-6-1999TU)• erroneous span, e.g. omission of determiners le 23 novembre

• Useful for EVENT span (easy task)

• class most problematic attribute as highly context-dependent

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 20 / 29

Page 21: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Automatic Pre-annotation

Agreement of manual with adjudicated gold

TIMEX3 EVENT SIGNALSpan Attr Span Attr Span

0.75 0.73 0.94 0.89 0.76

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 21 / 29

Page 22: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Manual Correction

• 3 human annotators : 2 inexperienced, paid undergraduate internsfrom the computational linguistics program + myself

• Tools : Callisto and Tango• freely available and easy to learn• not compatible with latest TimeML

• Quickref Annotation Guide : Concise with illustrative examples

• Corpus journal : log of problem cases + adopted solutions

• Adjudication : pairwise comparison to settle discrepancies

• Consistency : temporal graphs checked for consistency throughsaturation, corrected to ensure coherence and usability of data

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 22 / 29

Page 23: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Contents

1 Project Outline

2 Methodology

3 Text Sampling

4 Development of Annotation GuidelinesAdaptations for French

5 Automatic Annotation System

6 The Annotation StrategyAutomatic Pre-annotationManual Correction

7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus

8 Conclusion

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 23 / 29

Page 24: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Evaluating the Process

On average in 1 hour :

Markable # Relations #EVENT 21 TLINK 21TIMEX3 7 or SLINK 4SIGNAL 3 ALINK 0.33

To achieve the same size as TimeBank 1.2 for English : 870 personhours

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 24 / 29

Page 25: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Quantitative Evaluation

In approx. 200 hours of human effort

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 25 / 29

Page 26: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Qualitative Evaluation

Inter-annotator agreement : as average precision and recall.

Texts TIMEX3 EVENT SIGNALSpan Attr Span Attr Span

Pair 1 39 0.9 0.88 0.91 0.91 0.74Pair 2 49 0.86 0.82 0.87 0.83 0.75Pair 3 41 0.91 0.87 0.80 0.82 0.77

Average 0.89 0.86 0.86 0.85 0.75

TimeBank 1.2 10 0.83 0.95 0.78 0.95 0.77

Tab.: F-scores for inter-annotator agreement.

Zero incoherent graphs (versus 18/183 in TimeBank1.2).

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 26 / 29

Page 27: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Contents

1 Project Outline

2 Methodology

3 Text Sampling

4 Development of Annotation GuidelinesAdaptations for French

5 Automatic Annotation System

6 The Annotation StrategyAutomatic Pre-annotationManual Correction

7 Preliminary EvaluationEvaluating the ProcessEvaluating the Corpus

8 Conclusion

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 27 / 29

Page 28: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Conclusion

Conclusion

• TimeML easily adapted to French

• Modularity in the annotation task avoids confusion

• Pre-annotation has clear benefits• reduces time and errors• definitely effective for TIMEX3• useful for EVENT (but careful with attributes)

• An annotation tool (and data quality) could benefit from• modularity (cf BAT)• checking for presence and format of attributes (DTD/schema)• temporal graph coherence checking

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 28 / 29

Page 29: Experiences of a Temporal Annotation Project for Frenchclp/Lab/temporal-workshop_files/bittar… · Contents 1 Project Outline 2 Methodology 3 Text Sampling 4 Development of Annotation

Thank you

GrazieMerci

Andre Bittar (Universite Paris Diderot - Alpage)Experiences of a Temporal Annotation Project for French1er juin 2010 29 / 29