Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon,...

19
Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William B. Dolan, Dmitriy Belenko, Lucy Vanderwende Reporter: Chia-Ying Lee Advisor: Hsin-Hsi Chen Microsoft Research & University of Illinois IJCNLP 2008

Transcript of Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon,...

Page 1: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Using Contextual Speller Techniques and Language Modeling for ESL Error Correction

Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William B. Dolan, Dmitriy Belenko, Lucy Vanderwende

Reporter: Chia-Ying Lee Advisor: Hsin-Hsi Chen

Microsoft Research & University of IllinoisIJCNLP 2008

Page 2: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Introduction About 750M (74%) people use English as a

second language (Crystal 1997)

Non-native writer encountered some special problem. (Ex: prepositions 介係詞 )

Challenge: Writing errors often present a semantic dimension(Ex: at school 指地點, in school 指時間 )

2

Page 3: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Target Error Type1. Preposition 介係詞 presence and choice: In the other

hand, ... (On the other hand ...) 2. Definite and indefinite determiner presence and choice:

I am teacher... (am a teacher) 3. Gerund 動名詞 /infinitive 不定詞 confusion: I am

interesting in this book. (interested in) 4. Auxiliary verb presence and choice: 從屬動詞 My

teacher does is a good teacher (my teacher is...) 5. Over-regularized verb inflection: I writed a letter (wrote) 6. Adjective/noun confusion: This is a China book (Chinese

book) 7. Word order (adjective sequences and nominal

compounds): I am a student of university (university student)

8. Noun pluralization: They have many knowledges (much knowledge)

3

Page 4: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Problem Definition

Present a modular system for detection and correction of errors made by non-native writers.

Focus on preposition and determiner related problem.

4

Page 5: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Related Work Turner and Charniak (2007) utilize a language

model based on a statistical parser for determiner and preposition selection

De Felice and Pulman (2007) utilize a set of sophisticated syntactic and semantic analysis features to predict 5 common English prepositions

Han et al. (2004, 2006) use a maximum entropy classifier to propose article corrections

Izumi et al. (2003) and Chodorow et al. (2007) present techniques of automatic preposition choice modeling

5

Page 6: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

System Description 0. Preprocessing

Tokenized and POS tagged

1. Suggestion Provider (SP) Detection and correction

2. Language Model (LM) Delete the suggestions whose score is lower than

original

3. Example Provider (EP) Query the web for exemplary sentences

6

Page 7: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Suggestion Provider(1/3)

Classifiers : Presence/absence or pa classifier

ex: p(article + teacher) = 0.54 Choice or ch classifier

ex: p(the) = 0.04 p(a/an) = 0.96

Potential insertion sites are determined heuristically from the sequence of POS tags

7

Page 8: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Suggestion Provider(2/3) Features: ( ±6 tokens)

Relative position Token string POS tags Example: 0/I/PRP 1/am/VBP 2/teacher/NN

3/from/IN 4/Korea/NNP 5/./.

Decision tree classifiers (WinMine toolkit Chickering 2002) Better than linear SVM

8

Page 9: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Suggestion Provider(3/3) Data set:

English Encarta encyclopedia (560k sentences) A random set of 1M sentences from a Reuters

news data set.

Preposition from the NICT Japanese Learners of English corpus :about, as, at, by, for, from, in, like, of, on, since, to, with, than, “other“

9

Page 10: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Language Model 5-gram model trained on the English

Gigaword corpus (LDC2005T12) 120K-word vocabulary 54 million bigrams, 338 million trigrams, 801

million 4-grams and 12 billion 5-grams. Use interpolated Kneser-Ney smoothing

(Kneser and Ney 1995) without count cutoff Score:

I am teacher from Korea. score = 0.19 I am a teacher from Korea. score = 0.60

10

Page 11: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Example Provider (1/2)

Web Search

String query in a small window

Ranking rule:In the same sentenceSentence length Context overlap

11

Page 12: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Example Provider (2/2)

Original: I want to travel Disneyland in March. Suggestion: I want to travel to Disneyland in

March. Top 3 examples: 1. Timothy's wish was to travel to

Disneyland in California. 2. Should you travel to Disneyland in

California or to Disney World in Florida? 3. The tourists who travel to Disneyland in

California can either choose to stay in Disney resorts or in the hotel for Disneyland vacations.

12

Page 13: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Evaluation (1/5) Suggestion provider

Determiner choice Preposition choice Language model

Human evaluation 70% for training; 30%for testing Combined accuracy:

13

Page 14: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Evaluation (2/5) Suggestion provider

Determiner choice

Baseline:69.9% Choosing the mostfrequent class label none State of the art Turner and Charniak

(Penn Tree Bank): 86.74%

14

Page 15: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Evaluation (3/5) Suggestion provider

Preposition choice

Baseline : 28.94%Using no preposition

15

Page 16: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Evaluation (4/5) Language Model

Reduced the number of preposition corrections by 66.8% and the determiner corrections by 50.7%

Increase precision dramatically

For the accuracy of preposition suggestions LM score + classifier probability : 62.32%LM score alone: 58.36%

16

Page 17: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Evaluation (5/5) Human evaluation

17

CLEC: Chinese Learners of English Corpus (Gui and Yang 2003)

Page 18: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Conclusion and Future Work Successfully combining contextual speller

based methods with language model scoring and providing web-based examples.

The system can work even in extremely noisy text with reasonable accuracy

Future Work : Using web counts to build a learned ranker that

combines information from language model and classifiers

18

Page 19: Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Thank you!

19

買敏順找敏順!敏順讓您呼吸順暢 輕鬆舒爽