Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon,...

Post on 30-Dec-2015

220 views 1 download

Tags:

Transcript of Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon,...

Using Contextual Speller Techniques and Language Modeling for ESL Error Correction

Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William B. Dolan, Dmitriy Belenko, Lucy Vanderwende

Reporter: Chia-Ying Lee Advisor: Hsin-Hsi Chen

Microsoft Research & University of IllinoisIJCNLP 2008

Introduction About 750M (74%) people use English as a

second language (Crystal 1997)

Non-native writer encountered some special problem. (Ex: prepositions 介係詞 )

Challenge: Writing errors often present a semantic dimension(Ex: at school 指地點, in school 指時間 )

2

Target Error Type1. Preposition 介係詞 presence and choice: In the other

hand, ... (On the other hand ...) 2. Definite and indefinite determiner presence and choice:

I am teacher... (am a teacher) 3. Gerund 動名詞 /infinitive 不定詞 confusion: I am

interesting in this book. (interested in) 4. Auxiliary verb presence and choice: 從屬動詞 My

teacher does is a good teacher (my teacher is...) 5. Over-regularized verb inflection: I writed a letter (wrote) 6. Adjective/noun confusion: This is a China book (Chinese

book) 7. Word order (adjective sequences and nominal

compounds): I am a student of university (university student)

8. Noun pluralization: They have many knowledges (much knowledge)

3

Problem Definition

Present a modular system for detection and correction of errors made by non-native writers.

Focus on preposition and determiner related problem.

4

Related Work Turner and Charniak (2007) utilize a language

model based on a statistical parser for determiner and preposition selection

De Felice and Pulman (2007) utilize a set of sophisticated syntactic and semantic analysis features to predict 5 common English prepositions

Han et al. (2004, 2006) use a maximum entropy classifier to propose article corrections

Izumi et al. (2003) and Chodorow et al. (2007) present techniques of automatic preposition choice modeling

5

System Description 0. Preprocessing

Tokenized and POS tagged

1. Suggestion Provider (SP) Detection and correction

2. Language Model (LM) Delete the suggestions whose score is lower than

original

3. Example Provider (EP) Query the web for exemplary sentences

6

Suggestion Provider(1/3)

Classifiers : Presence/absence or pa classifier

ex: p(article + teacher) = 0.54 Choice or ch classifier

ex: p(the) = 0.04 p(a/an) = 0.96

Potential insertion sites are determined heuristically from the sequence of POS tags

7

Suggestion Provider(2/3) Features: ( ±6 tokens)

Relative position Token string POS tags Example: 0/I/PRP 1/am/VBP 2/teacher/NN

3/from/IN 4/Korea/NNP 5/./.

Decision tree classifiers (WinMine toolkit Chickering 2002) Better than linear SVM

8

Suggestion Provider(3/3) Data set:

English Encarta encyclopedia (560k sentences) A random set of 1M sentences from a Reuters

news data set.

Preposition from the NICT Japanese Learners of English corpus :about, as, at, by, for, from, in, like, of, on, since, to, with, than, “other“

9

Language Model 5-gram model trained on the English

Gigaword corpus (LDC2005T12) 120K-word vocabulary 54 million bigrams, 338 million trigrams, 801

million 4-grams and 12 billion 5-grams. Use interpolated Kneser-Ney smoothing

(Kneser and Ney 1995) without count cutoff Score:

I am teacher from Korea. score = 0.19 I am a teacher from Korea. score = 0.60

10

Example Provider (1/2)

Web Search

String query in a small window

Ranking rule:In the same sentenceSentence length Context overlap

11

Example Provider (2/2)

Original: I want to travel Disneyland in March. Suggestion: I want to travel to Disneyland in

March. Top 3 examples: 1. Timothy's wish was to travel to

Disneyland in California. 2. Should you travel to Disneyland in

California or to Disney World in Florida? 3. The tourists who travel to Disneyland in

California can either choose to stay in Disney resorts or in the hotel for Disneyland vacations.

12

Evaluation (1/5) Suggestion provider

Determiner choice Preposition choice Language model

Human evaluation 70% for training; 30%for testing Combined accuracy:

13

Evaluation (2/5) Suggestion provider

Determiner choice

Baseline:69.9% Choosing the mostfrequent class label none State of the art Turner and Charniak

(Penn Tree Bank): 86.74%

14

Evaluation (3/5) Suggestion provider

Preposition choice

Baseline : 28.94%Using no preposition

15

Evaluation (4/5) Language Model

Reduced the number of preposition corrections by 66.8% and the determiner corrections by 50.7%

Increase precision dramatically

For the accuracy of preposition suggestions LM score + classifier probability : 62.32%LM score alone: 58.36%

16

Evaluation (5/5) Human evaluation

17

CLEC: Chinese Learners of English Corpus (Gui and Yang 2003)

Conclusion and Future Work Successfully combining contextual speller

based methods with language model scoring and providing web-based examples.

The system can work even in extremely noisy text with reasonable accuracy

Future Work : Using web counts to build a learned ranker that

combines information from language model and classifiers

18

Thank you!

19

買敏順找敏順!敏順讓您呼吸順暢 輕鬆舒爽