Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo,...
-
Upload
eunice-wiggins -
Category
Documents
-
view
215 -
download
0
description
Transcript of Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo,...
![Page 1: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/1.jpg)
Statistical Machine Translation of Texts with Misspelled Words
Nicola Bertoldi, Mauro Cettolo, Marcello FedericoFBK - Fondazione Bruno Kessler,
Trento, Italy
ACL 2010
![Page 2: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/2.jpg)
Outline
Introduction System Data Evaluation Conclusions
![Page 3: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/3.jpg)
Introduction
non-word error
![Page 4: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/4.jpg)
Introduction
real-word error
![Page 5: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/5.jpg)
Introduction
Six different typing error operations ◆ Substitution
Target: [We] had just come in from Australia.Error : [Ww] had just come in from Australia.
◆ InsertionTarget: is a good place to stay, if you are looking for a hotel [around] LAX airport.Error : is a good place to stay, if you are looking for a hotel [arround] LAX airport.
◆ DeletionTarget: The room was [excellent] but the hallway was [filthy].Error : The room was [exellent] but the hallway was [filty].
![Page 6: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/6.jpg)
Introduction
◆ TranspositionTarget: The staff was [friendly].Error : The staff was [freindly].
◆ Run-OnTarget: I saw a teacher[.] who cares?Error : I saw a teacher[ ] who cares?
◆ SplitTarget: [We] had just come in from Australia.Error : [W e] had just come in from Australia.
Introduction
![Page 7: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/7.jpg)
Outline
Introduction System Data Evaluation Conclusions
![Page 8: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/8.jpg)
System
![Page 9: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/9.jpg)
SystemStep 1.
Step 2.
![Page 10: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/10.jpg)
SystemStep 3.
![Page 11: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/11.jpg)
SystemStep 4.
![Page 12: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/12.jpg)
SystemStep 5.Translation of the CN (e) is performed with the Moses decoder (Koehn et al., 2007)
![Page 13: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/13.jpg)
Outline
Introduction System Data Evaluation Conclusions
![Page 14: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/14.jpg)
Data
![Page 15: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/15.jpg)
DataEvaluation DataNon-word NoiseRandomly replace words in the text according to a list of 4,100frequently non-word errors provided in the Wikipedia.
Real-word NoiseReal-word errors are automatically introduced by another list of frequently misused words in the Wikipedia.
Random-word NoiseCorrupting the original text by randomly replacing, inserting,and deleting Characters.
![Page 16: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/16.jpg)
Outline
Introduction System Data Evaluation Conclusions
![Page 17: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/17.jpg)
Evaluation
![Page 18: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/18.jpg)
Evaluation
![Page 19: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/19.jpg)
Outline
Introduction System Data Evaluation Conclusions
![Page 20: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,](https://reader035.fdocuments.in/reader035/viewer/2022062906/5a4d1b347f8b9ab05999c508/html5/thumbnails/20.jpg)
Conclusions
◆ This paper addressed the issue of automatically translating written texts that are corrupted by misspelling errors.
◆ The enhanced MT system has been tested on texts corrupted with increasing noise levels of three different sources: random, non-word, and real-word errors.
◆ The impact of misspelling errors on MT performance depends on the noise rate, but not on the noise source.