Download - Spotting Translationese: An Empirical Approach

Transcript
Page 1: Spotting Translationese: An Empirical Approach

Spotting Translationese: An Empirical Approach

Pau Giménez FloresSupervisors: Carme Colominas and Toni Badia

Universitat Pompeu Fabra

Page 2: Spotting Translationese: An Empirical Approach

Content

1. Translationese2. Goals3. Translation Universals4. Empirical Methods in Translation Studies5. Theoretical Framework6. Hypotheses7. Methodology8. Working Plan9. Commented Bibliography

Page 3: Spotting Translationese: An Empirical Approach

Translationese• A product of the incompetence of the translator (translation errors):

–“unusual distribution of features is clearly a result of the translator’s inexperience or lack of competence in the target language” (Baker, 1998: 248)

• Translation-specific language or dialect, without any negative connotations (translation universals):

–Third code “which arises out of the bilateral consideration of the matrix and target codes: it is, in a sense, a sub-code of each of the codes involved” (Frawley, 1984: 168).

–Translationese: set of linguistic features of translated texts which are different both from the source language and the target language (Gellerstam, 1986).

Page 4: Spotting Translationese: An Empirical Approach

Goals

• Main goal: validating the hypothesis of translationese empirically.– Capturing the linguistic properties of translationese

in observable and refutable facts.– Detecting and classifying automatically translated vs.

non-translated texts based on its syntactic and lexical properties.

Page 5: Spotting Translationese: An Empirical Approach

Translation Universals (1)

“Features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems” (Baker, 1993: 243)

Page 6: Spotting Translationese: An Empirical Approach

Translation Universals (2)

• Explicitation or explicitness: translations tend to be more explicit than source texts– Repetition of redundant grammatical items (i.e.

prepositions)– Optional that-connective is more frequent in

reported speech in translated English (Olohan and Baker, 2000).

Page 7: Spotting Translationese: An Empirical Approach

Translation Universals (3)

• Simplification: the language of translations is assumed to be lexically and syntactically simpler than that of non-translated target language texts.– Narrower range of vocabulary: lower type-token ratio.– Lower level of information load: lower lexical density

Page 8: Spotting Translationese: An Empirical Approach

Translations Universals (4)

• Normalization: exaggeration of typical features of the target language. Translations tend to be more unmarked and conventional, less creative, more conservative. – Conventionalization of metaphors and idioms.– Dialectal and colloquial expressions less frequent.– Lexical choice of ‘standard translation’ (Gellerstam, 1986).

Page 9: Spotting Translationese: An Empirical Approach

Translations Universals (5)

• Interference from the source text and language (Toury, 1995; Mauranen, 2000). It can occur in the morphological, lexical, syntactic level, etc.

• Unique items hypothesis (Tirkkonen-Condit, 2002): translated texts “manifest lower frequencies of linguistic elements that lack linguistic counterparts in the source languages such that these could also be used as translations equivalents” (Simplification, Normalization?)

Page 10: Spotting Translationese: An Empirical Approach

Translations Universals (6)

However,The as yet relatively small amount of research into potential translation universals has produced contradictory results, which seems to suggest that a search for real, ‘unrestricted’ universals in the field of translation might turn out to be unsuccessful.

Puurtinen (2003: 403)

Page 11: Spotting Translationese: An Empirical Approach

Empirical Methods in TS (1)

• Laviosa-Braithwaite, (1996): study of the linguistic nature of English translated text in a subsection of the English Comparable Corpora (ECC).

• Øverås (1998): investigation of explicitation in translational English and translational Norwegian.

• Olohan and Baker (2000): testing of the explicitation hypothesis based on the omission and inclusion of the reporting that in translational and original English.

Page 12: Spotting Translationese: An Empirical Approach

Empirical Methods in TS (2)

• Borin and Prütz (2001): study of original newspaper articles in British and American English with articles translated from Swedish into English with POS n-gram tags.

• Puurtinen (2003): research of potential features of translationese in a corpus of Finnish translations of children’s books.

Page 13: Spotting Translationese: An Empirical Approach

Empirical Methods in TS (3)

• Baroni and Bernardini (2006): application of supervised machine learning techniques (SVMs) to detect translationese on two monolingual corpora of translated and original Italian texts.

Page 14: Spotting Translationese: An Empirical Approach

Empirical Methods in TS (4)

• Rayson et al (2008): a descriptive study of translationese by comparing keyword, keyword classes (POS) and key semantic tags frequencies in original Chinese, translated English and edited translated English corpora.

• Tirkonnen-Condit (2002): Translationese – a myth or an empirical fact? Human translators did not identify well if a text was translated or not.

Page 15: Spotting Translationese: An Empirical Approach

Theoretical FrameworkCrossroad of Corpus Linguistics, Translation Studies

and Computational Linguistics• It is an empirical research where corpora are the main

source of data and source of hypotheses (Laviosa-Braithwaite, 1996; Olohan and Baker, 2000, etc.)

• It tries to validate the existence of translationese and to define the linguistic properties of translated language as a product. (Gellerstam, 1986; Baker, 1993, etc.)

• Use of Computational Linguistic techniques such as information extraction and machine learning algorithms (Kindermann et al., 2003; Baroni and Bernardini, 2006)

Page 16: Spotting Translationese: An Empirical Approach

Hypotheses

1. Translationese exists and it is observable across languages.

2. This fact can be demonstrated with empirical methods applied to corpora in different languages.

Page 17: Spotting Translationese: An Empirical Approach

Methodology (1)

Preliminary Study Two monolingual comparable corpora of original and translated Catalan of art and architecture. 300.000 tokens each.

• Corpus Building– Corpus compilation– Tokenization, tagging and parsing with CatCG (Alsina, Badia et al. 2002)

• Corpus Exploitation– Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token ratio, lexical density, concordance lists)– Implementation of scripts to extract collocations and POS n-grams with Python and NTLK

• Implementation of a Machine Learning System– Machine Learning techniques (SVMs) in order to automatically classify texts in translated and not translated. – Training a set of the corpus and testing (Weka software).

Page 18: Spotting Translationese: An Empirical Approach

Methodology (2)Main experiment• Corpus Building

– Corpus compilation (Spanish, French, English, German)– Tokenization, tagging and parsing

• Corpus Exploitation– Exploitation with Wordsmith Tools (wordlists, frequency lists, type-token ratio, lexical density, concordance lists)– Implementation of scripts to extract collocations and POS n-grams with Python and NTLK

• Implementation of a Machine Learning System– Machine Learning techniques (SVMs) in order to automatically classify texts in translated and not translated. – Training a set of the corpus and testing (Weka software).

Page 19: Spotting Translationese: An Empirical Approach

Working Plan

Page 20: Spotting Translationese: An Empirical Approach

Commented Biblography (1)• Baker, M. (1995). Corpora in Translation Studies: An Overview and Some

Suggestions for Future Research. Target 7, 2: 223-243.

– Definition of a new type of corpora: monolingual comparable corpora in order to “effect a shift away from comparing either ST with TT or language A with language B to comparing text production per se with translation.” – Type-token ratio, lexical density measures.

• Borin, L. and Prütz, K. (2001). Through a Glass Darkly: Part-of-speech Distribution in Original and Translated Text, in Computational linguistics in the Netherlands 2000, 30-44.

– Comparison of POS n-grams in order to determine if there are significant syntactical differences between original and translated language. – Overuse in translated English of preposition-initial sentences and sentence-initial adverbs.

Page 21: Spotting Translationese: An Empirical Approach

Commented Biblography (2)

• Kindermann et al. (2003). Authorship attribution with support vector machines. Applied Intelligence 19, 109-123.

– Different statistical techniques for authorship attribution are described: the log-likelihood ratio statistic, naïve bayesian probabilistic classifiers, multi-layer perceptrons, k-nearest neighbour classification (kNN), Support Vector Machines (SVMs), etc. – SVMs achieve better results than other classifiers in author attribution: they are fast and allow a great number of features as input.

Page 22: Spotting Translationese: An Empirical Approach

Commented Biblography (3)

• Baroni, M. and Bernardini, S. (2006). A New Approach to the Study of Translationese: Machine-Learning the Difference between Original and Translated text, Literary and Linguistic Computing (2006) 21(3). 259-274

– A new explicit criterion to prove the existence of translationese: learnability by a machine. – SVMs allow the utilization of a big amount of features.– The application of SVMs achieve better results than professional human translators.– Their results show that translations are recognizable on purely grammatical/syntactic grounds (function words distribution and shallow syntactic patterns).

Page 23: Spotting Translationese: An Empirical Approach

Commented Biblography (4)

• Tirkkonen-Condit, S. (2002). Translationese – a Myth or an Empirical Fact? Target, 14 (2): 207–20.

– The hypothesis of translationese is, at least, controversial, whereas the unique items hypothesis can describe in a better way the translated or non-translated nature of a text. – Translated texts “manifest lower frequencies of linguistic elements that lack linguistic counterparts in the source languages such that these could also be used as translation equivalents”.