USING THE INTERNET FOR SPECIALISED TRANSLATION
Transcript of USING THE INTERNET FOR SPECIALISED TRANSLATION
USING THE INTERNET
FOR SPECIALISED TRANSLATION
1
Translation Technology
“much translation work is carried out in a computer-assisted translation (CAT) environment, which may vary from a standard desktop equipped with word processing software and a browsertoa full-blown translator workstation consisting of a multiplicity of tools specifically created for translators of technical texts and localizers."
“Translation agencies organize their workflow around project management systems that distribute translation tasks, memories and terminologies to and around individual translators.”
(F. Zanettin 2014, “Corpora inTranslation”)
Translation technologies
• electronicdictionariesand terminologicaldatabases, thearrivalof the Internet with its numerous possibilities for research, documentation and communication, andtheemergence of computer-assisted translationtools.
Alcina A. (2008) «Translation technologies - Scope, tools andresources».Target 20:1, 79–102
Degrees of Translation automation
• The term traditional human translation is understood to refer totranslation
without any kind of automation
• Fullyautomatichigh quality translation (FAHQT)meanstranslation that is performed wholly by the computer, withoutany kind of
human involvement,and is of “highquality”• Human-aided machine translation (HAMT) refers to systems in
whichthetranslation is essentially carriedout by the program itself, but aidrequired fromhumans
• Machine-aided human translation (MAHT) comprises any process or degree of automation in the translation process, provided that the mechanical intervention provides some kind of linguistic support.
Degrees of Translation automation
Tools vs.Resources• The word tool refers to computer programs that enable translators to carry
out a series of functions or tasks with a set of data that they have preparedand, at the same time, allows a particular kind of results to be obtained.
• Internet search engines• Word processor• Trados, Wordfast, Déjà Vu, Across, OmegaT, …• Antconc, Wordsmith…
• By resourceswe refer to all sets of previously gathered linguistic data which are organized in a particular manner and made available in some electronic format so that they can be used or looked up or used by translators used in the course of some phase of processing. Terminological databases (e.g. IATE), glossaries, …
• (online) dictionaries• British National Corpus, …
why and how can we mine the web?
• thestudyofwords“bypresentingtheminthecompanytheyusuallykeep- thatistosay,anelementoftheirmeaningisindicatedwhentheirhabitualwordaccompanimentsare shown”
• “Extendedunitsofmeaning” at work in language(Sinclair, 1996)
Extended units of meaning
Wordsmustbestudied incontextratherthanin
isolation
• collocation• colligation• semantic preference• semantic prosody
Extended units of meaning
Wordsmustbestudied incontextratherthanin
isolation
• DifferencesinItalianbetween(fromTaylor,1998: 61):◦ “pressione alta”=“high(blood)pressure” [medical]◦ “altapressione”=“(banksof)highpressure” [meteorological]
• collocation• colligation• semantic preference• semantic prosody
11
• “Tendencyofcertainwordstoco-occurregularlyinagiven language”(MonaBaker,1992: 47)• Asobservedinactualtexts(vs. intuition)
• Keyfeaturesof collocationsolanguage-specific(collocationsvaryfromlanguagetolanguage)
• Collocationsarenotstableor fixedotheymaychangediachronically(overtime)ingenerallanguageotheymaychangeinLSPvs.generallanguageotheymaychangeacrossLSPdomains
Collocation
•“Aconsistentauraofmeaningwithwhichaformisimbuedbyitscollcates”(Louw1993)
• “Feeling”or“aura”thatisevokedbyusingcertainwords(reinforcedbycollocates,duetoco-selectionalimplicationsandrestrictions)
•Usuallythisfeelingis“positiveor negative”• “Provide”tendstooccurwithwordsdenotingthingswhicharedesirable,necessaryorgood,suchas “information”,“service(s)”,“support”,“help”,“money”,“protection”,“food”, “care”• cf.Italian“fornire”and “elargire”
• “Cause”tendstooccurwithwordsdenotingnegativerepercussions/consequences,suchas“pain”,“damage”, “harm”• cf.Italian “causare”
•Not20necessarilyaccessibletointuition.
Semantic prosody
•“Aconsistentauraofmeaningwithwhichaformisimbuedbyitscollcates”(Louw1993)
• “Feeling”or“aura”thatisevokedbyusingcertainwords(reinforcedbycollocates,duetoco-selectionalimplicationsandrestrictions)
•Usuallythisfeelingis“positiveor negative”• “Provide”tendstooccurwithwordsdenotingthingswhicharedesirable,necessaryorgood,suchas“information”,“service(s)”,“support”,“help”,“money”,“protection”,“food”, “care”• cf.Italian“fornire”and “elargire”
• “Cause”tendstooccurwithwordsdenotingnegativerepercussions/consequences,suchas“pain”,“damage”, “harm”• cf.Italian “causare”
Semantic prosody
14
•Relationbetweenalemmaandasetofsemanticallyrelatedwords(Stubbs,2001:65)• Lemma:baseform(lexeme)ordictionaryentryofa word• “Commit”isusedwithagroupofsemanticallysimilarwords,e.g.“murder”,“crime”,“suicide”(cf.Italian “commettere”)
•“Revoke”isusedwithe.g.“licence”,“permit”,“authorization”
•Semanticprosodyà positive/negativeevaluation•Semanticpreferenceà relationtowordsbelongingtoaparticular,definablesemantic field
Semantic preference
15
•Relationbetweenalemmaandasetofsemanticallyrelatedwords(Stubbs,2001:65)• Lemma:dictionaryentryofa word• “Commit”isusedwithagroupofsemanticallysimilarwords,e.g.“murder”,“crime”,“suicide”(cf.Italian “commettere”)
•“Revoke”isusedwithe.g.“licence”,“permit”,“authorization”
•Semanticprosodyà positive/negativeevaluation•Semanticpreferenceà relationtowordsbelongingtoaparticular,definablesemantic field
Semantic preference
16
•Relationbetweenapairofgrammaticalcategoriesorapairingoflexisandgrammar(Stubbs,2001: 65)
• hear,notice,see,watchenters into colligationwiththesequence ofobject +either the bareinfinitive orthe -ingform;e.g.Corr
• We heard thevisitors leave/leaving.• We noticed himwalk away/walking away.• We heard Pavarottising/singing.• We saw it fall/falling.espondingcollocationsandcolligationsin
Italian for“breakthe law”?
Colligation
17
•Relationbetweenapairofgrammaticalcategoriesorapairingof
• hear,notice,see,watch enters into colligationwiththesequenceofobject +either the bareinfinitive orthe -ingform;e.g.
• We heard thevisitors leave/leaving.• We noticed himwalk away/walking away.• We heard Pavarottising/singing.• We saw it fall/falling.espondingcollocationsandcolligationsin
Italian for“breakthe law”?
Colligation
•Relationbetween apair ofgrammatical categoriesorapairing oflexis andgrammar (Stubbs,2001: 65)
Conclusion on using theWebfor specialised translation – Main advantages
• massive amount of texts and multi-source information can besearched
• content is constantly “refreshed” (i.e. updated andextended)
• a lot of sources, text types and domains/topics arerepresented
• many languages (English is dominant, good presence ofItalian)
• replicable search techniques across (your working/target) languages
• it is availableat anytime, at virtuallyno cost!
How to friend and unfriend someone on Facebook - Computer Hope1.https://www.computerhope.com › ... › Facebook Help24 gen 2018 - Before you can connect with another person on Facebook and view their full profile, you must first become friends. Below are the steps on how to find new friends on Facebook, addfriends, and how to unfriend any of your current friends. How to findfriends on Facebook; How to friend someone on ...
Conclusion on using theWebfor specialised translation – Main advantages
• massive amount of texts and multi-source information can besearched
• content is constantly “refreshed” (i.e. updated andextended)
• a lot of sources, text types and domains/topics arerepresented
• many languages (English is dominant, good presence ofItalian)
• replicable search techniques across (your working/target) languages
• it is availableat anytime, at virtuallyno cost!
Main disadvantages andproblems
o need to differentiate good/reliable sources from questionable information§for facts (limited control over user-generated content likeWikipedia)§for linguistic usage (badly translated, non-native texts, poorauthors)§it may be difficult to identify differences betweenexpert/non-expertuse
o data/results still need to be interpreted
Main disadvantages andproblemso Google focuses on content/information, rather than linguisticforms
• the rankingand sorting of results are performed accordingto criteria like
• “popularity” of the websites, or geographic relevance
• the same search can yield different numbers of hits, depending on unpredictable and uncontrollable factors as the time of the day, or the location from which the queryis made -- wordcounts are not reliable+it is difficult tocompare frequencies to verify translationhypothes
• data on which searches are performed isunstable/changes
Main disadvantages and problems
Particularly relevant to linguists/translators:§ no possible/meaningful sorting of hits/results(esp. L/R-hand collocates)
- e.g. alphabetical sorting of collocates, from least to most frequent,etc.- think of e.g. the “a * range/array of”, “on the vergeof” exercises
§ punctuation and upper case (capitals) are ignored, e.g. “aids” vs.“AIDS”§ impossibleto searchpartsof words,e.g. start with “geo…”,end in “-itis”§ no lemmatisedsearches
- hard to calculate frequencies of specific wordcombinations- e.g. to calculatehow frequent is the combination “tirare l’acquaalproprio
mulino”, all inflected forms must be searchedfor§ no POS-sensitivesearches
- e.g. to search for ‘spot’ as a noun vs. as averb
§ no possibility to specify the span occuring between twosearch terms- i.e. the * wildcard can include zero to nwords
«Googleology is bad science» (Kilgarriff 2007)
MACHINE TRANSLATION
(MT)
1
24Machine translation (MT):definition and key terms
• Definition of machine translation:
“computerised systems responsible for the production of translations
from one natural language into another, with or without human
assistance” (Hutchins & Somers, 1992: 3)
o Human intervention is not necessarily excluded, but if it does occur it is
subordinated to the prevailing action of the computer
• Some key terms:
o MT system / engine / service = the software that produces the translation
o input = the source text (i.e. original that we are trying to translate)
o [raw] output = [unedited] target text (i.e. the translation that we obtain)
MT – popular conceptions
Probably the translation technology that attracts the most public attention, esp. among non-translators.Two extreme positions about MT:
1.MT is totally useless and a waste of time and money, as the quality o the output is generally very low (funny anedoctes)
Underestimates possibilities2.MT will bring down language barriers; in a few years’ time MT will
be as good as human translation, no more need for translatorsUnderestimate limitations
Quality varies according to language pairs, integrated tools (MT thatlearns) and pre- editingThere will be more pre-editing and post-editing jobs, for which human expertise is required à new spheres of activity for translators/languageprofessionals
“L'inglese di Expo non sembra Google Translate,è Google Translate”
From http://www.linkiesta.it/it/blog-post/2015/02/12/linglese-di-expo-non-sembra-google-translate-e-google-translate/22476/
MT – popular conceptions
Probably the translation technology that attracts the most public attention, esp. among non-translators.Two extreme positions about MT:
1.MT is totally useless and a waste of time and money, as the quality o the output is generally very low (funny anedoctes)
Underestimates possibilities2.MT will bring down language barriers; in a few years’ time MT will
be as good as human translation, no more need for translatorsUnderestimate limitations
and post-editing jobs, for which human expertise is required à new spheres of activity for translators/language professionals
28
Texts in SL Texts in TL
Parallel corpora: a collection of original texts in language L1 and their translationsinto a give L2
Machine translation (MT):main architectures of MT systems
29
• So why is translation difficult for computers?
o Some blame the computer’s lack of “real-world knowledge”
o Focus on potential translation problems for EN-IT (with a computer!!)
o A simple example: lexical gaps and lexical asymmetries (concrete nouns)
§ legno / bosco / foresta in IT (+ EN, FR, DE and your other languages…)
Machine translation (MT):why is MT so difficult? Or why is translation difficult for computers?
legno bosco foresta IT
wood forest EN
bois forêt FR
30
• Partly because the translation often depends on the context / situation, which the computer is not able to take into account
“The ball is in your court”
Machine translation (MT):why is MT so difficult? Or why is translation difficult for computers?
“Il pallone è nella vostra metà campo”(the manager to the players)
“Il ballo è nella vostra corte”(the chamberlain to the king)
31Machine translation (MT):why is MT so difficult? Or why is translation difficult for computers?
• Lexical ambiguities (gramm. category <-> meaning <-> translation)
for example, in EN: round
j) My team was eliminated in the first round
k) The cowboy started to round up the cattle
l) We can use the round table for dinner
m) Maggie is going on a cruise round the world
• These sentences are ambiguous and very complex (for MT!):
Time flies like an arrow
Gas pump prices rose last time oil stocks fell
: girone)Noun(
: radunare)Verb(
: rotondo)Adjective(
: intorno al)Preposition(
32
1) The chimp eats the banana because it is greedy.
2) The chimp eats the banana because it is ripe.
3) The chimp eats the banana because it is lunchtime.
____________
___________ __
__
?
Machine translation (MT):some linguistic phenomena that are particularly difficult for MT
• The case / example of pronominal anaphora (resolution), difficult for MT
33
MT post-editing
34
• The aim of post-editing is to make the revised output usable orunderstandable, with the least possible effort (quickly)
• The priority is to save time and money
• The extent and the accuracy of post-editing are negotiated/specified on a case by case basis, depending on the needs and requirements
• Different “types” and levels of post-editing (in companies, organisations):
• no post-editing• internal circulation, almost never external publication
• minimum post-editing• internal circulation, rarely external publication
• full/complete post-editing (but… is it worth it?)• very rarely internal circulation, mostly external publication
MT post-editing
35
• new skill that is acquired with experience, different from translation
• in this scenario one has to balance and optimise quality-speed-cost, inrelation to the intended use/duration of the translation
• length of use ofthe document
• needs and expectations of the end user(s)
• ability of the readers/addressees tomake use of the doc.
• type, length and “visibility” of the document
• available and viable options
MT post-editing: introduction
36
• (minimum/full-complete) are decided specifically
• Factors to be considered (prioritised)
• save time and money (quality is less relevant)
• understandability and correctness of general meaning are key
• Factors to be ignored (irrelevant in PE)
• any detail or nuance
•elegance, fluency, naturalness of expression, etc.
on average PE is paid roughly 50% of the “real/proper” translation
Aims and level of PE (vs. translation/proofreading!)
37
MT pre-editing
38
•There are two possibilities to limit the texts / language in / for MT:• adopt a controlled language (restricted input) • use the sublanguage approach
• Common aims with both options (to the advantage of MT):• limited vocabulary • more certainty on interpretation • reduce syntactic variation
Limit input domain / topic
39
• Prescriptive rules aimed at normalising the style of the input (ST), e.g.
• do not write sentences with more than 20 words (general, language-neutral)
• avoid passive constructions, use only active verb forms
• avoid anaphoras, make all subjects and pronominal references explicit
• in EN: do not omit “that” in relative clauses (language-specific)
• in IT: do not use “solo” as an adverb, but use “soltanto/solamente”
• in IT: use the word “minuto” only as a noun (i.e. to mean 60 seconds);
for the adjectival meaning, use only “piccolo”
Etc……
The result of controlled language is restricted input
Controlled language
40
• Natural/normal behaviour of language within a well-defined domain(~ LSP, specialised language, jargon, etc.)
• “sub-” in the mathematical sense as in “subset”, not derogatory!• referred to very well-defined, enclosed, limited domains and texts
• A sublanguage exists and is used regardless of MT, but one can designan MT system that takes advantage of this sublanguage
• vocabulary• limited (relatively few concepts to be covered/expressed)• finite/closed (innovation/deviation tend to be avoided)• a few homographs, in general limited use of synonyms and coreferences
• syntax• limited range of structures and constructions (regularity + repetitiveness)
• usually sublanguages are very similar cross-linguistically between SL/TL(s)
Sublanguage (1/2)
41
• Input must be in (or converted into) electronic format
• Correct formatting and layout of the input are very important
o the word “e r r o r” (spaced letters) would not be recognised / translated
o spelling and typos are crucial: THEY BOOKS A ROOM …
(anybody would understand banal mistakes, but not an MT system!)
• Limited availability of language combinations (improving with SMT)
o coverage mostly limited to “usual” big languages with commercial interest
Machine translation (MT):restrictions to the use of MT
COMPUTER-ASSISTED TRANSLATION
(CAT) TOOLS
1
43
• Computer-assisted translation or computer(machine)-aided translation (CAT) refers to a variety of tools, a family of software products designed to support professional translators in their work.
• CAT is a “recent” development, derived from MT over the last 20 years
• The actual development of commercial CAT tools started in the 1990’s – the so-called “translator’s workstation / workbench”, which includes
• terminology management packages• translation memory (TM) software (+ text alignment software, etc.)
• CAT tools are pieces of software designed to enhance the work of translators:
• maximise speed à higher productivity• improve coherence and precision à higher quality
Computer-assisted translation (CAT) tools
44
• Used to create, store, retrieve and manipulate bi-/multilingual termbases/glossaries
• As searching for terminology can be highly time-consuming (even up to 75% of translators’ time), setting up a database which gathers the terminology you come across is vital.
• Lists in word processors / spreadsheets (e.g. Excel) àlimited options for presenting and sorting data
• The terminology covered is usually that of a given (sub-)discipline or the terms needed for a specific translation project.
• Terminology records consist of a number of flexible fields
CAT tools, example 1:terminology management packages
46
• Translation memory (TM):
“multilingual text archive containing […]multilingual texts, allowing storage and retrieval of
aligned text segments against various search conditions”
(EAGLES* 1995)* Evaluation of Natural Language Processing Systems
• This roughly means: a “filing cabinet” (i.e. a database) of old translations whose bits can be retrieved and used when / as needed by the translator
• essentially a textual database that can be searched• pairs of source-text and target-text segments
CAT tools, example 2:translation memory (TM) software
Note: Translation memory indicates both the software tool and the contents of the database, i.e. the whole set of aligned text segments that it includes
47
• Key idea: recycle similar past translations, never translate the same (or a similar) text twice
• How it works: • TM tools divide the source text – which must be in (or turned
into, e.g. with OCR) electronic/digital format –into segments, which translators can translate one-by-one in the traditional way.
• These segments (usually sentences, or even phrases) are then sent to a built-in database. When there is a new source segment equal or similar to one already translated, the memory retrieves the previous translation from the database.
• When is this most useful:• for the translation of any text that has a high degree of repeated
terms and phrases which must be translated consistently, as is the case with e.g. user manuals, computer products and subsequent versions of the same document (e.g. website updates).
• mostly relevant to technical/specialised translation (not literature)
Translation memory (TM) software
48
• Scenario◦ you have to translate the user manual of a printer (new model) from English into Italian◦ a lot of repetition within the document itself ◦ overlap and repetitions across updated (old-new) versions of the documentation◦ you have a relevant TM (similar topic / domain / texts / clients)◦ you translated the previous manual(s)◦ TM provided by client / translation agency / colleague
Using translation memory (TM) software
49
• Translation of a printer manual English (A) à Italian (B)
Source text (in language A)
ST: There are 4 ways to change print settings for this printer
Exact/Perfect match (everything in the segment is exactly the same)
A: There are 4 ways to change print settings for this printerB: Ci sono 4 modi per cambiare le impostazioni di stampa di questa stampante
Full match (only figures, dates and similar small details are different)
A: There are 2 ways to change print settings for this printerB: Ci sono 2 modi per cambiare le impostazioni di stampa di questa stampante
Using translation memory (TM) software
50
Source text (in language A)
ST: “There are 4 ways to change print settings for this printer”
Fuzzy match 85% similar (a few words in translation unit are different)
A: “There are several ways to change print settings for the printer”B: “Ci sono vari modi per cambiare le impostazioni di stampa alla stampante”
Fuzzy match 60% similar (some words in translation unit are different)
A: “There are several ways to modify the default setting of your printer”B: “Ci sono vari modi per modificare l’impostazione standard della tua stampante”
• With the acceptibility threshold of the TM tool set at 75%, nocandidate translation unit under that level of similarity is retrievedand shown to the translator!!
Using translation memory (TM) software
• CATtools - Advantages• canspeedupthetranslation process andincrease productivity• canimprove translation quality (byenhancing terminologicalandphraseological coherence)• canhelptranslators provide quotations• allow forcollaboration overlargeprojects
• TMs/termbases canbesharedbyseveraltranslatorsandupdatedinrealtime
• Uselessforsometexttypes(e.g.literature)• Essentialformanyspecialized/technicaldomains
• Translation agencies require translators touse(specific typesof)CATtools
• Technical/practical issues• different approaches:someCATtools have aproprietary,stand-alonetexteditor,others are«integrated»(e.g.toWordprocessor),somerecent ones arefully online
• proprietory vs.interchange formats• nomatches calculated below sentence-level (e.g.at phraselevel)• but Concordance function is becomingstandard
• criteriaused todefine similarity /matches• maching is calculatednotonthebasis ofsentenceorwordmeaning,but onthebasis ofcharacter-string similarityTP:IbambinigiocanoingruppoconilpalloneFM1:Ipampinigiovanoil grulloconiltallone (94%match)FM2:Ibimbi sidivertono giocando acalcio insieme (42%match)
16Someissues about TMs
• Language/translation issues• segmentationimpliesthatoverallperceptionoftheST/TTislostà STstructuretendstobereproducedinTT• cross-linguisticdifferencesine.g.cohesivepatternsmightbeoverlooked
• using TMs limits thetranslator’s creativity,as s/heis usuallyexpected tousetheterminology andphraseology included intheTM
• TMs cansometimesbereversed,as if translation direction didnotmatter…
• need tocontrolthereliabilityoftranslationswithin TM
16Someissues about TMs
CORPORA AND TRANSLATION
1
• “acollectionofnaturally-occurringlanguagetext,chosentocharacterize astateorvarietyofalanguage”(Sinclair,1991:171)
• “acollectionoftextsassumedtoberepresentative ofagivenlanguage,dialect,orothersubsetofalanguage,tobeusedforlinguisticanalysis”(Francis,1992:7)
• “aclosedset oftextsinmachine-readableformestablishedforgeneralorspecificpurposesbypreviouslydefinedcriteria”(Engwall,1992:167)
• “afinite-sizedbodyofmachine-readabletext,sampledinordertobemaximallyrepresentativeofthelanguagevarietyunderconsideration”(McEnery&Wilson,1996:23)
• “acollectionof(1)machine-readable (2)authentic texts[…]whichis(3)sampled tobe(4)representativeofaparticularlanguageorlanguagevariety”(McEneryetal.,2006:5)
What is a corpus? Some (authoritative) definitions
What is / is not a corpus…?
AnewspaperarchiveonCD-ROM?Anonlineglossary?Adigital library (e.g.ProjectGutenberg)?All RAI1programmes (e.g.forspoken TVlanguage)
Theanswerisalways“NO”
(seedefinition)
Corpora vs. web•Corpora:
– Usuallystable•searches canbereplicated
– Controlovercontents•wecanselect thetextstobeincluded,orhavecontroloverselectionstrategies
– Ad-hoclinguistically-awaresoftwaretoinvestigatethem•concordancers cansort/organiseconcordance lines
•Web (asaccessedviaGoogleorothersearchengines):– Veryunstable
•resultscanchangeatanytimeforreasonsbeyondourcontrol– Nocontrolovercontents
•what/howmanytextsareindexedbyGoogle’s robots?– Limitedcontroloversearchresults
•cannotsortororganisehitsmeaningfully;theyarepresentedrandomly
Click here foranother corpusvs.Googlecomparison
• A corpus is a principled collection of naturally occurring electronictexts designed to be a representative sample of language in actual use
• Some of the main features and criteria used to describe and classify corpora:
What types of corpora exist? A brief overview
generalspecialised
writtenspoken (transcribed)
multimodal (audio/video)balanced (sample)
opportunisticsynchronicdiachronic
staticdynamic
closed / finiteopen-ended (monitor)
raw (pre-corpus)marked-up (augmented)
POS-tagged (augmented)annotated (augmented)
monolingualbi- / multilingual
parallelcomparable
An example of planned balance:the British National Corpus100 m words of contemporary spoken and written British EnglishRepresentative of British English “as a whole”Designed to be appropriate for a variety of uses: lexicography, education, research, commercial applications (computational tools)Balanced with regard to genre, subject matter and styleSampling and representativeness very difficult to ensure
Dynamic (Monitor) vs static (Finite)
A static corpus will give a snapshot of language use at a given time
EasiertocontrolbalanceofcontentMaylimitusefulness,esp.astimepasses
A dynamic corpus is ever-changingCalled“monitor”corpusbecauseallowsustomonitorlanguagechangeovertime
Concordance for nodeword “eyes” (sorted 1L) generated from the BNC
63
Parallel (translational)corpora• containtranslationally“equivalent”texts:STsandtheircorrespondingTTs• needtobealigned,usuallyatthesentencelevel,i.e.SLsentenceXmatchedtoTLsentenceX’• contextisprovidedtoaccountfor“equivalence”and“translationshifts”betweenSTandTT• translationdirectionneedstobeclear,i.e.whichareSLandTLcomponentsofthecorpus
Comparable corpora• textsoriginallyproduced(nottranslated)intherespectivelanguages• consistofindependenttextswhichare“similar”accordingtosomepre-determinedcriteria•thevariouslanguagecomponentsshareasetofcommonfeatures,e.g.texttype,genre,publicationspan,domain,topic• parametersdefiningthissimilarityvarywidely
Parallel vs.comparable multilingual corpora
Bilingual parallel corpora on the web
64
• OPUScorpus,opus.lingfil.uu.se
• Avariety ofmultilingual parallel corpora• European Parliament debates (EuroParl corpus)• European CentralBank corpus• UNdocuments• Subtitles (opensubtitle project)• Softwaremanuals (PHP,OO)• …
Query
Sort + Launch the query
Choose TL(s)
help
http://opus.lingfil.uu.se/ à EuroParl v7 search interface
Other useful functions
Choose SL
66
Comparable Eng/Ita corpus on botany
Summing up: corpus use in translationMain uses:Test/generate hypotheses as to interpretation of the source text, and as to appropriate translations
helpful when you’re dealing withlittle known text-types /domainshelpful when you’re dealing withalittle known language
Improve quality – capture subtleties of source text, produce translations which read like native speaker texts
More precisely,Reference corpora provide insights on phraseologicalregularities in discourseComparable corpora (automatic and manual) can be used for (contrastive) specialised/genre-controlled text analysisParallel corpora provide equivalents in context/evidence of translation strategies (and are more versatile than TMs)