Translation quality assessment redefined
-
Upload
denis-khamin -
Category
Education
-
view
4.257 -
download
0
Transcript of Translation quality assessment redefined
TRANSLATION QUALITY ASSESSMENT REDEFINEDfrom TQI to competences and suitability
Demid Tishin
All Correct Language Solutions
www.allcorrect.ru
What are they thinking about when they look at the target text?
2
Client:
Will it blend?* Let’s find a flaw…
33
*Just a joke. “Will it do”, I mean
Quality manager:
Will it blend? I wish the client said
OK…
4
HR / Vendor Manager:
What kind of work can I trust to this provider? What can I
not?How quickly can we train him?
5
Project Manager:
Return for improvement or correct by other
resources?
6
To answer these questions the target text needs
assessment
7
TRANSLATION ASSESSMENT: THE ARMORY
What assessment techniques do you know?
8
TRANSLATION ASSESSMENT: THE ARMORY
Subjective assessment (“good / bad”)
Comparing with the sourceaccording to a parameter checklist
Automated comparison with a reference translation (BLEU etc.)
Weighing errors and calculating TQI 9
SUBJECTIVE ASSESSMENT (“GOOD / BAD”)
Pro’s Con’s
Speed Results not repeatable
Results not reproducibleDifficult for client and service provider to arrive at the same opinion
Impossible to give detailed reasons
Tells nothing of provider’s abilities
10
COMPARING WITH THE SOURCEACCORDING TO A PARAMETER CHECKLIST
Pro's Con's
Some reasoning for assessment results
Results not reproducibleDifficult for client and service provider to arrive at the same opinion
Results not repeatable
Tells nothing of provider’s abilities 11
AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION
The more word sequences correlate between the target and the reference, the better the translation
BLEU (BiLingual Evaluation Understudy), ROUGE, NIST, METEOR etc.
An overview of BLEU: Tomedes Blog http://blog.tomedes.com/measuring-machine-translation-quality/
12
AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION
13
AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION
Pro's Con's
Speed Does not account for individual style
Limited scope (today limited to MT output)
Does not correlate to human assessment
A number of reference translations must be prepared before assessment (justified for batch assessment of different translations of the same source sample)
Tells nothing of provider’s abilities
How should acceptability threshold be defined? 14
WEIGHING ERRORS AND CALCULATING TQI
Who? Lionbridge, Aliquantum, Logrus, All Correct Language Solutions … and many others
Publicly available techniques:- SAE J2450- ATA Framework for Standard Error Marking- LISA QA Model 3.1
An overview of translation quality index techniques and guidelines to create your own:http://www.aliquantum.biz/downloads.htm 15
WEIGHING ERRORS AND CALCULATING TQI
What components you will need: Error classifier Error weighing guidelines Translation assessment guidelines, which yield
repeatable and reproducible results Expert (competent and unambiguous) Assessment results form
16
WEIGHING ERRORS AND CALCULATING TQI
17
WEIGHING ERRORS AND CALCULATING TQI
18
WEIGHING ERRORS AND CALCULATING TQI
19
WEIGHING ERRORS AND CALCULATING TQI
TQI (Translation Quality Index) is the usual practical result of translation quality measurement
ATA Framework: TQI = EP * (250 / W) - BP
SAE J2450: TQI = EP / W
LISA QA Model: TQI = (1 - EP / W) * 100
where EP = total Error Points
W = number of words in sample
BP = Bonus Points for outstanding translation passages (ATA)
max. 3 points20
WEIGHING ERRORS AND CALCULATING TQI
Pro's
Results highly reproducible (SAE J2450)
Results highly repeatable (SAE J2450)
Detailed error classifier with explanations and examples (LISA QA Model)
Easy to use for quality feedback to providers
Convenient to grade providers according to their TQI for a specific project
TQI is a simple numeric index, which you can account in a database and use in your balanced scorecard, KPI’s etc. 21
WEIGHING ERRORS AND CALCULATING TQI
Con's
Limited scope (SAE J2450)
Low reproducibility of results (ATA Framework)
A threshold of acceptable TQI is required (e.g. 94,5 etc.), while clients do not tolerate any explicitly stated imperfection
Assessment is time-consuming (5-20 minutes per sample provided that the expert has carefully studied the source.
Subjective or underdeveloped error weight assignment – a try at forecasting error consequences (LISA QA Model)
Tells very little of provider’s abilities 22
WEIGHING ERRORS AND CALCULATING TQI
Con's
Underdeveloped Translation assessment guidelines, including but not limited to: - requirements to translation sample (size, presence of terminology etc.)- how to evaluate repeated typical (pattern) errors?- how to assess flaws in the target, which root in obvious flaws in the source?- how to evade several minor errors resulting in the same score as one major error?- how to handle obviously accidental errors that change the factual meaning? 23
WEIGHING ERRORS AND CALCULATING TQI
Con's
TQI is valid only for:- a specific subject field (e.g. gas turbines, food production etc.)- a specific text type (Legal, Technical and Research, or Advertising and Journalism)
A slight change in any of the above (subject, text type) means that one cannot forecast the provider’s TQI based on former evaluations a new (tailored) assessment is required ungrounded expenses 24
None of the translation assessment methods answers the questions:
Will it blend?What kind of work can I trust
to this provider? What can I not?How quickly can we train him?Return for improvement or correct by other
resources?
TRANSLATION ASSESSMENT: THE ARMORY
25
Translation assessment techniques
need improvement!
TRANSLATION ASSESSMENT: THE ARMORY
26
Split all errors into 2 major groups:Factual = error in designation of objects and
phenomena, their logical relations, and degree of event probability / necessity
Connotative = errors in conveying emotional and stylistic information, non-compliance with rules, standards, checklists and guidelines etc.
IMPROVEMENT 1: TWO ERROR DIMENSIONS
27
IMPROVEMENT 1: TWO ERROR DIMENSIONS
28
That’s a restaurant
That’s a damn fctory
That’s a factory
(factual error)
(2 connotative errors, though no factual errors)
(source)
Each text element (word, phrase, sentence etc.) can contain:
1 connotative error
or1 factual error
or1 connotative and 1 factual error
simultaneously
IMPROVEMENT 1: TWO ERROR DIMENSIONS
29
An accidental error (e.g. an obvious typo), which leads to obscuring factual info, counts as two (e.g. language and factual).
You can at once give specific instructions to the provider (e.g. be more careful) and consider client’s interest (e.g. absence of factual distortions whatever the reason)
“To kill Edward fear not, good it is” / “To kill Edward fear, not good it is” (Isabella of France): an error in the comma critical factual distortion
IMPROVEMENT 1: TWO ERROR DIMENSIONS
30
Map each error in the classifier to the competences that are required to avoid it
IMPROVEMENT 2: COMPETENCES
31
Competence types:Competences of acquisitionCompetences of productionAuxiliary (general) competences
IMPROVEMENT 2: COMPETENCES
32
Competence levels:Unsatisfactory = provider cannot do the
corresponding workBasic = can workAdvanced = can revise and correct work
of others or train others
IMPROVEMENT 2: COMPETENCES
33
Competences of acquisitionSource language rulesSource Literary Source CulturalSubject matter
IMPROVEMENT 2: COMPETENCES
34
Competences of production:Target language rulesTarget literaryTarget culturalTarget mode of expression (= register,
functional style)
IMPROVEMENT 2: COMPETENCES
35
Auxiliary (general) competences:ResearchTechnicalGeneral Carefulness, Responsibility and
Self-organisationCommunication (relevant for translation
as a service, not the product)Information security (relevant for
translation as a service, not the product)
IMPROVEMENT 2: COMPETENCES
36
IMPROVEMENT 2: COMPETENCES
37
IMPROVEMENT 2: COMPETENCES
38
IMPROVEMENT 2: COMPETENCES
Client can formulate precise and objective requirements to the provider
Assessment immediately shows which competences stand to the required level and which don’t
39
IMPROVEMENT 3: WORKFLOW ROLES
Map each workflow role (e.g. translate, compile project glossary, revise language etc.) to a number of required competences
40
IMPROVEMENT 3: WORKFLOW ROLES
Example of a competence set: Self-organisation = basic Subject matter = basic Source language rules = basic Target language rules = basic
Role: Can translate
41
IMPROVEMENT 3: WORKFLOW ROLES
42
IMPROVEMENT 3: WORKFLOW ROLES
Vendor Manager / Project Manager quickly assigns workflow roles and schedules the project
saves time
43
IMPROVEMENT 4: ERROR ALLOWABILITY
In each case client indicates which error types are allowed and which are not. The expert puts down the client requirements in a list
One “not allowed” error in the sample
Text fails (client perspective)
44
IMPROVEMENT 4: ERROR ALLOWABILITY
45
IMPROVEMENT 4: ERROR ALLOWABILITY
Assessment stands to the real client needs (“pass / fail”)
46
IMPROVEMENT 5: PROVIDER TRAINABILITY
Single out 2 major error groups: Correcting the error requires minimum training /
instructions; the provider can find and correct all errors of the type in his work himself
Correcting the error requires prolonged training; the provider cannot find all his errors in the text
47
IMPROVEMENT 5: PROVIDER TRAINABILITY
Errors that require minimum training: the original order of text sections is broken broken cross-references text omissions numbers / dates do not correspond to the source glossary / style guide violated non-compliance with reference sources inconsistent terminology non-compliance with regional formatting standards broken tags, line length obvious language errors etc.
48
IMPROVEMENT 5: PROVIDER TRAINABILITY
Errors that require prolonged training:understanding the sourceconfusion of special notions
(subject competence) stylistic devices and expressive means
(literary competence)cultural phenomenaetc.
49
IMPROVEMENT 5: PROVIDER TRAINABILITY
50
IMPROVEMENT 5: PROVIDER TRAINABILITY
What is the percentage of errors requiring minimum training?
PM can instantly take a decision – return the product for further improvement or correct with other resources
saves time51
IMPROVEMENT 5: PROVIDER TRAINABILITY
If all errors influencing the competence are easy to correct, the competence is assessed in two ways (at once):
Current state (“as is”)Potential state (after a short training)
52
NOTES
Provider has to work in normal conditions (enough time, work instructions)
The sample should be restricted to one main subject field according to the subject classifier
The source text should be meaningful and coherent
It is important to differentiate between errors and preferential choices
53
NOTES
To assess a sample the expert has to possess all competences on "advanced" level.
As it is difficult to find such experts in reality, several people can be assigned to assess one sample (e.g. one assesses terminology, another assesses all the other aspects)
Quality predictions for rush jobs cannot be based on normal competence assessment (as the rush quality output is normally lower)
54
EXAMPLE
55
EXAMPLE
56
EXAMPLE
57
EXAMPLE
58
CONCLUSION
The new assessment model replies to all the questions:
Will it blend? – pass / fail What kind of work can I trust to this provider?
What can I not?– competences and workflow roles How quickly can I train the provider? – potential
competences Return for improvement or correct by other
resources? – percentage of errors requiring minimum training
59
BENEFITS Provider and client speak the same “language”
(error types and competences) less debates Saves time when testing providers Simplifies planning of minimum and sufficient
workflow, optimizes resources Allows to avoid extra text processing stages
when not necessary (extra stages avoided) better turnaround (extra stages avoided) more flexible budgets
higher rates provider loyalty and good image for the company
Detailed feedback and training provider loyalty
60
THE FUTURE OF THE TECHNIQUE
Adjustment and testingDedicated software toolIntegration with QA tools
61
62
QUALITY MANAGEMENT PROCESS