Translation and adaptation issues influencing the normative interpretation of assessment instruments

Cross-cultural adaptation: Translation and Adaptation Issues Influencing the Normative

Interpretation of Assessment Instruments

The adaptation of assessment instruments for new target populations is generally required

when the new target population differs appreciably from the original population with which the

assessment device is used in terms of culture or cultural background, country, and language.

Most cross-cultural adaptations of assessment instruments involve the translation of an

instrument from one language into another (Geisinger, 1994). In some instances, however,

adaptations of assessment instruments are needed even when the language remains the same,

because the culture or life experiences of those speaking the same language differ. Numerous

instruments have been adapted for use with groups that differ experientially from those with

whom the instrument was originally used (Geisinger, 1994).

Example: The Minnesota Multiphase Personality Inventory-Adolescent [MMPI-A], for

example, is an adaptation of the MMPI-2 developed to be used with adolescents, whose life

experiences systematically differ from those of adults. Other measures have been developed with

different versions for men and women, for example, again on the basis of the generally different

life experiences and resultant orientations of men and women (Butcher, 1985; Butcher &

Pancheri, 1976).

In recent years, the term test adaptation frequently has replaced test translation. This shift

in terminology documents the adaptations in references to culture, in content, and in wording that

are needed in addition to simple translation in revising a test. These changes are needed to meet

the requirements of a circumstance that differs qualitatively from the original use of an

assessment device. Few guidelines regarding test adaptations have been developed, and those

that do exist have not been widely circulated. The International Test Commission, working in

conjunction with the European Association of Psychological Assessment, the International

Association of Applied Psychology, the International Association of Cross-Cultural Psychology,

the International Association for the Evaluation of Educational Achievement, the International

Language Testing Association, and the International Union of Psychological Science, has begun

systematizing the procedures that are recommended in test adaptation (International Test

Commission, 1993, p. 1).

Procedures for Adapting Assessment Measures for New Populations

Anytime a measure is simply used with a population that differs qualitatively from the

one for which it was originally developed, one must check its continued validity and usefulness

in that new population, even if the test itself remains unchanged. When it is likely that a measure

needs to be adapted for use with a new population, the onus is even greater on those adapting the

measure to demonstrate its usefulness (Geisinger, 1994).

Does a Given Measure Need to Be Adapted?

An early decision to be made is whether or not an assessment device actually needs to be

adapted for a new intended use. The answer to this question is generally easy when applied to an

extreme circumstance. When a new target population is not likely to differ appreciably from the

original population with whom the assessment device was used, it is unlikely that the measure

needs to be adjusted to that new population. A measure that has been developed with students

from Pennsylvania State University would not likely need to be adapted if it were to be

administered to students from the University of North Carolina, for example. (However, if it

were found that some items on the instrument used local vernacular or were based on

information known only to residents of central Pennsylvania, then some minor accommodations

would be needed.) At the other extreme, when one wishes to administer a measure to individuals

who cannot communicate in the language in which the measure was originally written, then the

measure obviously must be translated and perhaps otherwise altered. Indeed, such a translation

or adaptation must typically consider cultural as well as language differences between the

original and the target populations. Descriptions of adaptations of the MMPI for Turkey, Hong

Kong, Greece, and Chile may be found in Savasir and Erol (1990),Cheung (1985), Manos

(1985), and Rissetti and Makes (1985), respectively, and serve as excellent examples of both

language and cultural adaptations of that particular measure (Geisinger, 1994).

Steps for Adapting a Measure

The procedures provided later are intended to serve as general guidelines for adapting an

assessment instrument to a new culture and language population. Others (e.g., Bracken &

Barona, 1991; Butcher, in press-a) have provided lists of the steps suggested for the adaptation of

assessment instruments from one language and culture to another. In some cases, all of the steps

provided here would be required. In other situations, either additional steps would be

recommended or some of those steps provided here might not be needed. Furthermore, in some

instances, specific steps would need to be repeated in an iterative fashion. For example, if

validation research failed to yield acceptable evidence supporting the use of the assessment

instrument, the adaptation process might begin anew. Alternatively, minor changes might be

required, followed by field testing, and so forth. Nevertheless, the steps detailed here provide a

test translator with some common approaches to adapting a measure for a new target

population(Mininel et al., 2012).

Cross-cultural adaptation procedures:

The method followed international guidelines for studies of this kind, including the

following steps:

Initial

translation

Synthesis of

translations

Back

translation

Evaluation by an

expert

committee

Testing of the

penultimate

version

These steps allowed obtaining conceptual, semantic, idiomatic, experiential and

operational equivalences, in addition to content validity (Mininel et al., 2012).

1. Translate and adapt the measure: Sometimes an instrument can be translated or

adapted on a question-by-question basis. At other times, it must be adapted and translated only in

concept. A personality inventory item that asks an adolescent whether he or she prefers going to

watch a movie or to attend a dance will make little sense to an individual from a non-Western,

developing country. The choice presented in the item hence must be adapted culturally to be both

a meaningful decision and one comparable to the item when responded to by members of the

original population. Those translating or adapting an assessment instrument must meet a rigorous

set of requirements. They must be fluent in both languages, extremely knowledgeable about both

cultures, and expert in both the characteristics and the content measured on the instrument and

the uses to which the assessment instrument will be put. Back transition was once the technique

of choice in test translation. In this procedure, an original translation would render items from

the original version of the instrument to the second language, and a second translator—one not

familiar with the instrument—would translate the instrument back into the original language.

This second translation would be compared with the original version of the instrument.

Unfortunately, it has been found that, when translators knew that their work was going to be

subjected to back translation, they would use wording that ensured that a second translation

would faithfully reproduce the original version rather than a translation using the optimal

wording in the target language (Hambleton, 1993).

2. Review the translated or adapted version of the instrument . Rather than perform a

back translation, a more effective technique to ensure that the translation or adaptation has been

conducted appropriately is to use a group of individuals meeting the same criteria as the test

translator (as described in Step 1) to review carefully the quality of the translation or the

adaptation. This editorial review could be accomplished in a group meeting, with individual

reviews by the group members (much like the reviews of journal articles), or through some

combination thereof. Perhaps the most effective strategy would be to have the individuals

composing the panel (a) review the items and react in writing, (b) share their comments with one

another, and (c) meet to consider the points made by each other and to reconcile any differences

of opinion.

3. Adapt the draft instrument on the basis of the comments of the reviewers. The

original test translator or adapter needs to consider the comments made by the panel of experts

described in Step 2. Such deliberations should occur only after the panel has met without the

translator or adapter present to discuss their concerns. The translator or adapter can then meet

with the panel to explain the reasons for drafting the instrument in the manner used. Similarly,

the panel can explain why they reacted to the draft as they did. Through this discursive process,

the final instrument will reflect the best judgment of the entire group (Geisinger, 1994).

4. Pilot test the instrument. A few trial administrations of the assessment device should

be made simply to learn of the potential problems faced by those responding to the assessment

instrument. At this time, a small sample of individuals comparable to the eventual target

population should be identified; administered the assessment device; and then interviewed as to

the understandability of the instructions, acceptability of the time limits, wording of the items,

and so on. The instrument should be changed in light of these early findings (Geisinger, 1994).

5. Field test the instrument. Once pilot testing has been performed and the instrument is

revised accordingly, it is ready to be field tested. At this stage, the instrument is administered to a

large sample representative of the eventual population to be assessed with the instrument. (This

stage can also be performed in several waves, each involving a specific data collection project.)

A number of initial analyses should be performed with the data emerging from this data

collection. Internal consistency reliability analyses should be performed, for example. If

possible, test-retest reliability analyses also should be included in this phase of the instrument

adaptation process. Traditional or item-response theory item analyses should be performed. With

regard to all of the analytic processes described, comparisons should be made with the

corresponding results of the instrument in the original language or culture. It is also possible to

compare performance on individual items, in comparison with data collected with the original

instrument in the country of its origin. Especially useful with cognitive- type assessments, this

set of procedures has been called differential item functioning analyses and was once called item

bias detection methods. (The use of these procedures in cross-cultural testing situations may be

found in O'Brien, 1992, or van de Vijver & Poortinga, 1991. For a brief discussion of some of

the techniques themselves, see Cole & Moss, 1989.)

6. Standardize the scores. If desirable and appropriate, equate them with scores on the

original version. At this time, it is sufficient to state that most instruments that are translated or

adapted for use with a new population, unless they are used purely as research measures,

especially those research measures that are unpublished, use a standard scale such as the T scale

(Mean of 50, standard deviation of 10) to present their scores.

Using data from the field testing, if the sample is large enough (probably at least 750-1,000

participants, depending on how many subgroups in the population need adequate representation),

it probably is possible to perform a standardization. If the sample is not large enough or fully

representative, then collection of a new standardization sample is needed. The analytic

procedures involved in such processes are straightforward and provided in Peterson, Kolen, and

Hoover (1989).

7. Perform validation research as appropriate. Information on the research needed to

document that (a) an assessment device measures the same qualities in both languages or

versions and (b) the new version continues to provide scores that are interpretable in the manner

proposed (Geisinger, 1994).

8. Develop a manual: and other documents for the users of the assessment device. "Test

developers have a responsibility to provide evidence regarding reliability and validity for stated

test purposes, as well as manuals and norms, when appropriate, to guide proper interpretation"

(American Educational Research Association et al., 1985, p. 25). Just adapting a measure for use

in a second setting, no matter how much it might be needed I that setting, is not professionally

acceptable. Documents attesting to the value of using the adapted instrument and describing its

appropriate use are a required part of the adaptation of any measure for professional use

(Geisinger, 1994).

9. Train users. Standard 6.6 of the Standards for Educational and Psychological Testing

(American Educational Research Association et al., 1985) states that responsibility for test use

should be assumed by or delegated only to those individuals who have the training and

experience necessary to handle this responsibility in a professional and technically adequate

manner. Any special qualifications for test administration or interpretation noted in the manual

should be met (p. 42).

10. Collect reactions from users. As one implements the newly revised instrument, it is

frequently propitious to gather comments from those actually using the instrument. Such

comments could guide future changes in the instrument and suggest avenues for needed research.

It is also appropriate to attempt to determine whether the revised assessment device is being

misused or misinterpreted by users (Geisinger, 1994).

What Do Scores on the Adapted Measure Mean?

The interpretation of any score on an assessment device is dependent on many kinds of

information. Clearly, validation (including information on fairness and bias; see Geisinger,

1992a) and reliability information are critical to test interpretation. Normative information

carries meaning to a professional regarding the placement of an individual within a population

distribution of test takers, permitting the professional to interpret the likely meaning of the score.

This interpretative meaning of test scores is especially useful for tests with well-established

validation information and norms (Geisinger, 1994).

Example: Thus, a professional admissions counselor at a university can look at an

individual's scores on the SAT or a similar measure and approximate the probability that the

prospective student will succeed at the university. Similarly, an informed admissions committee

making decisions with regard to the applications for graduate study in psychology can consider

their applicants' Graduate Record Examination scores by using their accumulated (validation)

information on how former applicants have done in graduate study and both their local norms

(whether actually computed or not) and the examination's national test norms. Clinicians use

scores from well-established measures such as the MMPI in much the same way. Experienced

test users can make highly skilled interpretations on the basis of the proper test information,

especially if they are also able to gather the other information that they need to fine-tune their

judgments. Just translating a test and using the same scoring algorithm may not lead to

meaningful scores, even when the understanding of specific scores is well-established in the

original language, nation, or culture. Cultural and other differences between the original and

target populations as well as language differences across the two forms of the assessment device

may render the use of the same scores to denote identical meaning across test forms and cultures

substantially less meaningful (Lu, 1967, as quoted by Cheung, 1985, p. 131).

References:

American Educational Research Association, American Psychological Association, & National

Council on Measurement in Education (1985). Standards for educational and

psychological testing. Washington,DC: American Psychological Association.

American Psychological Association (Producer). (1974-1992). Psychological abstracts. (From

PsycLIT, Version 3.11 [CD-ROM]. Washington, DC: [Producer]. Boston: Silver Platter

International [Distributor].)

Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.),

Educational measurement (2nd ed., pp. 508-600). Washington, DC: American Council on

Education.

Ben-Porath, Y. S. (1990). Cross-cultural assessment of personality: The case for replicatory

factor analysis. In J. N. Butcher & C. D. Spielberger (Eds.), Advances in personality

assessment (Vol. 8, pp. 1-26). Hillsdale, NJ: Erlbaum.

Butcher, J. N. (1985). Current developments in MMPI use: An international perspective. In J. N.

Butcher & C. D. Spielberger (Eds.), Advances in personality assessment (Vol. 4, pp. 83-

94). Hillsdale, NJ: Erlbaum. Butcher, J. N.

Cheung, F. M. (1985). Cross-cultural considerations for the translation and adaptation of the

Chinese MMPI in Hong Kong. In J. N. Butcher & C. D. Spielberger (Eds.), Advances in

personality assessment (Vol. 4, pp. 131-158). Hillsdale, NJ: Erlbaum.

Geisinger, K. F. (1994). Cross-cultural normative assessment: Translation and adaptation issues

influencing the normative interpretation of assessment instruments. Psychological

assessment, 6(4), 304.

International Test Commission, European Association of Psychological Assessment,

International Association of Applied Psychology, International Association of Cross-

Cultural Psychology, International Association for the Evaluation of Educational

Achievement, International Language Testing Association, & International Union of

Psychological Science. (1993). Standards for adapting instruments and establishing score

equivalence. Manuscript in preparation.

Mininel, V. A., Felli, V. E. A., Loisel, P., & Marziale, M. H. P. (2012). Cross-cultural adaptation

of the Work Disability Diagnosis Interview (WoDDI) for the Brazilian context. Revista

Latino-Americana de Enfermagem, 20, 27-34.

Pennock-Roman, M. (1990). Test validity and language background: A study of Hispanic-

American students at six universities. New York: College Entrance Examination Board.

Pennock-Roman, M. (1992). Interpreting test performance in selective admissions for Hispanic

students. In K. F. Geisinger (Ed.), Psychological testing of Hispanics (pp. 99-135).

Washington, DC: American Psychological Association.

Translation and adaptation issues influencing the normative interpretation of assessment instruments

Education

Transcript of Translation and adaptation issues influencing the normative interpretation of assessment instruments