Journal

Developmental Psychology1977, Vol. 13, No, 5, 535-536

Kohlberg's Moral Judgment Scale: Some Methodological Considerations

KENNETH H. RUBIN AND KRISTIN T. TROTTERUniversity of Waterloo

Forty children in Grades 3 and 5 were administered the first three dilemmas of Kohlberg'smoral judgment scale. The children were divided into two groups. The first group receivedthe scale 2 weeks after first administration. The second group received a multiple-choicevariant of the scale. Data analyses revealed low test-retest reliability for scores attained onthe three dilemmas together as well as individually. Scores attained on items within eachdilemma were intercorrelated and found to be low and generally nonsignificant. Reliabilitycoefficients of internal consistency were .77, .73, and .82 for the 3 dilemmas, respectively,and .78 for the total scale. Children who received the multiple choice variant of the scalescored at significantly higher moral levels than did those who received the typical verbalproduction version of the scale.

Kohlberg (Note 1) has described the devel-opment of moral thought and knowledge as aprocess in which individuals pass through sixqualitatively different stages in a universal andinvariant sequential fashion. The method bywhich Kohlberg has assessed an individual'sstage of moral development, the Moral Judg-ment Scale, generally consists of presenting asubject with a series of moral dilemmas and ask-ing him/her to verbally resolve the dilemmas. Onthe basis of these responses the subject is as-signed a moral judgment level. Recently, Kur-tines and Greif (1974) questioned the reliabilityand validity of the Moral Judgment Scale. Theseauthors noted a relative lack of data concerningboth the test-retest reliability and the consis-tency of an individual's moral judgment stagefrom one dilemma to the next. Moreover, thereis evidence that projective test scores areinfluenced by the subject's verbal facility. Forexample, according to Rest (1976), a person canrecognize and discriminate an idea before he canspontaneously verbalize the idea in response to astory dilemma. As a result, a child's level ofmoral judgment may be underestimated whenassessed verbally. The purpose of the presentstudy was to consider each of the aforemen-tioned methodological issues.

The subjects were 40 Caucasian, middle-classchildren attending Grades 3 and 5 in southwest-ern Ontario. The Grade 3 children (M age = 9years 1 month) included 15 males and 7 females.The Grade 5 subjects (M age = 11 years 2months) included 9 males and 9 females. Toavoid possible administration problems asso-

Requests for reprints should be sent to Kenneth H.Rubin, Department of Psychology, University ofWaterloo, Waterloo, Ontario, Canada N2L 3G1.

ciated with the longer interview procedure (Kur-tines & Greif, 1974) only the first three ofKohlberg's (Note 1) original dilemmas were ad-ministered to each child by the second author.These dilemmas were designed to investigate at-titudes towards "life and punishment" (Heinz),"contract and personal relationship" (Joe andhis father), and "property and conscience" (Boband Karl). The most frequently occurring stagein response to the questions across all three di-lemmas was considered to be the subject's dom-inant, overall stage of moral judgment (globalscoring). In addition the most frequently occur-ring stage within each dilemma was calculated.Transcripts of 20 of the initial interviews weregiven to the first author to assess interjudgeagreement. Agreement for the scoring of globalmoral judgment levels was 85%. Following ini-tial testing, the subjects were grouped intomatched pairs according to both their ages andlevels of moral judgment. This was done to as-sure the initial comparability of groups prior toassessing subsequent performance on verbalversus forced choice moral judgment tests.Statistical /-tests were calculated to determinethe initial equivalence of the two samples oneach of the three dilemmas. As expected, theresults of the /-tests were nonsignificant. Chil-dren's moral levels on the three dilemmasranged from 1 to 3B.

To assess test-retest reliability one group(Group 1) was given the same Moral JudgmentScale 2 weeks after initial testing. The secondgroup (Group 2) was administered a multiplechoice variant of the scale. For Dilemma 1, onlyKohlberg (Note 1) Questions 1, 3, and 6 wereincluded in the multiple choice format, and com-parative statistics between those who took thistest and the verbal production format were per-

535

536 BRIEF REPORTS

formed only for these questions. Similarly Ques-tions 1, 3, and 5 were chosen for Dilemma 2 andQuestions 1, 3, 5, and 8a were chosen for Di-lemma 3. The scoring system for the verbal pro-duction versus forced choice comparison wentas follows. Each verbally produced answer wasassigned a moral judgment level. Moreover,each forced choice question was followed by 5answers, representative of the first 5 levels ofmoral judgment but randomly arranged on theanswer sheet. These alternatives derived fromKohlberg's (Note 1) scoring forms which werebased upon actual subject responses.

The stages at which Grade 3 and 5 childrenfell for the second testings ranged from 1 to 3Bfor the verbally produced answers and from 1 to4B for the multiple choice answers. There wereno statistically significant differences betweenthe levels of the two age groups. All data werepooled for further analysis. Pearson product-moment correlation coefficients were calculatedbetween the global moral levels from Week 1 toWeek 3 for Group 1. A statistically significantrelationship was found when the level of moraljudgment was calculated as the modal responselevel across all three dilemmas, /-(19) = .44,p <.05. While this coefficient is statistically sig-nificant, by conventional standards it actuallyrepresents a low test-retest reliability given thesmall sample size and the relatively short periodbetween tests. Perhaps had all nine Kohlbergdilemmas been utilized, the reliability may havebeen greater. Such a study remains to be carriedout. Statistically significant test-retest coeffi-cients were found for the levels calculated in-dividually for Dilemmas 2, r(19) = .39, p < .09and 3.K19) = .62, p < .003.

While the separate issues of "life and punish-ment," "contract and personal relationship,"and "property and conscience" are consideredin Dilemmas 1, 2, and 3, respectively, the globalmethod of scoring may obscure the possibility ofthe attainment of different levels of moral devel-opment given different questions across andwithin dilemmas. To examine this possibility, anaverage item intercorrelation was calculated forthe questions within each individual dilemma forall 40 pretest subjects. These Pearson product-moment correlations were .25 (10 questions), .23(9 questions), and .29 (11 questions) for the threedilemmas, repectively, and .11 for the totalscale. Internal consistency was calculated by

using Guilford's (1956, p. 463) modification ofthe Spearman-Brown formula. The reliabilitycoefficients were .77, .73, and .82 for the threedilemmas respectively and .78 for the total scale.Since the number of items involved in each di-lemma was small, the coefficients may actuallybe overestimates of dilemma reliability (Guil-ford, 1956). In addition, correlation coefficientscalculated between the moral levels attained foreach of the three dilemmas were all nonsig-nificant. Thus, the global method of scoring,which is based on the assumption that moralknowledge is a unitary construct, may be inade-quate as an index of moral development.

Finally, a series of t-tests was conducted be-tween Group 2 means for the verbal productionversus the multiple choice tasks for each di-lemma question as well as for each of the totaldilemma levels. All /-tests were significant at thep < .001 level1 indicating that the subjects' mul-tiple choice scores were consistently higher thanthe verbal production scores. Similarly, whenthe Group 2 multiple choice responses werecompared to the Group 1 verbally producedsecond test responses, the former means wereall found to be significantly higher (p < .001)than the latter means. In conclusion, the presentstudy describes a number of deficiencies withthe Moral Judgment Scale. Future research of alarge scale nature would do well to consider thepsychometric properties of this measure.

1 All /-test and mean data are available from thefirst author.

REFERENCE NOTE

1. Kohlberg, L. Instructions for standard scoring,Form A. Unpublished manuscript, Harvard Uni-versity, 1973.

REFERENCES

Guilford, J. P. Fundamental statistics in psychologyand education. New York: McGraw-Hill, 1956.

Kurtines W., & Greif, E. B. The development ofmoral thought: Review and evaluation of Kohlberg'sapproach. Psychological Bulletin, 1974,81, 453-470.

Rest, J. R. New approaches in the assessment ofmoral judgment. In T. Lickona (Ed.), Moral devel-opment and behavior. New York: Holt, Rinehart, &Winston, 1976.

(Received January 14, 1977)

Journal

Documents

Transcript of Journal