EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

15
1973, 6, 517-531 METHODOLOGICAL AND ASSESSMENT CONSIDERATIONS IN EVALUATING REINFORCEMENT PROGRAMS IN APPLIED SETTINGS' ALAN E. KAZDIN THE PENNSYLVANIA STATE UNIVERSITY The extensive use of reinforcement programs in applied settings has led to experimenta- tion that often fails to consider potential problems in design. The logic of the within- subject design is reviewed and specific designs employed in reinforcement programs are discussed. For each design (ABAB, or multiple-baseline design across behaviors, individ- uals, or situations), effects are discussed that make that design less powerful with respect to demonstrating the effect of the experimental variable. Problems in interpreting results of experiments in this area of inquiry are evaluated from the standpoint of internal and external validity. The issue of control groups is presented with considerations as to situa- tions that require their use. Finally, the assessment strategy for evaluating operant pro- grams is discussed and recommendations are made for measurement of behaviors in addition to the target response. The application of reinforcement systems to various populations in treatment and educational settings has proliferated in recent years (Ban- dura, 1969; Kazdin and Bootzin, 1972; O'Leary and Drabman, 1971). In spite of the apparent success of programs applying contingent social and/or token reinforcement, the evaluation of such programs, in many instances, has revealed a failure to recognize certain methodological fac- tors that may influence the results of experi- ments or their interpretation. The present paper attempts to discuss certain issues of experi- mental design that need to be considered in evaluating behavior modification (operant con- ditioning) programs. The discussion is restricted in general to those studies specifically evaluating reinforcement programs because they usually employ a within-subject experimental design. Studies have evaluated reinforcement pro- grams in a variety of settings, including psychi- 'The author gratefully acknowledges K. Daniel O'Leary for his reading of the manuscript. Mono- graphs of this article are available for $1.00 from the Business Office of the Journal of Applied Behavior Analysis, Department of Human Development, Uni- versity of Kansas, Lawrence, Kansas 66044. Ask for Monograph #3. atric hospitals (Ayllon and Azrin, 1965) classrooms (Wolf, Giles, and Hall, 1968), shel- tered workshops (Zimmerman, Stuckey, Garlick, and Miller, 1969), institutions (Burchard, 1967), home-style treatment facilities (Phillips, 1968) the home (Wahler, 1969), and several others. The design employed in such studies is referred to as the intrasubject replication design (Sidman, 1960). A brief conspectus of the rationale be- hind this design will permit background for presenting methodological issues. The reader is referred to several treatises with excellent de- scriptions of design in this area of inquiry (Baer, 1968, 1971; Baer, Wolf, and Risley, 1968; Bijou, Peterson, and Ault, 1968; Bijou, Peter- son, Harris, Allen, and Johnston, 1969; Mc- Namara and MacDonough, 1972; Risley, 1970; Risley and Baer, 1973; Risley and Wolf, 1972; Sidman, 1960; Thoresen, 1972; Wolf and Ris- ley, 1971). Briefly, the basic logic of the design is to determine operations that relate functionally to the performance of behavior. The effect of a variable (e.g., contingent praise) on behavior is demonstrated by the consecutive presentation, removal, and representation of the variable to a subject. Control over a behavior is demonstrated 517 NUMBER 3 (FALL 1973) JOURNAL OF APPLIED BEHAVIOR ANALYSIS

Transcript of EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

Page 1: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

1973, 6, 517-531

METHODOLOGICAL AND ASSESSMENT CONSIDERATIONS INEVALUATING REINFORCEMENT PROGRAMS

IN APPLIED SETTINGS'

ALAN E. KAZDIN

THE PENNSYLVANIA STATE UNIVERSITY

The extensive use of reinforcement programs in applied settings has led to experimenta-tion that often fails to consider potential problems in design. The logic of the within-subject design is reviewed and specific designs employed in reinforcement programs arediscussed. For each design (ABAB, or multiple-baseline design across behaviors, individ-uals, or situations), effects are discussed that make that design less powerful with respectto demonstrating the effect of the experimental variable. Problems in interpreting resultsof experiments in this area of inquiry are evaluated from the standpoint of internal andexternal validity. The issue of control groups is presented with considerations as to situa-tions that require their use. Finally, the assessment strategy for evaluating operant pro-grams is discussed and recommendations are made for measurement of behaviors inaddition to the target response.

The application of reinforcement systems tovarious populations in treatment and educationalsettings has proliferated in recent years (Ban-dura, 1969; Kazdin and Bootzin, 1972; O'Learyand Drabman, 1971). In spite of the apparentsuccess of programs applying contingent socialand/or token reinforcement, the evaluation ofsuch programs, in many instances, has revealed afailure to recognize certain methodological fac-tors that may influence the results of experi-ments or their interpretation. The present paperattempts to discuss certain issues of experi-mental design that need to be considered inevaluating behavior modification (operant con-ditioning) programs. The discussion is restrictedin general to those studies specifically evaluatingreinforcement programs because they usuallyemploy a within-subject experimental design.

Studies have evaluated reinforcement pro-grams in a variety of settings, including psychi-

'The author gratefully acknowledges K. DanielO'Leary for his reading of the manuscript. Mono-graphs of this article are available for $1.00 from theBusiness Office of the Journal of Applied BehaviorAnalysis, Department of Human Development, Uni-versity of Kansas, Lawrence, Kansas 66044. Ask forMonograph #3.

atric hospitals (Ayllon and Azrin, 1965)classrooms (Wolf, Giles, and Hall, 1968), shel-tered workshops (Zimmerman, Stuckey, Garlick,and Miller, 1969), institutions (Burchard, 1967),home-style treatment facilities (Phillips, 1968)the home (Wahler, 1969), and several others.The design employed in such studies is referredto as the intrasubject replication design (Sidman,1960). A brief conspectus of the rationale be-hind this design will permit background forpresenting methodological issues. The reader isreferred to several treatises with excellent de-scriptions of design in this area of inquiry (Baer,1968, 1971; Baer, Wolf, and Risley, 1968;Bijou, Peterson, and Ault, 1968; Bijou, Peter-son, Harris, Allen, and Johnston, 1969; Mc-Namara and MacDonough, 1972; Risley, 1970;Risley and Baer, 1973; Risley and Wolf, 1972;Sidman, 1960; Thoresen, 1972; Wolf and Ris-ley, 1971).

Briefly, the basic logic of the design is todetermine operations that relate functionally tothe performance of behavior. The effect of avariable (e.g., contingent praise) on behavior isdemonstrated by the consecutive presentation,removal, and representation of the variable to asubject. Control over a behavior is demonstrated

517

NUMBER 3 (FALL 1973)JOURNAL OF APPLIED BEHAVIOR ANALYSIS

Page 2: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

ALAN E. KAZDIN

if the behavior can be altered at will by alteringthe experimental operations. This researchstrategy is contrasted with the between-groupapproach, which seeks to demonstrate groupdifferences after manipulation of the indepen-dent variable(s), usually in a single session. Inthis design, the data are subjected to statisticalevaluation, and the focus is on mean differencesinstead of the behavior of individual subjects.

In the within-subject design, the effects of theexperimental variables may be immediatelyobserved on response rates. Under such circum-stances, it is recommended to examine variablesin an "improvised and rapidly changing de-sign" (Skinner, 1969, p. 112), with the goal ofachieving control over behavior. This approachbypasses variability due to intersubject differ-ences which is included in the design of be-tween-group experiments. This is a desirablefeature from several standpoints. First, inter-subject variability is not a feature of the be-havioral process of the individual subject, but isan effect due in part to the method of study.Second, in evaluating experimental findings intraditional group designs, intersubject variabilityserves as a base for statistical evaluation of theresults. Because of this, as Sidman (1960) noted,lawful effects of variables may be obscured. Asimilar problem is that averages from groupdata usually have no analogue in representingthe behavioral process of individuals. The formor shape of the function obtained with groupdata does not necessarily represent the behaviorchange process of the individual. Several subjectsin the group may be affected differently by theexperimental manipulation. This is obscured inthe between-group analysis. Having outlined,briefly, the rationale for the within-group ap-proach, types of these designs used and theirrespective sources of problems may be identified.

Specific Designs

The first design is frequently referred to asthe ABAB design (where A refers to baselineconditions, and B refers to the experimental con-

dition). Other names for the design include re-versal technique (Baer et al., 1968), intrasubjectreplication design (Sidman, 1960), equivalenttime-samples (Campbell and Stanley, 1963),and intensive design (Chassan, 1967). The de-sign employs alternate presentations of the base-line and experimental conditions within a sub-ject or group of subjects. Several variations maybe used in this design. More than one experi-mental condition may be presented before thesecond baseline (or reversal) phase. For ex-ample, O'Leary, Becker, Evans, and Saudargas(1969) included several experimental phases inevaluating the effect of token reinforcement onclassroom deportment. Separate phases includedthe effect of rules alone, educational structure,praise for appropriate and extinction of inap-propriate behavior, and token reinforcement.After the token phase, a reversal of conditionswas effected. The reversal condition (usually areturn to baseline) is an essential ingredient inthis design. Only such a reversal can demonstratethat behavior changes only when the experi-mental condition is in effect. This design is quitepowerful and rules out several alternative ex-planations that may account for behaviorchange.

In spite of the usefulness and power of thedesign, it makes a major presupposition, namelythat behavior changes made under various ex-perimental conditions are reversible when base-line conditions are reinstated. The demonstrationof a functional relationship between the presenceof the experimental condition and performancerequires that the changes made be transient andtherefore reversible. However, the changesmade in reinforcement programs might not bereversible when the experimental condition iswithdrawn (e.g., Surratt, Ulrich, and Hawkins,1969). Indeed, we would hope that they are notalways reversible, because this is tantamount todemonstrating only slight resistance to extinc-tion.

If the effects of a reinforcement program arenot reversible, the effect of the experimentalcondition(s) is not clear. For example, Hewett,

518

Page 3: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

METHODOLOGICAL AND ASSESSMENT CONSIDERATIONS

Taylor, and Artuso (1968) employed six classesin evaluating token reinforcement programs.

Some classes received 17 weeks of baseline fol-lowed by 17 weeks of the program; otherclasses received 17 weeks of the program fol-lowed by 17 weeks of baseline (i.e., no pro-

gram). Two other classes were included in thedesign, to wit 34 weeks of the program and 34weeks of baseline, respectively. For our purpose,

it is important to note that the two groups thatreceived the token program followed by baselineconditions failed to show a decline in target be-haviors. Further, appropriate behaviors increasedwhen the programs were removed. While specu-

lations may be given to account for this (suchas claiming that stimuli other than tokens in theenvironment have become reinforcers and main-tain behavior), it is unclear that reinforcementwas controlling any behavior in these two

classes because no reversal occurred when thecontingencies were removed. Addressing them-selves to this problem, Bijou and associates(Bijou et al., 1969) recommended using shortexperimental periods, which would facilitateobtaining a reversal of effects. This recommenda-tion is useful when the goal is to determine short-term effects of experimental conditions. How-ever, this design discussed (i.e., ABAB) by itself,may be inadequate when one wishes to studynon-transient effects. For example, if an investi-gator desires to implement a condition that leadsto relatively permanent changes in behavior, thisis difficult to demonstrate in this design. Hence,if an investigator adds to some experimental con-

dition, a variable that should enhance resistanceto extinction, this cannot be clearly demon-strated in this design. When behavior does not

reverse, other explanations for the effect remaintenable. However, variables that may result innon-transient effects can be evaluated after a

reversal phase has been successfully employed(Kazdin and Polster, 1973).

Recently, other within-group designs havebeen discussed that are not susceptible to thereversibility problem outlined above. These de-signs are particularly useful in situations where:

effecting a reversal would be undesirable becauseof exigencies of the situation; a reversal in re-sponses would not be expected, as in trainingcompetence in academic skills; or where an ex-perimental condition used is expected to inhibita reversal of responses. The multiple-elementbaseline design (Sidman, 1960) or multiple-baseline design (Baer et al., 1968) provides avaluable alternative to the ABAB design. Inthis design there is no reversal of conditions re-quired to demonstrate the efficacy of the con-tingencies. Instead, data are collected acrossbehaviors, across individuals, or across situations.

In the multiple-baseline design across be-haviors, two or more behaviors are observed forthe subject(s). After the behaviors have reachedstable rates, the experimental condition is im-plemented for only one of the behaviors whilebaseline conditions are continued for theother(s). The behavior exposed to the experi-mental condition should change while the otherbehavior remains at baseline levels. When ratesare stable for both behaviors, the second be-havior is brought into the contingency. Thisprocedure is continued until all behaviors forwhich multiple-baseline data were gathered aresequentially brought into the contingency.Ideally, each behavior changes only as it isincluded in the experimental contingency andnot before. This is a powerful demonstrationthat the experimental condition exerts controlover the behavior. The strength of the demon-stration stems from the consideration that eventsoccurring in time other than the experimentalcondition cannot plausibly account for thespecific changes in behavior. This is demon-strated without a reversal of experimental con-ditions.A major area of concern in using this design is

that one must be reasonably assured beforehandthat the target behaviors used are not inter-dependent or interrelated highly with each other.In such a situation, implementing a contingencyfor the performance of one behavior may beexpected to alter the behavior(s) for which con-tinued baseline data are collected. For example,

519

Page 4: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

ALAN E. KAZDIN

in a classroom situation it may not be the mostappropriate design to gather multiple-baselinedata across inappropriate motor behavior, in-appropriate verbalizations, and inappropriatetasks as three separate target behaviors. Al-though these behaviors are used as distinct cate-gories that can be reliably observed, they are alsomoderately intercorrelated in terms of fre-quency within individual children's repertoires(Kazdin, 1973c).2 Change in one of these re-sponses may result in other response changes.An even greater demonstration of this problemof response correlations is evident from theliterature on generalized imitation. In severalstudies, it has been shown that reinforcement forsome imitative behavior leads to a generalizedset for imitative behaviors. A multiple-baselinedesign across behaviors might not be able todemonstrate that responses are not imitated un-til the contingency is applied to the specificimitative behavior. As soon as some imitativebehavior is reinforced, other responses wouldchange even though not reinforced (Baer,Peterson, and Sherman, 1967; Metz, 1965;Peterson and Whitehurst, 1971), unless theyare topographically dissimilar responses(Garcia, Baer, and Firestone, 1971).The problem of intercorrelations among re-

sponses has not been evident in the few mul-tiple-baseline studies across behaviors (e.g.,McAllister, Stachowiak, Baer, and Conderman,1969; Wolf et al., 1968). However, it is a con-sideration an investigator should make whendeciding on the type of design to best demon-strate the efficacy of experimental conditions (cf.Buell, Stoddard, Harris, and Baer, 1968; Pender-grass, 1972).

2This problem stems, in part, from the definitionof an operant. As a response class, any operant mayinclude responses or elements that are part of otheroperants. Skinner (1953, p. 94) noted this in statingthat: "In reinforcing one operant we often produce anoticeable increase in the strength of another." and"The reinforcement of a response increases theprobability of all responses containing the same ele-ments." For a discussion of problems associated withdefining operants, the reader is referred to Schick(1971).

In the multiple-baseline design across indi-viduals, baseline data are gathered for at leastone behavior across several persons. After be-havior stabilizes across subjects, the experimentalcondition is invoked for one subject while base-line conditions are continued for the other sub-ject(s). Again, as the experimental condition isextended to include separate individuals, theresponse frequency changes. This demonstrationshows that behavior of the subject does notchange until he is included in the experimentalcondition.

As with the previous multiple-baseline de-sign, one major aspect may make this designproblematic. If it is possible that the alterationof the behavior of one subject will influence thebehavior of other subjects, the design losespower. In this situation, implementation of theexperimental condition for the first subject maydramatically alter the behavior of another sub-ject for whom baseline conditions are continued.For example, Broden, Bruce, Mitchell, Carter,and Hall (1970) showed that altering the at-tentive behaviors of one subject in the classroomthrough contingent reinforcement changed be-havior of an adjacent peer. Further, data theycollected indicated that when the attentive be-haviors were changed in one subject, the sourceof distraction and possibly social reinforcementfor the other subject decreased, so he improvedbecause there was less opportunity and peer rein-forcement for inattentiveness. Similarly, Kazdin1973b) demonstrated that contingent reinforce-ment of attentive behavior for a target subjectincreased attentive behavior in an adjacent sub-ject. However, it was also shown that reinforce-ment of inattentive behavior in the target subjectreliably increased attentive behavior in the non-target subject. The discriminative stimulus valueof social reinforcement appeared to account forthe improvement in behavior of the adjacentsubject. The point of these studies is to demon-strate that introducing contingencies for someindividuals in a situation may be expected toalter the behaviors of other individuals undersome circumstances, viz., when the behavior of

520

Page 5: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

METHODOLOGICAL AND ASSESSMENT CONSIDERATIONS

one subject influences the behavior of an ad-jacent peer (Broden et al., 1970) or when socialreinforcement provides a discriminative stimulusfor probable reinforcement for an adjacent peer(Kazdin, 1973b) or when there is a limit to theamount of reinforcement available (Sechrest,1963). In such situations, the use of a multiple-baseline design across individuals would noteffectively demonstrate the specific effects of theexperimental condition on the target behavior.The implementation of the contingency for thefirst subject might change the behavior of others,even though the baseline conditions were con-tinued for the other subjects.

In the multiple-baseline design across situa-tions, data are collected for a target behavior forone or more subjects across different circum-stances or situations. For example, in alteringthe promptness of individuals in an elementaryschool situation, one might collect data (numberof students late and number of minutes late)across several situations (arrival to class in themorning, after recess, after lunch, after assem-blies). After collecting baseline data in allsituations, the experimental contingency is insti-tuted to control the behavior in one situation.Baseline data are continued for behavior in allother situations until each is consecutively in-cluded into the contingency. As with previousmultiple-baseline designs, this design is moreeffective when there is little or no correlation ofbehavior across these situations. If behaviorchange in one situation is expected to alter be-havior in another situation, this design is a lesspowerful demonstration of the effects of thecontingencies. For example, Hunt and Zimmer-man (1969) demonstrated that contingent re-inforcement for productivity in one time periodincreased productivity in a time period in whichno reinforcement was delivered in a simulatedsheltered workshop. There may be some ques-tion here as to the specific operation of the con-tingencies.

In the three basic types of multiple-baselinedesigns discussed, each has one potential weak-ness in powerfully demonstrating the effect of a

particular experimental condition, viz., concom-itant changes in the areas for which baselinedata are collected. Whether this is the case in theparticular instance the investigator decides touse a design must be determined primarily fromexperience. Concomitant changes that may occuras a result of implementing a contingency forone behavior (individual or situation) have tobe determined empirically. In some instances,the investigator can rely cn the well-documentedexperience of others. For example, if one wereinterested in evaluating classroom deportmentin a multiple-baseline design across situations ortime (such as morning and afternoon classperiods), there already is consistent evidenceshowing that changes made in one of thesetime periods do not appreciably alter behaviorin the other (Becker, Madsen, Arnold, andThomas, 1967; Kuypers, Becker, and O'Leary,1968; Meichenbaum, Bowers, and Ross, 1968;O'Leary et al., 1969).The within-subject designs outlined above can

be employed in most situations. In spite of thereasons outlined early in the paper as to the ad-vantages of these designs, one must remaincognizant of the possible interaction obtainedbetween the experimental manipulation and thedesign employed in determining behavior. Thereis evidence bearing on this from several quartersin the experimental literature. For example, inexperiments varying conditioned stimulus in-tensity in eyelid conditioning and signal inten-sity in reaction time experiments, the effects ofvariations of the stimuli depend upon whetherthey are evaluated within or between groups(Grice and Hunter, 1964). Similarly, the effectsof different amounts of reinforcement in dis-crimination learning tasks differentially affectcorrect responses, again, depending on whetherone group is exposed to different levels of rein-forcement or separate groups are used (Lawson,1957; Schrier, 1958). These experiments andothers indicate that the effects obtained may bedependent upon the design of the experiment.There is a greater lesson to be learned from thisthan the simple recommendation that one

521

Page 6: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

ALAN E. KAZDIN

should not become overly dependent on onedesign in examining the effects of a particularoperation. Since the major interest in operantwork is determining how environmental ma-nipulations functionally control behavior, it isimportant to be more analytic about an experi-mental design that, in part, dictates the results.What about the design influences behavior, andhow can these influences be brought undercontrol, minimized, or altered? When an experi-mental design is examined in this light, it be-comes another experimental operation that ex-erts some functional relation to behavior. Whenthe design does determine the result in someway, it is important to determine precisely howit accomplishes this and over what parametriclevels of the experimental manipulation. Sid-man (1960, pp. 334-340) recommended com-paring the effects of experimental manipulationswhen scheduled separately or as combined withother manipulations. This is similar to recom-mending a closer scrutiny of the effects of our

designs.

Evaluation of ResultsInvestigations of reinforcement programs in

applied settings may introduce problems inevaluating the results. Initially, to discuss someof these problems, the distinction introduced byCampbell and Stanley (1963) on the validitiesof experimentation is useful. These authors referto internal validity as the degree to which theresults of an experiment are considered to be dueto the experimental manipulation. Externalvalidity refers to the extent to which the findingsobtained in a study may be extended or general-ized to other groups and settings. Campbell andStanley (1963) noted that the equivalent time-samples design (a version of the ABAB or re-versal design) is quite strong with respect to thepossible sources of threat to internal validity. Insuch designs, several rival hypotheses accountingfor the results are ruled out. It is unlikely thatevents (outside of the experimental manipula-tion) that occur in time (history), growth ordevelopmental processes within the subject (mat-

uration), systematic shifts in performance overtime resulting from the unreliability of measure-ment (e.g., regression and changes in themeasurement device), selective loss of subjects(mortality), repeated assessment (testing), andother factors account for the results. As forexternal validity, however, they list some factorsthat may delimit the generalizability of theresults. Both these types of validity will bediscussed in light of research in reinforcementprograms in applied settings.

Considering internal validity, investigatorsmust be reasonably assured that there are nofactors in the design that can account for theresults other than the intended manipulatedoperation. This statement seems so basic thatit might not warrant protracted discussion.However, in several studies it is evident thatthere are factors that covary with implementa-tion of the experimental operation. In someinstances, the operation of these extraneousfactors plausibly are causative of the changesattributed to the experimental operation. Ifthese extraneous factor(s) could not account forthe change entirely, they can interact with theexperimental operation as a codeterminant ofthe results. Examples of extraneous factors thatcovary with experimental conditions are evidentin several studies. One obvious factor that co-varies with conditions is instructions that conveyto subjects how they are supposed to perform.For example, Ayllon and Azrin (1965) in-structed subjects that they could continue towork even though they would receive tokenreinforcement for not working (i.e., a "vacationwith pay", p. 366). The rapid changes noted inperformance were attributed to the reinforce-ment contingencies. However, different instruc-tions preceding each experimental phase ap-peared to contribute, in part, to the abrupt effectof contingent reinforcement. Previous work hasshown that although instructions may not besufficient to sustain performance relative to con-tingent operant consequences, they are effectivein initiating behavior change (Ayllon and Azrin,1964; Hopkins, 1968; Packard, 1970). Em-

522

Page 7: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

METHODOLOGICAL AND ASSESSMENT CONSIDERATIONS

ploying reinforcement contingencies alone re-sults in behavior changes that are less dramaticthan when accompanied by contingency instruc-tions (Herman and Tramontana, 1971), al-though there are exceptions to this (Kazdin,1973c).The role of instructions is especially im-

portant to distinguish from that of reinforce-ment when the target behaviors are not ap-parent in the repertoire of the subjects. In thissituation, the effect of reinforcement may bedistinguished from initial training or practice inthe reinforced responses. These are usually con-founded in reinforcement studies. In one excep-tion (Suchotliff, Greaves, Stecker, and Berke,1970), psychiatric patients were exposed totraining for grooming skills (where individualswere exposed to classes instructing them in theexecution of the skills). Subsequently, a tokenreinforcement system was instituted to reinforcethese behaviors. The results indicated thatgrooming behaviors did not increase duringtoken reinforcement relative to what they wereduring training. Token reinforcement main-tained the behaviors developed during theformal instruction period. However, the incre-ments in performance over baseline were due totraining and instruction. In several studies, theeffect of instructions, in this sense, cannot beseparated from the reinforcement contingenciesthemselves.An additional factor when evaluating rein-

forcement is the possible experimental opera-tions that are included in the effect of the con-tingent application of an incentive per se. Theintroduction of reinforcement for appropriatebehavior includes several factors that lead tobehavior change. For example, aside from anincentive function, reinforcement also serves aninformational function (Bandura, 1971). Effectsobtained with reinforcement may be unjusti-fiably attributed to the incentive function ofconsequating events, rather than the informa-tional value of these events. Yet, providing feed-back alone to a subject about the adequacy of hisperformance may affect performance dramati-

cally. While this may demonstrate that feedbackserves as reinforcement for behavior (Mathis,Cotton, and Sechrest, 1971, p. 77), this does nothelp when investigators wish to make con-clusions about the effect of, say, token reinforce-ment in changing behavior. The introduction oftoken reinforcement includes both the effect ofincentives and informative feedback. These maybe reduced to at least two distinct experimentaloperations. For example, Zimmerman et al.(1969) evaluated independently the effects offeedback and token reinforcement on produc-tivity of clients in a sheltered workshop. Beforethe token reinforcement phase, subjects weretold that they could practise (by working well)so that they would know how to earn tokens.During this phase, subjects were told how manytokens they would have earned, if tokens weregiven out. Subsequently, a token reinforcementphase was instituted. Feedback alone increasedproduction, and token reinforcement resulted infurther increases. Although token reinforcementwas more effective, few studies have used as abase for comparison the feedback that is implicitin token reinforcement. Hence, conclusionsabout the effect of token reinforcement per se(particularly those made about the magnitude ofchange) often cannot be made on the basis ofthe experimental manipulation.

Instructions to agents who are administeringthe reinforcement program may also covary withthe implementation and removal of the rein-forcement contingencies. For example, teachersmay be told or be led to believe that certainexperimental phases will result in dramaticchanges in behaviors. This may change teacherbehavior, which influences the target behaviors.Studies have not carefully examined this.

In considering external validity, several issuesin reinforcement programs require mentioning.The question of the extent to which findingshold for other subjects and other settings thanthose that were included in an experiment areincluded in the issues discussed here.

First, there is a possibility of multiple-treatment interference (Campbell and Stanley,

523

Page 8: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

ALAN E. KAZDIN

1963), which may delimit generalization of theresults. Whenever multiple phases are appliedto a group, the conclusions derived from a latertreatment may depend on previous phases be-cause the effects of each are not erasable. For ex-

ample, the effects of contingent reinforcementmay be evaluated in a design in which it is pre-

ceded by noncontingent reinforcement, or

punishment (e.g., verbal reprimand by teachers).The results can only be generalized to includeother individuals exposed to a similar sequence

of events. Also, the conclusions may be general-ized only to conditions where the reinforcementcondition is introduced repeatedly, interspersedwith other conditions, and not to situations inin which the reinforcement is continuallypresent or introduced only once. For example, intwo of their experiments, O'Leary and Becker(O'Leary and Becker, 1967; O'Leary et al.,1969) implemented token reinforcement pro-

grams in classroom situations. In one study, in-structions, praising appropriate and ignoring

inappropriate behaviors along with token rein-forcement, were introduced simultaneously; inthe other study, these procedures were intro-duced sequentially in a cumulative fashion. Incomparing these studies, it is evident that thesimultaneous introduction of the conditions ledto greater change than the sequential introduc-tion of component parts. Thus, in generalizingthe effects of token reinforcement procedures,one must be careful to specify the manner inwhich the component parts of the procedures(informative feedback, instructions, approval,ignoring) are introduced. The manner in whichthe program is introduced, as dictated by the ex-

perimental design, may have consequences as to

the conclusions derived and their generaliza-bility. In laboratory studies of punishment, forexample, the manner in which the aversive event

is introduced, high intensity initially or gradualapproximation of high intensity, dictates theefficacy of the procedure (Azrin and Holz, 1966,p. 393).

The problem of multiple-treatment inter-ference arises also in situations in which a base-

line (or reversal) is interspersed with differentexperimental conditions. If different versions ofa reinforcement condition are presented andpreceded by a baseline condition, it is difficult toseparate the effects of the particular order ofconditions from the effect of the conditions.Again, any conclusions reached about a par-ticular condition are restricted to those individ-uals who are exposed to the particular sequenceof conditions. When multiple reversals are in-cluded in the design, it is often assumed that ifbehavior returns to baseline levels, that any newcondition may be compared to any other con-dition preceded by performance because of theequivalence of performance during reversalphases. There are several problems in com-paring treatments within subjects related tothe sequence effects. First, equivalent perform-ance during reversals only demonstrates anequivalence in response rates at the time thereversal phase is in effect. However, reinstatingthe target behavior may become easier after re-peated exposure to experimental conditions. Ithas already been demonstrated that repeatedexposure to a series of alternating conditioningand extinction trials results in a gradual declinein responding over the extinction trials (Perkinsand Cacioppo, 1950). A similar phenomenon,no doubt, exists in the repetitive exposure to thereconditioning phases themselves.A related factor that affects the external

validity of experiments in applied settings hasbeen called reactive effect of experimental ar-rangements (Campbell and Stanley, 1963). Thisrefers to the fact that the particular experimentalarrangement may preclude generalization oftreatment effects across time, situations, andindividuals. As this relates to reinforcement pro-grams, several aspects delimit generalization ofthe effects of the program. The definition ofgeneralization varies here depending upon theparticular problem considered. Initially, the de-velopment of reinforcement procedures in ap-plied settings is usually accomplished by havingobservers record target behaviors of the clientsor subjects. Although the influence of observers

524

Page 9: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

METHODOLOGICAL AND ASSESSMENT CONSIDERATIONS

is assumed not to affect the differential perform-ance of, for example, school students, thepresence of observers at all may constitute a re-active arrangement that delimits generalizationof the findings to situations where observers arepresent. In classroom settings, both teacher andstudents are usually aware of the presence ofthe observers and may perform differently insituations where the observers are absent. Thereis only sparse evidence bearing on this point.Surratt et al. (1969) obtained data indicatingthat the presence of an observer in the classroomis a sufficient discriminative stimulus for per-formance of the target response. At the end ofa reinforcement program designed to increase"time working" in students, two postchecks wereconducted to determine to what extent the be-haviors were maintained. The data from thefirst check were gathered by TV camera. Datafor the second postcheck were gathered by anobserver whose presence was previously associ-ated with the experimental condition. "Timeworking" was consistently higher (in all foursubjects) in the data collected by the observer.This suggests that there might be reactive effectsof using observers in applied settings, particu-larly if the experimental conditions becomeassociated with their presence. Subsequent intro-duction of an observer (SD) may occasion thetarget behaviors, and the findings cannot begeneralized beyond situations in which the ob-server is present.A related factor not to be neglected is the

effect of the investigator on the behavior ofagents delivering reinforcement. The presenceof the experimenter in the situation may oc-casion the desired response. Again, consideringthe classroom situation, the presence of theexperimenter may serve as a reminder to theteacher to deliver reinforcement or to increaseher rate of social approval. (When it is evidentthat observers are recording teacher behaviors,observers may also control these responses.) Themajor point is that the results obtained may belimited to those situations in which the experi-menter is present (either in the situation itself

or available for consultation). Although thepresence of the experimenter has been sparselyevaluated in reinforcement programs directly,there is evidence showing that the experi-menter's presence may do this in other contexts.Peterson and Whitehurst (1971) showed thatsystematic variation of experimenter's presenceexerts functional control over imitative behaviorin children.

The Use of Control GroupsIn discussions of work in the functional

analysis of behavior in applied settings, thematter of control groups arises infrequently.There is great justification for ignoring the useof comparative groups in most instances. Trea-tises on experimental design (e.g., Underwood,1957) usually recognize certain instances inwhich no control group is required to evaluateunambiguously the effect of the experimentalmanipulation. The design that fits this situationis an extended time-series design in which dataare available for the subject(s) over a longperiod of time, or when data are available for arelatively short period, but behavior changeswith the presentation and removal of the experi-mental variable (i.e., the ABAB design discussedearlier).

Other reasons have led investigators toeschew comparison groups stemming from phil-osophical or presuppositional considerationsrather than simply convenience or design. Theuse of comparison groups usually implies a sta-tistical evaluation of the data in terms of mea-sures of central tendency and variability. Severalassumptions are required for use of variousstatistical techniques and risks attendant on theirviolation must be considered. Further, in theexperimental analysis, the goal transcends ob-taining mean differences between groups withexposure to an experimental operation and onethat is not. Achieving functional control overbehavior makes the investigator concerned withdetermining effective variables that will alter theindividual's behavior. These variables necessarilyentail within-subject manipulations. Also, ex-

525

Page 10: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

ALAN E. KAZDIN

amining group means does not really demon-strate that the experimenter achieved controlover behavior in individual cases. There is thesubject generality problem (Sidman, 1960) orrepresentativeness of the findings. Comparisonsof treated groups with untreated groups obscurea closer examination of the effect of treatmenton those individuals in the experimental groups.Nevertheless, there are instances, particularly inrecent years, that merit utilization of experi-mental groups in between- as well as within-subject comparisons. The salient instances willbe presented.

Whereas the initial interest in operant workin applied settings was focused almost entirelyupon examining effective experimental opera-tions that functionally related to behavior, thisinterest has broadened (Staats, 1970). There isgreater use of finding effective treatments thatcannot easily be evaluated in single-subject orsingle-group designs. For example, Staats andhis associates began work with reinforcementprocedures on training reading skills by a care-ful scrutiny of the performance of individualsubjects (e.g., Staats and Butterfield, 1965).This led to refinement and extensions of pro-cedures which, though modified, were evaluated(statistically) as treatments and compared with acontrol group (e.g., Staats, Minke, and Butts,1970). The extension of findings from anexamination of single-group to between-groupdesigns has been required in light of the greateraims of reinforcement procedures in appliedsettings. As a recent example of this, initialtoken reinforcement programs in psychiatrichospitals (e.g., Ayllon and Azrin, 1965) focusedon evaluating the program on within-hospitalbehavior rather than global measures such as dis-charge and readmission rates (Ayllon and Azrin,1968, p. 27). However, as the efficacy of theprocedures evolved, it became important to de-termine how effective reinforcement procedureswere relative to traditional techniques (Birky,Chambliss, and Wasden, 1971; Marks, Sanoda,and Schalock, 1968; Hartlage, 1970) and howwell they fared when compared to untreated

groups in terms of follow-up success (Stayer andJones, unpublished). Such comparisons, ofcourse, imply the use of control groups.

Another situation that may require the use ofcomparison groups is related to the above.When different levels of variables are evalu-ated, the within-group design may restrict theexternal validity of the results. For example,if the effect of praise is evaluated and one wishesto compare that with the effect of praise andtoken reinforcement combined, the goal maynot be achieved by merely introducing praise inone phase and adding token reinforcement in asubsequent experimental phase. The questionof interest may be the effect of praise comparedto that of praise-plus-token reinforcement whenboth are introduced initially to a group notpreviously exposed to any experimental phase.This requires separate groups.

In advocating the use of control groups, it isimportant to recognize the limitations imposedby doing research in applied settings. Even if wemight envision situations in which a comparisongroup would provide desirable information,there are usually restrictions as to the informa-tion it can provide. With relatively rare excep-tions in the literature (e.g., Herman and Tra-montana, 1971), subjects cannot be matchedand assigned randomly to classes, hospital wards,institutional settings, or classrooms in whichthe procedures will be evaluated. In such in-stances it is desirable to select groups that willbest control for the factors one is interested incontrolling. For example, in a psychiatric setting,it is desirable to control for new staff, new wardfacilities, and diagnostic group in evaluating theprogram. Some investigations in psychiatricsettings have selected patients for the thera-peutic program and placed them on a specialward (Heap, Boblitt, Moore, and Hord, 1970).This makes evaluation of any program am-biguous because the effect of ward change, inand of itself, may lead to behavior change (DeVries, 1968; Higgs, 1970). An excellent ex-ample of avoiding some of these problems canbe found in the study by Schaefer and Martin

526

Page 11: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

METHODOLOGICAL AND ASSESSMENT CONSIDERATIONS

(1966). The patients on a ward were randomlyassigned to either receiving contingent or non-contingent token reinforcement. Not only didcontrols receive tokens and live on the sameward as the experimental subjects, they also re-ceived contingent praise for desirable behaviors.This type of control group effectively precludesa number of factors that may account for differ-ences between groups.

The use of controls may also be crucial whenthe experimenter wishes to determine whetherrelatively permanent effects of exposure to treat-ment conditions are obtained and the magnitudeof these effects. For example, Wolf et al.(1968) evaluated the effects of token reinforce-ment in altering academic skills in a remedialclassroom. Although the results were dramaticin within-group comparisons, the effectivenessof the program was enhanced by comparisons ofstudents who received the program with thosewho did not. Over the period in which the pro-gram was evaluated (1 yr), both groups madesignificant gains on a standard achievement testand grades. However, the experimental groupmade substantial gains over and above those ofcontrols. Clearly, a control group helped de-termine the magnitude of the gains made as aresult of the program.

Responses Assessed in Reinforcement Programs

The selection of responses to evaluate theefficacy of procedures is usually dictated by thepurpose of the study and the goals of the treat-ment or training institution in which the pro-gram is conducted. The use of response fre-quency has been employed as the most usefulmeasure of these responses, and its use hasa number of features to recommend it (seeBijou et al., 1968, 1969; Ferster, 1953; Honig,1966).The focus on observable behaviors is perhaps

one of the major advantages that accrue to be-havior modification procedures in general. Thisassessment procedure differs markedly fromtraditional approaches where inferential leaps

may be made in diagnosing, treating, and assess-ing dynamics, dispositions, or traits (Mischel,1968; Stuart, 1970).

In spite of the advantages of this assessmentapproach, the observations have been restrictedto the single target behavior of initial focus.While changes in target behaviors are theraison d'dtre for undertaking treatment or train-ing programs, concomitant changes may takeplace as well. If so, these should be assessed. Itis one thing to assess and evaluate changes in atarget behavior, but quite another to insist on ex-cluding non-target measures. It may be that in-vestigators are short-changing themselves inevaluating the programs. Recently, other areasof behavior modification have attempted toassess behaviors that might change as a result oftreatment but were not of direct therapeuticfocus (e.g., Bandura, Blanchard, and Ritter,1969; Kazdin, 1973a; Paul, 1967). To reiterate,the use of non-target behavioral measures in re-inforcement programs is to be encouraged. Yet,these measures are not to be made at the expenseof the primary data on target performance.

There are several potential advantages inusing measures of non-target behaviors as wellas the usual target response measures. One initialadvantage is that such assessment would permitthe possibility of determining response general-ization. If certain response frequencies are in-creased or decreased, it would be expected thatother related operants would be influenced. Itwould be a desirable addition to determinegeneralization of beneficial response changes bylooking at behaviors related to the target re-sponse. In addition, changes in the frequency ofresponses might also correlate with topographi-cal alterations. For example, a reduction of in-appropriate responses may result in the con-comitant reduction of the severity of the re-sponses (Burchard and Tyler, 1965; Hawkins,Peterson, Schweid, and Bijou, 1966).As the application of reinforcement programs

proliferates further, there will be increased con-cern with concomitant effects of such programs,aside from their obvious virtue of reliably alter-

527

Page 12: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

ALAN E. KAZDIN

ing the target behavior(s). Examples of this canbe seen in a few studies. Mulligan, Kaplan, andReppucci (1971) evaluated changes in "cog-nitive" variables made as a result of a token pro-gram for appropriate classroom behaviors andcompletion of tasks. The elementary school sub-jects in the program showed increases in IQscores, arithmetic achievement scores, and aslight decrease in anxiety. The conclusionsreached were that cognitive as well as behavioralimprovement may occur in a token reinforce-ment program. (This particular study hasmethodological problems such as possibility ofregression artifacts contributing to the results.)This suggests that there were important gainsmade outside of the obvious target responsechanges.

In using non-target as well as target measuresto assess changes made in reinforcement pro-grams, it is important to maintain emphasis onthe target behavioral measures. It is this measurethat can determine whether the program hasfunctional control over behavior. For example,Gripp and Magaro (1971) compared schizo-phrenic patients on one ward exposed to tokenreinforcement with control wards not exposed tothis treatment. Ratings were made on severalindirect measures of psychotic behavior, wardatmosphere, and social behaviors. Conclusionswere made on a number of dimensions relevantto psychopathology. The comparisons were de-termined from preprogram and postprogramratings. The experimental ward showed greaterimprovement than controls in number of di-mensions of gains and in degree of gains. How-ever, because the investigation did not focus ondirect behavioral targets on which the programcould be evaluated, and hence could not demon-strate functional control, the conclusions areambiguous. The effect of the program may havebeen due to collateral changes in staff behaviors,and a host of other factors. The use of indirector multiple measures of treatment efficacy areonly relevant when it has been demonstratedthat the contingencies themselves are respon-sible for the direct changes in behavior. When

the program has demonstrated control over be-havior, it would be interesting to determinenon-target correlates that may change as well.

SUMMARY

In summing up, there are several advantagesin using various within-subject designs. Gen-erally, these designs are quite powerful indemonstrating the effect of a particular experi-mental operation. However, there are potentialweaknesses of the designs currently employed.The choice of design may be influenced by theexpectation or desirability of a reversal inbehavior, if the experimental condition is with-drawn. In designs without a reversal of con-ditions (multiple-baseline designs), other prob-lems may arise, such as interdependence ofperformance (across behaviors, individuals, orsituations). Although these may prove to be in-frequent in future research, their occurrence ina given instance may be fatal in evaluating theresults.

In interpreting results of investigations in thisarea, it is important to be cognizant of poten-tial influences that may covary with the pre-sentation and withdrawal of experimental opera-tions. Also, various elements of the experimentmay delimit generalization of the results.Salient influences relevant to internal and ex-ternal validity were discussed.

The use of control groups was advocated inexamining certain effects of reinforcement pro-grams. Although the functional analysis is themajor aspect of the design, the proliferation ofexperiments in this area has led to questionsthat can, in many instances, be adequatelyanswered only by comparisons between or acrossgroups. As a final point, the use of multiple-response measures was encouraged. Changes intarget measures are the major point of under-taking reinforcement programs in applied set-tings. However, concomitant response changesmay take place along with the target responses.If so, these should be documented.

528

Page 13: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

METHODOLOGICAL AND ASSESSMENT CONSIDERATIONS 529

REFERENCES

Ayllon, T. and Azrin, N. H. Reinforcement and in-structions with mental patients. Journal of theExperimental Analysis of Behavior, 1964, 7, 327-331.

Ayllon, T. and Azrin, N. H. The measurement andreinforcement of behavior of psychotics. Journalof the Experimental Analysis of Behavior, 1965,8, 357-383.

Ayllon, T. and Azrin, N. H. The token economy: amotivational system for therapy and rehabilita-tion. New York: Appleton-Century-Crofts, 1968.

Azrin, N. H. and Holz, W. C. Punishment. In W.K. Honig (Ed.), Operant behavior: areas of re-search and application. New York: Appleton-Century-Crofts, 1966. Pp. 380-447.

Baer, D. M. Some remedial uses of the reinforce-ment contingency. In J. Shlien (Ed.), Researchin psychotherapy, Volume III. Washington, D.C.:American Psychological Association, 1968. Pp.3-20.

Baer, D. M. Behavior modification: you shouldn't.In E. A. Ramp and B. L. Hopkins (Eds.), A newdirection for education: behavior analysis, 1971.Lawrence, Kansas: University of Kansas Supportand Development Center for Follow Through,1971. Pp. 358-369.

Baer, D. M., Peterson, R. F., and Sherman, J. Thedevelopment of imitation by reinforcing be-havioral similarity to a model. Journal of the Ex-perimental Analysis of Behavior, 1967, 10, 405-416.

Baer, D. M., Wolf, M. M., and Risley, T. R. Somecurrent dimensions of applied behavior analysis.Journal of Applied Behavior Analysis, 1968, 1,91-97.

Bandura, A. Principles of behavior modification.New York: Holt, Rinehart, & Winston, 1969.

Bandura, A. Vicarious- and self-reinforcementprocesses. In R. Glaser (Ed.), The nature of re-inforcement. New York: Academic Press, 1971.Pp. 228-278.

Bandura, A., Blanchard, E. B., and Ritter, B. Rela-tive efficacy of desensitization and modeling ap-proaches for inducing behavioral, affective, andattitudinal changes. Journal of Personality andSocial Psychology, 1969, 13, 173-199.

Becker, W. C., Madsen, C. H., Arnold, C. R., andThomas, D. R. The contingent use of teacherattention and praising in reducing classroom be-havior problems. Journal of Special Education,1967, 1, 287-307.

Bijou, S. W., Peterson, R. F., and Ault, M. H. Amethod of integrate descriptive and experimentalfield studies at the level of data and empiricalconcepts. Journal of Applied Behavior Analysis,1968, 1, 175-191.

Bijou, S. W., Peterson, R. F., Harris, F. R., Allen, K.

E., and Johnston, M. S. Methodology for experi-mental studies of young children in natural set-tings. Psychological Record, 1969, 19, 177-210.

Birky, H. J., Chambliss, J. E., and Wasden, R. Acomparison of residents discharged from a tokeneconomy and two traditional psychiatric pro-grams. Behavior Therapy, 1971, 2, 46-51.

Broden, M., Bruce, M., Mitchell, M., Carter, V., andHall, R. V. Effects of teacher attention on at-tending behavior of two boys at adjacent desks.journal of Applied Behavior Analysis, 1970, 3,199-203.

Buell, J., Stoddard, P., Harris, F., and Baer, D. M.Collateral social development accompanying re-inforcement of outdoor play in a preschool child.Journal of Applied Behavior Analysis, 1968, 1,167-173.

Burchard, J. D. Systematic socialization: A pro-grammed environment for the habilitation ofantisocial retardates. Psychological Record, 1967,17, 461-476.

Burchard, J. D. and Tyler, V. 0. The modificationof delinquent behaviour through operant con-ditioning. Behaviour Research and Therapy, 1965,2, 245-250.

Campbell, D. T. and Stanley, J. C. Experimentaland quasi-experimental designs for research andteaching. In N. L. Gage (Ed.), Handbook of re-search on teaching. Chicago: Rand McNally,1963. Pp. 171-246.

DeVries, D. L. Effects of environmental change andof participation on the behavior of mental pa-tients. Journal of Consulting and Clinical Psy-chology, 1968, 32, 532-536.

Ferster, C. B. The use of the free operant in theanalysis of behavior. Psychological Bulletin,1953, 50, 264-274.

Garcia, E., Baer, D. M., and Firestone, I. The de-velopment of generalized imitation within topo-graphically determined boundaries. Journal ofApplied Behavior Analysis, 1971, 4, 101-112.

Grice, C. R. and Hunter, J. J. Stimulus intensityeffects depend upon the type of experimentaldesign. Psychological Review, 1964, 71, 247-256.

Gripp, R. F. and Magaro, P. A. A token economyprogram evaluation with untreated control wardcomparisons. Behaviour Research and Therapy,1971, 9, 137-149.

Hartlage, L. C. Subprofessional therapists' use ofreinforcement versus traditional psychothera-peutic techniques with schizophrenics. Journalof Consulting and Clinical Psychology, 1970, 34,181-183.

Hawkins, R. P., Peterson, R. F., Schweid, E., andBijou, S. W. Behavior therapy in the home:Amelioration of problem parent-child relationswith the parent in a therapeutic role. Journal ofExperimental Child Psychology, 1966, 4, 99-107.

Heap, R. F., Boblitt, W. E., Moore, C. H., and Hord,

Page 14: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

530 ALAN E. KAZDIN

J. E. Behavior-milieu therapy with chronicneuropsychiatric patients. Journal of AbnormalPsychology, 1970, 76, 349-354.

Herman, S. and Tramontana, J. Instructions andgroup versus individual reinforcement in modify-ing disruptive group behavior. Journal of AppliedBehavior Analysis, 1971, 4, 113-119.

Hewett, F. M., Taylor, F. D., and Artuso, A. A. TheSanta Monica Project: Evaluation of an engi-neered classroom design with emotionally dis-turbed children. Exceptional Children, 1969, 35,523-529.

Higgs, W. J. Effects of gross environmental changeupon behavior of schizophrenics: A cautionarynote. Journal of Abnormal Psychology, 1970, 76,421-422.

Honig, W. K. Introduction. Operant behavior:areas of research and application. New York:Appleton-Century-Crofts, 1966.

Hopkins, B. L. Effects of candy and social rein-forcement schedule learning on the modificationand maintenance of smiling. Journal of AppliedBehavior Analysis, 1968, 1, 121-128.

Hunt, J. G. and Zimmerman, J. Stimulating pro-ductivity in a simulated workshop setting. Ameri-can Journal of Mental Deficiency, 1969, 74, 43-49.

Kazdin, A. E. The effect of response cost andaversive stimulation in suppressing punished andnonpunished speech disfluencies. BehaviorTherapy, 1973, 4, 73-82. (a)

Kazdin, A. E. The effect of vicarious reinforcementon attentive behavior in the classroom. Journal ofApplied Behavior Analysis, 1973, 6, 71-78. (b)

Kazdin, A. E. The role of instructions and reinforce-ment in behavior changes in token reinforcementprograms. Journal of Educational Psychology,1973, 64, 63-71. (c)

Kazdin, A. E. and Bootzin, R. R. The tokeneconomy: an evaluative review. Journal of Ap-plied Behavior Analysis, 1972, 5, 343-372.

Kazdin, A. E. and Polster, R. Intermittent tokenreinforcement and response maintenance in ex-tinction. Behavior Therapy, 1973, 4, 386-391.

Kuypers, D. S., Becker, W. C., and O'Leary, K. D.How to make a token system fail. ExceptionalChildren, 1968, 11, 101-108.

Lawson, R. Brightness discrimination performanceand secondary reward strength as a function ofprimary reward amount. Journal of Comparativeand Physiological Psychology, 1957, 50, 35-39.

McAllister, L. W., Stachowiak, J. G., Baer, D. M.,and Conderman, L. The application of operantconditioning techniques in a secondary schoolclassroom. Journal of Applied Behavior Analysis,1969, 2, 277-285.

McNamara, J. R. and MacDonough, T. S. Somemethodological considerations in the design andimplementation of behavior therapy research.Behavior Therapy, 1972, 3, 361-378.

Marks, J., Sonoda, B., and Schalock, R. Reinforce-ment vs. relationship therapy for schizophrenics.Journal of Abnormal Psychology, 1968, 73, 397-402.

Mathis, B. C., Cotton, J. W., and Sechrest, L. Psy-chological foundations of education. New York:Academic Press, 1971.

Meichenbaum, D. H., Bowers, K., and Ross, R. R.Modification of classroom behavior of institu-tionalized female adolescent offenders. BehaviourResearch and Therapy, 1968, 6, 343-353.

Metz, J. R. Conditioning generalized imitation inautistic children. Journal of Experimental ChildPsychology, 1965, 2, 389-399.

Mischel, W. Personality and assessment. New York:Wiley, 1968.

Mulligan, W., Kaplan, R. D., and Reppucci, N. D.Changes in cognitive variables among behaviorproblem elemetary school boys treated in a tokeneconomy special classroom. Unpublished paperpresented at Association for the Advancement ofBehavior Therapy, Washington, D.C., September1971.

O'Leary, K. D. and Becker, W. C. Behavior modifi-cation of an adjustment class: A token reinforce-ment program. Exceptional Children, 1967, 9,637-642.

O'Leary, K. D., Becker, W. C., Evans, M. B., andSaudargas, R. A. A token reinforcement pro-gram in a public school: A replication and sys-tematic analysis. Journal of Applied BehaviorAnalysis, 1969, 2, 3-31.

O'Leary, K. D. and Drabman, R. Token reinforce-ment programs in the classroom: A review.Psychological Bulletin, 1971, 75, 379-398.

Packard, R. G. The control of "classroom atten-tion": a group contingency for complex behavior.Journal of Applied Behavior Analysis, 1970, 3,13-28.

Paul, G. L. Insight versus desensitization in psycho-therapy two years after termination. Journal ofConsulting Psychology, 1967, 31, 333-348.

Pendergrass, V. E. Timeout from positive reinforce-ment following persistent, high-rate behavior inretardates. Journal of Applied Behavior Analysis,1972, 5, 85-91.

Perkins, C. C. and Cacioppo, A. J. The effect of in-termittent reinforcement on the change in ex-tinction rate following successive reconditionings.Journal of Experimental Psychology, 1950, 40,794-801.

Peterson, R. F. and Whitehurst, G. J. A variableinfluencing the performance of generalizedimitative behaviors. Journal of Applied BehaviorAnalysis, 1971, 4, 1-9.

Phillips, E. L. Achievement Place: token reinforce-ment procedures in a home-style rehabilitationsetting for "predelinquent" boys. Journal of Ap-plied Behavior Analysis, 1968, 1, 213-223.

Page 15: EVALUATING REINFORCEMENT PROGRAMS ALAN E. KAZDIN ...

METHODOLOGICAL AND ASSESSMENT CONSIDERATIONS 531

Risley, T. R. Behavior modification: An experi-mental-therapeutic endeavor. In L. A. Hamer-lynck, P. 0. Davidson, and L. E. Acker (Eds.),Behavior modification and ideal mental healthservices. Calgary, Alberta, Canada: University ofCalgary Press, 1970. Pp. 103-127.

Risley, T. R. and Baer, D. M. Operant behaviormodification: The deliberate development ofchild behavior. In B. Calwell and H. Riccuiti(Eds.), Review of child development research,Volume III: Social action, 1973, in press.

Risley, T. R. and Wolf, M. M. Strategies foranalyzing behavioral change over time. In J.Nesselroade and H. Reese (Eds.), Life-span de-velopmental psychology: methodological issues.New York: Academic Press, 1972.

Schaefer, H. H. and Martin, P. L. Behavior therapyfor "apathy" of hospitalized schizophrenics. Psy-chological Reports, 1966, 19, 1147-1158.

Schick, K. Operants. Journal of the ExperimentalAnalysis of Behavior, 1971, 15, 413-423.

Schrier, A. M. Comparison of two methods of in-vestigating the effects of amount of reward onperformance. Journal of Comparative and Physio-logical Psychology, 1958, 51, 725-731.

Sechrest, L. Implicit reinforcement of responses.Journal of Educational Psychology, 1963, 54,197-201.

Sidman, M. Tactics of scientific research. NewYork: Basic Books, 1960.

Skinner, B. F. Science and human behavior. NewYork: Macmillan, 1953.

Skinner, B. F. Contingencies of reinforcement: atheoretical analysis. New York: Appleton-Cen-tury-Crofts, 1969.

Staats, A. W. Reinforcer systems in the solution ofhuman problems. In G. A. Fargo, C. Behrns, andP. Nolen (Eds.), Behavior modification in theclassroom. Belmont, California: Wadsworth, 1970.Pp. 6-3 1.

Staats, A. W. and Butterfield, W. H. Treatment ofnonreading in a culturally deprived juveniledelinquent: An application of learning principles.Child Development, 1965, 4, 925-942.

Staats, A. W., Minke, K. A., and Butts, P. A token-reinforcement remedial reading program ad-ministered by black therapy technicians to prob-

lem black children. Behavior Therapy, 1970, 1,331-353.

Stayer, S. J. and Jones, F. Ward 108: Behaviormodification and the delinquent soldier. Unpub-lished paper presented at Behavioral EngineeringConference, Walter Reed General Hospital,1969.

Stuart, R. B. Trick or treatment: how and whenpsychotherapy fails. Champaign, Illinois: Re-search Press, 1970.

Suchodiff, L., Greaves, S., Stecker, H., and Berke, R.Critical variables in the token economy. Pro-ceedings of the 78th Annual Convention of theAmerican Psychological Association, 1970, 5,517-518.

Surratt, P. R., Ulrich, R. E., and Hawkins, R. P. Anelementary student as a behavioral engineer.Journal of Applied Behavior Analysis, 1969, 2,85-92.

Thoresen, C. E. The intensive design: an intimateapproach to counseling research. Unpublishedpaper presented at meeting of American Educa-tional Research Association, Chicago, April,1972.

Underwood, B. J. Psychological research. NewYork: Appleton-Century-Crofts, 1957.

Wahler, R. G. Setting generality: some specific andgeneral effects of child behavior therapy. Journalof Applied Behavior Analysis, 1969, 2, 239-246.

Wolf, M. M., Giles, D. K., and Hall, R. V. Experi-ments with token reinforcement in a remedialclassroom. Behaviour Research and Therapy,1968, 6, 51-64.

Wolf, M. M. and Risley, T. R. Reinforcement:applied research. In R. Glaser (Ed.), The natureof reinforcement. New York: Academic Press,1971. Pp. 310-325.

Zimmerman, J., Stuckey, T. E., Garlick, B. J., andMiller, M. Effects of token reinforcement onproductivity in multiply handicapped clients in asheltered workshop. Rehabilitation Literature,1969, 30, 34-41.

Received 21 March 1972.(Revision requested 27 December 1972.)(Final acceptance 29 January 1973.)

Comments by reviewers on following pages