Evaluation for 21st century:A handbook: Edited by Eleanor Chelimsky and William R. Shadish. Thousand...

EVALUATION andPROGRAMPLANNlNG

PERGAMON Evaluation and Progrannnlng Planning 21 (1998) 237-242

Book reviews

E~rlurr~io~l fijr rlrt, .?l.r/ C‘c,tl/ur.,r: /1 Hot&book. Edited by Eleanor Chelimsky and William R. Shadish. Thousand Oaks. CA: Sage. 1997.

Rcric,~~,c,/,: Michael Hendricks. MH .4.,:~~itrte.s, Porrltrtul, OR. I’.S.A.

K~~~wor-tl.~: Di\,ersity ofapproaches to evaluation; Infrastructure for evaluation: Visibility of evaluators.

When the editor of this journal first asked me to review this

book. 1 hesitated. After all, who has time in a busy schedule to review 501 pages’? But then I realized two things: First. if Ele-

anor Chelimsky and Will Shadish---two of our most respected evaluators+ are willing to explicate the issues which w-ill define

evaluation as Me enter the next century. then I should either know those issues or stop my consulting business. But I also

knew I simply wouldn’t make the time to read this book unless pushed to do \o. So I forced myself to accept Jon Morell’s offer. and I’m glad I did.

Since I would like this revie\+ to be as helpful as possible. please allow me to offer (I) a concise description of the book’s contents. (2) four overall themes which. in my opinion. run through the various chapters. (3) four important topics which

receive too-little attention in the book, and (4) my overall con- clusion and rrcommendatioii. I hope you find these offerings useful in some \\a!

To begin. thi\ book is an edited collection of 32 papers orig-

inally presented at the International Evaluation Conference held in Vancouver. British Columbia in November 1995. As betits that conference. the 36 authors come from I I different

countries. with almost half (15) currently working outside the United States. So this book presents a true international perspective. for \\hich I commend the editors.

Dedicated 10 the memory of Donald T. Campbell (whose infectious good humor this reviewer also misses greatly). the

book begins \rith a short preface in which the editors establish the M,atchword for the entire ~olume~~diversity. Diversity of

actors. of geopraphlc locations. of topics to evaluate. of methods. of 11~s. and even of purposes for doing evaluation. As the editors say. “Evaluation in the next century will be far more di\ersc than it is today. So we must face the discomfort

of stirring ourselves if \\e are to avoid being left behind.” (p. xiii).

This ‘stirring‘ IS done in the book‘s seven sections. each of

which amplq illustrates this diversit). Since these seven sections differ radically in their lengths. let me order them along that dimension, since the length of each section will determine how

a potential reader would spend her or his time. The longest se&on of the book (at I30 pages) presents ‘A

Sampler of the Current Methodological Tool Kit‘. This is not especially surprising. given our traditional emphasis on methods. but it is somewhat ironic. In an important early chap-

SOt4Y 71%)0X $19 00 ( IYYX Elsevicr Science Ltd. All rights reserved

PII. so 1-I’) -1 X9(9X )00007 Y

ter in the book. Thomas D. Cook calls for the field of evaluation to build its foundation upon the stability of a three-legged stool.

with separate but equal legs for methods. theories of evaluation. and syntheses of past findings. Cook warns us that our ‘methods‘ leg is already the longest and strongest of the three, so he urges us to give special attention to the other two legs. Perhaps Eleanor and Will are simply reflecting the types of papers presented in Vancouver. or perhaps they have inad- vertently continued our past habits. but the length ofthis section provides an amusing example that we do. as Tom Cook claims. seem to focus much of our attention on methods.

The eight chapters in this section discuss multiple methods. cross-design synthesis. research syntheaia for pubhc health policy. empowerment evaluation. cluster evaluation. scientific

realist evaluation. single-case evaluation. and interrupted time series designs. For a lapsed methodologist. these chapters are rewarding. although quite uneven. Several authors provide such practical and detailed text. and include so many illustrative tables and graphs. that they essentially olfer us mini-primers on the techniques themselves. This is true for Lois-ellin Datta on

multimethod evaluation. Judith A. Droitcour on cross-design synthesis. James R. Sanders on cluster e\,wluarion. and Robert G. Orwin on interrupted time series. Most c’calualors should not only read. but actually study. these chapters. The remaining four chapters in this section are interesting. although tM0 authors-- David M. Fetterman on empowerment e\~aluation and Mansoor A. F. Kazi on single-cast evaluation may actually tend IO undermine theit- o\vn causes by presenting esamples of their approaches which. at least IO mc. :Ire simply uncon- vincing.

The next longest section (at I22 papes) showcases ‘New Top- ics for Evaluation’. and it powerfull) illustrates the impressive diversity of topics we evaluators now address The nine chapters discuss recent evaluations of human rights violations. immi- gration policy. institutional change. impacts of de\clopment prqjrcts on \vomen. foreign aid. U.S. nuclear capability. Ru+ sian nuclear power policy. and the global en\ironmcnt. Quite a

change from the mid- 1960s when the preponderance of funding from Title I and the Public Health Service cncured that most e\,aluations addressed either education or health cart.

In fact. if an!’ part of this book could be llsed 10 market the

field of e\,aluation. it would be thih sectIon. Each chapter IS worth reading. especially Ignacio C‘ano‘s analyses of massive human riphrs Colations and Michael Bambergcr‘< report of

studying gender impacts in Tunisia. However. I was most impre- ssed by KI-istinu Svensson’s thorough and thorouphlq insightful chapter on -‘The Analysis and Evaluation of Forclgn Aid”. Perhaps others have known of Ms. Sucnsson. a member of the Swedish Parliament who is currently in Burundi launching development projects. but this is my lit-st exposure. However. I certainly hope all of us can learn more in the future from this experienced and wise voice.

The next longest section (at 70 pages) discusses “Inter- national Evaluation”, certainly an apt topic given the book’s

origin. Imagine my surprise, then, when I found this section to be strangely uninspiring, even though I have a strong personal interest in this topic. Who knows why, but to me these five chapters seem uninspiring and devoid of context, especially when contrasted to the rich, compelling stories in the section just discussed previously.

The editors praise Masafumi Nagao’s use of evaluation in the small Japanese town of Oguni, and I agree that this chapter is thought-provoking. The other four chapters, however, offer useful but somewhat dry descriptions of evaluation in China, Denmark, Latin America, and the World Bank. The first three of these descriptions are reasonably balanced. but Robert Pic- ciotto’s tribute to evaluation in the World Bank is embar- rassingly one-sided. A more balanced discussion of both strengths and limitations (such as the discussion Jim Sanders provides regarding cluster evaluation) would give the reader a more accurate, and therefore more useful, understanding of the reality of evaluation efforts in this vitally important organ-

ization. The next longest section (at 52 pages) describes the increas-

ingly close connections between ‘Auditing and Evaluation’. Within the book, this section is actually the first substantive section, and that position quite accurately reflects, in my opinion, the importance of this convergence between evaluation and performance auditing. As an example of this phenomenon. several of us recently established an Oregon Program Eva- luators Network in Portland, Oregon. At our steering com- mittee meeting to choose speakers for our first meeting, several mainstream evaluators independently suggested that we invite the local County Auditor, who eagerly accepted.

Each of the four chapters in this section takes pains to show the similarities and differences between auditing and evaluation in different locations (Canada, Sweden. Europe, and Minne- sota), and each points out the increasing overlap of purposes and methods. However. each chapter also contains nuggets of valuable constructive criticisms of evaluation, including L. Denis Desautels’ observation of “the inability of evaluators to demonstrate the value added by their activities” (p. 77), Inga- Britt Alhenius’ gentle exhortation to do more than simply crit- icize those who bravely go first and do things, Christopher Pollitt and Hilkka Summa’s reminder of the importance of a strong institutional base. and Roger A. Brooks’ finding that neither auditing nor evaluation alone can provide all the answers needed.

The next longest section (at 42 pages) actually opens the book by presenting ‘Evaluation-Yesterday and Today’. This section establishes an historical context for the remainder of the book, and it contains only two chapters. In one. Thomas D. Cook reports what we’ve learned over the past 25 years, and in the other, Eleanor Chelimsky discusses the political environment of evaluation. My review of this section is simple-read these two chapters. Whenever Tom or Eleanor speak, all evaluators should listen. and these two chapters are no exception.

1 offer exactly the same advice for the second-shortest section (at 34 pages) on ‘An Enduring Argument About the Purpose of Evaluation’. This section also contains only two chapters, but the authors are Robert E. Stake and Michael Striven, and they end the book with a bang, certainly not with a whimper. They debate the very essence of our role-What does it mean to be an evaiuator?%and they clearly do not agree.

Bob Stake can envision situations in which revealing the whole truth might harm efforts he knows in his heart to be ‘positive forces’. so he suggests that advocacy in evaluation might be a necessary evil at times. As an example, he describes an evaluation in which he, “operating under an ethic of min-

imized disruption” (p. 473). very consciously did not report ‘weaknesses’ and ‘misdirections’ as publicly or as persistently as he reported ‘strengths’ and ‘good moves’. Michael Striven.

on the other hand. views an evaluator (not an evaluation con- sultant, which he views as an entirely different role) as an expert witness at a trial, obligated to tell “the truth, the whole truth, and nothing but the truth.” From his perspective, distancing and objectivity are the highest ideals of our profession. I per-

sonally find myself siding more with &riven’s position, but the juxtaposition of these two chapters certainly makes the reader think.

The shortest section in the book (at only 24 pages) is devoted to. surprisingly. ‘Performance Measurement and Evaluation’. Surprising to me, at least, since many persons consider the

vet-itable explosion of performance measurement efforts to be- for better or for worse (or for both)----one of the most significant

developments in the evaluation arena in recent years. But this book captures papers presented almost three years ago, and 1 suspect that if this book were updated today, performance measurement would receive significantly more attention. In any event. Joseph S. Wholey and Caroline Mawhood do a fine job of describing performance measurement efforts in the U.S. and

the U.K., respectively, as of late 1995. Having described the book’s basic contents, please allow me

to suggest four important themes which weave throughout the book. Theme I is obvious-the increasing diversity of evaluation. As noted earlier. diversity is this book’s watchword,

and the editors do a fine job of illustrating the many positive dimensions of that diversity.

I wonder. however, if this same diversity might also carry

within it some risks to the field? In other words, might this diversity ever be a liability? The editors are justifiably proud that we evaluators are able to wear different hats (Eleanor Chelimsky’s ‘perspectives’) and utilize different methods to

answer different sorts of questions. But would an outsider per- ceive us as being diverse or possibly as simply not having a core identity of our own? In other words, does ‘diverse’ ever become

‘diffuse”?

Theme 2 involves the need for reliable infrastructures to support the practice of evaluation. This theme is perhaps most

obvious in the calls for creating more solid evaluation insti-

tutions in countries such as China, but several authors note that our US infrastructures are not exactly rock-solid themselves

(consider the almost total dismantling of the once-outstanding Program Evaluation and Methodology Division at GAO. for a sad example). In addition. mol-e than one of our audit colleagues notes that, from their perspective. we evaluators seem to be operating from shaky ground, indeed. To my knowledge (admittedly limited), there is no systematic effort underway within the US to catalogue and characterize our evaluation infrastructures, much less to strengthen them. Might such an effort be worth discussing?

Theme 3 is a question-why are we evaluators not more visible. with a louder. more respected voice in policy discussions at all levels‘! Several authors admit that, as a profession, we are neither as highly respected nor as influential as we should-and

could-be. Tom Cook suggests we should build more effective

links with other professions, as well as conduct and publicize

syntheses. L. Denis Desautels urges us to show the value we

add to a discussion. Would these steps help? What else would be needed? What. if anything, should the American Evaluation Association undertake to boost our profession’s standing?

Theme 4 involves money, and money practically permeates the book. Should we ‘follow the money’ in choosing topics to

evaluate‘? Do shrinking budgets create a greater or lesser

demand for our services’! Can’t we evaluators offer more in the way of financial analyses? And speaking of financial analyses. Isn’t it interesting that auditors and evaluators are coming closer, but only the auditors seem to be moving<? That is. aren‘t

auditors broadening their skills by learning ours. while we evaluators. by and large. are failing to broaden our skills bq learning theirs? Does anyone else worry that this might bc a

sure-fire recipe for being eventually replaced by auditors’? Having suggested these four cross-cutting themes in the book. L

iet me also mention four topics which, to my mind, are some-

what under represented in the book. This mention is certainI> not a criticism of the editors. who were obviously limited to the

selection of papers presented in Vancou\:er. On the contrary. they have done an admirable job of covering topics important

for evaluators.

But in my opinion. a new evaluator reading this book might

be misinformed unless she or he also learns that we devote

considerable thought to (in alphabetical order) (1) electronic

databases. including the types ofdata they contain. the potential for using those data in evaluative work. and problems accessing and using those data: (2) ethics and guiding pi-inciples. especially

in the often-ambiguous situations in which we find ourselvex:

(3) prospective e\,aluation and its potential for inlccting 0111

knowledge into discussions at an earlier moment: and (4) the

\aluc of true randomized experiments and the surpriGng frc-

yucncy \\ ith which the! can be implemented. Each of the\e

topicx is certainly mentioned in the book. but perhaps not to

the full breadth or depth they might deser\c

In concluGo11. let me state emphatically that this is a tine

book. Nor a great book. and not the best book cithcr Elcanol

Chclimsky or M’ill Shadish ha< ever produced. but ;I lint book

nonetheless. The editors xt out to illustrate :hr amar~np diber-

sity within which \vc evaluators ~ill ntxd to Gqxrate 111 the 2lht

centur!‘. and thq succeed admirably. While none <)l‘ us would

recommend thi> book for beginnin, o students or for cax1a1 rcad-

ers. those of LIS nhu plan to bc acti\c practitioner\ in the yeat

2000 aill benefit considerublv from ha\ 1112 our horizons

strctcheti bq thi\ volume.

Ercharit~g Twott~cr~t 01i.i~orzr~rrrlt.s: The Quulit>. of’Ps.d~iuiri~~ rrrd SU~.S~LIII~~L~ ,4h14sc Pw,ytutm. by Rudolf H. Moos, 2nd edn.

revised and expanded. New Bruswick, NJ: Transaction

Publishers, 1997. Rcricwr: William A. Hargreaves. Uniwr.ritl’ of’ Cal~f~mirr. c:.s..4.

K(~_YIIY~&: Psychiatric inpatient care: Psychiatric residential care: Practice guidelines; Social ecology: Mental health treat-

ment: Substance abuse treatment.

Early in his career Rudolf Moos carlled out a research area that he called the “social ecology” of treatment environments. The early work was summarized in the first edition of this book in

1974. Now revised and expanded. the second edition covers

over 30 years of research on the determinants and the effects of treatment environments. Much of this work has been carried

out by Moos and his colleagues. Most of the work has focussed

on mental health and substance abuse treatment programs, Lncluding inpatient. other residential. and other community

programs. He first examined hospital environments. Moos and col-

leagues developed the Ward Atmosphere Scale (WAS) to be completed by program staff and by patients. This instrument describes treatment environments using ten subscales: Involve- ment, Support, Spontaneity. Autonomy, Practical Orientation. Personal Problem Orientation. Anger and Aggression. Order and Organization. Program Clarity. and Staff Control. These subscales are xeen as reflecting three underlying domains: relationship. personal growth. and system maintenance. The

investigators gathered normative data in ;I hampIe of 160 psychiatric programs in 44 hospitals in 16 states. They also studied 36 programs in the United Kingdom. Moos discusses use of the scale in monitoring and improving hospital programs. illus-

trating methods with case examples and quasi-experiments, and presents findings suggesting aiu naturally occurring types of hospital programs: therapeutic community. Irelationship oriented. action oriented. insight oriented. control oriented. and undifferentiated.

By the early 1970s. Moos and his collcag~~cs had developed :I second but parallel instrument called the C‘ommunlty Oriented Programs Environment Scale (COPES). adapted for community programs such as halfw;t.>; houses. pai-tial hospi- taliration. day treatment. and rehablhtation workshops. While item content neces$aril) differs. the xcalc !iclds conceptuall\i similar scores on the hame ten subscale\. Normati\l: data were gathered on 54 community programs 111 the Unltcd States. Moos prehents case examples illustrating the else of the COPES to change interpersonal process aspects of community trcat- ment programs. He prehents data contrasting the characteristics of substance abuse I > mental health progr;lms and programs with professional star vs programs with only p~lraprofession~ll staft‘. From ;I cluster analysis of clients’ perceptions of 7X programs, Moos dcri\.cs ;I typology ol‘sis clusters with the same names as the six types of inpatient programs, although the charactertstics of some types differ somewhat from the characteristics of the corresponding inpatient type.

Moos highlight\ the Lariability of both hospital and community treatment programs. He notes that .-the quality of the treatment milieu cannot bc inferred simply l’rom knowledge

Evaluation for 21st century:A handbook: Edited by Eleanor Chelimsky and William R. Shadish. Thousand...

Documents

Transcript of Evaluation for 21st century:A handbook: Edited by Eleanor Chelimsky and William R. Shadish. Thousand...