Independent Quality Assessment of UNODC Evaluation Reports ... · evaluation reports in 2016 and...

IndependentQualityAssessmentofUNODCEvaluationReports2016

SynthesisReport

Preparedfor:

IndependentEvaluationUnit,UNODC

byJohnMathiasonandAnnSutherland

December22,2016

CONTENTS

INTRODUCTION........................................................................................................................3

METHODOLOGY........................................................................................................................3

FINDINGS..................................................................................................................................4

A.StrengthsoftheEvaluationReports................................................................................5

B.WeaknessesoftheEvaluationReports...........................................................................6

C.ImprovementsfromFirstDraftinFinal...........................................................................7

D.BestPracticesObserved..................................................................................................8

E.ChallengesforEvaluators................................................................................................9

F.MainstreamingHumanRightsandGenderConsiderations.............................................9

RECOMMENDATIONS.............................................................................................................11

Appendix1.ComparisonofFinalReportwithDraftReport

Appendix2.UNODCEQATemplateasUsedintheReview

Appendix3.RecommendedRevisionstoUNODCEQATemplate

INTRODUCTION

The IndependentEvaluationUnit (IEU) is leadingandguidingevaluations inorder toprovide objective information on the performance of United Nations Office on DrugsandCrime(UNODC).AsamemberoftheUnitedNationsEvaluationGroup(UNEG),IEUisfollowingitsNormsandStandards.

Thenumberofindependentprojectevaluationsandin-depthevaluationshasincreasedsignificantlyoverthepastyearsatUNODC,generatingimportantlearningopportunitiesforallstakeholders,includingtheIEU.UNODC’seffortstocreatebettermechanismsfordiscussingandpropellingstrategicissuesforwardarebeinginformedbyIEU,wherebyimportant insights are generated in the planning, implementation and finalization ofevaluations. As part of its efforts to ensure the agency’s evaluations are providingcredible information to inform planning processes, the IEU has commissionedindependentevaluationqualityassessmentsofevaluationreportsproducedsince2014.

This document synthesizes the EvaluationQuality Assessment (EQA) of all publishedevaluation reports in 2016 andmakes comparisonswith EQAs from 2014 and 2015.Moreover, thequalityofarandomsampleof30%of firstdraftevaluationreportshasbeenreviewedandassessedtoanalyzethequalitativechangesbetweenthefirstdraftreportandthepublishedversion.

Thisassignmentwascarriedoutfrommid-Novembertomid-Decemberof2016bytwoindependent consultants - Dr. John Mathiason (Team Leader) and Ann Sutherland(TeamMember).Bothhaveextensiveexperience inconductingevaluationsandmeta-evaluations for international organizations. They are the Managing Director andPrincipalAssociate,respectively,forAssociatesforInternationalManagementServices(AIMS).

METHODOLOGY

ThefirstphaseofthisassignmentinvolvedmakingminorrevisionstotheEQAtemplate.Checkboxes forratingtheextenttowhicheachcriterionwasmet(met,partiallymet,notmet)were incorporated into the template. Thiswas found to be advantageous inseveral respects: the comments are less redundant as it is no longernecessary to re-state the criteria; the reviewer is forced to consider all elements; and the commentssectionisusedonlytohighlightissuesthatareparticularlystrongorweak,andprovideanyadditionaljustificationthatmaybeneededtoexplaintheoverallscoregivenfortherespectivesection.TherevisedEQAtemplateusedfor theassessment isshownintheAnnex.

The reviewers then examined the quality of all of the evaluation reports publishedduring2016.Thetotalnumberofevaluationsfortheyearwas19.Threewerein-depthevaluations and 16 were independent evaluations. The reviewers also compared thefinal drafts of a third of the evaluations, randomly selected, with the first drafts toobserveanydifferences thatcouldbeattributed to theprocessofcommentingon thefirstdraftsbyIEU.Allweredraftsofindependentevaluations.

To ensure consistency of the review process, both reviewers independently assessedone In-Depth and three Independent Project Evaluation reports. These reports werealso randomly selected andwere rated according to the nineEQA criteria categories.Thereviewersthencomparedtheircommentsandscoresforeachcriterionaswellasthe overall score. In all cases, the overall scores were the same for both reviewers.Therewereminordifferencesincriteriascoresandtherewerenon-materialdifferencesincomments.Thedifferenceswerediscussedandresolved.Asitwasevidentthattherewas consistency between the team, the remaining reports were each rated by oneperson; assignments were based on each reviewer’s area of expertise and languageskills.Overall,approximatelyonethirdoftheevaluationreportswereassessedbybothreviewers.

Theteamusedthe“Unsatisfactory”ratingonlywhenthecriteriaelementsweremissingorverypoorlyaddressed.Similarly,“VeryGood”wasonlyusedwhenallcriteriawerefullymet. As a result, “Fair”was usedwhen the criteriawere generally notmet, andthereforethatscorecanbeunderstoodtomeanthatthereports,orpiecesofthereports,werenotwelldone.

FINDINGS

Allofthe19reportsproducedin2016wereofadequatequality.AsseeninTable1,fourreportsreceivedaVeryGoodratingandnoreportswereratedasUnsatisfactory.

Table 1: Overall Rating of 2016 Reports compared with previous EQA cycles Unsatisfactory Fair Good Very Good Total

# of Reports - 2016 0 8 7 4 19 % of Reports - 2016 0 42% 37% 21% 100%

# of Reports - 20151 0 12 9 1 22 % of Reports – 2015 0% 53% 41% 5% 100% # of Reports – 2014/152 0 9 22 2 33 % of Reports – 2014/15 0% 27% 67% 6% 100% Table1alsoshowstherehavebeensubstantialimprovementscomparedto2015withanincreasednumberofVeryGoodreportsandfewerratedasFair.Thedropinratingsfor the latterhalf of2015compared to thoseof2014-mid2015maybeexplainedbychanges made to the EQA template, in particular to the addition of a criteria ofReliabilityandanincreasedemphasisongenderanalysis.In 2016 there were, as last year, some differences by criterion. Table 2 shows thatExecutiveSummaryandReliabilitysectionswerethemostlikelytohavenegative(FairandUnsatisfactory)ratings.Recommendations,LessonsLearnedandGEEWwerealso

1ThisincludedreportspublishedfromJunethroughDecember2015.2This included the first batch of EQA assessments which considered reports published fromJanuary2014throughMay2015.

lessstrong.TheConclusionsandPresentation/Structuresectionswerethemost likelytohavepositiveratings.

Table 2: Report Rating by Criteria

Table3showsthescoresbysubjectarea.Duetothe lownumbersofreports, there isnotaspecificsectorthatcanbesaidtohavestrongerreports.Howeveritisnotablethattwo of the In-depth reports were rated as Very Good and one as Good. This is animprovementfrom2015where,oftheeightIn-depthreports,25%wereratedasFairand75%wereratedasGood.

Table 3: Report Rating by Subject

A.StrengthsoftheEvaluationReports

Asmentionedabove, theevaluationreports for2016showed improvementwith58%receivingaratingofGoodorVeryGood.Therewere fourprimarystrengthsobservedacrossthosereviewed:

1

2

1

1

3

5

12

6

10

10

8

2

9

10

7

12

3

10

5

6

7

12

8

6

7

2

3

1

4

2

4

4

2

3

2

Presentation/Structure

Executive Summary

Context & Purpose

Scope & Method

Reliability

Findings

Conclusions

Recommendations

Lessons Learned

GEEW

Unsatisfactory

Fair

Good

Very Good

1

5

1

1

1

2

2

1

1

1

1

2

Corruption

Organized Crime & Drugs

Justice

Prev/Treat/Reinteg

Research Trends

Terrorism Prevention

Indepth

Fair

Good

Very Good

Consistency with DAC and UNEG Norms: As was the case for 2015, the reportsgenerallyconformedtotheacceptednormsandguidelines.

Conclusions:Theconclusionssectionwasthebestinmeetingcriteria,showingthattheevaluatorswereabletosynthesizetheresultseffectively in16of19cases(84%wereratedasGoodorVeryGood).

PresentationandStructure: Thissectionofthereportsreceivedthesecond-highestscorewith14of17reportsreceivingaratingofVeryGoodorGood.

InclusionofGenderAnalysis:Therewerenotableimprovementsinthe2016reportsand these are discussed in the “Mainstreaming ofHumanRights andGender” sectionbelow.

B.WeaknessesoftheEvaluationReports

There continue to be areas inwhichUNODCevaluations could improveuponoverall.Theseinclude:

Executive Summaries:Thiswas theweakest sectionof the2016evaluation reports.Themainissueswerethatthesectionoften(a)exceedsthemaximumpagelength,(b)doesnotincludereferencetotheevaluationmethodologyused,(c)isoverlyfocusedonfindings rather than conclusions, and (d) the Summary Matrixes include too manyand/orunclearrecommendations.

ReliabilityofData: Thereareanumberofgoodpracticesinthefieldofdevelopmentevaluationthatarenotsufficientlyusedinthereportsreviewed.Commonare:

• Theabsenceofsamplingframestoaddressreliabilityofdata.Withoutthis, it isnotpossibletoknowifthedataarerepresentative.

• Thesourcesofdata fromstakeholdersconsultedduringtheevaluationprocessnot being evident in the Findings. In approximately one third of reports, thefindingsarepresentedwithoutreference towhere theperspectivescame fromorwithonlyverygeneralreferencetosource(i.e.“intervieweessaid . . .”).Thiscan leave the impression that the Findings are solely or mostly from theperspectiveoftheevaluator.

• Responsesnotbeingbrokendownbystakeholdergrouporbygender.

• Data that was collected but not reported on. In a number of cases, themethodology included data collection processes such as surveys or MostSignificantChange(MSC)questionsbuttherewasnospecificreferencetothesein theFindings.BOL/Y15 isacase inpointwhereby themethodology includedinterviewswith467students,howevernoneofthedatafromthisexercisewasshown.Insuchinstances,itisunknowntothereaderwhetherthedatawasnotanalyzedorwhetheritwasnotuseful.

• Lackofquantitativedatacollectedduringtheevaluation(primarysource).

• Theoveruseofdatafromdocumentreview(secondarysource).

• No,oroverlyvague,explanationsforhowdatawasanalyzedwhichsuggeststhatdataisnotbeingreviewedinasystematicmanner.

AbsenceofLogframes: Aswasthecasefor2015,themostsignificantconcernistheabsence of results frameworks and clear indicators to measure the progress orachievement of the intervention against its planned results. Logframes were rarelyincluded in the reports but in four cases the evaluators prepared an improvedframework(andthesewereevaluationreportsratedVeryGood).Logframesshouldbemandatorycomponentsofevaluationssincetheyareintendedtoexpresswhathasbeenpromisedintermsofoutcomesandobjectivestobeachieved.Ifevaluatorsfindexistinglogframesinadequate,thenthereshouldbesomeattempttosetoneupinordertobeclear aboutwhat the evaluation process ismeasuring. At aminimum, the evaluatorsshouldarticulatetheproposedchainofresultsorprogramtheory.Thispracticeshouldbesuggestedintheinceptionreports.

Inadequate attention to Effectiveness:Effectivenessshouldbe themainsummativefocusoftheFindings.Often,moreattentionisgiventoatleastoneormoreofRelevance,Design, Efficiency, and Impact (when incorrectly used). While all are important tocommenton,theseshouldbelessofafocusthanprogresstowardsachievingresultsoftheintervention.

Varied understandings of Effectiveness and Impact: Frequently, the Effectivenesssectionisusedtopresenttheactivitiesthathavebeencarriedoutwithoutmentioningorlinkingthemtotheoutcomestheyareexpectedtocontributetowards.Thishappenseventhough,inmostcases,themainevaluationquestionsintheToRsareexplicitabouttheneed for theEffectiveness section to focus on outcomes.An additional concern isthat the Impact section is frequently pitched at the level of outcome results, andsometimes at the output level (i.e. increased knowledge gained from trainingworkshops).Iftheprojecthasnotyethadtheopportunitytoshowsubstantialprogresstowardsresults,inmostcasesitwouldbesufficientfortheevaluatortojuststatethatintheImpactsection.

Other common shortcomings: These include lack of consultation with, orinvolvement of, CLPs or other stakeholders beyond their being interviewed; notprovidinginformationaboutevaluatorsandtheirsuitabilityfortheassignment(inonecasethenameoftheevaluatorwasnotprovided);notprovidingthedates/timeframeoftheevaluationprocess;minimaluseofvisualaids;and,using interviewprotocolsthatarenotadaptedforspecificstakeholdergroups.

C.ImprovementsfromFirstDraftinFinal

Asrequested,thereviewerscomparedthefinalreportsofjustoveronethird(seven}oftheevaluationswiththeirfirstdraftsonwhichIEUwillhavecommented. Thesewererandomly selected from the first batch of 17 evaluations received. The results of thecomparisonareshowninTable4andthespecificanalysisisshowninAnnex1.

Table 4. Comparison of First and Final Drafts of a Sample of Evaluations Evaluation Date of first draft and

final draft Substantive

Changes Rating

MEXZ44 Preventing Corruption Mexico October 2015 October 2015

Few Fair

XAWU72 Airport Communications Project Final July 2016 Many Good

September 2016 LAOX26 Criminal Justice Responses to Human Trafficking in Lao Aug/Sept 2016

September 2016 Many Very

Good PERG34 Proyecto Sistema de Monitoreo de Cultivos Ilícitos SIMCI July 2015

February 2016 Some Fair

BRAX16 Expressive Youth - Citizenship, Access to Justice, Peace Brazil

November 2014 June 2015

(But issued in 2016)

Some Good

PSEX02 Forensic Human Resource & Governance Palestine Midterm

February 2016 June 2016

Some Good

XAPX37 Counter-Terrorism East and Southeast Asia Partnership Criminal Justice Responses

April 2016 June 2016

Few Very Good

Thefinaldocumentsshowedgeneralimprovementfromthedraftversions.AlthoughitdidnotappearthatthechangeswouldhaveaffectedtheoverallEQAratingsofanyofthereports,therewereseveralcaseswherethecategoryscoreswouldhaveimproved,particularly GEEW scores. Therewas considerable variation in the extent of changesbetween thedraft and final versions.Of the seven reports, twohadminimal changes,twohadsubstantialchangesandthreewereinbetween.InthetworeportsthatwereinSpanish, themaindifferencewas that theExecutive Summary (and summarymatrix)wastranslatedintoEnglish.Overall,therevisionsweremostlyhelpfulinoneormoreofthe following ways: increasing readability through editing and formatting changes,providing more evidence to back up findings, disaggregating the list of stakeholdersconsultedbygender,includingmoregenderanalysis,andreformulatingtheConclusions.In some instances the revisionswerenotbeneficial,mostnotably a reportwhere theExecutiveSummarywasincreasedfromfourtosevenpages.

Thereweresubstantialdifferencesintimebetweenthefirstdraftandthepublicationofthefinalversionofthereport.Therangewasfromthesamemonthtooverayear.

D.BestPracticesObserved

Revisions to logical frameworks: Where the evaluators reviewed and then, whennecessary, revised the logical framework of the evaluation, they were able to obtaindata that show outcomes and give an initial sense of impact, or indicatewhere datacouldnotbeobtained.Thegoodlogicalframeworksshowedclearlytheexpectedcausalconnectionofoutputwithoutcomes.ThetwoIn-depthevaluationsratedasVeryGoodprovidegoodexamplesofthispractice.

EvaluationMatrix:ThematrixthatisincludedinthereportforLAOX26StrengtheningCriminal justice Responses to Human Trafficking clearly shows theindicators/benchmarksandfindingsinrelationtoeachoftheevaluationquestions.

Robustmethodology:RASH13PreventionoftransmissionofHIVAmongDrugUsers isexemplaryforitsstrongmethodologywhichisabletoshoweffectiveness,relevanceandsustainability.

Treatmentoffindings:ClusterEvaluationofSMART(GLOJ88)&Forensic(GLOU54)hasstrongpresentationofEffectiveness,Efficiency,Impact,andPartnerships.

Presentation of Evaluation Tools: TKMX57 Strengthening Customs Service inTurkmenistan is an exemplary report for the inclusion of evaluation tools tailored todifferentstakeholdergroups.

E.ChallengesforEvaluators

Althoughsomeprogresshasbeenmadeinthisregard,theToRsmostlycontinuetohavetoomany questions to be answered by the evaluations. The stronger evaluations aregenerallythosewheretheevaluatorshavenarroweddownthenumberofquestions.

F.MainstreamingHumanRightsandGenderConsiderations

As in 2015, the evaluations this year consistently included dedicated sub-sectionswithintheFindingsforHumanRightsandGender.

Table5showsthattheaveragescoresforconsideringGEEWhaveimprovedaccordingto theUN-SWAPcriteria.TheSWAPtoolassesses theextent towhichgenderequalityandtheempowermentofwomen(GEEW)isintegratedintoevaluationprocesses.Therearefourcriteriaandeachisratedonascaleof0-3with0beingawardedwhenthereisno integration, 1 when there is partial integration, 2 when there is satisfactoryintegration, and 3 when the evaluation process exceeds requirements. The overallscoresfor2016wouldhavebeenevenhigherifitwasnotforonereportthatreceivedanUnsatisfactoryrating.

Table 5: Average scores for the integration of GEEW (UN-SWAP)

Quality Assessment Criteria Average

Score (0-3) 2015 2016

a. GEEW is integrated in the evaluation scope of analysis and indicators are designed in a way that ensures GEEW-related data will be collected.

1.125 1.63

b. Evaluation criteria and evaluation questions specifically address how GEEW has been integrated into design, planning, implementation of the intervention and the results achieved.

1.625 1.63

c. Gender-responsive evaluation methodology, methods and tools, and data analysis techniques are selected.

.875 1.37

d. Evaluation findings, conclusions and recommendations reflect a gender analysis. 1.875 1.84

Overall Score 5.5 6.53

Overall Rating Fair Fair

Reportsthatwereparticularlystronginanalysinggenderissuesincluded:PSEX02 Forensic Human Resource & Governance Palestine Midterm – this is agoodexampleof ananalysis that goesbeyond lookingat representationofwomen ineventsandcareerpositionstolookatbroaderissuesofGEEW.

XAPX37Counter-TerrorismEastandSoutheastAsiaPartnershipCriminalJusticeResponsesandLAOX26Criminal JusticeResponses toHumanTrafficking in Laowerebothexemplaryfortheexplanationinthemethodologysectionofhowgenderwasconsidered.ThelatterreportreferencedtheUNWomen’shandbook,“HowtoManageAGender-Responsive Evaluation” and used this guidance to structure the evaluationapproach(seeFigure1).

Figure 1: Excerpt of M ethodology Sect ion from LAOX26 Evaluation Report

RECOMMENDATIONS

ThemainrecommendationsfortheIEUtoconsiderare:

Updatedguidanceformanagersofevaluationprocessesandforevaluators: Itissuggested that the reviewand revisionofUNODC’sguidance forevaluationprocessestake into account the recommendationsmade in the EQA Synthesis Reports for both2014and2015astheissuesraisedcontinuetoberelevant.Theguidanceshouldincludethe expectations for reviewing draft reports and should be available in Spanish andFrench.

Additionalrecommendationsbeyondthoseinpastreportsare:

More emphasis on robustmethodology:Evaluators shouldbe includingqualitativedata collectionandanalysisprocesses. If surveysarenot feasible, interviewprotocolsandother toolsshouldbedesigned toelicitat leastsomeresponses thatcanbemoreeasily quantified through content analysis processes. Evaluators also need to betransparentabouthowtheyhaveanalyzedthedatacollectedthroughbothquantitativeandqualitativeprocesses.Additionally,evaluatorsshouldbedisaggregatingresponsesbystakeholdergroupsandbygender.ItisunlikelythatallstakeholdersholdthesameviewsandthevaryingperspectivesshouldbeilluminatedintheFindings.

Further refinement of the EQA template: Further refinement of the template issuggested. The new designwith a box to rate each criteria element has shown to behelpfulbutitsusehasrevealedsomeredundanciestobeaddressed.TherecommendedchangesareshowninAnnex3.

Appendix1.ComparisonofFinalReportwithDraftReport

MEXZ44

AddedEnglish-languageExecutiveSummaryandSummaryMatrix.Veryminoreditorialchanges.

XAWU72

There were extensive changes between the versions – most but not all wereimprovements.TheExecutiveSummarywasincreasedfrom4to7pages,inpartduetoanoverlydetaileddescriptionoftheprojectandchanges,andtheexcessivenumberoffootnoteswas increased. The ES improvements included criteria subheadings for thefindings/conclusions, and a note about the role of the CLP. The text in the SummaryMatrixissomewhatmoreclearandconcise.IntheBackgroundsection,adescriptionoftheprojectwas removedwhichnegativelyaffected theEQArating for that section. Inthe Methodology, Conclusions and Recommendations sections references to HR andGenderwereadded,and intheannexed listofstakeholders, thenumbers interviewedwere gender disaggregated. Text was adjusted throughout the Findings – mostly forclarification or additional evidence. There was a substantial amount of editing done(mostlyminorissues),longparagraphswerebrokenupandsomewererearranged.TheConclusionswerecompletelychangedtoincludeallcriteria.Onelessonswasmovedtobecome a conclusion. There were a number of formatting changes through out thedocument–mosthelpfulbutinacoupleofplacesmaps/figuresendedupcoveringtextwhichwasn’tthecaseintheearlierdraft.

LAOX26

Therewereanumberofchangesthatimprovedthefinalversion,andtheGEEWscore.The final version noted that the consultants were independent and one was asubstantiveexpert.TextofExecutiveSummarynot includedindraft.SummaryMatrixadjustedinfinalversiontoincludetowhomrecommendationsdirectedtoandmoreinevidencecolumn,andonerecommendation(andcorrespondinglesson)abouthandingback victims of trafficking was dropped. In several cases direct excerpts fromdocuments (some quite lengthy) are included in the draft but were removed orshortenedandthemainpointfromthosewereintegratedintothemaintext.Thiswasan improvement.Morespecificitywasaddedabout institutionsthatwere interviewedandpartoftheProjectSteeringCommittee.Thefullsetofevaluationquestions(41)wasremovedfromthemethodologysectionandreferencewasmadetowheretheycouldbefound in the annex. The Efficiency section was increased to include more financialinformation(budget/outcomeandspendingrate),moreanalysisoftheprojectlogframe,andmore evidence to support the shortcomings identified. The Effectiveness sectionincluded additional information about the outcomes of the training. The Conclusionssection,whichwasorganizedby criteria, originallydidnot addressDesignorHumanRights&Gender.Therewasminorre-wordingoftheRecommendations.Thenamesofpeopleconsultedduringtheevaluationswereremovedfromthelistingintheannexandaddedwasthegenderdisaggregationofthisgroup.

PERG34

AddedEnglish-languageExecutive Summary and SummaryMatrix. Moved a table onEvaluación de Hallazgos from the Findings section to an annex. Added arecommendationforUNODCVienna:“4.SerecomiendaalasedeenVienaestablecerunprograma de entrenamiento de nuevas tecnologías, dirigido al personal técnico delproyecto,entreotros,enelusoymanejodeproductossatelitalesdeúltimageneración,enlosnuevossistemasdemedicióndelaextensión,producciónydefinirsuaplicabilidadenelpaís.”Addedfiverecommendationsfortheprojectteamandeditedtheresultsshowninthe SummaryMatrix. Added a table on budgetary contributions by source. Added aparagraphinthemethodologysectionthatdescribedlimitationsintheinterviewsthatweremade. Addedmaterial in theFindingsdealingwithdesign,relevance,efficiency,partnershipsandeffectivenesswereallincreasedinthefinal.Therewerenochangesintheconclusionsorrecommendations.

BRAX16

Thefirstdraftdidnothaveanexecutivesummary.ThefinaldrafttranslatedthefrontpagefromPortuguesetoEnglish.Annexeswereaddedtothefinaldraft.

PSEX02

Therewereminorchangesthroughoutmostofthedocument.Thefontusedinthedraftwas according to the Guidelines but this was changed in the final. Minor revisionsincludedmovingonerecommendationfrom‘important’to‘key’intheSummaryMatrixand noting at the beginning of the Recommendations section who the majority ofrecommendations were directed towards. More substantial editing changes includedmoving portions of the text to the footnotes (footnotes were already used ratherfrequently),genderdisaggregationofthelistofstakeholders,addingmoreexplanatorydetailsinsomecases,andmakingsomeparagraphs/sectionsmoresuccinct.Thelatterwas particularly evident in the Efficiency section where the financial discussion wasshortenedandthelengthysubsectionon“ReasonsexplicatingEfficiency”wasreducedfrom3.5 to 2 pages.Althoughmost of the editing changes appearedhelpful, in a fewcasesreferencetotheperspectivesofstakeholderswereremovedanditwasnotedintheEQAthatthiswasashortcomingofthereport.ThedraftjustincludedtheToRofthelead evaluator but the final version included the full ToR (with ToRs for bothconsultants),apracticethatisnotedintheEQASummaryReportsasbeingredundant.

XAPX37

Thereweresomeeditorialchanges(includingmakingsomestatementslessdefinitive).Thesectiononhumanrightsandgenderequalitywasexpanded.Textonpartnershipswasmovedupaheadofeffectivenessinthefindings..

Appendix2.UNODCEQATemplateasUsedintheReview

GeneralProjectInformation

Project/ProgrammeNumberandName

ThematicArea

GeographicArea(Region,Country)

Approvedbudgetofthetimeoftheevaluation(USD)

TypeofEvaluation(In-Depth/IndependentProject;final/midterm;other)

Evaluator(s)

DateofEvaluation(fromMM/YYYYtoMM/YYYY)

DateofEvaluationReport(MM/YYYY)

QualityAssessmentconductedon/by

OVERALLQUALITYRATING:SUMMARY:

QualityAssessmentCriteriaAssessmentLevels:VeryGood-Good-Fair-UnsatisfactoryMeetsCriteria:Y=YesN=NoP=Partially

1.Structure,CompletenessAndClarityOfReport RATING:a. Format(headings,font)accordstoIEUGuidelinesandTemplatesforEvaluation

Reports.b. StructureaccordstoIEUGuidelinesforEvaluationReportswiththefollowing

sequence:ExecutiveSummary;SummaryMatrixofFindings,EvidenceandRecommendations;Introduction(BackgroundandContext,EvaluationScopeandMethodology,LimitationstotheEvaluation);Findings(Design,Relevance,Efficiency,PartnershipandCooperation,Effectiveness,Impact,Sustainability,HumanRightsandGenderEquality/mainstreaming,Innovation);Conclusions;Recommendations;LessonsLearned.

c. Languageisempoweringandinclusiveavoidinggender,heterosexual,age,culturalandreligiousbias,amongothers.

d. Reportiseasytoreadandunderstand(i.e.writteninanaccessiblenon-technicallanguageappropriatefortheintendedaudience).Visualaids,suchasmapsandgraphs,areusedtoconveykeyinformation.Listofacronymsisincluded.

e. Reportisfreefromanygrammar,spelling,orpunctuationerrors. f. Objectivesstatedinthetermsofreferenceareadequatelyaddressed. g. Issuesofhumanrightsandgenderequality/mainstreamingareadequatelyaddressed h. Reportcontainsalogicalsequence:evidence-assessment-findings-conclusions-

recommendations.

i. CompositionofEvaluationTeamisincludedandhasgenderandgeographicexpertise.Preferablyitisgenderbalancedandincludesprofessionalsfromcountriesorregionsconcerned.

j. Annexesincludeataminimum:evaluationtermsofreference;listofpersonsinterviewedandsitesvisited;listofdocumentsconsulted;evaluationtoolsused.

2.ExecutiveSummary RATING:a. Writtenasastand-alonesectionthatprovidesanoverviewoftheevaluationand

presentsitsmainresults.

b. Generallyfollowsthestructureof:i)Purpose,includingintendedaudience(s);ii)Objectivesandbriefdescriptionofintervention;iii)Methodology);iv)MainConclusions;v)Recommendations.

c. SummaryMatrixpresentsonlythekeyandmostimportantrecommendationsfromevaluationreport.

d. Findings,sourcesandrecommendationsintheSummaryMatrixareclearandcohesive,andspecifythestakeholdertowhomtheyareaddressed.

e. Maximumlength4pages,excludingtheSummaryMatrix. 3.EvaluationContextAndPurpose RATING:a. Cleardescriptionoftheprojectevaluatedispresented. b. Logicmodeland/ortheexpectedresultschain,and/orprogramtheory(thatata

minimumidentifiesandlinksobjectives,outcomesandindicatorsoftheproject)isclearlydescribed

c. Contextofkeycultural,genderrelated,social,political,economic,demographic,andinstitutionalfactorsaredescribed,andthekeystakeholdersinvolvedintheprojectimplementationandtheirrolesareidentified.

d. Project’sstatusisdescribedincludingitsphaseofimplementationandanysignificantchanges(e.g.strategies,logicalframeworks)thathaveoccurred.

e. Purposeoftheevaluationisclearlydefined,includingwhyitwasneededatthatpointintime,whoneededtheinformation,whatinformationisneeded,howtheinformationwillbeused,andthetargetaudience.

4.ScopeAndMethodology RATING:a. Evaluationscopeisclearlyexplainedincludingthemainevaluationcriteria,questions

andjustificationofwhattheevaluationdidanddidnotcover.

b. Transparentdescriptionpresentedofmethodologyapplied;howitwasdesignedtoaddresstheevaluationpurpose,objectives,questionsandcriteriaisexplained.

c. Methodologyallowsfordrawingcausalconnectionsbetweenoutputandexpectedoutcomes

d. Gendersensitivemethodologyawareofpowerrelationsduringanevaluationprocess,inclusiveandparticipatory.

e. Datacollectionmethodsandanalysis,anddatasourcesareclearlydescribed;asaretherationaleforselectingthem,andtheirlimitationsareclearlydescribed.Referenceindicatorsandbenchmarksareincludedwhererelevant.

f. Samplingframeclearlydescribedandincludesareaandpopulationtoberepresented,rationaleforselection,mechanicsofselectionincludingwhetherrandom,numbersselectedoutofpotentialsubjects,andlimitationsofsample.

g. Methodsareappropriateforanalysinggenderequality/mainstreamingandhumanrightsissuesidentifiedinevaluationscope

h. Highdegreeofparticipationofinternalandexternalstakeholders,includingtheCore

LearningPartners,throughouttheevaluationprocessisplannedforandmadeexplicit.Whentherearethematicorapproachgaps(i.e.genderequality/mainstreaming)amongstakeholders,otherkeyinformantsnotdirectlyinvolvedintheprojectwereinvitedforconsultation.

5.ReliabilityofDataToensurequalityofdataandrobustdatacollectionprocesses

RATING:

a. Triangulationprinciples(usingmultiplesourcesofdataandmethods)wereappliedtovalidatefindings.

b. Qualitativeandquantitativedatasourceswereused,andincludedtherangeofstakeholdergroupsandadditionalkeyinformants(whennecessary)definedinevaluationscope.

c. Limitationsthatemergedinprimaryandsecondarydatasourcesandcollectionprocesses(bias,datagaps,etc.)areidentifiedand,ifrelevant,actionstakentominimizesuchissuesareexplained.

d. Evidenceprovidedofhowdatawascollectedwithasensitivitytoissuesofdiscriminationandotherethicalconsiderations.

e. Adequatedisaggregationofdatabyrelevantstakeholderundertaken(gender,ethnicity,age,under-representedgroups,etc.).Ifthishasnotbeenpossible,itisexplained.

6.FINDINGSANDANALYSISToensuresoundanalysisandcrediblefindings

RATING:

Findings - a. Havebeenformulatedclearly,takeintoaccountanyidentifiedbenchmarks,andare

basedonrigorousanalysisofthedatacollected.

b. AddressallevaluationcriteriaandquestionsraisedintheToRincludingrelevance,efficiency,effectiveness,impactandsustainability,aswellasUNODC’sadditionalcriteriaofdesign,partnershipandcooperation,innovation,andthecross-cuttingthemesofhumanrightsandgender.

c. AddressanylimitationsorgapsintheevidenceanddiscussanyimpactsonrespondingtoevaluationquestionsraisedinToR.

d. Discussanyvariancesbetweenplannedandactualresultsoftheproject(intermsofobjectives,outcomes,outputs).

e. Arepresentedinaclearmanner.

Analysis -a. Interpretationsarebasedoncarefullydescribedassumptions. b. Contextualfactorsareidentified(includingreasonsforaccomplishmentsandfailures,

andcontinuingconstraints).

c. Causeandeffectlinksbetweenaninterventionanditsendresults(includingunintendedresults)areexplained.

d. Includessubstantiveanalysisofgenderequality/mainstreamingissues e. Includessubstantiveanalysisofhumanrightsissues. 7.CONCLUSIONS RATING:a. Takeintoconsiderationallevaluationcriteriaandquestions,includinghumanrights

andgenderequality/mainstreamingcriteria.

b. Havebeenformulatedclearlyandarebasedonfindingsandsubstantiatedbyevidencecollected.

c. Conveytheevaluators’unbiasedjudgementoftheintervention d. Developedwiththeinvolvementofrelevantstakeholders. e. Presentacomprehensivepictureofboththestrengthsandweaknessesoftheproject. f. Gobeyondthefindingsandprovideathoroughunderstandingoftheunderlying

issuesoftheprojectandaddvaluetothefindings.

8.RECOMMENDATIONS RATING:a. Areclearlyformulated,basedontheconclusions,andsubstantiatedbyevidence

collected.

b. Addressflaws,ifany,inproject’sdataacquisitionprocesses. c. Arespecific,realistic,time-boundandactionable,andofamanageablenumber. d. Areclusteredandprioritized. e. Reflectstakeholders’consultationswhilstremainingbalancedandimpartial f. Clearlyidentifyatargetgroupforaction. 9.LESSONSLEARNED RATING:a. Arecorrectlyidentified,innovativeandaddvaluetocommonknowledge. b. Arebasedonspecificevidenceandanalysisdrawnfromtheevaluation. c. Havewiderapplicabilityandrelevancetothespecificsubjectandcontext.

SCORING

ElementOfTheEvaluationPointsPerCategory

PointsAwardedVeryGood Good Fair Unsatisfactory

PresentationAndCompleteness 10 ExecutiveSummary 5

EvaluationContextAndPurpose 5 EvaluationScopeAndMethodology 10

ReliabilityOfData 5

FindingsAndAnalysis 35

Conclusions 10

Recommendations 15 LessonsLearned 5

TotalMaximumScore 100

VeryGood->veryconfidenttouse

Good->confidenttouse

Fair->usewithcaution

Unsatisfactory->notconfidenttouse

ASSESSMENTOFTHEINTEGRATIONOFGENDEREQUALITYANDEMPOWERMENTOFWOMEN(GEEW)forUN-SWAPQualityAssessmentCriteria Comments Score(0-3)

a. GEEWisintegratedintheevaluationscopeofanalysisandindicatorsaredesignedinawaythatensuresGEEW-relateddatawillbecollected.

b. EvaluationcriteriaandevaluationquestionsspecificallyaddresshowGEEWhasbeenintegratedintodesign,planning,implementationoftheinterventionandtheresultsachieved.

c. Gender-responsiveevaluationmethodology,methodsandtools,anddataanalysistechniquesareselected.

d. Evaluationfindings,conclusionsandrecommendationsreflectagenderanalysis.

OverallScore

OverallRating

UN-SWAPScoringSystemExceedingRequirements 3-Fullyintegrated.Applieswhenalloftheelementsunderacriterionaremet,usedandfullyintegratedinthe

evaluationandnoremedialactionisrequiredMeetingRequirements 2-Satisfactorilyintegrated.Applieswhenasatisfactorylevelhasbeenreachedandmanyoftheelementsaremetbut

stillimprovementcouldbedoneApproachingRequirements 1-Partiallyintegrated.Applieswhensomeminimalelementsaremetbutfurtherprogressisneededandremedial

actiontomeetthestandardisrequired.Missing 0-Notatallintegrated.Applieswhennoneoftheelementsunderacriterionaremet.OverallCalculation 11-12=verygood8-10=good4-7=Fair0-3=unsatisfactory

Appendix3.RecommendedRevisionstoUNODCEQATemplate

GeneralProjectInformation

Project/ProgrammeNumberandName

ThematicArea

GeographicArea(Region,Country)

Approvedbudgetofthetimeoftheevaluation(USD)

TypeofEvaluation(In-Depth/IndependentProject;final/midterm;other)

Evaluator(s)

DateofEvaluation(fromMM/YYYYtoMM/YYYY)

DateofEvaluationReport(MM/YYYY)

QualityAssessmentconductedon/by

OVERALLQUALITYRATING:SUMMARY:

QualityAssessmentCriteriaAssessmentLevels:VeryGood-Good-Fair-UnsatisfactoryMeetsCriteria:Y=YesN=NoP=PartiallyN/A-NotApplicable

1.Structure,CompletenessAndClarityOfReport RATING:

a. Format(headings,font)accordstoIEUGuidelinesandTemplatesforEvaluationReports.

b. StructureaccordstoIEUGuidelinesforEvaluationReportswiththefollowinglogicalsequence:Listofacronyms;ExecutiveSummary;SummaryMatrixofFindings,EvidenceandRecommendations;Introduction(BackgroundandContext,EvaluationScopeandMethodology,LimitationstotheEvaluation);Findings(Design,Relevance,Efficiency,PartnershipandCooperation,Effectiveness,Impact,Sustainability,HumanRightsandGenderEquality/Mainstreaming,Innovation);Conclusions;Recommendations;LessonsLearned.

c. Objectivesstatedinthetermsofreferenceareadequatelyaddressed. d. Issuesofhumanrightsandgenderequality/mainstreamingareadequately

addressed

e. Reportiseasytoreadandunderstand(i.e.writteninanaccessiblenon-technicallanguageappropriatefortheintendedaudience)..

f. Languageisempoweringandinclusiveavoidinggender,heterosexual,age,culturalandreligiousbias,amongothers.

g. Reportisfreefromgrammar,spelling,orpunctuationerrors. h. Visualaids,suchasmapsandgraphs,areusedtoconveykeyinformation i. Reportcontainsalogicalsequence:evidence-assessment-findings-conclusions-

recommendations.

j. CompositionofEvaluationTeamisincludedandhasgenderandgeographicexpertise.Preferablyitisgenderbalancedandincludesprofessionalsfromcountriesorregionsconcerned.

k. Annexesincludeataminimum:evaluationtermsofreference;logicmodeland/orevaluationmatrix;listofpersonsinterviewedandsitesvisited;listofdocumentsconsulted;evaluationtoolsused.

2.ExecutiveSummary RATING:a. Writtenasastand-alonesectionthatprovidesanoverviewoftheevaluation

andpresentsitsmainresults.

b. Generallyfollowsthestructureof:i)Purpose,includingintendedaudience(s);ii)Objectivesandbriefdescriptionofintervention;iii)Methodology);iv)MainConclusions;v)Recommendations.

c. SummaryMatrixpresentsonlythekeyandmostimportantrecommendationsfromevaluationreport.

d. Findings,sourcesandrecommendationsintheSummaryMatrixareclearandcohesive,andspecifythestakeholdertowhomtheyareaddressed.

e. Maximumlength4pages,excludingtheSummaryMatrix. 3.EvaluationContextAndPurpose RATING:a. Cleardescriptionoftheprojectevaluatedispresented. b. Logicmodeland/ortheexpectedresultschain,and/orprogramtheory(thatat

aminimumidentifiesandlinksobjectives,outcomesandindicatorsoftheproject)isclearlydescribed.

c. Contextofkeycultural,genderrelated,social,political,economic,demographic,andinstitutionalfactorsaredescribed,andthekeystakeholdersinvolvedintheprojectimplementationandtheirrolesareidentified.

d. Projectstatusisdescribedincludingitsphaseofimplementationandanysignificantchanges(e.g.tostrategies,logicalframeworks)thathaveoccurred.

e. Purposeoftheevaluationisclearlydefined,includingwhyitwasneededatthatpointintime,whoneededtheinformation,whatinformationisneeded,howtheinformationwillbeused,andthetargetaudience.

4.ScopeAndMethodology RATING:a. Evaluationscopeisclearlyexplainedincludingthemainevaluationcriteria,

questionsandjustificationofwhattheevaluationdidanddidnotcover.

b. Transparentdescriptionpresentedofmethodologyapplied,includinghowitwasdesignedtoaddresstheevaluationpurpose,objectives,questionsandcriteriaisexplained.

c. Methodologyallowsfordrawingcausalconnectionsbetweenoutputandexpectedoutcomes

d. Methodsareappropriateforanalysinggenderequality/mainstreamingandhumanrightsissuesidentifiedinevaluationscope;Gendersensitivemethodologytakesintoaccountpowerrelationsduringanevaluationprocess;isinclusiveandparticipatory.

e. Datacollectionmethodsandanalysis,anddatasourcesareclearlydescribed;asaretherationaleforselectingthem,andtheirlimitationsareclearlydescribed.Referenceindicatorsandbenchmarksareincludedwhererelevant.

f. Samplingframeclearlydescribedandincludesareaandpopulationtoberepresented,rationaleforselection,mechanicsofselectionincludingwhetherrandom,numbersselectedoutofpotentialsubjects,andlimitationsofsample.

g. h. Highdegreeofparticipationofinternalandexternalstakeholders,includingthe

CoreLearningPartners,throughouttheevaluationprocessisplannedforandmadeexplicit.Whentherearethematicorapproachgaps(i.e.genderequality/mainstreaming)amongstakeholders,externalkeyinformantsnotdirectlyinvolvedintheprojectwereinvitedforconsultation.

5.ReliabilityofDataToensurequalityofdataandrobustdatacollectionprocesses

RATING:

a. Triangulationprinciples(usingmultiplesourcesofdataandmethods)wereappliedtovalidatefindings.

b. Qualitativeandquantitativedatasourceswereused,andincludedtherangeofstakeholdergroupsandadditionalkeyinformants(whennecessary)definedinevaluationscope.

c. Limitationsthatemergedinprimaryandsecondarydatasourcesandcollectionprocesses(bias,datagaps,etc.)areidentifiedand,ifrelevant,actionstakentominimizesuchissuesareexplained.

d. Evidenceprovidedofhowdatawascollectedwithasensitivitytoissuesofdiscriminationandotherethicalconsiderations.

e. Adequatedisaggregationofdatabyrelevantstakeholderundertaken(gender,ethnicity,age,under-representedgroups,etc.).Ifthishasnotbeenpossible,itisexplained.

6.FINDINGSANDANALYSISToensuresoundanalysisandcrediblefindings

RATING:

Findings - a. Areclearlyformulatedandpresented b. Arebasedonrigorousanalysisofthedatacollected;takeintoaccountany

identifiedbenchmarks.

c. AddressallevaluationcriteriaandquestionsraisedintheToRincludingrelevance,efficiency,effectiveness,impactandsustainability,aswellasUNODC’sadditionalcriteriaofdesign,partnershipandcooperation,innovation,

andthecross-cuttingthemesofhumanrightsandgender.d. Addressanylimitationsorgapsintheevidenceanddiscussanyimpactson

respondingtoevaluationquestionsraisedinToR.

e. Discussanyvariancesbetweenplannedandactualresultsoftheproject(intermsofobjectives,outcomes,outputs).

f. Arepresentedinaclearmanner. Analysis -a. Interpretationsarebasedoncarefullydescribedassumptions. b. Contextualfactorsareidentified(includingreasonsforaccomplishmentsand

failures,andcontinuingconstraints).

c. Causeandeffectlinksbetweenaninterventionanditsendresults(includingunintendedresults)areexplained.

d. Includessubstantiveanalysisofgenderequality/mainstreamingissues e. Includessubstantiveanalysisofhumanrightsissues. 7.CONCLUSIONS RATING:a. Takeintoconsiderationallevaluationcriteriaandquestions,includinghuman

rightsandgenderequality/mainstreamingcriteria.

b. Havebeenformulatedclearlyandarebasedonfindingsandsubstantiatedbyevidencecollected.

c. Conveytheevaluators’unbiasedjudgementoftheintervention d. Developedwiththeinvolvementofrelevantstakeholders. e. Presentacomprehensivepictureofboththestrengthsandweaknessesofthe

project.

f. Gobeyondthefindingsandprovideathoroughunderstandingoftheunderlyingissuesoftheprojectandaddvaluetothefindings.

8.RECOMMENDATIONS RATING:a. Areclearlyformulated,basedontheconclusions,andsubstantiatedby

evidencecollected.

b. Addressflaws,ifany,inproject’sdataacquisitionprocesses. c. Arespecific,realistic,time-boundandactionable,andofamanageablenumber. d. Areclusteredandprioritized. e. Reflectstakeholders’consultationswhilstremainingbalancedandimpartial

f. Clearlyidentifyatargetgroupforaction. 9.LESSONSLEARNED RATING:a. Arecorrectlyidentified,innovativeandaddvaluetocommonknowledge. b. Arebasedonspecificevidenceandanalysisdrawnfromtheevaluation. c. Havewiderapplicabilityandrelevancetothespecificsubjectandcontext. SCORING

ElementOfTheEvaluationPointsPerCategory

PointsAwardedVeryGood Good Fair Unsatisfactory

PresentationAndCompleteness 10 ExecutiveSummary 5

EvaluationContextAndPurpose 5 EvaluationScopeAndMethodology 10

ReliabilityOfData 5

FindingsAndAnalysis 35

Conclusions 10

Recommendations 15 LessonsLearned 5

TotalMaximumScore 100

VeryGood->veryconfidenttouse

Good->confidenttouse

Fair->usewithcaution

Unsatisfactory->notconfidenttouse

ASSESSMENTOFTHEINTEGRATIONOFGENDEREQUALITYANDEMPOWERMENTOFWOMEN(GEEW)forUN-SWAPQualityAssessmentCriteria Comments Score(0-3)

a. GEEWisintegratedintheevaluationscopeofanalysisandindicatorsaredesignedinawaythatensuresGEEW-relateddatawillbecollected.

b. EvaluationcriteriaandevaluationquestionsspecificallyaddresshowGEEWhasbeenintegratedintodesign,planning,implementationoftheinterventionandtheresultsachieved.

c. Gender-responsiveevaluationmethodology,methodsandtools,anddataanalysistechniquesareselected.

d. Evaluationfindings,conclusionsandrecommendationsreflectagenderanalysis.

OverallScore

OverallRating

UN-SWAPScoringSystemExceedingRequirements 3-Fullyintegrated.Applieswhenalloftheelementsunderacriterionaremet,usedandfullyintegratedinthe

evaluationandnoremedialactionisrequiredMeetingRequirements 2-Satisfactorilyintegrated.Applieswhenasatisfactorylevelhasbeenreachedandmanyoftheelementsaremetbut

stillimprovementcouldbedoneApproachingRequirements 1-Partiallyintegrated.Applieswhensomeminimalelementsaremetbutfurtherprogressisneededandremedial

actiontomeetthestandardisrequired.Missing 0-Notatallintegrated.Applieswhennoneoftheelementsunderacriterionaremet.OverallCalculation 11-12=verygood8-10=good4-7=Fair0-3=unsatisfactory

Independent Quality Assessment of UNODC Evaluation Reports ... · evaluation reports in 2016 and...

Documents

Transcript of Independent Quality Assessment of UNODC Evaluation Reports ... · evaluation reports in 2016 and...