Initiating and Sponsoring Research and Evaluation Guideline · Scope ... Appendix 3:Guidance Notes...

26
1 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016 Guideline Initiating and sponsoring research and evaluation This document is currently under review. Please direct any queries regarding this document to the policy officer listed in Table 1. Summary This guideline is designed to ensure that the quality of evidence provided to key decision-makers in the Department for Education and Child Development (DECD) is of a high standard. It requires evaluations to meet particular standards and aims to ensure all significant DECD services are evaluated on a regular planned basis. Whilst the guidelines are predominantly applied to evaluation, they also apply where DECD sponsors or initiates research. Table 1 - Document details Publication date 23 March 2016 File number DECD 14/11161 Related legislation This guideline is consistent with the requirements and expectations of the Public Sector Act 2009, Public Finance and Audit Act 1987 and Treasurers Instructions. NOTE: Treasurers Instruction 17 instructs Chief Executives to ensure that officers of their authorities ensure evaluation of public sector initiatives. Related policies, procedures, guidelines, standards, frameworks Research and Evaluation Policy Procurement Governance Policy Conducting Research and Evaluation in DECD Procedure Reporting on Research and Evaluation Guideline Version 1.1 Replaces 1.0 Policy officer (position) Manager Strategic Data Development Policy officer (phone) (08) 8204 1262 Policy sponsor (position) Director, Business Intelligence, Office for Strategy and Performance

Transcript of Initiating and Sponsoring Research and Evaluation Guideline · Scope ... Appendix 3:Guidance Notes...

1 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Guideline

Initiating and sponsoring research and evaluation

This document is currently under review. Please direct any queries regarding this document to the policy officer listed in Table 1. Summary

This guideline is designed to ensure that the quality of evidence provided to key decision-makers in the Department for Education and Child Development (DECD) is of a high standard. It requires evaluations to meet particular standards and aims to ensure all significant DECD services are evaluated on a regular planned basis. Whilst the guidelines are predominantly applied to evaluation, they also apply where DECD sponsors or initiates research.

Table 1 - Document details

Publication date

23 March 2016

File number DECD 14/11161

Related legislation This guideline is consistent with the requirements and expectations of the Public Sector Act 2009, Public Finance and Audit Act 1987 and Treasurer’s Instructions.

NOTE: Treasurer’s Instruction 17 instructs Chief Executives

to ensure that officers of their authorities ensure evaluation of public sector initiatives.

Related policies, procedures, guidelines, standards, frameworks

Research and Evaluation Policy

Procurement Governance Policy

Conducting Research and Evaluation in DECD Procedure

Reporting on Research and Evaluation Guideline

Version 1.1

Replaces 1.0

Policy officer (position) Manager Strategic Data Development

Policy officer (phone) (08) 8204 1262

Policy sponsor (position) Director, Business Intelligence, Office for Strategy and Performance

2 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Executive director responsible (position and office)

A/Executive Director Office for Strategy and Performance

Applies to All DECD staff

Key words Research, Evaluation, Evidence-based Policy, Evidence- based Practice

Status

Approved

Approved by

A/Executive Director Office for Strategy and Performance

Approval date

22 March 2016

Review date

22 March 2017

3 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Table 2 - Revision record

Date

Version

Revision description

2014

1.0

New guideline

22 March 2016

1.1

Minor edits, approved by A/Executive Director, Office for Strategy and Performance.

4 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Table of Contents

Guideline ........................................................................................................................... . 1

Initiating and Sponsoring Research and Evaluation............................................................ 1

Summary ......................................................................................................................................................1

Table of Contents .........................................................................................................................................4

1. Title ..........................................................................................................................................................5

2. Purpose ....................................................................................................................................................5

3. Scope ............................................................................................................................. ..........................5

4. Guideline detail.........................................................................................................................................6

4.1 Aims and benefits of evaluation.......................................................................................................6

4.2 Planning Evaluation – New Programs .............................................................................................6

4.3 Planning Evaluation – Existing Programs........................................................................................6

4.4 Planning Evaluation – Coordination ................................................................................................7

4.5 Undertaking Evaluation ...................................................................................................................7

4.6 Initiating, Partnering or Influencing Research within DECD .............................................................8

5. Roles and responsibilities ........................................................................................................................ .8

6. Monitoring, evaluation and review.............................................................................................................9

7. Definitions and abbreviations ..................................................................................................................10

8. Supporting documents........................................................................................................................... .11

9. References .............................................................................................................................................11

Appendix

Appendix 1:Evaluation Planning Documentation Requirements............................................... 13

Appendix 2:Standards for Quality Evaluation ..............................................................................................17

Appendix 3:Guidance Notes for Assessing the Strength of Research and Evaluation Evidence for a Single

Research and Evaluation Reports ..............................................................................................................21

5 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

1. Title

Initiating and Sponsoring Research and Evaluation Guideline

2. Purpose

This guideline is necessary to ensure that the quality of evidence provided to key decision-makers in the Department for Education and Child Development (DECD) is of a high standard.

This guideline aims to:

• ensure that the Senior Executive Group within DECD has timely, strategically focussed,

objective and evidence-based information on the performance of its programs to assist in the planning and delivery for improved outcomes for children, young people and families;

• improve the quality of evaluation design, management and reporting;

• bring about a culture of evaluation and evidence-informed policy within DECD.

This guideline aims to ensure that research and evaluation that is procured, sponsored, funded, or undertaken by DECD or by DECD and one or more partners:

• Is of a high standard in terms of research methodology, design and reporting

• Is likely to provide evidence that can reliably be used to inform policy and practice

• Utilises existing DECD data sets where available

• Is cost effective and avoids duplication of other research and evaluation activities

• Ensures the safety and well-being of students, children and young people, parents, staff and

volunteers.

3. Scope

This guideline sets out how the research and evaluation should be specified in any contracts or agreements that relate to DECD.

Whilst this guideline predominantly applies to evaluation it should be applied to all research and evaluation activity within DECD or that DECD is involved in or can influence. This guideline also applies where DECD sponsors or initiates research.

The scope of these guidelines includes programs, projects, policies, interventions, initiatives, activities, strategies and directions undertaken or managed through DECD whether alone or in partnership with other entities or organisations.

6 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

4. Guideline detail

4.1 Aims and benefits of evaluation

Evaluation is about improving the quality of decisions by informing current and future programs with lessons learned from previous experience. Evaluation has the potential to provide a better understanding of the impacts of programs and how they affect key outcomes.

Evaluation is an essential element of good governance and generates, evidence-based information for decision and policy making in relation to existing and proposed programs. It provides mechanisms for organisational assurance and accountability and acts as a catalyst for reform and improvement. Evaluation provides management with objective information to assist in assessing value and setting priorities.

Evaluation within DECD has two main purposes:

• Learning: for the purpose of program improvement, evaluation for learning discovers what has

worked well, for whom and in what circumstances and also, equally importantly, what has not worked.

• Accountability: for the purpose of holding DECD to account for its performance, evaluation for

accountability discovers evidence of performance and outcomes.

4.2 Planning Evaluation – New Programs

All proposals for new major or significant programs should include an evaluation plan and program logic documented. This requires that the proposal seeking approval for the new initiative include the program theory and logic which describes the theory of change, assumptions, dependencies, inputs, outputs and the expected short and long term outcomes. The program logic is generally required as part of the business case for a new initiative. It follows that the changes in outcomes that are put forward as benefits in the business case are the same outcomes that should be assessed as part of the evaluation.

The proposed timing of the evaluation should be specified in the evaluation plan and justified based on the nature of the new initiative. The timeframe for completion of an evaluation should be specified before a new program commences. This timeframe should relate to the expected time required to commence the program and for the outcomes to change as set out in the business case.

4.3 Planning Evaluation – Existing Programs

All existing programs are expected to be evaluated on a regular basis. The timing of evaluations will depend on the nature of the program. However the value of undertaking a formal evaluation of an existing program should be considered by the relevant program owner at least every 5 years.

Some existing programs do not currently have a formal program logic documented. When the program is next evaluated, program logic should be developed and included as part of the specification for the evaluation.

Before preparing an evaluation brief, the first stage of the planning process is for the program area to conduct due diligence about the evaluation and feasibility of what is desired. This should include an assessment of whether the scope and requirements of the evaluation can actually be evaluated given the purpose of the evaluation, and whether all the evaluation questions can be answered to the level required.

7 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

4.4 Planning Evaluation – Coordination

The Business Intelligence Unit may work with program managers to support evaluations undertaken in DECD to ensure that:

• Evaluation activity and support will be targeted to areas of greatest strategic importance

• Evaluations are undertaken by reputable evaluators. The competence and integrity of the

evaluator is critical to the credibility of evaluation.

• Credibility is strengthened by having a high degree of transparency of the evaluation process.

• Evaluations are cost-effective and utilise DECD standard data sets and other existing resources

where possible

• There is a managed regular evaluation program that is planned in advance.

4.5 Undertaking Evaluation

All evaluations should meet the DECD Standards for Quality Evaluation (refer Appendix 2).

All evaluations should be conducted independently from the organisational unit with responsibility for the program being evaluated.

The procurement of all evaluation will be in accordance with the DECD Procurement Governance Policy To assist business units in assuring that evaluation reports are of consistent high quality appropriate to DECD needs, the ‘Evaluation Planning Documentation’ sets out minimum expectations as to what should be included in a specification for the acquisition of consultancy or contractor services relevant to evaluation. These expectations are those particular to evaluation and do not detract in any way to the overarching and general requirements for any proposed acquisition.

The program manager may prepare the specification in consultation with the Business Intelligence Unit.

The specifications for all proposed evaluations may be reviewed by the Business Intelligence Unit prior to being approved by the relevant Executive Director. Worksites should then seek Procurement Unit advice on recommended procurement methods in accordance with the DECD Procurement Governance Policy.

Where applicable, worksites should seek Procurement Unit advice on recommended procurement methods on a case-by-case basis.

The specification for the evaluation will require the evaluation to:

be an impartial assessment rather than a paid endorsement of a particular program;

make clear distinction between value judgements and statements of fact based on reliable methods of observation and inference.

involve program stakeholders and program staff.

ensure differences in perspectives between stakeholders are properly taken into account in planning, implementing and reporting an evaluation. Evaluations should be free of bias and reflect a fair and balanced view.

8 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

ensure the process is as open as possible and the results made appropriately available. To enhance transparency, methods of data collection and analysis should be clearly disclosed; factual findings and conclusions be explicitly justified; standards for value judgements be clearly stated; value judgements and recommendations be clearly distinguished from factual findings and conclusions

meet the requirements of the Conducting Research and Evaluation in DECD sites procedure for externally and internally initiated research, where the evaluation requires involvement of students, children and young people. This includes meeting the procedure requirement that any research and evaluation that involves full or partial funding or in-kind support of DECD must obtain the approval of the Chief Executive.

Draft and final reports may be provided to the Business Intelligence Unit for feedback to ensure compliance with the Standards for Quality Evaluation. A separate guideline1sets out the requirements for how evaluation reports are assessed and disseminated within DECD.

4.6 Initiating, Partnering or Influencing Research within DECD

DECD may initiate, partner or significantly influence research such as that initiated at a national level on behalf of all education authorities. In these situations DECD staff are required to apply the same principles outlined in these guidelines to ensure that the research evidence is of benefit to children and young people.

A separate Guidance Notes for Assessing the Strength of Research and Evaluation Evidence for a Single Research and Evaluation Reports is included in APPENDIX 3. This details how evidence from a

single research study will be assessed by DECD. As such it provides helpful guidance for DECD to specify research proposals. The Business Intelligence Unit may be able to provide specific advice on research specifications.

5. Roles and responsibilities

Table 2 - Roles and responsibilities

Role

Authority/responsibility for

Business Intelligence Unit Promote a culture of evaluation and evidence- informed policy

Provide quality assurance checks on proposals for evaluation by external contractors.

May provide quality assurance on evaluation specifications going to tender for external contractors prior to approval by the Executive Director.

Monitor and review these guidelines.

Assistant Director, Procurement Provide advice on recommended procurement methods and where applicable the submission of an Acquisition Plan to the Assistant Director,

1 Reporting on Research and Evaluation guideline

9 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Role

Authority/responsibility for

Procurement and Contracting.

Evaluators Meet the DECD Standards for Quality Evaluations and the specifications for the evaluation.

May provide draft and final reports to the Business Intelligence Unit for feedback.

Executive Directors Ensure that all proposals for new programs incorporate an evaluation plan that includes the proposed timing of the evaluation and program logic.

Ensure that all existing programs are evaluated at least every 5 years.

Ensure there are adequate financial resources to support planned evaluations.

Approve the specifications for all proposed evaluations

Ensure an evaluation schedule is maintained Promote a culture of evaluation and evidence- informed policy within their area of responsibility.

Program Managers Preparing specifications and planning documentation for evaluation.

Senior Executive Group (SEG) The SEG has the overarching responsibility to establish a sustainable evaluation culture within DECD, and specifically shall:

Review the Evaluation Schedule to ensure that planned evaluations meet the identified critical strategic directions of the agency and, in particular, outcomes for children, young people and families;

Ensure that the Evaluation Plan covers programs critical to the Department.

6. Monitoring, evaluation and review

These guidelines will be reviewed by the Business Intelligence Unit on an annual basis commencing 12 months after the approval date.

10 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

7. Definitions and abbreviations

Table 3 - Definitions and abbreviations

Term

Meaning

Activity (Activities)

What happens between the inputs and outputs, the events or actions that take place, the things that are done to carry out the program (processes, tools, actions). Activities transform inputs into outputs.

Evaluation

Evaluation is the systematic collection and analysis of information about the activities, characteristics, and outcomes of programs to make judgments about the program, guide improvement in program effectiveness and/or inform decisions about future programming. Programs include services, projects, policies, interventions, initiatives, business processes and activities.

Goal

A general description of an intended vision which may be ‘aspirational’

Inputs

The resources available and used to run the program (eg: money, facilities, staff, materials, equipment, customers, clients)

Key decision-makers

The Senior Executive Group within DECD, the Chief Executive, the Minister for Education and Child Development and Cabinet.

Need

The problem or issue that an intervention aims to address.

New Initiatives

A new program, intervention, service, legislation, project, business process or any other activity of strategic importance.

Office

For the purpose of this policy ‘Office’ refers to a part of the organisation for which a Head, Deputy Chief Executive or Educational Director is responsible

Output(s)

What is immediately produced through the activities (e.g. products, policies, distributions, materials, events)

Outcome(s)

The changes or benefits to the participants, intended and unintended, resulting from an intervention (i.e.: the end result from the application of the inputs to perform the activities that produce the outputs).

11 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Term

Meaning

Objective(s)

The specific means to reach the goal; a fairly specific description of an intended outcome; the steps taken to achieve the desired outcome.

Program In the context of evaluation and this policy,

‘program’ is interpreted widely to include projects, policies, legislation, interventions, initiatives, activities, business processes, strategies and directions undertaken by DECD whether alone or in partnership with other entities or organisations; and is of strategic importance.

The term ‘program’ is used throughout this policy document to embrace all of the above activities.

Research Research is defined as the creation of new knowledge or the use of existing knowledge in new, creative and systematic ways so as to generate new concepts, methodologies and understandings”. (OECD, 2002)

A studious inquiry or examination; especially : investigation or experimentation aimed at the discovery and interpretation of facts, revision of accepted theories or laws in the light of new facts, or practical application of such new or revised theories or laws. http://www.merriam- webster.com/dictionary/research

8. Supporting documents

Appendix 1:Evaluation Planning Documentation Requirements

Appendix 2:Standards for Quality Evaluations

Appendix 3:Guidance Notes for Assessing the Strength of Research and Evaluation Evidence for a Single Research and Evaluation Reports

9. References

Davidson, Jane. (2005) Evaluation methodology basics: the nuts and bolts of sound evaluation. Sage

Publications, Thousand Oaks, California.

American Evaluation Association 2004. Guiding Principles for Evaluators. Retrieved March, 2014, Retrieved March, 2014, from http://www.eval.org/p/cm/ld/fid=51

Australian Evaluation Society 2013. Guidelines for the Ethical Conduct of Evaluations. Retrieved March, 2014, from http://www.aes.asn.au/join-the-aes/membership-ethical-guidelines.html

12 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Department for Education and Child Development 2009. Research and Evaluation Framework.

Retrieved February, 2014, from https://www.decd.sa.gov.au/department/research-and-data

Department for Education and Child Development 2011. Self Review Standards. Retrieved March, 2014

OECD 2013. Synergies for Better Learning: An International Perspective on Evaluation and Assessment. Retrieved March 2014, from http://www.oecd.org/edu/school/synergies-for-better-learning.htm

Department of Education and Early Childhood Development 2013. Evaluation in DEECD Guide.

Retrieved March, 2014, https://edugate.eduweb.vic.gov.au/Services/frameworks/evaluation/Pages/default.aspx

13 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Appendix 1: Evaluation Planning Documentation Requirements

To avoid confusion with the general use of the term ‘specification’ in procurement documentation, the specific expectations relevant to the procurement of evaluation/review services will be referred to hereafter as ‘terms of reference’. Each of the headings in these requirements (which now follow) should be included in the terms of reference (specification) for an evaluation or review to be conducted by an external contractor or consultant.

1 Context/background

This section should be aimed at providing potential contractors/consultants with a succinct description of the program to be evaluated. Typical information might include:

The fundamental, underlying ‘need’ for the program – what problem is being addressed by the program? (Sometimes referred to as the problem statement.)

The fundamental underlying ‘theory’ as to how, and to what extent, the program will address the ‘need’ or problem. (Sometimes referred to as the program theory.)

The size and reach of the program (in terms of the ‘dollar’ size of the program and/or the number of schools, students, teachers, etc. involved).

The ‘history’ of the program or its maturity or stage of development – for example the number of years it has been running.

The stated goals, targets, KPIs or other measures.

A listing of the key stakeholders.

Generally, a program logic model should be included in this section. The model should show the inputs, activities, outputs and outcomes (generally as short, medium and long term outcomes). The logic model may also include influences and assumptions. Logic models are generally depicted diagrammatically. A number of logic model templates are available. If a ‘diagrammatic’ logic model is not included, there should be a text based equivalent that fully describes the program’s theory and the logical links that lead to the planned outcomes. This section (context/background) might also include or reference, where available:

Previous evaluations or research.

A literature review.

It should be remembered that the program to be evaluated will be unfamiliar to most potential evaluators. This section has to “bring them, succinctly, up to speed” on what the program is all about – its why, how and when!

14 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

2 Purpose

The specification must include a statement of the purpose of the evaluation – that is, ‘why is the evaluation being conducted?’ The objective/s of the evaluation (as distinct from those of the program) should also be clearly stated. Davidson (2005) suggests that the main purpose/s of an evaluation are generally:

• to determine the overall quality or value of a program or initiative;

• to find areas for improvement; or

• both of the above.

The key (decision making) audience for the evaluation report should also be stated (for example, the Chief Executive.)

A description of how the results will be used might also be appropriate for this sub-section (for example, “the results of the evaluation will be used to inform decisions about whether the program should continue in the future”).

3 Scope

The scope defines “the depth and breadth of the evaluation”. In general, the broader the evaluation, the less depth will be covered – that is, while a broad evaluation may cover a lot of ground, results might be superficial or not specific. Conversely, the greater the depth of investigation, the narrower the focus might be.

The scope might also define from when evaluation consideration should begin (in time) and/or localities or regions, types of school or other demographic considerations. Where it is important that all types of location, eg, metropolitan, outer-metro, rural, regional, remote, etc., this should be made clear and distinguished from a scope that is to specifically include all regions. There may be significant cost implications for the scope required. This is why it is important that these are made explicit in the specification.

What is not in scope must also be made clear.

The specification cannot state the department’s budget for the evaluation, so the ‘scope’ is an opportunity for describing the extent of what is expected – that is, to give prospective tenderers a ‘sense’ of whether to price a garden shed or a mansion (or whatever between).

4 Key evaluation questions

Defining very clear explicit key questions that must be answered by the evaluation contractor is absolutely core to the specification for an evaluation. What, specifically, do we (DECD) want the evaluation to tell us? That is, what are the big picture question/s for which we need answers? Typically there will be two to five key questions. Additional sub-questions may be defined under each key question.

Defining very clear explicit key questions that must be answered by the evaluation contractor is absolutely core to the specification for an evaluation.

Generalised questions (which can be re-framed for particular programs) include:

What differences are we making for children? (Or, for what children in what situations?)

15 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

What specific outcomes for students have been achieved as a result of program x? (improved NAPLAN scores, attendance, SACE completions, etc.)

Is program x the most cost effective means for achieving the planned outcomes? Was program x executed as planned?

How sustainable is program x (under various budget scenarios)?

5 Governance

The support that will be (and, significantly, will not be) provided by the department should be made absolutely explicit.

That a contact person will (or will not) be available should be clarified and arrangements for liaison with steering or other groups made explicit. This might include responsibility for making arrangements for meetings and, for example, interim reporting and frequency.

The requirements for contact with schools must also be made explicit and should include the requirements of the Conducting Research and Evaluation in DECD Procedure, where the evaluation requires involvement of students, children and young people.

If not detailed elsewhere the ownership of data, evaluation outcomes and/or intellectual property must be clarified. For example, rights to publish papers based on the evaluation and data obtained from it must be made explicit.

6 Data specification

DECD has significant data holdings that can inform and support evaluations. The department also has an open information policy that makes it clear that these data sets are a public asset and are freely available for planning and evaluation purposes.

The terms of reference should make it clear that appropriate data sets (eg, enrolments, NAPLAN results, HR/staffing demographics, student demographics, etc.) and data elements will be made available to the evaluator. Any limitations associated with the data sets should also be made clear.

The Business Intelligence unit in the Office for Strategy and Performance is available to assist program managers to identify relevant data sets.

Potential evaluators should also be made aware of all previous studies such as evaluations, reviews or research.

7 Deliverables and time frame

The terms of reference must make clear all required deliverables. These would almost certainly include a final report but might also include interim reports and/or presentations.

It is essential that the time frame for delivery of all reports is absolutely specific.

The form (electronic and/or hard copy) in which reports should be presented should also be made clear. Similarly, to whom they are to be presented.

It should be noted that it is common practice for a draft final report to be presented to the department to give the opportunity for feedback or clarification. The time frame for the department to comment on the draft report should also be made explicit.

16 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Clearly, if other artefacts are required to be delivered as part of the evaluation (‘short’ report version, PowerPoint summary, draft policies, guidelines, professional development sessions, draft media releases, etc) the requirements for these, and timelines, should be made explicit.

8 Methodology

Unless there is some overwhelming reason, the terms of reference should not mandate the methodology – that is the area of expertise of the evaluator. To mandate a particular procedure (eg survey methodologies or focus groups) may restrict the evaluator which, in turn, might lead to a less informative report. (This is why the key evaluation questions and the ‘purpose’ of the evaluation are extremely important.)

However, if the department requires there to be, for example, regular meetings (say, with a reference group) this must be stated so that the evaluator can factor this in to costings.

17 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Appendix 2: Standards for Quality Evaluation

The SA Public Service High Performance Framework (HPF) and DECD Improvement and Accountability Policy requires DECD staff to undertake robust analysis of performance and the evaluation of practices, programs and improvement strategies to inform future directions.

Evaluation involves the determination of the merit, worth, value, quality and usefulness of a program, policy, service or any other intervention or initiative to assess an aim and objective. It is an essential organisational practice that provides a valuable knowledge base for planning and implementing programs and services, enables practitioners to meet accountability requirements and to more systematically document, share and communicate effective practices. Quality evaluations often encompass common criteria and standards that can be identified to assist individuals and the department in ensuring that the evaluations that take place are rigorous and of sufficient quality. For the purposes of this document a standard refers to an agreed set of expectations about a process or practice. Standards for evaluation are needed to ensure that staff are provided with the relevant and necessary information to enable them to assess the quality and rigor of an evaluation, in addition to providing guidance and focus in the planning, conduct and review of evaluations. The criteria that apply to any evaluation are typically determined by the purpose, scope and scale of the evaluation. These standards are underpinned by a body of evaluation research, effective practice and other evaluation standards, including the standards of the American Evaluation Association (AEA) that have been endorsed by the board of the Australasian Evaluation Society (AES).

Standard 1 Effective management of the evaluation process

Criteria include:

Processes have clear, known sequence of actions, roles and responsibilities, and identified time- frames and end points for each evaluation activity;

Planned and implemented effectively, including acquiring the services or support of appropriately skilled, responsible and qualified personnel;

Clear documentation of why the program or strategy is needed, underpinning research or theory, who is the target audience,

Fiscal responsibility, where appropriate (i.e. evaluations should account for all expended resources);

Consideration is given to the context surrounding the evaluation including competing demands between national, departmental and local needs and priorities;

Any potential risks are identified and managed, including conflicts of interest individual, and/or otherwise (e.g. funding).

Standard 2 The evaluation is strategically, purposefully and robustly designed

Criteria include:

Clearly articulates the purposes of the evaluation and describes the objectives of the program or service in terms that are measurable or quantifiable ;

18 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Formulates a set of key evaluation questions or hypotheses;

The relationship between the goals, inputs, activities/strategies undertaken, outputs and any short- term, medium-term or long-term outcomes proposed for the program or service being evaluated are clearly described;

Describes what evidence (literature, past evaluations etc.) underpins the rationale for the evaluation;

Uses an evaluation methodology that is appropriate to the evaluation purpose (e.g. experimental, comparison group, qualitative, formative/summative, cost-benefit analysis, pre/post test)

Incorporates due consideration of the costs and available resources (e.g. funding, skills, time), including details of how the investment of resources in evaluation will provide a benefit;

Describes the degree of program integrity by identifying whether the program, practice or strategy has been delivered according to how it was intended and on a consistent basis (e.g. over time periods or multiple sites, if relevant);

Identifies the benefits (or otherwise) to children and young people with clearly articulated links to emerging priority areas, educational goals and/or care, safety, wellbeing and learning outcomes of children, young people and their families.

Understands the role of independent, external professional or academic partners to broaden and ensure evaluation processes are robustly and objectively applied.

Standard 3 The evaluation is ethically conducted

Criteria include:

Maintains ethical standards and codes of conduct, including exercising informed judgements in applying evaluation processes in all contexts and dealings with stakeholders;

Understands the relevant legislative, administrative and organisational policies and processes required

Establishes and maintains respectful collaborative relationships with providers of information,

including respecting the values, culture and diversity of stakeholder views, ensuring the confidentiality of sensitive information and minimising any administrative and response burden (e.g. utilises existing data sources where data is available and relevant).

Standard 4 The evaluation produces valid and reliable findings

Criteria include:

Collecting quality data to achieve valid, reliable, rigorous and contextual evaluative information, based on valid and reliable measures (often multiple) of quantitative and qualitative evidence;

Selects from a range of appropriate data collection methods that best support timely, representative, inclusive and accurate data to be collected (e.g. surveys, interviews, focus groups, observations, assessment/screening tools etc.);

19 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Selects from a range of appropriate data analysis techniques that adequately answers the evaluation questions and justifies conclusions drawn.

Standard 5 The evaluation is responsive, communicated and used

Criteria include:

Reports are constructed clearly and accurately to target audiences in a timely manner, and discuss any strengths, limitations or unexpected outcomes of the evaluation;

Conclusions and findings are clearly linked to the data collected and analysed as part of the evaluation;

The main implications for the findings are discussed in the report, and where applicable offer recommendations for policy and practice, including the cost-efficacy of the program;

Engages opportunities with stakeholders, decision makers and intended users aimed at sharing and applying the evaluation (e.g. publish findings, formal briefings, collaborative discussion etc.).

20 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Appendix 3: Guidance Notes for Assessing the Strength of Research and Evaluation Evidence for a Single Research and Evaluation Reports

2

Additional guidance notes for assessing the strength of the evidence is described in Appendix 2 “Reporting on Research and Evaluation Evidence, Findings and Recommendations within DECD Guidelines”

Describing a single study

Single studies should be described and categorised as follows: i. by type ii. by design iii. by method.

Type of research

Research and evaluation studies should be categorised by overarching type as follows:

i. Primary research studies empirically observe a phenomenon at first hand, collecting,

analysing or presenting ‘raw’ data. ii. Secondary review studies interrogate primary research studies, summarising and

interrogating their data and findings. iii. Theoretical or conceptual studies: most studies (primary and secondary) include some

discussion of theory, but some focus almost exclusively on the construction of new theories rather than generating, or synthesising empirical data.

Research Designs, Research Methods

A research design is a framework in which a research study is undertaken. It employs one or more research methods to:

i. collect data

ii. analyse data.

Data collection can be either quantitative or qualitative.

Data analysis methods can also be quantitative (using mathematical techniques to illustrate data or

explore causal relationships) or qualitative (collating ‘rich’ data and inferring meaning).

The line between quantitative and qualitative research is blurred by mixed method designs. Mixed methods may involve the quantitative analysis of qualitative data or the interrogation of quantitative data through a qualitative lens.6 In that sense, different research designs and methods can be ‘nested’ as part of a flexible methodological approach to a research question.

2 This protocol is based largely on that of the UK Department of International Development’s 2014 Guidance Note on Assessing the Strength of Evidence https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/291982/HTN-strength-evidence- march2014.pdf

21 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Some designs are better suited for demonstrating the presence of a causal relationship, others are more appropriate for explaining such causal relationships while some designs are more useful for describing political, social and environmental contexts.

Primary research studies tend to employ one of the following research designs. As noted above, they

may employ more than one research method.

i. Experimental research designs (also called ‘intervention designs’, ‘randomized designs’ and Randomised Control Trials [RCTs]) have two key features. First, they manipulate an

independent variable (for example, the researchers administer a treatment, like giving a drug to a person, or fertilizing crops in a field). Second, and crucially, they randomly assign subjects to

treatment groups (also called intervention groups) and to control groups. Depending on the group to which the subject is randomly assigned, they will/will not get the treatment.

The two key features of experimental studies increase the chances that any effect recorded after

the administration of the treatment is a direct result of that treatment (and not as a result of pre‐ existing differences between the subjects who did/did not receive it). Experimental research designs use quantitative analysis (often ‘descriptive statistics’ followed by ‘inferential statistics’). The combination of random assignment and quantitative analysis enables the construction of a robust ‘counter‐factual’ argument (i.e. “this is what would have happened in the absence of the

intervention or treatment”). Such designs are useful for demonstrating the presence, and size of causal linkages (e.g. “a causes b”) with a high degree of confidence.

ii. Quasi‐Experimental research designs typically include one, but not both of the key features of an experimental design. A quasi‐experiment might involve the manipulation of an independent variable (e.g. the administration of a drug to a group of patients), but participants will not be randomly assigned to treatment or control groups. In the second type of quasi‐experiment, it is

the manipulation of the independent variable that is absent. For example, researchers might seek to explore the impact of the awards of scholarships on student attainment, but it would be unethical to deliberately manipulate such an intervention. Instead, the researchers exploit other naturally occurring features of the subject groups to control for (i.e. eliminate) differences between subjects in the study (i.e. they ‘simulate’ randomisation). A regression‐discontinuity

design is an example of a quasi-experiment.

iii. Observational (sometimes called ‘non‐experimental’) research designs display neither of the key features of experimental designs. They may be concerned with the effect of a treatment (e.g. a drug, a herbicide) on a particular subject sample group, but the researcher does not deliberately manipulate the intervention, and does not assign subjects to treatment or control groups. Instead, the researchers are merely an observer of a particular action, activity or phenomena. There are a range of methods that can be deployed within observational research designs:

A variety of observational methods use quantitative data collection and data analysis techniques to infer causal relationships between phenomena: for example, cohort and/or longitudinal designs; case control designs; cross‐sectional designs (supplemented by quantitative data analysis) and large‐n surveys are all types of observational research.

Interviews, focus groups, case studies, historical analyses, ethnographies, political economy analysis are also all forms of observational research design, usually relying more on qualitative methods to gain rich understanding of the perspectives of people and communities. When such studies are underpinned by structured design frameworks that enable their repetition in multiple contexts, they can form a powerful basis for comparative research.

22 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

Secondary review studies tend to employ one of the following research designs:

i. Systematic Review designs adopt exhaustive, systematic methods to search for literature on

a given topic. They interrogate multiple databases and search bibliographies for references. They screen the studies identified for relevance, appraise for quality (on the basis of the research design, methods and the rigour with which these were applied), and synthesise the findings using formal quantitative or qualitative methods. DECD Systematic Reviews are always labelled as such. They represent a robust, high quality technique for evidence synthesis. Even Systematic Reviews must demonstrate that they have compared ‘like with like’ studies.

ii. Non‐Systematic Review designs also summarise or synthesise literature on a given topic. Some non‐systematic reviews will borrow some systematic techniques for searching for and

appraising research studies and will generate rigorous findings, but many will not.

iii. Theoretical or conceptual research studies may adopt structured designs and methods, but they do not generate empirical evidence. Theoretical or conceptual research may be useful in designing policy or programmes and in interrogating underlying assumptions and empirical studies, but should not be referred to as ‘evidence’. Nor should existing policy papers or institutional literature.

Why categorise studies by type, design & method?

The different types of study, different designs and methods outlined above are more or less appropriate for answering different types of research question. Categorising studies by type provides the reader with an initial, general understanding of how the study’s findings were produced, and helps them begin to make some general judgements about the appropriateness of the design for the research question. The following descriptors should be used to describe single research studies by type:

Research Type Research Design

Primary (P) Experimental (EXP) + state method used

Quasi‐Experimental (QEX) + state method

Observational (OBS) + state method used

Secondary (S) Systematic Review (SR)

Other Review (OR)

Theoretical or Conceptual (TC) N/A

Assessing the quality of single studies

Following the description of a single, primary research study by type, design and method the reviewer or user should aim to consider its quality (or the degree to which it minimises the risk of bias). Although

this is not a trivial exercise, there are some general rules of thumb that all staff will be able to apply.

When assessing the credibility of a study, the reviewer is looking principally to assess the quality of the study in its own right and its appropriateness for answering the research question posed by the author of the study. An assessment of the relevance or applicability of the study to a specific policy question or

business case is an important, but separate, part of evidence synthesis, which is covered later in these guidelines.

Proxies for quality: journal rankings

Journal ranking systems can provide an indicative, though contested proxy guide to the scrutiny with which an academic study has been subjected prior to publication. The principal journal ranking system is the ‘Impact Factor’ rating. Journals often publish their Impact Factor ranking somewhere on their

23 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

website. An alternative is the ‘H‐Index’, which ranks individual academics according to productivity and impact. However, not all well‐designed and robustly applied research is to be found in peer reviewed journals and not all studies in peer‐reviewed journals are of high quality. Journal rankings do not always include publications from southern academic organisations or those that feature in online journals, so a broad and inclusive approach is required to capture all relevant studies. Both ‘Impact Factor’ and ‘H‐index’ scores may give an incomplete picture of academic quality.

Principles of high quality research studies

The following principles of credible research enquiry are relevant to all research studies. Reviewers of any research literature will have to think carefully about how exactly to apply these principles depending on the nature of the study. Assessors of studies will always have to make a judgement about study quality based on a combination of the following criteria. It is taken as read that research has been ethically conducted.

i. Conceptual framing: high quality studies acknowledge existing research or theory. They

make clear how their analysis sits within the context of existing work. They typically construct a conceptual or theoretical framework, which sets out their major assumptions, and describes how they think about the issue at hand. High quality studies pose specific research questions and may investigate specific hypotheses.

ii. Transparency: High quality studies are transparent about the design and methods that they employ, the data that has been gathered and analysed, and the location/geography in which that data was gathered. This allows for the study results to be reproduced by other researchers, or modified with alternative formulations. Failure to disclose the data and code on which analysis is based raises major questions over the credibility of the research.

Transparency includes openness about any funding behind a study: research conducted with support from a party with vested interests (e.g. a drug company) may be less credible than that conducted independently.

iii. Appropriateness There are three main types of research design (see above), and many

types of methods. Some designs and methods are more appropriate for some types of research exercise than others. Typically, experimental research designs tend to be more appropriate for identifying, with confidence, the presence of causal linkages between observable phenomena. The implementation of an experimental design is not, in itself, a sign

of good quality. The diverse array of observational (or ‘non‐experimental’ designs) may be more appropriate for questions that either cannot be explored through experimental designs due to ethical or practical considerations, or for the investigation of perspectives, people and behaviours that lie at the heart of most development processes.

iv. Cultural sensitivity: Even research designs that appear well‐suited to answering the

question at hand may generate findings that are not credible if they fail to consider local, cultural factors that might affect any behaviours and trends observed. For example, take a study that investigates efforts to boost girls’ enrolment rates at schools in a religiously conservative country. If the study fails to explicitly consider the socio‐cultural factors which

influence parental support for girls’ education, it is likely to miss the real reasons why the intervention worked or didn’t work. High quality studies will demonstrate that they have taken adequate steps to consider the effect of local cultural dynamics on their research, or on a development intervention.

v. Validity: There are four principal types of validity. Measurement validity: Many studies seek to measure something: e.g. agricultural

productivity, climate change, health. Measurement validity relates to whether or not the specific indicator chosen to measure a concept is well suited to measuring it. For example, is income a valid measure of family welfare, or are specific measures of individual health and

24 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

happiness more appropriate? Identifying valid measures is especially challenging and important in international development research: just because an indicator is a valid measure in one country or region does not mean it will be equally valid in another.

Internal validity: Some research is concerned with exploring the effect of one (independent)

variable on another (dependent) variable. It can do so using a range of designs and

methods. As described above, some designs and methods (e.g. experiments and quasi‐ experiments) are better able than others to determine such cause and effect linkages: they will minimise the possibility that some ‘confounding’, unseen variable is affecting changes in the dependent variable, and consequently they are said to demonstrate strong internal validity. Take the example of a study that explores the relationship between levels of corruption and firm efficiency. An internally valid study would employ a technique capable of demonstrating that corruption does indeed cause firms to become more inefficient. A study lacking in internal validity, on the other hand, might employs a technique which leaves open the possibility of reverse causality: i.e. that a firm is actually more likely to engage in corrupt behaviours because it is inefficient, and to compensate for its inefficiency.

External validity: This describes the extent to which the findings of a study are likely to be

replicable across multiple contexts. Do they apply only to the subjects investigated during this particular study, or are they likely to apply to a wider population/country group? Quantitative researchers typically seek to address issues of external validity by constructing

‘representative samples’ (i.e. groups of subjects that are representative of a wider community/society).

Ecological validity: this dimension of validity relates to the degree to which any research is

really able to capture or accurately represent the real world, and to do so without the research itself somehow impacting upon the subjects it seeks to study. Any time a researcher carries out an investigation in the field (asking questions, measuring something), s/he introduces something artificial into that context. Ecologically valid studies will explicitly consider how far the research findings may have been biased by the activity of doing research itself. Such consideration is sometimes referred to as ‘reflexivity.’

vi. Reliability: Three types of reliability are explored here.

Stability: if validity is about measuring the right ‘thing’, then stability is about measuring it

‘right’. Assume that a study seeks to investigate the health of newborn children. Assume that

‘birth weight’ is a valid measure. For birth weight to be measured reliably, the investigator will require a reliable instrument (e.g. accurate weighing scales) with which to gather data. Alternatively, consider data which is gathered on the basis of questionnaires or interviews being conducted by multiple researchers: what steps, if any, have been taken to ensure that the researchers are consistent in the way they ask questions and gather data?

Internal reliability: many concepts can be measured using multiple indicators, scales and indices. For example, corruption could be measured by recorded incidence of embezzlement from public sector organisations, and with the use of a corruption perceptions index. If very

significant discrepancies exist between indicators (e.g. if a country appears to experience low levels of corruption when embezzlement is measured, but high levels of corruption when perceptions are explored), then the internal reliability of one or other of the measures is open to question. High quality research will consider such issues, with specific attention to whether

or not particular measures are well‐suited to the cultural context in which they are taken.

Analytical reliability: the findings of a research study are open to question if the application of a different analytical technique (or ‘specification’) to the same set of data produces dramatically different results.

25 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

vii. Cogency: A high quality study will provide a clear, logical thread that runs through the entire

paper. This will link the conceptual (theoretical) framework to the data and analysis, and, in turn, to the conclusions. High quality studies will signpost the reader through the different sections of the paper, and avoid making claims in their conclusions that are not clearly backed up by the data and findings. High quality studies will also be self‐critical, identifying

limitations of the work, or exploring alternative interpretations of the analysis. A really rigorous review of the evidence on a given topic should give due consideration to all seven of these aspects of study quality. It is possible to construct checklists, or scorecards to grade evidence based on these criteria, and it is expected that DECD Evidence Papers will do so.

Principles of Research Quality

Quality Principles

Associated questions

Conceptual framing

Does the study acknowledge existing research?

Does the study construct a conceptual framework?

Does the study pose a research question or outline a hypothesis?

Transparency Does the study present or link to the raw data it analyses?

What is the geography/context in which the study was conducted?

Does the study declare sources of support/funding?

Appropriateness Does the study identify a research design?

Does the study identify a research method?

Does the study demonstrate why the chosen design and method are well suited to the research question?

Cultural sensitivity

Does the study explicitly consider any context‐specific cultural factors that may bias the analysis/findings?

Validity To what extent does the study demonstrate measurement validity?

To what extent is the study internally valid?

To what extent is the study externally valid?

To what extent is the study ecologically valid?

Reliability To what extent are the measures used in the study stable?

To what extent are the measures used in the study internally reliable?

To what extent are the findings likely to be sensitive/changeable depending on the analytical technique used?

Cogency Does the author ‘signpost’ the reader throughout?

To what extent does the author consider the study’s limitations and/or alternative interpretations of the analysis?

Are the conclusions clearly based on the study’s results?

The following descriptors should be used when assessing the quality of single research studies. Assignment of a particular ‘grade’ to a study is a matter of judgement for the reviewer. It should be based on consideration of each of the criteria outlined above to ensure consistency of approach across studies.

Study quality

Definition

High Comprehensively addresses multiple principles of quality.

Moderate Some deficiencies in attention to principles of quality.

Low Major deficiencies in attention to principles of quality.

Principles in practice

Reviewers must be prepared to defend their assessment based on the quality criteria spelled out and are advised to keep a record of their observations on the following aspects of a study to demonstrate the

26 | Initiating and Sponsoring Research and Evaluation Guideline | 22 March 2016

basis of their assessment. Where many studies are reviewed by different analysts, this is particularly

important to enable inter‐rater reliability (i.e. that assessments of studies are more or less stable across multiple reviewers).

Those citing evidence should not confuse studies which present “evidence of no effect” (i.e. they actually show that ‘x’ has no effect on ‘y’) and those which “find no evidence for an effect” (which means that there may be an effect of ‘x’ on ‘y’, but it hasn’t been isolated in the current study).