InnoSI Case Study Research and Evaluation Guide (Work...

InnoSI

Case Study Research and Evaluation Guide

(Work package 4)

Sue Baines

Chris Fox

Jessica Ozan

Florian Sipos

2

01

02

03

04

05

06

07

08

09

Introduction p.3

Evaluation framework overview p.6

Case study evaluation resources p.16

Needs assessment p.18

Theories of change p.22

Process evaluation p.25

Impact evaluation p.34

Economic evaluation p.46

Bibliography p.57

Contents

3

1. Introduction

1.1 Aim and objectives of Work Package 4

The aim of Work Package 4 is, through a number of case studies, to document and

evaluate a wide range of innovative approaches to delivering social investment policy at

a regional or local level. The specific objectives are:

1. Identify and evaluate innovative and strategic approaches to social welfare

reform at the regional and local level

2. For each approach identified evaluate the distribution of the policy, social and

managerial roles between public, private and third sectors

3. For each approach identified evaluate the legal framework used

4. For each approach identified evaluate the interaction and complementarity with

broader social welfare policies in the medium to long term

5. For each approach identified evaluate the social outcomes, social returns and

effectiveness of interventions for the various actors, contributors and

beneficiaries concerned

6. For each approach identified evaluate the social and psychological impact of

social welfare reform on individuals and communities, including the ways

individuals’ sense of identity is shaped by their interactions with welfare policy

and its reform (including gender and generational issues)

7. For each approach identified evaluate whether, from the perspective of

recipients, policy initiatives strengthen or weaken the public sphere

A key point to note is that all objectives are evaluative.

1.2 What we committed to in the bid

In the proposal we described how, for each case study we would collect and analyse:

a range of policy/programme documentation supplemented by key informant

interviews and a range of secondary (administrative) data covering

development, implementation and delivery of the policy/programme and its

financial and non-financial outcomes

primary quantitative data (e.g. small scale surveys where data is limited)

4

designed to explore the use of social innovation and the wider social value

delivered by the policy/programme

primary qualitative data (e.g. interviews and/or focus groups) with recipients

and potential recipients of the social welfare reform and groups that represent

them

analysis of media coverage of the social welfare reform being evaluated

In 10 case study areas we will recruit Community Reporters who will provide

additional, rich qualitative data (WP5).

In the bid we described how analysis of the data would allow us to:

describe the innovative elements of the programme/reform

understand the context (including regional context) of the programme/reform

describe the implementation process

identify the impact of policy/reform on key outcomes both financial, non-

financial and social value

explore the social and psychological impact of welfare reform on individuals and

communities

Key points to note in this description are: first, that we have committed to gathering

empirical data in each case study site, not just relying on existing documentation or

secondary data; secondly, that empirical data collection extends to welfare recipients

and other effected communities; and thirdly, that data from the community report

programme should supplement the work of the research partner, not be a substitute for

it.

1.3 Outputs from each case study

Three deliverables are specified in the Description of Work and are as follows.

WP4 Deliverables

D4.1 : Selection of case studies

5

This will be a short report confirming the final selection of 20 case studies. A short

description of each case study will be included. This report will be prepared by MMU

based on the information that partners have supplied.

Submission: January 2016

D4.2 : Evaluation report

This will be an evaluation report on each case study. A standard template is included in

Appendix A that must be completed. It contains sections on literature review, needs

analysis, implementation evaluation, impact evaluation and economic evaluation (see

below for more details). A report on each case study will be written by the relevant

academic using the standard template. These reports will then be reviewed by MMU

who will compile them into a single evaluation report and submit the report to the

Commission.

Submission: Submission is October 2016. Therefore populated templates must reach

MMU by 15th October. MMU will then review templates and integrate them into a

report by 31st October. It is important that partners are prepared to respond to

reviewer comments during October.

D4.3 : A synthesis of findings

A synthesis of findings from the 20 case studies. The synthesis report will be written by

MMU and Debrecen.

Submission is December 2016

In addition MMU require the following additional outputs from each case study. These

are as follows.

Additional required case study outputs

D.WP4A Literature review

6

Two literature reviews, one to support each case study. These should document both

policy and research literature relevant to the case study. Some relevant material will

already have been submitted as part of WP2. Further guidance is provided below.

Submission to MMU: 29th February 2016

D.WP4B Evaluation framework

A description of the evaluation framework that will be used for each case study. This

guidance document makes suggestions about the likely components of the evaluation

framework.

Submission to MMU: 29th February 2016

D.WP4C Interim Report

One interim report for each case study covering the needs analysis and implementation

evaluation.

Submission to MMU: 1st July 2016

Revised versions of each of these additional outputs will also feature in the Case Study

Template in Appendix A and so all ultimately contribute to D4.2.

1.4 Resources

Academic partners will typically have 110-130 days of researcher time per case study.

2. Evaluation framework: Overview

Evaluation is at the heart of each case study. There is significant variation between case

studies. They focus on different aspects of welfare, different types of recipients and are

located in widely varying regions with distinct social, economic, political and cultural

contexts. It is therefore not possible to specify a ‘standard’ evaluation methodology to

be implemented in all case studies. However, all case studies should contain some

7

common elements and these are set out in this section. We therefore start this section

by distinguishing some broad approaches to evaluation, which are likely to be relevant

to developing an evaluation framework for WP4 case studies. These are:

formative and summative evaluation;

impact and process evaluation; and

economic evaluation.

We go on to highlight some considerations likely to shape the evaluation framework,

particularly in relation to evaluation cycles. We then describe the elements of

evaluation that should be included in each

Literature review

Needs assessment

Programme theory

Process evaluation

Impact evaluation

Economic evaluation

A more detailed set of evaluation resources that describe each of these elements are

included in a separate document.

2.1 Formative and summative evaluation

Scriven (1967) makes a distinction between formative and summative evaluation,

which Lincoln and Guba (1986) suggest are, broadly speaking, aims of evaluation:

The aim of formative evaluation is to provide descriptive and judgmental

information, leading to refinement, improvement, alterations, and/or

modification in the evaluand, while the aim of summative evaluation is to

determine its impacts, outcomes, or results. (Lincoln and Guba 1986: 550)

Given the objectives for WP4 we suggest that evaluation frameworks for each case

study should include both formative and summative elements.

8

2.2 Process (implementation) and impact (outcome) evaluation

Another common distinction in the evaluation world is between process (sometimes

referred to as implementation) and impact (sometimes referred to as outcome)

evaluation. Typical questions addressed in an impact evaluation might include (based

on HM Treasury 2013)

What were the policy, programme or project outcomes?

Did the policy, programme or project achieve its stated objectives?

Were there any observed changes, and if so big were the changes and how much

could be said to have been caused by the policy, programme or project as

opposed to other factors?

How did any changes vary across different individuals, stakeholders, sections of

society and so on, and how did they compare with what was anticipated?

Did any outcomes occur which were not originally intended, and if so, what and

how significant were they?

Process evaluation answers the question ‘how was the policy, programme or project

delivered’ (HM Treasury 2013) or the ‘what is going on’ question (Robson 2011).

Impact evaluation therefore looks similar to summative evaluation and process

evaluation looks similar to formative evaluation, but there are distinctions. For example,

a summative evaluation occurs at the end of a programme, whereas an impact

evaluation need not necessarily do so.

2.3 Economic evaluation

A summative or outcome evaluation might demonstrate the impact of a policy,

programme or project but will not on its own show whether those outcomes justified

the investment (HM Treasury 2013). Evaluators may ask (based on Dhiri and Brand

1999):

What was the true cost of an intervention?

Did the outcome(s) achieved justify the investment of resources?

Was this the most efficient way of realising the desired outcome(s) or could the

9

same outcome(s) have been achieved at a lower cost through an alternate course

of action?

How should additional resources be spent?

Given the focus in InnoSi on understanding the value of innovative forms of funding

social investment case study evaluation frameworks are likely to include an economic

component.

2.4 Evaluation cycle

Different case studies are examining policies and programmes that are at different

stages of development. The evaluation framework will need to relate to the stage in the

policy or programme life-cycle. The Public Sector Transformation Network (2014)

identify a series of stages in a project or programme life-cycle: development and design;

implementation; delivery; and scaling-up. They suggest that different types of

evaluation are likely to be relevant at different stages in this life-cyle, as illustrated in

Figure 2.1.

9

The evaluation cycle Evaluation should not be considered a stand-alone activity. It should be thought of as a set of linked tasks that are undertaken from the start and throughout a period of transformation. The diagram below describes the evaluation cycle linked to the stages of service delivery and type of evaluation. Figure: 1.1

2.2 Evaluation preparation

Local providers developing service transformation proposals should factor in evaluation right from the beginning of the service design and development stage. This will enable better service delivery as it will enable projects or programmes to demonstrate how and why a service change can achieve intended outcomes. A useful tool that can help both the initial service development and evaluation planning stages is a transformation logic model, also known as ‘intervention logic’ or ‘programme theory’. A transformation logic model describes a transformation programme in a diagram and a few simple words. It shows a causal connection between the need identified, the intervention to address it and the impact on individuals and local communities. A transformation logic model is a useful framework to identify outputs and outcomes that will feed into the evaluation. Ideally, this should be developed to support the design of the policy or service intervention. If a transformation logic model does exist it should be tested to help hone the evaluation process. If one does not exist, now is a good time to develop one. This should not be a technical exercise, but needs to involve those that are designing and implementing the service intervention. A logic model can be used internally as a tool for supporting the monitoring and evaluating projects, and externally as a way of summarising the project’s overall purpose for partners

Process Evaluation

Impact

Evaluation

Del

iver

y

Implementation

Develo

pm

ent

and

Design

Scaling up

Logic

Model

Evaluation Plan

Baseline

Testing

Regular monitoring,continuous learning and improvement

Cost Benefit Analysis

10

Figure 2.1: Evaluation through the life-cycle of a policy or programme (Public Sector

Transformation Network 2014: Figure 1.1)

Evaluation during development and design of a policy or programme

If the policy or programme selected for the case study is in its early stages evaluation

activities might include looking at the needs that a programme is intended to address

(needs assessment) (Rossi et al. 2004); exploration of the logic model or theory of

change upon which the programme is based (discussed in more detail below).

Evaluation activity might also include ex ante economic evaluation where, before a

programme is implemented a mixture of available empirical data, evidence from the

wider literature and assumptions are used to model the likely efficiency of the

intervention.

Evaluation during implementation of a programme

In the early stages of a new programme, evaluation might involve developing an

understanding of the theories of change (see below for more detail) or undertaking a

formative, process evaluation to inform programme improvement (see below for more

detail).

A common challenge in the evaluation world is attempting to undertake impact

evaluation too early in the programme cycle. Evaluating the outcomes of a programme

when it is still in a state of flux is problematic. In technical terms this presents a

challenge to ‘construct validity’. Construct validity refers to how well a measure

conforms to theoretical expectations (Punch 2014) and is discussed in more detail

below.

Evaluation of an established programme

Both formative and summative evaluations might be undertaken on an established

programme. Often the need at this stage in the programme life-cycle is for summative

evaluation to understand what impact, if any, the programme has had. Economic

evaluation to assess the efficiency of the programme may well follow.

Formative evaluation can still be of relevance, particularly if the programme in question

may be scaled-up in the future and insight into its replicability is needed.

11

2.5 An evaluation framework for WP4 case studies

Taking account of the considerations set out above we suggest that the evaluation

framework for WP4 case studies will always include the following elements:

Literature review: Covering key policy and research literature relating to the

policy or programme that is the subject of the case study. This could include

previous evaluations of the same or similar programmes. Policy documents that

help explain the development of the policy or programme. Academic papers that

analyse the policy area.

Needs assessment: Social programmes exist to alleviate a social problem (Rossi

et al. 2004) and a needs assessment assesses the nature, magnitude and

distribution of the social problem and the extent to which there is a need for the

intervention (Rossi et al. 2004).

Programme theory: Evaluations often start with an assessment of the logic

model or theory of change that underpins it. This programme theory may not be

set out explicitly during the design of the programme. During an assessment of

programme theory evaluators as questions about the way a programme is

conceptualized and designed (Rossi et al. 2004). In relation to the InnoSi case

studies this will entail not only describing the mechanisms for implementing the

particular project or programme being studied, but also elaborating the policies

that inform the project / programme and the value sets or ideologies behind the

policies.

Process evaluation: A process or implementation evaluation examines whether

and how the programme was implemented and run. Even with a plausible theory

about how to intervene a programme must still be implemented well to have a

reasonable chance of making an impact. The main issues process evaluation will

concentrate on are: the distribution of the policy, social and managerial roles

between public, private and third sectors; evaluate the legal framework used;

and the interaction and complementarity with broader social welfare policies.

In addition, evaluation frameworks will consider the following, but will need to tailor

the approach according to relevant context

12

Impact evaluation: Effective implementation doesn’t guarantee that the

programme has the desired impact. An impact evaluation asks whether the

desired impact was achieved and whether there were unintended side effects

(Rossi et al. 2004). Different impact evaluation designs are possible. (Quasi)

experimental designs are often favoured where the aim is to provide estimates of

effect that are most robust in terms of internal validity. But, such designs have

limitations: they assume that the intervention is fixed, focused on a narrow set of

well-defined outcomes and, that while the intervention may be ‘complicated’ it is

not ‘complex’. Where these assumptions don't hold alternative impact evaluation

designs are possible including theory-led designs, such as realist evaluation

(Pawson and Tilley 1997) and case-based designs (Byrne and Ragin 2009). More

detail is provided below.

Economic evaluation: Even if a programme has a positive effect on the target

population this does not guarantee that it is efficient. Some effective

programmes may incur costs that are high relative to their impact in comparison

to other alternatives (Rossi et al. 2004). An economic evaluation examines the

relationship between the programmes costs and its effectiveness and commonly

takes the form or either a Cost-effectiveness or Cost-Benefit Analysis. However,

the possibility of using such designs is dependent on the type of impact design

that is used, where in the implementation cycle the policy or programme is and

the resource available to the evaluation team. Therefore some case studies may

consider using a form of ex ante economic evaluation or alternative economic

evaluation models such as Social Return on Investment (SROI). More detail is

provided below.

The following table/figure can help you decide which approach is most relevant to your

case study.

13

Does the case study have an ongoing evaluation?

No = Explore all options

Yes = Is it an impact evaluation? No = Explore this option

Yes = Will the results or interim data be available before September 2016? No = Are control groups available? No = Use alternative impact evaluation designs

Yes = Use quasi experimental design

Yes = Are they combined with process data? No = See process evaluation

Yes = Talk with stakeholder to identify additional evaluation needs

Is it a process evaluation? No = Explore this option

Yes = Will the results or interim data be available before September 2016?

Is it an economic evaluation? No = Explore this option

Yes = Will the results or interim data be available before September 2016?

14

The table below suggests how the resource available for each case study might allocated

across different areas of the case study work.

Area of work Indicative days (total days 110 – 130)

Planning the case study 5 days

Literature review 15 days

Needs assessment 10 days

Programme theory 15 days

Process evaluation 25 days

Interim case study report 5 days

Impact evaluation 25 days

Economic evaluation 15 days

Final case study report 5 days

2.6 Quality assurance

The Work Package Leaders (MMU and Debrecen) have responsibility for the overall

delivery of the Work Package and the Deliverables.

In order to ensure that the individual case study reports are of a high and comparable

standard the following have been put in place:

The requirement to produce an evaluation framework by February 29th

(D.WP4B)

Either MMU or Debrecen will visit each academic partner between March and

June 2016 in order to spend one or two days with the case study research team

looking in detail at the work they are doing.

15

In order to make the synthesis report (D4.3) manageable and to ensure that all material

is of a high and comparable standard, various interim deliverables are requested. These

will allow MMU and Debrecen to start the synthesis during 2016 and avoid leaving all of

the synthesis until November/December 2016

16

3. INNOSI WP4 CASE STUDY EVALUATION RESOURCES

Sue Baines

Chris Fox

Robert Grimm

This guidance draws on Fox, C, Caldeira, R. and Grimm, R. Forthcoming, An Introduction

to Evaluation, London:Sage.

17

Introduction

This document is a resource for academic partners on the INNOSI project designing and

undertaking case studies of Social Investment policies and programmes as part of Work

Package 4.

Taking account of objectives of WP4 we suggest that the evaluation framework for WP4

case studies will always include the following elements:

Literature review: Covering key policy and research literature relating to the

policy or programme that is the subject of the case study. This could include

previous evaluations of the same or similar programmes. Policy documents that

help explain the development of the policy or programme. Academic papers that

analyse the policy area.

Needs assessment: Social programmes exist to alleviate a social problem (Rossi

et al. 2004) and a needs assessment assesses the nature, magnitude and

distribution of the social problem and the extent to which there is a need for the

intervention (Rossi et al. 2004).

Programme theory: Evaluations often start with an assessment of the logic

model or theory of change that underpins it. This programme theory may not be

set out explicitly during the design of the programme. During an assessment of

programme theory evaluators as questions about the way a programme is

conceptualized and designed (Rossi et al. 2004).

Process evaluation: A process or implementation evaluation examines whether

and how the programme was implemented and run. Even with a plausible theory

about how to intervene a programme must still be implemented well to have a

reasonable chance of making an impact. The main issues process evaluation will

concentrate on are: the distribution of the policy, social and managerial roles

between public, private and third sectors; evaluate the legal framework used;

and the interaction and complementarity with broader social welfare policies.

In addition, evaluation frameworks will consider the following, but will need to tailor

the approach according to relevant context

18

Impact evaluation: Effective implementation doesn’t guarantee that the

programme has the desired impact. An impact evaluation asks whether the

desired impact was achieved and whether there were unintended side effects

(Rossi et al. 2004). Different impact evaluation designs are possible. (Quasi)

experimental designs are often favoured where the aim is to provide estimates of

effect that are most robust in terms of internal validity. But, such designs have

limitations: they assume that the intervention is fixed, focused on a narrow set of

well-defined outcomes and, that while the intervention may be ‘complicated’ it is

not ‘complex’. Where these assumptions don't hold alternative impact evaluation

designs are possible including theory-led designs, such as realist evaluation

(Pawson and Tilley 1997) and case-based designs (Byrne and Ragin 2009). More

detail is provided below.

Economic evaluation: Even if a programme has a positive effect on the target

population this does not guarantee that it is efficient. Some effective

programmes may incur costs that are high relative to their impact in comparison

to other alternatives (Rossi et al. 2004). An economic evaluation examines the

relationship between the programmes costs and its effectiveness and commonly

takes the form or either a Cost-effectiveness or Cost-Benefit Analysis. However,

the possibility of using such designs is dependent on the type of impact design

that is used, where in the implementation cycle the policy or programme is and

the resource available to the evaluation team. Therefore some case studies may

consider using a form of ex ante economic evaluation or alternative economic

evaluation models such as Social Return on Investment (SROI). More detail is

provided below.

This document provides detailed evaluation resources to support academic partners as

they design their case studies.

4. Needs assessment

4.1 Introduction

As Rossi et al (2004: 102) note:

19

“Evaluation questions about the nature of a social problem that a program is

intended to alleviate are fundamental to the evaluation of that program.”

A needs assessment is the process by which an evaluator determines whether there is a

need for the policy or programme (Rossi et al. 2004). Needs assessment is important

because a programme cannot be effective at addressing a social problem if there is no

problem to begin with. Questions about the need for services might include (based in

part on Rossi et al. 2004):

What are the nature and magnitude of the problem to be addressed?

What are the characteristics of the population in need?

What are the needs of the population?

What services are needed?

How much service is needed and over what time period?

What service delivery arrangements are needed to provide those services to the

population?

Answering questions such as these will help address various of the objectives for WP4

including the identification of innovative social investment programmes (one aspect of

‘innovation’ could be meeting a need that has not previously been addressed or the

needs of a group that has not previously been engaged) and the social and psychological

impact of social welfare reform on individuals and communities (we first need to

understand these communities and their needs).

4.2 Using an existing needs assessment

In many cases it will be possible for the InnoSi research team to make use of an existing

needs assessment undertaken as part of the development of the policy or programme

being studied. However, it would be important to scrutinise an existing needs

assessment and ask some critical questions of it:

Has the target population been described clearly? Where social indicators

have been used to project or estimate the population are the indicators reliable

and is the estimation methodology defensible?

Are the needs of the target population described clearly? Where multiple

problems are to be addressed is the relationship between problems

20

(dependencies, causal order, etc.) described clearly?

Is the extent of the need described clearly? This will include quantifying the

size of the problem and also paying attention to its geographical and temporal

distribution.

Are the limitations in knowledge of the need recognised? Limitations may

result from a range of sources including the quality of available data, the

regularity with which data is collected and difficulties in the interpretation of

data.

Are differing interpretations of need recognised? Social problems are not

objective phenomena, but are social constructs and the differing interpretations

and understandings of different stakeholder groups will influence the outputs

from the needs assessment. The engagement of stakeholder groups is therefore

crucial in needs assessment (see below).

Depending on answers to these questions, additional work to develop a needs

assessment within which the researchers have more confidence may be required.

4.3 Undertaking a needs assessment

The first stage of a needs assessment is to define the need in terms of target population

and types of need experienced by the target population. This is likely to involve

reviewing policy or programme documentation and potentially interviewing key

stakeholders. A key consideration here is that defining a social problem and specifying

the goals of an intervention are fundamentally political processes (Rossi et al. 2004) and

the InnoSi researchers should build an awareness of this into the research process.

To undertake a needs assessment, the research team might analyse existing data

sources, use existing social indicators or undertake primary research (based in part on

Rossi et al. 2004):

Social indicators are often a good starting point for assessing needs. Such data

can often be used to assess change over time and form the basis of forecasts for

future need. The key challenge to using social indicators will be their relevance

and coverage, particularly where a policy or programme is regional and social

indicators are only available at a national level.

21

Organisation or service data collected by service delivery organisations can be

useful in a needs assessment. Where an organisation keeps detailed case files on

each client it might be possible to analyse such data. How useful it is will depend

on the coverage of the population of interest, the types of data collected and the

completeness and quality of the available data.

Surveys and censuses might contain relevant information. Often however, such

data is of limited use due to infrequent collection and a limited number of data

fields relevant to the need in question.

Key informant surveys or interviews represent a relatively straightforward

way to gather data on needs. Well-placed key informants might be able to

provide not just professional opinions, but also data from the organisations they

represent. The challenge in using such data is to assess the strength and quality

of key informant’s knowledge and to triangulate between the accounts of

different key informants.

When developing a needs assessment it is often useful to apply some of the following

concepts to describing the target population and their need (based in part on Rossi et al.

2004):

A population at risk is a public health concept that describes those persons

with a significant probability of developing the condition or need that the policy

or programme seeks to address.

A population in need is a group of people that currently manifest the need.

Incidence refers to the number of new cases of a particular need that are

identified or arise in a specified area and/or over a specified time, whereas

prevalence refers to the total number of existing cases in an area at a specified

time.

Incidence or prevalence can be expressed as a rate within an area or population.

22

5. Theories of change

5.1 Introduction

It is common practice for evaluations, regardless of the paradigm within which they are

located, to include an elaboration of the Theory of Change (TOC) that underpins the

policy or programme.

5.2 What is a theory of change?

The theory of change was fully articulated in the 1990s at the Aspen Institute

Roundtable on Community Change. Evaluating Complex Community Initiatives CCIs was

found challenging (Kubisch et al. 1998) due to:

Horizontal complexity

Vertical complexity

Contextual issues

Flexible and evolving intervention

Broad range of outcomes

Absence of a comparison community or control group.

Weiss (2000) hypothesises that a key reason that CCIs and other complex programmes

are difficult to evaluate is that theories of change that underpin them are poorly

articulated.

Developing a theory of change involves stating the desired (long-term) change based on

a number of assumptions that hypothesise, project or calculate how change can be

enabled. More specifically it requires thinking through:

Context for the initiative, including social, political and environmental

conditions, the current state of the problem the project is seeking to influence

and other actors able to influence change

Long-term change that the initiative seeks to support and for whose ultimate

benefit

Process/sequence of change anticipated to lead to the desired long-term

outcome

23

Assumptions about how these changes might happen, as a check on whether the

activities and outputs are appropriate for influencing change in the desired

direction in this context. (Vogel 2012: 4).

Assumptions are crucial:

The central idea in theory of change thinking is making assumptions explicit.

Assumptions act as ‘rules of thumb’ that influence our choices, as individuals and

organisations. Assumptions reflect deeply held values, norms and ideological

perspectives. These inform the design and implementation of programmes.

Making assumptions explicit, especially seemingly obvious ones, allows them to

be checked, debated and enriched to strengthen programmes. (Vogel 2012: 4)

As the focus on assumptions would imply, the TOC process is fundamentally

participatory and should include a variety of stakeholders and therefore of perceptions.

Finally, theories of changes may be developed at different points in the life-cycle of a

programme. They can be prospective and developed at the initial phase –

conceptualisation, planning and design. They can also be retrospective and be

‘reconstructed’ or pieced together after the programme is fully underway.

5.3 Developing a theory of change

There is not one single process to develop a theory of change. Over the years, many

different processes that arrive at a programmatic TOC have been conceptualised.

Broadly these can be grouped in one of the two, or a mix of both processes:

Researcher-led: Developing TOCs follows a rigorous research-like process

because a few elements that are relevant for the development of a TOC are

researched and investigated, e.g. the context. Assumptions may also be

formulated more like research hypotheses that can therefore in the future be

tested in a more in-depth way.

Stakeholder-led: Researchers/ programme managers facilitate a process in

which stakeholders are central. Stakeholders are provided with the basic

information, e.g. of the context but their own perceptions are taken into account.

This configures a collective induction exercise whose objective is to generate the

collective vision underlying the programme.

24

Reconstructing a programme’s TOC does involve a lot of work. The evaluator may start

with programme documentation such as funding bids, project plans or steering group

minutes. Often the evaluator needs to conduct a series of structured and semi-

structured interviews with key informants and stakeholders to piece together reasoning

that was never consciously or at least structurally articulated.

Other techniques like focus groups discussions or stakeholders meeting can also be

used. Yet ultimately the evaluator will act as the TOC proponent based on what s/he

pieces together. The last step is to procure validation and finally generate agreement

around what the TOC of the programme could have been, had it been conceptualised at

the planning stage.

Multiple theories of change

There is not, and there should not be, anything problematic with different stakeholders

bringing different perspectives to bear in the process of developing a theory of change.

If anything, TOCs are strengthened by this diversity of perceptions that ground projects

in its complexity, and work with it. Additionally, consensus is not always the reality and

power relations permeate all social relations.

Under-developed theories of change

Central to the TOC approach, and more so to its ability to be tested and evaluated, are

the assumptions. Often these are expected to be substantiated by evidence and in many

cases by social analysis. However:

[O]ne potential problem is that Theories of Change can be based on weak and

selective evidence bases and build in all kinds of assumptions about the world

that are not sufficiently problematised. In this respect they can reinforce and

mask the problem they purport to solve, creating a misleading sense of security

about the level of critical analysis a programme has been subjected to. (Valters

2014: 4)

This lack of initial analysis will, in turn, affect theory-based evaluations that will be

testing assumptions that were not sufficiently problematised and may not even hold as

25

assumptions, let alone as basis for change. In these cases it is right to say that

assumptions will be windmills and evaluators Don Quixotes.

When is a Theory of Change sufficient?

According to Connell and Kubisch (1998) a TOC should be:

Plausible. There must be available evidence that sustain the assumptions,

and hence that support the change potential of the activities to be

implemented.

Doable. The necessary resources – from financial to intuitional – must in in

place to ensure that the TOC informed initiative can be operationalised.

Testable. It must be specific and complete enough for the evaluator to assess

progress and evaluate contribution to change.

The emphasis of the TOC is on the social change that one wants to enable. As an

approach the TOC’s aim is to arrive at a measurable description of this change, and this

is the link between TOC and evaluation.

6. Process evaluation

6.1 Introduction

Process evaluation ‘verifies what the program is and whether or not it is delivered as

intended to the targeted recipients’ (Scheirer 1994, cited in Rossi et al. 2004). It also

considers unintended or wider delivery issues encountered during implementation. The

process evaluation of WP4 case studies is likely to consider most, if not all of the

following questions:

Has the intervention been implemented as intended? We will want to

understand whether our case study is typical or atypical.

What are the mechanisms by which the programme achieves its goals? In

particular we are interested in the distribution of the policy, social and

managerial roles between public, private and third sectors and the legal

framework used. These are key questions for InnoSi.

Has the intervention reached the target population?

26

How has the intervention been experienced both by those implementing it and

receiving it? Looking at how direct recipients and broader communities

experience the implementation of the policy or programme starts to address our

object of exploring the social and psychological impact of welfare reform on

individuals and communities (also addressed under ‘impact evaluation’).

What contextual factors are critical to effective implementation? In particular,

what is the interaction and complementarity with broader social welfare

policies? This is a key question for InnoSi?

Were unintended or wider delivery issues encountered during implementation?

There is sometimes a sense among those who are new to evaluation that designing and

delivering a process evaluation might be easier than designing and delivering an

outcome (impact) evaluation. We tend to disagree. As Moore et al. (2014) note, high

quality outcome evaluations require a range of skills, but generally, research questions

are easier to define and there is much literature to turn to for guidance, whereas:

Process evaluations, in contrast, involve deciding from a wide range of

potentially important research questions, integrating complex theories that

cross disciplinary boundaries, and combining quantitative and qualitative

methods of data collection and analysis. (Moore et al. 2014: 56)

6.2 Theory-driven process evaluation theory

Process evaluations usually study a complex mix of individual and organisational

dynamics. There are many theories from disciplines including psychology, sociology and

management science that evaluators can draw on to shape their understanding of the

phenomenon that they are evaluating. It is beyond the scope of this guide to provide a

comprehensive survey of this literature, but potentially useful literatures include:

Organisational dynamics: Rogers and Williams (2006) suggest six perspectives

on organisational dynamics that evaluators should consider, to which we have

added a seventh: managerial hierarchy perspective; street-level bureaucrat

perspective; the organisational development perspective; the conflict and

bargaining perspective; the chance and chaos perspective; the external influence

27

perspective; and the partnership perspective.

Change and innovation: Process evaluations are increasingly concerned not

just with whether an intervention is implemented correctly, but the change

mechanisms through which implementation is achieved (Moore et al. 2014).

There are many theories describing individual and organisational processes of

change. For example, Moore et al. (2014) cite the work of Hawe and colleagues

(2009) who describe interventions as events within systems, which “either leave

a lasting footprint or wash out, depending how well system dynamics are

harnessed” (Moore et al. 2014: 38). They also note how theories from sociology

and social psychology emphasise the processes through which interventions

become a fully integrated part of their setting, using the terms ‘routinisation’ or

‘normalisation’ respectively to describe these.

Systems and complexity: ‘Systems thinking’ originated in the natural sciences

before being applied to social inquiry. As applied to organisations it “suggests

that issues, events, forces and incidents should not be viewed as isolated

phenomena but seen as interconnected, interdependent components of a

complex entity.” (Iles and Sutherland 2001: 17). Complexity theorists distinguish

‘complex’ interventions from one ‘complicated’ ones. The former are

characterized by unpredictability, emergence (complex patterns of behaviour

arising out of a combination of relatively simple interactions), and non-linearity

of outcomes (Moore et al., 2014). Addressing the challenge of evaluating complex

interventions or interventions delivered in a complex context was key to the

development of theories of change (Weiss 1995, Kubisch et al. 1998). The

implication for process evaluation is that it must do more than describe whether

an intervention was implemented as intended, but must also generate

understanding (theory) about how mechanisms of change operate in the context

of complex organisational settings.

6.3 Designing a process evaluation

We assume that, given the nature of InnoSi and the questions we are addressing,

process evaluation is likely to involve quantitative and qualitative methods.

28

Qualitative methods

Qualitative methods are particularly useful in cases where interventions are set in

complex contexts, affected by a plethora of non-controllable independent (and

exogenous) variables, and with extended and non-linear causal chains. When well-

designed and effectively implemented they can capture emerging changes in

implementation, experiences of the intervention and can be used in the generation of

new theory. Often qualitative methods are used to capture perceptions and behaviours

that are not fully captured by quantitative methods. Qualitative methods typically used

in qualitative evaluation designs are categorised by some as ‘data enhancers’, the

assumption being that “when data are enhanced, it is possible to see key aspects of

cases more clearly” (Ragin 1994: 13). The fact that data is enhanced as opposed to being

condensed (Ritchie and Lewis 2003) introduces a number of advantages in terms of the

analysis and consequent operationalization of evaluations’ findings and

recommendations, namely:

Analysis is more aligned with participants’ own analytical categories and closer

to the emic viewpoint. resonate with stakeholders’ language, perceptions and

analytical categories.

In-depth study of cases and cross-case comparisons lead to analysis that

provides rich illustrative information regarding complex phenomena and the

relationships that they shape and that shape them in context.

Overall, the main methodological implications of this broad approach to social science

research and evaluation in particular are purpose and participation. Any qualitative

evaluation design can only be applied in situations where:

The purpose of the evaluation is clear enough for the evaluator to be able to

unequivocally identify pertinent sampling techniques (such as case studies) and

stakeholders.

Internal and external stakeholders can be involved and participate throughout

the evaluation process.

29

Quality assurance

An influential set of quality standards for qualitative evaluation was drawn up by the

British Government (Spencer et al. 2003). The framework is based around four guiding

principles – that research should be (2003:20):

Contributory in advancing wider knowledge or understanding about policy,

practice, theory or a particular substantive field.

Defensible in design by providing a research strategy that can address the

evaluative questions posed.

Rigorous in conduct through the systematic and transparent collection, analysis

and interpretation of qualitative data.

Credible in claim through offering well-founded and plausible arguments about

the significance of the evidence generated.

The framework has been designed to be applied to appraisals of the outputs of

qualitative evaluations. It is designed to aid the informed judgement of quality, but not

to be prescriptive or to encourage the mechanistic following of rules. The questions are

open-ended to reflect the fact that appraisals of quality must allow judgement, and that

standards are inevitably shaped by the context and purpose of assessment (Spencer et

al. 2003).

Area Appraisal questions

Possible quality indicators

De

sig

n

How defensible is the research design?

Discussion of how overall research strategy was designed to meet aims of study

Discussion of rationale for study design

Convincing argument for different features of research design (e.g. reasons given for different components or stages of research; purpose of particular methods or data sources, multiple methods, time frames etc.)

Use of different features of design/data sources evident in findings presented

Discussion of limitations of research design and their implications for the study evidence

30

Sa

mp

le

How well defended is the sample design/target selection of cases/documents?

Description of study locations/areas and how and why chosen

Description of population of interest and how sample selection relates to it (e.g. typical, extreme case, diverse constituencies etc.)

Rationale for basis of selection of target sample/settings/documents (e.g. characteristics/features of target sample/settings/documents, basis for inclusions and exclusions, discussion of sample size/number of cases/setting selected etc.)

Discussion of how sample/selections allowed required comparisons to be made

Sa

mp

le

Sample composition/case inclusion – how well is the eventual coverage described?

Detailed profile of achieved sample/case coverage

Maximising inclusion (e.g. language matching or translation; specialised recruitment; organised transport for group attendance)

Discussion of any missing coverage in achieved samples/cases and implications for study evidence (e.g. through comparison of target and achieved samples, comparison with population etc.)

Documentation of reasons for non-participation among sample approached/non-inclusion of selected cases/documents

Discussion of access and methods of approach and how these might have affected participation/coverage

Da

ta c

olle

ction

How well was the data collection carried out?

Discussion of:

• who conducted data collection

• procedures/documents used for collection/recording

• checks on origin/status/authorship of documents

Audio or video recording of interviews/discussions/conversations (if not recorded, were justifiable reasons given?)

Description of conventions for taking fieldnotes (e.g. to identify what form of observations were required/to distinguish description from researcher commentary/analysis)

Discussion of how fieldwork methods or settings may have influenced data collected.

Demonstration, through portrayal and use of data, that depth, detail and richness were achieved in collection

An

aly

sis

How well has the approach to and formulation of the analysis been conveyed?

Description of form of original data (e.g. use of verbatim transcripts, observation or interview notes, documents, etc.)

Clear rationale for choice of data management method/tool/package

Evidence of how descriptive analytic categories, classes, labels etc. have been generated and used (i.e. either through explicit discussion or portrayal in the commentary)

Discussion, with examples, of how any constructed analytic concepts/typologies etc. have been devised and applied

31

An

aly

sis

Contexts of data sources – how well are they retained and portrayed?

Description of background or historical developments and social/organisational characteristics of study sites or settings

Participants’ perspectives/observations placed in personal context (e.g. use of case studies/vignettes/individual profiles, textual extracts annotated with details of contributors)

Explanation of origins/history of written documents

Use of data management methods that preserve context (i.e. facilitate within case description and analysis)

An

aly

sis

How well has diversity of perspective and content been explored?

Discussion of contribution of sample design/case selection in generating diversity

Description and illumination of diversity/multiple perspectives/alternative positions in the evidence displayed

Evidence of attention to negative cases, outliers or exceptions

Typologies/models of variation derived and discussed

Examination of origins/influences on opposing or differing positions

Identification of patterns of association/linkages with divergent positions/groups

An

aly

sis

How well has detail, depth and complexity (i.e. richness) of the data been conveyed?

Use and exploration of contributors’ terms, concepts and meanings

Unpacking and portrayal of nuance/subtlety/intricacy within data

Discussion of explicit and implicit explanations

Detection of underlying factors/influences

Identification and discussion of patterns of association/conceptual linkages within data

Presentation of illuminating textual extracts/observations

An

aly

sis

How clear are the links between data, interpretation and conclusions – i.e. how well can the route to any conclusions be seen?

Clear conceptual links between analytic commentary and presentations of originaldata (i.e. commentary and cited data relate; there is an analytic context to cited data, not simply repeated description)

Discussion of how/why particular interpretation/significance is assigned to specific aspects of data – with illustrative extracts of original data

Discussion of how explanations/ theories/conclusions were derived – and how they relate to interpretations and content of original data (i.e. how warranted); whether alternative explanations explored

Display of negative cases and how they lie outside main proposition/theory/ hypothesis etc.; or how proposition etc. revised to include them

Figure ??: Assessing quality in qualitative evaluation (based on Spencer et al. 2003)

Quantitative methods

Process evaluations often start with a description of the quality (fidelity), quantity

(dose) and the extent to which the intervention reached its intended audience (Moore

et al. 2014). Such process evaluation questions are often addressed using quantitative

32

approaches. Quantitative methods might also be used to quantify key process variables

and/or to allow testing of pre-hypothesised change mechanisms and contextual

moderators (Moore et al. 2014).

Process evaluators wishing to integrate quantitative data into their evaluation have two

broad options to consider. Should they collect primary data or should they rely on

secondary analysis of routine monitoring data? Primary quantitative data collection

might typically involve some form of survey with staff delivering the intervention

and/or beneficiaries of the service (service users). Structured observations are also

commonly used to gather quantitative data. Analysis of secondary data is common

practice in process evaluations and carries a number of advantages as well as creating

some important challenges. It also raises broader questions about the overlap between

monitoring and evaluation. We discuss these further below.

Secondary analysis of monitoring data

There are some clear advantages to making use of data already being routinely gathered

as part of programme implementation.

Depending on the nature of the available monitoring data it may be possible to

collect some, quantitative data from all cases, staff or sites allowing for a level of

coverage that might not be possible if the evaluation team had to collect its own

data.

Large volumes of data can be collected at little additional cost to the evaluation

team.

Use of monitoring data as opposed to new primary data collection can reduce the

additional burden that evaluation activity places on programme staff.

Monitoring data is routinely collected and therefore the behaviour of programme

staff will not be changed in ways that it might be if new data was collected by the

evaluation team. This can help avoid bias resulting from the presence of

evaluators (sometimes referred to as the Hawthorne effect). This is not to

suggest that monitoring data doesn’t change the behaviour of programme staff,

but, if such monitoring would also be part of a scaled up intervention, any effect

that it had would also be reproduced (Moore et al. 2014).

33

There are also limitations and challenges to consider. The key challenge is that

monitoring data, because it is constructed primarily to assist in the management and

governance of a programme, may not capture the aspects of implementation that are of

most interest to the evaluation. One potential solution is for evaluators to be involved in

the development of monitoring systems and this may be possible in some case studies

where the policy or programme is at an early stage of development.

Another common challenge is to ascertain the validity and reliability of the data (Moore

et al. 2014). There are several questions that evaluators need to ask:

How consistently is monitoring data collected? For example, in an evaluation

involving multiple sites there may be different staff cultures when it comes to the

importance of collecting monitoring data.

How consistently are the requirements for monitoring data interpreted? For

example, in a large organisation, different staff groups might interpret

definitions in a monitoring system differently.

What time and resources are given to collecting monitoring? For example, if

some managers allocate time for staff to enter monitoring data into a database

other managers don’t, then the quality of monitoring data between the two

groups of staff may vary.

Various strategies to address these issues include using small-scale observations or a

programme of interviews to provide indications of validity (Moore et al. 2014). For

example, evaluators could observe what data staff record as they perform key tasks or

they could interview staff to ascertain how staff understand and interpret requirements

for them to collect and record monitoring data.

Another common challenge arises from negotiating complex governance processes that

often arise when gaining access to monitoring systems (Moore et al. 2014). Issues

ranging from data confidentiality to the compatibility of different databases can hinder

an evaluator’s attempts to access monitoring data. Allowing time and resource during

the evaluation planning process to negotiate with data ‘gatekeepers’ and find workeable

solutions is crucial.

34

The overlap between monitoring and evaluation

In our experience, when process evaluations are undertaken that incorporate the

secondary analysis of monitoring data it is common for confusion to arise. The aims of a

process evaluation will often overlap with management practices (Moore et al. 2014)

and overlaps are more apparent when evaluators make use of monitoring data. The

confusion arising from these overlaps can be problematic. Sometimes they call into

question the independence of the evaluator or the confidentiality of data that evaluators

collect and analyse. On other occasions evaluators can find themselves being drawn too

far into programme management taking resources away from important evaluation

tasks.

Integrating quantitative and qualitative analysis

When designing a mixed methods process evaluation it is important to make sure that

quantitative and qualitative analysis will build upon one another (Moore et al. 2014).

So, for example qualitative data might be used to explain quantitative findings or

quantitative data might be used to test hypotheses or theory developed through

qualitative analysis.

Where a process evaluation is undertaken alongside an outcome (impact) evaluation it

is good practice to analyse and report on qualitative process data prior to knowing the

results of the outcome evaluation (Moore et al. 2014). This avoids evaluator bias when

interpreting the process evaluation data, but while the ideal, it is not always practical or

possible and is unlikely to be practical for the InnoSi case studies.

7. Impact evaluation

7.1 Introduction

The basic question impact evaluations often seek to answer is ‘did the intervention

work?’ or ‘did the intervention cause the impact?’

Impact questions likely to be asked of WP4 case studies include (based in part on HM

Treasury 2013):

35

Did the policy, programme or project achieve its stated objectives?

What were the social and psychological impacts of social welfare reform on

individuals and communities, including the ways individuals’ sense of identity is

shaped by their interactions with welfare policy and its reform (including gender

and generational issues)? This is a key question for InnoSi.

What were the social outcomes and effectiveness of interventions for the various

actors, contributors and beneficiaries concerned? This will be a key question for

InnoSi.

From the perspective of recipients, did policy initiatives strengthen or weaken

the public sphere? This is a key question for InnoSi.

Did any outcomes occur which were not originally intended, and if so, what and

how significant were they?

Questions such as this assume that it is possible to attribute the impact observed to the

intervention being evaluated. The most widely deployed approaches to answering this

kind of question are experimental and quasi-experimental designs. But these designs

work best when the intervention is narrowly defined and when the link between

intervention and outcome is relatively direct and short-term (Stern et al. 2012).

Where programmes are long-term, embedded in a changing context and with extended

causal chains then a more useful impact question might be ‘did the intervention make a

difference?’ (Stern et al. 2012). This allows space for combinations of causes rather than

assuming that the intervention is a cause acting on its own (Stern et al. 2012).

Experimental and quasi experimental designs are likely to be less successful at

answering this type of question, but other alternative approaches to impact evaluation

exist and we discuss some of the more common ones, including theory-based and case-

based designs.

In this section we provide a brief overview of experimental and quasi-experimental

impact designs as well as a brief introduction to alternative impact evaluation designs.

We conclude with a brief word on methods. To understand both the advantages and

limitations of the (quasi) experimental approach to impact evaluation and to help

distinguish what is different about alternative designs, we start by considering what

36

makes for a trustworthy impact evaluation. We discuss four types of ‘validity’: statistical

validity, internal validity, construct validity and external validity.

7.2 Establishing trustworthiness: Validity

In the experimental evaluation tradition and more widely among evaluators who prefer

quantitative approaches to impact evaluation the ‘trustworthiness’ of an evaluation

design is discussed in terms of its ‘validity’. Validity can be divided into four categories.

Statistical conclusion validity

Statistical conclusion validity is concerned with whether the presumed cause (the

intervention) and the presumed effect (the outcome) are related (Farrington 2003).

Technically this is known as ‘covariance’. A challenge is whether the evaluation is

‘sensitive’ enough to provide reasonable evidence that the presumed cause and effect

‘covary’. An evaluation having insufficient statistical power to detect an effect is a key

threat to statistical conclusion validity (Cook and Campbell 1979). The Government

Social Research Unit (2007a) notes that the history of evaluating social programmes in

North America and the United Kingdom suggests that the effects of social programmes

are often modest. When we combine this with the fact that individuals subject to social

interventions tend to be relatively heterogeneous the implication is that samples in

programme evaluations will often have to be large in order to detect programme

impacts (Government Social Research Unit 2007a).

Internal validity

Internal validity refers to whether the evaluation can demonstrate plausibly a causal

relationship between the treatment and the outcome (Robson 2011). In other words is

the relationship between an independent and dependent variables a causal relationship.

Once it is established that two variables covary we need to decide whether there is

really a causal relationship between the two and which direction causality flows in. A

number of possible threats to internal validity have been identified (based on Cook and

Campbell 1979) including:

history refers to other things that change in the participant’s environment but

that are not related to the intervention or treatment being evaluated;

37

maturation is a threat when the effect that we observe might be due to the

people being evaluated growing older, wiser, stronger or more experienced

rather than to the intervention we are evaluating;

testing is a threat when the effect we observe might be due to the number of

times particular responses are measured;

instrumentation is a threat when an effect might be due to a change in the

measuring instrument between pre and post-test measurements and not to the

effect of the treatment;

regression (statistical regression to give it its technical name) is a threat if

participants in an evaluation are chosen because they are unusual or atypical;

and

selection is a threat when an effect may be due to a difference between the kinds

of people in one experimental group compared to another.

Construct validity

Construct validity refers to how well a measure conforms to theoretical expectations

(Punch 2014) or, more formally, the validity with which we can make generalisations

about higher order constructs. In other words, are we measuring what we think we are

measuring (Robson 2011). It recognises that when evaluators examine the relationships

between variables they move from the specifics of what they are measuring to an

abstract level where relationships between variables are turned into theoretical

constructs. The threats to construct validity include:

Inadequate ‘operationalisation’ of concepts occurs where the process of

turning concepts into a set of measures for which data will be collected during

the evaluation is not based on an appropriate conceptual analysis of the

construct.

Mono-operation bias refers to situations where an evaluation is designed with

only one measure to represent each of the constructs being evaluated.

Mono-method bias addresses the scenario where, although there is more than

one measure for each construct there is reliance on a single method of

measurement.

38

External validity

External validity refers to whether results from the evaluation can be generalised to

other situations. More formally it is the validity with which we can infer that a causal

relationship that we observe during the evaluation can be generalised across different

types of persons, settings and times (Cook and Campbell 1979). Cook and Campbell

identify three threats to external validity:

Interaction of selection and treatment: The challenge here is whether the

findings can safely be generalized beyond the group used in the evaluation.

Interaction of setting and treatment: The challenge here is whether results

obtained in one setting could be obtained in other settings.

Interaction of history and treatment: The challenge here is whether a causal

relationship can be generalized in the future.

7.3 Different approaches to impact

The table below maps out potential links between impact questions and evaluation

designs. Impact evaluation designs are discussed below.

Key impact

evaluation

question

Related evaluation questions Some designs that may be

suitable

To what extent can a

specific (net) impact

be attributed to the

intervention?

What is the net effect of the

intervention?

How much of the impact can be

attributed to the intervention?

What would have happened

without the intervention?

Experiment

Quasi-experiment

Has the intervention

made a difference?

What cases are necessary or

sufficient for the effect?

Was the intervention needed to

Experiment

Quasi-experiment

Case-based design

39

produce the effect?

Would these impacts have

happened anyhow?

How has the

intervention made a

difference?

How and why have the impacts

come about?

What causal factors have resulted

in the observed impacts?

Has the intervention resulted in

any unintended impacts, and if so,

what and how significant were

they?

For whom has the intervention

made a difference?

Theory-based evaluation,

particularly ‘realist’

versions

Figure ??: Impact questions and relevant impact designs (based in part on Stern et al. 2012: Table

4.2)

Experiments

Many evaluators working in the scientific tradition argue that the randomized field trial

is the ‘gold standard’ research design for assessing causal effects (Rossi et al. 2004). It is

unlikely that there will be the opportunity to implement a randomised field trial to

evaluate the impact of an InnoSi case study, in part because of the need for the evaluator

to have some control over the implementation of the intervention being evaluated in

order to establish randomisation. However, a brief description of a randomised field

trial design follows.

A randomized experiment is “An experiment in which units are assigned to receive the

treatment or an alternative condition by a random process such as the toss of a coin or a

table of random numbers” (Shadish et al. 2002: 12). In social policy the term

randomized experiment is sometimes used synonymously with the term ‘randomized

trial’ or ‘randomized controlled trial’ (RCT). This reflects the influence of clinical

research on social science research over decades.

40

The simplest randomized field experiment involves random allocation of units (these

may be people, classrooms, neighbourhoods, etc.) to two different conditions and a

post-test assessment of units. In the simplest experimental design the control group

gets nothing (a ‘placebo’). However, in a social experiment the use of a placebo is

unusual and it is more likely that the control group will receive either ‘treatment as

usual’ or an alternative treatment.

Shadish et al. (2004) set out the situations which increase the probability of doing a

successful randomized field experiment. These include

When demand outstrips supply randomization can provide a credible strategy

for distributing the service fairly.

When an innovation cannot be delivered to all units at once the order in which

units receive it can sometimes be randomized.

When experimental units are spatially separated or inter-unit communication is

low randomisation might be possible.

There are many variants on the basic design of a random field trial described above.

Some of the more common ones are:

The inclusion of before and after measures of outcome.

Longitudinal designs with repeated measures of outcome.

Factorial designs that use two or more treatments or interventions (independent

variables).

Randomised field trials have some clear advantages. Random assignment to treatment

and control groups, when undertaken properly, overcomes many of the threats to

internal validity. Results from simple randomized field experiments are usually easy to

understand. Such designs also have potential weaknesses including:

The integrity of a randomized field experiment can be easily threatened if the

requirements for randomisation between intervention and control group are

difficult to maintain.

Experimental designs work best when the intervention is tightly defined and

41

standardized.

Randomized field experiments is that they provide average impact estimates for

the different groups within the trial and these may hide.

Randomized field experiments do not explain why an intervention works. This

criticism is sometimes expressed using the metaphor of a ‘black box’. A

randomized field experiment is likened to a black box: we measure the inputs

and the outcomes but gain relatively little insight into what is causing the

outcomes and why.

Quasi experiments

Very often, where experimental designs are not possible, quasi-experimental designs

are. The classic definition of a quasi-experiment is given by Cook and Campbell:

“Experiments that have treatments, outcome measures, and experimental units,

but do not use random assignment to create the comparisons from which

treatment-caused change is inferred. Instead, the comparisons depend on non-

equivalent groups that differ from each other in many ways other than the

presence of the treatment whose effects are being tested”. (Cook and Campbell

1979: 6)

Put more simply, quasi-experimental designs are experiments that lack random

assignment of units but that otherwise are similar in purpose and structure to

randomized field experiments (Shadish et al. 2002). Many different designs of quasi-

experiment are possible, some of the designs more likely to be relevant to InnoSi case

studies are set out here.

42

Non-equivalent control group designs: Probably the most common quasi-experiment is the

‘untreated control group design with dependent pre-test and post-test samples’ often called

the ‘non-equivalent comparison group design’ (Shadish et al. 2002). The basic components

of the design are an intervention and control group that are not created through random

assignment, hence they are non-equivalent. Data is collected on the outcome measure

(dependent variable) both before and after treatment for both groups. This design allows

some of the threats to internal validity to be avoided. There are a number of ways in which

the non-equivalent comparison group design can be improved. If the subjects in the

intervention and control groups can be matched this can increase group similarity. Adding

multiple pre-tests and/or post-tests can increase interpretability. Another strategy involves

a post-test measurement of two plausibly related outcome variables one of which the

intervention is expected to change (dependent variable) and one (the non-equivalent

dependent variable) that is not expected to change as a result of the intervention. The latter

variable must be expected to respond to some or all of the contextually important internal

validity threats that the dependent variable is subject to (Shadish et al. 2002). Other

improvements focus on the comparison group and include the use of multiple comparison

cohorts or the use of a cohort control group. A cohort is a group or groups that move

through an institution together. Designs can also be strengthened if it is possible to either

stop the intervention for the intervention group after a period of time and observe the

effects, or even go one stage further and to re-start the intervention after it has been

stopped.

43

Interrupted time series designs: A time series is a large series of observations of the same

variable made over time. An interrupted time series design is an evaluation at which the

specific point in the series where an intervention is made is known. If the treatment has an

impact then the causal hypothesis is that observations after the date of treatment will be

different to the ones before (technically, if arranged on a graph they will have a different

slope or gradient) (Shadish et al. 2002). Thus the series will show an ‘interruption’ – hence

the name. In this design it is important to consider delayed effects. Immediate effects are

easier to interpret, but delayed effects can be interpreted if there is a theoretical

understanding of the delay. For example, we would expect a delay of at least 9 months

between the introduction of new advice on birth control and the first effects on birth rate.

As a ‘rule of thumb’ about 100 observations are required in order to model trends and

adjust for factors such as seasonal change. However, Shadish et al. (2002) are strong

advocates of ‘short time series’ i.e. where there are less than 100 observations available.

They suggest that while statistical analysis might be difficult or impossible having a number

of pre-test and post-tests can still help address some threats to internal validity and allow

for a better understanding of the nature of the causal impact.

Regression discontinuity design: The regression discontinuity is not particularly common as

an evaluation design, but its advocates argue that it could be used more often. It is

mentioned here because many argue that, when it comes to making causal inferences, this

is the strongest evaluation design other than a randomized field experiment. It might also

be applicable when studying welfare reforms where access to a service is based on meeting

standard eligibility criteria. In a regression discontinuity design subjects are allocated to an

intervention or control group based on whether they fall above or below a cut-off score.

The variable on which they are scored (the assignment variable) can be any variable

measured before the intervention including the outcome variable (dependent variable). This

design has often been used in scenarios where the assignment variable is an assessment of

need or merit. This design is therefore particularly relevant in situations where there is

criticism of the use of random assignment because this would be perceived to be

inequitable.

Alternative impact evaluation designs

44

There are many alternative impact evaluation designs. In this section we highlight two

broad approaches: theory-led designs and case-based designs. These are not simply

‘qualitative’ alternatives to ‘quantitative’ impact evaluation. Their proponents are

generally critical of relativist perspectives associated with some researchers working in

the qualitative tradition. They propose impact designs that, they argue, improve

internal validity by foregrounding participants’ perspectives and an understanding of

the context in which impact occurs. They are also concerned with external validity. So,

for example, Byrne, in a defence of case-based methods starts from the proposition that

“the central project of any science is the elucidation of causes that extend beyond the

unique specific instance” (Byrne 2009: 1).

Their starting point is a recognition of the complexity of social programmes, which

often involve partnership approaches and contain multiple goals (Pawson and Tilley

1997, Blamey and Mackenzie 2007, Marchal et al. 2013) and the challenge posed by new

‘arms-length’ modes of government practice in which the reform of public services is

‘depoliticised’ (Diamond 2013) and greater emphasis is placed on evaluators to deliver

‘evidence-based policy’.

A key difference that distinguishes these alternative impact evaluation designs from the

(quasi) experimental approach is a different understanding of causation. Put crudely,

these alternative approaches to impact design see causation as more ‘complex’. For

example, Pawson and Tilly (1997) draw a distinction between the ‘successionist’

approach to causation assumed by experimentalists and the ‘generative’ approach

assumed by scientific realist evaluators. Successionist causation is ‘external’ in the sense

that we do not and cannot observe certain causal forces at work (Pawson and Tilley

1997). Generative causation sees causation ‘internally’ and describes the transformative

potential of phenomena (Pawson and Tilley 1994). Case-based approaches might also

subscribe to generative understandings of causation or to the idea of ‘multiple

causation’ (Byrne et al. 2009, Stern et al. 2012).

Theory-led designs for impact evaluation recognise that interventions in social policy

are complex and that an understanding of context is crucial to explaining impact. This is

in contrast to the (quasi)experimental approach which ‘smuggles’ in a particular set of

understandings about what programmes are and how they work (Pawson and Tilley

1994).

45

One example of a theory-led approach is ‘scientific realism’. For the scientific realists

interventions or programmes are not an external, impinging 'force' to which subjects

'respond', but instead work (outcomes) by introducing appropriate ideas and

opportunities (mechanisms) to groups in the appropriate social and cultural conditions

(context) (Pawson and Tilley 1997). At the heart of impact evaluation is therefore the

study of Context-Mechanism-Outcome configurations (Pawson and Tilley 1997).

Different evaluations will require different design elements and the use of different

methods, but broadly the starting point might be to collect 'before' and 'after' data to

give an overall picture of outcomes but then the focus is on data which can be used to

explore mechanism and context variation with comparisons of variation in outcome

patterns across groups. But these would not be the standard experimental-versus-

control-group comparisons. Instead, comparisons would be defined by the

mechanism/context framework (Pawson and Tilley 1994).

Relatively recent methodologies for systematic causal analysis using case designs must

be distinguished from traditional understandings of ‘case studies’ (Stern et al. 2012).

The tradition in evaluation of naturalistic, constructivist and interpretive case studies

that generally focus on the unique characteristics of a single case might contribute to

richer understanding of causation but cannot themselves support causal analysis (Stern

et al. 2012). By contrast new approaches to case are interested in generalising beyond a

single case but distinguish ‘generalising’ from ‘universalizing’ (Byrne 2009). Cases are

generally seen as complex systems. A key distinction between case-based approaches

and experimental designs is the rejection of analysis based on variables (Byrne 2009).

The case is a complex entity in which multiple causes interact:

“It is how these causes interact as a set that allows an understanding of cases . . . .

This view does not ignore individual causes of variables but examines them as

‘configurations’ or ‘sets’ in their context.” (Stern et al. 2009: 31)

Case-based methods are varied but typically involve multiple case studies founded on

systematic comparison (Byrne 2009). Generally, quantitative and qualitative data is

used and a sharp distinction between quantitative and qualitative methods is rejected

(Stern et al. 2012). Analytical techniques can be complex. Kent (2009) emphasises that

case-based methods are not restricted to ‘small-n research’. He goes on to describe a

range of quantitative methods that include Bayesian statistics, configurational analysis

46

(including Qualitative Comparative Analysis – QCA), fussy-set analysis, neural network

analysis and analysis of the tipping point. Some of these require advanced analytical

skills (eg Bayesian analysis) and/or substantial computing power (eg QCA).

7.4 Methods

It is likely, given the designs discussed above that impact evaluations will draw on a

range of quantitative and qualitative methods.

While quantitative methods such a surveys of users or analysis of management data

might be best suited to addressing questions of overall programme impact, a mix of

quantitative and qualitative methods may be more appropriate when assessing the

social and psychological impact of social welfare reform on individuals and

communities, including the ways individuals’ sense of identity is shaped by their

interactions with welfare policy and its reform (including gender and generational

issues).

Community Reporters

Community Reporters will be recruited in one of the case study sites. The data from

Community Reporters is an additional data stream. It is not intended to replace the

evaluation team’s own data collection.

8. Economic evaluation

8.1 Introduction

Rossi et al. (2004: 332) note that:

Whether programs have been implemented successfully and the degree to which

they are effective are at the heart of evaluation. However, it is just as critical to

be informed about the cost of program outcomes and whether the benefits

achieved justify those costs.

47

In the InnoSi case studies we can therefore see economic evaluation as an extension of

impact evaluation.

Rossi et al. (2004: 332) describe economic evaluation as providing “a frame of reference

for relating costs to program results”. Economic evaluations ask questions such as

(based in part on Dhiri and Brand 1999):

What was the true cost of the policy or programme?

Did the outcome(s) achieved justify the investment of resources?

What were the social returns of interventions for the various actors, contributors

and beneficiaries concerned? This is a key question for InnoSi.

Was this policy or programme the most efficient way of realising the desired

outcome(s) or could the same outcome(s) have been achieved at a lower cost

through an alternate course of action?

Attempts to address these issues have traditionally fallen into one of two forms:

Cost effectiveness analysis: A form of economic evaluation where the outcomes

of an intervention are measured using the most appropriate natural effects or

physical units (Drummond et al. 2005) such as burglaries avoided or the cost of

converting each smoker into a non-smoker. The outcomes are not expressed in

monetary terms. Instead the results are expressed as a cost-effectiveness ratio

such as £1,000 per burglary avoided or $1,000 dollars per smoker converted into

a non-smoker (Rossi et al. 2004).

Cost-Benefit Analysis: A form of economic evaluation were the outcomes are

valued in monetary terms (Drummond et al. 2005). A Cost-Benefit Analysis of a

programme to reduce cigarette smoking would examine the different in dollars

between the costs of the programme and the savings from reduced medical care

for smoking-related diseases (Rossi et al. 2004). Potentially this makes it the

broadest form of economic evaluation method, however, as we will discuss later

difficulties in capturing and measuring wider consequences of an intervention

mean that, in reality its scope can be limited (Roman 2004).

The distinction between cost effectiveness and CBA is more fundamental than taking

the additional step of valuing the outcomes in a study. Drummond et al. (2005), in the

48

context of health care, argue that while a CEA is based on decision-makers reviewing

results and deciding on the relative values assigned to competing priorities, a CBA is

rooted in welfare economics where the relevant source of values is believed to be

individual consumers because they are best placed to judge their own welfare.

Over recent years growing interest in economic evaluation has led to a proliferation of

approaches to ‘economic’ evaluation that are designed to be accessible to evaluators

without an economics training, one of the most widely used being Social Return On

Investment.

In the remainder of this section we describe briefly the main stages in undertaking a

Cost-Benefit Analysis and a Social Return on Investment analysis.

8.2 Stages in a Cost-Benefit Analysis

There are a number of stages in a CBA:

Define the scope of the analysis

Key issues to decide at this stage include the perspective to take in the analysis (for

example will the perspective be that of the state, a specific agency or the whole of

society), what outcomes are to be measured and the alternatives to be compared (for

example, participation in a programme versus non-participation) (Welsh and

Farrington 2003). The starting point for a cost analysis is to establish the viewpoint for

analysis (Drummond et al. 2005) in other words, ‘who pays?’; the viewpoint taken could

have a radical effect on the analysis undertaken. Common viewpoints for the analysis of

social projects are: those of individual participants; programme funders or sponsors; or

the communal social unit involved in the programme, such as municipality, region, state

or nation (Rossi et al. 2004).

Assemble cost data

It is useful to divide cost into three categories (Rossi et al. 2004):

Direct project expenditure: in many programmes a substantial proportion of

the direct project expenditure goes on staff and their associated ‘on-costs’

(salaries, employer contributions such as – in the UK - National Insurance and

pensions).

49

Costs incurred by programme recipients: these might include time spent

participating in an activity or travel costs.

Costs incurred by co-operating agencies: These might include costs resulting

from a referral to another agency. For example, a project working with young

people at risk of offending might make a referral to a social service provider,

which will then incur additional cost.

Across all of these costs areas evaluators must take account of the costs of services or

facilities used by the programme which are ‘free’1 or discounted. An example of this

would be the use of (un-paid) volunteers’ time or where a project makes use of

(formerly surplus) office space provided gratis.

Evaluators must also identify where resources have been diverted as a result of the

intervention. Resources which would have been mobilised anyway, in the absence of the

intervention, are generally excluded from cost analyses (Dhiri and Brand 1999). This is

the concept of ‘additionality of costs’ (Dhiri and Brand 1999).

There are different ways of gathering data on the costs of a programme. Estimates are

often developed through: a review of financial reports; invoices and progress reports to

funders; and interviews with key staff (Roman 2004). In some cases these might be

supplemented by surveys, activity diaries or activity sampling exercises covering

programme staff (Dhiri and Brand 1999). Client costs, for example, time spent by clients

or their transport costs might be imputed (Rossi et al. 2004) or, data could be gathered

from clients via interviews or surveys.

The concept of ‘additionality’ requires a counterfactual to estimate the difference

between the costs incurred by the intervention and costs which would have been

incurred anyway. The counterfactual might take the form of comparing current budgets

to the baseline (pre-intervention) level of resources (Dhiri and Brand 1999) or the cost

of the next best approach to delivering the desired outcome.

Once relevant costs have been identified, individual items must be measured and valued

(Drummond et al. 2005). A general principle is that the economic cost of an input should

be estimated. Economic evaluations often distinguish between average cost of

1 Economists would generally hold, philosophically, that nothing is “free” – there is always an opportunity cost. In this

case we mean that the good or service is not paid out of the intervention budget.

50

delivering a unit of output and marginal costs. Marginal cost can be defined as the cost

of producing one extra unit of output. When calculating the marginal cost, only those

inputs which are required to achieve the extra unit of output are included. Fixed costs

such as premises or staff are excluded unless they are required to achieve this extra unit

(Dhiri and Brand 1999).

Estimate impact of programme

There are different ways to measure programme effects, but most economists favour an

outcome evaluation in which an experimental or quasi-experimental design has been

used. We discussed these approaches to impact evaluation in above.

Estimate the monetary value of outcomes

The defining feature of a CBA is that the effects of the intervention – the outcomes – are

valued in standardized monetary units, such as the dollar or the pound. Thus the benefit

of the intervention, expressed in monetary terms can be compared directly with the

costs of the intervention, also expressed in monetary terms. A key stage in a CBA is to

put a value on the costs and benefits of the programme outcomes. For example, in an

economic evaluation of an employment programme helping more people into work has

a range of economic benefits for the individuals and for society. For individuals there

might be financial benefits if they earn more in employment than they received in state

benefits. There might also be benefits in terms of improved mental or physical health.

For society benefits might include more tax revenues, less benefit payments and savings

for health services. There may also be multiplier effects – those in work have spare

income to generate demand for goods and services. There might also be some costs. For

individuals these might include the need to pay more tax, for society an example of a

cost could be that as more mothers move into the workplace the demand for and hence

cost of childcare increases.

Some costs and benefits might be relatively simple to value by using existing market

data. For example, in the above example it might be relatively simple to estimate the

decrease in benefit payments and increase in tax returns. Economists understand the

monetary value of non-market goods in terms of the impact these things have on utility

which, in a broad sense, is the satisfaction a person gets from consumption of a good or

to the change in their welfare or well-being (HM Treasury 2013). The preferred method

51

of estimating this change in welfare is to estimate peoples’ ‘willingness to pay’ (WTP) or

‘willingness to accept’ (WTA) the programmes outputs or outcomes (HM Treasury

2013). One way to estimate peoples’ WTP or WTA is to look at preferences they ‘reveal’

in a similar or related market. Where it is not possible to identify WTP or WTA through

revealed preferences another option is to construct surveys that describe a hypothetical

choice in a hypothetical market and ask people to state their preferences.

For the InnoSi case studies it will be advisable to first look at existing estimates of the

value of costs and benefits and consider whether these can be applied in the current

study. If there are no existing or reliable estimates of value a decision must be made

whether to undertake new research to estimate the cost of benefits. This is likely to be

costly and so the potential benefit new insights will provide must be weighed against

the cost of the research required to generate them. One option might be to pool

resources across several case studies.

Calculate present value and assess efficiency

If the monetary expressions of the costs and benefits of an intervention are to be

compared directly then it is important we recognise not all of these costs and benefits

accrue at the same point in time. Therefore a process of discounting is used to calculate

the Net Present Value of all costs. Once the Net Present Value of costs and benefits has

been calculated then the intervention’s efficiency can be calculated in the form of a

benefit-cost ratio (benefits divided by costs) or net value (benefits minus costs) (Welsh

and Farrington 2001).

Describe the distribution of costs and benefits

Describing the distribution of programme costs and benefits involves identifying who

gained and lost from the intervention (Welsh and Farrington 2001). An economic

evaluation assesses efficiency expressed as “the extent to which a programme delivers

additional benefits, however expressed, relative to the additional costs used to provide

the programme” (Palfrey et al. 2012: 127). But, as Palfrey et al. note, simple technical

efficiency is not sufficient for establishing priorities within and between publicly funded

services and for that allocative efficiency must be used.

52

Conduct sensitivity analysis

Once the intervention’s efficiency has been calculated it is important to check how

sensitive the resulting figure is to variations in the estimates that have been used in the

CBA.

Present the results

Results should be transparent and replicable.

All of these stages are required for a CBA. For a CEA the fourth step is omitted. Below

we look at some of these stages in more detail.

8.3 Social Return on Investment

Social Return on Investment (SROI) has been developed in recognition that there is a

need for better ways to account for the social, economic and environmental value that

results activities across the public, not-for-profit and private sectors (The SROI Network

2012). It also recognises a need for a methodology that is more accessible than

traditional approaches to CBA.

SROI was developed from social accounting and CBA and is based on seven principles

that underpin how SROI should be applied (The SROI Network 2012):

involve stakeholders;

understand what changes;

value the things that matter;

only include what is material;

do not over-claim;

be transparent;

verify the result.

A SROI involves the following stages:

1. establishing scope and identifying key stakeholders;

2. mapping outcomes;

53

3. evidencing outcomes and giving them a value;

4. establishing impact;

5. calculating the SROI;

6. reporting, using and embedding.

Some elements of a SROI evaluation are almost identical to those in a CBA. In this

section we concentrate on areas that are different to the process of undertaking a CBA

as described above.

Establishing scope and identifying key stakeholders.

The SROI Guidance emphasises the importance of clear boundaries around what the

SROI analysis will cover, who will be involved in the process and how. Several steps are

highlighted including establishing the scope and identifying stakeholders. The

importance of thinking about unintended outcomes and negative outcomes and the

implications of these in the identification of stakeholders is emphasised (The SROI

Network 2012).

Mapping outcomes.

Through engaging with stakeholders an impact map will be developed. The Impact Map

is central to the SROI analysis (The SROI Network 2012) and is similar or in some cases

the same as a theory of change or a logic model (see above).

Evidencing outcomes and giving them a value.

This stage involves finding data to show whether outcomes have happened and then

valuing them (The SROI Network 2012). The first stage is to develop outcome indicators

and it is suggested that more than one indicator and a mix of complementary, subjective

(or self-reported) and objective indicators is desirable. Next outcomes data is collected.

The third stage is to establish how long outcomes last. Estimates of the duration of each

outcome should be determined through consultation with stakeholders or reference to

other research. Where the duration of an outcome is for many years – the example given

is a parenting intervention with children from deprived areas that may potentially have

effects that last into adulthood – it is recommended that longitudinal data is gathered to

support estimates of duration (The SROI Network 2012).

54

Putting a value on the outcome

The process of valuation has strong similarities with those described above for CBA. A

distinction is made between proxies that are easy to source because there is an obvious

market value and proxies that are more challenging. For the latter techniques such as

stated preference (willingness to pay, or accept compensation) and revealed preference

are suggested (The SROI Network 2012). These are discussed in detail earlier in this

chapter.

Establishing impact.

The SROI Guidance suggests that having collected evidence on outcomes and monetised

them, “those aspects of change that would have happened anyway or are a result of

other factors are eliminated from consideration” (The SROI Network 2012: 55). Several

ways of doing this are outlined.

The first is to calculate ‘deadweight’, which is “a measure of the amount of outcome that

would have happened even if the activity had not taken place” (The SROI Network 2012:

55). Another component of the estimate of impact is to consider ‘displacement’ which is

“an assessment of how much of the outcome displaced other outcomes’ (The SROI

Network 2012: 57) and may apply to some outcomes. Also important is ‘attribution’

which is an assessment of how much of the outcome was caused by the contribution of

other organisations or people. Three main approaches to estimating attribution are

suggested (The SROI Network 2012):

basing an estimate on the evaluator’s experience. The SROI Guidance suggests

that if an evaluation has been working with other organisations for a number of

years they may have a good idea of how each contributes to outcomes;

asking stakeholders;

consulting with the other organisations to which the evaluator thinks there

might be attribution.

Finally, ‘drop-off’ considers how long the outcomes last and is calculated for outcomes

that last more than one year (The SROI Network 2012). Once deadweight, displacement,

attribution and drop-off are estimated an impact for each outcome can be calculated by

multiplying the financial proxy by the quantity of the outcome and deducting any

55

percentages for deadweight or attribution. This is then repeated for each outcome (The

SROI Network 2012).

Calculating the SROI.

This stage involves adding up all the benefits, subtracting any negatives and comparing

the result to the investment, after which the sensitivity of the results can be tested (The

SROI Network 2012). A final, optional stage in the analysis is to calculate the ‘payback

period’:

8.4 Social Return on Investment and Cost-Benefit Analysis distinguished

The SROI approach has some clear similarities with CBA. However, it also draws on two

other traditions; sustainability accounting and financial accounting (The SROI Network

Undated) and was developed, specifically to be accessible to non-economists. There are

therefore some important differences between SROI and CBA.

The first difference is the role of stakeholder involvement, which is fundamental to the

SROI, particularly when trying to determine outcomes, or the changes that result from

an activity (The SROI Network Undated). Secondly, the SROI Network (Undated:

unnumbered) suggests that the principle of only including what is material is “the most

notable difference between the two approaches”. They argue that while CBA is an

application of welfare economics and so begins from the perspective that all welfare

effects will be included, in practice, “it often focuses on a particular policy outcome with

some recognition of unintended consequences” which “risks omission of important

effects” (The SROI Network Undated: Unnumbered). By contrast the SROI Network

argues that SROI “recognises these limitations and aims to include material outcomes,

drawing on financial and sustainability reporting, which hold materiality as a central

tenet” (The SROI Network undated: unnumbered). The third difference to us seems just

as notable and is the approach to methodological rigour. One example, is the rigour with

which the impact of the intervention is established. In a CBA preference is given to

experimental or quasi-experimental approaches whereas: ‘SROI principles can be used

at any level of rigour, as long as it is ‘good enough’ for the type of decision it is being

used to inform” (The SROI Network, Undated, Unnumbered). This different application

56

of rigour in SROI applies to valuation as well. Again “an estimate for a value may be

good enough for a particular audience” (The SROI Network, Undated, Unnumbered).

57

9. Bibliography

Blamey A and Mackenzie M (2007) 'Theories of Change and Realistic Evaluation: Peas in

a Pod or Apples and Oranges?' Evaluation 13 pp. 439-455.

Byrne D (2009) ‘Case-Based Methods: Why We Need Them; What They Are; How To Do

Them’ in Byrne D, and Ragin CC (eds.) The SAGE Handbook of Case-Based Methods,

London: Sage.

Connell JP and Kubisch AC (1998). Applying a theory of change approach to the

evaluation of comprehensive community initiatives: progress, prospects, and problems.

In: Fulbright-Anderson K, Kubisch, AC and Connell JP (eds.) New Approaches to

Evaluating Community Initiatives: Theory, Measurement, and Analysis. Washington, DC:

The Aspen Institute.

Cook T and Campbell D (1979) Quasi-experimental Design and Analysis Issues for Field

Settings, Boston: Houghton Mifflin Company.

Dhiri S and Brand S (1999) ‘Analysis of costs and benefits: guidance for evaluators’,

Crime Reduction Programme – Guidance Note 1, London: Home Office.

Drummond M, Sculpher M, Torrance G, O’Brien B and Stoddart G (2005) Methods for the

Economic Evaluation of Health Care Programmes (Third Edition), Oxford: Oxford

University Press.

Farrington D (2003) Methodological Quality Standards for Evaluation Research The

ANNALS of the American Academy of Political and Social Science 587; 49

Hawe P, Shiell A and Riley T (2009) Theorizing interventions as events in systems.

American Journal of Community Psychology 43(6):267–76.

HM Treasury (2013) The Green Book: Appraisal and Evaluation in Central Government,

London: HM Treasury.

Iles V and Sutherland K (2001) Organisational change: a review for health care

managers, professionals, and researchers. London: NCCSDO.

Kubisch AC, Fulbright-Anderson K and Connell J (1998) Evaluating community

initiatives: A progress report. In: Fulbright-Anderson K, Kubisch AC and Connell JP

58

(eds.) New Approaches to Evaluating Community Initiatives: Theory, Measurement, and

Analysis Washington, DC: The Aspen Institute.

Lincoln YS and Guba EE (1986) Research, Evaluation and Policy Analysis: Heuristics and

Disciplined Enquiry, Policy Studies Review 5(3) pp.546 – 565.

Marchal B, Belle S, Olmen J, Hoerée T and Kegels G (2012) ‘Is realist evaluation keeping

its promise? A review of published empirical studies in the field of health systems

research ’ Evaluation 18(2) pp.192-212.

Moore G Audrey S Barker M Bond L Bonell C Hardeman W Moore L O’Cathain A Tinati T

Wight D Baird J (2014) Process evaluation of complex interventions: Medical Research

Council guidance. London: MRC Population Health Science Research Network.

Palfrey C, Thomas P and Phillips C (2012) Evaluation for the Real World: The Impact of

Evidence in Policy Making, Bristol: The Policy Press.

Pawson R and Tilley N (1994) ‘What works in evaluation research?’ British Journal of

Criminology 34 pp.291-306.

Pawson R and Tilley N (1997) Realistic evaluation. London: Sage.

Public Sector Transformation Network (2014) Public Service Transformation:

Introductory guide to evaluation, London: Public Sector Transformation Network.

Punch K (2014) Introduction to Social Research: Quantitative and Qualitative

Approaches, London: Sage.

Ritchie J and Lewis J (eds.) (2003) Qualitative Research Practice. A Guide for Social

Science Students and Researchers. London: Sage.

Robson C (2011) Real World Research (Third Edition), Chichester: Wiley.

Rogers PJ and Williams B (2006) Evaluation for practice improvement and

organizational learning. In: Ian F. Shaw IF, Greene JC and Mark MM (eds.) The SAGE

Handbook of Evaluation. London: Sage, pp.77-98.

Roman J (2004) ‘Can Cost-Benefit Analysis Answer Criminal Justice Policy Questions,

And If So, How?’ Journal of Contemporary Criminal Justice 20, pp.257 – 275.

Rossi P, Lipsey M and Freeman H (2004) Evaluation: A Systematic Approach: London:

Sage.

59

Scriven M (1967) ‘The methodology of evaluation’. AERA Monograph Series in

Curriculum Evaluation, No. 1. Chicago: Rand McNaliy.

Shadish WR, Cook TD and Campbell DT (2002) Experimental and quasi-experimental

designs for generalized causal inference. Boston: Houghton-Mifflin.

Spencer L. Ritchie J, Lewis J and Dillon L (2003) Quality in Qualitative Evaluation: A

framework for assessing research evidence National Centre for Social Research.

Government Chief Social Researcher’s Office Strategy Unit.

Stern E, Stame N, Mayne J, Forss K, Davies R and Befani B (2012) Broadening The Range

Of designs and Methods for Impact Evaluations: Report of a Study Commissioned by the

Department For International Development, London: DFID.

The SROI Network (2012) A Guide to Social Return on Investment

www.thesroinetwork.org [accessed 24th April 2015].

The SROI Network (Undated) SROI and Cost-Benefit Analysis: Spot the Difference, or

Chalk and Cheese, Blog entry http://www.thesroinetwork.org/blog/410-sroi-and-cost-

benefit-analysis [accessed 24th April 2015].

Vogel I (2012) Review of the use of ‘Theory of Change’ in International development. DFID

Research Paper, Department for International Development.

Weiss C (2000) Which links in which theories shall we evaluate? In: Rogers PJ, Hacsi T,

Petrosino A and Huebner TA (eds.) Program Theory in Evaluation: Challenges and

Opportunities, New Directions for Evaluation,San Francisco: Jossey-Bass.

Weiss C (1995) Nothing as practical as good theory: exploring theory-based evaluation

for comprehensive community initiatives for children and families. In: Connell JP,

Kubisch AC, Schorr LB and Weiss CH (eds.) New Approaches to Evaluating Community

Initiatives: Concepts, Methods and Contexts. Washington, DC: Aspen Institute.

Welsh B and Farrington D (2001) ‘Assessing the Economic Costs and Benefits of Crime

Prevention’ in Welsh, B., Farrington, D. and Sherman, L. (Eds.) Costs and Benefits of

Preventing Crime, Oxford: Westview Press.

http://www.thesroinetwork.org/

http://www.thesroinetwork.org/blog/410-sroi-and-cost-benefit-analysis

http://www.thesroinetwork.org/blog/410-sroi-and-cost-benefit-analysis

https://www.gov.uk/government/news/dfid-research-review-of-the-use-of-theory-of-change-in-international-development

InnoSI Case Study Research and Evaluation Guide (Work...

Documents

Transcript of InnoSI Case Study Research and Evaluation Guide (Work...