The Neglected “R” in the Risk-Needs-Responsivity Model: A ...
of 34
/34
Embed Size (px)
Transcript of The Neglected “R” in the Risk-Needs-Responsivity Model: A ...
Responsivity AssessmentApproach for Assessing Responsivity to
Correctional Interventions
Authors
St. Paul, MN 55108-5219
St. Paul, Minnesota 55108-5219
January 2019
This information will be made available in alternative format upon request.
Printed on recycled paper with at least 10 percent post-consumer waste
Prevailing correctional practice holds that offenders should be assigned to interventions
on the basis of assessments for risk, needs, and responsivity. Assessments of responsivity,
however, typically consist of little more than a checklist of items such as motivation,
gender, language, or culture. We introduce a new actuarial approach for assessing
responsivity, which focuses on predicting whether individuals will desist after
participating in an intervention. We assess responsivity by using multiple classification
methods and predictive performance metrics to analyze various approaches for
prioritizing individuals for correctional interventions. The results suggest that adding an
actuarial responsivity assessment to the existing risk and needs assessments would likely
improve treatment assignments and further enhance the effectiveness of an effective
intervention. We conclude by discussing the implications of more rigorous responsivity
assessments for correctional research, policy and practice.
1
Introduction
As correctional agencies have increasingly embraced the idea of evidence-based
practices, risk-needs-responsivity (RNR) has become the prevailing model to guide the
delivery of correctional interventions. The risk principle holds that programming
resources should be reserved for higher-risk individuals, whereas the needs principle
dictates that interventions should target criminogenic needs areas, or dynamic risk factors
that are susceptible to change. The responsivity principle, meanwhile, suggests that
programming should be tailored to the strengths, abilities, and learning styles of
individuals.
Because the RNR paradigm holds that the effective delivery of programming
should be customized to an individual’s risk, needs, and responsivity, the use of
assessment instruments is central to this model. Currently, most of the widely-used tools
simultaneously assess for risk and needs. Risk assessment involves predicting who is
most likely to recidivate, while needs assessment focuses on identifying which
interventions would be the most appropriate for an individual (Gottfredson and Moriarity,
2006).
Among existing assessment instruments, the only component that is truly actuarial
is the assessment of recidivism risk. That is, these instruments rely on statistical methods
to estimate the likelihood an individual will commit a new crime in the future. The
assessment of criminogenic needs, on the other hand, does not use an actuarial approach.
Instead, the common strategy involves tallying up the number of items related to each
criminogenic needs area. The needs areas, or domains, with the highest scores are those
that presumably should be targeted for programming. For example, if an individual scores
2
highest for the substance abuse domain, then substance abuse treatment would be
considered an appropriate—if not the most appropriate—intervention.
As with the assessment of criminogenic needs, contemporary instruments do not
use actuarial methods to assess responsivity. Of the three principles within the RNR
model, responsivity is generally an afterthought. Indeed, rather than being described as
risk, needs, and responsivity assessment tools, the most widely-used instruments are
typically referred to as risk and needs assessments. And even when there is an attempt to
account for responsivity, the assessment of responsivity is barely more than a checklist of
items such as motivation, gender, and culture.
Present Study
In this study, we introduce an actuarial approach for assessing responsivity, which
involves estimating the likelihood that an individual’s participation in an intervention will
result in desistance. If an individual participated in, say, substance abuse treatment, what
is the probability it would lead to desistance? With risk assessment, the focus is on
identifying who will recidivate. It is the opposite for responsivity assessment, where the
focus is on identifying who will desist, or not recidivate. Yet, because responsivity
assessment also considers participation in correctional interventions, it attempts to predict
whether participating in an intervention will result in desistance. In doing so, the
responsivity assessment we present in this study also accounts for the efficacy of an
intervention.
Our sample consists of more than 23,000 offenders released from Minnesota
prisons between 2003 and 2011. We focus on prisoner participation in prison-based
chemical dependency treatment, which has been found to be effective in reducing
3
differences, we conducted separate analyses for male and female offenders.
Each of the offenders had been assessed for chemical dependency (CD) needs
upon their entry to prison. We developed baseline recidivism prediction models (i.e., risk
assessment) along with responsivity assessment models that predict desistance following
participation in prison-based CD treatment. Using the risk, needs, and responsivity
assessment data, we then examined the performance of various prioritization schemes in
reducing recidivism. In addition to prioritizing on the basis of risk and needs, we
prioritized offenders on the basis of risk-needs-responsivity, risk and responsivity, and
needs and responsivity. We estimate the overall impact on recidivism and conclude by
discussing the implications for correctional research, policy, and practice.
Risk, Needs and Responsivity Assessments
Over the last half century, risk assessment within corrections has transitioned
from reliance on professional judgment in making classification decisions to the
widespread use of empirically-based, actuarial instruments. Even though more objective,
actuarial methods for assessing risk had been around since the late 1920s (Burgess,
1928), it was not until the 1970s that clinical judgment began to give way to the
development of what Bonta and Andrews (2007) refer to as second-generation risk
assessment instruments. Consisting mostly of static items such as criminal history, these
actuarial instruments, which have been found to consistently outperform clinical
judgment in predicting recidivism (Brennan, Dieterich, and Ehret, 2009), were developed
through statistical analyses. Following the emergence of the “what works” literature and
the growing acceptance of the risk-needs-responsivity (RNR) model, which places an
4
emphasis on assessing and targeting an offender’s criminogenic needs (dynamic risk
factors) for interventions, third-generation instruments began to incorporate both static
and dynamic predictors of recidivism. Continuing this focus on assessing static and
dynamic risk factors, fourth-generation risk assessment tools have been designed to
follow individuals from intake to case closure, be administered on multiple occasions,
and better integrate protective factors (i.e., factors that reduce recidivism risk) within the
assessment process (Brennan et al., 2009).
In calling for a concentration of programming resources on the highest-risk
offenders, the risk principle makes sense at both the individual and aggregate levels.
While an intervention has an aggregate effect size, its effects on individuals will vary.
After completing an intervention, even those that are effective, some individuals will
recidivate while others will desist. For example, let us assume we have an intervention
that reduces recidivism by 25 percent. If we applied this intervention to, say, 100 higher-
risk individuals whose baseline recidivism probability was 80 percent, we would expect
the intervention to lower recidivism by 25 percent, resulting in 60 recidivists. In other
words, the intervention produced desistance for 20 of the 100 offenders.
But what if we applied the intervention to a lower-risk group whose baseline
recidivism probability was 40 percent? If we assume the intervention lowers recidivism
by 25 percent, then there would be 30 recidivists; that is, the intervention produced
desistance for 10 of the 100 offenders, which is half the number we observed for the
higher-risk group. Conceptually, adhering to the risk principle can help maximize an
effective intervention's impact on recidivism.
5
One outstanding question, however, is whether we would still observe a 25
percent reduction for the higher-risk group compared to the lower-risk group. For the
higher-risk group, who may be more entrenched in a criminal lifestyle, it could be that
one intervention is insufficient to bring about desistance. To be sure, the extant literature
suggests that higher-risk individuals require more intensive programming (Bonta,
Wallace-Capretti, and Rooney, 2000; Lowenkamp and Latessa, 2005). But the use of risk
and needs assessments operates on the assumption that we assign individuals to
interventions on the basis of risk (high) and needs (high); that is, if a high-risk individual
has a high substance abuse need, we would presumably want to prioritize this person for
CD treatment. But would CD treatment be the most appropriate intervention or, more
specifically, the most effective in reducing recidivism risk for this individual?
While risk assessment involves predicting who is most likely to recidivate, the
goal of needs assessment is, or at least should be, to identify the areas in which
interventions would likely have the greatest impact in lowering recidivism risk. If an
individual is in prison for, say, 6 months and can only participate in one intervention,
which one would have the greatest impact on recidivism? Ostensibly, needs assessment
should be able to help us identify what type of intervention would be most beneficial.
None of the existing risk and needs assessments, however, have demonstrated
they can validly predict or identify needs. The existing literature seems to assume that if a
tool predicts recidivism, then it also predicts needs. But examining how well a tool
performs in predicting recidivism is an evaluation of its ability to assess risk, not needs.
Indeed, the factors that heighten the need for an intervention within a particular area may
not be predictive of recidivism. For example, the extent of chemical use in the 12 months
6
prior to prison may be more indicative of the need for CD treatment than it is for
recidivism risk.
But even if current assessments were able to accurately predict recidivism and
identify the salient needs areas of offenders, it is still critical to assess responsivity to
programming. For instance, a potential problem can arise from assigning high-risk, high-
need individuals to interventions that do not reduce recidivism because they are either
ineffective in general or insufficient for higher-risk individuals. Put another way, general
responsivity refers to types or programming that are most effective in reducing
recidivism, such as cognitive-behavioral interventions. Specific responsivity, on the other
hand, includes individual barriers that may limit the likelihood for program participation
and successful completion (Bonta and Andrews, 2007). Examples of specific responsivity
include motivation, anxiety, different forms of learning styles, language, transportation,
gender, and culture (Cullen, 2002).
Existing responsivity assessments have not been developed through the use of
statistical methods, or an actuarial approach. Instead, these assessments are, for the most
part, barely more than a list of items for practitioners to consider when making program
placement decisions. None of the responsivity assessments used on correctional
populations have, to our knowledge, been evaluated to determine whether they are valid
or reliable. It is therefore unclear whether these assessments perform well in identifying
responsivity factors or whether their use has led to more appropriate program
assignments and, ultimately, better recidivism outcomes.
In this study, we introduce a more rigorous, actuarial approach for assessing both
general and specific responsivity. In particular, we assess responsivity by estimating the
7
likelihood that an individual’s participation in an intervention will result in desistance. If
an individual participated in, say, CD treatment, what is the probability it would lead to
desistance? As we demonstrate later, this approach to responsivity assessment not only
accounts for the efficacy of an intervention but also the varying effects an intervention
has on individuals. By improving the process in which individuals are assigned to
interventions, we propose that an actuarial approach to responsivity assessment can help
achieve better recidivism outcomes.
Chemical Dependency Treatment in MnDOC
Shortly after their admission to prison in Minnesota, prisoners with at least six
months to serve in prison undergo a brief (20-40 minutes) chemical dependency (CD)
assessment conducted by a licensed assessor. CD assessors use DSM-IV criteria for
substance abuse in their diagnoses, which are based on both self-report and collateral
information. The criteria for abuse include problems at work or school, not taking care of
personal responsibilities, financial problems, engaging in dangerous behavior while
intoxicated, legal problems, problems at home or in relationships, and continued use
despite experiencing problems. The criteria for dependence, on the other hand, include
increased tolerance; withdrawal symptoms; greater use than intended over a relatively
long period of time, inability to cut down or quit; a lot of time spent acquiring, using, or
recovering from use; missing important family, work, or social activities; and knowledge
that continued use would exacerbate a serious medical or psychological condition. After
completing the assessment, CD assessors assign prisoners a rating of no need, moderate
need, or high need for CD treatment.
8
Even though most newly admitted offenders are considered to be chemically
abusive or dependent, the number of prisoners directed to CD treatment greatly exceeds
the number of CD treatment beds available. In fact, among prisoners who receive a CD
assessment, roughly one-fourth enter CD treatment during their confinement. As a result,
the Minnesota Department of Corrections (MnDOC) has used a relatively simple,
summative algorithm to prioritize prisoners for CD treatment.
The algorithm produces a score that ranges from a low of 0 points to a high of 40
points. Of the 40 possible points, 10 are based on the assessed need for CD treatment.
More specifically, prisoners are given 0 points for no need, 5 points for moderate need,
and 10 points for high need. Likewise, 10 of the 40 points are based on assessed
recidivism risk. As noted below, our sample contains prisoners released from Minnesota
prisons between 2003 and 2011. During this time, the MnDOC used the Level of Service
Inventory-Revised (LSI-R) to assess recidivism risk. Depending on their LSI-R score,
prisoners were given either 10 points (very high risk), 7 points (high risk), 4 points
(medium risk), or 0 points (low risk) for recidivism risk.
While risk and needs make up half of the 40 points in the algorithm, the offense
for which offenders are imprisoned accounts for a total 10 points. In particular, offenders
in prison for a felony DWI are given 10 points while those in prison for other offenses
receive 0 points. The final 10 points cover items related to factors such as mental illness,
traumatic brain injury, and a history of assaultive behavior. Based on the score (ranging
from 0 to 40) from the algorithm, prisoners are then given a CD treatment priority level
of 1 (score of 20 or higher), 2 (score between 14 and 19), or 3 (score of 13 or lower).
9
Priority level 1 prisoners are most likely to receive a CD treatment offer, followed by
those in priority level 2 and priority level 3.
A prior evaluation of the MnDOC's CD treatment showed it is effective in
reducing recidivism. Using propensity score matching to match 926 treated offenders
released in 2005 with 926 inmates who had been untreated, Duwe (2010) found that
treatment decreased the risk of recidivism by 17 percent for rearrest, 21 percent for
reconviction, and 25 percent for reimprisonment for a new felony offense. Moreover,
consistent with earlier research (Wexler et al., 1990), the results showed that increased
treatment time appeared to lower the risk of recidivism, but only up to a point. While
short-term (90 days) and medium-term (180 days) programs had a statistically significant
impact on all three recidivism measures, no significant effects were found for long-term
(365 days) programming.
Data and Method
Our overall sample consists of 23,034 offenders released from Minnesota prisons
between 2003 and 2011 who had been assessed for chemical dependency. Within this
sample, there were 2,314 females and 20,720 males. Each of the 23,034 prisoners were
given a treatment need level from one of three categories—high need for treatment,
moderate need, and no need—as well as a CD treatment priority level. Of the 23,034
prisoners, a total of 5,414 (24 percent) participated in CD treatment during their
confinement prior to their release.
While the treatment need level (high, moderate or no need) provides us with the
assessed CD treatment needs for the prisoners in our sample, we developed predictive
models for assessments of recidivism risk and responsivity. More specifically, because
10
there are important gender differences with respect to risk and needs, we initially
separated our overall sample into males (N = 20,720) and females (N = 2,314). Next, we
separated these samples into three sets by the year prisoners were released from prison.
Our first set, the training set, consisted of individuals (10,517 males and 1,250
females) released from Minnesota prisons between 2003 and 2007. Our second set, the
test set, contained individuals (4,876 males and 556 females) released from prison
between 2008 and 2009. Our final set, the validation set, consisted of individuals (5,327
males and 509 females) released from prison in either 2010 or 2011.
Focusing first on the assessment of recidivism risk, we developed predictive
models on the training set data. As shown in Tables 1 and 2, our dataset contained a total
of 36 predictors that are available when a MnDOC prisoner goes through intake at the
time of admission to prison. These predictors encompass items commonly found to be
predictive of recidivism, such as criminal history, age at release, gang affiliation, and
marital status. We also include items such as prison admission and offense type. Our
measure of recidivism is reconviction for a misdemeanor, gross misdemeanor, or felony
within three years of release from prison. We obtained reconviction data on all 23,034
prisoners from the Minnesota Bureau of Criminal Apprehension. As shown in Tables 1
and 2, females had lower recidivism rates compared to males.
For both males and females, we used two different classification methods—
logistic regression and random forests—to develop predictive models on the training set.
Over the last few decades, regression modeling has been increasingly used to develop
prediction tools in the criminal justice field (Brennan and Oliver, 2000; Duwe, 2012;
Duwe, 2014; Duwe and Freske, 2012; Lowenkamp and Whetzel, 2009), while the use of
11
Table 1. Descriptive Statistics for Male Prisoner Sample
Predictors Description Mean and SD Training Test Validation
Static/Criminal History Mean SD Mean SD Mean SD
Total Convictions Total # of convictions (any offense level) 11.05 8.47 13.05 9.49 14.15 10.46 Felony Convictions Total # of felony convictions 1.70 1.68 2.29 2.05 2.82 2.38
Felony Specialization/Diversity Degree of specialization/diversity in felony offenses 0.86 0.26 0.87 0.24 0.85 0.25
Violent Convictions Total # of violent offense convictions 1.46 1.81 1.69 2.04 1.87 2.09 Violent Specialization/Diversity Degree of specialization/diversity in violent offenses 0.91 0.21 0.92 0.19 0.92 0.19
Total Assault Convictions Total # of assault offense convictions 0.93 1.51 1.11 1.70 1.24 1.77
Total Robbery Convictions Total # of robbery convictions 0.17 0.57 0.20 0.68 0.20 0.67 VOFP Convictions Total VOFP, stalking and harassment convictions 0.14 0.54 0.22 0.73 0.32 0.89
Disorderly Conduct Convictions Total # of disorderly conduct convictions 0.09 0.34 0.15 0.46 0.24 0.63
Prostitution Convictions Total # of prostitution offense convictions 0.01 0.12 0.01 0.14 0.01 0.15 Drug Offense Convictions Total # of drug offense convictions 0.99 1.34 1.11 1.49 1.12 1.60
Drug Offense Specialization/Diversity Degree of specialization/diversity in drug offenses 0.93 0.18 0.95 0.15 0.96 0.14
False Information to Police Cons. Total # of false information to police convictions 0.44 0.90 0.52 0.99 0.55 1.03 Flee/Escape Convictions Total # of flee/escape police convictions 0.23 0.61 0.27 0.66 0.31 0.71
Weapons Offense Convictions Total # of weapons offense convictions 0.08 0.32 0.10 0.36 0.12 0.40
Total Property Convictions Total # of property offense convictions 2.91 4.09 3.03 4.38 3.29 4.77 Property Offense Specialization/Diversity Degree of specialization/diversity in prop. offenses 0.89 0.19 0.92 0.15 0.92 0.15
Driving While Intoxicated (DWI) Convictions Total # of DWI convictions 0.29 0.67 0.59 1.03 0.66 1.12
Failure to Register (FTR) Convictions Total # of FTR convictions 0.05 0.28 0.08 0.36 0.09 0.40 Total Supervision Failures Total # of revocations on probation and parole 1.08 1.36 1.17 1.50 1.30 1.63
Intake
Metro County of Commitment Commit from Twin Cities metro-area county 0.51 0.50 0.51 0.50 0.51 0.50 Length of Stay in Prison (Months) Difference in months between admission and release 18.72 16.15 21.14 20.83 22.03 22.90
New Court Commitment Admitted to prison directly from court 0.60 0.49 0.65 0.48 0.68 0.47
Probation Violator Admitted to prison for probation violation 0.36 0.48 0.34 0.47 0.31 0.46 Release Violator Admitted to prison for parole violation 0.04 0.21 0.01 0.12 0.01 0.11
Person Offense Most serious index offense is person offense = 1 0.21 0.41 0.22 0.41 0.23 0.42
Sex Offense Most serious index offense is sex offense = 1 0.08 0.26 0.06 0.24 0.06 0.24 Drug Offense Most serious index offense is drugs = 1 0.30 0.46 0.27 0.45 0.24 0.43
Property Offense Most serious index offense is property = 1 0.24 0.42 0.19 0.39 0.17 0.38
DWI Offense Most serious index offense is DWI = 1 0.05 0.21 0.12 0.32 0.09 0.29 Other Offense Most serious index offense is “Other” offense = 1 0.13 0.34 0.15 0.35 0.20 0.40
Suicidal History Suicidal history = 1; no history = 0 0.11 0.32 0.16 0.36 0.17 0.37 Security Threat Group (STG) Total # of STG criteria (0-10) 0.92 1.69 0.88 1.66 0.94 1.69
Marital Status Married = 1; unmarried = 0 0.11 0.32 0.11 0.31 0.11 0.31
Age at Release Age in years at time of release 33.15 9.36 34.51 9.85 34.60 10.02 Unsupervised Release Released to no correctional supervision 0.02 0.15 0.01 0.12 0.01 0.10
Recidivism within 3 Years
General Recidivism Reconviction for misd., gross misd., or felony 0.68 0.47 0.62 0.49 0.63 0.48 N 10,517 4,876 5,327
12
machine learning algorithms such as random forests has been more recent (Barnes and
Hyatt, 2012). Created by Breiman (2001), random forests is an ensemble method that
involves growing a forest of many trees, each of which is grown on an independent
bootstrap sample from the training data. Each time a tree is fit at each node, some of the
predictor variables are censored. Random forests then find the best split based on the
selected predictor variables. The trees are grown to a maximum depth, and a consensus
prediction is obtained after voting the trees.
Recent research has advocated testing multiple classification methods when
developing a predictive model (Duwe and Kim, 2016; Ridgeway, 2013), given that there
is no single best algorithm that yields the best performance in every situation (Caruana
and Niculescu-Mizil, 2006; Wolpert, 1996).1 To identify the best predictive models, we
evaluated performance on the test set data. After doing so, we then applied the best-
performing models to the validation set data. In the validation sets, each prisoner received
a predicted probability that reflects his or her likelihood of recidivating within three years
of release from prison.
To assess responsivity, we also developed predictive models on the training set
data. The main difference between the recidivism risk and responsivity assessments had
to do with the outcome being predicted. With the recidivism risk assessment, the
predicted outcome was whether individuals recidivated within three years. With
1 Prior research has provided mixed evidence on the performance of machine learning algorithms, such as
random forests, versus older, more traditional approaches like logistic regression. Some studies have found
little or no difference between these two sets of classification methods (Hamilton et al., 2015; Liu et al.,
2011; Tollenaar and van der Heijden, 2013), whereas others have observed a performance advantage for
machine learning approaches (Berk and Bleich, 2013; Caruana et al., 2006; Duwe and Kim, 2015, 2016;
Hess and Turner, 2013). The evidence seems to be clearer that statistical and machine learning algorithms
outperform simplistic, Burgess-style methods (Duwe and Kim, 2016). Given the fact there is no single best
algorithm that performs the best in every situation, research has advocated testing multiple algorithms
(Duwe and Kim, 2015, 2016; Ridgeway, 2013), which is the approach we have followed here.
13
Table 2. Descriptive Statistics for Female Prisoner Sample
Predictors Description Mean and SD Training Test Validation
Static/Criminal History Mean SD Mean SD Mean SD
Total Convictions Total # of convictions (any offense level) 9.24 7.67 10.75 9.38 11.86 10.23 Felony Convictions Total # of felony convictions 1.58 1.58 2.20 2.07 2.66 2.47
Felony Specialization/Diversity Degree of specialization/diversity in felony offenses 0.84 0.29 0.83 0.27 0.81 0.28
Violent Convictions Total # of violent offense convictions 0.86 1.89 0.80 1.49 0.98 1.98 Violent Specialization/Diversity Degree of specialization/diversity in violent offenses 0.94 0.19 0.96 0.13 0.94 0.18
Total Assault Convictions Total # of assault offense convictions 0.37 0.91 0.41 0.99 0.53 1.35
Total Robbery Convictions Total # of robbery convictions 0.07 0.32 0.07 0.30 0.10 0.48 VOFP Convictions Total VOFP, stalking and harassment convictions 0.03 0.23 0.05 0.31 0.08 0.39
Disorderly Conduct Convictions Total # of disorderly conduct convictions 0.04 0.22 0.11 0.41 0.19 0.54
Prostitution Convictions Total # of prostitution offense convictions 0.34 1.46 0.18 0.88 0.24 1.17 Drug Offense Convictions Total # of drug offense convictions 1.11 1.35 1.21 1.61 1.31 1.74
Drug Offense Specialization/Diversity Degree of specialization/diversity in drug offenses 0.89 0.25 0.91 0.21 0.92 0.19
False Information to Police Cons. Total # of false information to police convictions 0.49 0.97 0.64 1.29 0.56 1.05 Flee/Escape Convictions Total # of flee/escape police convictions 0.10 0.36 0.10 0.38 0.08 0.30
Weapons Offense Convictions Total # of weapons offense convictions 0.01 0.09 0.01 0.15 0.02 0.15
Total Property Convictions Total # of property offense convictions 3.38 4.87 3.58 5.61 3.78 5.67 Property Offense Specialization/Diversity Degree of specialization/diversity in prop. offenses 0.83 0.24 0.86 0.22 0.86 0.21
Driving While Intoxicated (DWI) Convictions Total # of DWI convictions 0.20 0.59 0.49 0.90 0.58 1.04
Failure to Register (FTR) Convictions Total # of FTR convictions 0.00 0.03 0.01 0.11 0.01 0.13 Total Supervision Failures Total # of revocations on probation and parole 0.85 0.92 1.04 1.06 0.97 1.12
Intake
Metro County of Commitment Commit from Twin Cities metro-area county 0.50 0.50 0.42 0.49 0.43 0.50 Length of Stay in Prison (Months) Difference in months between admission and release 11.93 11.61 14.30 14.26 17.11 15.79
New Court Commitment Admitted to prison directly from court 0.49 0.50 0.44 0.50 0.51 0.50
Probation Violator Admitted to prison for probation violation 0.47 0.50 0.50 0.50 0.48 0.50 Release Violator Admitted to prison for parole violation 0.04 0.20 0.06 0.24 0.01 0.10
Person Offense Most serious index offense is person offense = 1 0.13 0.34 0.13 0.34 0.15 0.36
Sex Offense Most serious index offense is sex offense = 1 0.01 0.08 0.01 0.08 0.01 0.10 Drug Offense Most serious index offense is drugs = 1 0.44 0.50 0.44 0.50 0.44 0.50
Property Offense Most serious index offense is property = 1 0.33 0.47 0.28 0.45 0.24 0.42
DWI Offense Most serious index offense is DWI = 1 0.03 0.17 0.07 0.26 0.11 0.31 Other Offense Most serious index offense is “Other” offense = 1 0.06 0.23 0.07 0.26 0.06 0.24
Suicidal History Suicidal history = 1; no history = 0 0.21 0.41 0.32 0.47 0.36 0.48 Security Threat Group (STG) Total # of STG criteria (0-10) 0.13 0.51 0.16 0.60 0.12 0.53
Marital Status Married = 1; unmarried = 0 0.10 0.30 0.09 0.29 0.12 0.33
Age at Release Age in years at time of release 34.61 8.45 35.70 9.36 35.88 9.33 Unsupervised Release Released to no correctional supervision 0.03 0.18 0.06 0.23 0.00 0.00
Recidivism
General Reconviction for misd., gross misd., or felony 0.59 0.49 0.57 0.50 0.50 0.50 N 1,250 555 509
14
responsivity assessment, the predicted outcome was whether individuals had 1)
participated in CD treatment and 2) desisted within three years of release from prison.
Therefore, for the entire dataset, we created a variable, CD treatment desistance, that
assigned a value of “1” to desistors who participated in CD treatment and a value of “0”
to all other offenders. As a result, offenders who participated in CD treatment but
recidivated were given a value of “0”. Likewise, offenders who desisted but did not
participate in CD treatment were assigned a value of “0” for this item.
After developing responsivity assessment models on the training set data for both
males and females, we evaluated predictive performance on the test sets. We then applied
the best-performing models to the validation sets for males and females. In the validation
sets, each offender received a predicted probability that reflects his or her likelihood of
desisting as a result of CD treatment.
Predictive Performance Metrics
To measure predictive performance, we used six metrics to capture the three main
areas of predictive validity—accuracy, discrimination, and calibration. To evaluate
predictive accuracy, which assesses how well a model makes correct classification
decisions, we used accuracy (ACC). For predictive discrimination, which measures the
degree to which the model separates the recidivists from the desistors, we used three
separate metrics—the AUC, the H measure developed by Hand (2009), and the precision-
recall curve (PRC). The AUC has been one of the most widely used predictive
performance metrics, and it is relatively robust across different recidivism base rates and
selection ratios (Smith, 1996). Still, the AUC can provide overly optimistic estimates of
predictive discrimination for imbalanced datasets (Davis and Goadrich, 2006), and it can
15
provide misleading results if receiver operating characteristic (ROC) curves cross (Hand,
2009). As a result, we also used Hand’s H-measure, which uses a common cost
distribution for all classifiers (Hand, 2009), and the precision-recall curve (PRC), which
assesses discrimination with the precision and recall values. Precision measures the
percent of positive predictions that were correct (based on the 50 percent threshold),
whereas recall reflects the percentage of positives (i.e., recidivists) that were captured.
Compared to the AUC, the PRC has been found to be a better metric for highly
imbalanced datasets (i.e., making predictions for an infrequently occurring outcome)
(Davis and Goadrich, 2006).
Calibration assesses how well the predicted probabilities from a model correspond
with the observed outcome being predicted. For our calibration metric, we used root
mean square error (RMSE), which measures the squared root of the average squared
difference between observed recidivism and predicted probabilities. The sixth metric we
used is the SAR (squared error, accuracy, and ROC area) statistic developed by Caruana,
Niculescu-Mizil, Crew, and Ksikes (2004). SAR is a combined measure of
discrimination, accuracy and calibration, and the formula for SAR is: (ACC + AUC + (1
– RMSE))/3 (Caruana, Niculescu-Mizil, Crew, and Ksikes, 2004).
Prioritizing Prisoners for CD Treatment
In the validation sets for the male and female prisoners, each offender had been
assessed for risk, needs, and responsivity. Put another way, each of the 5,327 males and
509 females in the validation sets had values for 1) recidivism risk probability, 2) CD
treatment need, and 3) responsivity probability. The values for both recidivism risk and
responsivity ranged from a low of 0 percent to a high of 100 percent. A higher predicted
16
probability for recidivism signifies a higher risk for recidivism. On the other hand, a
higher predicted probability for responsivity denotes a greater likelihood that an
individual will desist after participating in CD treatment. The values for CD treatment
need consisted of “1” for no need, “2” for moderate need, and “3” for high need.
Using these values from the risk, needs, and responsivity assessments, we
examined several different ways of prioritizing prisoners for CD treatment. In particular,
we prioritized offenders on the basis of 1) risk and needs, 2) risk and responsivity, 3)
needs and responsivity, and 4) risk, needs, and responsivity. For example, in prioritizing
prisoners by risk and needs, we added the values from the risk and needs assessments to
form a total risk-needs score. Likewise, to prioritize prisoners by risk, needs, and
responsivity, we added the values from the risk, needs, and responsivity assessments to
form a total risk-needs-responsivity score. Therefore, individuals with the highest scores
are presumably those with the highest risk, needs, and responsivity to CD treatment.
A little more than one-fourth of the prisoners in the male and female validation
sets entered CD treatment. For example, 1,377 (26%) of the 5,327 male offenders in the
validation set participated in CD treatment. The recidivism rate for the treated offenders
was 49 percent versus 68 percent for those who were untreated. For females, 145 (28%)
of the 509 offenders participated in CD treatment. The recidivism rate for the treated
offenders was 35 percent compared to 56 percent for those who were untreated
To determine how each prioritization scheme might perform in assigning
individuals for CD treatment, we organized the validation sets into quartiles and then
analyzed recidivism outcomes by CD treatment participation. To illustrate with the 5,327
male offenders in the validation set, the recidivism rate was 49 percent for the 1,377
17
(26%) who entered CD treatment and 68 percent for the 3,950 (74%) who did not. The
rate was therefore 27 percent lower for the treated offenders. With a recidivism rate of 49
percent among the 1,377 treated offenders, there were still 677 who were recidivists. Yet,
if we assumed that none of the 1,377 were able to enter treatment and the recidivism rate
for untreated offenders is 68 percent, then 932 would have been recidivists. Delivering
CD treatment to the 1,377 offenders is thus associated with a reduction of 255 recidivists
(932 minus 677).
If we prioritized the top one-fourth of offenders (i.e., CD treatment capacity) on
the basis of risk-needs, risk-responsivity, needs-responsivity, or risk-needs-responsivity,
would we still see a 27 percent reduction? Similarly, if we prioritized the top one-fourth
on the basis of these four prioritization schemes, would we still observe 255 prevented
recidivists? Would the treatment effect sizes and number of prevented recidivists be
smaller, larger, or about the same? To answer these questions, we present the findings in
the following section.
Results
In Table 3, we present the predictive performance results from the recidivism and
responsivity assessments for males and females. As noted above, we used two types of
classification methods—logistic regression and random forests. The results in Table 3
indicate the recidivism risk models for both classification methods predicted recidivism
relatively well for both males and females. For male offenders, the logistic regression
model slightly outperformed the random forests model across each of the six predictive
performance metrics in both the test and validation sets. For female offenders, the
random forests model slightly outperformed logistic regression.
18
Metrics ACC AUC H PRC RMSE SAR
TEST SET
Recidivism Baseline
Females
Responsivity
Males
Females
VALIDATION SET
Recidivism Baseline
Females
Responsivity
Males
Females
Logistic Regression 0.806 0.781 0.205 0.396 0.367 0.740
Random Forests 0.829 0.839 0.319 0.486 0.345 0.774 Notes: ACC = Accuracy; AUC = Area Under the Curve; PRC = Precision-Recall Curve;
RMSE = Root Mean Squared Error; SAR = Squared Error, Accuracy, ROC (Receiver
Operating Characteristic)
When we focus on assessing responsivity to CD treatment, the random forest
models performed best for male and female offenders in both the test and validation sets.
In particular, the validation test results for males indicate the random forests model had
good predictive discrimination (AUC = 0.82) and it yielded a correct classification rate of
19
87 percent. With an SAR value of 0.79, the random forests model had strong overall
predictive performance. We observed similar findings for females. The random forests
model achieved an accuracy rate of 83 percent, an AUC of 0.84, and a SAR of 0.77.
Overall, the responsivity assessment models had better predictive performance than those
for recidivism risk.
Measures Recidivism Rates Number of Recidivists
Treated N Untreated N Effect Treated Untreated Prevented Total N
Overall 0.492 1,377 0.677 3,950 0.273 677 932 255 5,327
Overall Adjusted 0.492 1,332 0.677 3,950 0.273 655 902 247 5,282
Risk-Needs-Responsivity
1 (Top 25%) 0.593 479 0.856 853 0.307 790 1,140 350 1,332
2 (26-50%) 0.550 350 0.760 982 0.276 733 1,012 279 1,332
3 (51-75%) 0.390 323 0.610 1,009 0.361 519 813 294 1,332
4 (Bottom 25%) 0.320 221 0.530 1,110 0.396 426 705 279 1,331
Risk-Needs
1 (Top 25%) 0.770 184 0.860 1,148 0.105 1,026 1,146 120 1,332
2 (26-50%) 0.630 329 0.720 1,003 0.125 839 959 120 1,332
3 (51-75%) 0.460 490 0.570 842 0.193 613 759 146 1,332
4 (Bottom 25%) 0.280 374 0.480 957 0.417 373 639 266 1,331
Risk-Responsivity
1 (Top 25%) 0.588 466 0.851 866 0.309 783 1,134 351 1,332
2 (26-50%) 0.553 349 0.764 983 0.276 737 1,018 281 1,332
3 (51-75%) 0.448 290 0.662 1,042 0.323 597 882 285 1,332
4 (Bottom 25%) 0.294 272 0.468 1,059 0.372 391 623 232 1,331
Needs-Responsivity
1 (Top 25%) 0.398 732 0.443 600 0.102 530 590 60 1,332
2 (26-50%) 0.549 419 0.634 913 0.134 731 844 113 1,332
3 (51-75%) 0.760 129 0.739 1203 +0.03 1,012 984 -28 1,332
4 (Bottom 25%) 0.598 97 0.762 1234 0.215 796 1,014 218 1,331
20
In Tables 4 and 5, we present the results for the male and female offenders. Here,
we focus only on the classification method that produced the best results. Whereas
random forests yielded the best outcomes for males, it was logistic regression for
females. For both males and females, we analyzed the results according to the four
different schemes for prioritizing offenders for CD treatment. Therefore, we compared
the overall results from the validation set with the four prioritization schemes: 1) risk-
needs, 2) risk-responsivity, 3) needs-responsivity, and 4) risk-needs-responsivity.
Table 5. Female Prisoner Results
Measures Recidivism Rates Number of Recidivists
Treated N Untreated N Effect Treated Untreated Prevented Total N
Overall 0.352 145 0.560 364 0.371 51 81 30 509
Overall Adjusted 0.352 127 0.560 364 0.371 45 71 26 491
Risk-Needs-Responsivity
1 (Top 25%) 0.500 34 0.828 93 0.396 64 105 41 127
2 (26-50%) 0.333 30 0.619 97 0.462 42 79 37 127
3 (51-75%) 0.386 44 0.482 83 0.199 49 61 12 127
4 (Bottom 25%) 0.189 37 0.297 91 0.364 24 38 14 128
Risk-Needs
1 (Top 25%) 0.706 17 0.855 110 0.174 90 109 19 127
2 (26-50%) 0.571 21 0.581 106 0.017 73 74 1 127
3 (51-75%) 0.333 48 0.354 79 0.059 42 45 3 127
4 (Bottom 25%) 0.186 59 0.290 69 0.359 24 37 13 128
Risk-Responsivity
1 (Top 25%) 0.529 34 0.817 93 0.353 67 104 37 127
2 (26-50%) 0.286 28 0.616 99 0.536 36 78 42 127
3 (51-75%) 0.439 41 0.512 86 0.143 56 65 9 127
4 (Bottom 25%) 0.167 42 0.267 86 0.374 21 34 13 128
Needs-Responsivity
1 (Top 25%) 0.261 69 0.293 58 0.109 33 37 4 127
2 (26-50%) 0.349 43 0.417 84 0.163 44 53 9 127
3 (51-75%) 0.632 19 0.648 108 0.025 80 82 2 127
4 (Bottom 25%) 0.429 14 0.719 114 0.403 55 92 37 128
In Table 4, the results show there were 1,377 treated offenders and 3,950
untreated offenders. The three-year reconviction rate was 49.2 percent for the treated and
67.7 percent for the untreated. The treated rate was therefore 27 percent lower than the
21
untreated rate. Among the 1,377 who were treated, there were 677 recidivists. If the
1,377 treated offender had not been treated and their rate was 67.7 percent, then 932
would have been recidivists. As a result, CD treatment prevented 255 recidivists. We also
show the overall adjusted figures based on one-fourth (N = 1,332) participating in CD
treatment.
In Table 5, the results for females show there were 145 treated offenders and 364
untreated offenders. The three-year reconviction rate was 35.2 percent for the treated and
56.0 percent for the untreated. The treated rate was therefore 37 percent lower than the
untreated rate. Among the 145 who were treated, there were 51 recidivists. If the 145
treated offenders had not been treated and their rate was 56.0 percent, then 81 would have
been recidivists. As a result, CD treatment prevented 30 recidivists. We also show the
overall adjusted figures based on one-fourth (N = 127) participating in CD treatment.
As shown in Tables 4 and 5, the risk-needs-responsivity (RNR) scheme
performed the best for both males and females, followed by risk-responsivity (RR), risk-
needs (RN), and needs-responsivity (NR). For males, the RNR and RR schemes
performed roughly the same, while the RNR scheme for females was clearly better than
the RR scheme. For both males and females, the RNR and RR prioritization schemes
increased the number of prevented recidivists while preserving the treatment effect size.
To illustrate, when we focus on the RNR scheme for males, we see that the effect
size among the top one-fourth (30.7 percent reduction) is actually a little larger than it is
for the overall sample (27.3 percent reduction). Moreover, because the RNR scheme
effectively isolated the higher-risk offenders, it prevented a larger number of recidivists.
Indeed, the number of prevented recidivists in the RNR scheme (350) was more than 100
22
higher than the number (247) for the overall adjusted sample. Focusing on the RNR
scheme for females, we see the effect size (39.6 percent) is larger than the overall effect
size (37.1 percent). The number of recidivists prevented (41) is also 15 higher than that
observed (26) for the overall adjusted sample. Combined, the RNR scheme accounted for
118 additional prevented recidivists, whereas the RR scheme was responsible for 115
prevented recidivists.
In contrast, neither the NR nor RN schemes performed well for either males or
females. For example, the NR scheme would produce an estimated 207 fewer prevented
recidivists, whereas the RN scheme yielded 133 fewer prevented recidivists. Although
the NR scheme may have performed well in identifying who needs CD treatment and
who would benefit from it the most, it is still important to account for recidivism risk.
Likewise, the RN scheme was effective in identifying higher-risk offenders, but it did not
perform well in identifying those who would benefit from CD treatment. Recidivism
rates were higher for offenders in the upper quartiles, but the treatment effect size was
smaller. It is possible that for some higher-risk offenders, CD treatment alone is
insufficient to help bring about desistance. For these offenders, they may need another
intervention or, more precisely, multiple interventions. Assessing for responsivity,
however, helps identify who would benefit the most from CD treatment, even among the
higher-risk offenders.
In Table 6, we estimate the overall impact that each prioritization scheme might
have on recidivism. Combined, the male and female validation sets included 5,836
released prisoners, of whom 1,522 were treated. The recidivism rate was 61.8 percent for
these offenders, resulting in 3,606 recidivists. If none of the 1,522 had been treated, the
23
estimated rate would have been 66.7 percent, resulting in 3,891 recidivists. The
prioritization scheme used by the MnDOC yielded 285 prevented recidivists.
Table 6. Overall Results
No Treatment 5,836 0 3,891 66.7%
Current State 5,836 1,522 3,606 61.8% 285 5.34
Needs-Responsivity 5,836 1,522 3,826 65.6% 65 23.42
Risk-Needs 5,836 1,522 3,749 64.2% 142 10.72
Risk-Responsivity 5,836 1,522 3,487 59.8% 404 3.77
Risk-Needs-Responsivity 5,836 1,522 3,481 59.7% 410 3.71
NNT = Number Needed to Treat
The number needed to treat (NNT) is a statistic that has been used, often in
epidemiology, to measure the efficacy of different types of treatment. NNT quantifies the
number of participants who would need to participate in an intervention in order to
produce one beneficial outcome. The NNT formula for this study is: 1/ (recidivism rate
for untreated prisoners) – (recidivism rate for treated prisoners). With 1,522 receiving
treatment, the number needed to treat (NNT) to achieve one desistor was 5.34.
When we examine the overall impact for each of the four prioritization schemes,
we see that both the NR and RN schemes performed worse than the current scheme used
by the MnDOC. The NR model achieved 65 desistors for a NNT of 23.42, whereas the
RN scheme was slightly better with 142 desistors and a NNT of 10.72. In contrast, the
RR model netted 404 desistors, resulting in a NNT of 3.77. The RNR model yielded 410
desistors, resulting in a NNT of 3.71. Compared to the current scheme used by the
MnDOC, the RNR model would produce 125 more desistors, lowering the recidivism
rate by a little more than two percentage points.
24
Conclusion
Often consisting of little more than a checklist of items, the assessment of
responsivity has been the neglected “R” in the RNR model. Here we introduced a more
rigorous, actuarial approach for assessing responsivity by attempting to predict which
prisoners would desist from crime after participating in a correctional intervention. The
responsivity assessment we presented in this study not only accounts for the efficacy of
an intervention, but it can also be combined with risk and needs assessments to
potentially produce better treatment assignments.
The results showed the responsivity assessments had relatively high levels of
predictive performance for male and female prisoners. More important, however, the
findings suggest that including an actuarial assessment for responsivity can help further
enhance the effectiveness of an effective intervention. We observed the best recidivism
outcomes when we combined the responsivity assessments with those for risk and needs.
Prioritizing the highest risk and need offenders who would likely benefit the most from
CD treatment increased the treatment effect size, improved the NNT metric, and lowered
the overall recidivism rate by two percentage points. Even though the prevention of more
than 100 individuals from becoming recidivists may not seem substantial, a reduction of
this magnitude is notable because crime is costly. Indeed, the costs resulting from crime
include victimization costs, criminal justice system (law enforcement, courts, and
corrections) costs, offender lost productivity, and public willingness-to-pay costs (Cohen
and Piquero, 2009). Although property offenses generally incur a relatively low cost, it
has been estimated that violent crimes such as a sex offense can cost society up to a half
million dollars or, more significantly, that one murder costs between $10 and $20 million
25
(in 2018 dollars) (Cohen and Piquero, 2009; DeLisi, Kosloski, Sween, Hachmeister,
Moore, and Drury, 2010; McCollister, French, and Fang, 2010).
While the findings suggest that using actuarial responsivity assessments may help
maximize the public safety benefits from effective interventions by prioritizing offenders
more effectively, several limitations are worth highlighting. Most notably, we examined
only one intervention (CD treatment) for one needs area (substance abuse) for prisoners
from one jurisdiction (Minnesota). In addition, we examined only two types of
classification methods (logistic regression and random forests), and we used a very
simplistic, summative approach for combining the risk, needs, and responsivity
assessments. Therefore, it is unclear the extent to which the findings presented here,
which should be considered preliminary, are generalizable. Still, because the findings are
promising, below we discuss the implications they may have for correctional research,
policy, and practice.
First, the results suggest that factors commonly associated with recidivism, such
as criminal history, gang affiliation, or marital status, may also have an impact on
responsivity. Indeed, it is worth reiterating that our responsivity assessment models had
better predictive performance than those for recidivism. Therefore, factors affecting
responsivity to correctional interventions may not only include those typically considered
such as gender, culture, language, and motivation, but also those more commonly
associated with recidivism risk.
Second, in addition to considering factors normally associated with recidivism,
the approach for assessing responsivity we introduced here has the advantage of helping
empirically determine whether an intervention would be effective in reducing recidivism
26
for individual offenders. Within the current RNR framework, offenders are assigned to
interventions on the basis of risk, needs and, in some instances, responsivity. It is
generally unclear, however, whether the intervention is actually effective or, even if it is,
whether the individual would benefit from the intervention. Just because the literature
indicates that prison-based drug treatment is generally effective does not mean that a
specific drug treatment program will be effective in reducing recidivism. After all, issues
such as a lack of program integrity can compromise the effectiveness of a correctional
intervention (Duwe and Clark, 2015). Yet, by assigning individuals to effective
interventions that are, in turn, the best interventions for those individuals, the use of an
actuarial approach for assessing responsivity holds the potential of delivering better
recidivism outcomes overall.
Third, even though the RNR model recommends assigning offenders on the basis
of risk, needs, and responsivity, treatment assignment decisions are often made strictly on
the basis of risk and needs due to the absence of any formal assessments for responsivity.
As such, offenders who are prioritized for programming are those with the highest risk
and needs. Our findings suggest, however, that assigning offenders strictly on the basis of
risk and needs may not deliver the desired results. Indeed, when we assigned offenders
just on the basis of risk and needs, we observed a reduced effect size for CD treatment, a
higher NNT, and fewer prevented recidivists. What these findings suggest is that many of
the highest-risk individuals may be too entrenched in a criminal lifestyle to desist as a
result of participating in CD treatment. While CD treatment may be enough to get lower-
risk prisoners to desist, more programming is needed for the higher-risk offenders. This
finding is consistent with the notion that greater doses of programming (i.e., multiple
27
interventions that address multiple needs areas) are needed for the highest-risk offenders
to help bring about desistance (Lowenkamp and Latessa, 2005).
Finally, notwithstanding the focus on a single correctional intervention in this
study, we suggest that simultaneously assessing responsivity to multiple interventions
may yield the greatest benefits. Correctional agencies typically have more than one
intervention to offer offenders and, as noted above, a single intervention may be
insufficient to bring about desistance for those with a higher risk for recidivism.
Therefore, the goal should involve conducting responsivity assessments for all
interventions an agency may have to provide offenders.
For example, let us assume a corrections agency has five interventions to which
offenders can be assigned on the basis of a risk and needs assessment. Responsivity
assessments for each of the five interventions may help better identify which programs
would work best for each individual offender. Moreover, for the higher-risk offenders
with longer confinement periods, which would allow for participation in multiple
programs, the responsivity assessment could evaluate which combinations of
interventions would most likely lead to desistance.
To illustrate, let us assume we have a very high risk individual who will be in
prison for two years, which is ample time to participate in multiple interventions. Let us
further assume a single intervention is unlikely to result in desistance for this individual.
If completing, say, CD treatment is unlikely to help this individual desist, what would his
probability for desistance be after completing CD treatment and an employment program
or cognitive-behavioral therapy? Responsivity assessments to multiple interventions
28
might reveal the best combination of programming for this individual and, in doing so,
would help deliver better recidivism outcomes overall.
As indicated by the limitations noted earlier, this study should be considered a
first step towards taking a more rigorous, actuarial approach to responsivity assessment.
Future research should examine whether this approach is effective for other types of
interventions for different offender populations in other jurisdictions. Along the same
lines, future studies should look at whether actuarial responsivity assessments can
accommodate multiple interventions so as to identify which intervention might work best
for an individual or whether multiple interventions are needed to achieve desistance for
higher-risk offenders. In addition, because we used a simple summative approach in
combining the values from the risk, needs, and responsivity assessments, future research
should examine whether there are more effective procedures for consolidating values into
a composite score.
If an actuarial approach for assessing responsivity is proven to be viable and
generalizable, there would undoubtedly be questions about how best to implement this
approach in practice. Given the reliance on historical programming data to assess
responsivity, the method we introduced here would seem to favor a more customized
assessment process that is specific to an agency and the programming it provides. This
does not mean, however, that a more generic actuarial responsivity assessment could not
be developed and integrated with global, off-the-shelf risk and needs assessments that are
used across multiple jurisdictions. Regardless of whether a valid and reliable generic
assessment can be developed, our findings suggest that actuarial responsivity assessment
29
is an area in need of more research in the future due to the potential impact it could have
on the programming assignment process and, more broadly, public safety.
30
REFERENCES
Barnes, G.C. & Hyatt, J.M. (2012). Classifying adult probationers by forecasting future
offending. National Institute of Justice: Washington, DC.
Berk, R.A., & Bleich, J. (2013). Statistical procedures for forecasting criminal behavior:
A comparative assessment. Criminology & Public Policy 12: 513-544.
Bonta, J. & Andrews, D.A. (2007). Risk-Needs-Responsivity Model for Offender
Assessment and Rehabilitation. Ottawa: Public Safety Canada.
Bonta, J., S. Wallace-Capretta, & J. Rooney, (2000). A Quasi-Experimental Evaluation of
an Intensive Rehabilitation Supervision Program. Criminal Justice and Behavior,
27, 312-329.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.
Brennan, T., Dieterich, W., & Ehret, B. (2009). Evaluating the predictive validity of the
COMPAS risk and needs assessment system.
Brennan, T., & Oliver, W.L. (2000). Evaluation of Reliability and Validity of COMPAS
Scales: National Aggregate Sample. Traverse City, MI: Northpointe Institute for
Public Management.
Burgess, E.W. (1928). Factors determining success or failure on parole. In A.A. Bruce,
E.W. Burgess, J. Landesco, & A.J. Harno (Eds.), The workings of the
indeterminate sentence law and the parole system in Illinois, (pp. 221–234).
Springfield, IL: Illinois State Board of Parole.
Caruana, R., Niculescu-Mizil, A., Crew, G., & Ksikes, A. (2004). Ensemble selection
from libraries of models, in Proceedings of the 21st International Conference on
Machine Learning, Canada: Banff, 1-12.
Caruana, R. & Niculescu-Mizil, A. (2006). An empirical comparison of supervised
learning algorithms using different performance metrics, in Proceedings of the
23rd International Conference on Machine Learning, New York: Association for
Computing Machinery, 161-168.
Cohen, M. A., & Piquero, A.R. (2009). New evidence on the monetary value of saving a
high risk youth. Journal of Quantitative Criminology, 25, 25-49.
31
Cullen, F. T. (2002) “Rehabilitation and Treatment Programs.” In J. Q. Wilson and J.
Petersilia (eds.), Crime: Public Policies for Crime Control, 2nd edition. San
Francisco: ICS Press.
Davis, J. & Goadrich, M. (2006). The relationship between precision-recall and ROC
curves, in Proceedings of the 23rd International Conference on Machine
Learning, Canada: Banff, 1-12.
DeLisi, M., Kosloski, A., Sween, M., Hachmeister, E., Moore, M., & Drury, A. (2010).
Murder by numbers: Monetary costs imposed by a sample of homicide offenders.
The Journal of Forensic Psychiatry & Psychology, 21:501-513.
Duwe, G. (2010). Prison-based chemical dependency treatment in Minnesota: An
outcome evaluation. The Journal of Experimental Criminology, 6: 57-81.
Duwe, G. (2012). Predicting first-time sexual offending among prisoners without a prior
sex offense history: The Minnesota Sexual Criminal Offending Risk Estimate
(MnSCORE). Criminal Justice and Behavior, 39, 1,434-1,454.
Duwe, G. (2014). The development, validity, and reliability of the Minnesota Screening
Tool Assessing Recidivism Risk (MnSTARR). Criminal Justice Policy Review,
25, 579-613.
Duwe, G. & Clark, V. (2015). Importance of program integrity: Outcome evaluation of a
gender-responsive, cognitive-behavioral program for female offenders.
Criminology & Public Policy, 14, 301-328.
Duwe, G. & Freske, P. (2012). Using logistic regression modeling to predict sex offense
recidivism: The Minnesota Sex Offender Screening Tool-3 (MnSOST-3). Sexual
Abuse: A Journal of Research and Treatment, 24, 350-377.
Duwe, G. & Kim, K. (2016). Sacrificing accuracy for transparency in recidivism
risk assessment: The impact of classification method on predictive performance.
Corrections: Policy, Practice and Research, 1, 155-176.
Gottfredson, S.D. & Moriarty, L.J. (2006). Statistical risk assessment: Old problems and
new applications. Crime and Delinquency, 52(1), 178–200.
Hamilton, Z., Neuilly, M-A., Lee, S., & Barnoski, R. (2014). Isolating modeling effects
in offender risk assessment. Journal of Experimental Criminology. DOI:
10.1007/s11292-014-9221-8.
32
Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area
under the ROC curve. Machine Learning, 77, 103-123.
Hess, J. & Turner, S. (2013). Risk Assessment Accuracy in Corrections Population
Management: Testing the Promise of Tree Based Ensemble Predictions. Center
for Evidence-Based Corrections: The University of California, Irvine.
Liu, Y.Y., Yang, M., Ramsey, M., Li, X.S., & Cold, J.W. (2011). A comparison of
logistic regression, classification and regression tree, and neural network models
in predicting violent re-offending. Journal of Quantitative Criminology, 27, 547-
573.
placement. Criminology and Public Policy, 4, 501-528.
Lowenkamp, C.T. & Whetzel, J. (2009). The development of an actuarial risk assessment
instrument for U.S. Pretrial Services. Federal Probation, 73, 33-36.
McCollister, K.E., French, M.T., & Fang, H. (2010). The cost of crime to society: New
crime-specific estimates for policy and program evaluation. Drug and Alcohol
Dependence, 108, 98-109.
Ridgeway, G. (2013). The Pitfalls of Prediction. National Institute of Justice Journal,
Issue No. 271.
Smith, W. (1996). The effects of base rate and cutoff point choice on commonly used
measures of association and accuracy in recidivism research. Journal of
Quantitative Criminology, 12, 83-111.
Tollenaar, N., & van der Heijden, P.G.M. (2013). Which method predicts recidivism
best? A comparison of statistical, machine learning and data mining predictive
methods. Journal of the Royal Statistical Society, Series A 176 (part 2): 565-584.
Wexler, H.K., Falkin, G.P. & Lipton, D.S. (1990). Outcome evaluation of a prison
therapeutic community for substance abuse treatment. Criminal Justice and
Behavior, 17, 71-92.
Wolpert, D.H. (1996). The lack of a priori distinctions between learning algorithms.
Neural Computation, 8, 1,341-1,390.
Authors
St. Paul, MN 55108-5219
St. Paul, Minnesota 55108-5219
January 2019
This information will be made available in alternative format upon request.
Printed on recycled paper with at least 10 percent post-consumer waste
Prevailing correctional practice holds that offenders should be assigned to interventions
on the basis of assessments for risk, needs, and responsivity. Assessments of responsivity,
however, typically consist of little more than a checklist of items such as motivation,
gender, language, or culture. We introduce a new actuarial approach for assessing
responsivity, which focuses on predicting whether individuals will desist after
participating in an intervention. We assess responsivity by using multiple classification
methods and predictive performance metrics to analyze various approaches for
prioritizing individuals for correctional interventions. The results suggest that adding an
actuarial responsivity assessment to the existing risk and needs assessments would likely
improve treatment assignments and further enhance the effectiveness of an effective
intervention. We conclude by discussing the implications of more rigorous responsivity
assessments for correctional research, policy and practice.
1
Introduction
As correctional agencies have increasingly embraced the idea of evidence-based
practices, risk-needs-responsivity (RNR) has become the prevailing model to guide the
delivery of correctional interventions. The risk principle holds that programming
resources should be reserved for higher-risk individuals, whereas the needs principle
dictates that interventions should target criminogenic needs areas, or dynamic risk factors
that are susceptible to change. The responsivity principle, meanwhile, suggests that
programming should be tailored to the strengths, abilities, and learning styles of
individuals.
Because the RNR paradigm holds that the effective delivery of programming
should be customized to an individual’s risk, needs, and responsivity, the use of
assessment instruments is central to this model. Currently, most of the widely-used tools
simultaneously assess for risk and needs. Risk assessment involves predicting who is
most likely to recidivate, while needs assessment focuses on identifying which
interventions would be the most appropriate for an individual (Gottfredson and Moriarity,
2006).
Among existing assessment instruments, the only component that is truly actuarial
is the assessment of recidivism risk. That is, these instruments rely on statistical methods
to estimate the likelihood an individual will commit a new crime in the future. The
assessment of criminogenic needs, on the other hand, does not use an actuarial approach.
Instead, the common strategy involves tallying up the number of items related to each
criminogenic needs area. The needs areas, or domains, with the highest scores are those
that presumably should be targeted for programming. For example, if an individual scores
2
highest for the substance abuse domain, then substance abuse treatment would be
considered an appropriate—if not the most appropriate—intervention.
As with the assessment of criminogenic needs, contemporary instruments do not
use actuarial methods to assess responsivity. Of the three principles within the RNR
model, responsivity is generally an afterthought. Indeed, rather than being described as
risk, needs, and responsivity assessment tools, the most widely-used instruments are
typically referred to as risk and needs assessments. And even when there is an attempt to
account for responsivity, the assessment of responsivity is barely more than a checklist of
items such as motivation, gender, and culture.
Present Study
In this study, we introduce an actuarial approach for assessing responsivity, which
involves estimating the likelihood that an individual’s participation in an intervention will
result in desistance. If an individual participated in, say, substance abuse treatment, what
is the probability it would lead to desistance? With risk assessment, the focus is on
identifying who will recidivate. It is the opposite for responsivity assessment, where the
focus is on identifying who will desist, or not recidivate. Yet, because responsivity
assessment also considers participation in correctional interventions, it attempts to predict
whether participating in an intervention will result in desistance. In doing so, the
responsivity assessment we present in this study also accounts for the efficacy of an
intervention.
Our sample consists of more than 23,000 offenders released from Minnesota
prisons between 2003 and 2011. We focus on prisoner participation in prison-based
chemical dependency treatment, which has been found to be effective in reducing
3
differences, we conducted separate analyses for male and female offenders.
Each of the offenders had been assessed for chemical dependency (CD) needs
upon their entry to prison. We developed baseline recidivism prediction models (i.e., risk
assessment) along with responsivity assessment models that predict desistance following
participation in prison-based CD treatment. Using the risk, needs, and responsivity
assessment data, we then examined the performance of various prioritization schemes in
reducing recidivism. In addition to prioritizing on the basis of risk and needs, we
prioritized offenders on the basis of risk-needs-responsivity, risk and responsivity, and
needs and responsivity. We estimate the overall impact on recidivism and conclude by
discussing the implications for correctional research, policy, and practice.
Risk, Needs and Responsivity Assessments
Over the last half century, risk assessment within corrections has transitioned
from reliance on professional judgment in making classification decisions to the
widespread use of empirically-based, actuarial instruments. Even though more objective,
actuarial methods for assessing risk had been around since the late 1920s (Burgess,
1928), it was not until the 1970s that clinical judgment began to give way to the
development of what Bonta and Andrews (2007) refer to as second-generation risk
assessment instruments. Consisting mostly of static items such as criminal history, these
actuarial instruments, which have been found to consistently outperform clinical
judgment in predicting recidivism (Brennan, Dieterich, and Ehret, 2009), were developed
through statistical analyses. Following the emergence of the “what works” literature and
the growing acceptance of the risk-needs-responsivity (RNR) model, which places an
4
emphasis on assessing and targeting an offender’s criminogenic needs (dynamic risk
factors) for interventions, third-generation instruments began to incorporate both static
and dynamic predictors of recidivism. Continuing this focus on assessing static and
dynamic risk factors, fourth-generation risk assessment tools have been designed to
follow individuals from intake to case closure, be administered on multiple occasions,
and better integrate protective factors (i.e., factors that reduce recidivism risk) within the
assessment process (Brennan et al., 2009).
In calling for a concentration of programming resources on the highest-risk
offenders, the risk principle makes sense at both the individual and aggregate levels.
While an intervention has an aggregate effect size, its effects on individuals will vary.
After completing an intervention, even those that are effective, some individuals will
recidivate while others will desist. For example, let us assume we have an intervention
that reduces recidivism by 25 percent. If we applied this intervention to, say, 100 higher-
risk individuals whose baseline recidivism probability was 80 percent, we would expect
the intervention to lower recidivism by 25 percent, resulting in 60 recidivists. In other
words, the intervention produced desistance for 20 of the 100 offenders.
But what if we applied the intervention to a lower-risk group whose baseline
recidivism probability was 40 percent? If we assume the intervention lowers recidivism
by 25 percent, then there would be 30 recidivists; that is, the intervention produced
desistance for 10 of the 100 offenders, which is half the number we observed for the
higher-risk group. Conceptually, adhering to the risk principle can help maximize an
effective intervention's impact on recidivism.
5
One outstanding question, however, is whether we would still observe a 25
percent reduction for the higher-risk group compared to the lower-risk group. For the
higher-risk group, who may be more entrenched in a criminal lifestyle, it could be that
one intervention is insufficient to bring about desistance. To be sure, the extant literature
suggests that higher-risk individuals require more intensive programming (Bonta,
Wallace-Capretti, and Rooney, 2000; Lowenkamp and Latessa, 2005). But the use of risk
and needs assessments operates on the assumption that we assign individuals to
interventions on the basis of risk (high) and needs (high); that is, if a high-risk individual
has a high substance abuse need, we would presumably want to prioritize this person for
CD treatment. But would CD treatment be the most appropriate intervention or, more
specifically, the most effective in reducing recidivism risk for this individual?
While risk assessment involves predicting who is most likely to recidivate, the
goal of needs assessment is, or at least should be, to identify the areas in which
interventions would likely have the greatest impact in lowering recidivism risk. If an
individual is in prison for, say, 6 months and can only participate in one intervention,
which one would have the greatest impact on recidivism? Ostensibly, needs assessment
should be able to help us identify what type of intervention would be most beneficial.
None of the existing risk and needs assessments, however, have demonstrated
they can validly predict or identify needs. The existing literature seems to assume that if a
tool predicts recidivism, then it also predicts needs. But examining how well a tool
performs in predicting recidivism is an evaluation of its ability to assess risk, not needs.
Indeed, the factors that heighten the need for an intervention within a particular area may
not be predictive of recidivism. For example, the extent of chemical use in the 12 months
6
prior to prison may be more indicative of the need for CD treatment than it is for
recidivism risk.
But even if current assessments were able to accurately predict recidivism and
identify the salient needs areas of offenders, it is still critical to assess responsivity to
programming. For instance, a potential problem can arise from assigning high-risk, high-
need individuals to interventions that do not reduce recidivism because they are either
ineffective in general or insufficient for higher-risk individuals. Put another way, general
responsivity refers to types or programming that are most effective in reducing
recidivism, such as cognitive-behavioral interventions. Specific responsivity, on the other
hand, includes individual barriers that may limit the likelihood for program participation
and successful completion (Bonta and Andrews, 2007). Examples of specific responsivity
include motivation, anxiety, different forms of learning styles, language, transportation,
gender, and culture (Cullen, 2002).
Existing responsivity assessments have not been developed through the use of
statistical methods, or an actuarial approach. Instead, these assessments are, for the most
part, barely more than a list of items for practitioners to consider when making program
placement decisions. None of the responsivity assessments used on correctional
populations have, to our knowledge, been evaluated to determine whether they are valid
or reliable. It is therefore unclear whether these assessments perform well in identifying
responsivity factors or whether their use has led to more appropriate program
assignments and, ultimately, better recidivism outcomes.
In this study, we introduce a more rigorous, actuarial approach for assessing both
general and specific responsivity. In particular, we assess responsivity by estimating the
7
likelihood that an individual’s participation in an intervention will result in desistance. If
an individual participated in, say, CD treatment, what is the probability it would lead to
desistance? As we demonstrate later, this approach to responsivity assessment not only
accounts for the efficacy of an intervention but also the varying effects an intervention
has on individuals. By improving the process in which individuals are assigned to
interventions, we propose that an actuarial approach to responsivity assessment can help
achieve better recidivism outcomes.
Chemical Dependency Treatment in MnDOC
Shortly after their admission to prison in Minnesota, prisoners with at least six
months to serve in prison undergo a brief (20-40 minutes) chemical dependency (CD)
assessment conducted by a licensed assessor. CD assessors use DSM-IV criteria for
substance abuse in their diagnoses, which are based on both self-report and collateral
information. The criteria for abuse include problems at work or school, not taking care of
personal responsibilities, financial problems, engaging in dangerous behavior while
intoxicated, legal problems, problems at home or in relationships, and continued use
despite experiencing problems. The criteria for dependence, on the other hand, include
increased tolerance; withdrawal symptoms; greater use than intended over a relatively
long period of time, inability to cut down or quit; a lot of time spent acquiring, using, or
recovering from use; missing important family, work, or social activities; and knowledge
that continued use would exacerbate a serious medical or psychological condition. After
completing the assessment, CD assessors assign prisoners a rating of no need, moderate
need, or high need for CD treatment.
8
Even though most newly admitted offenders are considered to be chemically
abusive or dependent, the number of prisoners directed to CD treatment greatly exceeds
the number of CD treatment beds available. In fact, among prisoners who receive a CD
assessment, roughly one-fourth enter CD treatment during their confinement. As a result,
the Minnesota Department of Corrections (MnDOC) has used a relatively simple,
summative algorithm to prioritize prisoners for CD treatment.
The algorithm produces a score that ranges from a low of 0 points to a high of 40
points. Of the 40 possible points, 10 are based on the assessed need for CD treatment.
More specifically, prisoners are given 0 points for no need, 5 points for moderate need,
and 10 points for high need. Likewise, 10 of the 40 points are based on assessed
recidivism risk. As noted below, our sample contains prisoners released from Minnesota
prisons between 2003 and 2011. During this time, the MnDOC used the Level of Service
Inventory-Revised (LSI-R) to assess recidivism risk. Depending on their LSI-R score,
prisoners were given either 10 points (very high risk), 7 points (high risk), 4 points
(medium risk), or 0 points (low risk) for recidivism risk.
While risk and needs make up half of the 40 points in the algorithm, the offense
for which offenders are imprisoned accounts for a total 10 points. In particular, offenders
in prison for a felony DWI are given 10 points while those in prison for other offenses
receive 0 points. The final 10 points cover items related to factors such as mental illness,
traumatic brain injury, and a history of assaultive behavior. Based on the score (ranging
from 0 to 40) from the algorithm, prisoners are then given a CD treatment priority level
of 1 (score of 20 or higher), 2 (score between 14 and 19), or 3 (score of 13 or lower).
9
Priority level 1 prisoners are most likely to receive a CD treatment offer, followed by
those in priority level 2 and priority level 3.
A prior evaluation of the MnDOC's CD treatment showed it is effective in
reducing recidivism. Using propensity score matching to match 926 treated offenders
released in 2005 with 926 inmates who had been untreated, Duwe (2010) found that
treatment decreased the risk of recidivism by 17 percent for rearrest, 21 percent for
reconviction, and 25 percent for reimprisonment for a new felony offense. Moreover,
consistent with earlier research (Wexler et al., 1990), the results showed that increased
treatment time appeared to lower the risk of recidivism, but only up to a point. While
short-term (90 days) and medium-term (180 days) programs had a statistically significant
impact on all three recidivism measures, no significant effects were found for long-term
(365 days) programming.
Data and Method
Our overall sample consists of 23,034 offenders released from Minnesota prisons
between 2003 and 2011 who had been assessed for chemical dependency. Within this
sample, there were 2,314 females and 20,720 males. Each of the 23,034 prisoners were
given a treatment need level from one of three categories—high need for treatment,
moderate need, and no need—as well as a CD treatment priority level. Of the 23,034
prisoners, a total of 5,414 (24 percent) participated in CD treatment during their
confinement prior to their release.
While the treatment need level (high, moderate or no need) provides us with the
assessed CD treatment needs for the prisoners in our sample, we developed predictive
models for assessments of recidivism risk and responsivity. More specifically, because
10
there are important gender differences with respect to risk and needs, we initially
separated our overall sample into males (N = 20,720) and females (N = 2,314). Next, we
separated these samples into three sets by the year prisoners were released from prison.
Our first set, the training set, consisted of individuals (10,517 males and 1,250
females) released from Minnesota prisons between 2003 and 2007. Our second set, the
test set, contained individuals (4,876 males and 556 females) released from prison
between 2008 and 2009. Our final set, the validation set, consisted of individuals (5,327
males and 509 females) released from prison in either 2010 or 2011.
Focusing first on the assessment of recidivism risk, we developed predictive
models on the training set data. As shown in Tables 1 and 2, our dataset contained a total
of 36 predictors that are available when a MnDOC prisoner goes through intake at the
time of admission to prison. These predictors encompass items commonly found to be
predictive of recidivism, such as criminal history, age at release, gang affiliation, and
marital status. We also include items such as prison admission and offense type. Our
measure of recidivism is reconviction for a misdemeanor, gross misdemeanor, or felony
within three years of release from prison. We obtained reconviction data on all 23,034
prisoners from the Minnesota Bureau of Criminal Apprehension. As shown in Tables 1
and 2, females had lower recidivism rates compared to males.
For both males and females, we used two different classification methods—
logistic regression and random forests—to develop predictive models on the training set.
Over the last few decades, regression modeling has been increasingly used to develop
prediction tools in the criminal justice field (Brennan and Oliver, 2000; Duwe, 2012;
Duwe, 2014; Duwe and Freske, 2012; Lowenkamp and Whetzel, 2009), while the use of
11
Table 1. Descriptive Statistics for Male Prisoner Sample
Predictors Description Mean and SD Training Test Validation
Static/Criminal History Mean SD Mean SD Mean SD
Total Convictions Total # of convictions (any offense level) 11.05 8.47 13.05 9.49 14.15 10.46 Felony Convictions Total # of felony convictions 1.70 1.68 2.29 2.05 2.82 2.38
Felony Specialization/Diversity Degree of specialization/diversity in felony offenses 0.86 0.26 0.87 0.24 0.85 0.25
Violent Convictions Total # of violent offense convictions 1.46 1.81 1.69 2.04 1.87 2.09 Violent Specialization/Diversity Degree of specialization/diversity in violent offenses 0.91 0.21 0.92 0.19 0.92 0.19
Total Assault Convictions Total # of assault offense convictions 0.93 1.51 1.11 1.70 1.24 1.77
Total Robbery Convictions Total # of robbery convictions 0.17 0.57 0.20 0.68 0.20 0.67 VOFP Convictions Total VOFP, stalking and harassment convictions 0.14 0.54 0.22 0.73 0.32 0.89
Disorderly Conduct Convictions Total # of disorderly conduct convictions 0.09 0.34 0.15 0.46 0.24 0.63
Prostitution Convictions Total # of prostitution offense convictions 0.01 0.12 0.01 0.14 0.01 0.15 Drug Offense Convictions Total # of drug offense convictions 0.99 1.34 1.11 1.49 1.12 1.60
Drug Offense Specialization/Diversity Degree of specialization/diversity in drug offenses 0.93 0.18 0.95 0.15 0.96 0.14
False Information to Police Cons. Total # of false information to police convictions 0.44 0.90 0.52 0.99 0.55 1.03 Flee/Escape Convictions Total # of flee/escape police convictions 0.23 0.61 0.27 0.66 0.31 0.71
Weapons Offense Convictions Total # of weapons offense convictions 0.08 0.32 0.10 0.36 0.12 0.40
Total Property Convictions Total # of property offense convictions 2.91 4.09 3.03 4.38 3.29 4.77 Property Offense Specialization/Diversity Degree of specialization/diversity in prop. offenses 0.89 0.19 0.92 0.15 0.92 0.15
Driving While Intoxicated (DWI) Convictions Total # of DWI convictions 0.29 0.67 0.59 1.03 0.66 1.12
Failure to Register (FTR) Convictions Total # of FTR convictions 0.05 0.28 0.08 0.36 0.09 0.40 Total Supervision Failures Total # of revocations on probation and parole 1.08 1.36 1.17 1.50 1.30 1.63
Intake
Metro County of Commitment Commit from Twin Cities metro-area county 0.51 0.50 0.51 0.50 0.51 0.50 Length of Stay in Prison (Months) Difference in months between admission and release 18.72 16.15 21.14 20.83 22.03 22.90
New Court Commitment Admitted to prison directly from court 0.60 0.49 0.65 0.48 0.68 0.47
Probation Violator Admitted to prison for probation violation 0.36 0.48 0.34 0.47 0.31 0.46 Release Violator Admitted to prison for parole violation 0.04 0.21 0.01 0.12 0.01 0.11
Person Offense Most serious index offense is person offense = 1 0.21 0.41 0.22 0.41 0.23 0.42
Sex Offense Most serious index offense is sex offense = 1 0.08 0.26 0.06 0.24 0.06 0.24 Drug Offense Most serious index offense is drugs = 1 0.30 0.46 0.27 0.45 0.24 0.43
Property Offense Most serious index offense is property = 1 0.24 0.42 0.19 0.39 0.17 0.38
DWI Offense Most serious index offense is DWI = 1 0.05 0.21 0.12 0.32 0.09 0.29 Other Offense Most serious index offense is “Other” offense = 1 0.13 0.34 0.15 0.35 0.20 0.40
Suicidal History Suicidal history = 1; no history = 0 0.11 0.32 0.16 0.36 0.17 0.37 Security Threat Group (STG) Total # of STG criteria (0-10) 0.92 1.69 0.88 1.66 0.94 1.69
Marital Status Married = 1; unmarried = 0 0.11 0.32 0.11 0.31 0.11 0.31
Age at Release Age in years at time of release 33.15 9.36 34.51 9.85 34.60 10.02 Unsupervised Release Released to no correctional supervision 0.02 0.15 0.01 0.12 0.01 0.10
Recidivism within 3 Years
General Recidivism Reconviction for misd., gross misd., or felony 0.68 0.47 0.62 0.49 0.63 0.48 N 10,517 4,876 5,327
12
machine learning algorithms such as random forests has been more recent (Barnes and
Hyatt, 2012). Created by Breiman (2001), random forests is an ensemble method that
involves growing a forest of many trees, each of which is grown on an independent
bootstrap sample from the training data. Each time a tree is fit at each node, some of the
predictor variables are censored. Random forests then find the best split based on the
selected predictor variables. The trees are grown to a maximum depth, and a consensus
prediction is obtained after voting the trees.
Recent research has advocated testing multiple classification methods when
developing a predictive model (Duwe and Kim, 2016; Ridgeway, 2013), given that there
is no single best algorithm that yields the best performance in every situation (Caruana
and Niculescu-Mizil, 2006; Wolpert, 1996).1 To identify the best predictive models, we
evaluated performance on the test set data. After doing so, we then applied the best-
performing models to the validation set data. In the validation sets, each prisoner received
a predicted probability that reflects his or her likelihood of recidivating within three years
of release from prison.
To assess responsivity, we also developed predictive models on the training set
data. The main difference between the recidivism risk and responsivity assessments had
to do with the outcome being predicted. With the recidivism risk assessment, the
predicted outcome was whether individuals recidivated within three years. With
1 Prior research has provided mixed evidence on the performance of machine learning algorithms, such as
random forests, versus older, more traditional approaches like logistic regression. Some studies have found
little or no difference between these two sets of classification methods (Hamilton et al., 2015; Liu et al.,
2011; Tollenaar and van der Heijden, 2013), whereas others have observed a performance advantage for
machine learning approaches (Berk and Bleich, 2013; Caruana et al., 2006; Duwe and Kim, 2015, 2016;
Hess and Turner, 2013). The evidence seems to be clearer that statistical and machine learning algorithms
outperform simplistic, Burgess-style methods (Duwe and Kim, 2016). Given the fact there is no single best
algorithm that performs the best in every situation, research has advocated testing multiple algorithms
(Duwe and Kim, 2015, 2016; Ridgeway, 2013), which is the approach we have followed here.
13
Table 2. Descriptive Statistics for Female Prisoner Sample
Predictors Description Mean and SD Training Test Validation
Static/Criminal History Mean SD Mean SD Mean SD
Total Convictions Total # of convictions (any offense level) 9.24 7.67 10.75 9.38 11.86 10.23 Felony Convictions Total # of felony convictions 1.58 1.58 2.20 2.07 2.66 2.47
Felony Specialization/Diversity Degree of specialization/diversity in felony offenses 0.84 0.29 0.83 0.27 0.81 0.28
Violent Convictions Total # of violent offense convictions 0.86 1.89 0.80 1.49 0.98 1.98 Violent Specialization/Diversity Degree of specialization/diversity in violent offenses 0.94 0.19 0.96 0.13 0.94 0.18
Total Assault Convictions Total # of assault offense convictions 0.37 0.91 0.41 0.99 0.53 1.35
Total Robbery Convictions Total # of robbery convictions 0.07 0.32 0.07 0.30 0.10 0.48 VOFP Convictions Total VOFP, stalking and harassment convictions 0.03 0.23 0.05 0.31 0.08 0.39
Disorderly Conduct Convictions Total # of disorderly conduct convictions 0.04 0.22 0.11 0.41 0.19 0.54
Prostitution Convictions Total # of prostitution offense convictions 0.34 1.46 0.18 0.88 0.24 1.17 Drug Offense Convictions Total # of drug offense convictions 1.11 1.35 1.21 1.61 1.31 1.74
Drug Offense Specialization/Diversity Degree of specialization/diversity in drug offenses 0.89 0.25 0.91 0.21 0.92 0.19
False Information to Police Cons. Total # of false information to police convictions 0.49 0.97 0.64 1.29 0.56 1.05 Flee/Escape Convictions Total # of flee/escape police convictions 0.10 0.36 0.10 0.38 0.08 0.30
Weapons Offense Convictions Total # of weapons offense convictions 0.01 0.09 0.01 0.15 0.02 0.15
Total Property Convictions Total # of property offense convictions 3.38 4.87 3.58 5.61 3.78 5.67 Property Offense Specialization/Diversity Degree of specialization/diversity in prop. offenses 0.83 0.24 0.86 0.22 0.86 0.21
Driving While Intoxicated (DWI) Convictions Total # of DWI convictions 0.20 0.59 0.49 0.90 0.58 1.04
Failure to Register (FTR) Convictions Total # of FTR convictions 0.00 0.03 0.01 0.11 0.01 0.13 Total Supervision Failures Total # of revocations on probation and parole 0.85 0.92 1.04 1.06 0.97 1.12
Intake
Metro County of Commitment Commit from Twin Cities metro-area county 0.50 0.50 0.42 0.49 0.43 0.50 Length of Stay in Prison (Months) Difference in months between admission and release 11.93 11.61 14.30 14.26 17.11 15.79
New Court Commitment Admitted to prison directly from court 0.49 0.50 0.44 0.50 0.51 0.50
Probation Violator Admitted to prison for probation violation 0.47 0.50 0.50 0.50 0.48 0.50 Release Violator Admitted to prison for parole violation 0.04 0.20 0.06 0.24 0.01 0.10
Person Offense Most serious index offense is person offense = 1 0.13 0.34 0.13 0.34 0.15 0.36
Sex Offense Most serious index offense is sex offense = 1 0.01 0.08 0.01 0.08 0.01 0.10 Drug Offense Most serious index offense is drugs = 1 0.44 0.50 0.44 0.50 0.44 0.50
Property Offense Most serious index offense is property = 1 0.33 0.47 0.28 0.45 0.24 0.42
DWI Offense Most serious index offense is DWI = 1 0.03 0.17 0.07 0.26 0.11 0.31 Other Offense Most serious index offense is “Other” offense = 1 0.06 0.23 0.07 0.26 0.06 0.24
Suicidal History Suicidal history = 1; no history = 0 0.21 0.41 0.32 0.47 0.36 0.48 Security Threat Group (STG) Total # of STG criteria (0-10) 0.13 0.51 0.16 0.60 0.12 0.53
Marital Status Married = 1; unmarried = 0 0.10 0.30 0.09 0.29 0.12 0.33
Age at Release Age in years at time of release 34.61 8.45 35.70 9.36 35.88 9.33 Unsupervised Release Released to no correctional supervision 0.03 0.18 0.06 0.23 0.00 0.00
Recidivism
General Reconviction for misd., gross misd., or felony 0.59 0.49 0.57 0.50 0.50 0.50 N 1,250 555 509
14
responsivity assessment, the predicted outcome was whether individuals had 1)
participated in CD treatment and 2) desisted within three years of release from prison.
Therefore, for the entire dataset, we created a variable, CD treatment desistance, that
assigned a value of “1” to desistors who participated in CD treatment and a value of “0”
to all other offenders. As a result, offenders who participated in CD treatment but
recidivated were given a value of “0”. Likewise, offenders who desisted but did not
participate in CD treatment were assigned a value of “0” for this item.
After developing responsivity assessment models on the training set data for both
males and females, we evaluated predictive performance on the test sets. We then applied
the best-performing models to the validation sets for males and females. In the validation
sets, each offender received a predicted probability that reflects his or her likelihood of
desisting as a result of CD treatment.
Predictive Performance Metrics
To measure predictive performance, we used six metrics to capture the three main
areas of predictive validity—accuracy, discrimination, and calibration. To evaluate
predictive accuracy, which assesses how well a model makes correct classification
decisions, we used accuracy (ACC). For predictive discrimination, which measures the
degree to which the model separates the recidivists from the desistors, we used three
separate metrics—the AUC, the H measure developed by Hand (2009), and the precision-
recall curve (PRC). The AUC has been one of the most widely used predictive
performance metrics, and it is relatively robust across different recidivism base rates and
selection ratios (Smith, 1996). Still, the AUC can provide overly optimistic estimates of
predictive discrimination for imbalanced datasets (Davis and Goadrich, 2006), and it can
15
provide misleading results if receiver operating characteristic (ROC) curves cross (Hand,
2009). As a result, we also used Hand’s H-measure, which uses a common cost
distribution for all classifiers (Hand, 2009), and the precision-recall curve (PRC), which
assesses discrimination with the precision and recall values. Precision measures the
percent of positive predictions that were correct (based on the 50 percent threshold),
whereas recall reflects the percentage of positives (i.e., recidivists) that were captured.
Compared to the AUC, the PRC has been found to be a better metric for highly
imbalanced datasets (i.e., making predictions for an infrequently occurring outcome)
(Davis and Goadrich, 2006).
Calibration assesses how well the predicted probabilities from a model correspond
with the observed outcome being predicted. For our calibration metric, we used root
mean square error (RMSE), which measures the squared root of the average squared
difference between observed recidivism and predicted probabilities. The sixth metric we
used is the SAR (squared error, accuracy, and ROC area) statistic developed by Caruana,
Niculescu-Mizil, Crew, and Ksikes (2004). SAR is a combined measure of
discrimination, accuracy and calibration, and the formula for SAR is: (ACC + AUC + (1
– RMSE))/3 (Caruana, Niculescu-Mizil, Crew, and Ksikes, 2004).
Prioritizing Prisoners for CD Treatment
In the validation sets for the male and female prisoners, each offender had been
assessed for risk, needs, and responsivity. Put another way, each of the 5,327 males and
509 females in the validation sets had values for 1) recidivism risk probability, 2) CD
treatment need, and 3) responsivity probability. The values for both recidivism risk and
responsivity ranged from a low of 0 percent to a high of 100 percent. A higher predicted
16
probability for recidivism signifies a higher risk for recidivism. On the other hand, a
higher predicted probability for responsivity denotes a greater likelihood that an
individual will desist after participating in CD treatment. The values for CD treatment
need consisted of “1” for no need, “2” for moderate need, and “3” for high need.
Using these values from the risk, needs, and responsivity assessments, we
examined several different ways of prioritizing prisoners for CD treatment. In particular,
we prioritized offenders on the basis of 1) risk and needs, 2) risk and responsivity, 3)
needs and responsivity, and 4) risk, needs, and responsivity. For example, in prioritizing
prisoners by risk and needs, we added the values from the risk and needs assessments to
form a total risk-needs score. Likewise, to prioritize prisoners by risk, needs, and
responsivity, we added the values from the risk, needs, and responsivity assessments to
form a total risk-needs-responsivity score. Therefore, individuals with the highest scores
are presumably those with the highest risk, needs, and responsivity to CD treatment.
A little more than one-fourth of the prisoners in the male and female validation
sets entered CD treatment. For example, 1,377 (26%) of the 5,327 male offenders in the
validation set participated in CD treatment. The recidivism rate for the treated offenders
was 49 percent versus 68 percent for those who were untreated. For females, 145 (28%)
of the 509 offenders participated in CD treatment. The recidivism rate for the treated
offenders was 35 percent compared to 56 percent for those who were untreated
To determine how each prioritization scheme might perform in assigning
individuals for CD treatment, we organized the validation sets into quartiles and then
analyzed recidivism outcomes by CD treatment participation. To illustrate with the 5,327
male offenders in the validation set, the recidivism rate was 49 percent for the 1,377
17
(26%) who entered CD treatment and 68 percent for the 3,950 (74%) who did not. The
rate was therefore 27 percent lower for the treated offenders. With a recidivism rate of 49
percent among the 1,377 treated offenders, there were still 677 who were recidivists. Yet,
if we assumed that none of the 1,377 were able to enter treatment and the recidivism rate
for untreated offenders is 68 percent, then 932 would have been recidivists. Delivering
CD treatment to the 1,377 offenders is thus associated with a reduction of 255 recidivists
(932 minus 677).
If we prioritized the top one-fourth of offenders (i.e., CD treatment capacity) on
the basis of risk-needs, risk-responsivity, needs-responsivity, or risk-needs-responsivity,
would we still see a 27 percent reduction? Similarly, if we prioritized the top one-fourth
on the basis of these four prioritization schemes, would we still observe 255 prevented
recidivists? Would the treatment effect sizes and number of prevented recidivists be
smaller, larger, or about the same? To answer these questions, we present the findings in
the following section.
Results
In Table 3, we present the predictive performance results from the recidivism and
responsivity assessments for males and females. As noted above, we used two types of
classification methods—logistic regression and random forests. The results in Table 3
indicate the recidivism risk models for both classification methods predicted recidivism
relatively well for both males and females. For male offenders, the logistic regression
model slightly outperformed the random forests model across each of the six predictive
performance metrics in both the test and validation sets. For female offenders, the
random forests model slightly outperformed logistic regression.
18
Metrics ACC AUC H PRC RMSE SAR
TEST SET
Recidivism Baseline
Females
Responsivity
Males
Females
VALIDATION SET
Recidivism Baseline
Females
Responsivity
Males
Females
Logistic Regression 0.806 0.781 0.205 0.396 0.367 0.740
Random Forests 0.829 0.839 0.319 0.486 0.345 0.774 Notes: ACC = Accuracy; AUC = Area Under the Curve; PRC = Precision-Recall Curve;
RMSE = Root Mean Squared Error; SAR = Squared Error, Accuracy, ROC (Receiver
Operating Characteristic)
When we focus on assessing responsivity to CD treatment, the random forest
models performed best for male and female offenders in both the test and validation sets.
In particular, the validation test results for males indicate the random forests model had
good predictive discrimination (AUC = 0.82) and it yielded a correct classification rate of
19
87 percent. With an SAR value of 0.79, the random forests model had strong overall
predictive performance. We observed similar findings for females. The random forests
model achieved an accuracy rate of 83 percent, an AUC of 0.84, and a SAR of 0.77.
Overall, the responsivity assessment models had better predictive performance than those
for recidivism risk.
Measures Recidivism Rates Number of Recidivists
Treated N Untreated N Effect Treated Untreated Prevented Total N
Overall 0.492 1,377 0.677 3,950 0.273 677 932 255 5,327
Overall Adjusted 0.492 1,332 0.677 3,950 0.273 655 902 247 5,282
Risk-Needs-Responsivity
1 (Top 25%) 0.593 479 0.856 853 0.307 790 1,140 350 1,332
2 (26-50%) 0.550 350 0.760 982 0.276 733 1,012 279 1,332
3 (51-75%) 0.390 323 0.610 1,009 0.361 519 813 294 1,332
4 (Bottom 25%) 0.320 221 0.530 1,110 0.396 426 705 279 1,331
Risk-Needs
1 (Top 25%) 0.770 184 0.860 1,148 0.105 1,026 1,146 120 1,332
2 (26-50%) 0.630 329 0.720 1,003 0.125 839 959 120 1,332
3 (51-75%) 0.460 490 0.570 842 0.193 613 759 146 1,332
4 (Bottom 25%) 0.280 374 0.480 957 0.417 373 639 266 1,331
Risk-Responsivity
1 (Top 25%) 0.588 466 0.851 866 0.309 783 1,134 351 1,332
2 (26-50%) 0.553 349 0.764 983 0.276 737 1,018 281 1,332
3 (51-75%) 0.448 290 0.662 1,042 0.323 597 882 285 1,332
4 (Bottom 25%) 0.294 272 0.468 1,059 0.372 391 623 232 1,331
Needs-Responsivity
1 (Top 25%) 0.398 732 0.443 600 0.102 530 590 60 1,332
2 (26-50%) 0.549 419 0.634 913 0.134 731 844 113 1,332
3 (51-75%) 0.760 129 0.739 1203 +0.03 1,012 984 -28 1,332
4 (Bottom 25%) 0.598 97 0.762 1234 0.215 796 1,014 218 1,331
20
In Tables 4 and 5, we present the results for the male and female offenders. Here,
we focus only on the classification method that produced the best results. Whereas
random forests yielded the best outcomes for males, it was logistic regression for
females. For both males and females, we analyzed the results according to the four
different schemes for prioritizing offenders for CD treatment. Therefore, we compared
the overall results from the validation set with the four prioritization schemes: 1) risk-
needs, 2) risk-responsivity, 3) needs-responsivity, and 4) risk-needs-responsivity.
Table 5. Female Prisoner Results
Measures Recidivism Rates Number of Recidivists
Treated N Untreated N Effect Treated Untreated Prevented Total N
Overall 0.352 145 0.560 364 0.371 51 81 30 509
Overall Adjusted 0.352 127 0.560 364 0.371 45 71 26 491
Risk-Needs-Responsivity
1 (Top 25%) 0.500 34 0.828 93 0.396 64 105 41 127
2 (26-50%) 0.333 30 0.619 97 0.462 42 79 37 127
3 (51-75%) 0.386 44 0.482 83 0.199 49 61 12 127
4 (Bottom 25%) 0.189 37 0.297 91 0.364 24 38 14 128
Risk-Needs
1 (Top 25%) 0.706 17 0.855 110 0.174 90 109 19 127
2 (26-50%) 0.571 21 0.581 106 0.017 73 74 1 127
3 (51-75%) 0.333 48 0.354 79 0.059 42 45 3 127
4 (Bottom 25%) 0.186 59 0.290 69 0.359 24 37 13 128
Risk-Responsivity
1 (Top 25%) 0.529 34 0.817 93 0.353 67 104 37 127
2 (26-50%) 0.286 28 0.616 99 0.536 36 78 42 127
3 (51-75%) 0.439 41 0.512 86 0.143 56 65 9 127
4 (Bottom 25%) 0.167 42 0.267 86 0.374 21 34 13 128
Needs-Responsivity
1 (Top 25%) 0.261 69 0.293 58 0.109 33 37 4 127
2 (26-50%) 0.349 43 0.417 84 0.163 44 53 9 127
3 (51-75%) 0.632 19 0.648 108 0.025 80 82 2 127
4 (Bottom 25%) 0.429 14 0.719 114 0.403 55 92 37 128
In Table 4, the results show there were 1,377 treated offenders and 3,950
untreated offenders. The three-year reconviction rate was 49.2 percent for the treated and
67.7 percent for the untreated. The treated rate was therefore 27 percent lower than the
21
untreated rate. Among the 1,377 who were treated, there were 677 recidivists. If the
1,377 treated offender had not been treated and their rate was 67.7 percent, then 932
would have been recidivists. As a result, CD treatment prevented 255 recidivists. We also
show the overall adjusted figures based on one-fourth (N = 1,332) participating in CD
treatment.
In Table 5, the results for females show there were 145 treated offenders and 364
untreated offenders. The three-year reconviction rate was 35.2 percent for the treated and
56.0 percent for the untreated. The treated rate was therefore 37 percent lower than the
untreated rate. Among the 145 who were treated, there were 51 recidivists. If the 145
treated offenders had not been treated and their rate was 56.0 percent, then 81 would have
been recidivists. As a result, CD treatment prevented 30 recidivists. We also show the
overall adjusted figures based on one-fourth (N = 127) participating in CD treatment.
As shown in Tables 4 and 5, the risk-needs-responsivity (RNR) scheme
performed the best for both males and females, followed by risk-responsivity (RR), risk-
needs (RN), and needs-responsivity (NR). For males, the RNR and RR schemes
performed roughly the same, while the RNR scheme for females was clearly better than
the RR scheme. For both males and females, the RNR and RR prioritization schemes
increased the number of prevented recidivists while preserving the treatment effect size.
To illustrate, when we focus on the RNR scheme for males, we see that the effect
size among the top one-fourth (30.7 percent reduction) is actually a little larger than it is
for the overall sample (27.3 percent reduction). Moreover, because the RNR scheme
effectively isolated the higher-risk offenders, it prevented a larger number of recidivists.
Indeed, the number of prevented recidivists in the RNR scheme (350) was more than 100
22
higher than the number (247) for the overall adjusted sample. Focusing on the RNR
scheme for females, we see the effect size (39.6 percent) is larger than the overall effect
size (37.1 percent). The number of recidivists prevented (41) is also 15 higher than that
observed (26) for the overall adjusted sample. Combined, the RNR scheme accounted for
118 additional prevented recidivists, whereas the RR scheme was responsible for 115
prevented recidivists.
In contrast, neither the NR nor RN schemes performed well for either males or
females. For example, the NR scheme would produce an estimated 207 fewer prevented
recidivists, whereas the RN scheme yielded 133 fewer prevented recidivists. Although
the NR scheme may have performed well in identifying who needs CD treatment and
who would benefit from it the most, it is still important to account for recidivism risk.
Likewise, the RN scheme was effective in identifying higher-risk offenders, but it did not
perform well in identifying those who would benefit from CD treatment. Recidivism
rates were higher for offenders in the upper quartiles, but the treatment effect size was
smaller. It is possible that for some higher-risk offenders, CD treatment alone is
insufficient to help bring about desistance. For these offenders, they may need another
intervention or, more precisely, multiple interventions. Assessing for responsivity,
however, helps identify who would benefit the most from CD treatment, even among the
higher-risk offenders.
In Table 6, we estimate the overall impact that each prioritization scheme might
have on recidivism. Combined, the male and female validation sets included 5,836
released prisoners, of whom 1,522 were treated. The recidivism rate was 61.8 percent for
these offenders, resulting in 3,606 recidivists. If none of the 1,522 had been treated, the
23
estimated rate would have been 66.7 percent, resulting in 3,891 recidivists. The
prioritization scheme used by the MnDOC yielded 285 prevented recidivists.
Table 6. Overall Results
No Treatment 5,836 0 3,891 66.7%
Current State 5,836 1,522 3,606 61.8% 285 5.34
Needs-Responsivity 5,836 1,522 3,826 65.6% 65 23.42
Risk-Needs 5,836 1,522 3,749 64.2% 142 10.72
Risk-Responsivity 5,836 1,522 3,487 59.8% 404 3.77
Risk-Needs-Responsivity 5,836 1,522 3,481 59.7% 410 3.71
NNT = Number Needed to Treat
The number needed to treat (NNT) is a statistic that has been used, often in
epidemiology, to measure the efficacy of different types of treatment. NNT quantifies the
number of participants who would need to participate in an intervention in order to
produce one beneficial outcome. The NNT formula for this study is: 1/ (recidivism rate
for untreated prisoners) – (recidivism rate for treated prisoners). With 1,522 receiving
treatment, the number needed to treat (NNT) to achieve one desistor was 5.34.
When we examine the overall impact for each of the four prioritization schemes,
we see that both the NR and RN schemes performed worse than the current scheme used
by the MnDOC. The NR model achieved 65 desistors for a NNT of 23.42, whereas the
RN scheme was slightly better with 142 desistors and a NNT of 10.72. In contrast, the
RR model netted 404 desistors, resulting in a NNT of 3.77. The RNR model yielded 410
desistors, resulting in a NNT of 3.71. Compared to the current scheme used by the
MnDOC, the RNR model would produce 125 more desistors, lowering the recidivism
rate by a little more than two percentage points.
24
Conclusion
Often consisting of little more than a checklist of items, the assessment of
responsivity has been the neglected “R” in the RNR model. Here we introduced a more
rigorous, actuarial approach for assessing responsivity by attempting to predict which
prisoners would desist from crime after participating in a correctional intervention. The
responsivity assessment we presented in this study not only accounts for the efficacy of
an intervention, but it can also be combined with risk and needs assessments to
potentially produce better treatment assignments.
The results showed the responsivity assessments had relatively high levels of
predictive performance for male and female prisoners. More important, however, the
findings suggest that including an actuarial assessment for responsivity can help further
enhance the effectiveness of an effective intervention. We observed the best recidivism
outcomes when we combined the responsivity assessments with those for risk and needs.
Prioritizing the highest risk and need offenders who would likely benefit the most from
CD treatment increased the treatment effect size, improved the NNT metric, and lowered
the overall recidivism rate by two percentage points. Even though the prevention of more
than 100 individuals from becoming recidivists may not seem substantial, a reduction of
this magnitude is notable because crime is costly. Indeed, the costs resulting from crime
include victimization costs, criminal justice system (law enforcement, courts, and
corrections) costs, offender lost productivity, and public willingness-to-pay costs (Cohen
and Piquero, 2009). Although property offenses generally incur a relatively low cost, it
has been estimated that violent crimes such as a sex offense can cost society up to a half
million dollars or, more significantly, that one murder costs between $10 and $20 million
25
(in 2018 dollars) (Cohen and Piquero, 2009; DeLisi, Kosloski, Sween, Hachmeister,
Moore, and Drury, 2010; McCollister, French, and Fang, 2010).
While the findings suggest that using actuarial responsivity assessments may help
maximize the public safety benefits from effective interventions by prioritizing offenders
more effectively, several limitations are worth highlighting. Most notably, we examined
only one intervention (CD treatment) for one needs area (substance abuse) for prisoners
from one jurisdiction (Minnesota). In addition, we examined only two types of
classification methods (logistic regression and random forests), and we used a very
simplistic, summative approach for combining the risk, needs, and responsivity
assessments. Therefore, it is unclear the extent to which the findings presented here,
which should be considered preliminary, are generalizable. Still, because the findings are
promising, below we discuss the implications they may have for correctional research,
policy, and practice.
First, the results suggest that factors commonly associated with recidivism, such
as criminal history, gang affiliation, or marital status, may also have an impact on
responsivity. Indeed, it is worth reiterating that our responsivity assessment models had
better predictive performance than those for recidivism. Therefore, factors affecting
responsivity to correctional interventions may not only include those typically considered
such as gender, culture, language, and motivation, but also those more commonly
associated with recidivism risk.
Second, in addition to considering factors normally associated with recidivism,
the approach for assessing responsivity we introduced here has the advantage of helping
empirically determine whether an intervention would be effective in reducing recidivism
26
for individual offenders. Within the current RNR framework, offenders are assigned to
interventions on the basis of risk, needs and, in some instances, responsivity. It is
generally unclear, however, whether the intervention is actually effective or, even if it is,
whether the individual would benefit from the intervention. Just because the literature
indicates that prison-based drug treatment is generally effective does not mean that a
specific drug treatment program will be effective in reducing recidivism. After all, issues
such as a lack of program integrity can compromise the effectiveness of a correctional
intervention (Duwe and Clark, 2015). Yet, by assigning individuals to effective
interventions that are, in turn, the best interventions for those individuals, the use of an
actuarial approach for assessing responsivity holds the potential of delivering better
recidivism outcomes overall.
Third, even though the RNR model recommends assigning offenders on the basis
of risk, needs, and responsivity, treatment assignment decisions are often made strictly on
the basis of risk and needs due to the absence of any formal assessments for responsivity.
As such, offenders who are prioritized for programming are those with the highest risk
and needs. Our findings suggest, however, that assigning offenders strictly on the basis of
risk and needs may not deliver the desired results. Indeed, when we assigned offenders
just on the basis of risk and needs, we observed a reduced effect size for CD treatment, a
higher NNT, and fewer prevented recidivists. What these findings suggest is that many of
the highest-risk individuals may be too entrenched in a criminal lifestyle to desist as a
result of participating in CD treatment. While CD treatment may be enough to get lower-
risk prisoners to desist, more programming is needed for the higher-risk offenders. This
finding is consistent with the notion that greater doses of programming (i.e., multiple
27
interventions that address multiple needs areas) are needed for the highest-risk offenders
to help bring about desistance (Lowenkamp and Latessa, 2005).
Finally, notwithstanding the focus on a single correctional intervention in this
study, we suggest that simultaneously assessing responsivity to multiple interventions
may yield the greatest benefits. Correctional agencies typically have more than one
intervention to offer offenders and, as noted above, a single intervention may be
insufficient to bring about desistance for those with a higher risk for recidivism.
Therefore, the goal should involve conducting responsivity assessments for all
interventions an agency may have to provide offenders.
For example, let us assume a corrections agency has five interventions to which
offenders can be assigned on the basis of a risk and needs assessment. Responsivity
assessments for each of the five interventions may help better identify which programs
would work best for each individual offender. Moreover, for the higher-risk offenders
with longer confinement periods, which would allow for participation in multiple
programs, the responsivity assessment could evaluate which combinations of
interventions would most likely lead to desistance.
To illustrate, let us assume we have a very high risk individual who will be in
prison for two years, which is ample time to participate in multiple interventions. Let us
further assume a single intervention is unlikely to result in desistance for this individual.
If completing, say, CD treatment is unlikely to help this individual desist, what would his
probability for desistance be after completing CD treatment and an employment program
or cognitive-behavioral therapy? Responsivity assessments to multiple interventions
28
might reveal the best combination of programming for this individual and, in doing so,
would help deliver better recidivism outcomes overall.
As indicated by the limitations noted earlier, this study should be considered a
first step towards taking a more rigorous, actuarial approach to responsivity assessment.
Future research should examine whether this approach is effective for other types of
interventions for different offender populations in other jurisdictions. Along the same
lines, future studies should look at whether actuarial responsivity assessments can
accommodate multiple interventions so as to identify which intervention might work best
for an individual or whether multiple interventions are needed to achieve desistance for
higher-risk offenders. In addition, because we used a simple summative approach in
combining the values from the risk, needs, and responsivity assessments, future research
should examine whether there are more effective procedures for consolidating values into
a composite score.
If an actuarial approach for assessing responsivity is proven to be viable and
generalizable, there would undoubtedly be questions about how best to implement this
approach in practice. Given the reliance on historical programming data to assess
responsivity, the method we introduced here would seem to favor a more customized
assessment process that is specific to an agency and the programming it provides. This
does not mean, however, that a more generic actuarial responsivity assessment could not
be developed and integrated with global, off-the-shelf risk and needs assessments that are
used across multiple jurisdictions. Regardless of whether a valid and reliable generic
assessment can be developed, our findings suggest that actuarial responsivity assessment
29
is an area in need of more research in the future due to the potential impact it could have
on the programming assignment process and, more broadly, public safety.
30
REFERENCES
Barnes, G.C. & Hyatt, J.M. (2012). Classifying adult probationers by forecasting future
offending. National Institute of Justice: Washington, DC.
Berk, R.A., & Bleich, J. (2013). Statistical procedures for forecasting criminal behavior:
A comparative assessment. Criminology & Public Policy 12: 513-544.
Bonta, J. & Andrews, D.A. (2007). Risk-Needs-Responsivity Model for Offender
Assessment and Rehabilitation. Ottawa: Public Safety Canada.
Bonta, J., S. Wallace-Capretta, & J. Rooney, (2000). A Quasi-Experimental Evaluation of
an Intensive Rehabilitation Supervision Program. Criminal Justice and Behavior,
27, 312-329.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.
Brennan, T., Dieterich, W., & Ehret, B. (2009). Evaluating the predictive validity of the
COMPAS risk and needs assessment system.
Brennan, T., & Oliver, W.L. (2000). Evaluation of Reliability and Validity of COMPAS
Scales: National Aggregate Sample. Traverse City, MI: Northpointe Institute for
Public Management.
Burgess, E.W. (1928). Factors determining success or failure on parole. In A.A. Bruce,
E.W. Burgess, J. Landesco, & A.J. Harno (Eds.), The workings of the
indeterminate sentence law and the parole system in Illinois, (pp. 221–234).
Springfield, IL: Illinois State Board of Parole.
Caruana, R., Niculescu-Mizil, A., Crew, G., & Ksikes, A. (2004). Ensemble selection
from libraries of models, in Proceedings of the 21st International Conference on
Machine Learning, Canada: Banff, 1-12.
Caruana, R. & Niculescu-Mizil, A. (2006). An empirical comparison of supervised
learning algorithms using different performance metrics, in Proceedings of the
23rd International Conference on Machine Learning, New York: Association for
Computing Machinery, 161-168.
Cohen, M. A., & Piquero, A.R. (2009). New evidence on the monetary value of saving a
high risk youth. Journal of Quantitative Criminology, 25, 25-49.
31
Cullen, F. T. (2002) “Rehabilitation and Treatment Programs.” In J. Q. Wilson and J.
Petersilia (eds.), Crime: Public Policies for Crime Control, 2nd edition. San
Francisco: ICS Press.
Davis, J. & Goadrich, M. (2006). The relationship between precision-recall and ROC
curves, in Proceedings of the 23rd International Conference on Machine
Learning, Canada: Banff, 1-12.
DeLisi, M., Kosloski, A., Sween, M., Hachmeister, E., Moore, M., & Drury, A. (2010).
Murder by numbers: Monetary costs imposed by a sample of homicide offenders.
The Journal of Forensic Psychiatry & Psychology, 21:501-513.
Duwe, G. (2010). Prison-based chemical dependency treatment in Minnesota: An
outcome evaluation. The Journal of Experimental Criminology, 6: 57-81.
Duwe, G. (2012). Predicting first-time sexual offending among prisoners without a prior
sex offense history: The Minnesota Sexual Criminal Offending Risk Estimate
(MnSCORE). Criminal Justice and Behavior, 39, 1,434-1,454.
Duwe, G. (2014). The development, validity, and reliability of the Minnesota Screening
Tool Assessing Recidivism Risk (MnSTARR). Criminal Justice Policy Review,
25, 579-613.
Duwe, G. & Clark, V. (2015). Importance of program integrity: Outcome evaluation of a
gender-responsive, cognitive-behavioral program for female offenders.
Criminology & Public Policy, 14, 301-328.
Duwe, G. & Freske, P. (2012). Using logistic regression modeling to predict sex offense
recidivism: The Minnesota Sex Offender Screening Tool-3 (MnSOST-3). Sexual
Abuse: A Journal of Research and Treatment, 24, 350-377.
Duwe, G. & Kim, K. (2016). Sacrificing accuracy for transparency in recidivism
risk assessment: The impact of classification method on predictive performance.
Corrections: Policy, Practice and Research, 1, 155-176.
Gottfredson, S.D. & Moriarty, L.J. (2006). Statistical risk assessment: Old problems and
new applications. Crime and Delinquency, 52(1), 178–200.
Hamilton, Z., Neuilly, M-A., Lee, S., & Barnoski, R. (2014). Isolating modeling effects
in offender risk assessment. Journal of Experimental Criminology. DOI:
10.1007/s11292-014-9221-8.
32
Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area
under the ROC curve. Machine Learning, 77, 103-123.
Hess, J. & Turner, S. (2013). Risk Assessment Accuracy in Corrections Population
Management: Testing the Promise of Tree Based Ensemble Predictions. Center
for Evidence-Based Corrections: The University of California, Irvine.
Liu, Y.Y., Yang, M., Ramsey, M., Li, X.S., & Cold, J.W. (2011). A comparison of
logistic regression, classification and regression tree, and neural network models
in predicting violent re-offending. Journal of Quantitative Criminology, 27, 547-
573.
placement. Criminology and Public Policy, 4, 501-528.
Lowenkamp, C.T. & Whetzel, J. (2009). The development of an actuarial risk assessment
instrument for U.S. Pretrial Services. Federal Probation, 73, 33-36.
McCollister, K.E., French, M.T., & Fang, H. (2010). The cost of crime to society: New
crime-specific estimates for policy and program evaluation. Drug and Alcohol
Dependence, 108, 98-109.
Ridgeway, G. (2013). The Pitfalls of Prediction. National Institute of Justice Journal,
Issue No. 271.
Smith, W. (1996). The effects of base rate and cutoff point choice on commonly used
measures of association and accuracy in recidivism research. Journal of
Quantitative Criminology, 12, 83-111.
Tollenaar, N., & van der Heijden, P.G.M. (2013). Which method predicts recidivism
best? A comparison of statistical, machine learning and data mining predictive
methods. Journal of the Royal Statistical Society, Series A 176 (part 2): 565-584.
Wexler, H.K., Falkin, G.P. & Lipton, D.S. (1990). Outcome evaluation of a prison
therapeutic community for substance abuse treatment. Criminal Justice and
Behavior, 17, 71-92.
Wolpert, D.H. (1996). The lack of a priori distinctions between learning algorithms.
Neural Computation, 8, 1,341-1,390.