Post on 28-Jul-2015
A Regression Analysis on the Determinants of Crime Rates Across Philippine
Provinces
A study presented to
Ms. Angela D. Nalica
Professor, Stat 136
In partial fulfilment of the requirements for
STAT 136: Introduction to Regression Analysis
University of the Philippines, Diliman, Quezon City
CRUZ, Clemence-Fatima
MACARAIG, Miguel Rodrigo
SANTOS, Marvin Allan
May 28, 2010
2 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Abstract
This study focuses on the provincial crime rate in the Philippines for the year
2000. The aim is to ascertain the possible factors that affect the crime rate in the
Philippines using multiple linear regression. The results present that, at a 0.05 level of
significance, the following variables contribute to crime rate: population density, poverty
incidence, number of policemen, and number of courts.
3 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Introduction
Crime is a truth that exists for all, whether it is taken as a moral or legal construct.
This is a truth that we would have to accept, no matter how appalling it may seem. Many
efforts have been exerted in order to eradicate crime. Unfortunately, a world or a country
without crime is strictly utopian. As such, an existence without crime is impossible to
achieve. For us to be able to eradicate the concept of crime, we must first remove the
concepts it violates which are morals and laws, which is, again, a utopian task. This
would be a discourse meant for philosophical minds, and would therefore beyond our
concerns. Thus, the best course of action would be to deter or lessen the crime incidence
or lower the crime rate that prevails in our country.
Here in the Philippines, crime is one of the foremost problems present. However,
in the midst of more urgent problems such as poverty, corruption, and hunger, crime
loses most of its significance and is then relegated to the bottom of the Philippines‘s long
list of problems. The solution to crime deterrence becomes limited to debates on revising
punishments for crimes and reinstating the death penalty—a punishment which has no
proven effect of deterring crime rate. What country officials fail to recognize is that band-
aid solutions such as imposing severe punishments do not work on large-scale problems
such as this. If so, what do we have to do in order to address this problem?
To any problem, the solution is to ascertain its true cause and attack it from its
roots. This may sound easy and simple enough; however in our country where problems
are more tangled than politics, this would be a complicated task. In most cases, people do
not seem to know for certain which problem causes which. For example: is the
Philippines poor because there is a high incidence of corruption? Or is the Philippines
corrupt because there is poverty? The same goes for crime rate. Is there a high rate of
crime because the Philippines is poor? Or is it that the Philippines is poor because there is
a high crime rate? Here lies the dilemma.
4 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
If the possible factors that affect crime rate are correctly or sufficiently identified,
a more feasible solution, by means of addressing or alleviating these factors, may be
formulated. And thus, we ask this question: what are the possible factors that affect crime
rate here in the Philippines, and by how much do these factors affect crime rate? It is in
hopes of answering these questions that this study was conducted.
This study aims to ascertain the possible factors that affect the crime rate here in
the Philippines, and to provide an estimate regarding the impact that these factors present
through the use of multiple linear regression. This study will focus on crime rate in the
Philippines using provincial data from the National Statistical Coordination Board
(NSCB). All data retrieved were from the year 2000.
5 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Definition of Termsϯ
Crime Rate** - is the number of reported crimes per 100,000 population.
Cohort Survival Rate – the percentage of enrollees at the beginning grade or year in a
given school year who reached the final grade or year of the elementary of
secondary level.
Consumer Price Index (CPI) – Indicator of the change in the average prices of a fixed
basket of goods and services commonly purchased by households relative to a
base year.
Enrolment - total number of pupils/students who register/enlist in a school year.
Family Income – includes primary income and receipts from other sources received by
all family members during the calendar year as participants in any economic
activity or as recipients of transfers, pensions, grants, etc. (2000 FIES, NSO)
Primary income includes:
• Salaries and wages from employment.
• Commissions, tips, bonuses, family and clothing allowance, transportation and
representation allowance and honoraria.
• Other forms of compensation and net receipts derived from the operation of
family-operated enterprises/activities and the practice of profession or trade.
Income from other sources include:
• Imputed rental values of owner-occupied dwelling units.
• Interests.
• Rentals including land owner‘s share of agricultural products
• Pensions
• Support and value of food and non-food items received as gifts by the family (as
well as the imputed value of services rendered free of charge to the family).
• Receipts from family sustenance activities, which are not considered as family
operated enterprise.
Family Expenditures – refers to the expenses or disbursements made by the family
purely for personal consumption during the reference period. They exclude all
expenses in relation to farm or business operations, investment ventures, purchase
6 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
of real property and other disbursements which do not involve personal
consumption. Gifts, support, assistance or relief in goods and services received by
the family from friends, relatives, etc. and consumed during the reference period
are included in the family expenditures. Value consumed from net share of crops,
fruits and vegetables produced or livestock raised by other households, family
sustenance and entrepreneurial activities are also considered as family
expenditures.
Functional Literacy - represents a significantly higher level literacy which includes not
only reading and writing skills but also numeracy skills. This skill must be
sufficiently advanced to enable the individual to participate fully and effectively
in activities commonly occurring in his life situation that require a reasonable
capability of communicating by written language.
Gini Ratio – the ratio of the area between the Lorenz curve and the diagonal (the line of
perfect equality) to the area below the diagonal.
Notes: It is a measure of the extent to which the distribution of
income/ expenditure among families/individuals deviates
from a perfectly equal distribution, with limits 0 for perfect
equality and 1 for perfect inequality.
Gross Regional Domestic Product - aggregate of the gross value added or income from
each industry or economic activity of the regional economy.
Human Development Index - a measure of how well a country has performed, not only
in terms of real income growth, but also in terms of social indicators of people‘s
ability to lead a long and healthy life, to acquire knowledge and skills, and to have
access to the resources needed to afford a decent standard of living.
Literacy rate, Simple/Basic – the percentage of the population 10 years old and over,
who can read, write and understand simple messages in any language or dialect.
Population Density – refers to the number of persons per unit of land area (usually in
square kilometers). This measure is more meaningful if given as population per
unit of arable land.
7 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Poverty Incidence – the proportion of families/individuals with per capita income /
expenditure less than the per capita poverty threshold to the total number of
families/individuals.
Province – the largest unit in the political structure of the Philippines. It consists, in
varying numbers, of municipalities and, in some cases, of component cities. Its
functions and duties in relation to its component cities and municipalities are
generally coordinative and supervisory.
Social Services - this covers expenditures for education, health, social security, labor and
employment, housing and community development and other social activities.
Unemployment Rate – proportion in percent of the total number of unemployed persons
to the total number of persons in the labor force.
__________________
ϯNational Statistical Coordination Board. (2009). Philippine statistical yearbook. (2009 edition). Makati City,
Philippines: Author.
**Note that crime rate = (total crime incidence/population)*100,000. 100,000 is a magnifier, and as such any power of
ten may be used. Usually, 1,000 and 100,000 are used as magnifiers.
8 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Review of Related Literature
Before, crime had been viewed as a moral and social construct. Over the years,
however, there has been a shift from a social point of view to an economic one. At the
forefront of this economic view on crime is Gary Becker‘s work Crime and Punishment:
An Economic Approach published in 1968. Here, Becker views crime as an economic
construct—one that presents opportunity and economic costs and has a supply of offenses.
He further asserts that ―some persons become ‗criminals‘...not because their basic
motivation differs...but because their benefits and costs differ.‖ Thus, according to
Becker, crime is not entirely psychological or social, but economic in which choices and
utility are of importance.
Ehrlich (On the Relation Between Education and Crime, 1975) attempts to
establish a link between education and crime, again from an economic perspective. In his
work, Ehrlich states that education may be viewed as an opportunity-maker. Education,
Ehrlich postulates, is important for on-the-job training. These two, in turn help determine
labour distribution and personal income. He further states that it is not educational
attainment that is closely related to crimes. Rather, it is the ―inequalities in the
distribution of schooling‖ that is ―strongly related to the incidence of many crimes.‖
Ehrlich echoes Becker‘s statement that the behaviour of crime is not merely
psychological or social, but also economic. Specifically, this is due to the ―relative
earnings‖ of offenders between legitimate and illegitimate activities. As a form of
rehabilitation, Ehrlich suggests training geared towards legitimate activities before
convicts are released from prison.
According to Wadsworth (2001), employment is an important factor that affects
crime rate. He asserts, ―both industrial composition and labor force participation…have
direct and indirect effects on violent and property crime rates. These effects cannot be
explained entirely by the fact that individuals who are unemployed commit more crimes.
There is a contextual influence of weak labor market opportunity that operates above and
beyond influencing individual employment experiences.‖ As such, it is not merely the
9 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
fact that they are unemployed that turns them into criminals. It is mostly due to the
individual experiences a person has of employment or lack thereof. This is a perspective
that Ehrlich, Becker, and Wadsworth seem to share.
As for Reynolds (2000), an implementation of more ‗get-tough policies‘ would be
a helpful deterrent to crime rate since ‗federal programs to reduce the so-called root
causes have done…more harm than good.‘ Curtly said, Reynolds believes that mere
programs to alleviate the ‗root causes‘ of crime would not deter it. Serious and strict
policy-making is the key to deterring crime rate. Other factors may be the urbanization of
a place (since urbanization opens new avenues for crimes) and police visibility (Sanidad-
Leones, 2010).
Yasir, et al. (2009) also state in their study of crime rate in Pakistan that poverty,
unemployment, inflation, and volatile policies may contribute to the rise of crime rate.
They further assert that a possible way to alleviate this would be the formulation of stable
economic policies. In general, economic factors such as mentioned above affect crime
rate. This is especially true for the ―policy-sensitive variables.‖ A possible solution to this
is the ―combination of counter-cyclical redistributive policies…and increases in the
resources of apprehending and convicting criminals…especially during economic
recessions‖ (Fajnzylber, et al., 1998).
Gillado and Cruz (2004) constructed a regression model for three different
classifications of crime—against property, against person, and rape. It is in this work that
they incorporated social, demographic, and economic factors. The following variables
were considered for the three models: per capita regional domestic products, average
income of people in rural and urban areas, cohort survival rates in elementary and
secondary education, corruption index, police population, population density, alcohol
consumption, Gini coefficient, unemployment rate, and consumer price index.
Although crime may still be a social construct, it can be observed that there is a
shift from a social perspective to an economic one. However, considering only one
10 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
perspective would be insufficient, since both perspectives are applicable to crime.
Although there may be opposing views regarding the factors of crime, it cannot be denied
that these factors, both economic and social, play their roles in affecting crime.
The variables considered for this study are heavily based on the works
abovementioned. For this particular study, the variables that were taken into
consideration are poverty incidence, unemployment rate, police force population, CPI,
and population density. These variables coincide with the variables mentioned in the
study of Gillado and Cruz. Data on cohort survival rates, however, are not available for
provinces. In order to account for the possible effect of education on crime, the
researchers have included literacy, functional literacy, and enrolment as variables. Other
variables not included in the abovementioned works, but were included in this study are
expenditures on social services, number of courts, family income and expenditure, human
development index (HDI), and geographical setting (based on archipelagic division).
11 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Methodological Sketch
The data used in this study are obtained from the National Statistical Coordination
Board‘s publication The Philippine Countryside in Figures, which is available both in
print and in electronic format. The electronic format may be accessed via
http://www.nscb.gov.ph/countryside/default.asp. Note that in the NSCB Philippine
Statistical Yearbook, crime rate is defined as the number of crimes per 100,000
population. However, for this study, crime rate is defined per 1,000 population.
A level of significance of 0.05 was set prior to any fitting or testing procedure.
This level of significance was chosen since studies in crime rate do not present very
severe consequences. However, the subject matter itself is of importance, since it is one
of the foremost problems present in the Philippines.
Before fitting, the data is subjected to checking. Here we check for any missing
values for the variables, especially for the dependent variable. Since SAS would omit
observations with missing values, these observations were deleted from the data set. The
data set was also checked for any possible encoding errors. For example, the province of
Camiguin recorded a number of 73549 policemen, whereas the rest of the observations
would range from about 400 to 2000 only. This observation was, then, deleted as it
presents a possible encoding error.
The variable crime rate was then regressed on the seventeen variables population
density, poverty incidence, family income, family expenditures, literacy rate, functional
literacy rate, consumer price index, human development index, unemployment rate,
expenditures on social services, number of courts, number of policemen, enrolment rate,
cohort survival rates for elementary and secondary education, and the two dummy
variables for location. In order to check whether at least one of the independent variables
would be able to explain the variability found in crime rate, the F-test was used. Each of
these independent variables‘ significance was assessed through the t-test, in which a p-
value of greater than the stipulated 0.05 level of significance will lead to the removal of
12 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
that corresponding independent variable. As such, the independent variable with the
highest p-value is removed first. Crime rate is then regressed on the remaining
independent variables, and the same process is repeated until all the independent
variables become significant.
The coefficient of multiple determination, R2, is not the only criterion in checking
the soundness of the model. In order to assess whether or not the model is good, several
diagnostic tests have to be performed. These include tests on multicollinearity, normality,
heteroskedasticity, linearity, and autocorrelation. Furthermore, an assessment of outliers
is essential to ascertain whether or not any outliers would greatly influence the model.
Multicollinearity among independent variables was checked with the use of
condition indices and proportion of variation. Normality was checked using Wilk-Shapiro,
Kolmogorov-Smirnov, Cramer-Von Mises, and Anderson-Darling tests, for which p-
values should not be less than the level of significance, 0.05. Heteroskedasticity was
checked through the shape of the residual plot (versus the predicted value of y). A funnel-
shaped plot would indicate a problem in heteroskedasticity. In order to be more certain
about problems with heteroskedasticity, White‘s test and the spec option were utilized.
Autocorrelation was checked using the Durbin-Watson statistic for which a value of d
close to 2 is desired. Departures from linearity were checked using partial regression
plots. As for outliers, there are two areas for which they have to be detected: outliers in
the dependent variable and outliers in the independent variables. Outliers among the
dependent variable are detected using Studentized Residuals. Values for the Studentized
residuals that exceed those corresponding to the t table imply that the observations
corresponding to that Studentized residual value are considered outliers. Outliers on the
independent variables are detected through the leverage (the diagonals of the Hat matrix).
A leverage greater than the cut off (2p/n) implies that the observation corresponding to
that leverage is an outlier. Not all outliers are influential. Thus, it is also necessary to
check the influence of the outliers. Influence may be checked through Cook‘s D, DFFITS,
and DFBETAS.
13 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
If any of these criteria is violated, necessary actions such as transformations and
removal of unimportant independent variables would have to be performed. On the other
hand, if the proposed model meets all the criteria, then the model can reasonably predict
crime rate.
14 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Results and Discussion
Preliminary results and Diagnostic Checking
In order to ascertain which factors influence crime rate, a regression model was
built. The initial model for crime rate is
𝐶𝑟𝑖𝑚𝑒 = 𝛽0 + 𝛽1𝑝𝑜𝑝𝑑𝑒𝑛 + 𝛽2𝑝𝑜𝑣𝑖𝑛𝑐 + 𝛽3𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟 + 𝛽4𝑝𝑛𝑝 + 𝛽5𝑓𝑎𝑚𝑖𝑛𝑐
+ 𝛽6𝑓𝑎𝑚𝑒𝑥𝑝 + 𝛽7𝑙𝑖𝑡𝑟 + 𝛽8𝑓𝑙𝑖𝑡 + 𝛽9𝑒𝑛𝑟𝑜𝑙𝑚𝑒𝑛𝑡 + 𝛽10𝑐𝑝𝑖 + 𝛽11𝑑𝑖
+ 𝛽12𝑠𝑜𝑐𝑠𝑒𝑟𝑣 + 𝛽13𝑐𝑜𝑢𝑟𝑡𝑠 + 𝛽14𝑔𝑒𝑜𝑔1 + 𝛽15𝑔𝑒𝑜𝑔2 + 𝛽16𝑐𝑜𝑠𝑢𝑟𝑣𝑒
+ 𝛽17𝑐𝑜𝑠𝑢𝑟𝑣𝑠 + 𝜀
where crime = crime rate per province
popden = population density per province
povinc = poverty incidence per province
unemployr = unemployment rate per province
pnp = number of policemen per province
faminc = average family income per province
famexp = average family expenditure per province
litr = literacy rate per province
flit = functional literacy per province
enrolment = enrolment rate per province
cpi = consumer price index per province
hdi = human development index per province
socserv = expenditures on social services per province
courts = number of courts per province
geog1 = 1 if the province is in Luzon, 0 if otherwise
geog2 = 1 if the province is in Visayas, 0 if otherwise.
cohsurve = cohort survival rate for elementary education
cohsurvs = cohort survival rate for secondary education
ε ~ N(0, σ2)
15 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Using the F-test under ANOVA, it is apparent that at least one of the independent
variables can explain crime rate. And in checking the t-values, there are, indeed, some
independent variables that are significant.
Table 1. Analysis of Variance Results
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 17 1102.64607 64.86153 7.42 <.0001
Error 40 349.53181 8.73830
Corrected Total 57 1452.17788
Root MSE 2.95606 R-Square 0.7593
Dependent Mean 4.30810 Adj R-Sq 0.6570
Coeff Var 68.61629
Table 2. Individual T-tests Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 6.50746 22.06194 0.29 0.7695
POPDEN POPDEN 1 0.00550 0.00250 2.20 0.0333
POVINC POVINC 1 -0.14712 0.06577 -2.24 0.0309
UNEMPLOYR UNEMPLOYR 1 -0.15076 0.09856 -1.53 0.1340
PNP PNP 1 -0.00532 0.00126 -4.22 0.0001
FAMINC FAMINC 1 -0.00013465 0.00007928 -1.70 0.0972
FAMEXP FAMEXP 1 0.00009779 0.00009885 0.99 0.3285
LITR LITR 1 0.12483 0.17442 0.72 0.4783
FLIT FLIT 1 -0.13971 0.11416 -1.22 0.2282
ENROLMENT ENROLMENT 1 -0.07913 0.10521 -0.75 0.4564
CPI CPI 1 -0.02358 0.04865 -0.48 0.6306
GEOG1 GEOG1 1 1.23942 2.05934 0.60 0.5507
GEOG2 GEOG2 1 1.09009 1.53537 0.71 0.4818
HDI HDI 1 49.39113 27.34424 1.81 0.0784
SOCSERV SOCSERV 1 0.00952 0.00579 1.64 0.1082
COURTS COURTS 1 0.45229 0.11380 3.97 0.0003
COHSURVE COHSURVE 1 -0.27471 0.12593 -2.18 0.0351
COHSURVS COHSURVS 1 0.11458 0.21994 0.52 0.6053
The coefficient of multiple determination has a value of 0.7593. This means that
the model formulated can explain 75.93 percent of the variability found in crime rate. The
mean sum of squares due to regression is also relatively large compared to the mean sum
of squares due to error. This means that the variability found in crime rate may be
attributed to the regression model rather than the error.
16 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Out of the seventeen independent variables in the model, only four turned out to
be significant. These are population density, poverty incidence, number of policemen,
and the number of courts. The first variable to be removed is CPI since it has the highest
p-value at a value (0.6306).
Table 3. Individual T-tests without CPI
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 1.22757 19.00492 0.06 0.9488
POPDEN POPDEN 1 0.00516 0.00237 2.18 0.0354
POVINC POVINC 1 -0.14110 0.06398 -2.21 0.0331
UNEMPLOYR UNEMPLOYR 1 -0.15067 0.09764 -1.54 0.1305
PNP PNP 1 -0.00531 0.00125 -4.25 0.0001
FAMINC FAMINC 1 -0.00013025 0.00007802 -1.67 0.1027
FAMEXP FAMEXP 1 0.00009276 0.00009738 0.95 0.3464
LITR LITR 1 0.11264 0.17098 0.66 0.5137
FLIT FLIT 1 -0.14148 0.11303 -1.25 0.2178
ENROLMENT ENROLMENT 1 -0.07440 0.10378 -0.72 0.4775
GEOG1 GEOG1 1 1.01635 1.98843 0.51 0.6120
GEOG2 GEOG2 1 0.98527 1.50581 0.65 0.5166
HDI HDI 1 52.12947 26.50338 1.97 0.0560
SOCSERV SOCSERV 1 0.00910 0.00567 1.60 0.1165
COURTS COURTS 1 0.44333 0.11123 3.99 0.0003
COHSURVE COHSURVE 1 -0.26792 0.12397 -2.16 0.0366
COHSURVS COHSURVS 1 0.12224 0.21731 0.56 0.5768
After removing CPI, cohort survival rate for elementary became significant. The
R2 dropped from 0.7593 to 0.7579. GEOG1 was then removed from the model because of
its high p-value.
17 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Table 4. Individual T-tests without CPI and GEOG1
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 -3.28597 16.68016 -0.20 0.8448
POPDEN POPDEN 1 0.00495 0.00232 2.14 0.0384
POVINC POVINC 1 -0.12928 0.05913 -2.19 0.0344
UNEMPLOYR UNEMPLOYR 1 -0.14410 0.09593 -1.50 0.1406
PNP PNP 1 -0.00547 0.00120 -4.55 <.0001
FAMINC FAMINC 1 -0.00013425 0.00007695 -1.74 0.0884
FAMEXP FAMEXP 1 0.00010185 0.00009490 1.07 0.2893
LITR LITR 1 0.09829 0.16717 0.59 0.5597
FLIT FLIT 1 -0.14496 0.11183 -1.30 0.2020
ENROLMENT ENROLMENT 1 -0.05650 0.09682 -0.58 0.5626
GEOG2 GEOG2 1 0.58537 1.27523 0.46 0.6486
HDI HDI 1 53.08694 26.20356 2.03 0.0492
SOCSERV SOCSERV 1 0.00877 0.00559 1.57 0.1240
COURTS COURTS 1 0.46781 0.09950 4.70 <.0001
COHSURVE COHSURVE 1 -0.25504 0.12031 -2.12 0.0400
COHSURVS COHSURVS 1 0.16347 0.20001 0.82 0.4184
For this model the R2 is 0.7563, which is still not different from the initial
model‘s R2. The number of significant variables did not change in any way, either. The
variable GEOG2 is then deleted since its p-value is the highest among the remaining
variables.
Table 5. Individual T-tests without CPI, GEOG1, and GEOG2
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 -2.28893 16.38564 -0.14 0.8896
POPDEN POPDEN 1 0.00495 0.00230 2.16 0.0366
POVINC POVINC 1 -0.12928 0.05859 -2.21 0.0327
UNEMPLOYR UNEMPLOYR 1 -0.13270 0.09181 -1.45 0.1556
PNP PNP 1 -0.00533 0.00115 -4.63 <.0001
FAMINC FAMINC 1 -0.00013622 0.00007612 -1.79 0.0806
FAMEXP FAMEXP 1 0.00010224 0.00009402 1.09 0.2829
LITR LITR 1 0.09915 0.16562 0.60 0.5525
FLIT FLIT 1 -0.14972 0.11032 -1.36 0.1818
ENROLMENT ENROLMENT 1 -0.05250 0.09554 -0.55 0.5855
HDI HDI 1 53.44495 25.95045 2.06 0.0455
SOCSERV SOCSERV 1 0.00898 0.00552 1.63 0.1107
COURTS COURTS 1 0.46252 0.09792 4.72 <.0001
COHSURVE COHSURVE 1 -0.24798 0.11823 -2.10 0.0419
COHSURVS COHSURVS 1 0.14043 0.19183 0.73 0.4681
18 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Here, it can be observed that there are six significant variables after removing
geog2. The coefficient of multiple determination dropped from 0.7563 to0.7551, which is
still not a far cry from the initial R2.
The next variables to be deleted were enrolment, literacy rate, cohort survival rate
for secondary education, functional literacy, family expenditure, family income, human
development index, cohort survival rate for elementary education, and expenditures on
social services, deleted one at a time. The results are shown on the following table.
Table 6. ANOVA Results and Individual T-tests for the Modified Model
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 5 1014.48120 202.89624 23.29 <.0001
Error 56 487.93771 8.71317
Corrected Total 61 1502.41891
Root MSE 2.95181 R-Square 0.6752
Dependent Mean 4.07475 Adj R-Sq 0.6462
Coeff Var 72.44154
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 9.07796 1.84835 4.91 <.0001
POPDEN POPDEN 1 0.00669 0.00182 3.68 0.0005
POVINC POVINC 1 -0.11473 0.03338 -3.44 0.0011
UNEMPLOYR UNEMPLOYR 1 -0.12234 0.07180 -1.70 0.0940
PNP PNP 1 -0.00414 0.00096282 -4.30 <.0001
COURTS COURTS 1 0.39710 0.08870 4.48 <.0001
Note that in this modified model, there are four significant variables at a level of
significance of 0.05: population density, poverty incidence, number of policemen, and
number of courts. It can be observed that unemployment rate is still not significant.
However, it is retained since theoretically, unemployment rate would have an effect on
crime rate. The F-test for this model implies that at least one of the independent variables
will be able to explain the variability found in crime rate. The R2, however, dropped.
From an initial value of 0.7597, it is now only 0.6752. Therefore, this model can only
19 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
explain 67.52 percent of the variability found in crime rate. This is understandable,
though, since 12 variables were removed from the model.
The new model is given by:
𝐶𝑟𝑖𝑚𝑒 = 𝛽0 + 𝛽1𝑝𝑜𝑝𝑑𝑒𝑛 + 𝛽2𝑝𝑜𝑣𝑖𝑛𝑐 + 𝛽3𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟 + 𝛽4𝑝𝑛𝑝 + 𝛽5𝑐𝑜𝑢𝑟𝑡𝑠 + 𝜀
where crime = crime rate per province
popden = population density per province
povinc = poverty incidence per province
unemployr = unemployment rate per province
pnp = number of policemen per province
courts = number of courts per province
ε ~ N(0,σ2)
Before this model can be accepted as the best model, diagnostic checking is
necessary. In checking for normality, we have the following result.
Table 7. Tests for Normality
Tests for Normality
Test --Statistic--- -----p Value------
Shapiro-Wilk W 0.90266 Pr < W 0.0001
Kolmogorov-Smirnov D 0.121649 Pr > D 0.0222
Cramer-von Mises W-Sq 0.211265 Pr > W-Sq <0.0050
Anderson-Darling A-Sq 1.377216 Pr > A-Sq <0.0050
Note that for all the tests for normality, the p-values are less than the level of
significance, 0.05. The null hypothesis of normality of error terms is then rejected. It is
necessary for remedial measures such as transformations to be performed. These will be
discussed later on in this section.
20 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
After normality, homoskedasticity is checked. A residual plot (versus the
predicted value of crime rate) is utilized in order to check for homoskedasticity. The
residual plot does not seem to exhibit any shape (funnel or diamond) that would imply
heteroskedasticity. In order to ascertain this, the spec option under the regression
procedure was used. The result indicates that the null hypothesis of constant variance
should not be rejected. Autocorrelation and multicollinearity were also checked. For
autocorrelation, the test statistic under the Durbin-Watson test is 2.246 with a first order
autocorrelation of -0.127. For a negative value of the first order autocorrelation, the
statistic 4 – d is used instead of d. This will yield a value of 1.754. This value is
compared to the tabulated values for the Durbin-Watson test. If d>du, the null hypothesis
is not rejected. However, if d<dl, the null hypothesis is rejected. Using the table, the
values are n = 65, k‘ = 5, dl = 1.438, and du = 1.767, it can be observed that d lies in
between dl and du. In this case, the test becomes inconclusive. As for multicollinearity, it
can be observed that the condition indices do not exceed 30. Thus, the model is free of
problems on multicollinearity. Results for these are shown on Table 8.
Table 8. Results for Spec Option, Durbin-Watson Test, and Multicollinearity Indicators
Collinearity Diagnostics*
Condition
Number Eigenvalue Index
1 4.83022 1.00000
2 0.63842 2.75062
3 0.24895 4.40480
4 0.13439 5.99517
5 0.11903 6.37016
6 0.02898 12.90959
Test of First and Second
Moment Specification
DF Chi-Square Pr > ChiSq
20 16.96 0.6558
Durbin-Watson D 2.246
Number of Observations 62
1st Order Autocorrelation -0.127
*Proportion of variation is omitted.
21 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Linearity is also checked to ascertain whether any departures from it may be
observed. Partial regression plots were obtained for crime rate versus each independent
variable, population density, poverty incidence, number of policemen, and number of
courts. The plots show that no distinct departure from linearity may be observed (plots
found in the appendices).
For outliers, it can be observed that there are three possible outliers from the
observations. These may be detected through Studentized residuals for dependent
variables and leverages for independent variables. The cut-off for Studentized residuals is
equal to two. As for the leverage, the cut-off computed is 0.1935. Observations 12, 18,
and 58 are possible outliers. The same observations may be influential as well. In
checking for influence, Cook‘s D, DFFITS, and DFBETAS have to be consulted. The
cut-offs for DFFITS and DFBETAS are 0.622 and 0.254 respectively.
Table 9. Outliers and Influential Observations
Output Statistics
Dependent Predicted Std Error Std Error Student Cook's
Obs Variable Value Mean Predict Residual Residual Residual D
12 25.6633 19.7931 1.6627 5.8702 2.439 2.407 0.449
18 10.0296 4.4596 0.8913 5.5701 2.814 1.979 0.066
Hat Diag Cov
Obs RStudent H Ratio DFFITS
12 2.5191 0.3173 0.8476 1.7174
18 2.0341 0.0912 0.7934 0.6442
Dependent Predicted Std Error Std Error Student Cook's
Obs Variable Value Mean Predict Residual Residual Residual D
58 22.6754 14.3599 1.2853 8.3155 2.657 3.129 0.382
Hat Diag Cov
Obs RStudent H Ratio DFFITS
58 3.4141 0.1896 0.4339 1.6514
Observation 12 has a Studentized residual equal to 2.407 and a Studentized
deleted residual of 2.5191. Its leverage is equal to 0.3173. These values exceed the cut-
offs computed. This observation may be considered influential since its Cook‘s D has the
highest value relative to the other observations. Moreover, its DFFITS is equal to 1.7174
22 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
which is well beyond the cut-off. DFBETAS under the variables UNEMPLOYR, PNP,
and COURTS exceed the cut-off, and may be candidates for influential observations.
Observation 18 has a Studentized deleted residual equal to 2.0437 but its leverage
is equal to 0.0912. This implies that the value of crime rate may be an outlier for this
observation while its independent variables are not. Its Cook‘s D is not relatively high;
however, its DFFITS is equal to 0.6442. The DFBETAS under the variables POVINC,
PNP, and COURTS exceed the cut-off, and may also be candidates for influential
observations.
Observation 58 has a Studentized residual equal to 3.129, a Studentized deleted
residual equal to 3.4141, and leverage equal to 0.1896. These support the supposition that
observation 58 may be an outlier. It may also be influential since its DFFITS is equal to
1.6514, and the DFBETAS under POPDEN and POVINC exceed the cut-off.
23 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Corrective Measures
Since there is a problem in normality, it is necessary to perform corrective
measures. One possible corrective measure is transformation of variables. For this
particular study, several transformations of variables were made. Some of these are the
natural logarithms and square roots. Several combinations of transformed variables were
also considered in order to correct the problem. Finally, the combination of the square
roots of crime rate, poverty incidence, and number of courts (coded as sqcrime, sqpovinc,
and sqcourts respectively) and population density helped in correcting the problem of
normality.
Thus, the transformed model is
𝑠𝑞𝑐𝑟𝑖𝑚𝑒 = 𝛽0 + 𝛽1𝑝𝑜𝑝𝑑𝑒𝑛 + 𝛽2𝑠𝑞𝑝𝑜𝑣𝑖𝑛𝑐 + 𝛽3𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟 + 𝛽4𝑝𝑛𝑝 + 𝛽5𝑠𝑞𝑐𝑜𝑢𝑟𝑡𝑠
+ 𝜀
where sqcrime = square root of the crime rate per province
popden = population density per province
sqpovinc = square root of the poverty incidence per province
unemployr = unemployment rate per province
pnp = number of policemen per province
sqcourts = square root of the number of courts per province
ε ~ N(0, σ2)
Again, this model is checked for normality, autocorrelation, heteroskedasticity,
linearity, multicollinearity, and outliers.
24 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
After transformation, there was a significant increase in the R2 of the model.
Recall that the R2 of the previous model was 0.6752. The R
2 of the transformed model is
equal to 0.7141. This means that there is an improvement in the amount of variability the
transformed model can explain. The four independent variables are still significant at a
level of significance of 0.05, while unemployment rate remains insignificant. Results of
which are found on the following table.
Table 10. ANOVA Results and Parameter Estimates for the Transformed Model
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 5 45.29466 9.05893 27.98 <.0001
Error 56 18.13036 0.32376
Corrected Total 61 63.42501
Root MSE 0.56900 R-Square 0.7141
Dependent Mean 1.74693 Adj R-Sq 0.6886
Coeff Var 32.57121
Parameter Estimates
Parameter
Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 2.95684 0.55258 5.35 <.0001 POPDEN POPDEN 1 0.00081505 0.00036395 2.24 0.0291 SQPOVINC SQPOVINC 1 -0.29978 0.07181 -4.17 0.0001 UNEMPLOYR UNEMPLOYR 1 -0.00672 0.01379 -0.49 0.6278 SQCOURTS SQCOURTS 1 0.55679 0.09904 5.62 <.0001 PNP PNP 1 -0.00096767 0.00018859 -5.13 <.0001
The first tests to be performed after transformation are tests for heteroskedasticity,
multicollinearity, and autocorrelation. The results for which are shown on Table 11.
25 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Table 11. Tests for Multicollinearity, Heteroskedasticity and Autocorrelation
Collinearity Diagnostics
Number Eigenvalue Condition Index
1 5.15775 1.00000
2 0.45889 3.35256
3 0.18424 5.29102
4 0.10986 6.85197
5 0.07841 8.11067
6 0.01086 21.79524
Collinearity Diagnostics
-----------------------------Proportion of Variation----------------------------
Number Intercept POPDEN SQPOVINC UNEMPLOYR PNP SQCOURTS
1 0.00060965 0.00725 0.00093940 0.00474 0.00512 0.00305
2 0.00234 0.40328 0.01132 0.02230 0.00123 0.00690
3 0.00195 0.22684 0.00249 0.14538 0.45895 0.04345
4 0.01717 0.01756 0.05286 0.71655 0.14420 0.02612
5 0.00027899 0.25731 0.02890 0.04728 0.36060 0.72823
6 0.97765 0.08777 0.90349 0.06375 0.02990 0.19225
Test of First and Second
Moment Specification
DF Chi-Square Pr > ChiSq
20 16.12 0.7094
Durbin-Watson D 2.063
Number of Observations 62
1st Order Autocorrelation -0.037
Note that the condition indices are less than 30. This implies that there is no
multicollinearity after transforming the model. The Durbin-Watson test statistic has a
value equal to 2.063, and a first order autocorrelation equal to -0.037. For this, consider
instead the statistic 4 – d. This yields a value of 1.937, which is close to 2. Also, if this
statistic is compared to the value on the Durbin-Watson table (recall that du is equal to
1.767), it can be observed that the value for 4 – d exceeds du. This will lead to the non-
rejection of the null hypothesis of no autocorrelation. Thus, it can be concluded that there
is no problem of autocorrelation.
Next, the outliers and influential observations were addressed. Recall that there
are three outliers and influential observations. These observations were removed one at a
time, and at each removal of observation, a diagnostic check is performed. At each stage,
the model is checked for multicollinearity, autocorrelation, normality, outliers, and
26 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
heteroskedasticity. After removing these three observations, the final model is, again
checked for the previously mentioned criteria. The results for these are found in
Appendices A-30 onwards.
Table 12. ANOVA, Parameter Estimates, and Multicollinearity Tests
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 5 33.58547 6.71709 30.99 <.0001
Error 52 11.27129 0.21676
Corrected Total 57 44.85676
Root MSE 0.46557 R-Square 0.7487
Dependent Mean 1.61632 Adj R-Sq 0.7246
Coeff Var 28.80432
Parameter Estimates
Parameter Standard Variance
Variable Label DF Estimate Error t Value Pr > |t| Inflation
Intercept Intercept 1 2.59768 0.48076 5.40 <.0001 0
POPDEN POPDEN 1 0.00076341 0.00031984 2.39 0.0207 1.60853
SQPOVINC SQPOVINC 1 -0.25736 0.06291 -4.09 0.0001 1.44434
UNEMPLOYR UNEMPLOYR 1 -0.00848 0.01183 -0.72 0.4767 1.02209
PNP PNP 1 -0.00082873 0.00015836 -5.23 <.0001 1.54602
SQCOURTS SQCOURTS 1 0.53004 0.08455 6.27 <.0001 2.10035
Collinearity Diagnostics
Number Eigenvalue Condition Index
1 5.17743 1.00000
2 0.44097 3.42650
3 0.18328 5.31492
4 0.10913 6.88793
5 0.07905 8.09268
6 0.01014 22.60067
-----------------------------Proportion of Variation-----------------------------
Number Intercept POPDEN SQPOVINC UNEMPLOYR PNP
SQCOURTS
1 0.00057169 0.00744 0.00086302 0.00465 0.00501 0.00302
2 0.00239 0.43102 0.01025 0.02384 0.00092325 0.00725
3 0.00261 0.23433 0.00378 0.10920 0.47734 0.04222
4 0.01606 0.02305 0.05667 0.79816 0.04533 0.00171
5 0.00010800 0.22789 0.01191 0.00065690 0.43714 0.75883
6 0.97826 0.07627 0.91652 0.06349 0.03426 0.18696
27 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Table 13. Results for Spec Option, Durbin-Watson Test, and Tests for Normality
Test of First and Second
Moment Specification
DF Chi-Square Pr > ChiSq
20 11.33 0.9372
Durbin-Watson D 2.142
Number of Observations 58
1st Order Autocorrelation -0.081
Tests for Normality
Test --Statistic--- -----p Value------
Shapiro-Wilk W 0.980824 Pr < W 0.4877
Kolmogorov-Smirnov D 0.096603 Pr > D >0.1500
Cramer-von Mises W-Sq 0.095625 Pr > W-Sq 0.1290
Anderson-Darling A-Sq 0.486635 Pr > A-Sq 0.2244
After these corrective measures, the final model has a coefficient of multiple
determination equal to 0.7487. This means that the model can explain 74.87 percent of
the variability found in crime rate. Note that its VIF‘s do not greatly exceed 10 and its
condition indices are all less than 30. This means that multicollinearity is not a problem
with this model. Linearity is checked using partial regression plots, and again, there seem
to be no distinct departures from linearity. Also, the Durbin-Watson test statistic shows a
value equal to 2.142 with a first order autocorrelation equal to -0.081. Again, the statistic
4 – d is considered instead, and a value of 1.858 is obtained, which is still close to 2. If
this is compared to the tabulated value (under the Durbin-Watson Table, n = 60, k‘ = 5),
d = 1.858 is greater than du = 1.767. The null hypothesis of no autocorrelation is not
rejected. Thus, there is no autocorrelation present. Under this model, the p-values are
well beyond the level of significance of 0.05. This implies that the null hypothesis that
the error terms are normally distributed is not rejected. Thus, the error terms follow a
normal distribution. In testing for heteroskedasticity, the spec option in SAS is used.
Since the Chi-square value computed is equal to 11.33, and its p-value is equal to 0.9372,
the null hypothesis of constant variance is not rejected. Thus, there is no
heteroskedasticity.
28 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Conclusion
The final model of crime rate is then, equal to
𝐶𝑟𝑖𝑚𝑒 ∗= 𝛽0 + 𝛽1𝑝𝑜𝑝𝑑𝑒𝑛 + 𝛽2𝑝𝑜𝑣𝑖𝑛𝑐 ∗ +𝛽3𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟 + 𝛽4𝑝𝑛𝑝 + 𝛽5𝑐𝑜𝑢𝑟𝑡𝑠 ∗ +𝜀
where crime rate * = sqcrime
povinc* = sqpovinc
pnp* = sqpnp
courts* = sqcourts.
The estimated model for crime rate is then equal to
𝐶𝑟𝑖𝑚𝑒 𝑟𝑎𝑡𝑒 ∗ = 2.1856 + 0.0008𝑝𝑜𝑝𝑑𝑒𝑛 − 0.2657𝑝𝑜𝑣𝑖𝑛𝑐 ∗ − 0.0001𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟
− 0.0008𝑝𝑛𝑝 + 0.5014𝑐𝑜𝑢𝑟𝑡𝑠 ∗
where each parameter estimate after β0 represents an increase or decrease in the
estimated mean of crime rate per unit increase in the corresponding independent variable
holding all other variables constant.
Note that for the independent variables poverty incidence (povinc),
unemployment rate (unemployr), and number of courts (courts), the signs of the
coefficients are adverse to theoretical expectations. As common sense would dictate, a
rise in poverty incidence would entail a rise in the crime rate. The same can be said for
unemployment rate. On the other hand, a rise in the number of courts would mean a
decrease in the crime rate. However, for this model, the relationships between poverty
incidence and crime rate, and unemployment rate and crime rate are inverted. That is, for
every increase in poverty incidence, crime rate decreases. For every increase in
unemployment rate, crime rate decreases. This is owed to the fact that during the year
2000, there was political and economic instability due to the ouster of Former President
Joseph Estrada. If the poverty incidence, unemployment rate, and crime rate for this year
29 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
are compared to the others, it can be observed that the poverty incidence and the
unemployment rate for the year 2000 are high and the crime rate for the same year is low,
relative to other years.
As for the number of courts, it can be observed that there is a direct relationship
between this and crime rate. This may be due to the fact that as the number of courts
increases, the opportunity for people to file cases would also increase. Thus, there would
also be an increase in the number of reported crimes, which would lead to an increase in
the crime rate. Although there is a difference between theory and empirical data in this
study, the model obtained is not extraordinary, and is still a plausible one.
Thus, through this model, we were able to establish a linear relationship between
crime and the factors population density, poverty incidence, number of police per
province, and number of courts per province. Since this model has satisfied the
conditions and assumptions, the model may be a plausible predictor of crime rate in the
Philippines.
30 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Recommendations
As previously mentioned, this particular study focused on crime rate in the
Philippines from provincial data for the year 2000 only. As a means of improving the
study, the group recommends considering data from other time periods as well as data
from municipalities or cities. The group also recommends formulating separate
regression lines for the different classifications of crimes (index and non-index crimes,
and crimes against property and person) as different factors may affect each category of
crime. Separating the regression lines would allow for classification of factors among
different types of crime. This may lead to a better model in terms of the coefficient of
determination.
31 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
References
The Philippine countryside in figures. (n.d.). Retrieved May 18, 2010, from
http://www.nscb.gov.ph/countryside/default.asp
Becker, G. (1968). Crime and punishment: an economic approach. The Journal of
Political Economy, 76(2), 169-217. Retrieved from http://www.jstor.org/
Ehrlich, I. (1975). The deterrent effect of capital punishment: a question of life and death.
The American Economic Review, 65(3), 397-417. Retrieved from
http://www.jstor.org/
Ehrlich, I. (1975). On the relation between education on crime. In F.T. Juster (Ed).
Education, income, and human behavior (pp. 313–338). United States of America:
National Bureau of Economic Research.
Fajnzylber, P., et al. (1998). Determinants of crime rates in Latin America and the world:
an empirical assessment. Washington, D.C., United States of America: The
World Bank.
Gillado, M. F., & Cruz, T.T. (2004, October 4-5). Panel data estimation of crime rates in
the Philippines. Retrieved from
www.nscb.gov.ph/ncs/9thncs/papers/publicOrder_PanelData.pdf
National Statistical Coordination Board. (2003). The Philippine countryside in figures
(2003 edition). Makati City, Philippines: Author.
Reynolds, M. (2000). Crime and punishment in Texas in the 1990s. Retrieved from
http://www.ncpa.org/pub/st237?pg=6.
Sanidad-leones, c. (2010). The current situation of crime associated with urbanization:
problems experienced and countermeasures initiated in the Philippines. Retrieved
from http://www.unafei.or.jp/english/pdf/PDF_rms/no68/09_Leones-1_p133-
150.pdf.
Wadsworth, T. (2001). Employment, crime, and context: a multi-level analysis of the
relationship between work and crime (Doctoral Dissertation, University of
Washington). Available from the National Criminal Justice Reference Service
(NCJRS) website http://www.ncjrs.gov/pdffiles1/nij/grants/198118.pdf.
Yasir, S., et al. (2009). Unemployment, poverty, inflation, and crime nexus: cointegration
and causality analysis of Pakistan. Pakistan Economic and Social Review, 47(1),
pp. 79-98.
32 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
Appendices
Appendix A. Results (SAS Outputs)
Appendix B. Durbin-Watson Table