A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces - Revised

A Regression Analysis on the Determinants of Crime Rates Across Philippine

Provinces

A study presented to

Ms. Angela D. Nalica

Professor, Stat 136

In partial fulfilment of the requirements for

STAT 136: Introduction to Regression Analysis

University of the Philippines, Diliman, Quezon City

CRUZ, Clemence-Fatima

MACARAIG, Miguel Rodrigo

SANTOS, Marvin Allan

May 28, 2010

2 A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Abstract

This study focuses on the provincial crime rate in the Philippines for the year

2000. The aim is to ascertain the possible factors that affect the crime rate in the

Philippines using multiple linear regression. The results present that, at a 0.05 level of

significance, the following variables contribute to crime rate: population density, poverty

incidence, number of policemen, and number of courts.

Introduction

Crime is a truth that exists for all, whether it is taken as a moral or legal construct.

This is a truth that we would have to accept, no matter how appalling it may seem. Many

efforts have been exerted in order to eradicate crime. Unfortunately, a world or a country

without crime is strictly utopian. As such, an existence without crime is impossible to

achieve. For us to be able to eradicate the concept of crime, we must first remove the

concepts it violates which are morals and laws, which is, again, a utopian task. This

would be a discourse meant for philosophical minds, and would therefore beyond our

concerns. Thus, the best course of action would be to deter or lessen the crime incidence

or lower the crime rate that prevails in our country.

Here in the Philippines, crime is one of the foremost problems present. However,

in the midst of more urgent problems such as poverty, corruption, and hunger, crime

loses most of its significance and is then relegated to the bottom of the Philippines‘s long

list of problems. The solution to crime deterrence becomes limited to debates on revising

punishments for crimes and reinstating the death penalty—a punishment which has no

proven effect of deterring crime rate. What country officials fail to recognize is that band-

aid solutions such as imposing severe punishments do not work on large-scale problems

such as this. If so, what do we have to do in order to address this problem?

To any problem, the solution is to ascertain its true cause and attack it from its

roots. This may sound easy and simple enough; however in our country where problems

are more tangled than politics, this would be a complicated task. In most cases, people do

not seem to know for certain which problem causes which. For example: is the

Philippines poor because there is a high incidence of corruption? Or is the Philippines

corrupt because there is poverty? The same goes for crime rate. Is there a high rate of

crime because the Philippines is poor? Or is it that the Philippines is poor because there is

a high crime rate? Here lies the dilemma.

If the possible factors that affect crime rate are correctly or sufficiently identified,

a more feasible solution, by means of addressing or alleviating these factors, may be

formulated. And thus, we ask this question: what are the possible factors that affect crime

rate here in the Philippines, and by how much do these factors affect crime rate? It is in

hopes of answering these questions that this study was conducted.

This study aims to ascertain the possible factors that affect the crime rate here in

the Philippines, and to provide an estimate regarding the impact that these factors present

through the use of multiple linear regression. This study will focus on crime rate in the

Philippines using provincial data from the National Statistical Coordination Board

(NSCB). All data retrieved were from the year 2000.

Definition of Termsϯ

Crime Rate** - is the number of reported crimes per 100,000 population.

Cohort Survival Rate – the percentage of enrollees at the beginning grade or year in a

given school year who reached the final grade or year of the elementary of

secondary level.

Consumer Price Index (CPI) – Indicator of the change in the average prices of a fixed

basket of goods and services commonly purchased by households relative to a

base year.

Enrolment - total number of pupils/students who register/enlist in a school year.

Family Income – includes primary income and receipts from other sources received by

all family members during the calendar year as participants in any economic

activity or as recipients of transfers, pensions, grants, etc. (2000 FIES, NSO)

Primary income includes:

• Salaries and wages from employment.

• Commissions, tips, bonuses, family and clothing allowance, transportation and

representation allowance and honoraria.

• Other forms of compensation and net receipts derived from the operation of

family-operated enterprises/activities and the practice of profession or trade.

Income from other sources include:

• Imputed rental values of owner-occupied dwelling units.

• Interests.

• Rentals including land owner‘s share of agricultural products

• Pensions

• Support and value of food and non-food items received as gifts by the family (as

well as the imputed value of services rendered free of charge to the family).

• Receipts from family sustenance activities, which are not considered as family

operated enterprise.

Family Expenditures – refers to the expenses or disbursements made by the family

purely for personal consumption during the reference period. They exclude all

expenses in relation to farm or business operations, investment ventures, purchase

of real property and other disbursements which do not involve personal

consumption. Gifts, support, assistance or relief in goods and services received by

the family from friends, relatives, etc. and consumed during the reference period

are included in the family expenditures. Value consumed from net share of crops,

fruits and vegetables produced or livestock raised by other households, family

sustenance and entrepreneurial activities are also considered as family

expenditures.

Functional Literacy - represents a significantly higher level literacy which includes not

only reading and writing skills but also numeracy skills. This skill must be

sufficiently advanced to enable the individual to participate fully and effectively

in activities commonly occurring in his life situation that require a reasonable

capability of communicating by written language.

Gini Ratio – the ratio of the area between the Lorenz curve and the diagonal (the line of

perfect equality) to the area below the diagonal.

Notes: It is a measure of the extent to which the distribution of

income/ expenditure among families/individuals deviates

from a perfectly equal distribution, with limits 0 for perfect

equality and 1 for perfect inequality.

Gross Regional Domestic Product - aggregate of the gross value added or income from

each industry or economic activity of the regional economy.

Human Development Index - a measure of how well a country has performed, not only

in terms of real income growth, but also in terms of social indicators of people‘s

ability to lead a long and healthy life, to acquire knowledge and skills, and to have

access to the resources needed to afford a decent standard of living.

Literacy rate, Simple/Basic – the percentage of the population 10 years old and over,

who can read, write and understand simple messages in any language or dialect.

Population Density – refers to the number of persons per unit of land area (usually in

square kilometers). This measure is more meaningful if given as population per

unit of arable land.

Poverty Incidence – the proportion of families/individuals with per capita income /

expenditure less than the per capita poverty threshold to the total number of

families/individuals.

Province – the largest unit in the political structure of the Philippines. It consists, in

varying numbers, of municipalities and, in some cases, of component cities. Its

functions and duties in relation to its component cities and municipalities are

generally coordinative and supervisory.

Social Services - this covers expenditures for education, health, social security, labor and

employment, housing and community development and other social activities.

Unemployment Rate – proportion in percent of the total number of unemployed persons

to the total number of persons in the labor force.

__________________

ϯNational Statistical Coordination Board. (2009). Philippine statistical yearbook. (2009 edition). Makati City,

Philippines: Author.

**Note that crime rate = (total crime incidence/population)*100,000. 100,000 is a magnifier, and as such any power of

ten may be used. Usually, 1,000 and 100,000 are used as magnifiers.

Review of Related Literature

Before, crime had been viewed as a moral and social construct. Over the years,

however, there has been a shift from a social point of view to an economic one. At the

forefront of this economic view on crime is Gary Becker‘s work Crime and Punishment:

An Economic Approach published in 1968. Here, Becker views crime as an economic

construct—one that presents opportunity and economic costs and has a supply of offenses.

He further asserts that ―some persons become ‗criminals‘...not because their basic

motivation differs...but because their benefits and costs differ.‖ Thus, according to

Becker, crime is not entirely psychological or social, but economic in which choices and

utility are of importance.

Ehrlich (On the Relation Between Education and Crime, 1975) attempts to

establish a link between education and crime, again from an economic perspective. In his

work, Ehrlich states that education may be viewed as an opportunity-maker. Education,

Ehrlich postulates, is important for on-the-job training. These two, in turn help determine

labour distribution and personal income. He further states that it is not educational

attainment that is closely related to crimes. Rather, it is the ―inequalities in the

distribution of schooling‖ that is ―strongly related to the incidence of many crimes.‖

Ehrlich echoes Becker‘s statement that the behaviour of crime is not merely

psychological or social, but also economic. Specifically, this is due to the ―relative

earnings‖ of offenders between legitimate and illegitimate activities. As a form of

rehabilitation, Ehrlich suggests training geared towards legitimate activities before

convicts are released from prison.

According to Wadsworth (2001), employment is an important factor that affects

crime rate. He asserts, ―both industrial composition and labor force participation…have

direct and indirect effects on violent and property crime rates. These effects cannot be

explained entirely by the fact that individuals who are unemployed commit more crimes.

There is a contextual influence of weak labor market opportunity that operates above and

beyond influencing individual employment experiences.‖ As such, it is not merely the

fact that they are unemployed that turns them into criminals. It is mostly due to the

individual experiences a person has of employment or lack thereof. This is a perspective

that Ehrlich, Becker, and Wadsworth seem to share.

As for Reynolds (2000), an implementation of more ‗get-tough policies‘ would be

a helpful deterrent to crime rate since ‗federal programs to reduce the so-called root

causes have done…more harm than good.‘ Curtly said, Reynolds believes that mere

programs to alleviate the ‗root causes‘ of crime would not deter it. Serious and strict

policy-making is the key to deterring crime rate. Other factors may be the urbanization of

a place (since urbanization opens new avenues for crimes) and police visibility (Sanidad-

Leones, 2010).

Yasir, et al. (2009) also state in their study of crime rate in Pakistan that poverty,

unemployment, inflation, and volatile policies may contribute to the rise of crime rate.

They further assert that a possible way to alleviate this would be the formulation of stable

economic policies. In general, economic factors such as mentioned above affect crime

rate. This is especially true for the ―policy-sensitive variables.‖ A possible solution to this

is the ―combination of counter-cyclical redistributive policies…and increases in the

resources of apprehending and convicting criminals…especially during economic

recessions‖ (Fajnzylber, et al., 1998).

Gillado and Cruz (2004) constructed a regression model for three different

classifications of crime—against property, against person, and rape. It is in this work that

they incorporated social, demographic, and economic factors. The following variables

were considered for the three models: per capita regional domestic products, average

income of people in rural and urban areas, cohort survival rates in elementary and

secondary education, corruption index, police population, population density, alcohol

consumption, Gini coefficient, unemployment rate, and consumer price index.

Although crime may still be a social construct, it can be observed that there is a

shift from a social perspective to an economic one. However, considering only one

perspective would be insufficient, since both perspectives are applicable to crime.

Although there may be opposing views regarding the factors of crime, it cannot be denied

that these factors, both economic and social, play their roles in affecting crime.

The variables considered for this study are heavily based on the works

abovementioned. For this particular study, the variables that were taken into

consideration are poverty incidence, unemployment rate, police force population, CPI,

and population density. These variables coincide with the variables mentioned in the

study of Gillado and Cruz. Data on cohort survival rates, however, are not available for

provinces. In order to account for the possible effect of education on crime, the

researchers have included literacy, functional literacy, and enrolment as variables. Other

variables not included in the abovementioned works, but were included in this study are

expenditures on social services, number of courts, family income and expenditure, human

development index (HDI), and geographical setting (based on archipelagic division).

Methodological Sketch

The data used in this study are obtained from the National Statistical Coordination

Board‘s publication The Philippine Countryside in Figures, which is available both in

print and in electronic format. The electronic format may be accessed via

http://www.nscb.gov.ph/countryside/default.asp. Note that in the NSCB Philippine

Statistical Yearbook, crime rate is defined as the number of crimes per 100,000

population. However, for this study, crime rate is defined per 1,000 population.

A level of significance of 0.05 was set prior to any fitting or testing procedure.

This level of significance was chosen since studies in crime rate do not present very

severe consequences. However, the subject matter itself is of importance, since it is one

of the foremost problems present in the Philippines.

Before fitting, the data is subjected to checking. Here we check for any missing

values for the variables, especially for the dependent variable. Since SAS would omit

observations with missing values, these observations were deleted from the data set. The

data set was also checked for any possible encoding errors. For example, the province of

Camiguin recorded a number of 73549 policemen, whereas the rest of the observations

would range from about 400 to 2000 only. This observation was, then, deleted as it

presents a possible encoding error.

The variable crime rate was then regressed on the seventeen variables population

density, poverty incidence, family income, family expenditures, literacy rate, functional

literacy rate, consumer price index, human development index, unemployment rate,

expenditures on social services, number of courts, number of policemen, enrolment rate,

cohort survival rates for elementary and secondary education, and the two dummy

variables for location. In order to check whether at least one of the independent variables

would be able to explain the variability found in crime rate, the F-test was used. Each of

these independent variables‘ significance was assessed through the t-test, in which a p-

value of greater than the stipulated 0.05 level of significance will lead to the removal of

that corresponding independent variable. As such, the independent variable with the

highest p-value is removed first. Crime rate is then regressed on the remaining

independent variables, and the same process is repeated until all the independent

variables become significant.

The coefficient of multiple determination, R2, is not the only criterion in checking

the soundness of the model. In order to assess whether or not the model is good, several

diagnostic tests have to be performed. These include tests on multicollinearity, normality,

heteroskedasticity, linearity, and autocorrelation. Furthermore, an assessment of outliers

is essential to ascertain whether or not any outliers would greatly influence the model.

Multicollinearity among independent variables was checked with the use of

condition indices and proportion of variation. Normality was checked using Wilk-Shapiro,

Kolmogorov-Smirnov, Cramer-Von Mises, and Anderson-Darling tests, for which p-

values should not be less than the level of significance, 0.05. Heteroskedasticity was

checked through the shape of the residual plot (versus the predicted value of y). A funnel-

shaped plot would indicate a problem in heteroskedasticity. In order to be more certain

about problems with heteroskedasticity, White‘s test and the spec option were utilized.

Autocorrelation was checked using the Durbin-Watson statistic for which a value of d

close to 2 is desired. Departures from linearity were checked using partial regression

plots. As for outliers, there are two areas for which they have to be detected: outliers in

the dependent variable and outliers in the independent variables. Outliers among the

dependent variable are detected using Studentized Residuals. Values for the Studentized

residuals that exceed those corresponding to the t table imply that the observations

corresponding to that Studentized residual value are considered outliers. Outliers on the

independent variables are detected through the leverage (the diagonals of the Hat matrix).

A leverage greater than the cut off (2p/n) implies that the observation corresponding to

that leverage is an outlier. Not all outliers are influential. Thus, it is also necessary to

check the influence of the outliers. Influence may be checked through Cook‘s D, DFFITS,

and DFBETAS.

If any of these criteria is violated, necessary actions such as transformations and

removal of unimportant independent variables would have to be performed. On the other

hand, if the proposed model meets all the criteria, then the model can reasonably predict

crime rate.

Results and Discussion

Preliminary results and Diagnostic Checking

In order to ascertain which factors influence crime rate, a regression model was

built. The initial model for crime rate is

𝐶𝑟𝑖𝑚𝑒 = 𝛽0 + 𝛽1𝑝𝑜𝑝𝑑𝑒𝑛 + 𝛽2𝑝𝑜𝑣𝑖𝑛𝑐 + 𝛽3𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟 + 𝛽4𝑝𝑛𝑝 + 𝛽5𝑓𝑎𝑚𝑖𝑛𝑐

+ 𝛽6𝑓𝑎𝑚𝑒𝑥𝑝 + 𝛽7𝑙𝑖𝑡𝑟 + 𝛽8𝑓𝑙𝑖𝑡 + 𝛽9𝑒𝑛𝑟𝑜𝑙𝑚𝑒𝑛𝑡 + 𝛽10𝑐𝑝𝑖 + 𝛽11𝑕𝑑𝑖

+ 𝛽12𝑠𝑜𝑐𝑠𝑒𝑟𝑣 + 𝛽13𝑐𝑜𝑢𝑟𝑡𝑠 + 𝛽14𝑔𝑒𝑜𝑔1 + 𝛽15𝑔𝑒𝑜𝑔2 + 𝛽16𝑐𝑜𝑕𝑠𝑢𝑟𝑣𝑒

+ 𝛽17𝑐𝑜𝑕𝑠𝑢𝑟𝑣𝑠 + 𝜀

where crime = crime rate per province

popden = population density per province

povinc = poverty incidence per province

unemployr = unemployment rate per province

pnp = number of policemen per province

faminc = average family income per province

famexp = average family expenditure per province

litr = literacy rate per province

flit = functional literacy per province

enrolment = enrolment rate per province

cpi = consumer price index per province

hdi = human development index per province

socserv = expenditures on social services per province

courts = number of courts per province

geog1 = 1 if the province is in Luzon, 0 if otherwise

geog2 = 1 if the province is in Visayas, 0 if otherwise.

cohsurve = cohort survival rate for elementary education

cohsurvs = cohort survival rate for secondary education

ε ~ N(0, σ2)

Using the F-test under ANOVA, it is apparent that at least one of the independent

variables can explain crime rate. And in checking the t-values, there are, indeed, some

independent variables that are significant.

Table 1. Analysis of Variance Results

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 17 1102.64607 64.86153 7.42 <.0001

Error 40 349.53181 8.73830

Corrected Total 57 1452.17788

Root MSE 2.95606 R-Square 0.7593

Dependent Mean 4.30810 Adj R-Sq 0.6570

Coeff Var 68.61629

Table 2. Individual T-tests Parameter Estimates

Parameter Standard

Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 6.50746 22.06194 0.29 0.7695

POPDEN POPDEN 1 0.00550 0.00250 2.20 0.0333

POVINC POVINC 1 -0.14712 0.06577 -2.24 0.0309

UNEMPLOYR UNEMPLOYR 1 -0.15076 0.09856 -1.53 0.1340

PNP PNP 1 -0.00532 0.00126 -4.22 0.0001

FAMINC FAMINC 1 -0.00013465 0.00007928 -1.70 0.0972

FAMEXP FAMEXP 1 0.00009779 0.00009885 0.99 0.3285

LITR LITR 1 0.12483 0.17442 0.72 0.4783

FLIT FLIT 1 -0.13971 0.11416 -1.22 0.2282

ENROLMENT ENROLMENT 1 -0.07913 0.10521 -0.75 0.4564

CPI CPI 1 -0.02358 0.04865 -0.48 0.6306

GEOG1 GEOG1 1 1.23942 2.05934 0.60 0.5507

GEOG2 GEOG2 1 1.09009 1.53537 0.71 0.4818

HDI HDI 1 49.39113 27.34424 1.81 0.0784

SOCSERV SOCSERV 1 0.00952 0.00579 1.64 0.1082

COURTS COURTS 1 0.45229 0.11380 3.97 0.0003

COHSURVE COHSURVE 1 -0.27471 0.12593 -2.18 0.0351

COHSURVS COHSURVS 1 0.11458 0.21994 0.52 0.6053

The coefficient of multiple determination has a value of 0.7593. This means that

the model formulated can explain 75.93 percent of the variability found in crime rate. The

mean sum of squares due to regression is also relatively large compared to the mean sum

of squares due to error. This means that the variability found in crime rate may be

attributed to the regression model rather than the error.

Out of the seventeen independent variables in the model, only four turned out to

be significant. These are population density, poverty incidence, number of policemen,

and the number of courts. The first variable to be removed is CPI since it has the highest

p-value at a value (0.6306).

Table 3. Individual T-tests without CPI

Parameter Estimates

Parameter Standard

Intercept Intercept 1 1.22757 19.00492 0.06 0.9488

POPDEN POPDEN 1 0.00516 0.00237 2.18 0.0354

POVINC POVINC 1 -0.14110 0.06398 -2.21 0.0331

PNP PNP 1 -0.00531 0.00125 -4.25 0.0001

FAMINC FAMINC 1 -0.00013025 0.00007802 -1.67 0.1027

FAMEXP FAMEXP 1 0.00009276 0.00009738 0.95 0.3464

LITR LITR 1 0.11264 0.17098 0.66 0.5137

FLIT FLIT 1 -0.14148 0.11303 -1.25 0.2178

GEOG1 GEOG1 1 1.01635 1.98843 0.51 0.6120

GEOG2 GEOG2 1 0.98527 1.50581 0.65 0.5166

HDI HDI 1 52.12947 26.50338 1.97 0.0560

SOCSERV SOCSERV 1 0.00910 0.00567 1.60 0.1165

COURTS COURTS 1 0.44333 0.11123 3.99 0.0003

After removing CPI, cohort survival rate for elementary became significant. The

R2 dropped from 0.7593 to 0.7579. GEOG1 was then removed from the model because of

its high p-value.

Table 4. Individual T-tests without CPI and GEOG1

Parameter Estimates

Parameter Standard

Intercept Intercept 1 -3.28597 16.68016 -0.20 0.8448

POPDEN POPDEN 1 0.00495 0.00232 2.14 0.0384

POVINC POVINC 1 -0.12928 0.05913 -2.19 0.0344

PNP PNP 1 -0.00547 0.00120 -4.55 <.0001

FAMINC FAMINC 1 -0.00013425 0.00007695 -1.74 0.0884

FAMEXP FAMEXP 1 0.00010185 0.00009490 1.07 0.2893

LITR LITR 1 0.09829 0.16717 0.59 0.5597

FLIT FLIT 1 -0.14496 0.11183 -1.30 0.2020

GEOG2 GEOG2 1 0.58537 1.27523 0.46 0.6486

HDI HDI 1 53.08694 26.20356 2.03 0.0492

SOCSERV SOCSERV 1 0.00877 0.00559 1.57 0.1240

COURTS COURTS 1 0.46781 0.09950 4.70 <.0001

For this model the R2 is 0.7563, which is still not different from the initial

model‘s R2. The number of significant variables did not change in any way, either. The

variable GEOG2 is then deleted since its p-value is the highest among the remaining

variables.

Table 5. Individual T-tests without CPI, GEOG1, and GEOG2

Parameter Estimates

Parameter Standard

Intercept Intercept 1 -2.28893 16.38564 -0.14 0.8896

POPDEN POPDEN 1 0.00495 0.00230 2.16 0.0366

POVINC POVINC 1 -0.12928 0.05859 -2.21 0.0327

PNP PNP 1 -0.00533 0.00115 -4.63 <.0001

FAMINC FAMINC 1 -0.00013622 0.00007612 -1.79 0.0806

FAMEXP FAMEXP 1 0.00010224 0.00009402 1.09 0.2829

LITR LITR 1 0.09915 0.16562 0.60 0.5525

FLIT FLIT 1 -0.14972 0.11032 -1.36 0.1818

HDI HDI 1 53.44495 25.95045 2.06 0.0455

SOCSERV SOCSERV 1 0.00898 0.00552 1.63 0.1107

COURTS COURTS 1 0.46252 0.09792 4.72 <.0001

Here, it can be observed that there are six significant variables after removing

geog2. The coefficient of multiple determination dropped from 0.7563 to0.7551, which is

still not a far cry from the initial R2.

The next variables to be deleted were enrolment, literacy rate, cohort survival rate

for secondary education, functional literacy, family expenditure, family income, human

development index, cohort survival rate for elementary education, and expenditures on

social services, deleted one at a time. The results are shown on the following table.

Table 6. ANOVA Results and Individual T-tests for the Modified Model

Analysis of Variance

Sum of Mean

Model 5 1014.48120 202.89624 23.29 <.0001

Error 56 487.93771 8.71317

Coeff Var 72.44154

Parameter Estimates

Parameter Standard

Intercept Intercept 1 9.07796 1.84835 4.91 <.0001

POPDEN POPDEN 1 0.00669 0.00182 3.68 0.0005

POVINC POVINC 1 -0.11473 0.03338 -3.44 0.0011

PNP PNP 1 -0.00414 0.00096282 -4.30 <.0001

COURTS COURTS 1 0.39710 0.08870 4.48 <.0001

Note that in this modified model, there are four significant variables at a level of

significance of 0.05: population density, poverty incidence, number of policemen, and

number of courts. It can be observed that unemployment rate is still not significant.

However, it is retained since theoretically, unemployment rate would have an effect on

crime rate. The F-test for this model implies that at least one of the independent variables

will be able to explain the variability found in crime rate. The R2, however, dropped.

From an initial value of 0.7597, it is now only 0.6752. Therefore, this model can only

explain 67.52 percent of the variability found in crime rate. This is understandable,

though, since 12 variables were removed from the model.

The new model is given by:

𝐶𝑟𝑖𝑚𝑒 = 𝛽0 + 𝛽1𝑝𝑜𝑝𝑑𝑒𝑛 + 𝛽2𝑝𝑜𝑣𝑖𝑛𝑐 + 𝛽3𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟 + 𝛽4𝑝𝑛𝑝 + 𝛽5𝑐𝑜𝑢𝑟𝑡𝑠 + 𝜀

where crime = crime rate per province

povinc = poverty incidence per province

courts = number of courts per province

ε ~ N(0,σ2)

Before this model can be accepted as the best model, diagnostic checking is

necessary. In checking for normality, we have the following result.

Table 7. Tests for Normality

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.90266 Pr < W 0.0001

Kolmogorov-Smirnov D 0.121649 Pr > D 0.0222

Cramer-von Mises W-Sq 0.211265 Pr > W-Sq <0.0050

Anderson-Darling A-Sq 1.377216 Pr > A-Sq <0.0050

Note that for all the tests for normality, the p-values are less than the level of

significance, 0.05. The null hypothesis of normality of error terms is then rejected. It is

necessary for remedial measures such as transformations to be performed. These will be

discussed later on in this section.

After normality, homoskedasticity is checked. A residual plot (versus the

predicted value of crime rate) is utilized in order to check for homoskedasticity. The

residual plot does not seem to exhibit any shape (funnel or diamond) that would imply

heteroskedasticity. In order to ascertain this, the spec option under the regression

procedure was used. The result indicates that the null hypothesis of constant variance

should not be rejected. Autocorrelation and multicollinearity were also checked. For

autocorrelation, the test statistic under the Durbin-Watson test is 2.246 with a first order

autocorrelation of -0.127. For a negative value of the first order autocorrelation, the

statistic 4 – d is used instead of d. This will yield a value of 1.754. This value is

compared to the tabulated values for the Durbin-Watson test. If d>du, the null hypothesis

is not rejected. However, if d<dl, the null hypothesis is rejected. Using the table, the

values are n = 65, k‘ = 5, dl = 1.438, and du = 1.767, it can be observed that d lies in

between dl and du. In this case, the test becomes inconclusive. As for multicollinearity, it

can be observed that the condition indices do not exceed 30. Thus, the model is free of

problems on multicollinearity. Results for these are shown on Table 8.

Table 8. Results for Spec Option, Durbin-Watson Test, and Multicollinearity Indicators

Collinearity Diagnostics*

Condition

Number Eigenvalue Index

1 4.83022 1.00000

2 0.63842 2.75062

3 0.24895 4.40480

4 0.13439 5.99517

5 0.11903 6.37016

6 0.02898 12.90959

Test of First and Second

Moment Specification

DF Chi-Square Pr > ChiSq

20 16.96 0.6558

Durbin-Watson D 2.246

Number of Observations 62

1st Order Autocorrelation -0.127

*Proportion of variation is omitted.

Linearity is also checked to ascertain whether any departures from it may be

observed. Partial regression plots were obtained for crime rate versus each independent

variable, population density, poverty incidence, number of policemen, and number of

courts. The plots show that no distinct departure from linearity may be observed (plots

found in the appendices).

For outliers, it can be observed that there are three possible outliers from the

observations. These may be detected through Studentized residuals for dependent

variables and leverages for independent variables. The cut-off for Studentized residuals is

equal to two. As for the leverage, the cut-off computed is 0.1935. Observations 12, 18,

and 58 are possible outliers. The same observations may be influential as well. In

checking for influence, Cook‘s D, DFFITS, and DFBETAS have to be consulted. The

cut-offs for DFFITS and DFBETAS are 0.622 and 0.254 respectively.

Table 9. Outliers and Influential Observations

Output Statistics

Dependent Predicted Std Error Std Error Student Cook's

Obs Variable Value Mean Predict Residual Residual Residual D

12 25.6633 19.7931 1.6627 5.8702 2.439 2.407 0.449

18 10.0296 4.4596 0.8913 5.5701 2.814 1.979 0.066

Hat Diag Cov

Obs RStudent H Ratio DFFITS

12 2.5191 0.3173 0.8476 1.7174

18 2.0341 0.0912 0.7934 0.6442

Dependent Predicted Std Error Std Error Student Cook's

Obs Variable Value Mean Predict Residual Residual Residual D

58 22.6754 14.3599 1.2853 8.3155 2.657 3.129 0.382

Hat Diag Cov

Obs RStudent H Ratio DFFITS

58 3.4141 0.1896 0.4339 1.6514

Observation 12 has a Studentized residual equal to 2.407 and a Studentized

deleted residual of 2.5191. Its leverage is equal to 0.3173. These values exceed the cut-

offs computed. This observation may be considered influential since its Cook‘s D has the

highest value relative to the other observations. Moreover, its DFFITS is equal to 1.7174

which is well beyond the cut-off. DFBETAS under the variables UNEMPLOYR, PNP,

and COURTS exceed the cut-off, and may be candidates for influential observations.

Observation 18 has a Studentized deleted residual equal to 2.0437 but its leverage

is equal to 0.0912. This implies that the value of crime rate may be an outlier for this

observation while its independent variables are not. Its Cook‘s D is not relatively high;

however, its DFFITS is equal to 0.6442. The DFBETAS under the variables POVINC,

PNP, and COURTS exceed the cut-off, and may also be candidates for influential

observations.

Observation 58 has a Studentized residual equal to 3.129, a Studentized deleted

residual equal to 3.4141, and leverage equal to 0.1896. These support the supposition that

observation 58 may be an outlier. It may also be influential since its DFFITS is equal to

1.6514, and the DFBETAS under POPDEN and POVINC exceed the cut-off.

Corrective Measures

Since there is a problem in normality, it is necessary to perform corrective

measures. One possible corrective measure is transformation of variables. For this

particular study, several transformations of variables were made. Some of these are the

natural logarithms and square roots. Several combinations of transformed variables were

also considered in order to correct the problem. Finally, the combination of the square

roots of crime rate, poverty incidence, and number of courts (coded as sqcrime, sqpovinc,

and sqcourts respectively) and population density helped in correcting the problem of

normality.

Thus, the transformed model is

𝑠𝑞𝑐𝑟𝑖𝑚𝑒 = 𝛽0 + 𝛽1𝑝𝑜𝑝𝑑𝑒𝑛 + 𝛽2𝑠𝑞𝑝𝑜𝑣𝑖𝑛𝑐 + 𝛽3𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟 + 𝛽4𝑝𝑛𝑝 + 𝛽5𝑠𝑞𝑐𝑜𝑢𝑟𝑡𝑠

+ 𝜀

where sqcrime = square root of the crime rate per province

sqpovinc = square root of the poverty incidence per province

sqcourts = square root of the number of courts per province

ε ~ N(0, σ2)

Again, this model is checked for normality, autocorrelation, heteroskedasticity,

linearity, multicollinearity, and outliers.

After transformation, there was a significant increase in the R2 of the model.

Recall that the R2 of the previous model was 0.6752. The R

2 of the transformed model is

equal to 0.7141. This means that there is an improvement in the amount of variability the

transformed model can explain. The four independent variables are still significant at a

level of significance of 0.05, while unemployment rate remains insignificant. Results of

which are found on the following table.

Table 10. ANOVA Results and Parameter Estimates for the Transformed Model

Sum of Mean

Model 5 45.29466 9.05893 27.98 <.0001

Error 56 18.13036 0.32376

Coeff Var 32.57121

Parameter Estimates

Parameter

Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 2.95684 0.55258 5.35 <.0001 POPDEN POPDEN 1 0.00081505 0.00036395 2.24 0.0291 SQPOVINC SQPOVINC 1 -0.29978 0.07181 -4.17 0.0001 UNEMPLOYR UNEMPLOYR 1 -0.00672 0.01379 -0.49 0.6278 SQCOURTS SQCOURTS 1 0.55679 0.09904 5.62 <.0001 PNP PNP 1 -0.00096767 0.00018859 -5.13 <.0001

The first tests to be performed after transformation are tests for heteroskedasticity,

multicollinearity, and autocorrelation. The results for which are shown on Table 11.

Table 11. Tests for Multicollinearity, Heteroskedasticity and Autocorrelation

Collinearity Diagnostics

Number Eigenvalue Condition Index

1 5.15775 1.00000

2 0.45889 3.35256

3 0.18424 5.29102

4 0.10986 6.85197

5 0.07841 8.11067

6 0.01086 21.79524

-----------------------------Proportion of Variation----------------------------

Number Intercept POPDEN SQPOVINC UNEMPLOYR PNP SQCOURTS

1 0.00060965 0.00725 0.00093940 0.00474 0.00512 0.00305

2 0.00234 0.40328 0.01132 0.02230 0.00123 0.00690

3 0.00195 0.22684 0.00249 0.14538 0.45895 0.04345

4 0.01717 0.01756 0.05286 0.71655 0.14420 0.02612

5 0.00027899 0.25731 0.02890 0.04728 0.36060 0.72823

6 0.97765 0.08777 0.90349 0.06375 0.02990 0.19225

20 16.12 0.7094

Note that the condition indices are less than 30. This implies that there is no

multicollinearity after transforming the model. The Durbin-Watson test statistic has a

value equal to 2.063, and a first order autocorrelation equal to -0.037. For this, consider

instead the statistic 4 – d. This yields a value of 1.937, which is close to 2. Also, if this

statistic is compared to the value on the Durbin-Watson table (recall that du is equal to

1.767), it can be observed that the value for 4 – d exceeds du. This will lead to the non-

rejection of the null hypothesis of no autocorrelation. Thus, it can be concluded that there

is no problem of autocorrelation.

Next, the outliers and influential observations were addressed. Recall that there

are three outliers and influential observations. These observations were removed one at a

time, and at each removal of observation, a diagnostic check is performed. At each stage,

the model is checked for multicollinearity, autocorrelation, normality, outliers, and

heteroskedasticity. After removing these three observations, the final model is, again

checked for the previously mentioned criteria. The results for these are found in

Appendices A-30 onwards.

Table 12. ANOVA, Parameter Estimates, and Multicollinearity Tests

Sum of Mean

Model 5 33.58547 6.71709 30.99 <.0001

Error 52 11.27129 0.21676

Coeff Var 28.80432

Parameter Estimates

Parameter Standard Variance

Variable Label DF Estimate Error t Value Pr > |t| Inflation

Intercept Intercept 1 2.59768 0.48076 5.40 <.0001 0

POPDEN POPDEN 1 0.00076341 0.00031984 2.39 0.0207 1.60853

SQPOVINC SQPOVINC 1 -0.25736 0.06291 -4.09 0.0001 1.44434

UNEMPLOYR UNEMPLOYR 1 -0.00848 0.01183 -0.72 0.4767 1.02209

PNP PNP 1 -0.00082873 0.00015836 -5.23 <.0001 1.54602

SQCOURTS SQCOURTS 1 0.53004 0.08455 6.27 <.0001 2.10035

Number Eigenvalue Condition Index

1 5.17743 1.00000

2 0.44097 3.42650

3 0.18328 5.31492

4 0.10913 6.88793

5 0.07905 8.09268

6 0.01014 22.60067

-----------------------------Proportion of Variation-----------------------------

Number Intercept POPDEN SQPOVINC UNEMPLOYR PNP

SQCOURTS

1 0.00057169 0.00744 0.00086302 0.00465 0.00501 0.00302

2 0.00239 0.43102 0.01025 0.02384 0.00092325 0.00725

3 0.00261 0.23433 0.00378 0.10920 0.47734 0.04222

4 0.01606 0.02305 0.05667 0.79816 0.04533 0.00171

5 0.00010800 0.22789 0.01191 0.00065690 0.43714 0.75883

6 0.97826 0.07627 0.91652 0.06349 0.03426 0.18696

Table 13. Results for Spec Option, Durbin-Watson Test, and Tests for Normality

20 11.33 0.9372

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.980824 Pr < W 0.4877

Kolmogorov-Smirnov D 0.096603 Pr > D >0.1500

Cramer-von Mises W-Sq 0.095625 Pr > W-Sq 0.1290

Anderson-Darling A-Sq 0.486635 Pr > A-Sq 0.2244

After these corrective measures, the final model has a coefficient of multiple

determination equal to 0.7487. This means that the model can explain 74.87 percent of

the variability found in crime rate. Note that its VIF‘s do not greatly exceed 10 and its

condition indices are all less than 30. This means that multicollinearity is not a problem

with this model. Linearity is checked using partial regression plots, and again, there seem

to be no distinct departures from linearity. Also, the Durbin-Watson test statistic shows a

value equal to 2.142 with a first order autocorrelation equal to -0.081. Again, the statistic

4 – d is considered instead, and a value of 1.858 is obtained, which is still close to 2. If

this is compared to the tabulated value (under the Durbin-Watson Table, n = 60, k‘ = 5),

d = 1.858 is greater than du = 1.767. The null hypothesis of no autocorrelation is not

rejected. Thus, there is no autocorrelation present. Under this model, the p-values are

well beyond the level of significance of 0.05. This implies that the null hypothesis that

the error terms are normally distributed is not rejected. Thus, the error terms follow a

normal distribution. In testing for heteroskedasticity, the spec option in SAS is used.

Since the Chi-square value computed is equal to 11.33, and its p-value is equal to 0.9372,

the null hypothesis of constant variance is not rejected. Thus, there is no

heteroskedasticity.

Conclusion

The final model of crime rate is then, equal to

𝐶𝑟𝑖𝑚𝑒 ∗= 𝛽0 + 𝛽1𝑝𝑜𝑝𝑑𝑒𝑛 + 𝛽2𝑝𝑜𝑣𝑖𝑛𝑐 ∗ +𝛽3𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟 + 𝛽4𝑝𝑛𝑝 + 𝛽5𝑐𝑜𝑢𝑟𝑡𝑠 ∗ +𝜀

where crime rate * = sqcrime

povinc* = sqpovinc

pnp* = sqpnp

courts* = sqcourts.

The estimated model for crime rate is then equal to

𝐶𝑟𝑖𝑚𝑒 𝑟𝑎𝑡𝑒 ∗ = 2.1856 + 0.0008𝑝𝑜𝑝𝑑𝑒𝑛 − 0.2657𝑝𝑜𝑣𝑖𝑛𝑐 ∗ − 0.0001𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑟

− 0.0008𝑝𝑛𝑝 + 0.5014𝑐𝑜𝑢𝑟𝑡𝑠 ∗

where each parameter estimate after β0 represents an increase or decrease in the

estimated mean of crime rate per unit increase in the corresponding independent variable

holding all other variables constant.

Note that for the independent variables poverty incidence (povinc),

unemployment rate (unemployr), and number of courts (courts), the signs of the

coefficients are adverse to theoretical expectations. As common sense would dictate, a

rise in poverty incidence would entail a rise in the crime rate. The same can be said for

unemployment rate. On the other hand, a rise in the number of courts would mean a

decrease in the crime rate. However, for this model, the relationships between poverty

incidence and crime rate, and unemployment rate and crime rate are inverted. That is, for

every increase in poverty incidence, crime rate decreases. For every increase in

unemployment rate, crime rate decreases. This is owed to the fact that during the year

2000, there was political and economic instability due to the ouster of Former President

Joseph Estrada. If the poverty incidence, unemployment rate, and crime rate for this year

are compared to the others, it can be observed that the poverty incidence and the

unemployment rate for the year 2000 are high and the crime rate for the same year is low,

relative to other years.

As for the number of courts, it can be observed that there is a direct relationship

between this and crime rate. This may be due to the fact that as the number of courts

increases, the opportunity for people to file cases would also increase. Thus, there would

also be an increase in the number of reported crimes, which would lead to an increase in

the crime rate. Although there is a difference between theory and empirical data in this

study, the model obtained is not extraordinary, and is still a plausible one.

Thus, through this model, we were able to establish a linear relationship between

crime and the factors population density, poverty incidence, number of police per

province, and number of courts per province. Since this model has satisfied the

conditions and assumptions, the model may be a plausible predictor of crime rate in the

Philippines.

Recommendations

As previously mentioned, this particular study focused on crime rate in the

Philippines from provincial data for the year 2000 only. As a means of improving the

study, the group recommends considering data from other time periods as well as data

from municipalities or cities. The group also recommends formulating separate

regression lines for the different classifications of crimes (index and non-index crimes,

and crimes against property and person) as different factors may affect each category of

crime. Separating the regression lines would allow for classification of factors among

different types of crime. This may lead to a better model in terms of the coefficient of

determination.

References

The Philippine countryside in figures. (n.d.). Retrieved May 18, 2010, from

http://www.nscb.gov.ph/countryside/default.asp

Becker, G. (1968). Crime and punishment: an economic approach. The Journal of

Political Economy, 76(2), 169-217. Retrieved from http://www.jstor.org/

Ehrlich, I. (1975). The deterrent effect of capital punishment: a question of life and death.

The American Economic Review, 65(3), 397-417. Retrieved from

http://www.jstor.org/

Ehrlich, I. (1975). On the relation between education on crime. In F.T. Juster (Ed).

Education, income, and human behavior (pp. 313–338). United States of America:

National Bureau of Economic Research.

Fajnzylber, P., et al. (1998). Determinants of crime rates in Latin America and the world:

an empirical assessment. Washington, D.C., United States of America: The

World Bank.

Gillado, M. F., & Cruz, T.T. (2004, October 4-5). Panel data estimation of crime rates in

the Philippines. Retrieved from

www.nscb.gov.ph/ncs/9thncs/papers/publicOrder_PanelData.pdf

National Statistical Coordination Board. (2003). The Philippine countryside in figures

(2003 edition). Makati City, Philippines: Author.

Reynolds, M. (2000). Crime and punishment in Texas in the 1990s. Retrieved from

http://www.ncpa.org/pub/st237?pg=6.

Sanidad-leones, c. (2010). The current situation of crime associated with urbanization:

problems experienced and countermeasures initiated in the Philippines. Retrieved

from http://www.unafei.or.jp/english/pdf/PDF_rms/no68/09_Leones-1_p133-

150.pdf.

Wadsworth, T. (2001). Employment, crime, and context: a multi-level analysis of the

relationship between work and crime (Doctoral Dissertation, University of

Washington). Available from the National Criminal Justice Reference Service

(NCJRS) website http://www.ncjrs.gov/pdffiles1/nij/grants/198118.pdf.

Yasir, S., et al. (2009). Unemployment, poverty, inflation, and crime nexus: cointegration

and causality analysis of Pakistan. Pakistan Economic and Social Review, 47(1),

pp. 79-98.

Appendices

Appendix A. Results (SAS Outputs)

Appendix B. Durbin-Watson Table

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces - Revised

Documents

Transcript of A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces - Revised

The Development Payoffs of Good Governance: … · The Development Payoffs of Good Governance: Emerging Results of a Social Experiment in Two Philippine Provinces ... famous Report

Determinants of E-Commerce Adoption of Philippine Businesses · businesses. In support to the Philippine E-Commerce Roadmap’s first success criteria having 100,000 Micro, Small,

Air Pollution in Bangkok - pier.or.th · reduce air pollution in Bangkok, nation wide, and all provinces Government can use findings (determinants and economic damage) as information

The Determinants of Tuberculosis (TB) Transmission in the Canadian-Born Population of the Prairie Provinces (The “DTT Project”) Primary Care Provincial.

jameslitsinger.files.wordpress.com · Web viewSmall farmer pest control practices for rainfed rice, corn, and grain legumes in three Philippine provinces. Philippine Entomologist

Provinces People & Places

DETERMINANTS OF INTERNATIONAL MIGRATION OF THE PHILIPPINE … · 2020. 4. 3. · Review of Socio-Economic Research and Development Studies 2018 Volume 2 No. 1, 22-42 22 DETERMINANTS

RTD - SPENDING PATTERNS OF OFW … PATTERNS OF OFW HOUSEHOLDS: A REVIEW OF RELATED LITERATURE Cid L. Terosa ... • Which of the Philippine regions and provinces

Philippine Native Trees – What to Plant in Different · PDF file1 Philippine Native Trees – What to Plant in Different Provinces CELSO B. LANTICAN, D.Phil. (Oxford) Retired Professor

Philippines - International Service for the Acquisition of ... · with Bt maize in the Northern Philippine provinces have ... University of the Philippines at Los Baños (IPB-UPLB),

Canadian provinces

Appalachian Provinces

RTE Pakistan · CREATION OF NEW PROVINCES On the eve of Independence, Pakistan inherited five provinces, while India had eight provinces. Now Pakistan is left with four provinces,

Top Provinces

IRPS 30 Biological Constraints to Farmers' Rice Yields in Three Philippine Provinces

Determinants of Export Performance in the Philippine ...the decision to export and the determinants of export success were identified. The study concluded that productivity, firm size

Determinants of Women Empowerment in Pakistan: Some New … · 2019-01-18 · economic and social conditions among these provinces. Male to Female ratio in Punjab is 107.2, Sind is

Who Are the Philippine Negritos? Evidence from Languagereid/Combined%20Files/A86.%202013.%20Who%2… · Ambala Ayta (abc) Zambales, Pampanga, and Bataan Provinces, Luzon ... Rizal

Human development in Philippine provinces over the long term

PHILIPPINE PLANNING .a·c·,· JOURNAL XXVIII, No. 2... · comprises the provinces of Cavite, Batangas, Laguna, Rizal, and Quezon (Figure 1). The subregion has a total land area of