Crime Hotspot Mapping and Analysis - Dr. Paul...

99
Crime Hotspot Mapping and Analysis Paul Zandbergen Department of Geography University of New Mexico

Transcript of Crime Hotspot Mapping and Analysis - Dr. Paul...

Crime Hotspot Mapping and Analysis

Paul ZandbergenDepartment of Geographyp g p yUniversity of New Mexico

Workshop Presentation PostedWorkshop Presentation Posted

h // l db / k hhttp://www.paulzandbergen.com/workshops

OutlineOutline• Introduction• Introduction

– Definition of crime hotspots– Purpose of hotspot mapping– Crime data for hotspot mappingCrime data for hotspot mapping

• Hotspot techniques– Grid‐based mapping– Local Moran’s ILocal Moran s I– Gi*– Kernel density– Nearest Neighbor Hierarchical clustering (NNH)– Spatial and Temporal Analysis of Crime (STAC)

• Hands‐on demonstrations• Comparison of hotspot techniques• Recent developments

– Predictive crime mapping– Hotspot visualization

Definition of Crime HotspotDefinition of Crime Hotspot

A hotspot is an area that has a greater than average number of criminal or disorder events, or an area where people have a higher than average risk of victimization

• Ways to express hotspots:Counts– Counts

– Densities (counts per sq. km)R t ( t l ti t i k– Rates (counts per population at risk

Purpose of Hotspot MappingPurpose of Hotspot Mapping

• Describe large volume of data in a meaningful way• Detect spatial and temporal patterns and trends• Strategically allocate resources• Test whether policies are working• Test whether policies are working• Identify underlying causes for crime events

Geocoding Crime EventsGeocoding Crime Events

Global vs Local ClusteringGlobal vs. Local Clustering

• Global clustering– Determines whether a pattern is clustered– Produces a single statistic with confidence intervals – e.g. Nearest Neighbor Index (NNI)

• Local clustering• Local clustering– Determines where clusters are locatedProd ces a map of cl sters (“hotspots”)– Produces a map of clusters (“hotspots”)

– e.g. kernel density

Methods for Global ClusteringMethods for Global Clustering

• Methods:– Nearest Neighbor Index

K nearest Neighbor– K‐nearest Neighbor– Quadrat analysis– Ripley’s K‐functionp y– Moran’s I– Getis‐Ord General G

• Determine whether a pattern is clustered, but not where.

Example: Nearest Neighbor IndexExample: Nearest Neighbor Index

Distance Mean ExpectedDistance Mean Observed

NNI

n = 100Expected mean distance:5.0

d 100

A = 100,000Expected mean distance:A/n

de

26136.0SE Standard error:

0

A/n2

0 100

Example: Nearest Neighbor IndexExample: Nearest Neighbor Index

homicides

Example: Nearest Neighbor IndexExample: Nearest Neighbor Index

homicides results

Average Nearest Neighbor tool

Z scores and p valuesZ‐scores and p‐values

P‐value : probability that the observed pattern was created by a random processZ‐score: number of standard deviations corresponding to a certain p‐valueZ score:  number of standard deviations corresponding to a certain p value

Example: Nearest Neighbor IndexExample: Nearest Neighbor Index

Crime Type NNI Z‐score Conclusion

Auto burglary 0.346 ‐134.24 Highly clustered

Homicide 0.467 ‐9.17 Highly clustered

Robbery 0.345 ‐54.67 Highly clustered

homicidesauto burglary robbery

Methods for Local ClusteringMethods for Local Clustering

• Methods:– Grid‐based mapping

Local Moran’s I– Local Moran s I– Gi*– Kernel densityy– Nearest Neighbor Hierarchical clustering (NNH)– Spatial and Temporal Analysis of Crime (STAC)

• Determine where clustering occurs• Clusters of high values are called “hotspots”

Aggregated Data vs Point DataAggregated Data vs. Point Data

Aggregation

OR OR

Point Pattern Analysis

OR OR

Why Aggregation?Why Aggregation?

• Determine crime rates (i.e. # per 10,000 residents)• Allows for different types of statistical tests

Hotspot MethodsHotspot Methods

• Aggregated data– Grid‐based thematic mapping– Local Moran’s I– Gi*

• Point dataPoint data– Kernel density– Nearest Neighbor Hierarchical clustering (NNH)– Nearest Neighbor Hierarchical clustering (NNH)– Spatial and Temporal Analysis of Crime (STAC)

Grid based Thematic MappingGrid‐based Thematic Mapping

Handson: Grid based Thematic MappingHandson: Grid‐based Thematic Mapping

• Overlay a grid over the study area– Several hundred meters to a few km

ArcGIS: Create Fishnet Tool– ArcGIS: Create Fishnet Tool

• Generate a count of crimes per grid cell– ArcGIS: Spatial JoinArcGIS: Spatial Join

• Remove grid cells with count = 0– ArcGIS: SQL (Select by Attribute, Select)ArcGIS: SQL (Select by Attribute, Select)

• Determine a threshold value for a “hotspot”– Typically a quintile classification– ArcGIS: data classification or SQL

Grid based Thematic MappingGrid‐based Thematic Mapping

10 class quintile classification highest class

2 km grid

Demo: Grid‐based Thematic Mapping

Local Moran’s I and Gi*Local Moran s I and Gi

• Basic question: Are nearby features similar?• Two approaches for aggregated data:

1 Measuring the similarity of nearby features1. Measuring the similarity of nearby features– Global: Moran’s I

Local: Local Moran’s I– Local: Local Moran s I

2. Measuring the concentration of high or low values– Global: General G‐statistic– Local: Gi*

Global: Moran’s I and General G statisticGlobal: Moran s I and General G‐statistic

• Global measures of spatial autocorrelation– i.e. a single statistic for one spatial pattern

• Significance testing– using Z‐scores derived from standard errorg

• Analysis based on normalized data– i e using densities and rates not on raw countsi.e. using densities and rates, not on raw counts

• Defining what “nearby” means is criticalthis is referred to as “spatial eights”– this is referred to as “spatial weights”

Moran’s IMoran s I

i

jij

ij xxxxwn

I)')('(

jiij

i j

xxxxwI

)')('(i j i

x’ is the mean value for all featuresx is the mean value for all featuresi is the index for the target featurej is the index for the neighbor featuren is the number of featuresn is the number of featureswij is the spatial weight for the pair

Expected Value for Moran’s IExpected Value for Moran s I

• Value for I range from ‐1 to 1• Random distribution ~ 0• Clustered: I > 0 (positive spatial autocorrelation)• Dispersed: I < 0 (negative spatial autocorrelation)

• Significance test based on standard deviation.• Z‐scores are used in the reporting

Moran’s I ExampleMoran s I Example

crime density (# per sq. km)

General G StatisticGeneral G‐Statistic

jiij xxw )(

ji

ij

jj

xxdG

)()(

iji

j

)(

i is the index for the target featurej is the index for the neighbor featuren is the number of featuresn is the number of featureswij is the spatial weight for the pair

Expected value for General G statisticExpected value for General G‐statistic

h l l l b• The General G‐Statistic is a relative statistic, so no conclusions can be drawn from its absolute value. This is because the sum of weights for a dataset can vary.

• The theoretical range, however, is from 0 to 1, but values close to 1 are rare.

• If the observed G statistic is higher than the expected value there is a• If the observed G‐statistic is higher than the expected value, there is a concentration of high values

• If the observed G‐statistic is lower than the expected value, there is a concentration of low values

W)1()( nn

WE dG

General G statistic ExampleGeneral G‐statistic Example

crime density (# per sq. km)

Spatial NeighborhoodsSpatial Neighborhoods

• Two basic types:– Adjacency (i.e. sharing a boundary)

Distance based– Distance‐based

• Adjacency not as widely used, since it is very sensitive to specific local boundariesspecific local boundaries

• Many variations for distance‐based neighborhoods exist• Neighborhood relationships are stored in a a spatial weightsNeighborhood relationships are stored in a a spatial weights 

matrix

Spatial weights matrixSpatial weights matrix

• Matrix for all features in a spatial dataset– n features produce an nxn matrix

V l i t i i di t th “ i ht”• Values in matrix indicate the “weight”– Can be 0s and 1s to indicate “no neighbor” and “neighbor”– Can be value that indicates relative weights i e some neighbors countCan be value that indicates relative weights, i.e. some neighbors count 

more than others

• Weighs can be row‐normalized to account for potential biases– e.g. number of neighbors may vary with size of polygons, so you can 

normalize for the number of neighbors

Example: Kansas Countiesp

Neighbors

001 003 005 …. 209

001

Neighbors

003

005

Targ

et

…..

209

T105 x 105 matrix

Example: Polygon ContiguityExample: Polygon Contiguity

Example: Polygon ContiguityExample: Polygon Contiguity

001 …. 009 052 053 105 141 165 … 209

001

Regular spatial weights matrix

167 0 0 1 1 1 1 1 1 0 0

…..

209

001 …. 009 052 053 105 141 165 … 209

001

Row-standardized spatial weights matrix

001

167 0 0 0.167 0.167 0.167 0.167 0.167 0.167 0 0

…..

209

Distance based NeighborhoodsDistance‐based Neighborhoods

• Distance is based on centroids of polygons• Data needs to be projected• Fixed threshold distance

– within the distance all neighbors count the same

I di i h d• Inverse distance weighted– Regular or squared– Up to a threshold distance or to infinite (end of study area)– Up to a threshold distance or to infinite (end of study area)

Example: Fixed DistanceExample: Fixed Distance

• Distance determination uses centroids• Threshold distance should be based on an understanding on what makes 

up a meaningful “neighborhood” for the variable question• Row standardization is recommended for fixed distances• Row standardization is recommended for fixed distances

Example: Inverse Distance WeightedExample: Inverse Distance Weighted

• Distance determination uses centroids• Distance determination uses centroids• Can be regular or squared• Typically no threshold distance• Row standardization is not recommended since it• Row standardization is not recommended since it

makes all distances relative

Hands on: Moran’s I and General GHands‐on: Moran s I and General G

• Aggregate data to polygons– ArcGIS: Spatial Join

• Determine densities or rates:– ArcGIS: Add Field, Field Calculator

• Determine patterns– ArcGIS: Spatial Autocorrelation (Moran’s I)ArcGIS: Spatial  Autocorrelation (Moran s I)– ArcGIS: High/Low Clustering (Getis‐Ord General G)– Select spatial weights– Select spatial weights

Demo: Moran’s I and General G

Local Moran’s ILocal Moran s I

• Local version of Moran’s I ‐ for every feature:– a value for local Moran’s I– Z‐score and corresponding p‐valueZ score and corresponding p value– type of cluster

• Spatial weights: inverse distance• Interpretation

– Values for Moran’s I indicate whether clustering occurs– Z‐values indicates whether the result is statistically significant.

Local Moran’s I ResultsLocal Moran s I Results

% of people w/ diabetes local clustering result

Local Moran’s I InterpretationLocal Moran s I Interpretation

• Clusters:– HH (high‐high) – cluster of high values

LL (low low) cluster of low values– LL (low‐low) – cluster of low values

• Outliers:– HL (high‐low) – high value surrounded by low valuesHL (high low)  high value surrounded  by low values– LH (low‐high) – low value surrounded by high values

• Not significantg– p‐value > 0.05

• For crime data we are interested in HH clusters

Local Moran’s I ExampleLocal Moran s I Example

i i ll i ifii d i statistically significant high‐high clusters

crime density(# per sq km)

Gi*GiL l i f G l G S i i• Local version of General G‐Statistic– a Z‐score and corresponding p‐value for every feature– no local Gi* reportedno local Gi  reported

• Spatial weights: fixed distance• Two types:

– Gi: do not include the target feature in the neighborhood– Gi*: include the target features in the neighborhood

Gi* ResultsGi  Results

% of people w/ diabetes local clustering result

Gi*InterpretationGi Interpretation

• Positive Z‐scores > 1.96– Clustering of high values

N ti Z 1 96• Negative Z‐scores < ‐1.96– Clustering of low values

• Z scores between 1 96 and 1 96• Z‐scores between ‐1.96 and 1.96– Not significant

• For crime data we are interested in clustering of high values

Gi* ExampleGi  Examplel i f hi h li d i clustering of high valuescrime density

(# per sq km)

Local Moran’s I vs Gi*Local Moran s I vs. Gi

• Both indicate spatial clustering of high crime areas• Local Moran’s I is more robust, since it is not dependent on 

h th hi h l l l twhether high or low values cluster• Gi*is most meaningful if either high or low values cluster, not 

bothboth– This is typically the case for crime densities, so Gi* is very suited for 

crime hotspot mapping

Hands on: Local Moran’s I and Gi*Hands‐on: Local Moran s I and Gi

• Aggregate data to polygons– ArcGIS: Spatial Join

D t i d iti t• Determine densities or rates:– ArcGIS: Add Field, Field Calculator– Densities: normalize by areaDensities: normalize by area– Rates: normalize by population at risk

• Determine patternsp– ArcGIS: Cluster and Outlier Analysis (Anselin Local Moran’s I)– ArcGIS: Hot Spot Analysis(Getis‐Ord Gi*)– Select spatial weights

Demo: Local Moran’s I and Gi*

Mapping DensityMapping DensityP i l b d l l l l d i• Point values can be used to calculate a local density.– Essentially, the point values are spread out over a surface. The 

measured quantity of the points is distributed throughout a landscape – a density value is calculated for each cell in the output raster.

• Apply a search radius or bandwidthA circular search area is applied to each cell in the output raster being– A circular search area is applied to each cell in the output raster being created. The search area determines the distance to search for points in order to calculate a density value for each cell in the output raster.

• Calculating density is NOT a type of interpolation.

Surface Density CalculationSurface Density Calculation

When mapping density you determine the count per unit of area for a surface. This process

takes a discrete set of points buttakes a discrete set of points, but and creates a raster surface

where each cell is given a densitywhere each cell is given a density value. The calculation is count divided by the area of a user-specified search radius that is

centered on the cell and applied to each cell in the rasterto each cell in the raster.

Surface Density CalculationSurface Density Calculation

The simple method for creating a density surface uses a circular search area orThe simple method for creating a density surface uses a circular search area, or neighborhood, to calculate cell values. In a density surface, individual cell values are calculated by dividing the number of features that fall within the search area

(e g observations) by the size of the area (e g 2 88 acres) The resulting value is(e.g., observations) by the size of the area (e.g., 2.88 acres). The resulting value is then assigned to the cell. Every cell in the surface is processed in the same way.

Density CalculationsDensity Calculations

• You can calculate density using simple or kernel calculations:

• In a simple density calculation, points that fall within the search area are summed and then divided by the search area size to get each cell’s density valuesize to get each cell s density value.

• The kernel density calculation works the same as the simple density calculation except the points lying near the center ofdensity calculation, except the points lying near the center of a raster cell’s search area are weighted more heavily than those lying near the edge. The result is a smoother distribution of values.

Density TypesDensity Types

simple kernel

Density TypesDensity Types

simple kernel

Kernel DensityKernel Density

Kernel DensityKernel Density

Kernel DensityKernel Density

Kernel DensityKernel Density

Kernel DensityKernel Density

Hands on: Kernel Density ExampleHands‐on: Kernel Density Example

• Start with point data – Create weight field if needed

C t k l d it f• Create kernel density surface– ArcGIS: Kernel Density– Select bandwidth (search radius)Select bandwidth (search radius)– Select output raster cellsize

• Create hotspots from surfacep– ArcGIS: Classify or Reclassify– Select threshold values

Demo: Kernel density

Importance of Search RadiusImportance of Search Radius

• Boundaries vary greatly with search radius• No single best way to pick the best value• Size should correspond to scale of analysis

– regional vs. local vs. micro

• General guidelinesi t k di th f i– consistency – keep radius the same for comparisons

– uniform, i.e. no adaptive (very confusing)– “typical” values range from 100 to 500 m for “local” analysistypical  values range from 100 to 500 m for  local  analysis

Importance of ThresholdImportance of Threshold

• What density should be considered “hot”?• No single best technique• One recommended technique

– Remove cells with density = 0D t i d it ( i k )– Determine average density (e.g. x crimes per sq km)

– Map multiple of the mean (< 1*mean, 1‐2*mean, etc.)

• What NOT to use:– Software defaultsSoftware defaults– Classifications driven by distribution (e.g. Natural Breaks)

Demo: Kernel density

Nearest Neighbor Hierarchical (NNH) ClusteringNearest Neighbor Hierarchical (NNH) Clustering

• Identifies clusters of points based on:– Number of points (user defined)– Distance to points (user defined or based on randomness)

• Output:p– Ellipses of clusters by order (1st, 2nd, etc.)– Convex hulls of clusters by order (1st, 2nd, etc.)y ( , , )

Nearest Neighbor Hierarchical (NNH) ClusteringNearest Neighbor Hierarchical (NNH) Clustering

Hands on: NNH ClusteringHands‐on: NNH Clustering

• Prepare points in ArcGIS– Coordinate system, units– XY coordinates

• Open data in CrimeStatp– Specify units, fields

• Run NNH in CrimeStatRun NNH in CrimeStat– Set parameters– Specify output location (for multiple files)– Specify output location (for multiple files)

• Open results in ArcGIS for mapping

NNH in CrimestatNNH in Crimestat

NNH in CrimestatNNH in Crimestat

Nearest Neighbor Hierarchical ClusteringNearest Neighbor Hierarchical Clustering

Spatial and Temporal Analysis of CrimeSpatial and Temporal Analysis of Crime

• Identifies cluster based on:– Search neighborhood (user defined)g ( )– Number of points (user defined)Overlapping clusters combined into one– Overlapping clusters combined into one

• Outputs– Ellipses of clusters (one order only, vary in size)

Spatial and Temporal Analysis of CrimeSpatial and Temporal Analysis of Crime

Spatial and Temporal Analysis of CrimeSpatial and Temporal Analysis of Crime

Spatial and Temporal Analysis of CrimeSpatial and Temporal Analysis of Crime

Comparison of Hotspot TechniquesComparison of Hotspot Techniques

Hotspot Technique

Creating map

Key parameters Interpretation Statistical significance

Softwareq p g

Grid‐based thematic

Easy Grid‐cell size Easy No ArcGIS

Local Moran’s I Moderate Polygon type & size spatial Difficult Yes ArcGISLocal Moran s I Moderate Polygon type & size, spatial weights, normalization

Difficult Yes ArcGIS

Gi* Moderate Polygon type & size, spatial weights, normalization,

Difficult Yes ArcGISg , ,

Kernel density Easy Search radius Easy No ArcGIS

NNH Difficult Search radius, minimum count

Moderate No CrimeStatcount

STAC Difficult Search radius, minimum count

Moderate No CrimeStat

Recent DevelopmentsRecent Developments

• Predictive crime mapping• Hotspot visualizationHotspot visualization• Spatio‐temporal hotspots 

Predictive Crime MappingPredictive Crime MappingTime Period 1 Time Period 2Time Period 1 Time Period 2

?

Time periods can be years, seasons, months, weeks, shifts, etc.

Measures of Hotspot ReliabilityMeasures of Hotspot Reliability( )• Hit Rate (%)

– Percentage of crimes in period 2 that falls in hotspot derived from period 1– Higher values are betterg

• Predictive Accuracy Index (PAI)– Ratio of hit rate to the area percentage– Measures predictive accuracy of hot spot– Higher values are betterg

• Recapture Rate Index (RRI)– Ratio of hot spot crime densities for periods 2 and 1– Standardized for change in total number of crimes– Higher values are betterg

Example Calculation – Assaults in Las VegasExample Calculation – Assaults in Las Vegas

1,366 crimes in 2007overlay 1 km grid

1,035 km2 total area

653 crimes in 2007 within hotspot of 68 km2

1,531 crimes in 2008, of which 580 within hotspot 

based on 2007 data

Predictive Accuracy Index =(580/1 531) / (68/1 035) 5 77

Recapture Rate Index = (580/653) * (1 366/1 531) 0 78

Hit Rate =(580/1 531) 37 9% (580/1,531) / (68/1,035) = 5.77  (580/653) * (1,366/1,531) = 0.78  (580/1,531) = 37.9% 

Accuracy of Crime PredictionsAccuracy of Crime Predictions Charlotte NC – assaults

Hotspot technique Hit Rate PredictiveAccuracy Index

Recapture Rate Index

Charlotte, NC – assaults

Accuracy Index Rate Index

Grid‐based thematic – 250 m grid 45.6 15.8 0.81

Local Moran’s I – 250 m grid 62.3 9.2 0.86g

Local Moran’s I – blockgroups 38.1 7.8 1.02

Gi* – 250 m grid 59.5 7.6 0.95

Gi* – blockgroups 24.1 8.4 1.02

Kernel density – 200 m radius 34.8 14.5 0.60

N t i hb hi hi l l t i 5 8 563 0 70Nearest neighbor hierarchical clustering 5.8 563 0.70

Spatial and Temporal Analysis of Crime 3.2 1,083 0.70

Hotspot method has a major effect

Effects of Hotspot ParametersEffects of Hotspot ParametersLas Vegas NV – auto thefts

HR = 88 0%HR = 67 2%HR = 47 6%HR = 39 8%HR = 11 6%

Las Vegas, NV – auto thefts

HR = 88.0%PAI = 2.48RRI = 0.96

HR = 67.2%PAI = 3.79RRI = 0.94

HR = 47.6%PAI = 5.01RRI = 0.90

HR = 39.8%PAI = 5.51RRI = 0.87

HR = 11.6%PAI = 9.67RRI = 0.82

100 crimes/km2 50 crimes/km2 42 crimes/km2 25 crimes/km2 10 crimes/km2

There are trade‐offs among accuracy metrics

Effect of Hotspot ParametersEffect of Hotspot ParametersCharlotte NC – assaults – kernel density

Kernel density bandwidth Hit Rate Predictive Recapture 

Charlotte, NC – assaults – kernel density

Accuracy Index Rate Index

50 m 33.6 214.2 0.59

100 m 35 0 65 4 0 69100 m 35.0 65.4 0.69

200 m 40.0 25.3 0.75

300 m 45.1 17.1 0.83

400 m 47.9 13.4 0.87

500 m 50.2 11.6 0.92

1,000 m 53.3 8.0 0.98

Hotspot parameters have a major effect

Hotspot VisualizationHotspot Visualization

3D View of KDE3D View of KDE

Temporal PatternsTemporal Patterns

Temporal PatternsTemporal Patterns

IsosurfacesIsosurfaces

ResourcesResources

https://www.ncjrs.gov/pdffiles1/nij/209393.pdf

Workshop Presentation PostedWorkshop Presentation Posted

h // l db / k hhttp://www.paulzandbergen.com/workshops

ContactContact

Paul Zandbergen – University of New [email protected] – www.paulzandbergen.com