Means & Medians to Machine Learning: Spatial Statistics Basics …€¦ · 1:30p From Means and...

Post on 02-Oct-2020

3 views 0 download

Transcript of Means & Medians to Machine Learning: Spatial Statistics Basics …€¦ · 1:30p From Means and...

Lauren Bennett, Flora Vale, Alberto Nieto

Means & Medians to Machine Learning: Spatial Statistics Basics and Innovations

esriurl.com/spatialstats

What are Spatial Statistics?

Spatial Statistics are a set of exploratory techniques for

describing and modeling spatial distributions, patterns,

processes, and relationships.

proximityarea

lengthorientation

connectivitycoincidence

direction

Spreadsheets

Data or Information?

Maps

Data or Information?

When you look at a spreadsheet…

You ask for more

• Mean

• Standard Deviations

• Min and Max

• …

Same goes for maps!

We can do more

Means and Medians

Machine Learningclustering methods

summarizing spatial distributions

Means and Medianssummarizing spatial distributions

Central Feature

identifies the most centrally located feature in a point, line, or polygon feature class

Y

X

X

Y

X

Y

X

Y

X

Y

X

Y

central feature

Mean Center

identifies the geographic center (or the center of concentration) for a set of features

X

Y

(25,24)

(24,16)

(22,23)

(18,12)

(14,8)

(12,12)

(14,14)

(9,18)

(13,12)

X

Y (25,24)

(24,16)

(22,23)

(18,12)

(14,8)

(12,12)

(14,14)

(9,18)

(13,12)

(17,15)

mean center

mean =

(17,15)

Median Center

identifies the location that minimizes overall Euclidean distance to the features in a dataset

Y

X

(25,24)

(24,16)

(22,23)

(18,12)

(14,8)

(12,12)

(14,14)

(9,18)

(13,12)

X

YX: 25 . 24 . 22 . 18 . 14 . 14 . 13 . 12 . 9

Y: 24 . 23 . 18 . 16 . 14 . 12 . 12 . 12 . 8

(14,14)

median center

median =

Mean vs Median?

X

Y(176,138)

(25,24)

(24,16)

(22,23)

(18,12)

(14,8)

(12,12)

(14,14)

(9,18)

(13,12)

X

Y

Linear Directional Mean

identifies the mean direction, length, and geographic center for a set of lines

Y

X

Y

X

Directional Distribution(Standard Deviational Ellipse)

creates standard deviational ellipses to summarize the spatial characteristics of geographic features: central tendency, dispersion, and directional trends

X

Y

mean center

X

Y

Demo

Machine Learningclustering methods

Density-based Clustering

finds clusters based on feature locations

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Noise

DBSCAN – defined distance

HDBSCAN – self adjusting

OPTICS – multi-scale

DBSCAN – defined distance

DBSCAN – defined distance

DBSCAN – defined distance

DBSCAN – defined distance

DBSCAN – defined distance

DBSCAN – defined distance

DBSCAN – defined distance

HDBSCAN – self adjusting

OPTICS – multi-scale

OPTICS – multi-scale

OPTICS – multi-scale

OPTICS – multi-scaleR

each

abili

ty d

ista

nce

Feature order

OPTICS – multi-scale

OPTICS – multi-scale

DBSCAN HDBSCAN OPTICS

• Uses fixed search distance

• Clusters of similar densities

• Fast

• Uses range of search distances to find clusters of varying densities

• Data driven, requires least user input

• Uses neighbor distances to create reachability plot

• Most flexibility for fine tuning

• Can be computationally intensive

Demo

Multivariate Clustering

finds clusters based on feature attributes

3 groups 4 groupsK Means

2 groups 3 groups 4 groups

K Means

2 groups 3 groups 4 groups

K Means

2 groups 3 groups 4 groups

K Means

% Below 138 FPL

% Uninsured

% No High School

% Latino

Eligible Uninsured Americans

Spatially Constrained Multivariate Clustering

finds clusters based on feature attributes and proximity

https://xkcd.com/1364/

Minimum Spanning Tree

Minimum Spanning Tree

Minimum Spanning Tree

Minimum Spanning Tree

Minimum Spanning Tree

Minimum Spanning Tree

Minimum Spanning Tree

Crime in Chicago

Median Income

HS Dropout Rate

UnemploymentUnemployment

Crime Count

Demo

dissimilar between

dissimilar between

uniformgroups

Build Balanced Zones

creates spatially contiguous zones in your study area using a genetic growth algorithm based on criteria that you specify

Criteria

Zone building

• Attribute target

• Number of zones and attribute target

• Number of zones

Criteria

Zone selection

• Equal area

• Compactness

• Equal number of features

• Attribute to consider

fitness score

9.14

9.14

10.22

top 50% move on to next generation

top 50% move on to next generation

top 50% move on to next generation

and crossover to create new offspring

top 50% move on to next generation

and crossover to create new offspring

top 50% move on to next generation

and crossover to create new offspring

top 50% move on to next generation

then the next fittest 50% moves on and crosses over to create the next generation

ConvergenceFi

tne

ss S

core

Generation

Demo

Want to learn more???

esriurl.com/spatialstats

TUESDAY_________________________________________

1:45p Data Visualization for Spatial Analysis 146C

3:00p Machine Learning in ArcGIS 146C

4:15p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C

WEDNESDAY______________________________________

8:30a Machine Learning in ArcGIS 146C

11a Data Visualization for Spatial Analysis 146C

1:30p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C

2:45p Spatial Data Mining: Cluster Analysis and Space Time Analysis 146C

4:00p Beyond Where: Modeling Spatial Relationships and Making Predictions 146C

5:15p The Forest for the Trees: Making Predictions Using Forest-Based Classification and Regression 146C

Please fill out a course survey!!!

lbennett@esri.comanieto@esri.comfvale@esri.com

Want to learn more???

esriurl.com/spatialstats

TUESDAY_________________________________________

1:45p Data Visualization for Spatial Analysis 146C

3:00p Machine Learning in ArcGIS 146C

4:15p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C

WEDNESDAY______________________________________

8:30a Machine Learning in ArcGIS 146C

11a Data Visualization for Spatial Analysis 146C

1:30p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C

2:45p Spatial Data Mining: Cluster Analysis and Space Time Analysis 146C

4:00p Beyond Where: Modeling Spatial Relationships and Making Predictions 146C

5:15p The Forest for the Trees: Making Predictions Using Forest-Based Classification and Regression 146C

Please fill out a course survey!!!

lbennett@esri.comanieto@esri.comfvale@esri.com