Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is...

32
Spatial Statistics Zhe Jiang [email protected] 1

Transcript of Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is...

Page 1: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Spatial StatisticsZhe Jiang

[email protected]

1

Page 2: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

What is Spatial Statistics?• Statistics• The study of collection, analysis, interpretation of data• Descriptive v.s. inferential

• Spatial statistics• Statistics for spatial data (point, line, polygon, raster)• Variables indexed in 2D or 3D, random locations• Unique properties

• Non i.i.d.• Spatial autocorrelation• Isotropy v.s. anisotropy• Stationarity v.s. non-stationarity

2

Page 3: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Categories of Spatial Statistics• Geostatistics – point reference data• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷is 𝑟-dimensional Euclidean space• 𝑌 is random, 𝑠 is fixed

• Lattice statistics – areal data• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is a tessellation of Euclidean space• 𝑌 is random, 𝑠 is fixed

• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare point locations• 𝑠 is random

3U.S.riverstreamgaugeobservations Electionresultbycountyin2016 Shooting,Chicago2010Source:http://assets.dnainfo.comSource:http://brilliantmaps.comSource:USGS.gov

Page 4: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Part I: Geostatistics• Point reference data• A stochastic process: 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is 𝑟-dimensional

Euclidean space• Example:

• What is Geostatistics used for?• Exploratory data analysis• Spatial interpolation

4

U.S.riverstreamgaugeobservations Measuresatminingsites

Page 5: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Applications• Estimate precipitation based on records at a set of

weather stations• Infer ground water level based on sensor readings of

a set of gauges• Predict mineral resources based on samples at a

limited number of sites

5U.S.riverstreamgaugeobservations

Page 6: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Spatial Stationarity• 𝑌 𝑠 : 𝑠 ∈ 𝐷 , 𝐷 is 𝑟-dimensional Euclidean space• 𝑌 𝑠 is strictly stationary when• Distribution unchanged when locations shifted• For any 𝑛 ≥ 1, any 𝑛locations 𝑠), 𝑠*, 𝑠+, … , 𝑠- , ℎ ∈ 𝑅6• 𝑌(𝑠)), 𝑌(𝑠*), … , 𝑌(𝑠- ) , 𝑌(𝑠) + ℎ), 𝑌(𝑠* + ℎ), … , 𝑌(𝑠- + ℎ) has

same distribution• 𝑌 𝑠 is weakly stationary when• Mean, (co)variance unchanged when locations shifted• 𝐸 𝑌 𝑠 = 𝜇= ≡ 𝜇 (constant mean)• 𝐶𝑜𝑣 𝑌 𝑠 , 𝑌 𝑠 + ℎ = 𝐶(ℎ) for all ℎ ∈ 𝑅6

6

Covarianceacrossanytwolocationsissimplyafunctionofh!

Oftentoostrong,notrealistic!

Illustrativeexamplesource:http://azvoleff.com/

Page 7: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Variogram• Tobler’s first law of geography:

• “Everything is related to everything else, but near things are more related than distant things.”

• How is “difference” (“irrelevance”) of two observations increase with distance?

• How to measure the range of spatial “relatedness”?

• 𝑌 𝑠 is intrinsically stationary when• 𝐸 𝑌 𝑠 = 𝜇= ≡ 𝜇 (constant mean)• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*≡ 2𝑟 ℎ for any s, ℎ• 2𝑟 ℎ is called variogram• 𝑟 ℎ is called semi-variogram

• 𝑌 𝑠 is isotropy if𝑟 ℎ ≡ 𝑟( ℎ )

7

“Difference” across two locations only depends on h!

With isotropy, can plot a curve for r(|h|)!

Avg.annualprecip.(wrcc.dri.edu)

But is this assumption always valid?

h

s

Illustrationoffirstlawofgeographysource:http://azvoleff.com/

Page 8: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Variogram Plot• Models of semi-variogram

8

𝑟 𝑑 = G𝜏* + 𝜎*𝑖𝑓𝑑 > 00𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑟 𝑑 =

𝜏* + 𝜎*𝑖𝑓𝑑 >1𝜙

𝜏* + 𝜎*3𝜙𝑑 − 𝜙𝑑 *

2 𝑖𝑓0 < 𝑑 ≤1𝜙

0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒Linear

Spherical

Page 9: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Variogram Example

9

Diameteratbreastheightontrees

SeeRexample!

Page 10: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Variogram v.s. Covarigram• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*= 2𝑟 ℎ• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*= 𝑉𝑎𝑟 𝑠 + ℎ + 𝑉𝑎𝑟 𝑠 −2𝐶𝑜𝑣 𝑠 + ℎ, 𝑠 = 2𝐶𝑜𝑣 0 − 2𝐶𝑜𝑣 ℎ• Thus, 𝑟 ℎ = 𝐶 0 − 𝐶 ℎ , r ℎ and 𝐶 ℎ related!• Covariance 𝐶 ℎ is a function of ℎ

10

Page 11: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Ordinary Kriging• Problem: • Given observations at locations 𝑦(𝑠)), 𝑦(𝑠*), … , 𝑦(𝑠- ) , • How to predict 𝑦(𝑠Y)

• Assumptions:• Weak (intrinsic) stationarity • Known covariance 𝑪 𝒉• Unknown constant 𝐸 𝑌 ≡ 𝜇• Linear estimation:𝑦\(𝑠Y) = ∑ 𝑙.𝑦(𝑠.)-

._)

• Approach:• Minimize expected square loss!

11𝐸 𝑦(𝑠Y −` 𝑙.𝑦(𝑠.)

-

._))*

Page 12: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Ordinary Kriging

12

Minimize:𝐸 𝑦(𝑠Y − ∑ 𝑙.𝑦(𝑠.)-._) )*

𝐸 𝑦(𝑠Y −` 𝑙.𝑦(𝑠.)-

._)) = 0

𝐸(𝑦(𝑠Y)) ≡ 𝐸(𝑦(𝑠.)) ≡ 𝜇1 −` 𝑙. = 0

-

._)

= 𝑉𝑎𝑟(𝑦(𝑠Y)) − 2𝐶𝑜𝑣 𝑦(𝑠Y ,` 𝑙.𝑦(𝑠.)) +-

._)` ` 𝑙.𝑙a𝐶𝑜𝑣 𝑦(𝑠. , 𝑦(𝑠a))

-

a_)

-

._)

= 𝐶Y,Y −2` 𝑙.𝐶Y,. +` ` 𝑙.𝑙a𝐶.,a-

a_)

-

._)

-

._)

= 𝐶Y,Y −2𝐶Y,∗c 𝑙 + 𝑙c𝐶𝑙

𝟏c𝑙 − 1 = 0

Constrainedconvexoptimizationproblem!UsingLagrangian multiplier,Optimalsolution:

Remember 𝐶𝑜𝑣 𝑌 𝑠 , 𝑌 𝑠 + ℎ ≡ 𝐶 ℎ𝐶 ℎ covariogramcan be estimated from data!

𝑙∗ = 𝐶e)[𝐶Y∗ −𝟏c𝐶e)𝐶Y∗ − 1𝟏c𝐶e)1 𝟏] 𝑦\(𝑠Y) = 𝑙∗c𝑌

Notation:𝐶.,a ≡ 𝐶𝑜𝑣(𝑌 𝑠. , 𝑌 𝑠a )

Page 13: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Universal Kriging• Problem: • Given observations y = 𝑦(𝑠)), … , 𝑦(𝑠- ) , covariates 𝑥(𝑠)), … , 𝑥(𝑠- ) , 𝑥(𝑠Y),predict 𝑦(𝑠Y)

• Assumptions:• 𝑦(𝑠.) = 𝑥(𝑠.)c𝛽 + 𝜖., 𝒀 = 𝑿𝛽 + 𝝐• 𝝐~𝑁(0, Σ), where Σ = 𝜎*𝐻 ∅ + 𝜏*𝐼

• Estimator: 𝑦\(𝑠Y) = ℎ(𝑦)• How to find optimal ℎ(𝑦)?• Minimize expected square loss!

13

𝐸 𝑦(𝑠Y − ℎ(𝑦))*

𝑦\(𝑠Y) = ℎ 𝑦 = 𝐸 𝑦(𝑠Y |𝑦)Optimalprediction:

= × +

𝒀 𝑿 𝛽 𝝐

Page 14: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Universal Kriging

14

𝒀 = 𝑿𝛽 + 𝝐

= × +

𝒀 𝑿 𝛽 𝝐

𝝐~𝑁(0, Σ),whereΣ = 𝜎*𝐻 ∅ + 𝜏*𝐼AssumingaGaussianprocess,i.e.,anysetofobservationYfollowsGaussiandistribution

𝑌)𝑌*

~𝑁𝜇)𝜇* , Ω)) Ω)*

Ω*) Ω**

𝐸(𝑌)|𝑌*) = 𝜇) + Ω)*Ω**e)(𝑌* − 𝜇*)

𝑉𝑎𝑟(𝑌)|𝑌*) = Ω)) − Ω)*Ω**e)Ω*)

𝑦\(𝑠Y) = ℎ 𝑦 = 𝐸 𝑦(𝑠Y |𝑦)Optimalprediction:

𝑌) = 𝑦(𝑠Y)𝑌* = 𝑦 = 𝑦(𝑠) , … , 𝑦(𝑠-))c

𝜇) = 𝑥(𝑠Y)c𝛽𝜇* = (𝑥(𝑠))c𝛽, … , 𝑥(𝑠-)c𝛽)c= 𝑿𝛽

𝐸 𝑦(𝑠Y 𝑦 = 𝑥(𝑠Y)c𝛽 + 𝐶Y∗c Σe)(𝑦 − 𝑋𝛽)

𝛽w 𝐶 ℎ + 𝜏*𝐼

Page 15: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Review Questions:• For each of the following statement, True or False?

1. In Geostatistics, strict stationarity is often assumed.2. Variogram can help select a distance threshold for

spatial neighborhood (range of spatial dependency).3. Kriging assumes weak or intrinsic stationarity.4. Ordinary Kriging assumes weak or intrinsic

stationarity.

15

Page 16: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Part II: Lattice Statistics• Areal data model• A tessellation of continuous space into (regular or

irregular) cells• Mapping each unit to a non-spatial attribute value

• Lattice Statistics:• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is a set of cells in areal data

• What is Lattice Statistics used for?• Explore spatial patterns in areal maps• Model areal maps for interpretation

16http://brilliantmaps.com

Page 17: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

17

Electionresultbycountyin2016

Source:http://brilliantmaps.com

Q1:Isthemapspatiallystationary?Q2:Istherestrongautocorrelationgloballyandlocally?Q3:Howtomodelandinterpretcoefficientsthatimpactresults?

Page 18: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

W-Matrix• Spatial neighborhood matrix• Wij > 0 when i and j are neighbors• Wij = 0 when i and j are not neighbors

• Example

18

14 7

5 82

3 6

Anarealdatawith8units1

4 7

5 82

3 6

14 7

5 82

3 6

0 1 0 1 1 0 0 01 0 1 0 1 0 0 00 1 0 0 0 1 0 01 0 0 0 1 0 1 01 1 0 1 0 0 0 10 0 1 0 1 0 0 10 0 0 1 0 0 0 10 0 0 0 1 1 1 0

Arookneighborhood

Aqueenneighborhood

0 1 0 1 1 0 0 01 0 1 0 1 1 0 00 1 0 0 1 1 0 01 0 0 0 1 0 1 11 1 1 1 0 1 1 10 1 1 0 1 0 0 10 0 0 1 1 0 0 10 0 0 1 1 0 1 0

Page 19: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Spatial Autocorrelation• Measures the level of global spatial association• Moran’s I:• 𝐼 =

- ∑ ∑ xyz({ye{|)({ze{|)�z

�y

(∑ xyz) ∑ ({ye{|)~�y

�y�z

where 𝑖and 𝑗are locations.

• 𝐼 ∈ [−1, 1], high value shows strong spatial association• Example with rook neighborhood:

19

0 1 0 1 01 0 1 0 10 1 0 1 01 0 1 0 10 1 0 1 0

𝐼 ≈ −1

1 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 0

𝐼 ≈ 1

Page 20: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Spatial Autocorrelation• Geary’s C• 𝐶 =

(-e)) ∑ ∑ xyz({ye{z)~�z

�y

*(∑ xyz) ∑ ({ye{|)~�y

�y�z

where 𝑖and 𝑖are locations.

• 𝐶 ≥ 0, low values show strong spatial association• Example with rook neighborhood:

20

0 1 0 1 01 0 1 0 10 1 0 1 01 0 1 0 10 1 0 1 0

𝐶 =?

1 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 0

𝐶 =?

WhichonehashigherC?

Page 21: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Local Spatial Autocorrelation• Local indicator of spatial association (LISA)• When data is not homogeneous, local behaviors

may differ from global behavior (outliers)• Local Moran’s I: • 𝐼. =

{ye{|�~

∑ 𝑤.a(𝑌a−𝑌|)�a where 𝑚* =

∑ ({ye{|)~�y-

• 𝐼 = )-∑ 𝐼.�.

• Local Geary’s C• 𝐶. =

)�~∑ 𝑤.a(𝑌. − 𝑌a)*�a

• 𝐶 ∝ ∑ 𝐶.�.

21

Page 22: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Spatial Autocorrelation for Nominal Data

• Black-Black Joint Count

22

W B W B WB W B W BW B W B WB W B W BW B W B W

B B B W WB B B W WB B B W WB B B W WB B B W W

Suppose𝑛 locations,𝑛� white,𝑛�black.

𝑃� =-�-,𝑃� = -�

-

𝐽𝐶�� =)*∑ ∑ 𝑤.a𝐼(𝑦. = 𝐵, 𝑦a = 𝐵)�

a�. ,

Test:����e�(����)�𝐸(𝐽𝐶��) =

)*∑ ∑ 𝑤.a�

a�. 𝑃�𝑃�,𝑉𝑎𝑟(𝐽𝐶��) = 𝜎* assumingGaussiandistribution

𝐸(𝐽𝐶��) and𝑉𝑎𝑟(𝐽𝐶��) canalsobegeneratedfromrandompermutation!

H0:BandWareindependentlydistributed,𝐽𝐶�� asymptoticallynormaldistributionH1:BandWarenotindependent,BBtendstocluster.

Page 23: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Markov Random Field• Problem: How to model joint distribution of a field?• Brook’s Lemma: • Joint distribution can be determined by conditional

distribution• 𝑝 𝑦), 𝑦*, … , 𝑦- ⟸ 𝑝 𝑦.|𝑦a, 𝑗 ≠ 𝑖• However, conditional distribution above is too complex!

• Markov property:• 𝑝 𝑦.|𝑦a, 𝑗 ≠ 𝑖 ≡ 𝑝 𝑦.|𝑦a, 𝑗 ∈ 𝑁(𝑖)• Conditional distribution of observation at a location only

depends on its neighbors

23

Page 24: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Spatial Autoregressive Model (SAR)• 𝑌 = 𝜌𝑊𝑌 + 𝑋𝛽 + 𝜖

24

= × +

𝒀 𝑿 𝛽 𝝐

× +×

𝜌 𝑊 𝒀

autoregressiveterm covariates independentnoise

Page 25: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

1854BroadStreetcholeraoutbreak(Solidblackrectanglesshowvictims)

Part III: Spatial Point Process• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare event locations, fixed event type

25

Page 26: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

1854BroadStreetcholeraoutbreak(Solidblackrectanglesshowvictims)

Part III: Spatial Point Process• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare event locations, fixed event type

26

Page 27: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Spatial Point Process• Example• Crime event locations• Disease event locations

• Questions to answer• Are points tend to cluster/de-cluster?• Is point intensity homogeneous or there is a hotspot?

27

Shooting,Chicago2010Source:http://assets.dnainfo.com

K-functionstatisticsSpatialscanstatistics

&

'

%

$

What’s Special About Spatial Data Mining?

Clustering

? Clustering: Find groups of tuples

? Statistical Significance

• Complete spatial randomness, cluster, and decluster

Figure 9: Inputs: Complete Spatial Random (CSR), Cluster, and Decluster

Figure 10: Classical Clustering

Data is of Complete

Spatial Randomness

3: Mean Dense

1: Unusually Dense 2: Desnse

4: Sparse

33

4

3

2

1 2

3

3

2

3

2

2

1Data is of Decluster Pattern

Figure 11: Spatial Clustering

29

completespatialrandom(CSR)

clustering declustering pp

1 pp

CSR Hotspot

Page 28: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Homogeneous Poisson Point Process• Complete spatial randomness (CSR)• Intensity parameter 𝜆, 𝑁(𝐴) as number of points in an area 𝐴• For any 𝐴, 𝑁(𝐴)follows Poisson(𝜆𝐴)• For any disjoint 𝐴) and 𝐴*, 𝑁(𝐴))and 𝑁(𝐴*)are

independent• In any area A, conditioned on 𝑁 𝐴 = 𝑛, these 𝑛 points

independent and uniformly distributed in 𝐴

28

pp1

What can we say about CSR?• The intensity of points are the same everywhere• Once 𝑁 𝐴 is realized, point locations are independent• CSR is often used as a null distribution

Page 29: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Ripley’s K Function• Test if points tend to cluster with each other (micro)• Hypothesis testing• H0: homogeneous Poisson point process (independent)• H1: points tend to cluster with each other• Test statistic:

• 𝐾 𝑑 = 𝜆e)𝐸(#𝑜𝑓𝑝𝑜𝑖𝑛𝑡𝑠𝑤𝑖𝑡ℎ𝑖𝑛𝑟𝑎𝑑𝑖𝑢𝑠𝑑𝑜𝑓𝑎𝑝𝑜𝑖𝑛𝑡)• 𝐾� 𝑑 = 𝜆e) ∑ 𝐼(𝑑.a ≤ 𝑑)/𝑛�

. a• Under H0, 𝐾 𝑑 = 𝜋𝑑*

29

&

'

%

$

What’s Special About Spatial Data Mining?

Clustering

? Clustering: Find groups of tuples

? Statistical Significance

• Complete spatial randomness, cluster, and decluster

Figure 9: Inputs: Complete Spatial Random (CSR), Cluster, and Decluster

Figure 10: Classical Clustering

Data is of Complete

Spatial Randomness

3: Mean Dense

1: Unusually Dense 2: Desnse

4: Sparse

33

4

3

2

1 2

3

3

2

3

2

2

1Data is of Decluster Pattern

Figure 11: Spatial Clustering

29

CSR Clustering Declustering ExampleofKfunctionplot

CSR

clustering

declustering

Page 30: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Ripley’s Cross K Function• Test if of two types of events tend to cluster together• H0: event types i and j are independent• H1: event types i and j tend to cluster together• Test statistic:

• 𝐾.a 𝑑 = 𝜆ae)𝐸(#𝑜𝑓𝑝𝑜𝑖𝑛𝑡𝑠𝑜𝑓𝑡𝑦𝑝𝑒𝑗𝑤𝑖𝑡ℎ𝑖𝑛𝑑𝑜𝑓𝑎𝑝𝑜𝑖𝑛𝑡𝑖)

• 𝐾.a¢ 𝑑 = 𝜆ae) ∑ 𝐼(𝑑.a ≤ 𝑑)/𝑛.�

. a = (𝜆.𝜆a𝐴)e) ∑ 𝐼(𝑑.a ≤ 𝑑)�. a

• Under H0, 𝐾.a 𝑑 = 𝜋𝑑*

30

&

'

%

$

What’s Special About Spatial Data Mining?

Illustration of Cross-Correlation

? Illustration of Cross K-Function for Example Data

0 2 4 6 8 100

200

400

600

800

1000

Distance h

Cro

ss−K

func

tion

Cross−K function of pairs of spatial features

y=pi*h2o and *x and +* and x* and +

Figure 6: Cross K-function for Example Data

23

&

'

%

$

What’s Special About Spatial Data Mining?

Cross-Correlation

? Cross K-Function Definition

• K

ij

(h) = ∏

°1j

E [number of type j event within distance h

of a randomly chosen type i event]

• Cross K-function of some pair of spatial feature types

• Example

– Which pairs are frequently co-located?

– Statistical significance

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80Co−location Patterns − Sample Data

X

Y

Figure 5: Example Data (o and * ; x and +)

22

Page 31: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Spatial Scan Statistics• Test if point intensity is homogeneous everywhere• Hypothesis testing (assume Poisson point process)• H0: homogeneous intensity 𝜆.- = 𝜆£¤¥ for window W• H1: inhomogeneous intensity 𝜆.- > 𝜆£¤¥ for window W• Test statistic (likelihood ratio)

• 𝐿𝑅 = §¨©ª.«¬­.®££¯(°¨¥¨|±²)§¨©ª.«¬­.®££¯(°¨¥¨|±³)

=Sup�,·y¸¹·º»¼L(°¨¥¨;�,¿y¸,¿º»¼)Sup·y¸À·º»¼L(°¨¥¨;�,¿y¸,¿º»¼)

• 𝐿𝑅 = Sup�-���

-� -�Á

��Á

-�Á

𝐼(. )

• Significance • P-value, Monte Carlo simulation

31

pp1 pp

CSR Hotspot

𝑛�:observed#inW𝐸�:expected#inW

𝑛�/ :observed#outofW𝐸�/ :expected#outofW

Page 32: Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is Spatial Statistics? • Statistics • The study of collection, analysis, interpretation

Spatial Scan Statistics: An Example• Input: 13 points• Compute test statistic:

• 𝐿𝑅 = Sup�-���

-� -�Á

��Á

-�Á

𝐼(. )• 𝑛� = 7 (observed # in W)• 𝑛�/ = 6 (observed # out of W)• 𝐸� = )+

)Å ∗ 1 (expected # in W)• 𝐸�/ = )+

)Å ∗ 15 (expected # out of W)• 𝐼 . = 1 (density in W is higher than out)

• 𝐿𝑅 = ( Ç)+/)Å

)Ç( Å)+∗)È/)Å

)Å= 56115.15• Monte Carlo simulation

32

An Example

Scanning window Z , outside window as Z 0

LR =supZ2Z,p>q L(Z , p, q)

supp=qL(Z , p, q)= sup

Z2Z(nZBZ

)nZ (nZ 0

BZ 0)nZ 0 I (·)

nZ = 7 and nZ 0 = 6, |Z | = 1, |Z 0| = 16, BZ = 13/16⇥ 1 = 0.8,BZ 0 = 13/16⇥ 15 = 12.2, so LR = 56115.15

Z Z’

Figure: Illustrative exampleZhe Jiang (University of Alabama) Group Seminar Slides September 22, 2016 8 / 9

EnumeratingwindowWwithsize1

Studyareasize4*4=16