Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is...

Spatial StatisticsZhe Jiang

[email protected]

1

What is Spatial Statistics?• Statistics• The study of collection, analysis, interpretation of data• Descriptive v.s. inferential

• Spatial statistics• Statistics for spatial data (point, line, polygon, raster)• Variables indexed in 2D or 3D, random locations• Unique properties

• Non i.i.d.• Spatial autocorrelation• Isotropy v.s. anisotropy• Stationarity v.s. non-stationarity

2

Categories of Spatial Statistics• Geostatistics – point reference data• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷is 𝑟-dimensional Euclidean space• 𝑌 is random, 𝑠 is fixed

• Lattice statistics – areal data• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is a tessellation of Euclidean space• 𝑌 is random, 𝑠 is fixed

• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare point locations• 𝑠 is random

3U.S.riverstreamgaugeobservations Electionresultbycountyin2016 Shooting,Chicago2010Source:http://assets.dnainfo.comSource:http://brilliantmaps.comSource:USGS.gov

Part I: Geostatistics• Point reference data• A stochastic process: 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is 𝑟-dimensional

Euclidean space• Example:

• What is Geostatistics used for?• Exploratory data analysis• Spatial interpolation

4

U.S.riverstreamgaugeobservations Measuresatminingsites

Applications• Estimate precipitation based on records at a set of

weather stations• Infer ground water level based on sensor readings of

a set of gauges• Predict mineral resources based on samples at a

limited number of sites

5U.S.riverstreamgaugeobservations

Spatial Stationarity• 𝑌 𝑠 : 𝑠 ∈ 𝐷 , 𝐷 is 𝑟-dimensional Euclidean space• 𝑌 𝑠 is strictly stationary when• Distribution unchanged when locations shifted• For any 𝑛 ≥ 1, any 𝑛locations 𝑠), 𝑠*, 𝑠+, … , 𝑠- , ℎ ∈ 𝑅6• 𝑌(𝑠)), 𝑌(𝑠*), … , 𝑌(𝑠- ) , 𝑌(𝑠) + ℎ), 𝑌(𝑠* + ℎ), … , 𝑌(𝑠- + ℎ) has

same distribution• 𝑌 𝑠 is weakly stationary when• Mean, (co)variance unchanged when locations shifted• 𝐸 𝑌 𝑠 = 𝜇= ≡ 𝜇 (constant mean)• 𝐶𝑜𝑣 𝑌 𝑠 , 𝑌 𝑠 + ℎ = 𝐶(ℎ) for all ℎ ∈ 𝑅6

6

Covarianceacrossanytwolocationsissimplyafunctionofh!

Oftentoostrong,notrealistic!

Illustrativeexamplesource:http://azvoleff.com/

Variogram• Tobler’s first law of geography:

• “Everything is related to everything else, but near things are more related than distant things.”

• How is “difference” (“irrelevance”) of two observations increase with distance?

• How to measure the range of spatial “relatedness”?

• 𝑌 𝑠 is intrinsically stationary when• 𝐸 𝑌 𝑠 = 𝜇= ≡ 𝜇 (constant mean)• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*≡ 2𝑟 ℎ for any s, ℎ• 2𝑟 ℎ is called variogram• 𝑟 ℎ is called semi-variogram

• 𝑌 𝑠 is isotropy if𝑟 ℎ ≡ 𝑟( ℎ )

7

“Difference” across two locations only depends on h!

With isotropy, can plot a curve for r(|h|)!

Avg.annualprecip.(wrcc.dri.edu)

But is this assumption always valid?

h

s

Illustrationoffirstlawofgeographysource:http://azvoleff.com/

Variogram Plot• Models of semi-variogram

8

𝑟 𝑑 = G𝜏* + 𝜎*𝑖𝑓𝑑 > 00𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑟 𝑑 =

𝜏* + 𝜎*𝑖𝑓𝑑 >1𝜙

𝜏* + 𝜎*3𝜙𝑑 − 𝜙𝑑 *

2 𝑖𝑓0 < 𝑑 ≤1𝜙

0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒Linear

Spherical

Variogram Example

9

Diameteratbreastheightontrees

SeeRexample!

Variogram v.s. Covarigram• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*= 2𝑟 ℎ• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*= 𝑉𝑎𝑟 𝑠 + ℎ + 𝑉𝑎𝑟 𝑠 −2𝐶𝑜𝑣 𝑠 + ℎ, 𝑠 = 2𝐶𝑜𝑣 0 − 2𝐶𝑜𝑣 ℎ• Thus, 𝑟 ℎ = 𝐶 0 − 𝐶 ℎ , r ℎ and 𝐶 ℎ related!• Covariance 𝐶 ℎ is a function of ℎ

10

Ordinary Kriging• Problem: • Given observations at locations 𝑦(𝑠)), 𝑦(𝑠*), … , 𝑦(𝑠- ) , • How to predict 𝑦(𝑠Y)

• Assumptions:• Weak (intrinsic) stationarity • Known covariance 𝑪 𝒉• Unknown constant 𝐸 𝑌 ≡ 𝜇• Linear estimation:𝑦\(𝑠Y) = ∑ 𝑙.𝑦(𝑠.)-

._)

• Approach:• Minimize expected square loss!

11𝐸 𝑦(𝑠Y −` 𝑙.𝑦(𝑠.)

-

._))*

Ordinary Kriging

12

Minimize:𝐸 𝑦(𝑠Y − ∑ 𝑙.𝑦(𝑠.)-._) )*

𝐸 𝑦(𝑠Y −` 𝑙.𝑦(𝑠.)-

._)) = 0

𝐸(𝑦(𝑠Y)) ≡ 𝐸(𝑦(𝑠.)) ≡ 𝜇1 −` 𝑙. = 0

-

._)

= 𝑉𝑎𝑟(𝑦(𝑠Y)) − 2𝐶𝑜𝑣 𝑦(𝑠Y ,` 𝑙.𝑦(𝑠.)) +-

._)` ` 𝑙.𝑙a𝐶𝑜𝑣 𝑦(𝑠. , 𝑦(𝑠a))

-

a_)

-

._)

= 𝐶Y,Y −2` 𝑙.𝐶Y,. +` ` 𝑙.𝑙a𝐶.,a-

a_)

-

._)

-

._)

= 𝐶Y,Y −2𝐶Y,∗c 𝑙 + 𝑙c𝐶𝑙

𝟏c𝑙 − 1 = 0

Constrainedconvexoptimizationproblem!UsingLagrangian multiplier,Optimalsolution:

Remember 𝐶𝑜𝑣 𝑌 𝑠 , 𝑌 𝑠 + ℎ ≡ 𝐶 ℎ𝐶 ℎ covariogramcan be estimated from data!

𝑙∗ = 𝐶e)[𝐶Y∗ −𝟏c𝐶e)𝐶Y∗ − 1𝟏c𝐶e)1 𝟏] 𝑦\(𝑠Y) = 𝑙∗c𝑌

Notation:𝐶.,a ≡ 𝐶𝑜𝑣(𝑌 𝑠. , 𝑌 𝑠a )

Universal Kriging• Problem: • Given observations y = 𝑦(𝑠)), … , 𝑦(𝑠- ) , covariates 𝑥(𝑠)), … , 𝑥(𝑠- ) , 𝑥(𝑠Y),predict 𝑦(𝑠Y)

• Assumptions:• 𝑦(𝑠.) = 𝑥(𝑠.)c𝛽 + 𝜖., 𝒀 = 𝑿𝛽 + 𝝐• 𝝐~𝑁(0, Σ), where Σ = 𝜎*𝐻 ∅ + 𝜏*𝐼

• Estimator: 𝑦\(𝑠Y) = ℎ(𝑦)• How to find optimal ℎ(𝑦)?• Minimize expected square loss!

13

𝐸 𝑦(𝑠Y − ℎ(𝑦))*

𝑦\(𝑠Y) = ℎ 𝑦 = 𝐸 𝑦(𝑠Y |𝑦)Optimalprediction:

＝ × ＋

𝒀 𝑿 𝛽 𝝐

Universal Kriging

14

𝒀 = 𝑿𝛽 + 𝝐

＝ × ＋

𝒀 𝑿 𝛽 𝝐

𝝐~𝑁(0, Σ),whereΣ = 𝜎*𝐻 ∅ + 𝜏*𝐼AssumingaGaussianprocess,i.e.,anysetofobservationYfollowsGaussiandistribution

𝑌)𝑌*

~𝑁𝜇)𝜇* , Ω)) Ω)*

Ω*) Ω**

𝐸(𝑌)|𝑌*) = 𝜇) + Ω)*Ω**e)(𝑌* − 𝜇*)

𝑉𝑎𝑟(𝑌)|𝑌*) = Ω)) − Ω)*Ω**e)Ω*)

𝑦\(𝑠Y) = ℎ 𝑦 = 𝐸 𝑦(𝑠Y |𝑦)Optimalprediction:

𝑌) = 𝑦(𝑠Y)𝑌* = 𝑦 = 𝑦(𝑠) , … , 𝑦(𝑠-))c

𝜇) = 𝑥(𝑠Y)c𝛽𝜇* = (𝑥(𝑠))c𝛽, … , 𝑥(𝑠-)c𝛽)c= 𝑿𝛽

𝐸 𝑦(𝑠Y 𝑦 = 𝑥(𝑠Y)c𝛽 + 𝐶Y∗c Σe)(𝑦 − 𝑋𝛽)

𝛽w 𝐶 ℎ + 𝜏*𝐼

Review Questions:• For each of the following statement, True or False?

1. In Geostatistics, strict stationarity is often assumed.2. Variogram can help select a distance threshold for

spatial neighborhood (range of spatial dependency).3. Kriging assumes weak or intrinsic stationarity.4. Ordinary Kriging assumes weak or intrinsic

stationarity.

15

Part II: Lattice Statistics• Areal data model• A tessellation of continuous space into (regular or

irregular) cells• Mapping each unit to a non-spatial attribute value

• Lattice Statistics:• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is a set of cells in areal data

• What is Lattice Statistics used for?• Explore spatial patterns in areal maps• Model areal maps for interpretation

16http://brilliantmaps.com

17

Electionresultbycountyin2016

Source:http://brilliantmaps.com

Q1:Isthemapspatiallystationary?Q2:Istherestrongautocorrelationgloballyandlocally?Q3:Howtomodelandinterpretcoefficientsthatimpactresults?

W-Matrix• Spatial neighborhood matrix• Wij > 0 when i and j are neighbors• Wij = 0 when i and j are not neighbors

• Example

18

14 7

5 82

3 6

Anarealdatawith8units1

4 7

5 82

3 6

14 7

5 82

3 6

0 1 0 1 1 0 0 01 0 1 0 1 0 0 00 1 0 0 0 1 0 01 0 0 0 1 0 1 01 1 0 1 0 0 0 10 0 1 0 1 0 0 10 0 0 1 0 0 0 10 0 0 0 1 1 1 0

Arookneighborhood

Aqueenneighborhood

0 1 0 1 1 0 0 01 0 1 0 1 1 0 00 1 0 0 1 1 0 01 0 0 0 1 0 1 11 1 1 1 0 1 1 10 1 1 0 1 0 0 10 0 0 1 1 0 0 10 0 0 1 1 0 1 0

Spatial Autocorrelation• Measures the level of global spatial association• Moran’s I:• 𝐼 =

- ∑ ∑ xyz({ye{|)({ze{|)�z

�y

(∑ xyz) ∑ ({ye{|)~�y

�y�z

where 𝑖and 𝑗are locations.

• 𝐼 ∈ [−1, 1], high value shows strong spatial association• Example with rook neighborhood:

19

0 1 0 1 01 0 1 0 10 1 0 1 01 0 1 0 10 1 0 1 0

𝐼 ≈ −1

1 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 0

𝐼 ≈ 1

Spatial Autocorrelation• Geary’s C• 𝐶 =

(-e)) ∑ ∑ xyz({ye{z)~�z

�y

*(∑ xyz) ∑ ({ye{|)~�y

�y�z

where 𝑖and 𝑖are locations.

• 𝐶 ≥ 0, low values show strong spatial association• Example with rook neighborhood:

20

0 1 0 1 01 0 1 0 10 1 0 1 01 0 1 0 10 1 0 1 0

𝐶 =?

1 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 0

𝐶 =?

WhichonehashigherC?

Local Spatial Autocorrelation• Local indicator of spatial association (LISA)• When data is not homogeneous, local behaviors

may differ from global behavior (outliers)• Local Moran’s I: • 𝐼. =

{ye{|�~

∑ 𝑤.a(𝑌a−𝑌|)�a where 𝑚* =

∑ ({ye{|)~�y-

• 𝐼 = )-∑ 𝐼.�.

• Local Geary’s C• 𝐶. =

)�~∑ 𝑤.a(𝑌. − 𝑌a)*�a

• 𝐶 ∝ ∑ 𝐶.�.

21

Spatial Autocorrelation for Nominal Data

• Black-Black Joint Count

22

W B W B WB W B W BW B W B WB W B W BW B W B W

B B B W WB B B W WB B B W WB B B W WB B B W W

Suppose𝑛 locations,𝑛� white,𝑛�black.

𝑃� =-�-,𝑃� = -�

-

𝐽𝐶�� =)*∑ ∑ 𝑤.a𝐼(𝑦. = 𝐵, 𝑦a = 𝐵)�

a�. ,

Test:��e�(��)�𝐸(𝐽𝐶��) =

)*∑ ∑ 𝑤.a�

a�. 𝑃�𝑃�,𝑉𝑎𝑟(𝐽𝐶��) = 𝜎* assumingGaussiandistribution

𝐸(𝐽𝐶��) and𝑉𝑎𝑟(𝐽𝐶��) canalsobegeneratedfromrandompermutation!

H0:BandWareindependentlydistributed,𝐽𝐶�� asymptoticallynormaldistributionH1:BandWarenotindependent,BBtendstocluster.

Markov Random Field• Problem: How to model joint distribution of a field?• Brook’s Lemma: • Joint distribution can be determined by conditional

distribution• 𝑝 𝑦), 𝑦*, … , 𝑦- ⟸ 𝑝 𝑦.|𝑦a, 𝑗 ≠ 𝑖• However, conditional distribution above is too complex!

• Markov property:• 𝑝 𝑦.|𝑦a, 𝑗 ≠ 𝑖 ≡ 𝑝 𝑦.|𝑦a, 𝑗 ∈ 𝑁(𝑖)• Conditional distribution of observation at a location only

depends on its neighbors

23

Spatial Autoregressive Model (SAR)• 𝑌 = 𝜌𝑊𝑌 + 𝑋𝛽 + 𝜖

24

＝ × ＋

𝒀 𝑿 𝛽 𝝐

× ＋×

𝜌 𝑊 𝒀

autoregressiveterm covariates independentnoise

1854BroadStreetcholeraoutbreak(Solidblackrectanglesshowvictims)

Part III: Spatial Point Process• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare event locations, fixed event type

25

1854BroadStreetcholeraoutbreak(Solidblackrectanglesshowvictims)

Part III: Spatial Point Process• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare event locations, fixed event type

26

Spatial Point Process• Example• Crime event locations• Disease event locations

• Questions to answer• Are points tend to cluster/de-cluster?• Is point intensity homogeneous or there is a hotspot?

27

Shooting,Chicago2010Source:http://assets.dnainfo.com

K-functionstatisticsSpatialscanstatistics

&

'

%

$

What’s Special About Spatial Data Mining?

Clustering

? Clustering: Find groups of tuples

? Statistical Significance

• Complete spatial randomness, cluster, and decluster

Figure 9: Inputs: Complete Spatial Random (CSR), Cluster, and Decluster

Figure 10: Classical Clustering

Data is of Complete

Spatial Randomness

3: Mean Dense

1: Unusually Dense 2: Desnse

4: Sparse

33

4

3

2

1 2

3

3

2

3

2

2

1Data is of Decluster Pattern

Figure 11: Spatial Clustering

29

completespatialrandom(CSR)

clustering declustering pp

1 pp

CSR Hotspot

Homogeneous Poisson Point Process• Complete spatial randomness (CSR)• Intensity parameter 𝜆, 𝑁(𝐴) as number of points in an area 𝐴• For any 𝐴, 𝑁(𝐴)follows Poisson(𝜆𝐴)• For any disjoint 𝐴) and 𝐴*, 𝑁(𝐴))and 𝑁(𝐴*)are

independent• In any area A, conditioned on 𝑁 𝐴 = 𝑛, these 𝑛 points

independent and uniformly distributed in 𝐴

28

pp1

What can we say about CSR?• The intensity of points are the same everywhere• Once 𝑁 𝐴 is realized, point locations are independent• CSR is often used as a null distribution

Ripley’s K Function• Test if points tend to cluster with each other (micro)• Hypothesis testing• H0: homogeneous Poisson point process (independent)• H1: points tend to cluster with each other• Test statistic:

• 𝐾 𝑑 = 𝜆e)𝐸(#𝑜𝑓𝑝𝑜𝑖𝑛𝑡𝑠𝑤𝑖𝑡ℎ𝑖𝑛𝑟𝑎𝑑𝑖𝑢𝑠𝑑𝑜𝑓𝑎𝑝𝑜𝑖𝑛𝑡)• 𝐾� 𝑑 = 𝜆e) ∑ 𝐼(𝑑.a ≤ 𝑑)/𝑛�

. a• Under H0, 𝐾 𝑑 = 𝜋𝑑*

29

&

'

%

$


Clustering

? Clustering: Find groups of tuples

? Statistical Significance

• Complete spatial randomness, cluster, and decluster

Figure 9: Inputs: Complete Spatial Random (CSR), Cluster, and Decluster

Figure 10: Classical Clustering

Data is of Complete

Spatial Randomness

3: Mean Dense

1: Unusually Dense 2: Desnse

4: Sparse

33

4

3

2

1 2

3

3

2

3

2

2

1Data is of Decluster Pattern

Figure 11: Spatial Clustering

29

CSR Clustering Declustering ExampleofKfunctionplot

CSR

clustering

declustering

Ripley’s Cross K Function• Test if of two types of events tend to cluster together• H0: event types i and j are independent• H1: event types i and j tend to cluster together• Test statistic:

• 𝐾.a 𝑑 = 𝜆ae)𝐸(#𝑜𝑓𝑝𝑜𝑖𝑛𝑡𝑠𝑜𝑓𝑡𝑦𝑝𝑒𝑗𝑤𝑖𝑡ℎ𝑖𝑛𝑑𝑜𝑓𝑎𝑝𝑜𝑖𝑛𝑡𝑖)

• 𝐾.a¢ 𝑑 = 𝜆ae) ∑ 𝐼(𝑑.a ≤ 𝑑)/𝑛.�

. a = (𝜆.𝜆a𝐴)e) ∑ 𝐼(𝑑.a ≤ 𝑑)�. a

• Under H0, 𝐾.a 𝑑 = 𝜋𝑑*

30

&

'

%

$


Illustration of Cross-Correlation

? Illustration of Cross K-Function for Example Data

0 2 4 6 8 100

200

400

600

800

1000

Distance h

Cro

ss−K

func

tion

Cross−K function of pairs of spatial features

y=pi*h2o and *x and +* and x* and +

Figure 6: Cross K-function for Example Data

23

&

'

%

$


Cross-Correlation

? Cross K-Function Definition

• K

ij

(h) = ∏

°1j

E [number of type j event within distance h

of a randomly chosen type i event]

• Cross K-function of some pair of spatial feature types

• Example

– Which pairs are frequently co-located?

– Statistical significance

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80Co−location Patterns − Sample Data

X

Y

Figure 5: Example Data (o and * ; x and +)

22

Spatial Scan Statistics• Test if point intensity is homogeneous everywhere• Hypothesis testing (assume Poisson point process)• H0: homogeneous intensity 𝜆.- = 𝜆£¤¥ for window W• H1: inhomogeneous intensity 𝜆.- > 𝜆£¤¥ for window W• Test statistic (likelihood ratio)

• 𝐿𝑅 = §¨©ª.«¬.®££¯(°¨¥¨|±²)§¨©ª.«¬.®££¯(°¨¥¨|±³)

=Sup�,·y¸¹·º»¼L(°¨¥¨;�,¿y¸,¿º»¼)Sup·y¸À·º»¼L(°¨¥¨;�,¿y¸,¿º»¼)

• 𝐿𝑅 = Sup�-��

-� -�Á

��Á

-�Á

𝐼(. )

• Significance • P-value, Monte Carlo simulation

31

pp1 pp

CSR Hotspot

𝑛�:observed#inW𝐸�:expected#inW

𝑛�/ :observed#outofW𝐸�/ :expected#outofW

Spatial Scan Statistics: An Example• Input: 13 points• Compute test statistic:

• 𝐿𝑅 = Sup�-��

-� -�Á

��Á

-�Á

𝐼(. )• 𝑛� = 7 (observed # in W)• 𝑛�/ = 6 (observed # out of W)• 𝐸� = )+

)Å ∗ 1 (expected # in W)• 𝐸�/ = )+

)Å ∗ 15 (expected # out of W)• 𝐼 . = 1 (density in W is higher than out)

• 𝐿𝑅 = ( Ç)+/)Å

)Ç( Å)+∗)È/)Å

)Å= 56115.15• Monte Carlo simulation

32

An Example

Scanning window Z , outside window as Z 0

LR =supZ2Z,p>q L(Z , p, q)

supp=qL(Z , p, q)= sup

Z2Z(nZBZ

)nZ (nZ 0

BZ 0)nZ 0 I (·)

nZ = 7 and nZ 0 = 6, |Z | = 1, |Z 0| = 16, BZ = 13/16⇥ 1 = 0.8,BZ 0 = 13/16⇥ 15 = 12.2, so LR = 56115.15

Z Z’

Figure: Illustrative exampleZhe Jiang (University of Alabama) Group Seminar Slides September 22, 2016 8 / 9

EnumeratingwindowWwithsize1

Studyareasize4*4=16

Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is...

Documents

Transcript of Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is...