Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is...
Transcript of Statistics - Zhejiangzhejiang.cs.ua.edu/.../SpatialBigData/slides/SpatialStatistics.pdf · What is...
What is Spatial Statistics?• Statistics• The study of collection, analysis, interpretation of data• Descriptive v.s. inferential
• Spatial statistics• Statistics for spatial data (point, line, polygon, raster)• Variables indexed in 2D or 3D, random locations• Unique properties
• Non i.i.d.• Spatial autocorrelation• Isotropy v.s. anisotropy• Stationarity v.s. non-stationarity
2
Categories of Spatial Statistics• Geostatistics – point reference data• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷is 𝑟-dimensional Euclidean space• 𝑌 is random, 𝑠 is fixed
• Lattice statistics – areal data• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is a tessellation of Euclidean space• 𝑌 is random, 𝑠 is fixed
• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare point locations• 𝑠 is random
3U.S.riverstreamgaugeobservations Electionresultbycountyin2016 Shooting,Chicago2010Source:http://assets.dnainfo.comSource:http://brilliantmaps.comSource:USGS.gov
Part I: Geostatistics• Point reference data• A stochastic process: 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is 𝑟-dimensional
Euclidean space• Example:
• What is Geostatistics used for?• Exploratory data analysis• Spatial interpolation
4
U.S.riverstreamgaugeobservations Measuresatminingsites
Applications• Estimate precipitation based on records at a set of
weather stations• Infer ground water level based on sensor readings of
a set of gauges• Predict mineral resources based on samples at a
limited number of sites
5U.S.riverstreamgaugeobservations
Spatial Stationarity• 𝑌 𝑠 : 𝑠 ∈ 𝐷 , 𝐷 is 𝑟-dimensional Euclidean space• 𝑌 𝑠 is strictly stationary when• Distribution unchanged when locations shifted• For any 𝑛 ≥ 1, any 𝑛locations 𝑠), 𝑠*, 𝑠+, … , 𝑠- , ℎ ∈ 𝑅6• 𝑌(𝑠)), 𝑌(𝑠*), … , 𝑌(𝑠- ) , 𝑌(𝑠) + ℎ), 𝑌(𝑠* + ℎ), … , 𝑌(𝑠- + ℎ) has
same distribution• 𝑌 𝑠 is weakly stationary when• Mean, (co)variance unchanged when locations shifted• 𝐸 𝑌 𝑠 = 𝜇= ≡ 𝜇 (constant mean)• 𝐶𝑜𝑣 𝑌 𝑠 , 𝑌 𝑠 + ℎ = 𝐶(ℎ) for all ℎ ∈ 𝑅6
6
Covarianceacrossanytwolocationsissimplyafunctionofh!
Oftentoostrong,notrealistic!
Illustrativeexamplesource:http://azvoleff.com/
Variogram• Tobler’s first law of geography:
• “Everything is related to everything else, but near things are more related than distant things.”
• How is “difference” (“irrelevance”) of two observations increase with distance?
• How to measure the range of spatial “relatedness”?
• 𝑌 𝑠 is intrinsically stationary when• 𝐸 𝑌 𝑠 = 𝜇= ≡ 𝜇 (constant mean)• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*≡ 2𝑟 ℎ for any s, ℎ• 2𝑟 ℎ is called variogram• 𝑟 ℎ is called semi-variogram
• 𝑌 𝑠 is isotropy if𝑟 ℎ ≡ 𝑟( ℎ )
7
“Difference” across two locations only depends on h!
With isotropy, can plot a curve for r(|h|)!
Avg.annualprecip.(wrcc.dri.edu)
But is this assumption always valid?
h
s
Illustrationoffirstlawofgeographysource:http://azvoleff.com/
Variogram Plot• Models of semi-variogram
8
𝑟 𝑑 = G𝜏* + 𝜎*𝑖𝑓𝑑 > 00𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑟 𝑑 =
𝜏* + 𝜎*𝑖𝑓𝑑 >1𝜙
𝜏* + 𝜎*3𝜙𝑑 − 𝜙𝑑 *
2 𝑖𝑓0 < 𝑑 ≤1𝜙
0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒Linear
Spherical
Variogram Example
9
Diameteratbreastheightontrees
SeeRexample!
Variogram v.s. Covarigram• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*= 2𝑟 ℎ• 𝐸[𝑌 𝑠 + ℎ − 𝑌 𝑠 ]*= 𝑉𝑎𝑟 𝑠 + ℎ + 𝑉𝑎𝑟 𝑠 −2𝐶𝑜𝑣 𝑠 + ℎ, 𝑠 = 2𝐶𝑜𝑣 0 − 2𝐶𝑜𝑣 ℎ• Thus, 𝑟 ℎ = 𝐶 0 − 𝐶 ℎ , r ℎ and 𝐶 ℎ related!• Covariance 𝐶 ℎ is a function of ℎ
10
Ordinary Kriging• Problem: • Given observations at locations 𝑦(𝑠)), 𝑦(𝑠*), … , 𝑦(𝑠- ) , • How to predict 𝑦(𝑠Y)
• Assumptions:• Weak (intrinsic) stationarity • Known covariance 𝑪 𝒉• Unknown constant 𝐸 𝑌 ≡ 𝜇• Linear estimation:𝑦\(𝑠Y) = ∑ 𝑙.𝑦(𝑠.)-
._)
• Approach:• Minimize expected square loss!
11𝐸 𝑦(𝑠Y −` 𝑙.𝑦(𝑠.)
-
._))*
Ordinary Kriging
12
Minimize:𝐸 𝑦(𝑠Y − ∑ 𝑙.𝑦(𝑠.)-._) )*
𝐸 𝑦(𝑠Y −` 𝑙.𝑦(𝑠.)-
._)) = 0
𝐸(𝑦(𝑠Y)) ≡ 𝐸(𝑦(𝑠.)) ≡ 𝜇1 −` 𝑙. = 0
-
._)
= 𝑉𝑎𝑟(𝑦(𝑠Y)) − 2𝐶𝑜𝑣 𝑦(𝑠Y ,` 𝑙.𝑦(𝑠.)) +-
._)` ` 𝑙.𝑙a𝐶𝑜𝑣 𝑦(𝑠. , 𝑦(𝑠a))
-
a_)
-
._)
= 𝐶Y,Y −2` 𝑙.𝐶Y,. +` ` 𝑙.𝑙a𝐶.,a-
a_)
-
._)
-
._)
= 𝐶Y,Y −2𝐶Y,∗c 𝑙 + 𝑙c𝐶𝑙
𝟏c𝑙 − 1 = 0
Constrainedconvexoptimizationproblem!UsingLagrangian multiplier,Optimalsolution:
Remember 𝐶𝑜𝑣 𝑌 𝑠 , 𝑌 𝑠 + ℎ ≡ 𝐶 ℎ𝐶 ℎ covariogramcan be estimated from data!
𝑙∗ = 𝐶e)[𝐶Y∗ −𝟏c𝐶e)𝐶Y∗ − 1𝟏c𝐶e)1 𝟏] 𝑦\(𝑠Y) = 𝑙∗c𝑌
Notation:𝐶.,a ≡ 𝐶𝑜𝑣(𝑌 𝑠. , 𝑌 𝑠a )
Universal Kriging• Problem: • Given observations y = 𝑦(𝑠)), … , 𝑦(𝑠- ) , covariates 𝑥(𝑠)), … , 𝑥(𝑠- ) , 𝑥(𝑠Y),predict 𝑦(𝑠Y)
• Assumptions:• 𝑦(𝑠.) = 𝑥(𝑠.)c𝛽 + 𝜖., 𝒀 = 𝑿𝛽 + 𝝐• 𝝐~𝑁(0, Σ), where Σ = 𝜎*𝐻 ∅ + 𝜏*𝐼
• Estimator: 𝑦\(𝑠Y) = ℎ(𝑦)• How to find optimal ℎ(𝑦)?• Minimize expected square loss!
13
𝐸 𝑦(𝑠Y − ℎ(𝑦))*
𝑦\(𝑠Y) = ℎ 𝑦 = 𝐸 𝑦(𝑠Y |𝑦)Optimalprediction:
= × +
𝒀 𝑿 𝛽 𝝐
Universal Kriging
14
𝒀 = 𝑿𝛽 + 𝝐
= × +
𝒀 𝑿 𝛽 𝝐
𝝐~𝑁(0, Σ),whereΣ = 𝜎*𝐻 ∅ + 𝜏*𝐼AssumingaGaussianprocess,i.e.,anysetofobservationYfollowsGaussiandistribution
𝑌)𝑌*
~𝑁𝜇)𝜇* , Ω)) Ω)*
Ω*) Ω**
𝐸(𝑌)|𝑌*) = 𝜇) + Ω)*Ω**e)(𝑌* − 𝜇*)
𝑉𝑎𝑟(𝑌)|𝑌*) = Ω)) − Ω)*Ω**e)Ω*)
𝑦\(𝑠Y) = ℎ 𝑦 = 𝐸 𝑦(𝑠Y |𝑦)Optimalprediction:
𝑌) = 𝑦(𝑠Y)𝑌* = 𝑦 = 𝑦(𝑠) , … , 𝑦(𝑠-))c
𝜇) = 𝑥(𝑠Y)c𝛽𝜇* = (𝑥(𝑠))c𝛽, … , 𝑥(𝑠-)c𝛽)c= 𝑿𝛽
𝐸 𝑦(𝑠Y 𝑦 = 𝑥(𝑠Y)c𝛽 + 𝐶Y∗c Σe)(𝑦 − 𝑋𝛽)
𝛽w 𝐶 ℎ + 𝜏*𝐼
Review Questions:• For each of the following statement, True or False?
1. In Geostatistics, strict stationarity is often assumed.2. Variogram can help select a distance threshold for
spatial neighborhood (range of spatial dependency).3. Kriging assumes weak or intrinsic stationarity.4. Ordinary Kriging assumes weak or intrinsic
stationarity.
15
Part II: Lattice Statistics• Areal data model• A tessellation of continuous space into (regular or
irregular) cells• Mapping each unit to a non-spatial attribute value
• Lattice Statistics:• 𝑌 𝑠 : 𝑠 ∈ 𝐷 ,𝐷 is a set of cells in areal data
• What is Lattice Statistics used for?• Explore spatial patterns in areal maps• Model areal maps for interpretation
16http://brilliantmaps.com
17
Electionresultbycountyin2016
Source:http://brilliantmaps.com
Q1:Isthemapspatiallystationary?Q2:Istherestrongautocorrelationgloballyandlocally?Q3:Howtomodelandinterpretcoefficientsthatimpactresults?
W-Matrix• Spatial neighborhood matrix• Wij > 0 when i and j are neighbors• Wij = 0 when i and j are not neighbors
• Example
18
14 7
5 82
3 6
Anarealdatawith8units1
4 7
5 82
3 6
14 7
5 82
3 6
0 1 0 1 1 0 0 01 0 1 0 1 0 0 00 1 0 0 0 1 0 01 0 0 0 1 0 1 01 1 0 1 0 0 0 10 0 1 0 1 0 0 10 0 0 1 0 0 0 10 0 0 0 1 1 1 0
Arookneighborhood
Aqueenneighborhood
0 1 0 1 1 0 0 01 0 1 0 1 1 0 00 1 0 0 1 1 0 01 0 0 0 1 0 1 11 1 1 1 0 1 1 10 1 1 0 1 0 0 10 0 0 1 1 0 0 10 0 0 1 1 0 1 0
Spatial Autocorrelation• Measures the level of global spatial association• Moran’s I:• 𝐼 =
- ∑ ∑ xyz({ye{|)({ze{|)�z
�y
(∑ xyz) ∑ ({ye{|)~�y
�y�z
where 𝑖and 𝑗are locations.
• 𝐼 ∈ [−1, 1], high value shows strong spatial association• Example with rook neighborhood:
19
0 1 0 1 01 0 1 0 10 1 0 1 01 0 1 0 10 1 0 1 0
𝐼 ≈ −1
1 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 0
𝐼 ≈ 1
Spatial Autocorrelation• Geary’s C• 𝐶 =
(-e)) ∑ ∑ xyz({ye{z)~�z
�y
*(∑ xyz) ∑ ({ye{|)~�y
�y�z
where 𝑖and 𝑖are locations.
• 𝐶 ≥ 0, low values show strong spatial association• Example with rook neighborhood:
20
0 1 0 1 01 0 1 0 10 1 0 1 01 0 1 0 10 1 0 1 0
𝐶 =?
1 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 01 1 1 0 0
𝐶 =?
WhichonehashigherC?
Local Spatial Autocorrelation• Local indicator of spatial association (LISA)• When data is not homogeneous, local behaviors
may differ from global behavior (outliers)• Local Moran’s I: • 𝐼. =
{ye{|�~
∑ 𝑤.a(𝑌a−𝑌|)�a where 𝑚* =
∑ ({ye{|)~�y-
• 𝐼 = )-∑ 𝐼.�.
• Local Geary’s C• 𝐶. =
)�~∑ 𝑤.a(𝑌. − 𝑌a)*�a
• 𝐶 ∝ ∑ 𝐶.�.
21
Spatial Autocorrelation for Nominal Data
• Black-Black Joint Count
22
W B W B WB W B W BW B W B WB W B W BW B W B W
B B B W WB B B W WB B B W WB B B W WB B B W W
Suppose𝑛 locations,𝑛� white,𝑛�black.
𝑃� =-�-,𝑃� = -�
-
𝐽𝐶�� =)*∑ ∑ 𝑤.a𝐼(𝑦. = 𝐵, 𝑦a = 𝐵)�
a�. ,
Test:����e�(����)�𝐸(𝐽𝐶��) =
)*∑ ∑ 𝑤.a�
a�. 𝑃�𝑃�,𝑉𝑎𝑟(𝐽𝐶��) = 𝜎* assumingGaussiandistribution
𝐸(𝐽𝐶��) and𝑉𝑎𝑟(𝐽𝐶��) canalsobegeneratedfromrandompermutation!
H0:BandWareindependentlydistributed,𝐽𝐶�� asymptoticallynormaldistributionH1:BandWarenotindependent,BBtendstocluster.
Markov Random Field• Problem: How to model joint distribution of a field?• Brook’s Lemma: • Joint distribution can be determined by conditional
distribution• 𝑝 𝑦), 𝑦*, … , 𝑦- ⟸ 𝑝 𝑦.|𝑦a, 𝑗 ≠ 𝑖• However, conditional distribution above is too complex!
• Markov property:• 𝑝 𝑦.|𝑦a, 𝑗 ≠ 𝑖 ≡ 𝑝 𝑦.|𝑦a, 𝑗 ∈ 𝑁(𝑖)• Conditional distribution of observation at a location only
depends on its neighbors
23
Spatial Autoregressive Model (SAR)• 𝑌 = 𝜌𝑊𝑌 + 𝑋𝛽 + 𝜖
24
= × +
𝒀 𝑿 𝛽 𝝐
× +×
𝜌 𝑊 𝒀
autoregressiveterm covariates independentnoise
1854BroadStreetcholeraoutbreak(Solidblackrectanglesshowvictims)
Part III: Spatial Point Process• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare event locations, fixed event type
25
1854BroadStreetcholeraoutbreak(Solidblackrectanglesshowvictims)
Part III: Spatial Point Process• Spatial point process – spatial point event data• 𝑠), 𝑠*, 𝑠+, … , 𝑠- , 𝑠./sare event locations, fixed event type
26
Spatial Point Process• Example• Crime event locations• Disease event locations
• Questions to answer• Are points tend to cluster/de-cluster?• Is point intensity homogeneous or there is a hotspot?
27
Shooting,Chicago2010Source:http://assets.dnainfo.com
K-functionstatisticsSpatialscanstatistics
&
'
%
$
What’s Special About Spatial Data Mining?
Clustering
? Clustering: Find groups of tuples
? Statistical Significance
• Complete spatial randomness, cluster, and decluster
Figure 9: Inputs: Complete Spatial Random (CSR), Cluster, and Decluster
Figure 10: Classical Clustering
Data is of Complete
Spatial Randomness
3: Mean Dense
1: Unusually Dense 2: Desnse
4: Sparse
33
4
3
2
1 2
3
3
2
3
2
2
1Data is of Decluster Pattern
Figure 11: Spatial Clustering
29
completespatialrandom(CSR)
clustering declustering pp
1 pp
CSR Hotspot
Homogeneous Poisson Point Process• Complete spatial randomness (CSR)• Intensity parameter 𝜆, 𝑁(𝐴) as number of points in an area 𝐴• For any 𝐴, 𝑁(𝐴)follows Poisson(𝜆𝐴)• For any disjoint 𝐴) and 𝐴*, 𝑁(𝐴))and 𝑁(𝐴*)are
independent• In any area A, conditioned on 𝑁 𝐴 = 𝑛, these 𝑛 points
independent and uniformly distributed in 𝐴
28
pp1
What can we say about CSR?• The intensity of points are the same everywhere• Once 𝑁 𝐴 is realized, point locations are independent• CSR is often used as a null distribution
Ripley’s K Function• Test if points tend to cluster with each other (micro)• Hypothesis testing• H0: homogeneous Poisson point process (independent)• H1: points tend to cluster with each other• Test statistic:
• 𝐾 𝑑 = 𝜆e)𝐸(#𝑜𝑓𝑝𝑜𝑖𝑛𝑡𝑠𝑤𝑖𝑡ℎ𝑖𝑛𝑟𝑎𝑑𝑖𝑢𝑠𝑑𝑜𝑓𝑎𝑝𝑜𝑖𝑛𝑡)• 𝐾� 𝑑 = 𝜆e) ∑ 𝐼(𝑑.a ≤ 𝑑)/𝑛�
. a• Under H0, 𝐾 𝑑 = 𝜋𝑑*
29
&
'
%
$
What’s Special About Spatial Data Mining?
Clustering
? Clustering: Find groups of tuples
? Statistical Significance
• Complete spatial randomness, cluster, and decluster
Figure 9: Inputs: Complete Spatial Random (CSR), Cluster, and Decluster
Figure 10: Classical Clustering
Data is of Complete
Spatial Randomness
3: Mean Dense
1: Unusually Dense 2: Desnse
4: Sparse
33
4
3
2
1 2
3
3
2
3
2
2
1Data is of Decluster Pattern
Figure 11: Spatial Clustering
29
CSR Clustering Declustering ExampleofKfunctionplot
CSR
clustering
declustering
Ripley’s Cross K Function• Test if of two types of events tend to cluster together• H0: event types i and j are independent• H1: event types i and j tend to cluster together• Test statistic:
• 𝐾.a 𝑑 = 𝜆ae)𝐸(#𝑜𝑓𝑝𝑜𝑖𝑛𝑡𝑠𝑜𝑓𝑡𝑦𝑝𝑒𝑗𝑤𝑖𝑡ℎ𝑖𝑛𝑑𝑜𝑓𝑎𝑝𝑜𝑖𝑛𝑡𝑖)
• 𝐾.a¢ 𝑑 = 𝜆ae) ∑ 𝐼(𝑑.a ≤ 𝑑)/𝑛.�
. a = (𝜆.𝜆a𝐴)e) ∑ 𝐼(𝑑.a ≤ 𝑑)�. a
• Under H0, 𝐾.a 𝑑 = 𝜋𝑑*
30
&
'
%
$
What’s Special About Spatial Data Mining?
Illustration of Cross-Correlation
? Illustration of Cross K-Function for Example Data
0 2 4 6 8 100
200
400
600
800
1000
Distance h
Cro
ss−K
func
tion
Cross−K function of pairs of spatial features
y=pi*h2o and *x and +* and x* and +
Figure 6: Cross K-function for Example Data
23
&
'
%
$
What’s Special About Spatial Data Mining?
Cross-Correlation
? Cross K-Function Definition
• K
ij
(h) = ∏
°1j
E [number of type j event within distance h
of a randomly chosen type i event]
• Cross K-function of some pair of spatial feature types
• Example
– Which pairs are frequently co-located?
– Statistical significance
0 10 20 30 40 50 60 70 800
10
20
30
40
50
60
70
80Co−location Patterns − Sample Data
X
Y
Figure 5: Example Data (o and * ; x and +)
22
Spatial Scan Statistics• Test if point intensity is homogeneous everywhere• Hypothesis testing (assume Poisson point process)• H0: homogeneous intensity 𝜆.- = 𝜆£¤¥ for window W• H1: inhomogeneous intensity 𝜆.- > 𝜆£¤¥ for window W• Test statistic (likelihood ratio)
• 𝐿𝑅 = §¨©ª.«¬.®££¯(°¨¥¨|±²)§¨©ª.«¬.®££¯(°¨¥¨|±³)
=Sup�,·y¸¹·º»¼L(°¨¥¨;�,¿y¸,¿º»¼)Sup·y¸À·º»¼L(°¨¥¨;�,¿y¸,¿º»¼)
• 𝐿𝑅 = Sup�-���
-� -�Á
��Á
-�Á
𝐼(. )
• Significance • P-value, Monte Carlo simulation
31
pp1 pp
CSR Hotspot
𝑛�:observed#inW𝐸�:expected#inW
𝑛�/ :observed#outofW𝐸�/ :expected#outofW
Spatial Scan Statistics: An Example• Input: 13 points• Compute test statistic:
• 𝐿𝑅 = Sup�-���
-� -�Á
��Á
-�Á
𝐼(. )• 𝑛� = 7 (observed # in W)• 𝑛�/ = 6 (observed # out of W)• 𝐸� = )+
)Å ∗ 1 (expected # in W)• 𝐸�/ = )+
)Å ∗ 15 (expected # out of W)• 𝐼 . = 1 (density in W is higher than out)
• 𝐿𝑅 = ( Ç)+/)Å
)Ç( Å)+∗)È/)Å
)Å= 56115.15• Monte Carlo simulation
32
An Example
Scanning window Z , outside window as Z 0
LR =supZ2Z,p>q L(Z , p, q)
supp=qL(Z , p, q)= sup
Z2Z(nZBZ
)nZ (nZ 0
BZ 0)nZ 0 I (·)
nZ = 7 and nZ 0 = 6, |Z | = 1, |Z 0| = 16, BZ = 13/16⇥ 1 = 0.8,BZ 0 = 13/16⇥ 15 = 12.2, so LR = 56115.15
Z Z’
Figure: Illustrative exampleZhe Jiang (University of Alabama) Group Seminar Slides September 22, 2016 8 / 9
EnumeratingwindowWwithsize1
Studyareasize4*4=16