Spatial Association Defining the relationship between two variables.

Spatial Association

Defining the relationship between two variables.

Method Depends On Data Type• The statistical/spatial analysis method is the function of

measurement level and the spatial data model.

Nominal Ordinal Interval/ Ratio

Nominal Chi-sq Chi-sqMedian by Nominal Class

Mean by nominal classK-S test

Ordinal Rank Correlation Coefficient

Rank Correlation Coefficient Mean of the ordinal class

Interval/ Ratio

Co-varianceCross correlationCorrelation

Chi-Square • Chi-Square can be used to compare:

– Area A to Area B– Area A to Line– Area A to Point

• The hypothesis is always the same:– HO: The distribution of observations across Area A is equal to

the Expected Distribution, where the Expected Distribution is usually CSR.

• Or does the distribution of Area A explains the distribution of the observations, assume that random observations would be distributed proportionally across Area A.

– HA: Not equal to Expected Distribution, indicating a potential first order effect.

Chi-Square

n

iEEO

i

iiX1

)(2 2

Chi-Square• Advantages:

– Easy to compute and interpret– Non-parametric, distribution neutral – “Easy” to determine expected values – proportional to

area– Can be applied to nominal (count) data.

• Disadvantages:– Results are influenced by the scale of the

observations – use other indices– Ideal for points, more problematic with areas and

lines.– Influenced by zone systems – arbitrary areas

Cramer V Statistic

• Cramer's coefficient is a measure of association that ranges from 0 to 1. A Cramer's coefficient of 0 indicates that the calculated chi-square is 0, i.e., the observed frequencies are all equal to the expected frequencies. This means that the there is perfect independence between the rows and columns and the column variable provides no information about the row variable. A Cramer's coefficient of 1 indicates that the calculated chi-square is the highest possible chi-square value [n(L-1)]; this indicates a perfect relationship between the rows and columns -- the column variable provides perfect information about the row variable.

• A V greater the 0.7 is strong, between 0.4-0.7 is moderate, between 0.2-0.4 is weak.

• V equals the square root of chi-square divided by sample size, n, times m, which is the smaller of (rows - 1) or (columns - 1):

– V = SQRT(X2/nm).

Chi-Square• Calculate the chi-square statistic and the Cramer's

coefficient for the following data. Test for significance at the 0.05 level. The Table value for a Chi-square statistic with 4 degrees of freedom at the 0.05 level is 9.488. 3.28 is between right-tail probability of 0.7 and 0.5

Where V = [3.28 / 16 * 1]1/2

Veg Type Veg Area sq.km.

Fraction of Area

Observed Fire area sq km

Expected Fire Area sq km

Chi-sq

A 1000 0.25 0.2 2 1.62

B 1000 0.25 3.5 2 1.125

C 800 0.2 2.3 1.6 0.3063

D 1000 0.25 1.9 2 0.005

E 200 0.05 0.1 0.4 0.225

Total 4000 1 8 8 3.2813

V = 0.453

Chi-Square• Chi-square is sensitive to scale. The bigger the

numbers the large chi-square.

• V will normalize for scale. Here, although the chi-square is very high the results may still be only moderately strong with a V of 0.453 (note vegetation type a and b).

Veg Type Veg Area ha Fraction of Area

Observed Fire area ha Expected Fire Area ha Chi-sq

A 100000 0.25 20 200 162

B 100000 0.25 350 200 112.5

C 80000 0.2 230 160 30.625

D 100000 0.25 190 200 0.5

E 20000 0.05 10 40 22.5

Total 400000 1 800 800 328.13

V= 0.453

Kolmogorov – Smirnov Test• Compare Observed CFD to Expect CDF

– HO: Observed EQ Expect – HA: Observed NE Expect, indication of a 1st order effect

• Expected can be any distribution – usually CSR.

• Advantages:– Ideal for comparing points to fields, more problematic with areas and lines.– Nonparametric – distribution neutral– Easy to compute and interpret

• Disadvantages:– How to compute the Expect CDF?– Use random number of point with the same sample size as the observed. If the

sample is “small” you random points may not appear random.– Create a CDF for the population using all measurements or a large sample. In

this case you are sampling the environment and you are asking are the sample points randomly located across the environment.

K-S Test• Archeology sites vs.

distance from wadis• Random n = 250• Sites n = 84• P=0.01; Dmax = 0.21• P=0.05; Dmax = 0.17• P=0.10; Dmax = 0.15

• A Dmax = 0.12 indicates the distance distributions may be the same

0.000

0.200

0.400

0.600

0.800

1.000

1.200

0 200 400 600 800 1000 1200

Distance to Wadi

CD

F Random

SitesDmax = 0.98 – 0.86 = 0.12

Analysis of Environmental Justice

Point in Polygon Analysis

Erie Chi-SquaredPoverty Expected Minority Expected

AREA SITES AREA SITESTOTAL 2784133584 162 2784133584 162Low 2743262089 159.6218 2668104627 155.248639Medium 37710244.45 2.194241 20151280.46 1.172539799High 3161250.755 0.183943 95877675.83 5.578821209

Poverty Observed Minority Observed

TOTAL 162 162Low 131 132Medium 24 7High 7 23

Low 5.132182846 3.481506946Medium 216.6996098 28.96216607High 252.570429 54.40172021SUM 474.4022216 86.84539322

CHIINV = 5.991476357

CHI-Squared Statistic

V = 0.86 V = 0.37

Interpreting Chi Square• Zero indicates no relationship• Large numbers indicate stronger relationship• Or, a table of significance can be consulted to

determine if the specific value is statistically significant

• The fact that we have shown that there is a correlation between variables does NOT mean that we have found out anything about WHY this is so. In our analysis we might state our assumptions as to why this is so, but we would need to perform other analyses to show causation.

Spatial Correspondence of Areal Distributions

• Quadrat and nearest-neighbor analysis deal with a single distribution of points

• Often, we want to measure the distribution of two or more variables

• The coefficient of Areal correspondence and chi-square statistics perform these tasks

Coefficient of Areal Correspondence

• Simple measure of the extent to which two distributions correspond to one another– Compare wheat farming to areas of minimal

rainfall• Based on the approach of overlay analysis

Overlay Analysis

• Two distributions of interest are mapped at the same scale and the outline of one is overlaid with the other

Coefficient of Areal Correspondence

• CAC is the ratio between the area of the region where the two distributions overlap and the total area of the regions covered by the individual distributions of the entire region

AreaCAreaBAreaA

AreaC

0011

0

25.04.6.6.

4.

11

1

Result of CAC• Where there is no correspondence, CAC

is equal to 0• Where there is total correspondence, CAC

is equal to 1• CAC provides a simple measure of the

extent of spatial association between two distributions, but it cannot provide any information about the statistical significance of the relationship

Resemblance Matrix• Proposed by Court (1970)

• Advantages over CAC– Limits are –1 to +1 with a perfect negative

correspondence given a value of –1– Sampling distribution is roughly normal, so you

can test for statistical significance

TotalArea

reasSumUnlikeAasSumLikeAre

Spatial Association Defining the relationship between two variables.

Documents

Transcript of Spatial Association Defining the relationship between two variables.