Spatial Association Defining the relationship between two variables.
-
Upload
martin-foster -
Category
Documents
-
view
233 -
download
5
Transcript of Spatial Association Defining the relationship between two variables.
Spatial Association
Defining the relationship between two variables.
Method Depends On Data Type• The statistical/spatial analysis method is the function of
measurement level and the spatial data model.
Nominal Ordinal Interval/ Ratio
Nominal Chi-sq Chi-sqMedian by Nominal Class
Mean by nominal classK-S test
Ordinal Rank Correlation Coefficient
Rank Correlation Coefficient Mean of the ordinal class
Interval/ Ratio
Co-varianceCross correlationCorrelation
Chi-Square • Chi-Square can be used to compare:
– Area A to Area B– Area A to Line– Area A to Point
• The hypothesis is always the same:– HO: The distribution of observations across Area A is equal to
the Expected Distribution, where the Expected Distribution is usually CSR.
• Or does the distribution of Area A explains the distribution of the observations, assume that random observations would be distributed proportionally across Area A.
– HA: Not equal to Expected Distribution, indicating a potential first order effect.
Chi-Square
n
iEEO
i
iiX1
)(2 2
Chi-Square• Advantages:
– Easy to compute and interpret– Non-parametric, distribution neutral – “Easy” to determine expected values – proportional to
area– Can be applied to nominal (count) data.
• Disadvantages:– Results are influenced by the scale of the
observations – use other indices– Ideal for points, more problematic with areas and
lines.– Influenced by zone systems – arbitrary areas
Cramer V Statistic
• Cramer's coefficient is a measure of association that ranges from 0 to 1. A Cramer's coefficient of 0 indicates that the calculated chi-square is 0, i.e., the observed frequencies are all equal to the expected frequencies. This means that the there is perfect independence between the rows and columns and the column variable provides no information about the row variable. A Cramer's coefficient of 1 indicates that the calculated chi-square is the highest possible chi-square value [n(L-1)]; this indicates a perfect relationship between the rows and columns -- the column variable provides perfect information about the row variable.
• A V greater the 0.7 is strong, between 0.4-0.7 is moderate, between 0.2-0.4 is weak.
• V equals the square root of chi-square divided by sample size, n, times m, which is the smaller of (rows - 1) or (columns - 1):
– V = SQRT(X2/nm).
Chi-Square• Calculate the chi-square statistic and the Cramer's
coefficient for the following data. Test for significance at the 0.05 level. The Table value for a Chi-square statistic with 4 degrees of freedom at the 0.05 level is 9.488. 3.28 is between right-tail probability of 0.7 and 0.5
Where V = [3.28 / 16 * 1]1/2
Veg Type Veg Area sq.km.
Fraction of Area
Observed Fire area sq km
Expected Fire Area sq km
Chi-sq
A 1000 0.25 0.2 2 1.62
B 1000 0.25 3.5 2 1.125
C 800 0.2 2.3 1.6 0.3063
D 1000 0.25 1.9 2 0.005
E 200 0.05 0.1 0.4 0.225
Total 4000 1 8 8 3.2813
V = 0.453
Chi-Square• Chi-square is sensitive to scale. The bigger the
numbers the large chi-square.
• V will normalize for scale. Here, although the chi-square is very high the results may still be only moderately strong with a V of 0.453 (note vegetation type a and b).
Veg Type Veg Area ha Fraction of Area
Observed Fire area ha Expected Fire Area ha Chi-sq
A 100000 0.25 20 200 162
B 100000 0.25 350 200 112.5
C 80000 0.2 230 160 30.625
D 100000 0.25 190 200 0.5
E 20000 0.05 10 40 22.5
Total 400000 1 800 800 328.13
V= 0.453
Kolmogorov – Smirnov Test• Compare Observed CFD to Expect CDF
– HO: Observed EQ Expect – HA: Observed NE Expect, indication of a 1st order effect
• Expected can be any distribution – usually CSR.
• Advantages:– Ideal for comparing points to fields, more problematic with areas and lines.– Nonparametric – distribution neutral– Easy to compute and interpret
• Disadvantages:– How to compute the Expect CDF?– Use random number of point with the same sample size as the observed. If the
sample is “small” you random points may not appear random.– Create a CDF for the population using all measurements or a large sample. In
this case you are sampling the environment and you are asking are the sample points randomly located across the environment.
K-S Test• Archeology sites vs.
distance from wadis• Random n = 250• Sites n = 84• P=0.01; Dmax = 0.21• P=0.05; Dmax = 0.17• P=0.10; Dmax = 0.15
• A Dmax = 0.12 indicates the distance distributions may be the same
0.000
0.200
0.400
0.600
0.800
1.000
1.200
0 200 400 600 800 1000 1200
Distance to Wadi
CD
F Random
SitesDmax = 0.98 – 0.86 = 0.12
Analysis of Environmental Justice
Point in Polygon Analysis
Erie Chi-SquaredPoverty Expected Minority Expected
AREA SITES AREA SITESTOTAL 2784133584 162 2784133584 162Low 2743262089 159.6218 2668104627 155.248639Medium 37710244.45 2.194241 20151280.46 1.172539799High 3161250.755 0.183943 95877675.83 5.578821209
Poverty Observed Minority Observed
TOTAL 162 162Low 131 132Medium 24 7High 7 23
Low 5.132182846 3.481506946Medium 216.6996098 28.96216607High 252.570429 54.40172021SUM 474.4022216 86.84539322
CHIINV = 5.991476357
CHI-Squared Statistic
V = 0.86 V = 0.37
Interpreting Chi Square• Zero indicates no relationship• Large numbers indicate stronger relationship• Or, a table of significance can be consulted to
determine if the specific value is statistically significant
• The fact that we have shown that there is a correlation between variables does NOT mean that we have found out anything about WHY this is so. In our analysis we might state our assumptions as to why this is so, but we would need to perform other analyses to show causation.
Spatial Correspondence of Areal Distributions
• Quadrat and nearest-neighbor analysis deal with a single distribution of points
• Often, we want to measure the distribution of two or more variables
• The coefficient of Areal correspondence and chi-square statistics perform these tasks
Coefficient of Areal Correspondence
• Simple measure of the extent to which two distributions correspond to one another– Compare wheat farming to areas of minimal
rainfall• Based on the approach of overlay analysis
Overlay Analysis
• Two distributions of interest are mapped at the same scale and the outline of one is overlaid with the other
Coefficient of Areal Correspondence
• CAC is the ratio between the area of the region where the two distributions overlap and the total area of the regions covered by the individual distributions of the entire region
AreaCAreaBAreaA
AreaC
0011
0
25.04.6.6.
4.
11
1
Result of CAC• Where there is no correspondence, CAC
is equal to 0• Where there is total correspondence, CAC
is equal to 1• CAC provides a simple measure of the
extent of spatial association between two distributions, but it cannot provide any information about the statistical significance of the relationship
Resemblance Matrix• Proposed by Court (1970)
• Advantages over CAC– Limits are –1 to +1 with a perfect negative
correspondence given a value of –1– Sampling distribution is roughly normal, so you
can test for statistical significance
TotalArea
reasSumUnlikeAasSumLikeAre