Sa Presentation 20070917111 Thomas

46
Vector based spatial analysis Nikolaos Spyropoulos and Thomas K. Andersen Institute of Geography

Transcript of Sa Presentation 20070917111 Thomas

Page 1: Sa Presentation 20070917111 Thomas

Vector based spatial analysis

Nikolaos Spyropoulos and Thomas K. AndersenInstitute of Geography

Page 2: Sa Presentation 20070917111 Thomas

The ESRI Guide to GIS Analysis, Mitchell 2005

• Chapter 4, Identifying Clusters

• Chapter 5, Analyzing Geographic Relationships

Page 3: Sa Presentation 20070917111 Thomas

Chapter 4, Identifying Clusters

Page 4: Sa Presentation 20070917111 Thomas

Identifying Clusters

Why identify clusters?

•Get an understanding of the location pattern in an area

•Compare these patterns with other features, for identifying possible contributing factors

•Take action on behalf of these identified clusters

Clusters of burglaries Income and emergency calls

Page 5: Sa Presentation 20070917111 Thomas

Using statistics to identify clusters

Conclusions can be drawn when looking at a map (e.g. where is the cluster), by using statistics it is possible to test the conclusions and validate them

With statistics each events is counted as an unique occurrence, which is hard to see on a map;

Page 6: Sa Presentation 20070917111 Thomas

Time period of data

The time period of data can vary a lot, from current conditions to long time periods

- For vacant parcels you need a snapshot of the current condition, for crimes or earthquakes, defining a time period is needed

Vacant houses Crimes Earthquakes

Now 6 month 100 years

Therefore: The time period is different, and has to be defined

Page 7: Sa Presentation 20070917111 Thomas

Distance within clusters

Clusters are usually defined by using Euclidian distance.

Though travel time or cost can also be used.

- Clusters of burglaries can be dependent on driving time between the crimes. Because Euclidian distance doesn’t take barriers (such as a river) into account, the Euclidian distance seems very close, even though the travel time is long.

Page 8: Sa Presentation 20070917111 Thomas

Identifying clusters - methods

Two methods for identifying clusters:

1. Finding clusters of features

when features are found in close proximity

2. Finding clusters of similar value

when groups of high and low values are found together (”hot and cold spots”)

Page 9: Sa Presentation 20070917111 Thomas

Finding clusters of features

Page 10: Sa Presentation 20070917111 Thomas

Nearest neighbour hierarchical clustering (1)

”One method for finding clusters is to specify the distance features can be from each other, in order to be part of a cluster, and the minimum number of features that make up a cluster.”

(Mitchell 2005:152)

Clusters with a specified number of features within a specified distance

Page 11: Sa Presentation 20070917111 Thomas

The method is hierarchical because the routine continues on to group the clusters into larger clusters (shows several geographic scales e.g. neighbourhood and citywide for crimes).

Clusters at small scale(Neighbourhood) Clusters at bigger scale(citywide)

Nearest neighbour hierarchical clustering (2)

Page 12: Sa Presentation 20070917111 Thomas

Nearest neighbour hierarchical clustering (3)

How nearest neighbour hierarchical clustering works:

A probability level is specified, to calculate the distance within which features will be considered a cluster

If the distance is greater than the high end of the range, the features are further apart than you would expect by chance. For clustering it is opposite, the low end of the scale is interesting (Confidence interval)

. The confidence interval is calculated by using the mean distance that would occur between points in a random distribution “mean random distance”.

--See page 155 and 156 for calculation

Page 13: Sa Presentation 20070917111 Thomas

Finding clusters of similar value

Page 14: Sa Presentation 20070917111 Thomas

Finding clusters of similar values

The GIS looks at the attribute values of each feature and its neighbours, as well as the proximity of the features.

Then calculates a degree to which nearby features have similar values for a given attribute.

Percent age 65 or over Percentages of seniors similar to their neighbours

(Blue less similar, red more similar)

Page 15: Sa Presentation 20070917111 Thomas

Identifying clusters of similar values (1)

Where high values are surrounded by high values or low values are surrounded by low values, the features are similar

Page 16: Sa Presentation 20070917111 Thomas

Identifying clusters of similar values (2)

A statistic is calculated for each feature. It is then possible to map the features based on this value, to see the locations of features of similar value

Page 17: Sa Presentation 20070917111 Thomas

Moran’s Ii (1)

A method to identify similar values

Emphasizes how features differ from the values in the study area as a whole

Compares the value of each feature in a pair to the mean value for all features in the study area (local variation - the method looks what’s happening right around each feature)

--Calculation see page 167

Page 18: Sa Presentation 20070917111 Thomas

Moran’s Ii (2)

The value for Moran’s Ii depends on the difference in attribute values, the number of neighbours with similar values, and the magnitude of the attribute data

• A high positive value for indicates that the feature is surrounded by features with similar values, either high or low.

• A Negative value indicates that the feature is surrounded by features of dissimilar values.

Page 19: Sa Presentation 20070917111 Thomas

Gi statistic

Identifying concentrations (clusters) of high and low values within a distance

Compares neighbouring within a specified distance

Two versions:

1. Gi statistic

2. Gi*

Page 20: Sa Presentation 20070917111 Thomas

Version 1 - Gi statistic

Is used to find out what’s going on around a feature/or cell, without taking the target value into account

-Used for dispersion of a certain phenomena in a certain area. Gi has been used to track down the spreading of AIDS in the counties in the San Francisco area. It was possible to see the increase over time and distance

Page 21: Sa Presentation 20070917111 Thomas

Version 2 – Gi*

The value of the target feature is included. Used to find hot or cold spots.

A distance (search radius) is defined

This distance is based on the knowledge of the features and their behaviour. Example: how long are people willing to travel to go to a certain store? (Euclidian dist., travel time etc.)

Page 22: Sa Presentation 20070917111 Thomas

Chapter 5, Analyzing Geographic Relationships

Page 23: Sa Presentation 20070917111 Thomas

Analyzing Geographic Relationships

Why Analyze Geographic Relationships?

Analysis of feature distributions.

Analysis of relationships between features.

Understanding of Predict where Examine why what is going on something is things occur In a place. likely to occur. where they do.

Page 24: Sa Presentation 20070917111 Thomas

Why Analyze Geographic Relationships?

Understanding what is going on in a place.

Example: Analysis of accidents related to speed limit in highways

Page 25: Sa Presentation 20070917111 Thomas

Why Analyze Geographic Relationships?

Predicting where something is likely to occur.

Example: Analysis of landforms in order to identify artifacts locations.

Page 26: Sa Presentation 20070917111 Thomas

Why Analyze Geographic Relationships?

Examine why things occur where they do.

Example: Improvment of newborns health.

Page 27: Sa Presentation 20070917111 Thomas

Using Statistics to Analyze Relationships

• When we look for relationships we form an opinion about things based on personal knowledge of phenomena or visual analysis of the map.

• Statistics allow us to verify those relationships and measure how strong they are.

• The idea behind using statistics is: To see in what extent the value of an attribute changes

when an other changes,

measure the relationship between two or more maps representing the variables (analyze the relationship between two attribute data).

Page 28: Sa Presentation 20070917111 Thomas

Assigning Variables to Geography

•Variables from different layers must be associated with the same geographic unit.

Case not:i)Different cell sizes Ratioii)Different set of features Combine feauturesiii)Points representing diff. categories of features Sum Features to areaiv)Combine two or more sets of features Raster

Example: Emergency calls and population data.

Page 29: Sa Presentation 20070917111 Thomas

Using Statistics to Analyze Geographic Relationships

Two statistical assumptions:•Each value is likely to occur equaly to the sample•The value of an observation doesn’t affect an other value

In Geography:

•Attribute values vary across a region Regional trends influence attribute values

Page 30: Sa Presentation 20070917111 Thomas

Using Statistics to Analyze Geographic Relationships

•Nearby features are more similar than distant ones

Spatial autocorrelation

Violation of observations independance

Smaller units tend to be more similar than bigger.

Page 31: Sa Presentation 20070917111 Thomas

Using Statistics to Analyze Geographic Relationships

Identifying relationships Vs Analyzing processes

Asking for Relationships Analyzing processesbetween (x,y)

Measure the extent of main variablesvariation that drives a processTake actions predict values Understand of a variable

Page 32: Sa Presentation 20070917111 Thomas

Identifying Geographic Relationships

How much two attributes vary.

direct relationship inverse relationship (positive correlation) (negative correlation)

If suspisious about a relationship then:

measure the relationship confirm measure direction and strenghth

Page 33: Sa Presentation 20070917111 Thomas

Methods for Identifying Geographic Relationships

•Pearson’s Correlation Coefficient

Page 34: Sa Presentation 20070917111 Thomas

Methods for Identifying Geographic Relationships

•Spearman’s Rank Correlation Coefficient

measures the extent to which two lists of ranked values correspond

Page 35: Sa Presentation 20070917111 Thomas

Identifying Geographic Relationships

What correlation coefficient doesn’t measure

• Can not apply results of correlation e.g. from a county to the nation.

• Doesn’t measure causation X Y

• Correlation doesn’t explain why there is a relationship.

• Doesn’t measure the form of the relationship just the dispretion around a straight line.

Page 36: Sa Presentation 20070917111 Thomas

Analyzing Geographic Processes

We analyze geographic processes in order to predict that something will occur.

Steps1. Develop a theory as to what is driving the process2. Analyze the relationships between various atributes of your

data (build a Model)

Page 37: Sa Presentation 20070917111 Thomas

Analyzing Geographic Processes

Linear Regression Analysis

•Plot variables on chart.•Find the line that passes between all data points (ordinary least squares method)

Page 38: Sa Presentation 20070917111 Thomas

Analyzing Geographic Processes

Ordinary Least Squares

Example from Wikipedia

Page 39: Sa Presentation 20070917111 Thomas

Analyzing Geographic Processes

Interpreting the results of regression analysis

We can see how our model works by comparing the variance inthe predicted values to the variance in the observed values.

• Perfect fit (all points on line) then R2 = 1

• Any other case with 1>R2 means not perfect fit

Calculate residuals (differences between predicted & observed values)

Page 40: Sa Presentation 20070917111 Thomas

Using More Than One Independent Variable

Most geographic processes aren’t controlled by a single variable

New Regression Analysis Equation

r2 in multivariate regression describes the variation in y explained by the combination of independent variables.

Page 41: Sa Presentation 20070917111 Thomas

Using More Than One Independent Variable

Identifying the key variables

Analysis

Test the significance of each variablet-test

Goal

Page 42: Sa Presentation 20070917111 Thomas

Factors Influencing the Regression Analysis Results

Least squares regression analysis is effective only if the following are true:

1. Linear relationship between Y,X.2. Residuals have a Mean of 0.3. Residuals have a constant Variance.4. Residuals are randomly aranged along the regression line.5. Residuals are normaly distributed.6. Independent variables are not highly correlated.

Page 43: Sa Presentation 20070917111 Thomas

Regretion Analysis & Geographic Data

For geographic data misspesification can result from many sources.

Can Occur When:

Analyze data Missing variablesat the wrongscale for the process

Page 44: Sa Presentation 20070917111 Thomas

Dealing With Regional Variation

Geographic Weighted Regression (GWR)

• Allows model coefficients to vary regionally.

• Regression runs for each location and not as a whole.

Example: Per capita income.

Page 45: Sa Presentation 20070917111 Thomas

Dealing with Local Trends

Methods to address local trends.

Resampling Spatial filtering(remove spatial autocorrelation)

Page 46: Sa Presentation 20070917111 Thomas

Running A Linear Regression Analysis With Geographic Data.

1. Determine what are you trying to predict.

2. Identify the key independent variables.

3. Examine the distribution of your variables.

4. Run the ordinary least squares regression.

5. Examine the coefficients for each independent variable.

6. Examine the residuals.• Test for spatial autocorrelation• Look for missing variables• Plot y-values against residuals• Create a frequency curve