Spatial data mining

27
Spatial Data Mining Presented by-: Rajkumar jain M.tech (c.s.e) 1 st year (2 nd sem)

Transcript of Spatial data mining

Page 1: Spatial data mining

Spatial Data Mining

Presented by-: Rajkumar jain M.tech (c.s.e)

1st year (2nd sem)

Page 2: Spatial data mining

Overview• What is spatial data.• What makes spatial data mining

different.• Spatial data mining task• Spatial data properties• Clustering analysis• Trend analysis• Future parameter

2

Page 3: Spatial data mining

What is Spatial Data?• Objects of

types:– points– lines– polygons– etc.

Used in/for:� GIS - Geographic Information Systems

� GPS - Global Positioning System

� Environmental studies

� etc …

Page 4: Spatial data mining

4

Introduction• Spatial data mining is the process of discovering

interesting, useful, non-trivial patterns from large spatial datasets– E.g. Determining hotspots: unusual locations.

• Spatial Data Mining Tasks– Characteristics rule.– Discriminate rule. E.g. Comparison of price ranges of different geographical

area. – Association rule-: we can associate the non spatial attribute

to spatial attribute or spatial attribute to spatial attribute.– Clustering rule-: helpful to find outlier detection which is

useful to find suspicious knowledge E.g. Group crime location.

Page 5: Spatial data mining

- Classification rule-: it defines whether a spatial entity belong to a particular class or how many classes will be classified.

e.g. Remote sensed image based on spectrum and GIS data. - Trend detection-A trend is a temporal pattern in some time

series data. Spatial trend is defined as consider a non spatial attribute which is the neighbour of a spatial data object.

• Properties of Spatial Data– Spatial autocorrelation– Spatial heterogeneity– Implicit Spatial Relations

5

Page 6: Spatial data mining

Hetrogeneity of Spatial Data• Auto correlation.• Patterns usually have to be defined in the

spatial attribute subspace and not in the complete attribute space.

• Longitude and latitude (or other coordinate systems) are the glue that link different data collections together.

• People are used to maps in GIS therefore, data mining results have to summarized on the top of maps.

Page 7: Spatial data mining

• Patterns not only refer to points, but can also refer to lines, or polygons or other higher order geometrical objects

7

Page 8: Spatial data mining

Autocorrelation• Items in a traditional data are

independent of each other, – whereas properties of locations in a map are

often “auto-correlated”.• First law of geography [Tobler]:

– Everything is related to everything, but nearby things are more related than distant things.

– People with similar backgrounds tend to live in the same area.

Page 9: Spatial data mining

– Economies of nearby regions tend to be similar.

– Changes in temperature occur gradually over space.

9

Page 10: Spatial data mining

10

Spatial Relations• Spatial databases do not store spatial

relations explicitly– Additional functionality required to compute them

• Three types of spatial relations specified by the OGC reference model– Distance relations

• Euclidean distance between two spatial features– Direction relations

• Ordering of spatial features in space– Topological relations

• Characterise the type of intersection between spatial features

Page 11: Spatial data mining

11

Distance relations• If dist is a distance

function and c is some real number

1. dist(A,B)>c,2. dist(A,B)<c and3. dist(A,B)=c

AB

A B

BA

Page 12: Spatial data mining

12

Direction relations• If directions of B and C

are required with respect to A

• Define a representative point, rep(A)

• rep(A) defines the origin of a virtual coordinate system

• The quadrants and half planes define the direction relations

• B can have two values {northeast, east}

• Exact direction relation is northeast

A

C

B

rep(A)

C north A

B northeast A

Page 13: Spatial data mining

13

Topological Relations• Topological relations describe how geometries

intersect spatially.• Simple geometry types

– Point, 0-dimension– Line, 1-dimension– Polygon, 2-dimension

• Each geometry represented in terms of – boundary (B) – geometry of the lower dimension– interior (I) – points of the geometry when boundary is

removed– exterior (E) – points not in the interior or boundary

Page 14: Spatial data mining

14

DE-9IM• Topological relations are defined using any

one of the following models– 4IM, four intersection model (only B and E

considered)– 9IM, nine intersection models (B, I, and E)– DE-9IM, dimensionally extended 9 intersection

model.

• Dim is the dimension function

Page 15: Spatial data mining

15

Example• Consider two

polygons– A - POLYGON ((10

10, 15 0, 25 0, 30 10, 25 20, 15 20, 10 10))

– B - POLYGON ((20 10, 30 0, 40 10, 30 20, 20 10))

Page 16: Spatial data mining

16

I(B) B(B) E(B)

I(A)

B(A)

E(A)

9-Intersection Matrix of example geometries

Page 17: Spatial data mining

17

DE-9IM for the example geometries

I(B) B(B) E(B)

I(A) 2 1 2

B(A) 1 0 1

E(A) 2 1 2

Page 18: Spatial data mining

18

Relationships using DE-9IM• Different geometries may give

rise to different numbers in the DE-9IM

• For a specific type of relationship we are only interested in certain values in certain positions– That is, we are interested in

patterns in the matrix than actual values

• Actual values are replaced by wild cards– T: value is "true" - non empty -

any dimension >= 0– F: value is "false" - empty -

dimension < 0– *: Don't care what the value is– 0: value is exactly zero– 1: value is exactly one– 2: value is exactly two

A overlaps B

I(B) B(B) E(B)

I(A) T * T

B(A) * * *

E(A) T * *

Page 19: Spatial data mining

19

Cluster analysis• Cluster analysis divides data into meaningful or useful groups

(clusters). Cluster analysis is very useful in spatial databases. For example, by grouping feature vectors as clusters can be used to create thematic maps which are useful in geographic information systems.

• CLUSTERING METHODS FOR SPATIAL DATA MINING1. Partitioning Around Medoids (PAM)- PAM is similar to K- means

algorithm. Like k- means algorithm, PAM divides data sets into groups but based on medoids. Whereas k- means is based on centroids. By using medoids, we can reduce the dissimilarity of objects within a cluster. In PAM, first calculate the medoid, then assigned the object to the nearest medoid, which forms a cluster.

• let i be a object, vi be a cluster then i is nearer to medoids mvi than mw d(i ,mvi)<d(i, mw) here w=1,2,……..k.

The k representative objects should minimize the objective function, which is the sum of the dissimilarities of all objects to their nearest medoid: Objective function = S d(i, mvi)

Page 20: Spatial data mining

• Clustering Large Applications(CLARA)• Compared to PAM, CLARA can deal with much larger data sets.

Like PAM CLARA also finds objects that are centrally located in the clusters. The main problem with PAM is that it finds the entire dissimilarity matrix at a time. So for n objects, the space complexity of PAM becomes O(n2). But CLARA avoid this problem. CLARA accepts only the actual measurements (i.e.,. n ´ p data matrix).

• CLARA assigns objects to clusters in the following way:• BUILD-step: Select k "centrally located" objects, to be used as

initial medoids. Now the smallest possible average distance between the objects to their medoids are selected, that forms clusters.

• SWAP-step: Try to decrease the average distance between the objects and the medoids. This is done by replacing representative objects. Now an object that does not belong to the sample is assigned to the nearest medoids.

20

Page 21: Spatial data mining

Trend analysis• Spatial trend-: it is regular change of one or more non spatial

attribute. E.g. when we move away eastward from the cyber tower, the

rental of residential house decrease approximately at the rate of 5% per km.

• This trend is identified by neighborhood path starting from location O and regression analysis is performed on the respective attribute values for the object of a neighborhood path to describe the regularity of change.

there are two algorithm to determine the global trend and local trend.

• Global trend-: here if considering all the object on all path starting from O,

the values for the specified attribute in general trend tend to increase or decrease with increasing distance or decreasing distance. 21

Page 22: Spatial data mining

• Local trend-: it consider the detect single path starting from an object O

and having a certain trend. E.g. some trends may be positive while the other may be negative.

22

Page 23: Spatial data mining

Spatial trend detection• E.g. Let g be graph and O is an object in g and let a is a non

special attribute on which we are detecting changing pattern while we move away from O in the neighborhood graph.

• Here let be a filter which indicate subset of neighbor to be taken into consideration.

• Let min_conf be real number.• Let min_length and max_length initialized with natural

number and here difference of distance must be between these.

23

Page 24: Spatial data mining

Architecture of Spatial Data mining

24

HUMAN COMPUTER INTERACTION SYSTEM

SPATIAL DATA MINING SYSTEM,

DISCOVERABLE KNOWLEDGE

DATA RELATED TO PROBLEM

KNOWLEDGE BASE MANAGEMENT SYSTEM

SPATIAL DATABASE

SPATIAL DATA BASE MANAGEMENT SYSTEM

DOMAIN KNOWLEDGE DATABASE

Page 25: Spatial data mining

Examples of Spatial Patterns• 1855 Asiatic Cholera in London.

– A water pump identified as the source.• Crime hotspots for planning police

patrol routes.• Affects of weather in the US caused by

unusual warming of Pacific ocean (El Nino).

Page 26: Spatial data mining

26

Future scope

• Data mining in Spatial Object Oriented Databases: How can the object oriented approach be used to design a

spatial database. Object Oriented Database may be a better choice for handling spatial data rather than traditional relational or extended relational models. For example, rectangles, polygons, and more complex spatial objects can be model naturally in object oriented database.

• Parallel data mining can use because here it takes much computational time to process the spatial data.

Page 27: Spatial data mining

Thank you

27