DATA STRUCTURES USED IN SPATIAL DATA MINING

22
1 DATA STRUCTURES USED IN DATA STRUCTURES USED IN SPATIAL DATA MINING SPATIAL DATA MINING

description

DATA STRUCTURES USED IN SPATIAL DATA MINING. What is Spatial data ?. - PowerPoint PPT Presentation

Transcript of DATA STRUCTURES USED IN SPATIAL DATA MINING

Page 1: DATA STRUCTURES USED IN SPATIAL DATA MINING

11

DATA STRUCTURES USED IN DATA STRUCTURES USED IN SPATIAL DATA MININGSPATIAL DATA MINING

Page 2: DATA STRUCTURES USED IN SPATIAL DATA MINING

22

What is Spatial data ?What is Spatial data ? broadly be defined as data which covers broadly be defined as data which covers

multidimensional points, lines, rectangles, multidimensional points, lines, rectangles, polygons, cubes and other geometric objects. polygons, cubes and other geometric objects. Spatial data occupies a certain amount of Spatial data occupies a certain amount of space called it’s spatial extent, which is space called it’s spatial extent, which is characterized by location and boundary.characterized by location and boundary.

USESUSES Geographic Information Systems.Geographic Information Systems. CAD/CAM It can CAD/CAM It can Multimedia ApplicationsMultimedia Applications – – Content based image retrievalContent based image retrieval – – Fingerprint matchingFingerprint matching – – MRI ( Digitized medical images)MRI ( Digitized medical images)

Page 3: DATA STRUCTURES USED IN SPATIAL DATA MINING

33

Features of spatial dataFeatures of spatial data

Specific features of spatial data are rich Specific features of spatial data are rich data types, implicit spatial relationships data types, implicit spatial relationships among the variables, observations that among the variables, observations that are not independent, spatial auto are not independent, spatial auto correlation among the features.correlation among the features.

It has two distinct types of attributes i.e. It has two distinct types of attributes i.e. spatial attributes, non spatial attributes. spatial attributes, non spatial attributes. Spatial attributes are used to define the Spatial attributes are used to define the spatial locations and extend of spatial spatial locations and extend of spatial objects.objects.

Page 4: DATA STRUCTURES USED IN SPATIAL DATA MINING

44

Types of spatial databasesTypes of spatial databases

Region DataRegion Data: It has a spatial extent : It has a spatial extent having a location and boundary. having a location and boundary. Region data basically is the geometric Region data basically is the geometric approximation to an actual database.approximation to an actual database.

Point DataPoint Data: Point data consists of : Point data consists of collection of points in a collection of points in a multidimensional space. It doesn’t multidimensional space. It doesn’t cover any area of space. cover any area of space.

Page 5: DATA STRUCTURES USED IN SPATIAL DATA MINING

55

What is Spatial Data Mining?What is Spatial Data Mining?

It is defined as the non-trivial search It is defined as the non-trivial search for interesting and unexpected spatial for interesting and unexpected spatial patterns from spatial databases.patterns from spatial databases.

New understanding of geographic New understanding of geographic processes for critical questions like processes for critical questions like how is the health of planet Earth? how is the health of planet Earth? Characterize effects of human activity Characterize effects of human activity on environment and ecologyon environment and ecology? ? needs needs spatial data mining.spatial data mining.

Page 6: DATA STRUCTURES USED IN SPATIAL DATA MINING

66

Spatial data in GISSpatial data in GIS A geographic information system is A geographic information system is

any system for capturing, storing, any system for capturing, storing, analyzing and managing data and analyzing and managing data and associated attributes which are associated attributes which are spatially referenced to Earth.spatially referenced to Earth.

There are two broad methods used There are two broad methods used to store data in a GIS i.e. Raster and to store data in a GIS i.e. Raster and Vector. In a GIS, geographical Vector. In a GIS, geographical features are often expressed as features are often expressed as vectors, by considering those features vectors, by considering those features as geometrical shapes like point, as geometrical shapes like point, chains, polygonschains, polygons.

Page 7: DATA STRUCTURES USED IN SPATIAL DATA MINING

77

Spatial data structures used in Spatial data structures used in GISGIS

In order to handle spatial data In order to handle spatial data efficiently, as required in computer efficiently, as required in computer aided design and geo-data aided design and geo-data applications, a database system applications, a database system needs an index mechanism that will needs an index mechanism that will help it retrieve data items quickly help it retrieve data items quickly according to their spatial locations.according to their spatial locations.

Quad treeQuad tree k-d treek-d tree R-treeR-tree R+-treeR+-tree R*-treeR*-tree

Page 8: DATA STRUCTURES USED IN SPATIAL DATA MINING

88

Quad treesQuad trees

It is used to store 2D space.It is used to store 2D space. Each node of a quad tree is associated Each node of a quad tree is associated

with a rectangular region of with a rectangular region of space.space.

The top node is associated with the entire The top node is associated with the entire target space.target space.

Each internal node splits the space into Each internal node splits the space into four disjunct sub spaces according to the four disjunct sub spaces according to the axes.axes.

Each of these sub spaces is split Each of these sub spaces is split recursively until there is at most one recursively until there is at most one object inside each of them.object inside each of them.

Page 9: DATA STRUCTURES USED IN SPATIAL DATA MINING

99

Division of space by quadtreeDivision of space by quadtree

Page 10: DATA STRUCTURES USED IN SPATIAL DATA MINING

1010

k-d Treesk-d Trees

A k-d tree partitions the space into two sub A k-d tree partitions the space into two sub spaces according to one of the coordinates spaces according to one of the coordinates of the splitting points.of the splitting points.

Let level(nod) be the length of the path Let level(nod) be the length of the path from the root to the node nod and suppose from the root to the node nod and suppose the axes are numbered from 0 to k − 1. At the axes are numbered from 0 to k − 1. At the level level(nod) in every node the space the level level(nod) in every node the space is split according to the coordinate number is split according to the coordinate number (level(nod) mod k).(level(nod) mod k).

The partitioning is done along one The partitioning is done along one dimension at the node at the top level of dimension at the node at the top level of the tree, along another dimension in nodes the tree, along another dimension in nodes at the next level and so on, cycling through at the next level and so on, cycling through the dimensions. the dimensions.

Page 11: DATA STRUCTURES USED IN SPATIAL DATA MINING

1111

Division of space by a k-d treeDivision of space by a k-d tree

Page 12: DATA STRUCTURES USED IN SPATIAL DATA MINING

1212

R-TreesR-Trees It is a balanced tree structure with the

index objects stored in leaf nodes.

The structure is completely dynamic with no need for intermittent restructuring.

If M is the maximum number of entries in one node and m = M/2. Then ‘m’ specifies the minimum number of entries allowed in a node except for the root.

Page 13: DATA STRUCTURES USED IN SPATIAL DATA MINING

1313

Continue…Continue… Every non-leaf node has between ‘m’ and Every non-leaf node has between ‘m’ and ‘M’ children‘M’ children unless it is the root.unless it is the root.

The root node has at least two children The root node has at least two children unless it is a leaf.unless it is a leaf.

For each index record (I, tuple-id) in a leaf For each index record (I, tuple-id) in a leaf node, I is the smallest rectangle that node, I is the smallest rectangle that spatially contains the n dimensional data spatially contains the n dimensional data object.object.

For each (I, child-ptr) entry in a non-leaf For each (I, child-ptr) entry in a non-leaf node, I is the smallest rectangle that node, I is the smallest rectangle that spatially contains the rectangles in the child spatially contains the rectangles in the child nodes.nodes.

Page 14: DATA STRUCTURES USED IN SPATIAL DATA MINING

1414

Division of space by R-trees

Page 15: DATA STRUCTURES USED IN SPATIAL DATA MINING

1515

R+-tree It is an extension of R-tree. Here bounding rectangle of nodes at one

level do not overlap. This feature decreases the number of searched branches of the tree and reduces the time consumption and increases the space consumption .

Here the data objects are allowed to split so that different parts of one object can be stored in more nodes of one tree level.

Page 16: DATA STRUCTURES USED IN SPATIAL DATA MINING

1616

Continue…Continue… Root has at least two children unless it is a Root has at least two children unless it is a

leaf.leaf. All leaves are at same level.All leaves are at same level. There is no constraint on the minimum There is no constraint on the minimum

number of entries at each node.number of entries at each node.

Page 17: DATA STRUCTURES USED IN SPATIAL DATA MINING

1717

Division of space by R+-treeDivision of space by R+-tree

Page 18: DATA STRUCTURES USED IN SPATIAL DATA MINING

1818

R*-treeR*-tree

R*-tree is a modification of R–tree. R–tree R*-tree is a modification of R–tree. R–tree tries to minimize the area of all nodes of tries to minimize the area of all nodes of the tree.the tree.

But R*–tree combines more criteria: But R*–tree combines more criteria: • the area covered by a bounding the area covered by a bounding

rectanglerectangle• the margin of a rectangle: Minimization the margin of a rectangle: Minimization

of the margin of a bounding rectangle of the margin of a bounding rectangle prefers the squares. prefers the squares.

• the overlap between rectangles: the overlap between rectangles: Minimization of the overlap between Minimization of the overlap between rectangles decreases the number of rectangles decreases the number of paths that must be searchedpaths that must be searched

Page 19: DATA STRUCTURES USED IN SPATIAL DATA MINING

1919

Conclusion

New techniques are needed for SDM New techniques are needed for SDM due todue to

spatial auto correlation, continuity of spatial auto correlation, continuity of space. Indexing structures discussed space. Indexing structures discussed above are very much useful for above are very much useful for spatial data represented in vector spatial data represented in vector space. For metric spaces M-tree, Vp-space. For metric spaces M-tree, Vp-tree, mvp-tree are used.The main aim tree, mvp-tree are used.The main aim of all these indexing structures is to of all these indexing structures is to minimize disk access.minimize disk access.

Page 20: DATA STRUCTURES USED IN SPATIAL DATA MINING

2020

References

http://en.wikipedia.org/wiki/Quadtree

http://www.cs.umd.edu/~hjs/rtrees/index.html

Spatial datamining.pdf http://www.dbminer.com R+-tree.pdf Data structure for spatial data

mining21.pdf

Page 21: DATA STRUCTURES USED IN SPATIAL DATA MINING

2121

THANK YOU

Page 22: DATA STRUCTURES USED IN SPATIAL DATA MINING

2222

QUERIES ??????