Lecture 5: GIS Data Management - WordPress.com1 Lecture 5: GIS Data Management GE 118: INTRODUCTION...
Transcript of Lecture 5: GIS Data Management - WordPress.com1 Lecture 5: GIS Data Management GE 118: INTRODUCTION...
1
Lecture 5:
GIS Data Management
GE 118: INTRODUCTION TO GIS
Engr. Meriam M. Santillan
Caraga State University
2
File Structures
(File-based datasets)
Simple list
Ordered sequential files
Indexed files
3
Simple List
Simplest file structure
Unordered/unstructured
Arrangement is by whichever comes first
4
Ordered Sequential Files
Simple lists that are arranged according to
some order (ex. Alphabetical order)
5
Indexed Files
An index to the directory is needed for more
efficient searches involving finding entries
given certain criteria
Can be developed as direct files or inverted
files
6
Direct Indexed Files
Records are used to provide access to other
pertinent information
7
Indirect Indexed Files
Index is based on possible search criteria,
not on the entities themselves
Attributes are the primary search criteria and
the entities rely on them for selection
8
Database
An integrated set of data on a particular
subject
Collection of interrelated data stored
together with controlled redundancy to
serve one or more applications in an
optimal fashion
Requires more elaborate structure
called a database structure or
database management system
9
Significance of Database
Most GIS activities consist of storing entity and
attribute data so that we can retrieve any
combination of these objects.
Each graphical feature must be stored explicitly with
its attributes so that their combined search becomes
faster.
10
Advantages of Database over
File-based datasets
Collecting data at a single location reduces
redundancy and duplication
Lower maintenance cost due to better organization
and decreased data duplication
Multiple applications can use the same data and can
evolve separately over time
11
Advantages of Database over
File-based datasets
User knowledge can be transferred between applications more easily because database remains constant
Facilitated data sharing, with a corporate view provided to data managers and users
Security and standards for data and data access can be established and enforced
12
Database Management System
A software application designed to organize the efficient and effective storage and access to data
A suite of software programs designed to store, retrieve and manipulate data within a database
13
Types of Database Structure
1. Hierarchical Data Structures
2. Network Systems
3. Relational Database Structures
14
Hierarchical Data Structure
‘one-to-many’ or ‘parent-child’ relationship
Implies that each element has a direct relationship
to a number of symbolic children
Each child is capable of having the same direct
relationship with his/her own offspring, and so on.
15
Hierarchical Data Structure
16
Hierarchical Data Structure
Advantages:
Simple and straightforward data access since parent
and children are directly linked
Easy to search since structure is well defined
Relatively easy to expand by adding new branches
and formulating new decision rules
17
Hierarchical Data Structure
Disadvantages:
Confined to queries along one branch only
Difficult restructuring to allow other possible search
criteria
Creates large index files
Redundant entries for searching
18
Network Systems
‘many-to-many’ relationship
Each individual data is linked directly to
anywhere in the database using pointers,
without the parent-child relationship.
19
Network Systems
20
Network Systems
Advantages:
Less rigid compared to hierarchical structure
Can handle many-to-many relationships
Allows much greater flexibility
Reduced redundancy of data
21
Network Systems
Disadvantages:
In very complex GIS, the number of pointers can become large, thus requiring a lot of storage space
Linkages between data must still be explicitly defined using pointers
Numerous possible linkages can become extremely tangled, resulting to confusion and incorrect linkages
Not recommended for novice users
22
Relational Database
Management Systems
(RDBMS)
Data are stored as ordered records or rows of attribute values called tuples
Tuples are grouped with corresponding data rows in a form called relations
Each column represents data for a single attribute for the entire dataset
23
Relational Database
Management Systems
(RDBMS)
Primary key – a column which is used to define
the search strategy or criterion
Foreign key – column in the second table to
which the primary key is linked
24
Relational Database
Management Systems
(RDBMS)
Normal forms – set of rules to indicate the
forms that the tables should take
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
25
First Normal Form
Table must contain columns and
rows
Because the columns are to be
used as search keys, there should
only be a single value in each row
location
26
Second Normal Form
Requires that every column that is
not a primary key be totally
dependent on the primary key
Simplifies the tables
Reduces redundancy by imposing the
restriction that each column be only
searchable using the primary key
27
Third Normal Form
States that columns that are not primary keys must “depend” on the primary key, whereas the primary key does not depend on the nonprimary key Primary key must be used to find other
columns
But the other columns are not needed to search for values in the primary key column
Idea is to reduce redundancy
28
Relational Database
Management Systems
(RDBMS) Advantages:
Allow us to collect data in reasonably simple tables, keeping organization also simple
Capable of doing relational joins, as long as there is at least one column common to the tables to be joined
Allows greatest flexibility, both in design and querying
29
Data Storage in a DBMS
Object classes/layers are stored in database tables
Each layer is stored as a single database table in a database management system
Rows contain objects, while columns contain attributes/properties of the objects
30
Data Storage in a DBMS
Geographic database tables have a geometry column (or shape column), which non-geographic tables don’t have
Each layer is stored as a single database table in a database management system
Rows contain objects while columns contain attributes/properties of the objects
31
Basic Database
Functions/Operations
Join
Tables are joined together using common row/column
values or keys
After joining two or more tables, a new table is created
which contains all the values of the joined tables
Database tables can be joined together to create new
relations, or views of the database.
32
Basic Database
Functions/Operations
Link
Tables are linked using common row/column values or
keys
Unlike in joining, linking tables does not result to a new
table. The original tables are retained but accessing one
enables the user to also access a table linked to it
33
Database Design
Involves three stages: conceptual, logical,
and physical
Involves six practical steps (see Figure)
34
Stages of Database Design
Conceptual Model
User View
Object
and
Relationships
Geographic
Representation
Logical Model
Geographic
Database
Types
Geographic
Database
Structure
Physical Model
Database
Schema
35
Conceptual Model
Steps involved are:
1. Model the user’s view
2. Define objects and their relationships
3. Select geographic representation
36
Model the User’s View
Identifying organizational functions, determining data requirements of these functions, organizing data into groups for data management
May be presented using a report with tables
37
Define Objects and Their
Relationships
Specification of object types/classes and
functions, and their relationships
May be presented using diagrams
38
Select Geographic
Representation
Choosing between the types of discrete objects (point, line, or polygon) or field to represent the data
Selection has a critical impact on the database use
Although it is possible to switch between representations later on, it would be computationally expensive and would lead to information loss
39
Logical Model
Steps involved are:
1. Match to geographic database types
2. Organize geographic database structure
40
Match to Geographic Database
Types
Matching of object types to be studied to
specific data types supported by the GIS
41
Organize Geographic Database
Structure
Defining topological associations, specifying
rules and relationships, and assigning
coordinate systems
42
Physical Model
Step involved is:
1. Define database schema
definition of the actual physical database
schema that will hold the database data values
usually created using the DBMS software’s data
definition language (ex. SQL)
43
Database
Organization/Structuring
Necessary for efficient query, analysis, and
mapping
44
Structuring Techniques
1. Topologic Creation
2. Indexing
45
Topologic Creation
Can be created for vector data using either batch or interactive techniques
Batch Topology – for CAD, survey, simple feature and other unstructured vector data
– an iterative process
Interactive Topology – performed dynamically at the time objects are added to the database
46
Indexing
Can help speed up certain types of queries
Three main indexing methods in GIS are grid indexes, quadtrees, and R-trees.
Database index – a special representation of information about objects that improves searching
47
Thank you!