Unit4 Database
-
Upload
kim-kishi-sena -
Category
Documents
-
view
289 -
download
3
description
Transcript of Unit4 Database
-
Wang,X., ENGO351
Unit 5: Data Management
Unit 4: Data Management
htt
p:/
/ww
w.d
ata
go
ve
rna
nce
.co
m/
-
Wang,X., ENGO351
Unit 5: Data Management
Data management: Outline
Definition of a database
Database management systems
Conceptual data modelling
The entity-relationship model
Logical data modelling
The relational database model
The standard query language (SQL)
-
Wang,X., ENGO351
Unit 5: Data Management
Database
A database is a logical, structured collection of data about
things, their attributes and their relationships to each other
A database is a collection of related data.
Within a GIS, these things and relationships have a
spatial component
-
Wang,X., ENGO351
Unit 5: Data Management
Spatial Data Examples
Examples of non-spatial data
Names, UCID, grades of a student
Examples of Spatial data
Census Data
NASA satellites imagery - terabytes of data per day
Weather and Climate Data
Rivers, Farms
Medical Imaging
Exercise: Identify spatial and non-spatial data items in
A phone book
A cookbook with recipes
-
Wang,X., ENGO351
Unit 5: Data Management
Why use database?
Data Redundancy and Inconsistency - The same piece of information may be duplicated in several different files.
Difficulty in Accessing Data: Conventional file processing environments do not allow data to be retrieved in a convenient and efficient manner.
Data Isolation: Data is scattered in a number of different files. Linking these data to provide new information may require a substantial amount of programming to answer each specific query.
Concurrent Access Problems: Multiple users attempting to access the same file may lead to problems.
Security Problems: Certain users should not be allowed to access certain data-sets and not be permitted to edit certain data that they do not own.
Integrity Problems: Some data values should satisfy certain consistency constraints.
-
Wang,X., ENGO351
Unit 5: Data Management
Database management system
A database management system (DBMS) is a collection
of software programs that facilitates the processes of
defining, construction, and manipulating the database for
various applications
A DBMS provides tools for data input, search, retrieval,
manipulation, and output
Most commercial GIS include database management
tools for local databases
ArcGIS uses Microsoft Access
-
Wang,X., ENGO351
Unit 5: Data Management
Data model
A data model is a description or view of the real world and
data modeling is the process that formalizes the description
or view at different levels of data abstraction
The real-world is made up of complex spatial objects and
phenomena
A data model tends to be tailored to a specific application or
problem context
Different users may have different data models
-
Wang,X., ENGO351
Unit 5: Data Management
Data models and levels of abstraction
Data modeling involves three steps corresponding to
increasing levels of abstraction:
conceptual modeling
logical modeling
physical modeling
-
Wang,X., ENGO351
Unit 5: Data Management
A conceptual model represents the users perception of the real world Data abstraction
is strictly limited to the description of the information content of the users view of the real world, without any concern for computer implementation
(Brown, 1997)
Conceptual model (1)
-
Wang,X., ENGO351
Unit 5: Data Management
Conceptual model (2)
The function of a conceptual data model is to provide the necessary language to describe how we naturally conceptualize data organization Although conceptual data modeling is part of the database
design process, conceptual models are database-independent
A conceptual model provides a basic reference for users who need to understand the structure of the data in the system
-
Wang,X., ENGO351
Unit 5: Data Management
Conceptual model (3)
A conceptual data model provides a way to communicate between users, designers and computers
(Worboys, 2004)
-
Wang,X., ENGO351
Unit 5: Data Management
The entity-relationship model
The entity-relationship (E-R) data model is one of the most commonly-used conceptual data models for GIS data modelling
It is based on the concepts of:
entities
attributes
relationships
to represent real-world features, their properties and their relationships
KimHighlight
-
Wang,X., ENGO351
Unit 5: Data Management
The entity-relationship diagram
An entity-relationship diagram (E-R diagram) can be used to express the features and properties of an E-R model
(Lo and Yeung, 2007)
-
Wang,X., ENGO351
Unit 5: Data Management
Entity
An entity is a thing or object in the real world that is distinguishable from all other objects
each person in a classroom is an entity
An entity has an independent existence
physical existence: a person, a car, a forest
conceptual existence: a company, a job, an agenda
Entities can be thought of as nouns.
Examples: a computer, an employee, a song, a mathematical theorem
Entities are represented as rectangles.
-
Wang,X., ENGO351
Unit 5: Data Management
Relationships between entities
A relationship captures how two or more entities are related to one another. i.e. the relationship performs between the entity artist and song
Relationships can be thought of as verbs, linking two or more nouns
Relationships are represented as diamonds, connected by lines to each of the entities in the relationship.
-
Wang,X., ENGO351
Unit 5: Data Management
Attributes
Entity and relations can have attributes. i.e. an employee entity might have an SIN attribute
i.e. entity City has attribute types name, population, density
The attributes are associated to each occurrence of the entity type
Attributes are represented as ellipses connected to their
owning entity sets by a line.
-
Wang,X., ENGO351
Unit 5: Data Management
Relationship types (1)
Relationship types are subdivided into three categories: one-to-one (1:1)
many-to-one (N:1)
many-to-many (M:N)
Participation constraint: entity occurrence may only exist if it participates in a relationship. One can have total and partial participation constraints
i.e. one director is the manager of at most one cinema (double lines means mandatory)
(Worboys, 2004)
A BDouble line m e ans B ca nnot exist
without A. A c an exist without B.
-
Wang,X., ENGO351
Unit 5: Data Management
Relationship types (2)
In a many-to-one relationship, many occurrences of one entity type may have a relationship with at most one occurrence of another entity type i.e. many cinemas are located in a town
(Worboys, 2004)
-
Wang,X., ENGO351
Unit 5: Data Management
Relationship types (3)
In a many-to-many relationship, many occurrences of one entity type may have a relationship with many occurrences of another entity type i.e. many roads connect
many cities
(Worboys, 2004, p. 58)
-
Wang,X., ENGO351
Unit 5: Data Management
Process of creating E-R model
1. Start with textual description
2. Identify entities (nouns)
3. Tabulate entities
4. Determine relationships
5. Determine cardinality ratio/participation constraints
6. Determine attributes & key attributes
-
Wang,X., ENGO351
Unit 5: Data Management
An Example: World Database
Conceptual Model 3 Entities: Country, City, River
2 Relationships: capital-of, originates-in
Attributes listed in the figure
-
Wang,X., ENGO351
Unit 5: Data Management
Logical model (1)
A logical data model represents the real world by means of diagrams, lists, and tables designed to reflect the recording of the data in terms of some formal language
Logical models are software dependent; they must be expressed in terms of the language of a specific database management system (i.e. relational model)
-
Wang,X., ENGO351
Unit 5: Data Management
Logical model (2)
This figure illustrates the logical model that was created from the conceptual model previously shown
(Brown, 1997)
-
Wang,X., ENGO351
Unit 5: Data Management
The relational database model (1)
The relational database model,
developed by Codd in the early 1970s, is
the most widely used logical database
model in the computer industry
It represents the database as a collection
of tables (simple files) called relations.
-
Wang,X., ENGO351
Unit 5: Data Management
The relational database model (2)
(Aronoff, 1993)
-
Wang,X., ENGO351
Unit 5: Data Management
The relational database model (3)
Each table in the
database
represents an entity
type identified in
the data modeling
process
(Brown, 1997)
-
Wang,X., ENGO351
Unit 5: Data Management
Column and tuple
In a particular table,
each column represents an attribute
each row, called a tuple (or record), represents a collection of
data values associated to the occurrence of an entity
(Lo a
nd Y
eung,
2007)
-
Wang,X., ENGO351
Unit 5: Data Management
Relation
A relation can be simply thought of as a table of data
A relation is made up of a set of tuples
There can be any number of tuples in a relation
Within a relation, the logical order of the tuples is not
important
-
Wang,X., ENGO351
Unit 5: Data Management
Domain
A domain is a set of attribute values associated to each
column of a relation
A domain is identified by an attribute name
There are occasions when the values of some attributes
within a particular tuple are unknown or missing
a special value, called null (which is not zero), is assigned
-
Wang,X., ENGO351
Unit 5: Data Management
Primary and foreign keys (1)
Tables, in a relational database, are connected to each
other using keys
A primary key represents one or more attributes (columns
in the table) whose values can uniquely identify a record in
a table
Its counterpart in another table for the purpose of linkage
is called a foreign key
http://www.datagovernance.com/
-
Wang,X., ENGO351
Unit 5: Data Management
Primary and foreign keys (2)
Thus, a key
common to two
tables are used to
establish
connections
between
corresponding
records in the
tables
-
Wang,X., ENGO351
Unit 5: Data Management
Primary and foreign keys (3)
Keys are a simple
way to connect
tables within a
relational database
Collector
-
Wang,X., ENGO351
Unit 5: Data Management
Primary key
Attribute values of the primary key allow users to identify
individual tuples uniquely
e.g. SIN, an identification number
The primary key must not contain null values (Why?)
What if cannot use of a certain attribute alone in a relation
to identify the unique identification of a tuple?
use one or more additional attributes to form a compound key
create an additional attribute to hold a unique identifier for each
occurrence of an entity
-
Wang,X., ENGO351
Unit 5: Data Management
Foreign key
A foreign key is an attribute in a relation that is a primary
key in another relation
The identical values of the primary and the foreign keys
make it possible to logically link different tuples in different
relations
-
35
Join and Relate
Two common operations for linking tables in a relational
database are join and relate.
A join operation brings together two tables by using a
key that is common to both tables.
A relate operation temporarily connects two tables but
keeps the tables physically separate.
-
36
Figure 8.10 (Chang, 2012)
Primary key and foreign key provide the linkage to join the table on the
right to the feature attribute table on the left.
-
37
Figure 8.11 (Chang, 2012)
This example of a many-to-one relationship in the SSURGO database
relates three tree species in cotreestomng to the same soil component
in component.
-
38
Figure 8.12 (Chang, 2012)
This example of a one-to-many relationship in the SSURGO database
relates one soil map unit in mapunit to two soil components in
component.
-
Wang,X., ENGO351
Unit 5: Data Management
Normalisation
Normalising is the splitting of the database into multiple tables.
Two main reasons for normalising a database: prevents unnecessary duplication of data, thus conserving time
and disk space, and in some cases, preventing errors.
makes it easier to extract exactly the information from the database.
Each normal form is built on the normal form before it. e.g. if a table is in 3-NF it is in 2_NF
KimHighlight
KimHighlight
-
Wang,X., ENGO351
Unit 5: Data Management
Normal Forms
1-NF: attributes are atomic or single valued.
2-NF: All non-primary attributes are fully dependent on the primary key; no non-primary attributes are
functionally determined by a subset of the key
3-NF: No non-prime attributes are functionally determined by another non-prime attribute; all non-
primary attributes are directly dependent on the primary
key.
Example: A company runs many projects. Employees work on particular tasks on different projects. How do we invoice the time
each employee allocates to their different tasks on each project?
-
Wang,X., ENGO351
Unit 5: Data Management
Guidelines for relational database design
1. Create E-R model
2. Produce table for each entity
3. For 1:1 relationships,
add foreign key to either participating entities
For 1:M relationships,
add foreign key to entity on "Many" side of relationship
4. For N:M relationships, create a new table
5. Check design is "normalised"
-
Wang,X., ENGO351
Unit 5: Data Management
Advantages of the relational model
A relational database is simple and flexible
Each table in the database can be prepared, maintained,
and edited separately from other tables
The tables can remain separate until a query or an
analysis requires that attribute data from different tables be
linked together
-
Wang,X., ENGO351
Unit 5: Data Management
Physical model
A physical data model describes the physical storage of the data in the computer by record formats and access paths
It is hardware dependent and is concerned primarily with the implementation details of a database
It is intended for the system programmer and database manager, not the general user
(Lo and Yeung, 2007)
KimHighlight
-
Wang,X., ENGO351
Unit 5: Data Management
(Lo and Yeung, 2007)
-
Wang,X., ENGO351
Unit 5: Data Management
SQL
SQL (Structured Query Language) is a data query
language designed for relational databases
SQL has been developed by IBM in the 1970s and many
commercial database management systems such as
Oracle, DB2, Access, and Microsoft SQL Server have since
adopted the query language
This is an English-like language that consists of a set of
powerful and flexible commands for the manipulation of the
data in the relational tables
-
Wang,X., ENGO351
Unit 5: Data Management
Relational algebra
The SQL language is made with a set of relational
operators and commands that can be combined to query a
database. These operators and commands form what is
called relational algebra
-
Wang,X., ENGO351
Unit 5: Data Management
Relational operators (1)
With SQL, six logical operators can be used:
equal to
not equal to
smaller than
greater than
smaller or equal to
greater or equal to
-
Wang,X., ENGO351
Unit 5: Data Management
Relational operators (2)
With SQL, three boolean
operators can be used:
NOT
AND
OR
(Chang,
2006)
The shaded area represents:
the complement of A (NOT)
the union of A and B (OR)
the intersection of A and B (AND)
-
Wang,X., ENGO351
Unit 5: Data Management
SQL commands (1)
SQL is used to create and query the database using a set
of SQL commands. e.g.
the create command creates databases and tables
the select command selects rows that have been inserted into the
tables
By using SQL commands, the user needs only to specify
the tables, columns, and row qualifiers to retrieve any data
item in the entire database
the users do not need to know the technical details of how the data are stored
-
Wang,X., ENGO351
Unit 5: Data Management
SQL commands (2)
SQL commands are used to perform two main functions:
to define the database structure (data definition)
e.g., CREATE
to insert, modify, and retrieve data from the database (data
manipulation)
e.g. SELECT
-
Wang,X., ENGO351
Unit 5: Data Management
CREATE command
To create a database:
CREATE DATABASE database_name
To create a table in a database:
CREATE TABLE table_name (column_name1 data_type,
column_name2 data_type, ....... )
CREATE TABLE ADDRESS_BOOK (
NAME CHAR (30)
COMPANY CHAR (20)
E-MAIL CHAR (25)
)
NAME COMPANY E-MAIL
ADDRESS_BOOK
KimSticky Note;
KimSticky Note;
KimSticky Note;
-
Wang,X., ENGO351
Unit 5: Data Management
INSERT and DELETE
To insert new rows into a table:
INSERT INTO table_name VALUES (value1, value2,....)
INSERT INTO River (Name, Origin, Length) VALUES (Mississippi, USA, 6000)
To delete rows in a table:
DELETE FROM table_name WHERE column_name =
some_value
DELETE FROM City
WHERE Country = Canada;
NAME COMPANY E-MAIL
John Smith Travelcity [email protected]
INSERT INTO ADDRESS_BOOK (NAME, COMPANY, E-MAIL)
VALUES (John Smith, Travelcity, [email protected]) DELETE FROM ADDRESS_BOOK WHERE COMPANY =
Travelcity;
-
Wang,X., ENGO351
Unit 5: Data Management
SELECT command
The select command extracts data items in specified
rows of a table. It returns a new table that has a subset of
tuples of the original.
SELECT * FROM table_name
SELECT column_name(s) FROM table_name
SELECT column_name(s) FROM table_name(s) WHERE conditions
GROUP BY column_name
-
Wang,X., ENGO351
Unit 5: Data Management
World database data tables
-
Wang,X., ENGO351
Unit 5: Data Management
SELECT Example 1.
Simplest Query has SELECT and FROM clauses
Query: List all the cities and the country they belong to.
SELECT Name, Country
FROM CITY
Result
-
Wang,X., ENGO351
Unit 5: Data Management
SELECT Example 2.
Commonly 3 clauses (SELECT, FROM, WHERE) are used
Query: List the names of the capital cities in the CITY table.
SELECT Name
FROM CITY
WHERE CAPITAL=Y
SELECT *
FROM CITY
WHERE CAPITAL=Y
Result
-
Wang,X., ENGO351
Unit 5: Data Management
SELECT Example 3
Query: List the attributes of countries in the Country
relation where the life-expectancy is less than seventy
years.
SELECT Co.Name,Co.Life-Exp
FROM Country Co
WHERE Co.Life-Exp
-
Wang,X., ENGO351
Unit 5: Data Management
Multi-table Query Examples
Query: List the capital cities and populations of
countries whose GDP exceeds one trillion dollars.
Note:Tables City and Country are joined by matching City.Country = Country.Name.
SELECT Ci.Name,Co.Pop
FROM City Ci,Country Co
WHERE Ci.Country =Co.Name
AND Co.GDP >1000.0
AND Ci.Capital=Y
-
Wang,X., ENGO351
Unit 5: Data Management
Multi-table Query Example
Query: What is the name and population of the capital city
in the country where the St. Lawrence River originates?
SELECT Ci.Name, Ci.Pop
FROM City Ci, Country Co, River R
WHERE R.Origin =Co.Name
AND Co.Name =Ci.Country
AND R.Name =St.Lawrence AND Ci.Capital=Y