Download - Database Management Systems

Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

44

Chapter 3

Data Base Development

Key words: DBMS, RDBMS, E R diagrams, Database design.

3.1 Data Base Management System

A database management system (DBMS), sometimes just called a database manager, is a

program that lets one or more computer users create and access data in a database. The

DBMS manages user requests (and requests from other programs) so that users and other

programs are free from having to understand where the data is physically located on storage

media, and in a multi-user system, who else may also be accessing the data. In handling user

requests, the DBMS ensures the integrity of the data (that is, making sure it continues to be

accessible and is consistently organized as intended) and security (making sure only those

with access privileges can access the data). In other words, a DBMS is a software

package that helps the use of integrated collection of data records and files known as

databases. It allows different user application programs to easily access the same database.

DBMS allows users and other software to store and retrieve data in a structured way.

3.2 Types of Data Base Management Systems

There are three main types of Database Management Systems (DBMS) and these types are

based upon their management of database structures. In other words, the types of DBMS are

entirely dependent upon how the database is structured by that particular DBMS. The types

of DBMS are:

3.2.1 Hierarchical DBMS: "A DBMS is said to be hierarchical if the relationships among

data in the database are established in such a way that one data item is present as the

subordinate of another one". Here subordinate means that items have 'parent-child'

relationships among them. Direct relationships exist between any two records that are stored

consecutively. The data structure "tree" is followed by the DBMS to structure the database.

No backward movement is possible / allowed in the hierarchical database. Most of the older

DBMS such as Dbase, FoxPro etc are hierarchical which are rarely used now days.
http://en.wikipedia.org/wiki/Data_structure



45

3.2.2 Network DBMS: " A DBMS is said to be a Network DBMS if the relationships among

data in the database are of type many-to-many ". The relationship among many-to-many

appears in the form of a network. Thus the structure of a network database is extremely

complicated because of these many-to-many relationships in which one record can be used as

a key of the entire database. A network database is structured in the form of a graph that is

also a data structure. Though the structure of such a DBMS is highly complicated however it

has two basic elements i.e. records and sets to designated many-to-many relationships.

Mainly high-level languages such as Pascal, COBOL and FORTRAN etc were used to

implement the records and set structures.

3.2.3 Relational DBMS: "A DBMS is said to be a Relational DBMS or RDBMS if the

database relationships are treated in the form of a table". A table that is composed of rows

and columns is used to organize the database and its structure and is actually a two dimension

array in the computer memory. A number of RDBMS are available however the most popular

are Oracle, Sybase, Ingress, Informix, Microsoft SQL Server, Microsoft Access and so on.

The most typical DBMS is a relational database management system (RDBMS). A standard

user and program interface is the Structured Query Language (SQL). A newer kind of DBMS

is the Object-Oriented Database Management System (ODBMS).

Examples of DBMS:

A DBMS can be thought of as a file manager that manages data in databases rather than files

in file systems. In IBM's mainframe operating systems, the no relational data managers were

(and are, because these legacy application systems are still used) known as access methods.

A DBMS is usually an inherent part of a database product. On PCs, Microsoft Access is a

popular example of a single- or small-group user DBMS. Microsoft's SQL Server is an

example of a DBMS that serves database requests from multiple (client) users. Other popular

DBMSs (these are all RDBMSs, by the way) are IBM's DB2, Oracle's line of database

management products, and Sybase's products. IBM's Information Management System (IMS)

was one of the first DBMSs. A DBMS may be used by or combined with transaction

managers, such as IBM's Customer Information Control System (CICS).
http://searchsqlserver.techtarget.com/sDefinition/0,,sid87_gci214260,00.htmlhttp://searchoracle.techtarget.com/sDefinition/0,,sid41_gci213671,00.htmlhttp://searchexchange.techtarget.com/sDefinition/0,,sid43_gci212118,00.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci212472,00.htmlhttp://searchwinit.techtarget.com/sDefinition/0,,sid1_gci211795,00.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci213553,00.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci213849,00.html



46

3.3 Database Domains and Transaction

Table 3.1 highlights the multifaceted nature of transportation data (Fletcher 1987).

Transportation entities have obvious physical descriptions but can also have logical

relationships with other transportation entities. Second, entities exist both in the real world

and in the database or virtual world. The relationships between the physical and logical

realms are often one-to-many, creating database design complexities.

Table 3.1: GIS-T modelling transformations

Logical Physical

Real

Legal definitions

- Route

- State trunk network

- County trunk network

- Street network

- Political boundary

Actual facilities

- Highways

- Roads

- Interchanges

- Intersections

- transit terminals, stops

Virtual

Data structures

- Networks

- Chains

- Links

- Nodes

- Lattices

Data values

- Lines

- Points

- Polylines

- Polygons

- Attributes

The real/physical mode corresponds to transportation facilities as constructed and used in the

real world (e.g., physical facilities such as highways, intersections and interchanges). Also in

the real world are real/logical or legally defined transportation entities such as state and

federal routes. The relationship between real/physical entities and real/logical entities are

often one-to-many. These one-to- many relationships occur in both directions. For example,

two state routes may share the same physical highway. Conversely, a state route can (and

often will) traverse several physical streets in an urban area.



47

Virtual/logical entities correspond to data structures such as nodes, links, networks and

polygons. Virtual/physical entities correspond to geometric and attribute data associated with

the transportation entity. This latter data is often the information displayed graphically by the

GIS. A one-to-many relationship occurs when two or more network links correspond to the

same graphical line when displaying the network at a given scale (e.g., displaying a two-way

street represented logically by two directed arcs as a single cartographic line at small map

scales). Also, several cartographic lines can represent one link (e.g., displaying modal-

specific flow in a network link).

3.4 RDBMS and Entity Relationship (ER) diagram

An RDBMS may be defined as a DBMS in which data is stored in the form of tables and the

relationship among the data is also stored in the form of tables. It is a system for storing and

working with large databases. Instead of records being stored in some sort of linked list of

free-form records, tables of fixed-length records are used. A linked-list system would be very

inefficient when storing "sparse" databases where some of the data for any one record could

be left empty. The relational model solved this by splitting the data into a series of

normalized tables, with optional elements being moved out of the main table to where they

would take up room only if needed.

For instance, a common use of a database system is to track information about users, their

name, login information, various addresses and phone numbers. In the navigational approach

all of these data would be placed in a single record, and unused items would simply not be

placed in the database. In the relational approach, the data would be normalized into a user

table, an address table and a phone number table (for instance). Records would be created in

these optional tables only if the address or phone numbers were actually provided.

Linking the information back together is the key to this system. In the relational model, some

bit of information is used as a "key", uniquely defining a particular record (Figure-3.1). When

information is being collected about a user, information stored in the optional (or related)

tables would be found by searching for this key. For instance, if the login name of a user is

unique, addresses and phone numbers for that user would be recorded with the login name as

its key. The E-R model treats data as consisting of entities and relationships among entities.

In addition entities also exhibit properties or attributes. Although many conceptual data

modelling exist by far the most common techniques are entity-relationship (E-R) models.



48

These are also referred to as E-R diagram. As the term diagram implies the techniques are

predominantly graphical devices. This allows easy communication of fundamental data

properties to users.

The E-R models treats data as consisting of entities and relationship among entities. In

real world; those can be physical (e.g., a person, a house, or a project) or conceptual (e.g., a

job or a project). An entity type is an aggregation of particular entities with the same

attributes.

respect to the database. A relationship describes the set of associations among different entity

types.

Figure 3.2 summarizes a simplified graphical notation set for E-R modelling (also refer to

figure 3.3 for example E/R diagram). A box labelled with the entity name represents the

entity type. Each entity types can have several attributes. E/R diagrams indicate the number

of entities involved in the relationship by the graphical notations at each end of the line

connecting the entity types. The number of entities involved in a relationship can be exactly

one, one or more (i.e., never zero), zero or one, or zero or more. We can also qualify the

relationships by specifying the types of relationship. Two methods are available. One method

uses a diamond shaped polygon containing a label describing the relationship types. This

method allows us to list attributes associated with relationship types. The second, simpler

method labels the line connecting the two entity types with the relation type. This does not

allow attributes to be associated with relationship type. E/R diagram can have much greater

detail; for a more complete graphical notation set.

Figure-3.1: Related Table of RDBMS



49

Figure 3.2 provides a simple example of an E-R diagram. The E-R diagram illustrates the

entities Land, Parcel and street address, presumably with the intention of building a cadastral

database. The E-R diagram captures the following relationships.

1. Parcel may have one Land,

2. Land must have one Parcel,

3. Parcel may have zero or more street address, for example, over time or if a

parcel is a corner lot.

4. Street address may relate to one Parcel.

Figure 3.2; Simplified E-R diagram notation



50

Figure 3.3: E-R diagram Example

These entities and relationships lead to the following implications;

1. Not all parcels include land; for example, a condominium may not include

ownership of land. This is why street address is related to parcel rather than to

land.

2. A parcel record must exist before a land record can be attached to it.

3. Not all parcels have a Street address. This allows a parcel to be created before

assigning an address.

4. Multiple street addresses may be assigned to a single parcel; for example multiple

buildings on single parcel may have different addresses.

As this example illustrates, an E-R diagram can summarize a large amount of information in

a clear and easily communicated manner.

3.4.1 Node-Arc Model of Transportation Networks and Relational Data Base

A network is a type of graph, a mathematical structure that represents relationships among

entities. Rather than relationships, a network represents interaction or movement between

point locations. Nodes are point locations where flow originates, terminates or relays while

arcs are the conduits for flow between nodes. Arcs connect nodes; these can represent

physical conduits (e.g., a road segment) or logical relationship (e.g., airline service between

two cities). Arcs are directed or undirected. If the arc is directed, the node ordering indicates

the flow direction. An important difference between a network and a graph is that a network

can accommodate weights associated with each arc. Each arc has a weight that represents the

cost incurred by one unit of flow when traversing the arc. In the basic "node-arc"

representation of a transportation network, we deal exclusively with directed networks (that

is, a network consisting of directed arcs) since transportation systems typically have

important directional flow properties (e.g., one-way streets, differences in directional travel

times depending on the time-of-day).



51

The most common logical data model used to support the node-arc representation is the

relational model. Figure 3.4 provides a simple example network and Figure 3.5 provides the

normalized relational structure for this network.

Figure 3.4: A Sample Network

Figure 3.5: Normalized Relation for Sample Network



52

Other, ancillary relations include turn tables and reference address tables. Turn tables are

relations for storing data on expanded intersection representations. The turn table contains a

tuple corresponding to each direction of travel through an intersection. An additional field

maintains the travel cost associated with that direction of travel (or perhaps a pointer to a

flow cost function). A reserved character (such as a negative number) can indicate a turn

restriction. Similar to the expanded intersection representation in the formal node-arc model,

the turn table strategy is effective but not efficient. Turn tables require adding twelve tuples

to the database for each intersection in the street network. The total can be quite large for a

detailed urban street network.

We often want to include information on address locations within the network. This is useful

for address matching within the network, i.e., georeferencing entities (such as home

addresses, businesses) based on their street address. To maintain address information, we can

arc, as well as other information such

as which side the address range applies and a parity field indicates whether the address

numbers on each side are always even or always odd. The street name corresponding to the

arc often must be partitioned into the following fields: i) a street prefix (e.g., "North"); ii)

street name (e.g., "Oak"); iii) street type (e.g., "Avenue"), and; iv) street suffix (e.g., "East").

3.5 Database design

In early implementations of DBMS, data processing departments continued to design

database applications using methods they had used with conventional files. Therefore, data

were not integrated and redundant data existed. However, the design methodology improved

over the years and database design was divided into various steps. The first step is the

development of the conceptual data model. And the next step is design of logical model.

The conceptual data model is a model of the entities employed in the functional operation of

the enterprise and is usually represented using tools such as an entity-relation chart or an

entity chart. These tools permit a visual data model of the enterprise being developed. This

model provides a vehicle for discussing the functional requirements of the enterprise. By

understanding this model, semantic inconsistencies can be detected and corrected and the

model simplified. After identification of the entities and the subject databases, the location in

which data are stored is determined. Data may be partitioned so they are stored in the location



53

where they are used, centralized in one location or replicated in some even many

locations. Each form of distributed databases has its own benefits and limitations. The final

configuration depends on the use of data within the enterprise, availability requirements, and

the cost which management is willing to pay for implementation. After the location where

data will reside is established, functions can be organized by system to provide a blueprint for

application development. The conceptual database design describes the data relationships of

the enterprise independent of any database management system; but the logical database

design then maps the conceptual design into a logical design for a specific model.

During the logical database design, the entities and attributes that support the functions of the

enterprise are analyzed. The affinity matrix is a meaningful tool to create a mathematical

index of the affinity of one attribute to another.

Table-3.2 and 3.3 shows an example from a research study (Butler and Dueker 2001) where

the attempt is to create primary transportation feature-attribute tables for a complete

multimodal transportation facility. The transportation features to be included in the sample

database design are roadways, airport, runways, waterways, railroads, and intersections. The

primary key of each table is underlined. The primary keys of data tables are designed to store

history through the inclusion of a time stamp (entry date and entry time). A field is also

provided to record the name of the person who made the entry. The transportation feature

table includes a data item called the extkeyID, which is an external key identifier to link this

table with an external data table. For instance, the extkeyID could contain the waterwayID for

linking a water-based transportation feature record to the Waterway Table. The value of the

tranfeattypeID will determine the feature table to which the identifier in extkeyID is related.

This approach allows full normalization of the database using look-up tables and

simplification of the naming processing. The tables are described in the following sections.

Being centered on physical transportation features, the design treats utilizing modes as

events. For example, a transit route would be a traversal across one or more transportation

feature segments (raillines and/or roadways), with each segment defined as a linear event on

a transportation feature. A useful design for traversals and other elements of a complete GIS-

T database is provided in the previous work of Dueker and Butler. Time stamps are provided

to support temporal applications, such as the evaluation of traffic accidents based on the



54

identifiers, are provided as a managerial means of tracking changes. The sample tables

illustrate the nature of attribute fields that can be included. No intent should be inferred from

the absence of a particular attribute. Where attributes are justified by their potential

usefulness, tables and fields have been included that offer benefits in implementing the

design. For example, we included look-up tables for such defined domain variables as

designator and direction. Not all tables need to be utilized; many are included here to

illustrate the multimodal flexibility of the proposal. Field names have been selected for their

mnemonic value, but are not otherwise critical.

The Transportation Feature Table contains the data needed to describe each feature in the

transportation network. There will be one record for each physical facility on the base map.

The table uses tranfeatID, plus the date and time the record was created, as the primary key to

identify each record. The descriptive data include the beginning and ending milepoints, a

standard name, a separate external key (usually that of the data source), and the direction of

travel. The design assumes that all included transportation features will be of the linear type.

Nonlinear features may be referenced to adjacent linear features. For example, an airport

terminal may be tied to a point on the accessing roadway.

The table 3.2 provides additional details regarding each included data item. In this example,

the jurisdiction domain consists of counties in a single state, so a county line represents a

forced end to each linear feature.

Table 3.2: Transportation Feature Table

Data Item Meaning External Key

entrydate the date that the record was

created

entrytime the time that the record was

created

enteredby the user identification (ID)

of the person creating the

record

tranfeatID the unique numeric identifier

for a transportation feature

tranfeattypeID the unique identifier for a

transportation feature type

Yes



55

Data Item Meaning External Key

stateID The unique identifier for the

record containing the USPS

state code

Yes

countyID The unique identifier for the

record containing the FIPS

county code

Yes

designatorID The unique identifier of the

type of agency defining the

feature

Yes

primaryID The unique primary route

identifier created by the

designator (all related

features will carry this same

primary identifier)

Yes

secondaryID The unique secondary route

identifier created by the

designator(realignments,

ramps, and service roads will

carry a secondary identifier

indicates the existing

mainline)

Yes

beginmilelog The milelog measure for the

f

endmilelog The milelog measure for the

directionID The unique identifier of a

direction code

The fields tranfeattypeID, stateID, countyID, designatorID, primaryID, and secondaryID

would be combined to create a single public key for accessing the data without knowledge of

the internal key (tranfeatID). Table 3.3 gives the primary transportation feature-attribute

table.



56

Table 3.3: The primary transportation feature-attribute table



57

3.5.1 Conceptual Data Model for Integrating Transportation and Spatial Data

Chalasani and Axhausan (2005) developed a conceptual data model to facilitate

understanding of interactions between transport and spatial data. The model contained four

sections: Travel survey data, spatial data, transport data (functional), and transport data

(infrastructure). A logi added to simplify the interactions in the

model.

3.5.1.1 Transport Data (Infrastructure)

Transport infrastructure data contains information about the prevailing infrastructure, i.e. the

static characteristics of the transportation network, represented as a set of links and nodes,

important junctions, public transport stops, etc. The transport network database consists of

two data files, namely links and nodes. A simple ER diagram that represents the transport

network data with two entities is shown in Figure 3.6.

Figure 3.6: ER Diagram for Transportation Network Data.

3.5.1.2 Transport Data (Functional)

Transport functional data carries information about dynamic characteristics of the prevailing

transportation system. Several methods such as traffic volume counts, cordon counts, moving

types: network operational characteristics, such as traffic movements at intersections,

direction of traffic, etc., and public transport operational parameters, such as routes,

schedules, frequencies, etc. A simple ER diagram for functional based transport data is shown

in Figure 3.7.



58

Figure 3.7: ER Diagram for Transport Data

3.5.1.3 Transport Survey Data

Traditionally travel survey data is trip based. Each data file contains information on a distinct

type of object, such as households, persons, vehicles, journeys, trips, stages, etc. An entity-

relationship diagram for a typical trip-based travel survey is shown in Figure 3.8.

3.5.1.4 Spatial Data

Spatial data in present context is limited to geo-referenced information i.e. geographic data

and geo-data. The following spatial data sets were used in the development of entity-

relationship diagrams for spatial data:

-data