Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
44
Chapter 3
Data Base Development
Key words: DBMS, RDBMS, E R diagrams, Database design.
3.1 Data Base Management System
A database management system (DBMS), sometimes just called a database manager, is a
program that lets one or more computer users create and access data in a database. The
DBMS manages user requests (and requests from other programs) so that users and other
programs are free from having to understand where the data is physically located on storage
media, and in a multi-user system, who else may also be accessing the data. In handling user
requests, the DBMS ensures the integrity of the data (that is, making sure it continues to be
accessible and is consistently organized as intended) and security (making sure only those
with access privileges can access the data). In other words, a DBMS is a software
package that helps the use of integrated collection of data records and files known as
databases. It allows different user application programs to easily access the same database.
DBMS allows users and other software to store and retrieve data in a structured way.
3.2 Types of Data Base Management Systems
There are three main types of Database Management Systems (DBMS) and these types are
based upon their management of database structures. In other words, the types of DBMS are
entirely dependent upon how the database is structured by that particular DBMS. The types
of DBMS are:
3.2.1 Hierarchical DBMS: "A DBMS is said to be hierarchical if the relationships among
data in the database are established in such a way that one data item is present as the
subordinate of another one". Here subordinate means that items have 'parent-child'
relationships among them. Direct relationships exist between any two records that are stored
consecutively. The data structure "tree" is followed by the DBMS to structure the database.
No backward movement is possible / allowed in the hierarchical database. Most of the older
DBMS such as Dbase, FoxPro etc are hierarchical which are rarely used now days.
http://en.wikipedia.org/wiki/Data_structureChapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
45
3.2.2 Network DBMS: " A DBMS is said to be a Network DBMS if the relationships among
data in the database are of type many-to-many ". The relationship among many-to-many
appears in the form of a network. Thus the structure of a network database is extremely
complicated because of these many-to-many relationships in which one record can be used as
a key of the entire database. A network database is structured in the form of a graph that is
also a data structure. Though the structure of such a DBMS is highly complicated however it
has two basic elements i.e. records and sets to designated many-to-many relationships.
Mainly high-level languages such as Pascal, COBOL and FORTRAN etc were used to
implement the records and set structures.
3.2.3 Relational DBMS: "A DBMS is said to be a Relational DBMS or RDBMS if the
database relationships are treated in the form of a table". A table that is composed of rows
and columns is used to organize the database and its structure and is actually a two dimension
array in the computer memory. A number of RDBMS are available however the most popular
are Oracle, Sybase, Ingress, Informix, Microsoft SQL Server, Microsoft Access and so on.
The most typical DBMS is a relational database management system (RDBMS). A standard
user and program interface is the Structured Query Language (SQL). A newer kind of DBMS
is the Object-Oriented Database Management System (ODBMS).
Examples of DBMS:
A DBMS can be thought of as a file manager that manages data in databases rather than files
in file systems. In IBM's mainframe operating systems, the no relational data managers were
(and are, because these legacy application systems are still used) known as access methods.
A DBMS is usually an inherent part of a database product. On PCs, Microsoft Access is a
popular example of a single- or small-group user DBMS. Microsoft's SQL Server is an
example of a DBMS that serves database requests from multiple (client) users. Other popular
DBMSs (these are all RDBMSs, by the way) are IBM's DB2, Oracle's line of database
management products, and Sybase's products. IBM's Information Management System (IMS)
was one of the first DBMSs. A DBMS may be used by or combined with transaction
managers, such as IBM's Customer Information Control System (CICS).
http://searchsqlserver.techtarget.com/sDefinition/0,,sid87_gci214260,00.htmlhttp://searchoracle.techtarget.com/sDefinition/0,,sid41_gci213671,00.htmlhttp://searchexchange.techtarget.com/sDefinition/0,,sid43_gci212118,00.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci212472,00.htmlhttp://searchwinit.techtarget.com/sDefinition/0,,sid1_gci211795,00.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci213553,00.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci213849,00.htmlChapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
46
3.3 Database Domains and Transaction
Table 3.1 highlights the multifaceted nature of transportation data (Fletcher 1987).
Transportation entities have obvious physical descriptions but can also have logical
relationships with other transportation entities. Second, entities exist both in the real world
and in the database or virtual world. The relationships between the physical and logical
realms are often one-to-many, creating database design complexities.
Table 3.1: GIS-T modelling transformations
Logical Physical
Real
Legal definitions
- Route
- State trunk network
- County trunk network
- Street network
- Political boundary
Actual facilities
- Highways
- Roads
- Interchanges
- Intersections
- transit terminals, stops
Virtual
Data structures
- Networks
- Chains
- Links
- Nodes
- Lattices
Data values
- Lines
- Points
- Polylines
- Polygons
- Attributes
The real/physical mode corresponds to transportation facilities as constructed and used in the
real world (e.g., physical facilities such as highways, intersections and interchanges). Also in
the real world are real/logical or legally defined transportation entities such as state and
federal routes. The relationship between real/physical entities and real/logical entities are
often one-to-many. These one-to- many relationships occur in both directions. For example,
two state routes may share the same physical highway. Conversely, a state route can (and
often will) traverse several physical streets in an urban area.
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
47
Virtual/logical entities correspond to data structures such as nodes, links, networks and
polygons. Virtual/physical entities correspond to geometric and attribute data associated with
the transportation entity. This latter data is often the information displayed graphically by the
GIS. A one-to-many relationship occurs when two or more network links correspond to the
same graphical line when displaying the network at a given scale (e.g., displaying a two-way
street represented logically by two directed arcs as a single cartographic line at small map
scales). Also, several cartographic lines can represent one link (e.g., displaying modal-
specific flow in a network link).
3.4 RDBMS and Entity Relationship (ER) diagram
An RDBMS may be defined as a DBMS in which data is stored in the form of tables and the
relationship among the data is also stored in the form of tables. It is a system for storing and
working with large databases. Instead of records being stored in some sort of linked list of
free-form records, tables of fixed-length records are used. A linked-list system would be very
inefficient when storing "sparse" databases where some of the data for any one record could
be left empty. The relational model solved this by splitting the data into a series of
normalized tables, with optional elements being moved out of the main table to where they
would take up room only if needed.
For instance, a common use of a database system is to track information about users, their
name, login information, various addresses and phone numbers. In the navigational approach
all of these data would be placed in a single record, and unused items would simply not be
placed in the database. In the relational approach, the data would be normalized into a user
table, an address table and a phone number table (for instance). Records would be created in
these optional tables only if the address or phone numbers were actually provided.
Linking the information back together is the key to this system. In the relational model, some
bit of information is used as a "key", uniquely defining a particular record (Figure-3.1). When
information is being collected about a user, information stored in the optional (or related)
tables would be found by searching for this key. For instance, if the login name of a user is
unique, addresses and phone numbers for that user would be recorded with the login name as
its key. The E-R model treats data as consisting of entities and relationships among entities.
In addition entities also exhibit properties or attributes. Although many conceptual data
modelling exist by far the most common techniques are entity-relationship (E-R) models.
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
48
These are also referred to as E-R diagram. As the term diagram implies the techniques are
predominantly graphical devices. This allows easy communication of fundamental data
properties to users.
The E-R models treats data as consisting of entities and relationship among entities. In
real world; those can be physical (e.g., a person, a house, or a project) or conceptual (e.g., a
job or a project). An entity type is an aggregation of particular entities with the same
attributes.
respect to the database. A relationship describes the set of associations among different entity
types.
Figure 3.2 summarizes a simplified graphical notation set for E-R modelling (also refer to
figure 3.3 for example E/R diagram). A box labelled with the entity name represents the
entity type. Each entity types can have several attributes. E/R diagrams indicate the number
of entities involved in the relationship by the graphical notations at each end of the line
connecting the entity types. The number of entities involved in a relationship can be exactly
one, one or more (i.e., never zero), zero or one, or zero or more. We can also qualify the
relationships by specifying the types of relationship. Two methods are available. One method
uses a diamond shaped polygon containing a label describing the relationship types. This
method allows us to list attributes associated with relationship types. The second, simpler
method labels the line connecting the two entity types with the relation type. This does not
allow attributes to be associated with relationship type. E/R diagram can have much greater
detail; for a more complete graphical notation set.
Figure-3.1: Related Table of RDBMS
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
49
Figure 3.2 provides a simple example of an E-R diagram. The E-R diagram illustrates the
entities Land, Parcel and street address, presumably with the intention of building a cadastral
database. The E-R diagram captures the following relationships.
1. Parcel may have one Land,
2. Land must have one Parcel,
3. Parcel may have zero or more street address, for example, over time or if a
parcel is a corner lot.
4. Street address may relate to one Parcel.
Figure 3.2; Simplified E-R diagram notation
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
50
Figure 3.3: E-R diagram Example
These entities and relationships lead to the following implications;
1. Not all parcels include land; for example, a condominium may not include
ownership of land. This is why street address is related to parcel rather than to
land.
2. A parcel record must exist before a land record can be attached to it.
3. Not all parcels have a Street address. This allows a parcel to be created before
assigning an address.
4. Multiple street addresses may be assigned to a single parcel; for example multiple
buildings on single parcel may have different addresses.
As this example illustrates, an E-R diagram can summarize a large amount of information in
a clear and easily communicated manner.
3.4.1 Node-Arc Model of Transportation Networks and Relational Data Base
A network is a type of graph, a mathematical structure that represents relationships among
entities. Rather than relationships, a network represents interaction or movement between
point locations. Nodes are point locations where flow originates, terminates or relays while
arcs are the conduits for flow between nodes. Arcs connect nodes; these can represent
physical conduits (e.g., a road segment) or logical relationship (e.g., airline service between
two cities). Arcs are directed or undirected. If the arc is directed, the node ordering indicates
the flow direction. An important difference between a network and a graph is that a network
can accommodate weights associated with each arc. Each arc has a weight that represents the
cost incurred by one unit of flow when traversing the arc. In the basic "node-arc"
representation of a transportation network, we deal exclusively with directed networks (that
is, a network consisting of directed arcs) since transportation systems typically have
important directional flow properties (e.g., one-way streets, differences in directional travel
times depending on the time-of-day).
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
51
The most common logical data model used to support the node-arc representation is the
relational model. Figure 3.4 provides a simple example network and Figure 3.5 provides the
normalized relational structure for this network.
Figure 3.4: A Sample Network
Figure 3.5: Normalized Relation for Sample Network
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
52
Other, ancillary relations include turn tables and reference address tables. Turn tables are
relations for storing data on expanded intersection representations. The turn table contains a
tuple corresponding to each direction of travel through an intersection. An additional field
maintains the travel cost associated with that direction of travel (or perhaps a pointer to a
flow cost function). A reserved character (such as a negative number) can indicate a turn
restriction. Similar to the expanded intersection representation in the formal node-arc model,
the turn table strategy is effective but not efficient. Turn tables require adding twelve tuples
to the database for each intersection in the street network. The total can be quite large for a
detailed urban street network.
We often want to include information on address locations within the network. This is useful
for address matching within the network, i.e., georeferencing entities (such as home
addresses, businesses) based on their street address. To maintain address information, we can
arc, as well as other information such
as which side the address range applies and a parity field indicates whether the address
numbers on each side are always even or always odd. The street name corresponding to the
arc often must be partitioned into the following fields: i) a street prefix (e.g., "North"); ii)
street name (e.g., "Oak"); iii) street type (e.g., "Avenue"), and; iv) street suffix (e.g., "East").
3.5 Database design
In early implementations of DBMS, data processing departments continued to design
database applications using methods they had used with conventional files. Therefore, data
were not integrated and redundant data existed. However, the design methodology improved
over the years and database design was divided into various steps. The first step is the
development of the conceptual data model. And the next step is design of logical model.
The conceptual data model is a model of the entities employed in the functional operation of
the enterprise and is usually represented using tools such as an entity-relation chart or an
entity chart. These tools permit a visual data model of the enterprise being developed. This
model provides a vehicle for discussing the functional requirements of the enterprise. By
understanding this model, semantic inconsistencies can be detected and corrected and the
model simplified. After identification of the entities and the subject databases, the location in
which data are stored is determined. Data may be partitioned so they are stored in the location
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
53
where they are used, centralized in one location or replicated in some even many
locations. Each form of distributed databases has its own benefits and limitations. The final
configuration depends on the use of data within the enterprise, availability requirements, and
the cost which management is willing to pay for implementation. After the location where
data will reside is established, functions can be organized by system to provide a blueprint for
application development. The conceptual database design describes the data relationships of
the enterprise independent of any database management system; but the logical database
design then maps the conceptual design into a logical design for a specific model.
During the logical database design, the entities and attributes that support the functions of the
enterprise are analyzed. The affinity matrix is a meaningful tool to create a mathematical
index of the affinity of one attribute to another.
Table-3.2 and 3.3 shows an example from a research study (Butler and Dueker 2001) where
the attempt is to create primary transportation feature-attribute tables for a complete
multimodal transportation facility. The transportation features to be included in the sample
database design are roadways, airport, runways, waterways, railroads, and intersections. The
primary key of each table is underlined. The primary keys of data tables are designed to store
history through the inclusion of a time stamp (entry date and entry time). A field is also
provided to record the name of the person who made the entry. The transportation feature
table includes a data item called the extkeyID, which is an external key identifier to link this
table with an external data table. For instance, the extkeyID could contain the waterwayID for
linking a water-based transportation feature record to the Waterway Table. The value of the
tranfeattypeID will determine the feature table to which the identifier in extkeyID is related.
This approach allows full normalization of the database using look-up tables and
simplification of the naming processing. The tables are described in the following sections.
Being centered on physical transportation features, the design treats utilizing modes as
events. For example, a transit route would be a traversal across one or more transportation
feature segments (raillines and/or roadways), with each segment defined as a linear event on
a transportation feature. A useful design for traversals and other elements of a complete GIS-
T database is provided in the previous work of Dueker and Butler. Time stamps are provided
to support temporal applications, such as the evaluation of traffic accidents based on the
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
54
identifiers, are provided as a managerial means of tracking changes. The sample tables
illustrate the nature of attribute fields that can be included. No intent should be inferred from
the absence of a particular attribute. Where attributes are justified by their potential
usefulness, tables and fields have been included that offer benefits in implementing the
design. For example, we included look-up tables for such defined domain variables as
designator and direction. Not all tables need to be utilized; many are included here to
illustrate the multimodal flexibility of the proposal. Field names have been selected for their
mnemonic value, but are not otherwise critical.
The Transportation Feature Table contains the data needed to describe each feature in the
transportation network. There will be one record for each physical facility on the base map.
The table uses tranfeatID, plus the date and time the record was created, as the primary key to
identify each record. The descriptive data include the beginning and ending milepoints, a
standard name, a separate external key (usually that of the data source), and the direction of
travel. The design assumes that all included transportation features will be of the linear type.
Nonlinear features may be referenced to adjacent linear features. For example, an airport
terminal may be tied to a point on the accessing roadway.
The table 3.2 provides additional details regarding each included data item. In this example,
the jurisdiction domain consists of counties in a single state, so a county line represents a
forced end to each linear feature.
Table 3.2: Transportation Feature Table
Data Item Meaning External Key
entrydate the date that the record was
created
entrytime the time that the record was
created
enteredby the user identification (ID)
of the person creating the
record
tranfeatID the unique numeric identifier
for a transportation feature
tranfeattypeID the unique identifier for a
transportation feature type
Yes
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
55
Data Item Meaning External Key
stateID The unique identifier for the
record containing the USPS
state code
Yes
countyID The unique identifier for the
record containing the FIPS
county code
Yes
designatorID The unique identifier of the
type of agency defining the
feature
Yes
primaryID The unique primary route
identifier created by the
designator (all related
features will carry this same
primary identifier)
Yes
secondaryID The unique secondary route
identifier created by the
designator(realignments,
ramps, and service roads will
carry a secondary identifier
indicates the existing
mainline)
Yes
beginmilelog The milelog measure for the
f
endmilelog The milelog measure for the
directionID The unique identifier of a
direction code
The fields tranfeattypeID, stateID, countyID, designatorID, primaryID, and secondaryID
would be combined to create a single public key for accessing the data without knowledge of
the internal key (tranfeatID). Table 3.3 gives the primary transportation feature-attribute
table.
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
56
Table 3.3: The primary transportation feature-attribute table
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
57
3.5.1 Conceptual Data Model for Integrating Transportation and Spatial Data
Chalasani and Axhausan (2005) developed a conceptual data model to facilitate
understanding of interactions between transport and spatial data. The model contained four
sections: Travel survey data, spatial data, transport data (functional), and transport data
(infrastructure). A logi added to simplify the interactions in the
model.
3.5.1.1 Transport Data (Infrastructure)
Transport infrastructure data contains information about the prevailing infrastructure, i.e. the
static characteristics of the transportation network, represented as a set of links and nodes,
important junctions, public transport stops, etc. The transport network database consists of
two data files, namely links and nodes. A simple ER diagram that represents the transport
network data with two entities is shown in Figure 3.6.
Figure 3.6: ER Diagram for Transportation Network Data.
3.5.1.2 Transport Data (Functional)
Transport functional data carries information about dynamic characteristics of the prevailing
transportation system. Several methods such as traffic volume counts, cordon counts, moving
types: network operational characteristics, such as traffic movements at intersections,
direction of traffic, etc., and public transport operational parameters, such as routes,
schedules, frequencies, etc. A simple ER diagram for functional based transport data is shown
in Figure 3.7.
Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)
Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore
58
Figure 3.7: ER Diagram for Transport Data
3.5.1.3 Transport Survey Data
Traditionally travel survey data is trip based. Each data file contains information on a distinct
type of object, such as households, persons, vehicles, journeys, trips, stages, etc. An entity-
relationship diagram for a typical trip-based travel survey is shown in Figure 3.8.
3.5.1.4 Spatial Data
Spatial data in present context is limited to geo-referenced information i.e. geographic data
and geo-data. The following spatial data sets were used in the development of entity-
relationship diagrams for spatial data:
-data
Top Related