Database Management Systems

download Database Management Systems

If you can't read please download the document

description

DBMS Notes

Transcript of Database Management Systems

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    44

    Chapter 3

    Data Base Development

    Key words: DBMS, RDBMS, E R diagrams, Database design.

    3.1 Data Base Management System

    A database management system (DBMS), sometimes just called a database manager, is a

    program that lets one or more computer users create and access data in a database. The

    DBMS manages user requests (and requests from other programs) so that users and other

    programs are free from having to understand where the data is physically located on storage

    media, and in a multi-user system, who else may also be accessing the data. In handling user

    requests, the DBMS ensures the integrity of the data (that is, making sure it continues to be

    accessible and is consistently organized as intended) and security (making sure only those

    with access privileges can access the data). In other words, a DBMS is a software

    package that helps the use of integrated collection of data records and files known as

    databases. It allows different user application programs to easily access the same database.

    DBMS allows users and other software to store and retrieve data in a structured way.

    3.2 Types of Data Base Management Systems

    There are three main types of Database Management Systems (DBMS) and these types are

    based upon their management of database structures. In other words, the types of DBMS are

    entirely dependent upon how the database is structured by that particular DBMS. The types

    of DBMS are:

    3.2.1 Hierarchical DBMS: "A DBMS is said to be hierarchical if the relationships among

    data in the database are established in such a way that one data item is present as the

    subordinate of another one". Here subordinate means that items have 'parent-child'

    relationships among them. Direct relationships exist between any two records that are stored

    consecutively. The data structure "tree" is followed by the DBMS to structure the database.

    No backward movement is possible / allowed in the hierarchical database. Most of the older

    DBMS such as Dbase, FoxPro etc are hierarchical which are rarely used now days.

    http://en.wikipedia.org/wiki/Data_structure
  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    45

    3.2.2 Network DBMS: " A DBMS is said to be a Network DBMS if the relationships among

    data in the database are of type many-to-many ". The relationship among many-to-many

    appears in the form of a network. Thus the structure of a network database is extremely

    complicated because of these many-to-many relationships in which one record can be used as

    a key of the entire database. A network database is structured in the form of a graph that is

    also a data structure. Though the structure of such a DBMS is highly complicated however it

    has two basic elements i.e. records and sets to designated many-to-many relationships.

    Mainly high-level languages such as Pascal, COBOL and FORTRAN etc were used to

    implement the records and set structures.

    3.2.3 Relational DBMS: "A DBMS is said to be a Relational DBMS or RDBMS if the

    database relationships are treated in the form of a table". A table that is composed of rows

    and columns is used to organize the database and its structure and is actually a two dimension

    array in the computer memory. A number of RDBMS are available however the most popular

    are Oracle, Sybase, Ingress, Informix, Microsoft SQL Server, Microsoft Access and so on.

    The most typical DBMS is a relational database management system (RDBMS). A standard

    user and program interface is the Structured Query Language (SQL). A newer kind of DBMS

    is the Object-Oriented Database Management System (ODBMS).

    Examples of DBMS:

    A DBMS can be thought of as a file manager that manages data in databases rather than files

    in file systems. In IBM's mainframe operating systems, the no relational data managers were

    (and are, because these legacy application systems are still used) known as access methods.

    A DBMS is usually an inherent part of a database product. On PCs, Microsoft Access is a

    popular example of a single- or small-group user DBMS. Microsoft's SQL Server is an

    example of a DBMS that serves database requests from multiple (client) users. Other popular

    DBMSs (these are all RDBMSs, by the way) are IBM's DB2, Oracle's line of database

    management products, and Sybase's products. IBM's Information Management System (IMS)

    was one of the first DBMSs. A DBMS may be used by or combined with transaction

    managers, such as IBM's Customer Information Control System (CICS).

    http://searchsqlserver.techtarget.com/sDefinition/0,,sid87_gci214260,00.htmlhttp://searchoracle.techtarget.com/sDefinition/0,,sid41_gci213671,00.htmlhttp://searchexchange.techtarget.com/sDefinition/0,,sid43_gci212118,00.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci212472,00.htmlhttp://searchwinit.techtarget.com/sDefinition/0,,sid1_gci211795,00.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci213553,00.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci213849,00.html
  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    46

    3.3 Database Domains and Transaction

    Table 3.1 highlights the multifaceted nature of transportation data (Fletcher 1987).

    Transportation entities have obvious physical descriptions but can also have logical

    relationships with other transportation entities. Second, entities exist both in the real world

    and in the database or virtual world. The relationships between the physical and logical

    realms are often one-to-many, creating database design complexities.

    Table 3.1: GIS-T modelling transformations

    Logical Physical

    Real

    Legal definitions

    - Route

    - State trunk network

    - County trunk network

    - Street network

    - Political boundary

    Actual facilities

    - Highways

    - Roads

    - Interchanges

    - Intersections

    - transit terminals, stops

    Virtual

    Data structures

    - Networks

    - Chains

    - Links

    - Nodes

    - Lattices

    Data values

    - Lines

    - Points

    - Polylines

    - Polygons

    - Attributes

    The real/physical mode corresponds to transportation facilities as constructed and used in the

    real world (e.g., physical facilities such as highways, intersections and interchanges). Also in

    the real world are real/logical or legally defined transportation entities such as state and

    federal routes. The relationship between real/physical entities and real/logical entities are

    often one-to-many. These one-to- many relationships occur in both directions. For example,

    two state routes may share the same physical highway. Conversely, a state route can (and

    often will) traverse several physical streets in an urban area.

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    47

    Virtual/logical entities correspond to data structures such as nodes, links, networks and

    polygons. Virtual/physical entities correspond to geometric and attribute data associated with

    the transportation entity. This latter data is often the information displayed graphically by the

    GIS. A one-to-many relationship occurs when two or more network links correspond to the

    same graphical line when displaying the network at a given scale (e.g., displaying a two-way

    street represented logically by two directed arcs as a single cartographic line at small map

    scales). Also, several cartographic lines can represent one link (e.g., displaying modal-

    specific flow in a network link).

    3.4 RDBMS and Entity Relationship (ER) diagram

    An RDBMS may be defined as a DBMS in which data is stored in the form of tables and the

    relationship among the data is also stored in the form of tables. It is a system for storing and

    working with large databases. Instead of records being stored in some sort of linked list of

    free-form records, tables of fixed-length records are used. A linked-list system would be very

    inefficient when storing "sparse" databases where some of the data for any one record could

    be left empty. The relational model solved this by splitting the data into a series of

    normalized tables, with optional elements being moved out of the main table to where they

    would take up room only if needed.

    For instance, a common use of a database system is to track information about users, their

    name, login information, various addresses and phone numbers. In the navigational approach

    all of these data would be placed in a single record, and unused items would simply not be

    placed in the database. In the relational approach, the data would be normalized into a user

    table, an address table and a phone number table (for instance). Records would be created in

    these optional tables only if the address or phone numbers were actually provided.

    Linking the information back together is the key to this system. In the relational model, some

    bit of information is used as a "key", uniquely defining a particular record (Figure-3.1). When

    information is being collected about a user, information stored in the optional (or related)

    tables would be found by searching for this key. For instance, if the login name of a user is

    unique, addresses and phone numbers for that user would be recorded with the login name as

    its key. The E-R model treats data as consisting of entities and relationships among entities.

    In addition entities also exhibit properties or attributes. Although many conceptual data

    modelling exist by far the most common techniques are entity-relationship (E-R) models.

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    48

    These are also referred to as E-R diagram. As the term diagram implies the techniques are

    predominantly graphical devices. This allows easy communication of fundamental data

    properties to users.

    The E-R models treats data as consisting of entities and relationship among entities. In

    real world; those can be physical (e.g., a person, a house, or a project) or conceptual (e.g., a

    job or a project). An entity type is an aggregation of particular entities with the same

    attributes.

    respect to the database. A relationship describes the set of associations among different entity

    types.

    Figure 3.2 summarizes a simplified graphical notation set for E-R modelling (also refer to

    figure 3.3 for example E/R diagram). A box labelled with the entity name represents the

    entity type. Each entity types can have several attributes. E/R diagrams indicate the number

    of entities involved in the relationship by the graphical notations at each end of the line

    connecting the entity types. The number of entities involved in a relationship can be exactly

    one, one or more (i.e., never zero), zero or one, or zero or more. We can also qualify the

    relationships by specifying the types of relationship. Two methods are available. One method

    uses a diamond shaped polygon containing a label describing the relationship types. This

    method allows us to list attributes associated with relationship types. The second, simpler

    method labels the line connecting the two entity types with the relation type. This does not

    allow attributes to be associated with relationship type. E/R diagram can have much greater

    detail; for a more complete graphical notation set.

    Figure-3.1: Related Table of RDBMS

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    49

    Figure 3.2 provides a simple example of an E-R diagram. The E-R diagram illustrates the

    entities Land, Parcel and street address, presumably with the intention of building a cadastral

    database. The E-R diagram captures the following relationships.

    1. Parcel may have one Land,

    2. Land must have one Parcel,

    3. Parcel may have zero or more street address, for example, over time or if a

    parcel is a corner lot.

    4. Street address may relate to one Parcel.

    Figure 3.2; Simplified E-R diagram notation

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    50

    Figure 3.3: E-R diagram Example

    These entities and relationships lead to the following implications;

    1. Not all parcels include land; for example, a condominium may not include

    ownership of land. This is why street address is related to parcel rather than to

    land.

    2. A parcel record must exist before a land record can be attached to it.

    3. Not all parcels have a Street address. This allows a parcel to be created before

    assigning an address.

    4. Multiple street addresses may be assigned to a single parcel; for example multiple

    buildings on single parcel may have different addresses.

    As this example illustrates, an E-R diagram can summarize a large amount of information in

    a clear and easily communicated manner.

    3.4.1 Node-Arc Model of Transportation Networks and Relational Data Base

    A network is a type of graph, a mathematical structure that represents relationships among

    entities. Rather than relationships, a network represents interaction or movement between

    point locations. Nodes are point locations where flow originates, terminates or relays while

    arcs are the conduits for flow between nodes. Arcs connect nodes; these can represent

    physical conduits (e.g., a road segment) or logical relationship (e.g., airline service between

    two cities). Arcs are directed or undirected. If the arc is directed, the node ordering indicates

    the flow direction. An important difference between a network and a graph is that a network

    can accommodate weights associated with each arc. Each arc has a weight that represents the

    cost incurred by one unit of flow when traversing the arc. In the basic "node-arc"

    representation of a transportation network, we deal exclusively with directed networks (that

    is, a network consisting of directed arcs) since transportation systems typically have

    important directional flow properties (e.g., one-way streets, differences in directional travel

    times depending on the time-of-day).

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    51

    The most common logical data model used to support the node-arc representation is the

    relational model. Figure 3.4 provides a simple example network and Figure 3.5 provides the

    normalized relational structure for this network.

    Figure 3.4: A Sample Network

    Figure 3.5: Normalized Relation for Sample Network

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    52

    Other, ancillary relations include turn tables and reference address tables. Turn tables are

    relations for storing data on expanded intersection representations. The turn table contains a

    tuple corresponding to each direction of travel through an intersection. An additional field

    maintains the travel cost associated with that direction of travel (or perhaps a pointer to a

    flow cost function). A reserved character (such as a negative number) can indicate a turn

    restriction. Similar to the expanded intersection representation in the formal node-arc model,

    the turn table strategy is effective but not efficient. Turn tables require adding twelve tuples

    to the database for each intersection in the street network. The total can be quite large for a

    detailed urban street network.

    We often want to include information on address locations within the network. This is useful

    for address matching within the network, i.e., georeferencing entities (such as home

    addresses, businesses) based on their street address. To maintain address information, we can

    arc, as well as other information such

    as which side the address range applies and a parity field indicates whether the address

    numbers on each side are always even or always odd. The street name corresponding to the

    arc often must be partitioned into the following fields: i) a street prefix (e.g., "North"); ii)

    street name (e.g., "Oak"); iii) street type (e.g., "Avenue"), and; iv) street suffix (e.g., "East").

    3.5 Database design

    In early implementations of DBMS, data processing departments continued to design

    database applications using methods they had used with conventional files. Therefore, data

    were not integrated and redundant data existed. However, the design methodology improved

    over the years and database design was divided into various steps. The first step is the

    development of the conceptual data model. And the next step is design of logical model.

    The conceptual data model is a model of the entities employed in the functional operation of

    the enterprise and is usually represented using tools such as an entity-relation chart or an

    entity chart. These tools permit a visual data model of the enterprise being developed. This

    model provides a vehicle for discussing the functional requirements of the enterprise. By

    understanding this model, semantic inconsistencies can be detected and corrected and the

    model simplified. After identification of the entities and the subject databases, the location in

    which data are stored is determined. Data may be partitioned so they are stored in the location

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    53

    where they are used, centralized in one location or replicated in some even many

    locations. Each form of distributed databases has its own benefits and limitations. The final

    configuration depends on the use of data within the enterprise, availability requirements, and

    the cost which management is willing to pay for implementation. After the location where

    data will reside is established, functions can be organized by system to provide a blueprint for

    application development. The conceptual database design describes the data relationships of

    the enterprise independent of any database management system; but the logical database

    design then maps the conceptual design into a logical design for a specific model.

    During the logical database design, the entities and attributes that support the functions of the

    enterprise are analyzed. The affinity matrix is a meaningful tool to create a mathematical

    index of the affinity of one attribute to another.

    Table-3.2 and 3.3 shows an example from a research study (Butler and Dueker 2001) where

    the attempt is to create primary transportation feature-attribute tables for a complete

    multimodal transportation facility. The transportation features to be included in the sample

    database design are roadways, airport, runways, waterways, railroads, and intersections. The

    primary key of each table is underlined. The primary keys of data tables are designed to store

    history through the inclusion of a time stamp (entry date and entry time). A field is also

    provided to record the name of the person who made the entry. The transportation feature

    table includes a data item called the extkeyID, which is an external key identifier to link this

    table with an external data table. For instance, the extkeyID could contain the waterwayID for

    linking a water-based transportation feature record to the Waterway Table. The value of the

    tranfeattypeID will determine the feature table to which the identifier in extkeyID is related.

    This approach allows full normalization of the database using look-up tables and

    simplification of the naming processing. The tables are described in the following sections.

    Being centered on physical transportation features, the design treats utilizing modes as

    events. For example, a transit route would be a traversal across one or more transportation

    feature segments (raillines and/or roadways), with each segment defined as a linear event on

    a transportation feature. A useful design for traversals and other elements of a complete GIS-

    T database is provided in the previous work of Dueker and Butler. Time stamps are provided

    to support temporal applications, such as the evaluation of traffic accidents based on the

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    54

    identifiers, are provided as a managerial means of tracking changes. The sample tables

    illustrate the nature of attribute fields that can be included. No intent should be inferred from

    the absence of a particular attribute. Where attributes are justified by their potential

    usefulness, tables and fields have been included that offer benefits in implementing the

    design. For example, we included look-up tables for such defined domain variables as

    designator and direction. Not all tables need to be utilized; many are included here to

    illustrate the multimodal flexibility of the proposal. Field names have been selected for their

    mnemonic value, but are not otherwise critical.

    The Transportation Feature Table contains the data needed to describe each feature in the

    transportation network. There will be one record for each physical facility on the base map.

    The table uses tranfeatID, plus the date and time the record was created, as the primary key to

    identify each record. The descriptive data include the beginning and ending milepoints, a

    standard name, a separate external key (usually that of the data source), and the direction of

    travel. The design assumes that all included transportation features will be of the linear type.

    Nonlinear features may be referenced to adjacent linear features. For example, an airport

    terminal may be tied to a point on the accessing roadway.

    The table 3.2 provides additional details regarding each included data item. In this example,

    the jurisdiction domain consists of counties in a single state, so a county line represents a

    forced end to each linear feature.

    Table 3.2: Transportation Feature Table

    Data Item Meaning External Key

    entrydate the date that the record was

    created

    entrytime the time that the record was

    created

    enteredby the user identification (ID)

    of the person creating the

    record

    tranfeatID the unique numeric identifier

    for a transportation feature

    tranfeattypeID the unique identifier for a

    transportation feature type

    Yes

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    55

    Data Item Meaning External Key

    stateID The unique identifier for the

    record containing the USPS

    state code

    Yes

    countyID The unique identifier for the

    record containing the FIPS

    county code

    Yes

    designatorID The unique identifier of the

    type of agency defining the

    feature

    Yes

    primaryID The unique primary route

    identifier created by the

    designator (all related

    features will carry this same

    primary identifier)

    Yes

    secondaryID The unique secondary route

    identifier created by the

    designator(realignments,

    ramps, and service roads will

    carry a secondary identifier

    indicates the existing

    mainline)

    Yes

    beginmilelog The milelog measure for the

    f

    endmilelog The milelog measure for the

    directionID The unique identifier of a

    direction code

    The fields tranfeattypeID, stateID, countyID, designatorID, primaryID, and secondaryID

    would be combined to create a single public key for accessing the data without knowledge of

    the internal key (tranfeatID). Table 3.3 gives the primary transportation feature-attribute

    table.

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    56

    Table 3.3: The primary transportation feature-attribute table

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    57

    3.5.1 Conceptual Data Model for Integrating Transportation and Spatial Data

    Chalasani and Axhausan (2005) developed a conceptual data model to facilitate

    understanding of interactions between transport and spatial data. The model contained four

    sections: Travel survey data, spatial data, transport data (functional), and transport data

    (infrastructure). A logi added to simplify the interactions in the

    model.

    3.5.1.1 Transport Data (Infrastructure)

    Transport infrastructure data contains information about the prevailing infrastructure, i.e. the

    static characteristics of the transportation network, represented as a set of links and nodes,

    important junctions, public transport stops, etc. The transport network database consists of

    two data files, namely links and nodes. A simple ER diagram that represents the transport

    network data with two entities is shown in Figure 3.6.

    Figure 3.6: ER Diagram for Transportation Network Data.

    3.5.1.2 Transport Data (Functional)

    Transport functional data carries information about dynamic characteristics of the prevailing

    transportation system. Several methods such as traffic volume counts, cordon counts, moving

    types: network operational characteristics, such as traffic movements at intersections,

    direction of traffic, etc., and public transport operational parameters, such as routes,

    schedules, frequencies, etc. A simple ER diagram for functional based transport data is shown

    in Figure 3.7.

  • Chapter-3: Data Base Development NPTEL Web Course (12th Aug. 2011)

    Course Title: Geo-informatics in Transportation Engineering Course Co-ordinator: Dr. Ashish Verma, IISc Bangalore

    58

    Figure 3.7: ER Diagram for Transport Data

    3.5.1.3 Transport Survey Data

    Traditionally travel survey data is trip based. Each data file contains information on a distinct

    type of object, such as households, persons, vehicles, journeys, trips, stages, etc. An entity-

    relationship diagram for a typical trip-based travel survey is shown in Figure 3.8.

    3.5.1.4 Spatial Data

    Spatial data in present context is limited to geo-referenced information i.e. geographic data

    and geo-data. The following spatial data sets were used in the development of entity-

    relationship diagrams for spatial data:

    -data