Lecture 7 and 8 Databases and Information Management

download Lecture 7 and 8 Databases and Information Management

of 33

Transcript of Lecture 7 and 8 Databases and Information Management

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    1/33

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    2/33

    The need for a database

    solution

    Organisations have to collate, process and

    output vast amounts of data and information

    Manual or multiple system options not viablewhen dealing with possibly millions of

    customers in a global market

    Need for accessible informationtouch of a

    button/key Need for up-to-date information

    Need for information security

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    3/33

    The need for structure Our ability to use data improves when we

    introduce more structure in how data is stored. The need to go through all the data to reach a

    specific item or customer is now not arequirement.

    The more the structure, the easier to sift or minethrough large amounts of data.

    Think of a structure as an address. The moreprecise the address we have, the easier it is to find

    the associated data. Similarly the structure of the data tells the system

    where to look for data and avoid unnecessaryeffort.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    4/33

    Structure example Ever wondered why an Internet search can

    return the addresses of home pages thatcontain a word so quickly.

    The answer is because the addresses arestored under a list of words arrangedalphabetically.

    When you enter a word, the computer jumpsto the appropriate row of the Table without

    any processing of information in other rows. Thus, it is possible to peruse very large

    databases in a fraction of a second.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    5/33

    Database models and

    structures

    There are four main types of databases:

    flat

    relational

    hierarchical

    distributed

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    6/33

    Flat structure databases Flat data models have the least amount of

    structure. They typically take the form of onelarge table, where the first row is the list of thevariables and subsequent rows are data

    Each case has a row of data

    Many statisticians use flat models

    Advantages

    Most software include free access to flat datafiles. For a small number of cases, flatdatabases do a reasonably fast job

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    7/33

    Flat structure databases

    Disadvantages

    Flat databases waste computer storageas it is required to keep information onitems that logically cannot be available.

    For example, if information onpostcodes/zip codes is required, flat files

    require us to enter missing information forzip codes in foreign countries.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    8/33

    Flat structure databases

    Hierarchical and structural databasesavoid this problem by defining different

    tables for classes of countries in whichpost/zip code is not available.

    Thus flat files keep large sparse data full ofmissing information.

    When the size of database is large, searchthrough the data takes a long time

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    9/33

    Flat structure databases Flat databases are not conducive to complicated

    search queries that divide the database further.

    For example, in a flat file about students it wouldbe difficult to find all students that live in a certainpost/zip code that have received a grade C.

    Such a query will require repeated pass throughthe data.

    First pass may identify all students who live in a

    post/zip code, second pass may identify allstudents who have a grade C and the third passmay find students in both groups

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    10/33

    Flat structure databases

    Such multiple passes through data areinefficient and take a long time:

    Since every simple ifstatement (ifpost/zipcode is equal to SE1 then ...) takes afraction of a secondeach subsequentsearch accumulates more time

    Efficient search methods are important forlarge databases

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    11/33

    Flat structure database

    example

    Student ID Name Mid termgrade

    Finalgrade

    Address Post/zipcode

    ... .....

    4561 Alison Safaie B A 13 Manor Park SE1 ... ... ..

    7878 Mike Smith C B 24 Curzon Street NW14 ... ... ..

    8954 SherifMohamed

    A C 21a High Street NW14 ... ... ..

    Table 1: Example of a flat file

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    12/33

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    13/33

    Relational databases This information in essence says that the patient ID

    number, name and medical record numberbelong to the same person.

    To effectively store both information items and therelationships among these items, relational data iskept in table formats.

    The first row of the table shows the name of thevariables and subsequent rows are data. All itemsin the same row usually belong to the same case

    and are related. One column of the table istreated as the key to the table. Numbers orcharacters in this column corresponds to items inother tables. Thus it is possible to move from onetable to another.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    14/33

    Relational databases

    StudentID

    Keycolumn

    Name Mid-term

    Final

    4561 AlisonGhadiri

    B A

    7878 Mike Smith C B

    8954 SherifMohamed

    A C

    Table 2: Table for "Students grades"

    StudentID

    Keycolumn

    Address Post/zip code

    4561 13 Manor Park SE1

    7878 24 CurzonStreet

    NW14

    8954 21a HighStreet NW14

    Table 3: "Students' contact information

    ExampleHere is an example of a Table ofgrades for the example introducedunder flat files

    This is an example of an additional tablefor contact information:

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    15/33

    Relational database queries

    When a query is made, the relationaldatabase searches through its tables to find

    the answer. Often the answer involves information pooled

    together from different tables.

    For example, a query for names of peoplewith grade of A, can be answered from the

    table of grades. The query for grades ofpeople in post/zip code area NW14 must beanswered from both tables.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    16/33

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    17/33

    Hierarchical databases Hierarchical database models are one type

    of relational data models based on a parent-

    child relationship Hierarchical database models resemble al

    tree structure

    Example

    The file directory on your desktop is an

    example of a hierarchical database system. A folder may contains other folders which may

    contain other files, which contain data.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    18/33

    Hierarchical databases

    advantages

    In hierarchical models, children inherit therelationships and characteristics of theirparents.

    An operation on the parent affects thechildren; if you tell the computer to do anoperation on a folder, the operation affectsall the folders/children and files that itcontains.

    For example, if you delete the top folder, youwould delete all of the folders and files itcontains. This feature saves time for theperson maintaining the files.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    19/33

    Distributed databases Most databases physically reside in one

    place. They may be backed up to another placebut the elements of the database are notmaintained in different places.

    In a distributed database, data is kept in differentsettings and on different computers. One orseveral central computers maintain indexes towhere the data is. Using the address of the data

    computers can then communicate and find theinformation needed.

    Distributed databases need not only addresses forwhere the data is but they also need an audit trailof who has updated data or retrieved it.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    20/33

    Distributed databases Audit data is needed in order to pinpoint errors in

    the system and in order to understand whereconfidentiality of the system breaks down.

    When a computer requests data from another, anaudit trail is created by storing who sent the datawhere and when.

    When this computer passes the data to another,the information needs to be updated in theoriginal computer.

    As the number of computers receiving the dataincreases the task of auditing becomes moredifficult.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    21/33

    Distributed database example

    A good example of a distributed database isWorld Wide Web pages. These pages of data arekept on a different computers, often referred to asWeb servers.

    The address of each file is the location addressyou enter when you want to see the page. Thislocation address is an index to the Web pages.

    Centralised computers keep the beginning of

    these addresses, called domains. Subsequent detailed addresses are kept at the

    Web servers.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    22/33

    Distributed database searches

    When searching a distributed database, two stepsmust occur.

    First a program, sometimes called crawler, mustindex the content of the databases, then anotherprogram, often called a search engine, wouldsearch the index for your request.

    When a match is found, the index is used to findthe address of the information. This address is

    provided to the engine that assembles a list foryou.

    When you click on the items identified by thesearch engine, you use the address to retrieve thedata items it has found.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    23/33

    Database considerations for

    an organisation

    In terms of database management anorganisation has to consider the following:

    How to store their datawhichmodel/approach

    Data security

    Cost of a DBMS (database management

    system) initial and maintenanceCentralised or de-centralised approach

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    24/33

    Data considerations for an

    organisation What suits there needs, size and works with their

    business processes and functions How is the data going to be stored on a single

    DBMS or distributed across a number of databases(centralised v decentralised approach)

    How to protect the data from internal andexternal threatshackers, viruses, natural disasters,incompetence of end users

    Cost of design, implementation, management of

    the systemneed for a database manager oradministrator(s)

    Decentralised or centralised approach (see nextslides)

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    25/33

    Decentralised or centralised?

    Decentralised databases exchange files andtherefore may exchange corrupted files or

    viruses that may affect the entiresystem. Security of these databases aredifficult to maintain.

    In decentralised databases the type of datato be exchanged, the process of addressingthe data, and the protocol for updating thedata must be agreed upon ahead of timeand plans must be in place for updating theprocess.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    26/33

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    27/33

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    28/33

    Data-less information systems A data-less information system contains no

    centralised data and relies on distributeddatabases.

    The Data-less Information System relies on rapidcommunication to construct the data at the pointof need for the data.

    A data-less information system also does notrequire any additional hardware; instead it relieson the hardware of existing organisations and

    institutions. e.g. Wavenet relying on weather and climate

    systems around the world

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    29/33

    Components of a data-less IS

    There are three components to a data-less information system:

    Decoder

    Communicator

    Analysis

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    30/33

    Decoder This software resides in the database of

    participating organisations. It reads andanalyses the structure of the other systems

    records. It uses known characteristics of the data,

    national and international standard of data,and the supplied format of the data todecipher the format and structure of thedata.

    When it is finished understanding the structureof the data, it sends a message to other unitsacknowledging participation

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    31/33

    Communicator This software resides in the database of participating

    organisations systems. I

    It verifies any consent that is required (if required e.g.

    patient medical records), the nature of data to becollected, and the systems suggestion about whereto look for the data.

    This software contacts all sites that have a reportingdecoder and collects the necessary information,including the possibility of different possibleinterpretation of the same data at different sites. The

    data collected by the software is for one time useand would be erased after the use.

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    32/33

  • 8/12/2019 Lecture 7 and 8 Databases and Information Management

    33/33