ICS_2415_ADBS_Sess05_10_AdvTopics

277
 Ky ang’anda S ICS 2415 Advanced Dbase Systems Database Security, Integrity and Recovery

description

good

Transcript of ICS_2415_ADBS_Sess05_10_AdvTopics

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Security, Integrity

    and Recovery

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Security

    and Integrity

    Definitions

    Threats to security

    Threats to integrity

    Resolution of Problems

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Security

    SECURITY Protecting the database from unauthorised users Ensures that users are allowed to do the things they

    are trying to do

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Security

    INTEGRITY Protecting the database from authorised users Ensures that what users are trying to do is correct

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Security

    TYPES OF SYSTEM FAILURES

    1.HARDWAREDISK , CPU , NETWORK

    2. SOFTWARESYSTEM, DATABASE, PROGRAM

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Security

    Important security features include:

    Views

    Authorisation & controls

    User defined procedures

    Encryption procedures

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Authorisation Rules

    An example: a person who can supply a particular password may be authorised to read any record, but cannot modify any of those records.

    Authorisation Table for subjects i.e. Salesperson

    Customer Records Order Records

    Read Y YInsert Y Y

    Modify Y NDelete N N

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Authorisation Rules

    Authorisation Table for Objects i.e. Order Records

    Salesperson Order Entry Accounting

    Password (Zahra) (Maina) (Shirin)

    Read Y Y Y

    Insert N Y N

    Modify N Y Y

    Delete N N Y

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    CONSTRAINTSCan be classed in 3 different ways:

    1. Business constraints

    2. Entity constraints

    3. Referential constraints

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    BUSINESS CONSTRAINTS

    A value in one column may be constrained by value of another or by some calculation

    or formulae.

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    ENTITY CONSTRAINTSIndividual columns of a table may be constrained e.g. not null

    REFERENTIAL CONSTRAINTSSome times referred to as key constraints, e.g.

    Table 2 depends on Table 1

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    create table account_dets(acc_id char(6) primary key,acc_custid char(6) references customer(cust_id),acc_odraft number(4) check (acc_odraft

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    BENEFITS OF USING CONSTRAINTS Guaranteed integrity and consistency Defined as part of table definition

    Applies across all applications

    Cannot be circumvented

    Application development productivity

    Requires no special programming

    Easy to specify and maintain(reduced coding)

    Defined once only

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    CONCURRENCY CONTROL WHAT IS IT?

    The co-ordination of simultaneous requests, for the same data, from multiple users

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    CONCURRENCY CONTROL WHY IS IT IMPORTANT?

    Simultaneous execution of transactions over a shared database may create several data integrity and consistency problems

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    Janet Time John

    1. Read balance (1000)

    1. Read Balance (1000)

    2. Withdraw 200 (800)

    Balance 800 2. Withdraw 300 (700)

    3. Write balance

    Balance 800 3. Write Balance

    Balance 700

    ERROR

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    The three main integrity problems are:

    Lost updates

    Uncommitted data

    Inconsistent retrievals

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    LOCKING

    Two kinds of Locks:

    1. Shared Locks (allows read only access)

    2. Exclusive Locks (prevents reading of a

    record)

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Integrity

    Time

    User 1 User2

    1. Lock record X

    1. Lock record Y

    2. Request record Y

    2. Request Record X

    (Wait for Y) (Wait for X)

    DEADLOCK

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database Recovery

    The process of restoring the database to a correct state in the event of a failure, e.g. System Crashes

    Media Failures

    Application Software Errors

    Natural Physical Disasters

    Carelessness

    Sabotage

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Basic Recovery Facilities Backup Facilities

    Journaling Facilities

    Checkpoint facilities

    Recovery Facilities

    Database Recovery

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Transactions

    Basic unit of recovery Properties of Transaction (ACID)

    Atomicity Consistency Isolation Durability

    Purpose of recovery manager is to enforce Atomicity and Durability

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Staff Salary

    Update Example

    Read Operations: Find address of the disk block that contains record with primary key x transfer block into a DB buffer in main memory

    copy salary data from DB buffer into variable salary

    Write Operations: as steps 1 & 2 above

    copy salary data from variable salary into the DB buffer write DB buffer back to disk

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Storing Data

    Database

    Buffer

    Main Memory

    SecondaryStorage Commit

    Buffer contents flushed to secondary storage permanent

    buffer full

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database(State 1)

    Database(State 2)

    Database(State 3)

    Database(State 4)

    Update Trans1 Update Trans2 Update Trans3

    Database(State 2)

    DatabaseBackup

    Database Update Procedures

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    DBMS provides a mechanism for taking backup copies of the database and log file at regular intervals.

    A dump or copy or backup file contains all or part of the database

    backups taken without having to stop the system

    Back-up Facilities

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    REDO LOGSThis is the main logging file. The file contains two different types of logging records. AFTER IMAGES

    BEFORE IMAGES

    Journal Facilities

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    REDO LOGS - AFTER IMAGESAfter any column of any row on any table in the database is changed, then the new values are not only written to the database but also to the redo log. The complete row is written to the log. If a row is deleted then notification is also put on to the redo log. After images are used in roll forward recovery.

    Journal Facilities

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    REDO LOGS - BEFORE IMAGESBefore a row is updated the data is copied to the redo log. It is not a simple copy from the database because a separate area of the database maintains the immediate pre-update version of each row updated in the database. The extra area is called the ROLLBACK SEGMENT. The redo log takes before image copies from the rollback segment in the database.

    Journal Facilities

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Sample Log File

    Tid Time Operation Object Before

    Image

    After Image pPtr nPtr

    T1 10:12 START 0 2

    T1 10:13 UPDATE TENANT NO21 (old value) (new value) 1 8

    T2 10:14 START 0 4

    T2 10:16 INSERT TENANT NO37 (new value) 3 5

    T2 10:17 DELETE TENANT NO9 (old value) 4 6

    T2 10:17 UPDATE PROPERTY PG16 (old value) (new value) 5 9

    T1 10:18 COMMIT 2 0

    10:19 CHECKPOINT T2

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Duplicate Databases

    Rollback Recovery

    Rollforward Recovery

    Reprocessing Transactions

    Types of Recovery

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Requires 2 copies of the databaseAdvantages

    Fast Recovery (seconds)

    Good for disk failuresDisadvantages

    No protection against power failure

    Expensive

    Duplicate Databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Changes made to the database are undone

    (Backward Recovery )

    Rollback enables the updating to be undone to a predetermined point in the database processing that provides a consistent database state.

    Rollback Recovery

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database(State 1)

    Database(State 2)

    Database(State 3)

    Database(State 4)

    Update Trans1 Update Trans2 Update Trans3

    Database(State 2)

    DatabaseBackup

    Database Update Procedures

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database

    (with

    changes)

    ROLLBACKDatabase

    (without

    changes)

    Before

    Images

    Rollback Recovery

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    This recovery technique updates an out-of-date database up-to-the current processing position.

    If the data is inconsistent then the database may need to rollback to the previous consistent state.

    Roll Forward Recovery

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database(State 1)

    Database(State 2)

    Database(State 3)

    Database(State 4)

    Update Trans1 Update Trans2 Update Trans3

    Database(State 2)

    DatabaseBackup

    Database Update Procedures

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database(withchanges)

    ROLL FORWARD

    Database(without changes)

    After Images

    Roll Forward Recovery

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Similar to Forward Recovery

    Uses update transactions instead of after images

    ADVANTAGES Simple

    DISADVANTAGES Slow

    Reprocessing Transactions

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Database(State 1)

    Database(State 2)

    Database(State 3)

    Database(State 4)

    Update Trans1 Update Trans2 Update Trans3

    Database(State 2)

    DatabaseBackup

    Database Update Procedures

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Problem Recovery Procedure

    Storage Medium

    Destruction

    *Duplicate Database

    Forward Recovery

    Reprocess Transactions

    Transaction error or

    system failure

    *Backward Recovery

    Forward Recovery or reprocess

    transactions - bring forward to

    just before termination

    Incorrect Data *Backward Recovery

    Reprocess Transactions

    (exclusing those from the update

    that created incorrect data)

    Database Recovery Procedures

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Summary

    This lecture has looked at security and recovery procedures

    Ensuring that these two are administered correctly cuts out the majority of problems with database administration

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Further Reading

    SecurityConnolly & Begg, chapter 19

    Concurrency ControlConnolly & Begg, chapter 20?

    Integrity and RecoveryConnolly & Begg, chapters 18 and 19?

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 44

    Advanced Database Security

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 45

    Contents

    Definitions Countermeasures Security Controls Data Protection and Privacy Statistical Databases Web Database Security Issues and Solutions SQL Injection

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 46

    Database Security Definition

    Definition (revisited): The protection of the database against intentional or

    unintentional threats using computer-based or non-computer-based controls

    Areas in which to reduce risk: theft and fraud

    loss of confidentiality

    loss of privacy

    loss of integrity

    loss of availability

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 47

    Countermeasures

    Ways to reduce risk

    Include Computer Based Controls

    Non-computer Based Controls

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 48

    Computer Based Controls

    Security of a DBMS is only as good as the OS

    Computer based Security controls available: authorization and authentication

    views

    backup and recovery

    Integrity

    Encryption within database and data transport

    RAID for fault tolerance

    associated procedures e.g. backup, auditing, testing, upgrading, virus checking

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 49

    Non-computer based Controls

    Include: Security policy and contingency plan

    personnel controls

    secure positioning of equipment

    escrow agreements

    maintenance agreements

    physical access controls Both internal and external

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 50

    Data Security

    Two (original) broad approaches to data security: Discretionary access control

    a given user has different access rights (privileges) on different objects

    flexible, but limited to which rights users can have on an object

    privileges can be passed on at users discretion

    Mandatory access control each data object is labelled with a certain classification level

    each user is given a certain clearance level

    rigid, hierarchic

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 51

    Role Based Access Control

    A specific function within an organisation

    Authorizations are granted to the roles Instead of users

    Users are made members of roles

    Privileges can not be passed on to other users

    Simplifies authorization management

    Supported in SQL

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 52

    System R Authorization Model

    One of the first authorization model for RDBMS As part of System R RDBMS

    Based on concept of Protection Objects Tables and views

    Access modes SELECT

    INSERT

    DELETE

    UPDATE

    Not all applicable for views

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 53

    System R Authorization Model

    Users can give access to other users through use of GRANT and REVOKE

    Removing REVOKE is recursive

    System R has a closed world policy If no authorization then access is denied

    However authorization can be granted later

    Negative authorization Denials are expressed Denials take precedence

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 54

    SQL Facilities

    SQL supports discretionary access control using view mechanism and authorization system

    e.g. CREATE VIEW S_NINE_TO_FIVE ASSELECT S.S#, S.SNAME, S.STATUS, S.CITYFROM SWHERE to_char(SYSDATE, 'HH24:MI:SS) >= 09:00:00AND to_char(SYSDATE, 'HH24:MI:SS)

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 55

    Oracle Virtual Private Databases

    Fine-grained access control based on tuple-level access

    Uses dynamic query modification

    Users are given a specific policy The policy returns a specific WHERE clause in the query

    depending on the policy SELECT * FROM prop_for_rent

    Becomes SELECT * FROM prop_for_rent WHERE prop_type = F

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 56

    Data Protection and Privacy

    Privacy concerns the right of an individual not to have personal

    information collected, stored and disclosed either willfully or indiscriminately

    Data Protection Act the protection of personal data from unlawful

    acquisition, storage and disclosure, and the provision of the necessary safeguards to avoid the destruction or corruption of the legitimate data held

    New Freedom of Information Act

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 57

    Statistical Databases

    A database that permits queries that derive aggregated information (e.g. sums, averages) but not queries that derive individual information

    Tracking possible to make inferences from legal queries to

    deduce answers to illegal ones SELECT COUNT(*) FROM STATS X WHERE X.SEX=M AND

    X.OCCUPATION = Programmer)

    SELECT SUM(X.SALARY) FROM STATS X WHERE X.SEX=M AND X.OCCUPATION = Programmer)

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 58

    Statistical Databases

    Various strategies can be used to minimize problems prevent queries from operating on only a few

    database entries

    swap attribute values among tuples

    randomly add in additional entries

    use only a random sample

    maintain history of query results and reject queries that use a high number of records identical to previous queries

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 59

    Web Database Security Issues

    Internet is an open network traffic can easily be monitored, e.g. credit card numbers

    Challenge is to ensure that information conforms to: privacy, integrity, authenticity, non-fabrication, non-

    repudiation

    Information also needs protected on web server

    Also need to protect from executable content

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 60

    Web Database Security Solutions

    Various methods can be used proxy servers

    improve performance and filter requests

    firewalls prevents unauthorised access to/from a private network

    digital certificates electronic message attachments to verify that user is

    authentic

    Kerberos centralised security server for all data and resources on

    network

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 61

    Web Database Security Solutions

    Secure Sockets Layer and Secure HTTP SSL - secure connection between client and server S-HTTP - individual messages transmitted securely

    Secure Electronic Transactions certificates which splits transactions so that only relevant

    information is provided to each user

    Java - Java Virtual Machine (JVM) class loader - checks applications do not violate system

    integrity by checking class hierarchies bytecode verifier - verify that code will not crash or violate

    system integrity Active-X

    uses digital signatures, user is responsible for security

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 62

    SQL Injection

    a technique used to take advantage of non-validated input vulnerabilities to pass SQL commands through a Web application for execution by a backend database1

    Can chain SQL commands

    Embed SQL commands in a string

    Ability to execute arbitrary SQL queries

    1 http://imperva.com/application_defense_center/glossary/sql_injection.html

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 63

    SQL Injection: Example 1

    Form asking for username and password Original Query:

    SQLQuery = SELECT count(*) FROM users

    WHERE username = + $usename +

    AND password = + $password + ;

    Specify username and password = OR 1 = 1

    SELECT count(*) FROM users WHERE

    username = OR 1 = 1 AND password = OR 1 = 1;

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 64

    SQL Injection : Example 2

    SQLQuery = SELECT * FROM staff WHERE staff_no =

    + $name + ;

    Enter staff_no: 100 OR 1 = 1

    Will give the query: SELECT * FROM staff WHERE staff_no = 100 OR 1

    = 1;

    Even worse: Enter staff_no: 100; DROP TABLE staff; SELECT * FROM sys.user_tables

    Enter staff_no: 100 UNION SELECT SELECT Username, Password FROM Users

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 65

    SQL Injection : Remedies

    Can include:

    Strip quotation marks and other spurious characters from strings

    Use stored procedures

    Limit field lengths or even dont allow text entries

    Restrict UNION

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 66

    Summary

    Have looked at a number of issues and solutions for database security

    e.g. access controls, SQL features, etc.

    Web security is an important problem

    Need to consider security of data transmission, the data server and the clients

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 67

    Further Reading

    Connolly and Begg, chapter 19

    Date (7th edition), chapter 17

    both Connolly and Date have general introductions to security concepts, with mention of some advanced features

    Bertino and Sandhu: Database Security Concepts, Approaches and Challenges, IEEE Transactions on Dependable and Secure Computing, Vol. 2, No. 1, 2005

    Oracle 8i Virtual Private Database White Paper: http://www.oracle.com/technology/deploy/security/oracle8i/pdf/vpd_wp6.pdf

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 68

    Client/Server, Distributed and

    Internet Databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 69

    Client/Server Databases

    Web Databases

    Distributed Databases

    Contents

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 70

    In a file server client architecture each client must run a copy of the DBMS

    A better solution is to have a central database server which performs all database commands sent to it from client PCs.

    Application programs on each client PC can then concentrate on user interface functions.

    Database recovery, security and concurrency control is managed centrally on the server.

    Client/Server Architecture

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 71

    DATABASE SERVER

    The SERVER portion of the client/server database system which provides processing and shared access functions.

    Client/Server Architecture

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 72

    CLIENT Manages the user interface (controls the PC screen,

    interprets data sent to it by the server and displays the results of database queries)

    The client forms queries in a specified language (usually SQL) to retrieve data from the database. This query process is usually transparent to the user.

    Client/Server Architecture

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 73

    CLIENT/SERVER ADVANTAGES Allows companies to harness the benefits of

    microcomputer technology such as low cost.

    Processing can be performed close to the source of the data - more speed.

    Allows the use of GUI interfaces that are commonly available on PCs and workstations.

    Paves the way for truly open systems.

    Client/Server Architecture

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 74

    CLIENT/SERVER DESIGN ISSUES The server must be upgradeable to allow for the

    growth in clients.

    Gateway software is normally required for accessing databases held on a mainframe.

    The server must have capabilities for backup, recovery, security and UPS.

    Client/Server Architecture

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 75

    CLIENT/SERVER DESIGN ISSUES

    Can be complex and so require specialised and expensive tools such as database servers and APIs.

    A lack of comprehensive standards.

    Front-end GUI software often requires expensive client workstations.

    Client/Server Architecture

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 76

    Traditional Client-Server

    Architecture

    Traditional Database Systems are based on a two-tier client-server architecture

    Fat clients

    Client

    Database

    Server

    User interface

    Main business and data

    processing logic

    Server-side validation

    Database access

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 77

    Web Architecture

    Need for enterprise scalability causes problems which can be solved by a three-tier architecture

    Thin clients

    Client

    Database

    Server

    User interface

    Server-side validation

    Database access

    Application

    Server

    Business logic

    Data processing logic

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 78

    Web as a Database Platform

    Advantages

    DBMS advantages

    E.g. transactions, concurrency, synchronisation, security, integrity

    Simplicity

    HTML is a simple markup language, however with new scripting languages this simplicity is being lost

    Platform independence

    Web clients are mostly platform independent

    Graphical User Interface

    Users prefer a GUI to a text based application

    Standardization

    HTML is a de facto standard

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 79

    Advantages (cont).

    Cross-platform support Users on all types of computer can access a machine with a web browser

    Transparent network access Access solely by URL

    Scalable deployment Applications upgraded on server only

    Innovation Organisations can provide new services and reach new customers

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 80

    Web as a Database Platform

    Disadvantages Reliability

    Internet is a slow and unreliable communication medium No guarantee of delivery

    Security Data accessible on web User authentication and secure data transmissions are critical

    Cost A report from Forrester Research claims that maintaining a commercial web

    site costs $200 to $3.4 million

    Scalability Unreliable and potentially very large peak loads Needs highly scalable server architectures

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 81

    Disadvantages (cont.)

    Limited HTML Functionality Need to extend HTML with scripting languages Adds a performance overhead

    Statelessness No concept of a database connection

    Bandwidth Internet is slow! 1.5mbps compared to 10-100mbps

    Performance Many scripting languages are interpreted languages

    Immaturity of development tools This is improving!

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 82

    Web Database Approaches

    Traditional web pages are normally static

    To run queries, need to be able to produce dynamic HTML pages

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 83

    Client Side vs.

    Server Side

    To access database and process information from the database, need executable content

    Acts as a gateway between the Web and the database Server

    This can run at either of two locations Client Side

    Server Side

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 84

    Web Database Approaches

    Approaches include: CGI - Common Gateway Interface HTTP Cookies - allows machine to store information,

    e.g. user authentication JavaScript - code which runs on client machine PHP - Hypertext Preprocessor Active Server Pages - MS Access dynamic forms

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 85

    Database Connectivity

    Client Side, 2 approaches: Extend the browser using scripts, or add-ons or applets,

    e.g. plug-ins, JavaScript, ActiveX, Java applets Link browser to other (external) applications, e.g. legacy systems

    Server Side, 2 approaches: Embed scripts within web page source, e.g. PHP, Java servlets Create programs which are executed when accessed by client, e.g.

    CGI

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 86

    Client Side

    Advantages Distribution of processing

    Feedback speed

    Web-page functionality

    Disadvantages Platform/environment dependent

    Security and integrity

    Download time

    Programming limitations

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 87

    Server Side

    Advantages Platform/browser independent

    Security and integrity

    Download time

    Programming limitations direct access to database

    Disadvantages Lack of debugging tools

    Lack of direct control over user interface

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 88

    DECENTRALIZED DATABASE stored on computers at multiple locations.

    computers are not interconnected by a network.

    users at the various sites cannot share data.

    DISTRIBUTED DATABASE Spread physically across computers in multiple locations that

    are connected by a data communications link.

    Distributed Databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 89

    Geographical Distribution: Several databases run under the control of different CPU's at a variety of different locations.

    Platform Distribution: Databases exist on diverse hardware platforms, and are 'brought together' by the distributed database manager.

    Architectural Distribution: Different database architectures exist together, e.g. an object-oriented database communicating with a relational database

    Distribution Types

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 90

    Distributed Database Requirements: For a distributed database to be as such, a

    fundamental principle must be adhered to: To the user, a distributed database should look exactly like

    a non-distributed system

    Local Autonomy: All operational controls and data maintenance are

    controlled only by that site.

    Dates Rules

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 91

    No Reliance On A Central Site: This follows on from the first objective and is self-

    explanatory

    Continuous Operation: A distributed approach leads to greater reliability

    and availability. The database should still be able tofunction, even if one of its sites is unavailable.

    Dates Rules

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 92

    Distributed Transaction Management: Transaction processing is the key to the successful

    usage of distributed databases.

    Must cater for two core aspects of transaction management i.e. recovery control and concurrency.

    Location Independence Otherwise known as Transparency.

    Dates Rules

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 93

    Fragmentation Independence: Horizontal Partitioning: different rows from the

    same table are stored at different sites.

    Vertical Partitioning: different columns from thesame table are maintained at different sites.

    Replication Independence: Replication occurs when a stored relation can be

    represented by many distinct copies (replicas), stored atmany sites. As with fragmentation, users must not be awarethat the data is replicated.

    Dates Rules

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 94

    Distributed Query Processing: Queries may retrieve information from several

    sites. Therefore distributed queries must beoptimised.

    Dates Rules

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 95

    Hardware Independence: Presenting a 'single-image' system to the end user

    regardless of platform.

    Operating System Independence: Same as above, but based upon software.

    Network Independence: Support for a disparate variety of communication

    networks.

    DBMS Independence: Achieving heterogeneity between different database

    management systems via a common interface, i.e. The SQLlanguage.

    Dates Rules

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 96

    ADVANTAGES Increased reliability and availability

    Encourages local ownership of data

    Modular growth

    Lower communication costs

    Faster response

    Distributed Databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 97

    DISADVANTAGES Software complexity and cost

    Processing overhead

    Data integrity

    Slow response

    Distributed Databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 98

    HOW SHOULD A DATABASE BE DISTRIBUTED ?

    Four basic strategies1. Data replication

    2. Horizontal partitioning

    3. Vertical partitioning

    4. Combinations of the above

    Distributed Databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 99

    Separate copy of the database stored at the different sites.

    Preferred for systems where: Most transactions are read only

    Data is relatively static, for example timetables or catalogues.

    Data Replication

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 100

    Advantages Reliability - If one site fails another copy of the data can be found at a second site.

    Fast response - Each site has a full copy of the data therefore queries can be processed locally.

    Data Replication Advantages

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 101

    Horizontal Partitioning: The base table is split horizontally into several different tables at different sites.

    Selected rows from a table are put into tables at different sites.

    Distributed databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 102

    Advantages Efficiency - Data items are stored where they are most often used away from other applications.

    Optimisation - Data optimised for local use

    Security - Only relevant data is available

    Distributed databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 103

    Disadvantages Inconsistent access speed - When data from several different partitions are required, access speed can vary significantly.

    Backup vulnerability

    Distributed databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 104

    Vertical PARTITIONING

    Some of the columns in a table are projected into a table at one of the sites and other columns are projected into a table at another site.The same advantages and disadvantages of horizontal partitioning apply.

    Distributed databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 105

    Combinations

    To complicate matters even further it is possible to have a strategy which is a combination of all the above. Some data stored centrally, some distributed both horizontally and vertically. It could be a real challenge (or a nightmare).

    Distributed databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 106

    DISTRIBUTED DBMS Determine the location from which data is to be

    retrieved.

    Translate requests from different nodes.

    Provide functions such as security, recovery, concurrency and optimisation.

    Distributed databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 107

    DISTRIBUTED DBMSIT SHOULD ALSO OFFER:

    Location transparency

    Replication transparency

    Failure transparency

    Concurrency transparency

    Commit protocol

    Distributed databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 108

    Further Reading

    Distributed Databases Connolly and Begg, chapter 22

    Web Databases Connolly and Begg, chapter 29

    Sections 29.1 to 29.3

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 109

    Object-Oriented Databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 110

    Contents

    Complex Applications

    RDBMS Weaknesses

    Next Generation Data Models

    Object-Oriented Databases

    Further Reading

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 111

    Relational DBMS Suitability

    Relational DBMS are suitable for certain types of applications simple data types, e.g. dates, strings

    large number of instances, e.g. students, employees

    well defined relationships between data, e.g. student, course relationships and use of joins

    short transactions, e.g. simple queries

    Most successful for business applications On-line transaction processing

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 112

    Complex Applications

    RDBMS are inadequate for applications including: CAD, CAM

    CASE

    Office Information Systems

    Multimedia systems

    GIS

    Science and medicine

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 113

    Complex Applications

    CAD, CAM complex objects graphics a large number of types but few instances of each type hierarchical design not static

    CASE software development lifecycle co-operative engineering concurrent sharing of design code/documentation

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 114

    Complex Applications

    Office Information and Multimedia Systems e-mail support

    documentation

    SGML documents

    Geographic Information Systems spatial and temporal information, e.g. satellite/survey photos, maps

    pattern recognition

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 115

    RDBMS Weaknesses

    Poor separation of real world entities normalisation leads to entities that dont closely match

    real world joins costly

    Semantic overloading all data held as relationships no mechanism for differentiation between entities and

    relationships

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 116

    RDBMS Weaknesses

    Poor support for integrity and enterprise constraints relational systems good for supporting referential, entity and

    simple business constraints

    not good for more complex enterprise constraints

    Homogeneous data structure data pushed into rows and columns

    not all real world data can be organised in this way

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 117

    RDBMS Weaknesses

    Limited operations SQL does not allow new operations to be defined e.g. select age from person;

    Difficulty handling recursive queries e.g. find all ancestors

    Impedance mismatch need to embed SQL to get computational completeness data types in SQL and programming language dont match

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 118

    RDBMS Weaknesses

    Concurrency, schema changes and poor navigational access no support for long duration transactions

    difficult to change schema, e.g. add columns to a table

    RDBMS based on content based access Not navigational

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 119

    Data Models

    1st Generation

    2nd Generation

    3rd

    Generation

    Hierarchical

    Network

    Relational

    Entity-Relational

    Semantic

    Object-Relational Object-Oriented

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 120

    Object-Oriented Databases

    Overview and Origins

    OODB Strategies

    OO Database System Manifesto

    Advantages and Disadvantages

    Object Database Standard OQL

    JDO

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 121

    OO Databases Overview

    Object-Oriented Database e.g. ObjectStore, Objectivity, Jasmine, POET

    based on object-oriented programming techniques

    Information is represented in the form of objects

    Objects: A uniquely identifiable entity that contains both attributes that describe the state of a real-world object and the actions associated with it (Connelly & Begg, 2005)

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 122

    OO Databases Overview

    include concepts such as user extensible type system, complex objects,

    encapsulation, inheritance, polymorphism, dynamic binding, object identity

    ODMG standard being devised to define data model and query language standard also defines interoperability between ODMG compliant

    systems

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 123

    Origins of OO Databases

    Traditional Database Systems persistence, sharing, transactions, concurrency control, recovery

    control, security, integrity, querying

    Semantic Data Models generalisation, aggregation, navigational querying

    Object-Oriented Programming object identity, encapsulation, inheritance, types and classes,

    methods, complex objects, polymorphism, extensibility

    Special Requirements versioning, schema evolution

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 124

    OODBMS Development

    Strategies

    Various approaches: Extend an existing OO-PL with database capabilities

    Provide extensible OO DBMS libraries

    Embed OO database language constructs in a conventional host language

    Extend an existing database language with OO capabilities

    Develop a novel data model/data language

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 125

    Object Oriented DB System

    Manifesto

    Developed by Atkinson et. Al. 1989

    Devised 13 mandatory features for an OODBMS based on two criteria:

    should be an OO system

    should be a DBMS

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 126

    Object-Oriented DB System

    Manifesto

    OO Criteria:1. Complex objects must be supported2. Object identity must be supported3. Encapsulation must be supported4. Types or classes must be supported5. Types or classes must be able to inherit from their

    ancestors6. Dynamic binding must be supported7. The DML must be computationally complete8. The set of data types must be extensible

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 127

    Object-Oriented DB System

    Manifesto

    DBMS Criteria:9. Data persistence must be provided

    10. The DBMS must be capable of managing very large databases

    11. The DBMS must support concurrent users

    12. The DBMS must be capable of recovery from hardware and software

    13. The DBMS must provide a simple way of querying data

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 128

    OODB Advantages

    Enriched modelling capabilities

    Extensibility

    Removal of impedance mismatch

    More expressive query language

    Support for schema evolution

    Support for long duration transactions

    Applicability to advanced database applications

    Improved performance

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 129

    OODB Disadvantages

    Lack of universal data model

    Lack of experience

    Lack of standards

    Query optimisation compromises encapsulation

    Locking at object level may impact performance

    Complexity

    Lack of support for views

    Lack of support for security

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 130

    And most importantly

    What about integrity?

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 131

    Object Database Standard

    Standard for Object-Oriented Data Model proposed by Object Data Management Group

    ODMG Object model is a superset of OMG object model

    Consists of Object model (OM) Object Definition Language (ODL) Object Interchange Format (OIF) Object Query Language (OQL) Language bindings: C++, Smalltalk, Java

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 132

    Object Model

    Basic Constructs object

    literals

    both characterised by types

    objects attributes

    relationships

    operations

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 133

    Types

    Interface Definition defines abstract behaviour of object type e.g. interface Employee{..};

    Class defines abstract behaviour and abstract state extended interface with information for ODMS schema

    definition objects are class instances e.g. class Person{..};

    Literal abstract state of a literal type e.g. struct Complex {float ie, float im;};

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 134

    Types (cont)

    Inheritance applied to both interfaces and classes inheritance of behaviour between object types

    Extend applied to object types only inheritance of state and behaviour

    Extent set of all instances of a class

    extension must have an unique key

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 135

    Objects

    Instances of a class

    Have an unique object identifier remains for lifetime of object

    Names equivalent to global variables

    Lifetime can be transient

    persistent

    type and lifetime are independent

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 136

    Objects

    Collections Set, Bag, List, Array

    Dictionary - sequenced key-value pairs

    Structured objects Date, Interval, Time, Timestamp

    Literals Atomic, Collection, Structured

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 137

    Example ODL Schema

    class Student

    (extent students)

    {

    attribute short id;

    attribute string name;

    attribute string address;

    attribute date dob;

    relationship set takes

    inverse Module takenby;

    short age();

    };

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 138

    Example ODL Schema

    class Module

    (extent modules)

    {

    attribute string title;

    attribute short semester;

    relationship set takenby

    inverse Student takes;

    };

    class Postgrad extends Student

    (extent postgrads)

    {

    attribute string thesis_title;

    };

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 139

    Object Interchange Format

    Used to dump/load current state of ODBMS to/from a set of files

    e.g. Sarah Person{Name Sarah,PersonAddress{Street Willow Lane,

    City Durham,

    Phone {CountryCode 44,

    AreaCode 191,

    PersonCode 1234}}}

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 140

    Object Query Language

    Similar to SQL92 extensions: complex objects, object identity, path

    expressions, polymorphism, operation invocation, late binding

    e.g. select distinct x.agefrom Persons x

    where x.name = Pat;

    Return literal of type setselect distinct struct(a : x.age, s : x.sex)

    from Persons x

    where x.name = Pat;

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 141

    OQL Examples

    Path Expressionsselect c.address

    from Persons p, p.children c

    where p.address.street = Main Street

    and count(p.children) >= 2

    and c.address.city != p.address.city;

    Methodsselect max(select c.age from p.children c)

    from Persons p

    where p.name = Paul;

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 142

    OQL Polymorphism Examples

    Late Bindingselect p.activities

    from Persons p;

    Class Indicationselect ((Student)p).grade

    from Persons p

    where course of study in p.activities;

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 143

    Java Data Objects

    ODMG disbanded in 2001

    ODMG Java Data Binding superceded by JDO: Provides transparent persistence

    Scales from embedded to enterprise

    Integrates with EJB and J2EE

    Is being widely adopted in the database industry

    More information at www.odmg.org

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 144

    Further Reading

    Connolly & Begg, chapters 25, 26 and 27

    Date 7th ed., chapter on Object-Oriented databases

    Atkinson et al, Object-Oriented Database System Manifesto, Proc. 1st Intl Conference on DOOD, Japan, 1989.

    Cattell, et. al., The Object Data Standard: ODMG3.0

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Object-Relational Databases

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Contents

    Background Extensions to Relational Model Database World Advantages and Disadvantages of ORDBMS

    Third Generation Databases Postgres Oracle SQL3 and SQL:2003 Comparison of OO/OR Models Further Reading

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Extensions to Relational Model

    Advanced Emerging Database Applications use: user extensible type system, encapsulation, inheritance, polymorphism,

    dynamic binding, complex objects, object identity

    Extend relational model with OO features: Extended Relational DBMS

    Object-Relational DBMS

    Universal Server

    Standard based on SQL - SQL3 (started 1991!)

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    The Database World

    Stonebraker proposed a four quadrant view of the database world:

    Relational

    DBMS

    Object-

    RelationalDBMS

    File systems Object-

    OrientedDBMS

    Search capabilities/

    multi-user support

    Data complexity/extensibility

    However distinction between OQL and SQL is becoming less clear

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Object-Relational Advantages

    Weaknesses of RDBMS given last time

    Reuse and Sharing extending the DBMS server to perform standard

    functionality centrally

    functionality shared by all applications, e.g. spatial data types

    Evolutionary rather than revolutionary SQL3 upwardly compatible with current SQL standard

    Current standard SQL:2003

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Object-Relational Disadvantages

    Complexity and Associated Increased Costs simplicity and purity of relational model is lost

    majority of applications do not achieve optimal performance

    Semantic gap between object-oriented and relational OO applications not as data centric as Relational

    Objectives of Initial SQL standard were to minimise user effort and be easy to learn

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    3rd Generation Database Manifesto

    Manifesto developed by Stonebraker et. Al (1990)1. A third generation DBMS must have a rich type system

    2. Inheritance is a good idea

    3. Functions, including database procedures and methods and encapsulation, are a good idea

    4. Unique identifiers for records should be assigned by the DBMS only if a user-defined primary key is not available

    5. Rules (triggers, constraints) will become a major feature in future systems. They should not be associated with a specific function or collection

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    3rd Generation DBMS

    6. Essentially, all programmatic access to a database should be through a non-procedural, high-level access language

    7. There should be at least two ways to specify collections, one using enumeration of members, and one using the query language to specify membership

    8. Updateable views are essential

    9. Performance indicators have almost nothing to do with data models and must not appear in them

    10. 3rd generation DBMS must be accessible from multiple high-level languages

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    3rd Generation DBMS

    11. Persistent forms of a high-level language, for a variety of high-level languages, are a good idea. They will all be supported on top of a single DBMS by compiler extensions and a complex runtime system

    12. For better or worse, SQL is intergalactic dataspeak.

    13. Queries and their resulting answers must be the lowest level of communication between a client and a server

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    3rd Generation DBMS

    Atkinson, OODB Manifesto, 1989 Stonebraker et al devised 3rd Generation DB System

    Manifesto in 1990 Darwen and Date published a 3rd Manifesto in 1995 in

    defense of the relational data model certain OO features are desirable, but should be orthogonal to the

    relational model Relational model needs no extension, no correction, no

    subsumption, no perversion SQL is a perversion of the model

    define a language D, but with a front-end layer that allows SQL to be used

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    SQL3 (aka SQL99, and SQL:1999!)

    The ANSI/ISO SQL3 standard includes new features including: row and reference type constructors user defined types (UDTs)

    can participate in supertype/subtype relationships

    user defined procedures, functions and operators type constructors for collection types

    arrays, sets, lists, multisets

    support for large objects BLOBS and CLOBS

    Superceded by SQL:2003

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    SQL:2003

    Row types a data type that can represent types of rows in tables

    e.g.CREATE TABLE branch(

    bno VARCHAR(3),

    address ROW(

    street VARCHAR(25),

    town VARCHAR(15),

    pcode ROW( city_id VARCHAR(4)

    subpart VARCHAR(4))));

    INSERT INTO branch

    VALUES(B5, (22 Deer Rd, Sidcup, (SW1, 4EH)));

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    SQL:2003

    User Defined Types (UDT) abstract data types 2 types, distinct and structured structured types consists of one or more attribute and routine defns

    CREATE TYPE age_type as INTEGER FINAL;

    CREATE TYPE person_type AS (PRIVATE

    date_of_birth DATE CHECK(date_of_birth > DATE 1990-01-01);PUBLIC

    fname VARCHAR(15) NOT NULL,lname VARCHAR(15) NOT NULL,FUNCTION get_age (P person_type) RETURNS age_type

    RETURN /* code to calc age */END; ...

    END) NOT FINAL;

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    SQL:2003

    User defined routines (UDR) may be defined as part of a UDT or as part of a schema can be a procedure, function or method Can be written in SQL or in an external programming

    language

    Polymorphism uses a generalised object model, i.e.

    No two functions in the same schema allowed to have same signature (no. of arguments, same data types, same return type)

    No two procedures allowed to have same name and number of parameters

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    SQL:2003

    Subtypes/supertypes multiple inheritance is not supported substitutability

    when an instance of a supertype is expected, an instance of the subtype can be used in place

    Tables A UDT instance can only persist if stored as a

    column in a table can use table inheritance

    Completely independent from UDT facility

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    SQL:2003

    Querying uses SQL92 syntax with extensions to handle objects

    e.g.

    SELECT s.lname, s.get_age

    FROM staff s

    WHERE s.is_manager;

    SELECT p.lname, p.address

    FROM person p

    WHERE p.get_age > 65;

    SELECT p.lname, p.address

    FROM ONLY (person) p

    WHERE p.get_age > 65;

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    SQL:2003

    Reference Types and OID system generated, type REF Reference types can be used to define relationships

    between row types reference types uniquely identify rows allows rows to be shared across tables complex joins can be replaced by simple path expressions reference types do not provide referential integrity

    Collection types ARRAYs, LISTs, SETs, MULTISETs

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    SQL:2003

    Persistent Stored Modules (SQL/PSM) SQL:2003 now computationally complete

    New statements added: blocks

    Assignment

    IF .. THEN .. ELSE .. ENDIF, and CASE

    REPEAT BLOCKS

    CALL and RETURN for invoking procedures

    Condition handling

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    SQL:2003

    Triggers An SQL statement that is automatically executed by the DBMS as

    a side effect of a modification to a table

    Triggering events include insertion, deletion and update of rows in a table

    Useful for: Verifying input data

    Maintaining complex integrity constraints

    alerts

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Oracle 8

    An object-relational extension to Oracle 7 Object types can be used to create object tables with object

    identifiers: attributes Methods (normally written in PL/SQL)

    Does not support object hierarchies Oracle 9 does support object hierarchies

    New types: VARRAYs and nested tables REFs LOBs

    Oracle 9i, 10g further updates to Oracle 8 Examples in tutorial booklet

  • Kyanganda S. ICS 2415 Advanced Dbase Systemstaken from Connolly and Begg

    Comparison of ORDBMS v OODBMS

    Feature ORDBMS OODBMS

    OID Supported (REF type) Supported Encapsulation Supported (UDT) Broken for queries

    Inheritance Supported Supported

    Polymorphism Supported Supported (OOPL) Complex Objects Supported (UDT) Supported

    Relationships Strongly supported Supported

    Create/Access persistent data

    Supported, not transparent

    Supported, degree of transparancy differs

    Ad hoc query facility Strong support Supported in ODMG2

    Navigation Supported (REF type) Strong support Integrity Constraints Strong supported No

    Object server/page

    server

    Object server Either

    Schema evolution Limited support Varying support

    ACID transactions Strong support Supported

    Recovery Strong support Varying support

    Adv. trans models No support Varying support

    Security, Integrity,

    Views

    Strong support Limited support

  • Kyanganda S. ICS 2415 Advanced Dbase Systems

    Further Reading

    Connolly and Begg, chapter 28 a very good discussion

    Stonebraker, Object-Relational DBMSs: The Next Great Wave, 1996.

    Third Manifesto www.thirdmanifesto.com

    OR and OO manifestos are available from citeseer http://citeseer.ist.psu.edu/

    Dietrich and Urban Advanced Course in DB Systems chapter 8 covers SQL:2003

    Oracle Object-relational Tutorial from module

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 167

    Database Performance

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 168

    Contents

    Database Performance Denormalisation

    Indexes

    Clustering

    Query Optimisation

    Benchmarking Wisconsin, TPC-C, 007, Bucky

    Summary

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 169

    Database Performance

    Query performance is necessary to achieve acceptable performance of a RDBMS

    Various ways in which this can be achieved: De-normalisation of data to reduce joins

    Creating indexes on frequently retrieved attributes

    Clustering tables to reduce the number of disk reads

    Automatic optimisation of queries

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 170

    Normalisation

    Normalisation improves the logical database design and prevents anomalies BUT More tables more joins

    Joining > 3 tables is likely to be slow

    De-normalisation reverses normalisation for efficiency purposes

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 171

    Database Performance

    Example:

    Branch(BranchNo, street, city, postcode, mgrstaffno)

    Could also be:

    Branch(BranchNo, street, postcode, mgrstaffno)

    Postcode(Postcode,city)

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 172

    De-normalisation

    Advantages: Minimises need for joins

    Reduces number of foreign keys in relations

    Reduces number of indexes Saves storage space

    Reduces number of relations

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 173

    De-normalisation

    Disadvantages Speed up retrievals, but may slow down updates

    Increases application complexity

    Relation size can increase

    Sacrifices flexibility

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 174

    Indexes

    INDEXES

    An index is a table or some other data structure that is used to determine the location of a row within a table that satisfies some condition.

    Indexes may be defined on both primary and non key attributes.

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 175

    Indexes

    Oracle allows faster access on any named table by using an index. each row within a table is given a unique value or rowid.

    each rowid can be held in an index.

    an index can be created at any time.

    any column within a table can be indexed.

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 176

    When to create an Index?

    Before any input of data for Unique index

    After data input for Non-unique index

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 177

    Creating Indexes

    HOW DO YOU CREATE AN INDEX ?EXAMPLE :-(a) CREATE INDEX TENIDX ON

    TENANT(SURNAME);

    (b) CREATE UNIQUE INDEX TENIDX ON

    TENANT(SURNAME);

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 178

    Index Guidelines

    GUIDELINES FOR USE OF INDEXES > 200 rows in a table

    a column is frequently used in a where clause

    specific columns are frequently used as join columns

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 179

    Indexes

    POINTS TO WATCH avoid if possible > 3 indexes on any one table

    avoid indexing a column with too few distinct values

    For example:- male/female

    avoid indexing a column with too many distinct values

    avoid if > 15% of rows will be retrieved

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 180

    Clusters

    A disk is arranged in blocks

    Blocks are retrieved as a whole and buffered

    Disk Access time is slow compared with Memory access

    Gains can be made if the number of block transfers can be reduced

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 181

    Database Performance

    CLUSTERING clusters physically arrange the data on disk so that

    frequently retrieved info is stored together

    allows 2 or more tables to be stored in the same physical block

    can greatly reduce access time for join operations

    can also reduce storage space requirements

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 182

    Database Performance

    CLUSTER DEFINITION clustering is transparent to the user

    no queries have to be modified

    no applications need to be changed

    tables are queried in the same way whether clustered or not

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 183

    Database Performance

    DECIDING WHERE TO USE CLUSTERS Each table can only reside in 1 cluster

    At least one attribute in the cluster must be NOT NULL

    Consider the query transactions in the system

    How often is the query submitted?

    How time critical is the query?

    Whats the amount of data retrieved?

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 184

    Clustering Tables

    Street City Postcode Branch

    No

    Staff

    No

    First

    Name

    Last

    name

    Position DOB Salary

    22 Deer St London SW1 4EH B005 SL21 John White Manager 310000

    SL41 Julie Lee Assistant 9000163 Main St Glasgow G11 9QX B003 SG37 Ann Beech Assistant 12000

    SG14 David Ford Supervisor 18000

    SG5 Susan Brand Manager 24000

    Staff TableBranch Table

    Tables Clustered on Common Column

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 185

    Database Performance

    CLUSTERING EXERCISE

    STOCK WAREHOUSE

    PRODUCT

    1000

    3

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 186

    Database Performance

    To speed up access time to data in these three tables (WAREHOUSE, PRODUCT, STOCK) it is necessary to cluster either STOCK around WAREHOUSE, or STOCK around PRODUCT.

    How do we decide which will be the most efficient?

    For the purpose of this exercise we will assume that each block can hold 100 records.

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 187

    Database Performance

    If STOCK is clustered around PRODUCT

    No of products = 1000. There will be 1 record for each PRODUCT in each WAREHOUSE. Therefore each product would have 3 records

    Each block would contain 100/3 products, i.e. 33 products. There would therefore be a 1 in 3 chance of accessing a particular stock item by reading one block of data.

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 188

    Database Performance

    If STOCK is clustered around WAREHOUSE

    No of warehouses = _____. There will be ____ record for each item of STOCK in each warehouse. Therefore each warehouse would have ______ records. The records for each warehouse would have to be stored across ______ blocks.

    Access would therefore be faster if STOCK is clustered around the product.

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 189

    Database Performance

    Select *

    from ...; DBMSDATA FILES

    SQL OPTIMISATION

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 190

    Query Optimisation

    Automatic query optimisation can dramatically improve query execution time

    e.g. Consider the simple SQL queryselect s.student_no, s.student_name, c.course_name

    from student s, course c

    where s.course_id = c.course_id

    and s.age > 25;

    This query is more optimal if the selections and projections are performed before the join

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 191

    Example

    1000 students of which only 100 are over the age of 25, and there are 50 courses

    Alternative 1: Join first read the 1000 students, read all courses 1000 times (once

    for each student), construct an intermediate table of 1000 records (which may be too large to fit in memory)

    restrict the result to those over the age of 25 (100 rows at most)

    project the result over the required attributes

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 192

    Example

    Alternative: Restrict first

    read 1000 tuples but restrict to those over the age of 25, returning an intermediate table of only 100 rows - which has a much better potential of being storable in main memory

    join the result with the course table, again returning an intermediate table of only 100 rows

    project the result over the required attributes

    Obviously this version is BETTER! Could be improved further by doing the projection before the join.

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 193

    Query Processing Stages

    Four stages in query processing1. Cast the query into internal form

    normally tree based (relational algebra)

    2. Convert to canonical form3. Choose candidate low-level procedures

    using indexes, clustering, etc.

    4. Generate query plans and choose and run the optimal query

    based on cost formulas and database statistics Rule or cost based in Oracle

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 194

    Query Cast into

    Internal Form

    S C

    JOIN over course_id

    RESTRICT where age > 25

    PROJECT over student_no,

    RESULT

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 195

    Canonical Form

    Canonical form given a set Q of queries, and a notion of equivalence

    between two queries q1 and q2 in set Q, then there exists a subset C of Q, the set of canonical forms for Q, if and only if every query q in Q is equivalent to only one query c in C.

    The query c is the canonical form of the query q

    Uses expression transformation rules

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 196

    Expression Transformation

    Rules

    Examples (not complete) (A WHERE p1) WHERE p2 == A WHERE p1 and p2 (A PROJECT x,y) PROJECT y == A PROJECT y (A UNION B) PROJECT x == (A PROJECT x) UNION (B

    PROJECT x) (A JOIN B) PROJECT x == (A PROJECT x1) JOIN (B PROJECT

    x2) A JOIN B == B JOIN A (A JOIN B) JOIN C == A JOIN (B JOIN C) (A JOIN B) PROJECT x = A PROJECT x

    where x is FK from B to A

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 197

    Choose Candidate Low-Level

    Procedures

    How to execute the query represented by that converted form

    Take into consideration: Indexes

    Other physical access paths

    Distribution of data values

    Clustering

    Specify as a series of low-level operations

    Each low level operation has a set of predefined implementation procedures

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 198

    Generate Query Plans/Choose

    Cheapest

    Each query plan from stage 3 will have a cost formula generated from the cost formula for each low-level procedure

    Oracle supports Rule Based

    Rank queries according to algebra operations

    15 rules in Oracle

    Cost Based Optimal rule based query may not in fact be optimal due to cost of

    operating query, e.g. join order

    Need to gather statistics

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 199

    Database Performance

    OPTIMIZING PERFORMANCEPerformance can be regarded as a

    balancing act between:-

    access performance

    update performance

    ease of use/modification

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 200

    Benchmarking

    Software and systems development projects include performance evaluation work, but sometimes not sufficient to prevent major performance problems

    Benchmarking is a useful tool which can be used at the prototyping stage to improve performance of the DBMS application

    There are many benchmarks available

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 201

    Database Benchmarking

    A tool for comparing the performance of DBMS summarise relative performance in a single figure

    Usually measured in transactions per second (tps) with a cost measure in terms of system cost over 5 years

    Two principal uses of benchmarks providing comparative performance measures

    a source of data and queries that represent experimental approximations to other problems

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 202

    Data Generation Approaches

    Artificial generation of data with entirely artificial properties designed to investigate particular aspects of a system, e.g. join e.g. Wisconsin

    Synthetic Workload produce a simplified version of an application use synthetically generated data with similar properties to real

    system, e.g. a banking application e.g. Transaction Processing Performance Council (TPC)

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 203

    Wisconsin Benchmark

    First systematic benchmark definition compares particular features of DBMS rather than a

    simple overall performance metric

    A single-user series of tests, comprising selections and projections with varying selectivities on

    clustered, indexed non-indexed attributes

    joins of varying selectivities

    aggregate functions (e.g. min, max, sum)

    updates/deletions involving key/non-key attributes

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 204

    Wisconsin Benchmark

    Straightforward to implement

    Scalable, e.g. parallel architectures

    Useful readily understandable results

    Lack of highly skewed attribute distribution

    Simple join queries

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 205

    TPC-C

    Measures performance of a typical order entry application from initiation at a terminal until response arrives back

    from server

    benchmark encompasses time taken by server, network and other system components

    terminals emulated using a negative-exponential transaction arrival distribution

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 206

    TPC-C Schema

    Taken from TPC-C Standard Specification: available at www.tpc.org

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 207

    TPC-C

    5 transactions covering New order A payment Order status enquiry A delivery A stock level inquiry

    10 terminals at each warehouse All 5 transactions available at each terminal produce an equal number of New-Order and Payment transactions and to

    produce one Delivery transaction, one Order-Status transaction, and one Stock-Level transaction for every ten New-Order transactions

    Metric number of New-Order transactions executed per minute

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 208

    Other TPC Benchmarks

    TPC-H Ad-hoc decision support environments

    TPC-App Application server and web services benchmark

    Further info on www.tpc.org

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 209

    OO7

    A benchmark for Object Database Systems

    Examines the performance characteristics of different types of retrieval/traversal, object creation/deletion and updates and query processor

    A number of sample implementations are provided

    Based on a complex parts hierarchy

    Further info ftp.cs.wisc.edu

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 210

    OO7 Tests

    Test 1: Raw traversal speed, traversal with updates, operations

    on manuals

    Tests with/without full cache

    Test 2: Exact matches, range searches, path lookup, scan, make,

    join

    Test 3: Insert/update a group of composite parts

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 211

    BUCKY

    An Object-Relational Benchmark

    Objective To test the key features that add the object to object-relational

    database systems, as defined by Stonebraker Inheritance, Complex Objects, ADTs

    Not triggers

    Based on a university database schema

    Exists as an object-relational and a relational schema can compare performance tradeoffs between using object aspects

    of DBMS compared to purely relational

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 212

    BUCKY Schema

    Taken from: The BUCKY Object-Relational Benchmark, Carey, el. al

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 213

    BUCKY Queries

    Aim is to test various object queries, involving1. row types with inheritance2. inter-object references3. set-valued attributes4. methods of row objects5. ADT attributes and their methods

    Two BUCKY performance metrics O-R Efficiency Index, for comparing O-R and relational

    implementations O-R Power Rating, for comparing O-R systems

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 214

    Summary

    Database application performance can be improved by Indexes

    Clustering

    De-normalisation

    Query Optimisation

    But it is important at the design stage to ensure that the application is designed with optimal performance in mind Benchmarking is a tool which can help with this

    Its a black art!

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 215

    Further Reading

    Connelly and Begg, chapter 18, 21 Also information in OODB chapter on benchmarking

    Date: chapter on Query Optimisation Bitton, De Witt and Turbyfill Benchmarking

    Database Systems: A Systematic Approach, Proc. 9th VLDB 1983.

    Carey, et. al The BUCKY Object-Relational Benchmark, http://www.cs.wisc.edu/~naughton/bucky.html

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 216

    Data Warehousing and Data

    Mining

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 217

    Contents

    Data Warehousing

    OLAP

    Data Mining

    Further Reading

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 218

    Data Warehousing

    OLTP (online transaction processing) systems range in size from megabytes to terabytes high transaction throughput

    Decision makers require access to all data Historical and current 'A data warehouse is a subject-oriented, integrated,

    time-variant and non-volatile collection of data in support of managements decision-making process'(Inmon 1993)

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 219

    Benefits

    Potential high returns on investment 90% of companies in 1996 reported return of investment

    (over 3 years) of > 40%

    Competitive advantage Data can reveal previously unknown, unavailable and

    untapped information

    Increased productivity of corporate decision-makers Integration allows more substantive, accurate and

    consistent analysis

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 220

    Comparison

    OLTP Data Warehouse

    Holds current data Holds historic data

    Stores detailed data Detailed, lightly/highlysummarised data

    Data is dynamic Data largely static

    Repetitive processing Ad hoc querying, unstructuredand heuristic processing

    High transaction throughput Medium-low level transactionthroughput

    Predictable usage patterns Unpredictable usage patterns

    Transaction driven Analysis driven

    Application oriented Subject oriented

    Supports day-to-day decisions Strategic decisions

    Large number ofclerical/operational users

    Lower number of managerialusers

    Source: Connolly and Begg p1153

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 221

    Typical Architecture

    Warehouse mgr

    Load

    mgr

    Warehouse mgr

    Query

    manager

    DBMS

    Meta-data Highly

    summarized

    data

    Lightly summarized

    data

    Detailed data

    Mainframe operational

    n/w,h/w data

    Departmental

    RDBMS data

    Private data

    External dataArchive/backup

    Reporting query, app

    development,EIS tools

    OLAP tools

    Data-mining tools

    Source: Connolly and Begg p1157

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 222

    Data Warehouses

    Types of Data Detailed

    Summarised

    Meta-data

    Archive/Back-up

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 223

    Information Flows

    Warehouse Mgr

    Load

    mgr

    Warehouse mgr

    Query

    manager

    DBMS

    Meta-

    dataHighly

    summ.

    data

    Lightly

    summ.

    Detailed data

    Operational data

    source 1

    Operational data

    source n

    Archive/backup

    Reporting query, app

    development,EIS tools

    OLAP tools

    Data-mining tools

    Meta-flow

    Inflow

    Downflow

    Upflow

    Outflow

    Source Connolly and Begg p1162

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 224

    Information Flow Processes

    Five primary information flows Inflow - extraction, cleansing and loading of data from

    source systems into warehouse

    Upflow - adding value to data in warehouse through summarizing, packaging and distributing data

    Downflow - archiving and backing up data in warehouse

    Outflow - making data available to end users

    Metaflow - managing the metadata

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 225

    Problems of Data Warehousing

    1. Underestimation of resources for data loading2. Hidden problems with source systems3. Required data not captured4. Increased end-user demands5. Data homogenization6. High demand for resources7. Data ownership8. High maintenance9. Long duration projects10. Complexity of integration

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 226

    Data Warehouse Design

    Data must be designed to allow ad-hoc queries to be answered with acceptable performance constraints

    Queries usually require access to factual data generated by business transactions e.g. find the average number of properties rented out with a

    monthly rent greater than 700 at each branch office over the last six months

    Uses Dimensionality Modelling

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 227

    Dimensionality Modelling

    Similar to E-R modelling but with constraints composed of one fact table with a composite primary key dimension tables have a simple primary key which corresponds

    exactly to one foreign key in the fact table

    uses surrogate keys based on integer values

    Can efficiently and easily support ad-hoc end-user queries

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 228

    Star Schemas

    The most common dimensional model A fact table surrounded by dimension tables Fact tables

    contains FK for each dimension table large relative to dimension tables read-only

    Dimension tables reference data query performance speeded up by denormalising into a

    single dimension table

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 229

    E-R Model Example

    Source: Connolly and Begg

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 230

    Star Schema Example

    Source: Connolly and Begg

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 231

    Other Schemas

    Snowflake schemas variant of star schema

    each dimension can have its own dimensions

    Starflake schemas hybrid structure

    contains mixture of (denormalised) star and (normalised) snowflake schemas

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 232

    OLAP

    Online Analytical Processing dynamic synthesis, analysis and consolidation of large

    volumes of multi-dimensional data

    normally implemented using specialized multi-dimensional DBMS

    a method of visualising and manipulating data with many inter-relationships

    Support common analytical operations such as consolidation

    drill-down

    slicing and dicing

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 233

    Codds OLAP Rules

    1. Multi-dimensional conceptual view2. Transparency3. Accessibility4. Consistent reporting performance5. Client-server architecture6. Generic dimensionality7. Dynamic sparse matrix handling8. Multi-user support9. Unrestricted cross-dimensional operations10. Intuitive data manipulation11. Flexible reporting12. Unlimited dimensions and aggregation levels

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 234

    OLAP Tools

    Categorised according to architecture of underlying database Multi-dimensional OLAP

    data typically aggregated and stored according to predicted usage

    use array technology

    Relational OLAP use of relational meta-data layer with enhanced SQL

    Managed Query Environment deliver data direct from DBMS or MOLAP server to desktop in form

    of a datacube

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 235

    MOLAP

    RDB

    Server

    Load

    MOLAP

    serverRequest

    Result

    Presentation

    Layer

    Database/Application

    Logic Layer

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 236

    ROLAP

    RDB

    Server

    ROLAP

    serverRequest

    Result

    Presentation

    Layer

    Application

    Logic Layer

    SQL

    Result

    Database

    Layer

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 237

    MQE

    RDB

    Server

    Load

    MOLAP

    serverRequest

    Result

    SQL

    Result

    End-user

    tools

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 238

    Data Mining

    The process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions (Simoudis, 1996) focus is to reveal information which is hidden or unexpected

    patterns and relationships are identified by examining the underlying rules and features of the data

    work from data up

    require large volumes of data

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 239

    Example Data Mining

    Applications

    Retail/Marketing Identifying buying patterns of customers

    Finding associations among customer demographic characteristics

    Predicting response to mailing campaigns

    Market basket analysis

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 240

    Example Data Mining

    Applications

    Banking Detecting patterns of fraudulent credit card use

    Identifying loyal customers

    Predicting customers likely to change their credit card affiliation

    Determining credit card spending by customer groups

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 241

    Data Mining Techniques

    Four main techniques Predictive Modelling

    Database Segmentation

    Link Analysis

    Deviation Direction

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 242

    Data Mining Techniques

    Predictive Modelling using observations to form a model of the important

    characteristics of some phenomenon

    Techniques: Classification

    Value Prediction

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 243

    Classification Example-

    Tree Induction

    Customer renting property

    > 2 years

    Rent property

    Rent property Buy property

    Customer age

    > 25 years?

    No Yes

    No Yes

    Source: Connolly and Begg

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 244

    Data Mining Techniques

    Database Segmentation: to partition a database into an unknown number of

    segments (or clusters) of records which share a number of properties

    Techniques: Demographic clustering

    Neural clustering

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 245

    Segmentation: Scatterplot

    Example

    Source: Connolly and Begg

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 246

    Data Mining Techniques

    Link Analysis establish associations between individual records (or sets

    of records) in a database e.g. when a customer rents property for more than two years

    and is more than 25 years old, then in 40% of cases, the customer will buy the property

    Techniques Association discovery

    Sequential pattern discovery

    Similar time sequence discovery

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 247

    Data Mining Techniques

    Deviation Detection identify outliers, something which deviates from some

    known expectation or norm

    Statistics

    Visualisation

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 248

    Deviation Detection:

    Visualisation Example

    Source: Connolly and Begg

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 249

    Mining and Warehousing

    Data mining needs single, separate, clean, integrated, self-consistent data source

    Data warehouse well equipped: populated with clean, consistent data

    contains multiple sources

    utilises query capabilities

    capability to go back to data source

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 250

    Further Reading

    Connolly and Begg, chapters 31 to 34.

    W H Inmon, Building the Data Warehouse, New York, Wiley and Sons, 1993.

    Benyon-Davies P, Database Systems (2nd ed), Macmillan Press, 2000, ch 34, 35 & 36.

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 251

    Interoperability and XML

  • Kyanganda S. ICS 2415 Advanced Dbase Systems 252

    Objectives

    To investigate issues surrounding interoperability

    To gain a basic understanding of XML and its developments related to database systems

    To gain a basic understanding of the use of XML towards achieving interoperability

  • Kyanganda S. ICS 2415