Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1....

12
1 Notes for CA218 Introduction to Databases Dr. Martin Crane Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4. Entity-Relational (ER) Data Modelling 5. The Relational Data Model 6. ER to Relational Modelling 7. Structured Query Language (SQL) 8. Views 9. Normalization Assessment Continuous Assessment: Lab Exam on SQL (Probably in Week 9) 1 hour long Worth 25% of the total module marks Written Exam 2 hours long Answer 4 questions (Q1, 2, 3) and one of Q4,5 CA218 Introduction to Databases Notes (c) Martin Crane 2011

Transcript of Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1....

Page 1: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

1

Notes for CA218Introduction to Databases

Dr. Martin Crane

Course Contents

1. Introduction, Database Basics2. Database Overview3. Low-Level Stuff: Data Storage4. Entity-Relational (ER) Data Modelling5. The Relational Data Model6. ER to Relational Modelling7. Structured Query Language (SQL)8. Views9. Normalization

Assessment

• Continuous Assessment: – Lab Exam on SQL (Probably in Week 9)– 1 hour long– Worth 25% of the total module marks

• Written Exam– 2 hours long– Answer 4 questions (Q1, 2, 3) and one of Q4,5

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 2: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

2

Course Material

• My Notes: – See webpage:www.computing.dcu.ie/~mcrane/CA218New.html

• Course Textbook– Fundamentals of Database Systems

by ElMasri & Navathe

• The Web

Chapter 1Intro, Database Basics

• Basics of Information Systems & Databases

Information Systems• “A computer-based information system retrieves info from its

database in response to a user’s query”– ‘computer-based’: Manual (i.e. handwritten) V automatic

(remote sensor beams data to computer) V non-automatic (clerk enters)

– ‘retrieves’: Retrieve/Store/Modify/Delete- always 4 DML cmds– ‘info’: Computerised info can be:

• Structured numeric/alpha/free text• Voice• Image• Others (biological data, MP4 video…. Anything that is searchable)

– ‘database’: repository which is ‘big’ & ‘organised’• Should be easy to retrieve data• Keep integrity of data (15.999 <>16)• Security, storage requirements

->

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 3: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

3

Information Systems (cont’d)– ‘user’s query’ differs according to wishes of User:

1. Create a report2. Quality assurance3. Decision-making4. Study Trends5. Answer a question6. Perform calculations7. Summarize data

– Characteristics of ‘users query’:• Precise (5, above) or vague info needed (3,4,7 above)• Can be expressed precisely or vaguely (“like”)• Interactive or batch execution/retrieval• Seeking specific info or aggregate (i.e. derived like

max/min/avg)

Basic Database Definitions• Database

• An object or mechanism used to store info or data• Users should be able to store data in organised manner• Examples: phone book, mobile contacts (web isn’t a DB!)

• Data: – Collection of one or more bits of information

• DBMS (DataBase Management System)( g y )• Together with Database forms Database System that

provides logic to ensure & reinforce necessary standards on the data

• => is a set of rules that are part of the DB software & dictates logically how the data is stored, treated & accessed

• Schema of a DB system:• structure described in formal language supported by the

DBMS

Basic Database Definitions (cont’d)

• DB Catalog: • Complete description of DB structure & Constraints• Contains data on structure of each file, type/storage format

of data item & constrains on data• Info stored in catalog = Metadata or data dictionary

• DB View:• DB View:– Subset of DB or virtual data derived from DB but not stored

• Program-Data Independence:• Changes in file structure don’t require changes in all

programs accessing it

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 4: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

4

Information Systems & DBMS• So where does a DBMS fit in?:

– Interactive query (i.e adhoc or canned) or host query (i.e. SQL in Java- ‘host lang’)

– Facilitates unambiguous statement (data meets requirements)– Facilitates precise query (i.e. comes back with single piece of

data)– Retrieved data is specifically stored or aggregated (in SQL

‘views’)– Query is Boolean combination of predicates (i.e relation btw

entities)– Exact matching of conditions (i.e select … where in SQL)– Provides a formal schema (structure of data not expected to

change)– DBMS also provides:

• Security (i.e no unauthorised access)• Data independence (see above)• Persistence (i.e complex DTs can be stored on termination)• Concurrency (i.e updates by users visible by multiple users sim’ly)• “recovery & backup” (i.e transfer of copied files from one location to

another with operations on them)

Chapter 2Database Overview

• The Database Management System (DBMS)– Contents– Software & Users– Three Level Architecture– Data models

What Makes Up a DBMS?

Database

End UsersApplication Programs

Database Management System

(DBMS)

DBMS stores, maintains and provides access to data

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 5: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

5

DBMS Data• Essentially all the data required by the DBMS to

enforce structure & constraints on the data• Has to allow for DBMS running on a range of

platforms• Can have single/multi-user, with shared access and

integrity of data must be maintained• Users are concerned with overlapping subsets of total • Users are concerned with overlapping subsets of total

data (i.e. data perceived by different users in different ways)

• Example: DCU Data, consists of:– Students have views consisting of ….– Library has views consisting of ….– Finance has views consisting of ….

• Inherent feature of DBMS data is that is shared.

DBMS Software• DBMS is an application program sitting btw user(s) &

data

DBMS

Users

Not allowed!

• DBMS handles all interactions between the two• DBMS shields users from each other & from

unauthorised access

Operating System

Data

DBMS Users: Actors• DBAs are the DB System Managers & responsible for

– DB itself AND DBMS & related– Also responsible for:

• Authorising access to DB• Co-ordination & Monitoring its use• Updates• Breach of security and slow response time

• DB Designers are responsible forDB Designers are responsible for– Identifying data to be stored in DB & choosing appropriate structures

to represent & store data– Talk to prospective users, understand their requirements & design

accordingly

• System Analysts & Application Programmers– Typically write C, Java with embedded SQL commands– Run online or batch– Programs are precompiled allowing for dynamic of DBMS at runtime &

more sophisticated interaction with DB

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 6: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

6

DBMS Users: Actors (cont’d)• End Users: use an interactive query language (eg SQL),

maybe working in a bullet-proof, controlled environment or using a command line interface (eg ORACLE)

• End Users can be classified as:– Casual/Occasional

• Only occ’ly use DB but maybe need different info each time• Usually Middle/High-Level managers, using High-Level Language

– Naïve/Canned Transactions• Constant querying of DB• Typically Bank Tellers, Reservation Staff for airlines etc

– Sophisticated• Need to thoroughly familiarise themselves with DBMS facilities• Typically engineers, scientists with complex requirements

– Stand-alone• Maintain personal DBs using COTS s/w with menus or GUIs• Example is user of tax package for their own personal purposes.

DBMS Users: Workers Behind Scene• Typically accociated with design, devpt, operation of

DBMS s/w & system environment• DBMS Designers & Developers

• Design/implement DBMS modules/interfaces as a s/w package

• Modules to implement catalog, query language, interface processors, data access & securityp , y

• Must interface with other sys s/w (eg OS, compilers)

• Tool Developers• Optional S/w packages to facilitate DB system design & use• For DB design, performance monitoring, GUIs, simulation

• Operators & Maintenance Personnel• Responsible for actual running/maintenance of h/w & s/w

environment for the DB system

• DBMS used by any reasonably self-contained organisation from a single individual to a large corporation wanting to manage a large volume of information.

• Example: Dublin City University – Students, Lecturers, Courses, Books, Schools, Faculties,

Lectures

DBMS Data Example: Entities & R’ships

– All are entities or distinguishable objects in the real world– Also have relationships between real world entities:

• Students make up Faculties• Schools have Students• Students attend Lectures given-by Lecturers• Lectures are part of Courses• Students borrow Books• Lecturers recommend Books• Courses can be composed of other Courses

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 7: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

7

• Features of real-world relationships:– They are bi-directional (i.e. can be put two ways)– Most are binary (i.e. involve only 2 entities), some

are ternary (i.e. involve 3 participating entities):

DBMS Data Ex: Entities & R’ships cont’d

Lecturer Recommends CourseLecturer Lectures Student

Binary

– Entity types may be linked in more than one way:

Textbook TernaryBinary

Man Marries Woman1 1

Lecturer Teaches Students1 m

Student Enrols on Coursesmn

1:1

n:m

1:m

More Formal Definitions….• View

• A user’s perspective of the DB that may be a subset of DB or contain virtual data derived from DB files but not stored

• Example students, library, finance have different views of DB

• Entity: • A real world object or concept (eg employee or project)

• Attribute: • A property of interest, further describing an entity (eg

name,project)

• Relationship:• R’ship between 2/more entities represents an interaction btw them• Example works-on between employee and project

• Logical org’n gives a clear picture & helps programmers develop application programs faster

• Handles the low-level file maintenance.• Centralises info; this is a good thing because:

– Redundancy is avoided– Inconsistency is avoided (don’t have to alter in multiple places)– Data is shared– Standards are enforced (e.g. naming standards)– Security is applied– Integrity is maintained (has special meaning for DBs eg Ref Integrity, below)

Why use a DBMS?

g y ( p g g g y, )– Requirements are balanced for different users

• Yields data independence (see above) where data org’n is not built into application programs, e.g.:– Representation of numeric data (BCD, Floating point etc.)– Units for numeric data (S.I., Imperial units)– Data Coding (text or MP3 etc)– Stored record & stored file structure (hash/index file, see below)

• DBA can change access structures during mid-life of DBMS without users knowing except with respect to performance.

Logical D.I.

Physical D.I.

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 8: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

8

• Functional organisation• Does not cover many DBMS: concurrency, backup, security etc. • Data Independence:

– Logical: changes to conceptual level possible without affecting external– Physical: changes to internal level possible without affecting conceptual– Mapping between layers makes this possible

Three Level Architecture of DBMS

Vie A Vie B Vie C

External Level:- User view- Def’d by user/app

View A View B View C

Conceptual View

Internal View

Mapping Supplied by DBMS

Mapping Supplied by DBMS/OS

programmer consulting with DBA

Conceptual Level:- Def’d by DBA

Logical Data Independence

Physical Data Independence

Internal Level:- Def’d by DBA for optimisation

• Users use a language incorporating a data sublanguage for DB with:– Data Definition Language (DDL) for implementation of External

Schema– Data Manipulation Language (DML) for data retrieval/insertion/deletion

etc

• Individual user’s view is an external view: multiple occurance of

3 Level Architecture cont’d: External Level

• Individual user s view is an external view: multiple occurance of multiple types of external records.

• Each view describes the part the individual users is interested in, hiding the rest.

• Views are defined by an External Schema defined in DDL by the DBA (e.g. create_view command in SQL)

• Describes the structure of the whole DB for the end users above.• Hides the details of data storage (i.e. just what data are stored &

relationships between the data)• May be different to or similar to external views (i.e. may require

joining of two tables)• Presents the data as is: multiple occurrences of multiple types of

conceptual records

3 Level Architecture cont’d: Conceptual Level

conceptual records• Conceptual schema

– Defined by DBA using conceptual DDL.– Includes security & integrity constraints not present in external levels.– Compiled by DBMS & stored in data dictionary (aka MetaData

Repository containing names & descriptions of tables and fields in DB)• No more than a union of individual external schemas with Security

& Integrity added on.

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 9: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

9

• Describes the physical storage structure of the DB.• Managed by the Operating System under direction of DBMS• Defines types of stored records, indices, how fields are stored etc

(i.e. how data is stored at internal level) • Defined using an (occasionally used) internal DDL (i.e. subset of

SQL DDL, e.g. create_index in SQL)P i thi l d th b

3 Level Architecture cont’d: Internal Level

• Programs accessing this layer are dangerous as they bypass security and integrity checks of the internal layer

• Mappings exist between the three different levels of the 3LA and the DBA is responsible for correct mapping between the levels, eg– Changes to internal level don’t require changes to conceptual, just to

the mapping (eg index change)– Changes to conceptual level map onto the external level.

• DBMS Components:1. Query Compiler: handles high-level queries entered

interactively; does parsing, analysing, interpreting of queries (to DB access code) & generates calls to …

2. Run-Time Database Processor: handles access runtime to DB (ie optimal path) in form of retrieval/update on DB, access thro…

3. Stored Data Manager: DB & DB catalog stored on disk & access to it handled by OS. A higher-level stored data

What Makes Up a DBMS? Alternate View

Inte

ract

ive

use

access to it handled by OS. A higher level stored data manager controls access to DBMS info stored on disk (DB & DB catalog).

4. Precompiler: extracts DML from application programs in host language for sending to…

5. DML Compiler: generates object code for DB access

• Misc– DDL Compiler: Called by DBA, processes schema defs in DDL

for storage of metadata in catalog.

What Makes Up a DBMS? Alternate View (cont’d)

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 10: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

10

What Makes Up a DBMS? Simplified View

1.2.1.

3.

Role of DBA• Essential part of any DBMS is the role played by the DBA.• Has overall control of DBMS.• Decides on info content & logical, conceptual DB design schema.• Decides on storage structures (index, hash) & access using DDL.• Liases with users & helps them design external schemas using

DDL.• Defines security & integrity checks.• Defines backup & recovery strategies.• Monitors performance of DBMS, responds to changing

requirements using load, dump (record of DB for backup) & statistical analysis routines.

• Important source of info for DBA is system catalog/data dictionary:– Contains data about data– Descriptions of other objects rather than “raw” data– Includes schemas and mappings– Data dictionary contains result of compiling DDL giving a set of tables

& can be queried like a DB

Data Models• Model

– Used to hide superfluous details while highlighting details relevent to the applications at hand

• => Data Model is a mechanism to do this for DB applications– i.e. entities of interest & their relationships in the DB– Allows conceptualization of association of entities & their

attributesDiff i f i i i i i & – Differ in ways of representing associations among entities & attributes

– Main ones we look at: Hierarchical, Network (& ER) & Relational models

– Almost all non-relational models have been extended to have relational front ends.

– ER representation can be mapped into Relational DB.– All this will become clearer later when we look deeper at ER,

Relational in more depth.

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 11: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

11

Various Data Models…Flat File

• Flat File (1950s)• Single, sequential 2D array of data • + Ok for storing data (small lists, etc)• - Search: v. inefficient (look at all entries)• - No concurrency, probs: crashes, redundancies

Flat File Example

Files:StudentsStaffCourses

HostComputer

Name IDMary Smith 1234John Tell 1579

Name IDPete Jones 2468Jay Simms 6842

Course IDDatabases CA218Logic CA208

p

• From late 1960s• Parent-child r’ship • Only 1X1:N r’ship btw. 2 types of data

per DB• Works ok for, e.g. Table of Contents type

data• + • Probs with deletion of data e g parents

Various Data Models… Hierarchical

• - Probs with deletion of data e.g.parents• - duplication of data (eg Courses stored

twice)• - changes (eg to Courses) made multiple

times• - navigation has to go from root down• - deletion (eg a parent record) causes

probs

Hierarchical DB Example

School

CoursesGrades Test

StaffStudents

Courses

Various Data Models…Network• From early 1970s

• Different to Hierarchical– Supports N:M r’ships– Navigation can start anywhere– But needs familiarity with structure for changing, navigation &

optimisation the DB

• Network+Hierarchical evolved into ER

Network DB Example

School

Grades Test

StaffStudents

Courses

CA218 Introduction to Databases Notes

(c) Martin Crane 2011

Page 12: Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1. Introduction, Database Basics 2. Database Overview 3. Low-Level Stuff: Data Storage 4.

12

– Developed by EF Codd, IBM 1970)– Different to Hierarchical, Network DBs

• Aims to “protect” users from DB structure• Data Indep means data location unimportant • Data Storage in Relations/Tables (with

columns & rows) for data independence> d t & t t l i d ( lik H&N)

Relational DB Example

Various Data Models… Relational

• => data & structural indep (unlike H&N)• Rows in table id’ed thro keys (eg ISBN,

PPSN)• + Tabular view means conceptual simplicity,

=> easier to design, implement, manage &use.

• + Ad hoc query capability based on SQL• - Ease of use, mgmt => increased h/w &

system s/w overhead

School

Tests

StaffStudents

CoursesEnrollment

CA218 Introduction to Databases Notes

(c) Martin Crane 2011