Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1....
Transcript of Course Contents - DCU School of Computingmhughes/Lectures/Wk1_3Up.pdf · Course Contents 1....
1
Notes for CA218Introduction to Databases
Dr. Martin Crane
Course Contents
1. Introduction, Database Basics2. Database Overview3. Low-Level Stuff: Data Storage4. Entity-Relational (ER) Data Modelling5. The Relational Data Model6. ER to Relational Modelling7. Structured Query Language (SQL)8. Views9. Normalization
Assessment
• Continuous Assessment: – Lab Exam on SQL (Probably in Week 9)– 1 hour long– Worth 25% of the total module marks
• Written Exam– 2 hours long– Answer 4 questions (Q1, 2, 3) and one of Q4,5
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
2
Course Material
• My Notes: – See webpage:www.computing.dcu.ie/~mcrane/CA218New.html
• Course Textbook– Fundamentals of Database Systems
by ElMasri & Navathe
• The Web
Chapter 1Intro, Database Basics
• Basics of Information Systems & Databases
Information Systems• “A computer-based information system retrieves info from its
database in response to a user’s query”– ‘computer-based’: Manual (i.e. handwritten) V automatic
(remote sensor beams data to computer) V non-automatic (clerk enters)
– ‘retrieves’: Retrieve/Store/Modify/Delete- always 4 DML cmds– ‘info’: Computerised info can be:
• Structured numeric/alpha/free text• Voice• Image• Others (biological data, MP4 video…. Anything that is searchable)
– ‘database’: repository which is ‘big’ & ‘organised’• Should be easy to retrieve data• Keep integrity of data (15.999 <>16)• Security, storage requirements
->
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
3
Information Systems (cont’d)– ‘user’s query’ differs according to wishes of User:
1. Create a report2. Quality assurance3. Decision-making4. Study Trends5. Answer a question6. Perform calculations7. Summarize data
– Characteristics of ‘users query’:• Precise (5, above) or vague info needed (3,4,7 above)• Can be expressed precisely or vaguely (“like”)• Interactive or batch execution/retrieval• Seeking specific info or aggregate (i.e. derived like
max/min/avg)
Basic Database Definitions• Database
• An object or mechanism used to store info or data• Users should be able to store data in organised manner• Examples: phone book, mobile contacts (web isn’t a DB!)
• Data: – Collection of one or more bits of information
• DBMS (DataBase Management System)( g y )• Together with Database forms Database System that
provides logic to ensure & reinforce necessary standards on the data
• => is a set of rules that are part of the DB software & dictates logically how the data is stored, treated & accessed
• Schema of a DB system:• structure described in formal language supported by the
DBMS
Basic Database Definitions (cont’d)
• DB Catalog: • Complete description of DB structure & Constraints• Contains data on structure of each file, type/storage format
of data item & constrains on data• Info stored in catalog = Metadata or data dictionary
• DB View:• DB View:– Subset of DB or virtual data derived from DB but not stored
• Program-Data Independence:• Changes in file structure don’t require changes in all
programs accessing it
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
4
Information Systems & DBMS• So where does a DBMS fit in?:
– Interactive query (i.e adhoc or canned) or host query (i.e. SQL in Java- ‘host lang’)
– Facilitates unambiguous statement (data meets requirements)– Facilitates precise query (i.e. comes back with single piece of
data)– Retrieved data is specifically stored or aggregated (in SQL
‘views’)– Query is Boolean combination of predicates (i.e relation btw
entities)– Exact matching of conditions (i.e select … where in SQL)– Provides a formal schema (structure of data not expected to
change)– DBMS also provides:
• Security (i.e no unauthorised access)• Data independence (see above)• Persistence (i.e complex DTs can be stored on termination)• Concurrency (i.e updates by users visible by multiple users sim’ly)• “recovery & backup” (i.e transfer of copied files from one location to
another with operations on them)
Chapter 2Database Overview
• The Database Management System (DBMS)– Contents– Software & Users– Three Level Architecture– Data models
What Makes Up a DBMS?
Database
End UsersApplication Programs
Database Management System
(DBMS)
DBMS stores, maintains and provides access to data
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
5
DBMS Data• Essentially all the data required by the DBMS to
enforce structure & constraints on the data• Has to allow for DBMS running on a range of
platforms• Can have single/multi-user, with shared access and
integrity of data must be maintained• Users are concerned with overlapping subsets of total • Users are concerned with overlapping subsets of total
data (i.e. data perceived by different users in different ways)
• Example: DCU Data, consists of:– Students have views consisting of ….– Library has views consisting of ….– Finance has views consisting of ….
• Inherent feature of DBMS data is that is shared.
DBMS Software• DBMS is an application program sitting btw user(s) &
data
DBMS
Users
Not allowed!
• DBMS handles all interactions between the two• DBMS shields users from each other & from
unauthorised access
Operating System
Data
DBMS Users: Actors• DBAs are the DB System Managers & responsible for
– DB itself AND DBMS & related– Also responsible for:
• Authorising access to DB• Co-ordination & Monitoring its use• Updates• Breach of security and slow response time
• DB Designers are responsible forDB Designers are responsible for– Identifying data to be stored in DB & choosing appropriate structures
to represent & store data– Talk to prospective users, understand their requirements & design
accordingly
• System Analysts & Application Programmers– Typically write C, Java with embedded SQL commands– Run online or batch– Programs are precompiled allowing for dynamic of DBMS at runtime &
more sophisticated interaction with DB
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
6
DBMS Users: Actors (cont’d)• End Users: use an interactive query language (eg SQL),
maybe working in a bullet-proof, controlled environment or using a command line interface (eg ORACLE)
• End Users can be classified as:– Casual/Occasional
• Only occ’ly use DB but maybe need different info each time• Usually Middle/High-Level managers, using High-Level Language
– Naïve/Canned Transactions• Constant querying of DB• Typically Bank Tellers, Reservation Staff for airlines etc
– Sophisticated• Need to thoroughly familiarise themselves with DBMS facilities• Typically engineers, scientists with complex requirements
– Stand-alone• Maintain personal DBs using COTS s/w with menus or GUIs• Example is user of tax package for their own personal purposes.
DBMS Users: Workers Behind Scene• Typically accociated with design, devpt, operation of
DBMS s/w & system environment• DBMS Designers & Developers
• Design/implement DBMS modules/interfaces as a s/w package
• Modules to implement catalog, query language, interface processors, data access & securityp , y
• Must interface with other sys s/w (eg OS, compilers)
• Tool Developers• Optional S/w packages to facilitate DB system design & use• For DB design, performance monitoring, GUIs, simulation
• Operators & Maintenance Personnel• Responsible for actual running/maintenance of h/w & s/w
environment for the DB system
• DBMS used by any reasonably self-contained organisation from a single individual to a large corporation wanting to manage a large volume of information.
• Example: Dublin City University – Students, Lecturers, Courses, Books, Schools, Faculties,
Lectures
DBMS Data Example: Entities & R’ships
– All are entities or distinguishable objects in the real world– Also have relationships between real world entities:
• Students make up Faculties• Schools have Students• Students attend Lectures given-by Lecturers• Lectures are part of Courses• Students borrow Books• Lecturers recommend Books• Courses can be composed of other Courses
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
7
• Features of real-world relationships:– They are bi-directional (i.e. can be put two ways)– Most are binary (i.e. involve only 2 entities), some
are ternary (i.e. involve 3 participating entities):
DBMS Data Ex: Entities & R’ships cont’d
Lecturer Recommends CourseLecturer Lectures Student
Binary
– Entity types may be linked in more than one way:
Textbook TernaryBinary
Man Marries Woman1 1
Lecturer Teaches Students1 m
Student Enrols on Coursesmn
1:1
n:m
1:m
More Formal Definitions….• View
• A user’s perspective of the DB that may be a subset of DB or contain virtual data derived from DB files but not stored
• Example students, library, finance have different views of DB
• Entity: • A real world object or concept (eg employee or project)
• Attribute: • A property of interest, further describing an entity (eg
name,project)
• Relationship:• R’ship between 2/more entities represents an interaction btw them• Example works-on between employee and project
• Logical org’n gives a clear picture & helps programmers develop application programs faster
• Handles the low-level file maintenance.• Centralises info; this is a good thing because:
– Redundancy is avoided– Inconsistency is avoided (don’t have to alter in multiple places)– Data is shared– Standards are enforced (e.g. naming standards)– Security is applied– Integrity is maintained (has special meaning for DBs eg Ref Integrity, below)
Why use a DBMS?
g y ( p g g g y, )– Requirements are balanced for different users
• Yields data independence (see above) where data org’n is not built into application programs, e.g.:– Representation of numeric data (BCD, Floating point etc.)– Units for numeric data (S.I., Imperial units)– Data Coding (text or MP3 etc)– Stored record & stored file structure (hash/index file, see below)
• DBA can change access structures during mid-life of DBMS without users knowing except with respect to performance.
Logical D.I.
Physical D.I.
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
8
• Functional organisation• Does not cover many DBMS: concurrency, backup, security etc. • Data Independence:
– Logical: changes to conceptual level possible without affecting external– Physical: changes to internal level possible without affecting conceptual– Mapping between layers makes this possible
Three Level Architecture of DBMS
Vie A Vie B Vie C
External Level:- User view- Def’d by user/app
View A View B View C
Conceptual View
Internal View
Mapping Supplied by DBMS
Mapping Supplied by DBMS/OS
programmer consulting with DBA
Conceptual Level:- Def’d by DBA
Logical Data Independence
Physical Data Independence
Internal Level:- Def’d by DBA for optimisation
• Users use a language incorporating a data sublanguage for DB with:– Data Definition Language (DDL) for implementation of External
Schema– Data Manipulation Language (DML) for data retrieval/insertion/deletion
etc
• Individual user’s view is an external view: multiple occurance of
3 Level Architecture cont’d: External Level
• Individual user s view is an external view: multiple occurance of multiple types of external records.
• Each view describes the part the individual users is interested in, hiding the rest.
• Views are defined by an External Schema defined in DDL by the DBA (e.g. create_view command in SQL)
• Describes the structure of the whole DB for the end users above.• Hides the details of data storage (i.e. just what data are stored &
relationships between the data)• May be different to or similar to external views (i.e. may require
joining of two tables)• Presents the data as is: multiple occurrences of multiple types of
conceptual records
3 Level Architecture cont’d: Conceptual Level
conceptual records• Conceptual schema
– Defined by DBA using conceptual DDL.– Includes security & integrity constraints not present in external levels.– Compiled by DBMS & stored in data dictionary (aka MetaData
Repository containing names & descriptions of tables and fields in DB)• No more than a union of individual external schemas with Security
& Integrity added on.
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
9
• Describes the physical storage structure of the DB.• Managed by the Operating System under direction of DBMS• Defines types of stored records, indices, how fields are stored etc
(i.e. how data is stored at internal level) • Defined using an (occasionally used) internal DDL (i.e. subset of
SQL DDL, e.g. create_index in SQL)P i thi l d th b
3 Level Architecture cont’d: Internal Level
• Programs accessing this layer are dangerous as they bypass security and integrity checks of the internal layer
• Mappings exist between the three different levels of the 3LA and the DBA is responsible for correct mapping between the levels, eg– Changes to internal level don’t require changes to conceptual, just to
the mapping (eg index change)– Changes to conceptual level map onto the external level.
• DBMS Components:1. Query Compiler: handles high-level queries entered
interactively; does parsing, analysing, interpreting of queries (to DB access code) & generates calls to …
2. Run-Time Database Processor: handles access runtime to DB (ie optimal path) in form of retrieval/update on DB, access thro…
3. Stored Data Manager: DB & DB catalog stored on disk & access to it handled by OS. A higher-level stored data
What Makes Up a DBMS? Alternate View
Inte
ract
ive
use
access to it handled by OS. A higher level stored data manager controls access to DBMS info stored on disk (DB & DB catalog).
4. Precompiler: extracts DML from application programs in host language for sending to…
5. DML Compiler: generates object code for DB access
• Misc– DDL Compiler: Called by DBA, processes schema defs in DDL
for storage of metadata in catalog.
What Makes Up a DBMS? Alternate View (cont’d)
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
10
What Makes Up a DBMS? Simplified View
1.2.1.
3.
Role of DBA• Essential part of any DBMS is the role played by the DBA.• Has overall control of DBMS.• Decides on info content & logical, conceptual DB design schema.• Decides on storage structures (index, hash) & access using DDL.• Liases with users & helps them design external schemas using
DDL.• Defines security & integrity checks.• Defines backup & recovery strategies.• Monitors performance of DBMS, responds to changing
requirements using load, dump (record of DB for backup) & statistical analysis routines.
• Important source of info for DBA is system catalog/data dictionary:– Contains data about data– Descriptions of other objects rather than “raw” data– Includes schemas and mappings– Data dictionary contains result of compiling DDL giving a set of tables
& can be queried like a DB
Data Models• Model
– Used to hide superfluous details while highlighting details relevent to the applications at hand
• => Data Model is a mechanism to do this for DB applications– i.e. entities of interest & their relationships in the DB– Allows conceptualization of association of entities & their
attributesDiff i f i i i i i & – Differ in ways of representing associations among entities & attributes
– Main ones we look at: Hierarchical, Network (& ER) & Relational models
– Almost all non-relational models have been extended to have relational front ends.
– ER representation can be mapped into Relational DB.– All this will become clearer later when we look deeper at ER,
Relational in more depth.
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
11
Various Data Models…Flat File
• Flat File (1950s)• Single, sequential 2D array of data • + Ok for storing data (small lists, etc)• - Search: v. inefficient (look at all entries)• - No concurrency, probs: crashes, redundancies
Flat File Example
Files:StudentsStaffCourses
HostComputer
Name IDMary Smith 1234John Tell 1579
Name IDPete Jones 2468Jay Simms 6842
Course IDDatabases CA218Logic CA208
p
• From late 1960s• Parent-child r’ship • Only 1X1:N r’ship btw. 2 types of data
per DB• Works ok for, e.g. Table of Contents type
data• + • Probs with deletion of data e g parents
Various Data Models… Hierarchical
• - Probs with deletion of data e.g.parents• - duplication of data (eg Courses stored
twice)• - changes (eg to Courses) made multiple
times• - navigation has to go from root down• - deletion (eg a parent record) causes
probs
Hierarchical DB Example
School
CoursesGrades Test
StaffStudents
Courses
Various Data Models…Network• From early 1970s
• Different to Hierarchical– Supports N:M r’ships– Navigation can start anywhere– But needs familiarity with structure for changing, navigation &
optimisation the DB
• Network+Hierarchical evolved into ER
Network DB Example
School
Grades Test
StaffStudents
Courses
CA218 Introduction to Databases Notes
(c) Martin Crane 2011
12
– Developed by EF Codd, IBM 1970)– Different to Hierarchical, Network DBs
• Aims to “protect” users from DB structure• Data Indep means data location unimportant • Data Storage in Relations/Tables (with
columns & rows) for data independence> d t & t t l i d ( lik H&N)
Relational DB Example
Various Data Models… Relational
• => data & structural indep (unlike H&N)• Rows in table id’ed thro keys (eg ISBN,
PPSN)• + Tabular view means conceptual simplicity,
=> easier to design, implement, manage &use.
• + Ad hoc query capability based on SQL• - Ease of use, mgmt => increased h/w &
system s/w overhead
School
Tests
StaffStudents
CoursesEnrollment
CA218 Introduction to Databases Notes
(c) Martin Crane 2011