CSCI 3140 Module 1 – Background
description
Transcript of CSCI 3140 Module 1 – Background
CSCI 3140Module 1 – Background
(Based on Chapters 1 – 13 of
Database Systems by Connolly and Begg)
Theodore Chiasson
Dalhousie University
Definitions
• Database- A collection of related data
• Database Management System (DBMS)– Software that manages and controls access to the database
File-based systems
• Early attempt to computerize manual filing systems• Limitations of file-based systems include:
– Separation and isolation of data• Application programs often access many files
– Duplication of data• Data needed in several departments will be entered into each
department’s system– Data dependence
• Application programs need to be updated if file formats are changed– Incompatibility of files
• Different application development environments produce different file formats
– Fixed queries/proliferation of application programs• Application programmer needs to intervene for ad hoc queries
Two underlying problems with file-based systems:
1) The definition of data is embedded within the application programs rather than being stored separately and independently
2) There is no control over access and manipulation of data beyond that imposed by the application programs
Definitions (revisited)
• Database- A shared collection of logically related data, and a description of
this data, designed to meet the information needs of an organization
• Database Management System (DBMS)– A software systems that allows user to define, create, maintain,
and control access to the database.
Important database terms
• System catalog, data dictionary, or meta-data– The ‘data about the data’– The part of the database that makes it self-describing
• Entity– A distinct object (person, place, thing, concept or event) that is
represented in the database
• Attribute– A property describing some aspect of an entity
• Relationship– An association between entities
Facilities provided by a DBMS:
• Data Definition Language (DDL)– Used to specify the data types and structures and the constraints on the
data to be stored in the database
• Data Manipulation Language (DML)– Used to insert, update, delete, and retrieve data from the database
• Structured Query Language (SQL)– De facto standard language for relational DBMSs.
• Controlled access to the database– Security system
– Integrity system
– Concurrency control system
– Recovery control system
– User-accessible catalog
• External level– The users’ view of the database– Describes the part of the database that is relevant to each user– External schemas, or subschemas
• Conceptual level– The community view of the database– Describes what data is stored in the database and the relationships
among the data– Conceptual schema
• Internal level– The physical representation of the database on the computer– Describes how the data is stored in the database– Internal schema
ANSI-SPARC three-level architecture
ANSI-SPARC three-level architecture
View 1
InternalSchema
ConceptualSchema
View 2 View nExternallevel
Conceptuallevel
Internallevel
Physical dataorganization
User 1 User 2 User n
The Relational Model
• All data is logically structured in relations (tables)• Each relation has a name and is made up of attributes (columns) of data• Each tuple (row) contains one value per attribute
Relation: a table with columns and rows (also called a file)
Attribute: a named column of a relation (also called a field)
Tuple: a row of a relation (also called a record)
Domain: the set of allowable values for one or more attributes
Degree: the number of attributes in a relation (unary, binary, ternary, n-ary)
Cardinality: the number of tuples in a relation
Relational database: a collection of normalized relations with distinct relation names
Mathematical review
• Let A and B be sets, where A = {2,4} and B = {1,3,5).• The CARTESIAN PRODUCT of these two sets, written A X B, is the set of
all ordered pairs such that the first element is an element of A and the second element is an element of B, which gives us the set
{(2,1),(2,3),(2,5),(4,1),(4,3),(4,5)}• Any subset of this set is termed a relation• For example, R = {(2,1),(4,1)} is a relation. In this case, R can be specified as all those ordered pairs with
the second element equal to 1, or R = {(x,y) | x E A, y E B, and y = 1} We could also define a relation S as S = {(x,y) | x E A, y E B, and x = 2y} which, given set A and B above, would yield S = {(2,1)}
Database Relations
• Relation Schema – a named relation defined by a set of attribute and domain name pairs
• Let A1, A2, … An be attributes with domains D1, D2, ... Dn. Then the set {A1:D1, A2: D2, … An: Dn} is a relation schema. Thus, a relation R is a set of n-tuples:
(A1:d1, A2,d2, … An:dn) such that d1E D1, d2E D2, … dnE Dn
Tuples are normally written without the column names, yielding (d1,d2, … dn) A relation instance is a set of tuples, such as {(6050, Theo, 317)} {(ID:6050, NAME: Theo, Office: 317)}
• Relational database schema – a set of relation schemas, each with a distinct name– Also called a relational schema
Relational keys
• Superkey– An attribute, or set of attributes, that uniquely identifies a tuple within a
relation
• Candidate key– A superkey such that no proper subset is a superkey within the relation
• Primary key– The candidate key that is selected to uniquely identify tuples with the
relation
• Foreign key– An attribute, or set of attributes, within one relation that matches the
candidate key of some (possibly the same) relation
Relational Integrity
• Domain constraints– Restrict the allowable set of values for the attributes of a relation
• Null– Represents a value for an attribute that is currently unknown or is not
applicable for this tuple
• Entity integrity– In a base relation, no attribute of a primary key may be null
• Referential integrity– If a foreign key exists in a relation, either the foreign key value must
match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null
• Enterprise constraints– Additional rules specified by the users or database administrators of a
database
Views
• Base Relation– A named relation corresponding to an entity in the conceptual schema,
whose tuples are physically stored in the database
• View– The dynamic results of one or more relational operations operating on
the base relations to produce another relation.
– A view is a virtual relation that does not necessarily exist in the database but can be produced upon request by a particular user, at the time of the request
– Not all views are updatable.
Relation Algebra
• A theoretical language with operations that work on one or more relations to define another relation without changing the original relation(s)
• Five fundamental operations:– Selection
– Projection
– Cartesian product
– Union
– Set difference
• Three additional operations (can be derived from the five)– Join
– Intersection
– Division
Unary operations
• Selection operation– Input is a relation and a predicate. The predicate is constructed from
boolean expressions involving attributes of the relation using logical operators ^ (AND), v (OR), and ~ (NOT). Output is a new relation that consists of only those tuples in the original relation for which the predicate evaluates to TRUE.
– Example: σsalary > 100000(Staff)– Result: A new relation with the same attributes as the Staff relation, and
with tuples corresponding to staff with a salary greater than 100,000.– The degree of the new relation is the same as the degree of the original
relation.
• Projection operation– Input is a relation and a set of attributes of the relation. Output is a new
relation with only those attributes listed in the input set. The degree of the new relation is the number of attributes in the input set, and the cardinality of the new relation is the same as the cardinality of the original relation.
– Π(staffNo, salary)(Staff)
• Union (R U S)– Input is two union-compatible relations R and S, output is a new relation
containing all of the tuples of R and S
– Union-compatibility is defined as having the same number of attributes with each pair of corresponding attributes having the same domain
• Set Difference (R – S)– Yields a new relation consisting of tuples that are in relation R that are
not in relation S. R and S must be union-compatible.
• Intersection (R S)– Yields a new relation consisting of tuples that are in both relation R and
relation S. Equivalent to R – (R – S)
• Cartesian Product (R x S)– Yields a new relation that is the concatenation of every tuple of R with
every tuple of S
Set Operations
Join Operations
• Theta join ( R FS)
– Yields a relation with all tuples of the Cartesian Product of R and S that satisfy the predicate F
• Equijoin ( R FS)
– A Theta join in which the only predicate is equal (=)
• Natural join ( R S)– Equijoin over all common attributes of R and S. One occurrence of each
common attribute is eliminated from the result.
• Outer join ( R S)– Tuples from R which do not have matching values in the common
attributes of S are also included in the join, with missing values in the second relation set to null.
• Semijoin ( R FS)
– Contains the tuples of R that participate in a join of R with S
Division operation (R – S)
• The division operation defines a relation over the attributes C that consists of the set of tuples from R that match the combination of every tuple is S, where C is the set of attributes that are in R but are not in S.
Relational Calculus
• Based on a branch of symbolic logic called predicate calculus– A predicate is a truth-valued function with arguments
– A proposition is a predicate with values filled in for the arguments
– If P is a predicate, the set of all x such that P is true for x is expressed as
{x | P(x)}
Tuple Relational Calculus
• Tuple relational calculus– Find tuples for which a predicate is true– Tuple variables `range over’ a named relation– Specify a range of a tuple variabe S as the Staff relation using the
notationStaff(S)
– To express the query `find all tuples of S such that F(S) is true’, we write{S | F(S)}
F is a formula, or well-formed formula (wff)– Quantifiers tell how many instances the predicate applies to
• Universal quantifier `for all’• Existential quantifier `there exists’
– Variables qualified with a quantifier are bound variables, those not qualified are free variables
– All free variables are to the left of the vertical bar in a wff
General form of an expression in Tuple Relational Calculus
{ S1.a1, S2.a2, …, Sn.an | F(S1, S2, …, Sm)} m ≥ n
Where S1, S2, …, Sm are tuple variables, each ai is an attribute of the relation over which Si ranges, and F is a formula
A wff is made up of atoms, where atoms can be:
- R(Si) where Si is a tuple variable and R is a relation
- Si.a1 θ Sj.a2 where Si and Sj are tuple variables, a1 is an attribute of the relation over which Si ranges, a2 is an attributes of the relation over which Sj ranges, and θ is a comparison operator (<,≤,>,≥,=,≠); the attributes a1 and a2 must have domains whose members can be compared by theta
- Si.a1 θ c where where Si is a tuple variable, a1 is an attribute of the relation over which Si ranges, c is a constant from the domain of attribute a1, and θ is one of the comparison operators.
Domain Relational Calculus
• Domain relational calculus– Variables take their values from the domains of attributes rather than the tuples
of relations– Expressions have the general form:
{d1, d2, …, dn | F(d1, d2, …, dm)} m >= n
Where d1, d2, …, dn, …, dm represent domain variables and F(d1, d2, …, dm) represents a formula composed of atoms, where each atom has one of the following forms:
- R(d1, d2, …, dn) where R is a relation of degree n and each di is a domain variable
- di θ dj ,where di and dj are domain variables and θ is a comparison operator
- di θ c,where di is a domain variable, c is a constant from the domain of d i, and θ is a comparison operator
We recursively build up formulae from atoms where - an atom is a formula, - the conjunction or disjunction of two formulae is a formula, the negation of a
formula is a formula- Existential and universal quantifiers can be applied
Entity-Relationship Modeling
• Entity type– A group of objects with the same properties, which are identified by the
enterprise as having an independent existence
• Entity occurance– A uniquely identifiable object of an entity type
• Each entity type in an E-R diagram is represented as a rectangle labeled with the name of the entity
Staff PropertyForRent
Entity-Relationship Modeling
• Relationship type– A set of meaningful associations among entity types
• Relationship occurance– A uniquely identifiable association, which includes one occurrence from
each participating entity type
• Each relationship type in an E-R diagram is represented as a line connecting the associated entity types, labeled with the name of the relationship
Staff BranchHas
Entity-Relationship Modeling
• Degree of a relationship type– The number of participating entity types in a relationship
• Binary: A branch has staff
• Ternary: Staff registers a client at a branch
• Quaternary: A solicitor arranges a bid for a buyer through a bank
Staff BranchHas
Staff Branch
Client
Registers
Buyer Bank
Bid
Arranges
Solicitor
Entity-Relationship Modeling
• Recursive relationship– A relationship type where the same entity type participates more than
once in different roles• Example: Staff (supervisor) supervises staff (Supervisee)
• Example of entities associated through two distinct relationships– Manager manages branch office
– Branch office has member of staff
Staff
Supervises
Supervisee
Supervisor
Staff Branch
Has
Manages
Member of staff
Manager
Branch
Branch
Entity-Relationship Modeling
• Attribute– A property of an entity or relationship type
• Attribute domain– The set of allowable values for one or more attributes
• Simple Attribute– An attribute composed of a single component with an independent existence
• Composite Attribute– An attribute composed of multiple components, each with an independent
existence• Examples
– Address (could be broken into street, city, postal code)– Name (could be broken into FirstName, MiddleInitial, LastName)
• Single-valued Attribute– An attribute that holds a single value for each occurrence of an entity type
• Multi-valued Attribute– An attribute that holds multiple values for each occurance of an entity type
• Example: Each branch may have several phone numbers
• Derived Attribute– An attribute that represents a value that is derivable from the value of a related
attribute or set of attributes, not necessarily in the same entity type
Entity-Relationship Modeling
• Candidate Key– The minimal set of attributes that uniquely identifies each occurance of
an entity type
• Primary Key– The candidate key that is selected to uniquely identify each occurrence
of an entity type
• Composite Key– A candidate key that consists of two or more attributes
Entity-Relationship Modeling
• Diagrammatic representation of attributes in ER diagram
Staff Branch
Has
Manages
staffNo {PK}namepositionsalary/totalStaff
branchNo {PK}address street city postalCodetelNo[1..3]
Derivedattribute
Multi-valuedattribute
Compositeattribute
Primary key
Area forattribute
list
Entity-Relationship Modeling
• Strong entity type– An entity type that is not existence-dependent on some other entity type
• Weak entity type– An entity type that is existence-dependent on some other entity type
• Attributes on relationships– Attributes can be attached to relationships as well as entities
– Diagrammatic representation:
Newspaper PropertyForRentAdvertises
newspaperName propertyNo
dateAdvertcost
Entity-Relationship Modeling
• Multiplicity– The number of possible occurrences of an entity type that may relate to
a single occurrence of an associated entity type through a particular relationship
• One-to-one (1:1)• One-to-many (1:*)• Many-to-many (*:*)
– Cardinality• Describes the maximum number of possible relationship occurrences for an
entity participating in a given relationship
– Participation• Determines whether all or only some entity occurrences participate in a
relationship
Entity-Relationship Modeling
• Specialization/Generalization– Superclass
• An entity type that includes one or more distinct subgroupings of its occurrences
– Subclass• A distinct subgrouping of occurrences of an entity type
– (name of subclass) IS-A (name of superclass)• Manager IS-A Staff• Secretary IS-A Staff
– Specialization• The process of maximizing the differences between members of an entity by
identifying their distinguishing characteristics
– Generalization• The process of minimizing the differences between entites by identifying
their common characteristics
Entity-Relationship Modeling
• Diagrammatic representation of specialization/generalization
StaffBranchHas
staffNo {PK}namepositionsalary
branchNo {PK}address street city postalCode
1..1 1..*
SalesPersonnel
salesAreacarAllowance
Manager
mgrStartDatebonus
Secretary
typingSpeed
Manages1..1
1..1{Optional, And}
Normalization
• Unnormalized form– A table that contains one or more repeating groups
Artist Album
Supertramp Crime of the Century
Even in the quietest moments
Breakfast in America
Sting Dream of the Blue Turtles
Nothing like the Sun
Normalization
• First normal form (1NF)– A relation in which the intersection of each row and column contains one
and only one value
Artist Album
Supertramp Crime of the Century
Supertramp Even in the quietest moments
Supertramp Breakfast in America
Sting Dream of the Blue Turtles
Sting Nothing like the Sun
Normalization
• Functional dependency– Describes the relationship between attributes in a relation– If A and B are attributes of a relation R, B is functionally dependent on A
(denoted A -> B) if each value of A is associated with exactly one value of B.
• (A and B may each consist of one or more attributes)
• Determinant– The attribute or set of attributes on the left-hand-side of the arrow of a
functional dependency
• Full functional dependency– Indicates that if A and B are attributes in a relation, B is fully functionally
dependent on A if B is functionally dependent on A, but not on any proper subset of A
• Second normal form (2NF)– A relation that is in first normal form and every non-primary-key attribute
is fully functionally dependent on every candidate key
Normalization
• Transitive dependency– A condition where A, B, and C are attributes of a relation such that if
A->B and B->C , then C is transitively dependent on A via B (provided that A is not fully functionally dependent on B or C)
• Third normal form (3NF)– A relation that is in first and second normal form, and in which no
non-primary-key attribute is transitively dependent on any candidate key
Normalization
• Boyce-Codd normal form (BCFN)– A relation that is in BCNF if and only if every determinant is a
candidate key
• Example: ClientInterview(clientNo, interviewDate, interviewTime, staffNo, roomNo)
clientNo, interviewDate -> interviewTime, staffNo, roomNo (Primary key) staffNo, interviewDate, interviewTime -> clientNo (Candidate key)
roomNo, interviewDate, interviewTime -> staffNo, clientNo (Candidate key) staffNo, interviewDate -> roomNo
Split into two relations that are in BCNF:
Interview(clientNo, interviewDate, interviewTime, staffNo) StaffRoom(staffNo, interviewDate, roomNo)
Normalization
• Multivalued dependency– Represents a dependency between attributes in a relation, such that for
each value of A there is a set of values for B and a set of values for C. However, the set of values for B and C are independent of each other.
• Trivial multivalued dependency– If A->> B is a multivalued dependency in relation R, it is considered
trivial if either (a) B is a subset of A or (b) A U B = R
• Fourth normal form (4NF)– A relation that is in BCNF and contains no nontrivial multivalued
dependenciesExample:BranchStaffOwner(branchNo, sName, oName) BCNF
branchNo ->> sName 4NFbranchNo ->> oName 4NF
Normalization
• Lossless-join dependency– A property of decomposition, which ensures that no spurious tuples are
generated when relations are reunited through a natural join operation
• Fifth normal form (5NF)– A relation has no join dependencies
Example:
Chrysler, Dodge, Jeep dealership carries cars from all three manufacturers. If the dealership decides to offer trucks for sale, trucks from all manufacturers that make trucks must be offerred.
Manufacturer(manufacturerNo, vehicleType)
Dealer(dealerNo, dealerName)
VehicleTypes(vehicleTypeNo, vehicleType, vehicleTypeDescription)
VehicleTypesOffered(manufacturerNo, dealerNo, vehicleTypeNo)
Normalization
• Unnormalized form– A table that contains one or more repeating groups
• First normal form (1NF)– A relation in which the intersection of each row and column contains one and
only one value
• Second normal form (2NF)– A relation that is in first normal form and every non-primary-key attribute is fully
functionally dependent on every candidate key
• Third normal form (3NF)– A relation that is in first and second normal form, and in which no non-primary-
key attribute is transitively dependent any candidate key
• Boyce-Codd normal form (BCNF)– A relation is in BCNF, if and only if, every determinant is a candidate key
• Fourth normal form (4NF)– A relation is in Boyce-Codd normal form and contains no nontrivial multi-valued
dependencies
• Fifth normal form (5NF)– A relation has no join dependencies