Week 2-data models
-
Upload
garapatiavinash -
Category
Documents
-
view
78 -
download
1
description
Transcript of Week 2-data models
1
Today’s Class
Data Models
Relational Model
2
Data Models
Data Model: A set of concepts to describe the structure of a database, and certain constraintsthat the database should obey.
Data Model Operations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. Operations on the data model may include basic operations and user-defined operations.
3
Categories of data models
Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. (Also called entity-based or object-based data models.)
Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer.
Implementation (representational) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details.
4
Schemas versus Instances• Database Schema: The description of a
database. Includes descriptions of the database structure and the constraints that should hold on the database.
• Schema Diagram: A diagrammatic display of (some aspects of) a database schema.
• Schema Construct: A component of the schema or an object within the schema, e.g., STUDENT, COURSE.
• Database Instance: The actual data stored in a database at a particular moment in time. Also called database state (or occurrence).
5
Database Schema Vs. Database State
• Database State: Refers to the content of a database at a moment in time.
• Initial Database State: Refers to the database when it is loaded
• Valid State: A state that satisfies the structure and constraints of the database.
• Distinction• The database schema changes very infrequently. The
database state changes every time the database is updated.
• Schema is also called intension, whereas state is called extension.
6
define
empty state
initial state
load
state
update
updatevalid state
satisfy database schema
7
Importance of Data Models
Data models Representations, usually graphical, of complex
real-world data structures
Facilitate interaction among the designer, the applications programmer and the end user
End-users have different views and needs for data
Data model organizes data for various users
8
Data Model Basic Building Blocks
Entity Anything about which data will be
collected/stored
Attribute Characteristic of an entity
Relationship Describes an association among entities
• One-to-one (1:1) relationship
• One-to-many (1:M) relationship
• Many-to-many (M:N or M:M) relationship
Constraint A restriction placed on the data
9
History of Data Models
Relational Model: proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82. Now in several commercial products (DB2, ORACLE, SQL Server, SYBASE, INFORMIX).
Network Model: the first one to be implemented by Honeywell in 1964-65 (IDS System). Adopted heavily due to the support by CODASYL (CODASYL - DBTG report of 1971). Later implemented in a large variety of systems - IDMS (Cullinet - now CA), DMS 1100 (Unisys), IMAGE (H.P.), VAX -DBMS (Digital Equipment Corp.).
Hierarchical Data Model: implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems. The most popular model. Other system based on this model: System 2k (SAS inc.)
10
History of Data Models
Object-oriented Data Model(s): several models have been proposed for implementing in a database system. One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE). Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS (at H.P.- used in Open OODB).
Object-Relational Models: Most Recent Trend. Started with Informix Universal Server. Exemplified in the latest versions of Oracle-10i, DB2, and SQL Server etc. systems.
11
Hierarchical Database Model
Logically represented by an upside down tree
Each parent can have many children
Each child has only one parent
12
Hierarchical Database Model Advantages
Conceptual simplicity
Database security and integrity
Data independence
Efficiency
Disadvantages
Complex implementation
Difficult to manage and lack of standards
Lacks structural independence
Applications programming and use complexity
Implementation limitations
13
Hierarchical Data Model
• ADVANTAGES:• Hierarchical Model is simple to construct and operate on
• Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies
• Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.
• DISADVANTAGES:• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records
• Little scope for "query optimization"
14
Network Database Model
Each record can have multiple parents
Composed of sets
Each set has owner record and member record
Member may have several owners
15
Network Database Model
Advantages
Conceptual simplicity
Handles more relationship types
Data access flexibility
Promotes database integrity
Data independence
Conformance to standards
Disadvantages
System complexity
Lack of structural independence
16
Network Data Model
• ADVANTAGES:• Network Model is able to model complex relationships and
represents semantics of add/delete on the relationships.
• Can handle most situations for modeling using record types and relationship types.
• Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database.
• DISADVANTAGES:• Navigational and procedural nature of processing
Database contains a complex array of pointers that thread through a set of records.
Little scope for automated "query optimization”
17
Network Model Data are represented by collections of records.
similar to an entity in the E-R model
Records and their fields are represented as record typetype customer = record type account = record
customer-name: string; account-number: integer;
customer-street: string; balance: integer;
customer-city: string;
end end
Relationships among data are represented by links similar to a restricted (binary) form of an E-R relationship
restrictions on links depend on whether the relationship is many many, many-to-one, or one-to-one.
18
Data-Structure Diagrams
o Schema representing the design of a network database.
o A data-structure diagram consists of two basic components:
o Boxes, which correspond to record types.
o Lines, which correspond to links.
o Specifies the overall logical structure of the database.
o For every E-R diagram, there is a corresponding data-structure diagram.
19 20
Network Model Data Structure
a record type may be both a owner and member of two set types
21
The DBTG CODASYL Model
o All links are treated as many-to-one relationships.
o To model many-to-many relationships, a record type is defined to represent the relationship and two links are used.
o Create a new record type Rlink (referred to as a dummy record type).
22
23
Hierarchical Model
o A hierarchical database consists of a collection of records which are connected to one another through links.
o a record is a collection of fields, each of which contains only one data value.
o A link is an association between precisely two records.
o The hierarchical model differs from the network model in that the records are organized as collections of trees rather than as arbitrary graphs.
24
Tree-Structure Diagrams
The schema for a hierarchical database consists of
boxes, which correspond to record types
lines, which correspond to links
Record types are organized in the form of a rooted tree.
No cycles in the underlying graph.
Relationships formed in the graph must be such that only one-to-many or one-to-one relationships exist between a parent and a child.
25
General Structure
A parent may have an arrow pointing to a child, but a child must have an arrow pointing to its parent.
26
Database schema is represented as a collection of tree-structure diagrams.
single instance of a database tree
The root of this tree is a dummy node
The children of that node are actual instances of the appropriate record type
27
Single Relationships
If the relationship depositor is one to one, then the link depositor has two arrows.
Only one-to-many and one-to-one relationships can be directly represented in the hierarchical mode.
28
The Relational Model
29
Introduction
Proposed by Edgar. F. Codd(1923-2003) in the early seventies. [ Turing Award –1981
Most of the modern DBMS are relational.
Simple and elegant model with a mathematical basis.
Led to the development of a theory of data dependencies and database design.
Relational algebra operations
crucial role in query optimization and execution.
Laid the foundation for the development of
Tuple relational calculus and then
Database standard SQL
30
Basic Concepts
• Entities and relationships are stored in tables
• Relationships are captured by including key of
one table into another
• Languages for manipulating the tables
• All popular DBMSs today are based on relational
data model (or an extension of it, e.g., object-
relational data model)
31
Why is it so good?
• Simplicity, everybody knows how to manipulate tables
• Tables are simple enough so that solutions to complicated
problems such as concurrency control and query
optimization can be obtained
• It has a theoretical basis for the studying of database design
problems
• Tables are logical concepts; physically tables can be stored
in different ways support data independence
32
Terminology
• Relation table; denoted by R(A1, A2, ..., An) where R is a relation
name and (A1, A2, ..., An) is the relation schema of R
• Attribute column; denoted by Ai
• Tuple row
• Attribute value value stored in a table cell
• Domain legal type and range of values of an attribute
denoted by dom(Ai)
– Attribute: Age Domain: [0-100]
– Attribute: EmpName Domain: 50 alphabetic chars
– Attribute: Salary Domain: non-negative integer
• Ideally, a domain can be defined in terms of another domain; e.g., the domain
of EmpName is PersonName. This is NOT allowed in most basic DBMSs.
• However, most recent DBMSs allows this (object-relational) extension such
as Oracle 10g.
33
Relational Database: Definitions
Relational database: a set of relations
Relation: made up of 2 parts: Instance : a table, with rows and columns.
#Rows = cardinality, #fields = degree / arity.
Schema : specifies name of relation, plus name and type of each column.
• e.g. Students(sid: string, name: string, login: string,age: integer, gpa: real).
Can think of a relation as a set of rows or tuples (i.e., all rows are distinct).
34
STUDENT
Name Student-id Age CGPA
Chan Kin Ho 99223367 23 8.19
Lam Wai Kin 96882145 17 10.00
Man Ko Yee 96452165 22 8.75 Lee Chin Cheung 96154292 16 10.00
Alvin Lam 96520934 15 9.65
Attributes/Columns (collectively
as a schema)
Relation Name/Table Name
An Example Relation
Cardinality = 5, degree = 4, all rows distinct
35
Another Relation Example
enrollment (studentName, rollNumber, courseNo, sectionNo)
enrollment
36
Relational Model
Sets
collections of items of the same type
no order
no duplicates
Mappings
domain range1:many
many:1
1:1
many:many
37
Relational Model Concepts
Relational Model of data is based on theconcept of RELATION
A Relation is a Mathematical concept based onidea of SETS
The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations
38
Relational Model Concepts
The model was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper:"A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970.
The above paper caused a major revolution in the field of Database management and earned Codd the coveted ACM Turing Award in 1981
39
Relational Model Concepts
The relational model represents the database as a collection of relations
Each relation resembles a table of values
When a relation is thought of as a table of values, each row in the table represents a collection of related data values
40
Some Formal Definitions
• A relation is denoted by: R(A1, A2, ..., An)
– STUDENT(Name, Student-id, Age, CGA)
• Degree of a relation: the number of attributes n in the
relation.
• Tuple t of R(A1, A2, ..., An): An ordered set of values
<v1,v2,...,vn> where each vi is an element of dom(Ai).
• Relation instance r(R): A set of tuples in R
r(R) = {t1, t2, ..., tm}, or alternatively
r(R) dom(A1) dom(A2) ... dom(An)
41
Domain
A Domain D is a set of atomic values.
Atomic means that each value in the domain is indivisible as far as the relational model is concerned
It means that if we separate an atomic value, the value itself become meaningless, for example: SSN
Local_phone_number
Names
Employee_ages
42
Domains & Data Types
Smallest semantic of data Individual Part Number, Individual Supplier
number, Individual City name etc. Atomic values or scalar values Domain is a named set of atomic values Pool of legal values Example: Supplier number an integer [0, 10000]
43
Relation and Cartesian Product
• A relation is any subset of the Cartesian product of domains of
values
• Example: Let Dom(Name) = { Lee, Cheung }
Dom(Grade) = { A, B, C }
Then the Cartesian product of the domains is
Dom(Name) Dom(Grade) = { Lee, A , Lee, B , Lee, C ,
Cheung, A , Cheung, B , Cheung C }
• A relation StudentGrade (Name, Grade) can be defined as any
subset of the Cartesian product Dom(Name) Dom(Grade)
r(StudentGrade) = { Lee, A , Cheung C } Dom(Name) Dom(Grade)
44
Characteristics of Relations
• Tuples in a relation are not considered to be ordered, even
though they appear to be in a tabular form. (Recall that a
relation is a set of tuples.)
• Ordering of attributes in a relation schema R are
significant.
• Values in a tuple: All values are considered atomic.
(Recall that a domain is a set of atomic values.) A special
null value is used to represent values that are unknown or
inapplicable to certain tuples.
45
Identical Relations
46
Relational Model Notation
An attribute A can be qualified with the relation name R to which it belongs by using the dot notation R.A
For example, STUDENT.Name or STUDENT.Age
47
Relational Model Notation
We refer to component values of a tuple t by:• t[Ai] or t.Ai• This is the value vi of attribute Ai for tuple t• Both t[Ai, Aj, Ak] or t.(Ai, Aj, Ak) refers to a list of
attributes from R
For example: consider a tuple t=< “Barbara Benson”, “533-69-1238”, “839-8461”, “7384 Fontana Lane”, NULL, 19, 3.25> from the STUDENT relation in Figure 5.1
We have t.name=< “Barbara Benson”,>t. (Ssn, Gpa, Age) = <“533-69-1238”,3.25,19>
48
Domain Constraints Each attribute A must be an atomic value from the
dom(A)
The data types associated with domains typically include standard numeric data type for integers, real numbers, Characters, Booleans, fix-length strings, time, date, money or some special data types
Domain-constrained comparisonsSelect …..
From P, SP
Where P.P# = SP.P#
Select …..
From P, SP
Where P.weight = SP.qty
Both are valid queries in SQL, but second one makes nosense!!
49
Key Constraints
A relation is defined as a set of tuples By definition, all elements of a set are distinct This means that no two tuples can have the
same combination of values for all their attributes
Superkey: a set of attributes that no two distinct tuples in any state r of R have the same value
Every relation has at least one default superkey – the set of all its attributes
50
Key Constraints
A superkey can have redundant attributes, so a more useful concept is that of a KEY which has no redundancy
Key satisfied two constrains:
Two distinct tuple in any state of the relation cannot have identical values for the attributes in the key
It is a minimal superkey
51
Key Constraints
For example, consider STUDENT relation
The attribute set {IDNO} is a key of STUDENT because no two student can have the same value for IDNO
Any set of attributes that includes IDNO – for example {IDNO, Name, Age} – is a superkey
52
Key Constraints
In general, a relation schema may have more than one key, in this case, each of the key is called a candidate key
Example: Consider the CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) CAR has two keys:
• Key1 = {State, Reg#}• Key2 = {SerialNo}
Both are also superkeys of CAR {SerialNo, Make} is a superkey but not a key.
53
Let K R (I.e., K is a set of attributes which is a subset of the schema of R)
K is a superkey of R if K can identify a unique tuple in a given relation r(R)
Keys
Customer(CusNo, Name, Address, …)where customers have unique customer numbers and unique names.Possible superkeys: CusNo
{CusNo, Name}{CusNo, Name, Address}plus many others
• K is a candidate key if K is minimal
• Every relation is guarantee to (must) have at least one key.
Why?
54
Key(Candidate key)
A key can not be determined from any particular instance data
it is an intrinsic property of a schema
it can only be determined from the meaning of attributes
A relation can have more than one key.
Superkey: A set of attributes that contains any key as a subset. A key can also be defined as a minimal superkey
Primary Key: One of the candidate keys chosen for indexing purposes ( More details later…)
55
Key Constraints
If a relation has several candidate keys, one is chosen arbitrarily to be the primary key.
Example: Consider the CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) We chose SerialNo as the primary key
The primary key value is used to uniquely identify each tuple in a relation Provides the tuple identity
Also used to reference the tuple from another tuple General rule: Choose as primary key the smallest of the
candidate keys (in terms of size) Not always applicable – choice is sometimes subjective
56
CAR table with two candidate keys –LicenseNumber chosen as Primary Key
57
COMPANY Database Schema
58
Key Constraints and Constraints on NULL values
Another constraint on attributes specifies whether NULL value are or are not permitted
For example, if every STUDENT tuple must have a valid, non-NULL value for the Name attribute, then Name of STUDENT is constrained to be NOT NULL
59
Entity Integrity
Entity Integrity:
The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R).
• This is because primary key values are used to identify the individual tuples.
• t[PK] null for any tuple t in r(R)
• If PK has several attributes, null is not allowed in any of these attributes
Note: Other attributes of R may be constrained to disallow null values, even though they are not members of the primary key.
60
Referential Integrity Constraint
Referential Integrity Constraint is specified between two relations and is used to maintain the consistency among tuples in the two relations
Informally define the constraint: a tuple in one relation must refer to an existing tuple in that relation
For example, the Dno in EMPLOYEE gives the department number for which each employee works, this number must match the Dnumber value in DEPARTMENT
61
Referential Integrity Constraint
Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2.
A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
62
Displaying a relational database schema and its constraints
Each relation schema can be displayed as a row of attribute names
The name of the relation is written above the attribute names
The primary key attribute (or attributes) will be underlined
A foreign key (referential integrity) constraints is displayed as a directed arc (arrow) from the foreign key attributes to the referenced table
Can also point to the primary key of the referenced relation for clarity
Next slide shows the COMPANY relational schema diagram
63
Referential Integrity Constraints for COMPANY database
64
65
Other Types of Constraints
Semantic Integrity Constraints:
based on application semantics and cannot be expressed by the model per se
Example: “the max. no. of hours per employee for all projects he or she works on is 56 hrs per week”
A constraint specification language may have to be used to express these
66
Modification and Updates
In this section, we concentrate on the database Updates and Modification
There are threee basic operation: Insert, Delete and Modify Insert is used to insert a new tuple or tuples in a
relation
Delete is used to delete tuples
Update (or Modify) is used to change the values of some attributes
67
Modification and Updates
Insert: insert new element with specify all related attributes
Delete: delete an element by giving Relation name and key of the tuple
Modify: modify a value by giving a relation name, Key of the target tuple and attribute to modify
68
Possible violations for each operation
INSERT may violate any of the constraints:
Domain constraint:• if one of the attribute values provided for the new tuple
is not of the specified attribute domain
Key constraint:• if the value of a key attribute in the new tuple already
exists in another tuple in the relation
Referential integrity:• if a foreign key value in the new tuple references a
primary key value that does not exist in the referenced relation
Entity integrity:• if the primary key value is null in the new tuple
69
Insert Example
Insert <„Cecilia‟, „F‟, „Kolonsky‟, NULL, „1960-04-05‟, „6357 Windy lane,Kate,TX‟, F, 28000, NULL, 4> into EMPLOYEE
Insert < „Cecilia‟, „F‟, „Kolonsky‟, 999887777, „1960-04-05‟, „6357 Windy lane,Kate,TX‟, F, 28000, NULL, 4 >
Cecilia‟, „F‟, „Kolonsky‟, 667788999, „1960-04-05‟, „6357 Windy lane,Kate,TX‟, F, 28000, NULL, 7>
70
Possible violations for each operation
DELETE may violate only referential integrity:
If the primary key value of the tuple being deleted is referenced from other tuples in the database
• Can be remedied by several actions: RESTRICT, CASCADE, SET NULL
• RESTRICT option: reject the deletion
• CASCADE option: propagate the new primary key value into the foreign keys of the referencing tuples
• SET NULL option: set the foreign keys of the referencing tuples to NULL
One of the above options must be specified during database design for each foreign key constraint
71
Delete Example
Delete the EMPLOYEE tuple with Ssn=„99988777‟
Delete the EMPLOYEE tuple with Ssn=„333445555‟
Delete the EORKS_ON tuple eith Essn=„999887777‟ and Pno=10
72
Possible violations for each operation
UPDATE may violate domain constraint and NOT NULL constraint on an attribute being modified
Any of the other constraints may also be violated, depending on the attribute being updated:
Updating the primary key (PK):
• Similar to a DELETE followed by an INSERT
• Need to specify similar options to DELETE
Updating a foreign key (FK):
• May violate referential integrity
Updating an ordinary attribute (neither PK nor FK):
• Can only violate domain constraints
73
Update Example
Update the salary of EMPLOYEE tuple with Ssn=„999887777‟ to 2800
Update the Dno of the EMPLOYEE tuple with Ssn=„999887777‟ to 1
Update the Dno of the EMPLOYEE tuple with Ssn=„999887777‟ to 7
Update the Ssn of the EMPLOYEE tuple with Ssn=„999887777‟ to „987654321‟
74
Summary In relational systems, the DB is perceived by the
user as relations & nothing else
Relations are only logical structures
At the physical level, the system is free to storethe data in any way it likes – using sequentialfiles, indexing, hashing…
Provided it can map stored representations torelations
75
Relational Systems Consider the relations:Dept(dept#, dname, budget)
D1 MKTNG 10M
D2 DEV 12M
D3 RES 5M
Emp(emp#, ename, dept#, salary)
E1 LOPEZ D1 40K
There is a connection between tuples E1 & D1. The connection is represented,
not by a pointer, but by the occurrence of value D1 in E1.
In non-relational systems, such information is typically represented by some
kind of pointer that is visible to the user.
76
Relational Systems In relational systems, there are no pointers at
the logical level
Pointers will be there at the physical level
Physical storage details are concealed from theuser in relational systems
77
Properties of Relations
There are no duplicate tuples• Body of a relation is a mathematical set
Tuples are unordered, top to bottom• Body of a relation is a mathematical set• No such thing as fifth tuple, next tuple ..• No concept of positional addressing
Attributes are unordered, left to right• Heading of a relation is a mathematical set• No concept of positional addressing
All attribute values are atomic• Normalized (1st Normal Form)
78
Types of Relations Base Relations
• The original (given) relations
Derived Relations• Relations obtained from base relations
Views• “Virtual” derived relation• Only definition is stored in the catalog• Definition executed at run-time
Snapshots• “Real” derived relation
Query Result• Unnamed derived relation