Database Entity Relations

download Database Entity Relations

of 31

Transcript of Database Entity Relations

  • 8/13/2019 Database Entity Relations

    1/31

  • 8/13/2019 Database Entity Relations

    2/31

  • 8/13/2019 Database Entity Relations

    3/31

    Exam Summary

    Average: 74.5

    Max: 93.5

    Min: 36.5

    Std Dev: 11.6

    -20

    -15

    -10

    -5

    0

    5

    10

    15

    20

    -40 -30 -20 -10 0 10 20 30

    Diff from Mean Exam 2

    DifffromM

    eanExam1

    Good on Exam 1

    Worse on Exam 2

    Bad on Exam 1

    Good on Exam 2

  • 8/13/2019 Database Entity Relations

    4/31

    Exam Details

    Section: LowHigh / Total (Average)

    1: External Sorting 1024 / 24 (18)

    2: Query Execution 1743 / 44 (36)

    3: Query Optim. 6.529 / 32 (20)

  • 8/13/2019 Database Entity Relations

    5/31

    Section 1: External Sorting

    For all questions in this section, assume that you have 96pages in the buffer pool, and you wish to sort EMP byEname.

    Using the general external sort, how many sorting passesare required? (6pts)

    #passes = 1 + Ceil(log(B-1)(N/B))

    = 1 + Ceil(log(95)(10,000/96))

    = 1 + 2

    = 3

    What is the I/O cost of doing this sort? (6pts)

    I/O Cost = 2N * (# passes)

    = 20,000 * 3

  • 8/13/2019 Database Entity Relations

    6/31

    Section 1 (cont)

    Database systems sometimes use blocked I/O because reading a block ofcontinuous pages is more efficient that doing a separate I/O for each page.Assume that in our computer system, it is much more efficient to read andwrite blocks of 32 pages at a time, so all reads and writes from files must be inblocks of 32 pages (and if a file has less than 32 pages, it is padded with blankpages). Consider doing external sort with blocked I/O, which gets faster I/O atthe expense of more sorting passes. If the database must read and write 32pages at a time, how many sorting passes are required? Show either the

    formula you use, or the temp file sizes after each pass. (4pts)

    Pass 0: 105 runs of 96 pages Pass 1: 2-way merge 96 + 96 -> result is 53 runs of 192 pages Pass 2: 2-way merge of 192 + 192 -> result is 27 runs of 384 pages Pass 3: -> result is 14 runs of 768 pages, Pass 4: -> result is 7 runs 1536 pages, Pass 5: -> result is 4 runs 3072 pages, Pass 6: -> result is 2 runs of 6144, Pass 7: -> result is 10,000 8 passes total

  • 8/13/2019 Database Entity Relations

    7/31

    Section 1 (cont)

    Blocked I/O is used because reading a 32-page block isfaster than 32 separate 1-page I/Os. For each sortingpass, you still must read and write every page in thefile, but instead of doing 10,000 1-page I/Os, instead

    you do (10,000/32) block I/Os. Assume that in oursystem, we can read a 32-page block in the time it

    would normally take to do 8 single-page I/Os. Is theblocked I/O sort faster or slower than a regular sortfrom question 2? By approximately what ratio? (4pts)

    Assume that instead of a heap file, the records fromEMP are stored a clustered B-Tree index, whose key is

    Ename, using alternative 1 (i.e., the full data recordsare stored in the leaves of the tree). The B-tree hasdepth 3. Assuming the B-Tree already exists, what isthe approximate I/O cost to use the B-tree to get therecords in sorted order? (4pts)

  • 8/13/2019 Database Entity Relations

    8/31

    Section 2: Relational Operators

    Consider the operation: (EID < 5000)EMP

    What is the I/O cost of this operation if there is no index? (4pts) Lecture 15 slide 9: no index -> sequential scan, N I/Os, meaning 10,000

    I/Os

    What is the reduction factor? (4pts) 5000/100,000 = 0.05

    Assume that instead of a heap file, the records from EMP are storedin a clustered B-Tree index, whose key is Ename, using alternative 1(i.e., the full data records are stored in the leaves of the tree). TheB-tree has depth 3. Assuming the B-Tree already exists, what is theI/O cost of this selection operation? (4pts)

    Due to a copy paste error on my part, the B-Tree Index is *not* usefulfor the query. I gave credit either way, whether people tried to use theindex as an index, or for sequential scan.

    In considering the cost of using the index, either as an index or for ascan, full credit required knowing that B-Trees have an overhead ofapproximately 50%, i.e., there are 50% more leaves than the number ofpages in a heap file.

  • 8/13/2019 Database Entity Relations

    9/31

    Relational Operators (cont)

    Consider the query: select distinct Ename from EMPorder by Ename

    If you had 96 pages of memory, what algorithm would youuse to execute this query and why? (4pts)

    Order by suggests sort-merge to remove dups.

    If, instead, the query were: select distinct Ename fromEMP and you had 101 pages of memory, whatalgorithm would you use and why? (4pts)

    Could use hashing to remove dups, which would have asimilar I/O cost but be more CPU efficient.

  • 8/13/2019 Database Entity Relations

    10/31

    Relational Operators (cont)

    Consider the join: In_Dept Dept

    Assuming 100 pages of memory, what is the I/O cost of this join using Blocked Nested Loops?(4pts) Lec 15, slide 27: Blocked nested loops: M + Ceil(M/(B-2)) * N 550 + Ceil(550/100)) * 20 = 550 + 6 * 20 = 670

    What is the I/O cost of this using Index Nested Loops, with an unclustered Hash indexon Dept.DID? Assume an average cost of 1.2 I/Os to get to the right hash bucket.(4pts)

    Lec 15, slide 27: index nested loops: M + #tuples in M * index cost 550 + 110,000 * (1.2 + 1) = 242,000

    Consider the join: Dept In_Dept. Assuming 100 pages of memory, what is the I/Ocost of this using Sort-Merge-Join, optimized so that the last merge pass is combinedwith the join if possible? (4pts) Best: 1 pass over In_Dept, to make 6 runs, sort Dept in memory, merge. Cost 550*3 + 20 =

    1670 By the book: 3(M + N) = 3*570 = 1710

    Assuming 100 pages of memory, what is the I/O cost of this using regular (not hybrid)hash join? (4pts) Best: partition In_Dept (2 * 550), hash Dept in memory, match. Cost 550 * 3 + 20 = 1670 By the book: 3(M + N) = 1710

  • 8/13/2019 Database Entity Relations

    11/31

    Relational Operators (cont)

    Consider the join: ( (EID

  • 8/13/2019 Database Entity Relations

    12/31

    Section 3: Query Optimization

    There are several parts of an SQL query that can cause a column (or columns)to be considered an "interesting order", meaning that an ordered input canlower the cost of evaluating that part of the query. Briefly describe three ofthese, and for each one brieflyexplain why. (6 pts)

    Order By, Group By, Joins, Distinct, some aggregations (e.g. Max, Min)

    Write the reduction factor that would be used by a System R style queryoptimizer for each of the following predicates (6pts)

    Building.BID > 150 BID ranges from 1 to 200, so this would be . EMP.Ename = Joe

    We dont know about distribution of names, so 1/10.

    IN_DEPT.EID = 003 Since there are 100,000 employees, this is 1/100,000

    Given the following query, where X is the join operator:

    (EMP.Ename) (Dept.Budget > 500000) (Emp X In_Dept X Dept) Mark whether each of the following queries are equivalent (True/False). __T___(EMP.Ename) (Dept.Budget > 500000) (Emp X Dept X In_Dept) __T___(EMP.Ename) (Dept.Budget > 500000) (Dept X In_Dept X Emp) __T___(EMP.Ename) (Emp X In_Dept X (Dept.Budget > 500000)(Dept)) __F___(Dept.Budget > 500000) ((EMP.Ename) (Emp) X In_Dept X Dept)

  • 8/13/2019 Database Entity Relations

    13/31

    Query Optimization (cont)

    Given the schema on the first page of the exam, assume the following:

    There are unclustered hash indexes on both Emp.EID and Dept.DID. There are clustered B-Tree Indexes (alt 1data records store in the B-Tree leaves) on

    both Emp.EID and Dept.DID. There is an unclustered Btree index on (Emp.Salary, Emp.EID) You can assume that Btree-Indexes have heights of 3. And the cost for getting to the

    data record using a hash index is 2.

    What is the best access plan for the following query (4pts).

    SELECT E.eid, E.SalaryFROM Emp EWHERE E.Salary > 100,000

    Very best is unclustered B-tree, index only plan

    Draw all possible join orders considered by a System-R style query optimizer forthe following query. (4pts)

    SELECT E.eid, D.dnameFROM Emp E, In_Dept I, Dept DWHERE E.sal = 64,000 AND D.budget > 500,000 AND

    E.eid = I.eid AND I.did = D.did (E X I) X D) (I X E) X D) (D X I) X E) (I X D) X E)

  • 8/13/2019 Database Entity Relations

    14/31

  • 8/13/2019 Database Entity Relations

    15/31

    Today and Thursday: The ER Model

    A different data model from Relational

    Most commonly used for database design

    Today: Details of the ER Model

    Thursday: Translating ER Schemas to Relational

  • 8/13/2019 Database Entity Relations

    16/31

    Databases Model the Real World

    Data Model translates real world things

    into structures computers can store Many models:

    Relational, E-R, O-O, Network, Hierarchical, etc.

    Relational Rows & Columns

    Keys & Foreign Keys to link Relations

    sid name login age gpa

    53666 Jones jones@cs 18 3.4

    53688 Smith smith@eecs 18 3.2

    53650 Smith smith@math 19 3.8

    sid cid grade

    53666 Carnatic101 C

    53666 Reggae203 B

    53650 Topology112 A

    53666 History105 B

    Enrolled Students

  • 8/13/2019 Database Entity Relations

    17/31

    Database Design

    The process of modellingthings in the realworld into elements of a data model.

    I.E., describing things in the real world using a

    data model.

    E.G., describing students and enrollmentsusing various tables with key/foreign keyrelationships

    The Relational model is not the only model inuse

  • 8/13/2019 Database Entity Relations

    18/31

    A Problem with the Relational Model

    With complicated schemas, it may be hard for a personto understand the structure from the data definition.

    CREATE TABLE Students(sid CHAR(20),name CHAR(20),login CHAR(10),age INTEGER,

    gpa FLOAT)

    CREATE TABLE Enrolled(sid CHAR(20),cid CHAR(20),grade CHAR(2))

    sid name login age gpa

    53666 Jones jones@cs 18 3.4

    53688 Smith smith@eecs 18 3.2

    53650 Smith smith@math 19 3.8

    Studentscid grade sidCarnatic101 C 53666

    Reggae203 B 53666

    Topology112 A 53650

    History105 B 53666

    Enrolled

  • 8/13/2019 Database Entity Relations

    19/31

    One Solution: The E-R Model

    Instead of relations, it has:Entities and Relationships

    These are described with diagrams,

    both structure, notation more obvious to humans

    A visual language for describing schemas

    lot

    name

    Students

    ssn

    Enrolled_in

    since dname

    budgetdid

    Courses

  • 8/13/2019 Database Entity Relations

    20/31

    Steps in Database Design

    Requirements Analysis user needs; what must database do?

    Conceptual Design

    high level descr (often done w/ER model)

    Logical Design translate ER into DBMS data model

    Schema Refinement (in 2 weeks)

    consistency, normalization

    Physical Design(discussed already)

    indexes, disk layout

    Security Design

    who accesses what, and how

  • 8/13/2019 Database Entity Relations

    21/31

    Conceptual Design

    Define enterprise entitiesand relationships

    What information about entities and relationshipsshould be in database?

    What are the integrity constraintsor business rulesthat hold?

    A database `schema in the ER Model is representedpictorially (ER diagrams).

    Can map an ER diagram into a relational schema.

  • 8/13/2019 Database Entity Relations

    22/31

    ER Model Basics

    Entity:

    Real-world thing, distinguishable from other objects.

    Entity described by set of attributes.

    Entity Set: A collection of similar entities. E.g., all employees.

    All entities in an entity set have the same set of attributes.(Until we consider hierarchies, anyway!)

    Each entity set has a key (underlined).

    Each attribute has a domain.

    Employees

    ssnname

    lot

  • 8/13/2019 Database Entity Relations

    23/31

    ER Model Basics (Contd.)

    Relationship: Association among two or more entities.E.g., Attishoo works in Pharmacy department.

    relationships can have their own attributes.

    Relationship Set: Collection of similar relationships.

    An n-ary relationship set Rrelates nentity sets E1... En ;each relationship in Rinvolves entities e1E1, ..., enEn

    lot

    name

    Employees

    ssn

    Works_In

    since dname

    budgetdid

    Departments

  • 8/13/2019 Database Entity Relations

    24/31

    ER Model Basics (Cont.)

    Same entity set can participate indifferent relationship sets, or indifferent roles in the same set.

    subor-

    dinate

    super-

    visor

    Reports_To

    since

    Works_In

    dname

    budgetdid

    Departments

    lot

    name

    Employees

    ssn

  • 8/13/2019 Database Entity Relations

    25/31

    Key Constraints

    An employee canwork in manydepartments; a

    dept can havemany employees.

    1-to-11-to ManyMany-to-

    Many

    since

    Manages

    dname

    budgetdid

    Departments

    since

    Works_In

    lot

    name

    ssn

    Employees

    In contrast, each dept

    has at most onemanager, accordingto the key constrainton Manages.

  • 8/13/2019 Database Entity Relations

    26/31

    Participation Constraints Does every employee work in a department?

    If so, this is aparticipation constraint the participation of Employees in Works_In is said to betotal(vs. partial)

    What if every department has an employee working in it?

    Basically means at least one

    lot

    name dname

    budgetdid

    sincename dname

    budgetdid

    since

    Manages

    since

    DepartmentsEmployees

    ssn

    Works_In

    Means: exactly one

  • 8/13/2019 Database Entity Relations

    27/31

    Weak EntitiesA weak entity can be identified uniquely only by

    considering the primary key of another(owner) entity.

    Owner entity set and weak entity set mustparticipate in a one-to-many relationship set (one

    owner, many weak entities).

    Weak entity set must have total participation inthis identifying relationship set.

    lot

    nameagepname

    DependentsEmployees

    ssn

    Policy

    cost

    Weak entities have only a partial key (dashed underline)

  • 8/13/2019 Database Entity Relations

    28/31

    Binary vs. Ternary Relationships

    If each policy isowned by just 1employee: Bad design

    Beneficiary

    agepname

    Dependents

    policyid cost

    Policies

    Purchaser

    name

    Employees

    ssn lot

    Better design

    Think through allthe constraints inthe 2nd diagram!

    Policies

    policyid cost

    agepname

    DependentsCovers

    name

    Employees

    ssn lot

    Key constraint on

    Policies wouldmean policy canonly cover 1dependent!

  • 8/13/2019 Database Entity Relations

    29/31

    Binary vs. Ternary Relationships (Contd.)

    Previous example illustrated case when two binaryrelationships were better than one ternaryrelationship.

    Opposite example: a ternary relation Contractsrelates entity sets Parts, Departments andSuppliers,

    and has descriptive attribute qty. No combination ofbinary relationships is an adequate substitute.

  • 8/13/2019 Database Entity Relations

    30/31

    Binary vs. Ternary Relationships (Contd.)

    S can-supply P, D needs P, and D deals-with S doesnot imply that D has agreed to buy P from S.

    How do we record qty?

    Suppliers

    qty

    DepartmentsContractParts

    Suppliers

    Departments

    deals-with

    Parts

    can-supply

    VS.

    needs

  • 8/13/2019 Database Entity Relations

    31/31

    Summary so far

    Entities and Entity Set (boxes)

    Relationships and Relationship sets (diamonds)

    binary

    n-ary Key constraints (1-1,1-M, M-M, arrows on 1 side)

    Participation constraints (bold for Total)

    Weak entities - require strong entity for key

    Next, a couple more advanced concepts