21.entities - courses.cs.washington.edu

63
CSE 344 JULY 30 TH DB DESIGN (CH 4)

Transcript of 21.entities - courses.cs.washington.edu

Page 1: 21.entities - courses.cs.washington.edu

CSE 344JULY 30TH

DB DESIGN (CH 4)

Page 2: 21.entities - courses.cs.washington.edu

ADMINISTRIVIA• HW6 due next Thursday

• uses Spark API rather than MapReduce• (a bit higher level)

• be sure to shut down your AWS cluster when not in use

• Still grading midterms...• will hand back on Wednesday

Page 3: 21.entities - courses.cs.washington.edu

REVIEW

Horizontally partitioned data:• allows for easy scale-up

• OLTP is easy!• but OLAP becomes harder...

• new problem for query execution: needed data may not be local• fix by reshuffling: put required data into the same machine

Page 4: 21.entities - courses.cs.washington.edu

EXAMPLE 1

QuerySingle Node Plan

Multi-Node Plan

Page 5: 21.entities - courses.cs.washington.edu

REVIEW

MapReduce• lower-level API for • handles networking, I/O, failure, and straggler issues• main piece provided by the API is shuffle

• read → map → shuffle → reduce → write• fits perfectly with what we need above

• one job contains one shuffle• complex queries becomes multiple jobs

Page 6: 21.entities - courses.cs.washington.edu

PARALLEL QUERY PLANS• Split into phases at each reshuffle

• Do as much work as possible within each phase

• selections never require a reshuffle• group by & join may require it

• (consider what data is currently partitioned by)

• In general, n-phase plan can be implemented in (n-1) MapReduce jobs

• (since it has n-1 shuffles)

Page 7: 21.entities - courses.cs.washington.edu

EXAMPLE 1

QuerySingle Node Plan

Multi-Node Plan

MapReduce• saw this before in three parts: left side, right side, join• only 1 reshuffle (for the join), so only 1 job• mapper does first two parts: input is Item i or Order o

• left side: output (i.item, (‘Item’, i))• right side: output (o.item, (‘Order’, o))

• reducer: input is (item, [(table_name, value)])• split list into two: items and orders• output all combinations (two nested loops)

Page 8: 21.entities - courses.cs.washington.edu

EXAMPLE 2Schema:

Drug(spec VARCHAR(255), compatibility INT)Person(name VARCHAR(100) PK, compatibility INT)

QuerySELECT P.name, count(D.spec)FROM Person AS P, Drug AS DWHERE P.compatibility = D.compatibilityGROUP BY P.name;

Logical PlanƔ name, count(spec) (Person ⋈ Drug)

Page 9: 21.entities - courses.cs.washington.edu

EXAMPLE 2Logical Plan

Ɣ name, count(spec) (Person ⋈ Drug)

Possible Reshuffles• join: get tuples with same “compatibility” on same machine• group by: get tuples with same “name” on same machine

Page 10: 21.entities - courses.cs.washington.edu

EXAMPLE 2Logical Plan

Ɣ name, count(spec) (Person ⋈ Drug)

Assume• Drug is block partitioned• Person is hash partitioned by “compatibility”

Physical Plan• join: reshuffle Drug by compatibility (not needed for Person)• group-by: no reshuffle!

• name is a PK so only one tuple with that name• join can produce multiple tuples but all are on same machine

with the Person tuple having that name

Page 11: 21.entities - courses.cs.washington.edu

DB DESIGN

Page 12: 21.entities - courses.cs.washington.edu

DATABASE DESIGNWhat it is:Starting from scratch, design the database schema:relation, attributes, keys, foreign keys, constraints etc.Why it’s hardThe database will be in operation for a very long time (years). Updating the schema while in production is very expensive

• computer time• engineering time

Page 13: 21.entities - courses.cs.washington.edu

DATABASE DESIGN

Consider issues such as:• What entities to model• How entities are related• What constraints exist in the domain

Several formalisms exists• We discuss E/R diagrams

• frequently used to communicate schemas in industry• UML, model-driven architecture

Reading: Sec. 4.1-4.6

Page 14: 21.entities - courses.cs.washington.edu

DATABASE DESIGN PROCESS

companymakesproduct

name

price name address

Conceptual Model:

Relational Model:Tables + constraintsAnd also functional dep.

Normalization:Eliminates anomalies

Conceptual Schema

Physical SchemaPhysical storage details

Page 15: 21.entities - courses.cs.washington.edu

ENTITY / RELATIONSHIP DIAGRAMS

Entity set = a class• An entity = an object

Attribute

Relationship

Product

city

makes

Company

Page 16: 21.entities - courses.cs.washington.edu

Person

CompanyProduct

buys

makes

employs

name CEO

price

address name ssn

address

name

Page 17: 21.entities - courses.cs.washington.edu

KEYS IN E/R DIAGRAMSEvery entity set must have a key

Product

name

price

Page 18: 21.entities - courses.cs.washington.edu

MULTIPLICITY OF E/R RELATIONS

one-one:

many-one

many-many

123

abcd

123

abcd

123

abcd

Page 19: 21.entities - courses.cs.washington.edu

Person

CompanyProduct

buys

makes

employs

name CEO

price

address name ssn

address

name

What doesthis say ?

Page 20: 21.entities - courses.cs.washington.edu

ATTRIBUTES ON RELATIONSHIPS

ProductPerson Buys

name price

address

date

name

What doesthis say ?

Page 21: 21.entities - courses.cs.washington.edu

MULTI-WAY RELATIONSHIPSHow do we model a purchase relationship between buyers,products and stores?

Purchase

Product

Person

Store

Can still model as a mathematical set (How?)

As a set of triples ⊆ Person × Product × Store

date

Page 22: 21.entities - courses.cs.washington.edu

Q: What does the arrow mean ?

ARROWS IN MULTIWAY RELATIONSHIPS

A: Any person buys a given product from at most one store

Purchase

Product

Person

Store

[Fine print: Arrow pointing to E means that if we select one entity from each of the other entity sets in the relationship, those entities are related to at most one entity in E]

date

Page 23: 21.entities - courses.cs.washington.edu

Q: What does the arrow mean ?

ARROWS IN MULTIWAY RELATIONSHIPS

A: Any person buys a given product from at most one storeAND every store sells to every person at most one product

Purchase

Product

Person

Store

date

Page 24: 21.entities - courses.cs.washington.edu

CONVERTING MULTI-WAY RELATIONSHIPS TO BINARY

Purchase

Person

Store

Product

StoreOf

ProductOf

BuyerOf

date

Arrows go in which direction?

Page 25: 21.entities - courses.cs.washington.edu

CONVERTING MULTI-WAY RELATIONSHIPS TO BINARY

Purchase

Person

Store

Product

StoreOf

ProductOf

BuyerOf

date

Make sure you understand why!

Page 26: 21.entities - courses.cs.washington.edu

PurchaseProduct Person

President PersonCountry

Moral: Be faithful to the specifications of the application!

DESIGN PRINCIPLES:WHAT’S WRONG?

Page 27: 21.entities - courses.cs.washington.edu

DESIGN PRINCIPLES:WHAT’S WRONG?

Purchase

Product

Store

date

personNamepersonAddr

Moral: pick the rightkind of entities.

Page 28: 21.entities - courses.cs.washington.edu

DESIGN PRINCIPLES:WHAT’S WRONG?

Purchase

Product

Person

Store

dateDates

Moral: don’t complicate life morethan it already is.

Page 29: 21.entities - courses.cs.washington.edu

FROM E/R DIAGRAMSTO RELATIONAL SCHEMA

Entity set à relationRelationship à relation

Page 30: 21.entities - courses.cs.washington.edu

ENTITY SET TO RELATION

Product

prod-ID category

price

Product(prod-ID, category, price)

prod-ID category priceGizmo55 Camera 99.99Pokemn19 Toy 29.99

Page 31: 21.entities - courses.cs.washington.edu

N-N RELATIONSHIPS TO RELATIONS

Orders

prod-ID cust-ID

date

Shipment Shipping-Co

address

namedate

How to represent Shipment in relations?

Page 32: 21.entities - courses.cs.washington.edu

prod-ID cust-ID name date

Gizmo55 Joe12 UPS 4/10/2011

Gizmo55 Joe12 FEDEX 4/9/2011

N-N RELATIONSHIPS TO RELATIONS

Orders

prod-ID cust-ID

date

Shipment Shipping-Co

address

name

Orders(prod-ID,cust-ID, date)Shipment(prod-ID,cust-ID, name, date)Shipping-Co(name, address)

date

Page 33: 21.entities - courses.cs.washington.edu

N-1 RELATIONSHIPS TO RELATIONS

Orders

prod-ID cust-ID

date

Shipment Shipping-Co

address

namedate

How to represent Shipment in relations?

Page 34: 21.entities - courses.cs.washington.edu

N-1 RELATIONSHIPS TO RELATIONS

Orders

prod-ID cust-ID

date

Shipment Shipping-Co

address

name

Orders(prod-ID,cust-ID, date1, name, date2) Shipping-Co(name, address)

date

Remember: no separate relations for many-one relationship

Page 35: 21.entities - courses.cs.washington.edu

MULTI-WAY RELATIONSHIPS TO RELATIONS

Purchase

Product

Person

Storeprod-ID price

ssn name

name address

Purchase(prod-ID, ssn, name)

Page 36: 21.entities - courses.cs.washington.edu

MODELING SUBCLASSES

Some objects in a class may be special• define a new class• better: define a subclass

Products

Software products

Educational products

So --- we define subclasses in E/R

Page 37: 21.entities - courses.cs.washington.edu

Product

name category

price

isa isa

Educational ProductSoftware Product

Age Groupplatforms

MODELING SUBCLASSES

Page 38: 21.entities - courses.cs.washington.edu

Product

name category

price

isa isa

Educational ProductSoftware Product

Age Groupplatforms

Name Price Category

Gizmo 99 gadget

Camera 49 photo

Toy 39 gadget

Name platforms

Gizmo unix

Product

Sw.Product

Ed.Product

Other ways to convert are possible...

Name Age Group

Gizmo toddler

Toy retired

MODELING SUBCLASSES

Is this representation subclassing in Java sense?

Page 39: 21.entities - courses.cs.washington.edu

MODELING UNION TYPES WITH SUBCLASSES

FurniturePiece

PersonCompany

Say: each piece of furniture is owned either by a person or by a company

Page 40: 21.entities - courses.cs.washington.edu

MODELING UNION TYPES WITH SUBCLASSES

Say: each piece of furniture is owned either by a person or by a companySolution 1. Acceptable but imperfect (What’s wrong ?)

FurniturePiecePerson Company

ownedByPerson ownedByComp.

Page 41: 21.entities - courses.cs.washington.edu

MODELING UNION TYPES WITH SUBCLASSES

Solution 2: better, more laborious

isa

FurniturePiece

Person CompanyownedBy

Owner

isa

Page 42: 21.entities - courses.cs.washington.edu

WEAK ENTITY SETSEntity sets are weak when their key comes from otherclasses to which they are related.

UniversityTeam affiliation

numbersport name

Team(sport, number, universityName)University(name)

Page 43: 21.entities - courses.cs.washington.edu

WHAT ARE THE KEYS OF R ?

R

A

B

S

T

V

Q

UW

V

Z

C

DE G

K

H

FL

Page 44: 21.entities - courses.cs.washington.edu

INTEGRITY CONSTRAINTS MOTIVATION

ICs help prevent entry of incorrect informationHow? DBMS enforces integrity constraints

• Allows only legal database instances (i.e., those that satisfy all constraints) to exist

• Ensures that all necessary checks are always performed and avoids duplicating the verification logic in each application

An integrity constraint is a condition specified on a database schema that restricts the data that can be stored in an instance of the database.

Page 45: 21.entities - courses.cs.washington.edu

CONSTRAINTS IN E/R DIAGRAMSFinding constraints is part of the modeling process. Commonly used constraints:

Keys: social security number uniquely identifies a person.

Single-value constraints: a person can have only one biological father.

Referential integrity constraints: if you work for a company, itmust exist in the database.

Other constraints: peoples’ ages are between 0 and 120

Page 46: 21.entities - courses.cs.washington.edu

KEYS IN E/R DIAGRAMS

address name ssn

Person

Product

name category

price

No formal way to specify multiplekeys in E/R diagrams

Underline:

Page 47: 21.entities - courses.cs.washington.edu

SINGLE VALUE CONSTRAINTS

makes

makes

vs.

Page 48: 21.entities - courses.cs.washington.edu

REFERENTIAL INTEGRITY CONSTRAINTS

CompanyProduct makes

CompanyProduct makes

Each product made by at most one company.Some products made by no company

Each product made by exactly one company.

Page 49: 21.entities - courses.cs.washington.edu

OTHER CONSTRAINTS

CompanyProduct makes<100

Q: What does this mean ?A: A Company entity cannot be connectedby relationship to more than 99 Product entities

Page 50: 21.entities - courses.cs.washington.edu

CONSTRAINTS IN SQLConstraints in SQL:Keys, foreign keysAttribute-level constraintsTuple-level constraintsGlobal constraints: assertions

The more complex the constraint, the harder it is to check and to enforce

simplest

Mostcomplex

Page 51: 21.entities - courses.cs.washington.edu

KEY CONSTRAINTS

OR:

CREATE TABLE Product (name CHAR(30) PRIMARY KEY,category VARCHAR(20))

CREATE TABLE Product (name CHAR(30),category VARCHAR(20),PRIMARY KEY (name))

Product(name, category)

Page 52: 21.entities - courses.cs.washington.edu

KEYS WITH MULTIPLE ATTRIBUTES

CREATE TABLE Product (name CHAR(30),category VARCHAR(20),price INT,

PRIMARY KEY (name, category))

Name Category Price

Gizmo Gadget 10

Camera Photo 20

Gizmo Photo 30

Gizmo Gadget 40

Product(name, category, price)

Page 53: 21.entities - courses.cs.washington.edu

OTHER KEYSCREATE TABLE Product (

productID CHAR(10),name CHAR(30),category VARCHAR(20),price INT,PRIMARY KEY (productID),UNIQUE (name, category))

There is at most one PRIMARY KEY;there can be many UNIQUE

Page 54: 21.entities - courses.cs.washington.edu

FOREIGN KEY CONSTRAINTS

CREATE TABLE Purchase (prodName CHAR(30)REFERENCES Product(name),date DATETIME)

prodName is a foreign key to Product(name)name must be a key in Product

Referentialintegrity

constraints

May writejust Product

if name is PK

Page 55: 21.entities - courses.cs.washington.edu

FOREIGN KEY CONSTRAINTS

Example with multi-attribute primary key

(name, category) must be a KEY in Product

CREATE TABLE Purchase (prodName CHAR(30),category VARCHAR(20),date DATETIME,FOREIGN KEY (prodName, category)

REFERENCES Product(name, category)

Page 56: 21.entities - courses.cs.washington.edu

Name Category

Gizmo gadget

Camera Photo

OneClick Photo

ProdName Store

Gizmo Wiz

Camera Ritz

Camera Wiz

Product Purchase

WHAT HAPPENS WHEN DATA CHANGES?

Types of updates:In Purchase: insert/updateIn Product: delete/update

Page 57: 21.entities - courses.cs.washington.edu

SQL has three policies for maintaining referential integrity:NO ACTION reject violating modifications (default)CASCADE after delete/update do delete/updateSET NULL set foreign-key field to NULLSET DEFAULT set foreign-key field to default value

• need to be declared with column, e.g., CREATE TABLE Product (pid INT DEFAULT 42)

WHAT HAPPENS WHEN DATA CHANGES?

Page 58: 21.entities - courses.cs.washington.edu

MAINTAINING REFERENTIAL INTEGRITY

CREATE TABLE Purchase (prodName CHAR(30),category VARCHAR(20),date DATETIME,FOREIGN KEY (prodName, category)

REFERENCES Product(name, category)ON UPDATE CASCADEON DELETE SET NULL )

Name Category

Gizmo gadget

Camera Photo

OneClick Photo

ProdName Category

Gizmo Gizmo

Snap Camera

EasyShoot Camera

Product Purchase

Page 59: 21.entities - courses.cs.washington.edu

CONSTRAINTS ON ATTRIBUTES AND TUPLES

Constraints on attributes:NOT NULL -- obvious meaning...CHECK condition -- any condition !

Constraints on tuplesCHECK condition

Page 60: 21.entities - courses.cs.washington.edu

CONSTRAINTS ON ATTRIBUTES AND TUPLES

CREATE TABLE R (A int NOT NULL, B int CHECK (B > 50 and B < 100), C varchar(20),

D int,CHECK (C >= 'd' or D > 0))

Page 61: 21.entities - courses.cs.washington.edu

CONSTRAINTS ON ATTRIBUTES AND TUPLES

CREATE TABLE Product (productID CHAR(10),name CHAR(30),category VARCHAR(20),price INT CHECK (price > 0),PRIMARY KEY (productID),UNIQUE (name, category))

Page 62: 21.entities - courses.cs.washington.edu

CREATE TABLE Purchase (prodName CHAR(30)

CHECK (prodName IN(SELECT Product.nameFROM Product),

date DATETIME NOT NULL)

Constraints on Attributes and Tuples

Whatis the difference from

Foreign-Key ?

What does this constraint do?

Page 63: 21.entities - courses.cs.washington.edu

GENERAL ASSERTIONS

CREATE ASSERTION myAssert CHECK(NOT EXISTS(

SELECT Product.nameFROM Product, PurchaseWHERE Product.name = Purchase.prodNameGROUP BY Product.nameHAVING count(*) > 200) )

But most DBMSs do not implement assertionsBecause it is hard to support them efficientlyInstead, they provide triggers