CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot...

Post on 21-Feb-2021

3 views 0 download

Transcript of CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot...

September 23, 2015 Sam Siewert

CS317 File and Database Systems

Lecture 5, Part-2 – ORDBMS http://www.ibmbigdatahub.com/video/ibm-big-data-minute-drowning-petabytes

SQL Theory and Standards

DBMS Design (Connolly-Begg Chapter 10)

Part-2 Development Lifecycle

Sam Siewert

2

For Discussion… Big Data – Velocity, volume, variety, veracity [2014] 1. Daily – 2.5 quintillion bytes (2,500,000,000,000,000,000) or 2 Exabytes, or

46,566,128 50GB Blu-Ray Discs, IBM Estimate

2. Annually – 7.5 billion in global population, produce/consume 2.25 unique Blu-Rays per Year, or 23 DVDs (assuming even distribution – unlikely)

3. Annually – If produced/consumed by US population alone – 53 Blu-Rays per Year or 564 DVDs per person

4. Data in Total is 40 trillion gigabytes or 800 billion Blu-Rays for just over 100 (unique) Blu-Rays per person globally

5. Data by Powers of 10 and 2 – 264 is 16 Exabytes of Addressable Data [PC limit]

6. Data Max Veolicity is 100 Gbps is Fastest Ethernet [8b/10b – 10 billion bytes per second]

7. How much is Truly Unique Data vs. Duplicated

8. What is the Quality (Veracity) of this Data?

Sam Siewert 3

Big Data Volume and Velocity Can Be Estimated as Shown – Disk drives shipped and in use – Online data only, or removable and archive media as well? – Bit-rot (media eventually fails, limited storage lifetime)

Variety, Depends on Level of Data Duplication – Enterprise Storage System Deduplication – E.g. EMC Deduplication – Internet Archive [petabytes] and Wayback machine,

http://www.loc.gov/about/general-information/ [traditional volumes], Stanford Digital Repository, National Archives, National A/V Conservation

Veracity, perhaps Most Challenging Part – Is the Data Correct – Not Corrupted – Is it Valid – From a Known, Trusted Source, Corresponding to

Metadata Description – Has the Data Been Processed and if so, How? – Is it Raw Data (from a sensor, user, other)? – Veracity is difficult – E.g. http://berkeleyearth.org/about-data-set

Sam Siewert 4

Quiz #2

Let’s Go Over it …

Sam Siewert

5

Quiz #2 Average was 68.3, Std. Deviation was 17.5 - Primarily Need to Study Book More Quiz #1 – 81.5, 8.5 (Ideal) – Mostly from In-Class Notes Let’s Go Over Solutions Now with Book Citations Solutions Provide References Back to the Book – Posted on Canvas as Well

Sam Siewert 6

Quiz #2 - Review

Sam Siewert 7

Equi-join is a specific type of Theta-Join where the Predicate tests for EQUIVALENCE ONLY

Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam

Quiz #2 - Review

Sam Siewert 8

See p. 119, 132, 1) Selection [Restriction], 2) Projection [Projection], 3) Union [Join – Specific Union], 4) Set Difference [Codd Omits], 5) Cartesian Product [Permutation]

Encouraged! See Class Notes and Example of TC,RA, and Use of DISTINCT

Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam

Required [Except Intersection]

Pearson Education © 2014 9

intersection can be composed as R – (R – S)

Nice to Have! - Relational Algebra Operations – Composed from Required

Pearson Education © 2014 10

Quiz #2 - Review

Sam Siewert 11

Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam

PK, FK EQUIVALENCE Book Says that EQUIVALENCE for Equi-Join is Predicate that Uses “=“ – p. 126 (bottom) This is Simplistic, especially for Multi-table Joins and PKs formed from more than One Attribute E.g. if(X == Y) Can in Fact Involve a Complex Comparison – E.g. if X is a vector = [1, 1, 3] and Y is a vector, then

EQUIVALENCE requires Comparison of Each Component – If((X[0] == Y[0]) && (X[1] == Y[1]) && (X[2] == Y[2]))

Likewise, Consider Simple Tuples of FirstName, LastName, DoB [PK=FirstName, LastName] Another Relation [FK=FirstName, LastName] with Street Address, City, Zipcode Sam Siewert 12

Join Cheat Sheet http://www.codeproject.com/KB/database/Visual_SQL_Joins/Visual_SQL_JOINS_orig.jpg

Sam Siewert 13

JOINS You Must Know MySQL Join Support – Inner, Cross, Left, Right, Outer, Natural, Multi-table with Predicates (Theta and Equi-Join) Cross-Join [p. 171, Matches Theory p. 126] Theta-Join [p. 170 – 3 Table Join] Equi-Join [p. 168-169] Natural-Join (Rarely Used, but Matches Theory on p. 127) Inner-Join (Not in Book! But, Common in MySQL) Alternative Form – Nested Queries [p. 164] Other Joins You are Not Responsible For (Less Useful)

Sam Siewert 14

Connolly-Begg Chapter 9

ORDBMS Extensions to SQL (SQL:2011)

Part -2

Sam Siewert

15

Unstructured Data BLOBs - Binary Large Objects – Images – Digital Video and Audio – Digital Media – Binary Data (Documents and Code), Perhaps Proprietary – http://mercury.pr.erau.edu/~siewerts/extra/images/example-

images/Moose-to-Skeleton.png – http://mercury.pr.erau.edu/~siewerts/extra/images/example-

images/Sled-Dogs.jpg – http://mercury.pr.erau.edu/~siewerts/extra/images/example-

images/korean-air-profile.jpg

CLOBs – Character Large Objects – Log files and Traces (IT) – Transaction Logs – XML, HTML, XDS, etc. [Web documents typically via HTTP,

HTTPS]

Sam Siewert 16

OO Concepts – “Real World” OOA – Object Oriented Analysis – Define Class Hierarchies (Abstract Classes with Attributes) and

Interfaces (Public, Private) and Methods (Operations) – Inheritance and Multiple Inheritance

OOD – OO Design – Encapsulation of Methods with Data (Attributes) for Abstract and

Derived Classes – Instantiation and Use of Objects [Use Cases]

OOP – Object Oriented Programming (Java, C++, …) – Programming Language – Direct Implementation of OOD – Implementation of Re-useable OO Code Libraries

Boost - http://www.boost.org/ OpenCV [C++ version] Many More … in other OOPLs

Sam Siewert 17

Classes Useful in Real World E.g. Biology – Kingdom, Phylum, Class, Order, Genus, Species [Multiple Inheritance Examples], Proven Use Parts – Components compose Sub-system(s) compose System(s) compose System of Systems Supports Re-Use of Objects Instantiated from Class Hierarchy Multiple Inheritance – Odd? Can be Abstract, Derived and Concrete

– E.g. Mathematical, Data Structures, Image Processing

– Organization of Information (Classes in Ontological Web Language)

– Simulation of Physical Systems – Most Often Software Libraries

Sam Siewert 18

http://en.wikipedia.org/wiki/Platypus#mediaviewer/File:Wild_Platypus_4.jpg

https://www.youtube.com/watch?v=kDay5OWDPn4#t=26

Quick Review of OO [not just C++] Encapsulation of Data and Methods in an Instantiated Object Objects are Instances from a Class Hierarchy

– Classes Define Encapsulated Data and Methods Virtual Functions can Be Refined Pure Virtual Functions in Abstract Classes Defined must be Refined

– Can Inherit Data and Methods from Parent Classes – Can In Fact Have Multiple Inheritance – Instantiated Objects Call Dynamically Bound Methods [Determined at Runtime]

Enables Semantic Overload [Can be Done without OO too]

– Overloaded Functions (Methods), Resolved by Type Signatures or Subtype/Sub-class

– Overloaded Operators (E.g. math operators work not only on integers and real numbers, but also vectors, matrices, and complex numbers)

– Derived Data Types from Base types

Polymorphism – Parametric – Re-useable Templates (E.g. Ada and Java Generic, C++ Template) – Functional Semantic Overloading – Dynamic or Subtype or Subclass Polymorphism using Late Binding

OOPs – Smalltalk to more current Java, C++, Ada95, … CLOS Sam Siewert 19

Operator and Function Overloading What is Required to Be OO? Common Consensus is – Encapsulation, Class Hierarchy, Polymorphism (Parametric & Subtype or Subclass with Late Binding), Inheritance Operator Overloading Not Required (E.g. Java Frowns Upon, No Support) Some PLs have OO Features, but not All Sam Siewert 20 http://en.wikipedia.org/wiki/Operator_overloading

Storing Objects in Relational Databases

One approach to achieving persistence with an OOPL is to use an RDBMS as the underlying storage engine. – O2 – merged with Informix and acquired by IBM – ObjectStore - http://www.objectstore.com/ – Objectivity - http://www.objectivity.com/products/objectivitydb – Versant - http://www.actian.com/products/operational-databases/

Requires mapping class instances (i.e. objects) to one or more tuples distributed over one or more relations. To handle class hierarchy, have two basics tasks to perform:

(1) design relations to represent class hierarchy; (2) design how objects will be accessed.

Pearson Education © 2009 21

Storing Objects in Relational Databases

Pearson Education © 2009 22

Mapping Classes to Relations Number of strategies for mapping classes to

relations, although each results in a loss of semantic information.

(1) Map each class or subclass to a relation: Staff (staffNo, fName, lName, position, sex, DOB, salary) Manager (staffNo, bonus, mgrStartDate) SalesPersonnel (staffNo, salesArea, carAllowance) Secretary (staffNo, typingSpeed)

Pearson Education © 2009 23

Mapping Classes to Relations (2) Map each subclass to a relation

Manager (staffNo, fName, lName, position, sex, DOB, salary, bonus, mgrStartDate) SalesPersonnel (staffNo, fName, lName, position, sex, DOB, salary, salesArea, carAllowance) Secretary (staffNo, fName, lName, position, sex, DOB, salary, typingSpeed)

(3) Map the hierarchy to a single relation Staff (staffNo, fName, lName, position, sex, DOB, salary, bonus, mgrStartDate, salesArea, carAllowance, typingSpeed, typeFlag)

Pearson Education © 2009 24

ORDBMSs RDBMSs currently dominant database technology with estimated sales of US$24billion in 2011, expected to grow to US$37billion by 2016 . Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited to advanced database applications, and added functionality is required. Reject claim that extended RDBMSs will not provide sufficient functionality or will be too slow to cope adequately with new complexity. Can remedy shortcomings of relational model by extending model with OO features.

Pearson Education © 2014 25

ORDBMSs - Features OO features being added include: – user-extensible types, – encapsulation, – inheritance, – polymorphism, – dynamic binding of methods, – complex objects including non-1NF objects, – object identity.

Pearson Education © 2014 26

ORDBMSs - Features However, no single extended relational model. All models: – share basic relational tables and query

language, – all have some concept of ‘object’, – some can store methods (or procedures or

triggers).

Some analysts predict ORDBMS will have 50% larger share of market than RDBMS.

Pearson Education © 2014 27

Stonebraker’s View

Pearson Education © 2014 28

Advantages of ORDBMSs Resolves many of known weaknesses of RDBMS. Reuse and sharing: – reuse comes from ability to extend server to

perform standard functionality centrally; – gives rise to increased productivity both for

developer and end-user. Preserves significant body of knowledge and experience gone into developing relational applications.

Pearson Education © 2014 29

Disadvantages of ORDBMSs Complexity. Increased costs. Proponents of relational approach believe simplicity and purity of relational model are lost. Some believe RDBMS is being extended for what will be a minority of applications. OO purists not attracted by extensions either. SQL now extremely complex.

Pearson Education © 2014 30

SQL:2011 - New OO Features Type constructors for row types and reference types. User-defined types (distinct types and structured types) that can participate in supertype/subtype relationships. User-defined procedures, functions, methods, and operators. Type constructors for collection types (arrays, sets, lists, and multisets). Support for large objects – BLOBs and CLOBs. Recursion.

Pearson Education © 2014 31