Welcome to CO 572: Advanced Databases
Transcript of Welcome to CO 572: Advanced Databases
![Page 1: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/1.jpg)
Welcome to CO 572: Advanced Databases
Holger Pirk
Holger Pirk Welcome to CO 572: Advanced Databases 1 / 41
![Page 2: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/2.jpg)
Purpose of this Lecture
Figuring stu� out
What you knowI This should mostly be revision (tell me if it is not)
What we're trying to achieve
What I will expect
What you can expect
![Page 3: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/3.jpg)
Disclaimer
He's new at this
Not an excuse for gross incompetence
![Page 4: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/4.jpg)
Disclaimer
He's new at this
Not an excuse for gross incompetence
![Page 5: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/5.jpg)
An example
What Resource bounds this query's performance
Vectorized Partition & Merge
Partition & Merge
Vectorized
OriginalParallelScanning
50
Qualifying Tuples/Pivot
0 100
2.0
4.0
6.0
8.0
10
0.0
11
Wal
lclo
ck ti
me
in s
(a) Desktop
50
Qualifying Tuples/Pivot
0 100
1.0
2.0
3.0
4.0
0.0
4.2
Wal
lclo
ck ti
me
in s
(b) Workstation
50
Qualifying Tuples/Pivot
0 100
1.0
2.0
3.0
4.0
5.0
0.0
5.9
Wal
lclo
ck ti
me
in s
(c) Server
50
Qualifying Tuples/Pivot
0 100
1.0
2.0
3.0
4.0
5.0
0.0
5.9
Wal
lclo
ck ti
me
in s
(d) High-End Server
Figure 9: Single Threaded Performance
Predicated
Predicated in Register Refined Partition & Merge
Vectorized Refined Partition & Merge
50
Qualifying Tuples/Pivot
0 100
2.0
4.0
6.0
8.0
10
0.0
11
Wal
lclo
ck ti
me
in s
(a) Desktop
50
Qualifying Tuples/Pivot
0 100
0.20
0.40
0.60
0.80
1.0
1.2
1.4
0.0
1.5
Wal
lclo
ck ti
me
in s
(b) Workstation
50
Qualifying Tuples/Pivot
0 100
0.50
1.0
1.5
2.0
0.0
2.6
Wal
lclo
ck ti
me
in s
(c) Server
50
Qualifying Tuples/Pivot
0 100
0.50
1.0
1.5
2.0
0.0
2.6
Wal
lclo
ck ti
me
in s
(d) High-End Server
Figure 10: Multi Threaded Performance
![Page 6: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/6.jpg)
An example
Option A
CPU!
Option B
Memory!
Option C
Disk!
Option D
What?
![Page 7: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/7.jpg)
Another example
What is this Workload's Isolation Level
Effect&of&long&read/only&transacAons
Workload:)• Short)txns)10R+)2W)• Long)txns:)R)10%)of)rows)
24)threads)in)total)• X)threads)running)short)txns)• 24?X)threads)running)long)txns))
Paul)Larson,)Nov)2013) 26)
• TradiZonal)locking:)update)performance)collapses)• MulZversioning:))))))update)performance)per)thread)unaffected))
![Page 8: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/8.jpg)
Another example
Option A
Repeatable Read!
Option B
Serializable!
Option C
Read Committed!
Option D
I Have no idea!
![Page 9: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/9.jpg)
Takeaways
Bottom line
You (probably) don't know everything
I certainly don't
But I know a lot of people who do � ASK!!!
If noone knows, it is research (see me!)
![Page 10: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/10.jpg)
What is a Database Management System?
![Page 11: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/11.jpg)
Well,
as it says, a
Database
Management
System
![Page 12: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/12.jpg)
What is a Database?
Merriam Webster
Database A usually large collection of organized data
Data Factual information
In Context
Pretty much any structured collection of data points/data itemsI A relational tableI A set in your favorite programming languageI A vector in your favorite programming languageI A graphI A stack of index cards
![Page 13: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/13.jpg)
What is Management?
Merriam Webster
the conducting or supervising of something
to handle or direct with a degree of skill
to work upon or try to alter for a purpose
judicious use of means to accomplish an end
In Context
Provide everything you need to manage your dataI while taking advantage of degrees of freedom
Usually prescribe an external interface: data model, protocol &semantic guarantees
I Relations, Documents/Trees, Graphs, Arrays
Results matter, internal organization is a degree of freedomI Storage, Processing, Resilience, etc. ← This is our focus
![Page 14: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/14.jpg)
What is a System?
Merriam Webster
a regularly interacting or interdependent group of items forming auni�ed whole
In Context
Often made up from components,I that interactI to achieve a greater goal
Usually applicable to many situations (i.e., generic)
How is that di�erent from a well-designed application?I The goal is domain-agnosticI Components serve a data-management purpose, not a domain-purpose
![Page 15: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/15.jpg)
So, then what is a Database Management System?
The Greater Goal
Data(base) Management System
Application Logic
DiskCPU
RAM
User Interface
Figure: The role of a DBMS in a software application
![Page 16: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/16.jpg)
So, then what is a Database Management System?
Functionality of a DBMS
Storage
Query Processing
Transaction Processing
External Interface
(Access Control)
![Page 17: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/17.jpg)
So, then what is a Database Management System?
Transaction Processing
Getting data into the database
Storage
Organizing data for the purpose
Query Processing
Retrieving data from storage
Perform a modest amount of computation
External Interface
The best are standardized
Often using some kind of query language (like SQL)
![Page 18: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/18.jpg)
What a Database Management System is not?
A runtime for your applications (though people have tried)I some support user-de�ned functionsI some even have embedded webservers, middleware, . . .
F a horrible idea
A place to store intermediate state
A �lesystem
![Page 19: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/19.jpg)
Why DBMSs do not make good �lesystems
Option A
DBMSs are not good atdirect-lookups
Option B
If you limit degrees offreedom, performance su�ers
Option C
DBMSs do not scale
Option D
They do not?
![Page 20: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/20.jpg)
What is a data management system?
Components
![Page 21: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/21.jpg)
What Database Management Systems exist?
Domain Closed Source Open Source
Relational Oracle DBMS, IBM DB2,Microsoft SQL Server
PostgreSQL,MySQL, SQLite
OLAP VectorWise, Vertica,Snow�ake, Impala, Redshift
MonetDB
OLTP VoltDB Silo, H-StoreGraphs Virtuoso Neo4JTrees/Documents DocumentDB, IMS (1966!) CouchDB
![Page 22: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/22.jpg)
What is a database management application?
Not a system
The boundary is blurry
Not genericI Domain-speci�cI Hard to generalizeI Often contains domain-speci�c tricks
Here is a proposed spectrum
Yelp
A mobile app for geo-services
A library to manage unordered collections of tagged coordinates
A spatial data management library
A relational database
A block storage system
![Page 23: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/23.jpg)
(Relational) Database Management Systems
![Page 24: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/24.jpg)
Relations are sets of tuples
Places
Name (PK) Lonitude Latitude Stars
Yummy Burgers 37.9 21.1 4.1Great Co�ee Inc. 36.1 18.9 NULL
The Ice Cream Shop NULL NULL 4
Reviews
User Place Name (FK) Comment
Holger Yummy Burgers Just awful, never againPeter The Ice Cream Shop Quite alright
![Page 25: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/25.jpg)
Normalization
Denormalization leads to redundancy
Why is redundancy badI . . .I . . .I . . .
The goal of normalization
Eliminate redundancy
1NF No complex attributes
2NF No functional dependencies within tables
3NF All primary keys are de�ned
![Page 26: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/26.jpg)
Normalization
Denormalization leads to redundancy
Why is redundancy badI . . .I . . .I . . .
The goal of normalization
Eliminate redundancy
1NF No complex attributes
2NF No functional dependencies within tables
3NF All primary keys are de�ned
![Page 27: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/27.jpg)
Schemas
The de�nition of the attributes of the tuples in your relations (duh)
create tab le r e v i ew ( s t a r s int ,comment varchar (1024) , u s e r i n t ) ;
But also integrity contraints (uniqueness, keys, foreign-keys)
a l t e r tab le r e v i ew add
fore ign key ( u s e r ) r e f e r e n c e s u s e r ( name ) ;
Some people distinguish external and internal schema
The idea of internal schemas is misleadingly simplisticI there may not even be a schema
![Page 28: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/28.jpg)
Internal Storage
Many degrees of freedom
Data could be storedI Column-wise, Row-wise, HybridI In trees, graphs, etc.
Directly in indices
Normalized or denormalized
Compressed
On disk, in memory or on GPUs
On a remote machine or even in the cloud
![Page 29: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/29.jpg)
Transactions
Isolated Run like you were alone on the system
Atomic Run completely or not at all
Consistent The interesting thing here is that there may be inconsistencyin between
Durable After the transaction commits, even power outage won'tundo the transaction
![Page 30: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/30.jpg)
Question
Why are databases a bad place for intermediate state
![Page 31: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/31.jpg)
Why does any of this matter?
![Page 32: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/32.jpg)
Why does any of this matter?
DBMSs make lots of money
We are talking about it 50 billion dollar market (in 2017)I That is only the pure sales volume of relational DBMSsI add administration, tuning, application development, . . .
they also are an important part of our life
Online shops
Banks
Online content
Your phone (SQLite probably has billions of running instances)
. . .
![Page 33: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/33.jpg)
Why would I care?
Why do I need to know?
If you ever happen to work on a DBMS (not very likely)
To make you a kick-ass DBA (somewhat more likely)
To apply data management techniques outside the �eld (extremelylikely)
I Bragging rights if you're the guy to implement a radix-partitionedin-memory hash-join
I Some of this actually comes up if you interview at Google, Facebook,Microsoft, etc.
![Page 34: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/34.jpg)
What does this course look like
![Page 35: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/35.jpg)
Some Admin stu�
Course is taught in two halvesI First halve on single-node databases (taught by your's truly)I Second halve on distributed databases (taught by Peter McBrien)
Register for this course (at level 2)!!!I Bad things will happen if you don'tI Ask your cohort admin person how to do it
![Page 36: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/36.jpg)
Things I will expect
Basic, knowledge about algorithms & data structuresI Arrays, linked lists, trees, heaps, . . .I Sorting, binary searching, graph/tree-traversal
Some basic computer architecture knowledgeI Main memory, disks, caches, multicore, . . .
Programming C++I Programming for one of the two coursework assignments
Honesty when preparing courseworkI Don't try me!
![Page 37: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/37.jpg)
Coursework
There will be three assignmentsI Using a database (with me)I Database internals (also with me)I Distributed data processing (with Peter McBrien)I This is feedback
Exam in the endI Relevant is what is discussed in class
![Page 38: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/38.jpg)
Books
Fundamentals of Database Systems
Ramez Elmasri, Shamkant NavatheSixth edition, Pearson new international edition., Pearson,
Database Systems: The Complete Book
Hector Garcia-Molina, Je�rey D. Ullman, Jennifer Widom2nd ed., Pearson Education
Database Systems: Practical Approach to Design, Implementation,and Management
Thomas M. Connolly, Carolyn E. BeggSixth edition, Global edition., Pearson Education Limited
![Page 40: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/40.jpg)
Outline
Object-Relational mapping
Data Storage
Querying relational data
Join formulation and evaluation
Query planing and optimization
Processing models
Secondary storage
![Page 41: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/41.jpg)
This was it
![Page 42: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/42.jpg)
This was it
See you on friday for a class on Object Relational Mapping (thank melater!)
![Page 43: Welcome to CO 572: Advanced Databases](https://reader034.fdocuments.in/reader034/viewer/2022042421/625fa4268ba999362d59070b/html5/thumbnails/43.jpg)
Remember
Register for this courseI at Level 2!!!