CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.
-
Upload
elyssa-banker -
Category
Documents
-
view
225 -
download
1
Transcript of CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.
![Page 1: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/1.jpg)
1
CS 440 Database Management Systems
RDBMS Architecture and Data Storage
![Page 2: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/2.jpg)
2
Announcements
• Normal form and FD practice session on Feb 4th in the class.
• Assignment 1 due on Feb 7th – Submission through TEACH.
• Project progress report due on Feb 11th
– 1 – 2 pages of status report– Submission through TEACH
![Page 3: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/3.jpg)
3
Database Implementation
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
User Requirements
SQL
Data
![Page 4: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/4.jpg)
4
The big advantage of RDBMS
• It separates logical level (schema) from physical level (implementation).
• Physical data independence– Users do not worry about how their data is stored and
processes on the physical devices.– It is all SQL!– Their queries work over (almost) all RDBMS
deployments.
![Page 5: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/5.jpg)
5
Issues in logical level
• Data models – Relational, XML , …
• Query language• Data quality– normalization
• Usability• ...
![Page 6: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/6.jpg)
6
Issues on physical level
• Processor: 100 – 1000 MIPS• Main memory: 1μs – 1 ns• Secondary storage: higher capacity and durability• Disk random access : Seek time + rotational
latency + transfer time– Seek time: 4 ms - 15 ms!– Rotational latency: 2 ms – 7 ms!– Transfer time: around 1000 Mb/ sec– Read, write in blocks.
![Page 7: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/7.jpg)
7
Storage capacity versus access time
10-9 10-6 10-3 10-0 103
access time (sec)
1015
1013
1011
109
107
105
103
cache
electronicmain
electronicsecondary
magneticopticaldisks
onlinetape
nearlinetape &opticaldisks
offlinetape
typi
cal c
apac
ity
(byt
es)
from Gray & Reuterupdated in 2002
![Page 8: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/8.jpg)
8
Storage cost versus access time
10-9 10-6 10-3 10-0 103
access time (sec)
104
102
100
10-2
10-4
cache
electronicmain
electronicsecondary magnetic
opticaldisks
onlinetape
nearlinetape &opticaldisks
offlinetape
doll
ars/
MB
from Gray & Reuter
![Page 9: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/9.jpg)
9
Gloomy future: Moor’s law
• Speed of processors and cost and maximum capacity of storage increase exponentially over time.
• But storage (main and secondary) access time grows much more slowly.
• This is why managing and analyzing big data is hard.
![Page 10: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/10.jpg)
10
Issues in physical level
Three things are important in the database systems: performance, performance, and performance! ( Bruce Lindsay, co-creator of System R)
![Page 11: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/11.jpg)
11
Issues in physical level
• Other things also matter– Reliability when it comes to transactions.–…
• But performance is still a big deal.
![Page 12: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/12.jpg)
12
Is it easy to achieve good performance?
• Let’s build an RDBMS.• It supports core SQL.• No stored procedure for this version!
![Page 13: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/13.jpg)
13
Storing Data
• Store each relation in an ASCII file:
Person (SSN, Name, Age)
person.txt: 111222333 - John - 24 444222111 - Charles - 43
![Page 14: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/14.jpg)
14
Storing Data
• Store schema information in a catalogue relation:
Catalogue (AttrName, Type, RelName, Position)
catalogue.txt: SSN - String – Person - 1 Name - String - Person - 2 Age – Integer – Person - 3
![Page 15: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/15.jpg)
15
SQL Support
• SQL compiler • Like any other computer language compiler.
SELECT SSN FROM Person;
SSN 111222333 444222111
![Page 16: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/16.jpg)
16
Query Execution: Selection
1. Find the selection attribute position from the catalogue.
2. Scan the file that contains the relation.
3. Show the tuples that satisfy the condition.
SELECT * FROM Person WHERE SSN = 111222333;
![Page 17: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/17.jpg)
17
Query Execution: Join1. Read the catalogue to find the info on join
attributes.
2. Read the first relation, for each tuple: a. Read the second relation, for each tuple:
b. Assemble the join tuple
c. Output if they satisfy the condition.
SELECT *FROM Person, PersonAddrWHERE Person.SSN = PersonAddr.SSNand Person.SSN = 111222333;
![Page 18: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/18.jpg)
18
Performance Issues: Storing Data
• Update John to Sheldon– Rewrite the whole file very slow– Type conversion slow
• Delete the tuple with SSN of 111222333.
Person (SSN, Name, Age)
person.txt: 111222333 - John - 24 444222111 - Charles - 43
![Page 19: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/19.jpg)
19
Performance Issues: Selection
• We have to scan the whole relation to select some tuples very slow
• We can use an index to find the tuples much fasters.
SELECT * FROM Person WHERE SSN = 111222333;
![Page 20: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/20.jpg)
20
Performance Issues: Selection
• Read tuples one by one–Much faster if we read a whole bunch of them
together: caching
SELECT * FROM Person WHERE SSN = 111222333;
![Page 21: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/21.jpg)
21
Performance Issues: Join
• Quadratic I/O access– Very slow for large relations
SELECT * FROM Person, PersonAddr WHERE Person.SSN = PersonAddr.SSN and Person.SSN = 111222333;
![Page 22: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/22.jpg)
22
Performance Issues: Query Execution
• Two ways of executing the query– First join, then select– First select, then join much faster
• Query (execution) optimization.
SELECT * FROM Person, PersonAddr WHERE Person.SSN = PersonAddr.SSN
and Person.SSN = 111222333;
![Page 23: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/23.jpg)
23
Reliability
• Update the name in person– Power outage is the operation done? – Disk crash
Update Person SET Name = Smith WHERE Person.SSN = 111222333;
![Page 24: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/24.jpg)
24
Probably not that many people download our RDBMS
• Let’s redesign the components of our RDBMS
![Page 25: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/25.jpg)
25
Database Implementation
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
User Requirements Data storage
![Page 26: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/26.jpg)
26
Random access versus sequential access
• Disk random access : Seek time + rotational latency + transfer time.
• Disk sequential access: reading blocks next to each other
• No seek time or rotational latency • Much faster than random access
![Page 27: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/27.jpg)
27
Units of data on physical device
• Fields: data items• Records• Blocks• Files
![Page 28: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/28.jpg)
28
Fields
• Fixed size– Integer, Boolean, …
• Variable length– Varchar, …– Null terminated– Size at the beginning of the string
![Page 29: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/29.jpg)
29
Records: Sets of Fields
• Schema– Number of fields, types of fields, order, …
• Fixed format and length– Record holds only the data items
• Variable format and length– Record holds fields and their size, type, …
information
• Range of formats in between
![Page 30: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/30.jpg)
30
Record Header
• Pointer to the record schema ( record type)• Record size• Timestamp• Other info …
![Page 31: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/31.jpg)
31
Blocks
• Collection of records• Reduces number of I/O access• Different from OS blocks–Why should RDBMS manage its own blocks?– It knows the access pattern better than OS.
• Separating records in a block– Fixed size records: no worry!–Markers between records– Keep record size information in records or block
header.
![Page 32: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/32.jpg)
32
Spanned versus un-spanned
• Unspanned– Each records belongs to only one block
• Spanned– Records may store across multiple records– Saves space– The only way to deal with large records and fields:
blob, image, …
![Page 33: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/33.jpg)
33
Heap versus Sorted Files
• Heap files– There is not any order in the file– New blocks (records) are inserted at the end of file.
• Sorted files– Order blocks (and records) based on some key.– Physically contiguous or using links to the next
blocks.
![Page 34: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/34.jpg)
34
Average Cost of Data Operations
• Insertion – Heap files are more efficient.– Overflow areas for sorted files.
• Search for a record– Sorted files are more efficient.
• Search for a range of records– Sorted files are more efficient.
• Deletion– Heap files are more efficient – Although we find the record faster in the sorted file.
![Page 35: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/35.jpg)
35
Indirection
• The address of a record on the disk• Physical address – Device ID, Cylinder #, Track #, …
• Map physical addresses to logical addresses– Flexible in moving records for insertion and deletion– Costly lookup–Many options in between, tradeoff
Rec ID Physical Address
Logical address
Physical address on disk
![Page 36: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/36.jpg)
36
Block Header
• Data about block• File, relation, DB IDs • Block ID and type• Record directory• Pointer to free space• Timestamp• Other info …
![Page 37: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/37.jpg)
37
Row and Column Stores
• We have talked about row store– All fields of a record are stored together.
SSN1 Name1 Age1 Salary1SSN2 Name2 Age2 Salary2SSN3 Name3 Age3 Salary3
![Page 38: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/38.jpg)
38
Row and Column Stores
• We can store the fields in columns.–We can store SSNs implicitly.
SSN1 Name1SSN2 Name2SSN3 Name3
SSN1 Age1SSN2 Age2SSN3 Age3
SSN1 Salary1SSN2 Salary2SSN3 Salary3
![Page 39: CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.](https://reader035.fdocuments.in/reader035/viewer/2022062219/5517a67655034645368b5c8b/html5/thumbnails/39.jpg)
39
Row versus column store
• Column store– Compact storage– Faster reads on data analysis and mining operations
• Row store– Faster writes – Faster reads for record access (OLTP)
• Further reading–Mike Stonebreaker, et al, “C-Store, a column oriented
DBMS”, VLDB’05.