CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and...
Transcript of CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and...
![Page 1: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/1.jpg)
CSC 261/461 – Database SystemsLecture 16
Spring 2018
![Page 2: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/2.jpg)
The IO Model & External Sorting
![Page 3: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/3.jpg)
Today’s Lecture
• Chapter 16 (Disk Storage, File Structure and Hashing)• Chapter 17 (Indexing)
This chapters cover a lot of details and it’s not possible to cover everything in class.
So please study as much as you can
![Page 4: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/4.jpg)
![Page 5: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/5.jpg)
Simplified Database System Environment
![Page 6: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/6.jpg)
What you will learn about in this section
1. Storage and memory model
2. Buffer
![Page 7: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/7.jpg)
1. THE BUFFER
![Page 8: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/8.jpg)
High-level: Disk vs. Main Memory
• Disk:
– Slow• Sequential access
– (although fast sequential reads)
– Durable• We will assume that once on disk,
data is safe!
– Cheap
![Page 9: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/9.jpg)
• Random Access Memory (RAM) or Main Memory:
– Fast• Random access, byte addressable
– ~10x faster for sequential access– ~100,000x faster for random access!
– Volatile• Data can be lost if e.g. crash occurs, power goes out, etc!
– Expensive• For $100, get 16GB of RAM vs. 2TB of disk!
High-level: Disk vs. Main Memory
![Page 10: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/10.jpg)
• Keep in mind the tradeoffs here as motivation for the mechanisms we introduce
–Main memory: fast but limited capacity, volatile• Vs.
– Disk: slow but large capacity, durable
High-level: Disk vs. Main Memory
How do we effectively utilize both ensuring certain critical guarantees?
![Page 11: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/11.jpg)
Hardware Description of Disk Devices
• Information is stored on a disk surface in concentric circles (Track)
• Tracks with same diameter on various surfaces is called cylinder
• Tracks are divided into sectors• OS divides a track into equal
sized disk blocks (pages)– One page = one or more sectors
![Page 12: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/12.jpg)
A Simplified Filesystem Model
• For us, a page is a fixed-sized array of memory – One (or more) disk block (blocks)– Interface:
• write to an entry (called a slot) or set to “None”
• And a file is a variable-length list of pages– Interface: create / open / close; next_page();
etc.
Disk
1,0,3 1,0,3File
Page
![Page 13: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/13.jpg)
The Buffer
Disk
Main Memory
Buffer
• Transfer of data between main memory and disk takes place in units of disk blocks.
• The hardware address of a block is a combination of a cylinder number, track number, and block number.
• A buffer is a region of physical memory used to store a single block.
• Sometimes, several contiguous blocks can be copied into a cluster
– In this lecture: We will mostly not distinguish between a buffer and a cluster.
• Key idea: Reading / writing to disk is slow- need to cache data!
![Page 14: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/14.jpg)
Main Memory
Buffer
The (Simplified) Buffer
• In this class: We’ll consider a buffer located in main memory that operates over pages and files:
Disk1,0,31,0,3
• Read(page): Read page from disk -> buffer if not already in buffer
![Page 15: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/15.jpg)
Main Memory
Buffer
The (Simplified) Buffer
• In this class: We’ll consider a buffer located in main memory that operates over pages and files:
Disk1,0,3
1,0,3• Read(page): Read page from disk ->
buffer if not already in buffer
02
Processes can then read from / write to the page in the buffer
![Page 16: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/16.jpg)
Main Memory
Buffer
The (Simplified) Buffer
• In this class: We’ll consider a buffer located in main memory that operates over pages and files:
Disk1,0,3
1,2,3• Read(page): Read page from disk ->
buffer if not already in buffer
• Flush(page): Evict page from buffer & write to disk
![Page 17: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/17.jpg)
Main Memory
Buffer
The (Simplified) Buffer
• In this class: We’ll consider a buffer located in main memory that operates over pages and files:
Disk1,0,3
1,2,3• Read(page): Read page from disk ->
buffer if not already in buffer
• Flush(page): Evict page from buffer & write to disk
• Release(page): Evict page from buffer without writing to disk
![Page 18: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/18.jpg)
Main Memory
Buffer
Disk
• Database maintains its own buffer
– Why? The OS already does this…
– DB knows more about access patterns.
– Recovery and logging require ability to flush to disk.
Managing Disk: The DBMS Buffer
![Page 19: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/19.jpg)
The Buffer Manager
• A buffer manager handles supporting operations for the buffer:
– Primarily, handles & executes the “replacement policy” • i.e. finds a page in buffer to flush/release if buffer is full and a new
page needs to be read in
– DBMSs typically implement their own buffer management routines
![Page 20: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/20.jpg)
Use of Two Buffer
B
![Page 21: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/21.jpg)
Buffer Replacement Strategies
• Least recently used (LRU)
• Clock policy
• First-in-first-out (FIFO)
• Refer 16.3.2 for details
![Page 22: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/22.jpg)
Records and Files
• Data is usually stored in the form of records
• Each record consists of a collection of related data values or items. – Record usually describe entities
![Page 23: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/23.jpg)
File Types
• Unordered Records (Heap Files)
• Ordered Records (Sorted Files)
![Page 24: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/24.jpg)
Heap Files
• Insertion (of a record):– Very efficient. – Last disk block is copied into a buffer– New record is added– Block is rewritten back to disk
• Searching:– Linear search
• Deletion:– Rewrite empty block after deleting record. (or)– Use deletion marker
![Page 25: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/25.jpg)
Sorted Files
• Physically sort the records of a file– Based on the values of one of the fields (ordering fields)– Ordered and sequential file
• Searching:– Can perform Binary Search.
• Insertion and Deletion:– Expensive
![Page 26: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/26.jpg)
Average Access Times for a File of b Blocks under Basic File Organizations
![Page 27: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/27.jpg)
2. EXTERNAL MERGE & SORT
![Page 28: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/28.jpg)
Challenge: Merging Big Files with Small Memory
How do we efficiently merge two sorted files when both are much larger than our main memory buffer?
![Page 29: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/29.jpg)
External Merge Algorithm
• Input: 2 sorted lists of length M and N
• Output: 1 sorted list of length M + N
• Required: At least 3 Buffer Pages
• IOs: 2(M+N)
STOP!Think about the solution before you proceed!The idea is same as merge step in Merge sort
![Page 30: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/30.jpg)
Recap: Merge Sort
![Page 31: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/31.jpg)
6 2 8 4 3 7 5 16 2 8 4 3 7 5 1
Merge-Sort(A, 0, 7)Divide
A:
CSC172, Spring 2018
![Page 32: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/32.jpg)
6 2 8 4
3 7 5 1
6 2 8 4
Merge-Sort(A, 0, 3) , divideA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 33: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/33.jpg)
3 7 5 1
8 4
6 26 2
Merge-Sort(A, 0, 1) , divideA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 34: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/34.jpg)
3 7 5 1
8 4
6
2
Merge-Sort(A, 0, 0) , base caseA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 35: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/35.jpg)
3 7 5 1
8 4
6 2
Merge-Sort(A, 0, 0), returnA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 36: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/36.jpg)
3 7 5 1
8 4
6
2
Merge-Sort(A, 1, 1) , base caseA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 37: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/37.jpg)
3 7 5 1
8 4
6 2
Merge-Sort(A, 1, 1), returnA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 38: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/38.jpg)
3 7 5 1
8 4
2 6
Merge(A, 0, 0, 1)A:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 39: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/39.jpg)
3 7 5 1
8 42 6
Merge-Sort(A, 0, 1), returnA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 40: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/40.jpg)
3 7 5 1
8 4
2 6
Merge-Sort(A, 2, 3)
48
, divideA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 41: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/41.jpg)
3 7 5 1
4
2 6
8
Merge-Sort(A, 2, 2), base caseA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 42: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/42.jpg)
3 7 5 1
4
2 6
8
Merge-Sort(A, 2, 2), returnA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 43: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/43.jpg)
4
2 6
8
Merge-Sort(A, 3, 3), base caseA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 44: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/44.jpg)
3 7 5 1
4
2 6
8
Merge-Sort(A, 3, 3), returnA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 45: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/45.jpg)
3 7 5 1
2 6
4 8
Merge(A, 2, 2, 3)A:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 46: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/46.jpg)
3 7 5 1
2 6 4 8
Merge-Sort(A, 2, 3), returnA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 47: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/47.jpg)
3 7 5 1
2 4 6 8
Merge(A, 0, 1, 3)A:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 48: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/48.jpg)
3 7 5 12 4 6 8
Merge-Sort(A, 0, 3), returnA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 49: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/49.jpg)
3 7 5 1
2 4 6 8
Merge-Sort(A, 4, 7)A:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 50: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/50.jpg)
1 3 5 7
2 4 6 8A:
Merge (A, 4, 5, 7)
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 51: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/51.jpg)
1 3 5 72 4 6 8
Merge-Sort(A, 4, 7), returnA:
Merge-Sort(A, 0, 7)
CSC172, Spring 2018
![Page 52: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/52.jpg)
1 2 3 4 5 6 7 8
Merge(A, 0, 3, 7)A:
Merge-Sort(A, 0, 7)Merge-Sort(A, 0, 7), done!
CSC172, Spring 2018
![Page 53: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/53.jpg)
A[middle]A[left]
SortedFirstPart
Sorted SecondPart
Merge-Sort: Merge
A[right]
merge
A:
A:
Sorted
CSC172, Spring 2018
![Page 54: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/54.jpg)
6 10 14 223 5 15 28L: R:
Temporary Arrays
5 15 28 30 6 10 145
Merge-Sort: Merge Example
2 3 7 8 1 4 5 6A:
CSC172, Spring 2018
![Page 55: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/55.jpg)
Merge-Sort: Merge Example
3 5 15 28 30 6 10 14
L:
A:
3 15 28 30 6 10 14 22
R:
i=0 j=0
k=0
2 3 7 8 1 4 5 6
1
CSC172, Spring 2018
![Page 56: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/56.jpg)
Merge-Sort: Merge Example
1 5 15 28 30 6 10 14
L:
A:
3 5 15 28 6 10 14 22
R:
k=1
2 3 7 8 1 4 5 6
2
i=0 j=1
CSC172, Spring 2018
![Page 57: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/57.jpg)
Merge-Sort: Merge Example
1 2 15 28 30 6 10 14
L:
A:
6 10 14 22
R:
i=1
k=2
2 3 7 8 1 4 5 6
3
j=1
CSC172, Spring 2018
![Page 58: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/58.jpg)
Merge-Sort: Merge Example
1 2 3 6 10 14
L:
A:
6 10 14 22
R:
i=2 j=1
k=3
2 3 7 8 1 4 5 6
4
CSC172, Spring 2018
![Page 59: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/59.jpg)
Merge-Sort: Merge Example
1 2 3 4 6 10 14
L:
A:
6 10 14 22
R:
j=2
k=4
2 3 7 8 1 4 5 6
i=2
5
CSC172, Spring 2018
![Page 60: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/60.jpg)
Merge-Sort: Merge Example
1 2 3 4 5 6 10 14
L:
A:
6 10 14 22
R:
i=2 j=3
k=5
2 3 7 8 1 4 5 6
6
CSC172, Spring 2018
![Page 61: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/61.jpg)
Merge-Sort: Merge Example
1 2 3 4 5 6 14
L:
A:
6 10 14 22
R:
k=6
2 3 7 8 1 4 5 6
7
i=2 j=4CSC172, Spring 2018
![Page 62: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/62.jpg)
Merge-Sort: Merge Example
1 2 3 4 5 6 7 14
L:
A:
3 5 15 28 6 10 14 22
R:2 3 7 8 1 4 5 6
8
i=3 j=4
k=7
CSC172, Spring 2018
![Page 63: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/63.jpg)
Merge-Sort: Merge Example
1 2 3 4 5 6 7 8
L:
A:
3 5 15 28 6 10 14 22
R:2 3 7 8 1 4 5 6
i=4 j=4
k=8
CSC172, Spring 2018
![Page 64: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/64.jpg)
Key (Simple) Idea
To find an element that is no larger than all elements in two lists, one only needs to compare minimum elements from each list.
If:!" ≤ !$ ≤ ⋯ ≤ !&'" ≤ '$ ≤ ⋯ ≤ '(
Then:)*+(!", '") ≤ !/)*+(!", '") ≤ '0
for i=1….N and j=1….M
![Page 65: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/65.jpg)
External Merge Algorithm
7,11 20,31
23,24 25,30
Input:Two sorted files
Output:One mergedsorted file
Disk
Main Memory
Buffer1,5
2,22
F1
F2
![Page 66: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/66.jpg)
External Merge Algorithm
7,11 20,31
23,24 25,30
Disk
Main Memory
Buffer
1,5 2,22Input:Two sorted files
Output:One mergedsorted file
F1
F2
![Page 67: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/67.jpg)
External Merge Algorithm
7,11 20,31
23,24 25,30
Disk
Main Memory
Buffer
5 22 1,2Input:Two sorted files
Output:One mergedsorted file
F1
F2
![Page 68: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/68.jpg)
External Merge Algorithm
7,11 20,31
23,24 25,30
Disk
Main Memory
Buffer
5 22
1,2
Input:Two sorted files
Output:One mergedsorted file
F1
F2
![Page 69: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/69.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
Main Memory
Buffer
522
1,2
This is all the algorithm “sees”… Which file to load a page from next?
Input:Two sorted files
Output:One mergedsorted file
F1
F2
7,11
![Page 70: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/70.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
Main Memory
Buffer
522
1,2
We know that F2 only contains values ≥ 22… so we should load from F1!
Input:Two sorted files
Output:One mergedsorted file
F1
F2
7,11
![Page 71: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/71.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
Main Memory
Buffer
522
1,2
Input:Two sorted files
Output:One mergedsorted file
F1
F27,11
![Page 72: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/72.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
Main Memory
Buffer
5,722
1,2
Input:Two sorted files
Output:One mergedsorted file
F1
F211
![Page 73: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/73.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
Main Memory
Buffer
5,7
22
1,2
Input:Two sorted files
Output:One mergedsorted file
F1
F211
![Page 74: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/74.jpg)
External Merge Algorithm
23,24 25,30
Disk
Main Memory
Buffer
5,7
22
1,2
Input:Two sorted files
Output:One mergedsorted file
F1
F211
20,31
And so on…
![Page 75: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/75.jpg)
We can merge lists of arbitrary length with only 3 buffer pages.
If lists of size M and N, thenCost: 2(M+N) IOs
Each page is read once, written once
![Page 76: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not](https://reader033.fdocuments.in/reader033/viewer/2022052810/607dc5315b72282bfd6507a3/html5/thumbnails/76.jpg)
Acknowledgement
• Some of the slides in this presentation are taken from the slides provided by the authors.
• Many of these slides are taken from cs145 course offered byStanford University.