CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and...

76
CSC 261/461 Database Systems Lecture 16 Spring 2018

Transcript of CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and...

Page 1: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

CSC 261/461 – Database SystemsLecture 16

Spring 2018

Page 2: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

The IO Model & External Sorting

Page 3: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Today’s Lecture

• Chapter 16 (Disk Storage, File Structure and Hashing)• Chapter 17 (Indexing)

This chapters cover a lot of details and it’s not possible to cover everything in class.

So please study as much as you can

Page 4: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not
Page 5: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Simplified Database System Environment

Page 6: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

What you will learn about in this section

1. Storage and memory model

2. Buffer

Page 7: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

1. THE BUFFER

Page 8: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

High-level: Disk vs. Main Memory

• Disk:

– Slow• Sequential access

– (although fast sequential reads)

– Durable• We will assume that once on disk,

data is safe!

– Cheap

Page 9: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

• Random Access Memory (RAM) or Main Memory:

– Fast• Random access, byte addressable

– ~10x faster for sequential access– ~100,000x faster for random access!

– Volatile• Data can be lost if e.g. crash occurs, power goes out, etc!

– Expensive• For $100, get 16GB of RAM vs. 2TB of disk!

High-level: Disk vs. Main Memory

Page 10: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

• Keep in mind the tradeoffs here as motivation for the mechanisms we introduce

–Main memory: fast but limited capacity, volatile• Vs.

– Disk: slow but large capacity, durable

High-level: Disk vs. Main Memory

How do we effectively utilize both ensuring certain critical guarantees?

Page 11: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Hardware Description of Disk Devices

• Information is stored on a disk surface in concentric circles (Track)

• Tracks with same diameter on various surfaces is called cylinder

• Tracks are divided into sectors• OS divides a track into equal

sized disk blocks (pages)– One page = one or more sectors

Page 12: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

A Simplified Filesystem Model

• For us, a page is a fixed-sized array of memory – One (or more) disk block (blocks)– Interface:

• write to an entry (called a slot) or set to “None”

• And a file is a variable-length list of pages– Interface: create / open / close; next_page();

etc.

Disk

1,0,3 1,0,3File

Page

Page 13: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

The Buffer

Disk

Main Memory

Buffer

• Transfer of data between main memory and disk takes place in units of disk blocks.

• The hardware address of a block is a combination of a cylinder number, track number, and block number.

• A buffer is a region of physical memory used to store a single block.

• Sometimes, several contiguous blocks can be copied into a cluster

– In this lecture: We will mostly not distinguish between a buffer and a cluster.

• Key idea: Reading / writing to disk is slow- need to cache data!

Page 14: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Main Memory

Buffer

The (Simplified) Buffer

• In this class: We’ll consider a buffer located in main memory that operates over pages and files:

Disk1,0,31,0,3

• Read(page): Read page from disk -> buffer if not already in buffer

Page 15: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Main Memory

Buffer

The (Simplified) Buffer

• In this class: We’ll consider a buffer located in main memory that operates over pages and files:

Disk1,0,3

1,0,3• Read(page): Read page from disk ->

buffer if not already in buffer

02

Processes can then read from / write to the page in the buffer

Page 16: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Main Memory

Buffer

The (Simplified) Buffer

• In this class: We’ll consider a buffer located in main memory that operates over pages and files:

Disk1,0,3

1,2,3• Read(page): Read page from disk ->

buffer if not already in buffer

• Flush(page): Evict page from buffer & write to disk

Page 17: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Main Memory

Buffer

The (Simplified) Buffer

• In this class: We’ll consider a buffer located in main memory that operates over pages and files:

Disk1,0,3

1,2,3• Read(page): Read page from disk ->

buffer if not already in buffer

• Flush(page): Evict page from buffer & write to disk

• Release(page): Evict page from buffer without writing to disk

Page 18: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Main Memory

Buffer

Disk

• Database maintains its own buffer

– Why? The OS already does this…

– DB knows more about access patterns.

– Recovery and logging require ability to flush to disk.

Managing Disk: The DBMS Buffer

Page 19: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

The Buffer Manager

• A buffer manager handles supporting operations for the buffer:

– Primarily, handles & executes the “replacement policy” • i.e. finds a page in buffer to flush/release if buffer is full and a new

page needs to be read in

– DBMSs typically implement their own buffer management routines

Page 20: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Use of Two Buffer

B

Page 21: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Buffer Replacement Strategies

• Least recently used (LRU)

• Clock policy

• First-in-first-out (FIFO)

• Refer 16.3.2 for details

Page 22: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Records and Files

• Data is usually stored in the form of records

• Each record consists of a collection of related data values or items. – Record usually describe entities

Page 23: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

File Types

• Unordered Records (Heap Files)

• Ordered Records (Sorted Files)

Page 24: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Heap Files

• Insertion (of a record):– Very efficient. – Last disk block is copied into a buffer– New record is added– Block is rewritten back to disk

• Searching:– Linear search

• Deletion:– Rewrite empty block after deleting record. (or)– Use deletion marker

Page 25: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Sorted Files

• Physically sort the records of a file– Based on the values of one of the fields (ordering fields)– Ordered and sequential file

• Searching:– Can perform Binary Search.

• Insertion and Deletion:– Expensive

Page 26: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Average Access Times for a File of b Blocks under Basic File Organizations

Page 27: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

2. EXTERNAL MERGE & SORT

Page 28: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Challenge: Merging Big Files with Small Memory

How do we efficiently merge two sorted files when both are much larger than our main memory buffer?

Page 29: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

• Input: 2 sorted lists of length M and N

• Output: 1 sorted list of length M + N

• Required: At least 3 Buffer Pages

• IOs: 2(M+N)

STOP!Think about the solution before you proceed!The idea is same as merge step in Merge sort

Page 30: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Recap: Merge Sort

Page 31: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

6 2 8 4 3 7 5 16 2 8 4 3 7 5 1

Merge-Sort(A, 0, 7)Divide

A:

CSC172, Spring 2018

Page 32: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

6 2 8 4

3 7 5 1

6 2 8 4

Merge-Sort(A, 0, 3) , divideA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 33: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

8 4

6 26 2

Merge-Sort(A, 0, 1) , divideA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 34: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

8 4

6

2

Merge-Sort(A, 0, 0) , base caseA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 35: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

8 4

6 2

Merge-Sort(A, 0, 0), returnA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 36: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

8 4

6

2

Merge-Sort(A, 1, 1) , base caseA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 37: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

8 4

6 2

Merge-Sort(A, 1, 1), returnA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 38: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

8 4

2 6

Merge(A, 0, 0, 1)A:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 39: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

8 42 6

Merge-Sort(A, 0, 1), returnA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 40: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

8 4

2 6

Merge-Sort(A, 2, 3)

48

, divideA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 41: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

4

2 6

8

Merge-Sort(A, 2, 2), base caseA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 42: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

4

2 6

8

Merge-Sort(A, 2, 2), returnA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 43: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

4

2 6

8

Merge-Sort(A, 3, 3), base caseA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 44: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

4

2 6

8

Merge-Sort(A, 3, 3), returnA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 45: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

2 6

4 8

Merge(A, 2, 2, 3)A:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 46: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

2 6 4 8

Merge-Sort(A, 2, 3), returnA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 47: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

2 4 6 8

Merge(A, 0, 1, 3)A:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 48: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 12 4 6 8

Merge-Sort(A, 0, 3), returnA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 49: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

3 7 5 1

2 4 6 8

Merge-Sort(A, 4, 7)A:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 50: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

1 3 5 7

2 4 6 8A:

Merge (A, 4, 5, 7)

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 51: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

1 3 5 72 4 6 8

Merge-Sort(A, 4, 7), returnA:

Merge-Sort(A, 0, 7)

CSC172, Spring 2018

Page 52: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

1 2 3 4 5 6 7 8

Merge(A, 0, 3, 7)A:

Merge-Sort(A, 0, 7)Merge-Sort(A, 0, 7), done!

CSC172, Spring 2018

Page 53: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

A[middle]A[left]

SortedFirstPart

Sorted SecondPart

Merge-Sort: Merge

A[right]

merge

A:

A:

Sorted

CSC172, Spring 2018

Page 54: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

6 10 14 223 5 15 28L: R:

Temporary Arrays

5 15 28 30 6 10 145

Merge-Sort: Merge Example

2 3 7 8 1 4 5 6A:

CSC172, Spring 2018

Page 55: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Merge-Sort: Merge Example

3 5 15 28 30 6 10 14

L:

A:

3 15 28 30 6 10 14 22

R:

i=0 j=0

k=0

2 3 7 8 1 4 5 6

1

CSC172, Spring 2018

Page 56: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Merge-Sort: Merge Example

1 5 15 28 30 6 10 14

L:

A:

3 5 15 28 6 10 14 22

R:

k=1

2 3 7 8 1 4 5 6

2

i=0 j=1

CSC172, Spring 2018

Page 57: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Merge-Sort: Merge Example

1 2 15 28 30 6 10 14

L:

A:

6 10 14 22

R:

i=1

k=2

2 3 7 8 1 4 5 6

3

j=1

CSC172, Spring 2018

Page 58: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Merge-Sort: Merge Example

1 2 3 6 10 14

L:

A:

6 10 14 22

R:

i=2 j=1

k=3

2 3 7 8 1 4 5 6

4

CSC172, Spring 2018

Page 59: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Merge-Sort: Merge Example

1 2 3 4 6 10 14

L:

A:

6 10 14 22

R:

j=2

k=4

2 3 7 8 1 4 5 6

i=2

5

CSC172, Spring 2018

Page 60: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Merge-Sort: Merge Example

1 2 3 4 5 6 10 14

L:

A:

6 10 14 22

R:

i=2 j=3

k=5

2 3 7 8 1 4 5 6

6

CSC172, Spring 2018

Page 61: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Merge-Sort: Merge Example

1 2 3 4 5 6 14

L:

A:

6 10 14 22

R:

k=6

2 3 7 8 1 4 5 6

7

i=2 j=4CSC172, Spring 2018

Page 62: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Merge-Sort: Merge Example

1 2 3 4 5 6 7 14

L:

A:

3 5 15 28 6 10 14 22

R:2 3 7 8 1 4 5 6

8

i=3 j=4

k=7

CSC172, Spring 2018

Page 63: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Merge-Sort: Merge Example

1 2 3 4 5 6 7 8

L:

A:

3 5 15 28 6 10 14 22

R:2 3 7 8 1 4 5 6

i=4 j=4

k=8

CSC172, Spring 2018

Page 64: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Key (Simple) Idea

To find an element that is no larger than all elements in two lists, one only needs to compare minimum elements from each list.

If:!" ≤ !$ ≤ ⋯ ≤ !&'" ≤ '$ ≤ ⋯ ≤ '(

Then:)*+(!", '") ≤ !/)*+(!", '") ≤ '0

for i=1….N and j=1….M

Page 65: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

7,11 20,31

23,24 25,30

Input:Two sorted files

Output:One mergedsorted file

Disk

Main Memory

Buffer1,5

2,22

F1

F2

Page 66: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

7,11 20,31

23,24 25,30

Disk

Main Memory

Buffer

1,5 2,22Input:Two sorted files

Output:One mergedsorted file

F1

F2

Page 67: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

7,11 20,31

23,24 25,30

Disk

Main Memory

Buffer

5 22 1,2Input:Two sorted files

Output:One mergedsorted file

F1

F2

Page 68: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

7,11 20,31

23,24 25,30

Disk

Main Memory

Buffer

5 22

1,2

Input:Two sorted files

Output:One mergedsorted file

F1

F2

Page 69: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

20,31

23,24 25,30

Disk

Main Memory

Buffer

522

1,2

This is all the algorithm “sees”… Which file to load a page from next?

Input:Two sorted files

Output:One mergedsorted file

F1

F2

7,11

Page 70: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

20,31

23,24 25,30

Disk

Main Memory

Buffer

522

1,2

We know that F2 only contains values ≥ 22… so we should load from F1!

Input:Two sorted files

Output:One mergedsorted file

F1

F2

7,11

Page 71: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

20,31

23,24 25,30

Disk

Main Memory

Buffer

522

1,2

Input:Two sorted files

Output:One mergedsorted file

F1

F27,11

Page 72: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

20,31

23,24 25,30

Disk

Main Memory

Buffer

5,722

1,2

Input:Two sorted files

Output:One mergedsorted file

F1

F211

Page 73: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

20,31

23,24 25,30

Disk

Main Memory

Buffer

5,7

22

1,2

Input:Two sorted files

Output:One mergedsorted file

F1

F211

Page 74: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

External Merge Algorithm

23,24 25,30

Disk

Main Memory

Buffer

5,7

22

1,2

Input:Two sorted files

Output:One mergedsorted file

F1

F211

20,31

And so on…

Page 75: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

We can merge lists of arbitrary length with only 3 buffer pages.

If lists of size M and N, thenCost: 2(M+N) IOs

Each page is read once, written once

Page 76: CSC 261/461 –Database Systems Lecture 16 · • Chapter 16 (Disk Storage, File Structure and Hashing) • Chapter 17 (Indexing) This chapters cover a lot of details and it’s not

Acknowledgement

• Some of the slides in this presentation are taken from the slides provided by the authors.

• Many of these slides are taken from cs145 course offered byStanford University.