Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main...

57
Lecture 11: External Sorting Lecture 11

Transcript of Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main...

Page 1: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11:ExternalSorting

Lecture11

Page 2: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Announcements

1. MidtermReview:ThisFriday!

2. ProjectPart#2isout.ImplementCLOCK!

3. MidtermMaterial:EverythinguptoBuffermanagement.1. Today’slectureisnofairgame.2. Donotforgettogoovertheactivitiesaswell!

4. OurusualWednesdayupdate….

2

Lecture11

Page 3: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11:ExternalSorting

Lecture11

Page 4: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Whatyouwilllearnaboutinthissection

1. ExternalMerge(ofsortedfiles)

2. ExternalMerge- Sort

4

Lecture11

Page 5: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

1.ExternalMerge

5

Lecture11

Page 6: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Challenge:MergingBigFileswithSmallMemory

Howdoweefficientlymergetwosortedfileswhenbotharemuchlargerthanourmainmemorybuffer?

Lecture11>Section1>ExternalMerge

Page 7: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeAlgorithm

• Input:2sorted listsoflengthMandN

• Output: 1sortedlistoflengthM+N

• Required:Atleast3BufferPages

• IOs:2(M+N)

Lecture11>Section1>ExternalMerge

Page 8: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Key(Simple)IdeaTofindanelementthatisnolargerthanallelementsintwolists,one

onlyneedstocompareminimumelementsfromeachlist.

Lecture11>Section1>ExternalMerge

If:𝐴" ≤ 𝐴$ ≤ ⋯ ≤ 𝐴&𝐵" ≤ 𝐵$ ≤ ⋯ ≤ 𝐵(

Then:𝑀𝑖𝑛(𝐴", 𝐵") ≤ 𝐴/𝑀𝑖𝑛(𝐴", 𝐵") ≤ 𝐵0

fori=1….Nandj=1….M

Page 9: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeAlgorithm

Lecture11>Section1>ExternalMerge

7,11 20,31

23,24 25,30

Input:Twosortedfiles

Output:Onemergedsortedfile

Disk

MainMemory

Buffer1,5

2,22

F1

F2

Page 10: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11>Section1>ExternalMerge

7,11 20,31

23,24 25,30

Disk

MainMemory

Buffer

1,5 2,22Input:Twosortedfiles

Output:Onemergedsortedfile

F1

F2

ExternalMergeAlgorithm

Page 11: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11>Section1>ExternalMerge

7,11 20,31

23,24 25,30

Disk

MainMemory

Buffer

5 22 1,2Input:Twosortedfiles

Output:Onemergedsortedfile

F1

F2

ExternalMergeAlgorithm

Page 12: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11>Section1>ExternalMerge

7,11 20,31

23,24 25,30

Disk

MainMemory

Buffer

5 22

1,2

Input:Twosortedfiles

Output:Onemergedsortedfile

F1

F2

ExternalMergeAlgorithm

Page 13: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11>Section1>ExternalMerge

20,31

23,24 25,30

Disk

MainMemory

Buffer

522

1,2

Thisisallthealgorithm“sees”…Whichfiletoloadapagefromnext?

Input:Twosortedfiles

Output:Onemergedsortedfile

F1

F2

7,11

ExternalMergeAlgorithm

Page 14: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11>Section1>ExternalMerge

20,31

23,24 25,30

Disk

MainMemory

Buffer

522

1,2

WeknowthatF2 onlycontainsvalues≥ 22…soweshouldloadfromF1!

Input:Twosortedfiles

Output:Onemergedsortedfile

F1

F2

7,11

ExternalMergeAlgorithm

Page 15: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11>Section1>ExternalMerge

20,31

23,24 25,30

Disk

MainMemory

Buffer

522

1,2

Input:Twosortedfiles

Output:Onemergedsortedfile

F1

F27,11

ExternalMergeAlgorithm

Page 16: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11>Section1>ExternalMerge

20,31

23,24 25,30

Disk

MainMemory

Buffer

5,722

1,2

Input:Twosortedfiles

Output:Onemergedsortedfile

F1

F211

ExternalMergeAlgorithm

Page 17: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11>Section1>ExternalMerge

20,31

23,24 25,30

Disk

MainMemory

Buffer

5,7

22

1,2

Input:Twosortedfiles

Output:Onemergedsortedfile

F1

F211

ExternalMergeAlgorithm

Page 18: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Lecture11>Section1>ExternalMerge

23,24 25,30

Disk

MainMemory

Buffer

5,7

22

1,2

Input:Twosortedfiles

Output:Onemergedsortedfile

F1

F211

20,31

Andsoon…

ExternalMergeAlgorithm

Page 19: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Wecanmerge2listsofarbitrarylengthwithonly 3bufferpages.

IflistsofsizeMandN,thenCost: 2(M+N)IOs

Eachpageisreadonce,writtenonce

WithB+1bufferpages,canmergeBlists.How?

Lecture11>Section1>ExternalMerge

Page 20: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

2.ExternalMergeSort

20

Lecture11>Section2

Page 21: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Whatyouwilllearnaboutinthissection

1. Externalmergesort(2-waysort)

2. Externalmergesortonlargerfiles

3. Optimizationsforsorting

21

Lecture11>Section2

Page 22: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeAlgorithm

• Supposewewanttomergetwosorted filesbothmuchlargerthanmainmemory(i.e.thebuffer)

•Wecanusetheexternalmergealgorithm tomergefilesofarbitrarylength in2*(N+M)IOoperationswithonly3bufferpages!

Ourfirstexampleofan“IOaware”algorithm/costmodel

Lecture11>Section2>ExternalMergeSort

Page 23: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

WhyareSortAlgorithmsImportant?

• DatarequestedfromDBinsortedorderisextremelycommon• e.g.,findstudentsinincreasing GPA order

•Whynotjustusequicksortinmainmemory??• Whataboutifweneedtosort1TBofdatawith1GBofRAM…

Aclassicproblemincomputerscience!

Lecture11>Section2>ExternalMergeSort

Page 24: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Morereasonstosort…

• Sortingusefulforeliminatingduplicatecopiesinacollectionofrecords(Why?)

• SortingisfirststepinbulkloadingB+treeindex.

• Sort-merge joinalgorithminvolvessorting

Comingup…

Comingup…

Lecture11>Section2>ExternalMergeSort

Page 25: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Dopeoplecare?

Sortbenchmarkbearshisname

http://sortbenchmark.org

Lecture11>Section2>ExternalMergeSort

Page 26: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeSort

Lecture11>Section2>ExternalMergeSort

Page 27: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Sohowdowesortbigfiles?

1. Splitintochunkssmallenoughtosortinmemory(“runs”)

2. Merge pairs(orgroups)ofrunsusingtheexternalmergealgorithm

3. Keepmerging theresultingruns(eachtime=a“pass”)untilleftwithonesortedfile!

Lecture11>Section2>ExternalMergeSort

Page 28: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeSortAlgorithm(2-waysort)

27,24 3,1

Example:• 3Bufferpages• 6-pagefile

Disk MainMemory

Buffer

18,22F

33,12 55,3144,10

1. Splitintochunkssmallenoughtosortinmemory

Lecture11>Section2>ExternalMergeSort

Orangefile=unsorted

Page 29: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeSortAlgorithm(2-waysort)

27,24 3,1

Disk MainMemory

Buffer

18,22

F1

F2

33,12 55,3144,10

1. Splitintochunkssmallenoughtosortinmemory

Example:• 3Bufferpages• 6-pagefile

Lecture11>Section2>ExternalMergeSort

Orangefile=unsorted

Page 30: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeSortAlgorithm(2-waysort)

27,24 3,1

Disk MainMemory

Buffer

18,22

F1

F233,12 55,3144,10

1. Splitintochunkssmallenoughtosortinmemory

Example:• 3Bufferpages• 6-pagefile

Lecture11>Section2>ExternalMergeSort

Orangefile=unsorted

Page 31: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeSortAlgorithm(2-waysort)

27,24 3,1

Disk MainMemory

Buffer

18,22

F1

F231,33 44,5510,12

Example:• 3Bufferpages• 6-pagefile

1. Splitintochunkssmallenoughtosortinmemory

Lecture11>Section2>ExternalMergeSort

Orangefile=unsorted

Page 32: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeSortAlgorithm(2-waysort)

Disk MainMemory

BufferF1

F2

31,33 44,5510,12

AndsimilarlyforF2

27,24 3,118,2218,22 24,271,3

1. Splitintochunkssmallenoughtosortinmemory

Example:• 3Bufferpages• 6-pagefile

Lecture11>Section2>ExternalMergeSort

Eachsortedfileisacalledarun

Page 33: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

ExternalMergeSortAlgorithm(2-waysort)

Disk MainMemory

BufferF1

F2

2.Nowjustruntheexternalmerge algorithm&we’redone!

31,33 44,5510,12

18,22 24,271,3

Example:• 3Bufferpages• 6-pagefile

Lecture11>Section2>ExternalMergeSort

Page 34: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

CalculatingIOCost

For3bufferpages,6pagefile:

1. Splitintotwo3-pagefiles andsortinmemory1. =1R+1Wforeachfile=2*(3+3)=12IOoperations

2. Merge eachpairofsortedchunksusingtheexternalmergealgorithm1. =2*(3+3)=12IOoperations

3. Totalcost=24IO

Lecture11>Section2>ExternalMergeSort

Page 35: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

18,43 24,2745,38

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

31,33 47,5510,12

18,22 23,2041,3

31,33 39,5542,46

18,23 24,271,3

48,33 44,4010,12

18,22 24,2716,31

Lecture11>Section2>ExternalMergeSort:Largerfiles

Page 36: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

18,43 24,2745,38

31,33 47,5510,12

18,22 23,2041,3

31,33 39,5542,46

18,23 24,271,3

48,33 44,4010,12

18,22 24,2716,31

1.Splitintofilessmallenoughtosortinbuffer…

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Lecture11>Section2>ExternalMergeSort:Largerfiles

Page 37: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

1.Splitintofilessmallenoughtosortinbuffer…andsort

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Lecture11>Section2>ExternalMergeSort:Largerfiles

Calleachofthesesortedfilesarun

Page 38: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

2.Nowmergepairsof(sorted)files…theresultingfileswillbesorted!

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Lecture11>Section2>ExternalMergeSort:Largerfiles

Page 39: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

3.Andrepeat…

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Disk

10,12 12,183,10

22,23 24,2718,20

33,33 38,4131,31

45,47 55,5543,44

10,12 16,181,3

23,24 24,2718,22

31,33 33,3927,31

44,46 48,5540,42

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Lecture11>Section2>ExternalMergeSort:Largerfiles

Calleachofthesestepsapass

Page 40: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

4.Andrepeat!

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Disk

10,12 12,183,10

22,23 24,2718,20

33,33 38,4131,31

45,47 55,5543,44

10,12 16,181,3

23,24 24,2718,22

31,33 33,3927,31

44,46 48,5540,42

Disk

3,10 10,101,3

12,16 18,1812,12

20,22 22,2318,18

24,24 27,2723,24

31,31 31,3327,31

33,38 39,4033,33

43,44 44,4541,42

48,55 55,5546,47

Lecture11>Section2>ExternalMergeSort:Largerfiles

Page 41: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Simplified3-pageBufferVersionAssumeforsimplicitythatwesplitanN-pagefileintoNsingle-pagerunsandsortthese;then:

• Firstpass:MergeN/2pairsofrunseach oflength1page

• Secondpass:MergeN/4pairsofrunseachoflength2pages

• Ingeneral,forN pages,wedo 𝒍𝒐𝒈𝟐 𝑵 passes• +1fortheinitialsplit&sort

• Eachpassinvolvesreadingin&writingoutallthepages=2NIO

Unsortedinputfile

Split&sort

Merge

Merge

Sorted!

à 2N*( 𝒍𝒐𝒈𝟐 𝑵 +1)totalIOcost!

Lecture11>Section2>ExternalMergeSort:Largerfiles

Page 42: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

UsingB+1bufferpagestoreduce#ofpasses

SupposewehaveB+1bufferpagesnow;wecan:

1. Increaselengthofinitialruns.SortB+1atatime!Atthebeginning,wecansplittheNpagesintorunsoflengthB+1andsorttheseinmemory

Lecture11>Section2>Optimizationsforsorting

2𝑁( log$ 𝑁 + 1)

IOCost:

Startingwithrunsoflength1

2𝑁( log$𝑵

𝑩 + 𝟏 + 1)

StartingwithrunsoflengthB+1

Page 43: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

UsingB+1bufferpagestoreduce#ofpasses

SupposewehaveB+1bufferpagesnow;wecan:

2.PerformaB-waymerge.Oneachpass,wecanmergegroupsofBrunsatatime(vs.mergingpairsofruns)!

Lecture11>Section2>Optimizationsforsorting

IOCost:

2𝑁( log$ 𝑁 + 1) 2𝑁( log$𝑵

𝑩 + 𝟏 + 1)

Startingwithrunsoflength1

StartingwithrunsoflengthB+1

2𝑁( log@𝑵

𝑩 + 𝟏 + 1)

PerformingB-waymerges

Page 44: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking

Lecture11>Section2>Optimizationsforsorting

Page 45: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repackingforevenlongerinitialruns

• WithB+1bufferpages,wecannowstartwithB+1-lengthinitialruns(anduseB-waymerges)toget2𝑁( log@

𝑵𝑩A𝟏

+ 1) IOcost…

• Canwereducethiscostmorebygettingevenlongerinitialruns?

• Userepacking- producelongerinitialrunsby“merging”inbufferaswesortatinitialstage

Lecture11>Section2>Optimizationsforsorting

Page 46: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer

• Startwithunsortedsingleinputfile,andload2pages

57,24 3,98

DiskMainMemory

Buffer18,22F1

10,33 44,5531,12

Lecture11>Section2>Optimizationsforsorting

F2

Page 47: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer

• Taketheminimumtwovalues,andputinoutputpage

57,24 3,98

DiskMainMemory

Buffer18,22F1

10,33

44,55

31,12

Lecture11>Section2>Optimizationsforsorting

F2 31 33 10,12

m=12

Alsokeeptrackofmax(last)valueincurrentrun…

Page 48: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer

• Next,repack

57,24 3,98

DiskMainMemory

BufferF1

33

Lecture11>Section2>Optimizationsforsorting

F2 31 31,3310,12

m=1244,55

18,22

Page 49: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer

• Next,repack,thenloadanotherpageandcontinue!

57,24 3,98

DiskMainMemory

BufferF1

Lecture11>Section2>Optimizationsforsorting

F2 31,3310,12

m=1244,55

m=33

18,22

Page 50: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer

• Now,however,thesmallestvaluesarelessthanthelargest(last)inthesortedrun…

3,98

DiskMainMemory

BufferF1

Lecture11>Section2>Optimizationsforsorting

F2 31,3310,12

m=33

18,2218,22

Wecallthesevaluesfrozen becausewecan’taddthemtothisrun…

44,55

57,24

Page 51: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer

• Now,however,thesmallestvaluesarelessthanthelargest(last)inthesortedrun…

DiskMainMemory

BufferF1

Lecture11>Section2>Optimizationsforsorting

F2 31,3310,12

m=55

44,55 57,24 18,22

3,98

Page 52: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer

• Now,however,thesmallestvaluesarelessthanthelargest(last)inthesortedrun…

DiskMainMemory

BufferF1

Lecture11>Section2>Optimizationsforsorting

F2 31,3310,12

m=55

44,55 57,24 18,22 3,98

Page 53: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer

• Now,however,thesmallestvaluesarelessthanthelargest(last)inthesortedrun…

DiskMainMemory

BufferF1

Lecture11>Section2>Optimizationsforsorting

F2 31,3310,12

m=55

44,55 3,24 18,22 57,98

Page 54: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer• Onceallbufferpageshaveafrozenvalue,orinputfileisempty,startnewrunwiththefrozenvalues

DiskMainMemory

BufferF1

Lecture11>Section2>Optimizationsforsorting

F2 31,3310,12

m=0

44,55 3,24 18,22

57,98

F3

Page 55: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking Example:3pagebuffer• Onceallbufferpageshaveafrozenvalue,orinputfileisempty,startnewrunwiththefrozenvalues

DiskMainMemory

BufferF1

Lecture11>Section2>Optimizationsforsorting

F2 31,3310,12

m=0

44,55

57,98

F3

3,18 22,24

Page 56: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Repacking

• Notethat,forbufferwithB+1pages:• Ifinputfileissortedà nothingisfrozenà wegetasingle run!• Ifinputfileisreversesorted(worstcase)à everythingisfrozenà wegetrunsoflengthB+1

• Ingeneral,withrepackingwedonoworse thanwithoutit!

• Whatifthefileisalreadysorted?

• Engineer’sapproximation:runswillhave~2(B+1)length

~2𝑁( log@𝑵

𝟐(𝑩 + 𝟏) + 1)

Lecture11>Section2>Optimizationsforsorting

Page 57: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3

Summary

• BasicsofIOandbuffermanagement.

• WeintroducedtheIOcostmodelusingsorting.• SawhowtodomergeswithfewIOs,• Worksbetterthanmain-memorysortalgorithms.

• Describedafewoptimizationsforsorting

Lecture11