Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential)...

38
Parallel Sorting

Transcript of Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential)...

Page 1: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

ParallelSorting

Page 2: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Ajungle

Page 3: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Illustration

https://www.youtube.com/watch?v=kPRA0W1kECg

Page 4: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

(Sequential)Sorting

• BubbleSort,InsertionSort– O(n2)

• MergeSort,HeapSort,QuickSort– O(nlogn)– QuickSort bestonaverage

• Optimal Parallel Timecomplexity– O(nlogn)/P– IfP=NthenO(logn)

Page 5: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

InsertionSortInsertion_Sort (A)

for i from 1 to |A| - 1j = iwhile j > 0 and A[j-1] > A[j]

swap A[j] and A[j-1]j = j – 1

Return ( A )

Inherentlysequentialsohardtoparallelize!!!!è Onlythroughpipelining canspeedupberealized

Page 6: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

PipelinedInsertionSort•

Tpipelined =2n,withnprocessors,somaximalspeedup=n/4– 3/4(wortcasesequentialtime=(n-1)(n-2)/2=n2/2-3n/2+2/2)

Page 7: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

ParallelMergeSortMerge_Sort (A)

n = |A|halfway = floor(n/2)

DOINPARALLELMerge_Sort (A[1]… A[halfway])Merge_Sort (A[halfway+1]… A[n])

j = 1; current = 1for i from 1 to halfway

while j ≤ n-halfway and A[halfway + j] < A[i]X[current] = A[halfway + j]j = j + 1; current = current+1

X[current] = A[i]current = current+1

Return ( X )

halfway halfway + j ni

A

Page 8: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Inapicture

Page 9: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

NotesMergeSort

• Collectssortedlistontooneprocessor,mergingasitemscometogether

• Mapswelltotree structure,sortinglocallyonleaves,thenmergingupthetree

• Asitemsapproachrootoftree,processorsaredropped,limitingparallelism

• O(n),ifP=n(1+2+4+…+n/2+n)=n(1+1/2+1/4…)=n.2

Page 10: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

ParallelQuickSortQuickSort (A)

if |A| == 1 then return Ai = rand_int (|A|)p = A[i]DOINPARALLEL

L = QuickSort({a A|a < p})E = {a A|a = p}G = QuickSort({a A|a > p})

Return ( L || E || G )

Page 11: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

IfweassumethatthepivotsarechosensuchthatLandGareaboutequalinsize,then

Sequential:T(n)=2T(n/2)+O(n)=O(nlogn)Infactitcanbeproventhatthisalwaysholds!

Forparallelexecution thechoiceofi iscrucialforloadbalance.Evenmoreimportantlywewouldliketochoosemultiplepivots(p-1)atthesametime,sothateachtimewegetppartitions whichcanbeexecutedinparallel.

Page 12: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Ppartitions• Foragivenp(numberofpivots)ands(oversamplingrate),firstselectatrandomp*scandidatepivots

for i from 1 to p*s

Cand[i] = rand_int (|A|)

• Sort thelistofcandidatepivots:Cand[i]• ChooseCand[s],Cand[2*s]…Cand[(p-1)*s]Findagoodvaluefortheoversamplingrate:s>1,

è sshouldnotleadtoverylongsortingtimes

Page 13: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

ParallelRadixSortInsteadofcomparingvalues:COMPAREDIGITS

Radix_Sort (A, b) # Assumebinaryrepresentationsofkeysfor i from 0 to b-1

FLAGS = { (a>>i) mod 2 | a A } NOTFLAGS = { 1-FLAGS[a] | a A }R_0 = SCAN (NOTFLAGS)s_0 = SUM (NOTFLAGS)R_1 = SCAN (FLAGS)R = {if FLAGS[j] == 0

then R_0[j]else R_1[j] + s_0| j [0…|A|-1}

A = A sorted by RReturn ( A )

∈∈

(a>>i) mod 2: rightshift i times,soe.g.01101>>2 mod2 =00011 mod 2 = 1

So(a>>i) mod 2 equalsthe(i+1)th rightmostbitofa

Page 14: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

LSD/MSDRadixSort

Insteadof(a>>i) mod 2

onecanalsoimplementsRadixSortwith:(a<<i) div 2^(b-1)

ThefirstimplementationiscalledleastsignificantdigitRadixSortorLSDRadixSortThelatteronisMSDRadixSort

Page 15: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

NotesRadixSort

ØSequentialtimecomplexity:T(n)=O(b.n),

biterations,eachiterationO(n)ØNotethatb≈logn,soatotalofO(nlogn)ØInsteadofsingledigitsablockofrdigitscanbetakeneachtime,resultinginb/r iterations

Page 16: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Illustration(LSDRadixSort)

Page 17: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

SortingofeachselecteddigitinRadixSort,withPrefixSumBasedSorting

Eachelementi oftheprefixsumarrayhastheSUMofallelementswhichindexissmallerthani

Page 18: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Whatistherelationshipwithsorting?

ØAllbitswhichareequalto0areflaggedwitha1ØComputePrefixSumofthisflagarrayØ Storeallflagged(1)entriesofx[k]inthelocationindicatedbytheprefixsum

Page 19: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Secondstage

ØAllbitswhichareequalto1 areflaggedwitha1ØComputePrefixSumofthisflagarrayØ Storeallflagged(1)entriesofx[k]inthenextlocationsindicatedbytheprefixsum

Page 20: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Whataboutparallelexecution?

• Computationallythesortingalgorithmisreducedtocomputingtheprefixsumarraysforeachbitranking.

• However,computingtheseprefixsumarraysseemstobeinherentlysequential.Ornot?

Page 21: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

ParallelExecutionofPrefixSums

Prefix_Sum (X) # X a n-bit array

for index from 0 to log nDOINPARALLELforallkif k >= 2^index thenX[k] = X[k]+X[k-2^index]

X >> 1 #Shift all entries to the rightReturn ( X )

Page 22: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

IllustrationofparallelPrefixSums

Page 23: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

ImprovingCachePerformanceØ Theparallelprefixsumalgorithmrequiresthewholearraytobe

fetchedateachiterationØ BadcacheperformanceØ ThroughTilingTechniquestheXarraycanbecutintoslices(tiles)Ø Onceeverynumberofiterationsre-tile!!Ø ACUDAimplementationoftheoverallalg.canbefoundon

https://github.com/debdattabasu/amp-radix-sort

2index

X

P2

P1

P3

Page 24: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Bitonic Sorting

Basedonbitonic sequences:

A[1],A[2],….,A[n-1],A[n]isbitonic,iffthereisaj andk suchthat• A[1]…A[j]ismonotonicincreasing,• A[j]…A[k]ismonotonicdecreasing,• A[k]…A[n]A[1]!!ismonotonicincreasing

ORviseversa

Page 25: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

A“better”definitionofBitonic Sequence

Abitonic sequence isasequencewithA[1]<=A[2]<=….<=A[k]>=…>= A[n-1]>=A[n]

forsomek(1<=k<=n),oracircularshiftofsuchasequence.

Page 26: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Inapicture

Bitonic:

NotBitonic

Ifrotated:TwoPeaks

Page 27: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

A[1]>=A[2]>=….>=A[k]<=…<=A[n-1]<=A[n]leadstothesamedefinition

Page 28: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Bitonic “Merge”Bitonic_Merge (A) # A is a bitonic sequence

n = |A|if n == 1 then return Ahalf_n = floor(n/2)for i from 1 to half_n

c[i] = min(A[i],A[i+half_n])d[i] = max(A[i],A[i+half_n])

DOINPARALLELBitonic_Merge (c[1]…c[half_n])Bitonic_Merge (d[1]…d[half_n])

Return ( )

Page 29: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

NotesBitonic Merge

• Eachc andd sequenceisabitonic sequenceagain

• Foralli: c[i] <= d[i]• Attheendwesortedbitonic sequencesoflength1,henceasortedsequence

Page 30: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Bitonic Mergealwaysyieldsbitonic sequences

Page 31: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Bitonic MergeNetwork•

Page 32: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Bitonic MergeNetwork(2)•

Page 33: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Bitonic MergeNetwork(3)

Page 34: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

ParallelBitonic Sort

Bitonic_Sort (A)

n = |A|

if n == 1 then return Afor i from 0 to log(n)

DOINPARALLELforallk=m.2^i,k<nBitonic_Merge (A[k]…A[k+2^i-1])*

Return ( )

*Foroddvaluesofm,interchangeminandmax

Page 35: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

NotesBitonic Sort

• Eachiterationcreateslongerandlongerbitonic sequences

• Inthelastiterationthewholesequenceisbitonic andthefinalbitonic mergecreatesasortedlist

Page 36: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Bitonic SortNetwork

four bitonic lists of length 2 constituting 2 bitonic lists of length 4

2 Bitonic Merge Networks

4 Bitonic Merge Networks

Page 37: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

Whyalternatingmax/min?NotethatatthestartofeachBitonic MergeNetworkwehavetwoBitonic SequenceswhichconstitutesOneBitonicSequence!!!

Ifoneofthesesequencesis(monotonic)increasingandtheotheris(monotonic)decreasingthenthisisalwaysthecase.Ifbothareincreasingordecreasingthisisnotnecessarilythecase,i.e.

isnotbitonic

Page 38: Parallel Sortingliacs.leidenuniv.nl/~wijshoffhag/PPI2017_2018/Lecture_10.pdf · (Sequential) Sorting •Bubble Sort, Insertion Sort –O ( n2 ) •Merge Sort, Heap Sort, QuickSort

NotesBitonic SortNetwork• Assumen=2^k• Thebitonic mergestageshave1,2,3,…,kstepseach,sotimetosortis

T(n) =1+2+…+k=k(k-1)/2=O(k2)=O(log2 n)

• Eachsteprequiresn/2processors,sothetotalnumberofprocessorsisO((n/2) log2 n)

• Thenetworkcanhandledmultiplepipelined listproducingasortedlisteachtimestep