CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting
description
Transcript of CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting
11
CSC401 – Analysis of Algorithms Lecture Notes 8
Comparison-Based Sorting
ObjectivesObjectives
Introduce different known sorting Introduce different known sorting algorithmsalgorithms
Analyze the running of diverse sorting Analyze the running of diverse sorting algorithmsalgorithms
Induce the lower bound of running Induce the lower bound of running time on comparison-based sortingtime on comparison-based sorting
22
Divide-and-ConquerDivide-and-Conquer Divide-and conquerDivide-and conquer is a is a
general algorithm general algorithm design paradigm:design paradigm:– DivideDivide: divide the input : divide the input
data data SS in two disjoint in two disjoint subsets subsets SS11 and and SS22
– RecurRecur: solve the : solve the subproblems associated subproblems associated with with SS11 and and SS22
– ConquerConquer: combine the : combine the solutions for solutions for SS11 and and SS22 into a solution for into a solution for SS
The base case for the The base case for the recursion are recursion are subproblems of size 0 subproblems of size 0 or 1or 1
Merge-sortMerge-sort is a sorting is a sorting algorithm based on the algorithm based on the divide-and-conquer divide-and-conquer paradigm paradigm
Like heap-sortLike heap-sort– It uses a comparatorIt uses a comparator– It has It has OO((nn log log nn) ) running running
timetime Unlike heap-sortUnlike heap-sort
– It does not use an It does not use an auxiliary priority queueauxiliary priority queue
– It accesses data in a It accesses data in a sequential manner sequential manner (suitable to sort data on (suitable to sort data on a disk)a disk)
33
Merge-SortMerge-Sort Merge-sort on an Merge-sort on an
input sequence input sequence SS with with nn elements elements consists of three consists of three steps:steps:– DivideDivide: partition : partition SS into into
two sequences two sequences SS11 and and SS22 of about of about nn22 elements eachelements each
– RecurRecur: recursively sort : recursively sort SS11 and and SS22
– ConquerConquer: merge : merge SS11 and and SS2 2 into a unique into a unique sorted sequencesorted sequence
Algorithm mergeSort(S, C)Input sequence S with n
elements, comparator C Output sequence S sorted
according to Cif S.size() > 1
(S1, S2) partition(S, n/2)
mergeSort(S1, C)
mergeSort(S2, C)
S merge(S1, S2)
44
Merging Two Sorted SequencesMerging Two Sorted Sequences The conquer step of The conquer step of
merge-sort consists merge-sort consists of merging two of merging two sorted sequences sorted sequences A A and and BB into a sorted into a sorted sequence sequence S S containing the union containing the union of the elements of of the elements of A A and and BB
Merging two sorted Merging two sorted sequences, each sequences, each with with nn22 elements elements and implemented by and implemented by means of a doubly means of a doubly linked list, takes linked list, takes OO((nn)) timetime
Algorithm merge(A, B)Input sequences A and B with
n2 elements each
Output sorted sequence of A B
S empty sequence
while A.isEmpty() B.isEmpty()if A.first().element() <
B.first().element()S.insertLast(A.remove(A.first()))
elseS.insertLast(B.remove(B.first()))
while A.isEmpty()S.insertLast(A.remove(A.first()))while B.isEmpty()S.insertLast(B.remove(B.first()))return S
55
Merge-Sort TreeMerge-Sort Tree An execution of merge-sort is depicted by a binary An execution of merge-sort is depicted by a binary
treetree– each node represents a recursive call of merge-sort and each node represents a recursive call of merge-sort and
storesstores unsorted sequence before the execution and its partitionunsorted sequence before the execution and its partition sorted sequence at the end of the executionsorted sequence at the end of the execution
– the root is the initial call the root is the initial call – the leaves are calls on subsequences of size 0 or 1the leaves are calls on subsequences of size 0 or 1
7 2 9 4 2 4 7 9
7 2 2 7 9 4 4 9
7 7 2 2 9 9 4 4
66
Execution ExampleExecution Example
7 2 9 4 2 4 7 9 3 8 6 1 1 3 6 8
7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6
7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1
7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9
77
Analysis of Merge-SortAnalysis of Merge-Sort The height The height hh of the merge-sort tree is of the merge-sort tree is OO(log (log nn))
– at each recursive call we divide in half the sequence, at each recursive call we divide in half the sequence,
The overall amount or work done at the nodes of The overall amount or work done at the nodes of depth depth i i is is OO((nn)) – we partition and merge we partition and merge 22ii sequences of size sequences of size nn22ii – we make we make 22ii11 recursive calls recursive calls
Thus, the total running time of merge-sort is Thus, the total running time of merge-sort is OO((nn log log nn))
sizesize#seqs#seqsdepthdepth
………………
nn22ii22iiii
nn222211
nn1100
88
Set OperationsSet Operations We represent a set by We represent a set by
the sorted sequence of the sorted sequence of its elementsits elements
By specializing the By specializing the auxliliary methods he auxliliary methods he generic merge generic merge algorithm can be used algorithm can be used to perform basic set to perform basic set operations:operations:– unionunion– intersectionintersection– subtractionsubtraction
The running time of an The running time of an operation on sets operation on sets A A and and B B should be at most should be at most OO((nnAAnnBB))
Set union:Set union:– aIsLessaIsLess((a, Sa, S))
S.insertFirstS.insertFirst((aa))– bIsLessbIsLess((b, Sb, S))
S.insertLastS.insertLast((bb))– bothAreEqualbothAreEqual((a, b, Sa, b, S))
S. insertLastS. insertLast((aa)) Set intersection:Set intersection:
– aIsLessaIsLess((a, Sa, S)) { { do nothing do nothing }}
– bIsLessbIsLess((b, Sb, S)) { { do nothing do nothing }}
– bothAreEqualbothAreEqual((a, b, Sa, b, S))S. insertLastS. insertLast((aa))
99
Storing a Set in a ListStoring a Set in a List We can implement a set with a listWe can implement a set with a list Elements are stored sorted according to some Elements are stored sorted according to some
canonical orderingcanonical ordering The space used is The space used is OO((nn))
List
Nodes storing set elements in order
Set elements
1010
Generic MergingGeneric Merging Generalized merge Generalized merge
of two sorted listsof two sorted lists A A and and BB
Template method Template method genericMergegenericMerge
Auxiliary methodsAuxiliary methods– aIsLessaIsLess– bIsLessbIsLess– bothAreEqualbothAreEqual
Runs in Runs in OO((nnAAnnBB)) time provided the time provided the auxiliary methods auxiliary methods run in run in OO(1)(1) time time
Algorithm genericMerge(A, B)S empty sequencewhile A.isEmpty() B.isEmpty()
a A.first().element(); b B.first().element()
if a < baIsLess(a, S); A.remove(A.first())
else if b < abIsLess(b, S); B.remove(B.first())
else { b = a } bothAreEqual(a, b, S)A.remove(A.first()); B.remove(B.first())
while A.isEmpty()aIsLess(a, S); A.remove(A.first())
while B.isEmpty()bIsLess(b, S); B.remove(B.first())
return S
1111
Using Generic Merge for Set OperationsUsing Generic Merge for Set Operations Any of the set operations can be Any of the set operations can be
implemented using a generic mergeimplemented using a generic merge For example:For example:
– For For intersectionintersection: only copy elements that : only copy elements that are duplicated in both listare duplicated in both list
– For For unionunion: copy every element from both : copy every element from both lists except for the duplicateslists except for the duplicates
All methods run in linear time.All methods run in linear time.
1212
Quick-SortQuick-Sort Quick-sortQuick-sort is a is a
randomized sorting randomized sorting algorithm based on the algorithm based on the divide-and-conquer divide-and-conquer paradigm:paradigm:– DivideDivide: pick a random : pick a random
element element xx (called (called pivotpivot) ) and partition and partition SS into into L L elements less than elements less than xx E E elements equal elements equal xx G G elements greater than elements greater than xx
– RecurRecur: sort : sort L L and and GG– ConquerConquer: join : join LL, , EE and and GG
x
x
L GE
x
1313
PartitionPartition We partition an input We partition an input
sequence as follows:sequence as follows:– We remove, in turn, each We remove, in turn, each
element element yy from from SS and and – We insert We insert yy into into LL, , EE or or GG,,
depending on the result depending on the result of the comparison with of the comparison with the pivot the pivot xx
Each insertion and Each insertion and removal is at the removal is at the beginning or at the end beginning or at the end of a sequence, and of a sequence, and hence takes hence takes OO(1)(1) time time
Thus, the partition step Thus, the partition step of quick-sort takes of quick-sort takes OO((nn)) timetime
Algorithm partition(S, p)Input sequence S, position p of pivot Output subsequences L, E, G of the
elements of S less than, equal to,or greater than the pivot, resp.
L, E, G empty sequencesx S.remove(p) while S.isEmpty()
y S.remove(S.first())if y < x
L.insertLast(y)else if y = x
E.insertLast(y)else { y > x }
G.insertLast(y)return L, E, G
1414
Quick-Sort TreeQuick-Sort Tree An execution of quick-sort is depicted by a An execution of quick-sort is depicted by a
binary treebinary tree– Each node represents a recursive call of quick-sort Each node represents a recursive call of quick-sort
and storesand stores Unsorted sequence before the execution and its pivotUnsorted sequence before the execution and its pivot Sorted sequence at the end of the executionSorted sequence at the end of the execution
– The root is the initial call The root is the initial call – The leaves are calls on subsequences of size 0 or 1The leaves are calls on subsequences of size 0 or 1
7 4 9 6 2 2 4 6 7 9
4 2 2 4 7 9 7 9
2 2 9 9
1515
Execution ExampleExecution Example
7 9 7 17 7 9
8 8
7 2 9 4 3 7 6 1 1 2 3 4 6 7 7 9
2 4 3 1 1 2 3 4
1 1 4 3 3 4
9 9 4 4
9 9
1616
Worst-case Running TimeWorst-case Running Time The worst case for quick-sort occurs when the pivot The worst case for quick-sort occurs when the pivot
is the unique minimum or maximum elementis the unique minimum or maximum element One of One of LL and and GG has size has size n n 1 1 and the other has size and the other has size 00 The running time is proportional to the sumThe running time is proportional to the sum
nn ( (nn 1) 1) … … 2 2 Thus, the worst-case running time of quick-sort is Thus, the worst-case running time of quick-sort is
OO((nn22))timetimedepthdepth
11nn 1 1
…………
nn 1 111
nn00
…
1717
Expected Running TimeExpected Running Time Consider a recursive call of quick-sort on a sequence of size Consider a recursive call of quick-sort on a sequence of size ss
– Good callGood call:: the sizes of the sizes of LL and and GG are each less than are each less than 33ss44– Bad callBad call:: one of one of LL and and GG has size greater than has size greater than 33ss44
A call is A call is goodgood with probability with probability 1122– 1/2 of the possible pivots cause good calls:1/2 of the possible pivots cause good calls:7 9 7 1 1
7 2 9 4 3 7 6 1 9
2 4 3 1 7 2 9 4 3 7 61
7 2 9 4 3 7 6 1
Good call Bad call
Good pivotsBad pivots Bad pivots
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1818
Expected Running Time, Part 2Expected Running Time, Part 2 Probabilistic Fact:Probabilistic Fact: The expected number of coin tosses The expected number of coin tosses
required in order to get required in order to get kk heads is heads is 22kk For a node of depth For a node of depth ii, we expect, we expect
– ii2 2 ancestors are good callsancestors are good calls– The size of the input sequence for the current call is at The size of the input sequence for the current call is at
most (most (3344))ii22nn
s(r)
s(a) s(b)
s(c) s(d) s(f)s(e)
time per levelexpected height
O(log n)
O(n)
O(n)
O(n)
total expected time: O(n log n)
Therefore, we haveTherefore, we have– For a node of depth For a node of depth 2log2log4433nn, ,
the expected input size is the expected input size is oneone
– The expected height of the The expected height of the quick-sort tree is quick-sort tree is OO(log (log nn))
The amount or work done at The amount or work done at the nodes of the same depth is the nodes of the same depth is OO((nn))
Thus, the expected running Thus, the expected running time of quick-sort is time of quick-sort is OO((n n log log nn))
1919
In-Place Quick-SortIn-Place Quick-Sort Quick-sort can be Quick-sort can be
implemented to run in-implemented to run in-placeplace
In the partition step, we In the partition step, we use replace operations to use replace operations to rearrange the elements rearrange the elements of the input sequence of the input sequence such thatsuch that– the elements less than the the elements less than the
pivot have rank less than pivot have rank less than hh– the elements equal to the the elements equal to the
pivot have rank between pivot have rank between hh and and kk
– the elements greater than the elements greater than the pivot have rank the pivot have rank greater than greater than kk
Algorithm inPlaceQuickSort(S, l, r)Input sequence S, ranks l and rOutput sequence S with the
elements of rank between l and rrearranged in increasing order
if l r return
i a random integer between l and r x S.elemAtRank(i) (h, k) inPlacePartition(x)inPlaceQuickSort(S, l, h 1)inPlaceQuickSort(S, k 1, r)
The recursive calls The recursive calls considerconsider– elements with rank less elements with rank less
than than hh– elements with rank elements with rank
greater than greater than kk
2020
In-Place PartitioningIn-Place Partitioning Perform the partition using two indices to split S Perform the partition using two indices to split S
into L and E U G (a similar method can split E U G into L and E U G (a similar method can split E U G into E and G).into E and G).
Repeat until j and k cross:Repeat until j and k cross:– Scan j to the right until finding an element Scan j to the right until finding an element >> x. x.– Scan k to the left until finding an element < x.Scan k to the left until finding an element < x.– Swap elements at indices j and kSwap elements at indices j and k
3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 6 9
j k
(pivot = 6)
3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 6 9
j k
2121
Summary of Sorting AlgorithmsSummary of Sorting Algorithms
in-place, randomizedin-place, randomized fastest (good for large inputs)fastest (good for large inputs)
OO((nn log log nn))expectedexpected
quick-sortquick-sort
sequential data accesssequential data access fast (good for huge inputs)fast (good for huge inputs)OO((nn log log nn))merge-sortmerge-sort
in-placein-place fast (good for large inputs)fast (good for large inputs)OO((nn log log nn))heap-sortheap-sort
OO((nn22))
OO((nn22))
TimeTime
insertion-sortinsertion-sort
selection-sortselection-sort
AlgorithmAlgorithm NotesNotes
in-placein-place slow (good for small inputs)slow (good for small inputs)
in-placein-place slow (good for small inputs)slow (good for small inputs)
2222
Comparison-Based SortingComparison-Based Sorting Many sorting algorithms are comparison based.Many sorting algorithms are comparison based.
– They sort by making comparisons between pairs of They sort by making comparisons between pairs of objectsobjects
– Examples: bubble-sort, selection-sort, insertion-sort, Examples: bubble-sort, selection-sort, insertion-sort, heap-sort, merge-sort, quick-sort, ...heap-sort, merge-sort, quick-sort, ...
Let us therefore derive a lower bound on the Let us therefore derive a lower bound on the running time of any algorithm that uses running time of any algorithm that uses comparisons to sort n elements, xcomparisons to sort n elements, x11, x, x22, …, x, …, xnn..
Is xi < xj?
yes
no
2323
Counting ComparisonsCounting Comparisons Let us just count comparisons then.Let us just count comparisons then. Each possible run of the algorithm Each possible run of the algorithm
corresponds to a root-to-leaf path in a corresponds to a root-to-leaf path in a decision treedecision tree
xi < xj ?
xa < xb ?
xm < xo ? xp < xq ?xe < xf ? xk < xl ?
xc < xd ?
2424
Decision Tree HeightDecision Tree Height The height of this decision tree is a lower bound on the The height of this decision tree is a lower bound on the
running timerunning time Every possible input permutation must lead to a separate leaf Every possible input permutation must lead to a separate leaf
output. output. – If not, some input …4…5… would have same output If not, some input …4…5… would have same output
ordering as …5…4…, which would be wrong.ordering as …5…4…, which would be wrong. Since there are n!=1*2*…*n leaves, the height is at least log Since there are n!=1*2*…*n leaves, the height is at least log
(n!)(n!) minimum height (time)
log (n!)
xi < xj ?
xa < xb ?
xm < xo ? xp < xq ?xe < xf ? xk < xl ?
xc < xd ?
n!
2525
The Lower BoundThe Lower Bound Any comparison-based sorting algorithms takes at least log (n!) Any comparison-based sorting algorithms takes at least log (n!)
timetime Therefore, any such algorithm takes time at leastTherefore, any such algorithm takes time at least
That is, any comparison-based sorting algorithm must run in Ω(n That is, any comparison-based sorting algorithm must run in Ω(n log n) time.log n) time.
).2/(log)2/(2
log)!(log2
nnn
n
n