Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a...
Transcript of Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a...
Algorithms and Data Structures, Fall 2011
Sorting
Rasmus Pagh
Based on slides by Kevin Wayne, PrincetonAlgorithms, 4th Edition · Robert Sedgewick and Kevin Wayne · Copyright © 2002–2011
Monday, September 12, 11
Ex. Student records in a university.
Sort. Rearrange array of N items into ascending order.
2
Sorting problem
item
key
Chen 3 A 991-878-4944 308 Blair
Rohde 2 A 232-343-5555 343 Forbes
Gazsi 4 B 766-093-9873 101 Brown
Furia 1 A 766-093-9873 101 Brown
Kanaga 3 B 898-122-9643 22 Brown
Andrews 3 A 664-480-0023 097 Little
Battle 4 C 874-088-1212 121 Whitman
Andrews 3 A 664-480-0023 097 Little
Battle 4 C 874-088-1212 121 Whitman
Chen 3 A 991-878-4944 308 Blair
Furia 1 A 766-093-9873 101 Brown
Gazsi 4 B 766-093-9873 101 Brown
Kanaga 3 B 898-122-9643 22 Brown
Rohde 2 A 232-343-5555 343 Forbes
Monday, September 12, 11
Application: Find matching records in a database.
Fact:Sorting is one of the most important subroutines for handling large data sets.
3
Sorting problem
Rohde 232-343-5555
Andrews 664-480-0023
Gazsi 766-093-9873
Furia 766-093-9873
Battle 874-088-1212
Kanaga 898-122-9643
Chen 991-878-4944
232-343-5555 Sep 7, 2011 4:13 PM
664-480-0023 Sep 7, 2011 4:17 AM
664-480-0023 Sep 7, 2011 3:59 AM
766-093-9873 Sep 7, 2011 4:19 AM
898-122-9643 Sep 7, 2011 3:51 AM
898-122-9643 Sep 7, 2011 4:01 AM
991-878-4944 Sep 7, 2011 3:47 AM
Monday, September 12, 11
Problem session
Consider the following problem:
Given a list of appointments (start time, end time), find all pairs of overlapping appointments.
Argue that this can be solved easily by sorting!
What is the time spent, not including the sorting step?
4
Monday, September 12, 11
Recursive formulations of sorting
Selection sortssort(A,i..j) {if i<j { Locate the position k of the smallest item in A[i..j]; Exchange A[k] and A[i]; ssort(A,i+1,j) }}
Insertion sortisort(A,i..j) {if i<j { isort(A,i,j-1); Locate the position k of A[j] in the sorted array A[i..j-1]; Move A[j] to position k and shift previous entries in A[k..j-1] one position to the right }}
5
Monday, September 12, 11
Iterative implementations
6
int N = a.length; for (int i = 0; i < N; i++) { int min = i; for (int j = i+1; j < N; j++) if (a[j] < a[min]) min = j; exch(a, i, min); }
int N = a.length; for (int i = 0; i < N; i++) for (int j = i; j > 0; j--) if (a[j] < a[j-1]) exch(a, j, j-1); else break;
Selection sort
Insertion sort
exchange array elements
Animations: www.sorting-algorithms.com
Monday, September 12, 11
Time analysis
7
int N = a.length; for (int i = 0; i < N; i++) { int min = i; for (int j = i+1; j < N; j++) if (a[j] < a[min]) min = j; exch(a, i, min); }
int N = a.length; for (int i = 0; i < N; i++) for (int j = i; j > 0; j--) if (a[j] < a[j-1]) exch(a, j, j-1); else break;
Selection sort
Insertion sort
N-1, N-2,N-3,...,1 iterations
1,2,3,...,N iterations (at most)
~N2/2 comparisons
~N2/2 comparisons (at most)
Monday, September 12, 11
8
Callbacks
Goal. Sort any type of data.
Q. How can sort() know to compare data of type Double, String, and java.io.File without any information about the type of an item's key?
Callbacks = reference to executable code.
• Client passes array of objects to sort() function.
• The sort() function calls back object's compareTo() method as needed.
Implementing callbacks.
• Java: interfaces.
• C: function pointers.
• C++: class-type functors.
• C#: delegates.
• Python, Perl, ML, Javascript: first-class functions.
Monday, September 12, 11
Callbacks: roadmap
9
sort implementation
client object implementation
import java.io.File;public class FileSorter{ public static void main(String[] args) { File directory = new File(args[0]); File[] files = directory.listFiles(); Insertion.sort(files); for (int i = 0; i < files.length; i++) StdOut.println(files[i].getName()); }}
key point: no reference to File
public static void sort(Comparable[] a){ int N = a.length; for (int i = 0; i < N; i++) for (int j = i; j > 0; j--) if (a[j].compareTo(a[j-1]) < 0) exch(a, j, j-1); else break;}
public class Fileimplements Comparable<File> { ... public int compareTo(File b) { ... return -1; ... return +1; ... return 0; }}
Comparable interface (built in to Java)
public interface Comparable<Item>{ public int compareTo(Item that);}
Monday, September 12, 11
Implement compareTo() so that v.compareTo(w)
• Returns a negative integer, zero, or positive integerif v is less than, equal to, or greater than w.
• Throws an exception if incompatible types or either is null.
Built-in comparable types. String, Double, Integer, Date, File, ...User-defined comparable types. Implement the Comparable interface.
10
Comparable API
less than (return -1) equal to (return 0) greater than (return +1)
v
w v
wv w
Monday, September 12, 11
Java 1.5 API
11
public interface Comparable<T>This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class's compareTo method is referred to as its natural comparison method.
Lists (and arrays) of objects that implement this interface can be sorted automatically by Collections.sort (and Arrays.sort). Objects that implement this interface can be used as keys in a sorted map or elements in a sorted set, without the need to specify a comparator.
Total order:
• Reflexive: v = v.
• Antisymmetric: if v < w, then w > v; if v = w, then w = v.
• Transitive: if v ≤ w and w ≤ x, then v ≤ x.
Monday, September 12, 11
Date data type. Simplified version of java.util.Date.
public class Date implements Comparable<Date>{ private final int month, day, year;
public Date(int m, int d, int y) { month = m; day = d; year = y; }
public int compareTo(Date that) { if (this.year < that.year ) return -1; if (this.year > that.year ) return +1; if (this.month < that.month) return -1; if (this.month > that.month) return +1; if (this.day < that.day ) return -1; if (this.day > that.day ) return +1; return 0; }}
12
Implementing the Comparable interface, an example
only compare dates
to other dates
Monday, September 12, 11
Helper functions. Refer to data through compares and exchanges.
Less. Is item v less than w ?
Exchange. Swap item in array a[] at index i with the one at index j.
13
Two useful sorting abstractions
private static boolean less(Comparable v, Comparable w){ return v.compareTo(w) < 0; }
private static void exch(Comparable[] a, int i, int j){ Comparable swap = a[i]; a[i] = a[j]; a[j] = swap;}
Monday, September 12, 11
Recursive formulations of sorting, cont.
Mergesortmsort(A,i..j) {if i<j { m=(i+j)/2; msort(A,i..m); msort(A,m+1..j); merge(i..m,m+1..j) }}
Quicksortqsort(i..j) {if i<j { m=partition(i..j); qsort(A,i..m); qsort(A,m+1..j)}
14
where work happens;
assuming A[i..m] and A[m+1..j] are sorted, returns sorted list
where work happens;
rearranges A such that all keys in A[1..m] are smaller than all keys in A[m+1..j]
Monday, September 12, 11
Recursive formulations of sorting, cont.
Mergesortmsort(A,i..j) {if i<j { m=(i+j)/2; msort(A,i..m); msort(A,m+1..j); merge(i..m,m+1..j) }}
Quicksortqsort(i..j) {if i<j { m=partition(i..j); qsort(A,i..m); qsort(A,m+1..j)}
14
where work happens;
assuming A[i..m] and A[m+1..j] are sorted, returns sorted list
where work happens;
rearranges A such that all keys in A[1..m] are smaller than all keys in A[m+1..j]
Problem session: Argue that these procedures indeed sort any array A.
Monday, September 12, 11
Sorting routines in Java
15
Examples from the Java 1.5 documentation: • public static void sort(long[] a)
Sorts the specified array of longs into ascending numerical order. The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets that cause other quicksorts to degrade to quadratic performance.
• public static void sort(Object[] a)
This sort is guaranteed to be stable: equal elements will not be reordered as a result of the sort.The sorting algorithm is a modified mergesort (in which the merge is omitted if the highest element in the low sublist is less than the lowest element in the high sublist). This algorithm offers guaranteed n*log(n) performance.
Monday, September 12, 11
16
Merging: Java implementation
A G L O R H I M S T
A G H I L M
i j
k
lo himid
aux[]
a[]
private static void merge(Comparable[] a, Comparable[] aux, int lo, int mid, int hi){ assert isSorted(a, lo, mid); // precondition: a[lo..mid] sorted assert isSorted(a, mid+1, hi); // precondition: a[mid+1..hi] sorted
for (int k = lo; k <= hi; k++) aux[k] = a[k];
int i = lo, j = mid+1; for (int k = lo; k <= hi; k++) { if (i > mid) a[k] = aux[j++]; else if (j > hi) a[k] = aux[i++]; else if (less(aux[j], aux[i])) a[k] = aux[j++]; else a[k] = aux[i++]; }
assert isSorted(a, lo, hi); // postcondition: a[lo..hi] sorted}
copy
merge
Monday, September 12, 11
17
Mergesort: empirical analysis
Running time estimates:
• Laptop executes 108 compares/second.
• Supercomputer executes 1012 compares/second.
Bottom line. Good algorithms are better than supercomputers.
insertion sort (Ninsertion sort (Ninsertion sort (N2) mergesort (N log N)mergesort (N log N)mergesort (N log N)
computer thousand million billion thousand million billion
home instant 2.8 hours 317 years instant 1 second 18 min
super instant 1 second 1 week instant instant instant
Monday, September 12, 11
Proposition. If D (N) satisfies D (N) = 2 D (N / 2) + N for N > 1, with D (1) = 0,then D (N) = N lg N.Pf 1. [assuming N is a power of 2]
18
Divide-and-conquer recurrence: proof by picture
D (N)
D (N / 2)D (N / 2)
D (N / 4)D (N / 4)D (N / 4) D (N / 4)
D (2) D (2) D (2) D (2) D (2) D (2) D (2)
N
D (N / 2k)
2 (N/2)
2k (N/2k)
N/2 (2)
...
lg N
N lg N
= N
= N
= N
= N
...
D (2)
4 (N/4) = N
Monday, September 12, 11
Proposition. If D (N) satisfies D (N) = 2 D (N / 2) + N for N > 1, with D (1) = 0,then D (N) = N lg N. Pf 2. [assuming N is a power of 2]
19
Divide-and-conquer recurrence: proof by expansion
D(N) = 2 D(N/2) + N
D(N) / N = 2 D(N/2) / N + 1
= D(N/2) / (N/2) + 1
= D(N/4) / (N/4) + 1 + 1
= D(N/8) / (N/8) + 1 + 1 + 1
. . .
= D(N/N) / (N/N) + 1 + 1 + ... + 1
= lg N
given
divide both sides by N
algebra
apply to first term
apply to first term again
stop applying, D(1) = 0
Monday, September 12, 11
20
Quicksort t-shirt
Monday, September 12, 11
21
Quicksort: Java code for partitioning
private static int partition(Comparable[] a, int lo, int hi){ int i = lo, j = hi+1; while (true) { while (less(a[++i], a[lo])) if (i == hi) break;
while (less(a[lo], a[--j])) if (j == lo) break; if (i >= j) break; exch(a, i, j); }
exch(a, lo, j); return j;}
swap with partitioning item
check if pointers cross
find item on right to swap
find item on left to swap
swap
return index of item now known to be in place
i
! v" v
j
v
v
lo hi
lo hi
v
! v" v
j
before
during
after
Quicksort partitioning overview
i
! v" v
j
v
v
lo hi
lo hi
v
! v" v
j
before
during
after
Quicksort partitioning overview
i
! v" v
j
v
v
lo hi
lo hi
v
! v" v
j
before
during
after
Quicksort partitioning overview
Monday, September 12, 11
Quicksort analysis sketch (different from book, assuming no duplicates)
• Time is dominated by comparisons involving pivot elements.
• Pivot is chosen at random: When partitioning subarray of m elements around x, then with prob. ≥ 1/2, x ends up in a partition with at most ¾m elements. Pf. Imagine a sorted subarray:
•Diagram of how x’s subarray size may evolve, with probabilities:
•Observation: Number of horizontal steps is at most C lg N, for some constant C, with high prob. Number of pivot comparisons of x is ≤ (C+2) lg N.
22
<1/2 <1/2 <1/2
<1/2 <1/2
<1/2
≥1/2
≥1/2
≥1/2
x x x x
x x x
x x
x
xbad pivot elements
Monday, September 12, 11
23
Computational complexity. Framework to study efficiency of algorithmsfor solving a particular problem X.
Model of computation. Allowable operations.Cost model. Operation count(s).Upper bound. Cost guarantee provided by some algorithm for X.Lower bound. Proven limit on cost guarantee of all algorithms for X.Optimal algorithm. Algorithm with best possible cost guarantee for X.
Example: sorting.
• Model of computation: decision tree.
• Cost model: # compares.
• Upper bound: ~ N lg N from mergesort.
• Lower bound: ?
• Optimal algorithm: ?
lower bound ~ upper bound
can access information
only through compares
(e.g., our Java sorting framework)
Complexity of sorting
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a b c
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a b c
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a c b
a b c
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a c b c a b
a b c
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a < c
yes no
a c b c a b
a b c
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a < c
yes no
a c b c a b
b a ca b c
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a < c
yes no
a c b c a b
b a ca b c b < c
yes no
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a < c
yes no
a c b c a b
b a ca b c b < c
yes no
b c a
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a < c
yes no
a c b c a b
b a ca b c b < c
yes no
b c a c b a
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a < c
yes no
a c b c a b
b a ca b c b < c
yes no
b c a c b a
height of tree =
worst-case number
of compares
a < b
yes no
code between compares
(e.g., sequence of exchanges)
Monday, September 12, 11
24
Decision tree (for 3 distinct items a, b, and c)
b < c
yes no
a < c
yes no
a < c
yes no
a c b c a b
b a ca b c b < c
yes no
b c a c b a
height of tree =
worst-case number
of compares
a < b
yes no
code between compares
(e.g., sequence of exchanges)
(at least) one leaf for each possible ordering
Monday, September 12, 11
25
Compare-based lower bound for sorting
Proposition. Any compare-based sorting algorithm must use at leastlg ( N ! ) ~ N lg N compares in the worst-case.
Pf.
• Assume array consists of N distinct values a1 through aN.
• Worst case dictated by height h of decision tree.
• Binary tree of height h has at most 2 h leaves.
• N ! different orderings ⇒ at least N ! leaves.
at least N! leaves no more than 2h leaves
h
Monday, September 12, 11
26
Compare-based lower bound for sorting
Proposition. Any compare-based sorting algorithm must use at leastlg ( N ! ) ~ N lg N compares in the worst-case.
Pf.
• Assume array consists of N distinct values a1 through aN.
• Worst case dictated by height h of decision tree.
• Binary tree of height h has at most 2 h leaves.
• N ! different orderings ⇒ at least N ! leaves.
2 h ≥ # leaves ≥ N !
⇒ h ≥ lg ( N ! ) ~ N lg N
Stirling's formula
Monday, September 12, 11
27
Complexity of sorting
Model of computation. Allowable operations.Cost model. Operation count(s).Upper bound. Cost guarantee provided by some algorithm for X.Lower bound. Proven limit on cost guarantee of all algorithms for X.Optimal algorithm. Algorithm with best possible cost guarantee for X.
Example: sorting.
• Model of computation: decision tree.
• Cost model: # compares.
• Upper bound: ~ N lg N from mergesort.
• Lower bound: ~ N lg N.
• Optimal algorithm: mergesort.
First goal of algorithm design: optimal algorithms.
Monday, September 12, 11
Scandinavians behind the asymptotically fastest sorting algorithms
The optimality of time ~N lg N rests on the assumption that we sort by comparisons only. What if we allow “all kinds of algorithmic tricks”?
28
Mikkel Thorup has shown how to sort n integers in expected time
Solution is complicated – not a practical improvement. A practical O(n log log n) algorithm by Andersson et al. is known.
It is not known if the above is best possible, asymptotically.
Mikkel Thorup
Arne Andersson
Monday, September 12, 11
Course goals
29
• This lecture was particularly relevant for three course goals: – Clearly explain the mechanics of computations
and data structures involving manipulation of references, nested loops and recursion, specified in natural language, in abstract pseudocode or in concrete programming language (Java).
– Assess scalability of a given single-threaded software application, using asymptotic analysis.
– Design algorithms for ad hoc problems by using and combining known algorithms and data structures.
Monday, September 12, 11
Next week
We will, in a sense, combine our techniques for list representation and sorting, to obtain efficiently sorted lists.
The search tree data structures that we will study are a core ingredient in modern data management systems.
30
Monday, September 12, 11