Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a...

44
Algorithms and Data Structures, Fall 2011 Sorting Rasmus Pagh Based on slides by Kevin Wayne, Princeton Algorithms, 4 th Edition · Robert Sedgewick and Kevin Wayne · Copyright © 2002–2011 Monday, September 12, 11

Transcript of Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a...

Page 1: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Algorithms and Data Structures, Fall 2011

Sorting

Rasmus Pagh

Based on slides by Kevin Wayne, PrincetonAlgorithms, 4th Edition · Robert Sedgewick and Kevin Wayne · Copyright © 2002–2011

Monday, September 12, 11

Page 2: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Ex. Student records in a university.

Sort. Rearrange array of N items into ascending order.

2

Sorting problem

item

key

Chen 3 A 991-878-4944 308 Blair

Rohde 2 A 232-343-5555 343 Forbes

Gazsi 4 B 766-093-9873 101 Brown

Furia 1 A 766-093-9873 101 Brown

Kanaga 3 B 898-122-9643 22 Brown

Andrews 3 A 664-480-0023 097 Little

Battle 4 C 874-088-1212 121 Whitman

Andrews 3 A 664-480-0023 097 Little

Battle 4 C 874-088-1212 121 Whitman

Chen 3 A 991-878-4944 308 Blair

Furia 1 A 766-093-9873 101 Brown

Gazsi 4 B 766-093-9873 101 Brown

Kanaga 3 B 898-122-9643 22 Brown

Rohde 2 A 232-343-5555 343 Forbes

Monday, September 12, 11

Page 3: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Application: Find matching records in a database.

Fact:Sorting is one of the most important subroutines for handling large data sets.

3

Sorting problem

Rohde 232-343-5555

Andrews 664-480-0023

Gazsi 766-093-9873

Furia 766-093-9873

Battle 874-088-1212

Kanaga 898-122-9643

Chen 991-878-4944

232-343-5555 Sep 7, 2011 4:13 PM

664-480-0023 Sep 7, 2011 4:17 AM

664-480-0023 Sep 7, 2011 3:59 AM

766-093-9873 Sep 7, 2011 4:19 AM

898-122-9643 Sep 7, 2011 3:51 AM

898-122-9643 Sep 7, 2011 4:01 AM

991-878-4944 Sep 7, 2011 3:47 AM

Monday, September 12, 11

Page 4: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Problem session

Consider the following problem:

Given a list of appointments (start time, end time), find all pairs of overlapping appointments.

Argue that this can be solved easily by sorting!

What is the time spent, not including the sorting step?

4

Monday, September 12, 11

Page 5: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Recursive formulations of sorting

Selection sortssort(A,i..j) {if i<j { Locate the position k of the smallest item in A[i..j]; Exchange A[k] and A[i]; ssort(A,i+1,j) }}

Insertion sortisort(A,i..j) {if i<j { isort(A,i,j-1); Locate the position k of A[j] in the sorted array A[i..j-1]; Move A[j] to position k and shift previous entries in A[k..j-1] one position to the right }}

5

Monday, September 12, 11

Page 6: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Iterative implementations

6

int N = a.length; for (int i = 0; i < N; i++) { int min = i; for (int j = i+1; j < N; j++) if (a[j] < a[min]) min = j; exch(a, i, min); }

int N = a.length; for (int i = 0; i < N; i++) for (int j = i; j > 0; j--) if (a[j] < a[j-1]) exch(a, j, j-1); else break;

Selection sort

Insertion sort

exchange array elements

Animations: www.sorting-algorithms.com

Monday, September 12, 11

Page 7: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Time analysis

7

int N = a.length; for (int i = 0; i < N; i++) { int min = i; for (int j = i+1; j < N; j++) if (a[j] < a[min]) min = j; exch(a, i, min); }

int N = a.length; for (int i = 0; i < N; i++) for (int j = i; j > 0; j--) if (a[j] < a[j-1]) exch(a, j, j-1); else break;

Selection sort

Insertion sort

N-1, N-2,N-3,...,1 iterations

1,2,3,...,N iterations (at most)

~N2/2 comparisons

~N2/2 comparisons (at most)

Monday, September 12, 11

Page 8: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

8

Callbacks

Goal. Sort any type of data.

Q. How can sort() know to compare data of type Double, String, and java.io.File without any information about the type of an item's key?

Callbacks = reference to executable code.

• Client passes array of objects to sort() function.

• The sort() function calls back object's compareTo() method as needed.

Implementing callbacks.

• Java: interfaces.

• C: function pointers.

• C++: class-type functors.

• C#: delegates.

• Python, Perl, ML, Javascript: first-class functions.

Monday, September 12, 11

Page 9: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Callbacks: roadmap

9

sort implementation

client object implementation

import java.io.File;public class FileSorter{ public static void main(String[] args) { File directory = new File(args[0]); File[] files = directory.listFiles(); Insertion.sort(files); for (int i = 0; i < files.length; i++) StdOut.println(files[i].getName()); }}

key point: no reference to File

public static void sort(Comparable[] a){ int N = a.length; for (int i = 0; i < N; i++) for (int j = i; j > 0; j--) if (a[j].compareTo(a[j-1]) < 0) exch(a, j, j-1); else break;}

public class Fileimplements Comparable<File> { ... public int compareTo(File b) { ... return -1; ... return +1; ... return 0; }}

Comparable interface (built in to Java)

public interface Comparable<Item>{ public int compareTo(Item that);}

Monday, September 12, 11

Page 10: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Implement compareTo() so that v.compareTo(w)

• Returns a negative integer, zero, or positive integerif v is less than, equal to, or greater than w.

• Throws an exception if incompatible types or either is null.

Built-in comparable types. String, Double, Integer, Date, File, ...User-defined comparable types. Implement the Comparable interface.

10

Comparable API

less than (return -1) equal to (return 0) greater than (return +1)

v

w v

wv w

Monday, September 12, 11

Page 11: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Java 1.5 API

11

public interface Comparable<T>This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class's compareTo method is referred to as its natural comparison method.

Lists (and arrays) of objects that implement this interface can be sorted automatically by Collections.sort (and Arrays.sort). Objects that implement this interface can be used as keys in a sorted map or elements in a sorted set, without the need to specify a comparator.

Total order:

• Reflexive: v = v.

• Antisymmetric: if v < w, then w > v; if v = w, then w = v.

• Transitive: if v ≤ w and w ≤ x, then v ≤ x.

Monday, September 12, 11

Page 12: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Date data type. Simplified version of java.util.Date.

public class Date implements Comparable<Date>{ private final int month, day, year;

public Date(int m, int d, int y) { month = m; day = d; year = y; }

public int compareTo(Date that) { if (this.year < that.year ) return -1; if (this.year > that.year ) return +1; if (this.month < that.month) return -1; if (this.month > that.month) return +1; if (this.day < that.day ) return -1; if (this.day > that.day ) return +1; return 0; }}

12

Implementing the Comparable interface, an example

only compare dates

to other dates

Monday, September 12, 11

Page 13: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Helper functions. Refer to data through compares and exchanges.

Less. Is item v less than w ?

Exchange. Swap item in array a[] at index i with the one at index j.

13

Two useful sorting abstractions

private static boolean less(Comparable v, Comparable w){ return v.compareTo(w) < 0; }

private static void exch(Comparable[] a, int i, int j){ Comparable swap = a[i]; a[i] = a[j]; a[j] = swap;}

Monday, September 12, 11

Page 14: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Recursive formulations of sorting, cont.

Mergesortmsort(A,i..j) {if i<j { m=(i+j)/2; msort(A,i..m); msort(A,m+1..j); merge(i..m,m+1..j) }}

Quicksortqsort(i..j) {if i<j { m=partition(i..j); qsort(A,i..m); qsort(A,m+1..j)}

14

where work happens;

assuming A[i..m] and A[m+1..j] are sorted, returns sorted list

where work happens;

rearranges A such that all keys in A[1..m] are smaller than all keys in A[m+1..j]

Monday, September 12, 11

Page 15: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Recursive formulations of sorting, cont.

Mergesortmsort(A,i..j) {if i<j { m=(i+j)/2; msort(A,i..m); msort(A,m+1..j); merge(i..m,m+1..j) }}

Quicksortqsort(i..j) {if i<j { m=partition(i..j); qsort(A,i..m); qsort(A,m+1..j)}

14

where work happens;

assuming A[i..m] and A[m+1..j] are sorted, returns sorted list

where work happens;

rearranges A such that all keys in A[1..m] are smaller than all keys in A[m+1..j]

Problem session: Argue that these procedures indeed sort any array A.

Monday, September 12, 11

Page 16: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Sorting routines in Java

15

Examples from the Java 1.5 documentation: •  public static void sort(long[] a)

Sorts the specified array of longs into ascending numerical order. The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets that cause other quicksorts to degrade to quadratic performance.

•  public static void sort(Object[] a)

This sort is guaranteed to be stable: equal elements will not be reordered as a result of the sort.The sorting algorithm is a modified mergesort (in which the merge is omitted if the highest element in the low sublist is less than the lowest element in the high sublist). This algorithm offers guaranteed n*log(n) performance.

Monday, September 12, 11

Page 17: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

16

Merging: Java implementation

A G L O R H I M S T

A G H I L M

i j

k

lo himid

aux[]

a[]

private static void merge(Comparable[] a, Comparable[] aux, int lo, int mid, int hi){ assert isSorted(a, lo, mid); // precondition: a[lo..mid] sorted assert isSorted(a, mid+1, hi); // precondition: a[mid+1..hi] sorted

for (int k = lo; k <= hi; k++) aux[k] = a[k];

int i = lo, j = mid+1; for (int k = lo; k <= hi; k++) { if (i > mid) a[k] = aux[j++]; else if (j > hi) a[k] = aux[i++]; else if (less(aux[j], aux[i])) a[k] = aux[j++]; else a[k] = aux[i++]; }

assert isSorted(a, lo, hi); // postcondition: a[lo..hi] sorted}

copy

merge

Monday, September 12, 11

Page 18: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

17

Mergesort: empirical analysis

Running time estimates:

• Laptop executes 108 compares/second.

• Supercomputer executes 1012 compares/second.

Bottom line. Good algorithms are better than supercomputers.

insertion sort (Ninsertion sort (Ninsertion sort (N2) mergesort (N log N)mergesort (N log N)mergesort (N log N)

computer thousand million billion thousand million billion

home instant 2.8 hours 317 years instant 1 second 18 min

super instant 1 second 1 week instant instant instant

Monday, September 12, 11

Page 19: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Proposition. If D (N) satisfies D (N) = 2 D (N / 2) + N for N > 1, with D (1) = 0,then D (N) = N lg N.Pf 1. [assuming N is a power of 2]

18

Divide-and-conquer recurrence: proof by picture

D (N)

D (N / 2)D (N / 2)

D (N / 4)D (N / 4)D (N / 4) D (N / 4)

D (2) D (2) D (2) D (2) D (2) D (2) D (2)

N

D (N / 2k)

2 (N/2)

2k (N/2k)

N/2 (2)

...

lg N

N lg N

= N

= N

= N

= N

...

D (2)

4 (N/4) = N

Monday, September 12, 11

Page 20: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Proposition. If D (N) satisfies D (N) = 2 D (N / 2) + N for N > 1, with D (1) = 0,then D (N) = N lg N. Pf 2. [assuming N is a power of 2]

19

Divide-and-conquer recurrence: proof by expansion

D(N) = 2 D(N/2) + N

D(N) / N = 2 D(N/2) / N + 1

= D(N/2) / (N/2) + 1

= D(N/4) / (N/4) + 1 + 1

= D(N/8) / (N/8) + 1 + 1 + 1

. . .

= D(N/N) / (N/N) + 1 + 1 + ... + 1

= lg N

given

divide both sides by N

algebra

apply to first term

apply to first term again

stop applying, D(1) = 0

Monday, September 12, 11

Page 21: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

20

Quicksort t-shirt

Monday, September 12, 11

Page 22: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

21

Quicksort: Java code for partitioning

private static int partition(Comparable[] a, int lo, int hi){ int i = lo, j = hi+1; while (true) { while (less(a[++i], a[lo])) if (i == hi) break;

while (less(a[lo], a[--j])) if (j == lo) break; if (i >= j) break; exch(a, i, j); }

exch(a, lo, j); return j;}

swap with partitioning item

check if pointers cross

find item on right to swap

find item on left to swap

swap

return index of item now known to be in place

i

! v" v

j

v

v

lo hi

lo hi

v

! v" v

j

before

during

after

Quicksort partitioning overview

i

! v" v

j

v

v

lo hi

lo hi

v

! v" v

j

before

during

after

Quicksort partitioning overview

i

! v" v

j

v

v

lo hi

lo hi

v

! v" v

j

before

during

after

Quicksort partitioning overview

Monday, September 12, 11

Page 23: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Quicksort analysis sketch (different from book, assuming no duplicates)

• Time is dominated by comparisons involving pivot elements.

• Pivot is chosen at random: When partitioning subarray of m elements around x, then with prob. ≥ 1/2, x ends up in a partition with at most ¾m elements. Pf. Imagine a sorted subarray:

•Diagram of how x’s subarray size may evolve, with probabilities:

•Observation: Number of horizontal steps is at most C lg N, for some constant C, with high prob. Number of pivot comparisons of x is ≤ (C+2) lg N.

22

<1/2 <1/2 <1/2

<1/2 <1/2

<1/2

≥1/2

≥1/2

≥1/2

x x x x

x x x

x x

x

xbad pivot elements

Monday, September 12, 11

Page 24: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

23

Computational complexity. Framework to study efficiency of algorithmsfor solving a particular problem X.

Model of computation. Allowable operations.Cost model. Operation count(s).Upper bound. Cost guarantee provided by some algorithm for X.Lower bound. Proven limit on cost guarantee of all algorithms for X.Optimal algorithm. Algorithm with best possible cost guarantee for X.

Example: sorting.

• Model of computation: decision tree.

• Cost model: # compares.

• Upper bound: ~ N lg N from mergesort.

• Lower bound: ?

• Optimal algorithm: ?

lower bound ~ upper bound

can access information

only through compares

(e.g., our Java sorting framework)

Complexity of sorting

Monday, September 12, 11

Page 25: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

Monday, September 12, 11

Page 26: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 27: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 28: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a b c

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 29: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a b c

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 30: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a c b

a b c

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 31: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a c b c a b

a b c

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 32: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a < c

yes no

a c b c a b

a b c

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 33: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a < c

yes no

a c b c a b

b a ca b c

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 34: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a < c

yes no

a c b c a b

b a ca b c b < c

yes no

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 35: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a < c

yes no

a c b c a b

b a ca b c b < c

yes no

b c a

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 36: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a < c

yes no

a c b c a b

b a ca b c b < c

yes no

b c a c b a

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 37: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a < c

yes no

a c b c a b

b a ca b c b < c

yes no

b c a c b a

height of tree =

worst-case number

of compares

a < b

yes no

code between compares

(e.g., sequence of exchanges)

Monday, September 12, 11

Page 38: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

24

Decision tree (for 3 distinct items a, b, and c)

b < c

yes no

a < c

yes no

a < c

yes no

a c b c a b

b a ca b c b < c

yes no

b c a c b a

height of tree =

worst-case number

of compares

a < b

yes no

code between compares

(e.g., sequence of exchanges)

(at least) one leaf for each possible ordering

Monday, September 12, 11

Page 39: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

25

Compare-based lower bound for sorting

Proposition. Any compare-based sorting algorithm must use at leastlg ( N ! ) ~ N lg N compares in the worst-case.

Pf.

• Assume array consists of N distinct values a1 through aN.

• Worst case dictated by height h of decision tree.

• Binary tree of height h has at most 2 h leaves.

• N ! different orderings ⇒ at least N ! leaves.

at least N! leaves no more than 2h leaves

h

Monday, September 12, 11

Page 40: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

26

Compare-based lower bound for sorting

Proposition. Any compare-based sorting algorithm must use at leastlg ( N ! ) ~ N lg N compares in the worst-case.

Pf.

• Assume array consists of N distinct values a1 through aN.

• Worst case dictated by height h of decision tree.

• Binary tree of height h has at most 2 h leaves.

• N ! different orderings ⇒ at least N ! leaves.

2 h ≥ # leaves ≥ N !

⇒ h ≥ lg ( N ! ) ~ N lg N

Stirling's formula

Monday, September 12, 11

Page 41: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

27

Complexity of sorting

Model of computation. Allowable operations.Cost model. Operation count(s).Upper bound. Cost guarantee provided by some algorithm for X.Lower bound. Proven limit on cost guarantee of all algorithms for X.Optimal algorithm. Algorithm with best possible cost guarantee for X.

Example: sorting.

• Model of computation: decision tree.

• Cost model: # compares.

• Upper bound: ~ N lg N from mergesort.

• Lower bound: ~ N lg N.

• Optimal algorithm: mergesort.

First goal of algorithm design: optimal algorithms.

Monday, September 12, 11

Page 42: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Scandinavians behind the asymptotically fastest sorting algorithms

The optimality of time ~N lg N rests on the assumption that we sort by comparisons only. What if we allow “all kinds of algorithmic tricks”?

28

Mikkel Thorup has shown how to sort n integers in expected time

Solution is complicated – not a practical improvement. A practical O(n log log n) algorithm by Andersson et al. is known.

It is not known if the above is best possible, asymptotically.

Mikkel Thorup

Arne Andersson

Monday, September 12, 11

Page 43: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Course goals

29

• This lecture was particularly relevant for three course goals: – Clearly explain the mechanics of computations

and data structures involving manipulation of references, nested loops and recursion, specified in natural language, in abstract pseudocode or in concrete programming language (Java).

– Assess scalability of a given single-threaded software application, using asymptotic analysis.

– Design algorithms for ad hoc problems by using and combining known algorithms and data structures.

Monday, September 12, 11

Page 44: Mergesort and Quicksort - IT Uitu.dk/people/pagh/ads11/03-Sorting.pdf · The sorting algorithm is a tuned quicksort, ... This algorithm offers n*log(n) performance on many data sets

Next week

We will, in a sense, combine our techniques for list representation and sorting, to obtain efficiently sorted lists.

The search tree data structures that we will study are a core ingredient in modern data management systems.

30

Monday, September 12, 11