Mergesort, Analysis of Algorithms

22
Mergesort, Analysis of Algorithms Jon von Neumann and ENIAC (1945)

description

Mergesort, Analysis of Algorithms. Jon von Neumann and ENIAC (1945). Why Does It Matter?. Run time (nanoseconds). 1.3 N 3. 10 N 2. 47 N log 2 N. 48 N. Time to solve a problem of size. 1000. 1.3 seconds. 10 msec. 0.4 msec. 0.048 msec. 10,000. 22 minutes. 1 second. 6 msec. - PowerPoint PPT Presentation

Transcript of Mergesort, Analysis of Algorithms

Page 1: Mergesort, Analysis of Algorithms

Mergesort, Analysis of Algorithms

Jon von Neumann and ENIAC (1945)

Page 2: Mergesort, Analysis of Algorithms

2

Why Does It Matter?

1000

Time tosolve a

problemof size

10,000

100,000

million

10 million

1.3 seconds

22 minutes

15 days

41 years

41 millennia

920

3,600

14,000

41,000

1,000

Run time(nanoseconds) 1.3 N3

secondMax sizeproblemsolvedin one

minute

hour

day

10 msec

1 second

1.7 minutes

2.8 hours

1.7 weeks

10,000

77,000

600,000

2.9 million

100

10 N2

0.4 msec

6 msec

78 msec

0.94 seconds

11 seconds

1 million

49 million

2.4 trillion

50 trillion

10+

47 N log2N

0.048 msec

0.48 msec

4.8 msec

48 msec

0.48 seconds

21 million

1.3 billion

76 trillion

1,800 trillion

10

48 N

N multiplied by 10,time multiplied by

Page 3: Mergesort, Analysis of Algorithms

3

Orders of Magnitude

10-10

Meters PerSecond

10-8

10-6

10-4

10-2

1

102

1.2 in / decade

ImperialUnits

1 ft / year

3.4 in / day

1.2 ft / hour

2 ft / minute

2.2 mi / hour

220 mi / hour

Continental drift

Example

Hair growing

Glacier

Gastro-intestinal tract

Ant

Human walk

Propeller airplane

104

106

108

370 mi / min

620 mi / sec

62,000 mi / sec

Space shuttle

Earth in galactic orbit

1/3 speed of light

1

Seconds

102

103

104

105

106

107

108

109

1010

1 second

Equivalent

1.7 minutes

17 minutes

2.8 hours

1.1 days

1.6 weeks

3.8 months

3.1 years

3.1 decades

3.1 centuries

forever

1021 age ofuniverse

210 thousand

220 million

230 billion

. . .

10 10 seconds

Powersof 2

Page 4: Mergesort, Analysis of Algorithms

4

Impact of Better Algorithms

Example 1: N-body-simulation. Simulate gravitational interactions among N bodies.

– physicists want N = # atoms in universe Brute force method: N2 steps. Appel (1981). N log N steps, enables new research.

Example 2: Discrete Fourier Transform (DFT). Breaks down waveforms (sound) into periodic components.

– foundation of signal processing– CD players, JPEG, analyzing astronomical data, etc.

Grade school method: N2 steps. Runge-König (1924), Cooley-Tukey (1965).

FFT algorithm: N log N steps, enables new technology.

Page 5: Mergesort, Analysis of Algorithms

5

Mergesort

Mergesort (divide-and-conquer) Divide array into two halves.

A L G O R I T H M S

divideA L G O R I T H M S

Page 6: Mergesort, Analysis of Algorithms

6

Mergesort

Mergesort (divide-and-conquer) Divide array into two halves. Recursively sort each half.

sort

A L G O R I T H M S

divideA L G O R I T H M S

A G L O R H I M S T

Page 7: Mergesort, Analysis of Algorithms

7

Mergesort

Mergesort (divide-and-conquer) Divide array into two halves. Recursively sort each half. Merge two halves to make sorted whole.

merge

sort

A L G O R I T H M S

divideA L G O R I T H M S

A G L O R H I M S T

A G H I L M O R S T

Page 8: Mergesort, Analysis of Algorithms

8

Mergesort Analysis

How long does mergesort take? Bottleneck = merging (and copying).

– merging two files of size N/2 requires N comparisons T(N) = comparisons to mergesort N elements.

– to make analysis cleaner, assume N is a power of 2

Claim. T(N) = N log2 N.

Note: same number of comparisons for ANY file.– even already sorted

We'll prove several different ways to illustrate standard techniques.

otherwise)2/(2

1 if0)T(

merginghalves both sorting

NNT

NN

Page 9: Mergesort, Analysis of Algorithms

9

Proof by Picture of Recursion Tree

T(N)

T(N/2)T(N/2)

T(N/4)T(N/4)T(N/4) T(N/4)

T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(2)

N

T(N / 2k)

2(N/2)

4(N/4)

2k (N / 2k)

N/2 (2)

. . .

. . .log2N

N log2N

otherwise)2/(2

1 if0)T(

merginghalves both sorting

NNT

NN

Page 10: Mergesort, Analysis of Algorithms

10

Proof by Telescoping

Claim. T(N) = N log2 N (when N is a power of 2).

Proof. For N > 1:

otherwise)2/(2

1 if0)T(

merginghalves both sorting

NNT

NN

N

NN

NNT

N

NT

N

NT

N

NT

N

NT

N

2

log

log

11/

)/(

114/

)4/(

12/

)2/(

1)2/(2)(

2

Page 11: Mergesort, Analysis of Algorithms

11

Mathematical Induction

Mathematical induction. Powerful and general proof technique in discrete mathematics. To prove a theorem true for all integers k 0:

– Base case: prove it to be true for N = 0.– Induction hypothesis: assuming it is true for arbitrary N– Induction step: show it is true for N + 1

Claim: 0 + 1 + 2 + 3 + . . . + N = N(N+1) / 2 for all N 0.

Proof: (by mathematical induction) Base case (N = 0).

– 0 = 0(0+1) / 2. Induction hypothesis: assume 0 + 1 + 2 + . . . + N = N(N+1) / 2 Induction step: 0 + 1 + . . . + N + N + 1 = (0 + 1 + . . . + N) + N+1

= N (N+1) /2 + N+1= (N+2)(N+1) / 2

Page 12: Mergesort, Analysis of Algorithms

12

Proof by Induction

Claim. T(N) = N log2 N (when N is a power of 2).

Proof. (by induction on N) Base case: N = 1. Inductive hypothesis: T(N) = N log2 N.

Goal: show that T(2N) = 2N log2 (2N).

)2(log2

21)2(log2

2log2

2)(2)2(

2

2

2

NN

NNN

NNN

NNTNT

otherwise)2/(2

1 if0)T(

merginghalves both sorting

NNT

NN

Page 13: Mergesort, Analysis of Algorithms

13

Proof by Induction

What if N is not a power of 2?

T(N) satisfies following recurrence.

Claim. T(N) N log2 N.

Proof. See supplemental slides.

otherwise2/2/

1 if0

)T(

merginghalfright solvehalfleft solve

NNTNT

N

N

Page 14: Mergesort, Analysis of Algorithms

14

Computational Complexity

Framework to study efficiency of algorithms. Example = sorting.

MACHINE MODEL = count fundamental operations.– count number of comparisons

UPPER BOUND = algorithm to solve the problem (worst-case).

– N log2 N from mergesort

LOWER BOUND = proof that no algorithm can do better.

– N log2 N - N log2 e

OPTIMAL ALGORITHM: lower bound ~ upper bound.– mergesort

Page 15: Mergesort, Analysis of Algorithms

15

Decision Tree

printa1, a2, a3

a1 < a2

YES NO

a2 < a3

YES NO

a2 < a3

YES NO

a1 < a3

YES NO

a1 < a3

YES NO

printa1, a3, a2

printa3, a1, a2

printa2, a1, a3

printa2, a3, a1

printa3, a2, a1

Page 16: Mergesort, Analysis of Algorithms

16

Comparison Based Sorting Lower Bound

Theorem. Any comparison based sorting algorithm must use(N log2N) comparisons.

Proof. Worst case dictated by tree height h. N! different orderings. One (or more) leaves corresponding to each ordering. Binary tree with N! leaves must have height

Food for thought. What if we don't use comparisons? Stay tuned for radix sort.

eNNN

eN

NhN

22

2

2

loglog

)/(log

)!(log

Stirling's formula

Page 17: Mergesort, Analysis of Algorithms

Extra Slides

Page 18: Mergesort, Analysis of Algorithms

18

Proof by Induction

Claim. T(N) N log2 N.

Proof. (by induction on N) Base case: N = 1. Define n1 = N / 2 , n2 = N / 2. Induction step: assume true for 1, 2, . . . , N – 1.

NN

NNN

NnN

Nnnnn

Nnnnn

NnTnTNT

2

2

22

222221

222121

21

log

)1log(

log

loglog

loglog

)()()(

1loglog

2/2

2/

222

log

2

2

Nn

Nn

N

otherwise2/2/

1 if0

)T(

merginghalfright solvehalfleft solve

NNTNT

N

N

Page 19: Mergesort, Analysis of Algorithms

19

Implementing Mergesort

Item aux[MAXN];

void mergesort(Item a[], int left, int right) { int mid = (right + left) / 2; if (right <= left) return; mergesort(a, left, mid); mergesort(a, mid + 1, right); merge(a, left, mid, right);}

mergesort (see Sedgewick Program 8.3)

uses scratch array

Page 20: Mergesort, Analysis of Algorithms

20

Implementing Mergesort

void merge(Item a[], int left, int mid, int right) { int i, j, k; for (i = mid+1; i > left; i--) aux[i-1] = a[i-1]; for (j = mid; j < right; j++) aux[right+mid-j] = a[j+1];

for (k = left; k <= right; k++) if (ITEMless(aux[i], aux[j])) a[k] = aux[i++]; else a[k] = aux[j--];}

merge (see Sedgewick Program 8.2)

copy to temporary array

merge two sorted sequences

Page 21: Mergesort, Analysis of Algorithms

21

Profiling Mergesort Empirically

void merge(Item a[], int left, int mid, int right) <999>{ int i, j, k; for (<999>i = mid+1; <6043>i > left; <5044>i--) <5044>aux[i-1] = a[i-1]; for (<999>j = mid; <5931>j < right; <4932>j++) <4932>aux[right+mid-j] = a[j+1]; for (<999>k = left; <10975>k <= right; <9976>k++) if (<9976>ITEMless(aux[i], aux[j])) <4543>a[k] = aux[i++]; else <5433>a[k] = aux[j--];<999>}

void mergesort(Item a[], int left, int right) <1999>{ int mid = <1999>(right + left) / 2; if (<1999>right <= left) return<1000>; <999>mergesort(a, aux, left, mid); <999>mergesort(a, aux, mid+1, right); <999>merge(a, aux, left, mid, right);<1999>}

Mergesort prof.out

Striking feature:All numbers SMALL!

# comparisonsTheory ~ N log2 N = 9,966

Actual = 9,976

Page 22: Mergesort, Analysis of Algorithms

22

Sorting Analysis Summary

Running time estimates: Home pc executes 108 comparisons/second. Supercomputer executes 1012 comparisons/second.

Lesson 1: good algorithms are better than supercomputers.

Lesson 2: great algorithms are better than good ones.

computer

home

super

thousand

instant

instant

million

2.8 hours

1 second

billion

317 years

1.6 weeks

Insertion Sort (N2)

thousand

instant

instant

million

1 sec

instant

billion

18 min

instant

Mergesort (N log N)

thousand

instant

instant

million

0.3 sec

instant

billion

6 min

instant

Quicksort (N log N)