Chapter 7 Analysis of Algorithms © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights...

Chapter 7

Analysis of Algorithms

© 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Overview

7.1: Shows how to directly time programs and points out some problems with this approach.

7.2: Powerful mathematical tools to simplify our reasoning about the efficiency of algorithms.

7.3: The connections between this math and actual algorithms.

7.4 and 7.5: Considering the average or the worst-case running time.

Timing

To know which of two methods is faster, the most obvious approach is to time them.– System.currentTimeMillis()– The number of milliseconds since midnight,

January 1, 1970, Greenwich mean time.

Timing

Timing

Program output:

A modern computer is so fast that, on either of these data structures, the get() method takes less than one millisecond to run.

Timing

Timing

Results Vary

– Variation is beyond our control.– ArrayList (refer to p138) is roughly twice as fast as

the method from LinkedList.

Timing

If we get element 50 instead of element 5.

– In ArrayList, get() method jumps right to the array element in question.

– LinkList traverse all the previous list nodes to find the one we want.

– Timing experiment provides empirical evidence.– We can use some formal, mathematical tools to

make statements about the efficiency of an algorithm.

Asymptotic Analysis To estimate the running time or other resource (such as

memory space) requirements of an algorithm, we simplify the analysis and keeps us thinking about the most important aspect: the growth rate (of running time).

This is called “asymptotic algorithm analysis” Assume method A is (10n² - 5) milliseconds to process n

elements. Assume method B is (100n + 200) milliseconds. The differences for small values of n are relatively insignificant. In the graph below, for a small n, method A looks better.

Asymptotic Analysis What really concerns us is the asymptotic behavior of the

running-time functions:

What happens as n becomes very large?

Asymptotic Analysis

Theorem: 10n² – 5 > 100n + 200 for any value of n ≥ 12.

Proof: for any n ≥ 12:

10n² - 5 ≥ 10*12n - 5

≥ 120n - 5

≥ 100n + 20n – 5

≥ 100n + 20*12 - 5

≥ 100n + 240 - 5

> 100n + 200

Asymptotic Analysis

First and third inequalities follow because of the assumption that n ≥ 12.

Actual running times will depend on a number of factors.

Make statements about the speed of an algorithm in general, rather than a particular implementation of it.– This way, we don't have to redo our analysis if we

change programming languages or buy a faster computer.

Asymptotic Notation

To keep our running-time expressions general, we allow them to contain unspecified constants.– Algorithm A is an² + b– Algorithm B is cn + d

Where n is the number of elements to be processed Where a, b, c, and d are unspecified constants that

depend on factors such as the speed of the hardware.

Asymptotic Notation Monotonically nondecreasing (increasing

consistently)– Usually f(n + 1) ≥ f(n)– Algorithm for which the running-time function did not

fit into this category would be fairly strange.– For example:

if f(n) = 3n + 1. Let n =4, then f(n) = 3 * 4 + 1 = 13

and f(n+1) = f(5) = 3 * 5 + 1 = 16.

Hence, f(n + 1) ≥ f(n).– We call function f(n) is monotonically nondecreasing.

Asymptotic Notation

Upper Bound: the upper or highest growth rate that the algorithm can have. It’s the upper bound for the growth rate expressed as an equation.

If the upper bound for an algorithm’s growth rate is f(n), then we would write that this algorithm is in O(f(n)) in the worst case.

The symbol O is pronounced “big-oh”. For example, if n2 grows as fast as T(n) for the worst-

case input, we would say the algorithm is in “O(n2) in the worst case”.

Asymptotic Notation

Definition 7.1: T(n) is in the set O(f(n)) if there exist two positive constants c and n0 such that |T(n)| ≤ c|f(n)| for all n > n0.

For example:

– consider the sequential search algorithm for finding a specified value in an array.

– If visiting and testing one value in the array requires cs steps where cs is a positive number,

– then in the average case T(n) = csn/2. For all values of n > 1, |T(n)| = |csn/2| ≤ cs|n|.

– Therefore, by the definition, T(n) is in O(n) for n0 = 1 and c = cs.

– The time complexity (in the worst case) of the sequential search algorithm is O(n) .

Asymptotic Notation

Lowper Bound: to describe the least amount of a resource that an algorithm need for some class of input (the worst-, average-, or best-case input) of size n.

It’s the lower bound for the growth rate expressed as an equation.

The lower bound for an algorithm is denoted by the symbol Ω (pronounced “big-omega”).

Asymptotic Notation

Definition 7.2: T(n) is in the set Ω(f(n)) if there exist two positive constants c and n0 such that |T(n)| ≥ c|f(n)| for all n > n0.

For example:

Assume T(n) = c1n2 + c2n for c1 and c2 > 0.

Then, | c1n2 + c2n | ≥ |c1n2| ≥ c1|n2| for all n > 1.

So |T(n)| ≥ c|n2| for c = c1 and n0 = 1.

Therefore, by the definition, T(n) is in Ω(n2).

Asymptotic Notation When the upper and lower bounds are the same, we indicate

this by using Θ (big-Theta) notation. An algorithm is said to be Θ(f(n)) if it is in O(f(n)) and in Ω(f(n)). Since the sequential search algorithm is both in O(n) and in

Ω(n) in the average case, we say it is Θ(n) in the average case. Orders: Functions can be classified into orders. Θ(f): tight bound

– is pronounced “the order of f” or “order f”– n² is one of the functions in Θ(n²)– Θ(2ⁿ) is the largest– Θ(1) is the smallest.

For sufficiently large n, a function is asymptotically larger than any function in a lower order.

For example: Θ(n log n) is asymptotically larger than any function in Θ(n).

Asymptotic Notation

Figure 7-5 (p189) Some common orders of functions.

Asymptotic Notation

Multiplying or dividing a function by a positive constant doesn't change its order.

3n² and 0.2n² are both in Θ(n²) A function's order is not changed by adding or

subtracting a function in a lower order.

2ⁿ – n + log n is in Θ(2ⁿ) Problem: f(n) = 5n³ + 3n² – 4n + 11 Answer: Θ(n³)

Asymptotic Notation

Column frobbing algorithm is in Θ(n²) Row zorching algorithm is in Θ(n).

Θ(n) is a lower order, so we should choose row zorching.

Figure 7-6 (p189) Running-time expressions for two fictitious algorithms

Asymptotic Notation

Synchronized absquatulation takes time in Θ(n²).

It’s better than Θ(n3) in the Dynamic inversion algorithm.

Figure 7-7 (p190) Running-time expressions for other two fictitious algorithms

Asymptotic Notation

At least

Or

If n is a positive integer and n > 2, thenn! = n * (n-1) * (n-2) * (n-3) * … * 1 > 2 * 2 * 2 * 2 * … * 1 > 2n-1 * 1 > 2n /2

Figure 7-7 (p190) Running-time expressions for other two fictitious algorithms

Asymptotic Notation

Factorial decortication takes time which is at least in Θ(2ⁿ).

Order Θ(n³), we should choose cubic flensing because Θ(n!) ≥ Θ(2ⁿ) ≥ Θ(n³)

If Θ(f) is the set of functions which grow like f, then Ω(f) is the set of functions which grow like f or much more quickly.

We proved that n! is in Ω(2ⁿ) Ω: lower bound.

Asymptotic Notation

Figure 7-9 (p191) order relations between two functions f and g. The symbol is read “a member of (the set)”.

Asymptotic Notation

A function in O(g) might actually be larger than g.For example 3n² O(n²), but 3n² is larger than n².f O(g) if and only if, for some constant c>0, there

is some n0 ≥ 0 such that f(n) < cg(n) for any n ≥ n

0.

n0 is a threshold so that only large values of n are

considered (e.g. In Figues 7-3, n0 was 12).

f Ω(g) if and only if, for some constant c > 0, there is some n

0 ≥ 0 such that f(n) > cg(n) for any

n ≥ n0.

f Θ(g) if and only if f O(g) and f Ω(g).– Showing the f O(g) is called finding an

asymptotic upper bound on f.– Showing that f Ω(g) is called finding an

asymptotic lower bound.– Showing that f Θ(g) is called finding an

asymptotically tight bound.

Asymptotic Notation

For k > 1, the orders Θ(kn) are called exponential orders.

For k > 0, the orders Θ(nk) are called polynomial orders.

Counting Steps

When we analyze an algorithm, we're aiming for the order of the running time.– Write the algorithm down in precise English or in

any programming language, such as Java.– Determine how many steps are accomplished by

each line and how many times the line is executed. The time used by the line is the product of these two expressions.

– The total running time for the algorithm is the sum of the time required for each line. The order of this expression is the same as the order of the most time-consuming line.

Counting Steps

Size of the list: n– A single step:

Accessing or setting a variable or field, including an element of an array.

Addition, subtraction, multiplication, division, and other arithmetic operators.

Finding the length of an array or String. Comparisons using ==, <, etc. Any fixed number of single steps, such as two additions

and a variable assignment. Operators count as single steps.

– This is not necessarily true of methods in the Java library.

Counting Steps

Tally starts at 0 and ends up equal to n, there must be n passes through the loop.

– Running time for size() is linear: Θ(n). Line: 1 2 3 4 5 6 8 Time: c + c + c + c(n + 1) + cn + cn + c = 3cn + 5c

Counting Steps

r: the number of ranks. s: the number of suits. Time complexity (in line 9): Θ((r + 1)s) = Θ(rs)

Counting Steps

Deck constructor is the most expensive step in the algorithm.

Counting Steps• Adds up only the numbers for which j ≤ i.

n

i

i1

Total number of passes in the inner loop is 1+2+...+n

= )(2

)1( 2nnn

n

i

i1

Counting Steps

A single for loop typically takes time in Θ(n)

A doubly nested for loop typically takes time in Θ(n2).

Counting Steps

Do not overgeneralize the result about loops. Enhanced for loops generally runs at most n

times, where n is the number of elements in the data structure being traversed.

A loop may run less than n times if it is stopped early by a return or break statement or if it deals with more than one element on each pass.

Best, Worst, and Average Cases For example: use a sequential search algorithm to search the

“diamond 9” in a deck of porker cards.

If the first card is “diamond 9” then we only search once to find the card we want. This is the “best-case” for the algorithm.

If the last card is “diamond 9” then we have to search 52 times to find the card we want. This is the “worst-case” for the algorithm.

On average, the algorithm checks about 52 / 2 = 26 times to find any card in the deck. We call this the “average-case” running time for the algorithm.

If there are n cards in a deck, then the best-case is still 1, the worst-case will be n and the average-case will be n/2.

Best, Worst, and Average Cases

Difficult to analyze because of the if statement. But average-case could be equals size/2.


We have to decide which kind of analysis we're doing.

Best-case analysis– Tells us how fast the program runs if we get really

lucky about the data. Contains() Θ(1)

Best, Worst, and Average Cases Worst-case analysis

– Contains()– This means assuming that target is not in the

ArraList, giving a running time of Θ(n). Average-case analysis

– Requires that we make some assumption about what the “average” data set looks like.

– Average running time is:

Best case ≤ average case ≤ worst case


We must always be careful to choose our events so that they are exhaustive – at least one of them will occur.

Mutually exclusive – no more than one of them will occur.

With n possible events, the probability of each event occurring is 1/n.– Assuming that they are equally likely.


In the contains() method, if target is at index – 0, there is 1 pass through the loop.– 1, there are 2 passes– ...

Hence, the average running time for contains() is:

Amortized Analysis

Average-case analysis– How much time does this algorithm take on a

typical run? Worst-case analysis

– How much time does this algorithm take on the worst possible run?

Amortized analysis– If this algorithm is run several times, what is the

average time per run, given the worst possible sequence of runs?

Amortized Analysis

Amortized analysis does not have to make any assumptions about what a “typical” run looks like.– Often, the amortized running time is the same as

the worst-case running time.– For some algorithms it is not possible for the

worst run to occur many times in a row.

Amortized Analysis

We start with a new, empty ArrrayList and invoke add() n times. If we start at capacity 1 and double every time the ArrayList fills up. We need to stretch the ArrayList every time its size exceeds a power of 2.

Amortized Analysis

The total time for this sequence of runs is therefore: 1 + 2 + 4 + ... + (n-1)

Amortized Analysis

Assuming there are t terms.

The amortized time per operation is constant.

Amortized Analysis

What if, instead of doubling the capacity of the ArrayList, it increases the capacity by 1.

Amortized Analysis

Dividing this by n operations,

(n-1)n/(2n) = (n-1)/2 we find that the amortized time per operation is

now linear. Amortized analysis tells us that we should

prefer the version of stretch() that doubles the capacity of the ArrayList. Because constant time is better than linear time.

Summary

Analysis of algorithms gives us mathematical tools for comparing the efficiency of different algorithms.

We are interested in asymptotic behavior:– What happens to the running times as the size of

the data becomes arbitrarily large.

Summary

The running time of a method can be expressed as a function, and functions can be classified into orders.– A function's order is unchanged if it is multiplied by

a constant factor or if a lower-order function is added.

For examples: O(3n) = O(n) and O(n2 + n) = O(n2) – Things like faster hardware do not affect the order

of an algorithm's running-time function.

Summary

To find the running time of an algorithm, we determine the running time for each line.– Number of simple (constant-time) steps the line accomplishes,

multiplied by the number of times the line is run.

Best case ≤ average case ≤ amortized ≤ worst case– Worst-case analysis is the most common.– Average-case analysis is also useful, but requires some

assumption about what “average” data looks like.– Amortized analysis is appropriate when the worst case (such

as stretching an array) cannot happen on every run.

Chapter 7 Self-Study Homework

Pages: 191-202 Do Exercises: 7.2, 7.6, 7.10, 7.12, 7.15.

Chapter 7 Analysis of Algorithms © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights...

Documents

Transcript of Chapter 7 Analysis of Algorithms © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights...