Chapter 7 Analysis of Algorithms © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights...
-
Upload
berenice-campbell -
Category
Documents
-
view
231 -
download
1
Transcript of Chapter 7 Analysis of Algorithms © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights...
Chapter 7
Analysis of Algorithms
© 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
Overview
7.1: Shows how to directly time programs and points out some problems with this approach.
7.2: Powerful mathematical tools to simplify our reasoning about the efficiency of algorithms.
7.3: The connections between this math and actual algorithms.
7.4 and 7.5: Considering the average or the worst-case running time.
Timing
To know which of two methods is faster, the most obvious approach is to time them.– System.currentTimeMillis()– The number of milliseconds since midnight,
January 1, 1970, Greenwich mean time.
Timing
Timing
Program output:
A modern computer is so fast that, on either of these data structures, the get() method takes less than one millisecond to run.
Timing
Timing
Results Vary
– Variation is beyond our control.– ArrayList (refer to p138) is roughly twice as fast as
the method from LinkedList.
Timing
If we get element 50 instead of element 5.
– In ArrayList, get() method jumps right to the array element in question.
– LinkList traverse all the previous list nodes to find the one we want.
– Timing experiment provides empirical evidence.– We can use some formal, mathematical tools to
make statements about the efficiency of an algorithm.
Asymptotic Analysis To estimate the running time or other resource (such as
memory space) requirements of an algorithm, we simplify the analysis and keeps us thinking about the most important aspect: the growth rate (of running time).
This is called “asymptotic algorithm analysis” Assume method A is (10n² - 5) milliseconds to process n
elements. Assume method B is (100n + 200) milliseconds. The differences for small values of n are relatively insignificant. In the graph below, for a small n, method A looks better.
Asymptotic Analysis What really concerns us is the asymptotic behavior of the
running-time functions:
What happens as n becomes very large?
Asymptotic Analysis
Theorem: 10n² – 5 > 100n + 200 for any value of n ≥ 12.
Proof: for any n ≥ 12:
10n² - 5 ≥ 10*12n - 5
≥ 120n - 5
≥ 100n + 20n – 5
≥ 100n + 20*12 - 5
≥ 100n + 240 - 5
> 100n + 200
Asymptotic Analysis
First and third inequalities follow because of the assumption that n ≥ 12.
Actual running times will depend on a number of factors.
Make statements about the speed of an algorithm in general, rather than a particular implementation of it.– This way, we don't have to redo our analysis if we
change programming languages or buy a faster computer.
Asymptotic Notation
To keep our running-time expressions general, we allow them to contain unspecified constants.– Algorithm A is an² + b– Algorithm B is cn + d
Where n is the number of elements to be processed Where a, b, c, and d are unspecified constants that
depend on factors such as the speed of the hardware.
Asymptotic Notation Monotonically nondecreasing (increasing
consistently)– Usually f(n + 1) ≥ f(n)– Algorithm for which the running-time function did not
fit into this category would be fairly strange.– For example:
if f(n) = 3n + 1. Let n =4, then f(n) = 3 * 4 + 1 = 13
and f(n+1) = f(5) = 3 * 5 + 1 = 16.
Hence, f(n + 1) ≥ f(n).– We call function f(n) is monotonically nondecreasing.
Asymptotic Notation
Upper Bound: the upper or highest growth rate that the algorithm can have. It’s the upper bound for the growth rate expressed as an equation.
If the upper bound for an algorithm’s growth rate is f(n), then we would write that this algorithm is in O(f(n)) in the worst case.
The symbol O is pronounced “big-oh”. For example, if n2 grows as fast as T(n) for the worst-
case input, we would say the algorithm is in “O(n2) in the worst case”.
Asymptotic Notation
Definition 7.1: T(n) is in the set O(f(n)) if there exist two positive constants c and n0 such that |T(n)| ≤ c|f(n)| for all n > n0.
For example:
– consider the sequential search algorithm for finding a specified value in an array.
– If visiting and testing one value in the array requires cs steps where cs is a positive number,
– then in the average case T(n) = csn/2. For all values of n > 1, |T(n)| = |csn/2| ≤ cs|n|.
– Therefore, by the definition, T(n) is in O(n) for n0 = 1 and c = cs.
– The time complexity (in the worst case) of the sequential search algorithm is O(n) .
Asymptotic Notation
Lowper Bound: to describe the least amount of a resource that an algorithm need for some class of input (the worst-, average-, or best-case input) of size n.
It’s the lower bound for the growth rate expressed as an equation.
The lower bound for an algorithm is denoted by the symbol Ω (pronounced “big-omega”).
Asymptotic Notation
Definition 7.2: T(n) is in the set Ω(f(n)) if there exist two positive constants c and n0 such that |T(n)| ≥ c|f(n)| for all n > n0.
For example:
Assume T(n) = c1n2 + c2n for c1 and c2 > 0.
Then, | c1n2 + c2n | ≥ |c1n2| ≥ c1|n2| for all n > 1.
So |T(n)| ≥ c|n2| for c = c1 and n0 = 1.
Therefore, by the definition, T(n) is in Ω(n2).
Asymptotic Notation When the upper and lower bounds are the same, we indicate
this by using Θ (big-Theta) notation. An algorithm is said to be Θ(f(n)) if it is in O(f(n)) and in Ω(f(n)). Since the sequential search algorithm is both in O(n) and in
Ω(n) in the average case, we say it is Θ(n) in the average case. Orders: Functions can be classified into orders. Θ(f): tight bound
– is pronounced “the order of f” or “order f”– n² is one of the functions in Θ(n²)– Θ(2ⁿ) is the largest– Θ(1) is the smallest.
For sufficiently large n, a function is asymptotically larger than any function in a lower order.
For example: Θ(n log n) is asymptotically larger than any function in Θ(n).
Asymptotic Notation
Figure 7-5 (p189) Some common orders of functions.
Asymptotic Notation
Multiplying or dividing a function by a positive constant doesn't change its order.
3n² and 0.2n² are both in Θ(n²) A function's order is not changed by adding or
subtracting a function in a lower order.
2ⁿ – n + log n is in Θ(2ⁿ) Problem: f(n) = 5n³ + 3n² – 4n + 11 Answer: Θ(n³)
Asymptotic Notation
Column frobbing algorithm is in Θ(n²) Row zorching algorithm is in Θ(n).
Θ(n) is a lower order, so we should choose row zorching.
Figure 7-6 (p189) Running-time expressions for two fictitious algorithms
Asymptotic Notation
Synchronized absquatulation takes time in Θ(n²).
It’s better than Θ(n3) in the Dynamic inversion algorithm.
Figure 7-7 (p190) Running-time expressions for other two fictitious algorithms
Asymptotic Notation
At least
Or
If n is a positive integer and n > 2, thenn! = n * (n-1) * (n-2) * (n-3) * … * 1 > 2 * 2 * 2 * 2 * … * 1 > 2n-1 * 1 > 2n /2
Figure 7-7 (p190) Running-time expressions for other two fictitious algorithms
Asymptotic Notation
Factorial decortication takes time which is at least in Θ(2ⁿ).
Order Θ(n³), we should choose cubic flensing because Θ(n!) ≥ Θ(2ⁿ) ≥ Θ(n³)
If Θ(f) is the set of functions which grow like f, then Ω(f) is the set of functions which grow like f or much more quickly.
We proved that n! is in Ω(2ⁿ) Ω: lower bound.
Asymptotic Notation
Figure 7-9 (p191) order relations between two functions f and g. The symbol is read “a member of (the set)”.
Asymptotic Notation
A function in O(g) might actually be larger than g.For example 3n² O(n²), but 3n² is larger than n².f O(g) if and only if, for some constant c>0, there
is some n0 ≥ 0 such that f(n) < cg(n) for any n ≥ n
0.
n0 is a threshold so that only large values of n are
considered (e.g. In Figues 7-3, n0 was 12).
f Ω(g) if and only if, for some constant c > 0, there is some n
0 ≥ 0 such that f(n) > cg(n) for any
n ≥ n0.
f Θ(g) if and only if f O(g) and f Ω(g).– Showing the f O(g) is called finding an
asymptotic upper bound on f.– Showing that f Ω(g) is called finding an
asymptotic lower bound.– Showing that f Θ(g) is called finding an
asymptotically tight bound.
Asymptotic Notation
For k > 1, the orders Θ(kn) are called exponential orders.
For k > 0, the orders Θ(nk) are called polynomial orders.
Counting Steps
When we analyze an algorithm, we're aiming for the order of the running time.– Write the algorithm down in precise English or in
any programming language, such as Java.– Determine how many steps are accomplished by
each line and how many times the line is executed. The time used by the line is the product of these two expressions.
– The total running time for the algorithm is the sum of the time required for each line. The order of this expression is the same as the order of the most time-consuming line.
Counting Steps
Size of the list: n– A single step:
Accessing or setting a variable or field, including an element of an array.
Addition, subtraction, multiplication, division, and other arithmetic operators.
Finding the length of an array or String. Comparisons using ==, <, etc. Any fixed number of single steps, such as two additions
and a variable assignment. Operators count as single steps.
– This is not necessarily true of methods in the Java library.
Counting Steps
Tally starts at 0 and ends up equal to n, there must be n passes through the loop.
– Running time for size() is linear: Θ(n). Line: 1 2 3 4 5 6 8 Time: c + c + c + c(n + 1) + cn + cn + c = 3cn + 5c
Counting Steps
r: the number of ranks. s: the number of suits. Time complexity (in line 9): Θ((r + 1)s) = Θ(rs)
Counting Steps
Deck constructor is the most expensive step in the algorithm.
Counting Steps• Adds up only the numbers for which j ≤ i.
n
i
i1
Total number of passes in the inner loop is 1+2+...+n
= )(2
)1( 2nnn
n
i
i1
Counting Steps
A single for loop typically takes time in Θ(n)
A doubly nested for loop typically takes time in Θ(n2).
Counting Steps
Do not overgeneralize the result about loops. Enhanced for loops generally runs at most n
times, where n is the number of elements in the data structure being traversed.
A loop may run less than n times if it is stopped early by a return or break statement or if it deals with more than one element on each pass.
Best, Worst, and Average Cases For example: use a sequential search algorithm to search the
“diamond 9” in a deck of porker cards.
If the first card is “diamond 9” then we only search once to find the card we want. This is the “best-case” for the algorithm.
If the last card is “diamond 9” then we have to search 52 times to find the card we want. This is the “worst-case” for the algorithm.
On average, the algorithm checks about 52 / 2 = 26 times to find any card in the deck. We call this the “average-case” running time for the algorithm.
If there are n cards in a deck, then the best-case is still 1, the worst-case will be n and the average-case will be n/2.
Best, Worst, and Average Cases
Difficult to analyze because of the if statement. But average-case could be equals size/2.
Best, Worst, and Average Cases
We have to decide which kind of analysis we're doing.
Best-case analysis– Tells us how fast the program runs if we get really
lucky about the data. Contains() Θ(1)
Best, Worst, and Average Cases Worst-case analysis
– Contains()– This means assuming that target is not in the
ArraList, giving a running time of Θ(n). Average-case analysis
– Requires that we make some assumption about what the “average” data set looks like.
– Average running time is:
Best case ≤ average case ≤ worst case
Best, Worst, and Average Cases
We must always be careful to choose our events so that they are exhaustive – at least one of them will occur.
Mutually exclusive – no more than one of them will occur.
With n possible events, the probability of each event occurring is 1/n.– Assuming that they are equally likely.
Best, Worst, and Average Cases
In the contains() method, if target is at index – 0, there is 1 pass through the loop.– 1, there are 2 passes– ...
Hence, the average running time for contains() is:
Amortized Analysis
Average-case analysis– How much time does this algorithm take on a
typical run? Worst-case analysis
– How much time does this algorithm take on the worst possible run?
Amortized analysis– If this algorithm is run several times, what is the
average time per run, given the worst possible sequence of runs?
Amortized Analysis
Amortized analysis does not have to make any assumptions about what a “typical” run looks like.– Often, the amortized running time is the same as
the worst-case running time.– For some algorithms it is not possible for the
worst run to occur many times in a row.
Amortized Analysis
We start with a new, empty ArrrayList and invoke add() n times. If we start at capacity 1 and double every time the ArrayList fills up. We need to stretch the ArrayList every time its size exceeds a power of 2.
Amortized Analysis
The total time for this sequence of runs is therefore: 1 + 2 + 4 + ... + (n-1)
Amortized Analysis
Assuming there are t terms.
The amortized time per operation is constant.
Amortized Analysis
What if, instead of doubling the capacity of the ArrayList, it increases the capacity by 1.
Amortized Analysis
Dividing this by n operations,
(n-1)n/(2n) = (n-1)/2 we find that the amortized time per operation is
now linear. Amortized analysis tells us that we should
prefer the version of stretch() that doubles the capacity of the ArrayList. Because constant time is better than linear time.
Summary
Analysis of algorithms gives us mathematical tools for comparing the efficiency of different algorithms.
We are interested in asymptotic behavior:– What happens to the running times as the size of
the data becomes arbitrarily large.
Summary
The running time of a method can be expressed as a function, and functions can be classified into orders.– A function's order is unchanged if it is multiplied by
a constant factor or if a lower-order function is added.
For examples: O(3n) = O(n) and O(n2 + n) = O(n2) – Things like faster hardware do not affect the order
of an algorithm's running-time function.
Summary
To find the running time of an algorithm, we determine the running time for each line.– Number of simple (constant-time) steps the line accomplishes,
multiplied by the number of times the line is run.
Best case ≤ average case ≤ amortized ≤ worst case– Worst-case analysis is the most common.– Average-case analysis is also useful, but requires some
assumption about what “average” data looks like.– Amortized analysis is appropriate when the worst case (such
as stretching an array) cannot happen on every run.
Chapter 7 Self-Study Homework
Pages: 191-202 Do Exercises: 7.2, 7.6, 7.10, 7.12, 7.15.