CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures.
-
Upload
claire-mccoy -
Category
Documents
-
view
217 -
download
0
Transcript of CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures.
CSC231
1
Devon M. SimmondsUniversity of North Carolina,
Wilmington
CSC231Data Structures
Analysis of Algorithms
CSC231
2
Outline
Analysis Running time Worst case and average case
analysis Asymptotic notation
Big O, etc.
CSC231
Expressing good solutions to complex software-related problems require a good grasp of algorithms and data structures.
3
Program = Data Structures + Algorithms
Computer Science = solving problems using computer programs
CSC231
What are algorithms?
Instructions for accessing and manipulating our data structures. Order of access How data should be modified Efficiency of access Etc.
4
CSC231
Properties of Algorithms Finite input. Finite output. Correctness.
A correct algorithm halts with the correct output for every input.
Efficiency/Complexity How much resources the algorithm requires.
time Space bandwidth
5
CSC231
Algorithm Analysis We only analyze correct algorithms
Incorrect algorithms might not halt at all on some input instances
Incorrect algorithms might halt with other than the desired answer
Analyzing an algorithm involves: Predicting the resources that the algorithm requires
Time – computational time (usually the most important)
Space - memory Bandwidth
CSC231
Running Time?
The running time of an algorithm on a particular input is the number of primitive operations executed.
7
CSC231
Algorithm Analysis… Factors affecting the running time
computer compiler algorithm used input to the algorithm
The content of the input affects the running time typically, the input size (number of items in the input)
is the main consideration E.g. sorting problem the number of items to be
sorted E.g. multiply two matrices together the total
number of elements in the two matrices Machine model assumed
Instructions are executed one after another, with no concurrent operations Not parallel computers
CSC231
Example Calculate
Lines 1, 4 and 5 count for one unit each Line 3: executed N times, each time four units (2 mult, 1 add, I
asmt) Line 2: (1 for initialization, N+1 for all the tests, N for all the
increments) total 2N + 2 total cost: 6N + 4
N
i
i1
3
def sum(N): partialSum = 0 for i in range (1,
N+1): partialSum += i*i*i return partialSum
def main(): print(sum(3)) main()
1
2N+2
4N
1
1
1
2
3
4
5
CSC231 Worst- / average- / best-case Worst-case running time of an algorithm
The longest running time for any input of size n An upper bound on the running time for any input guarantee that the algorithm will never take longer The worst case can occur fairly often
E.g. in searching a database for a particular piece of information – at the end of the list
Best-case running time E.g. in searching a database for a particular piece of
information – at the beginning of the list Average-case running time
May be difficult to define what “average” means
CSC231
11
Why we do not use Experimental Studies
Write a program implementing the algorithm
Run the program with inputs of varying size and composition
Use a method like time.clock() to get an accurate measure of the actual running time
Plot the results 0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 50 100
Input Size
Tim
e (
ms)
CSC231
12
Limitations of Experiments It is necessary to implement the algorithm,
which may be difficult. Results may not be indicative of the running
time on other inputs not included in the experiment.
In order to compare two algorithms, the same hardware and software environments must be used
Many factors may affect the results. Poor memory use (i.e. cache misses, page faults) can
give misleading results Platform dependence?
CSC231
13
Theoretical Analysis Uses a high-level description of the
algorithm instead of an implementation
Characterizes running time as a function of the input size, n.
Takes into account all possible inputs Allows us to evaluate the speed of an
algorithm independent of the hardware/software environment
CSC231
14
Pseudocode High-level
description of an algorithm
More structured than English prose
Less detailed than a program
Preferred notation for describing algorithms
Hides many program design issues
Algorithm arrayMax(A, n)Input array A of n integersOutput maximum element of A
currentMax A[0]for i 1 to n 1 do
if A[i] currentMax thencurrentMax
A[i]return currentMax
Example: find max element of an array
CSC231
15
Primitive Operations Basic computations
performed by an algorithm
Identifiable in pseudocode
Largely independent from the programming language
Assumed to take a constant amount of time in the RAM model
Examples: Evaluating an
expression Assigning a
value to a variable
Indexing into an array
Calling a method Returning from a
method
CSC231
16
The Random Access Machine (RAM) Model
A CPU
An potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character
01
2
Instructions are executed one after another, no concurrent operations.Memory cells are numbered and accessing any cell in memory takes unit time.
CSC231
Running Time?
The running time of an algorithm on a particular input is the number of primitive operations executed.
17
CSC231
18
Counting Primitive Operations
By inspecting the pseudocode/program, we can determine the maximum number of primitive operations executed by an algorithm, as a function of the input size
Algorithm arrayMax(A, n) # operations
currentMax A[0] 2for( i 1; i <= n 1; i++ ) loop 2n times
if A[i] currentMax then 2 ops (index, comp)
currentMax A[i] 2 ops (index, asmt)return currentMax 1
4n + 1 < Total < 6n - 1
CSC231
19
Counting Primitive Operations
Algorithm arrayMax(A, n) # operations
currentMax A[0] 2for( i 1; i <= n 1; i++ ) loop 2n times
if A[i] currentMax then 2 ops (index, comp)
currentMax A[i] 2 ops (index, asmt)return currentMax 1
1st line = 2 ops, last line = 1 op 2nd line = 2n ops (init = 1, incr = n-1, cond = n). If statement is either 2 ops or 4 ops * (n-1) times
2 + 2n + 2(n-1) + 1 < Total < 2 + 2n + 4(n-1) + 1
4n + 1 < Total < 6n - 1
CSC231
20
Estimating Running Time Algorithm arrayMax executes 6n 1 primitive
operations in the worst case. Define:a = Time taken by the fastest primitive operationb = Time taken by the slowest primitive operation
Let T(n) be worst-case time of arrayMax. Then T(n) b(6n 1)
Hence, the running time T(n) is bounded by a linear function
Actual running time can be between a (4n + 1) and b (6n 1)
CSC231 Rules for Counting Primitive Operations Rule-1 loops
running time of loop <= #iterations * running time of statements in loop
21
Algorithm arrayMax(A, n) # operations
currentMax A[0] 2for i 1 to n 1 do loop n 1 times
if A[i] currentMax then 2 ops (index, comp)
currentMax A[i] 2 ops (index, asmt){ increment counter i } 2 ops (cond, incr) ?return currentMax 1 op (incr) ?
loop <= 4(n 1)
CSC231 Rules for Counting Primitive Operations Rule-2 nested loops
Evaluate nested loops inside out. running time of nested loop <= product
of sizes of loops * running time of statements
22
for i 1 to n 1 do loop n 1 timesfor i 1 to m 1 do loop m 1 times
if A[i] currentMax then 2 ops (index, comp)
currentMax A[i] 2 ops (index, asmt)
worst case running time = 4(n 1) (m 1)
CSC231 Rules for Counting Primitive Operations Rule-3 consecutive statements
Add consecutive statements.
23
for i 1 to n 1 do loop n 1 timesfor i 1 to m 1 do loop m 1 times
A[i] = currentMax 2 ops (index, comp)
currentMax++ 1 op (comp)
worst case running time = (2 + 1)
worst case running time = (n 1) (m 1)(2 + 1)
CSC231 Rules for Counting Primitive Operations Rule-4 if/else statements
Running time <= running time of condition + larger of two choices.
24
for i 1 to n 1 do loop n 1 timesfor i 1 to m 1 do loop m 1 times
if (A[i] currentMax && i > n) then 4 ops (index, 3 comps)
currentMax A[i] 2 ops (index, asmt) then
currentMax A[i] + 2 3 ops (index, comp, asmt)
worst case running time <= (4+3)(n 1) (m 1)
worst case running time <= (4+3)(n 1) (m
1)
CSC231 Rules for Counting Primitive Operations Add consecutive statements. Running time of if-statement <=
running time of condition + larger of two choices.
Running time of loop <= #iterations * running time of statements in loop
Evaluate nested loops inside out. running time of nested loop <= product of
sizes of loops * running time of statements
25
CSC231 Running-time of algorithms
Bounds are for the algorithms, rather than programs programs are just implementations of
an algorithm, and almost always the details of the program do not affect the bounds
Bounds are for algorithms, rather than problems A problem can be solved with several
algorithms, some are more efficient than others
CSC231
27
Growth Rate of Running Time
How does the running time of function T(n) change as input size increases? Changing the hardware/ software
environment Affects T(n) by a constant factor, but Does not alter the growth rate of T(n).
CSC231
28
Growth Rates
Growth rates of functions: Linear n Quadratic n2
Cubic n3
In a log-log chart, the slope of the line corresponds to the growth rate of the function
1E-11E+11E+31E+51E+71E+9
1E+111E+131E+151E+171E+191E+211E+231E+251E+271E+29
1E-1 1E+1 1E+3 1E+5 1E+7 1E+9
T(n
)
n
Cubic
Quadratic
Linear
CSC231
29
Constant Factors The growth rate
is not affected by constant factors
or lower-order terms
Examples 102n + 105 is a
linear function 105n2 + 108n is a
quadratic function 1E-11E+11E+31E+51E+71E+9
1E+111E+131E+151E+171E+191E+211E+231E+25
1E-1 1E+2 1E+5 1E+8
T(n
)
n
Quadratic
Quadratic
Linear
Linear
CSC231
30
Asymptotic Algorithm Analysis
Analysis for input sizes large enough to make only the order of growth of running times relevant. The asymptotic analysis of an algorithm determines
the running time in big-Oh notation To perform the asymptotic analysis
We find the worst-case number of primitive operations executed as a function of the input size
We express this function with big-Oh notation
CSC231
31
Asymptotic Notation
Big-O f(n) is O(g(n)) if f(n) is asymptotically less than or equal to
g(n)big-Omega f(n) is (g(n)) if f(n) is asymptotically greater than or equal
to g(n)big-Theta f(n) is (g(n)) if f(n) is asymptotically equal to g(n)
little-o f(n) is o(g(n)) if f(n) is asymptotically strictly less than g(n)
little-omega f(n) is (g(n)) if is asymptotically strictly greater than g(n)
CSC231
Asymptotic notation: Big-O
f(N) = O(g(N)) iff there are positive constants c and n0 such that
f(N) c g(N) when N n0
The growth rate of f(N) is less than or equal to the growth rate of g(N)
g(N) is an upper bound on f(N)
CSC231
Big-O Growth Rate
f(N) = O(g(N)) iff there are positive constants c and n0 such that
f(N) c g(N) when N n0
The growth rate of f(N) is less than or equal to the growth rate of g(N)
g(N) is an upper bound on f(N)
f(N) grows no faster than g(N) for “large” N
CSC231
Big-O: example
Let f(N) = 2N2. Then f(N) = O(N4) f(N) = O(N3) f(N) = O(N2) (best answer,
asymptotically tight)
O(N2): reads “order N-squared” or “Big-O N-
squared”
CSC231
Big O: more examples N2 / 2 – 3N = O(N2) 1 + 4N = O(N) 7N2 + 10N + 3 = O(N2) = O(N3) log10 N = log2 N / log2 10 = O(log2 N) = O(log N) sin N = O(1); 10 = O(1), 1010 = O(1)
log N + N = O(?) logk N = O(N) for any constant k N = O(2N), but 2N is not O(N) 210N is not O(2N)
)( 32
1
2 NONNiN
i
)( 2
1NONNi
N
i
CSC231
36
Big-O and Growth Rate The big-Oh notation gives an upper bound on
the growth rate of a function The statement “f(n) is O(g(n))” means that the
growth rate of f(n) is no more than the growth rate of g(n)
We can use the big-Oh notation to rank functions according to their growth rate
f(n) is O(g(n)) g(n) is O(f(n))
g(n) grows more
Yes No
f(n) grows more No Yes
Same growth Yes Yes
CSC231
37
Big-O Rules
If is f(n) a polynomial of degree d, then f(n) is O(nd), i.e.,
1. Drop lower-order terms2. Drop constant factors
Use the smallest possible class of functions
Say “2n is O(n)” instead of “2n is O(n2)” Use the simplest expression of the class
Say “3n + 5 is O(n)” instead of “3n + 5 is O(3n)”
CSC231
Some rules Ignore the lower order terms and the
coefficients of the highest-order term No need to specify the base of logarithm
Changing the base from one constant to another changes the value of the logarithm by only a constant factor
If T1(N) = O(f(N) and T2(N) = O(g(N)), then
T1(N) + T2(N) = max(O(f(N)), O(g(N))),
T1(N) * T2(N) = O(f(N) * g(N))
CSC231
Some typical growth rates c - constant time logn - logarithmic log2n – log-squared n – linear nlogn n2 – quadratic n3 - cubic an – exponential
39
CSC231
40
Summations
Logarithms and Exponents
Proof techniques Proof by contradiction Proof by induction
Math you need to Review
CSC231Math Review: logarithmic functions
xdx
xd
aaa
na
aba
a
bb
baab
abiffbx
e
bbb
an
b
m
ma
xa
1log
log)(loglog
loglog
log
loglog
logloglog
log
loglog
CSC231
42
properties of exponentials:a(b+c) = aba c
abc = (ab)c
ab /ac = a(b-c)
b = a logab
bc = a c*logab
properties of logarithms:logb(xy) = logbx + logbylogb (x/y) = logbx - logby
logbxa = alogbx
logba = logxa/logxb
Math you need to Review
CSC231
43
Computing Prefix Averages Asymptotic analysis with
two algorithms for prefix averages
The i-th prefix average of an array X is average of the first (i + 1) elements of X:
A[i] = (X[0] + X[1] + … + X[i])/(i+1)
Computing the array A of prefix averages of another array X has applications to financial analysis
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7
X
A
CSC231
44
A Prefix Averages Algorithm
Algorithm prefixAverages1(X, n)Input array X of n integersOutput array A of prefix averages of X appx. #operations A new array of n integers for i 0 to n 1 do
s X[0] for j 1 to i do
s s + X[j]A[i] s / (i + 1)
return A
Algorithm prefixAverages2 runs in O(?) time
CSC231
45
One Way To Think About This
You go through the outer loop n times Each time through the loop you have to
perform an assignment, and then another loop.
The inner loop first does no iterations, then 1, then 2, etc.
Each iteration of the inner loop costs a constant amount.
So, real cost of this is like )1(321 n
CSC231
46
Another Prefix Averages Algorithm
Algorithm prefixAverages2(X, n)Input array X of n integersOutput array A of prefix averages of X #operationsA new array of n integerss 0 for i 0 to n 1 do
s s + X[i]A[i] s / (i + 1)
return A
Algorithm prefixAverages2 runs in O(?) time
CSC231
47
Summary
Big-O f(n) is O(g(n)) if f(n) is asymptotically less than or equal to
g(n)big-Omega f(n) is (g(n)) if f(n) is asymptotically greater than or equal
to g(n)big-Theta f(n) is (g(n)) if f(n) is asymptotically equal to g(n)
little-oh f(n) is o(g(n)) if f(n) is asymptotically strictly less than g(n)
little-omega f(n) is (g(n)) if is asymptotically strictly greater than g(n)
CSC231
48
______________________Devon M. Simmonds
Computer Science Department
University of North Carolina Wilmington
_____________________________________________________________
Qu es ti ons?
Reading from course text:
CSC231
49
Relatives of Big-Oh
big-Omega f(n) is (g(n)) if there is a constant c > 0
and an integer constant n0 1 such that f(n) c•g(n) for n n0
big-Theta f(n) is (g(n)) if there are constants c’ > 0 and c’’ > 0 and
an integer constant n0 1 such that c’•g(n) f(n) c’’•g(n) for n n0
little-oh f(n) is o(g(n)) if, for any constant c > 0, there is an integer
constant n0 0 such that f(n) c•g(n) for n n0
little-omega f(n) is (g(n)) if, for any constant c > 0, there is an integer
constant n0 0 such that f(n) c•g(n) for n n0
CSC231
50
Devon M. SimmondsUniversity of North Carolina, Wilmington
TIME: Tuesday/Thursday 11:11:50am in 1012 & Thursday 3:30-5:10pm in 2006. Office hours: TR 1-2pm or by appointment. Office location: CI2046. Email: simmondsd[@]uncw.edu
Course Overview