CS 231: Algorithmic Problem Solvingcs231/resources/231slidesmo… · 2 M 0 CS 231 Module 5...
Transcript of CS 231: Algorithmic Problem Solvingcs231/resources/231slidesmo… · 2 M 0 CS 231 Module 5...
CS 231: Algorithmic Problem SolvingNaomi Nishimura
Module 5Date of this version: November 22, 2019
WARNING: Drafts of slides are made available prior to lecture foryour convenience. After lecture, slides will be updated to reflectmaterial taught. Check the date on this page to make sure youhave the correct, updated version.
WARNING: Slides do not include all class material; if you havemissed a lecture, make sure to find out from a classmate whatmaterial was presented verbally or on the board.
CS 231 Module 5 1 / 39
Matrix-chain multiplication
Matrix-chain multiplication
Input: A sequence (chain) of matrices M0, . . . ,Mn−1, where Mi hasdimension di ×di+1
Output: A parenthesization that results in the smallest number ofmultiplications of pairs of values
Possible parenthesizations: (M0M1)M2 and M0(M1M2)
2 x 10 10 x 30 30 x 3
M1
M2
M0
CS 231 Module 5 Matrix-chain multiplication 2 / 39
Approach 3: Divide-and-conquer
What if we knew that the last multiplication in an optimal solution was(M0 . . .Mk)(Mk+1 . . .Mn−1)?
Then the total cost would be:
• The optimal cost of forming A by multiplying M0 through Mk .
• The optimal cost of forming B by multiplying Mk+1 through Mn−1.
• The cost of multiplying A and B.
CS 231 Module 5 Matrix-chain multiplication 3 / 39
Divide-and-conquer, continued
Issues to resolve:
1. How do we handle the fact that we don’t know k?
2. How do we determine the optimal costs of forming A and B?
Ideas:
1. Try all possible values of k .
2. Use the same algorithm for each instance.
Problems:
• We end up trying all possibilities.
• We recompute some pieces more than once.
CS 231 Module 5 Matrix-chain multiplication 4 / 39
Paradigm: Dynamic programmingFeatures:
• Define a solution to an instance of a problem in terms of solutions tosmaller instances formed from the original instance.
• Solve each instance and store solutions in a table.
• Choose an order such that solutions can be found in the table whenthey are needed.
• Extract the solution to the original instance from the table.
Works when:
• The Optimal Substructure Property holds.
• Instances can be ordered so that smaller instances are solved beforetheir solutions are needed to solve bigger instances.
History:
• Popularized by Richard Bellman by work started in 1955
• “Programming” means use of a tabular solution method
CS 231 Module 5 Paradigm: Dynamic programming 5 / 39
Optimal substructure and dynamic programming
Dynamic programming, like the greedy approach, takes advantage ofoptimal substructure.
A dynamic programming algorithm may end up trying various options tofigure out which way of splitting up the problem is the best.
CS 231 Module 5 Paradigm: Dynamic programming 6 / 39
Dynamic programming recipe
Recipe for dynamic programming:
1. Determine how the optimal solution can be formed from optimalsolutions to smaller instances.
2. Determine what information should be stored in each table entry.
3. Determine the shape of the table or tables needed to store thesolutions to the smaller instances.
4. Determine the base cases.
5. Choose an order of evaluation.
6. Extract the solution from the table.
CS 231 Module 5 Paradigm: Dynamic programming 7 / 39
Using dynamic programming for matrix-chainmultiplication
Step 1: Determine how the optimal solution can be formed fromoptimal solutions to smaller instances.
Starting with a generic optimal solution to the original instance:
• Show that the Optimal Substructure Property holds.
• Consider all the possible ways it may contain optimal solutions tosmaller instances.
• Write a formula that encompasses all the possibilities.
CS 231 Module 5 Paradigm: Dynamic programming 8 / 39
A generic optimal solution to matrix-chain multiplication
In an optimal solution, the very last multiplication will consist ofmultiplying a pair of matrices A and B such that for some value of k :
• A is formed by multiplying M0 through Mk
• B is formed by multiplying Mk+1 through Mn−1
We have already executed the first two steps in the Optimal Substructurerecipe. We still need to show that if any piece O ′ of O is not an optimalsolution for a smaller input I ′ of I , then O is not an optimal solution for I .
CS 231 Module 5 Paradigm: Dynamic programming 9 / 39
Determining a formula
Note: We don’t know the value of k , other than it must be in the rangefrom 0 to n−2.
Suppose we consider the optimal costs of multiplying M0 through Mk andMk+1 through Mn−1 for all values of k .
One value (or more) will give us the optimal values.
The wrong choice of k will result in a value that is bigger than the optimalvalue.
If we take the minimum value obtained for any choice of k , we will obtainthe correct value for the optimal solution.
CS 231 Module 5 Paradigm: Dynamic programming 10 / 39
Defining smaller instances
To be able to repeat the process for each smaller instance, we need to beable to break up each smaller instance in all possible ways.
The smaller pieces may not start at M0 or end at Mn−1.
We define m[i , j ] to be the smallest cost to multiply Mi through Mj , wherem[i , j ] = mink{m[i ,k] +m[k + 1, j ] +didk+1dj+1}.
The matrix formed by multiplying Mi through Mk will have dimensions diand dk+1 and the matrix formed by multiplying Mk+1 through Mj willhave dimensions dk+1 and dj+1.
CS 231 Module 5 Paradigm: Dynamic programming 11 / 39
Matrix chain multiplication
Step 2: Determine what information should be stored in each tableentry.We store each value m[i , j ].Step 3: Determine the shape of the table or tables needed to storethe solutions to the smaller instances.A triangle (or a grid).Step 4: Determine the base cases.When m[i , j ] requires no multiplications, the cost is 0.Thus, m[i , i ] = 0 for all values of i .Step 5: Choose an order of evaluation.When we calculate m[i , j ], we want to be able to look up the values ofm[i ,k] and m[k + 1, j ] in the table.m[i , j ] = mink{m[i ,k] +m[k + 1, j ] +didk+1dj+1}To ensure that the “smaller” problems have been stored before they areused, we view the “size” of m[i , j ] as j− i and fill in entries innondecreasing order of size.
CS 231 Module 5 Paradigm: Dynamic programming 12 / 39
How to implement the filling of a table
To write an algorithm that fills in table entries, typically one loop is usedfor each dimension in the table.
For a one-dimensional table or list, the single loop will typically iterateover indices.
For a two-dimensional table:
• If filled row by row, typically the outer loop will iterate over rowindices and the inner loop will iterate over column indices.
• If filled column by column, typically the outer loop will iterate overcolumn indices and the inner loop will iterate over row indices.
• If filled diagonal by diagonal, typically the outer loop will iterate overdiagonals and the inner loop will iterate either over row or columnindices.
CS 231 Module 5 Paradigm: Dynamic programming 13 / 39
Filling diagonal by diagonal
Information to determine:
• Number of diagonals
• Starting and ending number of columns (or rows)
• For each diagonal and column (or row), the row (or column) of theentry in the given diagonal
CS 231 Module 5 Paradigm: Dynamic programming 14 / 39
Filling in the table for matrix chain multiplication
Information to determine:
• Number of diagonals Fill in table by increasing value of j− i . Onediagonal for each value of j− i , for a total of n.
• Starting and ending number of columns (or rows) Values range from0 to n−1
• For each diagonal and column (or row), the row (or column) of theentry in the given diagonal For diagonal d = j− i and column j , userow j−d .
CS 231 Module 5 Paradigm: Dynamic programming 15 / 39
Details of order of evaluation
0 1 2 3
0 1 5 8 10
1 2 6 9
2 3 7
3 4 0
1
2
3
i
j
j-i
Fill in increasing order of j-i(diagonals)
Within each each diagonal, work inincreasing order of j
First diagonal has i = j , costs are all0.
Solution is at top right corner
CS 231 Module 5 Paradigm: Dynamic programming 16 / 39
Choosing what to store
Step 6: Extract the solution from the table.
Table entry [0,n+ 1] will contain the minimum cost, but how do we figureout the actual sequence of multiplications?
Go back to Step 3 and also store which k is used to obtain the minimumvalue.
Once the table is full, use values stored in entry [0,n+ 1] to trace backthrough entries to extract ordering information.
CS 231 Module 5 Paradigm: Dynamic programming 17 / 39
Filling in diagonal j− i = 1
In each entry, store:
• minimum value, and
• which value k results in that value.
Recall: m[i , j ] = mink{m[i ,k] +m[k + 1, j ] +didk+1dj+1}
0 1 2 3
0 0 600k=0
1 0 900k=1
2 0 90k=2
3 0
d0 = 2, d1 = 10, d2 = 30, d3 = 3,d4 = 1
m[0,1] =m[0,0] +m[1,1] +d0d1d2 = 600
m[1,2] =m[1,1] +m[2,2] +d1d2d3 = 900
m[2,3] =m[2,2] +m[3,3] +d2d3d4 = 90
CS 231 Module 5 Paradigm: Dynamic programming 18 / 39
Filling in diagonal j− i = 2
Recall: m[i , j ] = mink{m[i ,k] +m[k + 1, j ] +didk+1dj+1}d0 = 2, d1 = 10, d2 = 30, d3 = 3,d4 = 1
0 1 2 3
0 0 600k=0
780k=1
1 0 900k=1
390k=1
2 0 90k=2
3 0
Try values k = 0 and k = 1 form[0,2]:m[0,2] =
min
{m[0,0] +m[1,2] +d0d1d3 = 960
m[0,1] +m[2,2] +d0d2d3 = 780Try values k = 1 and k = 2 form[1,3]:m[1,3] =
min
{m[1,1] +m[2,3] +d1d2d4 = 390
m[1,2] +m[3,3] +d1d3d4 = 930
CS 231 Module 5 Paradigm: Dynamic programming 19 / 39
Filling in diagonal j− i = 3Try values k = 0, k = 1, and k = 2 for m[0,3]:
m[0,3] = min
m[0,0] +m[1,3] +d0d1d4 = 410
m[0,1] +m[2,3] +d0d2d4 = 750
m[0,2] +m[3,3] +d0d3d4 = 786
0 1 2 3
0 0 600k=0
780k=1
410k=0
1 0 900k=1
390k=1
2 0 90k=2
3 0
m[0,3] uses k = 0 to split M0 and M1 through M3
m[1,3] uses k = 1 to split M1 and M2 through M3
Solution: M0(M1(M2M3))
CS 231 Module 5 Paradigm: Dynamic programming 20 / 39
Assessing a dynamic programming algorithm
Recipe for assessing an algorithm:
1. Determine the (worst-case) running time.
2. Ensure that the algorithm produces the correct output for anyinstance.
For dynamic programming algorithms:
1. Typically, the cost is dominated by the product of the number of tableentries and the cost of filling one entry. The cost of extracting asolution is usually no greater than the cost of filling the table.
2. Correctness is proved by showing that the problem has optimalsubstructure and that the order of evaulation ensures that thealgorithm finds the correct smaller instances to check.
The space required is typically the number of table entries.
CS 231 Module 5 Analysis 21 / 39
Expressing the solution in terms of other solutions
Common methods based on types of inputs:
Set Determine an order on the elements. Smaller instances aredefined using smaller subsets of the elements. Biggerinstances are formed by adding elements one at a time.
Sequence or Grid Define a problem in terms of the position or positions inthe sequence. Smaller instances are at earlier positions.Bigger instances are at later positions.
Tree Define a problem on a subtree. Smaller instances are onsmaller subtrees. Bigger instances are on bigger subtrees.
CS 231 Module 5 General approaches to dynamic programming 22 / 39
Determining what information should be stored in eachtable entry
Common types of information stored:
Decision problem True or False (solutions to smaller instances)
Evaluation problem Values (solutions to smaller instances)
Search or constructive problem Information indicating which smallerinstances led to the optimal solution
Because the cost of the algorithm will depend on the amount ofinformation stored in each entry (cost of calculation as well as cost ofreading time), often full solutions are not stored.
Instead, they can be reconstructed in asymptotically the same amount oftime required to fill the table.
CS 231 Module 5 General approaches to dynamic programming 23 / 39
Common table shapes
1D Typically used when each problem has only one variable.
2 1D Typically used when each problem has two variables, butbigger instances depend only on one-smaller values of onevariable. Examples: the input naturally has two variables(e.g. grids or graphs) or there are two inputs.
2D Typically used when each problem has two variables, andbigger instances depend on more than just one-smaller valuesof one variable. Sometimes a grid is used, and sometimes atriangle if, for example, M[i,j] = M[j,i] or one not defined.
2 2D Typically used in situations like for 2 1D tables, but this timefor problems with three variables.
3D or higher Typically used when problems have three or more variables.
For a tree, often nodes are ordered from leaves to root.
CS 231 Module 5 General approaches to dynamic programming 24 / 39
Longest common subsequence
A subsequence of a string x0x1 . . .xn−1 is any string xi1xi2 . . .xij such that0≤ i1 ≤ i2 ≤ ·· · ≤ ij ≤ n−1.
Example: The string ”abcde” is a subsequence of the string”000a0b0000cd0000e”.
Longest common subsequence
Input: A string X of length m and a string Y of length n
Output: A string Z that is a subsequence of both X and Y and ofmaximum length
a r m a g e d d o n
a a r d v a r k s
a r m a g e d d o n
a a r d v a r k s
CS 231 Module 5 Longest common subsequence 25 / 39
Recipe Step 1 for LCSSuppose Z is an optimal solution. Where in X and Y can the last symbolin Z be found?
• Matching the last symbols inboth X and Y .
• Matching a non-last symbol inX .
• Matching a non-last symbol inY .
z
x
y
z
x
y
z
x
y
CS 231 Module 5 Longest common subsequence 26 / 39
Defining smaller instances
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
CS 231 Module 5 Longest common subsequence 27 / 39
Recipe Step 2 for LCS
We’ll use Python slice notation to represent smaller strings, such as
• Z [: k] is the first k symbols (positions 0 to k−1)
• Z [k :] is all but the first k symbols (positions k and on)
• Z [j : k] is symbols j to k−1
Suppose Z [: k] is an LCS of X [: i ] and Y [: j ].
Claim
Case 1 xi−1 = yj−1 ⇒ Z [: k−1] is an LCS of X [: i −1] and Y [: j−1].Case 2 xi−1 6= yj−1 and xi−1 6= zk−1 ⇒ Z [: k] is an LCS of X [: i −1] andY [: j ].Case 3 xi−1 6= yj−1 and yj−1 6= zk−1 ⇒ Z [: k] is an LCS of X [: i ] andY [: j−1].
For C [i , j ] the length of the LCS of X [: i ] and Y [: j ]:C [i , j ] = C [i −1, j−1] + 1 if xi−1 = yj−1C [i , j ] = max{C [i −1, j ],C [i , j−1]} otherwise
CS 231 Module 5 Longest common subsequence 28 / 39
Steps 3 through 5Step 3: Determine what information should be stored in each tableentry.
Store length of subsequence as well as whether the best value of C [i , j ]was found using C [i −1, j−1], C [i −1, j ], or C [i , j−1].
Step 3: Determine the shape of the table or tables needed to storethe solutions to the smaller instances.A grid of dimensions (m+ 1)× (n+ 1)Step 4: Determine the base cases.
Use value 0 to denote empty string, so C [0,0] = C [0, j ] = C [i ,0] = 0
Step 5: Choose an order of evaluation.
C [i , j ] = C [i −1, j−1] + 1 if xi−1 = yj−1
C [i , j ] = max{C [i −1, j ],C [i , j−1]} if xi−1 = yj−1
We need to know C [i −1, j−1], C [i −1, j ], and C [i , j−1] before C [i , j ].
Use increasing values of i and then increasing values of j (hence row byrow and then column by column).
CS 231 Module 5 Longest common subsequence 29 / 39
Completion and analysis
Step 6: Extract the solution from the table.
C[m,n] contains the length of the longest common subsequence of theinputs.
To extract the sequence, store information on which of the three values ledto the best answer.
Analysis:
Space: O(mn)
Time: O(mn)
CS 231 Module 5 Longest common subsequence 30 / 39
Filling in the table for LCS
0 1 2 3
0 0 1 2 3
1 2 3 4
2 4 5
i
j
4
5
6
4
1
2 3
• Number of diagonals
• Starting and ending number ofcolumns
• For each diagonal and column,the row of the entry in thegiven diagonal
CS 231 Module 5 Longest common subsequence 31 / 39
Completing the implementation
Possible implementation choices:
• Use a Grid or two to store values and to keep track of which gave abest solution (plus the character if match found).
• Extract the solution by working backwards from C[m,n] and following“arrows” to previous best solutions, adding characters when found.
See sample code on website for calculating just the value and alsocalculating the value and extracting the subsequence.
CS 231 Module 5 Longest common subsequence 32 / 39
All-pairs cheapest paths
All-pairs cheapest paths
Input: A graph G with non-negative edge weights
Output: The least-cost paths between each pair of vertices in G
Note: Other methods needed when there can be negative cycles.
Naıve method: Use Dijkstra n times.
Floyd-Warshall dynamic programming algorithm
CS 231 Module 5 All-pairs cheapest paths 33 / 39
Knapsack (or 0-1 knapsack)
Note: In this version, each entire object is either included in or excludedfrom the knapsack.
Knapsack
Input: A set of n types of objects, where object i has positiveinteger weight wi and positive integer value vi , and aninteger weight bound W
Output: A subset of objects with total weight at most W and themaximum total value
CS 231 Module 5 Knapsack 34 / 39
Options for dynamic programming on graphs
Idea:
• Removing a set of one or more vertices to break a graph into smallergraphs
• Solve instances on smaller graphs that can be combined to solve theoriginal instance on a larger graph
• Solutions to additional problems on smaller graphs may also beneeded
Challenges in splitting up a graph:
• Using a large set of vertices to break up the graph may make itexpensive to account for solutions that involve vertices in the set
• Breaking a graph into a large number of smaller graphs may make itexpensive to combine the information in the smaller graphs
CS 231 Module 5 Graphs 35 / 39
Using dynamic programming on assignments and exams
In many of the examples seen in lectures, the biggest challenge is figuringout how to express an optimal solution in terms of optimal solutions tosmaller instances.
It is not expected that on your own you would be able to devise expressionsfor problems as challenging as, for example, all-pairs cheapest paths.
For assignments and exams, you can expect either to be provided with theexpression or to be asked a question in which finding such an expression isrelatively simple.
You will be expected to understand the rest of the steps in the recipe tobe able to figure those out on your own.
CS 231 Module 5 Summary 36 / 39
Comparing paradigms
Aspect Exhaustive Greedy D-and-C DP
Feasible solutions All Some Some Some
Applicability Wide Narrow Medium Medium
Speed Slow Fast Medium Medium
Divide-and-conquer:
• assumes you know which problems are needed
• works top down
• may repeat smaller instances
Dynamic programming:
• does not assume you know which smaller instances are needed
• works bottom up (top down version memoization exists)
• does not repeat smaller instances
CS 231 Module 5 Summary 37 / 39
Module summary
Topics covered:
• Matrix-chain multiplication
• Paradigm: Dynamic programming
• Dynamic programming recipe
• Solving matrix-chain multiplication
• Dynamic programming analysis
• General approaches to dynamic programming
• Longest common subsequence
• All-pairs cheapest paths
• Knapsack
• Finding the height of a tree
• Comparing paradigms
CS 231 Module 5 Summary 38 / 39