CSCI 256
Data Structures and Algorithm Analysis
Lecture 14
Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some by Iker Gondra
Elements of DP
• From an engineering perspective, when should we look for a DP solution to a problem?– Optimal substructure: The first step in solving an optimization
problem by DP is to characterize the structure of an optimal solution. A problem exhibits optimal structure if an optimal solution to the problem contains within it optimal solutions to subproblems
– Overlapping subproblems: The space of subproblems must be “small” in the sense that a recursive algorithm for the problem solves the same subproblems over and over again, rather than always generating new subproblems. Typically, total number of distinct subproblems is a polynomial in the input size. DP algorithms take advantage of this by solving each subproblem once and storing the solution in a table
Least Squares
• Least squares– Foundational problem in statistics and numerical analysis
– Given n points in the plane: (x1, y1), (x2, y2) , . . . , (xn, yn)
– Find a line y = ax + b that minimizes the sum of the squared error
– Solution: Calculus min error is achieved when
x
y
Segmented Least Squares
• Segmented least squares (first attempt)– Points lie roughly on a sequence of several line segments
– Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes SSE which we could call the error
x
y
Segmented Least Squares -- How many line segments should we choose??
• To optimize, we want to give assume a greater penalty for a larger number of segments as well as for the error—the squared deviations of the points from its corresponding line. Penalty of a partition is the sum of:– the number of segments into which we partition the points times
a given multiplier, c– For each segment the error value of the optimal line through that
segment• This problem is a partitioning problem.• This is an important problem in data mining and statistics known as
change detection: given a sequence of data points, identify a few points in the sequence at which a discrete change occurs (in this case a change from one linear approximation to another)
Segmented Least Squares
• Goal in segmented Least Squares Problem: find a partition of minimal penalty
Optimal interpolation with two segments
• Give an equation for the error of the optimal line ( having minimal least squares error ) through p1,…,pn with two line segments. Let Ei,j be the least squares error for the optimal line through pi, . . . pj
(DONE IN CLASS)
Optimal interpolation with three segments
• Give an equation for the error of the optimal line ( having minimal least squares error ) through p1,…,pn with three line segments. Let Ei,j be the least squares error for the optimal line through pi, . . . pj
Need to find i and j which minimize (Ej+1,n + Ei+1,j + E1,i)
(Note we haven’t included a penalty term accounting for the number of segments)
Can we do this recursively?
Segmented Least Squares
• Segmented least squares– Points lie roughly on a sequence of several line segments
– Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes f(x)
• Question: What's a reasonable choice for f(x) to balance accuracy and parsimony?
goodness of fit number of lines
x
y
Segmented Least Squares
• Segmented least squares– Points lie roughly on a sequence of several line segments
– Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes
• the sum of the sums of the squared errors E in each segment• the number of lines L• Tradeoff function: E + cL, for some constant c > 0
x
y
Optimal substructure property
Optimal solution with k line segments extends an optimal solution of k-1 line segments on a smaller problem
DP: Multiway Choice
• Notation– OPT[j] = minimum cost for points p1, p2 , . . . , pj
– Ei,j = minimum sum of squares for points pi, pi+1 , . . . , pj
• Give a recursive definition for OPT[j]
Notation.
OPT[j] = minimum cost for points p1, p2, . . . , pj.
Ei,j = minimum sum of squares for points pi, pi+1 , . . . , pj.
• To compute OPT[j]:–Last segment uses points pi, pi+1 , . . . , pj for some i.
–Cost = Ei,j + c + OPT[i-1].
–Which i ???
• Opt[j] = min 1 i j (Ei,j + c + Opt[i-1])
Segmented Least Squares: Algorithm
can be improved to O(n2) by pre-computing various statistics
INPUT: n, p1,…,pN , c
Segmented-Least-Squares() { Opt[0] = 0 for j = 1 to n for i = 1 to j compute the least square error Eij for the segment pi,…, pj
endfor for j = 1 to n Opt[j] = min 1 i j (Eij + c + Opt[i-1])
endfor
return Opt[n]}
• Total Running time: O(n3)
• Computing Ei,j for O(n2) pairs, O(n) per pair using previous formula
– this gives O(n3) to compute all Ei,j pairs
• Following this the algorithm has n iterations for values j = 1,…,n; for each value of j we have to compute the minimum of the recurrence to fill the array entry Opt[j]; this takes O(n) for each j; – This part gives O(n2)
Remark – there is an exercise in the text which shows how to reduce the total running time from O(n3) to O(n2)
Determining the solution
• When Opt[j] is computed, record the value of i that minimized the sum
• Store this value in an auxiliary array• Use to reconstruct solution
Top Related