Post on 01-Jan-2016
CSCI 256
Data Structures and Algorithm Analysis
Lecture 14
Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some by Iker Gondra
Elements of DP
• From an engineering perspective, when should we look for a DP solution to a problem?– Optimal substructure: The first step in solving an optimization
problem by DP is to characterize the structure of an optimal solution. A problem exhibits optimal structure if an optimal solution to the problem contains within it optimal solutions to subproblems
– Overlapping subproblems: The space of subproblems must be “small” in the sense that a recursive algorithm for the problem solves the same subproblems over and over again, rather than always generating new subproblems. Typically, total number of distinct subproblems is a polynomial in the input size. DP algorithms take advantage of this by solving each subproblem once and storing the solution in a table
Least Squares
• Least squares– Foundational problem in statistics and numerical analysis
– Given n points in the plane: (x1, y1), (x2, y2) , . . . , (xn, yn)
– Find a line y = ax + b that minimizes the sum of the squared error
– Solution: Calculus min error is achieved when
x
y
Segmented Least Squares
• Segmented least squares (first attempt)– Points lie roughly on a sequence of several line segments
– Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes SSE which we could call the error
x
y
Segmented Least Squares -- How many line segments should we choose??
• To optimize, we want to give assume a greater penalty for a larger number of segments as well as for the error—the squared deviations of the points from its corresponding line. Penalty of a partition is the sum of:– the number of segments into which we partition the points times
a given multiplier, c– For each segment the error value of the optimal line through that
segment• This problem is a partitioning problem.• This is an important problem in data mining and statistics known as
change detection: given a sequence of data points, identify a few points in the sequence at which a discrete change occurs (in this case a change from one linear approximation to another)
Segmented Least Squares
• Goal in segmented Least Squares Problem: find a partition of minimal penalty
Optimal interpolation with two segments
• Give an equation for the error of the optimal line ( having minimal least squares error ) through p1,…,pn with two line segments. Let Ei,j be the least squares error for the optimal line through pi, . . . pj
(DONE IN CLASS)
Optimal interpolation with three segments
• Give an equation for the error of the optimal line ( having minimal least squares error ) through p1,…,pn with three line segments. Let Ei,j be the least squares error for the optimal line through pi, . . . pj
Need to find i and j which minimize (Ej+1,n + Ei+1,j + E1,i)
(Note we haven’t included a penalty term accounting for the number of segments)
Can we do this recursively?
Segmented Least Squares
• Segmented least squares– Points lie roughly on a sequence of several line segments
– Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes f(x)
• Question: What's a reasonable choice for f(x) to balance accuracy and parsimony?
goodness of fit number of lines
x
y
Segmented Least Squares
• Segmented least squares– Points lie roughly on a sequence of several line segments
– Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes
• the sum of the sums of the squared errors E in each segment• the number of lines L• Tradeoff function: E + cL, for some constant c > 0
x
y
Optimal substructure property
Optimal solution with k line segments extends an optimal solution of k-1 line segments on a smaller problem
DP: Multiway Choice
• Notation– OPT[j] = minimum cost for points p1, p2 , . . . , pj
– Ei,j = minimum sum of squares for points pi, pi+1 , . . . , pj
• Give a recursive definition for OPT[j]
Notation.
OPT[j] = minimum cost for points p1, p2, . . . , pj.
Ei,j = minimum sum of squares for points pi, pi+1 , . . . , pj.
• To compute OPT[j]:–Last segment uses points pi, pi+1 , . . . , pj for some i.
–Cost = Ei,j + c + OPT[i-1].
–Which i ???
• Opt[j] = min 1 i j (Ei,j + c + Opt[i-1])
Segmented Least Squares: Algorithm
can be improved to O(n2) by pre-computing various statistics
INPUT: n, p1,…,pN , c
Segmented-Least-Squares() { Opt[0] = 0 for j = 1 to n for i = 1 to j compute the least square error Eij for the segment pi,…, pj
endfor for j = 1 to n Opt[j] = min 1 i j (Eij + c + Opt[i-1])
endfor
return Opt[n]}
• Total Running time: O(n3)
• Computing Ei,j for O(n2) pairs, O(n) per pair using previous formula
– this gives O(n3) to compute all Ei,j pairs
• Following this the algorithm has n iterations for values j = 1,…,n; for each value of j we have to compute the minimum of the recurrence to fill the array entry Opt[j]; this takes O(n) for each j; – This part gives O(n2)
Remark – there is an exercise in the text which shows how to reduce the total running time from O(n3) to O(n2)
Determining the solution
• When Opt[j] is computed, record the value of i that minimized the sum
• Store this value in an auxiliary array• Use to reconstruct solution