CSCI 256 Data Structures and Algorithm Analysis Lecture 14 Some slides by Kevin Wayne copyright...

21
CSCI 256 Data Structures and Algorithm Analysis Lecture 14 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some by Iker Gondra

Transcript of CSCI 256 Data Structures and Algorithm Analysis Lecture 14 Some slides by Kevin Wayne copyright...

CSCI 256

Data Structures and Algorithm Analysis

Lecture 14

Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some by Iker Gondra

Elements of DP

• From an engineering perspective, when should we look for a DP solution to a problem?– Optimal substructure: The first step in solving an optimization

problem by DP is to characterize the structure of an optimal solution. A problem exhibits optimal structure if an optimal solution to the problem contains within it optimal solutions to subproblems

– Overlapping subproblems: The space of subproblems must be “small” in the sense that a recursive algorithm for the problem solves the same subproblems over and over again, rather than always generating new subproblems. Typically, total number of distinct subproblems is a polynomial in the input size. DP algorithms take advantage of this by solving each subproblem once and storing the solution in a table

Least Squares

• Least squares– Foundational problem in statistics and numerical analysis

– Given n points in the plane: (x1, y1), (x2, y2) , . . . , (xn, yn)

– Find a line y = ax + b that minimizes the sum of the squared error

– Solution: Calculus min error is achieved when

x

y

Least Squares Solution? Sensible??

x

y

Segmented Least Squares

• Segmented least squares (first attempt)– Points lie roughly on a sequence of several line segments

– Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes SSE which we could call the error

x

y

Segmented Least Squares -- How many line segments should we choose??

• To optimize, we want to give assume a greater penalty for a larger number of segments as well as for the error—the squared deviations of the points from its corresponding line. Penalty of a partition is the sum of:– the number of segments into which we partition the points times

a given multiplier, c– For each segment the error value of the optimal line through that

segment• This problem is a partitioning problem.• This is an important problem in data mining and statistics known as

change detection: given a sequence of data points, identify a few points in the sequence at which a discrete change occurs (in this case a change from one linear approximation to another)

Segmented Least Squares

• Goal in segmented Least Squares Problem: find a partition of minimal penalty

What is the optimal linear interpolation with two line segments?

Optimal interpolation with two segments

• Give an equation for the error of the optimal line ( having minimal least squares error ) through p1,…,pn with two line segments. Let Ei,j be the least squares error for the optimal line through pi, . . . pj

(DONE IN CLASS)

What is the optimal linear interpolation with three line segments?

Optimal interpolation with three segments

• Give an equation for the error of the optimal line ( having minimal least squares error ) through p1,…,pn with three line segments. Let Ei,j be the least squares error for the optimal line through pi, . . . pj

Need to find i and j which minimize (Ej+1,n + Ei+1,j + E1,i)

(Note we haven’t included a penalty term accounting for the number of segments)

Can we do this recursively?

What is the optimal linear interpolation with n line segments?

Segmented Least Squares

• Segmented least squares– Points lie roughly on a sequence of several line segments

– Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes f(x)

• Question: What's a reasonable choice for f(x) to balance accuracy and parsimony?

goodness of fit number of lines

x

y

Segmented Least Squares

• Segmented least squares– Points lie roughly on a sequence of several line segments

– Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes

• the sum of the sums of the squared errors E in each segment• the number of lines L• Tradeoff function: E + cL, for some constant c > 0

x

y

Optimal substructure property

Optimal solution with k line segments extends an optimal solution of k-1 line segments on a smaller problem

DP: Multiway Choice

• Notation– OPT[j] = minimum cost for points p1, p2 , . . . , pj

– Ei,j = minimum sum of squares for points pi, pi+1 , . . . , pj

• Give a recursive definition for OPT[j]

Notation.

OPT[j] = minimum cost for points p1, p2, . . . , pj.

Ei,j = minimum sum of squares for points pi, pi+1 , . . . , pj.

• To compute OPT[j]:–Last segment uses points pi, pi+1 , . . . , pj for some i.

–Cost = Ei,j + c + OPT[i-1].

–Which i ???

• Opt[j] = min 1 i j (Ei,j + c + Opt[i-1])

Segmented Least Squares: Algorithm

can be improved to O(n2) by pre-computing various statistics

INPUT: n, p1,…,pN , c

Segmented-Least-Squares() { Opt[0] = 0 for j = 1 to n for i = 1 to j compute the least square error Eij for the segment pi,…, pj

endfor for j = 1 to n Opt[j] = min 1 i j (Eij + c + Opt[i-1])

endfor

return Opt[n]}

• Total Running time: O(n3)

• Computing Ei,j for O(n2) pairs, O(n) per pair using previous formula

– this gives O(n3) to compute all Ei,j pairs

• Following this the algorithm has n iterations for values j = 1,…,n; for each value of j we have to compute the minimum of the recurrence to fill the array entry Opt[j]; this takes O(n) for each j; – This part gives O(n2)

Remark – there is an exercise in the text which shows how to reduce the total running time from O(n3) to O(n2)

Determining the solution

• When Opt[j] is computed, record the value of i that minimized the sum

• Store this value in an auxiliary array• Use to reconstruct solution

Determining the solution

Find-Segments(j)

If j = 0 then

0utput nothing

Else

Get i that minimizes Ei,j + C + Opt[i-1]

Output the segment {pi,…pj} and the result of Find-Segments(i-1)

Endif