Solutions of RDBMS

download Solutions of RDBMS

of 64

Transcript of Solutions of RDBMS

  • 7/21/2019 Solutions of RDBMS

    1/64

    Introduction to Algorithms September 24, 2004

    Massachusetts Institute of Technology 6.046J/18.410J

    Professors Piotr Indyk and Charles E. Leiserson Handout 7

    Problem Set 1 Solutions

    Exercise 1-1. Do Exercise 2.3-7 on page 37 in CLRS.

    Solution:

    The following algorithm solves the problem:

    1.Sort the elements in Susing mergesort.

    2.Remove the last element from S. Let y be the value of the removed element.

    3.If S is nonempty, look for z=x y in Susing binary search.4.If Scontains such an element z, then STOP, since we have found y and zsuch that x=y +z.

    Otherwise, repeat Step 2.

    5.If S is empty, then no two elements in Ssum to x.

    Notice that when we consider an element yi of Sduring ith iteration, we dont need to look at theelements that have already been considered in previous iterations. Suppose there exists yj S,such that x=yi+yj. Ifj < i, i.e. if yj has been reached prior to yi, then we would have found yi

    when we were searching for x

    yj

    during jth iteration and the algorithm would have terminatedthen.

    Step 1 takes (n lgn)time. Step 2 takes O(1)time. Step 3 requires at most lgn time. Steps 24are repeated at most n times. Thus, the total running time of this algorithm is (n lgn). We can doa more precise analysis if we notice that Step 3 actually requires (lg(n i))time at ith iteration.However, if we evaluate

    n1lg(ni), we get lg(n1)!, which is (n lgn). So the total runningi=1time is still (n lgn).

    Exercise 1-2. Do Exercise 3.1-3 on page 50 in CLRS.

    Exercise 1-3. Do Exercise 3.2-6 on page 57 in CLRS.

    Exercise 1-4. Do Problem 3-2 on page 58 of CLRS.

    Problem 1-1. Properties of Asymptotic Notation

    Prove or disprove each of the following properties related to asymptotic notation. In each of the

    following assume that f, g, and h are asymptotically nonnegative functions.

  • 7/21/2019 Solutions of RDBMS

    2/64

    2 Handout 7: Problem Set 1 Solutions

    (a) f(n)=O(g(n))and g(n)=O(f(n))implies that f(n)=(g(n)).

    Solution:

    This Statement is True.

    Since f(n)=O(g(n)), then there exists an n0and a c such that for all nn0, f(n)Similarly, since g(n) =O(f(n)), there exists an n0 and a c such that for allcg(n).

    f(n). Therefore, for all nmax(n0, nHence, f(n)=(g(n)).

    ( )g n,0 ),0 c1g(n)f(n)cg(n).nn c

    0

    (b) f(n)+g(n)=(max(f(n), g(n))).

    Solution:

    This Statement is True.

    For all n1, f(n)max(f(n), g(n))and g(n)max(f(n), g(n)). Therefore:

    f(n)+g(n)max(f(n), g(n))+max(f(n), g(n))2max(f(n), g(n))

    and so f(n)+g(n) =O(max(f(n), g(n))). Additionally, for each n, either f(n)max(f(n), g(n))or else g(n)max(f(n), g(n)). Therefore, for all n1, f(n)+g(n)max(f(n), g(n))and so f(n)+g(n)=(max(f(n), g(n))). Thus, f(n)+g(n)=(max(f(n), g(n))).

    (c) Transitivity: f(n)

    =

    O(g(n))

    and g(n)

    =

    O(h(n))

    implies that f(n)

    =

    O(h(n)).

    Solution:

    This Statement is True.

    Since f(n) = O(g(n)), then there exists an n0 and a c such that for all n n0, ) f( )n,0 ( )g n,0

    f(n)cg(n). Similarly, since g(n)=O(h(n)), there exists an nh(n). Therefore, for all nmax(n0, n

    and a c such thatfor all nnHence, f(n)=O(h(n)).

    cch(n).c

    (d) f(n)=O(g(n))implies that h(f(n))=O(h(g(n)).

    Solution:

    This Statement is False.

    We disprove this statement by giving a counter-example. Let f(n)=n and g(n)=3nand h(n) = 2n. Then h(f(n)) = 2n and h(g(n))=8n. Since 2n is not O(8n), thischoice of f, gand his a counter-example which disproves the theorem.

  • 7/21/2019 Solutions of RDBMS

    3/64

    3Handout 7: Problem Set 1 Solutions

    (e) f(n) +o(f(n)) = (f(n)).

    Solution:

    This Statement is True.

    Let h(n) =o(f(n)). We prove that f(n) +o(f(n))=(f(n)). Since for all n1,f(n) +h(n)f(n), then f(n) +h(n) = (f(n)).Since h(n) =o(f(n)), then there exists an n0 such that for all n>n0, h(n)f(n).Therefore, for all n > n0, f(n) +h(n) 2f(n)and so f(n) +h(n) = O(f(n)).Thus, f(n) +h(n) = (f(n)).

    (f) f(n) =o(g(n)) and g(n) =o(f(n)) implies f(n) = (g(n)).

    Solution:This Statement is False.

    We disprove this statement by giving a counter-example. Consider f(n) = 1+cos(n) and g(n) = 1 cos(n).For all even values of n, f(n) = 2and g(n) = 0, and there does not exist a c1 forwhich f(n)c1g(n). Thus, f(n) is not o(g(n)), because if there does not exist a c1for which f(n)c1g(n), then it cannot be the case that for any c1 >0 and sufficientlylarge n, f(n)

    0 for which c2g(n)

    f

    (n), because we could set c

    = 1/c2 if sucha c2 existed.

    We have shown that there do not exist constants c1 >0 and c2 >0 such that c2g(n)f(n)c1g(n). Thus, f(n) is not (g(n)).

    Problem 1-2. Computing Fibonacci Numbers

    The Fibonacci numbers are defined on page 56 of CLRS as

    F0 = 0,

    F1 = 1,

    Fn = Fn1 +Fn2 for n2.

    In Exercise 1-3, of this problem set, you showed that the nth Fibonacci number is

    Fn =nn

    ,5

    where is the golden ratio and is its conjugate.

  • 7/21/2019 Solutions of RDBMS

    4/64

    4 Handout 7: Problem Set 1 Solutions

    A fellow 6.046 student comes to you with the following simple recursive algorithm for computing

    the nth Fibonacci number.

    FIB(n)

    1 if n= 02 then return 03 elseif n= 14 then return 15 return FIB(n 1) + FIB(n 2)

    This algorithm is correct, since it directly implements the definition of the Fibonacci numbers.

    Lets analyze its running time. Let T(n) be the worst-case running time of F IB(n).1

    (a) Give a recurrence for T(n), and use the substitution method to show that T(n) =O(Fn).

    Solution: The recurrence is: T(n) =T(n 1) +T(n 2) + 1.We use the substitution method, inducting on n. Our Induction Hypothesis is: T(n)cFn

    b.To prove the inductive step:

    T(n) cFn1

    +cFn2

    b b+ 1 cFn 2b+ 1

    Therefore, T(n)

    cFn

    b+ 1 provided that b

    1. We choose b= 2 and c=10.

    For the base case consider n

    0,

    1} and note the running time is no more than

    {10 2 = 8.

    (b) Similarly, show that T(n) = (Fn), and hence, that T(n) = (Fn).

    Solution: Again the recurrence is: T(n) =T(n 1) +T(n 2) + 1.We use the substitution method, inducting on n. Our Induction Hypothesis is: T(n)Fn.

    To prove the inductive step:

    T

    (n)

    Fn1 +Fn2 + 1

    + 1 FnTherefore, T(n) Fn. For the base case consider n {0,1} and note the runningtime is no less than 1.

    1In this problem, please assume that all operations take unit time. In reality, the time it takes to add two num-

    bers depends on the number of bits in the numbers being added (more precisely, on the number of memory words).

    However, for the purpose of this problem, the approximation of unit time addition will suffice.

  • 7/21/2019 Solutions of RDBMS

    5/64

    5Handout 7: Problem Set 1 Solutions

    Professor Grigori Potemkin has recently published an improved algorithm for computing the nthFibonacci number which uses a cleverly constructed loop to get rid of one of the recursive calls.

    Professor Potemkin has staked his reputation on this new algorithm, and his tenure committee has

    asked you to review his algorithm.

    FIB (n)1 if n= 02 then return 03 elseif n= 14 then return 15

    6

    7

    8

    sum 1for k1 to n2

    do sum sum + FIB (k)return sum

    Since it is not at all clear that this algorithm actually computes the nth Fibonacci number, letsprove that the algorithm is correct. Well prove this by induction over n, using a loop invariant inthe inductive step of the proof.

    (c) State the induction hypothesis and the base case of your correctness proof.

    Solution: To prove the algorithm is correct, we are inducting on n. Our inductionhypothesis is that for all n

  • 7/21/2019 Solutions of RDBMS

    6/64

    6 Handout 7: Problem Set 1 Solutions

    (e) Use your loop invariant to complete the inductive step of your correctness proof.

    Solution: To complete the inductive step of our correctness proof, we must show that

    if F ib(n) returns Fn for all n

  • 7/21/2019 Solutions of RDBMS

    7/64

    7Handout 7: Problem Set 1 Solutions

    Solution:

    We can use this idea to recursively multiply polynomials of degree n1, where nisa power of 2, as follows:Letp(x)and q(x)be polynomials of degree n1, and divide each into the upper n/2and lower n/2terms:

    p(x) = a(x)xn/2 +b(x),

    q(x) = c(x)xn/2 +d(x),

    where a(x), b(x), c(x), and d(x)are polynomials of degree n/21. The polynomialproduct is then

    p(x)q(x) = (a(x)xn/2 +b(x))(c(x)xn/2 +d(x))

    =

    a(x)c(x)x

    n

    +

    (a(x)d(x)

    +

    b(x)c(x))xn/2

    +

    b(x)d(x)

    .

    The four polynomial products a(x)c(x), a(x)d(x), b(x)c(x), and b(x)d(x)are computed recursively.

    (b) Give and solve a recurrence for the worst-case running time of your algorithm.

    Solution:

    Since we can perform the dividing and combining of polynomials in time (n), recursive polynomial multiplication gives us a running time of

    T

    (n) = 4T

    (n/2)

    +

    (n)

    = (n2).

    (c) Show how to multiply two linear polynomials A(x)=a1x+a0 and B(x)=b1x+b0using only three coefficient multiplications.

    Solution:

    We can use the following 3 multiplications:

    m1 = (a

    +

    b)(c

    +

    d)

    =

    ac

    +

    ad

    +

    bc

    +

    bd

    ,

    m2 = ac,

    m3

    = bd,

    so the polynomial product is

    (ax+b)(cx+d)=m2x2 +(m1m2m3)x+m3 .

  • 7/21/2019 Solutions of RDBMS

    8/64

    8 Handout 7: Problem Set 1 Solutions

    (d) Give a divide-and-conquer algorithm for multiplying two polynomials of degree-bound nbased on your formula from part (c).

    Solution:

    The algorithm is the same as in part (a), except for the fact that we need only compute

    three products of polynomials of degree n/2to get the polynomial product.

    (e) Give and solve a recurrence for the worst-case running time of your algorithm.

    Solution:

    Similar to part (b):

    T

    (n) = 3T

    (n/2)

    +

    (n)

    lg3)= (n

    (n1.585)

    Alternative solution Instead of breaking a polynomial p(x)into two smaller polynomials a(x)and b(x)such that p(x) =a(x)+xn/2b(x), as we did above, we coulddo the following:

    Collect all the even powers of p(x)and substitute y = x2 to create the polynomiala(y). Then collect all the odd powers of p(x), factor out xand substitute y = x2 to

    create the second polynomial b(y). Then we can see that

    p(x)=a(y)+x b(y)

    Both a(y)and b(y)are polynomials of (roughly) half the original size and degree, andwe can proceed with our multiplications in a way analogous to what was done above.

    Notice that, at each level k, we need to compute yk

    = y2 (where y0 = x), whichk1takes time (1)per level and does not affect the asymptotic running time.

  • 7/21/2019 Solutions of RDBMS

    9/64

    Introduction to Algorithms October 1, 2004

    Massachusetts Institute of Technology 6.046J/18.410J

    Professors Piotr Indyk and Charles E. Leiserson Handout 9

    Problem Set 2 Solutions

    Reading: Chapters 5-9, excluding 5.4 and 8.4

    Both exercises and problems should be solved, but only the problems should be turned in.

    Exercises are intended to help you master the course material. Even though you should not turn in

    the exercise solutions, you are responsible for material covered in the exercises.

    Mark the top of each sheet with your name, the course number, the problem number, your

    recitation section, the date and the names of any students with whom you collaborated.

    You will often be called upon to give an algorithm to solve a certain problem. Your write-up

    should take the form of a short essay. A topic paragraph should summarize the problem you are

    solving and what your results are. The body of the essay should provide the following:

    1. A description of the algorithm in English and, if helpful, pseudo-code.

    2. At least one worked example or diagram to show more precisely how your algorithm works.

    3. A proof (or indication) of the correctness of the algorithm.

    4. An analysis of the running time of the algorithm.

    Remember, your goal is to communicate. Full credit will be given only to correct algorithms

    which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.

    Exercise 2-1. Do Exercise 5.2-4 on page 98 in CLRS.

    Exercise 2-2. Do Exercise 8.2-3 on page 170 in CLRS.

    Problem 2-1. Randomized Evaluation of Polynomials

    In this problem, we consider testing the equivalence of two polynomials in a finite field.

    Afield is a set of elements for which there are addition and multiplication operations that satisfycommutativity, associativity, and distributivity. Each element in a field must have an additive and

    multiplicative identity, as well as an additive and multiplicative inverse. Examples of fields include

    the real and rational numbers.

    Afinite field has a finite number of elements. In this problem, we consider the field of integers

    modulop. That is, we consider two integers a and b to be equal if and only if they have the sameremainder when divided byp, in which case we write abmodp. This field, which we denote asZ/p, haspelements, {0, . . . , p 1}.

  • 7/21/2019 Solutions of RDBMS

    10/64

    2 Handout 9: Problem Set 2 Solutions

    Consider a polynomial in the field Z/p:

    n

    a(x) =

    aix

    i modp

    (1)i=0

    A root orzero of a polynomial is a value of x for which a(x) = 0. The following theorem describesthe number of zeros for a polynomial of degree n.

    Theorem 1 A polynomial a(x) of degree nhas at most ndistinct zeros.

    Polly the Parrot is a very smart bird that likes to play math games. Today, Polly is thinking of a

    polynomial a(x) over the field Z/p. Though Polly will not tell you the coefficients of a(x), shewill happily evaluate a(x) for any x of your choosing. She challenges you to figure out whether or

    not a

    is equivalent to zero (that is, whether x

    {0, . . . , p 1}:

    a(x) 0 modp).

    Throughout this problem, assume that a(x) has degree n, where n

  • 7/21/2019 Solutions of RDBMS

    11/64

    3Handout 9: Problem Set 2 Solutions

    The problem thus becomes: if a is not equivalent to zero, choose k such that theprobability that all kqueries evaluate to zero is no more than 1%. Let denote the

    margin of error in the general case (

    =

    1%

    in this part), and let Qi be a randomvariable indicating the result of the ith query. The constraint is as follows:

    Pr[Q1 =0and Q2 =0and ... and Qk =0]

    = Pr[Q1 =0]Pr[Q2 =0]...Pr[Qk =0]

    (n/p)k

    The first step follows from the fact that all of the queries are independent. The second

    step utilizes the bound from Part (a). Solving for k, we have:

    (n/p)k

    klg(n/p) lg

    k lg/lg(n/p)

    The last step above utilizes the assumption that n

  • 7/21/2019 Solutions of RDBMS

    12/64

    4 Handout 9: Problem Set 2 Solutions

    Returns whether a(x)b(x) c(x)modp x {0,...,p 1}

    Correct with probability at least 1

    EQUIV(a[0

    .

    .

    .

    n],

    b[0

    .

    .

    .

    n/2],

    c[0

    .

    .

    .

    n/2],

    p,

    )

    1 k lg/lg(n/p)2 for i1to k3 do x RANDOM(0,q 1)4 a a(x)5 b b(x)6 c c(x)7 if a b c 0 (modp)8 then return false

    9 return true

    Correctness. For a given value ofx

    ,a(x) =

    b(x)

    c(x)

    if and only ifa(x)

    b(x)

    c(x) = 0. Thus, a(x)

    is equivalent to b(x)

    c(x)

    if and only if a(x)

    b(x)

    c(x)

    is

    equivalent to zero. Our solution to Part (b) shows how to determine with probability

    at least 1 whether or not a given polynomial is equivalent to zero. Using this sameprocedure, we test whether or not a(x)b(x)c(x) is equivalent to zero, therebydetermining whether or not a(x)is equivalent to b(x) c(x).

    Running time. We count the number of arithmetic operations in EQUIV. In this class,

    we assume that steps 1 and 7 are (1), as they are arithmetic operations on scalarvalues; also, we assume that the call to RANDOM on line 3 is (1). Each polynomialevaluation on lines 4-6 is (n), as an n-degree polynomial can be evaluated in (n)time using Horners rule. The loop runs

    k

    =

    lg

    /

    lg(n/p)times, performing

    (n)

    work on each iteration. The total runtime is thus T

    (n)

    =

    (nlg

    /

    lg(n/p)).

    Consider Potemkins proposal, with a runtime P(n) = (nlg2(3)), and let us evaluate the conditions under which T(n) = O(P(n)). Note that, in the runtime of ouralgorithm, we cannot considerpto be fixed as ngrows, since we require that n 1and is a fixed constant. Thenlg/lg(n/p) = lg/lg(1/c) = (1). Thus, the loop executes (1)times andthe algorithm is has a running time of (n), which is asymptotically faster thanPotemkins proposal.

    On the other hand, if p = n+1 while remains fixed, then lg/lg(n/p) =lg/lg(n/(n+1)) =lg/(lgn lg(n+1))). Intuitively, one can see that this

  • 7/21/2019 Solutions of RDBMS

    13/64

    5Handout 9: Problem Set 2 Solutions

    dis (n)becausedn

    (lgn) = 1/n, which means that lg(n+1)lgn 1/n andlg/(lgnlg(n+1)))lg/(1/n)=(n). Thus, the loop executes (n)times and the algorithm has a running time of (n2), which is asymptotically slowerthan Potemkins proposal.

    We can prove more rigorously that lg/(lgnlg(n+1)))=(n)by appealing tothe following identity [CLRS, p.53]:

    n1lim 1+ =en n

    Then, using the definition of limit, there exist positive constants c1, c2, and n0 suchthat for all n>n0:

    n1

    c1e 1+ c

    2e

    n1

    ln(c1e)nln 1+ ln(c2e) [take natural log]n

    n+1ln(c1e)nln ln(c2e) [simplify]

    nln(c1e)n(ln(n+1)ln(n))ln(c2e) [simplify]

    ln(c1e)/nln(n+1)ln(n)ln(c2e)/n [divide by n]

    n/ln(c1e)1/(ln(n+1)ln(n))n/ln(c2e) [take inverse]

    n/ln(c1e)1/(ln(n)ln(n+1))n/ln(c2e) [simplify]

    Thus, 1/(ln(n)

    ln(n

    +

    1))

    =

    (n)

    because it is bounded above and below bya constant factor times n. By adjusting the constants, this implies that lg/(lgnlg(n+1)))=(n).

    Finally, we point out the desirable property that the algorithm is logarithmic in forfixed values of pand n. Decreasing the error margin by a given factor results in onlyan additive increase in the runtime.

    Problem 2-2. Distributed Median

    Alice has an array A[1..n], and Bob has an array B[1..n]. All elements in A and B are distinct.

    Alice and Bob are interested in finding the median element of their combined arrays. That is, theywant to determine which element msatisfies the following property:

    |{i[1,n] :A[i]m}|+|{i[1,n] :B[i]m}|=n (3)

    This equation says that there are a total of nelements in both A and B that are less than or equal tom. Note that mmight be drawn from either A or B.

    Because Alice and Bob live in different cities, they have limited communication bandwidth. They

    can send each other one integer at a time, where the value either falls within {0,...,n}or is drawn

  • 7/21/2019 Solutions of RDBMS

    14/64

    6 Handout 9: Problem Set 2 Solutions

    from the original A or B arrays. Each numeric transmission counts as a communication between

    Alice and Bob. One goal of this problem is to minimize the number of communications needed to

    compute the median.

    (a) Give a deterministic algorithm for computing the combined median of A and B. Your

    algorithm should run in O(n logn) time and use O(logn)communications. (Hint:consider sorting.)

    Solution:

    The algorithm works as follows. Alice and Bob begin by sorting their arrays using

    a deterministic (n log

    n)

    algorithm such as HeapSort or MergeSort. Then, Alice

    assumes the role of the master and Bob the role of the slave. Alice considers an

    element A[i]and sends n i to Bob, who returns two elements: B[n i]and B[n i+1]. Because A is sorted, A[i] is the combined median if and only if there areexactly n i elements in B that are less than A[i]. Because B is sorted, this conditionis reduced to checking whether or not B[ni]

  • 7/21/2019 Solutions of RDBMS

    15/64

    7Handout 9: Problem Set 2 Solutions

    ALICE(A[1. . . n])1 HEAPSORT(A)

    2 median

    MASTER(A)

    3 if median =NIL

    BOB(B[1. . . n])1 HEAPSORT(B)

    2 median

    SLAVE(B)

    3 if median =NIL4 then median SLAVE(A) 4 then median MASTER(B)5 return median 5 return median

    MASTER(M[1. . . n])1 lower 12 upper n3 median NIL4 while lower upper and median =NIL SLAVE(S[1. . . n])5 do i lower+(upper lower)/2 1 while true

    6 send n

    i

    2 do receivej7 receive b1

    3 ifj =DONE8 receive b2 4 then receive median9 cur M[i] 5 return median

    10 if b1

  • 7/21/2019 Solutions of RDBMS

    16/64

    8 Handout 9: Problem Set 2 Solutions

    As before, Alice and Bob begin by sorting their arrays using a deterministic (n logn)algorithm such as HeapSort or MergeSort. Then, Alice assumes the role of the master

    and Bob the role of the slave. When Alice sends a value A[i]to Bob, Bob returns thenumber of elements, count(A[i]), in his array that are less than A[i]. Because A

    is

    sorted, the element A[i]is the combined median if and only if i+count(A[i]) = n.Alice checks this condition and returns A[i]as the median if the condition holds. If thecondition fails, then she proceeds to do a binary search within her array. The search

    is on i, with an initial range of [1, n]. On each step, she descends into the top half ofthe range if i+count(A[i])n.Because the quantity i+count(A[i])is a monotonic function of i, the search terminateswith A[i] =i +count(A[i])if the combined median is stored within A.

    If the combined median is not held in A, then Alices binary search terminates after

    1

    +

    lg

    n steps and returns a value of NIL. In this case, Alice and Bob swap roles,with Bob becoming the master and Alice the slave. The procedure is repeated, and this

    time the binary search returns the combined median because it must be stored within

    B.

    For clarity, pseudocode for this algorithm is given below.

    ALICE(A[1. . . n])1 HEAPSORT(A)2 median MASTER(A)3 if median =NIL

    4 then median SLAVE(A)

    5 return median

    BOB(B[1. . . n])1 HEAPSORT(B)2 median SLAVE(B)3 if median =NIL

    4 then median MASTER(B)

    5 return median

    MASTER(M[1. . . n]) SLAVE(S[1. . . n])1 lower 1 1 while true2 upper n 2 do receive val3 median NIL 3 if val =DONE4 while lower upper and median =

    NIL 4 then receive median

    5 do i lower+(upper lower)/2 5 return median6 send A[i] 6 else send |i[1, n] :S[i] val|7 receive count

    8 if i +

    count =

    n

    9 then median=M[i]10 elseif i +count

  • 7/21/2019 Solutions of RDBMS

    17/64

    9Handout 9: Problem Set 2 Solutions

    Running Time. All but three statements are (1) time. Both Alice and Bob callHeapSort, which is (n lgn). Line 6 of SLAVE counts how many elements in Sareat most val. This can be implemented in (n)time with a brute-force comparison or,because the array is sorted, in (lg

    n)

    time using a binary search. The last statements

    of interest are lines 6-7 of MASTER, which wait for one iteration of SLAVE. Since the

    slave executes (lgn)operations between a receive and send statement, lines 6-7 ofMASTER are also (lgn).

    It remains to account for the loops. The loop in MASTER is performing a binary

    search, which (as we saw in lecture) requires (lgn)iterations. Each iteration does(lgn)work, so the total running time for the loop is (lg2n). The loop in SLAVEterminates when it receives a DONE value, which happens exactly when the loop

    in MASTER terminates; thus, SLAVE is also (lg2n). Alice and Bob each execute

    MASTER, SLAVE and HeapSort; HeapSort dominates, yielding a final running time of(n lgn).

    Communication cost. Most of the communication is in the loop of MASTER, in

    which two items are relayed between Alice and Bob per each iteration. Since this

    loop executes (lgn)times, it contributes (lgn)communications. The items sentand received at the end of MASTER contribute (1)communications, leaving the totalat (lgn).

    (b) Give a randomized algorithm for computing the combined median of A and B. Your

    algorithm should run in expected O(n)

    time and use expected O(log

    n)

    communications. (Hint: consider RANDOMIZED-SELECT.)

    Solution:

    The algorithm is almost identical to Part (a). As before, Alice starts as the master

    and conducts a binary search through Bobs elements, looking for A[i] such thatB[ni]

  • 7/21/2019 Solutions of RDBMS

    18/64

    10 Handout 9: Problem Set 2 Solutions

    MASTER(M[1. . . n])1 lower 1

    2 upper n

    3 median NIL4 while lower upper and median =NIL5 do i lower+(upper lower)/26 send n i

    7 send n upper 8 send n lower

    9 receive b1

    10 receive b211 cur RANDOMIZED-SELECT(M, lower, upper, i lower +1)

    12 if b1

  • 7/21/2019 Solutions of RDBMS

    19/64

    11Handout 9: Problem Set 2 Solutions

    For the inductive step, assume I is true on the current iteration. Then the call toRANDOMIZED-SELECT will partition around the ith smallest element in M, because1) (by the inductive hypothesis) the smallest element in M[lower . . . upper] is thelowerth smallest element in M, 2) by our call to RANDOMIZED-SELECT, we areselecting for the ilowerth smallest element in M[lower . . . upper](we also add 1to compensate for the 1-based array indexing) and 3) lower +i lower = i. Thus, thehypothesis will be satisfied for the range {lower, . . . , i}and {i , . . . , upper}becausethe PARTITION subroutine of RANDOMIZED-SELECT will place elements on the ap

    propriate side of the pivot i. Finally, the inductive hypothesis will hold on the nextiteration because we assign either lower or upper to be adjacent to i(but excluding ifrom the next range).

    Using the invariant, we conclude that the call to RANDOMIZED-PARTITION on line

    11 of MASTER

    returns the ith smallest element in M, which is equivalent to the expression M[i]from Part (a).

    It remains to show the equivalent property for lines 10 and 11 of SLAVE. This is done

    using the same loop invariant, but translating j = ni, bottom = nupper, andtop =nlower across the call between MASTER and SLAVE. In this way, we canshow that lines 10 and 11 of SLAVE return the nith and ni+1th smallest elementsof S, respectively.

    We have shown that our changes from Part (a) preserve the behavior of the algorithm,

    and thus the algorithm remains correct.

    Running Time. We can write a recurrence to model the running time of the main loopin MASTER. Let m=upperlower. On each iteration, m decreases to at most m/2and RANDOMIZED-SELECT runs three times (once in MASTER, twice in SLAVE)

    over a segment of size m, with expected running time (m). Thus E[T(m)] =E[T(m/2)]+(m), and by Case 3 of the Master Theorem, E[T(m)] = (m).Finally, noting that m = nat the beginning of the procedure, we have that the expected running time is (n).

    Communication cost. The communication cost is identical to Part (a), as the loop

    in MASTER still executes (lgn)iterations and sends (1)items on each iteration.Thus, the total number of communications is (lgn). (Note that this algorithm gives

    a deterministic bound on the number of communications.)

    Problem 2-3. American Gladiator

    You are consulting for a game show in which n contestants are pitted against n gladiators in order tosee which contestants are the best. The game show aims to rank the contestants in order of strength;

    this is done via a series of 1-on-1 matches between contestants and gladiators. If the contestant is

    stronger than the gladiator, then the contestant wins the match; otherwise, the gladiator wins the

  • 7/21/2019 Solutions of RDBMS

    20/64

    12 Handout 9: Problem Set 2 Solutions

    match. If the contestant and gladiator have equal strength, then they are perfect equals and a

    tie is declared. We assume that each contestant is the perfect equal of exactly one gladiator, and

    each gladiator is the perfect equal of exactly one contestant. However, as the gladiators sometimes

    change from one show to another, we do not know the ordering of strength among the gladiators.

    The game show currently uses a round-robin format in which (n2)matches are played and contestants are ranked according to their number of victories. Since few contestants can happily endure

    (n)gladiator confrontations, you are called in to optimize the procedure.

    (a) Give a randomized algorithm for ranking the contestants. Using your algorithm, the

    expected number of matches should be O(n logn).

    Solution:

    The problem statement does not describe exactly how the contestants and gladiatorsare specified, so we first need to come up with a reasonable representation for the

    input. Lets assume the contestants and gladiators are provided to us in two arrays

    C[1. . . n]and G[1. . . n], where we are allowed to compare elements across, but notwithin, these two arrays.

    We use a divide-and-conquer algorithm very similar to randomized quicksort. The al

    gorithm first performs a partition operation as follows: pick a random contestant C[i].Using this contestant, rearrange the array of gladiators into three groups of elements:

    first the gladiators weaker than C[i], then the gladiator that is the perfect equal of C[i],and finally the gladiators stronger than C[i]. Next, using the gladiator that is the per

    fect equal of C[i]

    we perform a similar partition of the array of contestants. This pairof partitioning operations can easily be implemented in (n)time, and it leaves thecontestants and gladiators nicely partitioned so that the pivot contestant and glad

    iator are aligned with each other and all other contestants and gladiators are on the

    correct side of these pivots weaker contestants and gladiators precede the pivots,

    and stronger contestants and gladiators follow the pivots. Our algorithm then finishes

    by recursively applying itself to the subarrays to the left and right of the pivot position

    to sort these remaining contestants and gladiators. We can assume by induction on nthat these recursive calls will properly sort the remaining contestants.

    To analyse the running time of our algorithm, we can use the same analysis as that

    of randomized quicksort. We are performing a partition operation in (n)

    time thatsplits our problem into two subproblems whose sizes are randomly distributed ex

    actly as would be the subproblems resulting from a partition in randomized quicksort.

    Therefore, applying the analysis from quicksort, the expected running time of our

    algorithm is (n logn).

    Interesting side note: Although devising an efficient randomized algorithm for this

    problem is not too difficult, it appears to be very difficult to come up with a deter

    ministic algorithm with running time better than the trivial bound of O(n2). This

  • 7/21/2019 Solutions of RDBMS

    21/64

    13Handout 9: Problem Set 2 Solutions

    remained an open research question until the mid-to-late 90s, when a very compli

    cated deterministic algorithm with (nlogn) running time was finally discovered.This problem provides a striking example of how randomization can help simplify the

    task of algorithm design.

    (b) Prove that any algorithm that solves part (a) must use (nlogn)matches in the worstcase. That is, you need to show a lower bound for any deterministic algorithm solving

    this problem.

    Solution:

    Lets use a proof based on decision trees, as we did for comparison-based sorting.

    Note that we can model any algorithm for sorting contestants and gladiators as a de

    cision tree. The tree will be a ternary tree, since every comparison has three possible

    outcomes: weaker, equal, or stronger. The height of such a tree corresponds to the

    worst-case number of comparisons made by the algorithm it represents, which in turn

    is a lower bound on the running time of that algorithm. We therefore want a lower

    bound of (nlogn)on the height, H, of any decision tree that solves part (a). Tobegin with, note that the number of leaves Lin any ternary tree must satisfy

    L3H.

    Next, consider the following class of inputs. Let the input array of gladiators Gbefixed and consist of ngladiators sorted in order of increasing strength, and considerone potential input for every permutation of the contestants. Our algorithm must in

    this case essentially sort the array of contestants. In our decision tree, if two different

    inputs of this type were mapped to the same leaf node, our algorithm would attempt

    to apply to both of these the same permutation of contestants, and it follows that the

    algorithm could not compute a ranking correctly for both of these inputs. Therefore,

    we must map every one of these n!different inputs to a distinct leaf node, so

    L n!

    3H n!

    H log3n!

    H = (nlogn) [Using Stirlings approximation]

  • 7/21/2019 Solutions of RDBMS

    22/64

    Introduction to Algorithms October 22, 2004

    Massachusetts Institute of Technology 6.046J/18.410J

    Professors Piotr Indyk and Charles E. Leiserson Handout 9

    Problem Set 3 Solutions

    Reading: Chapters 12.1-12.4, 13, 18.1-18.3

    Both exercises and problems should be solved, but only the problems should be turned in.

    Exercises are intended to help you master the course material. Even though you should not turn in

    the exercise solutions, you are responsible for material covered in the exercises.

    Mark the top of each sheet with your name, the course number, the problem number, your

    recitation section, the date and the names of any students with whom you collaborated.

    Three-hole punch your paper on submissions.

    You will often be called upon to give an algorithm to solve a certain problem. Your write-up

    should take the form of a short essay. A topic paragraph should summarize the problem you are

    solving and what your results are. The body of the essay should provide the following:

    1. A description of the algorithm in English and, if helpful, pseudo-code.

    2. At least one worked example or diagram to show more precisely how your algorithm works.

    3. A proof (or indication) of the correctness of the algorithm.

    4. An analysis of the running time of the algorithm.

    Remember, your goal is to communicate. Full credit will be given only to correct algorithms

    which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.

    Exercise 3-1. Do Exercise 12.1-2 on page 256 in CLRS.

    Exercise 3-2. Do Exercise 12.2-1 on page 259 in CLRS.

    Exercise 3-3. Do Exercise 12.3-3 on page 264 in CLRS.

    Exercise 3-4. Do Exercise 13.2-1 on page 278 in CLRS.

    Problem 3-1. Packing Boxes

    The computer science department makes a move to a new building offering the faculty and graduate

    students boxes, crates and other containers. Prof. Potemkin, afraid of his questionable tenure case,

    spends all of his time doing research and absentmindedly forgets about the move until the last

    minute. His secretary advises him to use the only remaining boxes, which have capacity exactly

    1 kg. His belongings consists of nbooks that weigh between 0 and 1 kilograms. He wants to

    minimize the total number of used boxes.

  • 7/21/2019 Solutions of RDBMS

    23/64

    2 Handout 9: Problem Set 3 Solutions

    Prof. Potemkin realizes that this packing problem is NP-hard, which means that the research

    community has not yet found a polynomial time algorithm1 that solves this problem exactly.

    He thinks of the heuristic approach called BEST-PACK:

    1.Take the books in the order in which they appear on his shelves.

    2.For each book, scan the boxes in increasing order of the remaining capacity and place the

    book in the first box in which it fits.

    (a) Describe a data structure that supports efficient implementation of BEST-PACK. Show

    how to use your data structure to get that implementation.

    Solution: BEST-PACK can be implemented using any data structure that supports the

    following three operations:

    1. Insert(x), where x is an element and key[x]is a number

    2. Delete(x)

    3. Successor(x), which reports the smallest x such that key[x]k

    There are several ways to obtain such a data structure. For example, one can use red-

    black trees or 23trees. Because they are balanced, they support Insert, Delete andSuccessor operations in O(logn)time. Even though the Successor operation was notexplained for 23trees, they can be implemented by modifying search.

    Our implementation is as follows: We use the remaining capacity of the boxes as the

    key in the binary tree. Suppose that the elements weigh w1, . . . , wn. Then, for a givenbook with weight wi, if there are no boxes that are already used and whose remaining

    capacity is greater than wi (i.e., the successor of wi), then we assign wi to a new box.

    (b) Analyze the running time of your implementation.

    Solution: The BEST-PACK implementation performs O(n)operations on the datastructure which implies that the total running time is O(n logn)

    1That is, an algorithm with running time O(nk)for some fixed k.

  • 7/21/2019 Solutions of RDBMS

    24/64

    3Handout 9: Problem Set 3 Solutions

    Soon, Prof. Potemkin comes up with another heuristic WORST-PACK, which is as follows:

    1.Take the books in the order in which they appear on his shelves.

    2.For each book, find a partially used box which has the maximum remaining capacity. If

    possible, place the book in that box. Otherwise, put the book into a new box.

    (c) Describe a data structure that supports an efficient implementation of WORST-PACK.

    Show how to use your data structure to get that implementation.

    Solution: WORST-PACK can be implemented using any priority queue data structure.

    We learned in recitation that a heap implements this data structure in O(logn)time.You can also use a balanced search tree to implement these operations.

    Our implementation is as follows: Pick a book. Delete the maximum from the priorityqueue. If the capacity is greater than the weight of the book, insert the book and reduce

    the capacity of the box. Reinsert the box in the priority queue. Otherwise pick a new

    box and insert the book.

    (d) Analyze the running time of your implementation.

    Solution: Our implementation performs O(n)operations. This means that the totalrunning time is O(nlogn).

  • 7/21/2019 Solutions of RDBMS

    25/64

    =

    4 Handout 9: Problem Set 3 Solutions

    Problem 3-2. AVL Trees

    An AVL tree is a binary search tree with one additional structural constraint: For any of its internal

    nodes, the height difference between its left and right subtree is at most one. We call this propertybalance. Remember that the height is the maximum length of a path to the root.

    For example, the following binary search tree is an AVL tree:

    5

    3 7

    2

    4

    Balanced AVL Tree

    Nevertheless, if you insert 1, the tree becomes unbalanced.

    In this case, we can rebalance the tree by doing a simple operation, called a rotation, as follows:

    5

    3 7

    2 4

    1

    Rotation

    3

    2 5

    1 4 7

    Unbalanced Balanced

    See CLRS, p. 278 for the formal definition of rotations.

    (a) If we insert a new element into an AVL tree of height 4, is one rotation sufficient to

    re-establish balance? Justify your answer.

    Solution: No, one rotation is not always sufficient to re-establish balance. For exam

    ple, consider the insertion of the shaded node in the following AVL tree:

  • 7/21/2019 Solutions of RDBMS

    26/64

    5Handout 9: Problem Set 3 Solutions

    Though the original tree was balanced, more than one rotation is needed to restore

    balance following the insertion. This can be seen by an exhaustive enumeration of the

    rotation possibilities.

    The problem asks for a tree of height 4, so we can extend the above example into a

    larger tree:

    (b) Denote the minimum number of nodes of an AVL tree of height hby M(h). A treeof height 0has one node, so M(0)=1. What is M(1)? Give a recurrence for M(h).Show that M(h)is at least Fh, where Fh is the hth Fibonacci number.

    Solution: M(1)= 2. For h 2, the tree will consist of a root plus two subtrees.Since the tree is of height h, one of the subtrees must be of height h1. The minimumnumber of nodes in this subtree is M

    (h1). Since the height of the subtrees can differby at most 1, the minimum number of nodes in the other subtree is M(h2). Thusthe total number of nodes is M

    (h)=M(h1)+M(h2)+1.

    Note that M(h)is remarkably similar to the Fibonacci numbers and that the recursionholds for the worse case AVL trees, which are called Fibonacci trees. It is easy to

  • 7/21/2019 Solutions of RDBMS

    27/64

    6 Handout 9: Problem Set 3 Solutions

    show by induction that M(h)=F(h+3)1. Note that, as shown in Problem Set 1,h

    1+ h+3F(h) 1where =2

    5. This implies that M(h) 2.5 5

    (c) Denote by nthe number of nodes in an AVL tree. Note that nM(h). Give an upperbound for the height of an AVL tree as a function of n.

    h+3Solution: We know that nM(h) 2 . Therefore, solving for h, we get5

    that his O(lgn).

  • 7/21/2019 Solutions of RDBMS

    28/64

    Introduction to Algorithms October 24, 2004

    Massachusetts Institute of Technology 6.046J/18.410J

    Professors Piotr Indyk and Charles E. Leiserson Handout 18

    Problem Set 4 Solutions

    Reading: Chapters 17, 21.121.3

    Both exercises and problems should be solved, but only the problems should be turned in.

    Exercises are intended to help you master the course material. Even though you should not turn in

    the exercise solutions, you are responsible for material covered in the exercises.

    Mark the top of each sheet with your name, the course number, the problem number, your

    recitation section, the date and the names of any students with whom you collaborated.

    Three-hole punch your paper on submissions.

    You will often be called upon to give an algorithm to solve a certain problem. Your write-up

    should take the form of a short essay. A topic paragraph should summarize the problem you are

    solving and what your results are. The body of the essay should provide the following:

    1. A description of the algorithm in English and, if helpful, pseudo-code.

    2. At least one worked example or diagram to show more precisely how your algorithm works.

    3. A proof (or indication) of the correctness of the algorithm.

    4. An analysis of the running time of the algorithm.

    Remember, your goal is to communicate. Full credit will be given only to correct algorithms

    which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.

    Exercise 4-1. The Ski Rental Problem

    A father decides to start taking his young daughter to go skiing once a week. The daughter may

    lose interest in the enterprise of skiing at any moment, so the kth week of skiing may be the last,for any k. Note that kis unknown.

    The father now has to decide how to procure skis for his daughter for every weekly session (until

    she quits). One can buy skis at a one-time cost of Bdollars, or rent skis at a weekly cost of Rdollars. (Note that one can buy skis at any timee.g., rent for two weeks, then buy.)

    Give a 2-competitive algorithm for this problemthat is, give an online algorithm that incurs atotal cost of at most twice the offline optimal (i.e., the optimal scheme if kis known).

    Problem 4-1. Queues as Stacks

    Suppose we had code lying around that implemented a stack, and we now wanted to implement a

    queue. One way to do this is to use two stacks S1 and S2. To insert into our queue, we push into

  • 7/21/2019 Solutions of RDBMS

    29/64

    2 Handout 18: Problem Set 4 Solutions

    stack S1. To remove from our queue we first check if S2 is empty, and if so, we dump S1 into S2(that is, we pop each element from S1 and push it immediately onto S2). Then we pop from S2.

    For instance, if we execute INSERT(a), INSERT(b), DELETE(), the results are:S1 =[] S2 =[]

    INSERT(a) S1 =[a] S2 =[]INSERT(b) S1 =[b a] S2 =[]DELETE() S1 =[] S2 =[a b] dump

    S1 =[] S2 =[b] pop (returns a)

    Suppose each push and pop costs 1 unit of work, so that performing a dump when S1has nelementscosts 2nunits (since we do npushes and npops).

    (a) Suppose that (starting from an empty queue) we do 3 insertions, then 2 removals,

    then 3 more insertions, and then 2 more removals. What is the total cost of these 10operations, and how many elements are in each stack at the end?

    Solution: The total work is 3 + (6 + 2) + 3 + (1 + 6 + 1) = 22. At the end, S1 has 0elements, and S2 has 2.

    (b) If a total of ninsertions and nremovals are done in some order, how large might therunning time of one of the operations be (give an exact, non-asymptotic answer)? Give

    a sequence of operations that induces this behavior, and indicate which operation has

    the running time you specified.

    Solution: An insertion always takes 1 unit, so our worst-case cost must be caused by

    a removal. No more that nelements can ever be in S1, and no fewer than 0 elementscan be in S2. Therefore the worst-case cost is 2n+ 1: 2nunits to dump, and one extrato pop from S2. This bound is tight, as seen by the following sequence: perform ninsertions, then nremovals. The first removal will cause a dump of nelements plus apop, for 2n+ 1 work.

    (c) Suppose we perform an arbitrary sequence of insertions and removals, starting from

    an empty queue. What is the amortized cost of each operation? Give as tight (i.e.,

    non-asymptotic) of an upper bound as you can. Use the accounting method to prove

    your answer. That is, charge $x

    for insertion and $y

    for deletion. What are x

    and y?Prove your answer.

    Solution: The tightest amortized upper bounds are 3 units per insertion, and 1 unit per

    removal. We will prove this 2 ways (using the accounting and potential methods; the

    aggregate method seems too weak to employ elegantly in this case). (We would also

    accept valid proofs of 4 units per insertion and 0 per removal, although this answer is

    looser than the one we give here.)

  • 7/21/2019 Solutions of RDBMS

    30/64

    3Handout 18: Problem Set 4 Solutions

    Here is an analysis using the accounting method: with every insertion we pay $3: $1

    is used to push onto S1, and the remaining $2 remain attached to the element justinserted. Therefore every element in S1 has $2 attached to it. With every removal wepay $1, which will (eventually) be used to pop the desired element off of S2. Beforethat, however, we may need to dump S1 into S2; this involves popping each elementoff of S1 and pushing it onto S2. We can pay for these pairs of operations with the $2attached to each element in S1.

    (d) Now well analyze the structure using the potential method. For a queue Q implemented as stacks S1 and S2, consider the potential function

    (Q) = number of elements in stack S1.

    Use this potential function to analyze the amortized cost of insert and delete opera

    tions.

    Solution: Let |S1i| denote the number of elements in S1 after the ith operation. Then

    the potential function on our structure Qi (the state of the queue after the ith operation) is defined to be (Qi) = 2|S

    i|. Note that |S1i| 0 at all times, so (Qi)0.1

    Also, |S10| = 0 initially, so (Q0) = 0 as desired.

    Now we compute the amortized costs: for an insertion, we have Si+1 =Si + 1, and1 1the actual cost ci = 1, so

    ci =ci + (Qi+1) (Qi) = 1 + 2(S1i + 1) 2(S1

    i) = 3.

    For a removal, we have two cases. First, when there is no dump from S1 to S2, theactual cost is 1, and Si+1 =S1i. Therefore ci = 1. When there is a dump, the actual1

    cost is 2|Si| + 1, and we have Si+1 = 0. Therefore we get1 1

    ci = (2|S1i| + 1) + 0 2|Si| = 11

    as desired.

    Problem 4-2. David Digs Donuts

    Your TA David has two loves in life: (1) roaming around Massachusetts on his forest-green Can

    nondale R300 road bike, and (2) eating Boston Kreme donuts. One Sunday afternoon, he is biking

    along Main Street in Acton, and suddenly turns the corner onto Mass Ave. (Yes, that Mass Ave.)His growling stomach announces that it is time for a donut. Because Mass Ave has so many donut

    shops along it, David decides to find a shop somewhere along that street. He faces two obstacles in

    his quest to satisfy his hunger: first, he does not know whether the nearest donut shop is to his left

    or to his right (or how far away the nearest shop is); and second, when he goes riding his contact

    lenses dry out dramatically, blurring his vision, and he cant see a donut shop until he is directly in

    front of it.

    You may assume that all donut shops are at an integral distance (in feet) from the starting location.

  • 7/21/2019 Solutions of RDBMS

    31/64

    4 Handout 18: Problem Set 4 Solutions

    (a) Give an efficient (deterministic) algorithm for David to locate a donut shop on Mass

    Ave as quickly as possible. Your algorithm will be online in the sense that the location

    of the nearest donut shop is unknown until you actually find the shop. The algorithm

    should be O(1)-competitive: if the nearest donut shop is distance d

    away from Davids

    starting point, the total distance that David has to bike before he gets his donut should

    be O(d). (The optimal offline algorithm would require David to bike only distance d.)

    Solution: WLOG, lets call the two directions of Mass Ave east and west.

    1. Check for a shop at the origin.

    2. i:=0.

    3. direction :=east;

    4. Repeat the following until a donut is found:

    (a) Bike 2i

    units in direction direction. If you pass a donut shop, stop and eat.(b) Bike 2i units back to the origin.

    (c) i:=i+1.

    (d) direction :=direction.

    Notice that you are back at the origin after every iteration of the loop.

    Suppose that the nearest donut shop is dfeet away from the origin. Let kbe such that2k

  • 7/21/2019 Solutions of RDBMS

    32/64

    5Handout 18: Problem Set 4 Solutions

    iteration. Thus the expected travel distance is

    k+1

    i=02

    k

    +d)/2 + (

    k+2

    i=02(

    k

    +d)/2

    (5d

    + 9d)/2 = 7d.

  • 7/21/2019 Solutions of RDBMS

    33/64

    Introduction to Algorithms Octoberber 31, 2004

    Massachusetts Institute of Technology 6.046J/18.410J

    Professors Piotr Indyk and Charles E. Leiserson Handout 21

    Problem Set 5 Solutions

    Reading: Chapters 15, 16

    Both exercises and problems should be solved, but only the problems should be turned in.

    Exercises are intended to help you master the course material. Even though you should not turn in

    the exercise solutions, you are responsible for material covered in the exercises.

    Mark the top of each sheet with your name, the course number, the problem number, your

    recitation section, the date and the names of any students with whom you collaborated.

    Three-hole punch your paper on submissions.

    You will often be called upon to give an algorithm to solve a certain problem. Your write-up

    should take the form of a short essay. A topic paragraph should summarize the problem you are

    solving and what your results are. The body of the essay should provide the following:

    1. A description of the algorithm in English and, if helpful, pseudo-code.

    2. At least one worked example or diagram to show more precisely how your algorithm works.

    3. A proof (or indication) of the correctness of the algorithm.

    4. An analysis of the running time of the algorithm.

    Remember, your goal is to communicate. Full credit will be given only to correct algorithms

    which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.

    Exercise 4-1. Do Exercise 15.2-1 on page 338 in CLRS.

    Exercise 4-2. Do exercise 15.3-4 on page 350 in CLRS.

    Exercise 4-3. Do exercise 15.4-4 on page 356 in CLRS and show how to reconstruct the actual

    longest common subsequence.

    Exercise 4-4. Do exercise 16.1-3 on page 379 in CLRS.

    Exercise 4-5. Do exercise 16.3-2 on page 392 in CLRS.

    Problem 4-1. Typesetting

    In this problem you will write a program (real code that runs!!!) to solve the following typesetting

    problem. Because of the trouble you may encounter while programming, we advise you to

    START THIS PROBLEM AS SOON AS POSSIBLE.

  • 7/21/2019 Solutions of RDBMS

    34/64

    2 Handout 21: Problem Set 5 Solutions

    You have an input text consisting of a sequence of nwords of lengths 1,2,...,n, where thelength of a word is the number of characters it contains. Your printer can only print with its built-in

    Courier 10-point fixed-width font set that allows a maximum of M characters per line. (Assumethat i M for all i = 1,...,n.) When printing words iand i+ 1 on the same line, one spacecharacter (blank) must be printed between the two words. Thus, if words ithrough jare printedon a line, the number of extra space characters at the end of the linethat is, after word jisMj+i

    j k.k=iTo produce nice-looking output, the heuristic of setting the cost to the square of the number of

    extra space characters at the end of the line has empirically shown itself to be effective. To avoid

    the unnecessary penalty for extra spaces on the last line, however, the cost of the last line is 0. Inother words, the cost linecost(i,j) for printing words ithroughjon a line is given by

    if words ithroughjdo not fit into a line,

    linecost(i,

    j) =

    0

    ifj

    =n

    (i.e. last line), Mj+i

    j 2otherwise.k=ik

    The total cost for typesetting a paragraph is the sum over all lines in the paragraph of the cost of

    each line. An optimal solution is an arrangement of the nwords into lines in such a way that thetotal cost is minimized.

    (a) Argue that this problem exhibits optimal substructure.

    Solution: First, notice that linecost(i,j) is defined to be if the words ithroughjdo not fit on a line to guarantee that no lines in the optimal solution overflow. (This

    relies on the assumption that the length of each word is not more than M.) Second,notice that linecost(i,j) is defined to be 0 when j =n, where nis the total numberof words; only the actual last line has zero cost, not the recursive last lines of subprob

    lems, which, since they are not the last line overall, have the same cost formula as any

    other line.

    Consider an optimal solution of printing words 1 through n. Let ibe the index of thefirst word printed on the last line of this solution. Then typesetting of words 1,...,i1must be optimal. Otherwise, we could paste in an optimal typesetting of these words

    and improve the total cost of solution, a contradiction. Please notice that the same

    cut-and-paste argument can be applied if we take ito be the index of the first word

    printed on the kth line, where 2

    k

    n. Therefore this problem displays optimalsubstructure.

    (b) Define recursively the value of an optimal solution.

    Solution: Let c(j) be the optimal cost of printing words 1 through j. From part (a),we see that given the optimal i(i.e., the index of the first word printed on the last lineof an optimal solution), we have c(j) = c(i1) + linecost(i,j). But since we do

  • 7/21/2019 Solutions of RDBMS

    35/64

    3Handout 21: Problem Set 5 Solutions

    not know what i is optimal, we need to consider every possible i, so our recursivedefinition of the optimal cost is

    c(j)

    =

    min

    {c(i

    1)

    +

    linecost(i,

    j)}.1ij

    To accommodate this recursive definition, we define c(0)=0.

    (c) Describe an efficient algorithm to compute the cost of an optimal solution.

    Solution: We calculate the values of an array for cfrom index 1 to n, which can bedone efficiently since each c(k)for 1 k

  • 7/21/2019 Solutions of RDBMS

    36/64

    4 Handout 21: Problem Set 5 Solutions

    (d) requires 5 parts: you should turn in the code you have written, and the output of your program

    on the two input samples using two values of M (the maximum number of characters per line),namely M =72and M =40, on each input sample.

    Sample 1 is fromA Capsule History of Typesetting by Brown, R.J. Sample 2 is from Out of Their

    Minds, by Shasha, Lazere. Remember that collaboration, as usual, is allowed to solve problems,

    but you must write your program by yourself.

    /* NOTE: This is an implementation of the O(nM) algorithm. */

    /* DISCLAIMER: No effort has been made to streamline memory */

    /* management or micro-optimize performance. */

    /* standard header files */

    #include

    #include

    /* arbitrary data size limits, so no dynamic allocation needed */

    #define WORD_NUM 1024 /* arbitrary max for number of input words */

    #define WORD_LENGTH 32 /* arbitrary max for length of input words */

    #define LINE_LENGTH 80 /* arbitrary max for length of output lines */

    /* macros */

    #define max(A, B) ((A) > (B) ? (A) : (B))

    /* global array of words */

    char words[WORD_NUM+1][WORD_LENGTH]; /* array for input words */

    int auxL[WORD_NUM+1]; /* auxillary array for computing lengths

    of lines - MM*/

    /* function prototypes */

    long linecost(int n, int M, int i, int j);

    long dynamic_typeset(int n, int M, int p[]);

    /* main expects two arguments: the input file name and M */

    int main (int argc, char *argv[]) {

    FILE *ifile; /* input file */

    int p[WORD_NUM]; /* array of how to get min costs */

    char lines[WORD_NUM+1][LINE_LENGTH]; /* buffer for output lines */

    int M; /* output line length */

  • 7/21/2019 Solutions of RDBMS

    37/64

    Handout 21: Problem Set 5 Solutions 5

    int n; /* number of input words */

    char read_word[WORD_LENGTH]; /* for use during reading */

    int i, j, k, l; /* aux vars used during construction of solution */

    /* verify arguments */

    if(argc != 3) /* verify number of arguments */

    exit(1);

    if(!(ifile = fopen(argv[1], "r"))) /* open input file */

    exit(2);

    if(!sscanf(argv[2], "%d", &M)) /* get length of output line */

    exit(3);

    /* read input words */

    n = 1;while(!feof(ifile)) {

    if(1 == fscanf(ifile, "%s", read_word)) { /* assumes input word fi

    strcpy(words[n++], read_word);

    if(n == WORD_NUM)

    break; /* no more room for words */

    }

    }

    n--;

    /*fill in auxillary array of word lengths */

    auxL[0] = 0;

    for(k = 1; k

  • 7/21/2019 Solutions of RDBMS

    38/64

    6 Handout 21: Problem Set 5 Solutions

    /* ... and construct next line */

    }

    while(j != 0); /* just finished first line */

    for(i = l; i > 0; i--) /* output lines in right order */

    printf("%d:[%d]\t%s\n", l-i+1, strlen(lines[i])-1, lines[i]);

    }

    /**** algorithmic part *****/

    /* returns min cost and a min solution in p[] */

    long dynamic_typeset(int n, int M, int p[]) {

    int i, j;

    /* need an extra space for c[0], so c is indexed from 1 to n, *//* instead of from 0 to n-1 (like p) */

    long c[WORD_NUM+1];

    c[0] = 0; /* base case */

    for(j = 1; j

  • 7/21/2019 Solutions of RDBMS

    39/64

    7Handout 21: Problem Set 5 Solutions

    }

    Solutions:

    sample1 72

    COST = 160

    1:[67] The first practical mechanized type casting machine was invented in

    2:[69] 1884 by Ottmar Mergenthaler. His invention was called the "Linotype".

    3:[72] It produced solid lines of text cast from rows of matrices. Each matrice

    4:[70] was a block of metal -- usually brass -- into which an impression of a

    5:[69] letter had been engraved or stamped. The line-composing operation was

    6:[72] done by means of a keyboard similar to a typewriter. A later development

    7:[64] in line composition was the "Teletypewriter". It was invented in

    8:[70] 1913. This machine could be attached directly to a Linotype or similar

    9:[66] machines to control composition by means of a perforated tape. The

    10:[70] tape was punched on a separate keyboard unit. A tape-reader translated

    11:[70] the punched code into electrical signals that could be sent by wire to

    12:[71] tape-punching units in many cities simultaneously. The first major news

    13:[56] event to make use of the Teletypewriter was World War I.

    sample1 40

    COST = 360

    1:[35] The first practical mechanized type

    2:[36] casting machine was invented in 1884

    3:[37] by Ottmar Mergenthaler. His invention

    4:[38] was called the "Linotype". It produced

    5:[37] solid lines of text cast from rows of

    6:[37] matrices. Each matrice was a block of

    7:[36] metal -- usually brass -- into which8:[34] an impression of a letter had been

    9:[39] engraved or stamped. The line-composing

    10:[32] operation was done by means of a

    11:[35] keyboard similar to a typewriter. A

    12:[37] later development in line composition

    13:[32] was the "Teletypewriter". It was

    14:[36] invented in 1913. This machine could

    15:[37] be attached directly to a Linotype or

    16:[39] similar machines to control composition

    17:[39] by means of a perforated tape. The tape

    18:[40] was punched on a separate keyboard unit.

    19:[36] A tape-reader translated the punched

    20:[39] code into electrical signals that could21:[38] be sent by wire to tape-punching units

    22:[40] in many cities simultaneously. The first

    23:[35] major news event to make use of the

    24:[31] Teletypewriter was World War I.

    sample2 72

    COST = 229

    1:[65] Throughout his life, Knuth had been intrigued by the mechanics of

    2:[70] printing and graphics. As a boy at Wisconsin summer camp in the 1940s,

    3:[71] he wrote a guide to plants and illustrated the flowers with a stylus on

  • 7/21/2019 Solutions of RDBMS

    40/64

    8 Handout 21: Problem Set 5 Solutions

    4:[69] the blue ditto paper that was commonly used in printing at that time.

    5:[71] In college, he recalls admiring the typeface used in his math texbooks.

    6:[71] But he was content to leave the mechanics of designing and setting type

    7:[72] to the experts. "I never thought I would have any control over printing.8:[71] Printing was done by typographers, hot lead, scary stuff. Then in 1977,

    9:[71] I learned about new printing machines that print characters made out of

    10:[69] zeros and ones, just bits, no lead. Suddenly, printing was a computer

    11:[71] science problem. I couldnt resist the challenge of developing computer

    12:[66] tools using the new technology with which to write my next books."

    13:[67] Knuth designed and implemented TeX, a computer language for digital

    14:[67] typography. He explored the field of typography with characteristic

    15:[68] thoroughness. For example, he wrote a paper called "The letter S" in

    16:[67] which he dissects the mathematical shape of that letter through the

    17:[67] ages, and explains his several day effort to find the equation that

    18:[33] yields the most pleasing outline.

    sample2 40

    COST = 413

    1:[35] Throughout his life, Knuth had been

    2:[38] intrigued by the mechanics of printing

    3:[35] and graphics. As a boy at Wisconsin

    4:[36] summer camp in the 1940s, he wrote a

    5:[35] guide to plants and illustrated the

    6:[39] flowers with a stylus on the blue ditto

    7:[40] paper that was commonly used in printing

    8:[36] at that time. In college, he recalls

    9:[38] admiring the typeface used in his math

    10:[37] texbooks. But he was content to leave

    11:[38] the mechanics of designing and setting

    12:[37] type to the experts. "I never thought

    13:[39] I would have any control over printing.

    14:[34] Printing was done by typographers,

    15:[36] hot lead, scary stuff. Then in 1977,

    16:[37] I learned about new printing machines

    17:[33] that print characters made out of

    18:[35] zeros and ones, just bits, no lead.

    19:[33] Suddenly, printing was a computer

    20:[38] science problem. I couldnt resist the

    21:[38] challenge of developing computer tools

    22:[38] using the new technology with which to

    23:[36] write my next books." Knuth designed

    24:[40] and implemented TeX, a computer language

    25:[39] for digital typography. He explored the26:[39] field of typography with characteristic

    27:[37] thoroughness. For example, he wrote a

    28:[39] paper called "The letter S" in which he

    29:[39] dissects the mathematical shape of that

    30:[37] letter through the ages, and explains

    31:[34] his several day effort to find the

    32:[38] equation that yields the most pleasing

    33:[8] outline

  • 7/21/2019 Solutions of RDBMS

    41/64

    9Handout 21: Problem Set 5 Solutions

    Here is what Sample 1 should look like when typeset with M = 50. Feel free to use this outputto debug your code.

    The first practical mechanized type casting

    machine was invented in 1884 by Ottmar

    Mergenthaler. His invention was called the

    "Linotype". It produced solid lines of text

    cast from rows of matrices. Each matrice was a

    block of metal -- usually brass -- into which

    an impression of a letter had been engraved or

    stamped. The line-composing operation was done

    by means of a keyboard similar to a typewriter.

    A later development in line composition was

    the "Teletypewriter". It was invented in1913. This machine could be attached directly

    to a Linotype or similar machines to control

    composition by means of a perforated tape. The

    tape was punched on a separate keyboard unit.

    A tape-reader translated the punched code into

    electrical signals that could be sent by wire to

    tape-punching units in many cities simultaneously.

    The first major news event to make use of the

    Teletypewriter was World War I.

    (e) Suppose now that the cost of a line is defined as the number of extra spaces. That is,when words ithroughjare put into a line, the cost of that line is

    if words ithroughjdo not fit into a line,linecost(i,j) = 0 ifj=n(i.e. last line),

    Mj+ij

    otherwise;k=ik

    and that the total cost is still the sum over all lines in the paragraph of the cost of each

    line. Describe an efficient algorithm that finds an optimal solution in this case.

    Solution: We use a straightforward greedy algorithm, which puts as many words as

    possible on each line before going to the next line. Such an algorithm runs in lineartime.

    Now we show that any optimal solution has the same cost as the solution obtained by

    this greedy algorithm. Consider some optimal solution. If this solution is the same as

    the greedy solution, then we are done. If it is different, then there is some line iwhichhas enough space left over for the first word of the next line. In this case, we move

    the first word of line i+ 1 to the end of line i. This does not change the total cost,since if the length of the word moved is l, then the reduction to the cost of line iwill

  • 7/21/2019 Solutions of RDBMS

    42/64

    10 Handout 21: Problem Set 5 Solutions

    be l+1, for the word and the space before it, and the increase of the cost of line i+1will also be l+1, for the word and the space after it. (If the moved word was the onlyword on line i+1, then by moving it to the previous line the total cost is reduced, acontradiction to the supposition that we have an optimal solution.) As long as there

    are lines with enough extra space, we can keep moving the first words of the next lines

    back without changing the total cost. When there are no longer any such lines, we will

    have changed our optimal solution into the greedy solution without affecting the total

    cost. Therefore, the greedy solution is an optimal solution.

    Problem 4-2. Manhattan Channel Routing

    A problem that arises during the design of integrated-circuit chips is to hook components together

    with wires. In this problem, well investigate a simple such problem.

    InManhattan routing, wires run on one of two layers of an integrated circuit: vertical wires runon layer 1, and horizontal wires run on layer 2. The height his the number of horizontal tracksused. Wherever a horizontal wire needs to be connected to a vertical wire, a via connects them.

    Figure 1 illustrates severalpins (electrical terminals) that are connected in this fashion. As can be

    seen in the figure, all wires run on an underlying grid, and all the pins are collinear.

    In our problem, the goal is to connect up a given set of pairs of pins using the minimum number

    of horizontal tracks. For example, the number of horizontal tracks used in the routing channel of

    Figure 1 is 3but fewer might be sufficient.

    Let L={(p1, q1), (p2, q2), . . . , (pn, qn)}be a list of pairs of pins, where no pin appears more thanonce. The problem is to find the fewest number of horizontal tracks to connect each pair. For exam

    ple, the routing problem corresponding to Figure 1 can be specified as the set {(1, 3), (2, 5), (4, 6), (8, 9)}.

    (a) What is the minimum number of horizontal tracks needed to solve the routing problem

    in Figure 1?

    Solution: You can verify that the wire connecting pins 4 and 6 could be at the same

    height as the wire connecting pins 1 and 3, making the number of horizontal track

    needed 2. Note that this is the minimum possible. Otherwise the wire connecting pins

    2 and 3 and the wire connecting 1 and 3 would be on the same track, violating the

    problem specifications.

    (b) Give an efficient algorithm to solve a given routing problem having n pairs of pins using the minimum possible number of horizontal tracks. As always, argue correctness

    (your algorithm indeed minimizes the number of horizontal tracks), and analyze the

    running time.

    Algorithm description

    The following algorithm routes pin pairs greedily into available horizontal tracks in

    order of the smaller pin of a pair.

  • 7/21/2019 Solutions of RDBMS

    43/64

    11Handout 21: Problem Set 5 Solutions

    1 6 7 83 542

    h=3

    9

    Figure 1: Pins are shown as circles. Vertical wires are shown as solid. Horizontal wires are dashed.

    Vias are shown as squares.

    1. Go through L and if qi > pi swap them. For each pair, we call the smaller pin,start, and the larger, end. Sort the start and end pins of the pairs in an increasing

    order. The resulting list contains 2n values.

    2. Place all available horizontal tracks in a stack S.

    3. Go through the list in sorted order.

    If the current pin is a start pin, pop the first available horizontal track from S

    and route it in that horizontal track. If S is empty, then it is not possible toroute all of the pin pairs using the given number of horizontal tracks. Report

    an error in this case.

    If the current pin is an end pin, look up in which horizontal track it has beenrouted and push that horizontal track back onto S.

    Correctness

    Suppose that the algorithm terminates with a routing requiring m horizontal tracks.Let k denote the first pair of pins routed in the mth horizontal track. Let sk denotethe start pin and fk denote the end pin of this pair. Let fl be the earliest finish pin

    appearing in the sorted list after sk. Necessarily, fl > sk. The closest routing in eachof the m1horizontal tracks already in use starts before sk. Each routing terminatesafter fl. Thus there are m routings in between [sk, fl], i.e., any routing must use atleast m vertical tracks. Thus the routing returned by the algorithm is optimal.

    The above argument shows that given an infinite supply of horizontal tracks, our algo

    rithm will always produce a routing that uses the fewest number of horizontal tracks.

    Thus, if the algorithm terminates with an error, it means that a given number of hor

    izontal tracks is less than the number of horizontal tracks in an optimal routing, and

  • 7/21/2019 Solutions of RDBMS

    44/64

    12 Handout 21: Problem Set 5 Solutions

    hence it is impossible to route all the pin pairs.

    Analysis

    This algorithm runs in O(n

    lg

    n)

    because it is necessary to sort 2n

    items (which canbe accomplished using heapsort or mergesort). Notice that scanning through the list

    and assigning horizontal tracks takes O(1)time per connection, for a total of O(2n)=O(n)time.

    This is also known as the interval-graph coloring problem. We can create an interval

    graph whose vertices are the given pairs of pins and whose edges connect incompatible

    pairs of pins. The smallest number of colors required to color every vertex so that

    no two adjacent vertices are given the same color corresponds to finding the fewest

    number of horizontal tracks needed to connect all of the pairs of pins.)

  • 7/21/2019 Solutions of RDBMS

    45/64

    Introduction to Algorithms November 15, 2004

    Massachusetts Institute of Technology 6.046J/18.410J

    Professors Piotr Indyk and Charles E. Leiserson Handout 26

    Problem Set 6 Solutions

    Reading: Chapters 22, 24, and 25.

    Both exercises and problems should be solved, but only the problems should be turned in.Exercises are intended to help you master the course material. Even though you should not turn in

    the exercise solutions, you are responsible for material covered in the exercises.

    Mark the top of each sheet with your name, the course number, the problem number, your

    recitation section, the date and the names of any students with whom you collaborated.

    You will often be called upon to give an algorithm to solve a certain problem. Your write-up

    should take the form of a short essay. A topic paragraph should summarize the problem you are

    solving and what your results are. The body of the essay should provide the following:

    1. A description of the algorithm in English and, if helpful, pseudo-code.

    2. At least one worked example or diagram to show more precisely how your algorithm works.

    3. A proof (or indication) of the correctness of the algorithm.

    4. An analysis of the running time of the algorithm.

    Remember, your goal is to communicate. Full credit will be given only to correct algorithms

    which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.

    Exercise 6-1. Do Exercise 22.2-5 on page 539 in CLRS.

    Exercise 6-2. Do Exercise 22.4-3 on page 552 in CLRS.

    Exercise 6-3. Do Exercise 22.5-7 on page 557 in CLRS.

    Exercise 6-4. Do Exercise 24.1-3 on page 591 in CLRS.

    Exercise 6-5. Do Exercise 24.3-2 on page 600 in CLRS.

    Exercise 6-6. Do Exercise 24.4-8 on page 606 in CLRS.

    Exercise 6-7. Do Exercise 25.2-6 on page 635 in CLRS.

    Exercise 6-8. Do Exercise 25.3-5 on page 640 in CLRS.

  • 7/21/2019 Solutions of RDBMS

    46/64

    2 Handout 26: Problem Set 6 Solutions

    Problem 6-1. Truckin

    Professor Almanac is consulting for a trucking company. Highways are modeled as a directed

    graph G = (V, E)in which vertices represent cities and edges represent roads. The company isplanning new routes from San Diego (vertex s) to Toledo (vertex t).

    (a) It is very costly when a shipment is delayed en route. The company has calculated the

    probabilityp(e)

    [0, 1] that a given road e

    Ewill close without warning. Give anefficient algorithm for finding a route with the minimum probability of encountering

    a closed road. You should assume that all road closings are independent.

    Solution:

    To simplify the solution, we use the probability q(e)= 1p(e)that a road will beopen. Further, we remove from the graph roads withp(e)= 1, as they are guaranteedto be closed and will never be included in a meaningful solution. (Following this

    transformation, we can use depth first search to ensure that some path from s to t

    has a positive probability of being open.) By eliminatingp(e) = 1, we now have

    0< q(e)

    1for all edges e

    E. It is important to have eliminated the possibility ofq(e)= 0, because we will be taking the logarithm of this quantity later.

    Because the road closings are independent, the probability that a given path will be

    open is the product of the probabilities of the edges being open. That is, for each path

    r =e1, e2, . . . , en, the probability Q(r) of the path being open is:

    Q(r) =n

    q(ei)i=1

    Our goal is to find the path r, beginning at s and ending at t, that maximizes Q(r).Taking the negative logarithm of both sides yields:

    lgQ(r) = lgn

    q(ei)i=1

    =n

    lg q(ei)

    i=1

    Using w(ei) to denote the quantity lg q(ei), this becomes:

    lgQ(r) =n

    w(ei)i=1

    The right hand side is a sum of edge weights w(e)along the path from s to t. Wecan minimize this quantity using Dijkstras algorithm for single-source shortest paths.

    Doing so will yield the path r that minimizes lgQ(r), thereby maximizing Q(r).

    This path will have the maximum probability of being open, and thus the minimum

    probability of being closed.

  • 7/21/2019 Solutions of RDBMS

    47/64

    3Handout 26: Problem Set 6 Solutions

    The running time is O(E+V lg V). We only spend (E) to remove edges withp(e) =1and to update the edge weights; Dijkstras algorithm dominates with a runtime ofO(E+V lgV).

    Alternate Solution:

    It is also possible to modify Dijkstras algorithm to directly compute the path with thehighest probability of being open. As above, let q(e) = 1p(e) denote the probabilitythat a given road e

    Ewill be open, and let Q(r) denote the probability that a givenpath r will be open. For a given vertex v, let o[v] denote the maximum value of Q(r)over all paths r from s to v. Then, make the following modifications to Dijkstras

    algorithm:

    1. Change INITIALIZE-S INGLE-S OURCE to assign o[s]= 1for the source vertexand o[v]= 0for all other vertices:

    INITIALIZE-S INGLE-S OURCE (G, s)1 for each vertex v V[G]2 do o[v]

    03 [v]NIL4 o[s]1

    That is, we can reach the source vertex with probability 1, and the probability ofreaching all other vertices will increase monotonically from 0using RELAX.

    2. Instead of EXTRACT-M IN, use EXTRACT-MAX (and a supporting data structure)

    to see which vertex to visit. That is, first explore paths with the highest probability

    of being open.

    3. Rewrite the RELAX step as follows:

    RELAX(u,v,q)1 if o[v]< o[u] q(e) where e = (u, v)2 then o[v]o[u] q(e)3 [v]u

    That is, if a vertex v can be reached with a higher probability than before along

    the edge under consideration, then increase the probability o[v]. Because theprobabilities of roads being open are independent, the probability of a path being

    open is the product of the probabilities of each edge being open.

    The argument for correctness parallels that of Dijkstras algorithm, as presented in

    lecture. The proof relies on the following properties:

    1. Optimal substructure. A sub-path of a path with the highest probability of being

    open must also be a path with the highest probability of being open. Otherwise we

    could increase the overall probability of being open by increasing the probability

    along this sub-path (cut-and-paste).

  • 7/21/2019 Solutions of RDBMS

    48/64

    4 Handout 26: Problem Set 6 Solutions

    2. Triangle inequality. Let (u, v)denote the highest probability of a path from uto v being open. Then for all u,v,x V, we have (u, v) (u, x)(x, v).Otherwise (u, v) could be increased if we chose the path through x.

    3. Well-definedness of shortest paths. Since q(e) [0, 1], the probability of a pathbeing open can only decrease as extra edges are added to a path. Since we are

    exploring the path with the highest probability of being open, this ensures that

    there are no analogues of negative-weight cycles.

    Properties (1) and (2) are true whenever an associative operator is used to combine

    edge weights into a path weight. In Dijkstras shortest path algorithm, the operator is

    addition; here it is multiplication.

    The running time is O(E +V lg V). We spend (E)to calculate q(e)= 1p(e),and Dijkstras algorithm runs in O(E +V lg V).

    (b) Many highways are off-limits for trucks that weigh more than a given threshold. For a

    given highway e

    E, let w(e) + denote the weight limit and let l(e) + denotethe highways length. Give an efficient algorithm that calculates: 1) the heaviest truck

    that can be sent from s to t, and 2) the shortest path this truck can take.

    Solution:

    First, we modify Dijkstras algorithm to find the heaviest truck that can be sent from

    s to t. The weight limit w(e) is used as the edge weight for e. There are three modifications to the algorithm:

    1. In INITIALIZE-S INGLE-SOURCE, assign a value of

    to the source vertex and a

    value of 0to all other vertices.

    2. Instead of EXTRACT-M IN, use EXTRACT-MAX (and a supporting data structure)

    to see which vertex to visit. That is, first explore those paths which support the

    heaviest trucks.3. In the RELAX step, use minin place of addition. That is, maintain the minimum

    weight limit encountered on a given path instead of the total path length from the

    source.

    As in Part (a), the proof of correctness follows that of Dijkstras algorithm. Since the

    minoperator is associative, the optimal paths exhibit optimal substructure and supportthe triangle inequality. There are no analogues of negative-weight cycles because the

    weight supported by a path can only decrease as the path becomes longer (and we are

    searching for the heaviest weight possible).

    Given the weight of the heaviest truck that can pass from s to t, we can find the shortest

    path as follows. Simply remove all edges from the graph that are less than the weightof the heaviest truck. Then, run Dijkstras algorithm (unmodified) to find the shortest

    path.

    The overall runtime of our algorithms is that of Dijkstras algorithm: O(E +V lg V).

  • 7/21/2019 Solutions of RDBMS

    49/64

    5Handout 26: Problem Set 6 Solutions

    (c) Consider a variant of (b) in which trucks must make strictly eastward progress with

    each city they visit. Adjust your algorithm to exploit this property and analyze the

    runtime.

    Solution:

    Remove from the graph any edges that do not make eastward progress. Because

    we always go eastward, there are no cycles in this graph. Thus, we can use DAG-

    SHORTEST-PATHS to solve the problem in (V +E) time. We need only modify theINITIALIZE-SINGLE-SOURCE and RELAX procedures as in (b).

    Problem 6-2. Constructing Construction Schedules

    Consider a set of n jobs to be completed during the construction of a new office building. For

    each i {1, 2, . . . , n}, aschedule assigns a time xi 0for job i to be started. There are someconstraints on the schedule:

    1. For each i, j {1, 2, . . . , n}, we denote by A[i, j]

    theminimum latency from the startof job i to the start of jobj. For example, since it takes a day for concrete to dry, construction

    of the walls must begin at least one day after pouring the foundation. The constraint on the

    schedule is:

    i, j {1, 2, . . . , n} : xi +A[i, j] xj (1)

    If there is no minimum latency between jobs i andj, then A[i, j] =.

    2. For each i, j {1, 2, . . . , n}, we denote by B[i, j] themaximum latency from the startof job i to the start of jobj. For example, weatherproofing must be added no later than one

    week after an exterior wall is erected. The constraint on the schedule is:

    i, j {1, 2, . . . , n} : xi +B[i, j] xj (2)

    If there is no maximum latency between jobs i andj, then B[i, j] =.

    (a) Show how to model the latency constraints as a set of linear difference equations. That

    is, given A[1 . .n, 1 . . n]and B[1 . .n, 1 . . n], construct a matrix C[1 . .n, 1 . . n]suchthat the following constraints are equivalent to Equations (1) and (2):

    i, j {1, 2, . . . n} : xi xj C[i, j] (3)

    Solution:

    Re-arranging Equation (1) yields:

    i, j {1, 2, . . . , n} : xi xj A[i, j] (4)

  • 7/21/2019 Solutions of RDBMS

    50/64

    6 Handout 26: Problem Set 6 Solutions

    Re-arranging Equation (2) yields:

    i, j {1, 2, . . . , n} : xi xj B[i, j]

    i, j {1, 2, . . . , n} : xj xi B[i, j]

    i, j {1, 2, . . . , n} : xi xj B[j, i] (5)

    Equations (4) and (5) are equivalent to Equation (3) if we set:

    i, j {1, 2, . . . , n} : C[i, j]=min(A[i, j], B[j, i])

    (b) Show that the Bellman-Ford algorithm, when run on the constraint graph correspond

    ing to Equation (3), minimizes the quantity (max{xi} min{xi}) subject to Equation(3) and the constraint xi 0for all xi.

    Solution:

    wx

    Recall that the Bellman-Ford algorithm operates on a graph in which each constraint

    j xi

    C[i, j]is translated to an edge from vertex vi to vertex vj with weightij =C[i, j]. Also, an extra vertex s is added with a 0-weight edge from s to each

    vertex v

    V. If Bellman-Ford detects a negative-weight cycle in this graph, then

    the constraints are unsatisfiable. We thus focus on the case in which there are no

    negative-weight cycles. The proof takes the form of a lemma and a theorem.

    Lemma 1 When Bellman-Ford is run on the constraint graph, max{xi} = 0.

    Proof. Letp =s, v1, . . . , vk be the shortest path from vertex s to vertex vk as reported by Bellman-Ford when run over the constraint graph. By the optimal substruc

    ture of shortest paths, s, v1 must be the shortest path from s to v1. By construction,the edge from s to v1 has a weight of 0. Thus x1 =(s, v1) =w(s, v1)= 0. In

    combination with the constraint xi 0for all xi, this implies that maxi xi = 0.

    Theorem 2 When Bellman-Ford is run on the constraint graph, it minimizes the quan

    tity (max{xi} min{xi})

    Proof. Since max{xi}= 0(by Lemma 1), it suffices to show that Bellman-Fordmaximizes min{xi}. Let xk =min{xi} in the solution produced by Bellman-Ford,and consider the shortest pathp =v0, v1, . . . , vk from s =v0 to vk. The weight ofk1 k1this path is w(p) = i=0 wi(i+1) =w(v0, v1)+

    k1C[i, i+1]= C[i, i+1].i=1 i=1The path corresponds to the following set of constraints:

    x1 x0 0

    x2 x1

    C[1, 2]

    x3 x2 C[2, 3]

    x

    . . .

    k xk1 C[k 1, k]

  • 7/21/2019 Solutions of RDBMS

    51/64

    7Handout 26: