Solutions of RDBMS
-
Upload
akshay-mehta -
Category
Documents
-
view
234 -
download
0
Transcript of Solutions of RDBMS
-
7/21/2019 Solutions of RDBMS
1/64
Introduction to Algorithms September 24, 2004
Massachusetts Institute of Technology 6.046J/18.410J
Professors Piotr Indyk and Charles E. Leiserson Handout 7
Problem Set 1 Solutions
Exercise 1-1. Do Exercise 2.3-7 on page 37 in CLRS.
Solution:
The following algorithm solves the problem:
1.Sort the elements in Susing mergesort.
2.Remove the last element from S. Let y be the value of the removed element.
3.If S is nonempty, look for z=x y in Susing binary search.4.If Scontains such an element z, then STOP, since we have found y and zsuch that x=y +z.
Otherwise, repeat Step 2.
5.If S is empty, then no two elements in Ssum to x.
Notice that when we consider an element yi of Sduring ith iteration, we dont need to look at theelements that have already been considered in previous iterations. Suppose there exists yj S,such that x=yi+yj. Ifj < i, i.e. if yj has been reached prior to yi, then we would have found yi
when we were searching for x
yj
during jth iteration and the algorithm would have terminatedthen.
Step 1 takes (n lgn)time. Step 2 takes O(1)time. Step 3 requires at most lgn time. Steps 24are repeated at most n times. Thus, the total running time of this algorithm is (n lgn). We can doa more precise analysis if we notice that Step 3 actually requires (lg(n i))time at ith iteration.However, if we evaluate
n1lg(ni), we get lg(n1)!, which is (n lgn). So the total runningi=1time is still (n lgn).
Exercise 1-2. Do Exercise 3.1-3 on page 50 in CLRS.
Exercise 1-3. Do Exercise 3.2-6 on page 57 in CLRS.
Exercise 1-4. Do Problem 3-2 on page 58 of CLRS.
Problem 1-1. Properties of Asymptotic Notation
Prove or disprove each of the following properties related to asymptotic notation. In each of the
following assume that f, g, and h are asymptotically nonnegative functions.
-
7/21/2019 Solutions of RDBMS
2/64
2 Handout 7: Problem Set 1 Solutions
(a) f(n)=O(g(n))and g(n)=O(f(n))implies that f(n)=(g(n)).
Solution:
This Statement is True.
Since f(n)=O(g(n)), then there exists an n0and a c such that for all nn0, f(n)Similarly, since g(n) =O(f(n)), there exists an n0 and a c such that for allcg(n).
f(n). Therefore, for all nmax(n0, nHence, f(n)=(g(n)).
( )g n,0 ),0 c1g(n)f(n)cg(n).nn c
0
(b) f(n)+g(n)=(max(f(n), g(n))).
Solution:
This Statement is True.
For all n1, f(n)max(f(n), g(n))and g(n)max(f(n), g(n)). Therefore:
f(n)+g(n)max(f(n), g(n))+max(f(n), g(n))2max(f(n), g(n))
and so f(n)+g(n) =O(max(f(n), g(n))). Additionally, for each n, either f(n)max(f(n), g(n))or else g(n)max(f(n), g(n)). Therefore, for all n1, f(n)+g(n)max(f(n), g(n))and so f(n)+g(n)=(max(f(n), g(n))). Thus, f(n)+g(n)=(max(f(n), g(n))).
(c) Transitivity: f(n)
=
O(g(n))
and g(n)
=
O(h(n))
implies that f(n)
=
O(h(n)).
Solution:
This Statement is True.
Since f(n) = O(g(n)), then there exists an n0 and a c such that for all n n0, ) f( )n,0 ( )g n,0
f(n)cg(n). Similarly, since g(n)=O(h(n)), there exists an nh(n). Therefore, for all nmax(n0, n
and a c such thatfor all nnHence, f(n)=O(h(n)).
cch(n).c
(d) f(n)=O(g(n))implies that h(f(n))=O(h(g(n)).
Solution:
This Statement is False.
We disprove this statement by giving a counter-example. Let f(n)=n and g(n)=3nand h(n) = 2n. Then h(f(n)) = 2n and h(g(n))=8n. Since 2n is not O(8n), thischoice of f, gand his a counter-example which disproves the theorem.
-
7/21/2019 Solutions of RDBMS
3/64
3Handout 7: Problem Set 1 Solutions
(e) f(n) +o(f(n)) = (f(n)).
Solution:
This Statement is True.
Let h(n) =o(f(n)). We prove that f(n) +o(f(n))=(f(n)). Since for all n1,f(n) +h(n)f(n), then f(n) +h(n) = (f(n)).Since h(n) =o(f(n)), then there exists an n0 such that for all n>n0, h(n)f(n).Therefore, for all n > n0, f(n) +h(n) 2f(n)and so f(n) +h(n) = O(f(n)).Thus, f(n) +h(n) = (f(n)).
(f) f(n) =o(g(n)) and g(n) =o(f(n)) implies f(n) = (g(n)).
Solution:This Statement is False.
We disprove this statement by giving a counter-example. Consider f(n) = 1+cos(n) and g(n) = 1 cos(n).For all even values of n, f(n) = 2and g(n) = 0, and there does not exist a c1 forwhich f(n)c1g(n). Thus, f(n) is not o(g(n)), because if there does not exist a c1for which f(n)c1g(n), then it cannot be the case that for any c1 >0 and sufficientlylarge n, f(n)
0 for which c2g(n)
f
(n), because we could set c
= 1/c2 if sucha c2 existed.
We have shown that there do not exist constants c1 >0 and c2 >0 such that c2g(n)f(n)c1g(n). Thus, f(n) is not (g(n)).
Problem 1-2. Computing Fibonacci Numbers
The Fibonacci numbers are defined on page 56 of CLRS as
F0 = 0,
F1 = 1,
Fn = Fn1 +Fn2 for n2.
In Exercise 1-3, of this problem set, you showed that the nth Fibonacci number is
Fn =nn
,5
where is the golden ratio and is its conjugate.
-
7/21/2019 Solutions of RDBMS
4/64
4 Handout 7: Problem Set 1 Solutions
A fellow 6.046 student comes to you with the following simple recursive algorithm for computing
the nth Fibonacci number.
FIB(n)
1 if n= 02 then return 03 elseif n= 14 then return 15 return FIB(n 1) + FIB(n 2)
This algorithm is correct, since it directly implements the definition of the Fibonacci numbers.
Lets analyze its running time. Let T(n) be the worst-case running time of F IB(n).1
(a) Give a recurrence for T(n), and use the substitution method to show that T(n) =O(Fn).
Solution: The recurrence is: T(n) =T(n 1) +T(n 2) + 1.We use the substitution method, inducting on n. Our Induction Hypothesis is: T(n)cFn
b.To prove the inductive step:
T(n) cFn1
+cFn2
b b+ 1 cFn 2b+ 1
Therefore, T(n)
cFn
b+ 1 provided that b
1. We choose b= 2 and c=10.
For the base case consider n
0,
1} and note the running time is no more than
{10 2 = 8.
(b) Similarly, show that T(n) = (Fn), and hence, that T(n) = (Fn).
Solution: Again the recurrence is: T(n) =T(n 1) +T(n 2) + 1.We use the substitution method, inducting on n. Our Induction Hypothesis is: T(n)Fn.
To prove the inductive step:
T
(n)
Fn1 +Fn2 + 1
+ 1 FnTherefore, T(n) Fn. For the base case consider n {0,1} and note the runningtime is no less than 1.
1In this problem, please assume that all operations take unit time. In reality, the time it takes to add two num-
bers depends on the number of bits in the numbers being added (more precisely, on the number of memory words).
However, for the purpose of this problem, the approximation of unit time addition will suffice.
-
7/21/2019 Solutions of RDBMS
5/64
5Handout 7: Problem Set 1 Solutions
Professor Grigori Potemkin has recently published an improved algorithm for computing the nthFibonacci number which uses a cleverly constructed loop to get rid of one of the recursive calls.
Professor Potemkin has staked his reputation on this new algorithm, and his tenure committee has
asked you to review his algorithm.
FIB (n)1 if n= 02 then return 03 elseif n= 14 then return 15
6
7
8
sum 1for k1 to n2
do sum sum + FIB (k)return sum
Since it is not at all clear that this algorithm actually computes the nth Fibonacci number, letsprove that the algorithm is correct. Well prove this by induction over n, using a loop invariant inthe inductive step of the proof.
(c) State the induction hypothesis and the base case of your correctness proof.
Solution: To prove the algorithm is correct, we are inducting on n. Our inductionhypothesis is that for all n
-
7/21/2019 Solutions of RDBMS
6/64
6 Handout 7: Problem Set 1 Solutions
(e) Use your loop invariant to complete the inductive step of your correctness proof.
Solution: To complete the inductive step of our correctness proof, we must show that
if F ib(n) returns Fn for all n
-
7/21/2019 Solutions of RDBMS
7/64
7Handout 7: Problem Set 1 Solutions
Solution:
We can use this idea to recursively multiply polynomials of degree n1, where nisa power of 2, as follows:Letp(x)and q(x)be polynomials of degree n1, and divide each into the upper n/2and lower n/2terms:
p(x) = a(x)xn/2 +b(x),
q(x) = c(x)xn/2 +d(x),
where a(x), b(x), c(x), and d(x)are polynomials of degree n/21. The polynomialproduct is then
p(x)q(x) = (a(x)xn/2 +b(x))(c(x)xn/2 +d(x))
=
a(x)c(x)x
n
+
(a(x)d(x)
+
b(x)c(x))xn/2
+
b(x)d(x)
.
The four polynomial products a(x)c(x), a(x)d(x), b(x)c(x), and b(x)d(x)are computed recursively.
(b) Give and solve a recurrence for the worst-case running time of your algorithm.
Solution:
Since we can perform the dividing and combining of polynomials in time (n), recursive polynomial multiplication gives us a running time of
T
(n) = 4T
(n/2)
+
(n)
= (n2).
(c) Show how to multiply two linear polynomials A(x)=a1x+a0 and B(x)=b1x+b0using only three coefficient multiplications.
Solution:
We can use the following 3 multiplications:
m1 = (a
+
b)(c
+
d)
=
ac
+
ad
+
bc
+
bd
,
m2 = ac,
m3
= bd,
so the polynomial product is
(ax+b)(cx+d)=m2x2 +(m1m2m3)x+m3 .
-
7/21/2019 Solutions of RDBMS
8/64
8 Handout 7: Problem Set 1 Solutions
(d) Give a divide-and-conquer algorithm for multiplying two polynomials of degree-bound nbased on your formula from part (c).
Solution:
The algorithm is the same as in part (a), except for the fact that we need only compute
three products of polynomials of degree n/2to get the polynomial product.
(e) Give and solve a recurrence for the worst-case running time of your algorithm.
Solution:
Similar to part (b):
T
(n) = 3T
(n/2)
+
(n)
lg3)= (n
(n1.585)
Alternative solution Instead of breaking a polynomial p(x)into two smaller polynomials a(x)and b(x)such that p(x) =a(x)+xn/2b(x), as we did above, we coulddo the following:
Collect all the even powers of p(x)and substitute y = x2 to create the polynomiala(y). Then collect all the odd powers of p(x), factor out xand substitute y = x2 to
create the second polynomial b(y). Then we can see that
p(x)=a(y)+x b(y)
Both a(y)and b(y)are polynomials of (roughly) half the original size and degree, andwe can proceed with our multiplications in a way analogous to what was done above.
Notice that, at each level k, we need to compute yk
= y2 (where y0 = x), whichk1takes time (1)per level and does not affect the asymptotic running time.
-
7/21/2019 Solutions of RDBMS
9/64
Introduction to Algorithms October 1, 2004
Massachusetts Institute of Technology 6.046J/18.410J
Professors Piotr Indyk and Charles E. Leiserson Handout 9
Problem Set 2 Solutions
Reading: Chapters 5-9, excluding 5.4 and 8.4
Both exercises and problems should be solved, but only the problems should be turned in.
Exercises are intended to help you master the course material. Even though you should not turn in
the exercise solutions, you are responsible for material covered in the exercises.
Mark the top of each sheet with your name, the course number, the problem number, your
recitation section, the date and the names of any students with whom you collaborated.
You will often be called upon to give an algorithm to solve a certain problem. Your write-up
should take the form of a short essay. A topic paragraph should summarize the problem you are
solving and what your results are. The body of the essay should provide the following:
1. A description of the algorithm in English and, if helpful, pseudo-code.
2. At least one worked example or diagram to show more precisely how your algorithm works.
3. A proof (or indication) of the correctness of the algorithm.
4. An analysis of the running time of the algorithm.
Remember, your goal is to communicate. Full credit will be given only to correct algorithms
which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.
Exercise 2-1. Do Exercise 5.2-4 on page 98 in CLRS.
Exercise 2-2. Do Exercise 8.2-3 on page 170 in CLRS.
Problem 2-1. Randomized Evaluation of Polynomials
In this problem, we consider testing the equivalence of two polynomials in a finite field.
Afield is a set of elements for which there are addition and multiplication operations that satisfycommutativity, associativity, and distributivity. Each element in a field must have an additive and
multiplicative identity, as well as an additive and multiplicative inverse. Examples of fields include
the real and rational numbers.
Afinite field has a finite number of elements. In this problem, we consider the field of integers
modulop. That is, we consider two integers a and b to be equal if and only if they have the sameremainder when divided byp, in which case we write abmodp. This field, which we denote asZ/p, haspelements, {0, . . . , p 1}.
-
7/21/2019 Solutions of RDBMS
10/64
2 Handout 9: Problem Set 2 Solutions
Consider a polynomial in the field Z/p:
n
a(x) =
aix
i modp
(1)i=0
A root orzero of a polynomial is a value of x for which a(x) = 0. The following theorem describesthe number of zeros for a polynomial of degree n.
Theorem 1 A polynomial a(x) of degree nhas at most ndistinct zeros.
Polly the Parrot is a very smart bird that likes to play math games. Today, Polly is thinking of a
polynomial a(x) over the field Z/p. Though Polly will not tell you the coefficients of a(x), shewill happily evaluate a(x) for any x of your choosing. She challenges you to figure out whether or
not a
is equivalent to zero (that is, whether x
{0, . . . , p 1}:
a(x) 0 modp).
Throughout this problem, assume that a(x) has degree n, where n
-
7/21/2019 Solutions of RDBMS
11/64
3Handout 9: Problem Set 2 Solutions
The problem thus becomes: if a is not equivalent to zero, choose k such that theprobability that all kqueries evaluate to zero is no more than 1%. Let denote the
margin of error in the general case (
=
1%
in this part), and let Qi be a randomvariable indicating the result of the ith query. The constraint is as follows:
Pr[Q1 =0and Q2 =0and ... and Qk =0]
= Pr[Q1 =0]Pr[Q2 =0]...Pr[Qk =0]
(n/p)k
The first step follows from the fact that all of the queries are independent. The second
step utilizes the bound from Part (a). Solving for k, we have:
(n/p)k
klg(n/p) lg
k lg/lg(n/p)
The last step above utilizes the assumption that n
-
7/21/2019 Solutions of RDBMS
12/64
4 Handout 9: Problem Set 2 Solutions
Returns whether a(x)b(x) c(x)modp x {0,...,p 1}
Correct with probability at least 1
EQUIV(a[0
.
.
.
n],
b[0
.
.
.
n/2],
c[0
.
.
.
n/2],
p,
)
1 k lg/lg(n/p)2 for i1to k3 do x RANDOM(0,q 1)4 a a(x)5 b b(x)6 c c(x)7 if a b c 0 (modp)8 then return false
9 return true
Correctness. For a given value ofx
,a(x) =
b(x)
c(x)
if and only ifa(x)
b(x)
c(x) = 0. Thus, a(x)
is equivalent to b(x)
c(x)
if and only if a(x)
b(x)
c(x)
is
equivalent to zero. Our solution to Part (b) shows how to determine with probability
at least 1 whether or not a given polynomial is equivalent to zero. Using this sameprocedure, we test whether or not a(x)b(x)c(x) is equivalent to zero, therebydetermining whether or not a(x)is equivalent to b(x) c(x).
Running time. We count the number of arithmetic operations in EQUIV. In this class,
we assume that steps 1 and 7 are (1), as they are arithmetic operations on scalarvalues; also, we assume that the call to RANDOM on line 3 is (1). Each polynomialevaluation on lines 4-6 is (n), as an n-degree polynomial can be evaluated in (n)time using Horners rule. The loop runs
k
=
lg
/
lg(n/p)times, performing
(n)
work on each iteration. The total runtime is thus T
(n)
=
(nlg
/
lg(n/p)).
Consider Potemkins proposal, with a runtime P(n) = (nlg2(3)), and let us evaluate the conditions under which T(n) = O(P(n)). Note that, in the runtime of ouralgorithm, we cannot considerpto be fixed as ngrows, since we require that n 1and is a fixed constant. Thenlg/lg(n/p) = lg/lg(1/c) = (1). Thus, the loop executes (1)times andthe algorithm is has a running time of (n), which is asymptotically faster thanPotemkins proposal.
On the other hand, if p = n+1 while remains fixed, then lg/lg(n/p) =lg/lg(n/(n+1)) =lg/(lgn lg(n+1))). Intuitively, one can see that this
-
7/21/2019 Solutions of RDBMS
13/64
5Handout 9: Problem Set 2 Solutions
dis (n)becausedn
(lgn) = 1/n, which means that lg(n+1)lgn 1/n andlg/(lgnlg(n+1)))lg/(1/n)=(n). Thus, the loop executes (n)times and the algorithm has a running time of (n2), which is asymptotically slowerthan Potemkins proposal.
We can prove more rigorously that lg/(lgnlg(n+1)))=(n)by appealing tothe following identity [CLRS, p.53]:
n1lim 1+ =en n
Then, using the definition of limit, there exist positive constants c1, c2, and n0 suchthat for all n>n0:
n1
c1e 1+ c
2e
n1
ln(c1e)nln 1+ ln(c2e) [take natural log]n
n+1ln(c1e)nln ln(c2e) [simplify]
nln(c1e)n(ln(n+1)ln(n))ln(c2e) [simplify]
ln(c1e)/nln(n+1)ln(n)ln(c2e)/n [divide by n]
n/ln(c1e)1/(ln(n+1)ln(n))n/ln(c2e) [take inverse]
n/ln(c1e)1/(ln(n)ln(n+1))n/ln(c2e) [simplify]
Thus, 1/(ln(n)
ln(n
+
1))
=
(n)
because it is bounded above and below bya constant factor times n. By adjusting the constants, this implies that lg/(lgnlg(n+1)))=(n).
Finally, we point out the desirable property that the algorithm is logarithmic in forfixed values of pand n. Decreasing the error margin by a given factor results in onlyan additive increase in the runtime.
Problem 2-2. Distributed Median
Alice has an array A[1..n], and Bob has an array B[1..n]. All elements in A and B are distinct.
Alice and Bob are interested in finding the median element of their combined arrays. That is, theywant to determine which element msatisfies the following property:
|{i[1,n] :A[i]m}|+|{i[1,n] :B[i]m}|=n (3)
This equation says that there are a total of nelements in both A and B that are less than or equal tom. Note that mmight be drawn from either A or B.
Because Alice and Bob live in different cities, they have limited communication bandwidth. They
can send each other one integer at a time, where the value either falls within {0,...,n}or is drawn
-
7/21/2019 Solutions of RDBMS
14/64
6 Handout 9: Problem Set 2 Solutions
from the original A or B arrays. Each numeric transmission counts as a communication between
Alice and Bob. One goal of this problem is to minimize the number of communications needed to
compute the median.
(a) Give a deterministic algorithm for computing the combined median of A and B. Your
algorithm should run in O(n logn) time and use O(logn)communications. (Hint:consider sorting.)
Solution:
The algorithm works as follows. Alice and Bob begin by sorting their arrays using
a deterministic (n log
n)
algorithm such as HeapSort or MergeSort. Then, Alice
assumes the role of the master and Bob the role of the slave. Alice considers an
element A[i]and sends n i to Bob, who returns two elements: B[n i]and B[n i+1]. Because A is sorted, A[i] is the combined median if and only if there areexactly n i elements in B that are less than A[i]. Because B is sorted, this conditionis reduced to checking whether or not B[ni]
-
7/21/2019 Solutions of RDBMS
15/64
7Handout 9: Problem Set 2 Solutions
ALICE(A[1. . . n])1 HEAPSORT(A)
2 median
MASTER(A)
3 if median =NIL
BOB(B[1. . . n])1 HEAPSORT(B)
2 median
SLAVE(B)
3 if median =NIL4 then median SLAVE(A) 4 then median MASTER(B)5 return median 5 return median
MASTER(M[1. . . n])1 lower 12 upper n3 median NIL4 while lower upper and median =NIL SLAVE(S[1. . . n])5 do i lower+(upper lower)/2 1 while true
6 send n
i
2 do receivej7 receive b1
3 ifj =DONE8 receive b2 4 then receive median9 cur M[i] 5 return median
10 if b1
-
7/21/2019 Solutions of RDBMS
16/64
8 Handout 9: Problem Set 2 Solutions
As before, Alice and Bob begin by sorting their arrays using a deterministic (n logn)algorithm such as HeapSort or MergeSort. Then, Alice assumes the role of the master
and Bob the role of the slave. When Alice sends a value A[i]to Bob, Bob returns thenumber of elements, count(A[i]), in his array that are less than A[i]. Because A
is
sorted, the element A[i]is the combined median if and only if i+count(A[i]) = n.Alice checks this condition and returns A[i]as the median if the condition holds. If thecondition fails, then she proceeds to do a binary search within her array. The search
is on i, with an initial range of [1, n]. On each step, she descends into the top half ofthe range if i+count(A[i])n.Because the quantity i+count(A[i])is a monotonic function of i, the search terminateswith A[i] =i +count(A[i])if the combined median is stored within A.
If the combined median is not held in A, then Alices binary search terminates after
1
+
lg
n steps and returns a value of NIL. In this case, Alice and Bob swap roles,with Bob becoming the master and Alice the slave. The procedure is repeated, and this
time the binary search returns the combined median because it must be stored within
B.
For clarity, pseudocode for this algorithm is given below.
ALICE(A[1. . . n])1 HEAPSORT(A)2 median MASTER(A)3 if median =NIL
4 then median SLAVE(A)
5 return median
BOB(B[1. . . n])1 HEAPSORT(B)2 median SLAVE(B)3 if median =NIL
4 then median MASTER(B)
5 return median
MASTER(M[1. . . n]) SLAVE(S[1. . . n])1 lower 1 1 while true2 upper n 2 do receive val3 median NIL 3 if val =DONE4 while lower upper and median =
NIL 4 then receive median
5 do i lower+(upper lower)/2 5 return median6 send A[i] 6 else send |i[1, n] :S[i] val|7 receive count
8 if i +
count =
n
9 then median=M[i]10 elseif i +count
-
7/21/2019 Solutions of RDBMS
17/64
9Handout 9: Problem Set 2 Solutions
Running Time. All but three statements are (1) time. Both Alice and Bob callHeapSort, which is (n lgn). Line 6 of SLAVE counts how many elements in Sareat most val. This can be implemented in (n)time with a brute-force comparison or,because the array is sorted, in (lg
n)
time using a binary search. The last statements
of interest are lines 6-7 of MASTER, which wait for one iteration of SLAVE. Since the
slave executes (lgn)operations between a receive and send statement, lines 6-7 ofMASTER are also (lgn).
It remains to account for the loops. The loop in MASTER is performing a binary
search, which (as we saw in lecture) requires (lgn)iterations. Each iteration does(lgn)work, so the total running time for the loop is (lg2n). The loop in SLAVEterminates when it receives a DONE value, which happens exactly when the loop
in MASTER terminates; thus, SLAVE is also (lg2n). Alice and Bob each execute
MASTER, SLAVE and HeapSort; HeapSort dominates, yielding a final running time of(n lgn).
Communication cost. Most of the communication is in the loop of MASTER, in
which two items are relayed between Alice and Bob per each iteration. Since this
loop executes (lgn)times, it contributes (lgn)communications. The items sentand received at the end of MASTER contribute (1)communications, leaving the totalat (lgn).
(b) Give a randomized algorithm for computing the combined median of A and B. Your
algorithm should run in expected O(n)
time and use expected O(log
n)
communications. (Hint: consider RANDOMIZED-SELECT.)
Solution:
The algorithm is almost identical to Part (a). As before, Alice starts as the master
and conducts a binary search through Bobs elements, looking for A[i] such thatB[ni]
-
7/21/2019 Solutions of RDBMS
18/64
10 Handout 9: Problem Set 2 Solutions
MASTER(M[1. . . n])1 lower 1
2 upper n
3 median NIL4 while lower upper and median =NIL5 do i lower+(upper lower)/26 send n i
7 send n upper 8 send n lower
9 receive b1
10 receive b211 cur RANDOMIZED-SELECT(M, lower, upper, i lower +1)
12 if b1
-
7/21/2019 Solutions of RDBMS
19/64
11Handout 9: Problem Set 2 Solutions
For the inductive step, assume I is true on the current iteration. Then the call toRANDOMIZED-SELECT will partition around the ith smallest element in M, because1) (by the inductive hypothesis) the smallest element in M[lower . . . upper] is thelowerth smallest element in M, 2) by our call to RANDOMIZED-SELECT, we areselecting for the ilowerth smallest element in M[lower . . . upper](we also add 1to compensate for the 1-based array indexing) and 3) lower +i lower = i. Thus, thehypothesis will be satisfied for the range {lower, . . . , i}and {i , . . . , upper}becausethe PARTITION subroutine of RANDOMIZED-SELECT will place elements on the ap
propriate side of the pivot i. Finally, the inductive hypothesis will hold on the nextiteration because we assign either lower or upper to be adjacent to i(but excluding ifrom the next range).
Using the invariant, we conclude that the call to RANDOMIZED-PARTITION on line
11 of MASTER
returns the ith smallest element in M, which is equivalent to the expression M[i]from Part (a).
It remains to show the equivalent property for lines 10 and 11 of SLAVE. This is done
using the same loop invariant, but translating j = ni, bottom = nupper, andtop =nlower across the call between MASTER and SLAVE. In this way, we canshow that lines 10 and 11 of SLAVE return the nith and ni+1th smallest elementsof S, respectively.
We have shown that our changes from Part (a) preserve the behavior of the algorithm,
and thus the algorithm remains correct.
Running Time. We can write a recurrence to model the running time of the main loopin MASTER. Let m=upperlower. On each iteration, m decreases to at most m/2and RANDOMIZED-SELECT runs three times (once in MASTER, twice in SLAVE)
over a segment of size m, with expected running time (m). Thus E[T(m)] =E[T(m/2)]+(m), and by Case 3 of the Master Theorem, E[T(m)] = (m).Finally, noting that m = nat the beginning of the procedure, we have that the expected running time is (n).
Communication cost. The communication cost is identical to Part (a), as the loop
in MASTER still executes (lgn)iterations and sends (1)items on each iteration.Thus, the total number of communications is (lgn). (Note that this algorithm gives
a deterministic bound on the number of communications.)
Problem 2-3. American Gladiator
You are consulting for a game show in which n contestants are pitted against n gladiators in order tosee which contestants are the best. The game show aims to rank the contestants in order of strength;
this is done via a series of 1-on-1 matches between contestants and gladiators. If the contestant is
stronger than the gladiator, then the contestant wins the match; otherwise, the gladiator wins the
-
7/21/2019 Solutions of RDBMS
20/64
12 Handout 9: Problem Set 2 Solutions
match. If the contestant and gladiator have equal strength, then they are perfect equals and a
tie is declared. We assume that each contestant is the perfect equal of exactly one gladiator, and
each gladiator is the perfect equal of exactly one contestant. However, as the gladiators sometimes
change from one show to another, we do not know the ordering of strength among the gladiators.
The game show currently uses a round-robin format in which (n2)matches are played and contestants are ranked according to their number of victories. Since few contestants can happily endure
(n)gladiator confrontations, you are called in to optimize the procedure.
(a) Give a randomized algorithm for ranking the contestants. Using your algorithm, the
expected number of matches should be O(n logn).
Solution:
The problem statement does not describe exactly how the contestants and gladiatorsare specified, so we first need to come up with a reasonable representation for the
input. Lets assume the contestants and gladiators are provided to us in two arrays
C[1. . . n]and G[1. . . n], where we are allowed to compare elements across, but notwithin, these two arrays.
We use a divide-and-conquer algorithm very similar to randomized quicksort. The al
gorithm first performs a partition operation as follows: pick a random contestant C[i].Using this contestant, rearrange the array of gladiators into three groups of elements:
first the gladiators weaker than C[i], then the gladiator that is the perfect equal of C[i],and finally the gladiators stronger than C[i]. Next, using the gladiator that is the per
fect equal of C[i]
we perform a similar partition of the array of contestants. This pairof partitioning operations can easily be implemented in (n)time, and it leaves thecontestants and gladiators nicely partitioned so that the pivot contestant and glad
iator are aligned with each other and all other contestants and gladiators are on the
correct side of these pivots weaker contestants and gladiators precede the pivots,
and stronger contestants and gladiators follow the pivots. Our algorithm then finishes
by recursively applying itself to the subarrays to the left and right of the pivot position
to sort these remaining contestants and gladiators. We can assume by induction on nthat these recursive calls will properly sort the remaining contestants.
To analyse the running time of our algorithm, we can use the same analysis as that
of randomized quicksort. We are performing a partition operation in (n)
time thatsplits our problem into two subproblems whose sizes are randomly distributed ex
actly as would be the subproblems resulting from a partition in randomized quicksort.
Therefore, applying the analysis from quicksort, the expected running time of our
algorithm is (n logn).
Interesting side note: Although devising an efficient randomized algorithm for this
problem is not too difficult, it appears to be very difficult to come up with a deter
ministic algorithm with running time better than the trivial bound of O(n2). This
-
7/21/2019 Solutions of RDBMS
21/64
13Handout 9: Problem Set 2 Solutions
remained an open research question until the mid-to-late 90s, when a very compli
cated deterministic algorithm with (nlogn) running time was finally discovered.This problem provides a striking example of how randomization can help simplify the
task of algorithm design.
(b) Prove that any algorithm that solves part (a) must use (nlogn)matches in the worstcase. That is, you need to show a lower bound for any deterministic algorithm solving
this problem.
Solution:
Lets use a proof based on decision trees, as we did for comparison-based sorting.
Note that we can model any algorithm for sorting contestants and gladiators as a de
cision tree. The tree will be a ternary tree, since every comparison has three possible
outcomes: weaker, equal, or stronger. The height of such a tree corresponds to the
worst-case number of comparisons made by the algorithm it represents, which in turn
is a lower bound on the running time of that algorithm. We therefore want a lower
bound of (nlogn)on the height, H, of any decision tree that solves part (a). Tobegin with, note that the number of leaves Lin any ternary tree must satisfy
L3H.
Next, consider the following class of inputs. Let the input array of gladiators Gbefixed and consist of ngladiators sorted in order of increasing strength, and considerone potential input for every permutation of the contestants. Our algorithm must in
this case essentially sort the array of contestants. In our decision tree, if two different
inputs of this type were mapped to the same leaf node, our algorithm would attempt
to apply to both of these the same permutation of contestants, and it follows that the
algorithm could not compute a ranking correctly for both of these inputs. Therefore,
we must map every one of these n!different inputs to a distinct leaf node, so
L n!
3H n!
H log3n!
H = (nlogn) [Using Stirlings approximation]
-
7/21/2019 Solutions of RDBMS
22/64
Introduction to Algorithms October 22, 2004
Massachusetts Institute of Technology 6.046J/18.410J
Professors Piotr Indyk and Charles E. Leiserson Handout 9
Problem Set 3 Solutions
Reading: Chapters 12.1-12.4, 13, 18.1-18.3
Both exercises and problems should be solved, but only the problems should be turned in.
Exercises are intended to help you master the course material. Even though you should not turn in
the exercise solutions, you are responsible for material covered in the exercises.
Mark the top of each sheet with your name, the course number, the problem number, your
recitation section, the date and the names of any students with whom you collaborated.
Three-hole punch your paper on submissions.
You will often be called upon to give an algorithm to solve a certain problem. Your write-up
should take the form of a short essay. A topic paragraph should summarize the problem you are
solving and what your results are. The body of the essay should provide the following:
1. A description of the algorithm in English and, if helpful, pseudo-code.
2. At least one worked example or diagram to show more precisely how your algorithm works.
3. A proof (or indication) of the correctness of the algorithm.
4. An analysis of the running time of the algorithm.
Remember, your goal is to communicate. Full credit will be given only to correct algorithms
which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.
Exercise 3-1. Do Exercise 12.1-2 on page 256 in CLRS.
Exercise 3-2. Do Exercise 12.2-1 on page 259 in CLRS.
Exercise 3-3. Do Exercise 12.3-3 on page 264 in CLRS.
Exercise 3-4. Do Exercise 13.2-1 on page 278 in CLRS.
Problem 3-1. Packing Boxes
The computer science department makes a move to a new building offering the faculty and graduate
students boxes, crates and other containers. Prof. Potemkin, afraid of his questionable tenure case,
spends all of his time doing research and absentmindedly forgets about the move until the last
minute. His secretary advises him to use the only remaining boxes, which have capacity exactly
1 kg. His belongings consists of nbooks that weigh between 0 and 1 kilograms. He wants to
minimize the total number of used boxes.
-
7/21/2019 Solutions of RDBMS
23/64
2 Handout 9: Problem Set 3 Solutions
Prof. Potemkin realizes that this packing problem is NP-hard, which means that the research
community has not yet found a polynomial time algorithm1 that solves this problem exactly.
He thinks of the heuristic approach called BEST-PACK:
1.Take the books in the order in which they appear on his shelves.
2.For each book, scan the boxes in increasing order of the remaining capacity and place the
book in the first box in which it fits.
(a) Describe a data structure that supports efficient implementation of BEST-PACK. Show
how to use your data structure to get that implementation.
Solution: BEST-PACK can be implemented using any data structure that supports the
following three operations:
1. Insert(x), where x is an element and key[x]is a number
2. Delete(x)
3. Successor(x), which reports the smallest x such that key[x]k
There are several ways to obtain such a data structure. For example, one can use red-
black trees or 23trees. Because they are balanced, they support Insert, Delete andSuccessor operations in O(logn)time. Even though the Successor operation was notexplained for 23trees, they can be implemented by modifying search.
Our implementation is as follows: We use the remaining capacity of the boxes as the
key in the binary tree. Suppose that the elements weigh w1, . . . , wn. Then, for a givenbook with weight wi, if there are no boxes that are already used and whose remaining
capacity is greater than wi (i.e., the successor of wi), then we assign wi to a new box.
(b) Analyze the running time of your implementation.
Solution: The BEST-PACK implementation performs O(n)operations on the datastructure which implies that the total running time is O(n logn)
1That is, an algorithm with running time O(nk)for some fixed k.
-
7/21/2019 Solutions of RDBMS
24/64
3Handout 9: Problem Set 3 Solutions
Soon, Prof. Potemkin comes up with another heuristic WORST-PACK, which is as follows:
1.Take the books in the order in which they appear on his shelves.
2.For each book, find a partially used box which has the maximum remaining capacity. If
possible, place the book in that box. Otherwise, put the book into a new box.
(c) Describe a data structure that supports an efficient implementation of WORST-PACK.
Show how to use your data structure to get that implementation.
Solution: WORST-PACK can be implemented using any priority queue data structure.
We learned in recitation that a heap implements this data structure in O(logn)time.You can also use a balanced search tree to implement these operations.
Our implementation is as follows: Pick a book. Delete the maximum from the priorityqueue. If the capacity is greater than the weight of the book, insert the book and reduce
the capacity of the box. Reinsert the box in the priority queue. Otherwise pick a new
box and insert the book.
(d) Analyze the running time of your implementation.
Solution: Our implementation performs O(n)operations. This means that the totalrunning time is O(nlogn).
-
7/21/2019 Solutions of RDBMS
25/64
=
4 Handout 9: Problem Set 3 Solutions
Problem 3-2. AVL Trees
An AVL tree is a binary search tree with one additional structural constraint: For any of its internal
nodes, the height difference between its left and right subtree is at most one. We call this propertybalance. Remember that the height is the maximum length of a path to the root.
For example, the following binary search tree is an AVL tree:
5
3 7
2
4
Balanced AVL Tree
Nevertheless, if you insert 1, the tree becomes unbalanced.
In this case, we can rebalance the tree by doing a simple operation, called a rotation, as follows:
5
3 7
2 4
1
Rotation
3
2 5
1 4 7
Unbalanced Balanced
See CLRS, p. 278 for the formal definition of rotations.
(a) If we insert a new element into an AVL tree of height 4, is one rotation sufficient to
re-establish balance? Justify your answer.
Solution: No, one rotation is not always sufficient to re-establish balance. For exam
ple, consider the insertion of the shaded node in the following AVL tree:
-
7/21/2019 Solutions of RDBMS
26/64
5Handout 9: Problem Set 3 Solutions
Though the original tree was balanced, more than one rotation is needed to restore
balance following the insertion. This can be seen by an exhaustive enumeration of the
rotation possibilities.
The problem asks for a tree of height 4, so we can extend the above example into a
larger tree:
(b) Denote the minimum number of nodes of an AVL tree of height hby M(h). A treeof height 0has one node, so M(0)=1. What is M(1)? Give a recurrence for M(h).Show that M(h)is at least Fh, where Fh is the hth Fibonacci number.
Solution: M(1)= 2. For h 2, the tree will consist of a root plus two subtrees.Since the tree is of height h, one of the subtrees must be of height h1. The minimumnumber of nodes in this subtree is M
(h1). Since the height of the subtrees can differby at most 1, the minimum number of nodes in the other subtree is M(h2). Thusthe total number of nodes is M
(h)=M(h1)+M(h2)+1.
Note that M(h)is remarkably similar to the Fibonacci numbers and that the recursionholds for the worse case AVL trees, which are called Fibonacci trees. It is easy to
-
7/21/2019 Solutions of RDBMS
27/64
6 Handout 9: Problem Set 3 Solutions
show by induction that M(h)=F(h+3)1. Note that, as shown in Problem Set 1,h
1+ h+3F(h) 1where =2
5. This implies that M(h) 2.5 5
(c) Denote by nthe number of nodes in an AVL tree. Note that nM(h). Give an upperbound for the height of an AVL tree as a function of n.
h+3Solution: We know that nM(h) 2 . Therefore, solving for h, we get5
that his O(lgn).
-
7/21/2019 Solutions of RDBMS
28/64
Introduction to Algorithms October 24, 2004
Massachusetts Institute of Technology 6.046J/18.410J
Professors Piotr Indyk and Charles E. Leiserson Handout 18
Problem Set 4 Solutions
Reading: Chapters 17, 21.121.3
Both exercises and problems should be solved, but only the problems should be turned in.
Exercises are intended to help you master the course material. Even though you should not turn in
the exercise solutions, you are responsible for material covered in the exercises.
Mark the top of each sheet with your name, the course number, the problem number, your
recitation section, the date and the names of any students with whom you collaborated.
Three-hole punch your paper on submissions.
You will often be called upon to give an algorithm to solve a certain problem. Your write-up
should take the form of a short essay. A topic paragraph should summarize the problem you are
solving and what your results are. The body of the essay should provide the following:
1. A description of the algorithm in English and, if helpful, pseudo-code.
2. At least one worked example or diagram to show more precisely how your algorithm works.
3. A proof (or indication) of the correctness of the algorithm.
4. An analysis of the running time of the algorithm.
Remember, your goal is to communicate. Full credit will be given only to correct algorithms
which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.
Exercise 4-1. The Ski Rental Problem
A father decides to start taking his young daughter to go skiing once a week. The daughter may
lose interest in the enterprise of skiing at any moment, so the kth week of skiing may be the last,for any k. Note that kis unknown.
The father now has to decide how to procure skis for his daughter for every weekly session (until
she quits). One can buy skis at a one-time cost of Bdollars, or rent skis at a weekly cost of Rdollars. (Note that one can buy skis at any timee.g., rent for two weeks, then buy.)
Give a 2-competitive algorithm for this problemthat is, give an online algorithm that incurs atotal cost of at most twice the offline optimal (i.e., the optimal scheme if kis known).
Problem 4-1. Queues as Stacks
Suppose we had code lying around that implemented a stack, and we now wanted to implement a
queue. One way to do this is to use two stacks S1 and S2. To insert into our queue, we push into
-
7/21/2019 Solutions of RDBMS
29/64
2 Handout 18: Problem Set 4 Solutions
stack S1. To remove from our queue we first check if S2 is empty, and if so, we dump S1 into S2(that is, we pop each element from S1 and push it immediately onto S2). Then we pop from S2.
For instance, if we execute INSERT(a), INSERT(b), DELETE(), the results are:S1 =[] S2 =[]
INSERT(a) S1 =[a] S2 =[]INSERT(b) S1 =[b a] S2 =[]DELETE() S1 =[] S2 =[a b] dump
S1 =[] S2 =[b] pop (returns a)
Suppose each push and pop costs 1 unit of work, so that performing a dump when S1has nelementscosts 2nunits (since we do npushes and npops).
(a) Suppose that (starting from an empty queue) we do 3 insertions, then 2 removals,
then 3 more insertions, and then 2 more removals. What is the total cost of these 10operations, and how many elements are in each stack at the end?
Solution: The total work is 3 + (6 + 2) + 3 + (1 + 6 + 1) = 22. At the end, S1 has 0elements, and S2 has 2.
(b) If a total of ninsertions and nremovals are done in some order, how large might therunning time of one of the operations be (give an exact, non-asymptotic answer)? Give
a sequence of operations that induces this behavior, and indicate which operation has
the running time you specified.
Solution: An insertion always takes 1 unit, so our worst-case cost must be caused by
a removal. No more that nelements can ever be in S1, and no fewer than 0 elementscan be in S2. Therefore the worst-case cost is 2n+ 1: 2nunits to dump, and one extrato pop from S2. This bound is tight, as seen by the following sequence: perform ninsertions, then nremovals. The first removal will cause a dump of nelements plus apop, for 2n+ 1 work.
(c) Suppose we perform an arbitrary sequence of insertions and removals, starting from
an empty queue. What is the amortized cost of each operation? Give as tight (i.e.,
non-asymptotic) of an upper bound as you can. Use the accounting method to prove
your answer. That is, charge $x
for insertion and $y
for deletion. What are x
and y?Prove your answer.
Solution: The tightest amortized upper bounds are 3 units per insertion, and 1 unit per
removal. We will prove this 2 ways (using the accounting and potential methods; the
aggregate method seems too weak to employ elegantly in this case). (We would also
accept valid proofs of 4 units per insertion and 0 per removal, although this answer is
looser than the one we give here.)
-
7/21/2019 Solutions of RDBMS
30/64
3Handout 18: Problem Set 4 Solutions
Here is an analysis using the accounting method: with every insertion we pay $3: $1
is used to push onto S1, and the remaining $2 remain attached to the element justinserted. Therefore every element in S1 has $2 attached to it. With every removal wepay $1, which will (eventually) be used to pop the desired element off of S2. Beforethat, however, we may need to dump S1 into S2; this involves popping each elementoff of S1 and pushing it onto S2. We can pay for these pairs of operations with the $2attached to each element in S1.
(d) Now well analyze the structure using the potential method. For a queue Q implemented as stacks S1 and S2, consider the potential function
(Q) = number of elements in stack S1.
Use this potential function to analyze the amortized cost of insert and delete opera
tions.
Solution: Let |S1i| denote the number of elements in S1 after the ith operation. Then
the potential function on our structure Qi (the state of the queue after the ith operation) is defined to be (Qi) = 2|S
i|. Note that |S1i| 0 at all times, so (Qi)0.1
Also, |S10| = 0 initially, so (Q0) = 0 as desired.
Now we compute the amortized costs: for an insertion, we have Si+1 =Si + 1, and1 1the actual cost ci = 1, so
ci =ci + (Qi+1) (Qi) = 1 + 2(S1i + 1) 2(S1
i) = 3.
For a removal, we have two cases. First, when there is no dump from S1 to S2, theactual cost is 1, and Si+1 =S1i. Therefore ci = 1. When there is a dump, the actual1
cost is 2|Si| + 1, and we have Si+1 = 0. Therefore we get1 1
ci = (2|S1i| + 1) + 0 2|Si| = 11
as desired.
Problem 4-2. David Digs Donuts
Your TA David has two loves in life: (1) roaming around Massachusetts on his forest-green Can
nondale R300 road bike, and (2) eating Boston Kreme donuts. One Sunday afternoon, he is biking
along Main Street in Acton, and suddenly turns the corner onto Mass Ave. (Yes, that Mass Ave.)His growling stomach announces that it is time for a donut. Because Mass Ave has so many donut
shops along it, David decides to find a shop somewhere along that street. He faces two obstacles in
his quest to satisfy his hunger: first, he does not know whether the nearest donut shop is to his left
or to his right (or how far away the nearest shop is); and second, when he goes riding his contact
lenses dry out dramatically, blurring his vision, and he cant see a donut shop until he is directly in
front of it.
You may assume that all donut shops are at an integral distance (in feet) from the starting location.
-
7/21/2019 Solutions of RDBMS
31/64
4 Handout 18: Problem Set 4 Solutions
(a) Give an efficient (deterministic) algorithm for David to locate a donut shop on Mass
Ave as quickly as possible. Your algorithm will be online in the sense that the location
of the nearest donut shop is unknown until you actually find the shop. The algorithm
should be O(1)-competitive: if the nearest donut shop is distance d
away from Davids
starting point, the total distance that David has to bike before he gets his donut should
be O(d). (The optimal offline algorithm would require David to bike only distance d.)
Solution: WLOG, lets call the two directions of Mass Ave east and west.
1. Check for a shop at the origin.
2. i:=0.
3. direction :=east;
4. Repeat the following until a donut is found:
(a) Bike 2i
units in direction direction. If you pass a donut shop, stop and eat.(b) Bike 2i units back to the origin.
(c) i:=i+1.
(d) direction :=direction.
Notice that you are back at the origin after every iteration of the loop.
Suppose that the nearest donut shop is dfeet away from the origin. Let kbe such that2k
-
7/21/2019 Solutions of RDBMS
32/64
5Handout 18: Problem Set 4 Solutions
iteration. Thus the expected travel distance is
k+1
i=02
k
+d)/2 + (
k+2
i=02(
k
+d)/2
(5d
+ 9d)/2 = 7d.
-
7/21/2019 Solutions of RDBMS
33/64
Introduction to Algorithms Octoberber 31, 2004
Massachusetts Institute of Technology 6.046J/18.410J
Professors Piotr Indyk and Charles E. Leiserson Handout 21
Problem Set 5 Solutions
Reading: Chapters 15, 16
Both exercises and problems should be solved, but only the problems should be turned in.
Exercises are intended to help you master the course material. Even though you should not turn in
the exercise solutions, you are responsible for material covered in the exercises.
Mark the top of each sheet with your name, the course number, the problem number, your
recitation section, the date and the names of any students with whom you collaborated.
Three-hole punch your paper on submissions.
You will often be called upon to give an algorithm to solve a certain problem. Your write-up
should take the form of a short essay. A topic paragraph should summarize the problem you are
solving and what your results are. The body of the essay should provide the following:
1. A description of the algorithm in English and, if helpful, pseudo-code.
2. At least one worked example or diagram to show more precisely how your algorithm works.
3. A proof (or indication) of the correctness of the algorithm.
4. An analysis of the running time of the algorithm.
Remember, your goal is to communicate. Full credit will be given only to correct algorithms
which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.
Exercise 4-1. Do Exercise 15.2-1 on page 338 in CLRS.
Exercise 4-2. Do exercise 15.3-4 on page 350 in CLRS.
Exercise 4-3. Do exercise 15.4-4 on page 356 in CLRS and show how to reconstruct the actual
longest common subsequence.
Exercise 4-4. Do exercise 16.1-3 on page 379 in CLRS.
Exercise 4-5. Do exercise 16.3-2 on page 392 in CLRS.
Problem 4-1. Typesetting
In this problem you will write a program (real code that runs!!!) to solve the following typesetting
problem. Because of the trouble you may encounter while programming, we advise you to
START THIS PROBLEM AS SOON AS POSSIBLE.
-
7/21/2019 Solutions of RDBMS
34/64
2 Handout 21: Problem Set 5 Solutions
You have an input text consisting of a sequence of nwords of lengths 1,2,...,n, where thelength of a word is the number of characters it contains. Your printer can only print with its built-in
Courier 10-point fixed-width font set that allows a maximum of M characters per line. (Assumethat i M for all i = 1,...,n.) When printing words iand i+ 1 on the same line, one spacecharacter (blank) must be printed between the two words. Thus, if words ithrough jare printedon a line, the number of extra space characters at the end of the linethat is, after word jisMj+i
j k.k=iTo produce nice-looking output, the heuristic of setting the cost to the square of the number of
extra space characters at the end of the line has empirically shown itself to be effective. To avoid
the unnecessary penalty for extra spaces on the last line, however, the cost of the last line is 0. Inother words, the cost linecost(i,j) for printing words ithroughjon a line is given by
if words ithroughjdo not fit into a line,
linecost(i,
j) =
0
ifj
=n
(i.e. last line), Mj+i
j 2otherwise.k=ik
The total cost for typesetting a paragraph is the sum over all lines in the paragraph of the cost of
each line. An optimal solution is an arrangement of the nwords into lines in such a way that thetotal cost is minimized.
(a) Argue that this problem exhibits optimal substructure.
Solution: First, notice that linecost(i,j) is defined to be if the words ithroughjdo not fit on a line to guarantee that no lines in the optimal solution overflow. (This
relies on the assumption that the length of each word is not more than M.) Second,notice that linecost(i,j) is defined to be 0 when j =n, where nis the total numberof words; only the actual last line has zero cost, not the recursive last lines of subprob
lems, which, since they are not the last line overall, have the same cost formula as any
other line.
Consider an optimal solution of printing words 1 through n. Let ibe the index of thefirst word printed on the last line of this solution. Then typesetting of words 1,...,i1must be optimal. Otherwise, we could paste in an optimal typesetting of these words
and improve the total cost of solution, a contradiction. Please notice that the same
cut-and-paste argument can be applied if we take ito be the index of the first word
printed on the kth line, where 2
k
n. Therefore this problem displays optimalsubstructure.
(b) Define recursively the value of an optimal solution.
Solution: Let c(j) be the optimal cost of printing words 1 through j. From part (a),we see that given the optimal i(i.e., the index of the first word printed on the last lineof an optimal solution), we have c(j) = c(i1) + linecost(i,j). But since we do
-
7/21/2019 Solutions of RDBMS
35/64
3Handout 21: Problem Set 5 Solutions
not know what i is optimal, we need to consider every possible i, so our recursivedefinition of the optimal cost is
c(j)
=
min
{c(i
1)
+
linecost(i,
j)}.1ij
To accommodate this recursive definition, we define c(0)=0.
(c) Describe an efficient algorithm to compute the cost of an optimal solution.
Solution: We calculate the values of an array for cfrom index 1 to n, which can bedone efficiently since each c(k)for 1 k
-
7/21/2019 Solutions of RDBMS
36/64
4 Handout 21: Problem Set 5 Solutions
(d) requires 5 parts: you should turn in the code you have written, and the output of your program
on the two input samples using two values of M (the maximum number of characters per line),namely M =72and M =40, on each input sample.
Sample 1 is fromA Capsule History of Typesetting by Brown, R.J. Sample 2 is from Out of Their
Minds, by Shasha, Lazere. Remember that collaboration, as usual, is allowed to solve problems,
but you must write your program by yourself.
/* NOTE: This is an implementation of the O(nM) algorithm. */
/* DISCLAIMER: No effort has been made to streamline memory */
/* management or micro-optimize performance. */
/* standard header files */
#include
#include
/* arbitrary data size limits, so no dynamic allocation needed */
#define WORD_NUM 1024 /* arbitrary max for number of input words */
#define WORD_LENGTH 32 /* arbitrary max for length of input words */
#define LINE_LENGTH 80 /* arbitrary max for length of output lines */
/* macros */
#define max(A, B) ((A) > (B) ? (A) : (B))
/* global array of words */
char words[WORD_NUM+1][WORD_LENGTH]; /* array for input words */
int auxL[WORD_NUM+1]; /* auxillary array for computing lengths
of lines - MM*/
/* function prototypes */
long linecost(int n, int M, int i, int j);
long dynamic_typeset(int n, int M, int p[]);
/* main expects two arguments: the input file name and M */
int main (int argc, char *argv[]) {
FILE *ifile; /* input file */
int p[WORD_NUM]; /* array of how to get min costs */
char lines[WORD_NUM+1][LINE_LENGTH]; /* buffer for output lines */
int M; /* output line length */
-
7/21/2019 Solutions of RDBMS
37/64
Handout 21: Problem Set 5 Solutions 5
int n; /* number of input words */
char read_word[WORD_LENGTH]; /* for use during reading */
int i, j, k, l; /* aux vars used during construction of solution */
/* verify arguments */
if(argc != 3) /* verify number of arguments */
exit(1);
if(!(ifile = fopen(argv[1], "r"))) /* open input file */
exit(2);
if(!sscanf(argv[2], "%d", &M)) /* get length of output line */
exit(3);
/* read input words */
n = 1;while(!feof(ifile)) {
if(1 == fscanf(ifile, "%s", read_word)) { /* assumes input word fi
strcpy(words[n++], read_word);
if(n == WORD_NUM)
break; /* no more room for words */
}
}
n--;
/*fill in auxillary array of word lengths */
auxL[0] = 0;
for(k = 1; k
-
7/21/2019 Solutions of RDBMS
38/64
6 Handout 21: Problem Set 5 Solutions
/* ... and construct next line */
}
while(j != 0); /* just finished first line */
for(i = l; i > 0; i--) /* output lines in right order */
printf("%d:[%d]\t%s\n", l-i+1, strlen(lines[i])-1, lines[i]);
}
/**** algorithmic part *****/
/* returns min cost and a min solution in p[] */
long dynamic_typeset(int n, int M, int p[]) {
int i, j;
/* need an extra space for c[0], so c is indexed from 1 to n, *//* instead of from 0 to n-1 (like p) */
long c[WORD_NUM+1];
c[0] = 0; /* base case */
for(j = 1; j
-
7/21/2019 Solutions of RDBMS
39/64
7Handout 21: Problem Set 5 Solutions
}
Solutions:
sample1 72
COST = 160
1:[67] The first practical mechanized type casting machine was invented in
2:[69] 1884 by Ottmar Mergenthaler. His invention was called the "Linotype".
3:[72] It produced solid lines of text cast from rows of matrices. Each matrice
4:[70] was a block of metal -- usually brass -- into which an impression of a
5:[69] letter had been engraved or stamped. The line-composing operation was
6:[72] done by means of a keyboard similar to a typewriter. A later development
7:[64] in line composition was the "Teletypewriter". It was invented in
8:[70] 1913. This machine could be attached directly to a Linotype or similar
9:[66] machines to control composition by means of a perforated tape. The
10:[70] tape was punched on a separate keyboard unit. A tape-reader translated
11:[70] the punched code into electrical signals that could be sent by wire to
12:[71] tape-punching units in many cities simultaneously. The first major news
13:[56] event to make use of the Teletypewriter was World War I.
sample1 40
COST = 360
1:[35] The first practical mechanized type
2:[36] casting machine was invented in 1884
3:[37] by Ottmar Mergenthaler. His invention
4:[38] was called the "Linotype". It produced
5:[37] solid lines of text cast from rows of
6:[37] matrices. Each matrice was a block of
7:[36] metal -- usually brass -- into which8:[34] an impression of a letter had been
9:[39] engraved or stamped. The line-composing
10:[32] operation was done by means of a
11:[35] keyboard similar to a typewriter. A
12:[37] later development in line composition
13:[32] was the "Teletypewriter". It was
14:[36] invented in 1913. This machine could
15:[37] be attached directly to a Linotype or
16:[39] similar machines to control composition
17:[39] by means of a perforated tape. The tape
18:[40] was punched on a separate keyboard unit.
19:[36] A tape-reader translated the punched
20:[39] code into electrical signals that could21:[38] be sent by wire to tape-punching units
22:[40] in many cities simultaneously. The first
23:[35] major news event to make use of the
24:[31] Teletypewriter was World War I.
sample2 72
COST = 229
1:[65] Throughout his life, Knuth had been intrigued by the mechanics of
2:[70] printing and graphics. As a boy at Wisconsin summer camp in the 1940s,
3:[71] he wrote a guide to plants and illustrated the flowers with a stylus on
-
7/21/2019 Solutions of RDBMS
40/64
8 Handout 21: Problem Set 5 Solutions
4:[69] the blue ditto paper that was commonly used in printing at that time.
5:[71] In college, he recalls admiring the typeface used in his math texbooks.
6:[71] But he was content to leave the mechanics of designing and setting type
7:[72] to the experts. "I never thought I would have any control over printing.8:[71] Printing was done by typographers, hot lead, scary stuff. Then in 1977,
9:[71] I learned about new printing machines that print characters made out of
10:[69] zeros and ones, just bits, no lead. Suddenly, printing was a computer
11:[71] science problem. I couldnt resist the challenge of developing computer
12:[66] tools using the new technology with which to write my next books."
13:[67] Knuth designed and implemented TeX, a computer language for digital
14:[67] typography. He explored the field of typography with characteristic
15:[68] thoroughness. For example, he wrote a paper called "The letter S" in
16:[67] which he dissects the mathematical shape of that letter through the
17:[67] ages, and explains his several day effort to find the equation that
18:[33] yields the most pleasing outline.
sample2 40
COST = 413
1:[35] Throughout his life, Knuth had been
2:[38] intrigued by the mechanics of printing
3:[35] and graphics. As a boy at Wisconsin
4:[36] summer camp in the 1940s, he wrote a
5:[35] guide to plants and illustrated the
6:[39] flowers with a stylus on the blue ditto
7:[40] paper that was commonly used in printing
8:[36] at that time. In college, he recalls
9:[38] admiring the typeface used in his math
10:[37] texbooks. But he was content to leave
11:[38] the mechanics of designing and setting
12:[37] type to the experts. "I never thought
13:[39] I would have any control over printing.
14:[34] Printing was done by typographers,
15:[36] hot lead, scary stuff. Then in 1977,
16:[37] I learned about new printing machines
17:[33] that print characters made out of
18:[35] zeros and ones, just bits, no lead.
19:[33] Suddenly, printing was a computer
20:[38] science problem. I couldnt resist the
21:[38] challenge of developing computer tools
22:[38] using the new technology with which to
23:[36] write my next books." Knuth designed
24:[40] and implemented TeX, a computer language
25:[39] for digital typography. He explored the26:[39] field of typography with characteristic
27:[37] thoroughness. For example, he wrote a
28:[39] paper called "The letter S" in which he
29:[39] dissects the mathematical shape of that
30:[37] letter through the ages, and explains
31:[34] his several day effort to find the
32:[38] equation that yields the most pleasing
33:[8] outline
-
7/21/2019 Solutions of RDBMS
41/64
9Handout 21: Problem Set 5 Solutions
Here is what Sample 1 should look like when typeset with M = 50. Feel free to use this outputto debug your code.
The first practical mechanized type casting
machine was invented in 1884 by Ottmar
Mergenthaler. His invention was called the
"Linotype". It produced solid lines of text
cast from rows of matrices. Each matrice was a
block of metal -- usually brass -- into which
an impression of a letter had been engraved or
stamped. The line-composing operation was done
by means of a keyboard similar to a typewriter.
A later development in line composition was
the "Teletypewriter". It was invented in1913. This machine could be attached directly
to a Linotype or similar machines to control
composition by means of a perforated tape. The
tape was punched on a separate keyboard unit.
A tape-reader translated the punched code into
electrical signals that could be sent by wire to
tape-punching units in many cities simultaneously.
The first major news event to make use of the
Teletypewriter was World War I.
(e) Suppose now that the cost of a line is defined as the number of extra spaces. That is,when words ithroughjare put into a line, the cost of that line is
if words ithroughjdo not fit into a line,linecost(i,j) = 0 ifj=n(i.e. last line),
Mj+ij
otherwise;k=ik
and that the total cost is still the sum over all lines in the paragraph of the cost of each
line. Describe an efficient algorithm that finds an optimal solution in this case.
Solution: We use a straightforward greedy algorithm, which puts as many words as
possible on each line before going to the next line. Such an algorithm runs in lineartime.
Now we show that any optimal solution has the same cost as the solution obtained by
this greedy algorithm. Consider some optimal solution. If this solution is the same as
the greedy solution, then we are done. If it is different, then there is some line iwhichhas enough space left over for the first word of the next line. In this case, we move
the first word of line i+ 1 to the end of line i. This does not change the total cost,since if the length of the word moved is l, then the reduction to the cost of line iwill
-
7/21/2019 Solutions of RDBMS
42/64
10 Handout 21: Problem Set 5 Solutions
be l+1, for the word and the space before it, and the increase of the cost of line i+1will also be l+1, for the word and the space after it. (If the moved word was the onlyword on line i+1, then by moving it to the previous line the total cost is reduced, acontradiction to the supposition that we have an optimal solution.) As long as there
are lines with enough extra space, we can keep moving the first words of the next lines
back without changing the total cost. When there are no longer any such lines, we will
have changed our optimal solution into the greedy solution without affecting the total
cost. Therefore, the greedy solution is an optimal solution.
Problem 4-2. Manhattan Channel Routing
A problem that arises during the design of integrated-circuit chips is to hook components together
with wires. In this problem, well investigate a simple such problem.
InManhattan routing, wires run on one of two layers of an integrated circuit: vertical wires runon layer 1, and horizontal wires run on layer 2. The height his the number of horizontal tracksused. Wherever a horizontal wire needs to be connected to a vertical wire, a via connects them.
Figure 1 illustrates severalpins (electrical terminals) that are connected in this fashion. As can be
seen in the figure, all wires run on an underlying grid, and all the pins are collinear.
In our problem, the goal is to connect up a given set of pairs of pins using the minimum number
of horizontal tracks. For example, the number of horizontal tracks used in the routing channel of
Figure 1 is 3but fewer might be sufficient.
Let L={(p1, q1), (p2, q2), . . . , (pn, qn)}be a list of pairs of pins, where no pin appears more thanonce. The problem is to find the fewest number of horizontal tracks to connect each pair. For exam
ple, the routing problem corresponding to Figure 1 can be specified as the set {(1, 3), (2, 5), (4, 6), (8, 9)}.
(a) What is the minimum number of horizontal tracks needed to solve the routing problem
in Figure 1?
Solution: You can verify that the wire connecting pins 4 and 6 could be at the same
height as the wire connecting pins 1 and 3, making the number of horizontal track
needed 2. Note that this is the minimum possible. Otherwise the wire connecting pins
2 and 3 and the wire connecting 1 and 3 would be on the same track, violating the
problem specifications.
(b) Give an efficient algorithm to solve a given routing problem having n pairs of pins using the minimum possible number of horizontal tracks. As always, argue correctness
(your algorithm indeed minimizes the number of horizontal tracks), and analyze the
running time.
Algorithm description
The following algorithm routes pin pairs greedily into available horizontal tracks in
order of the smaller pin of a pair.
-
7/21/2019 Solutions of RDBMS
43/64
11Handout 21: Problem Set 5 Solutions
1 6 7 83 542
h=3
9
Figure 1: Pins are shown as circles. Vertical wires are shown as solid. Horizontal wires are dashed.
Vias are shown as squares.
1. Go through L and if qi > pi swap them. For each pair, we call the smaller pin,start, and the larger, end. Sort the start and end pins of the pairs in an increasing
order. The resulting list contains 2n values.
2. Place all available horizontal tracks in a stack S.
3. Go through the list in sorted order.
If the current pin is a start pin, pop the first available horizontal track from S
and route it in that horizontal track. If S is empty, then it is not possible toroute all of the pin pairs using the given number of horizontal tracks. Report
an error in this case.
If the current pin is an end pin, look up in which horizontal track it has beenrouted and push that horizontal track back onto S.
Correctness
Suppose that the algorithm terminates with a routing requiring m horizontal tracks.Let k denote the first pair of pins routed in the mth horizontal track. Let sk denotethe start pin and fk denote the end pin of this pair. Let fl be the earliest finish pin
appearing in the sorted list after sk. Necessarily, fl > sk. The closest routing in eachof the m1horizontal tracks already in use starts before sk. Each routing terminatesafter fl. Thus there are m routings in between [sk, fl], i.e., any routing must use atleast m vertical tracks. Thus the routing returned by the algorithm is optimal.
The above argument shows that given an infinite supply of horizontal tracks, our algo
rithm will always produce a routing that uses the fewest number of horizontal tracks.
Thus, if the algorithm terminates with an error, it means that a given number of hor
izontal tracks is less than the number of horizontal tracks in an optimal routing, and
-
7/21/2019 Solutions of RDBMS
44/64
12 Handout 21: Problem Set 5 Solutions
hence it is impossible to route all the pin pairs.
Analysis
This algorithm runs in O(n
lg
n)
because it is necessary to sort 2n
items (which canbe accomplished using heapsort or mergesort). Notice that scanning through the list
and assigning horizontal tracks takes O(1)time per connection, for a total of O(2n)=O(n)time.
This is also known as the interval-graph coloring problem. We can create an interval
graph whose vertices are the given pairs of pins and whose edges connect incompatible
pairs of pins. The smallest number of colors required to color every vertex so that
no two adjacent vertices are given the same color corresponds to finding the fewest
number of horizontal tracks needed to connect all of the pairs of pins.)
-
7/21/2019 Solutions of RDBMS
45/64
Introduction to Algorithms November 15, 2004
Massachusetts Institute of Technology 6.046J/18.410J
Professors Piotr Indyk and Charles E. Leiserson Handout 26
Problem Set 6 Solutions
Reading: Chapters 22, 24, and 25.
Both exercises and problems should be solved, but only the problems should be turned in.Exercises are intended to help you master the course material. Even though you should not turn in
the exercise solutions, you are responsible for material covered in the exercises.
Mark the top of each sheet with your name, the course number, the problem number, your
recitation section, the date and the names of any students with whom you collaborated.
You will often be called upon to give an algorithm to solve a certain problem. Your write-up
should take the form of a short essay. A topic paragraph should summarize the problem you are
solving and what your results are. The body of the essay should provide the following:
1. A description of the algorithm in English and, if helpful, pseudo-code.
2. At least one worked example or diagram to show more precisely how your algorithm works.
3. A proof (or indication) of the correctness of the algorithm.
4. An analysis of the running time of the algorithm.
Remember, your goal is to communicate. Full credit will be given only to correct algorithms
which are which are described clearly. Convoluted and obtuse descriptions will receive low marks.
Exercise 6-1. Do Exercise 22.2-5 on page 539 in CLRS.
Exercise 6-2. Do Exercise 22.4-3 on page 552 in CLRS.
Exercise 6-3. Do Exercise 22.5-7 on page 557 in CLRS.
Exercise 6-4. Do Exercise 24.1-3 on page 591 in CLRS.
Exercise 6-5. Do Exercise 24.3-2 on page 600 in CLRS.
Exercise 6-6. Do Exercise 24.4-8 on page 606 in CLRS.
Exercise 6-7. Do Exercise 25.2-6 on page 635 in CLRS.
Exercise 6-8. Do Exercise 25.3-5 on page 640 in CLRS.
-
7/21/2019 Solutions of RDBMS
46/64
2 Handout 26: Problem Set 6 Solutions
Problem 6-1. Truckin
Professor Almanac is consulting for a trucking company. Highways are modeled as a directed
graph G = (V, E)in which vertices represent cities and edges represent roads. The company isplanning new routes from San Diego (vertex s) to Toledo (vertex t).
(a) It is very costly when a shipment is delayed en route. The company has calculated the
probabilityp(e)
[0, 1] that a given road e
Ewill close without warning. Give anefficient algorithm for finding a route with the minimum probability of encountering
a closed road. You should assume that all road closings are independent.
Solution:
To simplify the solution, we use the probability q(e)= 1p(e)that a road will beopen. Further, we remove from the graph roads withp(e)= 1, as they are guaranteedto be closed and will never be included in a meaningful solution. (Following this
transformation, we can use depth first search to ensure that some path from s to t
has a positive probability of being open.) By eliminatingp(e) = 1, we now have
0< q(e)
1for all edges e
E. It is important to have eliminated the possibility ofq(e)= 0, because we will be taking the logarithm of this quantity later.
Because the road closings are independent, the probability that a given path will be
open is the product of the probabilities of the edges being open. That is, for each path
r =e1, e2, . . . , en, the probability Q(r) of the path being open is:
Q(r) =n
q(ei)i=1
Our goal is to find the path r, beginning at s and ending at t, that maximizes Q(r).Taking the negative logarithm of both sides yields:
lgQ(r) = lgn
q(ei)i=1
=n
lg q(ei)
i=1
Using w(ei) to denote the quantity lg q(ei), this becomes:
lgQ(r) =n
w(ei)i=1
The right hand side is a sum of edge weights w(e)along the path from s to t. Wecan minimize this quantity using Dijkstras algorithm for single-source shortest paths.
Doing so will yield the path r that minimizes lgQ(r), thereby maximizing Q(r).
This path will have the maximum probability of being open, and thus the minimum
probability of being closed.
-
7/21/2019 Solutions of RDBMS
47/64
3Handout 26: Problem Set 6 Solutions
The running time is O(E+V lg V). We only spend (E) to remove edges withp(e) =1and to update the edge weights; Dijkstras algorithm dominates with a runtime ofO(E+V lgV).
Alternate Solution:
It is also possible to modify Dijkstras algorithm to directly compute the path with thehighest probability of being open. As above, let q(e) = 1p(e) denote the probabilitythat a given road e
Ewill be open, and let Q(r) denote the probability that a givenpath r will be open. For a given vertex v, let o[v] denote the maximum value of Q(r)over all paths r from s to v. Then, make the following modifications to Dijkstras
algorithm:
1. Change INITIALIZE-S INGLE-S OURCE to assign o[s]= 1for the source vertexand o[v]= 0for all other vertices:
INITIALIZE-S INGLE-S OURCE (G, s)1 for each vertex v V[G]2 do o[v]
03 [v]NIL4 o[s]1
That is, we can reach the source vertex with probability 1, and the probability ofreaching all other vertices will increase monotonically from 0using RELAX.
2. Instead of EXTRACT-M IN, use EXTRACT-MAX (and a supporting data structure)
to see which vertex to visit. That is, first explore paths with the highest probability
of being open.
3. Rewrite the RELAX step as follows:
RELAX(u,v,q)1 if o[v]< o[u] q(e) where e = (u, v)2 then o[v]o[u] q(e)3 [v]u
That is, if a vertex v can be reached with a higher probability than before along
the edge under consideration, then increase the probability o[v]. Because theprobabilities of roads being open are independent, the probability of a path being
open is the product of the probabilities of each edge being open.
The argument for correctness parallels that of Dijkstras algorithm, as presented in
lecture. The proof relies on the following properties:
1. Optimal substructure. A sub-path of a path with the highest probability of being
open must also be a path with the highest probability of being open. Otherwise we
could increase the overall probability of being open by increasing the probability
along this sub-path (cut-and-paste).
-
7/21/2019 Solutions of RDBMS
48/64
4 Handout 26: Problem Set 6 Solutions
2. Triangle inequality. Let (u, v)denote the highest probability of a path from uto v being open. Then for all u,v,x V, we have (u, v) (u, x)(x, v).Otherwise (u, v) could be increased if we chose the path through x.
3. Well-definedness of shortest paths. Since q(e) [0, 1], the probability of a pathbeing open can only decrease as extra edges are added to a path. Since we are
exploring the path with the highest probability of being open, this ensures that
there are no analogues of negative-weight cycles.
Properties (1) and (2) are true whenever an associative operator is used to combine
edge weights into a path weight. In Dijkstras shortest path algorithm, the operator is
addition; here it is multiplication.
The running time is O(E +V lg V). We spend (E)to calculate q(e)= 1p(e),and Dijkstras algorithm runs in O(E +V lg V).
(b) Many highways are off-limits for trucks that weigh more than a given threshold. For a
given highway e
E, let w(e) + denote the weight limit and let l(e) + denotethe highways length. Give an efficient algorithm that calculates: 1) the heaviest truck
that can be sent from s to t, and 2) the shortest path this truck can take.
Solution:
First, we modify Dijkstras algorithm to find the heaviest truck that can be sent from
s to t. The weight limit w(e) is used as the edge weight for e. There are three modifications to the algorithm:
1. In INITIALIZE-S INGLE-SOURCE, assign a value of
to the source vertex and a
value of 0to all other vertices.
2. Instead of EXTRACT-M IN, use EXTRACT-MAX (and a supporting data structure)
to see which vertex to visit. That is, first explore those paths which support the
heaviest trucks.3. In the RELAX step, use minin place of addition. That is, maintain the minimum
weight limit encountered on a given path instead of the total path length from the
source.
As in Part (a), the proof of correctness follows that of Dijkstras algorithm. Since the
minoperator is associative, the optimal paths exhibit optimal substructure and supportthe triangle inequality. There are no analogues of negative-weight cycles because the
weight supported by a path can only decrease as the path becomes longer (and we are
searching for the heaviest weight possible).
Given the weight of the heaviest truck that can pass from s to t, we can find the shortest
path as follows. Simply remove all edges from the graph that are less than the weightof the heaviest truck. Then, run Dijkstras algorithm (unmodified) to find the shortest
path.
The overall runtime of our algorithms is that of Dijkstras algorithm: O(E +V lg V).
-
7/21/2019 Solutions of RDBMS
49/64
5Handout 26: Problem Set 6 Solutions
(c) Consider a variant of (b) in which trucks must make strictly eastward progress with
each city they visit. Adjust your algorithm to exploit this property and analyze the
runtime.
Solution:
Remove from the graph any edges that do not make eastward progress. Because
we always go eastward, there are no cycles in this graph. Thus, we can use DAG-
SHORTEST-PATHS to solve the problem in (V +E) time. We need only modify theINITIALIZE-SINGLE-SOURCE and RELAX procedures as in (b).
Problem 6-2. Constructing Construction Schedules
Consider a set of n jobs to be completed during the construction of a new office building. For
each i {1, 2, . . . , n}, aschedule assigns a time xi 0for job i to be started. There are someconstraints on the schedule:
1. For each i, j {1, 2, . . . , n}, we denote by A[i, j]
theminimum latency from the startof job i to the start of jobj. For example, since it takes a day for concrete to dry, construction
of the walls must begin at least one day after pouring the foundation. The constraint on the
schedule is:
i, j {1, 2, . . . , n} : xi +A[i, j] xj (1)
If there is no minimum latency between jobs i andj, then A[i, j] =.
2. For each i, j {1, 2, . . . , n}, we denote by B[i, j] themaximum latency from the startof job i to the start of jobj. For example, weatherproofing must be added no later than one
week after an exterior wall is erected. The constraint on the schedule is:
i, j {1, 2, . . . , n} : xi +B[i, j] xj (2)
If there is no maximum latency between jobs i andj, then B[i, j] =.
(a) Show how to model the latency constraints as a set of linear difference equations. That
is, given A[1 . .n, 1 . . n]and B[1 . .n, 1 . . n], construct a matrix C[1 . .n, 1 . . n]suchthat the following constraints are equivalent to Equations (1) and (2):
i, j {1, 2, . . . n} : xi xj C[i, j] (3)
Solution:
Re-arranging Equation (1) yields:
i, j {1, 2, . . . , n} : xi xj A[i, j] (4)
-
7/21/2019 Solutions of RDBMS
50/64
6 Handout 26: Problem Set 6 Solutions
Re-arranging Equation (2) yields:
i, j {1, 2, . . . , n} : xi xj B[i, j]
i, j {1, 2, . . . , n} : xj xi B[i, j]
i, j {1, 2, . . . , n} : xi xj B[j, i] (5)
Equations (4) and (5) are equivalent to Equation (3) if we set:
i, j {1, 2, . . . , n} : C[i, j]=min(A[i, j], B[j, i])
(b) Show that the Bellman-Ford algorithm, when run on the constraint graph correspond
ing to Equation (3), minimizes the quantity (max{xi} min{xi}) subject to Equation(3) and the constraint xi 0for all xi.
Solution:
wx
Recall that the Bellman-Ford algorithm operates on a graph in which each constraint
j xi
C[i, j]is translated to an edge from vertex vi to vertex vj with weightij =C[i, j]. Also, an extra vertex s is added with a 0-weight edge from s to each
vertex v
V. If Bellman-Ford detects a negative-weight cycle in this graph, then
the constraints are unsatisfiable. We thus focus on the case in which there are no
negative-weight cycles. The proof takes the form of a lemma and a theorem.
Lemma 1 When Bellman-Ford is run on the constraint graph, max{xi} = 0.
Proof. Letp =s, v1, . . . , vk be the shortest path from vertex s to vertex vk as reported by Bellman-Ford when run over the constraint graph. By the optimal substruc
ture of shortest paths, s, v1 must be the shortest path from s to v1. By construction,the edge from s to v1 has a weight of 0. Thus x1 =(s, v1) =w(s, v1)= 0. In
combination with the constraint xi 0for all xi, this implies that maxi xi = 0.
Theorem 2 When Bellman-Ford is run on the constraint graph, it minimizes the quan
tity (max{xi} min{xi})
Proof. Since max{xi}= 0(by Lemma 1), it suffices to show that Bellman-Fordmaximizes min{xi}. Let xk =min{xi} in the solution produced by Bellman-Ford,and consider the shortest pathp =v0, v1, . . . , vk from s =v0 to vk. The weight ofk1 k1this path is w(p) = i=0 wi(i+1) =w(v0, v1)+
k1C[i, i+1]= C[i, i+1].i=1 i=1The path corresponds to the following set of constraints:
x1 x0 0
x2 x1
C[1, 2]
x3 x2 C[2, 3]
x
. . .
k xk1 C[k 1, k]
-
7/21/2019 Solutions of RDBMS
51/64
7Handout 26: