Divide-and-Conquer Dr. Yingwu Zhu P65-74, p83-88, p93-96, p170-180.

50
Divide-and-Conquer Divide-and-Conquer Dr. Yingwu Zhu Dr. Yingwu Zhu P65-74, p83-88, p93-96, P65-74, p83-88, p93-96, p170-180 p170-180

Transcript of Divide-and-Conquer Dr. Yingwu Zhu P65-74, p83-88, p93-96, p170-180.

Divide-and-ConquerDivide-and-Conquer

Dr. Yingwu ZhuDr. Yingwu Zhu

P65-74, p83-88, p93-96, p170-180P65-74, p83-88, p93-96, p170-180

Divide-and-ConquerDivide-and-Conquer

The most-well known algorithm design technique:1. Divide instance of problem into two or more

smaller instances• Recursive case

2. Solve smaller instances independently and recursively • When to stop? – base case

3. Obtain solution to original (larger) instance by combining these solutions

Divide-and-Conquer Technique (cont.)Divide-and-Conquer Technique (cont.)

subproblem 2 of size n/2

subproblem 1 of size n/2

a solution to subproblem 1

a solution tothe original problem

a solution to subproblem 2

a problem of size n

Divide

A General TemplateA General Template//S is a large problem with input size of nAlgorithm divide_and_conquer(S) if (S is small enough to handle) solve it //base case: conquer else split S into two (equally-sized) subproblems S1 and S2 divide_and_conquer(S1) divide_and_conquer(S2) combine solutions to S1 and S2 endifEnd

General Divide-and-Conquer Recurrence General Divide-and-Conquer Recurrence

• Recursive algorithms are a natural fit for divide-and-conquer– Distinguish from Dynamic Programming

• Recall in Lecture 1, algorithm efficiency analysis for recursive algorithms– Key: Recurrence Relation– Solve: backward substitution, often cumbersome!

General Divide-and-Conquer RecurrenceGeneral Divide-and-Conquer Recurrence

T(n) = aT(n/b) + f (n) where f(n) (nd), d 0, f(n) accounts for the time spent

on dividing the problem into smaller ones and combining their solutions

Master Theorem: If a < bd, T(n) (nd) If a = bd, T(n) (nd log n)

If a > bd, T(n) (nlog b a ) Note: The same results hold with O instead of .

Examples: T(n) = 4T(n/2) + n T(n) ? T(n) = 4T(n/2) + n2 T(n) ? T(n) = 4T(n/2) + n3 T(n) ?

Divide-and-Conquer ExamplesDivide-and-Conquer Examples

• Sorting: mergesort and quicksortSorting: mergesort and quicksort

• Maximum subarray pronblemMaximum subarray pronblem

• Multiplication of large integersMultiplication of large integers

• Closest-pair problemClosest-pair problem

• Matrix multiplication: Strassen’s algorithm (reading)Matrix multiplication: Strassen’s algorithm (reading)

MergesortMergesort

• Question #1: what makes mergesort distinct from many other sorting algorithms?

internal and external algorithm

• Question #2: How to design mergesort using divide-and-conquer technique?

MergesortMergesort• Split array A[0..n-1] in two about equal halves and

make copies of each half in arrays B and C• Sort arrays B and C recursively

• Q: when to stop?• Merge sorted arrays B and C into array A as

follows:– Repeat the following until no elements remain in one of the arrays:

• compare the first elements in the remaining unprocessed portions of the arrays

• copy the smaller of the two into A, while incrementing the index indicating the unprocessed portion of that array

– Once all elements in one of the arrays are processed, copy the remaining unprocessed elements from the other array into A.

Pseudocode of MergesortPseudocode of Mergesort

Pseudocode of MergePseudocode of Merge

Mergesort

Example

8 3 2 9 7 1 5 4

8 3 2 9 7 1 5 4

8 3 2 9 7 1 5 4

8 3 2 9 7 1 5 4

3 8 2 9 1 7 4 5

2 3 8 9 1 4 5 7

1 2 3 4 5 7 8 9

Analysis of MergesortAnalysis of Mergesort• Time efficiency by recurrence reltation: T(n) = 2T(n/2) + f(n) n-1 comparisons in merge operation for worst case! T(n) = Θ(n log n) • Number of comparisons in the worst case is close to

theoretical minimum for comparison-based sorting: log2 n! ≈ n log2 n - 1.44n (Section 11.2)

• Space requirement: Θ(nlogn) (not in-place)

• Can be implemented without recursion (bottom-up)

Mergsort: A big pictureMergsort: A big picture

• Problem: Assume you want to sort X terabytes (even perabytes) of data using a cluster of M (even thousands) computers.

• Solution?

Similar to Search Engines in some aspects: Similar to Search Engines in some aspects: data partitioning!data partitioning!

Exercises #1Exercises #1

a. Write a pseudocode for a divide-and-conquer algorithm for finding the position of the largest element in an array of n numbers.

b. Set up and solve a recurrence relation for the number of key comparisons made by your algorithm.

c. How does this algorithm compare with the brute-force algorithm for this problem?

QuicksortQuicksort• Select a pivot (partitioning element) – here, the first

element for simplicity!• Rearrange the list so that all the elements in the first s

positions are smaller than or equal to the pivot and all the elements in the remaining n-s positions are larger than the pivot (see next slide for an algorithm)

• Exchange the pivot with the last element in the first (i.e., ) subarray — the pivot is now in its final position

• Sort the two subarrays recursively

p

A[i]p A[i]>p

QuicksortQuicksort

• Basic operation: split/divide– Differ from the divide operation in mergesortmergesort– What is the major difference?

QuicksortQuicksort

• Basic operation: split/divide– Differ from the divide operation in mergesortmergesort– What is the major difference?

• Each split will place the pivot in the right position, and the left sublist << the right sublist

• No explicit merge

Partitioning AlgorithmPartitioning Algorithm

A[i] > pA[j] <= p

Quicksort ExampleQuicksort Example

8, 2, 13, 5, 14, 3, 7

Analysis of QuicksortAnalysis of Quicksort• Best case: T(n) =? • Worst case: T(n) =?

Analysis of QuicksortAnalysis of Quicksort• Best case: split in the middle — Θ(n log n) • Worst case: sorted array! — Θ(n2) • Average case: random arrays — Θ(n log n)

– Assume the split can happen in each position with equal probability! See textbook for details!

• Improvements:– better pivot selection: median-of-three partitioning – switch to insertion sort on small sublists– elimination of recursionThese combination makes 20-25% improvement

• Considered the method of choice for internal sorting of large files (n ≥ 10000)

Randomized Quicksort

• Many regard the randomized quicksort as the sorting algorithm of choice for large inputs

• Random sampling:– Select a randomly-chosen element from the

subarray as the pivot

• Expected running time: O(nlogn) – Chapter 7.4, p180

Randomized Quicksort

randomized_partition(A[l..r])int pos = l + rand() % (r-l+1);swap(A[l], A[pos]);partition(A[l..r]); //call default one

Questions for QuicksortQuestions for Quicksort

• Q1: How to implement the median-of-three rule by reusing the previous implementation of the split operation?

• Q2: How to implement a non-recursive quicksort?

Divide-and-Conquer ExamplesDivide-and-Conquer Examples

• Sorting: mergesort and quicksortSorting: mergesort and quicksort

• Binary search (degenerated) Binary search (degenerated)

• Binary tree traversalsBinary tree traversals

• Multiplication of large integersMultiplication of large integers

• Matrix multiplication: Strassen’s algorithmMatrix multiplication: Strassen’s algorithm

• Closest-pair algorithmClosest-pair algorithm

Maximum Subarray Problem

Problem: given an array of n numbers, find the (a) contiguous subarray whose sum has the largest value.

Application: an unrealistic stock market game, in which you decide when to buy and see a stock, with full knowledge of the past and future. The restriction is that you can perform just one buy followed by a sell. The buy and sell both occur right after the close of the market.

The interpretation of the numbers: each number represents the stock value at closing on any particular day.

Maximum Subarray Problem

Example:

Maximum Subarray Problem

Another Example: buying low and selling high, even with perfect knowledge, is not trivial:

A brute-force solution

• O(n2) Solution• Considering Cn

2 pairs

• Not a pleasant prospect if we are rummaging through long time-series (Who told you it was easy to get rich???), even if you are allowed to post-date your stock options…

A Better Solution: Max Subarray

Transformation: Instead of the daily price, let us consider the daily change: A[i] is the difference between the closing value on day i and that on day i-1.The problem becomes that of finding a contiguous subarray the sum of whose values is maximum.•On a first look this seems even worse: roughly the same number of intervals (one fewer, to be precise), and the requirement to add the values in the subarray rather than just computing a difference: (n3)?

Max Subarray

Max SubarrayHow do we divide?We observe that a maximum contiguous subarray A[i…j] must be located as follows:1.It lies entirely in the left half of the original array: [low…mid];2.It lies entirely in the right half of the original array: [mid+1…high];3.It straddles the midpoint of the original array: i ≤ mid < j.

Max Subarray: Divide & Conquer• The “left” and “right” subproblems are smaller versions of the

original problem, so they are part of the standard Divide & Conquer recursion.

• The “middle” subproblem is not, so we will need to count its cost as part of the “combine” (or “divide”) cost.– The crucial observation (and it may not be entirely obvious) is that we

can find the maximum crossing subarray in time linear in the length of the A[low…high] subarray.

How? A[i,…,j] must be made up of A[i…mid] and A[m+1…j] – so we find the largest A[i…mid] and the largest A[m+1…j] and combine them.

The middle subproblem

Algorithms: Max Subarray

Max Subarray: Algorithm Efficiency

We finally have:

The recurrence has the same form as that for MERGESORT, and thus we should expect it to have the same solution T(n) = (n lg n).

This algorithm is clearly substantially faster than any of the brute-force methods. It required some cleverness, and the programming is a little more complicated – but the payoff is large.

T n 1 if n 1,

2T n /2 n if n 1.

Multiplication of Large Integers Multiplication of Large Integers Consider the problem of multiplying two (large) n-digit integers represented by arrays of their digits such as:

A = 12345678901357986429 B = 87654321284820912836

Multiplication of Large Integers Multiplication of Large Integers Consider the problem of multiplying two (large) n-digit integers represented by arrays of their digits such as:

A = 12345678901357986429 B = 87654321284820912836

The grade-school algorithm:a1 a2 … an

b1 b2 … bn

(d10) d11d12 … d1n

(d20) d21d22 … d2n

… … … … … … … (dn0) dn1dn2 … dnn

Efficiency: n2 one-digit multiplications

Multiplication of Large Integers Multiplication of Large Integers Consider the problem of multiplying two (large) n-digit integers represented by arrays of their digits such as:

A = 12345678901357986429 B = 87654321284820912836

Discussion: How to apply “divide-and-conquer” to this problem?

First CutFirst CutA small example: A B where A = 2135 and B = 4014A = (21·102 + 35), B = (40 ·102 + 14)So, A B = (21 ·102 + 35) (40 ·102 + 14) = 21 40 ·104 + (21 14 + 35 40) ·102 + 35 14

In general, if A = A1A2 and B = B1B2 (where A and B are n-digit, A1, A2, B1, B2 are n/2-digit numbers),A B = A1 B1·10n + (A1 B2 + A2 B1) ·10n/2 + A2 B2

Recurrence for the number of one-digit multiplications T(n): T(n) = 4T(n/2), T(1) = 1Solution: T(n) = n2

Second CutSecond Cut

A B = A1 B1·10n + (A1 B2 + A2 B1) ·10n/2 + A2 B2

The idea is to decrease the number of multiplications from 4 to 3:

(A1 + A2 ) (B1 + B2 ) = A1 B1 + (A1 B2 + A2 B1) + A2 B2,

i.e., (A1 B2 + A2 B1) = (A1 + A2 ) (B1 + B2 ) - A1 B1 - A2 B2, which requires only 3 multiplications at the expense of (4-1) extra add/sub.

Recurrence for the number of multiplications T(n): T(n) = 3T(n/2), T(1) = 1Solution: T(n) = 3log 2n = nlog 23 ≈ n1.585

44

• To multiply two n-digit integers:– Add two ½n digit integers.– Multiply three ½n-digit integers.– Add, subtract, and shift ½n-digit integers to obtain result.

Integer Multiplication

A B CA C

Large-Integer MultiplicationLarge-Integer Multiplication

• Questions: • What if two large numbers have different number

of digits?• What if n is an odd number?

Closest-Pair Problem Closest-Pair Problem

• S is a set of n points Pi=(xi, yi) in the plane

• For simplicity, n is a power of two• Without loss of generality, we assume points

are ordered by their x coordinates• Discussion: How to apply divide-and-conquer?

Closest-Pair Problem by Divide-and-ConquerClosest-Pair Problem by Divide-and-Conquer

Step 1 Divide the points given into two subsets S1 and S2 by a vertical line x = c so that half the points lie to the left or on the

line and half the points lie to the right or on the line.

Closest Pair by Divide-and-Conquer (cont.)Closest Pair by Divide-and-Conquer (cont.)

Step 2 Find recursively the closest pairs for the left and right subsets.

Step 3 Set d = min{d1, d2} We can limit our attention to the points in the symmetric

vertical strip of width 2d as possible closest pair. Let C1 and C2 be the subsets of points in the left subset S1 and of the right subset S2, respectively, that lie in this vertical strip. The points in C1 and C2 are stored in increasing order of their y coordinates, which is maintained by merging during the execution of the next step.

Step 4 For every point P(x,y) in C1, we inspect points in C2 that may be closer to P than d. There can be no more than 6 such points (because d ≤ d2)!

Closest Pair by Divide-and-Conquer: Worst CaseClosest Pair by Divide-and-Conquer: Worst Case

The worst case scenario is depicted below:

Efficiency of the Closest-Pair AlgorithmEfficiency of the Closest-Pair Algorithm

Running time of the algorithm is described by

T(n) = 2T(n/2) + f(n), where f(n) O(n)

By the Master Theorem (with a = 2, b = 2, d = 1) T(n) O(n log n)