C++ Programming: Program Design Including Data Structures, Third Edition

109
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 19: Searching and Sorting Algorithms

description

C++ Programming: Program Design Including Data Structures, Third Edition. Chapter 19: Searching and Sorting Algorithms. Objectives. In this chapter you will: Learn the various search algorithms Explore how to implement the sequential and binary search algorithms - PowerPoint PPT Presentation

Transcript of C++ Programming: Program Design Including Data Structures, Third Edition

Page 1: C++ Programming:  Program Design Including Data Structures,  Third Edition

C++ Programming: Program Design IncludingData Structures, Third Edition

Chapter 19: Searching and Sorting Algorithms

Page 2: C++ Programming:  Program Design Including Data Structures,  Third Edition

ObjectivesIn this chapter you will:• Learn the various search algorithms• Explore how to implement the sequential and binary search

algorithms• Discover how the sequential and binary search algorithms

perform• Become aware of the lower bound on comparison-based

search algorithms• Learn the various sorting algorithms• Explore how to implement the bubble, selection, insertion,

quick, and merge sorting algorithms• Discover how the sorting algorithms discussed in this

chapter perform

Page 3: C++ Programming:  Program Design Including Data Structures,  Third Edition

• The most important operation that can be performed on a list is the search algorithm. Using the search algorithm, you can do the following:

• Determine whether a particular item is in the list.• If the data is specially organized (for example,

sorted), find the location in the list where a new item can be inserted.

• Find the location of an item to be deleted.

Page 4: C++ Programming:  Program Design Including Data Structures,  Third Edition

• The searching and sorting algorithms that we describe are generic. • Because searching and sorting require comparisons of data, the

algorithms should work on the type of data that provide appropriate functions to compare data items.

• Data can be organized with the help of an array or a linked list. • You can create an array of data items or you can use the class unorderedLinkedList to organize data.

• The algorithms that we describe should work on either organization.

• We write function templates to implement a particular algorithm. • All algorithms described in this chapter, with the exception of the

merge sort algorithms, are for array-based lists. • We show how to use the searching and sorting algorithms on

objects of the class unorderedArrayListType. • We place all the array-based searching and sorting functions in the

header file searchSortAlgorithms.h. • If you need to use a particular searching and/or sorting function

designed in this chapter, your program can include this header file and use that function.

Page 5: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Associated with each item in a data set is a special member that uniquely identifies the item in the data set.

• This unique member of the item is called the key of the item. • The keys of the items in the data set are used in such

operations as searching, sorting, inserting, and deleting. • When analyzing searching and sorting algorithms, the key

comparisons refer to comparing the key of the search item with the key of an item in the list.

• The number of key comparisons refers to the number of times the key of the search item (in algorithms such as searching and sorting) is compared with the keys of the items in the list.

Page 6: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Sequential search does not require the list elements to be in any particular order.

Page 7: C++ Programming:  Program Design Including Data Structures,  Third Edition

• The statements before and after the loop are executed only once, and hence require very little computer time.

• The statements in the for loop are the ones that are repeated several times. For each iteration of the loop, the search item is compared with an element in the list, and a few other statements are executed, including some other comparisons.

• The loop terminates as soon as the search item is found in the list. • Execution of the other statements in the loop is directly related to the

outcome of the key comparison. • Different programmers might implement the same algorithm differently,

although the number of key comparisons would typically be the same. • The speed of a computer can also easily affect the time an algorithm

takes to perform, but it of course does not affect the number of key comparisons required.

• Therefore, when analyzing a search algorithm, we count the number of key comparisons because this number gives us the most useful information.

Page 8: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Suppose that L is a list of length n. • If the search item is not in the list, we then compare the search item

with every element in the list, making n comparisons. This is an unsuccessful case.

• Suppose that the search item is in the list. • The number of key comparisons depends on where in the list the

search item is located. • If the search item is the first element of L, we make only one key

comparison. This is the best case. • On the other hand, if the search item is the last element in the list, the

algorithm makes n comparisons. This is the worst case. • To determine the average number of comparisons in the successful

case of the sequential search algorithm:1. Consider all possible cases.2. Find the number of comparisons for each case.3. Add the number of comparisons and divide by the number of

cases.

Page 9: C++ Programming:  Program Design Including Data Structures,  Third Edition

• If the search item, called the target, is the first element in the list, one comparison is required. If the target is the second element in the list, two comparisons are required. Similarly, if the target is the kth element in the list, k comparisons are required. We assume that the target can be any element in the list

• The following expression gives the average number of comparisons:

• This expression shows that on average, a successful sequential search searches half the list.

• If the list size is 1,000,000, on average, the sequential search makes 500,000 comparisons.

• The sequential search is not efficient for large lists.

Page 10: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Binary search can be applied to sorted lists• Uses the “divide and conquer” technique

− Compare search item to middle element

− If search item is less than middle element, restrict the search to the lower half of the list

− Otherwise search the upper half of the list

Page 11: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 12: C++ Programming:  Program Design Including Data Structures,  Third Edition

Search item = 89

Page 13: C++ Programming:  Program Design Including Data Structures,  Third Edition

Search item = 34

Page 14: C++ Programming:  Program Design Including Data Structures,  Third Edition

Search item = 22

Page 15: C++ Programming:  Program Design Including Data Structures,  Third Edition

Binary Search (continued)

• Every iteration cuts size of search list in half• If list L has 1000 items

− At most 11 iterations needed to determine if an item x is in list

• Every iteration makes 2 key (item) comparisons− Binary search makes at most 22 key comparisons to

determine if x is in L

• Sequential search makes 500 key comparisons (average) if x is in L for the same size list

Page 16: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Suppose that L is a sorted list of size n. • Suppose that n is a power of 2, that is, n = 2m, for

some nonnegative integer m. • After each iteration of the for loop, about half the

elements are left to search. • For example, after the first iteration the search

sublist is of the size about n /2 = 2m1. • It is easy to see that the maximum number of the

iteration of the for loop is about m + 1. Also, m = log2n.

• Each iteration makes 2 key comparisons. • The maximum number of comparisons to determine

whether an element x is in L is 2(m + 1) = 2(log2n + 1) = 2log2n + 2.

Page 17: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Just as a problem is analyzed before writing the algorithm and the computer program, after an algorithm is designed it should also be analyzed.

• There are various ways to design a particular algorithm.

• Certain algorithms take very little computer time to execute, while others take a considerable amount of time.

Page 18: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Lines 1 to 6 each have 1 operation, << or >>.

• Line 7 has 1 operation, >=.

• Either Line 8 or Line 9 executes; each has 1 operation.

• There are 3 operations, <<, in Line 11.

• The total number of operations executed in this code is 6 + 1 + 1 + 3 = 11.

Page 19: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 20: C++ Programming:  Program Design Including Data Structures,  Third Edition

• This algorithm has 5 operations (Lines 1 through 5) before the while loop. Similarly, there are 9 or 8 operations after the while loop, depending on whether Line 11 or Line 13 executes.

• Line 5 has 1 operation, and 4 operations within the while loop (Lines 6 through 8).

• Lines 5 through 8 have 5 operations. If the while loop executes 10 times, these 5 operations execute 10 times, plus one extra operation is executed at Line 5 to terminate the loop. Therefore, the number of operations executed from Lines 5 through 8 is 51.

• If the while loop executes 10 times, the total number of operations executed is:

5 × 10 + 1 + 5 + 9 or 5 × 10 + 1 + 5 + 8that is,

5 × 10+ 15 or 5 × 10 + 14 • When the while loop executes n times: If the while loop executes n

times, the number of operations executed is:5n + 15 or 5n + 14

• In these expressions, for very large values of n, the term 5n becomes the dominating term and the terms 15 and 14 become negligible.

Page 21: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Table 19-4 shows how certain functions grow as the parameter n grows.

• Suppose that the problem size is doubled.

• If the number of basic operations is a function of f(n) = n2; the number of basic operations is quadrupled.

• If the number of basic operations is a function of f(n) = 2n, then the number of basic operations is squared.

• However, if the number of operations is a function of f(n) = log2n, the change in the number of basic operations is insignificant.

Page 22: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 23: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 24: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 25: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 26: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 27: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 28: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 29: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 30: C++ Programming:  Program Design Including Data Structures,  Third Edition

Sorting a List: Bubble Sort

• Suppose list[0]...list[n - 1] is a list of n elements, indexed 0 to n – 1

• Bubble sort algorithm:

− In a series of n - 1 iterations, compare successive elements, list[index] and list[index + 1]

− If list[index] is greater than list[index + 1], then swap them

Page 31: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 32: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 33: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 34: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 35: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Suppose a list L of length n is to be sorted using bubble sort. • Consider the function bubbleSort. • This function contains nested for loops. • The outer loop executes n – 1 times. • For each iteration of the outer loop, the inner loop executes a certain

number of times. Let us consider the first iteration of the outer loop. • During the first iteration of the outer loop, the number of iterations of

the inner loop is n – 1. So there are n – 1 comparisons. • During the second iteration of the outer loop, the number of iterations

of the inner loop is n – 2, and so on. Thus, the total number of comparisons is

Page 36: C++ Programming:  Program Design Including Data Structures,  Third Edition

• In the worst case, the number of assignments is

Page 37: C++ Programming:  Program Design Including Data Structures,  Third Edition

template <class elemType>void unorderedArrayListType<elemType>::sort(){ bubbleSort(list, length);}

Page 38: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Selection sort: rearrange list by selecting an element and moving it to its proper position

• Find the smallest (or largest) element and move it to the beginning (end) of the list

Page 39: C++ Programming:  Program Design Including Data Structures,  Third Edition

Selection Sort (continued)

• On successive passes, locate the smallest item in the list starting from the next element

Page 40: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 41: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 42: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Suppose that a list L of length n is to be sorted using the selection sort algorithm.

• The function swap does three item assignments and is executed n − 1 times.

• The number of item assignments is 3(n − 1) = O(n).

• The key comparisons are made by the function minLocation. • For a list of length k, the function minLocation makes k − 1 key

comparisons. Also, the function minLocation is executed n − 1 times (by the function selectionSort).

• The first time, the function minLocation finds the index of the smallest key item in the entire list and therefore makes n − 1 comparisons.

• The second time, the function minLocation finds the index of the smallest element in the sublist of length n − 1 and makes n − 2 comparisons, and so on.

• The number of key comparisons is as follows:

Page 43: C++ Programming:  Program Design Including Data Structures,  Third Edition

• If n = 1000, the number of key comparisons the selection sort algorithm makes is

Page 44: C++ Programming:  Program Design Including Data Structures,  Third Edition

The insertion sort algorithm sorts the list by moving each element to its proper place.

Page 45: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 46: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 47: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 48: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 49: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 50: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Let L be a list of length n. • The for loop executes n – 1 times. • In the best case, when the list is already sorted, for each iteration

of the for loop, the if statement evaluates to false, so there are n – 1 key comparisons.

• In the best case, the number of key comparisons is n – 1 = O(n). • Let us consider the worst case. In this case, for each iteration of

the for loop, the if statement evaluates to true. Moreover, in the worst case, for each iteration of the for loop, the do…while loop executes firstOutOfOrder – 1 times. It follows that in the worst case, the number of key comparisons is:

1 + 2 + … + (n – 1) = n(n – 1 ) / 2 = O(n2).

Page 51: C++ Programming:  Program Design Including Data Structures,  Third Edition

• It can be shown that the average number of key comparisons and the average number of item assignments in an insertion sort algorithm are:

Page 52: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 53: C++ Programming:  Program Design Including Data Structures,  Third Edition

• We can trace the execution of a comparison-based algorithm by using a graph called a comparison tree.

• Let L be a list of n distinct elements, where n > 0.

• For any j and k, where 1 j n, 1 k n, either L[j] < L[k] or L[j] > L[k].

• Because each comparison of the keys has two outcomes, the comparison tree is a binary tree.

• While drawing this figure, we draw each comparison as a circle, called a node.

• The node is labeled as j:k, representing the comparison of L[j] with L[k]. If L[j] < L[k], follow the left branch; otherwise, follow the right branch.

• Figure 19-36 shows the comparison tree for a list of length 3.

• In Figure 19-36, the rectangle, called a leaf, represents the final ordering of the nodes.

Page 54: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 55: C++ Programming:  Program Design Including Data Structures,  Third Edition

• We call the top node in the figure the root node. • The straight line that connects the two nodes is called a branch. • A sequence of branches from a node, x, to another node, y, is

called a path from x to y. • Associated with each path from the root to a leaf is a unique

permutation of the elements of L. • This uniqueness follows because the sort algorithm only moves

the data and makes comparisons. • For a list of n elements, n > 0, there are n! different permutations.

Any one of these n! permutations might be the correct ordering of L. Thus, the comparison tree must have at least n! leaves.

Page 56: C++ Programming:  Program Design Including Data Structures,  Third Edition

• The quick sort algorithm uses the divide-and-conquer technique to sort a list.

• The list is partitioned into two sublists, which are then sorted and combined into one list in such a way so that the combined list is sorted.

• The general algorithm is

Page 57: C++ Programming:  Program Design Including Data Structures,  Third Edition

• To partition the list into two sublists, first we choose an element of the list called pivot.

• The pivot is used to divide the list into two sublists: lowerSublist and upperSublist.

• The elements in lowerSublist are smaller than pivot, and the elements in upperSublist are greater than pivot.

Page 58: C++ Programming:  Program Design Including Data Structures,  Third Edition

• There are several ways to determine pivot.

• However, pivot is chosen so that, it is hoped, lowerSublist and upperSublist are of nearly equal size.

• Let us choose the middle element of the list as pivot.

• The partition procedure that we describe partitions this list using pivot as the middle element, in our case 50, as shown in Figure 19-38.

Page 59: C++ Programming:  Program Design Including Data Structures,  Third Edition

The partition algorithm is as follows (we assume that pivot is chosen as the middle element of the list):1. Determine pivot, and swap pivot with the first element

of the list.

Suppose that the index smallIndex points to the last element less than pivot. The index smallIndex is initialized to the first element of the list.

2. For the remaining elements in the list (starting at the second element):If the current element is less than pivota. Increment smallIndex.b. Swap the current element with the array element

pointed to by smallIndex.3. Swap the first element, that is, pivot, with the array

element pointed to by smallIndex.

Page 60: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 61: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 62: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 63: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 64: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 65: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 66: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 67: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 68: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 69: C++ Programming:  Program Design Including Data Structures,  Third Edition

• The average-case behavior of a quick sort is O(nlog2n). However,

the worst-case behavior of a quick sort is O(n2). • This section describes the sorting algorithm whose behavior is

always O(nlog2n).• Like the quick sort algorithm, the merge sort algorithm uses the

divide-and-conquer technique to sort a list. • A merge sort algorithm also partitions the list into two sublists,

sorts the sublists, and then combines the sorted sublists into one sorted list.

• The merge sort and the quick sort algorithms differ in how they partition the list.

• A quick sort first selects an element in the list, called pivot, and then partitions the list so that the elements in one sublist are less than pivot and the elements in the other sublist are greater than or equal to pivot.

• By contrast, a merge sort divides the list into two sublists of nearly equal size.

Page 70: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 71: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 72: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 73: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Every time we advance middle by one node, we advance current by one node.

• After advancing current by one node, if current is not NULL, we again advance current by one node.

• Eventually, current becomes NULL and middle points to the last node of the first sublist.

Page 74: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 75: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 76: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 77: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 78: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 79: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 80: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 81: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 82: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 83: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Suppose that L is a list of n elements, where n > 0.

• Suppose that n is a power of 2, that is, n = 2m for some nonnegative integer m, so that we can divide the list into two sublists, each of size:

Page 84: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 85: C++ Programming:  Program Design Including Data Structures,  Third Edition

• Consider the general case when n = 2m. • The number of recursion levels is m. • To merge a sorted list of size s with a sorted list of size t, the

maximum number of comparisons is s + t 1. • Consider the function mergeList, which merges two sorted

lists into a sorted list. • This is where the actual work (comparisons and assignments)

is done. • The initial call to the function recMergeSort, at level 0, produces

two sublists, each of the size n / 2. • To merge these two lists, after they are sorted, the maximum

number of comparisons is

Page 86: C++ Programming:  Program Design Including Data Structures,  Third Edition

• At level 1, we merge two sets of sorted lists, where each sublist is of the size n / 4.

• To merge two sorted sublists, each of the size n / 4, we need at most

comparisons.

• At level 1 of the recursion, the number of comparisons is 2(n / 2 – 1) = n – 2 = O(n).

• At level k of the recursion, there are a total of 2k calls to the function mergeList. Each of these calls merge two sublists, each of the size n / 2k + 1, which requires a maximum of n / 2k 1 comparisons.

• At level k of the recursion, the maximum number of comparisons is

Page 87: C++ Programming:  Program Design Including Data Structures,  Third Edition

• The maximum number of comparisons at each level of the recursion is O(n).

• Because the number of levels of the recursion is m, the maximum number of comparisons made by the merge sort algorithms is O(nm).

• Now n = 2m implies that m = log2n. Hence, the maximum number of comparisons made by the merge sort algorithm is O(n log2n).

• If W(n) denotes the number of key comparisons in the worst case to sort L, then

• Let A(n) denote the number of key comparisons in the average case.

• On average, it can be shown that the number of comparisons for merge sort is given by the following equation: If n is a power of 2,

Page 88: C++ Programming:  Program Design Including Data Structures,  Third Edition

• The presidential election for the student council of your local university is about to be held.

• The chair of the election committee wants to computerize the voting and has asked you to write a program to analyze the data and report the winner.

• The university has four major divisions, and each division has several departments.

• For the election, the four divisions are labeled as region 1, region 2, region 3, and region 4.

• Each department in each division handles its own voting and reports the votes received by each candidate to the election committee.

• The voting is reported in the following form:

firstName lastName regionNumber numberOfVotes

Page 89: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 90: C++ Programming:  Program Design Including Data Structures,  Third Edition

• The input file containing the voting data looks like the following:

• The main component of this program is a candidate. Therefore, first we will design the class candidateType to implement a candidate object.

Page 91: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 92: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 93: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 94: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 95: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 96: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 97: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 98: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 99: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 100: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 101: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 102: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 103: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 104: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 105: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 106: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 107: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 108: C++ Programming:  Program Design Including Data Structures,  Third Edition
Page 109: C++ Programming:  Program Design Including Data Structures,  Third Edition

Function printResults