Sorting. We live in a world obsessed with keeping information, and to find it, we must keep it in...

Sorting

Sorting

• We live in a world obsessed with keeping information, and to find it, we must keep it in some sensible order.

• You learned in the last chapter that in worst case the searching time is proportional to the size of the list.

O(n)

• The only way to reduce the searching time is to keep list ordered, as in binary search.

O(log2 n )

Sorting

• Sorting is the process of creating some sensible order.

• Sorting is closely related to searching in that we must sift through an unordered list a number of times looking for a particular element or a particular place to put an element.

Ordered List

• An ordered list is a list in which each entry contains a key, such that the keys are in order.

• That is, if entry i comes before entry j in the list, then the key of entry i is less than or equal to the key of entry j.

Sorting

• Several years ago, it was estimated, more than half the time on many commercial computers was spent in sorting.

• Because sorting is so important, a great many algorithms have been devised for doing it.

• KNUTH dealt with about twenty-five sorting methods in his vol-3 and claims that they are “only a fraction of the algorithms that have been devised so far.”

Sorting

• Your text describes only a few of them:– Insertion Sort– Selection Sort– Shell Sort– Divide-and-Conquer Sorting– Mergesort for Linked Lists– Quicksort for Contiguous Lists– Heaps and Heapsort

Evaluate sorting methods

• We will evaluate sorting methods using “Big Oh” notation.

• In searching, the total amount of work done was clearly related to the number of comparisons of keys.

• The same observation is true for sorting algorithms, but sorting algorithms must also move their entries around the list or change pointers.

Required tasks when Sorting

• Compare the target item to other items

• Rearrange unordered items

• work done depends on:– number of comparisons– number of moves

Analysis

• As before, both the worst-case performance and the average performance of a sorting algorithm are of interest.

• To find the average, we shall consider what would happen if the algorithm were run on all possible orderings of the list (with n entries, there are n! such orderings altogether) and take the average of the results.

Sortable Lists

• We shall be particularly concerned with the performance of our sorting algorithms.

• In order to optimize performance of a program for sorting a list, we shall need to take advantage of any special features of the list’s implementation.

• For example, we shall see that some sorting algorithms work very efficiently on contiguous lists, but different implementations and different algorithms are needed to sort linked lists efficiently.

• Hence, to write efficient sorting programs, we shall need access to the private data members of the lists being sorted. Therefore, we shall add sorting functions as methods of our basic List data structures.

• The augmented list structure forms a new ADT that we shall call a Sortable_List.

class definition of Sortable Lists• The class definition for a Sortable_List takes the

following form.template <class Record>class Sortable_list :: public List<Record> {public: // Add prototypes for sorting methods here.

private: // Add prototypes for auxiliary functions here.

};

• This definition shows that a Sortable_list is a List with extra sorting methods.

• The base list class can be any of the List implementations of Chapter 6.

Record and Key• We use a template parameter class called Record to

stand for entries of the Sortable_list. • As in Chapter 7, we assume that the class Record

has the following properties:

Every Record has an associated key of type Key. A Record can be implicitly converted to the corresponding Key. Moreover, the keys (hence also the records) can be compared under the operations ‘ < ,’ ‘ > ,’ ‘ >= ,’ ‘ <= ,’ ‘ == ,’ and ‘ != .’

Instance of Sortable List• a program for testing our Sortable_list might

simply declare:Sortable_list<int> test_list;

• Here, the client uses the type int to represent both records and their keys.

INSERTION SORT• The name of this algorithms comes from the fact

that as we build an ordered list from an unordered one, we do so by choosing an element from the unordered list and “inserting” it into its correct place in the ordered list.

Sortable Lists3?

7 8 4 2 1 6 5

3 7?

8 4 2 1 6 5

3 7 8?

4 2 1 6 5

3 7 8 4?

2 1 6 5

3 4 7 8 2?

1 6 5

2 3 4 7 8 1?

6 5

1 2 3 4 7 8 6?

5

1 2 3 4 6 7 8 5?

1 2 3 4 5 6 7 8

algorithm• Take the first item in the unsorted list• Insert it into the correct position in the

sorted list• Repeat until the unsorted list is empty

implementation• If we wish to design an implementation of

an algorithm to do this, we must be more specific:i.e.

- what data structure will be used?- where does the sorted list begin and end?- how do we “do” steps 1 & 2 above?

Ordered insertion• An ordered list is an abstract data type, defined as a list in

which each entry has a key, and such that the keys are in order; that is, if entry i comes before entry j in the list, then the key of entry i is less than or equal to the key of entry j .

• For ordered lists, we shall often use two new operations that have no counterparts for other lists, since they use keys rather than positions to locate the entry.

• One operation retrieves an entry with a specified key from the ordered list. Retrieval by key from an ordered list is exactly the same as searching.

• The second operation, ordered insertion, inserts a new entry into an ordered list by using the key in the new entry to determine where in the list to insert it.

• Note that ordered insertion is not uniquely specified if the list already contains an entry with the same key as the new entry, since the new entry could go into more than one position.

Ordered insertion

ordered insertion• We begin with the ordered list shown in part (a) of the figure and wish

to insert the new entry hen.

• In contrast to the implementation-independent version of insert from Section 7.3, we shall start comparing keys at the end of the list, rather than at its beginning.

• Hence we first compare the new key hen with the last key ram shown in the coloured box in part (a).

• Since hen comes before ram, we move ram one position down, leaving the empty position shown in part (b).

• We next compare hen with the key pig shown in the coloured box in part (b).

• Again, hen belongs earlier, so we move pig down and compare hen with the key dog shown in the coloured box in part (c).

• Since hen comes after dog, we have found the proper location and can complete the insertion as shown in part (d).

Sorting by Insertion• To sort an unordered list, we think of

– removing its entries one at a time and then – inserting each of them into an initially empty new list,

always keeping the entries in the new list in the proper order according to their keys.

• This method is illustrated in Figure 8.2, which shows the steps needed to sort a list of six words. At each stage, the words that have not yet been inserted into the sorted list are shown in coloured boxes, and the sorted part of the list is shown in white boxes.

Sorting by Insertion

Sorting by Insertion• In the initial diagram, the first word hen is shown as

sorted, since a list of length 1 is automatically ordered.

The main step of contiguous insertion sort

Sorting by Insertion• The main step required to insert an entry denoted

current into the sorted part of the list is shown in Figure 8.3.

• In the method that follows, we assume that the class Sorted_list is based on the contiguous List implementation of Section 6.2.2.

• Both the sorted list and the unsorted list occupy the same List, member array, which we recall from Section 6.2.2 is called entry.

• The variable first_unsorted marks the division between the sorted and unsorted parts of this array.

insertion_sort( )template <class Record>

void Sortable_list<Record> :: insertion_sort( )

/* Post: The entries of the Sortable_list have been rearranged so that the keys in all the entries are sorted into increasing order.

Uses: Methods for the class Record; the contiguous List implementation of

Chapter 6 */

{

int first_unsorted; // position of first unsorted entry

int position; // searches sorted part of list

Record current; // holds the entry temporarily removed from list

for (first_unsorted = 1; first_unsorted < count; first_unsorted ++ )

if (entry[first_unsorted] < entry[first_unsorted - 1]) {

position = first_unsorted;

current = entry[first_unsorted];//Pull unsorted entry out of the list.

do { // Shift all entries until the proper position is found.

entry[position] = entry[position - 1];

position -- ; // position is empty.

} while (position > 0 && entry[position - 1] > current);

entry[position] = current;

}

}

insertion_sort( )

• a list with only one entry is automatically sorted,

• the loop on first_unsorted = 1 starts with the second entry.

– if it is in the correct position, nothing needs to be done.

– otherwise, the new entry is pulled out of the list into the variable current, and

– the do : : while loop pushes entries one position down the list until the correct position is found, and finally current is inserted there.

– The case when current belongs in the first position of the list must be detected specially, since in this case there is no entry with a smaller key that would terminate the search. We treat this special case as the first clause in the condition of the do : : while loop, position > 0 .

Analysis of Insertion Sort• Analyze the performance of the contiguous version of the program.

Analysis of Insertion SortAssumptions:

• We restrict our attention to the case when the list is initially in random order (meaning that all possible orderings of the keys are equally likely).

• When we deal with entry i, how far back must we go to insert it? There are i possible ways to move it:

– not moving it at all,

– moving it one position,

– moving it up to i - 1 positions to the front of the list.

• Given randomness, these are equally likely.

• The probability that it need not be moved is thus 1/i, in which case only one comparison of keys is done, with no moving of entries.

Analysis of Insertion Sortinserting one entry

• The remaining case, in which entry i must be moved, occurs with probability (i - 1)/i.

• Let us begin by counting the average number of iterations of the do : : while loop.

• Since all of the i - 1 possible positions are equally likely, the average number of iterations is

(1 + 2 + ... + (i - 1)) / (i - 1) (p.647)= ((i - 1) i) / (2 (i - 1))= i /2

Analysis of Insertion Sort• One key comparison and one assignment are done

for each of these iterations,

• with one more key comparison done outside the loop, along with two assignments of entries.

• Hence, in this second case, entry i requires, on average, i /2 + 1 comparisons and i /2 + 2 assignments.

Analysis of Insertion Sort• When we combine the two cases with their

respective probabilities, we have

1/i . 1 + (i - 1)/i . (i /2 + 1)comparisons= (i - 1)/2

• and

1/i . 0 + (i - 1)/i . (i /2 + 2)assignments= (i + 3)/2 - 2/i

Analysis of Insertion Sortinserting all entries • We wish to add these numbers from i = 2 to i = n, but to

avoid complications in the arithmetic, we first use the big-O notation to approximate each of these expressions by suppressing the terms bounded by a constant; that is, terms that are O(1).

• We thereby obtain i /2 + O(1) for both the number of comparisons and the number of assignments of entries.

• In making this approximation, we are really concentrating on the actions within the main loop and suppressing any concern about operations done outside the loop or variations in the algorithm that change the amount of work only by some bounded amount.

Analysis of Insertion Sort• To add i /2 + O(1) from i = 2 to i = n, we apply

Theorem A.1 on page 647.

• We also note that adding n terms, each of which is O(1), produces a result that is O(n). We thus obtain

• for both the number of comparisons of keys and the number of assignments of entries.

n

i = 2

(½ i + O(1)) =

n

i = 2

i + O(n) ½

= ¼ n2 + O(n)

Analysis of Insertion Sort• for both the number of comparisons of keys and the

number of assignments of entries.

• As n becomes larger, the contributions from the term involving n2 become much larger than the remaining terms collected as O(n).

• Hence as the size of the list grows, the time needed by insertion sort grows like the square of this size.

O(n2).

= ¼ n2 + O(n)

Analysis of Insertion Sort• The worst case for the contiguous version of

insertion sort is when the keys are input in reversed order.

• This would require i - 1 comparisons and i + 1 assignments for the i th entry in the list, with n keys being checked, giving a worst case comparison count of

n

i = 2

(i - 1) = ½ (n-1) n

Analysis of Insertion SortWorst Case:

5 4 3 2 1 2 moves4 5 3 2 1 3 moves3 4 5 2 1 ...2 3 4 5 1 unsorted n-1 moves1 2 3 4 5 n moves sorted

Total moves = 2 + 3 + 4 + ... + (n-1) + n 1 + 2 + 3 + 4 + ... + (n-1) + n= O(n2)

Linked Version of Insertion Sort• For a linked version of insertion sort, since there is no movement of data,

there is no need to start searching at the end of the sorted sublist.

• Instead, we shall traverse the original list, taking one entry at a time and inserting it in the proper position in the sorted list.

• The pointer variable last_sorted will reference the end of the sorted part of the list, and last_sorted->next will reference the first entry that has not yet been inserted into the sorted sublist.

• We shall let first_unsorted also point to this entry and use a pointer current to search the sorted part of the list to find where to insert *first_unsorted. If *first_unsorted belongs before the current head of the list, then we insert it there.

• Otherwise, we move current down the list until first_unsorted->entry <= current->entry and then insert *first_unsorted before *current. To enable insertion before *current we keep a second pointer trailing in lock step one position closer to the head than current.

• A sentinel is an extra entry added to one end of a list to ensure that a loop will terminate without having to include a separate check. Since we have

Analysis of Insertion Sort• the node *first_unsorted is already in position to

serve as a sentinel for the search,

• and the loop moving current is simplified.

• Finally, let us note that a list with 0 or 1 entry is already sorted, so that we can

• check these cases separately and thereby avoid trivialities elsewhere. The details

• appear in the following function and are illustrated in Figure 8.4.

Insertion Sort functiontemplate <class Record>

void Sortable_list<Record> :: insertion_sort( )

/* Post: The entries of the Sortable_list have been rearranged so that the keys in

all the entries are sorted into nondecreasing order.

Uses: Methods for the class Record. The linked List implementation of/

{

Node <Record> *first_unsorted, // the first unsorted node to be inserted

*last_sorted, // tail of the sorted sublist

*current, // used to traverse the sorted sublist

*trailing; // one position behind current

if (head != NULL) { // Otherwise, the empty list is already sorted.

last_sorted = head; // The first node alone makes a sorted sublist.

Insertion Sort functionwhile (last_sorted->next != NULL) {

first_unsorted = last_sorted->next;

if (first_unsorted->entry < head->entry) {

// Insert *first_unsorted at the head of the sorted list:

last_sorted->next = first_unsorted->next;

first_unsorted->next = head;

head = first_unsorted;

}

else {

// Search the sorted sublist to insert *first_unsorted:

trailing = head;

current = trailing->next;

while (first_unsorted->entry > current->entry) {

trailing = current;

current = trailing->next;

}

Insertion Sort function// *first_unsorted now belongs between *trailing and

*current.

if (first_unsorted == current)

last_sorted = first_unsorted; // already in right position

else {

last_sorted->next = first_unsorted->next;

first_unsorted->next = current;

trailing->next = first_unsorted;

}

}

}

}

}

Analysis of Insertion Sort• the node *first_unsorted is already in position to

serve as a sentinel for the search,

• and the loop moving current is simplified.

• Finally, let us note that a list with 0 or 1 entry is already sorted, so that we can

• check these cases separately and thereby avoid trivialities elsewhere. The details

• appear in the following function and are illustrated in Figure 8.4.

Analysis of Insertion Sort

Linked Insertion Sort• With no movement of data, there is no need to

search from the end of the sorted sublist, as for the contiguous case.

• Traverse the original list, taking one entry at a time and inserting it in the proper position in the sorted list.

• Pointer last_sorted references the end of the sorted part of the list.

• Pointer first_unsorted == last_sorted->next references the first entry that has not yet been inserted into the sorted sublist.

Linked Insertion Sort• Pointer current searches the sorted part of the list to

nd where to insert *first_unsorted.

• If *first_unsorted belongs before the head of the list, then insert it there.

• Otherwise, move current down the list until

• first_unsorted->entry <= current->entry• and then insert *first_unsorted before *current.

To enable insertion before *current, keep a second pointer trailing in lock step

• one position closer to the head than current.

Linked Insertion Sort• A sentinel is an extra entry added to one end of a list

to ensure that a loop will terminate without having to include a separate check.

• Since last_sorted->next == first_unsorted, the node *first_unsorted is already in position to serve as a sentinel for the search, and the loop moving current is simplied.

• A list with 0 or 1 entry is already sorted, so by checking these cases separately we avoid trivialities elsewhere.

Sorting Algorithms and Average Case Number of Comparisons

Simple Sorts– Straight Selection Sort

– Bubble Sort

– Insertion Sort

More Complex Sorts– Quick Sort

– Merge Sort

– Heap Sort

O(N2)

O(N*log N)

48

Selection Sort• We can analyze the performance of function selection_sort

in the same way that it is programmed. The main function does nothing except some bookkeeping and calling the subprograms.

• The function swap is called n - 1 times, and each call does 3 assignments of entries, for a total count of 3(n - 1).

• The function max_key is called n - 1 times, with the length of the sublist ranging from n down to 2.

• If t is the number of entries on the part of the list for which it is called, then max_key does exactly t - 1 comparisons of keys to determine the maximum. Hence, altogether, there are (n - 1) + (n - 2) +…+ 1 = 1/2 *n(n - 1) comparisons of keys, which we approximate to = ½ n2 + O(n)

Analysis and comparison:

• Selection sort moves the entries very efficiently but does many redundant comparisons.

• In its best case, insertion sort does the minimum number of comparisons, but it is inefficient in moving entries only one position at a time.

• Our goal now is to derive another method that avoids, as much as possible, the problems with both of these.

• Let us start with insertion sort and ask how we can reduce the number of times it moves an entry.

Shell Sort

Shell Sort• The reason why insertion sort can move entries only

one position is that it compares only adjacent keys.

• If we were to modify it so that it first compares keys far apart, then it could sort the entries far apart. Afterward, the entries closer together would be sorted, and finally the increment between keys being compared would be reduced to 1, to ensure that the list is completely in order.

• This is the idea implemented in 1959 by D. L. SHELL in the sorting method bearing his name. This method is also sometimes called diminishing-increment sort.

Example of Shell Sort

Shell Sort• we first sort all names that are at distance 5 from

each other (so there will be only two or three names on each such list),

• then re-sort the names using increment 3, and

• finally perform an ordinary insertion sort (increment 1).

• You can see that, even though we make three passes through all the names, the early passes move the names close to their final positions, so that at the final pass (which does an ordinary insertion sort), all the entries are very close to their final positions so the sort goes rapidly.

Shell Sort• we start with increment == count, where we recall

that count represents the size of the List being sorted, and at each pass reduce the increment with a statement:

increment = increment/3 + 1;

Analysis of Shell Sort• Very large empirical studies have been made of

Shell sort, and it appears that the number of moves, when n is large, is in the range of n 1:25 to 1.6n 1:25.

• This constitutes a substantial improvement over insertion sort.

57

Merge Sort Algorithm

Cut the array in half.

Sort the left half.

Sort the right half.

Merge the two sorted halves into one sorted array.

[first] [middle] [middle + 1] [last]

74 36 . . . 95 75 29 . . . 52

36 74 . . . 95 29 52 . . . 75

// Recursive merge sort algorithm

template <class ItemType >void MergeSort ( ItemType values[ ] , int first , int last )

// Pre: first <= last// Post: Array values[ first . . last ] sorted into ascending order.{

if ( first < last ) // general case

{ int middle = ( first + last ) / 2 ;

MergeSort ( values, first, middle ) ;

MergeSort( values, middle + 1, last ) ;

// now merge two subarrays// values [ first . . . middle ] with // values [ middle + 1, . . . last ].

Merge( values, first, middle, middle + 1, last ) ;}

} 58

59

Using Merge Sort Algorithm with N = 16

16

8 8

4 4 4 4

2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

60

Merge Sort of N elements: How many comparisons?

The entire array can be subdivided into halves only log2N times.

Each time it is subdivided, function Merge is calledto re-combine the halves. Function Merge uses a temporary array to store the merged elements. Merging is O(N) because it compares each element in the subarrays.

Copying elements back from the temporary array to the values array is also O(N).

MERGE SORT IS O(N*log2N).

Figure 11-24

Figure 11-25

MergesortSorting schemes are

internal -- designed for data items stored in main memoryexternal -- designed for data items stored in secondary memory.

Previous sorting schemes were all internal sorting algorithms:required direct access to list elements

( not possible for sequential files) made many passes through the list

(not practical for files)

mergesort can be used both as an internal and an external sort.basic operation in mergesort is merging, that is,combining two lists that have previously been sorted so that theresulting list is also sorted.

MergesortFor example: File1 15 20 25 35 45 60 65 70

File2 10 30 40 50 55

Pair by pair,compare the smallest unmerged element in File1, call it xwith the smallest unmerged element in File2, call it y

If x < y,copy x from File1 to the "merged" file, File3

Elsecopy y from File2 to the "merged" file, File3

File1 15 20 25 35 45 60 65 70 File3 10File2 10 30 40 50 55

File1 15 20 25 35 45 60 65 70 File3 10 15File2 10 30 40 50 55

MergesortFile1 15 20 25 35 45 60 65 70 File3 10 15 20File2 10 30 40 50 55

File1 15 20 25 35 45 60 65 70 File3 10 15 20 25File2 10 30 40 50 55

File1 15 20 25 35 45 60 65 70 File3 10 15 20 25 30File2 10 30 40 50 55

File1 15 20 25 35 45 60 65 70 File3 10 15 20 25 30 35File2 10 30 40 50 55

File1 15 20 25 35 45 60 65 70 File3 10 15 20 25 30 35 40File2 10 30 40 50 55

File1 15 20 25 35 45 60 65 70 File3 10 15 20 25 30 35 40 45File2 10 30 40 50 55

Etc.

Mergesort1. Open File1 and File2 for input, File3 for output.

2. Read first element x from File1 and first element y from File2.

3. Repeat the following until end of either File1 or File2 reached:If x< y

a. Write x to File3.b. Read a new x value from File1.

Elsea. Write y to File3.b. Read a new y value from File2.

4. If end of File1 encountered,copy any remaining elements from File2 into File3.

Else // end of File2 was encounteredcopy the rest of File1 into File3.

Sorting. We live in a world obsessed with keeping information, and to find it, we must keep it in...

Documents

Transcript of Sorting. We live in a world obsessed with keeping information, and to find it, we must keep it in...