Sorting Overview / Heapsort

41
Sorting Overview / Heapsort

description

Sorting Overview / Heapsort. Sort Routine Features / Definitions: Big O analysis is based on number of comparisons. - PowerPoint PPT Presentation

Transcript of Sorting Overview / Heapsort

Page 1: Sorting Overview / Heapsort

Sorting Overview / Heapsort

Page 2: Sorting Overview / Heapsort

Sort Routine Features / Definitions:

• Big O analysis is based on number of comparisons.

• A record is a group of related information, such as information about a particular student in a student database. Think of record as a row in a table. In C++, the struct data type can be used to represent a record, and an array of structs can be used to represent a table.

• In database terminology, each entry in a record is called a field. Think of a field as a column in a table. When sorting a list of records, the field on which the sort order is based is called the key.

Student ID Number Last Name First Name GPA

145848385 Pascal Grace 3.90

456535452 Babbage Charles 3.75

793867437 Pascal Blaise 3.83

record

field

Page 3: Sorting Overview / Heapsort

More Sort Routine Features / Definitions:

• Adaptive means that the sort routine's performance is better if the key values to be sorted are nearly sorted to begin with.

• Stable means that the sort routine maintains the original relative order of records with equal keys.

Page 4: Sorting Overview / Heapsort

Sorting on more than one key

• The key on which the overall sort order is based is called the primary key.

• For example, when names are sorted, they are usually sorted by last name. Then records with the same last name are typically sorted by first name. In this case, the last name is the primary key and the first name is the secondary key.

Student ID Number Last Name First Name GPA

456535452 Babbage Charles 3.75

793867437 Pascal Blaise 3.83

145848385 Pascal Grace 3.90

primary key secondary key

Page 5: Sorting Overview / Heapsort

Sorting on more than one key continued

• To sort records by last name, then by first name within last name, two complete sorts must be performed on the records.

• Although admittedly counterintuitive, the first sort must be based on the secondary key, first name. Depending on the characteristics of the data (how many records, how many fields, are they close to being sorted already, etc?) any sorting algorithm will do.

• After sorting on the secondary key, the table looks like this:

Student ID Number Last Name First Name GPA

793867437 Pascal Blaise 3.83

456535452 Babbage Charles 3.75

145848385 Pascal Grace 3.90

Page 6: Sorting Overview / Heapsort

Sorting on more than one key continued

• After sorting on the secondary key, the next sort must be based on the primary key. In order to maintain the order obtained by the sort on the first names, a stable sort must be used this time.

• In the table below, the first names have already been sorted, so we don't want the sort on the last names to ruin what we accomplished in the first sort. An unstable sort could possibly rearrange the first names, while a stable sort is guaranteed not to rearrange them.

Student ID Number Last Name First Name GPA

793867437 Pascal Blaise 3.83

456535452 Babbage Charles 3.75

145848385 Pascal Grace 3.90

Page 7: Sorting Overview / Heapsort

Sorting on more than one key continued

• After sorting by last name using a stable sort, the table now looks like this

• Remember, when sorting on multiple keys

• One sort must be performed for each key.

• The sorts must be performed in reverse order of the "importance" of the keys.

• Each sort after the first must be stable.

Student ID Number Last Name First Name GPA

793867437 Pascal Blaise 3.83

145848385 Pascal Grace 3.90

456535452 Babbage Charles 3.75

Page 8: Sorting Overview / Heapsort

Sorting Algorithms We've Discussed

Page 9: Sorting Overview / Heapsort

Bubble Sort / Straight Exchange Sort

• O(n2)

• compares each key with every other key - each pass "bubbles up" the next smallest key to its proper position

• requires n swaps / exchanges to "bubble" a key up n positions

• stable - maintains original order of records with equal keys • simple modifications are adaptive - behavior approaches O(n) when original data is almost sorted

Short Bubble algorithm in text "short circuits" outer loop as soon as all values are in order

Bi-directional Bubble alternately bubbles a small value up then a large value down and also "short circuits" outer loop

• generally, the worst performing algorithm of all, but the simplest to code!

Page 10: Sorting Overview / Heapsort

Straight Insertion Sort

• O(n2) • places next key from unsorted portion of array into the desired position among the previously sorted keys • doesn't compare each key with every other key but constantly "shifts down" groups of previously sorted keys

• stable

• adaptive - behavior approaches O(n) when original data is almost sorted

Page 11: Sorting Overview / Heapsort

Straight Selection Sort

• guaranteed O(n2)

• locates next smallest key from unsorted portion of array and places it in its proper position

• only O(n) swaps - unlike Bubble Sort, only one swap takes place when a key is moved no matter how far it is moved and sorted keys are never "shifted down" as they are by Insertion Sort

• not stable

• not adaptive, but since comparing is less labor intensive for the computer than data movement (swapping / shifting) the O(n) swaps make it a reasonable choice among the O(n2) sorts for small lists of large records

Page 12: Sorting Overview / Heapsort

Mergesort

• guaranteed O(n log2n)

• recursively divides array to be sorted in half, sorts each half and merges the two halves together

• only algorithm not coded to work "in place" - requires an additional array - not good choice for large records if space is a problem (can be coded in place, but not typically)

• stable

• not adaptive

Page 13: Sorting Overview / Heapsort

Quicksort / Partition Exchange Sort

• O(n log2n)

• advanced exchange sort algorithm - selects one key (the pivot) to be placed in its proper final position and partitions the remaining keys so that all keys to the "left" of the pivot are less than or equal to the pivot and all keys to the "right" of the pivot are greater than the pivot, then recursively sorts each partition

• based on idea that it is better to move keys large distances than to move them one position at a time (same theory behind Selection Sort) • not stable

• not adaptive; in fact behavior approaches O(n2) with poor choice of split value, or pivot (for nearly sorted data, choosing "leftmost" value as pivot is a poor choice; choosing a random pivot has a high probability of yielding better results)

Page 14: Sorting Overview / Heapsort

Heapsort

• guaranteed O(n log2n)

• works by first forming heap out of existing array, then successively swapping top of heap with bottom of heap and using ReHeapDown to reform heap from top to (bottom - 1) until heap has been emptied • high overhead because of its 2 O(n log2n) phases, but good for large values of n • not stable

• not adaptive

Page 15: Sorting Overview / Heapsort

To understand Heapsort:

• Must understand how to view and process an array as a binary tree

• Must understand what a heap is

Page 16: Sorting Overview / Heapsort

3

7 5

14 1 9 12

15 6 13 2 4 16 8 11

10

3

7

5

14

1

9

12

15

6

13

2

4

16

8

11

10

This array -->is equivalent to this binary tree:

Note that for a "node" at array index N:

• the left child is at index 2N + 1• the right child is at index 2N + 2

Page 17: Sorting Overview / Heapsort

Heaps

• A heap is a binary tree that meets a shape property:

• A full tree is one in which all leaves are on the same level and every nonleaf node has two children.

• A complete tree is full or at least full to the next-to-last level and the leaves on the last level are as far to the left as possible. A heap must be a complete tree.

• If an "outline" is drawn around a full tree, it looks like:

• If an "outline" is drawn around a complete tree, it looks like:

• A heap also has an order property:

• Each node contains a value greater than or equal to each of its children.

Page 18: Sorting Overview / Heapsort

3

7 5

14 1 9 12

15 6 13 2 4 16 8 11

10

This binary tree has the shape property, but not the order property

Note that each leaf satisfies the heap order property, so below the red dashed line we have a heap

Because of the shape property, the first nonleaf (from the bottom of the heap / end of the array ) is located at index position (bottom /2), where bottom is the index position of the last element in the heap.

An operation called ReHeapDown in the text begins with this first nonleaf node and repairs the heap from this point down, then moves back one node repeating the repair operation until the top of the heap is reached.

Page 19: Sorting Overview / Heapsort

3

7 5

14 1 9 12

15 6 13 2 4 16 8 11

10

Considering the 15 to be the root of a heap, ReHeapDown makes sure that the value 15 is larger than the values of its two children.

If necessary, the root value is swapped with the value of the largest child

A look at ReHeapDown:

In this case, no swap is necessary since 15 > 10 and there is no right child

Page 20: Sorting Overview / Heapsort

3

7 5

14 1 9 12

15 6 13 2 4 16 8 11

10

Since 12 > 11 and 12 > 8, no swap is necessary

. . . so ReHeapDown is now called with the 12 as the root of a heap.

We now know we have a heap below the red dashed line . . .

Page 21: Sorting Overview / Heapsort

3

7 5

14 1 9 12

15 6 13 2 4 16 8 11

10

Since 9 > 4 but 16 > 9, the 9 and 16 must be swapped to repair the order property.

. . . so ReHeapDown is now called with the 9 as the root of a heap.

We now know we have a heap below the red dashed line . . .

Page 22: Sorting Overview / Heapsort

3

7 5

14 1 16 12

15 6 13 2 4 9 8 11

10

Since 1 < 13 and 1 < 2, the 1 must be swapped with the 13 (the maximum child) to repair the order property.

. . . so ReHeapDown is now called with the 1 as the root of a heap.

We now know we have a heap below the red dashed line . . .

Page 23: Sorting Overview / Heapsort

3

7 5

14 13 16 12

15 6 1 2 4 9 8 11

10

Since 14 > 6 but 14 < 15, the 14 and 15 must be swapped to repair the order property.

. . . so ReHeapDown is now called with the 14 as the root of a heap.

We now know we have a heap below the red dashed line . . .

Page 24: Sorting Overview / Heapsort

3

7 5

15 13 16 12

14 6 1 2 4 9 8 11

10

Since 14 > 10 the order property is still okay.

But this time we need to keep going to make sure the swap didn't ruin the order property further down the heap.

Page 25: Sorting Overview / Heapsort

3

7 5

15 13 16 12

14 6 1 2 4 9 8 11

10

5 must be swapped with its maximum child, 16

. . . so ReHeapDown is now called with the 5 as the root of a heap.

We now know we have a heap below the red dashed line . . .

Page 26: Sorting Overview / Heapsort

3

7 16

15 13 5 12

14 6 1 2 4 9 8 11

10

. . . and we see that 5 must be swapped with 9

Then we need to compare 5 with its two children . . .

Page 27: Sorting Overview / Heapsort

3

7 16

15 13 9 12

14 6 1 2 4 5 8 11

10

We now know we have a heap below the red dashed line . . .

. . . so ReHeapDown is now called with the 7 as the root of a heap.

7 must be swapped with its maximum child, 15

Page 28: Sorting Overview / Heapsort

3

15 16

7 13 9 12

14 6 1 2 4 5 8 11

10

Then we need to compare 7 with its two children . . .

. . . and we see that 7 must be swapped with 14

Page 29: Sorting Overview / Heapsort

3

15 16

14 13 9 12

7 6 1 2 4 5 8 11

10

Then we need to swap 7 with its left child 10

Page 30: Sorting Overview / Heapsort

3

15 16

14 13 9 12

10 6 1 2 4 5 8 11

7

We now know we have a heap below the red dashed line . . .

. . . so ReHeapDown is now called with the 3 as the root of a heap.

3 must be swapped with its maximum child, 16

Page 31: Sorting Overview / Heapsort

16

15 3

14 13 9 12

10 6 1 2 4 5 8 11

7

. . . so ReHeapDown is now called with the 3 as the root of a heap.

3 must be swapped with its maximum child, 12

Then we need to compare 3 with its two children . . .

Page 32: Sorting Overview / Heapsort

16

15 12

14 13 9 3

10 6 1 2 4 5 8 11

7

. . . so ReHeapDown is now called with the 3 as the root of a heap.

3 must be swapped with its maximum child, 11

Then we need to compare 3 with its two children . . .

Page 33: Sorting Overview / Heapsort

16

15 12

14 13 9 11

10 6 1 2 4 5 8 3

7

AND we have just completed the first stage of Heapsort: building the heap!

We now finally have a complete heap!

Note that we considered n/2 nodes and swapped each node at most log2n times for O (n log2n) behavior.

Page 34: Sorting Overview / Heapsort

16

15 12

14 13 9 11

10 6 1 2 4 5 8 3

7

Note that by swapping the root and bottom of the heap, we will place the largest value in the array in its proper position.

NOW for the second stage of Heapsort!

root

bottom

Page 35: Sorting Overview / Heapsort

7

15 12

14 13 9 11

10 6 1 2 4 5 8 3

16

Now note that below the red line the array is sorted, and the remaining heap portion of the array is above the red line

Also notice that the heap order property has been compromised at the root.

So, what can we do???

ReHeapDown from the root, but be sure to stop at the new bottom!

new bottom

Page 36: Sorting Overview / Heapsort

15

14 12

10 13 9 11

7 6 1 2 4 5 8 3

16

The array after ReHeapDown from the 7 at the root.

new bottom

Page 37: Sorting Overview / Heapsort

15

14 12

10 13 9 11

7 6 1 2 4 5 8 3

16

One more pass of the second phase

Swap the root and the bottom

new bottom

root

Page 38: Sorting Overview / Heapsort

3

14 12

10 13 9 11

7 6 1 2 4 5 8 15

16 new bottom

root

Below the red line the array is sorted; above the red line is a heap in need of repair

So ReHeapDown from the root and, again, be sure to stop at the new bottom!

Page 39: Sorting Overview / Heapsort

14

13 12

10 3 9 11

7 6 1 2 4 5 8 15

16 new bottom

root

The array after ReHeapDown from the 3 at the root.

Page 40: Sorting Overview / Heapsort

1

2 3

4 5 6 7

8 9 10 11 12 13 14 15

16

After n such root / bottom swaps each followed by ReHeapDown (each with at most log2n swaps) . . .

. . . we have completed the second phase of Heapsort, which is O(nlog2n), and we have a sorted array

So, the first phase of Heapsort , O(nlog2n), plus the second phase of Heapsort, O(nlog2n), gives us 2*O(nlog2n), which is still O(nlog2n)

Page 41: Sorting Overview / Heapsort

Sorting HomeworkDue Tuesday, December 2

Chapter 10, pp. 669 – 673, problems 1 – 11, 23 – 27

Note: In 1 & 2 when the question asks you to show the array after the 4th iteration of an algorithm, show it after EACH of the first 4 iterations.

Note: Assume ALL questions ask WHY? This means ALL answers require an explanation!!!